Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
The U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database holding events data is considered for the analysis. With this analysis, the top five severe weather events resulting deaths, injury, property damage and crop damage is identified.
This project explores the NOAA storm database which tracks characteristics of major storms and weather events, including when and where it occurs, as well as estimates of any fatalities, injuries, and property damage.
The project is expected to answer the following major questions. These are:
1. Which types of events are most harmful with respect to population health across the United States?
2. Which types of events have the greatest economic consequences across the United States?
library(ggplot2)
library(dplyr)
library(R.utils)
require(gridExtra)
library(cowplot)
filePath<- getwd()
fileName<- "stormData.csv.bz2"
fileUrl<- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileUrl, destfile = fileName, method = "curl")
bunzip2.default(fileName)
Storm_Data <- read.csv("stormData.csv")
names(Storm_Data)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
str(Storm_Data)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
## $ BGN_TIME : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
## $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
## $ STATE : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : Factor w/ 35 levels ""," N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_DATE : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_TIME : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ WFO : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ ZONENAMES : Factor w/ 25112 levels ""," "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : Factor w/ 436774 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
As shwon above, there are 902297 observations and 37 variables.
The following histogram gives us a good overview of the events in which the database start in the year 1950 and end in November 2011. It also shows that in the earlier years of the database there are generally fewer events recorded.
formatedDate<- as.numeric(format(as.Date(Storm_Data$BGN_DATE, format = "%m/%d/%Y %H:%M:%S"), "%Y"))
hist(formatedDate, xlab = "Year",col = "lightblue", breaks = 60,
main = "Frequency of Storm Events from 1950 to 2011")
As it is observed from the above section, there are a total of 37 variables. out of these variables the following are the lists of variables of interset.
1. Health variables: FATALITIES and INJURIES
2. Economic variables: PROPDMG, PROPDMGEXP, CROPDMG and CROPDMGEXP
3. Events variable: EVTYPE
SelectedData <- c("STATE", "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")
Sel_StormData <- Storm_Data[SelectedData]
dim(Sel_StormData)
## [1] 902297 8
The follwing codes are developed to answer the questions explained in the synopsis section. The top five events are chosen for the demonestration of results in the form of plots
2.4.1. Which types of events are most harmful with respect to population health across the United States?*
Health Impact
1. Total fatalities and top five fatalities
T_Fatality <- aggregate(FATALITIES ~ EVTYPE, data = Sel_StormData, FUN="sum")
Top5Fatalities <- T_Fatality[order(-T_Fatality$FATALITIES) , ][1:5, ]
Top5Fatalities
## EVTYPE FATALITIES
## 834 TORNADO 5633
## 130 EXCESSIVE HEAT 1903
## 153 FLASH FLOOD 978
## 275 HEAT 937
## 464 LIGHTNING 816
2. Total injuries and top five injuries
T_Injury <- aggregate(INJURIES ~ EVTYPE, data = Sel_StormData, FUN="sum")
Top5Injuries <- T_Injury[order(-T_Injury$INJURIES) , ][1:5, ]
Top5Injuries
## EVTYPE INJURIES
## 834 TORNADO 91346
## 856 TSTM WIND 6957
## 170 FLOOD 6789
## 130 EXCESSIVE HEAT 6525
## 464 LIGHTNING 5230
2.4.2. Which types of events have the greatest economic consequences across the United States?
Economic Impact
The followings expressed the selected variables and their meanings. -PROPDMG = Property damage -CROPDMG = crop damage -PROPDMGEXP = property damage exponent -CROPDMGEXP = crop damage exponents
The symbols embeded in the data is extracted using the following codes:
unique(Sel_StormData$PROPDMGEXP)
## [1] K M B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels: - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
unique(Sel_StormData$CROPDMGEXP)
## [1] M K m B ? 0 k 2
## Levels: ? 0 2 B k K m M
cleaning and preparing the property and crop damages exponent variables(PROPDMGEXP and `CROPDMGEXP)
+, -, ? <- 0
0-8 -> 10
H, h<- 100
K,k -> 1,000
M,m -> 1,000,000
B,b -> 1,000,000,000
exp_multiplier<- function(exp) {
ifelse(exp == '+', 1,
ifelse(exp %in% paste(seq(0,8)), 10^1,
ifelse(exp %in% c('H', 'h'), 10^2,
ifelse(exp %in% c('K', 'k'), 10^3,
ifelse(exp %in% c('M', 'm'), 10^6,
ifelse(exp %in% c('B', 'b'), 10^9,
0)
)
)
)
)
)
}
Sel_StormData$PropDmgMult <- exp_multiplier(Sel_StormData$PROPDMGEXP)
Sel_StormData$CropDmgMult <- exp_multiplier(Sel_StormData$CROPDMGEXP)
Add two variables namely “TotalCropDMG” and “TotalPropDMG” and get their values by multiplying them against the corresponding damage variables.
And finally add the above two expenses to get the total economic danage expenses.
The overall health imact is calculated by adding one variable T_HealthImpct with the sum of FATALITIES AND INJURIES variables
Sel_StormData$TotalCropDMG<- Sel_StormData$CROPDMG * Sel_StormData$CropDmgMult
Sel_StormData$TotalPropDMG<- Sel_StormData$PROPDMG * Sel_StormData$PropDmgMult
Sel_StormData$TotalDamage<- Sel_StormData$TotalCropDMG + Sel_StormData$TotalPropDMG
Sel_StormData$T_HealthImpct<- Sel_StormData$FATALITIES + Sel_StormData$INJURIES
Crop damage:
CropDamageByEvent <- aggregate(TotalCropDMG ~ EVTYPE, data = Sel_StormData, FUN="sum")
Top5EventsCROPDMG <- CropDamageByEvent[order(-CropDamageByEvent$TotalCropDMG) , ][1:5, ]
Top5EventsCROPDMG
## EVTYPE TotalCropDMG
## 95 DROUGHT 13972566000
## 170 FLOOD 5661968450
## 590 RIVER FLOOD 5029459000
## 427 ICE STORM 5022113500
## 244 HAIL 3025954650
Property Damage:
PropDamageByEvent <- aggregate(TotalPropDMG ~ EVTYPE, data = Sel_StormData, FUN="sum")
Top5EventsPropDMG <- PropDamageByEvent[order(-PropDamageByEvent$TotalPropDMG) , ][1:5, ]
Top5EventsPropDMG
## EVTYPE TotalPropDMG
## 170 FLOOD 144657709800
## 411 HURRICANE/TYPHOON 69305840000
## 834 TORNADO 56937162897
## 670 STORM SURGE 43323536000
## 153 FLASH FLOOD 16140815011
Total Damage:
TotalDamageByEvent <- aggregate(TotalDamage ~ EVTYPE, data = Sel_StormData, FUN="sum")
Top5EventsTotalDMG <- TotalDamageByEvent[order(-TotalDamageByEvent$TotalDamage) , ][1:5, ]
Top5EventsTotalDMG
## EVTYPE TotalDamage
## 170 FLOOD 150319678250
## 411 HURRICANE/TYPHOON 71913712800
## 834 TORNADO 57352117607
## 670 STORM SURGE 43323541000
## 244 HAIL 18758224527
Health Impact:
HealthDamageByEvent <- aggregate(T_HealthImpct ~ EVTYPE, data = Sel_StormData, FUN="sum")
Top5EventsHealthImpact <- HealthDamageByEvent[order(-HealthDamageByEvent$T_HealthImpct) , ][1:5, ]
Top5EventsHealthImpact
## EVTYPE T_HealthImpct
## 834 TORNADO 96979
## 130 EXCESSIVE HEAT 8428
## 856 TSTM WIND 7461
## 170 FLOOD 7259
## 464 LIGHTNING 6046
The graphical representation of the top five events with the highest total fatalities and injuries are shown below:
Top5FatalityPlot<-ggplot(Top5Fatalities, aes(x = reorder(EVTYPE,-FATALITIES), y = FATALITIES)) + geom_bar(stat = "identity", fill = "red") +
theme(axis.text.x = element_text(angle = 45,hjust = 1), text = element_text(size = 10)) +
xlab("Events Type") + ylab("Total Number of Fatalities") +
ggtitle("Top five Severe Weather Events\n causing Fatalities in USA")
Top5InjuryPlot<- ggplot(Top5Injuries, aes(x = reorder(EVTYPE,-INJURIES), y = INJURIES)) + geom_bar(stat = "identity", fill = "green") +
theme(axis.text.x = element_text(angle = 45,hjust = 1), text = element_text(size = 10)) +
xlab("Events Type") + ylab("Total Number of Injuries")+
ggtitle("Top five Severe Weather Events\n causing Injuries in USA")
plot_grid(Top5FatalityPlot,Top5InjuryPlot, ncol = 2)
The follwing Plots show the graphical representation of the events causing property damage and crop damage
p1<- ggplot(Top5EventsPropDMG, aes(x=reorder(EVTYPE, -TotalPropDMG), y=TotalPropDMG/10^9))+geom_bar(stat="identity", fill="blue")+theme(text = element_text(size = 8),axis.text.x = element_text(angle=35, hjust = 1))+labs(x="Events Type", y="Property Damage(Billion Dollars)")+ggtitle("Top five Events with Highest property damage")
p2<- ggplot(Top5EventsCROPDMG, aes(x=reorder(EVTYPE, -TotalCropDMG), y=TotalCropDMG/10^9))+geom_bar(stat="identity", fill="green")+theme(text = element_text(size = 8), axis.text.x = element_text(angle=35, hjust = 1)) +labs(x="Events Type", y="Crop Damage(Billion Dollars)")+ggtitle("Top five Events with Highest Crop damage")
p3<- ggplot(Top5EventsTotalDMG, aes(x=reorder(EVTYPE, -TotalDamage), y=TotalDamage/10^9))+geom_bar(stat="identity", fill="red")+theme(text =element_text(size=8), axis.text.x = element_text(angle=35, hjust = 1))+labs(x="Events Type", y="Economic Damage(Billion Dollars)")+ggtitle("Top five Events with Highest Economic Impact")
p4<- ggplot(Top5EventsHealthImpact, aes(x=reorder(EVTYPE, -T_HealthImpct), y=T_HealthImpct))+geom_bar(stat="identity", fill="purple") + theme(text = element_text(size = 8), axis.text.x = element_text(angle=35, hjust = 1))+labs(x="Events Type", y="Health Impact")+ggtitle("Top five Events with Highest Health Impact\n(Fatalities & Injuries)")
plot_grid(p1, p2, p3, p4, nrow =2, ncol = 2, align = 'vh')