Synopis: Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern. This analysis investigates the effects of major weather events accross United States by examining their effects to the population health and economic consequences.
library(plyr)
library(dplyr)
library(ggplot2)
#install.packages("gdata")
library(gdata)
The Storm Data can be downloaded at: https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2
url <- 'https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2'
if (!file.exists('stormdata.csv.bz2')) {
download.file(url, 'stormdata.csv.bz2')
}
stormdata <- read.csv('stormdata.csv.bz2')
The downloaded storm data contains 902297 objects from 37 variables:
str(stormdata)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
## $ BGN_TIME : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
## $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
## $ STATE : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : Factor w/ 35 levels ""," N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_DATE : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_TIME : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ WFO : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ ZONENAMES : Factor w/ 25112 levels ""," "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : Factor w/ 436781 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
However, since we are only interested to see the impact of events to population health and economic consequences, these variables are extracted into a new dataframe: EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP).
relevantData <-select(stormdata,EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
str(relevantData)
## 'data.frame': 902297 obs. of 7 variables:
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
We split the data into two categories to answer the questions by selecting non-zero rows:
popHealthData <- filter(relevantData, (INJURIES > 0 | FATALITIES > 0))
economicData <- filter(relevantData, (PROPDMG > 0 | CROPDMG > 0 ))
For Population Health, only non zero values are filtered, thery are grouped by EVTYPE and summed by number of fatalities and injuries.
popHealthDataFatal <- relevantData %>%
filter( FATALITIES > 0)%>%
group_by(EVTYPE)%>%
summarise(Fatalities=sum(FATALITIES))%>%
arrange(desc(Fatalities))
popHealthDataInjured <- relevantData %>%
filter( INJURIES > 0)%>%
group_by(EVTYPE)%>%
summarise(Injuries=sum(INJURIES))%>%
arrange(desc(Injuries))
However, since there are 158 events that affecting the population’s injuries, we will only plot the top 10 events
str(popHealthDataInjured)
## Classes 'tbl_df', 'tbl' and 'data.frame': 158 obs. of 2 variables:
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 856 170 130 464 275 427 153 760 244 ...
## $ Injuries: num 91346 6957 6789 6525 5230 ...
popHealthDataInjured <- popHealthDataInjured[1:10,]
print(popHealthDataInjured)
## Source: local data frame [10 x 2]
##
## EVTYPE Injuries
## 1 TORNADO 91346
## 2 TSTM WIND 6957
## 3 FLOOD 6789
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5230
## 6 HEAT 2100
## 7 ICE STORM 1975
## 8 FLASH FLOOD 1777
## 9 THUNDERSTORM WIND 1488
## 10 HAIL 1361
str(popHealthDataFatal)
## Classes 'tbl_df', 'tbl' and 'data.frame': 168 obs. of 2 variables:
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 130 153 275 464 856 170 585 359 19 ...
## $ Fatalities: num 5633 1903 978 937 816 ...
popHealthDataFatal <- popHealthDataFatal[1:10,]
print(popHealthDataFatal)
## Source: local data frame [10 x 2]
##
## EVTYPE Fatalities
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
## 6 TSTM WIND 504
## 7 FLOOD 470
## 8 RIP CURRENT 368
## 9 HIGH WIND 248
## 10 AVALANCHE 224
A new economic loss variable, ECONLOSS, defined as the sum of property and crop damage, taking into account the magnitude indicated by PROPDMGEXP and CROPDMGEXP. The magnitude variables must have a valid level of “B”, “M”, or “K” (lower case accepted), or the records will be ignored.
stormdata$PROPDAMAGE <- ifelse(stormdata$PROPDMG == 0, 0, ifelse(stormdata$PROPDMGEXP == "", stormdata$PROPDMG, ifelse(stormdata$PROPDMGEXP == "K", 1000*stormdata$PROPDMG,
ifelse(stormdata$PROPDMGEXP == "M", 1000000*stormdata$PROPDMG, ifelse(stormdata$PROPDMGEXP == "B", 1000000000*stormdata$PROPDMG, NA)))))
stormdata$CROPDAMAGE <- ifelse(stormdata$CROPDMG == 0, 0, ifelse(stormdata$CROPDMGEXP == "", stormdata$CROPDMG, ifelse(stormdata$CROPDMGEXP == "K", 1000*stormdata$CROPDMG,
ifelse(stormdata$CROPDMGEXP == "M", 1000000*stormdata$CROPDMG, ifelse(stormdata$CROPDMGEXP == "B", 1000000000*stormdata$CROPDMG, NA)))))
stormdata$ECONLOSS <- stormdata$PROPDAMAGE + stormdata$CROPDAMAGE
popHealthDataInjured$EVTYPE <- reorder.factor(popHealthDataInjured$EVTYPE, -popHealthDataInjured$Injuries)
Impact of events to population Injuries
ggplot(popHealthDataInjured,aes(EVTYPE,Injuries))+
geom_bar(stat="identity", fill="#FF0000", width=.3,)+
theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5)) +
xlab("Event type") +
ylab("Number of injuries")
As a conclusion, Tornado is the event that has the most impact to the population injuries.
Impact of events to population Fatalities
popHealthDataFatal$EVTYPE <- reorder.factor(popHealthDataFatal$EVTYPE, -popHealthDataFatal$Fatalities)
ggplot(popHealthDataFatal,aes(EVTYPE,Fatalities))+
geom_bar(stat="identity", fill="#0000FF", width=.3,)+
theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5)) +
xlab("Event type") +
ylab("Number of fatalities")
As a conclusion, Tornado is the event that has the most impact to the population fatalities
Impact of events to Economic
eventloss <- ddply(stormdata, .(EVTYPE), summarise, sumloss=sum(ECONLOSS, na.rm=TRUE))
sortloss <- eventloss[order(-eventloss$sumloss),]
topeconimpact <- sortloss[1:10,]
barplot(topeconimpact$sumloss,names.arg=topeconimpact$EVTYPE, cex.names=0.5, cex.axis=0.5, las=2, main="Events with Greatest Historic Economic Loss", ylab="sum of economic loss ($)")
As a conclusion, Flood is the event that has the most impact to the economic.