Synopis: Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern. This analysis investigates the effects of major weather events accross United States by examining their effects to the population health and economic consequences.

library(plyr)
library(dplyr)
library(ggplot2)
#install.packages("gdata")
library(gdata)

Data Processing

The Storm Data can be downloaded at: https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2

url <- 'https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2'

if (!file.exists('stormdata.csv.bz2')) {
  download.file(url, 'stormdata.csv.bz2')
}
stormdata <- read.csv('stormdata.csv.bz2')

The downloaded storm data contains 902297 objects from 37 variables:

str(stormdata)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436781 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

However, since we are only interested to see the impact of events to population health and economic consequences, these variables are extracted into a new dataframe: EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP).

relevantData <-select(stormdata,EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
str(relevantData)
## 'data.frame':    902297 obs. of  7 variables:
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...

We split the data into two categories to answer the questions by selecting non-zero rows:

popHealthData <- filter(relevantData, (INJURIES > 0 | FATALITIES > 0))
economicData <- filter(relevantData, (PROPDMG > 0 | CROPDMG > 0 ))

For Population Health, only non zero values are filtered, thery are grouped by EVTYPE and summed by number of fatalities and injuries.

popHealthDataFatal <- relevantData %>%
        filter( FATALITIES > 0)%>%
         group_by(EVTYPE)%>%
         summarise(Fatalities=sum(FATALITIES))%>%
         arrange(desc(Fatalities))

popHealthDataInjured <- relevantData %>%
         filter( INJURIES > 0)%>%
         group_by(EVTYPE)%>%
         summarise(Injuries=sum(INJURIES))%>%
         arrange(desc(Injuries))

However, since there are 158 events that affecting the population’s injuries, we will only plot the top 10 events

str(popHealthDataInjured)
## Classes 'tbl_df', 'tbl' and 'data.frame':    158 obs. of  2 variables:
##  $ EVTYPE  : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 856 170 130 464 275 427 153 760 244 ...
##  $ Injuries: num  91346 6957 6789 6525 5230 ...
popHealthDataInjured <- popHealthDataInjured[1:10,]
print(popHealthDataInjured)
## Source: local data frame [10 x 2]
## 
##               EVTYPE Injuries
## 1            TORNADO    91346
## 2          TSTM WIND     6957
## 3              FLOOD     6789
## 4     EXCESSIVE HEAT     6525
## 5          LIGHTNING     5230
## 6               HEAT     2100
## 7          ICE STORM     1975
## 8        FLASH FLOOD     1777
## 9  THUNDERSTORM WIND     1488
## 10              HAIL     1361
str(popHealthDataFatal)
## Classes 'tbl_df', 'tbl' and 'data.frame':    168 obs. of  2 variables:
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 130 153 275 464 856 170 585 359 19 ...
##  $ Fatalities: num  5633 1903 978 937 816 ...
popHealthDataFatal <- popHealthDataFatal[1:10,]
print(popHealthDataFatal)
## Source: local data frame [10 x 2]
## 
##            EVTYPE Fatalities
## 1         TORNADO       5633
## 2  EXCESSIVE HEAT       1903
## 3     FLASH FLOOD        978
## 4            HEAT        937
## 5       LIGHTNING        816
## 6       TSTM WIND        504
## 7           FLOOD        470
## 8     RIP CURRENT        368
## 9       HIGH WIND        248
## 10      AVALANCHE        224

A new economic loss variable, ECONLOSS, defined as the sum of property and crop damage, taking into account the magnitude indicated by PROPDMGEXP and CROPDMGEXP. The magnitude variables must have a valid level of “B”, “M”, or “K” (lower case accepted), or the records will be ignored.

stormdata$PROPDAMAGE <- ifelse(stormdata$PROPDMG == 0, 0, ifelse(stormdata$PROPDMGEXP == "", stormdata$PROPDMG, ifelse(stormdata$PROPDMGEXP == "K", 1000*stormdata$PROPDMG,
                               ifelse(stormdata$PROPDMGEXP == "M", 1000000*stormdata$PROPDMG, ifelse(stormdata$PROPDMGEXP == "B", 1000000000*stormdata$PROPDMG, NA)))))

stormdata$CROPDAMAGE <- ifelse(stormdata$CROPDMG == 0, 0, ifelse(stormdata$CROPDMGEXP == "", stormdata$CROPDMG, ifelse(stormdata$CROPDMGEXP == "K", 1000*stormdata$CROPDMG,
                               ifelse(stormdata$CROPDMGEXP == "M", 1000000*stormdata$CROPDMG, ifelse(stormdata$CROPDMGEXP == "B", 1000000000*stormdata$CROPDMG, NA)))))

stormdata$ECONLOSS <- stormdata$PROPDAMAGE + stormdata$CROPDAMAGE
popHealthDataInjured$EVTYPE <- reorder.factor(popHealthDataInjured$EVTYPE, -popHealthDataInjured$Injuries)

Results

Impact of events to population Injuries

ggplot(popHealthDataInjured,aes(EVTYPE,Injuries))+
  geom_bar(stat="identity", fill="#FF0000", width=.3,)+
  theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5)) + 
  xlab("Event type") + 
  ylab("Number of injuries") 

As a conclusion, Tornado is the event that has the most impact to the population injuries.

Impact of events to population Fatalities

popHealthDataFatal$EVTYPE <- reorder.factor(popHealthDataFatal$EVTYPE, -popHealthDataFatal$Fatalities)

ggplot(popHealthDataFatal,aes(EVTYPE,Fatalities))+
  geom_bar(stat="identity", fill="#0000FF", width=.3,)+
  theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5)) + 
  xlab("Event type") + 
  ylab("Number of fatalities") 

As a conclusion, Tornado is the event that has the most impact to the population fatalities

Impact of events to Economic

eventloss <- ddply(stormdata, .(EVTYPE), summarise, sumloss=sum(ECONLOSS, na.rm=TRUE))
sortloss <- eventloss[order(-eventloss$sumloss),]
topeconimpact <- sortloss[1:10,]
barplot(topeconimpact$sumloss,names.arg=topeconimpact$EVTYPE, cex.names=0.5, cex.axis=0.5, las=2, main="Events with Greatest Historic Economic Loss", ylab="sum of economic loss ($)")

As a conclusion, Flood is the event that has the most impact to the economic.