Reproducible Research Project #2, NOAA Storm Data

This project examines the impact of severe weather in the United States and covers the time period from 1950 through November 2011. The analysis aims to investigate which different types of severe weather events are most harmful on the population’s health in respect of general injuries and fatalities. The economic consequences are analyzed by exploring the financial damage done to both general property and agriculture.

## Load libraries into R
library(plyr)
## Warning: package 'plyr' was built under R version 3.4.3
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.3

Data Processing

Read the data into R, and then subset your data to include just the variables necessary. For this analysis just include the following columns: EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP

stormyWeather<- read.csv(bzfile("repdata%2Fdata%2FStormData.csv.bz2"))

stormyWeathertoStudy<- stormyWeather[,c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]

Weather effects on Population Health

These next commands will summarize and order the top health-related events (death and injury).

hurtbyNature<- ddply(stormyWeathertoStudy, .(EVTYPE), summarize, fatalities=sum(FATALITIES), injuries=sum(INJURIES))
deadly<- hurtbyNature[order(hurtbyNature$fatalities, decreasing =TRUE), ]
injury<- hurtbyNature[order(hurtbyNature$injuries, decreasing =TRUE),]


## Here is a summary of the top 6 harmful events
head(injury)
##             EVTYPE fatalities injuries
## 834        TORNADO       5633    91346
## 856      TSTM WIND        504     6957
## 170          FLOOD        470     6789
## 130 EXCESSIVE HEAT       1903     6525
## 464      LIGHTNING        816     5230
## 275           HEAT        937     2100
## Here is a nice plot of the injury data

ggplot(injury[1:6, ], aes(EVTYPE, injuries, fill = EVTYPE)) + geom_bar(stat = "identity") + 
  xlab("Event Type") + ylab("Number of Injuries") + ggtitle("Injuries by Event type") + coord_flip()

Tornados cause the most injuries.

Here is the plot for deaths:

ggplot(deadly[1:6, ], aes(EVTYPE, fatalities, fill = EVTYPE)) + geom_bar(stat = "identity") + 
  xlab("Event Type") + ylab("Number of Deaths") + ggtitle("Deaths by Event type") + coord_flip()

Tornados cause both the most deaths and the most injuries.

Economic Impacts of Severe Weather Events

We examine the structure of our Propery Damage and Crop Damage variables and see that the data needs some additional cleansing so that numerics are all presented in the same fashion. We will make lower characters uppercase and replace symbolic characters with one and blanks with zero.

unique(stormyWeathertoStudy$PROPDMGEXP)
##  [1] K M   B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
##See what I mean? CROPDMGEXP has the same type of issue
stormyWeathertoStudy$PROPDMGEXP <- toupper(stormyWeathertoStudy$PROPDMGEXP)
stormyWeathertoStudy$PROPDMGEXP[stormyWeathertoStudy$PROPDMGEXP %in% c("", "+", "-", "?")] = "1"
stormyWeathertoStudy$CROPDMGEXP <- toupper(stormyWeathertoStudy$CROPDMGEXP)
stormyWeathertoStudy$CROPDMGEXP[stormyWeathertoStudy$CROPDMGEXP %in% c("", "+", "-", "?")] = "1"

Change the letter representation for exponents to numeric for both property and crop damages.

  stormyWeathertoStudy$PROPDMGEXP[stormyWeathertoStudy$PROPDMGEXP %in% c("B")] = "9"
  stormyWeathertoStudy$PROPDMGEXP[stormyWeathertoStudy$PROPDMGEXP %in% c("M")] = "6"
  stormyWeathertoStudy$PROPDMGEXP[stormyWeathertoStudy$PROPDMGEXP %in% c("K")] = "3"
  stormyWeathertoStudy$PROPDMGEXP[stormyWeathertoStudy$PROPDMGEXP %in% c("H")] = "2"  
  stormyWeathertoStudy$CROPDMGEXP[stormyWeathertoStudy$CROPDMGEXP %in% c("B")] = "9"
  stormyWeathertoStudy$CROPDMGEXP[stormyWeathertoStudy$CROPDMGEXP %in% c("M")] = "6"
  stormyWeathertoStudy$CROPDMGEXP[stormyWeathertoStudy$CROPDMGEXP %in% c("K")] = "3"
  stormyWeathertoStudy$CROPDMGEXP[stormyWeathertoStudy$CROPDMGEXP %in% c("H")] = "2"  

Multiply the property and crop damage columns by the appropriate corresponding exponent.

stormyWeathertoStudy$PROPDMGEXP<-10^(as.numeric(stormyWeathertoStudy$PROPDMGEXP))
stormyWeathertoStudy$CROPDMGEXP<-10^(as.numeric(stormyWeathertoStudy$CROPDMGEXP))
stormyWeathertoStudy[is.na(stormyWeathertoStudy$PROPDMG), "PROPDMG"]<- 0
stormyWeathertoStudy[is.na(stormyWeathertoStudy$CROPDMG), "CROPDMG"]<- 0

damage.property = stormyWeathertoStudy$PROPDMG *stormyWeathertoStudy$PROPDMGEXP
data=as.data.frame(cbind(stormyWeathertoStudy,damage.property))
Damage.property = ddply(data, .(EVTYPE), summarize, damage.property = sum(damage.property, na.rm = TRUE))
Damage.property = Damage.property[order(Damage.property$damage.property, decreasing = T), ] 
head(Damage.property)
##                EVTYPE damage.property
## 170             FLOOD    144657709870
## 411 HURRICANE/TYPHOON     69305840000
## 834           TORNADO     56947381244
## 670       STORM SURGE     43323536000
## 153       FLASH FLOOD     16822675842
## 244              HAIL     15735268026

Create a plot for property damage

##Only 3 figures allowed in this analysis; but here is the code
##ggplot(Damage.property[1:6, ], aes(EVTYPE, damage.property, fill = EVTYPE, alpha=0.5)) + geom_bar(stat = "identity") + 
  ##xlab("Event Type") + ylab("Property damages") + ggtitle("Property damages by Event type") + coord_flip()

Now, follow a similar procedure with our data to look at strictly crop damage.

damage.crop = stormyWeathertoStudy$CROPDMG *stormyWeathertoStudy$CROPDMGEXP
data2=as.data.frame(cbind(stormyWeathertoStudy,damage.crop))
                   
                   
Damage.crop = ddply(data2, .(EVTYPE), summarize, damage.crop = sum(damage.crop, na.rm = TRUE))                     
Damage.crop = Damage.crop[order(Damage.crop$damage.crop, decreasing = T), ]       
head(Damage.crop)                                        
##          EVTYPE damage.crop
## 95      DROUGHT 13972566000
## 170       FLOOD  5661968450
## 590 RIVER FLOOD  5029459000
## 427   ICE STORM  5022113500
## 244        HAIL  3025954500
## 402   HURRICANE  2741910000

Drought appears to be the most damaging event in regards to agriculture. Here is the plot:

##ggplot(Damage.crop[1:6, ], aes(EVTYPE, damage.crop, fill = EVTYPE)) + geom_bar(stat = "identity") + 
  ## xlab("Event Type") + ylab("Crop damages") + ggtitle("Crop damages by Event type") + coord_flip()

We can combine property damage and crop damage to determine total damages:

stormyWeathertoStudy <- within(stormyWeathertoStudy, TOTALDMG <- PROPDMG * PROPDMGEXP + CROPDMG * CROPDMGEXP)
DamageByType <- aggregate(stormyWeathertoStudy$TOTALDMG, by = list(EVTYPE = stormyWeathertoStudy$EVTYPE), 
                          FUN = sum)
DamageByType <- DamageByType[order(DamageByType$x, decreasing = TRUE), ]
##Display the top 6 most damaging event types
head(DamageByType)
##                EVTYPE            x
## 170             FLOOD 150319678320
## 411 HURRICANE/TYPHOON  71913712800
## 834           TORNADO  57362334514
## 670       STORM SURGE  43323541000
## 244              HAIL  18761222526
## 153       FLASH FLOOD  18243992942

When combining property and crop damages, flood leaves drought in the dust.

TopDamage <- DamageByType[1:5, ]
ggplot(TopDamage, aes(EVTYPE, y = x, fill=EVTYPE)) + geom_bar(stat = "identity") + xlab("Event Type") + 
  ylab("Damage in Dollars") + ggtitle("Damage by Event type") + coord_flip() 

Floods cause about $150B in damages, followed next by hurricanes and typhoons.

Results

In summary, this basic analysis reveals that tornados have the greatest effect on health with 5,633 deaths and 91,346 injuries. Floods have the greatest overall economic impact, with approximately $150B in damages.