Synopsis

In this report, we analyzed NOAA severe weather events data to see what types of weather events causes the highest human loss, or highest economic loss. The metrics we primarily looked at are: Injuries and Fatalities for human loss, Property Damage and Crop Damage for economic costs.

Our results show that, tornado is the most severe weather in terms of incurring human loss. This is possibly due to the reason that tornado is less predictable. It occurs in a sudden and will attack people in a very short period of time.

On the other hand, Hurricane and Flood are two weather types that would incur highest economic loss. Hurricane are much more severe on a single event than flood, while flood occur much more frequently than Hurricane.

Load the data

The package data.table is used for reading and processing purpose since the data set is quite huge

library(data.table)
library(dplyr)
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:data.table':
## 
##     between, last
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
StormData<- data.table(read.csv(bzfile('~/Google Drive/courses/Reproducible Research/PA2/repdata-data-StormData.csv.bz2'), stringsAsFactors = FALSE,header = TRUE))

Description of the dataset

Here is a list of the metrics/variables that we are concerned about regarding the severity of storms.

## <!-- html table generated in R 3.1.2 by xtable 1.7-4 package -->
## <!-- Sun Apr 26 16:11:09 2015 -->
## <table border=1>
## <tr> <th>  </th> <th> Column_Name....names.StormData1. </th> <th> Description....c..State....Event.Type....Fatalities....Injuries... </th>  </tr>
##   <tr> <td align="right"> 1 </td> <td> STATE__ </td> <td> State </td> </tr>
##   <tr> <td align="right"> 2 </td> <td> EVTYPE </td> <td> Event Type </td> </tr>
##   <tr> <td align="right"> 3 </td> <td> FATALITIES </td> <td> Fatalities </td> </tr>
##   <tr> <td align="right"> 4 </td> <td> INJURIES </td> <td> Injuries </td> </tr>
##   <tr> <td align="right"> 5 </td> <td> PROPDMG </td> <td> Property Damage </td> </tr>
##   <tr> <td align="right"> 6 </td> <td> PROPDMGEXP </td> <td> Property Damage Expense </td> </tr>
##   <tr> <td align="right"> 7 </td> <td> CROPDMG </td> <td> Crop Damage </td> </tr>
##   <tr> <td align="right"> 8 </td> <td> CROPDMGEXP </td> <td> Crop Damage Expense </td> </tr>
##   <tr> <td align="right"> 9 </td> <td> F </td> <td> Fujita Tornado Intensity Scale </td> </tr>
##   <tr> <td align="right"> 10 </td> <td> MAG </td> <td> Hail in Inches </td> </tr>
##    </table>

Preprocessing Data

We are interested in looking at:
* The length of the storm * The Injuries and Fatalities caused for each storm event * The economics loss, primarily property and crop damage for each storm event

The units for property and crop damage are ‘K’,‘M’,‘B’ respectively. So we need to transform them accordingly

StormData1 <- StormData1 %>% mutate(PropDamageCost  = ifelse(toupper(PROPDMGEXP) == 'K', 10^3*PROPDMG, ifelse(toupper(PROPDMGEXP) == 'M',PROPDMG*10^6,ifelse(toupper(PROPDMGEXP) == 'B',PROPDMG*10^9,PROPDMG)))) %>% mutate(CropDamageCost=ifelse(toupper(CROPDMGEXP) == 'K', 10^3*CROPDMG, ifelse(toupper(CROPDMGEXP) == 'M',CROPDMG*10^6,ifelse(toupper(CROPDMGEXP) == 'B',CROPDMG*10^9,CROPDMG))))

Results

In this section, I listed the top five events by fataities, injuries, economic losses and property losses.

From the results, it shows that:

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.1.3
fat_rank_5 <- StormData1[order(FATALITIES,decreasing=TRUE),.(EVTYPE,FATALITIES)][1:5]
inj_rank_5 <- StormData1[order(INJURIES,decreasing=TRUE),.(EVTYPE,INJURIES)][1:5]
PropDmg_rank_5 <- StormData1[order(PropDamageCost,decreasing=TRUE),.(EVTYPE,PropDamageCost)][1:5]
CropDmg_rank_5 <- StormData1[order(CropDamageCost,decreasing=TRUE),.(EVTYPE,CropDamageCost)][1:5]

We also plot the distribution/frequency for those severe weather types to see how frequent each type of events are the most severe.

Although the average injuries of heat related events is as severe as tornado related events, we see that the number of tornado instances which caused more than 100 injuries, is far higher than heat related instances. This indicates that, Tornado is the most severe extreme weather that causes human loss

In terms of economic loss, we compare all flood related events vs hurricane related events. On one hand, hurricanes will cause much more property and crop damages, on average. However, the histograms show that, flood related events occur way more frequently than hurricanes.

#Heat vs Tornado
par(mfrow = c(1,1))
tornado_heat_injuries <- data.frame(subset(StormData1, (EVTYPE == 'TORNADO'|grepl('HEAT',EVTYPE))&INJURIES>100)[,.(EVTYPE,INJURIES)])
tornado_heat_injuries[which(grepl('HEAT',tornado_heat_injuries$EVTYPE)),1] <- 'HEAT'
ggplot(tornado_heat_injuries,aes(x=INJURIES)) + geom_histogram(aes(fill=EVTYPE)) + ggtitle('Heat and Tornado Injuries Distribution')
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

summarise(group_by(tornado_heat_injuries,EVTYPE),mean_injuries = mean(INJURIES))
## Source: local data frame [2 x 2]
## 
##    EVTYPE mean_injuries
## 1    HEAT      203.2308
## 2 TORNADO      259.7153
#Hurricane/Typhoon vs Flood
Flood_Hurricane_PropDmg <- data.frame(subset(StormData1, (grepl('FLOOD',EVTYPE)|grepl('HURRICANE',EVTYPE))&PropDamageCost>0)[,.(EVTYPE,PropDamageCost)])
Flood_Hurricane_PropDmg[which(grepl('FLOOD',Flood_Hurricane_PropDmg$EVTYPE)),1] <- 'FLOOD'
Flood_Hurricane_PropDmg[which(grepl('HURRICANE',Flood_Hurricane_PropDmg$EVTYPE)),1] <- 'HURRICANE'

ggplot(Flood_Hurricane_PropDmg,aes(x=PropDamageCost)) + geom_histogram(aes(fill=EVTYPE)) + ggtitle('Flood and Hurricane Property Damage Cost comparison')
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

summarise(group_by(Flood_Hurricane_PropDmg,EVTYPE),mean_prop_dmg = mean(PropDamageCost))
## Source: local data frame [2 x 2]
## 
##      EVTYPE mean_prop_dmg
## 1     FLOOD       5302514
## 2 HURRICANE     405531962
Flood_Hurricane_CropDmg <- data.frame(subset(StormData1, (grepl('FLOOD',EVTYPE)|grepl('HURRICANE',EVTYPE))&CropDamageCost>0)[,.(EVTYPE,CropDamageCost)])
Flood_Hurricane_CropDmg[which(grepl('FLOOD',Flood_Hurricane_CropDmg$EVTYPE)),1] <- 'FLOOD'
Flood_Hurricane_CropDmg[which(grepl('HURRICANE',Flood_Hurricane_CropDmg$EVTYPE)),1] <- 'HURRICANE'
ggplot(Flood_Hurricane_CropDmg,aes(x=CropDamageCost)) + geom_histogram(aes(fill=EVTYPE)) + ggtitle('Flood and Hurricane Crop Damage Cost comparison')
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

summarise(group_by(Flood_Hurricane_CropDmg,EVTYPE),mean_crop_dmg = mean(CropDamageCost))
## Source: local data frame [2 x 2]
## 
##      EVTYPE mean_crop_dmg
## 1     FLOOD       2985038
## 2 HURRICANE      61281031