In this report, we analyzed NOAA severe weather events data to see what types of weather events causes the highest human loss, or highest economic loss. The metrics we primarily looked at are: Injuries and Fatalities for human loss, Property Damage and Crop Damage for economic costs.
Our results show that, tornado is the most severe weather in terms of incurring human loss. This is possibly due to the reason that tornado is less predictable. It occurs in a sudden and will attack people in a very short period of time.
On the other hand, Hurricane and Flood are two weather types that would incur highest economic loss. Hurricane are much more severe on a single event than flood, while flood occur much more frequently than Hurricane.
The package data.table is used for reading and processing purpose since the data set is quite huge
library(data.table)
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:data.table':
##
## between, last
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
StormData<- data.table(read.csv(bzfile('~/Google Drive/courses/Reproducible Research/PA2/repdata-data-StormData.csv.bz2'), stringsAsFactors = FALSE,header = TRUE))
Here is a list of the metrics/variables that we are concerned about regarding the severity of storms.
## <!-- html table generated in R 3.1.2 by xtable 1.7-4 package -->
## <!-- Sun Apr 26 16:11:09 2015 -->
## <table border=1>
## <tr> <th> </th> <th> Column_Name....names.StormData1. </th> <th> Description....c..State....Event.Type....Fatalities....Injuries... </th> </tr>
## <tr> <td align="right"> 1 </td> <td> STATE__ </td> <td> State </td> </tr>
## <tr> <td align="right"> 2 </td> <td> EVTYPE </td> <td> Event Type </td> </tr>
## <tr> <td align="right"> 3 </td> <td> FATALITIES </td> <td> Fatalities </td> </tr>
## <tr> <td align="right"> 4 </td> <td> INJURIES </td> <td> Injuries </td> </tr>
## <tr> <td align="right"> 5 </td> <td> PROPDMG </td> <td> Property Damage </td> </tr>
## <tr> <td align="right"> 6 </td> <td> PROPDMGEXP </td> <td> Property Damage Expense </td> </tr>
## <tr> <td align="right"> 7 </td> <td> CROPDMG </td> <td> Crop Damage </td> </tr>
## <tr> <td align="right"> 8 </td> <td> CROPDMGEXP </td> <td> Crop Damage Expense </td> </tr>
## <tr> <td align="right"> 9 </td> <td> F </td> <td> Fujita Tornado Intensity Scale </td> </tr>
## <tr> <td align="right"> 10 </td> <td> MAG </td> <td> Hail in Inches </td> </tr>
## </table>
We are interested in looking at:
* The length of the storm * The Injuries and Fatalities caused for each storm event * The economics loss, primarily property and crop damage for each storm event
The units for property and crop damage are ‘K’,‘M’,‘B’ respectively. So we need to transform them accordingly
StormData1 <- StormData1 %>% mutate(PropDamageCost = ifelse(toupper(PROPDMGEXP) == 'K', 10^3*PROPDMG, ifelse(toupper(PROPDMGEXP) == 'M',PROPDMG*10^6,ifelse(toupper(PROPDMGEXP) == 'B',PROPDMG*10^9,PROPDMG)))) %>% mutate(CropDamageCost=ifelse(toupper(CROPDMGEXP) == 'K', 10^3*CROPDMG, ifelse(toupper(CROPDMGEXP) == 'M',CROPDMG*10^6,ifelse(toupper(CROPDMGEXP) == 'B',CROPDMG*10^9,CROPDMG))))
In this section, I listed the top five events by fataities, injuries, economic losses and property losses.
From the results, it shows that:
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.1.3
fat_rank_5 <- StormData1[order(FATALITIES,decreasing=TRUE),.(EVTYPE,FATALITIES)][1:5]
inj_rank_5 <- StormData1[order(INJURIES,decreasing=TRUE),.(EVTYPE,INJURIES)][1:5]
PropDmg_rank_5 <- StormData1[order(PropDamageCost,decreasing=TRUE),.(EVTYPE,PropDamageCost)][1:5]
CropDmg_rank_5 <- StormData1[order(CropDamageCost,decreasing=TRUE),.(EVTYPE,CropDamageCost)][1:5]
We also plot the distribution/frequency for those severe weather types to see how frequent each type of events are the most severe.
Although the average injuries of heat related events is as severe as tornado related events, we see that the number of tornado instances which caused more than 100 injuries, is far higher than heat related instances. This indicates that, Tornado is the most severe extreme weather that causes human loss
In terms of economic loss, we compare all flood related events vs hurricane related events. On one hand, hurricanes will cause much more property and crop damages, on average. However, the histograms show that, flood related events occur way more frequently than hurricanes.
#Heat vs Tornado
par(mfrow = c(1,1))
tornado_heat_injuries <- data.frame(subset(StormData1, (EVTYPE == 'TORNADO'|grepl('HEAT',EVTYPE))&INJURIES>100)[,.(EVTYPE,INJURIES)])
tornado_heat_injuries[which(grepl('HEAT',tornado_heat_injuries$EVTYPE)),1] <- 'HEAT'
ggplot(tornado_heat_injuries,aes(x=INJURIES)) + geom_histogram(aes(fill=EVTYPE)) + ggtitle('Heat and Tornado Injuries Distribution')
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
summarise(group_by(tornado_heat_injuries,EVTYPE),mean_injuries = mean(INJURIES))
## Source: local data frame [2 x 2]
##
## EVTYPE mean_injuries
## 1 HEAT 203.2308
## 2 TORNADO 259.7153
#Hurricane/Typhoon vs Flood
Flood_Hurricane_PropDmg <- data.frame(subset(StormData1, (grepl('FLOOD',EVTYPE)|grepl('HURRICANE',EVTYPE))&PropDamageCost>0)[,.(EVTYPE,PropDamageCost)])
Flood_Hurricane_PropDmg[which(grepl('FLOOD',Flood_Hurricane_PropDmg$EVTYPE)),1] <- 'FLOOD'
Flood_Hurricane_PropDmg[which(grepl('HURRICANE',Flood_Hurricane_PropDmg$EVTYPE)),1] <- 'HURRICANE'
ggplot(Flood_Hurricane_PropDmg,aes(x=PropDamageCost)) + geom_histogram(aes(fill=EVTYPE)) + ggtitle('Flood and Hurricane Property Damage Cost comparison')
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
summarise(group_by(Flood_Hurricane_PropDmg,EVTYPE),mean_prop_dmg = mean(PropDamageCost))
## Source: local data frame [2 x 2]
##
## EVTYPE mean_prop_dmg
## 1 FLOOD 5302514
## 2 HURRICANE 405531962
Flood_Hurricane_CropDmg <- data.frame(subset(StormData1, (grepl('FLOOD',EVTYPE)|grepl('HURRICANE',EVTYPE))&CropDamageCost>0)[,.(EVTYPE,CropDamageCost)])
Flood_Hurricane_CropDmg[which(grepl('FLOOD',Flood_Hurricane_CropDmg$EVTYPE)),1] <- 'FLOOD'
Flood_Hurricane_CropDmg[which(grepl('HURRICANE',Flood_Hurricane_CropDmg$EVTYPE)),1] <- 'HURRICANE'
ggplot(Flood_Hurricane_CropDmg,aes(x=CropDamageCost)) + geom_histogram(aes(fill=EVTYPE)) + ggtitle('Flood and Hurricane Crop Damage Cost comparison')
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
summarise(group_by(Flood_Hurricane_CropDmg,EVTYPE),mean_crop_dmg = mean(CropDamageCost))
## Source: local data frame [2 x 2]
##
## EVTYPE mean_crop_dmg
## 1 FLOOD 2985038
## 2 HURRICANE 61281031