We explore the ational Oceanic and Atmospheric Administration’s (NOAA) Storm Events Database and try to find out the severe weather events in the US that cause the maximum Human distress and Financial damage. The analysis shows that tornadoes are the most destructive in terms of human injuries and deaths. Excessive heat and Flash floods are the next most fatal events, whereas Thunderstorms and Floods cause the next most number of injuries. In terms of financial damage, Floods, hurricanes and tornadoes are the top causes of property damage, whereas Droughts, Floods and River floods are the top causes to Crop damage. The data available from 1950 to 2011 show over 15000 deaths due to weather events and close to US$ 500 Bn loss in terms of property and crop damage.
The analysis is done in R, starting with the loading of the file:
dt <- read.csv(bzfile("repdata_data_StormData.csv.bz2"))
To understand the effect on human life, we find the total number of fatalities and injuries by the different event types.
library("dplyr")
grpddt <- tbl_df(dt) %>% group_by(EVTYPE) %>% summarise(sum(FATALITIES, na.rm=T), sum(INJURIES,na.rm=T))
names(grpddt)[1] <- "evtype"
names(grpddt)[2] <- "fatalities"
names(grpddt)[3] <- "injuries"
head(arrange(grpddt, desc(fatalities)))
## Source: local data frame [6 x 3]
##
## evtype fatalities injuries
## 1 TORNADO 5633 91346
## 2 EXCESSIVE HEAT 1903 6525
## 3 FLASH FLOOD 978 1777
## 4 HEAT 937 2100
## 5 LIGHTNING 816 5230
## 6 TSTM WIND 504 6957
head(arrange(grpddt, desc(injuries)))
## Source: local data frame [6 x 3]
##
## evtype fatalities injuries
## 1 TORNADO 5633 91346
## 2 TSTM WIND 504 6957
## 3 FLOOD 470 6789
## 4 EXCESSIVE HEAT 1903 6525
## 5 LIGHTNING 816 5230
## 6 HEAT 937 2100
The results show that tornadoes, excessive heat and flash floods are the top killers, whereas tornadoes (again), thunderstorms and floods cause the most injuries.
Just to compare the magnitudes of the top 15 severe events in terms of human life, we draw a bar graph for fatalities and injuries.
library("reshape2")
library("ggplot2")
topfatal <- arrange(grpddt, desc(fatalities))
topfatal <- topfatal[1:15, ]
topfatal$evtype <- factor(topfatal$evtype, levels=topfatal$evtype) #Order the levels decreasingly
m<- melt(topfatal, id="evtype")
ggplot(m, aes(x=evtype, y=value, fill=variable)) +
geom_bar(stat="identity", position="dodge")+
labs(x = "Event", y = "Human Fatalities and Injuries", title = "Top Harmful Events to Population Health")+
theme(axis.text.x = element_text(angle = 50, vjust = 1, hjust=1))+
coord_cartesian(ylim = c(0, 20000))
Note that injuries from tornado (91346) are much higher those from any other events, hence the y-axis has been cut off at a lower level to show the variation among the other events.
Crop Damage and Property Damage are measures of Financial damage. However, in the data, the CROPDMG and PROPDMG variables are accompanied by CROPDMGEXP and PROPDMGEXP variables, which give the multiplier for the corresponding variables. Though the multipliers “h|H” (100), “k|K” (1000), “m|M” (millions) and “b|B” (billions) are self-explanatory, others like numbers 0-9, +, -, ? are not, and hence we decide to leave out the rows with such values.
dt2 <- dt[,c("EVTYPE", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
leaveOut <- c(0:8, "?","+","-")
rowsToLeaveOut <- dt2$PROPDMGEXP %in% leaveOut | dt2$CROPDMGEXP %in% leaveOut
sum(rowsToLeaveOut)
## [1] 341
Only 341 out of 902297 observations are lost due to this data cleaning.
Next, we calculate the actual amount in USD of the Property and Crop Damages after taking into consideration the multiplier. Then we sum them up by the different event types and see which are the top contributors.
value <- function(base, unit){
base * switch(as.character(unit),
"b" = 10^9, "B" = 10^9,
"m" = 10^6, "M" = 10^6,
"k" = 10^3, "K" = 10^3,
"h" = 100, "H" = 100,
1)
}
dt2 <- dt2[-which(rowsToLeaveOut),]
dt2$PropertyDamage <- mapply(value, dt2$PROPDMG, dt2$PROPDMGEXP)
dt2$CropDamage <- mapply(value, dt2$CROPDMG, dt2$CROPDMGEXP)
grpddt2 <- tbl_df(dt2) %>% group_by(EVTYPE) %>% summarise(sum(PropertyDamage, na.rm=T), sum(CropDamage,na.rm=T))
names(grpddt2)[1] <- "evtype"
names(grpddt2)[2] <- "PropertyDamage"
names(grpddt2)[3] <- "CropDamage"
head(arrange(grpddt2, desc(PropertyDamage)))
## Source: local data frame [6 x 3]
##
## evtype PropertyDamage CropDamage
## 1 FLOOD 144657709807 5661968450
## 2 HURRICANE/TYPHOON 69305840000 2607872800
## 3 TORNADO 56936985483 364950110
## 4 STORM SURGE 43323536000 5000
## 5 FLASH FLOOD 16140811717 1420727100
## 6 HAIL 15732262277 3000954453
head(arrange(grpddt2, desc(CropDamage)))
## Source: local data frame [6 x 3]
##
## evtype PropertyDamage CropDamage
## 1 DROUGHT 1046106000 13972566000
## 2 FLOOD 144657709807 5661968450
## 3 RIVER FLOOD 5118945500 5029459000
## 4 ICE STORM 3944927810 5022110000
## 5 HAIL 15732262277 3000954453
## 6 HURRICANE 11868319010 2741910000
We see that the top 3 causes of property damage are Floods, Hurricanes / Typhoons and Tornadoes, whereas top 3 causes of Crop damage are Drought, Flood and River Flood.
We try a different form of visual representation of the data this time. We try to find the percentage of total financial damage (property and crop) that each weather event has contributed to, and represent this using a pie chart.
grpddt2$Damage <- (grpddt2$CropDamage + grpddt2$PropertyDamage)/10^9
topdamage <- arrange(grpddt2, desc(Damage))
topdamage$pct <- round(topdamage$Damage/sum(topdamage$Damage)*100)
topdamage <- topdamage[1:10, ]
topdamage$evtype <- factor(topdamage$evtype, levels=topdamage$evtype)
lbls = as.character(topdamage$evtype)
lbls <- paste(lbls, topdamage$pct)
lbls <- paste(lbls,"%",sep="")
library("plotrix")
pie3D(topdamage$Damage, theta = pi/4, radius = 1.2, labels = lbls, labelcex=0.75, explode=0.1, labelrad = 1.5, main="Top Weather Event Contributors to Financial Damage in US")
Total number of deaths:
sum(grpddt$fatalities)
## [1] 15145
Total financial damage:
sum(grpddt2$Damage)
## [1] 476.346
Tornadoes are the most destructive in terms of human injuries and deaths. Excessive heat and Flash floods are the next most fatal events, whereas Thunderstorms and Floods cause the next most number of injuries.
In terms of financial damage, Floods, hurricanes and tornadoes are the top causes of property damage, whereas Droughts, Floods and River floods are the top causes to Crop damage.
There have been a total of over 15000 deaths in the US in weather related incidents from 1950 to 2011. The total damage to property and crops is worth close to 500 billion US dollars. These high figures point to the need to continue our efforts to prepare for natural calamities.