Synopsis

We explore the ational Oceanic and Atmospheric Administration’s (NOAA) Storm Events Database and try to find out the severe weather events in the US that cause the maximum Human distress and Financial damage. The analysis shows that tornadoes are the most destructive in terms of human injuries and deaths. Excessive heat and Flash floods are the next most fatal events, whereas Thunderstorms and Floods cause the next most number of injuries. In terms of financial damage, Floods, hurricanes and tornadoes are the top causes of property damage, whereas Droughts, Floods and River floods are the top causes to Crop damage. The data available from 1950 to 2011 show over 15000 deaths due to weather events and close to US$ 500 Bn loss in terms of property and crop damage.

Data Processing

The analysis is done in R, starting with the loading of the file:

dt <- read.csv(bzfile("repdata_data_StormData.csv.bz2"))

Effect on Human Life

To understand the effect on human life, we find the total number of fatalities and injuries by the different event types.

library("dplyr")
grpddt <- tbl_df(dt) %>% group_by(EVTYPE) %>% summarise(sum(FATALITIES, na.rm=T), sum(INJURIES,na.rm=T))
names(grpddt)[1] <- "evtype"
names(grpddt)[2] <- "fatalities"
names(grpddt)[3] <- "injuries"
head(arrange(grpddt, desc(fatalities)))
## Source: local data frame [6 x 3]
## 
##           evtype fatalities injuries
## 1        TORNADO       5633    91346
## 2 EXCESSIVE HEAT       1903     6525
## 3    FLASH FLOOD        978     1777
## 4           HEAT        937     2100
## 5      LIGHTNING        816     5230
## 6      TSTM WIND        504     6957
head(arrange(grpddt, desc(injuries)))
## Source: local data frame [6 x 3]
## 
##           evtype fatalities injuries
## 1        TORNADO       5633    91346
## 2      TSTM WIND        504     6957
## 3          FLOOD        470     6789
## 4 EXCESSIVE HEAT       1903     6525
## 5      LIGHTNING        816     5230
## 6           HEAT        937     2100

The results show that tornadoes, excessive heat and flash floods are the top killers, whereas tornadoes (again), thunderstorms and floods cause the most injuries.

Just to compare the magnitudes of the top 15 severe events in terms of human life, we draw a bar graph for fatalities and injuries.

library("reshape2")
library("ggplot2")

topfatal <- arrange(grpddt, desc(fatalities))
topfatal <- topfatal[1:15, ]
topfatal$evtype <- factor(topfatal$evtype, levels=topfatal$evtype) #Order the levels decreasingly
m<- melt(topfatal, id="evtype")
ggplot(m, aes(x=evtype, y=value, fill=variable)) + 
      geom_bar(stat="identity", position="dodge")+
      labs(x = "Event", y = "Human Fatalities and Injuries", title = "Top Harmful Events to Population Health")+
      theme(axis.text.x = element_text(angle = 50, vjust = 1, hjust=1))+
      coord_cartesian(ylim = c(0, 20000))

Note that injuries from tornado (91346) are much higher those from any other events, hence the y-axis has been cut off at a lower level to show the variation among the other events.

Financial Damage

Crop Damage and Property Damage are measures of Financial damage. However, in the data, the CROPDMG and PROPDMG variables are accompanied by CROPDMGEXP and PROPDMGEXP variables, which give the multiplier for the corresponding variables. Though the multipliers “h|H” (100), “k|K” (1000), “m|M” (millions) and “b|B” (billions) are self-explanatory, others like numbers 0-9, +, -, ? are not, and hence we decide to leave out the rows with such values.

dt2 <- dt[,c("EVTYPE", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
leaveOut <- c(0:8, "?","+","-")
rowsToLeaveOut <- dt2$PROPDMGEXP %in% leaveOut | dt2$CROPDMGEXP %in% leaveOut
sum(rowsToLeaveOut)
## [1] 341

Only 341 out of 902297 observations are lost due to this data cleaning.

Next, we calculate the actual amount in USD of the Property and Crop Damages after taking into consideration the multiplier. Then we sum them up by the different event types and see which are the top contributors.

value <- function(base, unit){
base * switch(as.character(unit),
"b" = 10^9, "B" = 10^9,
"m" = 10^6, "M" = 10^6,
"k" = 10^3, "K" = 10^3,
"h" = 100,  "H" = 100,
1)
}

dt2 <- dt2[-which(rowsToLeaveOut),]
dt2$PropertyDamage <- mapply(value, dt2$PROPDMG, dt2$PROPDMGEXP)
dt2$CropDamage <- mapply(value, dt2$CROPDMG, dt2$CROPDMGEXP)
grpddt2 <- tbl_df(dt2) %>% group_by(EVTYPE) %>% summarise(sum(PropertyDamage, na.rm=T), sum(CropDamage,na.rm=T))
names(grpddt2)[1] <- "evtype"
names(grpddt2)[2] <- "PropertyDamage"
names(grpddt2)[3] <- "CropDamage"
head(arrange(grpddt2, desc(PropertyDamage)))
## Source: local data frame [6 x 3]
## 
##              evtype PropertyDamage CropDamage
## 1             FLOOD   144657709807 5661968450
## 2 HURRICANE/TYPHOON    69305840000 2607872800
## 3           TORNADO    56936985483  364950110
## 4       STORM SURGE    43323536000       5000
## 5       FLASH FLOOD    16140811717 1420727100
## 6              HAIL    15732262277 3000954453
head(arrange(grpddt2, desc(CropDamage)))
## Source: local data frame [6 x 3]
## 
##        evtype PropertyDamage  CropDamage
## 1     DROUGHT     1046106000 13972566000
## 2       FLOOD   144657709807  5661968450
## 3 RIVER FLOOD     5118945500  5029459000
## 4   ICE STORM     3944927810  5022110000
## 5        HAIL    15732262277  3000954453
## 6   HURRICANE    11868319010  2741910000

We see that the top 3 causes of property damage are Floods, Hurricanes / Typhoons and Tornadoes, whereas top 3 causes of Crop damage are Drought, Flood and River Flood.

We try a different form of visual representation of the data this time. We try to find the percentage of total financial damage (property and crop) that each weather event has contributed to, and represent this using a pie chart.

grpddt2$Damage <- (grpddt2$CropDamage + grpddt2$PropertyDamage)/10^9
topdamage <- arrange(grpddt2, desc(Damage))
topdamage$pct <- round(topdamage$Damage/sum(topdamage$Damage)*100)
topdamage <- topdamage[1:10, ]
topdamage$evtype <- factor(topdamage$evtype, levels=topdamage$evtype)
lbls = as.character(topdamage$evtype)
lbls <- paste(lbls, topdamage$pct)
lbls <- paste(lbls,"%",sep="")
library("plotrix")
pie3D(topdamage$Damage, theta = pi/4, radius = 1.2, labels = lbls, labelcex=0.75, explode=0.1, labelrad = 1.5, main="Top Weather Event Contributors to Financial Damage in US")

Total Loss

Total number of deaths:

sum(grpddt$fatalities)
## [1] 15145

Total financial damage:

sum(grpddt2$Damage)
## [1] 476.346

Results

  1. Tornadoes are the most destructive in terms of human injuries and deaths. Excessive heat and Flash floods are the next most fatal events, whereas Thunderstorms and Floods cause the next most number of injuries.

  2. In terms of financial damage, Floods, hurricanes and tornadoes are the top causes of property damage, whereas Droughts, Floods and River floods are the top causes to Crop damage.

  3. There have been a total of over 15000 deaths in the US in weather related incidents from 1950 to 2011. The total damage to property and crops is worth close to 500 billion US dollars. These high figures point to the need to continue our efforts to prepare for natural calamities.