Synopsis

This report explores the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. The specific dataset this report used for generating analysis tracks characteristics of major storms and weather events in the United States between the year 1950 and 2011, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. We are examining the most damaging types of weather events in terms of total injuries, fatalities and economic impact. Based on the findings of our exploratory analysis, we found that the most damaging event with the highest economic impact is flood, which led to around 150 billions dollar worth of combined agriculture and property loss between the year 1950 and 2011. The weather event that leads to the highest fatalities and injuries during the same time span is tornado, which results in ten folds higher amount of injury cases than the weather event ranked in second place.

Data Processing

Load the data

url<-"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
if (!file.exists("repdata_data_StormData.csv.bz2")){
  download.file(url, destfile = "repdata_data_StormData.csv.bz2")}
stormd<-read.csv("repdata_data_StormData.csv.bz2")

Processing of Economic Data

Regarding how to handle exponent value of PROPDMGEXP and CROPDMGEXP columns of the NOAA database prior to the year 2011, I used methods suggested and tested in this article.

expo<-union(unique(stormd$PROPDMGEXP),unique(stormd$CROPDMGEXP))
for (i in 1:dim(stormd)[1]){
  if(!stormd$CROPDMGEXP[i] %in% expo){
    stormd$CROPEXP[i]=10
  }
  if(!stormd$PROPDMGEXP[i] %in% expo){
    stormd$PROPEXP[i]=10
  }
}
stormd$PROPEXP[stormd$PROPDMGEXP == "H"|stormd$PROPDMGEXP == "h"] <- 100
stormd$PROPEXP[stormd$PROPDMGEXP == "K"] <- 1000
stormd$PROPEXP[stormd$PROPDMGEXP == "M"|stormd$PROPDMGEXP == "m"] <- 1e+06
stormd$PROPEXP[stormd$PROPDMGEXP == "B"] <- 1e+09
stormd$PROPEXP[stormd$PROPDMGEXP == ""] <- 0
stormd$PROPEXP[stormd$PROPDMGEXP == "+"] <- 1
stormd$PROPEXP[stormd$PROPDMGEXP == "-"] <- 0
stormd$PROPEXP[stormd$PROPDMGEXP == "?"] <- 0
stormd$CROPEXP[stormd$CROPDMGEXP == "H"|stormd$CROPDMGEXP == "h"] <- 100
stormd$CROPEXP[stormd$CROPDMGEXP == "K"] <- 1000
stormd$CROPEXP[stormd$CROPDMGEXP == "M"|stormd$CROPDMGEXP == "m"] <- 1e+06
stormd$CROPEXP[stormd$CROPDMGEXP == "B"] <- 1e+09
stormd$CROPEXP[stormd$CROPDMGEXP == ""] <- 0
stormd$CROPEXP[stormd$CROPDMGEXP == "+"] <- 1
stormd$CROPEXP[stormd$CROPDMGEXP == "-"] <- 0
stormd$CROPEXP[stormd$CROPDMGEXP == "?"] <- 0

As a result of code above, two new columns named CROPEXP and PROPEXP with numeric exponent have been added to the original table. Then we calculate the actual values of damage for each category by multiplification, the results of which were then added to two more new columns named PropDamage and CropDamage.

stormd$PropDamage<-stormd$PROPDMG * stormd$PROPEXP
stormd$CropDamage<-stormd$CROPDMG * stormd$CROPEXP

Then we started from looking at the total crop damages by weather events and property damages by weather events respectively. We decided to combine the damages from both categories as the economic losses recorded for property damage tend to be much higher(about 10 folds) than those of crop damages, thereby we felt that there wasn’t justification for analyzing them seperately for the specific purpose of our study.

library(dplyr)
summary<-stormd%>%
  group_by(EVTYPE)%>%
  summarize(sum_pdamage=sum(PropDamage, na.rm = TRUE), sum_cdamage=sum(CropDamage,na.rm=TRUE))
summary$total<-summary$sum_pdamage+summary$sum_cdamage

Processing of Casualty Data

summary2<-stormd%>%
  group_by(EVTYPE)%>%
  summarize(sum_fatality=sum(FATALITIES, na.rm = TRUE), sum_injury=sum(INJURIES,na.rm=TRUE))

Results

Economic Impact

library(ggplot2)
ggplot(arrange(summary,desc(total))[1:10,],aes(x=toupper(EVTYPE), y=total/10^9))+
  geom_bar(stat="identity",fill="#c8d9ca")+ 
  theme(legend.position = "null",axis.title.x=element_blank(),
        axis.text.x = element_text(angle = 90,hjust=1,size=9),plot.margin = unit(c(0,0,0,0), "cm"))+
  ylim(0,200)+geom_text(aes(label=round(total/10^9,digits=0),vjust=-0.3, size=3.5))+ 
  ylab("Cost of Damage ($ Billions)")+
  ggtitle("Total 10 Weather Events with Highest Total Property and Crop Damage")

As the barplot above suggests, flood leads to the highest economic loss at 150 billions total dollars during the past 60 years. It is followed by other severe weather events, such as hurricane/typhoon at 72 billions and tornado at 57 billions.

Health and Safety Impact

ggplot(arrange(summary2,desc(sum_fatality))[1:10,],aes(x=toupper(EVTYPE), y=sum_fatality))+
  geom_bar(stat="identity",fill="#c8d9ca")+ 
  theme(legend.position = "null",axis.title.x=element_blank(),
        axis.text.x = element_text(angle = 90,hjust=1,size=9),plot.margin = unit(c(0,0,0,0), "cm"))+
  geom_text(aes(label=sum_fatality,vjust=-0.3, size=3.5))+ 
  ylab("Number of Cases")+
  ggtitle("Top 10 Weather Events with Highest Fatalities")

The barplot above shows the top 10 weather events with highest kill counts, with No.1 being tornado at 5633 lives lost, followed by high heat weather at 1903 over a time span of 60 years.

ggplot(arrange(summary2,desc(sum_injury))[1:10,],aes(x=toupper(EVTYPE), y=sum_injury))+
  geom_bar(stat="identity",fill="#c8d9ca")+ 
  theme(legend.position = "null",axis.title.x=element_blank(),
        axis.text.x = element_text(angle = 90,hjust=1,size=9),plot.margin = unit(c(0,0,0,0), "cm"))+
  geom_text(aes(label=sum_injury,vjust=-0.3, size=3.5))+ 
  ylab("Number of Cases")+ylim(0,100000)+
  ggtitle("Top 10 Weather Events with Highest Injuries")  

The weather event with highest injuries is again tornado with 91346 cases recorded, which is over 13 folds higher than those of the events ranked as second and third.

Conclusions

Finally, it is noteworthy that tornado is also ranked among top three weather events with the highest combined economic impact in the previous session, thereby we conclude that tornado is the most harmful weather event with both financial and health threats while flood on the other hand leads to higher financial losses.