Synopsis

Objective: To understand which weather events have the greatest impact on public health and economic damage based on the NOAA storm database. Methods: The NOAA storm dataset (1950 to 2011) was downloaded and read in as a csv file. Injuries and fatalities are the two parameters in the dataset that guage the impact on public health. The total number of fatalities and injuries by event type were calculated. Another variable which considers total health impact as the sum of fatalities and injuries was estimated. The top 5 events which caused the greatest number of fatalities, injuries and fatalities+injuries were summarized. The % contribution of the top 5 events to the health impact was estimated. For the assessment of the impact of weather events on economic damage, two variables the property damage and crop damage were summed together to assess the total economic damage. Prior to summing these variables, the property damage and crop damage was transformed so that both represented damage in millions ($). The top 5 events that caused the most economic impact were summarized and their % contribution to the total economic impact was estimated. Results: The events with greatest number of injuries in the US over the 1950-2011 time span were tornado, thunderstorm winds, flood, excessive heat, and lightning in that order whereas the events that caused the most fatalities were tornado, excessive heat, flash flood, heat, and lightning. When the total health impact is considered as a sum of fatalities and injuries the top 5 events are tornado, excessive heat, thunderstorm winds, flood and lightning. Among these events the impact of tornado seems to be the greatest accounting for nearly 60% of all fatalities + injuries. The top 5 events contribute to about 80% of all the health impact. In terms of ecomomic damage (estimated as sum of crop damage and property damage) the events with greatest impact were flood, hurricane/typhoon, tornado, storm surge and hail. These events contributed to about 70% of all the economic damage Conclusion: The top weather event that has the greatest impact on public health in the united states is tornado which accounts for more than half of fatalities+injuries in the 1950-2011 time span. The top event which caused the most economic damage in that time span is flood which accounts for ~32% of all the economic damage.

Data Processing

The data for the analysis of impacts of weather event was downloaded from the NOAA storm database as a csv file. The file was read into R using the read.csv function

setwd("H:/Personal/Continuing Education/Data Scientist Specialization/Reproducible Research")
noaadata<-read.csv("repdata%2Fdata%2FStormData.csv", stringsAsFactors = F)

Dataprocessing-Assessing the health impact

For assessing the health impact of weather events, two variables from the dataset which represnt the number of fatalities and injuries were considered. The fatalities and injuries by event type were summed over the entire time span of the data collection (1950-2011). A derived variable which assess the total health impact called FATINJ was estimated as the sum of the fatalaities and injuries. Using the cummulative sum of the fatalities, injuries and fatalitites+ injuries the top 5 events for each of these three outcomes were asessed. To assess the impact of the top 5 events in relation to the others, total % of injuries+fatalities were estimated for each event. The top 5 events were used as is whereas all the remaining events were grouped into a “others” category. The results were then summarized using a pie chart.

totfat<-aggregate(FATALITIES~EVTYPE,data=noaadata,FUN="sum")# sum the fatalities by event type
totinj<-aggregate(INJURIES~EVTYPE,data=noaadata,FUN="sum")# sum the injuries by event type

totfat<-totfat[order(totfat$FATALITIES, decreasing = T),]# sort the dataset in descending order of fatalities
totinj<-totinj[order(totinj$INJURIES, decreasing = T),]# sort the dataset in descending order of injuries

tothealth<-merge(totfat,totinj)# merge the two datasets to create a total health impact dataset
tothealth$FATINJ<-tothealth$FATALITIES+tothealth$INJURIES # estimate the sum of fatalities and injuries

tothealth<-tothealth[order(tothealth$FATINJ, decreasing = T),]# sort the dataset in descending order of FATINJ

topfat<-totfat[1:5,]# subset the first 5 records to denote the top 5 events

names(topfat)<-c("Event", "Total Fatalities")
    
topinj<-totinj[1:5,]# subset the first 5 records to denote the top 5 events
names(topinj)<-c("Event", "Total Injuries")

tophealth<-tothealth[1:5,]# subset the first 5 records to denote the top 5 events
names(tophealth)<-c("Event","Total Fatalities", "Total Injuries","Total Health Impact (Fatalities + Injuries)")

tophealth$EVTYPE<-factor(c("TORNADO","EXCESSIVE HEAT", "TSTM WIND","FLOOD","LIGHTNING"),levels=c("TORNADO","EXCESSIVE HEAT", "TSTM WIND","FLOOD","LIGHTNING"), labels=c("T","EH","TW", "F","L"))

tothealth$PFATINJ<-round(tothealth$FATINJ*100/(sum(tothealth$FATINJ)),1) # calculate % total fatalities+injuries


# Create a factor variable depicting the top 5 events and combining all other events into a "Others" category
tothealth$IEVT<-ifelse(tothealth$EVTYPE=="TORNADO","Tornado",
                    ifelse(tothealth$EVTYPE=="EXCESSIVE HEAT","Excessive Heat",
                           ifelse(tothealth$EVTYPE=="TSTM WIND","Thunderstorm Wind",
                                  ifelse(tothealth$EVTYPE=="FLOOD","Flood",
                                      ifelse(tothealth$EVTYPE=="LIGHTNING","Lightning","Others" ))))) 

Phealth<-aggregate(PFATINJ~IEVT, data=tothealth,FUN="sum") # sum the percentages by event type factor variable
lab1<-paste(Phealth$IEVT,"(",Phealth$PFATINJ,"%",")", sep=" ") # create labels for the chart

Dataprocessing-Assessing the economic impact

For assessing the economic impact of weather events, property damage and crop damage were considered. The property damage in the dataset is presented as 2 variables a numeric PROPDMG variable and a character variable PROPDMGEXP which gives the exponent in terms of hundreds, thousands, millions etc. There were also other characters like “1”, “2” , “?”, “_" etc which were not in the dataset description. The records corresponding to these other undescribed exponents were set to 0 property damage. A new property damage variable which estimated the property damage in $ millions was constructed from the PROPDMG and PROPDMGEXP variables. A similar process was repeated for assessing crop damage in millions. Once the proerty and crop damage were correctly estimated they were added together to estimate the total economic damge. This total damage was then summed across the years for the event types and then the top 5 events having the highest economic damge were estimated. Similar to the health impact, in order to assess the economic damge from top 5 events versus others, the total % damage was estimated for each event. The top 5 events were used as is whereas all the remaining events were grouped into a “others” category. The results were then summarized using a pie chart.

unique(noaadata$PROPDMGEXP)## find out the unique symbols used for the PROPDMG value
##  [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-"
## [18] "1" "8"
# As per the documentation M represents millions, B billions and K thousands. under the assumption that "k" and "K"
# are identical we can create a property damage value with units of million dollars.

noaadata$PROPDMG1<-ifelse(noaadata$PROPDMGEXP %in% c("", "+", "0","5", "6", "?", "4", "2", "3", "7","-", "1", "8"),noaadata$PROPDMG*0,
                          ifelse(noaadata$PROPDMGEXP %in% c("B","b"),noaadata$PROPDMG*1000,
                            ifelse(noaadata$PROPDMGEXP %in% c("M","m"),noaadata$PROPDMG,
                                   ifelse(noaadata$PROPDMGEXP %in% c("K","k"),noaadata$PROPDMG/1000,
                                          noaadata$PROPDMG/100000 ))))
# Repeat the process for assessing crop damage in millions
unique(noaadata$CROPDMGEXP)
## [1] ""  "M" "K" "m" "B" "?" "0" "k" "2"
noaadata$CROPDMG1<-ifelse(noaadata$CROPDMGEXP %in% c("", "0", "?", "2"),noaadata$CROPDMG*0,
                          ifelse(noaadata$CROPDMGEXP %in% c("B","b"),noaadata$CROPDMG*1000,
                            ifelse(noaadata$CROPDMGEXP %in% c("M","m"),noaadata$CROPDMG,
                                   ifelse(noaadata$CROPDMGEXP %in% c("K","k"),noaadata$CROPDMG/1000,
                                          noaadata$CROPDMG/100000 ))))
# Calculate the total economic impact by adding the property damage and crop damage
noaadata$TOTDMG<-noaadata$PROPDMG1+noaadata$CROPDMG1

noaadata1<-noaadata[noaadata$TOTDMG>0,]# subset the dataset where the damage is greater than $0

CumDMG<-aggregate(TOTDMG~EVTYPE, data=noaadata1, FUN="sum")# sum the total damage by event type
CumDMG<-CumDMG[order(CumDMG$TOTDMG, decreasing = T),]
topdmg<-CumDMG[1:5,]

topdmg$EVTYPE<-factor(c("FLOOD", "HURRICANE/TYPHOON", "TORNADO", "STORM SURGE","HAIL"),levels=c("FLOOD", "HURRICANE/TYPHOON", "TORNADO", "STORM SURGE","HAIL"), labels=c("Flood", "Hurricane/Typhoon","Tornado","Storm Surge","Hail"))
topdmg$TOTDMGBil<-topdmg$TOTDMG/1000 # Calculate the total damge value in billion $

# Calculate the % damage caused by each event in terms of the economic damage
CumDMG$PDMG<-round(CumDMG$TOTDMG*100/(sum(CumDMG$TOTDMG)),1)

# Create a factor variable depicting the top 5 events and combining all other events into a "Others" category
CumDMG$IEVT<-ifelse(CumDMG$EVTYPE=="FLOOD","Flood",
                    ifelse(CumDMG$EVTYPE=="HURRICANE/TYPHOON","Hurricane/Typhoon",
                           ifelse(CumDMG$EVTYPE=="TORNADO","Tornado",
                                  ifelse(CumDMG$EVTYPE=="STORM SURGE","Storm Surge",
                                      ifelse(CumDMG$EVTYPE=="HAIL","Hail","Others" )))))

Pdmg<-aggregate(PDMG~IEVT, data=CumDMG,FUN="sum")
lab<-paste(Pdmg$IEVT,"(",Pdmg$PDMG,"%",")", sep=" ")

Results

Which events are most harmful with respect to population health

The top 5 events which lead to the greatest number of fatalities in the 1950-2011 time span are presented in Table 1.

kable(topfat, caption= "Table 1- Top 5 weather events with greatest fatalities", row.names = F)
Table 1- Top 5 weather events with greatest fatalities
Event Total Fatalities
TORNADO 5633
EXCESSIVE HEAT 1903
FLASH FLOOD 978
HEAT 937
LIGHTNING 816

The top 5 events which lead to the greatest number of injuries in the 1950-2011 time span are presented in Table 2.

kable(topinj, caption= "Table 2- Top 5 weather events with greatest injuries", row.names = F)
Table 2- Top 5 weather events with greatest injuries
Event Total Injuries
TORNADO 91346
TSTM WIND 6957
FLOOD 6789
EXCESSIVE HEAT 6525
LIGHTNING 5230

The top 5 events which lead to the greatest health impact (fatalities+injuries) in the 1950-2011 time span are presented in Table 3.

kable(tophealth, caption= "Table 3- Top 5 weather events with greatest health impact (fatalities + injuries)", row.names = F)
Table 3- Top 5 weather events with greatest health impact (fatalities + injuries)
Event Total Fatalities Total Injuries Total Health Impact (Fatalities + Injuries) EVTYPE
TORNADO 5633 91346 96979 T
EXCESSIVE HEAT 1903 6525 8428 EH
TSTM WIND 504 6957 7461 TW
FLOOD 470 6789 7259 F
LIGHTNING 816 5230 6046 L

Tornado tops the charts in events with the greatest fatalities as well as injuries and therefore also the overall health impact. The other events thunderstorm winds, flood, excessive heat, and lightning in decreasing order contribute to the top 5 events with most injuries whereas tornado, excessive heat, flash flood, heat, and lightning are the top 5 events for the fatalities.

The contribution of events to the total health impact (fatalities+injuries)is presented in Figure.1

pie(Phealth$PFATINJ,labels = lab1, col=brewer.pal(n=6,"Dark2"),
    main="Fig 1: Contribution of Events to Total Health Impact")

Tornado accounts for almost 6/10 of all the fatalities and injuries combined and the top5 events contribute to ~8/10 of all the fatalities and injuries.

Which events have the greatest economic consequences

The top 5 events which cause the most damage for the economy in 1950-2011 time span are presented in Figure 2.

ggbarplot(topdmg, x = "EVTYPE", y = "TOTDMGBil",
   fill = "EVTYPE", color = "EVTYPE",
      label = FALSE,
   xlab= "Damaging Events",ylab="Cummulative Damage ($Billions)",
   main="Fig 2: Top 5 Events with the most Economic Impact in the US")

Flood causes the largest ecomomic impact.

Contribution of the top5 events to the total economic impact is presented in Figure 3.

pie(Pdmg$PDMG,labels = lab, col=brewer.pal(8,"Set3"),
    main="Fig 3: Contribution of Events to Total Economic Impact")

The top 5 events contribute to almost 70% of all the damage with the top event contributing to approximately 32% of the total damage.

Conclusions and Limitations

The weather events having the most detrimental effects on public health include , tornado, excessive heat, thunderstorm winds, flood and lightning. Together these events contribute to almost 8/10 fatalities and injuries. The weather events that have the most economic impact include flood, hurricane/typhoon, tornadom storm surge and hail together contributing to almost 70% of the economic damage.

There are certain limiations in the data analysis presented. There are 985 unique event types in the dataset and some of the event types though labelled with a different name could be describing the same event for eg. “TSTM WIND/HAIL”, “Thunderstorm Wind”, and " TSTM WIND" could all be refereing to the same event but are not considered as such in the analysis. So the most commonly used label is representing the event in the analyses.