The data set was found to have two quantitative measures each for both health and economic damages. So, for each outcome (health and economic), the two respective measures were summed over every event type and merged into new data sets. The events with the top ten combined impacts (fatalities and injuries or crop and property) were plotted on bar graphs filled according to the measures.
The raw data set was read into R using read.csv and cached for further use. No other processing was done.
setwd("~/R Files")
stormdata <- read.csv("~/R Files/repdata%2Fdata%2FStormData.csv", header = TRUE)
The two quantitative measures(Fatalities and Injuries) were aggregated by summing and merged into a single dataframe. The data frame was then ordered by the sum total of Fatalities and Injuries for each event type.
library(ggplot2)
library(gridExtra)
library(reshape2)
stormdata_health<-merge(aggregate(stormdata$FATALITIES, by=list(EVTYPE=stormdata$EVTYPE), FUN=sum), aggregate(stormdata$INJURIES, by=list(EVTYPE=stormdata$EVTYPE), FUN=sum), by="EVTYPE")
names(stormdata_health)<-c("EVTYPE", "FATALITIES", "INJURIES")
stormdata_health$EVTYPE<-as.character(stormdata_health$EVTYPE)
stormdata_health<-stormdata_health[order(stormdata_health$FATALITIES+stormdata_health$INJURIES, decreasing = TRUE),]
With the data aggregated by event type and ordered, a list was drawn of the top ten events by combined impact. The data frame was then melted in order to distinguish the health impact from Fatalities and Injuries respectively and the list was used to subset the necessary events for the final graph.
event_list<-stormdata_health[1:10,1]
stormdata_health<-melt(stormdata_health, id.vars="EVTYPE", measure.vars=c("FATALITIES", "INJURIES"))
stormdata_health_top<-stormdata_health[which(stormdata_health$EVTYPE %in% event_list),]
## EVTYPE variable value
## 1 TORNADO FATALITIES 5633
## 2 EXCESSIVE HEAT FATALITIES 1903
## 3 TSTM WIND FATALITIES 504
## 4 FLOOD FATALITIES 470
## 5 LIGHTNING FATALITIES 816
## 6 HEAT FATALITIES 937
## 7 FLASH FLOOD FATALITIES 978
## 8 ICE STORM FATALITIES 89
## 9 THUNDERSTORM WIND FATALITIES 133
## 10 WINTER STORM FATALITIES 206
## 986 TORNADO INJURIES 91346
## 987 EXCESSIVE HEAT INJURIES 6525
## 988 TSTM WIND INJURIES 6957
## 989 FLOOD INJURIES 6789
## 990 LIGHTNING INJURIES 5230
## 991 HEAT INJURIES 2100
## 992 FLASH FLOOD INJURIES 1777
## 993 ICE STORM INJURIES 1975
## 994 THUNDERSTORM WIND INJURIES 1488
## 995 WINTER STORM INJURIES 1321
Using ggplot, the top 10 results were plotted using a subfilled bar graph to distinguish counts for Injuries against counts for Fatalities
g1<-ggplot(stormdata_health_top, aes(x=reorder(EVTYPE, -value), y=value, fill=variable))
g1+geom_bar(stat="identity")+theme(axis.text.x = element_text(size=8, angle=45))+xlab("Storm Event Type")+ylab("Combined Impact")+ggtitle("Health Outcomes from Storm Events (Top 10 Combined Impacts)")
##
The exact same methodology was used to answer this question with the caveat that the two quantitative variables used (CROPDMG and PROPDMG) had to be consolidated with their corresponding orders of magnitude (in PROPDMGEXP and CROPDMGEXP respectively) in order to accurately measure economic impact for each.
expmatch_PROPDMG<-data.frame(EXP=c(levels(stormdata$PROPDMGEXP)), FACTOR=c(1,1,1,1,1,1e1,1e2,1e3,1e4,1e5,1e6,1e7,1e8,1e9,1e2,1e2,1e3,1e6,1e6))
stormdata$PROPDMG<-stormdata$PROPDMG*expmatch_PROPDMG[match(stormdata$PROPDMGEXP, expmatch_PROPDMG$EXP),2]
expmatch_CROPDMG<-data.frame(EXP=c(levels(stormdata$CROPDMGEXP)), FACTOR=c(1,1,1,1e2,1e9,1e3,1e3,1e6,1e6))
stormdata$CROPDMG<-stormdata$CROPDMG*expmatch_CROPDMG[match(stormdata$CROPDMGEXP, expmatch_CROPDMG$EXP),2]
Once the damage values were modified by their corresponding orders of magnitude it was business as usual.
stormdata_econ<-merge(aggregate(stormdata$PROPDMG, by=list(EVTYPE=stormdata$EVTYPE), FUN=sum), aggregate(stormdata$CROPDMG, by=list(EVTYPE=stormdata$EVTYPE), FUN=sum), by="EVTYPE")
names(stormdata_econ)<-c("EVTYPE", "PROPDMG", "CROPDMG")
stormdata_econ$EVTYPE<-as.character(stormdata_econ$EVTYPE)
stormdata_econ<-stormdata_econ[order(stormdata_econ$PROPDMG+stormdata_econ$CROPDMG, decreasing = TRUE),]
event_list2<-stormdata_econ[1:10,1]
stormdata_econ<-melt(stormdata_econ, id.vars="EVTYPE", measure.vars=c("PROPDMG", "CROPDMG"))
stormdata_econ_top<-stormdata_econ[which(stormdata_econ$EVTYPE %in% event_list2),]
## EVTYPE variable value
## 1 FLOOD PROPDMG 144657709807
## 2 HURRICANE/TYPHOON PROPDMG 69305840000
## 3 TORNADO PROPDMG 56947380677
## 4 STORM SURGE PROPDMG 43323536000
## 5 HAIL PROPDMG 15735267513
## 6 FLASH FLOOD PROPDMG 16822673979
## 7 DROUGHT PROPDMG 1046106000
## 8 HURRICANE PROPDMG 11868319010
## 9 RIVER FLOOD PROPDMG 5118945500
## 10 ICE STORM PROPDMG 3944927860
## 986 FLOOD CROPDMG 5661968450
## 987 HURRICANE/TYPHOON CROPDMG 2607872800
## 988 TORNADO CROPDMG 414953270
## 989 STORM SURGE CROPDMG 5000
## 990 HAIL CROPDMG 3025954473
## 991 FLASH FLOOD CROPDMG 1421317100
## 992 DROUGHT CROPDMG 13972566000
## 993 HURRICANE CROPDMG 2741910000
## 994 RIVER FLOOD CROPDMG 5029459000
## 995 ICE STORM CROPDMG 5022113500
The results were then plotted in a similar way.
g2<-ggplot(stormdata_econ_top, aes(x=reorder(EVTYPE, -value), y=value, fill=variable))
g2+geom_bar(stat="identity")+theme(axis.text.x = element_text(size=8, angle=45))+xlab("Storm Event Type")+ylab("Combined Impact")+ggtitle("Economic Outcomes from Storm Events (Top 10 Combined Impacts)")
##
In terms of both fatalities and injuries, Tornadoes were found to have the largest impact on health outcomes with 5,633 recorded fatalities and 91,346 injuries.
Regarding economic impact, Floods were found to have the greatest impact on property values with $144,657,709,807 of damages reported over the course of this study and Drought was found to have the greates impact on crop values with $13,972,566,000 of damages reproted over the same period.