As an exercise in reproducible research, this publication will aim to investigate the severity of natural disasters based on the health and property damage that they cause. The data for this exercise can be found at https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2 with the documentation at https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf.
The analysis will take into account injuries and deaths caused by weather events under the Health section and crop and property damage under the Economy section.
The entire analysis has been conducted in RStudio. R version 3.3.1 (2016-06-21), platform x86_64-apple-darwin13.4.0.
Download and load code and related libraries
Data Conditioning In order to include all results in our analysis first, only relevant columns of data are used and the “EXP” column values are incorporated into the Crop and Property damage data in order to get sensible numerical values.
trimmed<-stormdata[,c("EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]
trimmed$CROPDMGEXP<-toupper(trimmed$CROPDMGEXP)
trimmed$cropnumeric<-ifelse(trimmed$CROPDMGEXP=="K",1000*trimmed$CROPDMG,ifelse(trimmed$CROPDMGEXP=="M",1000000*trimmed$CROPDMG,ifelse(trimmed$CROPDMGEXP=="B",1000000000*trimmed$CROPDMG,ifelse(trimmed$CROPDMGEXP=="2",100*trimmed$CROPDMG,trimmed$CROPDMG))))
trimmed$PROPDMGEXP<-toupper(trimmed$PROPDMGEXP)
trimmed<-subset(trimmed,trimmed$PROPDMGEXP %in% c("K","M","B","0","5","6","4","2","3","7","1","8",""))
trimmed$propnumeric<-ifelse(trimmed$PROPDMGEXP=="K",1000*trimmed$PROPDMG,ifelse(trimmed$PROPDMGEXP=="M",1000000*trimmed$PROPDMG,ifelse(trimmed$PROPDMGEXP=="B",1000000000*trimmed$PROPDMG,ifelse(trimmed$CROPDMGEXP=="2",100*trimmed$PROPDMG,ifelse(trimmed$PROPDMGEXP=="5",trimmed$PROPDMG*100000,ifelse(trimmed$PROPDMGEXP=="6",1000000*trimmed$PROPDMG,ifelse(trimmed$PROPDMGEXP=="",trimmed$PROPDMG,(10^as.numeric(trimmed$PROPDMGEXP))*trimmed$PROPDMG)))))))
## Warning in ifelse(trimmed$PROPDMGEXP == "", trimmed$PROPDMG,
## (10^as.numeric(trimmed$PROPDMGEXP)) * : NAs introduced by coercion
Instead of looking at the total numbers of injuries and deaths related to weather events, only the average rates will be reported, this also allows us to incorporate all the data from the dataset as previous years may not necessarily include data on all event types. The Tropical Storm Gordon has been excluded from the figures as it has not been categorised under an event type despite not being an event type per se.
fatalities<-aggregate(FATALITIES~EVTYPE,data=trimmed,FUN=mean)
fatalities<-fatalities[order(fatalities$FATALITIES,decreasing=TRUE),]
injuries<-aggregate(INJURIES~EVTYPE,data=trimmed,FUN=mean)
injuries<-injuries[order(injuries$INJURIES,decreasing=TRUE),]
plotfatal<-ggplot(data = fatalities[c(6,5,4,2,1),], aes(x = reorder(EVTYPE,FATALITIES), y = FATALITIES)) +geom_bar(stat = "identity", position = "stack")+coord_flip()+ggtitle("Number of Fatalities")+ theme(axis.title.y=element_blank())
plotinjure<-ggplot(data = injuries[c(6,5,4,3,1),], aes(x = reorder(EVTYPE,INJURIES), y = INJURIES)) +geom_bar(stat = "identity", position = "stack")+coord_flip()+ggtitle("Number of Injuries")+ theme(axis.title.y=element_blank())
grid.arrange(plotfatal,plotinjure,ncol=2)
Top 5 most harmful Weather Events
As it can be seen in the plots, Heat Waves seem to be the most dangerous events in terms of health risks. It must be borne in mind that these are only the mean values, without taking into account the frequency of the event type.
A similar route was taken in investigating the economic effects of weather events, as before, Hurricane Opal was removed from the plots due to misclassification.
cropdmg<-aggregate(cropnumeric~EVTYPE,data=trimmed,FUN=mean)
cropdmg<-cropdmg[order(cropdmg$cropnumeric,decreasing=TRUE),]
propdmg<-aggregate(propnumeric~EVTYPE,data=trimmed,FUN=mean)
propdmg<-propdmg[order(propdmg$propnumeric,decreasing=TRUE),]
plotcrop<-ggplot(data = cropdmg[c(5,4,3,2,1),], aes(x = reorder(EVTYPE,cropnumeric), y = cropnumeric)) +geom_bar(stat = "identity", position = "stack")+coord_flip()+ggtitle("Crop Damage")+ylab("Damage in USD")+ theme(axis.title.y=element_blank())
plotprop<-ggplot(data = propdmg[c(6,5,3,2,1),], aes(x = reorder(EVTYPE,propnumeric), y = propnumeric)) +geom_bar(stat = "identity", position = "stack")+coord_flip()+ggtitle("Property Damage")+ylab("Damage in USD")+ theme(axis.title.y=element_blank())
grid.arrange(plotprop,plotcrop,ncol=2)
Top 5 most Economically harmful Weather Events (amounts in USD)