Synopsis

In this very simple report we’ll look at some of the effects of weather events in the US form 1950 to 2011. Both on population health and also any economic repercussions. We’ll be analysing only the most detrimental events across the US and comparing them. We’ll be using data from NCDC DATA, who’s data they got from the National Weather service.

Documentation

data processing

Here we’ll download the data and a few dependent packages. The data is then shaped by pulling out the columns of interest, then grouping, summarising and ordering. Lets see how harmfull the weather has been over the last 61 years in the US

library(ggplot2)
library(dplyr)
library(reshape2)
#file url and download and unzip if necessary
fileUrl <-"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
if (!file.exists("./repdata_data_StormData.csv.bz2")) {
        mydir<- paste0(getwd(),"/","repdata_data_StormData.csv.bz2")
        download.file(fileUrl, destfile = mydir)
}
#download data
storm_data<-read.csv("repdata_data_StormData.csv.bz2")

#filter data columns of interest, then group, summarise and order.
flt_data<-select(storm_data, "EVTYPE", "FATALITIES", "INJURIES","PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")
event_grp<-group_by(flt_data, EVTYPE)
harm<-summarise(event_grp, fatalities = sum(FATALITIES), injuries = sum(INJURIES))
any_harm<-filter(harm, fatalities > 0 | injuries > 0)
most_harm<-arrange(any_harm, desc(fatalities))

Some results

Here We’ll take a look at total fatalities and injuries and the top contributers. we use melt and ggplot2 for plotting.

print(c(paste("Total fatalities from all sources were", sum(most_harm$fatalities), "and total injuries", sum(most_harm$injuries))))
## [1] "Total fatalities from all sources were 15145 and total injuries 140528"
df<-melt(most_harm[1:10,], id.vars="EVTYPE")
ggplot(df, aes(x=variable, y=value, fill=EVTYPE))+geom_histogram(stat = "identity", position = "dodge")+labs(x="", y = "Number of occurrences", title = "1950 to 2011 weather caused health issues")

As we can see, tornados far and away are responsible for more injuries and fatalities

library(xtable)
print(xtable(most_harm[1:10,]), type = "html")
EVTYPE fatalities injuries
1 TORNADO 5633.00 91346.00
2 EXCESSIVE HEAT 1903.00 6525.00
3 FLASH FLOOD 978.00 1777.00
4 HEAT 937.00 2100.00
5 LIGHTNING 816.00 5230.00
6 TSTM WIND 504.00 6957.00
7 FLOOD 470.00 6789.00
8 RIP CURRENT 368.00 232.00
9 HIGH WIND 248.00 1137.00
10 AVALANCHE 224.00 170.00

More results

Lets take a look at the weather that was most detrimental to property and crops.
To do this we’ll need to standardize the way we account for cost of damages. We do this by creating a new column that reflects the cost of damages taking the “damage key” into account.

#fix the weird way they wanted to keep track of damages by "mutating" the
#PROPDMG's columns to reflect the PROPDMGEXP value key... 
#ie "M" million, "B" billion etc. new row will correspond to num and letter in CROP/PROPDMG column.
dmgs <- mutate(flt_data, Property_Damages = ifelse(PROPDMGEXP == "B" | PROPDMGEXP == "b", PROPDMG*1000000000, ifelse(PROPDMGEXP == "M" | PROPDMGEXP == "m", PROPDMG*1000000, ifelse(PROPDMGEXP == "K" | PROPDMGEXP == "k", PROPDMG*1000,ifelse(PROPDMGEXP == "H" | PROPDMGEXP == "h", PROPDMG*100, PROPDMG)))))
#lets go ahead and do the same with CROPS, there are no hundreds (H's/h's)
dmgs <- mutate(dmgs, Crops_Damages = ifelse(CROPDMGEXP == "B" | CROPDMGEXP == "b", CROPDMG*1000000000, ifelse(CROPDMGEXP == "M" | CROPDMGEXP == "m", CROPDMG*1000000, ifelse(CROPDMGEXP == "K" | CROPDMGEXP == "k", CROPDMG*1000, CROPDMG))))

#now lets go ahead and group and order them by EVTYPE similar to what we did earlier
event_grp2<- group_by(dmgs, EVTYPE)
sum_dmgs<- summarise(event_grp2, Prop_Damage = sum(Property_Damages), Crop_Damage = sum(Crops_Damages))
any_dmgs<-filter(sum_dmgs, Prop_Damage > 0 | Crop_Damage > 0)
most_Property_dmgs<-arrange(any_dmgs, desc(Prop_Damage))
most_Ag_dmgs<-arrange(any_dmgs, desc(Crop_Damage))
#show totals
c(paste("Total Property damage from all sources was", round(sum(most_Property_dmgs$Prop_Damage)/1000000000, digits = 2),"billion dollars"),
paste("Total agricultural damages from all sources was", round(sum(most_Ag_dmgs$Crop_Damage)/1000000000, digits = 2),"billion dollars"))
## [1] "Total Property damage from all sources was 427.32 billion dollars"   
## [2] "Total agricultural damages from all sources was 49.1 billion dollars"
df1<-melt(most_Ag_dmgs[1:10,], id.vars="EVTYPE")
ggplot(df1, aes(x=variable, y=value/1000000000, fill=EVTYPE))+geom_histogram(stat = "identity", position = "dodge")+labs(x="Property and Crop Damage", y = "Billions of dollars", title = " 1950 to 2011 weather damage totals")

we can see here that flooding has had the greatest economic consequence overall, where as drought was the worst for crops.

Thanks for looking :)

this took me much longer than anticipated due to the subtle nuances of RMD knitting to html.