The following code reads in the data from U.S. National Oceanic and Atmospheric Administration’s (NOAA) pertaining to storms from 1950 to 2011. The data is then cleaned. Values prior to 1993 are dropped due to the small number event types recorded. Damage is then adjust for inflation to be in 2014 dollars. The worst 25 event types in terms of totals harm (injuries + fatalities) are displayed in tables and plots. Similiarly the top 25 worst event types in terms of total damage (crop damage + property damage) are displayed in tables and plots as well.
The following reads in the data from the csv. The csv must first be downloaded and saved to your working directory. The csv can be downloaded at this web address: Storm Data
library(lubridate)
library(ggplot2)
library(knitr)
data <- read.csv("repdata-data-StormData.csv.bz2")
The following sets the date format for the BGN_DATE field and calculates to new fields called harm (injuries + fatalities) and damage (crop damage + property damage)
data$BGN_DATE <- as.Date(data$BGN_DATE, "%m/%d/%Y")
data$harm <- data$FATALITIES + data$INJURIES
data$damage <- data$PROPDMG + data$CROPDMG
The following calculates the total damage and harm by year and event type in a new dataframe called summaryByYear.
summaryByYear <- aggregate(cbind(harm, damage)
~ EVTYPE + year(data$BGN_DATE)
,data = data, sum)
names(summaryByYear) <- c("EVTYPE", "year", "harm", "nomdamage")
Prior to 1993 very few event types were recorded.
We will therefore discard data prior to 1993. This is Figure 1
summaryByYear$counter <- 1
eventcount <- aggregate(counter ~ year,
data=summaryByYear
, sum)
plot(eventcount, ylab="# of Event Types")
Based on the previous plot, the data are subsetted to values after 1992.
summaryByYear <- summaryByYear[summaryByYear$year>1992, ]
CPI numbers obtained from here. These will be used to adjust for inflation
cpi <- data.frame(
rbind( c(1993, 144.5)
,c(1994, 148.2)
,c(1995, 152.4)
,c(1996, 156.9)
,c(1997, 160.5)
,c(1998, 163.0)
,c(1999, 166.6)
,c(2000, 172.2)
,c(2001, 177.1)
,c(2002, 179.9)
,c(2003, 184.0)
,c(2004, 188.9)
,c(2005, 195.3)
,c(2006, 201.6)
,c(2007, 207.3)
,c(2008, 215.303)
,c(2009, 214.537)
,c(2010, 218.056)
,c(2011, 224.939)
,c(2014, 236.736)))
names(cpi) <- c("year", "cpi")
Adjust damage for inflation. Values will be adjusted into 2014 dollars.
summaryByYear$damage <- NA
for (i in 1993:2011) {
oldcpi <- cpi[cpi$year==i,2]
realcpi <- cpi[cpi$year==2014, 2]
summaryByYear[summaryByYear$year==i,]$damage <-
(summaryByYear[summaryByYear$year==i,]$nomdamage / oldcpi)* realcpi
}
Total damage and harm over the period of 1993 to 2011 is summed by event type in a new data frame called summaryByEventType
summaryByEventType <- aggregate(cbind(harm, damage) ~ EVTYPE, data=summaryByYear, sum)
names(summaryByEventType) <- c("EVTYPE", "harm", "damage")
The following creates 2 new dataframes:
Top25HarmByEventType <- head(summaryByEventType[order(-summaryByEventType$harm),],25)
Top25HarmByYear <- summaryByYear[summaryByYear$EVTYPE %in%
Top25HarmByEventType$EVTYPE,]
The following creates 2 new dataframes:
Top25DamageByEventType which has the total most damaging 25 event types sorted by total damage
Top25DamageByYear which has the yearly total for the 25 most damaging event types
Top25DamageByEventType <- head(summaryByEventType[order(-summaryByEventType$damage),],25)
Top25DamageByYear <- summaryByYear[summaryByYear$EVTYPE %in%
Top25DamageByEventType$EVTYPE,]
The following displays a table for the 25 most harmful event types. This will be hidden to keep to the 3 figure rule
kable(Top25HarmByEventType[,c(1,2)], row.names=FALSE)
The following displays a plot for the 25 most harmful event types This is Figure 2
p <- ggplot(Top25HarmByYear, aes(x=EVTYPE, y=harm))
p0 <- p + geom_point()
p0 <- p0 + geom_boxplot()
p1 <- p0 + theme(axis.text.x = element_text(angle = 90, hjust = 1))
p1 <- p1 + labs(x="Event Type", y="Total Harm")
p1
The following displays a table 25 most damaging event types in 2014 dollars.
This is Figure 3
kable(Top25DamageByEventType[,c(1,3)], row.names=FALSE)
| EVTYPE | damage |
|---|---|
| FLASH FLOOD | 2054260.60 |
| TSTM WIND | 1957772.37 |
| TORNADO | 1920540.67 |
| HAIL | 1667983.62 |
| FLOOD | 1322883.59 |
| THUNDERSTORM WIND | 1036710.98 |
| LIGHTNING | 795616.37 |
| THUNDERSTORM WINDS | 742755.94 |
| HIGH WIND | 432793.19 |
| HEAVY SNOW | 172136.51 |
| WINTER STORM | 170384.90 |
| WILDFIRE | 101118.34 |
| HIGH WINDS | 92142.77 |
| ICE STORM | 89208.20 |
| HEAVY RAIN | 78331.63 |
| STRONG WIND | 75082.64 |
| TROPICAL STORM | 64725.62 |
| WILD/FOREST FIRE | 60627.64 |
| FLASH FLOODING | 53594.54 |
| DROUGHT | 47244.27 |
| URBAN/SML STREAM FLD | 41236.18 |
| BLIZZARD | 33105.57 |
| FLOOD/FLASH FLOOD | 32659.25 |
| HURRICANE | 28827.10 |
| STORM SURGE | 26786.73 |
The following displays a plot of the 25 most damaging event types in 2014 dollars. This will be hidden to keep to the 3 figure rule
p <- ggplot(Top25DamageByYear, aes(x=EVTYPE, y=damage))
p0 <- p + geom_point()
p0 <- p0 + geom_boxplot()
p1 <- p0 + theme(axis.text.x = element_text(angle = 90, hjust = 1))
p1 <- p1 + labs(x="Event Type", y="Total Damage")
p1
Note that some of the event types are recorded as differently but in fact appear to be the same thing (e.g, “Thunder Storm Wind” & “Thunder Storm Winds”). It would be advisable to clean those values and re-do the analysis. This was not done due to time constraints created in large part (ironcically enough) by copious amounts of New England snow.
datetime <- now("UTC")
Code run on : {2015-02-22 18:30:36} UTC
sessionInfo()
## R version 3.1.2 (2014-10-31)
## Platform: i386-w64-mingw32/i386 (32-bit)
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] knitr_1.9 ggplot2_1.0.0 lubridate_1.3.3
##
## loaded via a namespace (and not attached):
## [1] colorspace_1.2-4 digest_0.6.8 evaluate_0.5.5 formatR_1.0
## [5] grid_3.1.2 gtable_0.1.2 htmltools_0.2.6 labeling_0.3
## [9] MASS_7.3-39 memoise_0.2.1 munsell_0.4.2 plyr_1.8.1
## [13] proto_0.3-10 Rcpp_0.11.4 reshape2_1.4.1 rmarkdown_0.5.1
## [17] scales_0.2.4 stringr_0.6.2 tools_3.1.2 yaml_2.1.13