Weather-related events can cause significant losses, both in terms of lives, health, and the economy. Knowing which events cause the most damage can help guide allocation of scarce resources to where they can be of most help. Here, I use data collected by the National Oceanic & Atmospheric Administration, which tracks fatalities, injuries, and monetary damages on a per-event basis, to discover which weather events are the major offenders.
My analysis finds that tornadoes easily cause the most injuries, are also responsible for the most monetary damage, and generate the second-highest number of fatalities among weather events. Heat waves cause the most fatalities and are a distant second in injuries. Floods cause the most damage to property and crops.
With this knowledge, policymakers must choose which events to manage most heavily based on potential damages, ability to deal with large vs. frequent issues, and what resources are available.
Loading all the necessary libraries for the analysis:
library(data.table)
library(plyr)
library(lubridate)
library(ggplot2)
The NOAA data from the years 1950 to 2011 can be downloaded here. We do that in a cached operation, reading it into the ‘stormData’ variable with 902,297 observations of 37 variables.
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url, "StormData.csv.bz2", quiet=TRUE, method="curl")
stormData <- data.table(read.csv("StormData.csv.bz2"))
Then we run some cleanup operations.
stormClean <- stormData
stormClean$BGN_DATE <- parse_date_time(stormClean$BGN_DATE, orders="%m/%d/%Y %H:%M:%S")
stormClean <- stormClean[year(stormClean$BGN_DATE) > 1981,]
stormClean$EVTYPE = toupper(stormClean$EVTYPE)
stormClean[grepl("^HURRI",stormClean$EVTYPE),]$EVTYPE <- "HURRICANE"
stormClean[grepl("TORN",stormClean$EVTYPE),]$EVTYPE <- "TORNADO"
stormClean[grepl("HEAT",stormClean$EVTYPE),]$EVTYPE <- "HEAT"
stormClean[grepl("WARM",stormClean$EVTYPE),]$EVTYPE <- "HEAT"
stormClean[grepl("TSTM[ ]?W",stormClean$EVTYPE),]$EVTYPE <- "TSTM WIND"
stormClean[grepl("FLOOD",stormClean$EVTYPE),]$EVTYPE <- "FLOOD"
## Template for checking types:
## unique(stormClean[grepl("HURR",stormClean$EVTYPE),EVTYPE])
Taking this data, we extract a few summary statistics about the various event types. First, we find total, mean, and median fatalities, injuries, and monetary damage (as a broad first look, we combine the values of property and crop damages).
byEventFat <- stormClean[, list(numEvents = .N,
totalFat = sum(FATALITIES),
meanFat = mean(FATALITIES),
medianFat = median(FATALITIES)), by = EVTYPE]
byEventInj <- stormClean[, list(numEvents = .N,
totalInj = sum(INJURIES),
meanInj = mean(INJURIES),
medianInj = median(INJURIES)), by = EVTYPE]
byEventDmg <- stormClean[, list(numEvents = .N,
totalDmg = sum(PROPDMG + CROPDMG),
meanDmg = mean(PROPDMG + CROPDMG),
medianDmg = median(PROPDMG + CROPDMG)), by = EVTYPE]
Next, we find the number of events with fatalities, injuries, and property or crop damage, and merge that information into our prior tables.
fatRows <- stormClean[FATALITIES > 0, list(withFat = .N), by = EVTYPE]
injRows <- stormClean[INJURIES > 0, list(withInj = .N), by = EVTYPE]
dmgRows <- stormClean[PROPDMG > 0 | CROPDMG > 0, list(withDmg = .N), by = EVTYPE]
byEventFat <- merge(byEventFat, fatRows, by="EVTYPE", all=TRUE)
byEventInj <- merge(byEventInj, injRows, by="EVTYPE", all=TRUE)
byEventDmg <- merge(byEventDmg, dmgRows, by="EVTYPE", all=TRUE)
In order to eliminate outliers caused by miscategorization, we limit our analysis to event types with more than five instances.
First, although hurricanes cause significant loss of life, the geographic spread of heat waves and the frequency of tornadoes helps them become the largest weather-related killers.
fatTop <- head(arrange(byEventFat[numEvents > 5,], totalFat, decreasing=TRUE), 10)
fatTop$EVTYPE <- with(fatTop, factor(EVTYPE, EVTYPE))
ggplot(fatTop, aes(x=EVTYPE, y=totalFat)) + geom_bar(stat="identity")
As for injuries, once again, tornadoes are a principal weather-related cause. In fact, they cause far more than any other weather-related event.
injTop <- head(arrange(byEventInj[numEvents > 5,], totalInj, decreasing=TRUE), 10)
injTop$EVTYPE <- with(injTop, factor(EVTYPE, EVTYPE))
ggplot(injTop, aes(x=EVTYPE, y=totalInj)) + geom_bar(stat="identity")
Similarly, the frequency of flooding and tornadoes mean that they hold the lead in terms of dollars of damage caused to property and crops.
dmgTop <- head(arrange(byEventDmg[numEvents > 5,], totalDmg, decreasing=TRUE), 10)
dmgTop$EVTYPE <- with(dmgTop, factor(EVTYPE, EVTYPE))
ggplot(dmgTop, aes(x=EVTYPE, y=totalDmg)) + geom_bar(stat="identity")
Compare these results to the large but infrequent fatalities, injuries, and monetary damages caused by hurricane events:
compEvents <- c("HURRICANE", "TORNADO", "FLOOD", "HEAT")
byEventFat[EVTYPE %in% compEvents,]
## EVTYPE numEvents totalFat meanFat medianFat withFat
## 1: HURRICANE 288 135 0.46875000 0 48
## 2: TORNADO 37020 2250 0.06077796 0 745
## 3: FLOOD 82731 1525 0.01843324 0 986
## 4: HEAT 2975 3178 1.06823529 0 798
byEventInj[EVTYPE %in% compEvents,]
## EVTYPE numEvents totalInj meanInj medianInj withInj
## 1: HURRICANE 288 1328 4.6111111 0 30
## 2: TORNADO 37020 36077 0.9745273 0 3474
## 3: FLOOD 82731 8604 0.1039997 0 558
## 4: HEAT 2975 9243 3.1068908 0 233
byEventDmg[EVTYPE %in% compEvents,]
## EVTYPE numEvents totalDmg meanDmg medianDmg withDmg
## 1: HURRICANE 288 34570.04 120.034861 17.565 213
## 2: TORNADO 37020 2125949.79 57.427061 2.500 21532
## 3: FLOOD 82731 2800638.24 33.852344 0.000 32037
## 4: HEAT 2975 4716.04 1.585224 0.000 66
sessionInfo()
## R version 3.1.3 (2015-03-09)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X 10.10.3 (Yosemite)
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ggplot2_1.0.1 lubridate_1.3.3 plyr_1.8.1 data.table_1.9.4
##
## loaded via a namespace (and not attached):
## [1] chron_2.3-45 colorspace_1.2-6 digest_0.6.8 evaluate_0.5.5
## [5] formatR_1.0 grid_3.1.3 gtable_0.1.2 htmltools_0.2.6
## [9] knitr_1.9 labeling_0.3 MASS_7.3-39 memoise_0.2.1
## [13] munsell_0.4.2 proto_0.3-10 Rcpp_0.11.5 reshape2_1.4.1
## [17] rmarkdown_0.5.1 scales_0.2.4 stringr_0.6.2 tools_3.1.3
## [21] yaml_2.1.13