We wanted to answer two questions about severe weather in the United States. The first question is: what types of severe weather are the most harmful to the country's health and population? The data suggests strongly that tornados have caused significantly more death and injury than any other type of severe weather event. The second question is: what types of severe weather are the most harmful to the country economically? The data suggests that floods do more damage in than any other type of severe weather by a wide margin.
We began by downloading the bzipped data file, unzipping it, and reading the resulting csv into a data frame:
library (R.utils)
library (data.table)
fileurl <-
'https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2'
if (!file.exists('data.csv')) {
download.file(fileurl, method = 'curl', 'data.csv.bz2')
bunzip2('data.csv.bz2', 'data.csv') }
nrows <- system('wc -l data.csv', intern = T)
data <- read.table('data.csv', nrows = nrows, header = T, sep = ',', quote = '"')
## Warning: NAs introduced by coercion
With this data, we set about answering the first question. From the National Weather Service Instruction book describing the data (https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf) and an inspection of the data, we identified FATALITIES and INJURIES as the recorded quantities which represented the impact of severe weather on health and the population.
colnames(data)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
For fatalities and injuries, we accumulated the total number for each severe weather event type (EVTYPE). We filtered out those event types with no fatalities/injuries and took a quantile to find a reasonable cutoff for the remainder. The top 5% are shown below with their aggregate frequencies across the data:
xtfat <- xtabs(FATALITIES ~ EVTYPE, data = data)
xtinj <- xtabs(INJURIES ~ EVTYPE, data = data)
qfat <- quantile(xtfat[xtfat > 0], probs = c(0,.25,.5,.75,.9,.95,1))
qinj <- quantile(xtinj[xtinj > 0], probs = c(0,.25,.5,.75,.9,.95,1))
To answer this question, we looked at property damage (PROPDMG) and crop damage (CROPDMG). As described in the NWS document, these values were qualified with units of billions, millions, thousands, and hundreds. These unit modifiers were in adjacent columns, PROPDMGEXP and CROPDMGEXP, respectively. Conditioned on the value in the 'EXP' column we transformed the values in the 'DMG' columns to common dollar units. There were some values in the 'EXP' columns which did not fit the described regime for qualifying damage amounts. They represented a small amount of the data overall, and after looking at some summary data about them, we omitted them from our aggregate data. There is a possibility that this introduces some bias, but this did not seem likely. As with the data for the previous question, we aggregated the dollars in damage for each event type, filtered out those events with no damage reported, and generated quantiles to look at those events with maximum values.
summary(data$PROPDMGEXP)
## - ? + 0 1 2 3 4 5
## 465934 1 8 5 216 25 13 4 4 28
## 6 7 8 B h H K m M
## 4 5 1 40 1 6 424665 7 11330
adjust_damage_dollars <- function (dmg_vect, dmgexp_vect) {
adjust <- function (i) {
dmg <- dmg_vect[[i]]
dmgexp <- dmgexp_vect[[i]]
if (dmgexp %in% c('B', 'b')) { dmg * 1000000000 }
else { if (dmgexp %in% c('M', 'm')) { dmg * 1000000 }
else { if (dmgexp %in% c('K', 'k')) { dmg * 1000 }
else { if (dmgexp %in% c('h', 'H')) { dmg * 100 }
else NA } } }
}
sapply(1:length(dmg_vect), adjust)
}
property_damage_dollars <- adjust_damage_dollars(data$PROPDMG, data$PROPDMGEXP)
crop_damage_dollars <- adjust_damage_dollars(data$CROPDMG, data$CROPDMGEXP)
datadmg <- data.frame(evtype = data$EVTYPE, propdmg = property_damage_dollars,
cropdmg = crop_damage_dollars)
xtpdmg <- xtabs(propdmg ~ evtype, data = datadmg)
xtcdmg <- xtabs(cropdmg ~ evtype, data = datadmg)
qpdmg <- quantile(xtpdmg[xtpdmg > 0], probs = c(0,.25,.5,.75,.9,.95,1))
qcdmg <- quantile(xtcdmg[xtcdmg > 0], probs = c(0,.25,.5,.75,.9,.95,1))
To answer to the first question, we plotted the event types leading to the highest number of fatalities and injuries.
par(mfrow = c(1,2), cex = 0.7, mar = c(9,5,4,1))
maxinj <- xtinj[xtinj >= qinj[['95%']]]
barplot(maxinj[order(maxinj, decreasing = T)], las = 3, ylab = '# Injuries',
main = 'Injuries Caused by Event Type')
maxfat <- xtfat[xtfat >= qfat[['95%']]]
barplot(maxfat[order(maxfat, decreasing = T)], las = 3, ylab = '# Fatalities',
main = 'Fatalities Caused by Event Type')
By a wide margin, tornados have caused the highest number of injuries and fatalities of any type of severe weather. Severe heat also has led to a large number of injuries and fatalities.
And to answer the second question, we plotted the event types leading to the highest dollar value lost to property damage and crop damage.
par(mfrow = c(1,2), cex = 0.6, mar = c(11,5,3,1))
maxpdmg <- xtpdmg[xtpdmg >= qpdmg[['95%']]]
barplot(maxpdmg[order(maxpdmg, decreasing = T)], las = 3,
ylab = '$ Property Damage', main = 'Property Damage by Event Type')
maxcdmg <- xtcdmg[xtcdmg >= qcdmg[['95%']]]
barplot(maxcdmg[order(maxcdmg, decreasing = T)], las = 3,
ylab = '$ Crop Damage', main = 'Crop Damage by Event Type')
This result is also striking. Flood damage has caused nearly twice as much property damage as the next most damaging event type. Hurricane/typhoon, tornado, and storm surge events also account for large amounts of property damage, but far below that from floods. Also notable is the order of magnitude in difference between the dollars lost to property damage and to crop damage. While drought is the most damaging event type to crops, the aggregate damage would be less than the amount of property damage caused by the aforementioned event types, as well as a few others.