The following analysis uses data from 2002 to 2011 from the National Weather Service to determine what types of storm events have been most damaging to person and property. With respect to person, tornadoes have led to the most fatalities and the most injuries among storm types. Tornadoes also lead to the most fatalities per event, though other storm types such as hurricane/typhoons lead to more average injuries per event (though are less common occurences). With respect to property, water-based events including floods, hurricane/typhoons, and storm surges lead to the most property damage. Floods have created over $133 billion in property damage in the ten years considered, nearly double the second most impactful storm type, hurricane/typhoon, which created almost $70 billion in property damage. With respect to crop damage, drought is the most damaging, having created almost $5.5 billion in damage. Following drought, water-based storms including floods and hurricane/typhoons, are most damaging. In sum, tornadoes are most dangerous to personal safety, whereas drought and water-based storms are most damaging to property and crops.
First, we load the data, pulling only those columns we’ll need for our analysis, including the date on which the storm event began, the type of event, data on fatalities and injuries, and data on property and crop damage.
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.3.2
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.3.2
theURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
z <- tempfile()
download.file(theURL, z)
storm <- read.csv(z)[, c(2, 8, 23:28)]
Because we’re interested in storms likely to cause damage in the coming years, I limited our analysis to storms from the most recent ten years. The thought process behind this was that population locations and defenses against storms change over time, and so storms that took place long ago, should they occur in the same place and magnitude today, might have a very different impact. More recent events are likely to have similar impact if they reccur.
storm$BGN_DATE <- as.POSIXct(strptime(as.character(storm$BGN_DATE), "%m/%d/%Y %H:%M:%S"))
storm_2002 <- subset(storm, storm$BGN_DATE > "2002-01-01")
Data for property and crop damage are listed in thousands (K), millions (M), and billions (B). The next code chunk converts the figures to a common index so they can be summed and averaged.
for (i in 1:nrow(storm_2002)) {
if (storm_2002[i, 6] == "K") {
storm_2002[i, 5] <- storm_2002[i, 5] * 1000
} else if (storm_2002[i, 6] == "M") {
storm_2002[i, 5] <- storm_2002[i, 5] * 1000000
} else if (storm_2002[i, 6] == "B") {
storm_2002[i, 5] <- storm_2002[i, 5] * 1000000000
}
}
for (i in 1:nrow(storm_2002)) {
if (storm_2002[i, 8] == "K") {
storm_2002[i, 7] <- storm_2002[i, 7] * 1000
} else if (storm_2002[i, 8] == "M") {
storm_2002[i, 7] <- storm_2002[i, 7] * 1000000
} else if (storm_2002[i, 8] == "B") {
storm_2002[i, 7] <- storm_2002[i, 7] * 1000000000
}
}
Analysis of health impact focuses on fatalities and injuries. Below is a table of the ten storm types that have led to the most fatalities, along with the number of each event and the average fatalities per event. Tornadoes have led to the most fatalities by far, more than 50% more than excessive heat.
Fatalities <- storm_2002 %>% filter(FATALITIES > 0) %>% group_by(EVTYPE) %>% summarize(count = n(), mean = mean(FATALITIES, na.rm = TRUE), sum = sum(FATALITIES)) %>% arrange(desc(sum))
Fatalities10 <- Fatalities[1:10, ]
Fatalities10
## # A tibble: 10 × 4
## EVTYPE count mean sum
## <fctr> <int> <dbl> <dbl>
## 1 TORNADO 315 3.530159 1112
## 2 EXCESSIVE HEAT 225 3.071111 691
## 3 FLASH FLOOD 368 1.464674 539
## 4 LIGHTNING 346 1.069364 370
## 5 RIP CURRENT 301 1.129568 340
## 6 FLOOD 178 1.387640 247
## 7 HEAT 126 1.817460 229
## 8 AVALANCHE 114 1.271930 145
## 9 THUNDERSTORM WIND 107 1.214953 130
## 10 EXTREME COLD/WIND CHILL 87 1.436782 125
Next is a table of the ten storm types that have led to the most injuries, along with the number of each event and the average injuries per event. Tornadoes have also led to the most injuries, over four times the number of injuries as excessive heat.
Injuries <- storm_2002 %>% filter(INJURIES > 0) %>% group_by(EVTYPE) %>% summarize(count = n(), mean = mean(INJURIES, na.rm = TRUE), sum = sum(INJURIES)) %>% arrange(desc(sum))
Injuries10 <- Injuries[1:10, ]
Injuries10
## # A tibble: 10 × 4
## EVTYPE count mean sum
## <fctr> <int> <dbl> <dbl>
## 1 TORNADO 1132 12.003534 13588
## 2 EXCESSIVE HEAT 70 39.957143 2797
## 3 LIGHTNING 1223 1.839738 2250
## 4 THUNDERSTORM WIND 587 2.385009 1400
## 5 HURRICANE/TYPHOON 12 106.250000 1275
## 6 HEAT 36 33.944444 1222
## 7 TSTM WIND 511 2.242661 1146
## 8 WILDFIRE 184 4.951087 911
## 9 FLASH FLOOD 155 3.335484 517
## 10 HIGH WIND 186 2.612903 486
The plot below shows the ten events that led to the most fatalities and the most injuries. As was clear from the tables above, tornadoes are by far the deadliest, and most injurious storm event.
Fatalities10$Impact <- "Fatalities"
Injuries10$Impact <- "Injuries"
Fatalities10$EVTYPE <- factor(Fatalities10$EVTYPE, levels = Fatalities10$EVTYPE[order(Fatalities10$sum)])
Injuries10$EVTYPE <- factor(Injuries10$EVTYPE, levels = Injuries10$EVTYPE[order(Injuries10$sum)])
FI <- rbind(Fatalities10, Injuries10)
ggplot(FI, aes(x = EVTYPE, y = sum)) + geom_bar(stat = "identity", width = .5) + coord_flip() + facet_wrap(~ Impact, scales = "free") + xlab("Event Type") + ylab("Sum of Fatalities/Injuries") + ggtitle("Sum of Fatalities and Injuries for Ten Most Damaging Events")
Analysis of damage focuses on property and crop damage. Below is a table of the ten storm types that have led to the most property damage, along with the number of each event and the average property damage per event. The most damaging storm types to property are water-based, including floods, hurricane/typhoons, and storm surges. The sum and mean is in billions of dollars.
Property <- storm_2002 %>% filter(PROPDMG > 0) %>% group_by(EVTYPE) %>% summarize(count = n(), mean = mean(PROPDMG, na.rm = TRUE), sum = sum(PROPDMG)) %>% arrange(desc(sum))
Property10 <- Property[1:10, ]
Property10$mean <- Property10$mean/1000000000
Property10$sum <- Property10$sum/1000000000
Property10
## # A tibble: 10 × 4
## EVTYPE count mean sum
## <fctr> <int> <dbl> <dbl>
## 1 FLOOD 6706 1.989079e-02 133.387649
## 2 HURRICANE/TYPHOON 69 1.004432e+00 69.305840
## 3 STORM SURGE 73 5.913468e-01 43.168315
## 4 TORNADO 7873 2.337981e-03 18.406923
## 5 FLASH FLOOD 13109 8.169504e-04 10.709403
## 6 HAIL 13281 6.907821e-04 9.174278
## 7 HIGH WIND 3612 1.338133e-03 4.833336
## 8 WILDFIRE 723 6.581835e-03 4.758667
## 9 STORM SURGE/TIDE 47 9.874868e-02 4.641188
## 10 THUNDERSTORM WIND 42726 7.917087e-05 3.382654
With respect to crop damage, drought is the most impactful storm type. After that, water-based damage is most impactful, including flood and hurricane/typhoon. Figures below for mean and sum are in millions of dollars.
Crop <- storm_2002 %>% filter(CROPDMG > 0) %>% group_by(EVTYPE) %>% summarize(count = n(), mean = mean(CROPDMG, na.rm = TRUE), sum = sum(CROPDMG)) %>% arrange(desc(sum))
Crop10 <- Crop[1:10, ]
Crop10$mean <- Crop10$mean/1000000
Crop10$sum <- Crop10$sum/1000000
Crop10
## # A tibble: 10 × 4
## EVTYPE count mean sum
## <fctr> <int> <dbl> <dbl>
## 1 DROUGHT 153 35.4485294 5423.625
## 2 FLOOD 1233 2.9131447 3591.907
## 3 HURRICANE/TYPHOON 33 79.0264485 2607.873
## 4 HAIL 4880 0.2855098 1393.288
## 5 FROST/FREEZE 102 10.7263333 1094.086
## 6 FLASH FLOOD 1204 0.6748455 812.514
## 7 HIGH WIND 118 4.1887458 494.272
## 8 EXCESSIVE HEAT 2 246.2010000 492.402
## 9 HURRICANE 13 34.5007692 448.510
## 10 TROPICAL STORM 50 8.2012200 410.061
The plot below restates what we’ve seen in the tables above. Droughts are most damaging to crops. Beyond that, water-based storms, including floods and hurricanes/typhoons, are most damaging to crops and property in general.
Property10$Impact <- "Property"
Crop10$Impact <- "Crop"
Property10$EVTYPE <- factor(Property10$EVTYPE, levels = Property10$EVTYPE[order(Property10$sum)])
Crop10$EVTYPE <- factor(Crop10$EVTYPE, levels = Crop10$EVTYPE[order(Crop10$sum)])
PC <- rbind(Crop10, Property10)
ggplot(PC, aes(x = EVTYPE, y = sum)) + geom_bar(stat = "identity", width = .5) + coord_flip() + facet_wrap(~ Impact, scales = "free") + xlab("Event Type") + ylab("Sum of Damage") + ggtitle("Sum of Property and Crop Damage for Ten Most Damaging Events")