Synopsis

The following analysis uses data from 2002 to 2011 from the National Weather Service to determine what types of storm events have been most damaging to person and property. With respect to person, tornadoes have led to the most fatalities and the most injuries among storm types. Tornadoes also lead to the most fatalities per event, though other storm types such as hurricane/typhoons lead to more average injuries per event (though are less common occurences). With respect to property, water-based events including floods, hurricane/typhoons, and storm surges lead to the most property damage. Floods have created over $133 billion in property damage in the ten years considered, nearly double the second most impactful storm type, hurricane/typhoon, which created almost $70 billion in property damage. With respect to crop damage, drought is the most damaging, having created almost $5.5 billion in damage. Following drought, water-based storms including floods and hurricane/typhoons, are most damaging. In sum, tornadoes are most dangerous to personal safety, whereas drought and water-based storms are most damaging to property and crops.

Data Processing

First, we load the data, pulling only those columns we’ll need for our analysis, including the date on which the storm event began, the type of event, data on fatalities and injuries, and data on property and crop damage.

library(dplyr)
## Warning: package 'dplyr' was built under R version 3.3.2
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.3.2
theURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
z <- tempfile()
download.file(theURL, z)
storm <- read.csv(z)[, c(2, 8, 23:28)]

Because we’re interested in storms likely to cause damage in the coming years, I limited our analysis to storms from the most recent ten years. The thought process behind this was that population locations and defenses against storms change over time, and so storms that took place long ago, should they occur in the same place and magnitude today, might have a very different impact. More recent events are likely to have similar impact if they reccur.

storm$BGN_DATE <- as.POSIXct(strptime(as.character(storm$BGN_DATE), "%m/%d/%Y %H:%M:%S"))
storm_2002 <- subset(storm, storm$BGN_DATE > "2002-01-01")

Data for property and crop damage are listed in thousands (K), millions (M), and billions (B). The next code chunk converts the figures to a common index so they can be summed and averaged.

for (i in 1:nrow(storm_2002)) {
    
    if (storm_2002[i, 6] == "K") {
        storm_2002[i, 5] <- storm_2002[i, 5] * 1000
    } else if (storm_2002[i, 6] == "M") {
        storm_2002[i, 5] <- storm_2002[i, 5] * 1000000
    } else if (storm_2002[i, 6] == "B") {
        storm_2002[i, 5] <- storm_2002[i, 5] * 1000000000
    }
}

for (i in 1:nrow(storm_2002)) {
    
    if (storm_2002[i, 8] == "K") {
        storm_2002[i, 7] <- storm_2002[i, 7] * 1000
    } else if (storm_2002[i, 8] == "M") {
        storm_2002[i, 7] <- storm_2002[i, 7] * 1000000
    } else if (storm_2002[i, 8] == "B") {
        storm_2002[i, 7] <- storm_2002[i, 7] * 1000000000
    }
}

Results

Health

Analysis of health impact focuses on fatalities and injuries. Below is a table of the ten storm types that have led to the most fatalities, along with the number of each event and the average fatalities per event. Tornadoes have led to the most fatalities by far, more than 50% more than excessive heat.

Fatalities <- storm_2002 %>% filter(FATALITIES > 0) %>% group_by(EVTYPE) %>% summarize(count = n(), mean = mean(FATALITIES, na.rm = TRUE), sum = sum(FATALITIES)) %>% arrange(desc(sum))
    Fatalities10 <- Fatalities[1:10, ]
    Fatalities10
## # A tibble: 10 × 4
##                     EVTYPE count     mean   sum
##                     <fctr> <int>    <dbl> <dbl>
## 1                  TORNADO   315 3.530159  1112
## 2           EXCESSIVE HEAT   225 3.071111   691
## 3              FLASH FLOOD   368 1.464674   539
## 4                LIGHTNING   346 1.069364   370
## 5              RIP CURRENT   301 1.129568   340
## 6                    FLOOD   178 1.387640   247
## 7                     HEAT   126 1.817460   229
## 8                AVALANCHE   114 1.271930   145
## 9        THUNDERSTORM WIND   107 1.214953   130
## 10 EXTREME COLD/WIND CHILL    87 1.436782   125

Next is a table of the ten storm types that have led to the most injuries, along with the number of each event and the average injuries per event. Tornadoes have also led to the most injuries, over four times the number of injuries as excessive heat.

Injuries <- storm_2002 %>% filter(INJURIES > 0) %>% group_by(EVTYPE) %>% summarize(count = n(), mean = mean(INJURIES, na.rm = TRUE), sum = sum(INJURIES)) %>% arrange(desc(sum))
    Injuries10 <- Injuries[1:10, ]
    Injuries10
## # A tibble: 10 × 4
##               EVTYPE count       mean   sum
##               <fctr> <int>      <dbl> <dbl>
## 1            TORNADO  1132  12.003534 13588
## 2     EXCESSIVE HEAT    70  39.957143  2797
## 3          LIGHTNING  1223   1.839738  2250
## 4  THUNDERSTORM WIND   587   2.385009  1400
## 5  HURRICANE/TYPHOON    12 106.250000  1275
## 6               HEAT    36  33.944444  1222
## 7          TSTM WIND   511   2.242661  1146
## 8           WILDFIRE   184   4.951087   911
## 9        FLASH FLOOD   155   3.335484   517
## 10         HIGH WIND   186   2.612903   486

The plot below shows the ten events that led to the most fatalities and the most injuries. As was clear from the tables above, tornadoes are by far the deadliest, and most injurious storm event.

 Fatalities10$Impact <- "Fatalities"
    Injuries10$Impact <- "Injuries"
    Fatalities10$EVTYPE <- factor(Fatalities10$EVTYPE, levels = Fatalities10$EVTYPE[order(Fatalities10$sum)])
    Injuries10$EVTYPE <- factor(Injuries10$EVTYPE, levels = Injuries10$EVTYPE[order(Injuries10$sum)])
    FI <- rbind(Fatalities10, Injuries10)
    ggplot(FI, aes(x = EVTYPE, y = sum)) + geom_bar(stat = "identity", width = .5) + coord_flip() + facet_wrap(~ Impact, scales = "free") + xlab("Event Type") + ylab("Sum of Fatalities/Injuries") + ggtitle("Sum of Fatalities and Injuries for Ten Most Damaging Events")

Property and Crop Damage

Analysis of damage focuses on property and crop damage. Below is a table of the ten storm types that have led to the most property damage, along with the number of each event and the average property damage per event. The most damaging storm types to property are water-based, including floods, hurricane/typhoons, and storm surges. The sum and mean is in billions of dollars.

Property <- storm_2002 %>% filter(PROPDMG > 0) %>% group_by(EVTYPE) %>% summarize(count = n(), mean = mean(PROPDMG, na.rm = TRUE), sum = sum(PROPDMG)) %>% arrange(desc(sum))
    Property10 <- Property[1:10, ]
    Property10$mean <- Property10$mean/1000000000
    Property10$sum <- Property10$sum/1000000000
    Property10
## # A tibble: 10 × 4
##               EVTYPE count         mean        sum
##               <fctr> <int>        <dbl>      <dbl>
## 1              FLOOD  6706 1.989079e-02 133.387649
## 2  HURRICANE/TYPHOON    69 1.004432e+00  69.305840
## 3        STORM SURGE    73 5.913468e-01  43.168315
## 4            TORNADO  7873 2.337981e-03  18.406923
## 5        FLASH FLOOD 13109 8.169504e-04  10.709403
## 6               HAIL 13281 6.907821e-04   9.174278
## 7          HIGH WIND  3612 1.338133e-03   4.833336
## 8           WILDFIRE   723 6.581835e-03   4.758667
## 9   STORM SURGE/TIDE    47 9.874868e-02   4.641188
## 10 THUNDERSTORM WIND 42726 7.917087e-05   3.382654

With respect to crop damage, drought is the most impactful storm type. After that, water-based damage is most impactful, including flood and hurricane/typhoon. Figures below for mean and sum are in millions of dollars.

Crop <- storm_2002 %>% filter(CROPDMG > 0) %>% group_by(EVTYPE) %>% summarize(count = n(), mean = mean(CROPDMG, na.rm = TRUE), sum = sum(CROPDMG)) %>% arrange(desc(sum))
    Crop10 <- Crop[1:10, ]
    Crop10$mean <- Crop10$mean/1000000
    Crop10$sum <- Crop10$sum/1000000
    Crop10
## # A tibble: 10 × 4
##               EVTYPE count        mean      sum
##               <fctr> <int>       <dbl>    <dbl>
## 1            DROUGHT   153  35.4485294 5423.625
## 2              FLOOD  1233   2.9131447 3591.907
## 3  HURRICANE/TYPHOON    33  79.0264485 2607.873
## 4               HAIL  4880   0.2855098 1393.288
## 5       FROST/FREEZE   102  10.7263333 1094.086
## 6        FLASH FLOOD  1204   0.6748455  812.514
## 7          HIGH WIND   118   4.1887458  494.272
## 8     EXCESSIVE HEAT     2 246.2010000  492.402
## 9          HURRICANE    13  34.5007692  448.510
## 10    TROPICAL STORM    50   8.2012200  410.061

The plot below restates what we’ve seen in the tables above. Droughts are most damaging to crops. Beyond that, water-based storms, including floods and hurricanes/typhoons, are most damaging to crops and property in general.

 Property10$Impact <- "Property"
    Crop10$Impact <- "Crop"
    Property10$EVTYPE <- factor(Property10$EVTYPE, levels = Property10$EVTYPE[order(Property10$sum)])
    Crop10$EVTYPE <- factor(Crop10$EVTYPE, levels = Crop10$EVTYPE[order(Crop10$sum)])
    PC <- rbind(Crop10, Property10)
    ggplot(PC, aes(x = EVTYPE, y = sum)) + geom_bar(stat = "identity", width = .5) + coord_flip() + facet_wrap(~ Impact, scales = "free") + xlab("Event Type") + ylab("Sum of Damage") + ggtitle("Sum of Property and Crop Damage for Ten Most Damaging Events")