Synopsis

In this report, we analyze the health and economic consequences of severe weather events, using the NOAA Storm Database. We find that tornadoes, thunderstorm winds, and hail are associated with both the highest fatality rate across all event types in the database, and the highest average economic damage across all event types in the database.

Data Processing

First we read the data in.

data <- read_csv("repdata_data_StormData.csv.bz2")
## Parsed with column specification:
## cols(
##   .default = col_character(),
##   STATE__ = col_double(),
##   COUNTY = col_double(),
##   BGN_RANGE = col_double(),
##   COUNTY_END = col_double(),
##   END_RANGE = col_double(),
##   LENGTH = col_double(),
##   WIDTH = col_double(),
##   F = col_integer(),
##   MAG = col_double(),
##   FATALITIES = col_double(),
##   INJURIES = col_double(),
##   PROPDMG = col_double(),
##   CROPDMG = col_double(),
##   LATITUDE = col_double(),
##   LONGITUDE = col_double(),
##   LATITUDE_E = col_double(),
##   LONGITUDE_ = col_double(),
##   REFNUM = col_double()
## )
## See spec(...) for full column specifications.

One thing to note is that the data set is a total mess. Ideally, we would fix all the EVTYPE (event type) labels so that we could really see which types of severe weather events have large impacts. It would be nice if there were some easy way to do that. But based on my findings, I really think fixing the data would require manual classification of all the hundreds of event types, as well as additional research into what some labels even mean. For instance, I could try to replace all the labels that have “tstm” or “thunder” in them with a blanket “thunderstorm” label, but some labels appear to be aggregates, e.g. “TORNADOES, TSTM WIND, HAIL”, and it seems like I would easily count things twice. There’s also a whole set of labels for month summaries. Not to mention there are many misspellings that are not easy to predict, and could probably only fully be handled by doing this cleaning process by hand. Instead of taking a week to fix this, or making a half-hearted attempt to clean all this up, I will leave the labels as given, and state that the results of the following analyses come with no warranty.

A preprocessing step we can do, however, is to create a new variable that combines the information from PROPDMG and PROPDMGEXP into a single number. We will do the same for CROPDMG and CROPDMGEXP. Then we can sum the two types of damge into the total damage. Finally, we can group the data by event type, and summarize by the mean of the damage measures.

convert <- function(s) {
    n <- length(s)
    x <- numeric(n)
    for (i in 1:n) {
        if (s[i] == "h") x[i] = 100
        if (s[i] == "k") x[i] = 1000
        if (s[i] == "m") x[i] = 1000000
        if (s[i] == "b") x[i] = 1000000000
    }
    return(x)
}
econ_data <- data %>%
    mutate(PROPDMGEXP = str_to_lower(PROPDMGEXP), CROPDMGEXP = str_to_lower(CROPDMGEXP)) %>%
    filter(PROPDMGEXP %in% c("h", "k", "m", "b"), CROPDMGEXP %in% c("h", "k", "m", "b")) %>%
    mutate(prop_dmg = PROPDMG * convert(PROPDMGEXP), crop_dmg = CROPDMG * convert(CROPDMGEXP)) %>%
    mutate(total_dmg = prop_dmg + crop_dmg) %>%
    group_by(EVTYPE) %>%
    summarize(total_dmg = mean(total_dmg), prop_dmg = mean(prop_dmg), crop_dmg = mean(crop_dmg)) %>%
    arrange(desc(total_dmg))

We can also prepare the data for the health question. For this question we will just look at the average number of fatalities by event type. Less processing is required here than above. The reason why we keep two different data sets is that the above data set has observations removed where the PROPDMGEXP and CROPDMGEXP don’t make sense. The data set below keeps those observations.

fatal_data <- data %>%
    group_by(EVTYPE) %>%
    summarize(fatalities = mean(FATALITIES)) %>%
    arrange(desc(fatalities))

Results

Fatalities

Below we show the highest 10 event types in terms of average fatalities. We show the same information in a bar chart. We see that the top three most fatal severe weather events on average are (1) tornadoes, thunderstorm wind, and hail with an average of 25 fatalities per event, (2) cold and snow with an average of 14 fatalities per event, and (3) tropical storm Gordon with an average of 8 fatalities.

head(fatal_data, n = 10)
## # A tibble: 10 x 2
##    EVTYPE                     fatalities
##    <chr>                           <dbl>
##  1 TORNADOES, TSTM WIND, HAIL      25.0 
##  2 COLD AND SNOW                   14.0 
##  3 TROPICAL STORM GORDON            8.00
##  4 RECORD/EXCESSIVE HEAT            5.67
##  5 EXTREME HEAT                     4.36
##  6 HEAT WAVE DROUGHT                4.00
##  7 HIGH WIND/SEAS                   4.00
##  8 MARINE MISHAP                    3.50
##  9 WINTER STORMS                    3.33
## 10 Heavy surf and wind              3.00
fatal_data %>%
    head(n = 10) %>%
    ggplot(aes(x = fct_reorder(EVTYPE, fatalities), y = fatalities)) +
    geom_bar(stat = "identity") +
    coord_flip() +
    ggtitle("Average Fatalities of Severe Weather Events") +
    ylab("Fatalities per Event") +
    xlab("Event Type")

Economic Damage

Below we show the top 10 types of severe weather events in terms of total economic damage, which is the sum of property damage and crop damage. We also summarize our findings in a bar chart.

head(econ_data, n = 10)
## # A tibble: 10 x 4
##    EVTYPE                       total_dmg    prop_dmg   crop_dmg
##    <chr>                            <dbl>       <dbl>      <dbl>
##  1 TORNADOES, TSTM WIND, HAIL 1602500000. 1600000000.   2500000.
##  2 HURRICANE/TYPHOON           889338418.  810311970.  79026448.
##  3 HURRICANE OPAL              729000000.  722666667.   6333333.
##  4 RIVER FLOOD                 631773062.  317477188. 314295875.
##  5 HURRICANE                   182430412.  142887618.  39542794.
##  6 HURRICANE OPAL/HIGH WINDS   110000000.  100000000.  10000000.
##  7 HURRICANE ERIN               87336667.   85333333.   2003333.
##  8 WINTER STORM HIGH WINDS      65000000.   60000000.   5000000.
##  9 River Flooding               44670000.   35330000.   9340000.
## 10 STORM SURGE/TIDE             34128625.   34122375.      6250.
econ_data %>%
    head(n = 10) %>%
    ggplot(aes(x = fct_reorder(EVTYPE, total_dmg), y = total_dmg / 1e9)) +
    geom_bar(stat = "identity") +
    coord_flip() +
    ggtitle("Average Economic Damage of Severe Weather Events") +
    ylab("Damage in USD (Billions)") +
    xlab("Event Type")