Synopsis

The US Storm data between 1966 and 2011 was analysed to identify the types of event that have had the greatest impact across the US. Tornadoes have had the greatest impact on the population, killing 5600 and injuring 91,000. Winter storms have had the greatest economic impact causing 130 trillion dollars of property damage. The relative impact of different types of events can be seen against a logarithmic scale at the bottom of this report.

Data Processing

storm.url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(storm.url, "StormData.csv.bz2")
storm.data <- read.table("StormData.csv.bz2", header = TRUE, sep = ",")

A function is needed to interpret the PROPDMGEXP and CROPDMGEXP fields which multiply PROPDMG and CROPDMG by varying powers of 10. damage(x, y) calculates the actual damage of \(x.10^y\) where x is the coefficient, PROPDMG, and y is the exponent, PROPDMGEXP.

damage <- function(x, y) {
    exp <- ifelse(is.numeric(y), 
                  y,
                  ifelse(y %in% c("h", "H"),
                         2,
                         ifelse(y %in% c("k", "K"),
                                3,
                                ifelse(y %in% c("m", "M"),
                                       6,
                                       ifelse(y %in% c("b", "B"),
                                              9,
                                              0
                                              )
                                       )
                                )
                         )
                  )
    x * (10 ^ exp)
}

To quantify the effect on population health (people) we will add the number of fatalities and injuries. To quantify the cost (cost) we will add the property and crop damages.

suppressMessages(library(dplyr))
storm.summary <- storm.data %>%
    group_by(EVTYPE) %>%
    mutate(property = damage(PROPDMG, PROPDMGEXP),
           crop = damage(CROPDMG, CROPDMGEXP)) %>%
    summarize(people = sum(FATALITIES + INJURIES), 
              fatalities = sum(FATALITIES),
              injuries = sum(INJURIES),
              property = sum(property),
              crop = sum(crop),
              cost = sum(property + crop))

There are 985 distinct values of event type in the data. According to section 2.1.1 of the Storm Data Documentation there are only 48 permitted events. Rationalize this using a regular expression for each permitted events. This does not need to be perfect, as long as it rationalizes the results to give a unique set of values for the top 10 results. Existing strings will be retained for any items that do not match.

First, define the regex for each event.

matrix.events <- matrix(c("Avalanche", "avalanche",
                          "Blizzard", "blizzard",
                          "Coastal Flood", "coastal",
                          "Cold/Windhill", "cold",
                          "Debris Flow", "debris|flow",
                          "Dense Fog", "fog",
                          "Dense Smoke", "smoke",
                          "Drought", "drought",
                          "Dust Devil", "devil",
                          "Dust Storm", "dust(.*)storm",
                          "Excessive Heat", "excessive(.*)heat",
                          "Extremeold/Windhill", "extremeold",
                          "Flash Flood", "flash",
                          "Flood", "(^|urban |major )flood$",
                          "Frost/Freeze", "frost|freeze",
                          "Funnelloud", "funnelloud",
                          "Freezing Fog", "freezing(.*)fog",
                          "Hail", "hail",
                          "Heat", "^heat",
                          "Heavy Rain", "rain",
                          "Heavy Snow", "heavy(.*)snow",
                          "High Surf", "surf",
                          "High Wind", "high(.*)wind",
                          "Hurricane (Typhoon)", "hurricane|typhoon",
                          "Ice Storm", "ice(.*)storm",
                          "Lake-Effect Snow", "lake(.*)effect",
                          "Lakeshore Flood", "lakeshore(.*)flood",
                          "Lightning", "lightning",
                          "Marine Hail", "marine(.*)hail",
                          "Marine High Wind", "marine(.*)high(.*)wind",
                          "Marine Strong Wind", "marine(.*)strong(.*)wind",
                          "Marine Thunderstorm Wind", "marine(.*)thunderstorm(.*)wind",
                          "Rip Current", "rip(.*)current",
                          "Seiche", "seiche",
                          "Sleet", "sleet",
                          "Storm Surge/Tide", "surge|tide",
                          "Strong Wind", "strong(.*)wind",
                          "Thunderstorm Wind", "thunderstorm|tstm",
                          "Tornado", "tornado",
                          "Tropical Depression", "depression",
                          "Tropical Storm", "tropical(.*)storm",
                          "Tsunami", "tsunami",
                          "Volcanic Ash", "volcanic|ash",
                          "Waterspout", "waterspout",
                          "Wildfire", "wild(.*)fire",
                          "Winter Storm", "winter(.*)storm",
                          "Winter Weather", "winter(.*)weather",
                          "Other","(.*)"),
                        ncol = 2,
                        byrow = TRUE)
permitted.events <- data.frame(matrix.events)
names(permitted.events) <- c("name", "regex")

The rationalize.event() function will loop through these values to find a match. This can then be applied to all 985 events, and the results summarized.

event.count <- nrow(permitted.events)
rationalize.event <- function(x){
    for(i in 1:event.count){
        ifelse(grepl(permitted.events$regex[i], x, ignore.case = TRUE),
               return(permitted.events$name[i]),
               NA)
    }
}
storm.summary$event <- factor(sapply(storm.summary$EVTYPE, rationalize.event))

storm.summary <- storm.summary %>%
    group_by(event) %>%
    summarize(people = sum(people), 
              fatalities = sum(fatalities),
              injuries = sum(injuries),
              property = sum(property),
              crop = sum(crop),
              cost = sum(cost))

Results

Population impact

The top 10 event types with the greatest impact on the population are shown in the table below. This is calculated as the sum of fatalities and injuries.

library(knitr)
storm.bypeople <- arrange(storm.summary, desc(people))
kable(storm.bypeople[1:10,1:4], 
             digits = 0,
             caption = "Events types with the greatest impact on the population")
Events types with the greatest impact on the population
event people fatalities injuries
Tornado 97043 5636 91407
Thunderstorm Wind 10135 714 9421
Excessive Heat 8445 1920 6525
Flood 7259 470 6789
Lightning 6049 817 5232
Heat 3593 1114 2479
Flash Flood 2837 1035 1802
Ice Storm 2079 89 1990
High Wind 1815 297 1518
Wildfire 1696 90 1606

Economic impact

The top 10 event types with the greatest economic impact are shown in the table below. This is calculated as the sum of property and crop damage.

options(scipen = 10)
storm.bycost <- arrange(storm.summary, desc(cost))
kable(storm.bycost[1:10,c(1, 7, 5, 6)], 
             digits = 0,
             caption = "Event types with the greatest economic impact")
Event types with the greatest economic impact
event cost property crop
Winter Storm 132720591001979 132720590500000 501979
Tropical Storm 49049686349 49033180400 16505949
Hurricane (Typhoon) 22850914613 22684579475 166335138
Storm Surge/Tide 19393818716 19393817861 855
Flood 13399068900 13398899938 168961
Tornado 3215833901 3215732875 101026
Heavy Rain 3022934190 2898258754 124675436
Cold/Windhill 2707270350 611163461 2096106889
Hail 1859219683 1851522544 7697138
Frost/Freeze 1196875975 1705992 1195169984

Relative impact

To identify the types of events which the US should prioritize resources for, the plot below combines the top 10 event types in terms of population and economic impact. The scales are relative and use a logarithmic scale so that the differences between smaller values can be seen.

library(tidyr)
tenth.people <- storm.bypeople[[10, "people"]]
tenth.cost <- storm.bycost[[10, "cost"]]
max.people <- max(storm.summary$people)
max.cost <- max(storm.summary$cost)
storm.plot <- storm.summary %>%
    filter(people >= tenth.people | cost >= tenth.cost) %>%
    transmute(event, people, cost) %>%
    mutate(people = log10(people/max.people),
           cost = log10(cost/max.cost)) %>%
    arrange(people) %>%
    gather(type, damage, -event)

Order the levels of the event factor so that events are plotted in descending levels of population impact. Also update the levels of the type factor so that the facet titles are meaningful.

storm.plot$event <- with(storm.plot, factor(event, unique(event)))
levels(storm.plot$type) <-  c("Population", "Economic")

Use a dot plot rather than a bar chart, and remove the scale to emphasise that this is a relative comparison so all values are visible on the same scale. The financial impact of Winter Storms is actually 2000 times greater than Tropical Storm, the next most expensive event type.

library(ggplot2)
g <- ggplot(storm.plot, aes(x = event, colour = type)) +
    geom_point(aes(y = damage), stat = "identity") +
    coord_flip(ylim = c(-6, 0.1)) +
    facet_wrap(~ type, ncol = 2) +
    scale_colour_brewer(type = "qual", palette = 6) +
    theme(legend.position = "none",
          axis.text.x = element_blank(),
          axis.ticks.x = element_blank()) +
    labs(title = "Relative impact of weather events across the US (logarithmic)",
         x = NULL,
         y = NULL)
g