Synopsis

The goal of this assignment is to explore the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database and answer some basic questions about severe weather events. In particular, we focus on the following questions:

  1. Across the United States, which types of events are most harmful with respect to population health?

  2. Across the United States, which types of events have the greatest economic consequences?

To answer these questions, we process the data by classifying each free-text description of a documented weather event into exactly one of the 48 severe weather event categories. We then calculate mean property damage, mean crop damage, mean fatalities, and mean injuries for each of the 48 severe weather event categories and display these values in bar plots to visually access which severe weather events are most harmful with respect to population health (fatalities and injures) and economic consequences (property damage and crop damage).

Note: The table of contents on the left can be used to quickly access the various sections of this report.


Setup

We begin by setting global code-chunk options, loading libraries, and defining the “%notin%” function.

knitr::opts_chunk$set(echo = TRUE)                      # set global chunk options
suppressWarnings(suppressMessages(library(tidyverse)))  # tidyverse
suppressWarnings(suppressMessages(library(patchwork)))  # to use the + operator for panel plots
suppressWarnings(suppressMessages(library(kableExtra))) # for nice html tables
suppressWarnings(suppressMessages(library(DT)))         # for interactive html tables

# Function: notin
# Purpose: Negates the %in% operator
"%notin%"<-Negate("%in%")

Data Processing

To load the data, we first check if a “data” sub-directory exists in the user’s current working directory. If it does not, said sub-directory will be created to store the downloaded data. The data is then downloaded and read into R as the object “storm”.

# If "data" sub-dir doesn't exist, create it
if(!file.exists("./data")){dir.create("./data")} 

# Download file to "data" sub-dir
download.file('https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2',
              destfile = './data/storm.csv')

# Read data into R
storm <- read.csv('./data/storm.csv')

# Check number of obs
nrow(storm)
## [1] 902297

To process the data, we first clean the free-text description of the severe weather event. We then define property damage (in dollars) and crop damage (in dollars) based on values and units entered. Note that observations with invalid unit entries will have corresponding values set to missing.

# Clean EVTYPE, the free-text description of the severe weather event
# Remove leading/trailing spaces + make lower + replace double spaces with single space
storm <- storm %>% mutate(EVTYPE=gsub("  "," ",tolower(trimws(EVTYPE))))

# Define property damage and crop damage
storm <- storm %>% mutate(
    
    propdam = case_when(
        PROPDMG == 0 ~ 0,
        PROPDMG > 0 & PROPDMGEXP %in% c("k","K") ~ PROPDMG*1000,         # thousands
        PROPDMG > 0 & PROPDMGEXP %in% c("m","M") ~ PROPDMG*1000000,      # millions
        PROPDMG > 0 & PROPDMGEXP %in% c("b","B") ~ PROPDMG*1000000000,   # billions
        PROPDMG > 0 & PROPDMGEXP %notin% c('k','K','m','M','b','B') ~ NA # invalid unit entry
    ),
    
    cropdam = case_when(
        CROPDMG == 0 ~ 0,
        CROPDMG > 0 & CROPDMGEXP %in% c("k","K") ~ CROPDMG*1000,         # thousands
        CROPDMG > 0 & CROPDMGEXP %in% c("m","M") ~ CROPDMG*1000000,      # millions
        CROPDMG > 0 & CROPDMGEXP %in% c("b","B") ~ CROPDMG*1000000000,   # billions
        CROPDMG > 0 & CROPDMGEXP %notin% c('k','K','m','M','b','B') ~ NA # invalid unit entry
    ),
)

# Summaries
summary(storm$propdam)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max.      NA's 
## 0.000e+00 0.000e+00 0.000e+00 4.738e+05 5.000e+02 1.150e+11       327
summary(storm$cropdam)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max.      NA's 
## 0.000e+00 0.000e+00 0.000e+00 5.442e+04 0.000e+00 5.000e+09        15

In total, there are 327 observations with missing property damage values and 15 observations with missing crop damage values. With almost a million observations in our dataset, the proportion of observations with said missing values appears negligible.


Classify Events

Table 1, Section 2.1.1, page 6, of the National Weather Service Storm Data Preparation Directive gives the 48 severe weather events permitted for entry into the NOAA storm database. Since the variable EVTYPE contains the free-text description of the documented severe weather event, we proceed with data preparation by classifying each of the free-text descriptions into exactly one of the 48 severe weather event categories.

To classify free-text descriptions, the grep()/grepl() functions are used to search for key words and strings that indicate a given severe weather event, in accordance with definitions provided in the National Weather Service Storm Data Preparation Directive. In addition, we use the following rules in our classification scheme:

Rule 1: If a specific storm event (avalanche, blizzard, flood, hurricane, etc.) is noted and other singular weather events (rain, wind, cold, etc.) are noted, then the observation is assigned to the specific storm event listed in the note.

  • Examples
    • “high wind/blizzard” = “Blizzard”
    • “rain/flood” = “Flood”

Rule 2: If more than one specific storm event is noted then the observation is assigned to the specific storm event listed first in the note.

  • Examples
    • “blizzard/avalanche” = “Blizzard”
    • “ice storm/flash flood” = “Ice Storm”
  • Exceptions
    • Since waterspouts can become tornadoes and vice versa, we assume entries of the form “waterspout/tornado” represent such phenomena and therefore classify “waterspout/tornado” = “Tornado”.

Rule 3: If no specific storm events are noted and more than one singular weather event is noted, then the observation is assigned to the singular weather event listed first in the note.

  • Examples
    • “extreme wind chill/blowing snow” = “Extreme Cold/Wind Chill”
    • “heavy rain/lightning” = “Heavy Rain”
    • “hail/wind” = “Hail”
    • “cold/snow” = “Cold/Wind Chill”
  • Exceptions
    • When multiple singular weather events are noted and meet the criteria for “Winter Weather”, then the observation is assigned to the “Winter Weather” event category.
      • e.g., “rain/snow/ice” = “Winter Weather”
    • When the first singular weather event cannot be classified into one of the 48 event categories, then the observation is assigned to the second singular weather event listed in the note. For example, “wind” or “gusty wind” does not provide enough information to classify as either “High Wind” or “Strong Wind”. Accordingly, we have…
      • “wind/hail” = “Hail”
      • “gusty wind/hvy rain” = “Heavy Rain”
      • “gusty wind/rain” = “Heavy Rain”

Invariably, some free-text descriptions will not be able to be classified into one of the 48 event categories. For such observations, we defined broad categories for non-classification, including:

  • Wind Event: Wind event that cannot be classified as either “High Wind” or “Strong Wind”
  • Coastal Event: Coastal event that cannot be classified as “Coastal Flood”
  • Precipitation: Precipitation event that cannot be classifed
  • Summary: Free-text description is “Summary of [date]”
  • Record Temp: Record temperature event but direction not specified
  • Unknown: No clear non-classification category

Any observation falling into any of the above six categories will be removed from the analytical dataset.

Note the event classification code below is unique to the dataset we are working with. While this classification code could be used as the basis for classifying free-text descriptions in other NOAA storm database datasets, one should NOT directly apply/replicate this classification code in other datasets.

### Event Classification Code

events <- as.data.frame(table(storm$EVTYPE)) # Cleaned free-text event descriptions, 883 distinct
names(events)[1]<-'name'                     # name = free-text event description

events$event<-NA # event category

# Astronomical Low Tide
events$event[grep("low tide",events$name)] <- "Astronomical Low Tide"

# Avalanche
events$event[is.na(events$event) & grepl("avalanche|avalance",events$name)]  <- "Avalanche"
    events$event[events$event=='Avalanche' & grepl("blizzard/",events$name)] <- "Blizzard"

# Blizzard
events$event[is.na(events$event) & grepl("blizzard",events$name)]            <- "Blizzard"
    events$event[events$event=='Blizzard' & grepl("icestorm/",events$name)]  <- "Ice Storm"   

# Coastal Flood
events$event[is.na(events$event) & grepl("coastal flood|cstl",events$name)]  <- "Coastal Flood"

# Cold/Wind Chill
events$event[is.na(events$event) & 
    grepl("cold|wind chill|windchill|hypothermia|
          |low temp|record low|cool",events$name)]                  <- "Cold/Wind Chill"

    events$event[events$event=="Cold/Wind Chill" & 
        grepl("snow-|snow/|snow and|snow\\\\cold",events$name)]     <- "Heavy Snow"     
    events$event[events$event=="Cold/Wind Chill" &
        grepl("tornado|air funnel",events$name)]                    <- "Tornado"        
    events$event[events$event=="Cold/Wind Chill" & 
        grepl("fog",events$name)]                                   <- "Freezing Fog"   
    events$event[events$event=="Cold/Wind Chill" & 
        grepl("rain",events$name)]                                  <- "Drought" # rec low rain=drgt
    events$event[events$event=="Cold/Wind Chill" & 
        grepl("high wind/|high winds/|high winds and",events$name)] <- "High Wind"
    events$event[events$event=="Cold/Wind Chill" & 
        grepl("extreme|record|bitter|excessive",events$name)]       <- "Extreme Cold/Wind Chill"

# Debris Flow        
events$event[is.na(events$event) & 
    grepl("landslide|landslump|mudslide|mud slide|rock slide",events$name)]      <- "Debris Flow"
    
    events$event[events$event=="Debris Flow" & grepl("flash flood",events$name)] <- "Flash Flood"
    events$event[events$event=="Debris Flow" 
        & grepl("urban flood landslide",events$name)]                            <- "Flood"

# Dense Fog
events$event[is.na(events$event) & grepl("fog",events$name)]                    <- "Dense Fog"
    
    events$event[events$event=="Dense Fog" & grepl("freezing|ice",events$name)] <- "Freezing Fog" 
    events$event[is.na(events$event) & grepl("ice storm",events$name)]          <- "Ice Storm"    
    events$event[is.na(events$event) & grepl("glaze",events$name)]              <- "Freezing Fog" 

# Dense Smoke    
events$event[is.na(events$event) & grepl("smoke",events$name)]              <- "Dense Smoke"

# Drought
events$event[is.na(events$event) & grepl("drought|dry|driest",events$name)] <- "Drought"
    events$event[events$event=="Drought" & 
        grepl("microburst|mircoburst",events$name)]                         <- "Thunderstorm Wind" 

# Dust Devil
events$event[is.na(events$event) & grepl("dust devil|dust devel",events$name)] <- "Dust Devil" 

# Dust Storm
events$event[is.na(events$event) & 
    grepl("dust storm|blowing dust|duststorm",events$name)] <- "Dust Storm"

# Heat
events$event[is.na(events$event) & 
    grepl("heat|hyperthermia|warm|record high|
          |hot|high temperature record",events$name)]  <- "Heat"
    
    events$event[events$event=="Heat" & 
        grepl("excessive|extreme|record",events$name)] <- "Excessive Heat"

# Flood
events$event[is.na(events$event) & 
    grepl("flood|fld|dam break|dam fail|floood",events$name)]       <- "Flood"

    events$event[events$event=="Flood" & 
        grepl("beach|coastal|tidal",events$name)]                   <- "Coastal Flood"    
    events$event[events$event=="Flood" & grepl("lake",events$name)] <- "Lakeshore Flood"    
    events$event[events$event=="Flood" & 
        grepl("thunderstorm wind",events$name)]                     <- "Thunderstorm Wind"   
    events$event[events$event=="Flood" & 
        grepl("flash|dam break|dam fail",events$name)]              <- "Flash Flood" # by defn
    events$event[is.na(events$event) & 
        grepl("high water|rapidly rising water",events$name)]       <- "Flash Flood" # by defn
        events$event[events$event=="Flash Flood" &
            grepl("ice storm",events$name)]                         <- "Ice Storm" 

# Frost/Freeze
events$event[is.na(events$event) & grepl("frost|freeze",events$name)] <- "Frost/Freeze"    

# Funnel Cloud
events$event[is.na(events$event) & grepl("funnel",events$name)] <- "Funnel Cloud"
    events$event[events$event=="Funnel Cloud" & 
        grepl("thunderstorm wind",events$name)]                 <- "Thunderstorm Wind"
    events$event[events$event=="Funnel Cloud" & 
        grepl("waterspout fun",events$name)]                    <- "Waterspout"

# Hail    
events$event[is.na(events$event) & grepl("hail",events$name)]         <- "Hail"
    events$event[events$event=="Hail" & grepl("tornado",events$name)] <- "Tornado"
    events$event[events$event=="Hail" & 
        grepl("thunderstorm wind|tstm wind",events$name)]             <- "Thunderstorm Wind" 
    events$event[events$event=="Hail" & grepl("marine",events$name)]  <- "Marine Hail"

# Heavy Rain
events$event[is.na(events$event) &
    grepl("rain|heavy rain|hvy rain|excessive rain|
          |rainstorm|record rain|torrential rain|
          |unseasonal rain|heavy shower|wet",events$name)]                    <- "Heavy Rain"

    events$event[events$event=="Heavy Rain" & grepl("high wind",events$name)] <- "High Wind" 
    events$event[events$event=="Heavy Rain" &
        grepl("lightning and|lightning/",events$name)]                        <- "Lightning" 
    events$event[events$event=="Heavy Rain" & 
        grepl("freezing|snow",events$name)]                                   <- "Winter Weather" 
        events$event[events$event=="Winter Weather" & 
            grepl("wet snow",events$name)]                                    <- "Heavy Snow"
    events$event[events$event=="Heavy Rain" & 
            grepl("microburst|micoburst|thunderstorm wind",events$name)]      <- "Thunderstorm Wind" 

# Heavy Snow    
events$event[is.na(events$event) & grepl("snow",events$name)]                 <- "Heavy Snow"

    events$event[events$event=="Heavy Snow" & grepl("lake",events$name)]      <- "Lake-Effect Snow" 
    events$event[events$event=="Heavy Snow" & grepl("ice storm",events$name)] <- "Ice Storm"        
    events$event[events$event=="Heavy Snow" & 
        grepl("heavy snow/high winds/freezing|ice|
              |sleet|freezing precip",events$name)]                           <- "Winter Weather"
    events$event[events$event=="Heavy Snow" & 
        grepl("winter storm|thundersnow",events$name)]                        <- "Winter Storm" 
    events$event[events$event=="Heavy Snow" & 
        grepl("high wind/|high winds/|high wind and",events$name)]            <- "High Wind"    
    events$event[events$event=="Heavy Snow" & 
        grepl("lack of snow",events$name)]                               <- "Not Classified/Unknown"    

# High Surf    
events$event[is.na(events$event) & 
    grepl("surf|high seas|rough seas|rogue wave|
          |high swells|high waves|heavy seas|heavy swells",events$name)]  <- "High Surf"
    
    events$event[events$event=="High Surf" & 
        grepl("rip",events$name)]                                         <- "Rip Current" 

# High Wind    
events$event[is.na(events$event) & grepl("high wind",events$name)]        <- "High Wind"
    events$event[events$event=="High Wind" &
        grepl("hurricane",events$name)]                                   <- "Hurricane/Typhoon" 
    events$event[events$event=="High Wind" &
        grepl("winter storm",events$name)]                                <- "Winter Storm"    
    events$event[events$event=="High Wind" & grepl("marine",events$name)] <- "Marine High Wind"  

# Hurricane/Typhoon
events$event[is.na(events$event) & 
    grepl("hurrican|typhoon|floyd",events$name)] <- "Hurricane/Typhoon"
# Ice Storm
events$event[is.na(events$event) & grepl("ice|icy",events$name)] <- "Ice Storm" 

# Lightning
events$event[is.na(events$event) & grepl("lightning|lighting|ligntning",events$name)] <- "Lightning"
    events$event[events$event=="Lightning" & 
        grepl("wind/light|winds light|tstm wind|thunderstorm",events$name)] <- "Thunderstorm Wind" 

# Strong Wind
events$event[is.na(events$event) & grepl("strong wind",events$name)]        <- "Strong Wind"
    events$event[events$event=="Strong Wind" & grepl("marine",events$name)] <- "Marine Strong Wind" 

# Thunderstorm Wind
events$event[is.na(events$event) & 
    grepl("thunderstorm|thuderstorm|thundeerstrom|thundestorm|
          |thunderstormw|thunderstrom|thundertorm|thundertsorm|
          |thunerstorm|tstm|tunderstorm|thundeerstorm|
          |thunderestorm|storm force|microburst|
          |mircoburst|gustnado|downburst",events$name)] <- "Thunderstorm Wind"    
    
    events$event[events$event=="Thunderstorm Wind" & 
        grepl("marine",events$name)]                    <- "Marine Thunderstorm Wind" 
    events$event[events$event=="Thunderstorm Wind" & 
        grepl("non",events$name)]                       <- "Not Classified/Wind Event"
    
# Rip Current  
events$event[is.na(events$event) & grepl("rip",events$name)] <- "Rip Current"      
# Seiche
events$event[is.na(events$event) & grepl("seiche",events$name)] <- "Seiche"     
# Sleet
events$event[is.na(events$event) & grepl("sleet",events$name)] <- "Sleet"    
# Storm Surge/Tide
events$event[is.na(events$event) & grepl("storm surge|tide",events$name)] <- "Storm Surge/Tide"   

# Tornado
events$event[is.na(events$event) & 
    grepl("tornado|torndao|landspout",events$name)] <- "Tornado" 

# Tropical Depression
events$event[is.na(events$event) & 
    grepl("tropical depression",events$name)] <- "Tropical Depression"

# Tropical Storm
events$event[is.na(events$event) & grepl("tropical storm",events$name)] <- "Tropical Storm" 
# Tsunami
events$event[is.na(events$event) & grepl("tsunami",events$name)] <- "Tsunami" 
# Volcanic Ash
events$event[is.na(events$event) & grepl("volcanic|ash|vog",events$name)] <- "Volcanic Ash" 

# Waterspout
events$event[is.na(events$event) & 
    grepl("waterspout|water spout|wayter",events$name)] <- "Waterspout"  

# Wildfire
events$event[is.na(events$event) & grepl("wild|fire",events$name)] <- "Wildfire"  
# Winter Storm
events$event[is.na(events$event) & grepl("winter storm",events$name)] <- "Winter Storm"  
# Winter Weather
events$event[is.na(events$event) & 
    grepl("winter weather|wintry mix|winter mix|
          |wintery mix|heavy mix|mixed|freezing",events$name)] <- "Winter Weather"

# Not Classified Categories
events$event[is.na(events$event) & 
    grepl("wind|wnd",events$name)]      <- "Not Classified/Wind Event"    # unknown wind event

events$event[is.na(events$event) & 
    grepl("coastal|beach",events$name)] <- "Not Classified/Coastal Event" # unknown coastal event

events$event[is.na(events$event) & 
    grepl("prec",events$name)]          <- "Not Classified/Precipitation" # unknown precipitation

events$event[is.na(events$event) & 
    grepl("summary",events$name)]       <- "Not Classified/Summary"       # "summary" entry

events$event[is.na(events$event) & 
    grepl("record",events$name)]        <- "Not Classified/Record Temp"   # unknown record temp

events$event[is.na(events$event)]       <- "Not Classified/Unknown"       # unknown 

# Order by event category, then free-text entry
events <- events %>% arrange(event,name) 

# Total number of distinct, cleaned free-text entries in the data
nrow(events)
## [1] 883
### DOWNLOADABLE DATA TABLE OF CLASSIFIED EVENTS ###

# NOTE 1: For peer reviewers - I do not consider this a figure. This is an embedded, interactive
#         table intended to facilitate reproducible research.

# NOTE 2: Frequency gives the number of times the free-text entry occurs in the storm dataset

# NOTE 3: This data table can be downloaded as either a .csv or excel file by clicking on 
#         the "CSV" or "Excel" buttons below

datatable(events, rownames = FALSE, colnames = c("Free Text Entry","Frequency","Event Category"), 
          filter = 'top', extensions = 'Buttons',
          options = list(dom = '1Bfrtip',buttons= c('pageLength','csv','excel'), 
                         columnDefs = list(list(className = 'dt-center',targets = 0:(ncol(events)-1))),
                         scrollX = TRUE, scrollCollapse = TRUE, lengthMenu = c(5,10,25,50)))

Create Analytical Dataset

# Number of obs in raw dataset
nrow(storm) 
## [1] 902297
### Merge event classifications into the storm dataset to create analytical dataset

mergeme <- events %>% rename(EVTYPE=name) %>% select(EVTYPE,event)    # rename to merge
analysis <- merge(storm,mergeme,by="EVTYPE",all.x=T)                  # merge

# Remove events that could not be classified
analysis <- analysis %>% filter(event %notin% c('Not Classified/Coastal Event',
                                             'Not Classified/Precipitation',
                                             'Not Classified/Record Temp','Not Classified/Summary',
                                             'Not Classified/Unknown','Not Classified/Wind Event'))

# Number of obs in analytical dataset
nrow(analysis) 
## [1] 901436

We see above that 902297-901436=861 observations were removed due to non-classification. This removal therefore represents only 0.1% of the observations in our raw dataset.


Analysis

We first calculate mean property damage (in millions), mean crop damage (in millions), mean fatalities, and mean injuries for each of the 48 severe weather events and then plot these values in a series of bar plots. We then identify the top ten most harmful events for each of the four outcomes.

# Calculate mean prop damage, crop damage, fatalities, and injuries by event
eventmeans <- analysis %>% group_by(event) %>% 
    # calculate means
    summarize(
        meanprop  = mean(propdam,na.rm=T)/1000000, # in millions
        meancrop  = mean(cropdam,na.rm=T)/1000000, # in millions
        meanfatal = mean(FATALITIES,na.rm=T),
        meaninjur = mean(INJURIES,na.rm=T)
    ) %>% 
    mutate(event=factor(event)) # make event a factor variable

Results

# Skeleton plot
g <- eventmeans %>% ggplot() + 
    
    xlab("Severe Weather Event") +
    
    # add black plot border & black margin border + remove all legends
    theme(plot.background = element_rect(color="black"), 
          panel.background = element_rect(color="black"),legend.position="none") +
    
    # use levels for axis labels
    scale_x_discrete(limits = rev(levels(eventmeans$event))) + 
    coord_flip()

# Bar plot of means for each outcome
p <- g + geom_bar(aes(x=event,y=meanprop,fill=levels(event)),stat="identity") +
         ylab("Mean Property Damage\n(in Millions)") +
         ggtitle("Property Damage") 

q <- g + geom_bar(aes(x=event,y=meancrop,fill=levels(event)),stat="identity") +
         ylab("Mean Crop Damage\n(in Millions)") +
         ggtitle("Crop Damage") 

r <- g + geom_bar(aes(x=event,y=meanfatal,fill=levels(event)),stat="identity") +
         ylab("Mean Fatalities") +
         ggtitle("Fatalities") 

s <- g + geom_bar(aes(x=event,y=meaninjur,fill=levels(event)),stat="identity") +
         ylab("Mean Injuries") +
         ggtitle("Injuries") 
# Figure 1
p + q + r + s + 
plot_annotation("Figure 1: Property Damage, Crop Damage, Fatalities, and Injuries by Severe Weather Event")

Figure 1 shows that the distribution of mean values across the 48 severe weather events is heavily skewed for each outcome. For each outcome, a large number of events have relatively small means, while a small number of events have relatively large means. Note that bars corresponding to the same severe weather event have the same color across the outcome plots. In each outcome plot we see five severe weather events that stand out and appear to dominate all others. In order to see the variation in means for the other 43 severe weather events, we remove the five most harmful events from each outcome plot, respectively.

# Remove 5 most harmful for each outcome + drop unused levels for plotting
a <- eventmeans %>% arrange(desc(meanprop))  %>% slice(-5:-1); a$event <- droplevels(a$event)
b <- eventmeans %>% arrange(desc(meancrop))  %>% slice(-5:-1); b$event <- droplevels(b$event)
c <- eventmeans %>% arrange(desc(meanfatal)) %>% slice(-5:-1); c$event <- droplevels(c$event) 
d <- eventmeans %>% arrange(desc(meaninjur)) %>% slice(-5:-1); d$event <- droplevels(d$event) 

# Bar plot of means for each outcome
p <- a %>% ggplot() + geom_bar(aes(x=event,y=meanprop),stat="identity") +
         scale_x_discrete(limits = rev(levels(a$event))) + coord_flip() + 
         xlab("Severe Weather Event") + 
         ylab("Mean Property Damage\n(in Millions)") +
         ggtitle("Property Damage") +
         theme(plot.background = element_rect(color="black"),
               panel.background = element_rect(color="black"))

q <- b %>% ggplot() + geom_bar(aes(x=event,y=meancrop),stat="identity") +
         scale_x_discrete(limits = rev(levels(b$event))) + coord_flip() +
         xlab("Severe Weather Event") +
         ylab("Mean Crop Damage\n(in Millions)") +
         ggtitle("Crop Damage") +
         theme(plot.background = element_rect(color="black"), 
               panel.background = element_rect(color="black"))

r <- c %>% ggplot() + geom_bar(aes(x=event,y=meanfatal),stat="identity") +
         scale_x_discrete(limits = rev(levels(c$event))) + coord_flip() +
         xlab("Severe Weather Event") +
         ylab("Mean Fatalities") +
         ggtitle("Fatalities") +
         theme(plot.background = element_rect(color="black"),
               panel.background = element_rect(color="black"))

s <- d %>% ggplot() + geom_bar(aes(x=event,y=meaninjur),stat="identity") +
         scale_x_discrete(limits = rev(levels(d$event))) + coord_flip() +
         xlab("Severe Weather Event") +
         ylab("Mean Injuries") +
         ggtitle("Injuries") +
         theme(plot.background = element_rect(color="black"), 
               panel.background = element_rect(color="black"))
# Figure 2
p + q + r + s + 
plot_annotation("Figure 2: Property Damage, Crop Damage, Fatalities, and Injuries by Severe Weather Event\nFive Most Harmful Events Removed")

Since there is now no guarantee the same set of events will appear across the outcome plots, Figure 2 omits color for clarity. As expected, Figure 2 more clearly shows the variation in means across the remaining 43 severe weather events and indeed we see that some events appear more harmful than others with respect to all four outcomes.

We conclude our analysis by identifying the ten most harmful events for each outcome and representing this information in the form of a table.

# Get 10 most harmful events for each outcome
a <- eventmeans %>% arrange(desc(meanprop))  %>% slice(1:10) %>% 
    mutate(rank=1:10) %>% select(rank,event,meanprop)
b <- eventmeans %>% arrange(desc(meancrop))  %>% slice(1:10) %>% 
    mutate(rank=1:10) %>% select(rank,event,meancrop)
c <- eventmeans %>% arrange(desc(meanfatal)) %>% slice(1:10) %>% 
    mutate(rank=1:10) %>% select(rank,event,meanfatal)
d <- eventmeans %>% arrange(desc(meaninjur)) %>% slice(1:10) %>% 
    mutate(rank=1:10) %>% select(rank,event,meaninjur)

# Merge to create table
dflist <- list(a,b,c,d)      
t <- dflist %>% reduce(full_join, by='rank') %>%
    mutate(
        meanprop  = round(meanprop,2), # round for table
        meancrop  = round(meancrop,2),
        meanfatal = round(meanfatal,2),
        meaninjur = round(meaninjur,2)
    )

# Generate table
kable(t,col.names=c("Rank","Event","Mean Cost (m)","Event","Mean Cost (m)",
                    "Event","Mean Fatalities","Event","Mean Injuries"),
      caption = "<font size=-1.5><b><FONT COLOR=black>Figure 3: Ten Most Harmful Severe Weather Events by Outcome</b></font>") %>% 
kable_styling(bootstrap_options = c("striped", "hover","condensed"),font_size = 10.5) %>%
add_header_above(c(" " = 1,"Property Damage"=2,"Crop Damage"=2,"Fatalities"=2,"Injuries"=2)) %>%
footnote(general = "m = millions")
Figure 3: Ten Most Harmful Severe Weather Events by Outcome
Property Damage
Crop Damage
Fatalities
Injuries
Rank Event Mean Cost (m) Event Mean Cost (m) Event Mean Fatalities Event Mean Injuries
1 Hurricane/Typhoon 283.58 Hurricane/Typhoon 18.33 Tsunami 1.65 Tsunami 6.45
2 Storm Surge/Tide 93.15 Drought 5.32 Heat 1.11 Hurricane/Typhoon 4.43
3 Tropical Storm 11.07 Ice Storm 2.33 Excessive Heat 1.03 Excessive Heat 3.44
4 Tsunami 7.20 Frost/Freeze 1.33 Rip Current 0.74 Heat 2.46
5 Flood 5.09 Tropical Storm 1.00 Avalanche 0.58 Freezing Fog 2.31
6 Wildfire 2.00 Extreme Cold/Wind Chill 0.67 Hurricane/Typhoon 0.45 Tornado 1.51
7 Ice Storm 1.84 Heat 0.40 Marine Strong Wind 0.29 Ice Storm 1.01
8 Tornado 0.97 Flood 0.37 Cold/Wind Chill 0.23 Dust Storm 1.01
9 Winter Storm 0.59 Excessive Heat 0.25 High Surf 0.16 Rip Current 0.68
10 Coastal Flood 0.51 Cold/Wind Chill 0.13 Extreme Cold/Wind Chill 0.15 Dense Fog 0.59
Note:
m = millions

To construct Figure 3, we rank-ordered (descending) the mean values across the 48 severe weather events for each outcome and listed the first ten ranks for each outcome. The mean value for each event is also given for each outcome in Figure 3. We see that some severe weather events appear in multiple outcome lists. Such events would be of particular interest to stakeholders as these events may impact multiple domains of harm accessed in this analysis.


Limitations

This work has several limitations, including:

  • We performed an exploratory analysis of the data and did not attempt any statistical inference or model-based approaches to answering the posed questions. Providing some measure(s) of uncertainty for our estimates would be crucial in a real-world setting.

  • We did not use any of the available geographic data to account for population size when assessing fatalities and injuries. The metrics of “mean fatalities” and “mean injuries” do not account for population size and other metrics such as “Fatalities per 100,000” may be more meaningful.

  • We did not consider the frequency with which the 48 severe weather events occurred. Consider two severe weather events, A and B. Suppose A occurs much more frequently than B but A is less harmful than B (on some outcome e.g., crop damage). Then it is possible that A, in the long run, costs more in terms of crop damage than B. Our analysis does not account for this difference in A and B, and such information may be valuable to stakeholders. In addition, identifying severe weather events which are increasing or decreasing in frequency over time may be important information.

  • For some of the 48 severe weather events, sample sizes may be low and naive estimators such as the sample mean may be unreliable for such events. As such, an empirical Bayes smoothing adjustment, or similar technique, may be appropriate.