Analysis of the most harmful and economically damaging weather events in the United States

The following analysis considers the data in the NOAA Storm Database to answer two key questions:

  1. Which types of weather events are most harmful with respect to population health, and
  2. Which types of events have the greatest economic consequences?

The data ranges from 1950 to 2011. There are known data issues, including missing data (more prevalent in earlier years), imprecise weather event definition (i.e. “flood” and “river flood” are both coded), and typographical errors (“avalanch”).

Data Processing

The data is downloaded directly from a cloudfront repository, unzipped and loaded into R. Transformations are performed to calculate damages (in raw form, they represented as an integer, and a suffix (Kilo, million, billion), summarize totals by the type of weather event, and transform the data from wide to narrow format, to improve plotting.

Additional fields “Casualties”, representing the sum of injuries and fatalities, and “Economic Damage”, representing the sum of property and crop damages are calculated to facilitate interpretation.

knitr::opts_chunk$set(cache = TRUE) #this chunk is cached due to download / processing time

# Loading required libraries
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(reshape2)
library(ggplot2)

# Downloading compressed file from source, importing to R
data_temp <- tempfile()
download.file(
     "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
     data_temp
)

StormRaw <- read.csv(data_temp)
unlink(data_temp)

# Setting a variable to define damage value units
DMGMULTIPLIER <- 1000000

# Processing the raw data into a tidy set of data for further analysis
StormProcessed <- StormRaw %>%
     mutate(PROPDMGMULT = case_when ( grepl("K",PROPDMGEXP,fixed=TRUE) ~ 1000/DMGMULTIPLIER,
               grepl("M",PROPDMGEXP,fixed=TRUE) ~ 1000000/DMGMULTIPLIER,
               grepl("B",PROPDMGEXP,fixed=TRUE) ~ 1000000000/DMGMULTIPLIER,
               TRUE ~ 0 ),
            CROPDMGMULT = case_when ( grepl("K",CROPDMGEXP,fixed=TRUE) ~ 1000/DMGMULTIPLIER,
               grepl("M",CROPDMGEXP,fixed=TRUE) ~ 1000000/DMGMULTIPLIER,
               grepl("B",CROPDMGEXP,fixed=TRUE) ~ 1000000000/DMGMULTIPLIER,
               TRUE ~ 0 ),
            PROPDMG = if_else(is.na(PROPDMG),0,PROPDMG),
            CROPDMG = if_else(is.na(CROPDMG),0,CROPDMG),
            PROPDMGVAL = PROPDMG * PROPDMGMULT,
            CROPDMGVAL = CROPDMG * CROPDMGMULT,
            EventType = EVTYPE) %>%
     group_by(EventType) %>%
     summarize(Fatalities = sum(FATALITIES),
               Injuries = sum(INJURIES),
               PropertyDamage = sum(PROPDMGVAL),
               CropDamage = sum(CROPDMGVAL)) %>%
     mutate( Casualties = Fatalities + Injuries,
             EconomicDamage = PropertyDamage + CropDamage) %>%
     select(EventType,Fatalities,Injuries,Casualties,PropertyDamage,CropDamage,EconomicDamage)

# Convert processed data to long format for plotting
StormProcessedPopHealth <- StormProcessed %>%
     select(EventType,Fatalities,Injuries,Casualties) %>%
     top_n(n = 20,wt = Casualties) %>%
     select(EventType,Fatalities,Injuries)
StormProcessedPopHealth <- melt(StormProcessedPopHealth,id.vars = "EventType")

StormProcessedEconomic <- StormProcessed %>%
     select(EventType,PropertyDamage,CropDamage,EconomicDamage) %>%
     top_n(n = 20,wt = EconomicDamage) %>%
     select(EventType,PropertyDamage,CropDamage)
StormProcessedEconomic <- melt(StormProcessedEconomic,id.vars = "EventType")

Results

The most harmful weather events can be identified by the sum of the number of individuals injured and killed. The following plot identifies that tornados are by far the most harmful weather event type on aggregate. Excessive heat, wind, flood and lightning make up a second tier of harmful weather event types.

# Create plots answering key questions
ggplot(StormProcessedPopHealth, aes(x = reorder(EventType,-value),
                                    y = value,
                                    fill=variable)
       ) +
     geom_bar(stat = "identity") +
     theme(axis.text.x = element_text(angle = 90, hjust = 1),
           legend.position = c(0.9, 0.8)) +
     labs(title = "Top 20 Event Types by Total Casualties",
          caption = "Based on data from the NOAA Storm Database from 1950 to 2011",
          x = "",
          y = "Total Casualties",
          fill = "Casualty Type"
          )

The most economically damaging weather events can be identified by the sum of the property and crop damages caused. The following plot identifies that floods cause more economic damage than any other weather event type. Hurricanes, tornados and storm surges make up a second tier of economically damaging event types. Drought is a unique weather event in causing more damage to crops than any other event type, but causes no property damage.

ggplot(StormProcessedEconomic, aes(x = reorder(EventType,-value),
                                    y = value,
                                    fill=variable)
) +
     geom_bar(stat = "identity") +
     theme(axis.text.x = element_text(angle = 90, hjust = 1),
           legend.position = c(0.9, 0.8)) +
     labs(title = "Top 20 Event Types by Total Economic Damage",
          caption = "Based on data from the NOAA Storm Database from 1950 to 2011",
          x = "",
          y = "Total Damage",
          fill = "Damage Type"
     )