Public Health and Economic Impacts of Severe Weather Events in the USA

Executive Summary

The US Government’s National Oceanic and Atmospheric Administration (NOAA) publishes a database of severe weather event across the country. The database contains data from 1950 to 2011 and includes information on fatalities and injuries caused by each event as well as the associated economic losses from property and crop damage.

While the quality of the dataset is variable, data for more recent years is expected to be more complete. The key findings from this analysis of the dataset include

  • The public health impacts of severe weather as measured by fatalities and injuries has increased over time.
  • The public health impacts are highly concentrated in a small number of states as well as concentrated as by event type.
  • The economic impacts of severe weather events have increased since the 1990s.
  • The economic impacts are highly concentrated both geographically and temporally.

Data Processing

The severe weather dataset was downloaded and examined. It consists of over 9 million observations, each consisting of up to 37 variables including the date, time, and location of the event and the associated casualties and economic damage. The National Weather Service’s [Storm Data Documentation] (https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf) was used to provide clarity on the contents.

## Set up work environment and download the data

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0     ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1     ✔ tibble  3.2.1
## ✔ purrr   1.0.4     ✔ tidyr   1.3.1
## ✔ readr   2.1.5
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## fileurl <- 
    "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
## [1] "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
## download.file(fileurl, destfile = "./Reproducible_Peer2/data.bz2")

extract <- read.csv("data.bz2")

Data Transformation

The date on which the event occurred is provided as a character string in the format (m/dd/yyy h:mm:ss). The economic damage is provided as two sets of columns, the first providing a numeric value and the second providing the scale associated with the numeric value. As the documentation did not provide an inventory of the values for this field, only the most commonly used scales (e.g, “k” or “K” for kilo, i.e., multiply by 1000) were used. The analysis can easily be rerun if the corresponding information becomes available at a later stage. Relevant transformations were applied to produce a dataset ready for analysis.

  • Extracting the year in which the event occurred.
  • Calculating the (numeric) value of the property and crop damage for each event.
  • Subsetting the data to only include the required fields
extract$Year <- as.numeric(format(mdy_hms(extract$BGN_DATE), "%Y"))

## Define function to calculate damage $ values
dollarize <- function(input, column1, column2)
{   x <- input[,column1]
    y <- input[,column2]
    symb <- c("k","K","m","M","b","B")    
    mult <- c(1000,1000,1000000,1000000,1000000000,1000000000)    
    x * mult[match(y, symb)]
}

extract$PropDamage <- dollarize(extract, 25, 26) %>% replace_na(0)
extract$CropDamage <- dollarize(extract, 27, 28) %>% replace_na(0)
extract$TotalDamage <- extract$PropDamage + extract$CropDamage

## Subset the data needed for further analysis and clean up column names

data <- extract[,c(7, 8, 23, 24, 38, 39, 40, 41)]
colnames(data) <- c("State","Event_Type","fatalities","injuries", "Year",
                    "propertydamage","cropdamage","totaldamage")

Results - Public Health Impacts

The public health impacts of severe weather were assessed according to three parameters - by event type, by state, and by year.

total_f <- sum(data$fatalities)
total_i <- sum(data$injuries)

The total number of fatalities and injuries in the dataset were 15145 and 140528 respectively.

Event type

The data was grouped by event type and the top 10 causes of fatalities and injuries are presented below. It is evident that these casualties are highly concentrated within a small set of events and that there is some correlation between the number of fatalities and injuries for each type of severe weather event.

by_event <- data %>% group_by(Event_Type) %>% 
    summarize(Fatalities = sum(fatalities), Injuries = sum(injuries))

head(arrange(by_event[,1:2], desc(Fatalities)),10) ## Top causes of fatalities
## # A tibble: 10 × 2
##    Event_Type     Fatalities
##    <chr>               <dbl>
##  1 TORNADO              5633
##  2 EXCESSIVE HEAT       1903
##  3 FLASH FLOOD           978
##  4 HEAT                  937
##  5 LIGHTNING             816
##  6 TSTM WIND             504
##  7 FLOOD                 470
##  8 RIP CURRENT           368
##  9 HIGH WIND             248
## 10 AVALANCHE             224
head(arrange(by_event[,c(1,3)], desc(Injuries)),10) ## Top causes of injuries
## # A tibble: 10 × 2
##    Event_Type        Injuries
##    <chr>                <dbl>
##  1 TORNADO              91346
##  2 TSTM WIND             6957
##  3 FLOOD                 6789
##  4 EXCESSIVE HEAT        6525
##  5 LIGHTNING             5230
##  6 HEAT                  2100
##  7 ICE STORM             1975
##  8 FLASH FLOOD           1777
##  9 THUNDERSTORM WIND     1488
## 10 HAIL                  1361

State

Similarly, the data was grouped by state and it was seen that the casualties are highly concentrated.

by_state <- data %>% group_by(State) %>%
    summarize(Fatalities = sum(fatalities), Injuries = sum(injuries))

head(arrange(by_state[,1:2], desc(Fatalities)),10) ## States with highest fatalities
## # A tibble: 10 × 2
##    State Fatalities
##    <chr>      <dbl>
##  1 IL          1421
##  2 TX          1366
##  3 PA           846
##  4 AL           784
##  5 MO           754
##  6 FL           746
##  7 MS           555
##  8 CA           550
##  9 AR           530
## 10 TN           521
head(arrange(by_state[,c(1,3)], desc(Injuries)),10) ## States with highest injuries
## # A tibble: 10 × 2
##    State Injuries
##    <chr>    <dbl>
##  1 TX       17667
##  2 MO        8998
##  3 AL        8742
##  4 OH        7112
##  5 MS        6675
##  6 FL        5918
##  7 OK        5710
##  8 IL        5563
##  9 AR        5550
## 10 TN        5202

Over Time

Plots for casualties over time are presented below. The number of fatalities casued by severe weather events has increased sharply since the 1990s.

by_year <- data %>% group_by(Year) %>% summarize(Fatalities = sum(fatalities), 
                                                 Injuries = sum(injuries))

g1 <- ggplot(by_year, aes(x = Year, y = Fatalities, group = 1))
g1 + geom_line() + theme_minimal() + 
    scale_x_continuous(breaks=c(1950,1960,1970,1980,1990,2000,2010)) + 
    ggtitle("Fatalities Caused by Severe Weather Events in the USA")

g2 <- ggplot(by_year, aes(x = Year, y = Injuries, group = 1))
g2 + geom_line() + theme_minimal() + 
    scale_x_continuous(breaks=c(1950,1960,1970,1980,1990,2000,2010)) + 
    scale_y_continuous(breaks=c(2000, 4000, 6000, 8000, 10000)) + 
    ggtitle("Injuries Caused by Severe Weather Events in the USA")

Results - Economic Impacts

As with public health impacts, the economic impacts were examined based on three parameters - by event type, by state, and by year.

total_eco <- round(sum(data$totaldamage)/1000000000,1)

The total economic damages associated with severe weather events was 476.4 billion dollars.

Event Type

The top causes for property and crop damage are presented below (damage values are in $ millions). It is evident that these most of the losses are concentrated in a few event types.

eco_event <- data %>% group_by(Event_Type) %>% 
    summarize(Property_Damage = sum(propertydamage)/1000000,
              Crop_Damage = sum(cropdamage)/1000000)

## Top causes of property damage
head(arrange(eco_event[,1:2], desc(Property_Damage)),10) 
## # A tibble: 10 × 2
##    Event_Type        Property_Damage
##    <chr>                       <dbl>
##  1 FLOOD                     144658.
##  2 HURRICANE/TYPHOON          69306.
##  3 TORNADO                    56937.
##  4 STORM SURGE                43324.
##  5 FLASH FLOOD                16141.
##  6 HAIL                       15732.
##  7 HURRICANE                  11868.
##  8 TROPICAL STORM              7704.
##  9 WINTER STORM                6688.
## 10 HIGH WIND                   5270.
## Top causes of crop damage
head(arrange(eco_event[,c(1,3)], desc(Crop_Damage)),10) 
## # A tibble: 10 × 2
##    Event_Type        Crop_Damage
##    <chr>                   <dbl>
##  1 DROUGHT                13973.
##  2 FLOOD                   5662.
##  3 RIVER FLOOD             5029.
##  4 ICE STORM               5022.
##  5 HAIL                    3026.
##  6 HURRICANE               2742.
##  7 HURRICANE/TYPHOON       2608.
##  8 FLASH FLOOD             1421.
##  9 EXTREME COLD            1293.
## 10 FROST/FREEZE            1094.

State

The states with the highest losses can be seen in the following table, with just a few states accounting for a large share of the total losses (damage values are in $ millions).

eco_state <- data %>% group_by(State) %>%
    summarize(Property_Damage = sum(propertydamage)/1000000, 
              Crop_Damage = sum(cropdamage)/1000000)

## States with highest property damage
head(arrange(eco_state[,1:2], desc(Property_Damage)),10) 
## # A tibble: 10 × 2
##    State Property_Damage
##    <chr>           <dbl>
##  1 CA            123588.
##  2 LA             60074.
##  3 FL             41510.
##  4 MS             29810.
##  5 TX             26642.
##  6 AL             17241.
##  7 IL              8537.
##  8 NC              8230.
##  9 MO              7179.
## 10 OH              6856.
## States with highest crop damage
head(arrange(eco_state[,c(1,3)], desc(Crop_Damage)),10) 
## # A tibble: 10 × 2
##    State Crop_Damage
##    <chr>       <dbl>
##  1 TX          7301.
##  2 MS          6610.
##  3 IL          5551.
##  4 IA          4697.
##  5 FL          3903.
##  6 CA          3528.
##  7 NE          2171.
##  8 NC          2054.
##  9 LA          1229.
## 10 OK          1207.

Over Time

As can be seen in the following plot, the economic damage associated with severe weather events has increased dramatically since the 1990s. In particular, 2005 and 2006 had extremely large damages.

eco_year <- data %>% group_by(Year) %>% 
    summarize(Total_Damage = sum(totaldamage)/1000000)

g3 <- ggplot(eco_year, aes(x= Year, y = Total_Damage))
g3 + geom_line() + theme_minimal() + 
    scale_x_continuous(breaks=c(1950,1960,1970,1980,1990,2000,2010)) + 
    ylab("Total Economic Damage (million $)") + 
    ggtitle("Economic Damages from Severe Weather Events in the USA")

head(arrange(eco_year, desc(Total_Damage)),5)
## # A tibble: 5 × 2
##    Year Total_Damage
##   <dbl>        <dbl>
## 1  2006      125472.
## 2  2005      100825.
## 3  2004       26799.
## 4  1993       21987.
## 5  2011       21556.

NOTE TO READER The above analysis can be easily replicated for narrower geographical scopes (e.g., at state or county level). Please contact the author of the report to request bespoke analysis.