The Impact of Major Weather Events on the US Economy and Public Health

Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many weather severe events can result in fatalities, injuries, property damage, and crop damage and preventing such outcomes to the extent possible is a key concern.

This paper explores the National Oceanic and Atmospheric Administration (NOAA) storm dataset. The dataset tracks the type and characteristics of major storms and weather events in the USA, including but not limited to where the occur, when they occur, estimates of any fatalities, injuries, and property damage. The dataset timespan of the dataset runs from the year 1950 and ends in November 2011.

Aim

The aim of this paper is to attempt to answer the following two key questions

  1. Across the United States, which types of events are most harmful with respect to population health?

  2. Across the United States, which types of events have the greatest economic consequences?

Based on the answers to these questions decisions can be made on the most effective way to spend available funds to minimise the impact of storms and other severe weather events.

Executive Summary of Results

In response to the question “Across the United States, which types of events are most harmful with respect to population health?” the analysis seems to show that Tornados have the most impact on public health followed by excessive heat.

In response to the question “Across the United States, which types of events have the greatest economic consequences?”, flooding seems to have the biggest effect on the economey when we measure imapct using estimation of damage caused.

As advised in the methodolgy section further analysis is required to ensure findings are robust and accurate.

Methodology

A potential method of calculating the effect on public health would be to use the Years of potential life lost (YPLL) metric (4), using this metric a estimate of years of life lost due to fatality can be calculated for a given storm event, this can be added to the average number of months lost due to a specific type of injury during a storm event. These two metrics combined will then give us an estimate of the overall impact due to a Storm event on the Health of a population. The dataset provided for analysis and used in this paper doesn’t appear to have the required granularity to complete analysis in this manor.

Based on the data available and due to time constraints for this paper an simple health impact metric (SHIM) will be constructed which will weight injuries and fatalities with perceived impact on the overall health of the population. A injury will be counted as one unit within the metric, a fatality will be counted as twenty units within the metric. The fatality weighting is an arbitrary multiplier that has been applied as fatalities have a greater impact on the health of a population due to . This metric will then be summed across all storms reported, grouped by the type of weather event, the data will then be sorted and the event with the largest SHIM will be the most impaction event with regards to health.

A potential method of calculating cost to the economy can be taken by taking cost of damage due to a particular event and combining that with location, the cost can then be weighted by location with damage to major metropolitan areas, infrastructure and industrial facilities weighted as having a larger impact on the overall economy than damage to rural areas. The justification for this is impact on the economy of damage to major metropolitan area like New York City will be greater than that of a small town which has limited output.

Based on the data available and due to time constraints the economic impact of a storm or weather event will be determined by summing the cost of damage to property and crops. The the total damage per storm in USD will be calculated and then summed across all storms reported, grouped by the type of weather event, the data will then be sorted and the event with the largest sum total being the most impactful.

For the methods selected there are a number of issues around the non application of location bias, not using an estimate Years of potential life lost (YPLL) caused by injury and fatality, not taking into account the fact that health has an impact on the economic out and not weighting the damage cost.

A follow on analysis is recommended obtaining data to calculate an estimate Years of potential life lost (YPLL) caused by injury and fatality using actual data to inform weightings. The data guide contains useful information on typical types of injuries that occur due to a storm or weather event. It should also be considered that injury and fatality will have an economic impact due to loss of output from an injured or deceased member of a population.

Data Processing

The following is pre processing which is not related to a specific question, it is required for both datasets

#Install packagfes
#install.packages("ggplot2")
#install.packages("plyr")
#install.packages("R.utils")
#load libraries we will need later
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.2
library(plyr)
## Warning: package 'plyr' was built under R version 3.2.2
library(R.utils)
## Warning: package 'R.utils' was built under R version 3.2.2
## Loading required package: R.oo
## Warning: package 'R.oo' was built under R version 3.2.2
## Loading required package: R.methodsS3
## Warning: package 'R.methodsS3' was built under R version 3.2.2
## R.methodsS3 v1.7.0 (2015-02-19) successfully loaded. See ?R.methodsS3 for help.
## R.oo v1.19.0 (2015-02-27) successfully loaded. See ?R.oo for help.
## 
## Attaching package: 'R.oo'
## 
## The following objects are masked from 'package:methods':
## 
##     getClasses, getMethods
## 
## The following objects are masked from 'package:base':
## 
##     attach, detach, gc, load, save
## 
## R.utils v2.1.0 (2015-05-27) successfully loaded. See ?R.utils for help.
## 
## Attaching package: 'R.utils'
## 
## The following object is masked from 'package:utils':
## 
##     timestamp
## 
## The following objects are masked from 'package:base':
## 
##     cat, commandArgs, getOption, inherits, isOpen, parse, warnings
#Download the file
file.url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
file.dest <- "./dataStorm/repdata_data_StormData.csv.bz2"
file.bzfile <- "./dataStorm/repdata_data_StormData.csv.bz2"
file.dataset <- "./dataStorm/repdata_data_StormData.csv"

#Check if data directory exists
if(!dir.exists("./dataStorm")) 
{
        dir.create("dataStorm", showWarnings = TRUE, recursive = FALSE, mode = "0777")
}

# download from the URL
# Enable in RStudio - doesn't work in markdown
#download.file(file.url, file.dest, quiet=TRUE)

## Unzipping file into folder "Data" in wd
bunzip2 (file.bzfile, overwrite=TRUE, remove=FALSE, destname=file.dataset) 

#read the csv file
repdata <- read.csv(file=file.dataset, header=TRUE, sep=",",na.strings = "NA"  )

#Subset what we need for the analysis based on the cookbook
harm <- repdata[,c('STATE', 'EVTYPE', 'FATALITIES', 'INJURIES', 'PROPDMG', 'PROPDMGEXP','CROPDMG', 'CROPDMGEXP')]

#Lets look at the data first to understand whats there
head(harm)
##   STATE  EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1    AL TORNADO          0       15    25.0          K       0           
## 2    AL TORNADO          0        0     2.5          K       0           
## 3    AL TORNADO          0        2    25.0          K       0           
## 4    AL TORNADO          0        2     2.5          K       0           
## 5    AL TORNADO          0        2     2.5          K       0           
## 6    AL TORNADO          0        6     2.5          K       0
#Clean the Event Type up 
harm$EVTYPE <- toupper(harm$EVTYPE)
#Additional cleansing and standardization is recommended but not applied due to time constraints

#Check for missing data
sum (is.na (harm))
## [1] 0
#Produce initial set of summary statistics
summary(harm)
##      STATE           EVTYPE            FATALITIES      
##  TX     : 83728   Length:902297      Min.   :  0.0000  
##  KS     : 53440   Class :character   1st Qu.:  0.0000  
##  OK     : 46802   Mode  :character   Median :  0.0000  
##  MO     : 35648                      Mean   :  0.0168  
##  IA     : 31069                      3rd Qu.:  0.0000  
##  NE     : 30271                      Max.   :583.0000  
##  (Other):621339                                        
##     INJURIES            PROPDMG          PROPDMGEXP        CROPDMG       
##  Min.   :   0.0000   Min.   :   0.00          :465934   Min.   :  0.000  
##  1st Qu.:   0.0000   1st Qu.:   0.00   K      :424665   1st Qu.:  0.000  
##  Median :   0.0000   Median :   0.00   M      : 11330   Median :  0.000  
##  Mean   :   0.1557   Mean   :  12.06   0      :   216   Mean   :  1.527  
##  3rd Qu.:   0.0000   3rd Qu.:   0.50   B      :    40   3rd Qu.:  0.000  
##  Max.   :1700.0000   Max.   :5000.00   5      :    28   Max.   :990.000  
##                                        (Other):    84                    
##    CROPDMGEXP    
##         :618413  
##  K      :281832  
##  M      :  1994  
##  k      :    21  
##  0      :    19  
##  B      :     9  
##  (Other):     9

The following data processing is relevant to the question posed “Across the United States, which types of events are most harmful with respect to population health?” in that it prepares and aggregates data used to answer that question

#Pre Processing for the health impact
#Create the aggregates and the weighting
harmtotal <- aggregate(harm[, c('FATALITIES','INJURIES')], by=list(harm$EVTYPE), FUN=sum, na.rm=TRUE)
harmtotal$FATALITIESWEIGHTED <- harmtotal$FATALITIES * 20
harmtotal$Total <- harmtotal$FATALITIESWEIGHTED + harmtotal$INJURIES
harmtotal <- rename(harmtotal, c("Group.1"="Event"))

#Get the top 15 fatalities, injury and weighted health metric
fatal15 <- harmtotal[order(-harmtotal$FATALITIES),][1:15, ]
injury15 <- harmtotal[order(-harmtotal$INJURIES),][1:15, ]
weighted15 <- harmtotal[order(-harmtotal$Total),][1:15, ]

The following data processing is relevant to the question posed “Across the United States, which types of events have the greatest economic consequences?” in that it prepares and aggregates data used to answer that question

#Get unique values for property multiplier
unique(harm$PROPDMGEXP)
##  [1] K M   B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
#Set the multi*plier values
harm$PROPMULT[harm$PROPDMGEXP ==  "K"]  <-  1000
harm$PROPMULT[harm$PROPDMGEXP == "M"]   <-  10 ^ 6
harm$PROPMULT[harm$PROPDMGEXP == ""]   <-   1
harm$PROPMULT[harm$PROPDMGEXP == "B"]   <-  10 ^ 9
harm$PROPMULT[harm$PROPDMGEXP == "m"]   <-  10 ^ 6
harm$PROPMULT[harm$PROPDMGEXP == "+"]   <-  0
harm$PROPMULT[harm$PROPDMGEXP == "0"]   <-  1
harm$PROPMULT[harm$PROPDMGEXP == "5"]   <-  10 ^ 5
harm$PROPMULT[harm$PROPDMGEXP == "6"]   <-  10 ^ 6
harm$PROPMULT[harm$PROPDMGEXP == "?"]   <-  0
harm$PROPMULT[harm$PROPDMGEXP == "4"]   <-  10000
harm$PROPMULT[harm$PROPDMGEXP == "2"]   <-  100
harm$PROPMULT[harm$PROPDMGEXP == "3"]   <-  1000
harm$PROPMULT[harm$PROPDMGEXP == "h"]   <-  100
harm$PROPMULT[harm$PROPDMGEXP == "7"]   <-  10 ^ 7
harm$PROPMULT[harm$PROPDMGEXP == "H"]   <-  100
harm$PROPMULT[harm$PROPDMGEXP == "-"]   <-  0
harm$PROPMULT[harm$PROPDMGEXP == "1"]   <-  10
harm$PROPMULT[harm$PROPDMGEXP == "8"]   <-  10 ^ 8

#Check for NA's to ensure that all multipliers have been mapped
sum(is.na(harm$PROPMULT))
## [1] 0
#Get unique values for crop multiplier
unique(harm$CROPDMGEXP)
## [1]   M K m B ? 0 k 2
## Levels:  ? 0 2 B k K m M
#Apply crop multipliers
harm$CROPMULT[harm$CROPDMGEXP == ""]   <-   1
harm$CROPMULT[harm$CROPDMGEXP == "M"]   <-  10 ^ 6
harm$CROPMULT[harm$CROPDMGEXP ==  "K"]  <-  1000
harm$CROPMULT[harm$CROPDMGEXP == "m"]   <-  10 ^ 6
harm$CROPMULT[harm$CROPDMGEXP == "B"]   <-  10 ^ 9
harm$CROPMULT[harm$CROPDMGEXP == "?"]   <-  0
harm$CROPMULT[harm$CROPDMGEXP == "0"]   <-  0
harm$CROPMULT[harm$CROPDMGEXP == "k"]   <-  1000
harm$CROPMULT[harm$CROPDMGEXP == "2"]   <-  100

#Check for NA's to ensure that all multipliers have been mapped
sum(is.na(harm$CROPMULT))
## [1] 0
#Now we go and multiply the values out and create a total
harm$CROPValue <- harm$CROPMULT * harm$CROPDMG
harm$PROPValue <- harm$PROPMULT * harm$PROPDMG
harm$TotalCost <- harm$CROPValue + harm$PROPValue

#Create a sum table
costtotal <- aggregate(harm[, c('CROPValue','PROPValue', 'TotalCost')], by=list(harm$EVTYPE), FUN=sum, na.rm=TRUE)
costtotal <- rename(costtotal, c("Group.1"="Event"))

#Get the top 15 valye for crop destruction, property destruction and the total
crop15 <- costtotal[order(-costtotal$CROPValue),][1:15, ]
Property15 <- costtotal[order(-costtotal$PROPValue),][1:15, ]
TotalDamage15 <- costtotal[order(-costtotal$TotalCost),][1:15, ]

Results

Impact on public health

The question set out was “Across the United States, which types of events are most harmful with respect to population health?”

The following table shows the top 15 aggregated fatalities and weighted fatalities grouped by Storm Event Type. The values at the top of the table are those which have the most fatalities and therefore cause the most harm in relation to fatalities

fatal15[,c(1,3,4)]
##                 Event INJURIES FATALITIESWEIGHTED
## 758           TORNADO    91346             112660
## 116    EXCESSIVE HEAT     6525              38060
## 138       FLASH FLOOD     1777              19560
## 243              HEAT     2100              18740
## 418         LIGHTNING     5230              16320
## 779         TSTM WIND     6957              10080
## 154             FLOOD     6789               9400
## 524       RIP CURRENT      232               7360
## 320         HIGH WIND     1137               4960
## 19          AVALANCHE      170               4480
## 888      WINTER STORM     1321               4120
## 525      RIP CURRENTS      297               4080
## 245         HEAT WAVE      379               3440
## 125      EXTREME COLD      231               3240
## 685 THUNDERSTORM WIND     1488               2660

The following table shows the top 15 total number of injuries grouped by Storm Event Type. The values at the top of the table are those which have the most injuries and therefore cause the most impact due to lost recovery time for the population

injury15[,c(1,2)]
##                 Event FATALITIES
## 758           TORNADO       5633
## 779         TSTM WIND        504
## 154             FLOOD        470
## 116    EXCESSIVE HEAT       1903
## 418         LIGHTNING        816
## 243              HEAT        937
## 387         ICE STORM         89
## 138       FLASH FLOOD        978
## 685 THUNDERSTORM WIND        133
## 212              HAIL         15
## 888      WINTER STORM        206
## 372 HURRICANE/TYPHOON         64
## 320         HIGH WIND        248
## 274        HEAVY SNOW        127
## 875          WILDFIRE         75

The following is a graphical representation of the top 15 injury and actual fatality data in the form of a bar chart, please note the weighted value is not expressed here.

#Set parameters to have a panel plot
par(mfrow = c(1, 2))

#Injuries
barplot(injury15$INJURIES,
        names.arg=injury15$Event, 
        ylim= c(0,max(injury15$INJURIES) + 20000),
        col=heat.colors(15),
        cex.names= 0.8,
        las = 2, 
        mar=c(10, 10, 5, 5),
        ylab="Injuries Per Event Type", 
        main="Health Impact Due to Injuries")

#Fatalities
barplot(fatal15$FATALITIES,
        names.arg=fatal15$Event, 
        ylim= c(0,max(injury15$INJURIES) + 20000),
        col=heat.colors(15),
        cex.names= 0.8,
        las = 2, 
        mar=c(20, 20, 5, 5),
        ylab="Weighted Health Metric", 
        main="Health Impact due to Fatalities")

The following data table details the top 15 summed weighted fatalities and injuries (i.e. the SHIM) grouped by Storm Event Type. The values at the top of the table are those which have the the most impactful SHIM

weighted15[,c(1,5)]
##                 Event  Total
## 758           TORNADO 204006
## 116    EXCESSIVE HEAT  44585
## 418         LIGHTNING  21550
## 138       FLASH FLOOD  21337
## 243              HEAT  20840
## 779         TSTM WIND  17037
## 154             FLOOD  16189
## 524       RIP CURRENT   7592
## 320         HIGH WIND   6097
## 888      WINTER STORM   5441
## 19          AVALANCHE   4650
## 525      RIP CURRENTS   4377
## 685 THUNDERSTORM WIND   4148
## 245         HEAT WAVE   3819
## 387         ICE STORM   3755

The following is a graphical representation the top 15 summed weighted fatalities and injuries (i.e. the SHIM) grouped by Storm Event Type. This gives an overall impact on health within the population per storm event type

#Weigthed
barplot(weighted15$Total,
        names.arg=weighted15$Event, 
        ylim= c(0,max(weighted15$Total) + 5000),
col=heat.colors(15),
cex.names= 0.8,
las = 2, 
mar=c(20, 20, 5, 5),
ylab="Injuries Per Event Type", 
main="Aggregated Health Impact due to Injuries and Weighted Fatalities")

##Impact on the Economy

The question set out was “Across the United States, which types of events have the greatest economic consequences?”

The following data table shows estimated value of damage to crops per Storm event, the storm event with the largest value has the biggest amount of damage.

crop15[,c(1,2)]
##                 Event   CROPValue
## 84            DROUGHT 13972566000
## 154             FLOOD  5661968450
## 529       RIVER FLOOD  5029459000
## 387         ICE STORM  5022113500
## 212              HAIL  3025954453
## 363         HURRICANE  2741910000
## 372 HURRICANE/TYPHOON  2607872800
## 138       FLASH FLOOD  1421317100
## 125      EXTREME COLD  1312973000
## 187      FROST/FREEZE  1094186000
## 254        HEAVY RAIN   733399800
## 772    TROPICAL STORM   678346000
## 320         HIGH WIND   638571300
## 779         TSTM WIND   554007350
## 116    EXCESSIVE HEAT   492402000

The following data table shows estimated value of damage to property per Storm event, the storm event with the largest value has the biggest amount of damage.

Property15[,c(1,3)]
##                 Event    PROPValue
## 154             FLOOD 144657709807
## 372 HURRICANE/TYPHOON  69305840000
## 758           TORNADO  56947380617
## 599       STORM SURGE  43323536000
## 138       FLASH FLOOD  16822673979
## 212              HAIL  15735267513
## 363         HURRICANE  11868319010
## 772    TROPICAL STORM   7703890550
## 888      WINTER STORM   6688497251
## 320         HIGH WIND   5270046260
## 529       RIVER FLOOD   5118945500
## 875          WILDFIRE   4765114000
## 600  STORM SURGE/TIDE   4641188000
## 779         TSTM WIND   4484958495
## 387         ICE STORM   3944927860

The following data table shows sum of the estimated value of damage to property and crops per Storm event, the storm event with the largest value has the biggest amount of damage.

TotalDamage15[,c(1,4)]
##                 Event    TotalCost
## 154             FLOOD 150319678257
## 372 HURRICANE/TYPHOON  71913712800
## 758           TORNADO  57362333727
## 599       STORM SURGE  43323541000
## 212              HAIL  18761221966
## 138       FLASH FLOOD  18243991079
## 84            DROUGHT  15018672000
## 363         HURRICANE  14610229010
## 529       RIVER FLOOD  10148404500
## 387         ICE STORM   8967041360
## 772    TROPICAL STORM   8382236550
## 888      WINTER STORM   6715441251
## 320         HIGH WIND   5908617560
## 875          WILDFIRE   5060586800
## 779         TSTM WIND   5038965845

The following is a graphical representation the sum of the estimated value of damage to property and crops per Storm event. This provides an overall measure of the impact to the economy for an event

barplot(TotalDamage15$TotalCost,
        names.arg=TotalDamage15$Event, 
        ylim= c(0,max(TotalDamage15$TotalCost) + 5000),
        col=heat.colors(15),
        cex.names= 0.8,
        las = 2, 
        mar=c(20, 20, 5, 5),
        ylab="Property and Crop Damage USD", 
        main="Top 15 Storm Events for Property and Crop Damage")

Citations and referances

  1. The National Weather Service Storm Dataset in its raw used for in this paper was sourced from https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2
  2. The National Weather Service Storm Dataset guide is located at https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf
  3. The National Weather Service Storm Dataset FAQ can be found at https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2FNCDC%20Storm%20Events-FAQ%20Page.pdf
  4. Years of Potential Life Lost https://en.wikipedia.org/wiki/Years_of_potential_life_lost