Health and Economic Impacts of Storm and Weather Events in the US (1950-2011)

Synopsis

For this analysis, I examined the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database, which tracks characteristics of major storms and weather events. Specifically, I examined the events from 1950 to 2011 in order to understand which event types were most harmful to population health and to the economy. I found that tornadoes have caused the most total harm to population health (judged by the number of fatalities and injuries), and floods caused the most total harm to the economy (judged by property damage and crop damage). Although I focused on figuring out which event types caused the most total harm, I also analyzed which event types caused the most harm on a per-event basis. I did not take geography or changes over time into account in my analysis, which could form the basis for a useful follow-up study.

Data Processing

To begin, I downloaded the Storm Data file and saved it in my working directory. This file contains storm data from 1950 through 2011. More information about the data is available in the FAQ.

# read in the data and determine the dimensions
storm <- read.csv("repdata-data-StormData.csv.bz2", as.is = TRUE)
dim(storm)
## [1] 902297     37

Since this analysis is focused on event types, I checked the number of unique values in the event type field:

# determine the number of unique event types
events <- unique(storm$EVTYPE)
length(events)
## [1] 985

It contains 985 unique values. This was surprising, since the Storm Data Documentation listed only 48 event types.

Upon further exploration, I determined that many of those EVTYPE values were minor variations of the 48 “official” event types. For example:

# show event types containing 'FLASH'
events[grep("FLASH", events)]
##  [1] "ICE STORM/FLASH FLOOD"          "FLASH FLOOD"                   
##  [3] "FLASH FLOODING"                 "FLASH FLOODING/THUNDERSTORM WI"
##  [5] "FLASH FLOODS"                   "FLOOD/FLASH FLOOD"             
##  [7] "FLASH FLOOD WINDS"              "FLASH FLOOD/"                  
##  [9] "THUNDERSTORM WINDS/FLASH FLOOD" "LOCAL FLASH FLOOD"             
## [11] "FLOOD/FLASH FLOODING"           "FLASH FLOOD/FLOOD"             
## [13] "FLASH FLOOD FROM ICE JAMS"      "FLASH FLOOD - HEAVY RAIN"      
## [15] "FLASH FLOOD/ STREET"            "FLASH FLOOD/HEAVY RAIN"        
## [17] "FLOOD FLASH"                    "FLOOD FLOOD/FLASH"             
## [19] "FLOOD/FLASH"                    "FLASH FLOOD/ FLOOD"            
## [21] "FLASH FLOODING/FLOOD"           "FLASH FLOOODING"               
## [23] "FLOOD/FLASHFLOOD"               "FLASH FLOOD/LANDSLIDE"         
## [25] "FLASH FLOOD LANDSLIDES"         "FLOOD/FLASH/FLOOD"             
## [27] " FLASH FLOOD"

Although I considered attempting to “fix” every unofficial event type by hand, I decided that this was unnecessary since my analysis is focused on event types with the largest impacts, and 817 of the 985 event types had less than 10 occurrences in the dataset:

# calculate the number of event types with less than 10 occurrences
suppressMessages(library(dplyr))
storm %>% select(EVTYPE) %>% group_by(EVTYPE) %>% summarise(count = n()) %>% 
    filter(count < 10) %>% nrow()
## [1] 817

Instead, I focused on combining duplicate event types in which there were at least 50 occurrences in the dataset:

# display the event types with at least 50 occurrences
event.topcounts <- storm %>% select(EVTYPE) %>% group_by(EVTYPE) %>% summarise(count = n()) %>% 
    filter(count >= 50) %>% data.frame()
event.topcounts
##                      EVTYPE  count
## 1    ASTRONOMICAL HIGH TIDE    103
## 2     ASTRONOMICAL LOW TIDE    174
## 3                 AVALANCHE    386
## 4                  BLIZZARD   2719
## 5             COASTAL FLOOD    650
## 6          COASTAL FLOODING    143
## 7                      COLD     72
## 8           COLD/WIND CHILL    539
## 9                 DENSE FOG   1293
## 10                  DROUGHT   2488
## 11           DRY MICROBURST    186
## 12               DUST DEVIL    141
## 13               DUST STORM    427
## 14           EXCESSIVE HEAT   1678
## 15             EXTREME COLD    655
## 16  EXTREME COLD/WIND CHILL   1002
## 17        EXTREME WINDCHILL    204
## 18              FLASH FLOOD  54277
## 19           FLASH FLOODING    682
## 20                    FLOOD  25326
## 21        FLOOD/FLASH FLOOD    624
## 22                 FLOODING    120
## 23                      FOG    538
## 24                   FREEZE     74
## 25            FREEZING RAIN    250
## 26                    FROST     53
## 27             FROST/FREEZE   1342
## 28             FUNNEL CLOUD   6839
## 29            FUNNEL CLOUDS     87
## 30              GUSTY WINDS     53
## 31                     HAIL 288661
## 32                     HEAT    767
## 33                HEAT WAVE     74
## 34               HEAVY RAIN  11723
## 35               HEAVY SNOW  15708
## 36               HEAVY SURF     84
## 37     HEAVY SURF/HIGH SURF    228
## 38                HIGH SURF    725
## 39                HIGH WIND  20212
## 40               HIGH WINDS   1533
## 41                HURRICANE    174
## 42        HURRICANE/TYPHOON     88
## 43                      ICE     61
## 44                ICE STORM   2006
## 45         LAKE-EFFECT SNOW    636
## 46                LANDSLIDE    600
## 47               LIGHT SNOW    154
## 48                LIGHTNING  15754
## 49              MARINE HAIL    442
## 50         MARINE HIGH WIND    135
## 51 MARINE THUNDERSTORM WIND   5812
## 52         MARINE TSTM WIND   6175
## 53        MODERATE SNOWFALL    101
## 54              RECORD COLD     64
## 55              RECORD HEAT     81
## 56            RECORD WARMTH    146
## 57              RIP CURRENT    470
## 58             RIP CURRENTS    304
## 59              RIVER FLOOD    173
## 60                    SLEET     59
## 61                     SNOW    587
## 62              STORM SURGE    261
## 63         STORM SURGE/TIDE    148
## 64              STRONG WIND   3566
## 65             STRONG WINDS    196
## 66        THUNDERSTORM WIND  82563
## 67       THUNDERSTORM WINDS  20843
## 68  THUNDERSTORM WINDS HAIL     61
## 69      THUNDERSTORM WINDSS     51
## 70                  TORNADO  60652
## 71      TROPICAL DEPRESSION     60
## 72           TROPICAL STORM    690
## 73                TSTM WIND 219940
## 74           TSTM WIND/HAIL   1028
## 75         UNSEASONABLY DRY     56
## 76        UNSEASONABLY WARM    126
## 77              URBAN FLOOD    249
## 78           URBAN FLOODING     98
## 79     URBAN/SML STREAM FLD   3392
## 80               WATERSPOUT   3796
## 81         WILD/FOREST FIRE   1457
## 82                 WILDFIRE   2761
## 83                     WIND    340
## 84             WINTER STORM  11433
## 85           WINTER WEATHER   7026
## 86       WINTER WEATHER/MIX   1104
## 87               WINTRY MIX     90

I only combined clear duplicates, and generally used the name specified by the Storm Data documentation:

# fix duplicates
storm$EVTYPE[storm$EVTYPE == "COASTAL FLOODING"] <- "COASTAL FLOOD"
storm$EVTYPE[storm$EVTYPE == "COLD"] <- "COLD/WIND CHILL"
storm$EVTYPE[storm$EVTYPE == "EXTREME COLD"] <- "EXTREME COLD/WIND CHILL"
storm$EVTYPE[storm$EVTYPE == "EXTREME WINDCHILL"] <- "EXTREME COLD/WIND CHILL"
storm$EVTYPE[storm$EVTYPE == "FLASH FLOODING"] <- "FLASH FLOOD"
storm$EVTYPE[storm$EVTYPE == "FLOOD/FLASH FLOOD"] <- "FLASH FLOOD"
storm$EVTYPE[storm$EVTYPE == "FLOODING"] <- "FLOOD"
storm$EVTYPE[storm$EVTYPE == "URBAN FLOOD"] <- "FLOOD"
storm$EVTYPE[storm$EVTYPE == "URBAN FLOODING"] <- "FLOOD"
storm$EVTYPE[storm$EVTYPE == "URBAN/SML STREAM FLD"] <- "FLOOD"
storm$EVTYPE[storm$EVTYPE == "FREEZE"] <- "FROST/FREEZE"
storm$EVTYPE[storm$EVTYPE == "FROST"] <- "FROST/FREEZE"
storm$EVTYPE[storm$EVTYPE == "FUNNEL CLOUDS"] <- "FUNNEL CLOUD"
storm$EVTYPE[storm$EVTYPE == "HEAT WAVE"] <- "HEAT"
storm$EVTYPE[storm$EVTYPE == "HEAVY SURF"] <- "HIGH SURF"
storm$EVTYPE[storm$EVTYPE == "HEAVY SURF/HIGH SURF"] <- "HIGH SURF"
storm$EVTYPE[storm$EVTYPE == "HIGH WINDS"] <- "HIGH WIND"
storm$EVTYPE[storm$EVTYPE == "HURRICANE"] <- "HURRICANE/TYPHOON"
storm$EVTYPE[storm$EVTYPE == "TYPHOON"] <- "HURRICANE/TYPHOON"
storm$EVTYPE[storm$EVTYPE == "ICE"] <- "ICE STORM"
storm$EVTYPE[storm$EVTYPE == "MARINE TSTM WIND"] <- "MARINE THUNDERSTORM WIND"
storm$EVTYPE[storm$EVTYPE == "RIP CURRENTS"] <- "RIP CURRENT"
storm$EVTYPE[storm$EVTYPE == "STORM SURGE/TIDE"] <- "STORM SURGE"
storm$EVTYPE[storm$EVTYPE == "STRONG WINDS"] <- "STRONG WIND"
storm$EVTYPE[storm$EVTYPE == "THUNDERSTORM WINDS"] <- "THUNDERSTORM WIND"
storm$EVTYPE[storm$EVTYPE == "THUNDERSTORM WINDS HAIL"] <- "THUNDERSTORM WIND"
storm$EVTYPE[storm$EVTYPE == "THUNDERSTORM WINDSS"] <- "THUNDERSTORM WIND"
storm$EVTYPE[storm$EVTYPE == "TSTM WIND"] <- "THUNDERSTORM WIND"
storm$EVTYPE[storm$EVTYPE == "TSTM WIND/HAIL"] <- "THUNDERSTORM WIND"
storm$EVTYPE[storm$EVTYPE == "WILD/FOREST FIRE"] <- "WILDFIRE"
storm$EVTYPE[storm$EVTYPE == "WINTER WEATHER/MIX"] <- "WINTER WEATHER"

Next, I wanted to create a new variable for property damage amount (PROPDMGAMT) by multiplying the base figure (PROPDMG) by the magnitute (PROPDMGEXP). I also created a new variable for crop damage amount (CROPDMGAMT) by multiplying the base figure (CROPDMG) by the magnitute (CROPDMGEXP). Per page 12 of the documentation, “K” means thousands, “M” means millions, and “B” means billions.

However, the PROPDMGEXP and CROPDMGEXP variables do not always contain proper values:

table(storm$PROPDMGEXP)
## 
##             -      ?      +      0      1      2      3      4      5 
## 465934      1      8      5    216     25     13      4      4     28 
##      6      7      8      B      h      H      K      m      M 
##      4      5      1     40      1      6 424665      7  11330
table(storm$CROPDMGEXP)
## 
##             ?      0      2      B      k      K      m      M 
## 618413      7     19      1      9     21 281832      1   1994

I decided to treat everything other than K/k, M/m, and B/b as invalid. Thus, any record with an invalid magnitude would be assigned an amount of zero.

# copy data frame variables to vectors
PROPDMG <- storm$PROPDMG
PROPDMGEXP <- storm$PROPDMGEXP
CROPDMG <- storm$CROPDMG
CROPDMGEXP <- storm$CROPDMGEXP

# change lowercase k/m/b to uppercase K/M/B
PROPDMGEXP <- toupper(PROPDMGEXP)
CROPDMGEXP <- toupper(CROPDMGEXP)

# calculate PROPDMGAMT
PROPDMGAMT <- ifelse(PROPDMGEXP == "K", PROPDMG * 1000, ifelse(PROPDMGEXP == 
    "M", PROPDMG * 1e+06, ifelse(PROPDMGEXP == "B", PROPDMG * 1e+09, 0)))

# calculate CROPDMGAMT
CROPDMGAMT <- ifelse(CROPDMGEXP == "K", CROPDMG * 1000, ifelse(CROPDMGEXP == 
    "M", CROPDMG * 1e+06, ifelse(CROPDMGEXP == "B", CROPDMG * 1e+09, 0)))

# add new variables to data frame
storm$PROPDMGAMT <- PROPDMGAMT
storm$CROPDMGAMT <- CROPDMGAMT

Finally, I created a new variable called “HARMHEALTH” that combined the “FATALITIES” and “INJURIES” variables into a single number. The formula I decided to use was “HARMHEALTH = INJURIES + 20*FATALITIES”, meaning that fatalities were weighted 20 times as much as injuries.

# calculate HARMHEALTH for each event
storm$HARMHEALTH <- storm$INJURIES + 20 * storm$FATALITIES

Results

Harm to Population Health

The first question I had to answer was this: Across the United States, which types of events are most harmful with respect to population health? I answered this question by calculating the total “HARMHEALTH” score for every type of event, and then plotting the events with the top 10 scores:

# calculate the total HARMHEALTH for each event type and save it to a data
# frame
stormhealth <- storm %>% select(EVTYPE, HARMHEALTH) %>% group_by(EVTYPE) %>% 
    summarise(HARMHEALTHTOTAL = sum(HARMHEALTH), COUNT = n()) %>% arrange(desc(HARMHEALTHTOTAL)) %>% 
    data.frame()

# plot the top 10 event types
library(ggplot2)
ggplot(stormhealth[1:10, ]) + aes(x = reorder(EVTYPE, -HARMHEALTHTOTAL), y = HARMHEALTHTOTAL) + 
    geom_bar(stat = "identity") + labs(title = "Top 10 Storm Event Types Ranked by Total Harm to Human Health", 
    x = "Storm Event Type", y = "HARMHEALTH Score") + theme(axis.text.x = element_text(angle = 45, 
    hjust = 1, vjust = 1))

plot of chunk unnamed-chunk-10

As you can see, tornadoes have (by far) the highest impact on human health, followed by “excessive heat” and “heat”.

It is also interesting to see the average “harm per event” for these same 10 event types:

# calculate the average 'harm per event' for each event type
stormhealth$HARMPEREVENT <- round(stormhealth$HARMHEALTHTOTAL/stormhealth$COUNT, 
    2)
head(stormhealth, 10)
##                     EVTYPE HARMHEALTHTOTAL  COUNT HARMPEREVENT
## 1                  TORNADO          204006  60652         3.36
## 2           EXCESSIVE HEAT           44585   1678        26.57
## 3                     HEAT           24589    841        29.24
## 4        THUNDERSTORM WIND           23572 324486         0.07
## 5              FLASH FLOOD           22080  55583         0.40
## 6                LIGHTNING           21550  15754         1.37
## 7                    FLOOD           16950  29185         0.58
## 8              RIP CURRENT           11969    774        15.46
## 9                HIGH WIND            7099  21745         0.33
## 10 EXTREME COLD/WIND CHILL            6300   1861         3.39

On a per-event basis, you can see that heat and rip currents are the most dangerous out of these 10 event types.

Finally, let's take a look at which event types had the highest harm-per-event (limiting to event types with at least 50 occurrences):

# show the event types with the highest harm-per-event
stormhealth %>% filter(COUNT >= 50) %>% arrange(desc(HARMPEREVENT)) %>% head(10)
##                     EVTYPE HARMHEALTHTOTAL COUNT HARMPEREVENT
## 1                     HEAT           24589   841        29.24
## 2           EXCESSIVE HEAT           44585  1678        26.57
## 3              RIP CURRENT           11969   774        15.46
## 4        HURRICANE/TYPHOON            3826   273        14.01
## 5                AVALANCHE            4650   386        12.05
## 6          COLD/WIND CHILL            2660   611         4.35
## 7                      FOG            1974   538         3.67
## 8  EXTREME COLD/WIND CHILL            6300  1861         3.39
## 9                  TORNADO          204006 60652         3.36
## 10               HIGH SURF            3240  1037         3.12

Whether a government official should focus on events with the highest total harm or events with the highest harm-per-event depends on many factors. Additional factors that the government official might take into consideration include geography and changes over time, neither of which were considered in my analysis.

Harm to the Economy

The second question I had to answer was this: Across the United States, which types of events have the greatest economic consequences? I answered this question by calculating the total dollar amount of property and crop damages for every type of event, and then plotting the events with the top 10 damage totals:

# calculate the damage totals for each event type and save it to a data
# frame
stormecon <- storm %>% select(EVTYPE, PROPDMGAMT, CROPDMGAMT) %>% group_by(EVTYPE) %>% 
    summarise(Property = sum(PROPDMGAMT), Crop = sum(CROPDMGAMT), DMGTOTAL = Property + 
        Crop, COUNT = n()) %>% arrange(desc(DMGTOTAL)) %>% data.frame()

# save the top 10 event types to a new data frame
stormecon.top <- stormecon[1:10, ]

# reshape the data so that it can be plotted by damage type
library(reshape2)
stormecon.melt <- melt(stormecon.top, id = "EVTYPE", measure.vars = c("Property", 
    "Crop"))
stormecon.melt$EVTYPE <- factor(stormecon.melt$EVTYPE, levels = stormecon.top$EVTYPE)

# plot the top 10 event types
ggplot(stormecon.melt) + aes(x = EVTYPE, y = value/1e+09, fill = variable) + 
    geom_bar(stat = "identity") + labs(title = "Top 10 Storm Event Types Ranked by Total Damage to Property and Crops", 
    x = "Storm Event Type", y = "Total Damages (in billions of USD)", fill = "Type of Damage") + 
    theme(legend.position = "top", axis.text.x = element_text(angle = 45, hjust = 1, 
        vjust = 1))

plot of chunk unnamed-chunk-13

The types of the events with the greatest economic harm are floods, hurricanes/typhoons, tornadoes, and storm surges. As you can see, almost all of the damage totals are from property damage (rather than crop damage), with the exception of droughts.

It is also interesting to see the average “damage per event” for these same 10 event types:

# calculate the average 'damage per event' for each event type
stormecon$DMGPEREVENT <- round(stormecon$DMGTOTAL/stormecon$COUNT, 0)
head(stormecon, 10)
##               EVTYPE  Property      Crop  DMGTOTAL  COUNT DMGPEREVENT
## 1              FLOOD 1.448e+11 5.681e+09 1.505e+11  29185     5157744
## 2  HURRICANE/TYPHOON 8.177e+10 5.351e+09 8.712e+10    273   319139182
## 3            TORNADO 5.694e+10 4.150e+08 5.735e+10  60652      945593
## 4        STORM SURGE 4.796e+10 8.550e+05 4.797e+10    409   117275254
## 5               HAIL 1.573e+10 3.026e+09 1.876e+10 288661       64984
## 6        FLASH FLOOD 1.662e+10 1.531e+09 1.815e+10  55583      326612
## 7            DROUGHT 1.046e+09 1.397e+10 1.502e+10   2488     6036444
## 8  THUNDERSTORM WIND 9.751e+09 1.224e+09 1.098e+10 324486       33823
## 9        RIVER FLOOD 5.119e+09 5.029e+09 1.015e+10    173    58661298
## 10         ICE STORM 3.958e+09 5.022e+09 8.980e+09   2067     4344314

On a per-event basis, you can see that hurricanes/typhoons and storm surges cause the most damage out of these 10 event types.

Finally, let's take a look at which event types had the highest damage-per-event (limiting to event types with at least 50 occurrences):

# show the event types with the highest damage-per-event
stormecon %>% filter(COUNT >= 50) %>% arrange(desc(DMGPEREVENT)) %>% head(10)
##               EVTYPE  Property      Crop  DMGTOTAL COUNT DMGPEREVENT
## 1  HURRICANE/TYPHOON 8.177e+10 5.351e+09 8.712e+10   273   319139182
## 2        STORM SURGE 4.796e+10 8.550e+05 4.797e+10   409   117275254
## 3        RIVER FLOOD 5.119e+09 5.029e+09 1.015e+10   173    58661298
## 4     TROPICAL STORM 7.704e+09 6.783e+08 8.382e+09   690    12148169
## 5            DROUGHT 1.046e+09 1.397e+10 1.502e+10  2488     6036444
## 6              FLOOD 1.448e+11 5.681e+09 1.505e+11 29185     5157744
## 7          ICE STORM 3.958e+09 5.022e+09 8.980e+09  2067     4344314
## 8           WILDFIRE 7.767e+09 4.023e+08 8.169e+09  4218     1936750
## 9       FROST/FREEZE 9.700e+06 1.606e+09 1.616e+09  1469     1100076
## 10           TORNADO 5.694e+10 4.150e+08 5.735e+10 60652      945593

As I mentioned in the section on Population Health: Whether a government official should focus on events with the highest total damage or events with the highest damage-per-event depends on many factors. Additional factors that the government official might take into consideration include geography and changes over time, neither of which were considered in my analysis.

< END >