Synopsis

In this analysis we will use the NOAA National Weather Service Storm Events Database to assess what types of storm events have the greatest impact on public health and the economy. The database provides information for various types of events and gives data on fatalities, injuries, property damage and crop damage. By analyzing which events have the highest costs in each of these categories we will identify which types of events have the greatest impact.

Data processing

Preliminary steps

Before working with the data, we set some display options for this document and load some libraries.

# Set some chunk options
library(knitr)
opts_chunk$set(echo = TRUE,
               message = FALSE, warning = FALSE,
               cache = TRUE,
               cache.path = "cache/", 
               fig.path = "figure/"
               )
library(dplyr)
library(lubridate)
library(lattice)

Loading and preprocessing the data

# Download the data, if needed.
if (!file.exists("StormData.csv.bz2")) {
  url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
  destfile <- "StormData.csv.bz2"
  download.file(url, destfile, method = "curl")
}
rawdata <- read.csv("StormData.csv.bz2", stringsAsFactors = FALSE)

Below we present the steps taken to transform the data.

stormdata <- rawdata

We are interested in event types (EVTYPE), harm to population health (FATALITIES, INJURIES) and economic consequences (PROPDMG, CROPDMG, as well as the modifiers PROPDMGEXP and CROPDMGEXP). We will also need the date of observation (BGN_DATE). Our first step is to reduce the data to only these variables.

stormdata <- stormdata %>%
  select(BGN_DATE, EVTYPE, FATALITIES:CROPDMGEXP)

Next, according to http://www.ncdc.noaa.gov/stormevents/details.jsp, “All Event Types (48 from Directive 10-1605): From 1996 to present, 48 event types are recorded as defined in NWS Directive 10-1605.” Since there is variation in how the data was recorded before this time, we only look at data from 1996 onward. Once we complete this step the date is no longer needed, so we discard it.

stormdata <- stormdata %>%
  mutate(BGN_DATE = as.POSIXct(strptime(BGN_DATE, "%m/%d/%Y %H:%M:%S"))) %>%
  filter(BGN_DATE > as.POSIXct("1996-01-01 0:00:00")) %>%
  select(-BGN_DATE)

We further reduce the data to only those instances where the impact on population health or the economic consequences are non-zero.

stormdata <- stormdata %>%
  filter(FATALITIES > 0 | INJURIES > 0 | PROPDMG > 0 | CROPDMG > 0)

The values for PROPDMG and CROPDMG have corresponding modifier values PROPDMGEXP and CROPDMGEXP. At this point there are only three possible values in this field: “K”, “M”, “B”, and the empty string “”.

table(stormdata$PROPDMGEXP)
## 
##             B      K      M 
##   8447     32 185470   7364
table(stormdata$CROPDMGEXP)
## 
##             B      K      M 
## 102763      2  96786   1762

According to the NWS Storm Data Directive, the characters “K”, “M”, and “B” correspond to thousands, millions, and billions, respectively. We modify the PROPDMG and CROPDMG variables accordingly, and discard the PROPDMGEXP and CROPDMGEXP variables.

stormdata <- stormdata %>%
  mutate(PROPDMG = ifelse(PROPDMGEXP=="K", PROPDMG*1000, PROPDMG)) %>%
  mutate(PROPDMG = ifelse(PROPDMGEXP=="M", PROPDMG*1000000, PROPDMG)) %>%
  mutate(PROPDMG = ifelse(PROPDMGEXP=="B", PROPDMG*1000000000, PROPDMG)) %>%
  mutate(CROPDMG = ifelse(CROPDMGEXP=="K", CROPDMG*1000, CROPDMG)) %>%
  mutate(CROPDMG = ifelse(CROPDMGEXP=="M", CROPDMG*1000000, CROPDMG)) %>%
  mutate(CROPDMG = ifelse(CROPDMGEXP=="B", CROPDMG*1000000000, CROPDMG)) %>%
  select(-PROPDMGEXP, -CROPDMGEXP)

Inspection of the EVTYPE variable at this point shows there are some similar values up to variations in whitespace, capitalization, or punctuation. To normalize these values we force all letters to upper case and strip all non-alphanumeric characters. These condensed forms of the EVTYPE categories are stored in a new variable EVTYPECONDENSED.

stormdata <- stormdata %>%
  mutate(EVTYPECONDENSED = gsub("[^A-Z0-9]", "", toupper(EVTYPE)))

From this point we now have 175 unique values in EVTYPE. We would like to further classify these down to the 48 types defined in NWS Directive 10-1605, however it will require expert knowledge to this. For example, the data includes an EVTYPE of “BEACH EROSION”, but in reading the NWS document this may occur as a part of “Tropical Storm”, “High Surf”, “Seiche”, or other possible categories. For the purposes of this report, we will analyze the data as it stands at this point, and look for any possible areas where combining EVTYPEs might have a substantive impact on our results.

Results

We begin the analysis by grouping the data by EVTYPECONDENSED and summing the remaining numeric variables. All subsequent analyses will be based on subsets of this data. The entire table of values is displayed below.

groupstorm <- stormdata %>%
  group_by(EVTYPECONDENSED) %>%
  summarise_each(funs(sum), -EVTYPE)
kable(groupstorm, caption = "Table 1. Sums of all variables as grouped by EVTYPECONDENSED.")
Table 1. Sums of all variables as grouped by EVTYPECONDENSED.
EVTYPECONDENSED FATALITIES INJURIES PROPDMG CROPDMG
AGRICULTURALFREEZE 0 0 0 28820000
ASTRONOMICALHIGHTIDE 0 0 9425000 0
ASTRONOMICALLOWTIDE 0 0 320000 0
AVALANCHE 223 156 3711800 0
BEACHEROSION 0 0 100000 0
BLACKICE 1 24 0 0
BLIZZARD 70 385 525658950 7060000
BLOWINGDUST 0 0 20000 0
BLOWINGSNOW 1 1 15000 0
BRUSHFIRE 0 2 0 0
COASTALEROSION 0 0 766000 0
COASTALFLOOD 3 2 251400560 0
COASTALFLOODING 3 0 103809000 0
COASTALFLOODINGEROSION 0 5 35030000 0
COASTALSTORM 4 2 50000 0
COLD 18 12 554000 0
COLDANDSNOW 14 0 0 0
COLDTEMPERATURE 2 0 0 0
COLDWEATHER 2 0 0 0
COLDWINDCHILL 95 12 1990000 600000
DAMAGINGFREEZE 0 0 8000000 34130000
DAMBREAK 0 0 1002000 0
DENSEFOG 9 143 7319000 0
DENSESMOKE 0 0 100000 0
DOWNBURST 0 0 2000 0
DROUGHT 0 4 1046101000 13367566000
DROWNING 1 0 0 0
DRYMICROBURST 3 25 1732600 15000
DUSTDEVIL 2 39 663630 0
DUSTSTORM 11 376 5474000 3100000
EARLYFROST 0 0 0 42000000
EROSIONCSTLFLOOD 0 0 16200000 0
EXCESSIVEHEAT 1797 6391 7723700 492402000
EXCESSIVESNOW 0 2 1935000 0
EXTENDEDCOLD 1 0 100000 0
EXTREMECOLD 115 79 19760400 1308973000
EXTREMECOLDWINDCHILL 125 24 8648000 50000
EXTREMEWINDCHILL 17 5 755000 17000000
FALLINGSNOWICE 1 1 0 0
FLASHFLOOD 887 1674 15222253910 1334901700
FLASHFLOODFLOOD 0 0 5000 0
FLOOD 414 6758 143944833550 4974778400
FLOODFLASHFLOOD 0 0 10000 0
FOG 60 712 13145500 0
FREEZE 0 0 200000 156725000
FREEZINGDRIZZLE 2 13 105000 0
FREEZINGFOG 0 0 2182000 0
FREEZINGRAIN 2 0 626000 0
FREEZINGSPRAY 1 0 0 0
FROST 1 3 15000 0
FROSTFREEZE 0 0 10480000 1094186000
FUNNELCLOUD 0 1 134100 0
GLAZE 1 212 150000 0
GRADIENTWIND 0 0 37000 0
GUSTYWIND 1 1 360000 10000
GUSTYWINDHAIL 0 0 20000 0
GUSTYWINDHVYRAIN 0 0 2000 0
GUSTYWINDRAIN 0 0 2000 0
GUSTYWINDS 3 11 1276000 200000
HAIL 7 713 14595143420 2476029450
HARDFREEZE 0 0 0 12900000
HAZARDOUSSURF 0 1 0 0
HEAT 237 1222 1520000 176500
HEATWAVE 0 70 0 0
HEAVYRAIN 94 230 584864440 728169800
HEAVYRAINHIGHSURF 0 0 13500000 1500000
HEAVYSEAS 1 0 0 0
HEAVYSNOW 107 698 634417540 71122100
HEAVYSNOWSHOWER 0 2 10000 0
HEAVYSURF 6 40 1390000 0
HEAVYSURFANDWIND 3 0 0 0
HEAVYSURFHIGHSURF 42 48 9870000 0
HIGHSEAS 3 7 15000 0
HIGHSURF 90 150 83904500 0
HIGHSURFADVISORY 0 0 200000 0
HIGHSWELLS 1 0 5000 0
HIGHWATER 3 0 0 0
HIGHWIND 235 1083 5247860360 633561300
HIGHWINDG40 0 0 18000 0
HIGHWINDS 0 0 500000 0
HURRICANE 61 46 11812819010 2741410000
HURRICANEEDOUARD 0 2 0 0
HURRICANETYPHOON 64 1275 69305840000 2607872800
HYPERTHERMIAEXPOSURE 1 0 0 0
HYPOTHERMIAEXPOSURE 7 0 0 0
ICEJAMFLOODMINOR 0 0 1000 0
ICEONROAD 1 0 0 0
ICEROADS 0 1 12000 0
ICESTORM 82 318 3642248810 15660000
ICYROADS 4 22 331200 0
LAKEEFFECTSNOW 0 0 40182000 0
LAKESHOREFLOOD 0 0 7540000 0
LANDSLIDE 37 52 324578000 20017000
LANDSLIDES 1 1 5000 0
LANDSLUMP 0 0 570000 0
LANDSPOUT 0 0 7000 0
LATESEASONSNOW 0 0 180000 0
LIGHTFREEZINGRAIN 0 0 451000 0
LIGHTNING 650 4140 743077080 6898440
LIGHTSNOW 1 2 2513000 0
LIGHTSNOWFALL 0 0 85000 0
MARINEACCIDENT 1 2 50000 0
MARINEHAIL 0 0 4000 0
MARINEHIGHWIND 1 1 1297010 0
MARINESTRONGWIND 14 22 418330 0
MARINETHUNDERSTORMWIND 10 26 436400 50000
MARINETSTMWIND 9 8 5421000 0
MICROBURST 0 0 20000 0
MIXEDPRECIP 2 26 0 0
MIXEDPRECIPITATION 0 0 790000 0
MUDSLIDE 4 2 1325100 0
MUDSLIDES 1 0 0 0
NONSEVEREWINDDAMAGE 0 7 5000 0
NONTSTMWIND 0 1 40000 0
OTHER 0 4 55500 1034400
RAIN 0 0 300000 250000
RAINSNOW 4 2 0 0
RECORDHEAT 2 0 0 0
RIPCURRENT 340 209 1000 0
RIPCURRENTS 202 294 162000 0
RIVERFLOOD 1 0 20282000 1875000
RIVERFLOODING 1 1 106155000 28020000
ROCKSLIDE 0 0 150000 0
ROGUEWAVE 0 2 0 0
ROUGHSEAS 8 5 0 0
ROUGHSURF 4 1 10000 0
SEICHE 0 0 980000 0
SMALLHAIL 0 10 70000 20793000
SNOW 2 12 2554000 0
SNOWANDICE 1 0 0 0
SNOWSQUALL 2 35 30000 0
SNOWSQUALLS 1 0 70000 0
STORMSURGE 2 37 43193536000 5000
STORMSURGETIDE 11 5 4641188000 850000
STRONGWIND 103 278 174759450 64953500
STRONGWINDS 7 21 2234790 0
THUNDERSTORM 1 1 0 0
THUNDERSTORMWIND 130 1400 3382654440 398331000
THUNDERSTORMWINDG40 1 0 0 0
TIDALFLOODING 0 1 13000 0
TORNADO 1511 20667 24616905710 283425010
TORRENTIALRAINFALL 0 4 0 0
TROPICALDEPRESSION 0 0 1737000 0
TROPICALSTORM 57 338 7642475550 677711000
TSTMWIND 241 3629 4486156440 553915350
TSTMWIND40 0 0 1000 0
TSTMWIND41 0 0 8000 0
TSTMWIND45 0 0 10000 0
TSTMWINDANDLIGHTNING 0 0 80000 0
TSTMWINDG35 1 0 30000 0
TSTMWINDG40 0 1 45000 0
TSTMWINDG45 0 3 235500 0
TSTMWINDHAIL 5 95 44320500 64696250
TSUNAMI 33 129 144062000 20000
TYPHOON 0 5 600230000 825000
UNSEASONABLECOLD 0 0 0 5100000
UNSEASONABLYCOLD 0 0 0 25042500
UNSEASONABLYWARM 0 17 0 10000
UNSEASONALRAIN 0 0 0 10000000
URBANSMLSTREAMFLD 28 79 58309650 8488100
VOLCANICASH 0 0 500000 0
WARMWEATHER 0 2 0 0
WATERSPOUT 2 2 5730200 0
WETMICROBURST 0 0 35000 0
WHIRLWIND 1 0 12000 0
WILDFIRE 75 911 4758667000 295472800
WILDFORESTFIRE 12 545 3001782500 106782330
WIND 18 84 2289500 300000
WINDANDWAVE 0 0 1000000 0
WINDDAMAGE 0 0 10000 0
WINDS 1 0 0 0
WINTERSTORM 191 1292 1532733250 11944000
WINTERWEATHER 33 343 20866000 15000000
WINTERWEATHERMIX 28 140 6432000 0
WINTRYMIX 1 77 12500 0

Events most harmful to population health

In this section we will look at which events have been most harmful to population health. We will subset the data for the FATALITIES and INJURIES variables and eliminate cases where both of these values are zero.

pophealth <- groupstorm %>%
  select(EVTYPECONDENSED, FATALITIES, INJURIES) %>%
  filter(FATALITIES > 0 | INJURIES > 0)

Having done this, we examine first the highest fatality rates. Here we show a summary table of the EVETYPECONDENSED with the highest number of fatalities.

pophealthfatal <- pophealth %>%
  select(-INJURIES) %>%
  arrange(desc(FATALITIES))
summaryfatal <- rbind(head(pophealthfatal, 30),
                      c("All other types",
                        sum(pophealthfatal[31:nrow(pophealthfatal), 2]))) %>%
  mutate(FATALITIES = as.numeric(FATALITIES)) # Somehow above operation coerces string
par(las = 2)
par(mar = c(5,6,6,2)+0.1)
barplot(height = summaryfatal$FATALITIES, names.arg = summaryfatal$EVTYPECONDENSED, horiz = TRUE, cex.names = .5, space = .5, ylim = c(0,40), xlim = c(0,2000), main = "Figure 1. Greatest causes of fatalities")

From this table it appears that most fatalities occur as a result of excessive heat or tornados. One might consider combining some of the other items in the table as being a part of the same category, thus surpassing the totals for thes events. “Flood” and “Flash flood” combined total 1301 fatalities, but perhaps should not be considered together as they are in fact deemed separate categories by NWS Directive 10-1605. One might also look at the variations in events involving “wind”, but a cursory subsetting suggests that these do not combine in sufficient numbers to surpass either excessive heat or tornados:

arrange(pophealthfatal[grepl("WIND", pophealthfatal$EVTYPECONDENSED),], desc(FATALITIES))
## Source: local data frame [25 x 2]
## 
##           EVTYPECONDENSED FATALITIES
## 1                TSTMWIND        241
## 2                HIGHWIND        235
## 3        THUNDERSTORMWIND        130
## 4    EXTREMECOLDWINDCHILL        125
## 5              STRONGWIND        103
## 6           COLDWINDCHILL         95
## 7                    WIND         18
## 8        EXTREMEWINDCHILL         17
## 9        MARINESTRONGWIND         14
## 10 MARINETHUNDERSTORMWIND         10
## ..                    ...        ...

Again we must keep in mind this is a crude subsetting, because even NWS Directive 10-1605 separates wind events into at least 7 different categories.

We now turn our attention to injury rates. From the following chart we see tornados far surpass any other cause of injury.

pophealthinjury <- pophealth %>%
  select(-FATALITIES) %>%
  arrange(desc(INJURIES))
summaryinjury <- rbind(head(pophealthinjury, 30),
                      c("All other types",
                        sum(pophealthinjury[31:nrow(pophealthinjury), 2]))) %>%
  mutate(INJURIES = as.numeric(INJURIES)) # Somehow above operation coerces string
par(las = 2)
par(mar = c(5,6,6,2)+0.1)
barplot(height = summaryinjury$INJURIES, names.arg = summaryinjury$EVTYPECONDENSED, horiz = TRUE, cex.names = .5, space = .5, ylim = c(0,40), xlim = c(0,25000), main = "Figure 2. Greatest causes of injuries")

Events most harmful to the economy

In this section we will look at which events have been most harmful to U.S. economy, in terms of property and crop damage. We will subset the data for the PROPDMG and CROPDMG variables, sum their value, and eliminate cases where their sum are zero.

economy <- groupstorm %>%
  mutate(TOTALDMG = PROPDMG + CROPDMG) %>%
  select(EVTYPECONDENSED, TOTALDMG) %>%
  filter(TOTALDMG > 0) %>%
  arrange(desc(TOTALDMG))

We now look at what types of events cause the most economic damage. From the following figure we can see that floods cause by far the most devastating impact to the economy, distantly trailed by hurricanes and typhoons. We note that even if we had grouped together “TYPHOON” and “HURRICANE” with “HURRICANETYPHOON”, the total would still be less than that of floods. However, if we were to also include “STORMSURGE” with that group, then the total may in fact surpass flooding. It is at this point that the analysis would benefit from expert knowledge that could properly classify these sources of economic impact.

summaryeconomy <- rbind(head(economy, 30),
                      c("All other types",
                        sum(economy[31:nrow(economy), 2]))) %>%
  mutate(TOTALDMG = as.numeric(TOTALDMG) / 1000000000) # Somehow above operation coerces string
par(las = 2)
par(mar = c(5,6,6,2)+0.1)
barplot(height = summaryeconomy$TOTALDMG, names.arg = summaryeconomy$EVTYPECONDENSED, horiz = TRUE, cex.names = .5, space = .5, ylim = c(0,40), xlim = c(0,160), main = "Figure 3. Greatest impact on the economy, in billions of dollars")