In this analysis we will use the NOAA National Weather Service Storm Events Database to assess what types of storm events have the greatest impact on public health and the economy. The database provides information for various types of events and gives data on fatalities, injuries, property damage and crop damage. By analyzing which events have the highest costs in each of these categories we will identify which types of events have the greatest impact.
Before working with the data, we set some display options for this document and load some libraries.
# Set some chunk options
library(knitr)
opts_chunk$set(echo = TRUE,
message = FALSE, warning = FALSE,
cache = TRUE,
cache.path = "cache/",
fig.path = "figure/"
)
library(dplyr)
library(lubridate)
library(lattice)
# Download the data, if needed.
if (!file.exists("StormData.csv.bz2")) {
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
destfile <- "StormData.csv.bz2"
download.file(url, destfile, method = "curl")
}
rawdata <- read.csv("StormData.csv.bz2", stringsAsFactors = FALSE)
Below we present the steps taken to transform the data.
stormdata <- rawdata
We are interested in event types (EVTYPE), harm to population health (FATALITIES, INJURIES) and economic consequences (PROPDMG, CROPDMG, as well as the modifiers PROPDMGEXP and CROPDMGEXP). We will also need the date of observation (BGN_DATE). Our first step is to reduce the data to only these variables.
stormdata <- stormdata %>%
select(BGN_DATE, EVTYPE, FATALITIES:CROPDMGEXP)
Next, according to http://www.ncdc.noaa.gov/stormevents/details.jsp, “All Event Types (48 from Directive 10-1605): From 1996 to present, 48 event types are recorded as defined in NWS Directive 10-1605.” Since there is variation in how the data was recorded before this time, we only look at data from 1996 onward. Once we complete this step the date is no longer needed, so we discard it.
stormdata <- stormdata %>%
mutate(BGN_DATE = as.POSIXct(strptime(BGN_DATE, "%m/%d/%Y %H:%M:%S"))) %>%
filter(BGN_DATE > as.POSIXct("1996-01-01 0:00:00")) %>%
select(-BGN_DATE)
We further reduce the data to only those instances where the impact on population health or the economic consequences are non-zero.
stormdata <- stormdata %>%
filter(FATALITIES > 0 | INJURIES > 0 | PROPDMG > 0 | CROPDMG > 0)
The values for PROPDMG and CROPDMG have corresponding modifier values PROPDMGEXP and CROPDMGEXP. At this point there are only three possible values in this field: “K”, “M”, “B”, and the empty string “”.
table(stormdata$PROPDMGEXP)
##
## B K M
## 8447 32 185470 7364
table(stormdata$CROPDMGEXP)
##
## B K M
## 102763 2 96786 1762
According to the NWS Storm Data Directive, the characters “K”, “M”, and “B” correspond to thousands, millions, and billions, respectively. We modify the PROPDMG and CROPDMG variables accordingly, and discard the PROPDMGEXP and CROPDMGEXP variables.
stormdata <- stormdata %>%
mutate(PROPDMG = ifelse(PROPDMGEXP=="K", PROPDMG*1000, PROPDMG)) %>%
mutate(PROPDMG = ifelse(PROPDMGEXP=="M", PROPDMG*1000000, PROPDMG)) %>%
mutate(PROPDMG = ifelse(PROPDMGEXP=="B", PROPDMG*1000000000, PROPDMG)) %>%
mutate(CROPDMG = ifelse(CROPDMGEXP=="K", CROPDMG*1000, CROPDMG)) %>%
mutate(CROPDMG = ifelse(CROPDMGEXP=="M", CROPDMG*1000000, CROPDMG)) %>%
mutate(CROPDMG = ifelse(CROPDMGEXP=="B", CROPDMG*1000000000, CROPDMG)) %>%
select(-PROPDMGEXP, -CROPDMGEXP)
Inspection of the EVTYPE variable at this point shows there are some similar values up to variations in whitespace, capitalization, or punctuation. To normalize these values we force all letters to upper case and strip all non-alphanumeric characters. These condensed forms of the EVTYPE categories are stored in a new variable EVTYPECONDENSED.
stormdata <- stormdata %>%
mutate(EVTYPECONDENSED = gsub("[^A-Z0-9]", "", toupper(EVTYPE)))
From this point we now have 175 unique values in EVTYPE. We would like to further classify these down to the 48 types defined in NWS Directive 10-1605, however it will require expert knowledge to this. For example, the data includes an EVTYPE of “BEACH EROSION”, but in reading the NWS document this may occur as a part of “Tropical Storm”, “High Surf”, “Seiche”, or other possible categories. For the purposes of this report, we will analyze the data as it stands at this point, and look for any possible areas where combining EVTYPEs might have a substantive impact on our results.
We begin the analysis by grouping the data by EVTYPECONDENSED and summing the remaining numeric variables. All subsequent analyses will be based on subsets of this data. The entire table of values is displayed below.
groupstorm <- stormdata %>%
group_by(EVTYPECONDENSED) %>%
summarise_each(funs(sum), -EVTYPE)
kable(groupstorm, caption = "Table 1. Sums of all variables as grouped by EVTYPECONDENSED.")
| EVTYPECONDENSED | FATALITIES | INJURIES | PROPDMG | CROPDMG |
|---|---|---|---|---|
| AGRICULTURALFREEZE | 0 | 0 | 0 | 28820000 |
| ASTRONOMICALHIGHTIDE | 0 | 0 | 9425000 | 0 |
| ASTRONOMICALLOWTIDE | 0 | 0 | 320000 | 0 |
| AVALANCHE | 223 | 156 | 3711800 | 0 |
| BEACHEROSION | 0 | 0 | 100000 | 0 |
| BLACKICE | 1 | 24 | 0 | 0 |
| BLIZZARD | 70 | 385 | 525658950 | 7060000 |
| BLOWINGDUST | 0 | 0 | 20000 | 0 |
| BLOWINGSNOW | 1 | 1 | 15000 | 0 |
| BRUSHFIRE | 0 | 2 | 0 | 0 |
| COASTALEROSION | 0 | 0 | 766000 | 0 |
| COASTALFLOOD | 3 | 2 | 251400560 | 0 |
| COASTALFLOODING | 3 | 0 | 103809000 | 0 |
| COASTALFLOODINGEROSION | 0 | 5 | 35030000 | 0 |
| COASTALSTORM | 4 | 2 | 50000 | 0 |
| COLD | 18 | 12 | 554000 | 0 |
| COLDANDSNOW | 14 | 0 | 0 | 0 |
| COLDTEMPERATURE | 2 | 0 | 0 | 0 |
| COLDWEATHER | 2 | 0 | 0 | 0 |
| COLDWINDCHILL | 95 | 12 | 1990000 | 600000 |
| DAMAGINGFREEZE | 0 | 0 | 8000000 | 34130000 |
| DAMBREAK | 0 | 0 | 1002000 | 0 |
| DENSEFOG | 9 | 143 | 7319000 | 0 |
| DENSESMOKE | 0 | 0 | 100000 | 0 |
| DOWNBURST | 0 | 0 | 2000 | 0 |
| DROUGHT | 0 | 4 | 1046101000 | 13367566000 |
| DROWNING | 1 | 0 | 0 | 0 |
| DRYMICROBURST | 3 | 25 | 1732600 | 15000 |
| DUSTDEVIL | 2 | 39 | 663630 | 0 |
| DUSTSTORM | 11 | 376 | 5474000 | 3100000 |
| EARLYFROST | 0 | 0 | 0 | 42000000 |
| EROSIONCSTLFLOOD | 0 | 0 | 16200000 | 0 |
| EXCESSIVEHEAT | 1797 | 6391 | 7723700 | 492402000 |
| EXCESSIVESNOW | 0 | 2 | 1935000 | 0 |
| EXTENDEDCOLD | 1 | 0 | 100000 | 0 |
| EXTREMECOLD | 115 | 79 | 19760400 | 1308973000 |
| EXTREMECOLDWINDCHILL | 125 | 24 | 8648000 | 50000 |
| EXTREMEWINDCHILL | 17 | 5 | 755000 | 17000000 |
| FALLINGSNOWICE | 1 | 1 | 0 | 0 |
| FLASHFLOOD | 887 | 1674 | 15222253910 | 1334901700 |
| FLASHFLOODFLOOD | 0 | 0 | 5000 | 0 |
| FLOOD | 414 | 6758 | 143944833550 | 4974778400 |
| FLOODFLASHFLOOD | 0 | 0 | 10000 | 0 |
| FOG | 60 | 712 | 13145500 | 0 |
| FREEZE | 0 | 0 | 200000 | 156725000 |
| FREEZINGDRIZZLE | 2 | 13 | 105000 | 0 |
| FREEZINGFOG | 0 | 0 | 2182000 | 0 |
| FREEZINGRAIN | 2 | 0 | 626000 | 0 |
| FREEZINGSPRAY | 1 | 0 | 0 | 0 |
| FROST | 1 | 3 | 15000 | 0 |
| FROSTFREEZE | 0 | 0 | 10480000 | 1094186000 |
| FUNNELCLOUD | 0 | 1 | 134100 | 0 |
| GLAZE | 1 | 212 | 150000 | 0 |
| GRADIENTWIND | 0 | 0 | 37000 | 0 |
| GUSTYWIND | 1 | 1 | 360000 | 10000 |
| GUSTYWINDHAIL | 0 | 0 | 20000 | 0 |
| GUSTYWINDHVYRAIN | 0 | 0 | 2000 | 0 |
| GUSTYWINDRAIN | 0 | 0 | 2000 | 0 |
| GUSTYWINDS | 3 | 11 | 1276000 | 200000 |
| HAIL | 7 | 713 | 14595143420 | 2476029450 |
| HARDFREEZE | 0 | 0 | 0 | 12900000 |
| HAZARDOUSSURF | 0 | 1 | 0 | 0 |
| HEAT | 237 | 1222 | 1520000 | 176500 |
| HEATWAVE | 0 | 70 | 0 | 0 |
| HEAVYRAIN | 94 | 230 | 584864440 | 728169800 |
| HEAVYRAINHIGHSURF | 0 | 0 | 13500000 | 1500000 |
| HEAVYSEAS | 1 | 0 | 0 | 0 |
| HEAVYSNOW | 107 | 698 | 634417540 | 71122100 |
| HEAVYSNOWSHOWER | 0 | 2 | 10000 | 0 |
| HEAVYSURF | 6 | 40 | 1390000 | 0 |
| HEAVYSURFANDWIND | 3 | 0 | 0 | 0 |
| HEAVYSURFHIGHSURF | 42 | 48 | 9870000 | 0 |
| HIGHSEAS | 3 | 7 | 15000 | 0 |
| HIGHSURF | 90 | 150 | 83904500 | 0 |
| HIGHSURFADVISORY | 0 | 0 | 200000 | 0 |
| HIGHSWELLS | 1 | 0 | 5000 | 0 |
| HIGHWATER | 3 | 0 | 0 | 0 |
| HIGHWIND | 235 | 1083 | 5247860360 | 633561300 |
| HIGHWINDG40 | 0 | 0 | 18000 | 0 |
| HIGHWINDS | 0 | 0 | 500000 | 0 |
| HURRICANE | 61 | 46 | 11812819010 | 2741410000 |
| HURRICANEEDOUARD | 0 | 2 | 0 | 0 |
| HURRICANETYPHOON | 64 | 1275 | 69305840000 | 2607872800 |
| HYPERTHERMIAEXPOSURE | 1 | 0 | 0 | 0 |
| HYPOTHERMIAEXPOSURE | 7 | 0 | 0 | 0 |
| ICEJAMFLOODMINOR | 0 | 0 | 1000 | 0 |
| ICEONROAD | 1 | 0 | 0 | 0 |
| ICEROADS | 0 | 1 | 12000 | 0 |
| ICESTORM | 82 | 318 | 3642248810 | 15660000 |
| ICYROADS | 4 | 22 | 331200 | 0 |
| LAKEEFFECTSNOW | 0 | 0 | 40182000 | 0 |
| LAKESHOREFLOOD | 0 | 0 | 7540000 | 0 |
| LANDSLIDE | 37 | 52 | 324578000 | 20017000 |
| LANDSLIDES | 1 | 1 | 5000 | 0 |
| LANDSLUMP | 0 | 0 | 570000 | 0 |
| LANDSPOUT | 0 | 0 | 7000 | 0 |
| LATESEASONSNOW | 0 | 0 | 180000 | 0 |
| LIGHTFREEZINGRAIN | 0 | 0 | 451000 | 0 |
| LIGHTNING | 650 | 4140 | 743077080 | 6898440 |
| LIGHTSNOW | 1 | 2 | 2513000 | 0 |
| LIGHTSNOWFALL | 0 | 0 | 85000 | 0 |
| MARINEACCIDENT | 1 | 2 | 50000 | 0 |
| MARINEHAIL | 0 | 0 | 4000 | 0 |
| MARINEHIGHWIND | 1 | 1 | 1297010 | 0 |
| MARINESTRONGWIND | 14 | 22 | 418330 | 0 |
| MARINETHUNDERSTORMWIND | 10 | 26 | 436400 | 50000 |
| MARINETSTMWIND | 9 | 8 | 5421000 | 0 |
| MICROBURST | 0 | 0 | 20000 | 0 |
| MIXEDPRECIP | 2 | 26 | 0 | 0 |
| MIXEDPRECIPITATION | 0 | 0 | 790000 | 0 |
| MUDSLIDE | 4 | 2 | 1325100 | 0 |
| MUDSLIDES | 1 | 0 | 0 | 0 |
| NONSEVEREWINDDAMAGE | 0 | 7 | 5000 | 0 |
| NONTSTMWIND | 0 | 1 | 40000 | 0 |
| OTHER | 0 | 4 | 55500 | 1034400 |
| RAIN | 0 | 0 | 300000 | 250000 |
| RAINSNOW | 4 | 2 | 0 | 0 |
| RECORDHEAT | 2 | 0 | 0 | 0 |
| RIPCURRENT | 340 | 209 | 1000 | 0 |
| RIPCURRENTS | 202 | 294 | 162000 | 0 |
| RIVERFLOOD | 1 | 0 | 20282000 | 1875000 |
| RIVERFLOODING | 1 | 1 | 106155000 | 28020000 |
| ROCKSLIDE | 0 | 0 | 150000 | 0 |
| ROGUEWAVE | 0 | 2 | 0 | 0 |
| ROUGHSEAS | 8 | 5 | 0 | 0 |
| ROUGHSURF | 4 | 1 | 10000 | 0 |
| SEICHE | 0 | 0 | 980000 | 0 |
| SMALLHAIL | 0 | 10 | 70000 | 20793000 |
| SNOW | 2 | 12 | 2554000 | 0 |
| SNOWANDICE | 1 | 0 | 0 | 0 |
| SNOWSQUALL | 2 | 35 | 30000 | 0 |
| SNOWSQUALLS | 1 | 0 | 70000 | 0 |
| STORMSURGE | 2 | 37 | 43193536000 | 5000 |
| STORMSURGETIDE | 11 | 5 | 4641188000 | 850000 |
| STRONGWIND | 103 | 278 | 174759450 | 64953500 |
| STRONGWINDS | 7 | 21 | 2234790 | 0 |
| THUNDERSTORM | 1 | 1 | 0 | 0 |
| THUNDERSTORMWIND | 130 | 1400 | 3382654440 | 398331000 |
| THUNDERSTORMWINDG40 | 1 | 0 | 0 | 0 |
| TIDALFLOODING | 0 | 1 | 13000 | 0 |
| TORNADO | 1511 | 20667 | 24616905710 | 283425010 |
| TORRENTIALRAINFALL | 0 | 4 | 0 | 0 |
| TROPICALDEPRESSION | 0 | 0 | 1737000 | 0 |
| TROPICALSTORM | 57 | 338 | 7642475550 | 677711000 |
| TSTMWIND | 241 | 3629 | 4486156440 | 553915350 |
| TSTMWIND40 | 0 | 0 | 1000 | 0 |
| TSTMWIND41 | 0 | 0 | 8000 | 0 |
| TSTMWIND45 | 0 | 0 | 10000 | 0 |
| TSTMWINDANDLIGHTNING | 0 | 0 | 80000 | 0 |
| TSTMWINDG35 | 1 | 0 | 30000 | 0 |
| TSTMWINDG40 | 0 | 1 | 45000 | 0 |
| TSTMWINDG45 | 0 | 3 | 235500 | 0 |
| TSTMWINDHAIL | 5 | 95 | 44320500 | 64696250 |
| TSUNAMI | 33 | 129 | 144062000 | 20000 |
| TYPHOON | 0 | 5 | 600230000 | 825000 |
| UNSEASONABLECOLD | 0 | 0 | 0 | 5100000 |
| UNSEASONABLYCOLD | 0 | 0 | 0 | 25042500 |
| UNSEASONABLYWARM | 0 | 17 | 0 | 10000 |
| UNSEASONALRAIN | 0 | 0 | 0 | 10000000 |
| URBANSMLSTREAMFLD | 28 | 79 | 58309650 | 8488100 |
| VOLCANICASH | 0 | 0 | 500000 | 0 |
| WARMWEATHER | 0 | 2 | 0 | 0 |
| WATERSPOUT | 2 | 2 | 5730200 | 0 |
| WETMICROBURST | 0 | 0 | 35000 | 0 |
| WHIRLWIND | 1 | 0 | 12000 | 0 |
| WILDFIRE | 75 | 911 | 4758667000 | 295472800 |
| WILDFORESTFIRE | 12 | 545 | 3001782500 | 106782330 |
| WIND | 18 | 84 | 2289500 | 300000 |
| WINDANDWAVE | 0 | 0 | 1000000 | 0 |
| WINDDAMAGE | 0 | 0 | 10000 | 0 |
| WINDS | 1 | 0 | 0 | 0 |
| WINTERSTORM | 191 | 1292 | 1532733250 | 11944000 |
| WINTERWEATHER | 33 | 343 | 20866000 | 15000000 |
| WINTERWEATHERMIX | 28 | 140 | 6432000 | 0 |
| WINTRYMIX | 1 | 77 | 12500 | 0 |
In this section we will look at which events have been most harmful to population health. We will subset the data for the FATALITIES and INJURIES variables and eliminate cases where both of these values are zero.
pophealth <- groupstorm %>%
select(EVTYPECONDENSED, FATALITIES, INJURIES) %>%
filter(FATALITIES > 0 | INJURIES > 0)
Having done this, we examine first the highest fatality rates. Here we show a summary table of the EVETYPECONDENSED with the highest number of fatalities.
pophealthfatal <- pophealth %>%
select(-INJURIES) %>%
arrange(desc(FATALITIES))
summaryfatal <- rbind(head(pophealthfatal, 30),
c("All other types",
sum(pophealthfatal[31:nrow(pophealthfatal), 2]))) %>%
mutate(FATALITIES = as.numeric(FATALITIES)) # Somehow above operation coerces string
par(las = 2)
par(mar = c(5,6,6,2)+0.1)
barplot(height = summaryfatal$FATALITIES, names.arg = summaryfatal$EVTYPECONDENSED, horiz = TRUE, cex.names = .5, space = .5, ylim = c(0,40), xlim = c(0,2000), main = "Figure 1. Greatest causes of fatalities")
From this table it appears that most fatalities occur as a result of excessive heat or tornados. One might consider combining some of the other items in the table as being a part of the same category, thus surpassing the totals for thes events. “Flood” and “Flash flood” combined total 1301 fatalities, but perhaps should not be considered together as they are in fact deemed separate categories by NWS Directive 10-1605. One might also look at the variations in events involving “wind”, but a cursory subsetting suggests that these do not combine in sufficient numbers to surpass either excessive heat or tornados:
arrange(pophealthfatal[grepl("WIND", pophealthfatal$EVTYPECONDENSED),], desc(FATALITIES))
## Source: local data frame [25 x 2]
##
## EVTYPECONDENSED FATALITIES
## 1 TSTMWIND 241
## 2 HIGHWIND 235
## 3 THUNDERSTORMWIND 130
## 4 EXTREMECOLDWINDCHILL 125
## 5 STRONGWIND 103
## 6 COLDWINDCHILL 95
## 7 WIND 18
## 8 EXTREMEWINDCHILL 17
## 9 MARINESTRONGWIND 14
## 10 MARINETHUNDERSTORMWIND 10
## .. ... ...
Again we must keep in mind this is a crude subsetting, because even NWS Directive 10-1605 separates wind events into at least 7 different categories.
We now turn our attention to injury rates. From the following chart we see tornados far surpass any other cause of injury.
pophealthinjury <- pophealth %>%
select(-FATALITIES) %>%
arrange(desc(INJURIES))
summaryinjury <- rbind(head(pophealthinjury, 30),
c("All other types",
sum(pophealthinjury[31:nrow(pophealthinjury), 2]))) %>%
mutate(INJURIES = as.numeric(INJURIES)) # Somehow above operation coerces string
par(las = 2)
par(mar = c(5,6,6,2)+0.1)
barplot(height = summaryinjury$INJURIES, names.arg = summaryinjury$EVTYPECONDENSED, horiz = TRUE, cex.names = .5, space = .5, ylim = c(0,40), xlim = c(0,25000), main = "Figure 2. Greatest causes of injuries")
In this section we will look at which events have been most harmful to U.S. economy, in terms of property and crop damage. We will subset the data for the PROPDMG and CROPDMG variables, sum their value, and eliminate cases where their sum are zero.
economy <- groupstorm %>%
mutate(TOTALDMG = PROPDMG + CROPDMG) %>%
select(EVTYPECONDENSED, TOTALDMG) %>%
filter(TOTALDMG > 0) %>%
arrange(desc(TOTALDMG))
We now look at what types of events cause the most economic damage. From the following figure we can see that floods cause by far the most devastating impact to the economy, distantly trailed by hurricanes and typhoons. We note that even if we had grouped together “TYPHOON” and “HURRICANE” with “HURRICANETYPHOON”, the total would still be less than that of floods. However, if we were to also include “STORMSURGE” with that group, then the total may in fact surpass flooding. It is at this point that the analysis would benefit from expert knowledge that could properly classify these sources of economic impact.
summaryeconomy <- rbind(head(economy, 30),
c("All other types",
sum(economy[31:nrow(economy), 2]))) %>%
mutate(TOTALDMG = as.numeric(TOTALDMG) / 1000000000) # Somehow above operation coerces string
par(las = 2)
par(mar = c(5,6,6,2)+0.1)
barplot(height = summaryeconomy$TOTALDMG, names.arg = summaryeconomy$EVTYPECONDENSED, horiz = TRUE, cex.names = .5, space = .5, ylim = c(0,40), xlim = c(0,160), main = "Figure 3. Greatest impact on the economy, in billions of dollars")