Public Health and Economic Impacts of North American Weather Events, 1950-2011

Synopsis

This analysis utilizes data from the National Climatic Data Center of the United States Department of Commerce to address the broad public health and economic impacts of severe weather events. Data on severe weather events has been collected by the National Weather Service between 1950 and 2011.

Disclaimer

This report represents my submission for Peer Assessment #2 for “Reproducable Research”, Course #5 of 9 in the Johns Hopkins Univeristy Data Science track on Coursera. It represents my work alone, and is intended for use in this course only.

Data Processing

R Packages Required

This analysis relies on the following packages. The embedded code includes the library calls for each package, but it assumes these packages have already been installed.

dplyr
knitr
xtable
lubridate
ggplot2

The data was downloaded from the Course assignment page: https://class.coursera.org/repdata-034/human_grading/view/courses/975147/assessments/4/submissions

Reading Data

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(knitr)
library(xtable)
library(lubridate)
library(ggplot2)
#download.file(
#        "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
#        "StormData.csv.bz2", method="curl")
stormdata <- read.csv("StormData.csv.bz2")

Pre-Processing Data

The two questions this report seeks to answer both have to do with maximum impacts of different types of weather events:

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?

This analysis considers the two variables that represent health effects (INJURIES and FATILITIES) and the two sets of variables that represent economic effects (PROPDMG and CROPDMG). The health variables are fairly straightforward, but the economic variables require some additional pre-processing. There is an separate exponent variable for each (PROPDMGEXP and CROPDMGEXP), but they do not have a consistent structure.
Sometimes the exponent is a number and sometimes it is a letter, for example “M” or “m” for millions. This chunk of code converts all of the exponents to a common format and then multiplies through to get new, single numeric values for PROPCOST and CROPCOST.

# Clean up exponent for property damage and calculate PROPCOST
propexp <- tolower(stormdata$PROPDMGEXP)
propexp[propexp == "b"] <- 9   #billions  = 10^9
propexp[propexp == "m"] <- 6   #millions  = 10^6
propexp[propexp == "k"] <- 3   #thousands = 10^3
propexp[propexp == "h"] <- 2   #hundreds  = 10^2
propexp <- as.numeric(propexp) #coerce non-numeric to blanks

## Warning: NAs introduced by coercion

stormdata$PROPCOST <- stormdata$PROPDMG * 10^(propexp)
stormdata$PROPCOST[is.na(stormdata$PROPCOST)] <- 0

# Clean up exponent for crop damage and calculate CROPCOST
cropexp <- tolower(stormdata$PROPDMGEXP)
cropexp[cropexp == "b"] <- 9   #billions  = 10^9
cropexp[cropexp == "m"] <- 6   #millions  = 10^6
cropexp[cropexp == "k"] <- 3   #thousands = 10^3
cropexp[cropexp == "h"] <- 2   #hundreds  = 10^2
cropexp <- as.numeric(cropexp) #coerce non-numeric to blanks

## Warning: NAs introduced by coercion

stormdata$CROPCOST <- stormdata$CROPDMG * 10^(cropexp)
stormdata$CROPCOST[is.na(stormdata$CROPCOST)] <- 0

# Create total cost variable as sum of PROPCOST and CROPCOST
stormdata$TOTALCOST <- stormdata$PROPCOST + stormdata$CROPCOST

High Level Summary

#Number of events by outcome
eventsTotal    <- nrow(stormdata)
eventsInjury   <- sum(stormdata$INJURIES > 0)
eventsFatal    <- sum(stormdata$FATALITIES > 0)
eventsProperty <- sum(stormdata$PROPCOST > 0)
eventsCrops    <- sum(stormdata$CROPCOST > 0)

Out of 902297 events recorded, there were:

17604 that included injuries (1.9510206%),
6974 that included fatailites (0.7729162%),
239092 that included damage to property (26.4981486%), and
17813 that included damage to crops (1.9741837%).

Summarizing Event Types

Weather is hard to categorize, often because single events are made up of multiple parts. Many different events can happen in the same span of time which can make them difficult to separate out. For example, thunderstorm events often include wind, rain, hail, and sometimes tornados. Winter events can include snow, sleet, freezing rain, ice, or all of the above - not to mention wind and cold temperatures. Once in a while you’ll even see all of the above - THUNDERSNOW!

The event type within the EVTYPE variable in the original dataset has many unique values, many of which overlap. Instead of just one value for ‘HURRICANE’, there are also values of ‘HURRICANE EMILY’ and ‘HURRICANE FELIX’. The term ‘THUNDERSTORM’ appears in many different values of EVTYPE, and is sometimes even abbreviated as ‘TSTM’ instead.

This analysis groups the various iterations of different types of weather events into a manageable number of discrete categories. These are stored in a new variable called EVCAT, for EVent CATegory. I’ve used the ‘grep’ function to search for characteristic text strings in the EVTYPE variable that correspond to each category. For example, the category ‘tropical’ includes events like hurricanes, typhoons, and tropical storms. It is made up of events that contain the strings ‘hurricane’, ‘tropical’, and ‘typhoon’. The category ‘tstorm’ is made up of the descriptive strings like ‘thunder’ and ‘tstm’, but also components of thunderstorms like ‘lightning’ and ‘hail’.

Wind and rain are part of thunderstorms, but you can also get wind and rain outside of a thunderstorm. So I’ve created a hierarchy of events, and assigned the value of EVCAT in the order of that hierarchy, with lower-order events like rain first and higher-order events like thunderstorms later. That way, if an event has two or more key text strings in its EVTYPE description, it will be assigned the lower-order EVCAT value first and then reassigned the higher-order EVCAT value later. In other words, an EVTYPE of ‘tstm and wind’ will first get the EVCAT value ‘wind’ but end up with the EVCAT value ‘tstorm’. Starting off with an EVCAT value of ‘other’ for all of the entries ensures that any event that does not have a matching text string will end up with an EVCAT value of ‘other’.

This is an approximation, but reducing the number of weather categories from 9xx to 16 makes for a more easily understandable analysis. In other words, it’s pretty good for a first cut, and could be refined more later if the original analysis led to more detailed questions to be answered (and enough time and resources were allocated to do that more detailed analysis).

## Categorize events
stormdata$EVTYPE <- tolower(stormdata$EVTYPE)
stormdata$EVCAT <- "other"

#Basic Singular Events
stormdata$EVCAT[grep("fog", stormdata$EVTPYE)] <- "fog"
stormdata$EVCAT[grep("hot|heat|warm", stormdata$EVTYPE)] <- "heat"
stormdata$EVCAT[grep("cold|cool|frost|freeze", stormdata$EVTYPE)] <- "cold"
stormdata$EVCAT[grep("surf|swell|rip|tide", stormdata$EVTYPE)] <- "surf"
stormdata$EVCAT[grep("drought|dry", stormdata$EVTYPE)] <- "drought"

#Precipitation
stormdata$EVCAT[grep("rain", stormdata$EVTYPE)] <- "rain"
stormdata$EVCAT[grep("wind|microburst", stormdata$EVTYPE)] <- "wind"
stormdata$EVCAT[grep("snow|sleet|freezing|ice|icy|blizzard|winter|mix", stormdata$EVTYPE)] <- "snow"
stormdata$EVCAT[grep("flood|fld", stormdata$EVTYPE)] <- "flood"

#Storms
stormdata$EVCAT[grep("thunder|tstm|lightning|hail", stormdata$EVTYPE)] <- "tstorm"
stormdata$EVCAT[grep("tornado|waterspout|funnel", stormdata$EVTYPE)] <- "tornado"
stormdata$EVCAT[grep("tropical|hurricane|typhoon", stormdata$EVTYPE)] <- "tropical"
stormdata$EVCAT[grep("surge", stormdata$EVTYPE)] <- "surge"

#Catastrophies
stormdata$EVCAT[grep("fire", stormdata$EVTYPE)] <- "fire"
stormdata$EVCAT[grep("tsunami", stormdata$EVTYPE)] <- "tsunami"
stormdata$EVCAT[grep("volcanic", stormdata$EVTYPE)] <- "volcano"
stormdata$EVCAT[grep("avalanche|slide", stormdata$EVTYPE)] <- "landslide"

Results

Annual Time Series

The data is collected over a long period of time, from 1950 through 2011. A lot has changed over that time, including the way data is collected - particularly the amount and level of detail of data that is collected. So we’ll take a look at the annual totals for impacts to health and ecomonic costs for each of the categories we’ve created.

stormdata$YEAR <- year(mdy_hms(stormdata$BGN_DATE))

annualSum   <- aggregate(cbind(INJURIES, FATALITIES, TOTALCOST) ~ EVCAT + YEAR, sum,
                         data=stormdata)
annualAve   <- aggregate(cbind(INJURIES, FATALITIES, TOTALCOST) ~ EVCAT + YEAR, mean,
                         data=stormdata)

Population Heatlh Impacts: Injuries

injPlot <- ggplot(annualSum, aes(YEAR, INJURIES, color=EVCAT))
print(injPlot + geom_line())

injTable   <- select(annualSum[annualSum$INJURIES > 5000,], EVCAT, YEAR, INJURIES)
kable(injTable, align=c('c','c','c','c'))

	EVCAT	YEAR	INJURIES
4	tornado	1953	5131
26	tornado	1965	5197
44	tornado	1974	6824
161	flood	1998	6445
376	tornado	2011	6163

This plot shows the total number of injuries per year for each of the weather categories we created earlier. It’s clear tornados have had some of the highest health costs over the entire span of this data, with an number of years with over 2,000 injuries from that category of weather. There were tornado events in 1953, 1965, 1974, and 2011 that injured more than 5,000 people. You can also see a large flood event in 1998 where over 6,000 people were injured. It is also interesting to note that injuries from tornados seem to have reduced since the mid-1980s, suggesting our ability to identify and warn people to take cover has improved markedly since then.

Population Heatlh Impacts: Fatalities

fatPlot <- ggplot(annualSum, aes(YEAR, FATALITIES, color=EVCAT))
print(fatPlot + geom_line())

fatTable   <- select(annualSum[annualSum$FATALITIES > 200,], EVCAT, YEAR, FATALITIES)
kable(fatTable, align=c('c','c','c','c'))

	EVCAT	YEAR	FATALITIES
3	tornado	1952	230
4	tornado	1953	519
26	tornado	1965	301
44	tornado	1974	366
116	heat	1995	1056
177	heat	1999	502
288	heat	2006	252
376	tornado	2011	587

This plot shows the total number of fatalities per year for each of the weather categories we created earlier. As with injuries, many of the events with the highest number of fatalities appear to be tornados. The same four outbreaks in 1953, 1965, 1974, and 2011 killed more than 250 people each. Heat is the other biggest killer, with three heat wave events in 1995, 1999, and 2006 that killed more then 250 people. In fact the worst single event for human fatalities was the heat wave of 1995 that killed 1,056 people.

Economic Impacts: Damage to Property and Crops

dmgPlot <- ggplot(annualSum, aes(YEAR, TOTALCOST, color=EVCAT))
print(dmgPlot + geom_line())

dmgTable   <- select(annualSum[annualSum$TOTALCOST > 10000000000,], EVCAT, YEAR, TOTALCOST)
kable(dmgTable, align=c('c','c','c','c'))

	EVCAT	YEAR	TOTALCOST
85	flood	1993	17117336100
123	tropical	1995	18717932000
161	flood	1998	13777705050
170	tropical	1998	304767042000
185	tropical	1999	505101461000
264	tropical	2004	448295290000
278	surge	2005	43058565000
280	tropical	2005	352975473330
287	flood	2006	156779168070
304	flood	2007	10746512460
320	flood	2008	16515225310
368	flood	2011	17309877150
376	tornado	2011	12289675800

This plot shows the total amount of damage in dollars per year for each of the weather categories we created earlier. The total damage shown here includes damage to both property and crops. Far and away the weather events that produce the highest economic costs are tropical storms and hurricanes. The tropical seasons of 1998, 1999, 2004, and 2005 all resulted in over $300 billion in damage each, with the worst year of 1999 costing the US over half a trillion dollars in damage to property and crops. Flood are the other big source of economic damage, with six years of damages topping $10 billion each. The tornado outbreak of 2011 which killed nearly 600 people and injured over 6,000 more also resulted in over $12 billion in total damages.

Comparison of Totals and Averages

From the plots above, it’s clear that most categories appear in the data only after 1993. In fact, before that year the only two categories collected were ‘tornado’ and ‘tstorm’. And before 1955, the only category collected was ‘tornado’. So to compare total health and economic costs across categories, we’ll focus the analysis here on data from 1993 onward. If we included the years before that in our totals, the results would include a big skew towards ‘tornado’ and ‘tstorm’.

This section examines the cumulative totals start to aggregate different stats across the EVCAT variable, concentrating on the years 1993-2011 when all of the categories were being recorded.

recent   <- stormdata[stormdata$YEAR >= 1993, ]
totals   <- aggregate(cbind(INJURIES, FATALITIES, PROPCOST, CROPCOST) ~ EVCAT, sum,
                      data=recent)
averages <- aggregate(cbind(INJURIES, FATALITIES, PROPCOST, CROPCOST) ~ EVCAT, mean,
                      data=recent)

Population Health: Injuries and Fatalities

mostInjuries   <- arrange(select(totals, EVCAT, INJURIES), desc(INJURIES))
mostFatalities <- arrange(select(totals, EVCAT, FATALITIES), desc(FATALITIES))

highestAveInjuries <- arrange(select(averages, EVCAT, INJURIES), desc(INJURIES))
highestAveFatalities <- arrange(select(averages, EVCAT, FATALITIES), desc(FATALITIES))

top10injuries   <- arrange(top_n(stormdata, 10, INJURIES)[c("EVTYPE","INJURIES")], 
                          desc(INJURIES))
top10fatalities <- arrange(top_n(stormdata, 10, FATALITIES)[c("EVTYPE","FATALITIES")], 
                          desc(FATALITIES))

Which categories of events have caused the most injuries and the most fatalities since 1993?

EVCAT	INJURIES	EVCAT	FATALITIES
tornado	23403	heat	3143
tstorm	12420	tornado	1652
heat	9228	flood	1553
flood	8683	tstorm	1295
snow	6184	surf	736

Which categories have the highest average number of injuries and fatalities since 1993?

EVCAT	INJURIES	EVCAT	FATALITIES
tsunami	6.4500000	tsunami	1.6500000
heat	3.1196755	heat	1.0625423
tropical	1.6250000	surf	0.3463529
tornado	0.6364180	landslide	0.2579403
other	0.6140231	tropical	0.1903409

What category were the top ten events with the most injuries and most fatalities overall?

EVTYPE	INJURIES	EVTYPE	FATALITIES
tornado	1700	heat	583
ice storm	1568	tornado	158
tornado	1228	tornado	116
tornado	1150	tornado	114
tornado	1150	excessive heat	99
flood	800	tornado	90
tornado	800	tornado	75
tornado	785	excessive heat	74
hurricane/typhoon	780	excessive heat	67
flood	750	tornado	57

Tornados and thunderstorms injured the most people betwen 1993 and 2011, while heat events were the biggest killer and a significant source of injuries as well. The table of averages suggests that while tsunami events do not happen all that often, when they do they take a great toll on population health in the form of injuries and fatalities. Tropical events also fall into this category of large events with high average health costs.

The last table here looks back at the EVTYPE variable from the original dataset to take a finer- grained look at the top ten events for injuries and fatalities since 1950. This needs to be taken with a grain of salt, since as we determined earlier tornados and thunderstorms are likely to be over-represented here, since they were the only type of event being recorded from 1950 to 1992.
So the fact that six of the top ten events that caused the most injuries and the most fatalities since 1950 were tornado outbreaks is interesting, but it may be a bit misleading. Also, this table will tend to under-represent events that cross state lines, since each record in the original data table is separated out by location. For example, if a flood caused injuries in several states, the events in this table may only represent the injuries from one of those states.

Economic Impacts: Property Damage and Crop Damage

mostPropDamage <- arrange(select(totals, EVCAT, PROPCOST), desc(PROPCOST))
mostCropDamage <- arrange(select(totals, EVCAT, CROPCOST), desc(CROPCOST))

highestAvePropDamage <- arrange(select(averages, EVCAT, PROPCOST), desc(PROPCOST))
highestAveCropDamage <- arrange(select(averages, EVCAT, CROPCOST), desc(CROPCOST))

top10propDamage <- arrange(top_n(stormdata, 10, PROPCOST)[c("EVTYPE","PROPCOST")], 
                          desc(PROPCOST))
top10cropDamage <- arrange(top_n(stormdata, 10, CROPCOST)[c("EVTYPE","CROPCOST")], 
                          desc(CROPCOST))

Which categories have caused the most property damage and most crop damage since 1993?

EVCAT	PROPCOST	EVCAT	CROPCOST
flood	168269944238	tropical	1.552691e+12
tropical	93072537560	flood	1.351411e+11
surge	47965224000	tstorm	3.269518e+10
tstorm	28098417355	tornado	3.076988e+10
tornado	28014883594	wind	9.260054e+09

Which categories have the caused highest average property costs and crop costs since 1993?

EVCAT	PROPCOST	EVCAT	CROPCOST
surge	116703708	tropical	1470351052
tropical	88136873	fire	1832552
tsunami	7203100	flood	1569328
fire	2005101	drought	1317350
flood	1954037	tsunami	1000000

What category were the top ten events with the most damage to property and to crops overall?

EVTYPE	PROPCOST	EVTYPE	CROPCOST
flood	1.150e+11	hurricane	5.00e+11
storm surge	3.130e+10	hurricane	3.01e+11
hurricane/typhoon	1.693e+10	hurricane/typhoon	3.00e+11
storm surge	1.126e+10	hurricane/typhoon	2.85e+11
hurricane/typhoon	1.000e+10	hurricane/typhoon	9.32e+10
hurricane/typhoon	7.350e+09	flood	3.25e+10
hurricane/typhoon	5.880e+09	hurricane/typhoon	2.50e+10
hurricane/typhoon	5.420e+09	hurricane/typhoon	2.50e+10
tropical storm	5.150e+09	hurricane opal/high winds	1.00e+10
winter storm	5.000e+09	wildfire	6.50e+09

Floods, tropical events, thunderstorms, and tornados are all in the top five categories for total damage to property and crops. In the years between 1993 and 2011, these four types of events caused the most economic damage by far. The table of averages suggests that fire events and tsunamis are relatively rare, but cause a large amount of economic costs when they occur. It is also interesting to note that ‘surge’ events appears in the list of highest average costs to property. These types of events tend to accompany tropical events such as hurricanes and tropical storms. It was tempting to group them together in the original categorization of events. Leaving them separate suggests and interesting difference between the two, however. While tropical events cause high amounts of health and economic costs, the impacts of surge events tend to be mostly economic.

The last table here again examines the individual events that had the highest economic impacts, sorted by the EVCAT variable in the original dataset. This data reinforces the notion that tropical events like hurricanes and typhoons are the most damaging type to property and crops. Again, this data may under-represent some types of events that span multiple states. A more detailed assesment of the costliest single events would require additional aggregation of multiple records that occur over multiple locations in similar timeframes.

Summary

A public official looking for guidance on how to reduce the population health effects and economic costs of weather events in the United States could take a few ideas from this report, including areas for further future study.

In terms of impacts to population health, in the form of injuries and fatalities, it may be useful to focus on reducing overall impacts from tornados, heat waves, thunder storms, and floods. In particular, heat waves tend to kill and injure lots of people in relatively few events. This could be an area where targeted intervention could make a significant impact.

In terms of economic impacts, in the form of damage to property and crops, it may be useful to focus on protection from large tropical events like hurricanes, typhoons, and the resulting storm surges from such events. The fact that population health costs are relatively low for these events suggests that early warnings and evacuation orders have allowed people to escape the worst impacts of these events. But protecting property and crops, which can’t so easily be moved out of harms way, may require significant changes in other areas of policy.

The method of categoirzing weather events into a manageable number is a good way to get an overall look at the data here. But it may be worth parsing certain categories further in future analysis to be able to provide more detailed policy advice. In addition, the analysis in this report didn’t join up events that happened concurrently across multiple states. A further analysis that does this may be more useful in determining how to plan for such large events in the future.

Storm Analysis

citywright

10 July 2016