Public Health and Economic Impacts of North American Weather Events, 1950-2011

Synopsis

This analysis utilizes data from the National Climatic Data Center of the United States Department of Commerce to address the broad public health and economic impacts of severe weather events. Data on severe weather events has been collected by the National Weather Service between 1950 and 2011.

Disclaimer

This report represents my submission for Peer Assessment #2 for “Reproducable Research”, Course #5 of 9 in the Johns Hopkins Univeristy Data Science track on Coursera. It represents my work alone, and is intended for use in this course only.

Data Processing

R Packages Required

This analysis relies on the following packages. The embedded code includes the library calls for each package, but it assumes these packages have already been installed.

  • dplyr
  • knitr
  • xtable
  • lubridate
  • ggplot2

The data was downloaded from the Course assignment page: https://class.coursera.org/repdata-034/human_grading/view/courses/975147/assessments/4/submissions

Reading Data

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(knitr)
library(xtable)
library(lubridate)
library(ggplot2)
#download.file(
#        "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
#        "StormData.csv.bz2", method="curl")
stormdata <- read.csv("StormData.csv.bz2")

Pre-Processing Data

The two questions this report seeks to answer both have to do with maximum impacts of different types of weather events:

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

  2. Across the United States, which types of events have the greatest economic consequences?

This analysis considers the two variables that represent health effects (INJURIES and FATILITIES) and the two sets of variables that represent economic effects (PROPDMG and CROPDMG). The health variables are fairly straightforward, but the economic variables require some additional pre-processing. There is an separate exponent variable for each (PROPDMGEXP and CROPDMGEXP), but they do not have a consistent structure.
Sometimes the exponent is a number and sometimes it is a letter, for example “M” or “m” for millions. This chunk of code converts all of the exponents to a common format and then multiplies through to get new, single numeric values for PROPCOST and CROPCOST.

# Clean up exponent for property damage and calculate PROPCOST
propexp <- tolower(stormdata$PROPDMGEXP)
propexp[propexp == "b"] <- 9   #billions  = 10^9
propexp[propexp == "m"] <- 6   #millions  = 10^6
propexp[propexp == "k"] <- 3   #thousands = 10^3
propexp[propexp == "h"] <- 2   #hundreds  = 10^2
propexp <- as.numeric(propexp) #coerce non-numeric to blanks
## Warning: NAs introduced by coercion
stormdata$PROPCOST <- stormdata$PROPDMG * 10^(propexp)
stormdata$PROPCOST[is.na(stormdata$PROPCOST)] <- 0

# Clean up exponent for crop damage and calculate CROPCOST
cropexp <- tolower(stormdata$PROPDMGEXP)
cropexp[cropexp == "b"] <- 9   #billions  = 10^9
cropexp[cropexp == "m"] <- 6   #millions  = 10^6
cropexp[cropexp == "k"] <- 3   #thousands = 10^3
cropexp[cropexp == "h"] <- 2   #hundreds  = 10^2
cropexp <- as.numeric(cropexp) #coerce non-numeric to blanks
## Warning: NAs introduced by coercion
stormdata$CROPCOST <- stormdata$CROPDMG * 10^(cropexp)
stormdata$CROPCOST[is.na(stormdata$CROPCOST)] <- 0

# Create total cost variable as sum of PROPCOST and CROPCOST
stormdata$TOTALCOST <- stormdata$PROPCOST + stormdata$CROPCOST

High Level Summary

#Number of events by outcome
eventsTotal    <- nrow(stormdata)
eventsInjury   <- sum(stormdata$INJURIES > 0)
eventsFatal    <- sum(stormdata$FATALITIES > 0)
eventsProperty <- sum(stormdata$PROPCOST > 0)
eventsCrops    <- sum(stormdata$CROPCOST > 0)

Out of 902297 events recorded, there were:

  • 17604 that included injuries (1.9510206%),
  • 6974 that included fatailites (0.7729162%),
  • 239092 that included damage to property (26.4981486%), and
  • 17813 that included damage to crops (1.9741837%).

Summarizing Event Types

Weather is hard to categorize, often because single events are made up of multiple parts. Many different events can happen in the same span of time which can make them difficult to separate out. For example, thunderstorm events often include wind, rain, hail, and sometimes tornados. Winter events can include snow, sleet, freezing rain, ice, or all of the above - not to mention wind and cold temperatures. Once in a while you’ll even see all of the above - THUNDERSNOW!

The event type within the EVTYPE variable in the original dataset has many unique values, many of which overlap. Instead of just one value for ‘HURRICANE’, there are also values of ‘HURRICANE EMILY’ and ‘HURRICANE FELIX’. The term ‘THUNDERSTORM’ appears in many different values of EVTYPE, and is sometimes even abbreviated as ‘TSTM’ instead.

This analysis groups the various iterations of different types of weather events into a manageable number of discrete categories. These are stored in a new variable called EVCAT, for EVent CATegory. I’ve used the ‘grep’ function to search for characteristic text strings in the EVTYPE variable that correspond to each category. For example, the category ‘tropical’ includes events like hurricanes, typhoons, and tropical storms. It is made up of events that contain the strings ‘hurricane’, ‘tropical’, and ‘typhoon’. The category ‘tstorm’ is made up of the descriptive strings like ‘thunder’ and ‘tstm’, but also components of thunderstorms like ‘lightning’ and ‘hail’.

Wind and rain are part of thunderstorms, but you can also get wind and rain outside of a thunderstorm. So I’ve created a hierarchy of events, and assigned the value of EVCAT in the order of that hierarchy, with lower-order events like rain first and higher-order events like thunderstorms later. That way, if an event has two or more key text strings in its EVTYPE description, it will be assigned the lower-order EVCAT value first and then reassigned the higher-order EVCAT value later. In other words, an EVTYPE of ‘tstm and wind’ will first get the EVCAT value ‘wind’ but end up with the EVCAT value ‘tstorm’. Starting off with an EVCAT value of ‘other’ for all of the entries ensures that any event that does not have a matching text string will end up with an EVCAT value of ‘other’.

This is an approximation, but reducing the number of weather categories from 9xx to 16 makes for a more easily understandable analysis. In other words, it’s pretty good for a first cut, and could be refined more later if the original analysis led to more detailed questions to be answered (and enough time and resources were allocated to do that more detailed analysis).

## Categorize events
stormdata$EVTYPE <- tolower(stormdata$EVTYPE)
stormdata$EVCAT <- "other"

#Basic Singular Events
stormdata$EVCAT[grep("fog", stormdata$EVTPYE)] <- "fog"
stormdata$EVCAT[grep("hot|heat|warm", stormdata$EVTYPE)] <- "heat"
stormdata$EVCAT[grep("cold|cool|frost|freeze", stormdata$EVTYPE)] <- "cold"
stormdata$EVCAT[grep("surf|swell|rip|tide", stormdata$EVTYPE)] <- "surf"
stormdata$EVCAT[grep("drought|dry", stormdata$EVTYPE)] <- "drought"

#Precipitation
stormdata$EVCAT[grep("rain", stormdata$EVTYPE)] <- "rain"
stormdata$EVCAT[grep("wind|microburst", stormdata$EVTYPE)] <- "wind"
stormdata$EVCAT[grep("snow|sleet|freezing|ice|icy|blizzard|winter|mix", stormdata$EVTYPE)] <- "snow"
stormdata$EVCAT[grep("flood|fld", stormdata$EVTYPE)] <- "flood"

#Storms
stormdata$EVCAT[grep("thunder|tstm|lightning|hail", stormdata$EVTYPE)] <- "tstorm"
stormdata$EVCAT[grep("tornado|waterspout|funnel", stormdata$EVTYPE)] <- "tornado"
stormdata$EVCAT[grep("tropical|hurricane|typhoon", stormdata$EVTYPE)] <- "tropical"
stormdata$EVCAT[grep("surge", stormdata$EVTYPE)] <- "surge"

#Catastrophies
stormdata$EVCAT[grep("fire", stormdata$EVTYPE)] <- "fire"
stormdata$EVCAT[grep("tsunami", stormdata$EVTYPE)] <- "tsunami"
stormdata$EVCAT[grep("volcanic", stormdata$EVTYPE)] <- "volcano"
stormdata$EVCAT[grep("avalanche|slide", stormdata$EVTYPE)] <- "landslide"

Results

Annual Time Series

The data is collected over a long period of time, from 1950 through 2011. A lot has changed over that time, including the way data is collected - particularly the amount and level of detail of data that is collected. So we’ll take a look at the annual totals for impacts to health and ecomonic costs for each of the categories we’ve created.

stormdata$YEAR <- year(mdy_hms(stormdata$BGN_DATE))

annualSum   <- aggregate(cbind(INJURIES, FATALITIES, TOTALCOST) ~ EVCAT + YEAR, sum,
                         data=stormdata)
annualAve   <- aggregate(cbind(INJURIES, FATALITIES, TOTALCOST) ~ EVCAT + YEAR, mean,
                         data=stormdata) 

Population Heatlh Impacts: Injuries

injPlot <- ggplot(annualSum, aes(YEAR, INJURIES, color=EVCAT))
print(injPlot + geom_line())

injTable   <- select(annualSum[annualSum$INJURIES > 5000,], EVCAT, YEAR, INJURIES)
kable(injTable, align=c('c','c','c','c'))
EVCAT YEAR INJURIES
4 tornado 1953 5131
26 tornado 1965 5197
44 tornado 1974 6824
161 flood 1998 6445
376 tornado 2011 6163

This plot shows the total number of injuries per year for each of the weather categories we created earlier. It’s clear tornados have had some of the highest health costs over the entire span of this data, with an number of years with over 2,000 injuries from that category of weather. There were tornado events in 1953, 1965, 1974, and 2011 that injured more than 5,000 people. You can also see a large flood event in 1998 where over 6,000 people were injured. It is also interesting to note that injuries from tornados seem to have reduced since the mid-1980s, suggesting our ability to identify and warn people to take cover has improved markedly since then.

Population Heatlh Impacts: Fatalities

fatPlot <- ggplot(annualSum, aes(YEAR, FATALITIES, color=EVCAT))
print(fatPlot + geom_line())

fatTable   <- select(annualSum[annualSum$FATALITIES > 200,], EVCAT, YEAR, FATALITIES)
kable(fatTable, align=c('c','c','c','c'))
EVCAT YEAR FATALITIES
3 tornado 1952 230
4 tornado 1953 519
26 tornado 1965 301
44 tornado 1974 366
116 heat 1995 1056
177 heat 1999 502
288 heat 2006 252
376 tornado 2011 587

This plot shows the total number of fatalities per year for each of the weather categories we created earlier. As with injuries, many of the events with the highest number of fatalities appear to be tornados. The same four outbreaks in 1953, 1965, 1974, and 2011 killed more than 250 people each. Heat is the other biggest killer, with three heat wave events in 1995, 1999, and 2006 that killed more then 250 people. In fact the worst single event for human fatalities was the heat wave of 1995 that killed 1,056 people.

Economic Impacts: Damage to Property and Crops

dmgPlot <- ggplot(annualSum, aes(YEAR, TOTALCOST, color=EVCAT))
print(dmgPlot + geom_line())

dmgTable   <- select(annualSum[annualSum$TOTALCOST > 10000000000,], EVCAT, YEAR, TOTALCOST)
kable(dmgTable, align=c('c','c','c','c'))
EVCAT YEAR TOTALCOST
85 flood 1993 17117336100
123 tropical 1995 18717932000
161 flood 1998 13777705050
170 tropical 1998 304767042000
185 tropical 1999 505101461000
264 tropical 2004 448295290000
278 surge 2005 43058565000
280 tropical 2005 352975473330
287 flood 2006 156779168070
304 flood 2007 10746512460
320 flood 2008 16515225310
368 flood 2011 17309877150
376 tornado 2011 12289675800

This plot shows the total amount of damage in dollars per year for each of the weather categories we created earlier. The total damage shown here includes damage to both property and crops. Far and away the weather events that produce the highest economic costs are tropical storms and hurricanes. The tropical seasons of 1998, 1999, 2004, and 2005 all resulted in over $300 billion in damage each, with the worst year of 1999 costing the US over half a trillion dollars in damage to property and crops. Flood are the other big source of economic damage, with six years of damages topping $10 billion each. The tornado outbreak of 2011 which killed nearly 600 people and injured over 6,000 more also resulted in over $12 billion in total damages.

Comparison of Totals and Averages

From the plots above, it’s clear that most categories appear in the data only after 1993. In fact, before that year the only two categories collected were ‘tornado’ and ‘tstorm’. And before 1955, the only category collected was ‘tornado’. So to compare total health and economic costs across categories, we’ll focus the analysis here on data from 1993 onward. If we included the years before that in our totals, the results would include a big skew towards ‘tornado’ and ‘tstorm’.

This section examines the cumulative totals start to aggregate different stats across the EVCAT variable, concentrating on the years 1993-2011 when all of the categories were being recorded.

recent   <- stormdata[stormdata$YEAR >= 1993, ]
totals   <- aggregate(cbind(INJURIES, FATALITIES, PROPCOST, CROPCOST) ~ EVCAT, sum,
                      data=recent)
averages <- aggregate(cbind(INJURIES, FATALITIES, PROPCOST, CROPCOST) ~ EVCAT, mean,
                      data=recent)

Population Health: Injuries and Fatalities

mostInjuries   <- arrange(select(totals, EVCAT, INJURIES), desc(INJURIES))
mostFatalities <- arrange(select(totals, EVCAT, FATALITIES), desc(FATALITIES))

highestAveInjuries <- arrange(select(averages, EVCAT, INJURIES), desc(INJURIES))
highestAveFatalities <- arrange(select(averages, EVCAT, FATALITIES), desc(FATALITIES))

top10injuries   <- arrange(top_n(stormdata, 10, INJURIES)[c("EVTYPE","INJURIES")], 
                          desc(INJURIES))
top10fatalities <- arrange(top_n(stormdata, 10, FATALITIES)[c("EVTYPE","FATALITIES")], 
                          desc(FATALITIES))

Which categories of events have caused the most injuries and the most fatalities since 1993?

EVCAT INJURIES EVCAT FATALITIES
tornado 23403 heat 3143
tstorm 12420 tornado 1652
heat 9228 flood 1553
flood 8683 tstorm 1295
snow 6184 surf 736

Which categories have the highest average number of injuries and fatalities since 1993?

EVCAT INJURIES EVCAT FATALITIES
tsunami 6.4500000 tsunami 1.6500000
heat 3.1196755 heat 1.0625423
tropical 1.6250000 surf 0.3463529
tornado 0.6364180 landslide 0.2579403
other 0.6140231 tropical 0.1903409

What category were the top ten events with the most injuries and most fatalities overall?

EVTYPE INJURIES EVTYPE FATALITIES
tornado 1700 heat 583
ice storm 1568 tornado 158
tornado 1228 tornado 116
tornado 1150 tornado 114
tornado 1150 excessive heat 99
flood 800 tornado 90
tornado 800 tornado 75
tornado 785 excessive heat 74
hurricane/typhoon 780 excessive heat 67
flood 750 tornado 57

Tornados and thunderstorms injured the most people betwen 1993 and 2011, while heat events were the biggest killer and a significant source of injuries as well. The table of averages suggests that while tsunami events do not happen all that often, when they do they take a great toll on population health in the form of injuries and fatalities. Tropical events also fall into this category of large events with high average health costs.

The last table here looks back at the EVTYPE variable from the original dataset to take a finer- grained look at the top ten events for injuries and fatalities since 1950. This needs to be taken with a grain of salt, since as we determined earlier tornados and thunderstorms are likely to be over-represented here, since they were the only type of event being recorded from 1950 to 1992.
So the fact that six of the top ten events that caused the most injuries and the most fatalities since 1950 were tornado outbreaks is interesting, but it may be a bit misleading. Also, this table will tend to under-represent events that cross state lines, since each record in the original data table is separated out by location. For example, if a flood caused injuries in several states, the events in this table may only represent the injuries from one of those states.

Economic Impacts: Property Damage and Crop Damage

mostPropDamage <- arrange(select(totals, EVCAT, PROPCOST), desc(PROPCOST))
mostCropDamage <- arrange(select(totals, EVCAT, CROPCOST), desc(CROPCOST))

highestAvePropDamage <- arrange(select(averages, EVCAT, PROPCOST), desc(PROPCOST))
highestAveCropDamage <- arrange(select(averages, EVCAT, CROPCOST), desc(CROPCOST))

top10propDamage <- arrange(top_n(stormdata, 10, PROPCOST)[c("EVTYPE","PROPCOST")], 
                          desc(PROPCOST))
top10cropDamage <- arrange(top_n(stormdata, 10, CROPCOST)[c("EVTYPE","CROPCOST")], 
                          desc(CROPCOST))

Which categories have caused the most property damage and most crop damage since 1993?

EVCAT PROPCOST EVCAT CROPCOST
flood 168269944238 tropical 1.552691e+12
tropical 93072537560 flood 1.351411e+11
surge 47965224000 tstorm 3.269518e+10
tstorm 28098417355 tornado 3.076988e+10
tornado 28014883594 wind 9.260054e+09

Which categories have the caused highest average property costs and crop costs since 1993?

EVCAT PROPCOST EVCAT CROPCOST
surge 116703708 tropical 1470351052
tropical 88136873 fire 1832552
tsunami 7203100 flood 1569328
fire 2005101 drought 1317350
flood 1954037 tsunami 1000000

What category were the top ten events with the most damage to property and to crops overall?

EVTYPE PROPCOST EVTYPE CROPCOST
flood 1.150e+11 hurricane 5.00e+11
storm surge 3.130e+10 hurricane 3.01e+11
hurricane/typhoon 1.693e+10 hurricane/typhoon 3.00e+11
storm surge 1.126e+10 hurricane/typhoon 2.85e+11
hurricane/typhoon 1.000e+10 hurricane/typhoon 9.32e+10
hurricane/typhoon 7.350e+09 flood 3.25e+10
hurricane/typhoon 5.880e+09 hurricane/typhoon 2.50e+10
hurricane/typhoon 5.420e+09 hurricane/typhoon 2.50e+10
tropical storm 5.150e+09 hurricane opal/high winds 1.00e+10
winter storm 5.000e+09 wildfire 6.50e+09

Floods, tropical events, thunderstorms, and tornados are all in the top five categories for total damage to property and crops. In the years between 1993 and 2011, these four types of events caused the most economic damage by far. The table of averages suggests that fire events and tsunamis are relatively rare, but cause a large amount of economic costs when they occur. It is also interesting to note that ‘surge’ events appears in the list of highest average costs to property. These types of events tend to accompany tropical events such as hurricanes and tropical storms. It was tempting to group them together in the original categorization of events. Leaving them separate suggests and interesting difference between the two, however. While tropical events cause high amounts of health and economic costs, the impacts of surge events tend to be mostly economic.

The last table here again examines the individual events that had the highest economic impacts, sorted by the EVCAT variable in the original dataset. This data reinforces the notion that tropical events like hurricanes and typhoons are the most damaging type to property and crops. Again, this data may under-represent some types of events that span multiple states. A more detailed assesment of the costliest single events would require additional aggregation of multiple records that occur over multiple locations in similar timeframes.

Summary

A public official looking for guidance on how to reduce the population health effects and economic costs of weather events in the United States could take a few ideas from this report, including areas for further future study.

In terms of impacts to population health, in the form of injuries and fatalities, it may be useful to focus on reducing overall impacts from tornados, heat waves, thunder storms, and floods. In particular, heat waves tend to kill and injure lots of people in relatively few events. This could be an area where targeted intervention could make a significant impact.

In terms of economic impacts, in the form of damage to property and crops, it may be useful to focus on protection from large tropical events like hurricanes, typhoons, and the resulting storm surges from such events. The fact that population health costs are relatively low for these events suggests that early warnings and evacuation orders have allowed people to escape the worst impacts of these events. But protecting property and crops, which can’t so easily be moved out of harms way, may require significant changes in other areas of policy.

The method of categoirzing weather events into a manageable number is a good way to get an overall look at the data here. But it may be worth parsing certain categories further in future analysis to be able to provide more detailed policy advice. In addition, the analysis in this report didn’t join up events that happened concurrently across multiple states. A further analysis that does this may be more useful in determining how to plan for such large events in the future.