Severe Weather Impact:

An analysis of the human and economic costs of severe weather events in the United States from 1950 through 2011.

Michael Seelaus

Synopsis

The analysis was conducted using the U.S. National Oceanic and Atmospheric Administration (NOAA) storm database. This database tracks severe weather events across the United States and includes data on casualties and economic loss in dollars. The data cover the period 1950 through November 2011. The analysis looked at the total casualties, average casualties per event, and total economic loss grouped by type of event. These event groupings were created specifically for this analysis in order to account for variation of reporting over time as well as to get a higher level picture of the results. The data show a stark difference between casualties from Tornados and all other event types, soaring beyond the second highest by almost an order of magnitude. When viewing economic loss, Tropical events (defined as Tropical Storms/Hurricanes and associated Storm Surge) come in more than the second and third worst events combined (Flood and Tornado, respectively). The following report presents the details of this analysis and more detail on the results.

Data Processing

Before any analysis can begin, the data must be cleaned. This particular dataset is quite large and covers a very long period of time. (61 years!) As you may imagine, the style of entries in the database has changed and grown over time. As a result, there are 985 different type of severe weather events. Many of these are slight variations in wording of the same event and others are related events such as Storm Surge during a Hurricane. For this analysis, I have chosen to group these into 13 high level categories with a 14th “Other” category for the remaining events. This covers the vast majority of economic damage and casualties in the dataset.

First, the full large dataset is loaded and immediately trimmed to contain only the relevant variables for my analysis. A new variable called “casualties” is added as the simple sum of “FATALITIES” and “INJURIES”. At this point, the necessary packages are loaded for the rest of the analysis. (In this case, “plyr” and “ggplot2”.) The original full dataset is removed from memory in preparation for the processing to come.

library(plyr)
library(ggplot2)
data <- read.csv(bzfile("repdata-data-StormData.csv.bz2"))
my_data <- data[,c("EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP","REFNUM")]
my_data$casualties <- my_data$FATALITIES + my_data$INJURIES
rm(data)

A closer examination of the data reveals that the crop and property damage estimates are input as whole numbers with a scalar in a separate column. This column is meant to hold a single character represnting the scale (e.g. k for thousands, m for millions, b for billions). There are a variety of other entries in this variable that are not clearly defined in the accompanying documentaion for the dataset. In the absence of any information on those entries, only the whole dollars without additional scaling have been used for those observations. The “costing” function below was created to calculate the full dollar cost of each event. This custom function was used with mapply to create two new variables in the dataset representing the full property damage cost and full crop damage cost. A sum of these two was also added to represent the total cost.

costing <- function(x, exp) {
        if(exp %in% c("k","K")) {
                return (x * 1000)}
        else if(exp %in% c("m","M")) {
                return (x * 1000000)}
        else if(exp %in% c("b","B")) {
                return (x * 1000000000)}
        else
                return (x)
        }
my_data$propcost <- mapply(costing, my_data$PROPDMG, my_data$PROPDMGEXP)
my_data$cropcost <- mapply(costing, my_data$CROPDMG, my_data$CROPDMGEXP)
my_data$totalcost <- my_data$propcost + my_data$cropcost

Visually inspecting the data also revealed an outlier. A single flood was listed in the database as costing $113 Billion dollars. This was far outside all of the other observations and could not be verified. It has been removed from the analysis.

my_data <- my_data[-as.numeric(rownames(my_data[my_data$REFNUM == 605943,])),]

At this point, it is necessary to group the event types (EVTYPE) into more manageable chunks. The process begins by selecting all of the unique elements from the EVTYPE variable and setting them in a new data frame. Mutuate within ddply is then used to search for a particular string and, if found, enter a specific category name in a new column. After the first creation of this column, new categories will only be selecting for matching strings that are not yet assigned to a specific category. In other words, the order of selection matters since multiple strings are often found in one entry (e.g. Thunderstorm / Tornado / Hail). The final listing of events is as follows:

Event Name Description
TORNADO Tornados, funnel clouds, waterspouts
WIND Wind events not otherwise defined
HEAT Excessive heat, dry conditions, drought
FLOOD All varieties of flood, not including storm surge
WINTER Winter storms, ice, blizzard, extreme cold
LIGHTNING Lightning events not otherwise defined
TROPICAL Hurricane, Tropical Storm/Depression, related Storm Surge
FIRE Wildfire and other fires not otherwise defined
HAIL Hail events not otherwise defined
FOG Fog events not otherwise defined
RIP CURRENT Rip currents
SURF Surf advisors / high surf / rough surf
AVALANCHE Avalanche
OTHER All other events not defined above
evmap <- as.data.frame(unique(my_data$EVTYPE))
colnames(evmap) <- "EVTYPE"
evmap <- ddply(evmap, .(EVTYPE), mutate, category = if(grepl("TORNADO",EVTYPE, ignore.case = TRUE)) {"TORNADO"})
evmap <- ddply(evmap, .(EVTYPE), mutate, category = if(grepl("CLOUD", EVTYPE, ignore.case = TRUE) & is.na(category)){"TORNADO"} else {category})
evmap <- ddply(evmap, .(EVTYPE), mutate, category = if(grepl("SPOUT", EVTYPE, ignore.case = TRUE) & is.na(category)){"TORNADO"} else {category})
evmap <- ddply(evmap, .(EVTYPE), mutate, category = if(grepl("FUNNEL", EVTYPE, ignore.case = TRUE) & is.na(category)){"TORNADO"} else {category})
evmap <- ddply(evmap, .(EVTYPE), mutate, category = if(grepl("SURF", EVTYPE, ignore.case = TRUE) & is.na(category)){"SURF"} else {category})
evmap <- ddply(evmap, .(EVTYPE), mutate, category = if(grepl("FLOOD", EVTYPE, ignore.case = TRUE) & is.na(category)){"FLOOD"} else {category})
evmap <- ddply(evmap, .(EVTYPE), mutate, category = if(grepl("FLD", EVTYPE, ignore.case = TRUE) & is.na(category)){"FLOOD"} else {category})
evmap <- ddply(evmap, .(EVTYPE), mutate, category = if(grepl("HEAT", EVTYPE, ignore.case = TRUE) & is.na(category)){"HEAT"} else {category})
evmap <- ddply(evmap, .(EVTYPE), mutate, category = if(grepl("WARM", EVTYPE, ignore.case = TRUE) & is.na(category)){"HEAT"} else {category})
evmap <- ddply(evmap, .(EVTYPE), mutate, category = if(grepl("DRY", EVTYPE, ignore.case = TRUE) & is.na(category)){"HEAT"} else {category})
evmap <- ddply(evmap, .(EVTYPE), mutate, category = if(grepl("^DROUGHT", EVTYPE, ignore.case = TRUE) & is.na(category)){"HEAT"} else {category})
evmap <- ddply(evmap, .(EVTYPE), mutate, category = if(grepl("WINT", EVTYPE, ignore.case = TRUE) & is.na(category)){"WINTER"} else {category})
evmap <- ddply(evmap, .(EVTYPE), mutate, category = if(grepl("SNOW", EVTYPE, ignore.case = TRUE) & is.na(category)){"WINTER"} else {category})
evmap <- ddply(evmap, .(EVTYPE), mutate, category = if(grepl("ICE", EVTYPE, ignore.case = TRUE) & is.na(category)){"WINTER"} else {category})
evmap <- ddply(evmap, .(EVTYPE), mutate, category = if(grepl("FROST", EVTYPE, ignore.case = TRUE) & is.na(category)){"WINTER"} else {category})
evmap <- ddply(evmap, .(EVTYPE), mutate, category = if(grepl("FREEZE", EVTYPE, ignore.case = TRUE) & is.na(category)){"WINTER"} else {category})
evmap <- ddply(evmap, .(EVTYPE), mutate, category = if(grepl("COLD", EVTYPE, ignore.case = TRUE) & is.na(category)){"WINTER"} else {category})
evmap <- ddply(evmap, .(EVTYPE), mutate, category = if(grepl("BLIZZARD", EVTYPE, ignore.case = TRUE) & is.na(category)){"WINTER"} else {category})
evmap <- ddply(evmap, .(EVTYPE), mutate, category = if(grepl("CHILL", EVTYPE, ignore.case = TRUE) & is.na(category)){"WINTER"} else {category})
evmap <- ddply(evmap, .(EVTYPE), mutate, category = if(grepl("SLEET", EVTYPE, ignore.case = TRUE) & is.na(category)){"WINTER"} else {category})
evmap <- ddply(evmap, .(EVTYPE), mutate, category = if(grepl("FIRE", EVTYPE, ignore.case = TRUE) & is.na(category)){"FIRE"} else {category})
evmap <- ddply(evmap, .(EVTYPE), mutate, category = if(grepl("HAIL", EVTYPE, ignore.case = TRUE) & is.na(category)){"HAIL"} else {category})
evmap <- ddply(evmap, .(EVTYPE), mutate, category = if(grepl("LIGHTNING", EVTYPE, ignore.case = TRUE) & is.na(category)){"LIGHTNING"} else {category})
evmap <- ddply(evmap, .(EVTYPE), mutate, category = if(grepl("HURRICANE", EVTYPE, ignore.case = TRUE) & is.na(category)){"TROPICAL"} else {category})
evmap <- ddply(evmap, .(EVTYPE), mutate, category = if(grepl("TROPICAL", EVTYPE, ignore.case = TRUE) & is.na(category)){"TROPICAL"} else {category})
evmap <- ddply(evmap, .(EVTYPE), mutate, category = if(grepl("TYPHOON", EVTYPE, ignore.case = TRUE) & is.na(category)){"TROPICAL"} else {category})
evmap <- ddply(evmap, .(EVTYPE), mutate, category = if(grepl("SURGE", EVTYPE, ignore.case = TRUE) & is.na(category)){"TROPICAL"} else {category})
evmap <- ddply(evmap, .(EVTYPE), mutate, category = if(grepl("WIND", EVTYPE, ignore.case = TRUE) & is.na(category)){"WIND"} else {category})
evmap <- ddply(evmap, .(EVTYPE), mutate, category = if(grepl("RIP CURR", EVTYPE, ignore.case = TRUE) & is.na(category)){"RIP CURRENT"} else {category})
evmap <- ddply(evmap, .(EVTYPE), mutate, category = if(grepl("AVALANC", EVTYPE, ignore.case = TRUE) & is.na(category)){"AVALANCHE"} else {category})
evmap <- ddply(evmap, .(EVTYPE), mutate, category = if(grepl("FOG", EVTYPE, ignore.case = TRUE) & is.na(category)){"FOG"} else {category})
my_data <- join(my_data, evmap)
## Joining by: EVTYPE

The last step above was to join the new event table with the existing dataset in order to assign this new category column to every observation.

Finally, ddply is used to create a dataset of summary statistics useful for analysis across the newly created event groupings. These statistics include the sum of fatalities, injuries, and total casualties. It also includes the rate of fatalities, injuries, and total casualties per event type and the crop damage cost, property damage cost, and total economic cost of the event groupings.

my_data <- ddply(my_data, .(category), summarize, fatalities = sum(FATALITIES), fatalities_per = fatalities / length(category), injuries = sum(INJURIES), injuries_per = injuries / length(category), casualties = sum(casualties), casualties_per = casualties / length(category), cropcost = sum(cropcost), propcost = sum(propcost), totalcost = sum(propcost) + sum(cropcost))
my_data[is.na(my_data$category),"category"] <- "OTHER"

Results

The first section of this analysis looked at the events most harmful with respect to population health, specifically injuries and fatalities (here combined as casualties). As can be seen below from the five categories with the highest number of casualties, Tornadoes cause an incredible amount of human suffering as compared to other events. This may not be that surprising given the violence of this event and the inability to forecast far ahead of time.

ggplot(my_data, aes(x=casualties, y=reorder(category, casualties))) + geom_segment(aes(yend=category), xend=0, color="grey60") + geom_point(size=3) + theme_bw() + theme(panel.grid.major.x = element_blank(), panel.grid.minor.x = element_blank(), panel.grid.major.y = element_blank()) + labs(x="Casualties", y = "Weather Event", title = "Casualties by Weather Event")

However from a casuality per occurance standpoint, Tornadoes drops to third and Heat and Rip Currents take the top two spots. While not as frequent as tornadoes, Heat related and rip current related casualties have higher rates of casualty per event. This highlights the need to understand the types of events prevalent in each location and the variance in types of risk associated with each.

ggplot(my_data, aes(x=casualties_per, y=reorder(category, casualties_per))) + geom_segment(aes(yend=category), xend=0, color="grey60") + geom_point(size=3) + theme_bw() + theme(panel.grid.major.x = element_blank(), panel.grid.minor.x = element_blank(), panel.grid.major.y = element_blank()) + labs(x="Average Casualties per Event", y = "Weather Event", title = "Average Casualties per Weather Event")

The second part of this analysis looked at the events most harmful from an economic standpoint. Across this dimension, we can see that TROPICAL now dominates the field. This is also not surprising given the vast property damage casued by Hurriances in the United States. Flood and Tornado come in second and third on a tier of their own. All other event types fall back to lower levels. Given the information analyzed in this report, both Tornadoes and Hurricanes (and related events) score high on both economic and human cost.

ggplot(my_data, aes(x=totalcost, y=reorder(category, totalcost))) + geom_segment(aes(yend=category), xend=0, color="grey60") + geom_point(size=3) + theme_bw() + theme(panel.grid.major.x = element_blank(), panel.grid.minor.x = element_blank(), panel.grid.major.y = element_blank()) + labs(x="Total Damage Cost (Billions)", y = "Weather Event", title = "Total Damage Cost by Weather Event") + scale_x_continuous(labels=function(x)x/1000000000)