Analysis of outcomes from harmful weather events

Synopsis

This report analyzes data from the U.S. National Oceanic and Atmospheric Administration's database of severe weather events, between 1950 and 2011, to determine the impact of major storm events to and their economic consequences.

Based on the analysis provided below, it is clear that tornados cause the vast majority of casualties in the United States.

Additionally, it is clear that floods cause the greatest economic consequences in the United States. Also notable is that several other of the top ten financial damage causing events are related to flooding events (storm surges, flash/river floods).

Data Processing

This analysis was based on the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database, which tracks characteristics of major storms and weather events in the United States.

The data used can be downloaded from this location. A documentation file in pdf format is available here.

The details of the process are described in the following sections.

Downloading and reading the database


suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(ggthemes))
suppressPackageStartupMessages(library(xtable))
suppressPackageStartupMessages(library(plyr))
suppressPackageStartupMessages(library(reshape2))
suppressPackageStartupMessages(library(grid))
suppressPackageStartupMessages(library(gridExtra))

options(scipen = 9)

datafile <- "repdata_data_StormData.csv.bz2"

if (!file.exists(datafile)) {
    # download data, if it does not exist
    url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
    download.file(url, datafile, mode = "wb")
}

data <- read.csv(bzfile(datafile), header = TRUE, stringsAsFactors = FALSE)
# transform dates
data$BGN_DATE <- as.Date(data$BGN_DATE, format = "%m/%d/%Y %H:%M:%S")

## SUBSETTING DATA subset for data beginning in 1996 for more homogenous data
## this data is a little cleaner, and inflation adjustment isn't as difficult
data <- data[data$BGN_DATE >= as.Date("1996-01-01"), ]
# throw out any events for which injuries, fatalities, property damage, and
# crop damage were all 0
data <- data[data$INJURIES > 0 | data$FATALITIES > 0 | data$PROPDMG > 0 | data$CROPDMG > 
    0, ]

## PROCESSING COST DATA adjust property and crop damage values based on
## exponent
data$PROPDMGEXP[data$PROPDMGEXP == ""] <- 1
data$PROPDMGEXP[data$PROPDMGEXP == "K"] <- 1000
data$PROPDMGEXP[data$PROPDMGEXP == "M"] <- 1000000
data$PROPDMGEXP[data$PROPDMGEXP == "B"] <- 1000000000
data$CROPDMGEXP[data$CROPDMGEXP == ""] <- 1
data$CROPDMGEXP[data$CROPDMGEXP == "K"] <- 1000
data$CROPDMGEXP[data$CROPDMGEXP == "M"] <- 1000000
data$CROPDMGEXP[data$CROPDMGEXP == "B"] <- 1000000000
data$PROPDMGEXP <- as.numeric(data$PROPDMGEXP)
data$CROPDMGEXP <- as.numeric(data$CROPDMGEXP)
# create two new columns containing actual costs
data$PROPDMGCOST <- data$PROPDMG * data$PROPDMGEXP
data$CROPDMGCOST <- data$CROPDMG * data$CROPDMGEXP

## CLEANING UP EVENT TYPES homogenize case for EVTYPE values
data$EVTYPE <- toupper(data$EVTYPE)

# trim leading and trailing whitespace from EVTYPE
trim <- function(x) {
    gsub("(^[[:space:]]+|[[:space:]]+$)", "", x)
    gsub("[[:space:]]{2,}", " ", x)
}
data$EVTYPE <- trim(data$EVTYPE)

Analysis

# sum fatalities by event type and calculate share of total severe
# weather-related fatalities
fatalities <- aggregate(FATALITIES ~ EVTYPE, data = data, sum)
fatalities$percent <- fatalities$FATALITIES/sum(fatalities$FATALITIES) * 100

# rename and sort for tables and plots
names(fatalities) <- c("Event.Type", "Fatalities", "Percent.Total")
fatalities <- fatalities[order(fatalities[, 2], decreasing = TRUE), ]

To address the question of which types of severe weather events have the greatest human impact, I first determined the total number of fatalities for each event type in the dataset. To make it easier to grasp the impact, I determined the share of total severe weather-related fatalities caused by each event type. Then I ranked the event types by number of fatalities in descending order and looked for a natural break between events types that had a high share of fatalities and those with a lower share. The top four event types accounted for about 55% of severe weather-related fatalities.

Event.Type Fatalities Percent.Total
1 EXCESSIVE HEAT 1797.00 20.58
2 TORNADO 1511.00 17.30
3 FLASH FLOOD 887.00 10.16
4 LIGHTNING 651.00 7.46
5 FLOOD 414.00 4.74
6 RIP CURRENT 340.00 3.89
7 TSTM WIND 241.00 2.76
8 HEAT 237.00 2.71
9 HIGH WIND 235.00 2.69
10 AVALANCHE 223.00 2.55
Table 1. Number of fatalities per event type


# sum injuries by event type and calculate share of total severe
# weather-related injuries
injuries <- aggregate(INJURIES ~ EVTYPE, data = data, sum)
injuries$percent <- injuries$INJURIES/sum(injuries$INJURIES) * 100

# rename and sort for tables and plots
names(injuries) <- c("Event.Type", "Injuries", "Percent.Total")
injuries <- injuries[order(injuries[, 2], decreasing = TRUE), ]

To further address the question of which types of severe weather events have the greatest human impact, I determined the total number of injuries for each event type in the dataset. I again determined the share of total severe weather-related injuries caused by each event type and ranked the event types by number of injuries in descending order, looking for a natural break between events types that had a high share of injuries and those with a lower share. The top five event types accounted for about 72% of severe weather-related injuries.

Event.Type Injuries Percent.Total
1 TORNADO 20667.00 35.65
2 FLOOD 6758.00 11.66
3 EXCESSIVE HEAT 6391.00 11.02
4 LIGHTNING 4141.00 7.14
5 TSTM WIND 3629.00 6.26
6 FLASH FLOOD 1674.00 2.89
7 THUNDERSTORM WIND 1400.00 2.41
8 WINTER STORM 1292.00 2.23
9 HURRICANE/TYPHOON 1275.00 2.20
10 HEAT 1222.00 2.11
Table 2. Number of Injuries per event type


Across the United States, which types of events have the greatest economic consequences?

# sum damage cost (property and crop) by event type and calculate share of
# total severe weather-related damage costs
cost <- aggregate(PROPDMGCOST + CROPDMGCOST ~ EVTYPE, data = data, sum)
cost$Percent <- cost[, 2]/sum(cost[, 2]) * 100

# rename and sort for tables and plots rename and sort for tables and plots
names(cost) <- c("Event.Type", "Total.Damage", "Percent.Total")
cost[, 2] <- cost[, 2]/1000000000
cost <- cost[order(cost[, 2], decreasing = TRUE), ]

To address the question of which types of severe weather events have the greatest economic impact, I determined the total cost of property damage and crop damage for each event type in the dataset (in billion USD). To make it easier to grasp the impact, I determined the share of total severe weather-related damage costs caused by each event type. Then I ranked the event types by damage in descending order and looked for a natural break between events types that had a high share of damage and those with a lower share. The top four event types accounted for about 72% of damage.

Event.Type Total.Damage Percent.Total
1 FLOOD 148.92 37.09
2 HURRICANE/TYPHOON 71.91 17.91
3 STORM SURGE 43.19 10.76
4 TORNADO 24.90 6.20
5 HAIL 17.07 4.25
6 FLASH FLOOD 16.56 4.12
7 HURRICANE 14.55 3.62
8 DROUGHT 14.41 3.59
9 TROPICAL STORM 8.32 2.07
10 HIGH WIND 5.88 1.46
Table 3. Economic impact (in billion USD) per event type


Results

Human costs of severe weather events


df.f$Event.Type <- factor(df.f$Event.Type)
df.f$Event.Type <- with(df.f, reorder(Event.Type, 1/Fatalities))


ggplot(df.f[1:4, ]) + geom_bar(aes(x = Event.Type, y = Fatalities, fill = Percent.Total), 
    alpha = 0.85, colour = "deepskyblue4", stat = "identity") + xlab("") + ylab("Fatalities") + 
    scale_y_continuous(labels = comma) + scale_fill_continuous(breaks = c(8, 
    12, 16, 20), labels = c("8", "12", "16", "20")) + ggtitle("Fatalities from 1/1996 - 11/2011 by Event Type") + 
    labs(fill = "Percent: ") + theme_economist() + theme(axis.title.y = element_text(vjust = 1, 
    size = 15), legend.title = element_text(face = "bold"), legend.key.width = unit(1, 
    "cm"))

plot of chunk geom_bar


df.i$Event.Type <- factor(df.i$Event.Type)
df.i$Event.Type <- with(df.i, reorder(Event.Type, 1/Injuries))


ggplot(df.i[1:5, ]) + geom_bar(aes(x = Event.Type, y = Injuries, fill = Percent.Total), 
    alpha = 0.85, colour = "deepskyblue4", stat = "identity") + xlab("") + ylab("Injuries") + 
    scale_y_continuous(labels = comma) + scale_fill_continuous(breaks = c(10, 
    20, 30, 40), labels = c("10", "20", "30", "40")) + ggtitle("Injuries from 1/1996 - 11/2011 by Event Type") + 
    labs(fill = "Percent: ") + theme_economist() + theme(axis.title.y = element_text(vjust = 1, 
    size = 15), legend.title = element_text(face = "bold"), legend.key.width = unit(1, 
    "cm"))

plot of chunk geom_bar2

The most damaging severe weather events, in terms of fatalities and injuries, are excessive heat, tornadoes, flooding (including flash floods), lightning, and thunderstorm winds.

Economic costs of severe weather events


df.c$Event.Type <- factor(df.c$Event.Type)
df.c$Event.Type <- with(df.c, reorder(Event.Type, 1/Total.Damage))


ggplot(df.c[1:4, ]) + geom_bar(aes(x = Event.Type, y = Total.Damage, fill = Percent.Total), 
    alpha = 0.85, colour = "deepskyblue4", stat = "identity") + xlab("") + ylab("Total Damage ($b)") + 
    scale_fill_continuous(breaks = c(10, 20, 30, 40), labels = c("10", "20", 
        "30", "40")) + ggtitle("Total Property and Crop Damage\nfrom 1/1996 - 11/2011 by Event Type") + 
    labs(fill = "Percent: ") + theme_economist() + theme(axis.title.y = element_text(vjust = 1, 
    size = 15), legend.title = element_text(face = "bold"), legend.key.width = unit(1, 
    "cm"))

plot of chunk geom_bar3