This report analyzes data from the U.S. National Oceanic and Atmospheric Administration's database of severe weather events, between 1950 and 2011, to determine the impact of major storm events to and their economic consequences.
Based on the analysis provided below, it is clear that tornados cause the vast majority of casualties in the United States.
Additionally, it is clear that floods cause the greatest economic consequences in the United States. Also notable is that several other of the top ten financial damage causing events are related to flooding events (storm surges, flash/river floods).
This analysis was based on the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database, which tracks characteristics of major storms and weather events in the United States.
The data used can be downloaded from this location. A documentation file in pdf format is available here.
The details of the process are described in the following sections.
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(ggthemes))
suppressPackageStartupMessages(library(xtable))
suppressPackageStartupMessages(library(plyr))
suppressPackageStartupMessages(library(reshape2))
suppressPackageStartupMessages(library(grid))
suppressPackageStartupMessages(library(gridExtra))
options(scipen = 9)
datafile <- "repdata_data_StormData.csv.bz2"
if (!file.exists(datafile)) {
# download data, if it does not exist
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url, datafile, mode = "wb")
}
data <- read.csv(bzfile(datafile), header = TRUE, stringsAsFactors = FALSE)
# transform dates
data$BGN_DATE <- as.Date(data$BGN_DATE, format = "%m/%d/%Y %H:%M:%S")
## SUBSETTING DATA subset for data beginning in 1996 for more homogenous data
## this data is a little cleaner, and inflation adjustment isn't as difficult
data <- data[data$BGN_DATE >= as.Date("1996-01-01"), ]
# throw out any events for which injuries, fatalities, property damage, and
# crop damage were all 0
data <- data[data$INJURIES > 0 | data$FATALITIES > 0 | data$PROPDMG > 0 | data$CROPDMG >
0, ]
## PROCESSING COST DATA adjust property and crop damage values based on
## exponent
data$PROPDMGEXP[data$PROPDMGEXP == ""] <- 1
data$PROPDMGEXP[data$PROPDMGEXP == "K"] <- 1000
data$PROPDMGEXP[data$PROPDMGEXP == "M"] <- 1000000
data$PROPDMGEXP[data$PROPDMGEXP == "B"] <- 1000000000
data$CROPDMGEXP[data$CROPDMGEXP == ""] <- 1
data$CROPDMGEXP[data$CROPDMGEXP == "K"] <- 1000
data$CROPDMGEXP[data$CROPDMGEXP == "M"] <- 1000000
data$CROPDMGEXP[data$CROPDMGEXP == "B"] <- 1000000000
data$PROPDMGEXP <- as.numeric(data$PROPDMGEXP)
data$CROPDMGEXP <- as.numeric(data$CROPDMGEXP)
# create two new columns containing actual costs
data$PROPDMGCOST <- data$PROPDMG * data$PROPDMGEXP
data$CROPDMGCOST <- data$CROPDMG * data$CROPDMGEXP
## CLEANING UP EVENT TYPES homogenize case for EVTYPE values
data$EVTYPE <- toupper(data$EVTYPE)
# trim leading and trailing whitespace from EVTYPE
trim <- function(x) {
gsub("(^[[:space:]]+|[[:space:]]+$)", "", x)
gsub("[[:space:]]{2,}", " ", x)
}
data$EVTYPE <- trim(data$EVTYPE)
# sum fatalities by event type and calculate share of total severe
# weather-related fatalities
fatalities <- aggregate(FATALITIES ~ EVTYPE, data = data, sum)
fatalities$percent <- fatalities$FATALITIES/sum(fatalities$FATALITIES) * 100
# rename and sort for tables and plots
names(fatalities) <- c("Event.Type", "Fatalities", "Percent.Total")
fatalities <- fatalities[order(fatalities[, 2], decreasing = TRUE), ]
To address the question of which types of severe weather events have the greatest human impact, I first determined the total number of fatalities for each event type in the dataset. To make it easier to grasp the impact, I determined the share of total severe weather-related fatalities caused by each event type. Then I ranked the event types by number of fatalities in descending order and looked for a natural break between events types that had a high share of fatalities and those with a lower share. The top four event types accounted for about 55% of severe weather-related fatalities.
| Event.Type | Fatalities | Percent.Total | |
|---|---|---|---|
| 1 | EXCESSIVE HEAT | 1797.00 | 20.58 |
| 2 | TORNADO | 1511.00 | 17.30 |
| 3 | FLASH FLOOD | 887.00 | 10.16 |
| 4 | LIGHTNING | 651.00 | 7.46 |
| 5 | FLOOD | 414.00 | 4.74 |
| 6 | RIP CURRENT | 340.00 | 3.89 |
| 7 | TSTM WIND | 241.00 | 2.76 |
| 8 | HEAT | 237.00 | 2.71 |
| 9 | HIGH WIND | 235.00 | 2.69 |
| 10 | AVALANCHE | 223.00 | 2.55 |
# sum injuries by event type and calculate share of total severe
# weather-related injuries
injuries <- aggregate(INJURIES ~ EVTYPE, data = data, sum)
injuries$percent <- injuries$INJURIES/sum(injuries$INJURIES) * 100
# rename and sort for tables and plots
names(injuries) <- c("Event.Type", "Injuries", "Percent.Total")
injuries <- injuries[order(injuries[, 2], decreasing = TRUE), ]
To further address the question of which types of severe weather events have the greatest human impact, I determined the total number of injuries for each event type in the dataset. I again determined the share of total severe weather-related injuries caused by each event type and ranked the event types by number of injuries in descending order, looking for a natural break between events types that had a high share of injuries and those with a lower share. The top five event types accounted for about 72% of severe weather-related injuries.
| Event.Type | Injuries | Percent.Total | |
|---|---|---|---|
| 1 | TORNADO | 20667.00 | 35.65 |
| 2 | FLOOD | 6758.00 | 11.66 |
| 3 | EXCESSIVE HEAT | 6391.00 | 11.02 |
| 4 | LIGHTNING | 4141.00 | 7.14 |
| 5 | TSTM WIND | 3629.00 | 6.26 |
| 6 | FLASH FLOOD | 1674.00 | 2.89 |
| 7 | THUNDERSTORM WIND | 1400.00 | 2.41 |
| 8 | WINTER STORM | 1292.00 | 2.23 |
| 9 | HURRICANE/TYPHOON | 1275.00 | 2.20 |
| 10 | HEAT | 1222.00 | 2.11 |
# sum damage cost (property and crop) by event type and calculate share of
# total severe weather-related damage costs
cost <- aggregate(PROPDMGCOST + CROPDMGCOST ~ EVTYPE, data = data, sum)
cost$Percent <- cost[, 2]/sum(cost[, 2]) * 100
# rename and sort for tables and plots rename and sort for tables and plots
names(cost) <- c("Event.Type", "Total.Damage", "Percent.Total")
cost[, 2] <- cost[, 2]/1000000000
cost <- cost[order(cost[, 2], decreasing = TRUE), ]
To address the question of which types of severe weather events have the greatest economic impact, I determined the total cost of property damage and crop damage for each event type in the dataset (in billion USD). To make it easier to grasp the impact, I determined the share of total severe weather-related damage costs caused by each event type. Then I ranked the event types by damage in descending order and looked for a natural break between events types that had a high share of damage and those with a lower share. The top four event types accounted for about 72% of damage.
| Event.Type | Total.Damage | Percent.Total | |
|---|---|---|---|
| 1 | FLOOD | 148.92 | 37.09 |
| 2 | HURRICANE/TYPHOON | 71.91 | 17.91 |
| 3 | STORM SURGE | 43.19 | 10.76 |
| 4 | TORNADO | 24.90 | 6.20 |
| 5 | HAIL | 17.07 | 4.25 |
| 6 | FLASH FLOOD | 16.56 | 4.12 |
| 7 | HURRICANE | 14.55 | 3.62 |
| 8 | DROUGHT | 14.41 | 3.59 |
| 9 | TROPICAL STORM | 8.32 | 2.07 |
| 10 | HIGH WIND | 5.88 | 1.46 |
df.f$Event.Type <- factor(df.f$Event.Type)
df.f$Event.Type <- with(df.f, reorder(Event.Type, 1/Fatalities))
ggplot(df.f[1:4, ]) + geom_bar(aes(x = Event.Type, y = Fatalities, fill = Percent.Total),
alpha = 0.85, colour = "deepskyblue4", stat = "identity") + xlab("") + ylab("Fatalities") +
scale_y_continuous(labels = comma) + scale_fill_continuous(breaks = c(8,
12, 16, 20), labels = c("8", "12", "16", "20")) + ggtitle("Fatalities from 1/1996 - 11/2011 by Event Type") +
labs(fill = "Percent: ") + theme_economist() + theme(axis.title.y = element_text(vjust = 1,
size = 15), legend.title = element_text(face = "bold"), legend.key.width = unit(1,
"cm"))
df.i$Event.Type <- factor(df.i$Event.Type)
df.i$Event.Type <- with(df.i, reorder(Event.Type, 1/Injuries))
ggplot(df.i[1:5, ]) + geom_bar(aes(x = Event.Type, y = Injuries, fill = Percent.Total),
alpha = 0.85, colour = "deepskyblue4", stat = "identity") + xlab("") + ylab("Injuries") +
scale_y_continuous(labels = comma) + scale_fill_continuous(breaks = c(10,
20, 30, 40), labels = c("10", "20", "30", "40")) + ggtitle("Injuries from 1/1996 - 11/2011 by Event Type") +
labs(fill = "Percent: ") + theme_economist() + theme(axis.title.y = element_text(vjust = 1,
size = 15), legend.title = element_text(face = "bold"), legend.key.width = unit(1,
"cm"))
The most damaging severe weather events, in terms of fatalities and injuries, are excessive heat, tornadoes, flooding (including flash floods), lightning, and thunderstorm winds.
df.c$Event.Type <- factor(df.c$Event.Type)
df.c$Event.Type <- with(df.c, reorder(Event.Type, 1/Total.Damage))
ggplot(df.c[1:4, ]) + geom_bar(aes(x = Event.Type, y = Total.Damage, fill = Percent.Total),
alpha = 0.85, colour = "deepskyblue4", stat = "identity") + xlab("") + ylab("Total Damage ($b)") +
scale_fill_continuous(breaks = c(10, 20, 30, 40), labels = c("10", "20",
"30", "40")) + ggtitle("Total Property and Crop Damage\nfrom 1/1996 - 11/2011 by Event Type") +
labs(fill = "Percent: ") + theme_economist() + theme(axis.title.y = element_text(vjust = 1,
size = 15), legend.title = element_text(face = "bold"), legend.key.width = unit(1,
"cm"))