This assessment uses data from the U.S. National Oceanic and Atmospheric Administration (NOAA)’s Storm Database to identify trends in harm to public health and economic damage caused by storms in the United States between 1950 and 2011. The NOAA Storm Database contains health and economic outcomes data on 977 storm event types as well as the event beginning and ending dates. Health outcomes include injuries and fatalities caused by storm events. Economic outcomes include property damage and crop damage in dollars caused by storm events. The evaluation program reads in the raw NOAA storm data, processes the data, and calculates aggregate and annual health and economic outcomes for each type of storm event. The assessment finds that tornado events cause the most injuries and fatalities in total, though have caused fewer injuries and fatalities on average each year from 1950 to 2011. The assessment finds also that flood events have caused the most property damage and drought events have caused the most crop damage, though have also caused less damage on average over time.
First, we initialize the evaluation program and read in the raw storm data from the NOAA database.
library(readr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(data.table)
##
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
##
## between, first, last
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:data.table':
##
## hour, isoweek, isoyear, mday, minute, month, quarter, second, wday,
## week, yday, year
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(stringr)
library(purrr)
##
## Attaching package: 'purrr'
## The following object is masked from 'package:data.table':
##
## transpose
library(scales)
##
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
##
## discard
## The following object is masked from 'package:readr':
##
## col_factor
library(patchwork)
## Warning: package 'patchwork' was built under R version 4.5.3
library(tidyr)
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
file <- file.path(getwd(),"stormdata.csv.bz2")
if (!file.exists(file)) {
download.file(url, destfile = file)
}
data <- read_csv(file)
## Rows: 902297 Columns: 37
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (18): BGN_DATE, BGN_TIME, TIME_ZONE, COUNTYNAME, STATE, EVTYPE, BGN_AZI,...
## dbl (18): STATE__, COUNTY, BGN_RANGE, COUNTY_END, END_RANGE, LENGTH, WIDTH, ...
## lgl (1): COUNTYENDN
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Next, we process the data by retrieving the beginning and ending year of each storm event, transform property and crop damage outcomes data to display the total dollar amounts, and subset the data to contain only the event type, years, health outcomes, and economic outcomes data of interest.
# Retrieve the year each event began
data$BGN_DATE <- parse_date_time(data$BGN_DATE,
orders = c("mdy",
"dmy",
"ymd",
"mdy HMS",
"ymd HMS")
)
data$BGN_YR <- year(data$BGN_DATE)
# Retrieve the year each event ended
data$END_DATE <- parse_date_time(data$END_DATE,
orders = c("mdy",
"dmy",
"ymd",
"mdy HMS",
"ymd HMS")
)
data$END_YR <- year(data$END_DATE)
# Subset to health data
health_data <- data %>%
select(EVTYPE, BGN_YR, END_YR, INJURIES, FATALITIES)
# Subset to economic data
econ_data <- data %>%
select(EVTYPE, BGN_YR, END_YR, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
# Clean the property and crop damage columns
clean_damage <- function(df) {
exp_map <- c(
"H" = 100, "h" = 100,
"K" = 1000, "k" = 1000,
"M" = 1e6, "m" = 1e6,
"B" = 1e9, "b" = 1e9,
"0" = 1, "1" = 10, "2" = 100, "3" = 1000,
"4" = 10000, "5" = 1e5, "6" = 1e6,
"7" = 1e7, "8" = 1e8, "9" = 1e9,
"+" = 1, "-" = 1, "?" = 1
)
df %>%
mutate(
PROPDMGEXP = str_trim(PROPDMGEXP),
CROPDMGEXP = str_trim(CROPDMGEXP),
prop_mult = exp_map[PROPDMGEXP],
crop_mult = exp_map[CROPDMGEXP],
prop_mult = coalesce(prop_mult, 1),
crop_mult = coalesce(crop_mult, 1),
PROPDMG_DOLLARS = PROPDMG * prop_mult,
CROPDMG_DOLLARS = CROPDMG * crop_mult
) %>%
select(-prop_mult, -crop_mult)
}
econ_data <- clean_damage(econ_data)
# Subset to final columns
econ_data <- econ_data %>%
select(EVTYPE, BGN_YR, END_YR, PROPDMG_DOLLARS, CROPDMG_DOLLARS)
Then, we calculate the overall and annual health outcomes and economic outcomes of each event type, producing one dataset with overall outcomes for events from 1950-2011 and one dataset with annual outcomes for events in the same period. The resulting datasets each contain the total, average, and median number of fatalities caused by each event type, the total number of fatalities and injuries caused by each event type, the total, average, and median dollar amount of property damage caused by each event type, and the total, average, and median dollar amount of crop damage caused by each event type.
# Overall health outcomes
health_summary <- health_data %>%
group_by(EVTYPE) %>%
summarize(
total_fatal = sum(FATALITIES, na.rm = TRUE),
avg_fatal = mean(FATALITIES, na.rm = TRUE),
med_fatal = median(FATALITIES, na.rm = TRUE),
total_inj = sum(INJURIES, na.rm = TRUE),
avg_inj = mean(INJURIES, na.rm = TRUE),
med_inj = median(INJURIES, na.rm = TRUE),
.groups = "drop"
)
# Overall economic outcomes
econ_summary <- econ_data %>%
group_by(EVTYPE) %>%
summarize(
total_prop = sum(PROPDMG_DOLLARS, na.rm = TRUE),
avg_prop = mean(PROPDMG_DOLLARS, na.rm = TRUE),
med_prop = median(PROPDMG_DOLLARS, na.rm = TRUE),
total_crop = sum(CROPDMG_DOLLARS, na.rm = TRUE),
avg_crop = mean(CROPDMG_DOLLARS, na.rm = TRUE),
med_crop = median(CROPDMG_DOLLARS, na.rm = TRUE),
.groups = "drop"
)
# Joined overall dataset
full_data_summary <- full_join(
health_summary,
econ_summary,
by = c("EVTYPE")
)
# Annual health outcomes
health_annual <- health_data %>%
group_by(EVTYPE, BGN_YR) %>%
summarize(
total_fatal = sum(FATALITIES, na.rm = TRUE),
avg_fatal = mean(FATALITIES, na.rm = TRUE),
med_fatal = median(FATALITIES, na.rm = TRUE),
total_inj = sum(INJURIES, na.rm = TRUE),
avg_inj = mean(INJURIES, na.rm = TRUE),
med_inj = median(INJURIES, na.rm = TRUE),
.groups = "drop"
)
# Annual economic outcomes
econ_annual <- econ_data %>%
group_by(EVTYPE, BGN_YR) %>%
summarize(
total_prop = sum(PROPDMG_DOLLARS, na.rm = TRUE),
avg_prop = mean(PROPDMG_DOLLARS, na.rm = TRUE),
med_prop = median(PROPDMG_DOLLARS, na.rm = TRUE),
total_crop = sum(CROPDMG_DOLLARS, na.rm = TRUE),
avg_crop = mean(CROPDMG_DOLLARS, na.rm = TRUE),
med_crop = median(CROPDMG_DOLLARS, na.rm = TRUE),
.groups = "drop"
)
# Joined annual dataset
full_data_annual <- full_join(
health_annual,
econ_annual,
by = c("EVTYPE", "BGN_YR")
)
Finally, we identify which event types are associated with the most fatalities, injuries, property damage, and crop damage, overall and on average annually.
# Most dangerous events (fatality)
top_num_fatal <- full_data_summary %>%
slice_max(total_fatal, n = 3) %>%
select(EVTYPE, total_fatal)
top_num_fatal
## # A tibble: 3 × 2
## EVTYPE total_fatal
## <chr> <dbl>
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
# Most dangerous events (injury or fatality)
top_num_dangerous <- full_data_summary %>%
mutate(num_inj_or_fatal = total_fatal + total_inj) %>%
slice_max(num_inj_or_fatal, n = 3) %>%
select(EVTYPE, num_inj_or_fatal)
top_num_dangerous
## # A tibble: 3 × 2
## EVTYPE num_inj_or_fatal
## <chr> <dbl>
## 1 TORNADO 96979
## 2 EXCESSIVE HEAT 8428
## 3 TSTM WIND 7461
# Most expensive for property
top_prop_dmg <- full_data_summary %>%
mutate(total_prop = total_prop/1000000000) %>%
slice_max(total_prop, n = 3) %>%
select(EVTYPE, total_prop)
top_prop_dmg
## # A tibble: 3 × 2
## EVTYPE total_prop
## <chr> <dbl>
## 1 FLOOD 145.
## 2 HURRICANE/TYPHOON 69.3
## 3 TORNADO 56.9
# Most expensive for crops
top_crop_dmg <- full_data_summary %>%
mutate(total_crop = total_crop/1000000000) %>%
slice_max(total_crop, n = 3) %>%
select(EVTYPE, total_crop)
top_crop_dmg
## # A tibble: 3 × 2
## EVTYPE total_crop
## <chr> <dbl>
## 1 DROUGHT 14.0
## 2 FLOOD 5.66
## 3 RIVER FLOOD 5.03
We find that tornado events are the most harmful to population health, in terms of the total number of fatalities as well as the total number of fatalities and injuries.
p1a <- ggplot(top_num_fatal, aes(x = reorder(EVTYPE,total_fatal),
y = total_fatal)) +
geom_col(fill = "firebrick4", color = "black") +
geom_text(
aes( y = total_fatal / 2,
label = comma(total_fatal)
),
hjust = 0.5,
color = "white",
size = 4.5
) +
coord_flip() +
labs(
title = "Total Fatalities by Event Type",
y = "Fatalities"
) +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5)) +
theme(axis.title.y = element_blank())
p1b <- ggplot(top_num_dangerous, aes(x = reorder(EVTYPE, num_inj_or_fatal),
y = num_inj_or_fatal)) +
geom_col(fill = "firebrick4", color = "black") +
geom_text(
aes( y = num_inj_or_fatal / 2,
label = comma(num_inj_or_fatal)
),
hjust = 0.5,
color = "white",
size = 4.5
) +
coord_flip() +
labs(
title = "Total Fatalities and Injuries by Event Type",
y = "Fatalities and Injuries"
) +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5)) +
theme(axis.title.y = element_blank())
p1c <- ggplot(top_prop_dmg, aes(x = reorder(EVTYPE,total_prop),
y = total_prop)) +
geom_col(fill = "firebrick4", color = "black") +
geom_text(
aes( y = total_prop / 2,
label = paste0(dollar(total_prop), "B")
),
hjust = 0.5,
color = "white",
size = 4.5
) +
coord_flip() +
labs(
title = "Total Property Damage by Event Type",
y = "Property Damage (in Billion Dollars)"
) +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5)) +
theme(axis.title.y = element_blank())
p1d <- ggplot(top_crop_dmg, aes(x = reorder(EVTYPE,total_crop),
y = total_crop)) +
geom_col(fill = "firebrick4", color = "black") +
geom_text(
aes( y = total_crop / 2,
label = paste0(dollar(total_crop), "B")
),
hjust = 0.5,
color = "white",
size = 4.5
) +
coord_flip() +
labs(
title = "Total Crop Damage by Event Type",
y = "Crop Damage (in Billion Dollars)"
) +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5)) +
theme(axis.title.y = element_blank())
plot1 <- (p1a | p1b) /
(p1c | p1d) +
plot_annotation(title = "Figure I. Overall Health and Economic Outcomes",
theme = theme(
plot.title = element_text(size = 18,
face = "bold",
hjust = 0.5)))
plot1
Figure I Caption: Figure I displays the number of fatalities and injuries as well as property damage and crop damage (in billions of dollars) caused by storm events in the United States from 1950-2011. The three event types that have caused the most harm in each category are displayed.
We find also that tornado events have caused fewer fatalities and injuries on average annually since 1950.
annual_tornado <- full_data_annual %>%
filter(EVTYPE == "TORNADO") %>%
select(EVTYPE, BGN_YR, avg_fatal, avg_inj) %>%
pivot_longer(cols = c(avg_fatal, avg_inj),
names_to = "variable",
values_to = "value")
plot2 <- ggplot(annual_tornado, aes(x = BGN_YR,
y = value,
color = variable)) +
geom_line(linewidth = 1.1) +
geom_point() +
scale_x_continuous(breaks = seq(min(annual_tornado$BGN_YR),
max(annual_tornado$BGN_YR),
by = 5)) +
labs(color = "Measure") +
scale_color_discrete(labels = c("avg_fatal" = "Fatalities",
"avg_inj" = "Injuries")) +
theme(plot.title = element_text(hjust = 0.5)) +
theme(axis.title.y = element_blank()) +
theme(axis.title.x = element_blank())
plot2 <- plot2 + plot_annotation(title = "Figure II. Average Tornado Injuries and Fatalities Per Year",
theme = theme(
plot.title = element_text(size = 18,
face = "bold",
hjust = 0.5)))
plot2
Figure II Caption: Figure II displays average fatalities and injuries caused by tornados each year in the United States from 1950-2011.
Similarly, drought events have caused less crop damage on average in recent years, though property damage has remained relatively stable over time. Additionally, property and crop damage from floods has remained stable over time, acknowledging a spike in property damage in 2006.
annual_flood <- full_data_annual %>%
filter(EVTYPE == "FLOOD") %>%
mutate(avg_prop = avg_prop/1000000) %>%
mutate(avg_crop = avg_crop/1000000) %>%
select(EVTYPE, BGN_YR, avg_prop, avg_crop) %>%
pivot_longer(cols = c(avg_prop, avg_crop),
names_to = "variable",
values_to = "value")
annual_drought <- full_data_annual %>%
filter(EVTYPE == "DROUGHT") %>%
mutate(avg_prop = avg_prop/1000000) %>%
mutate(avg_crop = avg_crop/1000000) %>%
select(EVTYPE, BGN_YR, avg_prop, avg_crop) %>%
pivot_longer(cols = c(avg_prop, avg_crop),
names_to = "variable",
values_to = "value")
p3a <- ggplot(annual_flood, aes(x = BGN_YR,
y = value,
color = variable)) +
geom_line(linewidth = 1.1) +
geom_point() +
scale_x_continuous(breaks = seq(min(annual_flood$BGN_YR),
max(annual_flood$BGN_YR),
by = 1)) +
labs(title = "Average Flood Damage (in Million Dollars)",
color = "Measure") +
scale_color_discrete(labels = c("avg_prop" = "Property Damage",
"avg_crop" = "Crop Damage")) +
theme(plot.title = element_text(hjust = 0.5)) +
theme(axis.title.y = element_blank()) +
theme(axis.title.x = element_blank()) +
theme(legend.position = "none")
p3b <- ggplot(annual_drought, aes(x = BGN_YR,
y = value,
color = variable)) +
geom_line(linewidth = 1.1) +
geom_point() +
scale_x_continuous(breaks = seq(min(annual_drought$BGN_YR), max(annual_drought$BGN_YR), by = 1)) +
labs(title = "Average Drought Damage (in Million Dollars)",
color = "Measure") +
scale_color_discrete(labels = c("avg_prop" = "Property Damage",
"avg_crop" = "Crop Damage")) +
theme(plot.title = element_text(hjust = 0.5)) +
theme(axis.title.y = element_blank()) +
theme(axis.title.x = element_blank())
plot3 <- (p3a) /
(p3b) + plot_annotation(title = "Figure III. Average Economic Damage From Flood and Drought",
theme = theme(
plot.title = element_text(size = 18,
face = "bold",
hjust = 0.5)))
plot3
Figure III Caption: Figure III displays the average property damage and crop damage (in millions of dollars) caused by floods and drought in the United States annually from 1993-2011.
Note that the NOAA dataset is more complete in recent years. Incomplete event data for older years may impact results. For example, if events in older years that caused less health or economic harm are missing from the data, then annual average harm for events may be inflated for older years. This would make the trend of increasing safety for tornado events appear to be stronger than it may be in reality. Ultimately, local health officials should continue to invest in efforts to prepare for tornado events, as well as flood and drought, to mitigate harm to population health and local economies.