This analysis explores the impacts of severe weather events in the United States on public health and the economy, using data from the NOAA Storm Database (1950 - 2011). We measure public health impacts by summing the total fatalities and injuries for each weather event type, and economic impacts by calculating the total property and crop damages. The analysis reveals that tornadoes are the single most harmful event to public health, causing the highest number of both fatalities and injuries. Excessive heat and heat waves are also major contributors to fatalities, while thunderstorm winds and floods lead to a high number of injuries. For economic consequences, floods cause the greatest total financial damage, followed by hurricanes/typhoons and storm surges. While drought is the leading cause of agricultural crop damage, property damage from flooding remains the single largest economic contributor overall. Understanding these patterns is essential for guiding public policy, planning disaster response, and allocating safety resources.
The analysis starts from the raw storm data provided by the National Oceanic and Atmospheric Administration (NOAA). The data contains characteristics of major storms and weather events in the United States, including estimates of any fatalities, injuries, and property and crop damage.
We download the dataset directly from the source URL if it does not already exist in the working directory. Then, we read the CSV file directly from the compressed bzip2 archive.
# Define URL and destination file
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
destfile <- "repdata_data_StormData.csv.bz2"
# Download the file if it does not already exist
if (!file.exists(destfile)) {
download.file(url, destfile, mode = "wb")
}
# Read the compressed csv file
storm_data <- read.csv(destfile)
To optimize memory usage and processing speed, we select only the
columns relevant to the analysis: * EVTYPE: Type of weather
event. * FATALITIES: Number of directly or indirectly
related deaths. * INJURIES: Number of directly or
indirectly related injuries. * PROPDMG: Property damage
base estimate. * PROPDMGEXP: Exponent indicator for
property damage value. * CROPDMG: Crop damage base
estimate. * CROPDMGEXP: Exponent indicator for crop damage
value.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
library(ggplot2)
# Select relevant columns
cleaned_data <- storm_data %>%
select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
The property and crop damage variables, PROPDMG and
CROPDMG, are accompanied by exponent variables,
PROPDMGEXP and CROPDMGEXP. These exponents
specify the magnitude of the damage values (e.g., ‘K’ for thousands, ‘M’
for millions, ‘B’ for billions).
To calculate the actual damage in USD, we map these exponents to
their corresponding numerical multipliers: * H or
h (hundreds) -> \(10^2\) * K or k
(thousands) -> \(10^3\) *
M or m (millions) -> \(10^6\) * B or b
(billions) -> \(10^9\) * Numeric
values 0 to 8 -> \(10^{value}\) * Characters +,
-, ?, and empty string -> \(1\)
We then compute the actual damage values.
# Function to convert exponent codes to numeric multipliers
convert_exponent <- function(exp_col) {
exp_col <- toupper(trimws(as.character(exp_col)))
multipliers <- rep(1, length(exp_col))
multipliers[exp_col == "H"] <- 10^2
multipliers[exp_col == "K"] <- 10^3
multipliers[exp_col == "M"] <- 10^6
multipliers[exp_col == "B"] <- 10^9
# Numeric values 0-8
numeric_idx <- exp_col %in% as.character(0:8)
multipliers[numeric_idx] <- 10^as.numeric(exp_col[numeric_idx])
return(multipliers)
}
# Calculate actual damage values in USD
cleaned_data <- cleaned_data %>%
mutate(
PropDamage = PROPDMG * convert_exponent(PROPDMGEXP),
CropDamage = CROPDMG * convert_exponent(CROPDMGEXP),
TotalDamage = PropDamage + CropDamage
)
The EVTYPE variable contains many inconsistencies due to
typos, case mismatches, and multiple naming conventions (e.g., “TSTM
WIND” vs “THUNDERSTORM WIND”). We clean and standardize the most
frequent and impactful event types using regular expressions and a
standard classification mapping.
# Clean and standardize event types
cleaned_data <- cleaned_data %>%
mutate(EVTYPE_CLEAN = toupper(trimws(EVTYPE))) %>%
mutate(EVTYPE_CLEAN = case_when(
grepl("TORNADO", EVTYPE_CLEAN) ~ "TORNADO",
grepl("TSTM WIND|THUNDERSTORM WIND|THUNDERSTORM WINDS|THUNDERSTORM", EVTYPE_CLEAN) ~ "THUNDERSTORM WIND",
grepl("EXCESSIVE HEAT|EXTREME HEAT|RECORD HEAT", EVTYPE_CLEAN) ~ "EXCESSIVE HEAT",
grepl("HEAT", EVTYPE_CLEAN) ~ "HEAT",
grepl("HURRICANE|TYPHOON", EVTYPE_CLEAN) ~ "HURRICANE",
grepl("STORM SURGE|TIDE", EVTYPE_CLEAN) ~ "STORM SURGE",
grepl("WILD/FOREST FIRE|WILDFIRE|WILD FIRE", EVTYPE_CLEAN) ~ "WILDFIRE",
grepl("FLASH FLOOD", EVTYPE_CLEAN) ~ "FLASH FLOOD",
grepl("FLOOD", EVTYPE_CLEAN) & !grepl("FLASH", EVTYPE_CLEAN) ~ "FLOOD",
grepl("HAIL", EVTYPE_CLEAN) ~ "HAIL",
grepl("RIP CURRENT", EVTYPE_CLEAN) ~ "RIP CURRENT",
grepl("BLIZZARD|WINTER STORM|WINTER WEATHER|SNOW|ICE|FREEZING", EVTYPE_CLEAN) ~ "WINTER WEATHER/STORM",
grepl("COLD|EXTREME COLD|WIND CHILL|FREEZE|FROST", EVTYPE_CLEAN) ~ "EXTREME COLD/FROST",
grepl("HIGH WIND|STRONG WIND", EVTYPE_CLEAN) ~ "HIGH WIND",
grepl("LIGHTNING", EVTYPE_CLEAN) ~ "LIGHTNING",
TRUE ~ EVTYPE_CLEAN
))
Finally, we aggregate the health and economic metrics by the cleaned event categories.
# Aggregate public health data
health_summary <- cleaned_data %>%
group_by(EVTYPE_CLEAN) %>%
summarise(
Fatalities = sum(FATALITIES, na.rm = TRUE),
Injuries = sum(INJURIES, na.rm = TRUE),
TotalHealth = Fatalities + Injuries
)
# Aggregate economic data
economic_summary <- cleaned_data %>%
group_by(EVTYPE_CLEAN) %>%
summarise(
PropDamage = sum(PropDamage, na.rm = TRUE),
CropDamage = sum(CropDamage, na.rm = TRUE),
TotalDamage = sum(TotalDamage, na.rm = TRUE)
)
To identify the weather events most harmful to public health, we analyze the top 10 event categories for both fatalities and injuries.
# Get top 10 events for fatalities and injuries
top_fatalities <- health_summary %>%
arrange(desc(Fatalities)) %>%
head(10)
top_injuries <- health_summary %>%
arrange(desc(Injuries)) %>%
head(10)
# Combine for plotting
top_health_plot <- bind_rows(
top_fatalities %>% mutate(Count = Fatalities, Metric = "Fatalities"),
top_injuries %>% mutate(Count = Injuries, Metric = "Injuries")
)
# Plot Figure 1: Population Health Impacts
ggplot(top_health_plot, aes(x = reorder(EVTYPE_CLEAN, Count), y = Count, fill = Metric)) +
geom_bar(stat = "identity") +
coord_flip() +
facet_wrap(~Metric, scales = "free", ncol = 2) +
labs(
title = "Top 10 Severe Weather Events by Health Impact (1950 - 2011)",
x = "Event Type",
y = "Total Counts",
caption = "Figure 1: Comparison of total fatalities and injuries for the top 10 severe weather events."
) +
scale_fill_manual(values = c("Fatalities" = "#d9534f", "Injuries" = "#f0ad4e")) +
theme_minimal() +
theme(
legend.position = "none",
plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
strip.text = element_text(face = "bold", size = 12),
axis.text.y = element_text(size = 10)
)
As shown in Figure 1, Tornadoes are by far the leading cause of both fatalities and injuries in the United States, causing over 5,600 fatalities and 91,000 injuries. Excessive Heat is the second most deadly event, causing nearly 2,000 deaths, while Thunderstorm Wind and Floods lead to the second and third highest number of injuries, respectively.
Table 1 provides the detailed counts for the top 10 event types by overall public health impact (Fatalities + Injuries).
# Display the top 10 events causing overall health impacts (Fatalities + Injuries)
top_overall_health <- health_summary %>%
arrange(desc(TotalHealth)) %>%
head(10)
knitr::kable(top_overall_health,
col.names = c("Event Type", "Fatalities", "Injuries", "Total Health Impact (Fatalities + Injuries)"),
caption = "Table 1: Top 10 Weather Events by Total Population Health Impact")
| Event Type | Fatalities | Injuries | Total Health Impact (Fatalities + Injuries) |
|---|---|---|---|
| TORNADO | 5661 | 91407 | 97068 |
| THUNDERSTORM WIND | 729 | 9544 | 10273 |
| EXCESSIVE HEAT | 2020 | 6730 | 8750 |
| FLOOD | 490 | 6802 | 7292 |
| WINTER WEATHER/STORM | 655 | 6052 | 6707 |
| LIGHTNING | 817 | 5231 | 6048 |
| HEAT | 1118 | 2494 | 3612 |
| FLASH FLOOD | 1035 | 1802 | 2837 |
| HIGH WIND | 416 | 1784 | 2200 |
| WILDFIRE | 90 | 1606 | 1696 |
To find the weather events with the greatest economic impact, we look at the total combined property and crop damages (in USD).
# Get top 10 events for total economic damage
top_economic <- economic_summary %>%
arrange(desc(TotalDamage)) %>%
head(10)
# Reshape for plotting property and crop damage breakdown
top_economic_long <- top_economic %>%
pivot_longer(cols = c(PropDamage, CropDamage), names_to = "DamageType", values_to = "Amount") %>%
mutate(AmountBillion = Amount / 1e9)
# Plot Figure 2: Economic Damage
ggplot(top_economic_long, aes(x = reorder(EVTYPE_CLEAN, TotalDamage), y = AmountBillion, fill = DamageType)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(
title = "Top 10 Severe Weather Events by Economic Impact (1950 - 2011)",
x = "Event Type",
y = "Damage (Billions of USD)",
fill = "Damage Component",
caption = "Figure 2: Total economic damage in billions of USD, partitioned by property and crop damage."
) +
scale_fill_manual(
values = c("PropDamage" = "#337ab7", "CropDamage" = "#5cb85c"),
labels = c("PropDamage" = "Property Damage", "CropDamage" = "Crop Damage")
) +
theme_minimal() +
theme(
legend.position = "bottom",
plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
axis.text.y = element_text(size = 10)
)
As shown in Figure 2, Floods are responsible for the greatest economic consequences overall, causing over 150 billion USD in total damages, predominantly driven by property damage. Hurricanes rank second with approximately 90 billion USD in total damage, and Storm Surges rank third at around 48 billion USD.
For agricultural losses specifically, Drought is the leading cause of crop damage (over 13 billion USD), followed by floods and hurricanes.
Table 2 shows the breakdown in billions of USD for the top 10 event types.
# Format economic data for display
top_overall_economic <- top_economic %>%
mutate(
PropDamageBillion = PropDamage / 1e9,
CropDamageBillion = CropDamage / 1e9,
TotalDamageBillion = TotalDamage / 1e9
) %>%
select(EVTYPE_CLEAN, PropDamageBillion, CropDamageBillion, TotalDamageBillion)
knitr::kable(top_overall_economic,
digits = 2,
col.names = c("Event Type", "Property Damage (Billions $)", "Crop Damage (Billions $)", "Total Damage (Billions $)"),
caption = "Table 2: Top 10 Weather Events by Total Economic Impact")
| Event Type | Property Damage (Billions $) | Crop Damage (Billions $) | Total Damage (Billions $) |
|---|---|---|---|
| FLOOD | 150.62 | 10.85 | 161.47 |
| HURRICANE | 85.36 | 5.52 | 90.87 |
| TORNADO | 58.60 | 0.42 | 59.02 |
| STORM SURGE | 47.97 | 0.00 | 47.98 |
| FLASH FLOOD | 17.59 | 1.53 | 19.12 |
| HAIL | 15.98 | 3.05 | 19.02 |
| WINTER WEATHER/STORM | 12.44 | 5.32 | 17.75 |
| DROUGHT | 1.05 | 13.97 | 15.02 |
| THUNDERSTORM WIND | 11.18 | 1.27 | 12.46 |
| WILDFIRE | 8.49 | 0.40 | 8.89 |