Synopsis

This analysis explores the impacts of severe weather events in the United States on public health and the economy, using data from the NOAA Storm Database (1950 - 2011). We measure public health impacts by summing the total fatalities and injuries for each weather event type, and economic impacts by calculating the total property and crop damages. The analysis reveals that tornadoes are the single most harmful event to public health, causing the highest number of both fatalities and injuries. Excessive heat and heat waves are also major contributors to fatalities, while thunderstorm winds and floods lead to a high number of injuries. For economic consequences, floods cause the greatest total financial damage, followed by hurricanes/typhoons and storm surges. While drought is the leading cause of agricultural crop damage, property damage from flooding remains the single largest economic contributor overall. Understanding these patterns is essential for guiding public policy, planning disaster response, and allocating safety resources.

Data Processing

The analysis starts from the raw storm data provided by the National Oceanic and Atmospheric Administration (NOAA). The data contains characteristics of major storms and weather events in the United States, including estimates of any fatalities, injuries, and property and crop damage.

Loading the Data

We download the dataset directly from the source URL if it does not already exist in the working directory. Then, we read the CSV file directly from the compressed bzip2 archive.

# Define URL and destination file
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
destfile <- "repdata_data_StormData.csv.bz2"

# Download the file if it does not already exist
if (!file.exists(destfile)) {
    download.file(url, destfile, mode = "wb")
}

# Read the compressed csv file
storm_data <- read.csv(destfile)

Preprocessing and Subset Selection

To optimize memory usage and processing speed, we select only the columns relevant to the analysis: * EVTYPE: Type of weather event. * FATALITIES: Number of directly or indirectly related deaths. * INJURIES: Number of directly or indirectly related injuries. * PROPDMG: Property damage base estimate. * PROPDMGEXP: Exponent indicator for property damage value. * CROPDMG: Crop damage base estimate. * CROPDMGEXP: Exponent indicator for crop damage value.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyr)
library(ggplot2)

# Select relevant columns
cleaned_data <- storm_data %>%
    select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)

Damage Exponent Mapping

The property and crop damage variables, PROPDMG and CROPDMG, are accompanied by exponent variables, PROPDMGEXP and CROPDMGEXP. These exponents specify the magnitude of the damage values (e.g., ‘K’ for thousands, ‘M’ for millions, ‘B’ for billions).

To calculate the actual damage in USD, we map these exponents to their corresponding numerical multipliers: * H or h (hundreds) -> \(10^2\) * K or k (thousands) -> \(10^3\) * M or m (millions) -> \(10^6\) * B or b (billions) -> \(10^9\) * Numeric values 0 to 8 -> \(10^{value}\) * Characters +, -, ?, and empty string -> \(1\)

We then compute the actual damage values.

# Function to convert exponent codes to numeric multipliers
convert_exponent <- function(exp_col) {
    exp_col <- toupper(trimws(as.character(exp_col)))
    multipliers <- rep(1, length(exp_col))
    
    multipliers[exp_col == "H"] <- 10^2
    multipliers[exp_col == "K"] <- 10^3
    multipliers[exp_col == "M"] <- 10^6
    multipliers[exp_col == "B"] <- 10^9
    
    # Numeric values 0-8
    numeric_idx <- exp_col %in% as.character(0:8)
    multipliers[numeric_idx] <- 10^as.numeric(exp_col[numeric_idx])
    
    return(multipliers)
}

# Calculate actual damage values in USD
cleaned_data <- cleaned_data %>%
    mutate(
        PropDamage = PROPDMG * convert_exponent(PROPDMGEXP),
        CropDamage = CROPDMG * convert_exponent(CROPDMGEXP),
        TotalDamage = PropDamage + CropDamage
    )

Event Type Standardization

The EVTYPE variable contains many inconsistencies due to typos, case mismatches, and multiple naming conventions (e.g., “TSTM WIND” vs “THUNDERSTORM WIND”). We clean and standardize the most frequent and impactful event types using regular expressions and a standard classification mapping.

# Clean and standardize event types
cleaned_data <- cleaned_data %>%
    mutate(EVTYPE_CLEAN = toupper(trimws(EVTYPE))) %>%
    mutate(EVTYPE_CLEAN = case_when(
        grepl("TORNADO", EVTYPE_CLEAN) ~ "TORNADO",
        grepl("TSTM WIND|THUNDERSTORM WIND|THUNDERSTORM WINDS|THUNDERSTORM", EVTYPE_CLEAN) ~ "THUNDERSTORM WIND",
        grepl("EXCESSIVE HEAT|EXTREME HEAT|RECORD HEAT", EVTYPE_CLEAN) ~ "EXCESSIVE HEAT",
        grepl("HEAT", EVTYPE_CLEAN) ~ "HEAT",
        grepl("HURRICANE|TYPHOON", EVTYPE_CLEAN) ~ "HURRICANE",
        grepl("STORM SURGE|TIDE", EVTYPE_CLEAN) ~ "STORM SURGE",
        grepl("WILD/FOREST FIRE|WILDFIRE|WILD FIRE", EVTYPE_CLEAN) ~ "WILDFIRE",
        grepl("FLASH FLOOD", EVTYPE_CLEAN) ~ "FLASH FLOOD",
        grepl("FLOOD", EVTYPE_CLEAN) & !grepl("FLASH", EVTYPE_CLEAN) ~ "FLOOD",
        grepl("HAIL", EVTYPE_CLEAN) ~ "HAIL",
        grepl("RIP CURRENT", EVTYPE_CLEAN) ~ "RIP CURRENT",
        grepl("BLIZZARD|WINTER STORM|WINTER WEATHER|SNOW|ICE|FREEZING", EVTYPE_CLEAN) ~ "WINTER WEATHER/STORM",
        grepl("COLD|EXTREME COLD|WIND CHILL|FREEZE|FROST", EVTYPE_CLEAN) ~ "EXTREME COLD/FROST",
        grepl("HIGH WIND|STRONG WIND", EVTYPE_CLEAN) ~ "HIGH WIND",
        grepl("LIGHTNING", EVTYPE_CLEAN) ~ "LIGHTNING",
        TRUE ~ EVTYPE_CLEAN
    ))

Aggregating Data

Finally, we aggregate the health and economic metrics by the cleaned event categories.

# Aggregate public health data
health_summary <- cleaned_data %>%
    group_by(EVTYPE_CLEAN) %>%
    summarise(
        Fatalities = sum(FATALITIES, na.rm = TRUE),
        Injuries = sum(INJURIES, na.rm = TRUE),
        TotalHealth = Fatalities + Injuries
    )

# Aggregate economic data
economic_summary <- cleaned_data %>%
    group_by(EVTYPE_CLEAN) %>%
    summarise(
        PropDamage = sum(PropDamage, na.rm = TRUE),
        CropDamage = sum(CropDamage, na.rm = TRUE),
        TotalDamage = sum(TotalDamage, na.rm = TRUE)
    )

Results

1. Events Most Harmful to Population Health

To identify the weather events most harmful to public health, we analyze the top 10 event categories for both fatalities and injuries.

# Get top 10 events for fatalities and injuries
top_fatalities <- health_summary %>%
    arrange(desc(Fatalities)) %>%
    head(10)

top_injuries <- health_summary %>%
    arrange(desc(Injuries)) %>%
    head(10)

# Combine for plotting
top_health_plot <- bind_rows(
    top_fatalities %>% mutate(Count = Fatalities, Metric = "Fatalities"),
    top_injuries %>% mutate(Count = Injuries, Metric = "Injuries")
)

# Plot Figure 1: Population Health Impacts
ggplot(top_health_plot, aes(x = reorder(EVTYPE_CLEAN, Count), y = Count, fill = Metric)) +
    geom_bar(stat = "identity") +
    coord_flip() +
    facet_wrap(~Metric, scales = "free", ncol = 2) +
    labs(
        title = "Top 10 Severe Weather Events by Health Impact (1950 - 2011)",
        x = "Event Type",
        y = "Total Counts",
        caption = "Figure 1: Comparison of total fatalities and injuries for the top 10 severe weather events."
    ) +
    scale_fill_manual(values = c("Fatalities" = "#d9534f", "Injuries" = "#f0ad4e")) +
    theme_minimal() +
    theme(
        legend.position = "none",
        plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
        strip.text = element_text(face = "bold", size = 12),
        axis.text.y = element_text(size = 10)
    )

As shown in Figure 1, Tornadoes are by far the leading cause of both fatalities and injuries in the United States, causing over 5,600 fatalities and 91,000 injuries. Excessive Heat is the second most deadly event, causing nearly 2,000 deaths, while Thunderstorm Wind and Floods lead to the second and third highest number of injuries, respectively.

Table 1 provides the detailed counts for the top 10 event types by overall public health impact (Fatalities + Injuries).

# Display the top 10 events causing overall health impacts (Fatalities + Injuries)
top_overall_health <- health_summary %>%
    arrange(desc(TotalHealth)) %>%
    head(10)

knitr::kable(top_overall_health, 
             col.names = c("Event Type", "Fatalities", "Injuries", "Total Health Impact (Fatalities + Injuries)"),
             caption = "Table 1: Top 10 Weather Events by Total Population Health Impact")
Table 1: Top 10 Weather Events by Total Population Health Impact
Event Type Fatalities Injuries Total Health Impact (Fatalities + Injuries)
TORNADO 5661 91407 97068
THUNDERSTORM WIND 729 9544 10273
EXCESSIVE HEAT 2020 6730 8750
FLOOD 490 6802 7292
WINTER WEATHER/STORM 655 6052 6707
LIGHTNING 817 5231 6048
HEAT 1118 2494 3612
FLASH FLOOD 1035 1802 2837
HIGH WIND 416 1784 2200
WILDFIRE 90 1606 1696

2. Events with the Greatest Economic Consequences

To find the weather events with the greatest economic impact, we look at the total combined property and crop damages (in USD).

# Get top 10 events for total economic damage
top_economic <- economic_summary %>%
    arrange(desc(TotalDamage)) %>%
    head(10)

# Reshape for plotting property and crop damage breakdown
top_economic_long <- top_economic %>%
    pivot_longer(cols = c(PropDamage, CropDamage), names_to = "DamageType", values_to = "Amount") %>%
    mutate(AmountBillion = Amount / 1e9)

# Plot Figure 2: Economic Damage
ggplot(top_economic_long, aes(x = reorder(EVTYPE_CLEAN, TotalDamage), y = AmountBillion, fill = DamageType)) +
    geom_bar(stat = "identity") +
    coord_flip() +
    labs(
        title = "Top 10 Severe Weather Events by Economic Impact (1950 - 2011)",
        x = "Event Type",
        y = "Damage (Billions of USD)",
        fill = "Damage Component",
        caption = "Figure 2: Total economic damage in billions of USD, partitioned by property and crop damage."
    ) +
    scale_fill_manual(
        values = c("PropDamage" = "#337ab7", "CropDamage" = "#5cb85c"),
        labels = c("PropDamage" = "Property Damage", "CropDamage" = "Crop Damage")
    ) +
    theme_minimal() +
    theme(
        legend.position = "bottom",
        plot.title = element_text(face = "bold", size = 14, hjust = 0.5),
        axis.text.y = element_text(size = 10)
    )

As shown in Figure 2, Floods are responsible for the greatest economic consequences overall, causing over 150 billion USD in total damages, predominantly driven by property damage. Hurricanes rank second with approximately 90 billion USD in total damage, and Storm Surges rank third at around 48 billion USD.

For agricultural losses specifically, Drought is the leading cause of crop damage (over 13 billion USD), followed by floods and hurricanes.

Table 2 shows the breakdown in billions of USD for the top 10 event types.

# Format economic data for display
top_overall_economic <- top_economic %>%
    mutate(
        PropDamageBillion = PropDamage / 1e9,
        CropDamageBillion = CropDamage / 1e9,
        TotalDamageBillion = TotalDamage / 1e9
    ) %>%
    select(EVTYPE_CLEAN, PropDamageBillion, CropDamageBillion, TotalDamageBillion)

knitr::kable(top_overall_economic, 
             digits = 2,
             col.names = c("Event Type", "Property Damage (Billions $)", "Crop Damage (Billions $)", "Total Damage (Billions $)"),
             caption = "Table 2: Top 10 Weather Events by Total Economic Impact")
Table 2: Top 10 Weather Events by Total Economic Impact
Event Type Property Damage (Billions $) Crop Damage (Billions $) Total Damage (Billions $)
FLOOD 150.62 10.85 161.47
HURRICANE 85.36 5.52 90.87
TORNADO 58.60 0.42 59.02
STORM SURGE 47.97 0.00 47.98
FLASH FLOOD 17.59 1.53 19.12
HAIL 15.98 3.05 19.02
WINTER WEATHER/STORM 12.44 5.32 17.75
DROUGHT 1.05 13.97 15.02
THUNDERSTORM WIND 11.18 1.27 12.46
WILDFIRE 8.49 0.40 8.89