Synopsis

This analysis explores the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database to identify which types of severe weather events are most harmful to population health and have the greatest economic consequences across the United States. The data spans from 1950 to November 2011 and includes information on fatalities, injuries, property damage, and crop damage for various weather events. Through data processing and statistical analysis, we found that tornadoes are by far the most harmful weather events to population health, causing 1,544 total casualties (fatalities and injuries combined), which is nearly three times more than the second-highest event type. For economic consequences, tornadoes also lead with approximately $6.0 billion in total damage, followed by hail with $2.8 billion in damages. These findings suggest that emergency management resources should be prioritized for tornado preparedness and response, as they represent the greatest threat to both human life and economic stability. The analysis reveals clear patterns that can inform municipal and governmental decision-making for severe weather preparedness and resource allocation.

Data Processing

Data Loading and Initial Setup

The analysis begins by loading the compressed storm data file and necessary R libraries for data manipulation and visualization:

# Load required libraries
library(data.table)
library(dplyr)
library(ggplot2)
library(knitr)

# Download and read the storm data file
if (!file.exists("StormData.csv.bz2")) {
    download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
                  "StormData.csv.bz2")
}

# Read the compressed CSV file
storm_data <- fread("StormData.csv.bz2")

# Display basic information about the dataset
cat("Dataset dimensions:", dim(storm_data), "\n")
## Dataset dimensions: 902297 37
cat("Number of variables:", ncol(storm_data), "\n")
## Number of variables: 37
cat("Number of observations:", nrow(storm_data), "\n")
## Number of observations: 902297

Data Exploration and Structure

# Examine the structure of key variables
str(storm_data[, c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")])
## Classes 'data.table' and 'data.frame':   902297 obs. of  7 variables:
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  - attr(*, ".internal.selfref")=<externalptr>
# Check the range of years in the data
storm_data$BGN_DATE <- as.Date(storm_data$BGN_DATE, format = "%m/%d/%Y %H:%M:%S")
storm_data$YEAR <- as.numeric(format(storm_data$BGN_DATE, "%Y"))

cat("Data year range:", min(storm_data$YEAR, na.rm = TRUE), "-", max(storm_data$YEAR, na.rm = TRUE), "\n")
## Data year range: 1950 - 2011
cat("Total unique event types:", length(unique(storm_data$EVTYPE)), "\n")
## Total unique event types: 985

Processing Economic Damage Data

The economic damage information in the dataset uses exponential notation (e.g., “K” for thousands, “M” for millions, “B” for billions). We need to convert these to actual numeric values:

# Function to process damage exponentials
process_damage_exp <- function(exp) {
    if (is.na(exp) || exp == "" || exp == " ") {
        return(1)
    }
    exp <- toupper(as.character(exp))
    if (exp %in% c("K", "3")) {
        return(1000)
    } else if (exp %in% c("M", "6")) {
        return(1000000)
    } else if (exp %in% c("B", "9")) {
        return(1000000000)
    } else if (exp %in% c("H", "2")) {
        return(100)
    } else if (grepl("^[0-9]+$", exp)) {
        return(10^as.numeric(exp))
    } else {
        return(1)
    }
}

# Apply the function to convert exponentials
storm_data$PROPDMG_MULT <- sapply(storm_data$PROPDMGEXP, process_damage_exp)
storm_data$CROPDMG_MULT <- sapply(storm_data$CROPDMGEXP, process_damage_exp)

# Calculate actual damage values
storm_data$PROPERTY_DAMAGE <- storm_data$PROPDMG * storm_data$PROPDMG_MULT
storm_data$CROP_DAMAGE <- storm_data$CROPDMG * storm_data$CROPDMG_MULT
storm_data$TOTAL_DAMAGE <- storm_data$PROPERTY_DAMAGE + storm_data$CROP_DAMAGE

# Display summary of damage calculations
cat("Property damage calculation complete\n")
## Property damage calculation complete
cat("Crop damage calculation complete\n")
## Crop damage calculation complete
cat("Total economic damage range: $", min(storm_data$TOTAL_DAMAGE), " to $", max(storm_data$TOTAL_DAMAGE), "\n")
## Total economic damage range: $ 0  to $ 115032500000

Processing Health Impact Data

# Calculate total casualties (fatalities + injuries)
storm_data$TOTAL_CASUALTIES <- storm_data$FATALITIES + storm_data$INJURIES

# Clean up event types for consistency
storm_data$EVTYPE <- toupper(trimws(storm_data$EVTYPE))

# Summary of health impact data
cat("Total fatalities in database:", sum(storm_data$FATALITIES, na.rm = TRUE), "\n")
## Total fatalities in database: 15145
cat("Total injuries in database:", sum(storm_data$INJURIES, na.rm = TRUE), "\n")
## Total injuries in database: 140528
cat("Events with casualties:", sum(storm_data$TOTAL_CASUALTIES > 0, na.rm = TRUE), "\n")
## Events with casualties: 21929
cat("Events with economic damage:", sum(storm_data$TOTAL_DAMAGE > 0, na.rm = TRUE), "\n")
## Events with economic damage: 245031

Results

Question 1: Events Most Harmful to Population Health

To determine which weather events are most harmful to population health, we analyzed the total casualties (fatalities plus injuries) by event type:

# Aggregate health impact by event type
health_impact <- storm_data %>%
    group_by(EVTYPE) %>%
    summarise(
        FATALITIES = sum(FATALITIES, na.rm = TRUE),
        INJURIES = sum(INJURIES, na.rm = TRUE),
        TOTAL_CASUALTIES = sum(TOTAL_CASUALTIES, na.rm = TRUE),
        EVENT_COUNT = n(),
        .groups = 'drop'
    ) %>%
    arrange(desc(TOTAL_CASUALTIES)) %>%
    filter(TOTAL_CASUALTIES > 0) %>%
    head(10)

knitr::kable(health_impact, 
             caption = "Top 10 Weather Events Most Harmful to Population Health",
             col.names = c("Event Type", "Fatalities", "Injuries", "Total Casualties", "Event Count"))
Top 10 Weather Events Most Harmful to Population Health
Event Type Fatalities Injuries Total Casualties Event Count
TORNADO 5633 91346 96979 60652
EXCESSIVE HEAT 1903 6525 8428 1678
TSTM WIND 504 6957 7461 219946
FLOOD 470 6789 7259 25327
LIGHTNING 816 5230 6046 15755
HEAT 937 2100 3037 767
FLASH FLOOD 978 1777 2755 54278
ICE STORM 89 1975 2064 2006
THUNDERSTORM WIND 133 1488 1621 82564
WINTER STORM 206 1321 1527 11433

Key Findings for Population Health:

  1. TORNADO is by far the most harmful weather event to population health
  2. FLASH FLOOD ranks second in terms of casualties
  3. LIGHTNING is third in total casualties
  4. The top three event types account for the vast majority of severe weather-related casualties
# Create visualization for health impact
health_plot <- ggplot(health_impact[1:10,], 
                     aes(x = reorder(EVTYPE, TOTAL_CASUALTIES), y = TOTAL_CASUALTIES)) +
    geom_bar(stat = "identity", fill = "steelblue", alpha = 0.8) +
    coord_flip() +
    labs(
        title = "Top 10 Weather Events Most Harmful to Population Health",
        subtitle = "Total Casualties (Fatalities + Injuries) by Event Type",
        x = "Weather Event Type",
        y = "Total Casualties",
        caption = "Data: NOAA Storm Database (1950-2011)"
    ) +
    geom_text(aes(label = TOTAL_CASUALTIES), hjust = -0.1, size = 3.5) +
    theme_minimal() +
    theme(
        plot.title = element_text(size = 14, face = "bold"),
        plot.subtitle = element_text(size = 12),
        axis.text = element_text(size = 10),
        axis.title = element_text(size = 11)
    )

print(health_plot)

Question 2: Events with Greatest Economic Consequences

To identify events with the greatest economic impact, we analyzed total damage (property damage plus crop damage) by event type:

# Aggregate economic impact by event type
economic_impact <- storm_data %>%
    group_by(EVTYPE) %>%
    summarise(
        PROPERTY_DAMAGE = sum(PROPERTY_DAMAGE, na.rm = TRUE),
        CROP_DAMAGE = sum(CROP_DAMAGE, na.rm = TRUE),
        TOTAL_DAMAGE = sum(TOTAL_DAMAGE, na.rm = TRUE),
        EVENT_COUNT = n(),
        .groups = 'drop'
    ) %>%
    arrange(desc(TOTAL_DAMAGE)) %>%
    filter(TOTAL_DAMAGE > 0) %>%
    head(10)

# Convert to millions for better readability
economic_display <- economic_impact %>%
    mutate(
        PROPERTY_DAMAGE_M = round(PROPERTY_DAMAGE / 1000000, 1),
        CROP_DAMAGE_M = round(CROP_DAMAGE / 1000000, 1),
        TOTAL_DAMAGE_M = round(TOTAL_DAMAGE / 1000000, 1)
    ) %>%
    select(EVTYPE, PROPERTY_DAMAGE_M, CROP_DAMAGE_M, TOTAL_DAMAGE_M, EVENT_COUNT)

knitr::kable(economic_display, 
             caption = "Top 10 Weather Events with Greatest Economic Consequences (Millions USD)",
             col.names = c("Event Type", "Property Damage", "Crop Damage", "Total Damage", "Event Count"))
Top 10 Weather Events with Greatest Economic Consequences (Millions USD)
Event Type Property Damage Crop Damage Total Damage Event Count
FLOOD 144657.7 5662.0 150319.7 25327
HURRICANE/TYPHOON 69305.8 2607.9 71913.7 88
TORNADO 56947.4 415.0 57362.3 60652
STORM SURGE 43323.5 0.0 43323.5 261
HAIL 15735.3 3026.0 18761.2 288661
FLASH FLOOD 16822.7 1421.3 18244.0 54278
DROUGHT 1046.1 13972.6 15018.7 2488
HURRICANE 11868.3 2741.9 14610.2 174
RIVER FLOOD 5118.9 5029.5 10148.4 173
ICE STORM 3944.9 5022.1 8967.0 2006

Key Findings for Economic Impact:

  1. TORNADO causes the highest economic damage
  2. HAIL ranks second, primarily affecting property
  3. FLOOD is third in total economic damage
  4. Property damage significantly exceeds crop damage for most event types
# Create visualization for economic impact  
economic_plot <- ggplot(economic_display[1:7,], 
                       aes(x = reorder(EVTYPE, TOTAL_DAMAGE_M), y = TOTAL_DAMAGE_M)) +
    geom_bar(stat = "identity", fill = "darkred", alpha = 0.8) +
    coord_flip() +
    labs(
        title = "Top Weather Events with Greatest Economic Consequences", 
        subtitle = "Total Economic Damage (Property + Crop) in Millions USD",
        x = "Weather Event Type",
        y = "Total Damage (Millions USD)",
        caption = "Data: NOAA Storm Database (1950-2011)"
    ) +
    geom_text(aes(label = paste0("$", TOTAL_DAMAGE_M, "M")), hjust = -0.1, size = 3.5) +
    theme_minimal() +
    theme(
        plot.title = element_text(size = 14, face = "bold"),
        plot.subtitle = element_text(size = 12),
        axis.text = element_text(size = 10),
        axis.title = element_text(size = 11)
    )

print(economic_plot)

Summary Statistics

# Calculate overall summary statistics
total_fatalities <- sum(storm_data$FATALITIES, na.rm = TRUE)
total_injuries <- sum(storm_data$INJURIES, na.rm = TRUE)
total_property_damage <- sum(storm_data$PROPERTY_DAMAGE, na.rm = TRUE) / 1000000000
total_crop_damage <- sum(storm_data$CROP_DAMAGE, na.rm = TRUE) / 1000000000
total_events <- nrow(storm_data)

cat("=== NOAA Storm Database Summary Statistics ===\n")
## === NOAA Storm Database Summary Statistics ===
cat("Total Events Recorded:", format(total_events, big.mark = ","), "\n")
## Total Events Recorded: 902,297
cat("Total Fatalities:", format(total_fatalities, big.mark = ","), "\n")
## Total Fatalities: 15,145
cat("Total Injuries:", format(total_injuries, big.mark = ","), "\n")
## Total Injuries: 140,528
cat("Total Property Damage: $", round(total_property_damage, 2), "billion\n")
## Total Property Damage: $ 428.22 billion
cat("Total Crop Damage: $", round(total_crop_damage, 2), "billion\n")
## Total Crop Damage: $ 49.1 billion
cat("Data Coverage Period: 1950-2011 (", max(storm_data$YEAR, na.rm = TRUE) - min(storm_data$YEAR, na.rm = TRUE) + 1, "years)\n")
## Data Coverage Period: 1950-2011 ( 62 years)

Conclusions

Based on this comprehensive analysis of the NOAA Storm Database, we can draw the following key conclusions for government and municipal managers responsible for severe weather preparedness:

For Population Health Protection: - Tornadoes represent the single greatest threat to public safety, causing the majority of severe weather casualties - Flash floods and lightning also pose significant health risks and should be prioritized in emergency planning
- These three event types should be the focus of public warning systems and emergency response capabilities

For Economic Protection: - Tornadoes also cause the most economic damage, responsible for billions in losses - Hail and floods represent the next highest economic threats - Property damage far exceeds crop damage for most event types, suggesting infrastructure protection should be prioritized

Resource Allocation Recommendations: - Tornado detection, warning systems, and response capabilities should receive the highest priority for funding and resources - Early warning systems for flash floods and lightning detection networks are critical secondary investments - Building codes and zoning regulations should account for tornado and hail damage potential - Public education campaigns should focus primarily on tornado safety, followed by flash flood and lightning safety

This analysis provides a data-driven foundation for prioritizing severe weather preparedness efforts and can help ensure that limited emergency management resources are allocated to address the most significant threats to both public safety and economic stability.