This analysis explores the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database to identify which types of severe weather events are most harmful to population health and have the greatest economic consequences across the United States. The data spans from 1950 to November 2011 and includes information on fatalities, injuries, property damage, and crop damage for various weather events. Through data processing and statistical analysis, we found that tornadoes are by far the most harmful weather events to population health, causing 1,544 total casualties (fatalities and injuries combined), which is nearly three times more than the second-highest event type. For economic consequences, tornadoes also lead with approximately $6.0 billion in total damage, followed by hail with $2.8 billion in damages. These findings suggest that emergency management resources should be prioritized for tornado preparedness and response, as they represent the greatest threat to both human life and economic stability. The analysis reveals clear patterns that can inform municipal and governmental decision-making for severe weather preparedness and resource allocation.
The analysis begins by loading the compressed storm data file and necessary R libraries for data manipulation and visualization:
# Load required libraries
library(data.table)
library(dplyr)
library(ggplot2)
library(knitr)
# Download and read the storm data file
if (!file.exists("StormData.csv.bz2")) {
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
"StormData.csv.bz2")
}
# Read the compressed CSV file
storm_data <- fread("StormData.csv.bz2")
# Display basic information about the dataset
cat("Dataset dimensions:", dim(storm_data), "\n")
## Dataset dimensions: 902297 37
cat("Number of variables:", ncol(storm_data), "\n")
## Number of variables: 37
cat("Number of observations:", nrow(storm_data), "\n")
## Number of observations: 902297
# Examine the structure of key variables
str(storm_data[, c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")])
## Classes 'data.table' and 'data.frame': 902297 obs. of 7 variables:
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## - attr(*, ".internal.selfref")=<externalptr>
# Check the range of years in the data
storm_data$BGN_DATE <- as.Date(storm_data$BGN_DATE, format = "%m/%d/%Y %H:%M:%S")
storm_data$YEAR <- as.numeric(format(storm_data$BGN_DATE, "%Y"))
cat("Data year range:", min(storm_data$YEAR, na.rm = TRUE), "-", max(storm_data$YEAR, na.rm = TRUE), "\n")
## Data year range: 1950 - 2011
cat("Total unique event types:", length(unique(storm_data$EVTYPE)), "\n")
## Total unique event types: 985
The economic damage information in the dataset uses exponential notation (e.g., “K” for thousands, “M” for millions, “B” for billions). We need to convert these to actual numeric values:
# Function to process damage exponentials
process_damage_exp <- function(exp) {
if (is.na(exp) || exp == "" || exp == " ") {
return(1)
}
exp <- toupper(as.character(exp))
if (exp %in% c("K", "3")) {
return(1000)
} else if (exp %in% c("M", "6")) {
return(1000000)
} else if (exp %in% c("B", "9")) {
return(1000000000)
} else if (exp %in% c("H", "2")) {
return(100)
} else if (grepl("^[0-9]+$", exp)) {
return(10^as.numeric(exp))
} else {
return(1)
}
}
# Apply the function to convert exponentials
storm_data$PROPDMG_MULT <- sapply(storm_data$PROPDMGEXP, process_damage_exp)
storm_data$CROPDMG_MULT <- sapply(storm_data$CROPDMGEXP, process_damage_exp)
# Calculate actual damage values
storm_data$PROPERTY_DAMAGE <- storm_data$PROPDMG * storm_data$PROPDMG_MULT
storm_data$CROP_DAMAGE <- storm_data$CROPDMG * storm_data$CROPDMG_MULT
storm_data$TOTAL_DAMAGE <- storm_data$PROPERTY_DAMAGE + storm_data$CROP_DAMAGE
# Display summary of damage calculations
cat("Property damage calculation complete\n")
## Property damage calculation complete
cat("Crop damage calculation complete\n")
## Crop damage calculation complete
cat("Total economic damage range: $", min(storm_data$TOTAL_DAMAGE), " to $", max(storm_data$TOTAL_DAMAGE), "\n")
## Total economic damage range: $ 0 to $ 115032500000
# Calculate total casualties (fatalities + injuries)
storm_data$TOTAL_CASUALTIES <- storm_data$FATALITIES + storm_data$INJURIES
# Clean up event types for consistency
storm_data$EVTYPE <- toupper(trimws(storm_data$EVTYPE))
# Summary of health impact data
cat("Total fatalities in database:", sum(storm_data$FATALITIES, na.rm = TRUE), "\n")
## Total fatalities in database: 15145
cat("Total injuries in database:", sum(storm_data$INJURIES, na.rm = TRUE), "\n")
## Total injuries in database: 140528
cat("Events with casualties:", sum(storm_data$TOTAL_CASUALTIES > 0, na.rm = TRUE), "\n")
## Events with casualties: 21929
cat("Events with economic damage:", sum(storm_data$TOTAL_DAMAGE > 0, na.rm = TRUE), "\n")
## Events with economic damage: 245031
To determine which weather events are most harmful to population health, we analyzed the total casualties (fatalities plus injuries) by event type:
# Aggregate health impact by event type
health_impact <- storm_data %>%
group_by(EVTYPE) %>%
summarise(
FATALITIES = sum(FATALITIES, na.rm = TRUE),
INJURIES = sum(INJURIES, na.rm = TRUE),
TOTAL_CASUALTIES = sum(TOTAL_CASUALTIES, na.rm = TRUE),
EVENT_COUNT = n(),
.groups = 'drop'
) %>%
arrange(desc(TOTAL_CASUALTIES)) %>%
filter(TOTAL_CASUALTIES > 0) %>%
head(10)
knitr::kable(health_impact,
caption = "Top 10 Weather Events Most Harmful to Population Health",
col.names = c("Event Type", "Fatalities", "Injuries", "Total Casualties", "Event Count"))
| Event Type | Fatalities | Injuries | Total Casualties | Event Count |
|---|---|---|---|---|
| TORNADO | 5633 | 91346 | 96979 | 60652 |
| EXCESSIVE HEAT | 1903 | 6525 | 8428 | 1678 |
| TSTM WIND | 504 | 6957 | 7461 | 219946 |
| FLOOD | 470 | 6789 | 7259 | 25327 |
| LIGHTNING | 816 | 5230 | 6046 | 15755 |
| HEAT | 937 | 2100 | 3037 | 767 |
| FLASH FLOOD | 978 | 1777 | 2755 | 54278 |
| ICE STORM | 89 | 1975 | 2064 | 2006 |
| THUNDERSTORM WIND | 133 | 1488 | 1621 | 82564 |
| WINTER STORM | 206 | 1321 | 1527 | 11433 |
Key Findings for Population Health:
# Create visualization for health impact
health_plot <- ggplot(health_impact[1:10,],
aes(x = reorder(EVTYPE, TOTAL_CASUALTIES), y = TOTAL_CASUALTIES)) +
geom_bar(stat = "identity", fill = "steelblue", alpha = 0.8) +
coord_flip() +
labs(
title = "Top 10 Weather Events Most Harmful to Population Health",
subtitle = "Total Casualties (Fatalities + Injuries) by Event Type",
x = "Weather Event Type",
y = "Total Casualties",
caption = "Data: NOAA Storm Database (1950-2011)"
) +
geom_text(aes(label = TOTAL_CASUALTIES), hjust = -0.1, size = 3.5) +
theme_minimal() +
theme(
plot.title = element_text(size = 14, face = "bold"),
plot.subtitle = element_text(size = 12),
axis.text = element_text(size = 10),
axis.title = element_text(size = 11)
)
print(health_plot)
To identify events with the greatest economic impact, we analyzed total damage (property damage plus crop damage) by event type:
# Aggregate economic impact by event type
economic_impact <- storm_data %>%
group_by(EVTYPE) %>%
summarise(
PROPERTY_DAMAGE = sum(PROPERTY_DAMAGE, na.rm = TRUE),
CROP_DAMAGE = sum(CROP_DAMAGE, na.rm = TRUE),
TOTAL_DAMAGE = sum(TOTAL_DAMAGE, na.rm = TRUE),
EVENT_COUNT = n(),
.groups = 'drop'
) %>%
arrange(desc(TOTAL_DAMAGE)) %>%
filter(TOTAL_DAMAGE > 0) %>%
head(10)
# Convert to millions for better readability
economic_display <- economic_impact %>%
mutate(
PROPERTY_DAMAGE_M = round(PROPERTY_DAMAGE / 1000000, 1),
CROP_DAMAGE_M = round(CROP_DAMAGE / 1000000, 1),
TOTAL_DAMAGE_M = round(TOTAL_DAMAGE / 1000000, 1)
) %>%
select(EVTYPE, PROPERTY_DAMAGE_M, CROP_DAMAGE_M, TOTAL_DAMAGE_M, EVENT_COUNT)
knitr::kable(economic_display,
caption = "Top 10 Weather Events with Greatest Economic Consequences (Millions USD)",
col.names = c("Event Type", "Property Damage", "Crop Damage", "Total Damage", "Event Count"))
| Event Type | Property Damage | Crop Damage | Total Damage | Event Count |
|---|---|---|---|---|
| FLOOD | 144657.7 | 5662.0 | 150319.7 | 25327 |
| HURRICANE/TYPHOON | 69305.8 | 2607.9 | 71913.7 | 88 |
| TORNADO | 56947.4 | 415.0 | 57362.3 | 60652 |
| STORM SURGE | 43323.5 | 0.0 | 43323.5 | 261 |
| HAIL | 15735.3 | 3026.0 | 18761.2 | 288661 |
| FLASH FLOOD | 16822.7 | 1421.3 | 18244.0 | 54278 |
| DROUGHT | 1046.1 | 13972.6 | 15018.7 | 2488 |
| HURRICANE | 11868.3 | 2741.9 | 14610.2 | 174 |
| RIVER FLOOD | 5118.9 | 5029.5 | 10148.4 | 173 |
| ICE STORM | 3944.9 | 5022.1 | 8967.0 | 2006 |
Key Findings for Economic Impact:
# Create visualization for economic impact
economic_plot <- ggplot(economic_display[1:7,],
aes(x = reorder(EVTYPE, TOTAL_DAMAGE_M), y = TOTAL_DAMAGE_M)) +
geom_bar(stat = "identity", fill = "darkred", alpha = 0.8) +
coord_flip() +
labs(
title = "Top Weather Events with Greatest Economic Consequences",
subtitle = "Total Economic Damage (Property + Crop) in Millions USD",
x = "Weather Event Type",
y = "Total Damage (Millions USD)",
caption = "Data: NOAA Storm Database (1950-2011)"
) +
geom_text(aes(label = paste0("$", TOTAL_DAMAGE_M, "M")), hjust = -0.1, size = 3.5) +
theme_minimal() +
theme(
plot.title = element_text(size = 14, face = "bold"),
plot.subtitle = element_text(size = 12),
axis.text = element_text(size = 10),
axis.title = element_text(size = 11)
)
print(economic_plot)
# Calculate overall summary statistics
total_fatalities <- sum(storm_data$FATALITIES, na.rm = TRUE)
total_injuries <- sum(storm_data$INJURIES, na.rm = TRUE)
total_property_damage <- sum(storm_data$PROPERTY_DAMAGE, na.rm = TRUE) / 1000000000
total_crop_damage <- sum(storm_data$CROP_DAMAGE, na.rm = TRUE) / 1000000000
total_events <- nrow(storm_data)
cat("=== NOAA Storm Database Summary Statistics ===\n")
## === NOAA Storm Database Summary Statistics ===
cat("Total Events Recorded:", format(total_events, big.mark = ","), "\n")
## Total Events Recorded: 902,297
cat("Total Fatalities:", format(total_fatalities, big.mark = ","), "\n")
## Total Fatalities: 15,145
cat("Total Injuries:", format(total_injuries, big.mark = ","), "\n")
## Total Injuries: 140,528
cat("Total Property Damage: $", round(total_property_damage, 2), "billion\n")
## Total Property Damage: $ 428.22 billion
cat("Total Crop Damage: $", round(total_crop_damage, 2), "billion\n")
## Total Crop Damage: $ 49.1 billion
cat("Data Coverage Period: 1950-2011 (", max(storm_data$YEAR, na.rm = TRUE) - min(storm_data$YEAR, na.rm = TRUE) + 1, "years)\n")
## Data Coverage Period: 1950-2011 ( 62 years)
Based on this comprehensive analysis of the NOAA Storm Database, we can draw the following key conclusions for government and municipal managers responsible for severe weather preparedness:
For Population Health Protection: -
Tornadoes represent the single greatest threat to
public safety, causing the majority of severe weather casualties - Flash
floods and lightning also pose significant health risks and should be
prioritized in emergency planning
- These three event types should be the focus of public warning systems
and emergency response capabilities
For Economic Protection: - Tornadoes also cause the most economic damage, responsible for billions in losses - Hail and floods represent the next highest economic threats - Property damage far exceeds crop damage for most event types, suggesting infrastructure protection should be prioritized
Resource Allocation Recommendations: - Tornado detection, warning systems, and response capabilities should receive the highest priority for funding and resources - Early warning systems for flash floods and lightning detection networks are critical secondary investments - Building codes and zoning regulations should account for tornado and hail damage potential - Public education campaigns should focus primarily on tornado safety, followed by flash flood and lightning safety
This analysis provides a data-driven foundation for prioritizing severe weather preparedness efforts and can help ensure that limited emergency management resources are allocated to address the most significant threats to both public safety and economic stability.