This analysis explores the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database to evaluate the impacts of severe weather events during the year 2025. The objective is to identify which weather phenomena pose the greatest threat to population health and to analyze their spatial and temporal distributions across the United States. Through rigorous data processing, we merged event records, geographic locations, and fatality reports into a unified dataset. Our findings indicate that excessive heat and flash floods are the primary drivers of weather-related fatalities, with distinct geographic concentrations in southern and coastal states. Furthermore, seasonal analysis reveals critical threat windows, peaking in the mid-to-late summer months. Finally, we analyze the economic toll by quantifying property damage, revealing that while some events are less deadly, they require massive infrastructure repair budgets. These actionable insights are designed to assist municipal and government managers in strategically prioritizing emergency response resources and preventive planning.
The data for this analysis originates from the NOAA Storm Events Database. [cite_start]We focus exclusively on the year 2025, utilizing three distinct raw CSV files: event details, fatalities, and locations[cite: 24, 25, 26].
In this section, we load the raw files, join them via their unique
EVENT_ID, and perform necessary data transformations to
prepare for exploratory analysis. Specifically, we clean the
DAMAGE_PROPERTY variable, which is recorded as a character
string (e.g., “10.00K”, “1.50M”), converting it into a standardized
numeric format for accurate economic calculation.
# Load necessary libraries
library(dplyr)
library(readr)
library(ggplot2)
library(stringr)
library(scales)
library(tidyr)
# 1. Load the Raw Data
folder_path <- "."
details_file <- file.path(folder_path, "StormEvents_details-ftp_v1.0_d2025_c20260323.csv")
fatalities_file <- file.path(folder_path, "StormEvents_fatalities-ftp_v1.0_d2025_c20260323.csv")
locations_file <- file.path(folder_path, "StormEvents_locations-ftp_v1.0_d2025_c20260323.csv")
details <- read_csv(details_file)
fatalities <- read_csv(fatalities_file)
locations <- read_csv(locations_file)
# 2. Join the Datasets
joined_data <- details %>%
left_join(locations, by = "EVENT_ID", relationship = "many-to-many") %>%
left_join(fatalities, by = "EVENT_ID", relationship = "many-to-many")
# 3. Data Transformation: Parsing Property Damage
# The DAMAGE_PROPERTY column contains multipliers (K = Thousands, M = Millions, B = Billions).
# We must extract the numeric value and apply the multiplier.
transform_damage <- function(damage_str) {
if (is.na(damage_str) || damage_str == "0.00K" || damage_str == "0") return(0)
num_val <- as.numeric(str_extract(damage_str, "[0-9.]+"))
multiplier <- str_extract(damage_str, "[KMBkmb]")
if (is.na(multiplier)) return(num_val)
multiplier <- toupper(multiplier)
mult_val <- case_when(
multiplier == "K" ~ 1e3,
multiplier == "M" ~ 1e6,
multiplier == "B" ~ 1e9,
TRUE ~ 1
)
return(num_val * mult_val)
}
# Apply the transformation to create a new numeric column
clean_data <- joined_data %>%
mutate(Property_Damage_USD = sapply(DAMAGE_PROPERTY, transform_damage)) %>%
# Standardize event types to uppercase to avoid grouping errors
mutate(EVENT_TYPE = toupper(EVENT_TYPE))
Question: Across the United States, which types of events are most harmful with respect to population health?
To answer this, we aggregate the total number of fatalities for each event type. [cite_start]Since government managers must prioritize life-saving resources[cite: 52], isolating the most lethal events is the crucial first step.
# Calculate total fatalities per event type
health_summary <- clean_data %>%
filter(!is.na(FATALITY_ID)) %>%
group_by(EVENT_TYPE) %>%
summarise(Total_Fatalities = n(), .groups = 'drop') %>%
arrange(desc(Total_Fatalities)) %>%
slice_head(n = 10)
# Visualization
ggplot(health_summary, aes(x = reorder(EVENT_TYPE, Total_Fatalities), y = Total_Fatalities)) +
geom_col(fill = "#B22222", alpha = 0.8) +
coord_flip() +
labs(title = "Most Harmful Weather Events to Population Health",
subtitle = "Measured by total recorded fatalities in 2025",
x = "Weather Event Type",
y = "Total Fatalities") +
theme_minimal() +
theme(plot.title = element_text(face = "bold"))
Analysis: The data clearly indicates that resource allocation for public health should heavily prioritize excessive heat warnings and flood response mechanisms. While dramatic events like tornadoes capture significant public attention, the chronic, widespread nature of extreme heat and the sudden onset of flash flooding result in the highest human toll. Municipal managers should ensure that cooling centers and rapid water-rescue units receive adequate funding.
Question: Across the United States, which types of events are mostly happening in which States?
Understanding where events happen allows for localized preparedness. We will identify the top 5 most frequent severe weather events overall, and map out which states experience them the most.
# Identify top 5 events overall
top_5_events <- clean_data %>%
count(EVENT_TYPE, sort = TRUE) %>%
slice_head(n = 5) %>%
pull(EVENT_TYPE)
# Find which states have the highest occurrences of these top events
state_event_distribution <- clean_data %>%
filter(EVENT_TYPE %in% top_5_events) %>%
group_by(EVENT_TYPE, STATE) %>%
summarise(Event_Count = n(), .groups = 'drop') %>%
arrange(EVENT_TYPE, desc(Event_Count)) %>%
group_by(EVENT_TYPE) %>%
slice_head(n = 3) # Take top 3 states for each of the top 5 events
# Visualization
ggplot(state_event_distribution, aes(x = reorder(STATE, Event_Count), y = Event_Count, fill = EVENT_TYPE)) +
geom_col(position = "dodge") +
coord_flip() +
facet_wrap(~EVENT_TYPE, scales = "free_y") +
labs(title = "Primary Geographic Targets of Major Weather Events",
subtitle = "Top 3 states affected by the 5 most frequent event types",
x = "State",
y = "Number of Occurrences") +
theme_bw() +
theme(legend.position = "none", strip.text = element_text(face = "bold"))
Analysis: This panel plot reveals distinct geographic vulnerabilities. For instance, thunderstorm winds are heavily concentrated in states like Texas and Pennsylvania, whereas hail events dominate the central plains. State emergency managers in these highlighted regions must tailor their infrastructure resilience (e.g., roof reinforcements for hail, grid hardening for winds) to their specific statistical vulnerabilities rather than a generic national average.
Question: Which types of events are characterized by which months?
[cite_start]We track the temporal occurrence of the top weather events to help government agencies understand when to mobilize resources[cite: 52].
# Aggregate occurrences by month for the top events
monthly_trends <- clean_data %>%
filter(EVENT_TYPE %in% top_5_events) %>%
group_by(MONTH_NAME, EVENT_TYPE) %>%
summarise(Occurrences = n(), .groups = 'drop')
# Ensure chronological order of months
monthly_trends$MONTH_NAME <- factor(monthly_trends$MONTH_NAME, levels = month.name)
# Visualization
ggplot(monthly_trends, aes(x = MONTH_NAME, y = Occurrences, group = EVENT_TYPE, color = EVENT_TYPE)) +
geom_line(size = 1.2) +
geom_point(size = 3) +
labs(title = "Seasonal Timelines of Severe Weather Events",
subtitle = "Tracking event frequency throughout the 2025 calendar year",
x = "Month",
y = "Number of Recorded Events",
color = "Event Type") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(face = "bold"))
Analysis: The data exhibits sharp seasonal peaks. Thunderstorm winds and hail see a dramatic surge beginning in April, peaking in May and June, and sharply trailing off by September. This establishes a highly specific “threat window.” Municipalities must complete infrastructure maintenance, tree trimming, and emergency drill preparations by the end of March to be adequately prepared for the Q2/Q3 spikes.
Question: Beyond human health, which severe weather events exact the highest financial toll on municipal and private infrastructure?
[cite_start]While fatalities drive life-saving priorities, property damage drives long-term economic recovery and municipal budgeting[cite: 52]. Using our transformed numeric damage data, we evaluate the total financial impact in USD.
# Calculate total property damage by event type
economic_impact <- clean_data %>%
group_by(EVENT_TYPE) %>%
summarise(Total_Damage_USD = sum(Property_Damage_USD, na.rm = TRUE), .groups = 'drop') %>%
arrange(desc(Total_Damage_USD)) %>%
slice_head(n = 8)
# Visualization
ggplot(economic_impact, aes(x = reorder(EVENT_TYPE, Total_Damage_USD), y = Total_Damage_USD)) +
geom_col(fill = "#2E8B57", alpha = 0.9) +
scale_y_continuous(labels = label_dollar(scale = 1e-6, suffix = "M")) +
coord_flip() +
labs(title = "Economic Toll: Highest Property Damage by Event Type",
subtitle = "Total damage in USD (Millions)",
x = "Weather Event Type",
y = "Total Property Damage (USD)") +
theme_minimal() +
theme(plot.title = element_text(face = "bold"))
Analysis: The economic landscape tells a different story than the public health data. While excessive heat causes the most fatalities, events like Flash Floods, Tornadoes, and Hail inflict the vast majority of physical property damage. Government managers must partition their focus: operational life-saving funds should target heat and immediate flood rescue, while long-term infrastructure insurance, building code regulations, and recovery budgets must be heavily weighted toward mitigating wind, hail, and water damage to property.