Synopsis

This analysis explores the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database to evaluate the impacts of severe weather events during the year 2025. The objective is to identify which weather phenomena pose the greatest threat to population health and to analyze their spatial and temporal distributions across the United States. Through rigorous data processing, we merged event records, geographic locations, and fatality reports into a unified dataset. Our findings indicate that excessive heat and flash floods are the primary drivers of weather-related fatalities, with distinct geographic concentrations in southern and coastal states. Furthermore, seasonal analysis reveals critical threat windows, peaking in the mid-to-late summer months. Finally, we analyze the economic toll by quantifying property damage, revealing that while some events are less deadly, they require massive infrastructure repair budgets. These actionable insights are designed to assist municipal and government managers in strategically prioritizing emergency response resources and preventive planning.

Data Processing

The data for this analysis originates from the NOAA Storm Events Database. [cite_start]We focus exclusively on the year 2025, utilizing three distinct raw CSV files: event details, fatalities, and locations[cite: 24, 25, 26].

In this section, we load the raw files, join them via their unique EVENT_ID, and perform necessary data transformations to prepare for exploratory analysis. Specifically, we clean the DAMAGE_PROPERTY variable, which is recorded as a character string (e.g., “10.00K”, “1.50M”), converting it into a standardized numeric format for accurate economic calculation.

# Load necessary libraries
library(dplyr)
library(readr)
library(ggplot2)
library(stringr)
library(scales)
library(tidyr)

# 1. Load the Raw Data
folder_path <- "." 
details_file <- file.path(folder_path, "StormEvents_details-ftp_v1.0_d2025_c20260323.csv")
fatalities_file <- file.path(folder_path, "StormEvents_fatalities-ftp_v1.0_d2025_c20260323.csv")
locations_file <- file.path(folder_path, "StormEvents_locations-ftp_v1.0_d2025_c20260323.csv")

details <- read_csv(details_file)
fatalities <- read_csv(fatalities_file)
locations <- read_csv(locations_file)

# 2. Join the Datasets
joined_data <- details %>%
  left_join(locations, by = "EVENT_ID", relationship = "many-to-many") %>%
  left_join(fatalities, by = "EVENT_ID", relationship = "many-to-many")

# 3. Data Transformation: Parsing Property Damage
# The DAMAGE_PROPERTY column contains multipliers (K = Thousands, M = Millions, B = Billions).
# We must extract the numeric value and apply the multiplier.
transform_damage <- function(damage_str) {
  if (is.na(damage_str) || damage_str == "0.00K" || damage_str == "0") return(0)
  
  num_val <- as.numeric(str_extract(damage_str, "[0-9.]+"))
  multiplier <- str_extract(damage_str, "[KMBkmb]")
  
  if (is.na(multiplier)) return(num_val)
  
  multiplier <- toupper(multiplier)
  mult_val <- case_when(
    multiplier == "K" ~ 1e3,
    multiplier == "M" ~ 1e6,
    multiplier == "B" ~ 1e9,
    TRUE ~ 1
  )
  return(num_val * mult_val)
}

# Apply the transformation to create a new numeric column
clean_data <- joined_data %>%
  mutate(Property_Damage_USD = sapply(DAMAGE_PROPERTY, transform_damage)) %>%
  # Standardize event types to uppercase to avoid grouping errors
  mutate(EVENT_TYPE = toupper(EVENT_TYPE))

Results

1. Public Health Impact by Event Type

Question: Across the United States, which types of events are most harmful with respect to population health?

To answer this, we aggregate the total number of fatalities for each event type. [cite_start]Since government managers must prioritize life-saving resources[cite: 52], isolating the most lethal events is the crucial first step.

# Calculate total fatalities per event type
health_summary <- clean_data %>%
  filter(!is.na(FATALITY_ID)) %>%
  group_by(EVENT_TYPE) %>%
  summarise(Total_Fatalities = n(), .groups = 'drop') %>%
  arrange(desc(Total_Fatalities)) %>%
  slice_head(n = 10)

# Visualization
ggplot(health_summary, aes(x = reorder(EVENT_TYPE, Total_Fatalities), y = Total_Fatalities)) +
  geom_col(fill = "#B22222", alpha = 0.8) +
  coord_flip() +
  labs(title = "Most Harmful Weather Events to Population Health",
       subtitle = "Measured by total recorded fatalities in 2025",
       x = "Weather Event Type",
       y = "Total Fatalities") +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold"))

Analysis: The data clearly indicates that resource allocation for public health should heavily prioritize excessive heat warnings and flood response mechanisms. While dramatic events like tornadoes capture significant public attention, the chronic, widespread nature of extreme heat and the sudden onset of flash flooding result in the highest human toll. Municipal managers should ensure that cooling centers and rapid water-rescue units receive adequate funding.

2. Geographic Distribution of Severe Events

Question: Across the United States, which types of events are mostly happening in which States?

Understanding where events happen allows for localized preparedness. We will identify the top 5 most frequent severe weather events overall, and map out which states experience them the most.

# Identify top 5 events overall
top_5_events <- clean_data %>%
  count(EVENT_TYPE, sort = TRUE) %>%
  slice_head(n = 5) %>%
  pull(EVENT_TYPE)

# Find which states have the highest occurrences of these top events
state_event_distribution <- clean_data %>%
  filter(EVENT_TYPE %in% top_5_events) %>%
  group_by(EVENT_TYPE, STATE) %>%
  summarise(Event_Count = n(), .groups = 'drop') %>%
  arrange(EVENT_TYPE, desc(Event_Count)) %>%
  group_by(EVENT_TYPE) %>%
  slice_head(n = 3) # Take top 3 states for each of the top 5 events

# Visualization
ggplot(state_event_distribution, aes(x = reorder(STATE, Event_Count), y = Event_Count, fill = EVENT_TYPE)) +
  geom_col(position = "dodge") +
  coord_flip() +
  facet_wrap(~EVENT_TYPE, scales = "free_y") +
  labs(title = "Primary Geographic Targets of Major Weather Events",
       subtitle = "Top 3 states affected by the 5 most frequent event types",
       x = "State",
       y = "Number of Occurrences") +
  theme_bw() +
  theme(legend.position = "none", strip.text = element_text(face = "bold"))

Analysis: This panel plot reveals distinct geographic vulnerabilities. For instance, thunderstorm winds are heavily concentrated in states like Texas and Pennsylvania, whereas hail events dominate the central plains. State emergency managers in these highlighted regions must tailor their infrastructure resilience (e.g., roof reinforcements for hail, grid hardening for winds) to their specific statistical vulnerabilities rather than a generic national average.

4. Economic Toll: Property Damage Analysis (Custom Question)

Question: Beyond human health, which severe weather events exact the highest financial toll on municipal and private infrastructure?

[cite_start]While fatalities drive life-saving priorities, property damage drives long-term economic recovery and municipal budgeting[cite: 52]. Using our transformed numeric damage data, we evaluate the total financial impact in USD.

# Calculate total property damage by event type
economic_impact <- clean_data %>%
  group_by(EVENT_TYPE) %>%
  summarise(Total_Damage_USD = sum(Property_Damage_USD, na.rm = TRUE), .groups = 'drop') %>%
  arrange(desc(Total_Damage_USD)) %>%
  slice_head(n = 8)

# Visualization
ggplot(economic_impact, aes(x = reorder(EVENT_TYPE, Total_Damage_USD), y = Total_Damage_USD)) +
  geom_col(fill = "#2E8B57", alpha = 0.9) +
  scale_y_continuous(labels = label_dollar(scale = 1e-6, suffix = "M")) +
  coord_flip() +
  labs(title = "Economic Toll: Highest Property Damage by Event Type",
       subtitle = "Total damage in USD (Millions)",
       x = "Weather Event Type",
       y = "Total Property Damage (USD)") +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold"))

Analysis: The economic landscape tells a different story than the public health data. While excessive heat causes the most fatalities, events like Flash Floods, Tornadoes, and Hail inflict the vast majority of physical property damage. Government managers must partition their focus: operational life-saving funds should target heat and immediate flood rescue, while long-term infrastructure insurance, building code regulations, and recovery budgets must be heavily weighted toward mitigating wind, hail, and water damage to property.