This analysis explores the 2024 NOAA Storm Events database to identify patterns in storm-related fatalities, injuries, event frequencies, seasonal behaviors, and economic damages across the United States. The dataset, obtained from three interlinked files (event details, locations, and fatalities), was merged and cleaned for analysis.
Key findings include: - The most harmful event types to population health - The most common weather event types in each state - Seasonal patterns of storm event occurrences - The types of events with the greatest economic impact
This report is intended for emergency management professionals and analysts who may need to prioritize resources based on the type, location, and impact of severe weather events.
The raw data files were read from NOAA’s Storm Events archive, joined
by EVENT_ID, and cleaned to address missing values and
standardize variables. Below is the code for data loading, merging,
missingness visualization, and preprocessing.
# Clear the workspace — removes all existing variables to prevent conflicts
rm(list = ls())
# Load required packages
# dplyr: data manipulation
# readr: efficient CSV reading
# naniar: missing data visualization
# ggplot2: plotting
# scales: formatting numeric outputs (e.g., currency)
library(dplyr)
library(readr)
library(naniar)
library(ggplot2)
library(scales)
# Set the folder path where your unzipped data files are stored
# Update this path if your files are saved in a different location
folder_path <- "D:/MS Data Analytics/Fall 2024/DAT 511/Final Project/Data_Stewardship_Storm_Events_Project"
# Define file paths for the three NOAA data files: details, fatalities, locations
details_file <- file.path(folder_path, "data", "StormEvents_details-ftp_v1.0_d2024_c20250401.csv.gz")
fatalities_file <- file.path(folder_path, "data", "StormEvents_fatalities-ftp_v1.0_d2024_c20250401.csv.gz")
locations_file <- file.path(folder_path, "data", "StormEvents_locations-ftp_v1.0_d2024_c20250401.csv.gz")
# Read the CSV files directly from compressed .gz format
details <- read_csv(details_file)
fatalities <- read_csv(fatalities_file)
locations <- read_csv(locations_file)
# Merge the three datasets using the common key: EVENT_ID
# - Each EVENT_ID represents one storm event
# - 'details' is the base file; we enrich it by joining with 'locations' and then 'fatalities'
joined_data <- details %>%
left_join(locations, by = "EVENT_ID") %>%
left_join(fatalities, by = "EVENT_ID")
# Save the joined dataset to a local CSV file (optional, for inspection outside R)
output_file <- file.path(folder_path, "StormEvents_joined_data.csv")
write_csv(joined_data, output_file)
# View the first few rows to confirm successful loading and joining
print(head(joined_data))
## # A tibble: 6 × 71
## BEGIN_YEARMONTH BEGIN_DAY BEGIN_TIME END_YEARMONTH END_DAY END_TIME
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 202405 23 1947 202405 23 1947
## 2 202411 16 230 202411 18 1421
## 3 202405 19 1839 202405 19 1902
## 4 202405 23 2155 202405 23 2155
## 5 202405 24 1405 202405 24 1410
## 6 202411 1 0 202411 1 1600
## # ℹ 65 more variables: EPISODE_ID.x <dbl>, EVENT_ID <dbl>, STATE <chr>,
## # STATE_FIPS <dbl>, YEAR <dbl>, MONTH_NAME <chr>, EVENT_TYPE <chr>,
## # CZ_TYPE <chr>, CZ_FIPS <dbl>, CZ_NAME <chr>, WFO <chr>,
## # BEGIN_DATE_TIME <chr>, CZ_TIMEZONE <chr>, END_DATE_TIME <chr>,
## # INJURIES_DIRECT <dbl>, INJURIES_INDIRECT <dbl>, DEATHS_DIRECT <dbl>,
## # DEATHS_INDIRECT <dbl>, DAMAGE_PROPERTY <chr>, DAMAGE_CROPS <chr>,
## # SOURCE <chr>, MAGNITUDE <dbl>, MAGNITUDE_TYPE <chr>, FLOOD_CAUSE <chr>, …
# View a summary of column names, data types, and sample values
glimpse(joined_data)
## Rows: 89,860
## Columns: 71
## $ BEGIN_YEARMONTH <dbl> 202405, 202411, 202405, 202405, 202405, 202411, 202…
## $ BEGIN_DAY <dbl> 23, 16, 19, 23, 24, 1, 1, 14, 14, 17, 13, 17, 17, 1…
## $ BEGIN_TIME <dbl> 1947, 230, 1839, 2155, 1405, 0, 0, 1510, 1352, 1100…
## $ END_YEARMONTH <dbl> 202405, 202411, 202405, 202405, 202405, 202411, 202…
## $ END_DAY <dbl> 23, 18, 19, 23, 24, 1, 1, 14, 14, 18, 13, 18, 18, 1…
## $ END_TIME <dbl> 1947, 1421, 1902, 2155, 1410, 1600, 1600, 1515, 135…
## $ EPISODE_ID.x <dbl> 190907, 197838, 190905, 190907, 191916, 197531, 197…
## $ EVENT_ID <dbl> 1180619, 1223377, 1184919, 1180805, 1182348, 122190…
## $ STATE <chr> "OKLAHOMA", "OREGON", "OKLAHOMA", "OKLAHOMA", "MISS…
## $ STATE_FIPS <dbl> 40, 41, 40, 40, 28, 53, 41, 28, 47, 41, 53, 41, 41,…
## $ YEAR <dbl> 2024, 2024, 2024, 2024, 2024, 2024, 2024, 2024, 202…
## $ MONTH_NAME <chr> "May", "November", "May", "May", "May", "November",…
## $ EVENT_TYPE <chr> "Hail", "Heavy Snow", "Tornado", "Thunderstorm Wind…
## $ CZ_TYPE <chr> "C", "Z", "C", "C", "C", "Z", "Z", "C", "C", "Z", "…
## $ CZ_FIPS <dbl> 65, 509, 39, 51, 115, 211, 127, 141, 71, 127, 201, …
## $ CZ_NAME <chr> "JACKSON", "EAST SLOPES OF THE OREGON CASCADES", "C…
## $ WFO <chr> "OUN", "PDT", "OUN", "OUN", "MEG", "PQR", "PQR", "M…
## $ BEGIN_DATE_TIME <chr> "23-MAY-24 19:47:00", "16-NOV-24 02:30:00", "19-MAY…
## $ CZ_TIMEZONE <chr> "CST-6", "PST-8", "CST-6", "CST-6", "CST-6", "PST-8…
## $ END_DATE_TIME <chr> "23-MAY-24 19:47:00", "18-NOV-24 14:21:00", "19-MAY…
## $ INJURIES_DIRECT <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ INJURIES_INDIRECT <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ DEATHS_DIRECT <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ DEATHS_INDIRECT <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ DAMAGE_PROPERTY <chr> NA, "0.00K", "150.00K", "10.00K", "1.00K", "0.00K",…
## $ DAMAGE_CROPS <chr> NA, "0.00K", "0.00K", NA, "0.00K", "0.00K", "0.00K"…
## $ SOURCE <chr> "Public", "SNOTEL", "NWS Storm Survey", "Other Fede…
## $ MAGNITUDE <dbl> 1.50, NA, NA, 61.00, 52.00, NA, NA, 1.00, 0.88, NA,…
## $ MAGNITUDE_TYPE <chr> NA, NA, NA, "EG", "EG", NA, NA, NA, NA, NA, "MG", N…
## $ FLOOD_CAUSE <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CATEGORY <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ TOR_F_SCALE <chr> NA, NA, "EF1", NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ TOR_LENGTH <dbl> NA, NA, 6.70, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ TOR_WIDTH <dbl> NA, NA, 400, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ TOR_OTHER_WFO <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ TOR_OTHER_CZ_STATE <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ TOR_OTHER_CZ_FIPS <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ TOR_OTHER_CZ_NAME <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ BEGIN_RANGE <dbl> 4, NA, 8, 2, 0, NA, NA, 1, 1, NA, NA, NA, NA, NA, N…
## $ BEGIN_AZIMUTH <chr> "S", NA, "WNW", "W", "N", NA, NA, "NW", "SSE", NA, …
## $ BEGIN_LOCATION <chr> "FRIENDSHIP", NA, "CUSTER CITY", "NINNEKAH", "ALGOM…
## $ END_RANGE <dbl> 4, NA, 5, 2, 0, NA, NA, 1, 1, NA, NA, NA, NA, NA, N…
## $ END_AZIMUTH <chr> "S", NA, "N", "W", "N", NA, NA, "NW", "SSE", NA, NA…
## $ END_LOCATION <chr> "FRIENDSHIP", NA, "CUSTER CITY", "NINNEKAH", "ALGOM…
## $ BEGIN_LAT <dbl> 34.6380, NA, 35.7100, 34.9501, 34.1800, NA, NA, 34.…
## $ BEGIN_LON <dbl> -99.2167, NA, -99.0010, -97.9523, -89.0300, NA, NA,…
## $ END_LAT <dbl> 34.6380, NA, 35.7370, 34.9501, 34.1800, NA, NA, 34.…
## $ END_LON <dbl> -99.2167, NA, -98.8910, -97.9523, -89.0300, NA, NA,…
## $ EPISODE_NARRATIVE <chr> "Two primary rounds of severe convection occurred o…
## $ EVENT_NARRATIVE <chr> "MPing report.", "The Hog Pass SNOTEL reported an e…
## $ DATA_SOURCE <chr> "CSV", "CSV", "CSV", "CSV", "CSV", "CSV", "CSV", "C…
## $ YEARMONTH <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ EPISODE_ID.y <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LOCATION_INDEX <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ RANGE <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ AZIMUTH <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LOCATION <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LATITUDE <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LONGITUDE <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LAT2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LON2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ FAT_YEARMONTH <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ FAT_DAY <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ FAT_TIME <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ FATALITY_ID <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ FATALITY_TYPE <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ FATALITY_DATE <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ FATALITY_AGE <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ FATALITY_SEX <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ FATALITY_LOCATION <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ EVENT_YEARMONTH <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
# Sample 5,000 rows for plotting
set.seed(786)
sample_data <- joined_data %>% sample_n(5000)
# Calculate % missing values in each column
missing_prop <- colMeans(is.na(sample_data))
# Highlight variables with >80% missing
missing_prop <- colMeans(is.na(sample_data))
highlight_vars <- names(missing_prop[missing_prop > 0.8])
library(patchwork)
# Plot 1: Visual missingness heatmap
plot1 <- vis_miss(sample_data) +
theme(axis.text.x = element_text(
angle = 90,
vjust = 0.5,
hjust = 1,
color = ifelse(names(sample_data) %in% highlight_vars, "red", "black"),
face = ifelse(names(sample_data) %in% highlight_vars, "bold", "plain")
)) +
labs(title = "Visual Map of Missing Data")
# Plot 2: Bar chart of missing values
plot2 <- gg_miss_var(sample_data) +
labs(title = "Missingness by Variable")
# Combine the two vertically
plot1 / plot2 +
plot_annotation(title = "Missing Data Overview: Heatmap + Bar Chart")
gg_miss_upset(sample_data, nsets = 10)
# Compute missingness percentage across the full dataset
missing_summary <- colSums(is.na(joined_data)) / nrow(joined_data) * 100
# Sort to view highest missingness first
missing_summary <- sort(missing_summary, decreasing = TRUE)
print(missing_summary)
## CATEGORY TOR_OTHER_WFO TOR_OTHER_CZ_STATE TOR_OTHER_CZ_FIPS
## 99.93657 99.41353 99.41353 99.41353
## TOR_OTHER_CZ_NAME FATALITY_AGE FATALITY_SEX FAT_YEARMONTH
## 99.41353 98.72023 98.63009 98.51547
## FAT_DAY FAT_TIME FATALITY_ID FATALITY_TYPE
## 98.51547 98.51547 98.51547 98.51547
## FATALITY_DATE FATALITY_LOCATION EVENT_YEARMONTH TOR_F_SCALE
## 98.51547 98.51547 98.51547 96.38549
## TOR_LENGTH TOR_WIDTH FLOOD_CAUSE MAGNITUDE_TYPE
## 96.38549 96.38549 72.40485 68.59448
## MAGNITUDE YEARMONTH EPISODE_ID.y LOCATION_INDEX
## 58.44425 46.27309 46.27309 46.27309
## RANGE AZIMUTH LOCATION LATITUDE
## 46.27309 46.27309 46.27309 46.27309
## LONGITUDE LAT2 LON2 BEGIN_RANGE
## 46.27309 46.27309 46.27309 31.23748
## BEGIN_AZIMUTH BEGIN_LOCATION END_RANGE END_AZIMUTH
## 31.23748 31.23748 31.23748 31.23748
## END_LOCATION BEGIN_LAT BEGIN_LON END_LAT
## 31.23748 31.23748 31.23748 31.23748
## END_LON DAMAGE_CROPS DAMAGE_PROPERTY EVENT_NARRATIVE
## 31.23748 17.84554 17.38037 13.45538
## BEGIN_YEARMONTH BEGIN_DAY BEGIN_TIME END_YEARMONTH
## 0.00000 0.00000 0.00000 0.00000
## END_DAY END_TIME EPISODE_ID.x EVENT_ID
## 0.00000 0.00000 0.00000 0.00000
## STATE STATE_FIPS YEAR MONTH_NAME
## 0.00000 0.00000 0.00000 0.00000
## EVENT_TYPE CZ_TYPE CZ_FIPS CZ_NAME
## 0.00000 0.00000 0.00000 0.00000
## WFO BEGIN_DATE_TIME CZ_TIMEZONE END_DATE_TIME
## 0.00000 0.00000 0.00000 0.00000
## INJURIES_DIRECT INJURIES_INDIRECT DEATHS_DIRECT DEATHS_INDIRECT
## 0.00000 0.00000 0.00000 0.00000
## SOURCE EPISODE_NARRATIVE DATA_SOURCE
## 0.00000 0.00000 0.00000
# Identify columns with more than 90% missing data
# These columns are typically not useful for analysis and clutter the dataset
cols_to_drop <- names(missing_summary[missing_summary > 90])
# Drop high-missingness columns from dataset
joined_data_cleaned <- joined_data %>%
select(-all_of(cols_to_drop))
# Display which columns were dropped for transparency
cat("Dropped columns with >90% missingness:\n")
## Dropped columns with >90% missingness:
print(cols_to_drop)
## [1] "CATEGORY" "TOR_OTHER_WFO" "TOR_OTHER_CZ_STATE"
## [4] "TOR_OTHER_CZ_FIPS" "TOR_OTHER_CZ_NAME" "FATALITY_AGE"
## [7] "FATALITY_SEX" "FAT_YEARMONTH" "FAT_DAY"
## [10] "FAT_TIME" "FATALITY_ID" "FATALITY_TYPE"
## [13] "FATALITY_DATE" "FATALITY_LOCATION" "EVENT_YEARMONTH"
## [16] "TOR_F_SCALE" "TOR_LENGTH" "TOR_WIDTH"
Below are the analyses to address the key research questions.
# Select key columns related to fatalities and injuries.
health_data <- joined_data_cleaned %>%
select(EVENT_ID, EVENT_TYPE, STATE, DEATHS_DIRECT, DEATHS_INDIRECT, INJURIES_DIRECT, INJURIES_INDIRECT)
# Convert state names to uppercase (for standardization) and filter for US states.
us_states <- c(state.name, "District of Columbia") %>% toupper()
health_data <- health_data %>%
filter(STATE %in% us_states)
# Calculate total harm by summing deaths and injuries.
health_data <- health_data %>%
mutate(TOTAL_HARM = DEATHS_DIRECT + DEATHS_INDIRECT + INJURIES_DIRECT + INJURIES_INDIRECT)
# Group data by event type and compute total fatalities, injuries, and overall harm.
event_harm_summary <- health_data %>%
group_by(EVENT_TYPE) %>%
summarise(
TOTAL_DEATHS = sum(DEATHS_DIRECT + DEATHS_INDIRECT, na.rm = TRUE),
TOTAL_INJURIES = sum(INJURIES_DIRECT + INJURIES_INDIRECT, na.rm = TRUE),
TOTAL_HARM = sum(TOTAL_HARM, na.rm = TRUE)
) %>%
arrange(desc(TOTAL_HARM))
# Plot the top 10 most harmful weather events.
event_harm_summary %>%
slice_max(TOTAL_HARM, n = 10) %>%
ggplot(aes(x = reorder(EVENT_TYPE, TOTAL_HARM), y = TOTAL_HARM, fill = TOTAL_HARM)) +
geom_col(show.legend = FALSE) +
geom_text(aes(label = TOTAL_HARM), hjust = -0.1, size = 3.5, color = "black") +
coord_flip() +
labs(
title = "Top 10 Most Harmful Weather Events (Population Health Impact)",
subtitle = "Based on total reported injuries and deaths",
x = "Event Type",
y = "Total Harm (Injuries + Deaths)"
) +
theme_minimal(base_size = 12) +
expand_limits(y = max(event_harm_summary$TOTAL_HARM) * 1.1)
The analysis above highlights the top 10 storm event types in the U.S.
during 2024 that resulted in the highest combined counts of
direct/indirect deaths and injuries. Events such as tornadoes, excessive
heat, and flash floods emerge as the most harmful in terms of total
reported harm to population health. These event types show significantly
elevated casualty totals relative to others in the dataset. The results
are based on aggregated event-level injury and fatality data for all
U.S. states and territories.
# Deduplicate to the event level to avoid counting duplicates.
event_level_data <- health_data %>%
distinct(EVENT_ID, STATE, EVENT_TYPE, .keep_all = TRUE)
# Count unique events per state and event type.
event_counts_by_state <- event_level_data %>%
group_by(STATE, EVENT_TYPE) %>%
summarise(EVENT_COUNT = n(), .groups = "drop")
# For each state, identify the most common event type.
most_common_event_by_state <- event_counts_by_state %>%
group_by(STATE) %>%
slice_max(EVENT_COUNT, n = 1, with_ties = FALSE) %>%
arrange(STATE)
# Plot the most frequent event per state.
ggplot(most_common_event_by_state, aes(x = reorder(STATE, EVENT_COUNT), y = EVENT_COUNT)) +
geom_segment(aes(xend = STATE, y = 0, yend = EVENT_COUNT), color = "gray80") +
geom_point(aes(color = EVENT_TYPE), size = 4) +
coord_flip() +
labs(
title = "Most Common Weather Event by State (2024)",
subtitle = "Frequency of distinct storm events",
x = "State",
y = "Event Count",
color = "Event Type"
) +
theme_minimal(base_size = 12) +
theme(legend.position = "bottom")
This figure illustrates the most frequently recorded storm event type in
each U.S. state during 2024. The majority of states reported
Thunderstorm Wind as the most common hazard, highlighting its widespread
occurrence across the country. However, some states had different
dominant hazards—Texas experienced the highest number of Hail events,
while New Mexico and Georgia were led by Drought and Heat, respectively.
The differences in top event types reflect regional climatic patterns
and hazard exposures. Each point represents the most common event type
for a state, with the x-axis showing the corresponding number of
distinct storm events.
# Select the necessary columns for seasonal analysis.
seasonal_data <- joined_data_cleaned %>%
select(EVENT_TYPE, MONTH_NAME) %>%
filter(!is.na(MONTH_NAME), !is.na(EVENT_TYPE))
# Count the frequency of each event type by month.
event_monthly_counts <- seasonal_data %>%
group_by(EVENT_TYPE, MONTH_NAME) %>%
summarise(EVENT_COUNT = n(), .groups = "drop")
# Convert MONTH_NAME to an ordered factor from January to December.
event_monthly_counts$MONTH_NAME <- factor(
event_monthly_counts$MONTH_NAME,
levels = month.name,
ordered = TRUE
)
# Plot a heatmap to show event frequency by month and event type.
ggplot(event_monthly_counts, aes(x = MONTH_NAME, y = reorder(EVENT_TYPE, -EVENT_COUNT), fill = EVENT_COUNT)) +
geom_tile(color = "white") +
scale_fill_viridis_c(option = "C") +
labs(
title = "Seasonality of Weather Events (2024)",
subtitle = "Heatmap of event frequency by type and month",
x = "Month",
y = "Event Type",
fill = "Event Count"
) +
theme_minimal(base_size = 12) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
The seasonality heatmap reveals the monthly distribution of storm events
by type throughout 2024. Clear temporal patterns emerge — for example,
thunderstorm wind events cluster heavily in spring and summer months,
while winter storms and ice storms are concentrated in colder months.
This visualization helps illustrate the cyclical nature of various
hazards over the calendar year, offering insight into when different
event types tend to occur most often.
# Define a function to convert damage strings (e.g., "10K", "1.5M") to numeric values.
convert_damage <- function(damage) {
damage <- toupper(trimws(damage))
numeric_value <- as.numeric(gsub("[KMB]", "", damage))
multiplier <- case_when(
grepl("K", damage) ~ 1e3,
grepl("M", damage) ~ 1e6,
grepl("B", damage) ~ 1e9,
TRUE ~ 1
)
numeric_value * multiplier
}
# Select columns relevant to economic damage.
economic_data <- joined_data_cleaned %>%
select(EVENT_TYPE, DAMAGE_PROPERTY, DAMAGE_CROPS) %>%
mutate(
PROP_DMG_NUM = ifelse(is.na(DAMAGE_PROPERTY), NA, convert_damage(DAMAGE_PROPERTY)),
CROP_DMG_NUM = ifelse(is.na(DAMAGE_CROPS), NA, convert_damage(DAMAGE_CROPS)),
TOTAL_ECONOMIC_DAMAGE = rowSums(cbind(PROP_DMG_NUM, CROP_DMG_NUM), na.rm = TRUE)
)
# Summarize total economic damage by event type.
economic_summary <- economic_data %>%
group_by(EVENT_TYPE) %>%
summarise(
TOTAL_PROPERTY_DAMAGE = sum(PROP_DMG_NUM, na.rm = TRUE),
TOTAL_CROP_DAMAGE = sum(CROP_DMG_NUM, na.rm = TRUE),
TOTAL_ECONOMIC_DAMAGE = sum(TOTAL_ECONOMIC_DAMAGE, na.rm = TRUE)
) %>%
arrange(desc(TOTAL_ECONOMIC_DAMAGE))
# Function to format dollar amounts for the labels.
format_dollars <- function(x) {
ifelse(x >= 1e9,
dollar(x, scale = 1e-9, suffix = "B"),
dollar(x, scale = 1e-6, suffix = "M"))
}
# Plot the top 10 event types by economic damage.
economic_summary %>%
slice_max(TOTAL_ECONOMIC_DAMAGE, n = 10) %>%
ggplot(aes(x = reorder(EVENT_TYPE, TOTAL_ECONOMIC_DAMAGE), y = TOTAL_ECONOMIC_DAMAGE, fill = TOTAL_ECONOMIC_DAMAGE)) +
geom_col(show.legend = FALSE) +
geom_text(aes(label = format_dollars(TOTAL_ECONOMIC_DAMAGE)), hjust = -0.1, size = 3.5, color = "black") +
coord_flip() +
labs(
title = "Top 10 Weather Event Types by Economic Damage (2024)",
subtitle = "Combined property and crop damages",
x = "Event Type",
y = "Total Economic Damage (USD)"
) +
scale_fill_gradient(low = "#FFECB3", high = "#BF360C") +
scale_y_continuous(labels = format_dollars) +
theme_minimal(base_size = 13) +
expand_limits(y = max(economic_summary$TOTAL_ECONOMIC_DAMAGE) * 1.12)
This bar chart summarizes the total reported property and crop damage
for each event type. Events like hurricanes, floods, and hail caused the
most substantial economic losses in 2024, often measured in billions of
dollars. The financial impact was derived from NOAA’s reported figures
using standardized conversion (e.g., “K” = thousand, “M” = million).
These results quantify the relative economic burden associated with each
hazard.
This analysis of the 2024 NOAA Storm Events dataset provides a comprehensive overview of storm-related impacts across the United States. By integrating data from event details, locations, and fatalities, we examined the most harmful event types to population health, identified state-specific frequency patterns, uncovered seasonal trends, and quantified the economic consequences of severe weather.
The findings reveal clear geographic and temporal variability in how different hazards manifest, with events like tornadoes, heat, hail, and thunderstorm winds recurring prominently across multiple dimensions of impact. These insights may support future situational awareness efforts by municipal and emergency management professionals.
All results were generated through reproducible code applied to the raw NOAA files, ensuring transparency and traceability of every analytical step.