This analysis explores 2024 storm event data from the NOAA Storm Events Database. The goal is to uncover national weather patterns related to public safety, seasonal timing, state-level intensity, and economic consequences. The raw data includes thousands of entries detailing event types, injuries, deaths, damages, and dates.
Using tidyverse tools, the data is cleaned, merged, and
analyzed entirely within this document. We identify the most harmful
event types to human health, the most common weather events in highly
impacted states, monthly seasonality trends, and the events with the
largest financial toll. Visualizations highlight key findings. All
processing is done in R from raw .csv files, and no more
than five figures are presented in total.
The raw NOAA data was loaded from three CSV files and merged using EVENT_ID. Date fields were parsed to extract event months, and state/event names were standardized to ensure consistent grouping. Damage values were converted from character strings to numeric using parse_number(), and a new column combined property and crop damages. These transformations made the data clean and ready for analysis.
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.4.2
## Warning: package 'forcats' was built under R version 4.4.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)
library(forcats)
library(scales)
##
## Attaching package: 'scales'
##
## The following object is masked from 'package:purrr':
##
## discard
##
## The following object is masked from 'package:readr':
##
## col_factor
details <- read.csv("~/DAT 511/StormEvents_details.csv")
fatalities <- read.csv("~/DAT 511/StormEvents_fatalities.csv")
locations <- read.csv("~/DAT 511/StormEvents_locations.csv")
storm_data <- details %>%
left_join(locations, by = "EVENT_ID") %>%
left_join(fatalities, by = "EVENT_ID")
## Warning in left_join(., fatalities, by = "EVENT_ID"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 830 of `x` matches multiple rows in `y`.
## ℹ Row 441 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
## "many-to-many"` to silence this warning.
storm_data <- storm_data %>%
mutate(
BEGIN_DATE_TIME = mdy_hms(BEGIN_DATE_TIME),
BEGIN_MONTH = month(BEGIN_DATE_TIME, label = TRUE, abbr = TRUE),
STATE = str_to_title(STATE),
EVENT_TYPE = str_to_title(EVENT_TYPE),
DAMAGE_PROPERTY = parse_number(DAMAGE_PROPERTY),
DAMAGE_CROPS = parse_number(DAMAGE_CROPS),
Property_Damage = DAMAGE_PROPERTY + DAMAGE_CROPS
)
health_impact <- storm_data %>%
group_by(EVENT_TYPE) %>%
summarise(
Total_Fatalities = sum(DEATHS_DIRECT + DEATHS_INDIRECT, na.rm = TRUE),
Total_Injuries = sum(INJURIES_DIRECT + INJURIES_INDIRECT, na.rm = TRUE)
) %>%
mutate(Total_Harmed = Total_Fatalities + Total_Injuries) %>%
arrange(desc(Total_Harmed)) %>%
slice_head(n = 10)
ggplot(health_impact, aes(x = reorder(EVENT_TYPE, Total_Harmed), y = Total_Harmed)) +
geom_col(fill = "steelblue") +
coord_flip() +
labs(
title = "Top 10 Most Harmful Weather Events (Health Impact)",
x = "Event Type", y = "Total Fatalities + Injuries"
) +
theme_minimal()
Explanation:
Excessive heat was by far the most harmful weather event in 2024,
causing over 4,000 total injuries and fatalities. Tornadoes and flash
floods followed, while winter storms, hurricanes, and flooding had much
lower public health impact.
real_states <- state.name
top_states <- storm_data %>%
filter(STATE %in% real_states) %>%
group_by(STATE, EVENT_TYPE) %>%
summarise(Event_Count = n(), .groups = "drop") %>%
group_by(STATE) %>%
slice_max(Event_Count, n = 1) %>%
ungroup() %>%
slice_max(Event_Count, n = 15)
ggplot(top_states, aes(x = Event_Count, y = fct_reorder(STATE, Event_Count), fill = EVENT_TYPE)) +
geom_col() +
labs(
title = "Most Frequent Weather Event in Top 15 U.S. States (2024)",
x = "Number of Events", y = "State", fill = "Event Type"
) +
theme_minimal(base_size = 12) +
theme(axis.text.y = element_text(size = 10))
Explanation:
Thunderstorm wind and flash floods were tied as the most common events
across the top 15 states, each ranking first in 7 states. Thunderstorm
wind dominated the Midwest and Northeast, including New York and
Illinois. Flash floods were common in the South and Southeast, including
Texas, Georgia, and Florida. Only California saw general flooding as its
top event, and Oklahoma stood out for heat.
event_months <- storm_data %>%
group_by(BEGIN_MONTH, EVENT_TYPE) %>%
summarise(Event_Count = n(), .groups = "drop") %>%
group_by(BEGIN_MONTH) %>%
slice_max(Event_Count, n = 1)
ggplot(event_months, aes(x = Event_Count, y = fct_reorder(BEGIN_MONTH, Event_Count), fill = EVENT_TYPE)) +
geom_col() +
labs(
title = "Most Common Weather Events by Month",
x = "Number of Events", y = "Month", fill = "Event Type"
) +
theme_minimal()
Explanation:
Thunderstorm wind was the dominant weather event from April to August,
peaking in May. Flash floods were the top events in April, September,
and November, while winter weather appeared in December and January.
Drought was most prevalent in October, and early spring months featured
hail and flooding.
top_damage <- storm_data %>%
group_by(EVENT_TYPE) %>%
summarise(Total_Damage = sum(Property_Damage, na.rm = TRUE)) %>%
arrange(desc(Total_Damage)) %>%
slice_head(n = 10)
ggplot(top_damage, aes(x = reorder(EVENT_TYPE, Total_Damage), y = Total_Damage)) +
geom_col(fill = "firebrick") +
coord_flip() +
scale_y_continuous(labels = comma) +
labs(
title = "Top 10 Weather Events by Property Damage",
x = "Event Type", y = "Total Property + Crop Damage ($)"
) +
theme_minimal()
Explanation:
Flash floods caused the highest total economic damage in 2024,
surpassing $250,000. Thunderstorm winds and tornadoes followed as major
contributors to losses. Hail and general flooding also had notable
impacts, while lightning, wildfires, and drought caused relatively
smaller but still significant property and crop damage.