Synopsis

This analysis explores 2024 storm event data from the NOAA Storm Events Database. The goal is to uncover national weather patterns related to public safety, seasonal timing, state-level intensity, and economic consequences. The raw data includes thousands of entries detailing event types, injuries, deaths, damages, and dates.

Using tidyverse tools, the data is cleaned, merged, and analyzed entirely within this document. We identify the most harmful event types to human health, the most common weather events in highly impacted states, monthly seasonality trends, and the events with the largest financial toll. Visualizations highlight key findings. All processing is done in R from raw .csv files, and no more than five figures are presented in total.

Data Processing

The raw NOAA data was loaded from three CSV files and merged using EVENT_ID. Date fields were parsed to extract event months, and state/event names were standardized to ensure consistent grouping. Damage values were converted from character strings to numeric using parse_number(), and a new column combined property and crop damages. These transformations made the data clean and ready for analysis.

library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.4.2
## Warning: package 'forcats' was built under R version 4.4.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)
library(forcats)
library(scales)
## 
## Attaching package: 'scales'
## 
## The following object is masked from 'package:purrr':
## 
##     discard
## 
## The following object is masked from 'package:readr':
## 
##     col_factor
details <- read.csv("~/DAT 511/StormEvents_details.csv")
fatalities <- read.csv("~/DAT 511/StormEvents_fatalities.csv")
locations <- read.csv("~/DAT 511/StormEvents_locations.csv")

storm_data <- details %>%
  left_join(locations, by = "EVENT_ID") %>%
  left_join(fatalities, by = "EVENT_ID")
## Warning in left_join(., fatalities, by = "EVENT_ID"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 830 of `x` matches multiple rows in `y`.
## ℹ Row 441 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.
storm_data <- storm_data %>%
  mutate(
    BEGIN_DATE_TIME = mdy_hms(BEGIN_DATE_TIME),
    BEGIN_MONTH = month(BEGIN_DATE_TIME, label = TRUE, abbr = TRUE),
    STATE = str_to_title(STATE),
    EVENT_TYPE = str_to_title(EVENT_TYPE),
    DAMAGE_PROPERTY = parse_number(DAMAGE_PROPERTY),
    DAMAGE_CROPS = parse_number(DAMAGE_CROPS),
    Property_Damage = DAMAGE_PROPERTY + DAMAGE_CROPS
  )

Results

Q1: Which weather events are most harmful to population health?

health_impact <- storm_data %>%
  group_by(EVENT_TYPE) %>%
  summarise(
    Total_Fatalities = sum(DEATHS_DIRECT + DEATHS_INDIRECT, na.rm = TRUE),
    Total_Injuries = sum(INJURIES_DIRECT + INJURIES_INDIRECT, na.rm = TRUE)
  ) %>%
  mutate(Total_Harmed = Total_Fatalities + Total_Injuries) %>%
  arrange(desc(Total_Harmed)) %>%
  slice_head(n = 10)

ggplot(health_impact, aes(x = reorder(EVENT_TYPE, Total_Harmed), y = Total_Harmed)) +
  geom_col(fill = "steelblue") +
  coord_flip() +
  labs(
    title = "Top 10 Most Harmful Weather Events (Health Impact)",
    x = "Event Type", y = "Total Fatalities + Injuries"
  ) +
  theme_minimal()

Explanation:
Excessive heat was by far the most harmful weather event in 2024, causing over 4,000 total injuries and fatalities. Tornadoes and flash floods followed, while winter storms, hurricanes, and flooding had much lower public health impact.


Q2: Which events are most frequent in the most impacted states?

real_states <- state.name

top_states <- storm_data %>%
  filter(STATE %in% real_states) %>%
  group_by(STATE, EVENT_TYPE) %>%
  summarise(Event_Count = n(), .groups = "drop") %>%
  group_by(STATE) %>%
  slice_max(Event_Count, n = 1) %>%
  ungroup() %>%
  slice_max(Event_Count, n = 15)

ggplot(top_states, aes(x = Event_Count, y = fct_reorder(STATE, Event_Count), fill = EVENT_TYPE)) +
  geom_col() +
  labs(
    title = "Most Frequent Weather Event in Top 15 U.S. States (2024)",
    x = "Number of Events", y = "State", fill = "Event Type"
  ) +
  theme_minimal(base_size = 12) +
  theme(axis.text.y = element_text(size = 10))

Explanation:
Thunderstorm wind and flash floods were tied as the most common events across the top 15 states, each ranking first in 7 states. Thunderstorm wind dominated the Midwest and Northeast, including New York and Illinois. Flash floods were common in the South and Southeast, including Texas, Georgia, and Florida. Only California saw general flooding as its top event, and Oklahoma stood out for heat.


Q3: Which weather events are most common by month?

event_months <- storm_data %>%
  group_by(BEGIN_MONTH, EVENT_TYPE) %>%
  summarise(Event_Count = n(), .groups = "drop") %>%
  group_by(BEGIN_MONTH) %>%
  slice_max(Event_Count, n = 1)

ggplot(event_months, aes(x = Event_Count, y = fct_reorder(BEGIN_MONTH, Event_Count), fill = EVENT_TYPE)) +
  geom_col() +
  labs(
    title = "Most Common Weather Events by Month",
    x = "Number of Events", y = "Month", fill = "Event Type"
  ) +
  theme_minimal()

Explanation:
Thunderstorm wind was the dominant weather event from April to August, peaking in May. Flash floods were the top events in April, September, and November, while winter weather appeared in December and January. Drought was most prevalent in October, and early spring months featured hail and flooding.


Q4: Which weather events caused the most property and crop damage?

top_damage <- storm_data %>%
  group_by(EVENT_TYPE) %>%
  summarise(Total_Damage = sum(Property_Damage, na.rm = TRUE)) %>%
  arrange(desc(Total_Damage)) %>%
  slice_head(n = 10)

ggplot(top_damage, aes(x = reorder(EVENT_TYPE, Total_Damage), y = Total_Damage)) +
  geom_col(fill = "firebrick") +
  coord_flip() +
  scale_y_continuous(labels = comma) +
  labs(
    title = "Top 10 Weather Events by Property Damage",
    x = "Event Type", y = "Total Property + Crop Damage ($)"
  ) +
  theme_minimal()

Explanation:
Flash floods caused the highest total economic damage in 2024, surpassing $250,000. Thunderstorm winds and tornadoes followed as major contributors to losses. Hail and general flooding also had notable impacts, while lightning, wildfires, and drought caused relatively smaller but still significant property and crop damage.