This report analyzes the U.S. National Oceanic and Atmospheric Administration (NOAA) storm database for the year 2025 to identify the most severe weather events and their impacts. The goal is to provide actionable insights for government and municipal managers responsible for disaster preparedness and resource allocation.
The analysis examines four key dimensions: population health impact, geographic distribution, seasonal patterns, and economic damage. The results show that Flash Flood events are the most harmful to population health, accounting for the highest number of combined injuries and fatalities by a large margin.
Geographically, states such as Texas experience the highest frequency of severe weather events, particularly Flash Floods and Thunderstorm Winds. Seasonal trends indicate that winter months are dominated by cold-related events, while spring and summer months see increased activity from storms such as hail, thunderstorms, and flooding.
Finally, Flash Floods also represent the most economically damaging events, causing significantly higher property losses compared to other event types. These findings highlight the importance of prioritizing flood mitigation strategies, seasonal preparedness, and region-specific resource planning.
To begin the analysis, the raw CSV files containing the 2025 storm details, locations, and fatalities were loaded into the R environment using the readr package. These datasets were imported as data frames to enable efficient data manipulation.
Since the relevant information was distributed across multiple files, the datasets were merged into a single comprehensive dataset using the left_join() function from the dplyr package, based on the common EVENT_ID variable. This combined dataset includes event characteristics, geographic information, and fatality records.
Following the data integration, several transformations were performed to support the analysis. To measure the overall impact on population health, total injuries and deaths were calculated by combining both direct and indirect values, and a new variable total_harmed was created. Missing values were handled using appropriate functions (e.g., na.rm = TRUE) to ensure accurate aggregation.
The dataset was then grouped and summarized by key variables such as EVENT_TYPE, STATE, and MONTH_NAME to identify patterns across different dimensions. To improve clarity and visualization, subsets of the data were selected, including the top 10 most harmful events, the top 5 states with the highest event frequency, and the top 3 most common events per month.
For the economic impact analysis, a custom function was developed to convert alphanumeric property damage values (e.g., K, M, B) into numeric format. This allowed for accurate calculation and comparison of total economic damage across event types.
# 1. Loading necessary libraries
# These libraries are used for data manipulation (dplyr), reading CSV files (readr), and creating visualizations (ggplot2), and processing text data (tidytext).
library(dplyr)
library(readr)
library(ggplot2)
library(tidytext)
# 2. Defining the folder path
# This specifies the directory where the NOAA data files are stored.
folder_path <- '/Users/sabreenaaleemnabeela/Desktop/Main Folder/Data Stewardship/Final Project'
# 3. Defining File Paths
# These lines create full file paths for each dataset using the base folder.
details_file <- file.path(folder_path, "StormEvents_details-ftp_v1.0_d2025_c20260323.csv")
fatalities_file <- file.path(folder_path, "StormEvents_fatalities-ftp_v1.0_d2025_c20260323.csv")
locations_file <- file.path(folder_path, "StormEvents_locations-ftp_v1.0_d2025_c20260323.csv")
# 4. Loading the Raw Data
# The read_csv() function loads each CSV file into R as a data frame.
details <- read_csv(details_file)
fatalities <- read_csv(fatalities_file)
locations <- read_csv(locations_file)
# 5. Joining the datasets by EVENT_ID
# The datasets are merged into one using the common EVENT_ID variable.
# This creates a comprehensive dataset with all relevant information.
joined_data <- details %>%
left_join(locations, by = "EVENT_ID") %>%
left_join(fatalities, by = "EVENT_ID")
# 6. Saving the joined dataset
# The combined dataset is saved as a new CSV file for future use.
output_file <- file.path(folder_path, "StormEvents_joined_data.csv")
write_csv(joined_data, output_file)
# 7. Previewing the Data
# Displays the first few rows to verify that the data has been loaded and joined correctly.
message("Joined data saved to: ", output_file)
print(head(joined_data))
## # A tibble: 6 × 70
## BEGIN_YEARMONTH BEGIN_DAY BEGIN_TIME END_YEARMONTH END_DAY END_TIME
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 202503 31 1104 202503 31 1106
## 2 202503 30 1552 202503 30 1555
## 3 202501 5 1800 202501 6 2227
## 4 202501 3 1300 202501 3 1900
## 5 202501 3 1300 202501 3 1900
## 6 202501 3 1300 202501 3 1900
## # ℹ 64 more variables: EPISODE_ID.x <dbl>, EVENT_ID <dbl>, STATE <chr>,
## # STATE_FIPS <dbl>, YEAR <dbl>, MONTH_NAME <chr>, EVENT_TYPE <chr>,
## # CZ_TYPE <chr>, CZ_FIPS <dbl>, CZ_NAME <chr>, WFO <chr>,
## # BEGIN_DATE_TIME <chr>, CZ_TIMEZONE <chr>, END_DATE_TIME <chr>,
## # INJURIES_DIRECT <dbl>, INJURIES_INDIRECT <dbl>, DEATHS_DIRECT <dbl>,
## # DEATHS_INDIRECT <dbl>, DAMAGE_PROPERTY <chr>, DAMAGE_CROPS <chr>,
## # SOURCE <chr>, MAGNITUDE <dbl>, MAGNITUDE_TYPE <chr>, FLOOD_CAUSE <chr>, …
To determine which events are most harmful to population health, the direct and indirect injuries and deaths for each weather event type were aggregated to create a “total harmed” metric.
## Aggregating direct and indirect injuries and deaths
health_impact <- joined_data %>%
group_by(EVENT_TYPE) %>%
summarise(
total_injuries = sum(INJURIES_DIRECT, na.rm = TRUE) + sum(INJURIES_INDIRECT, na.rm = TRUE),
total_deaths = sum(DEATHS_DIRECT, na.rm = TRUE) + sum(DEATHS_INDIRECT, na.rm = TRUE),
total_harmed = total_injuries + total_deaths,
.groups = "drop"
) %>%
arrange(desc(total_harmed))
head(health_impact, 10)
## # A tibble: 10 × 4
## EVENT_TYPE total_injuries total_deaths total_harmed
## <chr> <dbl> <dbl> <dbl>
## 1 Flash Flood 81 37430 37511
## 2 Wildfire 419 873 1292
## 3 Tornado 688 441 1129
## 4 Excessive Heat 326 444 770
## 5 Dust Storm 417 91 508
## 6 Heat 51 429 480
## 7 Thunderstorm Wind 158 105 263
## 8 Winter Weather 131 47 178
## 9 Lightning 104 27 131
## 10 Winter Storm 33 65 98
## Isolating the top 10 for the plot
top10_health <- health_impact %>% slice_max(total_harmed, n = 10)
## Plot 1: Health Impact
ggplot(top10_health, aes(x = reorder(EVENT_TYPE, total_harmed), y = total_harmed)) +
geom_bar(stat = "identity", fill = "tomato") +
coord_flip() +
labs(
title = "Top 10 Most Harmful Event Types (Health Impact)",
x = "Event Type",
y = "Total Harmed (Injuries + Deaths)"
) +
theme_minimal()
Figure 1: A horizontal bar chart displaying the top 10 weather events that caused the highest combined number of injuries and fatalities in 2025. This highlights which events pose the greatest direct threat to human life.
Flash Flood events are by far the most harmful to population health in the United States, with a total of 37,511 combined injuries and deaths. This value is significantly higher than all other event types, indicating that flash floods pose the greatest threat to human life.
Other events such as wildfires and tornadoes also contribute to health impacts, but their effects are comparatively much smaller.
This suggests that emergency preparedness efforts should prioritize early warning systems, evacuation planning, and infrastructure improvements specifically for flash flood events.
The data was analyzed to calculate which five states experienced the highest total volume of severe weather events in 2025. The dataset was then filtered to these top states to isolate and observe the top five most frequent specific events within each of them.
## Counting the frequency of events by state
event_counts <- joined_data %>%
group_by(STATE, EVENT_TYPE) %>%
summarise(total_events = n(), .groups = "drop") %>%
arrange(desc(total_events))
head(event_counts, 10)
## # A tibble: 10 × 3
## STATE EVENT_TYPE total_events
## <chr> <chr> <int>
## 1 TEXAS Flash Flood 3337
## 2 VIRGINIA Flash Flood 2140
## 3 ALABAMA Thunderstorm Wind 1533
## 4 TEXAS Hail 1520
## 5 CALIFORNIA Flood 1384
## 6 PENNSYLVANIA Flash Flood 1379
## 7 TEXAS Thunderstorm Wind 1243
## 8 VIRGINIA Thunderstorm Wind 1115
## 9 ARIZONA Flash Flood 1076
## 10 WEST VIRGINIA Flash Flood 1055
## Finding the top 5 states with the most overall events
top_states <- joined_data %>%
count(STATE) %>%
arrange(desc(n)) %>%
slice_max(n, n = 5) %>%
pull(STATE)
## Filtering to the top 5 states, AND finding the top 5 events WITHIN each state
top_state_events <- event_counts %>%
filter(STATE %in% top_states) %>%
group_by(STATE) %>%
slice_max(total_events, n = 5) %>%
ungroup()
## Plot 2: State Frequency
ggplot(top_state_events, aes(x = reorder_within(EVENT_TYPE, total_events, STATE), y = total_events, fill = STATE)) +
geom_col(show.legend = FALSE) +
# Adding the numeric annotations to the end of each bar
geom_text(aes(label = scales::comma(total_events)), hjust = -0.15, size = 2, color = "black") +
# Cleaning up the axis labels and expands the scale so annotations don't get cut off
scale_x_reordered() +
scale_y_continuous(labels = scales::comma, expand = expansion(mult = c(0, 0.35))) +
# Removing the ncol=1 constraint so the 5 states can wrap naturally into a nice grid
facet_wrap(~ STATE, scales = "free_y") +
coord_flip() +
labs(
title = "Top 5 Weather Events in the 5 Most Impacted States",
x = "Event Type",
y = "Number of Events"
) +
theme_minimal(base_size = 12) +
theme(
legend.position = "none",
plot.title = element_text(face = "bold", size = 14),
# UPDATED: Made text size 9, and added padding (margin) to make the grey box taller
strip.text = element_text(face = "bold", size = 8, margin = margin(t = 6, r = 0, b = 6, l = 0)),
strip.background = element_rect(fill = "gray90", color = NA),
panel.grid.major.y = element_blank(),
# Angles the numbers on the bottom so they don't overlap
axis.text.x = element_text(angle = 45, hjust = 1)
)
Figure 2: A faceted horizontal bar chart showing the top 5 most frequent weather event types within each of the five states that recorded the highest total number of weather events in 2025.
The analysis shows that Texas experiences the highest frequency of severe weather events, particularly Flash Floods and Thunderstorm Winds.
Flash Floods are consistently among the most common event types across multiple states, including Texas, Virginia, Pennsylvania, and West Virginia.
This pattern indicates that certain regions are more prone to specific types of weather events, and resource allocation should be tailored accordingly. For example, flood prevention and water management strategies are especially important in these high-risk states.
To understand seasonal patterns, the dataset was grouped by month, and the top three most frequent weather events for each calendar month were isolated to maintain visual readability.
## Counting events by month
monthly_counts <- joined_data %>%
group_by(MONTH_NAME, EVENT_TYPE) %>%
summarise(total_events = n(), .groups = "drop") %>%
group_by(MONTH_NAME) %>%
slice_max(total_events, n = 5) %>%
ungroup() %>%
arrange(match(MONTH_NAME, month.name), desc(total_events))
print(monthly_counts, n = 60)
## # A tibble: 60 × 3
## MONTH_NAME EVENT_TYPE total_events
## <chr> <chr> <int>
## 1 January Winter Storm 1122
## 2 January Winter Weather 1007
## 3 January Extreme Cold/Wind Chill 816
## 4 January Cold/Wind Chill 643
## 5 January Heavy Snow 602
## 6 February Flood 1382
## 7 February Winter Weather 1372
## 8 February Extreme Cold/Wind Chill 867
## 9 February Flash Flood 772
## 10 February Winter Storm 698
## 11 March Thunderstorm Wind 2401
## 12 March High Wind 1670
## 13 March Hail 1087
## 14 March Strong Wind 342
## 15 March Tornado 327
## 16 April Thunderstorm Wind 2801
## 17 April Flash Flood 1926
## 18 April Hail 1840
## 19 April Flood 971
## 20 April Tornado 588
## 21 May Thunderstorm Wind 3969
## 22 May Hail 2858
## 23 May Flash Flood 2165
## 24 May Flood 809
## 25 May Tornado 566
## 26 June Thunderstorm Wind 5416
## 27 June Flash Flood 3783
## 28 June Hail 1426
## 29 June Flood 812
## 30 June Heat 563
## 31 July Flash Flood 6608
## 32 July Thunderstorm Wind 3855
## 33 July Heat 1423
## 34 July Excessive Heat 817
## 35 July Flood 810
## 36 August Flash Flood 2230
## 37 August Thunderstorm Wind 1682
## 38 August Heat 906
## 39 August Flood 621
## 40 August Hail 500
## 41 September Flash Flood 1379
## 42 September Thunderstorm Wind 807
## 43 September Hail 552
## 44 September Drought 402
## 45 September Flood 280
## 46 October Flash Flood 628
## 47 October Drought 432
## 48 October Flood 233
## 49 October Thunderstorm Wind 227
## 50 October Coastal Flood 186
## 51 November Drought 425
## 52 November Winter Storm 357
## 53 November Winter Weather 336
## 54 November Flood 304
## 55 November High Wind 240
## 56 December Winter Weather 1276
## 57 December High Wind 1061
## 58 December Flood 561
## 59 December Winter Storm 475
## 60 December Drought 325
## Filtering to the top 3 events per month for readability
top_monthly_events <- monthly_counts %>%
group_by(MONTH_NAME) %>%
slice_max(total_events, n = 3) %>%
ungroup()
## Ensuring chronological order for the facet wrap
top_monthly_events$MONTH_NAME <- factor(top_monthly_events$MONTH_NAME, levels = month.name)
## Plot 3: Monthly Trends
ggplot(top_monthly_events, aes(x = reorder(EVENT_TYPE, total_events), y = total_events, fill = MONTH_NAME)) +
geom_bar(stat = "identity") +
facet_wrap(~ MONTH_NAME, scales = "free_y", ncol = 3) +
coord_flip() +
labs(
title = "Top 3 Most Common Event Types by Month",
x = "Event Type",
y = "Number of Events"
) +
theme_minimal() +
theme(legend.position = "none", axis.text.y = element_text(size = 7))
Figure 3: A faceted grid chart illustrating the seasonality of severe weather, displaying the top three most frequent event types for each month of the year chronologically.
The results reveal strong seasonal patterns in severe weather events. Winter months such as January and February are dominated by cold-related events, including Winter Weather and Extreme Cold/Wind Chill.
In contrast, spring and summer months (April through August) experience a higher frequency of Thunderstorm Winds, Flash Floods, and Hail events.
The peak activity occurs during the summer months, particularly July and August, when convective storms are more frequent.
These findings suggest that emergency preparedness strategies should be adjusted seasonally, with different types of risks prioritized throughout the year.
Data Cleaning: The raw data uses letters (K, M, B) to represent thousands, millions, and billions. The custom convert_damage function strips out these letters and applies the correct mathematical multiplier so R can calculate actual dollar amounts.
Data Aggregation: Using dplyr, the code groups the newly cleaned data by EVENT_TYPE. It then uses summarise to add up the total property damage for each weather category and sorts them from highest to lowest.
Visualization: The slice_max command isolates just the top 10 most expensive events. Finally, ggplot builds a bar chart, using coord_flip() to turn it horizontal so the long weather event names are easy to read.
## Custom function to convert alphanumeric damage abbreviations (K, M, B) to actual numbers
convert_damage <- function(x) {
x <- toupper(x)
as.numeric(gsub("[KMB]", "", x)) *
ifelse(grepl("K", x), 1e3,
ifelse(grepl("M", x), 1e6,
ifelse(grepl("B", x), 1e9, 1)))
}
## Applying the conversion to the dataset
joined_data$damage_clean <- convert_damage(joined_data$DAMAGE_PROPERTY)
## Summarizing the clean damage data
damage_summary <- joined_data %>%
group_by(EVENT_TYPE) %>%
summarise(total_damage = sum(damage_clean, na.rm = TRUE), .groups = "drop") %>%
arrange(desc(total_damage))
head(damage_summary, 10)
## # A tibble: 10 × 2
## EVENT_TYPE total_damage
## <chr> <dbl>
## 1 Flash Flood 78285317100
## 2 Tornado 13264707500
## 3 Debris Flow 902091800
## 4 Wildfire 788942110
## 5 Flood 313959150
## 6 Thunderstorm Wind 260048450
## 7 Hail 71587500
## 8 Drought 37133250
## 9 Lightning 22620150
## 10 High Wind 12212600
## Isolating the top 10 for the plot
top10_damage <- damage_summary %>% slice_max(total_damage, n = 10)
## Plot 4: Economic Impact
ggplot(top10_damage, aes(x = reorder(EVENT_TYPE, total_damage), y = total_damage)) +
geom_bar(stat = "identity", fill = "darkblue") +
coord_flip() +
labs(
title = "Top 10 Event Types by Property Damage",
x = "Event Type",
y = "Total Property Damage (USD)"
) +
theme_minimal()
Figure 4: A horizontal bar chart detailing the top 10 weather events that resulted in the highest total property damage (in USD) across the United States in 2025.
Flash Floods are also the most economically damaging event type, causing approximately USD 78 billion in property damage. This is significantly higher than the next most damaging event, Tornadoes, which account for about USD 13 billion.
The large gap highlights the severe financial consequences associated with flooding events, likely due to widespread infrastructure damage and property loss.
These results emphasize the importance of investing in flood mitigation strategies, such as improved drainage systems, flood barriers, and land-use planning, to reduce future economic losses.