1 Synopsis

This report analyzes the NOAA Storm Events database for 2025 to identify the most harmful weather events, their geographic distribution, and their seasonal patterns across the United States. The analysis integrates three NOAA datasets—event details, fatalities, and locations—using a common event identifier.

Public health impact is measured using total deaths and injuries, while economic impact is assessed using property and crop damage estimates. The results show that extreme heat and tornadoes are the most harmful to population health, while thunderstorm wind events occur most frequently across many states. Seasonal patterns indicate that storm activity varies significantly throughout the year, with winter events dominating early months and convective storms increasing during warmer periods.

The analysis also reveals that certain events, such as tornadoes and wildfires, contribute disproportionately to economic damage. These findings highlight the importance of considering multiple dimensions of risk—frequency, severity, and timing—when planning for severe weather events.

2 Data Processing

This section describes how the raw NOAA Storm Events files were loaded, joined, cleaned, transformed, and prepared for analysis. The project uses the 2025 files because the assignment asks for the most recent matching details, fatalities, and locations files.

All transformations were performed to ensure consistency and interpretability of the data. Direct and indirect deaths and injuries were combined to capture total health impact, while damage values were converted into numeric form to allow accurate comparison across event types. Event-level aggregation was used to prevent double counting caused by multiple location or fatality records per event. These steps ensure that the analysis is both accurate and reproducible.

2.1 Load required packages

The packages below are used for reading CSV files, manipulating data, handling strings, formatting tables, and creating visualizations.

library(dplyr)
library(readr)
library(ggplot2)
library(stringr)
library(forcats)
library(scales)
library(knitr)
library(tidyr)

2.2 Define file paths and load raw CSV files

The analysis begins with the raw CSV files. The three files should be saved in the same folder as this R Markdown document.

folder_path <- "."

details_file <- file.path(folder_path, "StormEvents_details-ftp_v1.0_d2025_c20260323.csv")
fatalities_file <- file.path(folder_path, "StormEvents_fatalities-ftp_v1.0_d2025_c20260323.csv")
locations_file <- file.path(folder_path, "StormEvents_locations-ftp_v1.0_d2025_c20260323.csv")

details <- read_csv(details_file, show_col_types = FALSE)
fatalities <- read_csv(fatalities_file, show_col_types = FALSE)
locations <- read_csv(locations_file, show_col_types = FALSE)

dim(details)
## [1] 72241    51
dim(fatalities)
## [1] 895  10
dim(locations)
## [1] 51870    11

The details file contains the main event-level records, including state, event type, date, injuries, deaths, and damage estimates. The fatalities file contains fatality-level records where available. The locations file contains additional geographic information for events.

2.3 Inspect key columns

names(details)
##  [1] "BEGIN_YEARMONTH"    "BEGIN_DAY"          "BEGIN_TIME"        
##  [4] "END_YEARMONTH"      "END_DAY"            "END_TIME"          
##  [7] "EPISODE_ID"         "EVENT_ID"           "STATE"             
## [10] "STATE_FIPS"         "YEAR"               "MONTH_NAME"        
## [13] "EVENT_TYPE"         "CZ_TYPE"            "CZ_FIPS"           
## [16] "CZ_NAME"            "WFO"                "BEGIN_DATE_TIME"   
## [19] "CZ_TIMEZONE"        "END_DATE_TIME"      "INJURIES_DIRECT"   
## [22] "INJURIES_INDIRECT"  "DEATHS_DIRECT"      "DEATHS_INDIRECT"   
## [25] "DAMAGE_PROPERTY"    "DAMAGE_CROPS"       "SOURCE"            
## [28] "MAGNITUDE"          "MAGNITUDE_TYPE"     "FLOOD_CAUSE"       
## [31] "CATEGORY"           "TOR_F_SCALE"        "TOR_LENGTH"        
## [34] "TOR_WIDTH"          "TOR_OTHER_WFO"      "TOR_OTHER_CZ_STATE"
## [37] "TOR_OTHER_CZ_FIPS"  "TOR_OTHER_CZ_NAME"  "BEGIN_RANGE"       
## [40] "BEGIN_AZIMUTH"      "BEGIN_LOCATION"     "END_RANGE"         
## [43] "END_AZIMUTH"        "END_LOCATION"       "BEGIN_LAT"         
## [46] "BEGIN_LON"          "END_LAT"            "END_LON"           
## [49] "EPISODE_NARRATIVE"  "EVENT_NARRATIVE"    "DATA_SOURCE"
names(fatalities)
##  [1] "FAT_YEARMONTH"     "FAT_DAY"           "FAT_TIME"         
##  [4] "FATALITY_ID"       "EVENT_ID"          "FATALITY_TYPE"    
##  [7] "FATALITY_DATE"     "FATALITY_AGE"      "FATALITY_SEX"     
## [10] "FATALITY_LOCATION"
names(locations)
##  [1] "YEARMONTH"      "EPISODE_ID"     "EVENT_ID"       "LOCATION_INDEX"
##  [5] "RANGE"          "AZIMUTH"        "LOCATION"       "LATITUDE"      
##  [9] "LONGITUDE"      "LAT2"           "LON2"

2.4 Join the three NOAA files

The assignment requires the three NOAA files to be joined by EVENT_ID. The code below first creates the direct joined file required for the project. However, for analysis, the locations and fatalities files are also summarized to the event level before joining. This prevents events with multiple location or fatality records from being double-counted in event-frequency summaries.

StormEvents_joined_data <- details %>%
  left_join(locations, by = "EVENT_ID") %>%
  left_join(fatalities, by = "EVENT_ID")

write_csv(StormEvents_joined_data, "StormEvents_joined_data.csv")

locations_by_event <- locations %>%
  group_by(EVENT_ID) %>%
  summarise(
    location_record_count = n(),
    location_names = paste(sort(unique(LOCATION)), collapse = "; "),
    .groups = "drop"
  )

fatalities_by_event <- fatalities %>%
  group_by(EVENT_ID) %>%
  summarise(
    fatality_record_count = n(),
    fatality_types = paste(sort(unique(FATALITY_TYPE)), collapse = "; "),
    .groups = "drop"
  )

storm_event_level <- details %>%
  left_join(locations_by_event, by = "EVENT_ID") %>%
  left_join(fatalities_by_event, by = "EVENT_ID")

dim(StormEvents_joined_data)
## [1] 94364    70
dim(storm_event_level)
## [1] 72241    55

2.5 Create analysis variables

Several new variables are created to support the analysis:

  • total_deaths: direct deaths plus indirect deaths.
  • total_injuries: direct injuries plus indirect injuries.
  • total_health_harm: total deaths plus total injuries.
  • property_damage_value: property damage converted from NOAA abbreviations such as K, M, and B into numeric dollars.
  • crop_damage_value: crop damage converted into numeric dollars.
  • total_damage_value: property damage plus crop damage.
  • event_month: month name ordered from January through December.

These transformations are necessary because the raw files separate direct and indirect health outcomes and store damage estimates as text values.

parse_damage <- function(x) {
  x_chr <- as.character(x)
  x_chr <- str_trim(x_chr)

  value <- parse_number(x_chr)

  multiplier <- case_when(
    str_detect(str_to_upper(x_chr), "K") ~ 1e3,
    str_detect(str_to_upper(x_chr), "M") ~ 1e6,
    str_detect(str_to_upper(x_chr), "B") ~ 1e9,
    str_detect(str_to_upper(x_chr), "T") ~ 1e12,
    TRUE ~ 1
  )

  value * multiplier
}

storm_clean <- storm_event_level %>%
  mutate(
    total_deaths = coalesce(DEATHS_DIRECT, 0) + coalesce(DEATHS_INDIRECT, 0),
    total_injuries = coalesce(INJURIES_DIRECT, 0) + coalesce(INJURIES_INDIRECT, 0),
    total_health_harm = total_deaths + total_injuries,
    property_damage_value = parse_damage(DAMAGE_PROPERTY),
    crop_damage_value = parse_damage(DAMAGE_CROPS),
    total_damage_value = coalesce(property_damage_value, 0) + coalesce(crop_damage_value, 0),
    event_month = factor(MONTH_NAME, levels = month.name),
    EVENT_TYPE = str_to_title(EVENT_TYPE),
    STATE = str_to_title(STATE)
  )

glimpse(storm_clean)
## Rows: 72,241
## Columns: 62
## $ BEGIN_YEARMONTH       <dbl> 202503, 202503, 202501, 202501, 202501, 202501, …
## $ BEGIN_DAY             <dbl> 31, 30, 5, 3, 3, 3, 3, 3, 3, 3, 19, 13, 13, 13, …
## $ BEGIN_TIME            <dbl> 1104, 1552, 1800, 1300, 1300, 1300, 1547, 1527, …
## $ END_YEARMONTH         <dbl> 202503, 202503, 202501, 202501, 202501, 202501, …
## $ END_DAY               <dbl> 31, 30, 6, 3, 3, 3, 3, 3, 3, 3, 19, 13, 13, 13, …
## $ END_TIME              <dbl> 1106, 1555, 2227, 1900, 1900, 1900, 1619, 1619, …
## $ EPISODE_ID            <dbl> 201366, 200337, 197733, 197761, 197761, 197761, …
## $ EVENT_ID              <dbl> 1252415, 1241136, 1222851, 1223112, 1223113, 122…
## $ STATE                 <chr> "Georgia", "Michigan", "Virginia", "Maryland", "…
## $ STATE_FIPS            <dbl> 13, 26, 51, 24, 24, 24, 24, 51, 24, 24, 27, 27, …
## $ YEAR                  <dbl> 2025, 2025, 2025, 2025, 2025, 2025, 2025, 2025, …
## $ MONTH_NAME            <chr> "March", "March", "January", "January", "January…
## $ EVENT_TYPE            <chr> "Thunderstorm Wind", "Tornado", "Winter Storm", …
## $ CZ_TYPE               <chr> "C", "C", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z"…
## $ CZ_FIPS               <dbl> 45, 27, 56, 506, 504, 503, 14, 53, 5, 505, 89, 7…
## $ CZ_NAME               <chr> "CARROLL", "CASS", "SPOTSYLVANIA", "CENTRAL AND …
## $ WFO                   <chr> "FFC", "IWX", "LWX", "LWX", "LWX", "LWX", "LWX",…
## $ BEGIN_DATE_TIME       <chr> "31-MAR-25 11:04:00", "30-MAR-25 15:52:00", "05-…
## $ CZ_TIMEZONE           <chr> "EST-5", "EST-5", "EST-5", "EST-5", "EST-5", "ES…
## $ END_DATE_TIME         <chr> "31-MAR-25 11:06:00", "30-MAR-25 15:55:00", "06-…
## $ INJURIES_DIRECT       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ INJURIES_INDIRECT     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ DEATHS_DIRECT         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ DEATHS_INDIRECT       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ DAMAGE_PROPERTY       <chr> "1.00K", "100.00K", NA, NA, NA, NA, "0.00K", NA,…
## $ DAMAGE_CROPS          <chr> NA, "0.00K", NA, NA, NA, NA, "0.00K", NA, NA, NA…
## $ SOURCE                <chr> "Emergency Manager", "NWS Storm Survey", "Traine…
## $ MAGNITUDE             <dbl> 52.0, NA, NA, NA, NA, NA, NA, NA, NA, NA, 38.0, …
## $ MAGNITUDE_TYPE        <chr> "EG", NA, NA, NA, NA, NA, NA, NA, NA, NA, "MS", …
## $ FLOOD_CAUSE           <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ CATEGORY              <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ TOR_F_SCALE           <chr> NA, "EF1", NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ TOR_LENGTH            <dbl> NA, 2.59, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ TOR_WIDTH             <dbl> NA, 100, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ TOR_OTHER_WFO         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ TOR_OTHER_CZ_STATE    <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ TOR_OTHER_CZ_FIPS     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ TOR_OTHER_CZ_NAME     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ BEGIN_RANGE           <dbl> 2.22, 1.24, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ BEGIN_AZIMUTH         <chr> "W", "SW", NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ BEGIN_LOCATION        <chr> "TYUS", "EDWARDSBURG", NA, NA, NA, NA, NA, NA, N…
## $ END_RANGE             <dbl> 2.22, 1.47, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ END_AZIMUTH           <chr> "W", "NNE", NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ END_LOCATION          <chr> "TYUS", "EDWARDSBURG", NA, NA, NA, NA, NA, NA, N…
## $ BEGIN_LAT             <dbl> 33.4757, 41.7900, NA, NA, NA, NA, NA, NA, NA, NA…
## $ BEGIN_LON             <dbl> -85.238, -86.100, NA, NA, NA, NA, NA, NA, NA, NA…
## $ END_LAT               <dbl> 33.4757, 41.8200, NA, NA, NA, NA, NA, NA, NA, NA…
## $ END_LON               <dbl> -85.238, -86.070, NA, NA, NA, NA, NA, NA, NA, NA…
## $ EPISODE_NARRATIVE     <chr> "A cold-front initiated a line of thunderstorms …
## $ EVENT_NARRATIVE       <chr> "Tree down at the intersection of highway 5 and …
## $ DATA_SOURCE           <chr> "CSV", "CSV", "CSV", "CSV", "CSV", "CSV", "CSV",…
## $ location_record_count <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ location_names        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ fatality_record_count <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ fatality_types        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ total_deaths          <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ total_injuries        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ total_health_harm     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ property_damage_value <dbl> 1e+03, 1e+05, NA, NA, NA, NA, 0e+00, NA, NA, NA,…
## $ crop_damage_value     <dbl> NA, 0, NA, NA, NA, NA, 0, NA, NA, NA, 0, 0, 0, 0…
## $ total_damage_value    <dbl> 1e+03, 1e+05, 0e+00, 0e+00, 0e+00, 0e+00, 0e+00,…
## $ event_month           <fct> March, March, January, January, January, January…

2.6 Missing value check

The missing value summary focuses on the variables used in the analysis. Injury and death count fields are treated as zero where no value is recorded because the absence of a count in these fields means there is no reported injury or death for that event record. Damage values are converted into numeric estimates, and missing or blank values are treated as zero in total_damage_value.

missing_summary <- storm_clean %>%
  select(
    EVENT_ID, STATE, EVENT_TYPE, event_month,
    DEATHS_DIRECT, DEATHS_INDIRECT,
    INJURIES_DIRECT, INJURIES_INDIRECT,
    total_deaths, total_injuries, total_health_harm,
    property_damage_value, crop_damage_value, total_damage_value
  ) %>%
  summarise(across(everything(), ~sum(is.na(.)))) %>%
  pivot_longer(
    cols = everything(),
    names_to = "variable",
    values_to = "missing_count"
  ) %>%
  arrange(desc(missing_count))

kable(missing_summary, caption = "Missing values in key analysis variables")
Missing values in key analysis variables
variable missing_count
crop_damage_value 17146
property_damage_value 16399
EVENT_ID 0
STATE 0
EVENT_TYPE 0
event_month 0
DEATHS_DIRECT 0
DEATHS_INDIRECT 0
INJURIES_DIRECT 0
INJURIES_INDIRECT 0
total_deaths 0
total_injuries 0
total_health_harm 0
total_damage_value 0

3 Results

3.1 Question 1: Which event types are most harmful with respect to population health?

Population health impact is measured as the sum of direct deaths, indirect deaths, direct injuries, and indirect injuries. This combined measure is more informative than looking at deaths alone because some event types cause large injury burdens even when deaths are relatively limited.

health_by_event <- storm_clean %>%
  group_by(EVENT_TYPE) %>%
  summarise(
    event_count = n_distinct(EVENT_ID),
    deaths = sum(total_deaths, na.rm = TRUE),
    injuries = sum(total_injuries, na.rm = TRUE),
    total_health_harm = sum(total_health_harm, na.rm = TRUE),
    harm_per_event = total_health_harm / event_count,
    .groups = "drop"
  ) %>%
  arrange(desc(total_health_harm))

top_health_events <- health_by_event %>%
  slice_head(n = 10)

kable(
  top_health_events,
  caption = "Top 10 event types by total population health impact in 2025"
)
Top 10 event types by total population health impact in 2025
EVENT_TYPE event_count deaths injuries total_health_harm harm_per_event
Excessive Heat 1439 90 326 416 0.2890896
Tornado 1591 64 257 321 0.2017599
Flash Flood 5393 209 20 229 0.0424625
Heat 2864 163 51 214 0.0747207
Thunderstorm Wind 21807 41 141 182 0.0083459
Winter Weather 4436 31 123 154 0.0347160
Lightning 288 21 98 119 0.4131944
Wildfire 350 63 43 106 0.3028571
Dust Storm 320 17 78 95 0.2968750
Rip Current 72 39 49 88 1.2222222
ggplot(top_health_events,
       aes(x = fct_reorder(EVENT_TYPE, total_health_harm),
           y = total_health_harm)) +
  geom_col() +
  coord_flip() +
  labs(
    title = "Top 10 Event Types by Population Health Impact",
    subtitle = "Total health harm = deaths + injuries",
    x = "Event Type",
    y = "Total Health Harm"
  ) +
  scale_y_continuous(labels = comma) +
  theme_minimal()
Figure 1. Top 10 storm event types by total population health impact in 2025. Total health impact equals direct and indirect deaths plus direct and indirect injuries.

Figure 1. Top 10 storm event types by total population health impact in 2025. Total health impact equals direct and indirect deaths plus direct and indirect injuries.

The results identify the event types that produced the largest combined number of deaths and injuries in 2025. These hazards are especially important for emergency management because they represent the largest observed public health burden in the data.

3.2 Question 2: Across the United States, which event types happen most often in which states?

This section counts unique storm events by state and event type. Unique EVENT_ID values are used so that events are not double-counted because of multiple location records.

state_event_counts <- storm_clean %>%
  group_by(STATE, EVENT_TYPE) %>%
  summarise(event_count = n_distinct(EVENT_ID), .groups = "drop") %>%
  arrange(desc(event_count))

top_state_event_combinations <- state_event_counts %>%
  slice_head(n = 15)

kable(
  top_state_event_combinations,
  caption = "Top 15 state-event combinations by number of unique storm events in 2025"
)
Top 15 state-event combinations by number of unique storm events in 2025
STATE EVENT_TYPE event_count
Alabama Thunderstorm Wind 1532
Texas Hail 1453
Texas Thunderstorm Wind 1205
Virginia Thunderstorm Wind 1096
Georgia Thunderstorm Wind 1025
Pennsylvania Thunderstorm Wind 1010
Illinois Thunderstorm Wind 978
Missouri Thunderstorm Wind 886
Oklahoma Hail 836
Kansas Thunderstorm Wind 832
South Dakota Thunderstorm Wind 749
North Carolina Thunderstorm Wind 734
Atlantic North Marine Thunderstorm Wind 724
Ohio Thunderstorm Wind 717
Indiana Thunderstorm Wind 714
top_state_event_combinations <- top_state_event_combinations %>%
  mutate(state_event = paste(STATE, EVENT_TYPE, sep = " - "))

ggplot(top_state_event_combinations,
       aes(x = fct_reorder(state_event, event_count),
           y = event_count)) +
  geom_col() +
  coord_flip() +
  labs(
    title = "Most Frequent State-Event Type Combinations",
    subtitle = "Counts based on unique EVENT_ID values",
    x = "State and Event Type",
    y = "Number of Events"
  ) +
  scale_y_continuous(labels = comma) +
  theme_minimal()
Figure 2. Top state-event type combinations by number of unique storm events in 2025.

Figure 2. Top state-event type combinations by number of unique storm events in 2025.

This result is useful because emergency planning is often implemented at the state or municipal level. Identifying the most frequent event type in each state helps show where preparedness priorities may differ geographically.

3.3 Question 3: Which event types are characterized by which months?

This question examines seasonality by counting event types across months. To keep the heatmap readable, the analysis focuses on the 10 most frequent event types overall.

top_event_types_overall <- storm_clean %>%
  count(EVENT_TYPE, sort = TRUE) %>%
  slice_head(n = 10) %>%
  pull(EVENT_TYPE)

monthly_event_counts <- storm_clean %>%
  filter(EVENT_TYPE %in% top_event_types_overall) %>%
  group_by(event_month, EVENT_TYPE) %>%
  summarise(event_count = n_distinct(EVENT_ID), .groups = "drop")

kable(
  monthly_event_counts %>% arrange(EVENT_TYPE, event_month),
  caption = "Monthly counts for the 10 most frequent event types in 2025"
)
Monthly counts for the 10 most frequent event types in 2025
event_month EVENT_TYPE event_count
January Drought 202
February Drought 240
March Drought 241
April Drought 294
May Drought 251
June Drought 186
July Drought 162
August Drought 123
September Drought 402
October Drought 432
November Drought 425
December Drought 325
January Flash Flood 69
February Flash Flood 267
March Flash Flood 124
April Flash Flood 725
May Flash Flood 624
June Flash Flood 919
July Flash Flood 1570
August Flash Flood 550
September Flash Flood 324
October Flash Flood 149
November Flash Flood 58
December Flash Flood 14
January Flood 144
February Flood 592
March Flood 86
April Flood 289
May Flood 247
June Flood 204
July Flood 213
August Flood 144
September Flood 83
October Flood 54
November Flood 73
December Flood 132
January Hail 4
February Hail 42
March Hail 1084
April Hail 1820
May Hail 2835
June Hail 1384
July Hail 768
August Hail 494
September Hail 544
October Hail 100
November Hail 128
December Hail 2
April Heat 2
May Heat 11
June Heat 539
July Heat 1399
August Heat 894
September Heat 10
December Heat 9
January High Wind 282
February High Wind 576
March High Wind 1669
April High Wind 231
May High Wind 230
June High Wind 76
July High Wind 33
August High Wind 19
September High Wind 39
October High Wind 147
November High Wind 240
December High Wind 1061
January Marine Thunderstorm Wind 28
February Marine Thunderstorm Wind 56
March Marine Thunderstorm Wind 232
April Marine Thunderstorm Wind 131
May Marine Thunderstorm Wind 375
June Marine Thunderstorm Wind 392
July Marine Thunderstorm Wind 466
August Marine Thunderstorm Wind 215
September Marine Thunderstorm Wind 64
October Marine Thunderstorm Wind 110
November Marine Thunderstorm Wind 38
December Marine Thunderstorm Wind 19
January Thunderstorm Wind 56
February Thunderstorm Wind 575
March Thunderstorm Wind 2390
April Thunderstorm Wind 2766
May Thunderstorm Wind 3895
June Thunderstorm Wind 5266
July Thunderstorm Wind 3793
August Thunderstorm Wind 1629
September Thunderstorm Wind 783
October Thunderstorm Wind 207
November Thunderstorm Wind 143
December Thunderstorm Wind 304
January Winter Storm 1120
February Winter Storm 693
March Winter Storm 230
April Winter Storm 62
May Winter Storm 5
June Winter Storm 1
October Winter Storm 8
November Winter Storm 357
December Winter Storm 475
January Winter Weather 1006
February Winter Weather 1369
March Winter Weather 294
April Winter Weather 134
May Winter Weather 12
June Winter Weather 2
September Winter Weather 1
October Winter Weather 8
November Winter Weather 336
December Winter Weather 1274
ggplot(monthly_event_counts,
       aes(x = event_month,
           y = fct_rev(EVENT_TYPE),
           fill = event_count)) +
  geom_tile(color = "white") +
  labs(
    title = "Seasonality of Major Storm Event Types",
    subtitle = "Monthly counts for the 10 most frequent event types",
    x = "Month",
    y = "Event Type",
    fill = "Event Count"
  ) +
  scale_fill_gradient(labels = comma) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
Figure 3. Monthly distribution of the 10 most frequent storm event types in 2025. Darker cells indicate higher event counts.

Figure 3. Monthly distribution of the 10 most frequent storm event types in 2025. Darker cells indicate higher event counts.

The heatmap shows whether event types are concentrated in particular months or distributed more evenly across the year. This information matters for public managers because seasonal timing affects public communication, staffing, and resource readiness.

3.4 Question 4: Which event types caused the greatest estimated economic damage?

For the custom research question, this report examines which storm event types produced the highest estimated economic damage. Economic damage is measured as property damage plus crop damage after converting NOAA damage abbreviations into numeric dollar values.

damage_by_event <- storm_clean %>%
  group_by(EVENT_TYPE) %>%
  summarise(
    event_count = n_distinct(EVENT_ID),
    property_damage = sum(property_damage_value, na.rm = TRUE),
    crop_damage = sum(crop_damage_value, na.rm = TRUE),
    total_damage = sum(total_damage_value, na.rm = TRUE),
    avg_damage_per_event = total_damage / event_count,
    .groups = "drop"
  ) %>%
  arrange(desc(total_damage))

top_damage_events <- damage_by_event %>%
  slice_head(n = 10)

kable(
  top_damage_events,
  caption = "Top 10 event types by estimated economic damage in 2025",
  format.args = list(big.mark = ",")
)
Top 10 event types by estimated economic damage in 2025
EVENT_TYPE event_count property_damage crop_damage total_damage avg_damage_per_event
Tornado 1,591 1,906,326,500 3,373,000 1,909,699,500 1,200,313.953
Flash Flood 5,393 1,297,150,550 785,000 1,297,935,550 240,670.415
Wildfire 350 788,932,110 193,415,000 982,347,110 2,806,706.029
Thunderstorm Wind 21,807 210,492,330 58,624,250 269,116,580 12,340.835
Flood 2,261 92,357,950 45,000 92,402,950 40,868.178
Hail 9,205 60,072,500 2,300,000 62,372,500 6,775.937
Debris Flow 163 50,600,200 1,000 50,601,200 310,436.810
Drought 3,283 37,133,250 2,370,000 39,503,250 12,032.668
Lightning 288 22,600,150 15,400 22,615,550 78,526.215
High Wind 4,603 12,162,600 49,000 12,211,600 2,652.965
ggplot(top_damage_events,
       aes(x = fct_reorder(EVENT_TYPE, total_damage),
           y = total_damage)) +
  geom_col() +
  coord_flip() +
  labs(
    title = "Top 10 Event Types by Estimated Economic Damage",
    subtitle = "Total damage = property damage + crop damage",
    x = "Event Type",
    y = "Estimated Damage"
  ) +
  scale_y_continuous(labels = dollar) +
  theme_minimal()
Figure 4. Top 10 storm event types by estimated economic damage in 2025. Total damage combines property and crop damage estimates.

Figure 4. Top 10 storm event types by estimated economic damage in 2025. Total damage combines property and crop damage estimates.

This economic analysis adds another dimension to the public health analysis. Some hazards may be very damaging financially but may not rank as highly in injuries or deaths, while other hazards may have a larger human impact than economic impact. Considering both dimensions provides a more complete understanding of severe weather risk.

4 Conclusion

This report analyzed the 2025 NOAA Storm Events data by linking the details, locations, and fatalities files using EVENT_ID. The results identify the event types with the greatest public health impact, the state-event combinations with the highest event counts, the seasonal timing of major storm event types, and the event types associated with the greatest estimated economic damage. The analysis shows why emergency managers should consider several dimensions of risk rather than relying only on event frequency. Events that happen often are not always the most harmful, and events that cause the greatest economic damage may differ from events that produce the greatest public health burden. The project is reproducible because it begins with the raw CSV files, documents each transformation, and shows the code used to generate each table and figure. This analysis demonstrates that severe weather risk is multi-dimensional and cannot be evaluated based on frequency alone. While events such as thunderstorm winds occur most often, events like excessive heat and tornadoes have significantly greater impacts on human life. Additionally, economic damage is driven by a different set of event types, further emphasizing the need for a comprehensive approach to risk assessment. By combining public health, geographic, seasonal, and economic perspectives, this report provides a more complete understanding of storm event risk in the United States. These insights can support policymakers and emergency management agencies in making more informed, data-driven decisions.