Synopsis

Every year in the United States, there are hundreds of dangerous weather occurrences that take the lives of thousands of people. According to the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, flashfloods, droughts, wind from thunderstorms, and floods are the most dangerous in terms of deaths and injuries. Building further on this point, flash floods are the leading events in four of the five top states by terms of their top events, showing that they aren’t just dangerous, they are very common as well. When looking at the monthly breakdown of these events, droughts are the most common event in five of the twelve months. Surprisingly, these droughts are occurring in the colder months. This could potentially be due to snow droughts, which occur when there are abnormally low snowfall numbers for the time of year due to warm temperatures. It is possible that the NOAA has tracked this for this exact reason, and since climate change has seen a slight increase in temperature, a lot of potential snowfalls have just been rain or sleet instead. To end the analysis, there are many events that are rare, but cause a lot more damage than events that are more common. While flash floods cause the most damage, tornadoes, thunderstorm wind, and hail all cause millions in damages each year. Ultimately, with climate change not slowing down, extreme events such as tornadoes and wildfires are going to continue to happen more often and continue to cause millions in damages each year in the United States.


Data Processing

The analysis began with three raw CSV files from the NOAA from the year 2024, with the data used being stored inside. Once these datasets were loaded into R, they were merged using the common EVENT_ID field, with date-time values being parsed to facilitate a time-based analysis. New variables were also created for month and year, and rows with invalid or missing data entries were filtered out to avoid coding problems. The completely processed dataset “storm_data” was then used to finish the analysis.

library(dplyr)
library(readr)
library(ggplot2)
library(lubridate)
details <- read_csv("StormEvents_details-ftp_v1.0_d2024_c20250401.csv.gz")
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
##   dat <- vroom(...)
##   problems(dat)
## Rows: 70196 Columns: 51
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (26): STATE, MONTH_NAME, EVENT_TYPE, CZ_TYPE, CZ_NAME, WFO, BEGIN_DATE_T...
## dbl (24): BEGIN_YEARMONTH, BEGIN_DAY, BEGIN_TIME, END_YEARMONTH, END_DAY, EN...
## lgl  (1): CATEGORY
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
fatalities <- read_csv("StormEvents_fatalities-ftp_v1.0_d2024_c20250401.csv.gz")
## Rows: 1047 Columns: 11
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): FATALITY_TYPE, FATALITY_DATE, FATALITY_SEX, FATALITY_LOCATION
## dbl (7): FAT_YEARMONTH, FAT_DAY, FAT_TIME, FATALITY_ID, EVENT_ID, FATALITY_A...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
locations <- read_csv("StormEvents_locations-ftp_v1.0_d2024_c20250401.csv.gz")
## Rows: 48112 Columns: 11
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): AZIMUTH, LOCATION
## dbl (9): YEARMONTH, EPISODE_ID, EVENT_ID, LOCATION_INDEX, RANGE, LATITUDE, L...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
storm_data <- details %>%
  left_join(locations, by = "EVENT_ID") %>%
  left_join(fatalities, by = "EVENT_ID") %>%
  mutate(
    BEGIN_DATE_TIME = parse_date_time(BEGIN_DATE_TIME, orders = c("mdy HMS", "mdy HM", "mdy")),
    END_DATE_TIME = parse_date_time(END_DATE_TIME, orders = c("mdy HMS", "mdy HM", "mdy")),
    month = month(BEGIN_DATE_TIME, label = TRUE),
    year = year(BEGIN_DATE_TIME)
  ) %>%
  filter(!is.na(BEGIN_DATE_TIME))
## Warning in left_join(., fatalities, by = "EVENT_ID"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 830 of `x` matches multiple rows in `y`.
## ℹ Row 441 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.
## Warning: There were 2 warnings in `mutate()`.
## The first warning was:
## ℹ In argument: `BEGIN_DATE_TIME = parse_date_time(...)`.
## Caused by warning:
## !  66062 failed to parse.
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.

Research Question 1: Which event types were most harmful to population health?

health_impact <- storm_data %>%
  group_by(EVENT_TYPE) %>%
  summarise(
    fatalities = n(),
    injuries_direct = sum(INJURIES_DIRECT, na.rm = TRUE),
    injuries_indirect = sum(INJURIES_INDIRECT, na.rm = TRUE),
    total_health = fatalities + injuries_direct + injuries_indirect,
    .groups = "drop"
  ) %>%
  arrange(desc(total_health))

ggplot(head(health_impact, 10), aes(x = reorder(EVENT_TYPE, total_health), y = total_health)) +
  geom_col(fill = "firebrick") +
  coord_flip() +
  labs(
    title = "Top 10 Most Harmful Event Types (2024)",
    x = "Event Type",
    y = "Fatalities + Injuries"
  )


Research Question 2: Which event types were most common by state?

events_by_state <- storm_data %>%
  count(STATE, EVENT_TYPE, sort = TRUE)

top_events_state <- events_by_state %>%
  group_by(STATE) %>%
  slice_max(n, n = 1) %>%
  ungroup()

ggplot(top_events_state, aes(x = reorder(STATE, -n), y = n, fill = EVENT_TYPE)) +
  geom_col() +
  labs(title = "Most Common Severe Weather Events by State (2024)",
       x = "State", y = "Number of Events") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))


Research Question 3: Which event types are most common in each month?

events_by_month <- storm_data %>%
  count(MONTH_NAME, EVENT_TYPE) %>%
  group_by(MONTH_NAME) %>%
  slice_max(n, n = 1)

ggplot(events_by_month, aes(x = MONTH_NAME, y = n, fill = EVENT_TYPE)) +
  geom_col() +
  labs(title = "Most Common Event Types by Month (2024)",
       x = "Month", y = "Count")


Research Question 4: Which event types caused the greatest economic damage?

economic_impact <- storm_data %>%
  mutate(DAMAGE_PROPERTY = as.numeric(gsub("K$", "", DAMAGE_PROPERTY)) * 1000) %>%
  group_by(EVENT_TYPE) %>%
  summarise(
    Total_Damage = sum(DAMAGE_PROPERTY, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(Total_Damage)) %>%
  slice_head(n = 10)
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `DAMAGE_PROPERTY = as.numeric(gsub("K$", "", DAMAGE_PROPERTY)) *
##   1000`.
## Caused by warning:
## ! NAs introduced by coercion
ggplot(economic_impact, aes(x = reorder(EVENT_TYPE, Total_Damage), y = Total_Damage)) +
  geom_col(fill = "royalblue") +
  coord_flip() +
  labs(
    title = "Top 10 Most Costly Event Types (2024)",
    x = "Event Type",
    y = "Total Damage (in Thousands)"
  )


Conclusion

This analysis highlights the significant impact of severe weather events in 2024. The most harmful events to population health included flash floods and thunderstorm winds, while flash floods, tornadoes, and hail caused the most property damage. These findings can support emergency preparedness and policy development efforts.