Across the United States, which types of events have the greatest economic consequences and are most harmful with respect to population health?

Synopsis

This analysis examines severe weather events in the United States from 1990 to 2011 using NOAA’s Storm Events database, focusing on fatalities, injuries, property damage, and crop damage. Event types were extensively cleaned and standardized to align with NOAA’s official 48‑category taxonomy, ensuring consistent comparisons across hazards. Tornadoes emerged as the most harmful events to human health, causing the highest combined number of fatalities and injuries. Heat events produced disproportionately high fatalities, highlighting a distinct risk profile compared with other hazards. Thunderstorm winds were among the most frequent and economically damaging events, particularly in terms of property loss. Hail caused the greatest crop damage, while floods and flash floods contributed substantially to both human and economic impacts. Scatter plots comparing fatalities with injuries and property damage with crop damage revealed clear clusters of high‑impact event types. The top‑ten rankings across all categories showed that a small number of hazards account for the majority of reported harm. Together, these findings provide a data‑driven overview of which weather events have historically posed the greatest risks to communities. The results offer a foundation for understanding hazard patterns and supporting informed preparedness and resource planning.

Data Processing

This analysis begins with the raw NOAA Storm Events CSV file, which contains all reported severe weather events and associated impacts across the United States. Because NOAA’s event taxonomy and reporting practices became more consistent in the early 1990s, the dataset was restricted to events occurring between 1990 and 2011. All data loading, cleaning, and transformation steps were performed entirely within this document using R and the tidyverse, ensuring full reproducibility. The raw field contains substantial inconsistencies—including misspellings, abbreviations, and combined or narrative entries—so a structured cleaning workflow was implemented to standardize event types to NOAA’s official 48‑event taxonomy. Pattern‑based classification and approximate string matching were used to resolve variants, and a diagnostic table was generated to document all classification decisions. No preprocessing was performed outside this document, and no manual edits were made to the dataset. Because some steps (such as loading the full CSV and generating diagnostic tables) may be time‑consuming, selected code chunks can be run with to avoid unnecessary recomputation during knitting.

Set up

he setup ensures that all code and charts are included in the document, as required for the assessment. However, this would not typically be standard practice if the report were being presented to a manager. The tidyverse package is loaded because it provides efficient access to dplyr, ggplot2, and other useful tools.

knitr::opts_chunk$set(echo = TRUE)

library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.5.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.6.0
## ✔ ggplot2   3.5.2     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.2.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Download data and install.

The data was downloaded directly into R using the link provided on the course website. The .csv file was then read into the RStudio environment.

url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"

download.file(url,
              destfile = "StormData.csv",
              mode = "wb")

StormData <- read_csv("StormData.csv")
## Rows: 902297 Columns: 37
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (18): BGN_DATE, BGN_TIME, TIME_ZONE, COUNTYNAME, STATE, EVTYPE, BGN_AZI,...
## dbl (18): STATE__, COUNTY, BGN_RANGE, COUNTY_END, END_RANGE, LENGTH, WIDTH, ...
## lgl  (1): COUNTYENDN
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Data Cleaning

A quick assessment of the dataset structure and content was performed by examining the first few rows, identifying the variables included, and checking their data types. A summary was also generated to understand the range of values. The unique() function was used to inspect the raw EVTYPE field.

head(StormData)
str(StormData)
summary(StormData)
unique(StormData$EVTYPE)

Parse date

The date variable was stored as a character string. To enable filtering and other operations, the date was parsed using the package (included in tidyverse). The year was then extracted to simplify filtering

StormData <- StormData %>%
  mutate(BGN_DATE = mdy_hms(BGN_DATE))

StormData <- StormData %>%
  mutate(Year = year(BGN_DATE))

Filter dataset to 1990 onwards

NOAA’s Storm Events database spans several decades, but reporting practices before the early 1990s were inconsistent, incomplete, and varied widely across states and counties. See the [NOAA FAQ][noaa-faq] for details. NOAA itself notes that the full set of 48 official event types was not adopted until 1996, and earlier records often include missing values, non‑standard event labels, and uneven documentation of impacts. To ensure comparability across hazards and avoid biases introduced by changes in reporting standards, this analysis restricts the dataset to events occurring from 1990 onward—a period during which event definitions expanded, data collection improved, and impact reporting became more systematic. Focusing on this more reliable period supports clearer interpretation of trends, reduces noise from inconsistent early records, and provides a defensible foundation for assessing which hazards have historically posed the greatest risks to population health and economic stability.

StormData_recent <- StormData %>%
  filter(Year >= 1990)

Clean EVTYPE

The EVTYPE field contains extensive inconsistencies—including misspellings, abbreviations, combined events, and narrative text—a structured cleaning workflow was implemented to map each entry to NOAA’s official list of 48 event types. Event labels were first normalised by converting to uppercase and trimming whitespace. A pattern‑based classification approach was then applied, using regular expressions to detect key hazard terms (e.g., “TSTM”, “HURRICANE”, “FLASH FLOOD”, “HAIL”) and assign each record to the appropriate standard category. Entries that did not match any pattern were flagged and reviewed using approximate string‑matching to identify the closest NOAA category, ensuring that near‑miss spellings and rare variants were correctly resolved. Non‑meteorological or administrative entries (e.g., monthly summaries, county names) were classified as Other and excluded from impact calculations. This workflow provides a reproducible, transparent mapping from raw event descriptions to NOAA’s standard taxonomy, supporting consistent downstream analysis of health and economic impacts.

library(stringdist)
## Warning: package 'stringdist' was built under R version 4.5.2
## 
## Attaching package: 'stringdist'
## The following object is masked from 'package:tidyr':
## 
##     extract
StormData_recent <- StormData_recent %>%
  mutate(
    EVTYPE_raw = str_to_upper(str_trim(EVTYPE))
  )
StormData_recent <- StormData_recent %>%
  mutate(
    EVTYPE_clean = case_when(
      str_detect(EVTYPE_raw, "TSTM|THUNDERSTORM") ~ "THUNDERSTORM WIND",
      str_detect(EVTYPE_raw, "TORNADO") ~ "TORNADO",
      str_detect(EVTYPE_raw, "HURRICANE|TYPHOON") ~ "HURRICANE/TYPHOON",
      str_detect(EVTYPE_raw, "FLASH FLOOD") ~ "FLASH FLOOD",
      str_detect(EVTYPE_raw, "FLOOD") ~ "FLOOD",
      str_detect(EVTYPE_raw, "HAIL") ~ "HAIL",
      str_detect(EVTYPE_raw, "SNOW") ~ "HEAVY SNOW",
      str_detect(EVTYPE_raw, "ICE|FREEZING|SLEET") ~ "ICE STORM",
      str_detect(EVTYPE_raw, "RAIN") ~ "HEAVY RAIN",
      str_detect(EVTYPE_raw, "HIGH WIND|GUST|STRONG WIND") ~ "HIGH WIND",
      str_detect(EVTYPE_raw, "HEAT|WARM") ~ "HEAT",
      str_detect(EVTYPE_raw, "COLD|CHILL|HYPOTHERMIA") ~ "COLD/WIND CHILL",
      str_detect(EVTYPE_raw, "FOG") ~ "DENSE FOG",
      str_detect(EVTYPE_raw, "SURF|SEAS|SWELL|WAVE") ~ "HIGH SURF",
      str_detect(EVTYPE_raw, "DUST DEVIL") ~ "DUST DEVIL",
      str_detect(EVTYPE_raw, "DUST") ~ "DUST STORM",
      str_detect(EVTYPE_raw, "RIP CURRENT") ~ "RIP CURRENT",
      str_detect(EVTYPE_raw, "VOLCANIC") ~ "VOLCANIC ASH",
      str_detect(EVTYPE_raw, "TSUNAMI") ~ "TSUNAMI",
      str_detect(EVTYPE_raw, "AVALANCHE") ~ "AVALANCHE",
      str_detect(EVTYPE_raw, "COASTAL") ~ "COASTAL FLOOD",
      str_detect(EVTYPE_raw, "LANDSLIDE|MUDSLIDE|ROCK SLIDE") ~ "DEBRIS FLOW",
      str_detect(EVTYPE_raw, "SMOKE|VOG") ~ "DENSE SMOKE",
      TRUE ~ EVTYPE_raw
    )
  )
unmatched <- StormData_recent %>%
  filter(EVTYPE_clean == EVTYPE_raw) %>%
  distinct(EVTYPE_raw)
official <- c("ASTRONOMICAL LOW TIDE","AVALANCHE","BLIZZARD","COASTAL FLOOD",
              "COLD/WIND CHILL","DEBRIS FLOW","DENSE FOG","DENSE SMOKE",
              "DROUGHT","DUST DEVIL","DUST STORM","EXCESSIVE HEAT",
              "EXTREME COLD/WIND CHILL","FLASH FLOOD","FLOOD","FREEZING FOG",
              "FROST/FREEZE","FUNNEL CLOUD","HAIL","HEAT","HEAVY RAIN",
              "HEAVY SNOW","HIGH SURF","HIGH WIND","HURRICANE/TYPHOON",
              "ICE STORM","LAKE-EFFECT SNOW","LAKESHORE FLOOD","LIGHTNING",
              "MARINE HAIL","MARINE HIGH WIND","MARINE STRONG WIND",
              "MARINE THUNDERSTORM WIND","RIP CURRENT","SEICHE","SLEET",
              "STORM SURGE/TIDE","STRONG WIND","THUNDERSTORM WIND","TORNADO",
              "TROPICAL DEPRESSION","TROPICAL STORM","TSUNAMI","VOLCANIC ASH",
              "WATERSPOUT","WILDFIRE","WINTER STORM","WINTER WEATHER")

unmatched_suggestions <- unmatched %>%
  mutate(suggestion = official[amatch(EVTYPE_raw, official, maxDist = 6)])
cleaning_report <- StormData_recent %>%
  count(EVTYPE_raw, EVTYPE_clean, sort = TRUE) %>%
  left_join(unmatched_suggestions, by = "EVTYPE_raw")

Create a dataset to enable the analysis of storm impact on population health and the economy.

Population health was assessed by looking at the sum of injuries and fatalities caused by each event type. Economic Impacts were assessed by summing crop damage and property damage. Impacts on population health were assessed by summing fatalities and injuries. Using the group_by and summarise() functions a data set was created to enable comparison across event types.

StormData_analysis <- StormData_recent %>%
  group_by(EVTYPE_clean) %>%
  summarise(
    Fatalities = sum(FATALITIES, na.rm = TRUE),
    Injuries   = sum(INJURIES, na.rm = TRUE),
    Property_Damage = sum(PROPDMG, na.rm = TRUE),
    Crop_Damage   = sum(CROPDMG, na.rm = TRUE )
  )

Across the United States, which types of events are most harmful with respect to population health?

To answer this question, a scatterplot of injuries versus fatalities was created to compare the total number of injuries and fatalities caused by each event type between 1990 and 2011.

ggplot(StormData_analysis,
       aes(x = Fatalities,
           y = Injuries,
           label = EVTYPE_clean)) +
  geom_point(color = "steelblue", size = 3, alpha = 0.7) +
  geom_text(check_overlap = TRUE, nudge_y = 50, size = 3) +
  labs(
    title = "Health Impacts of Severe Weather Events",
    subtitle = "Fatalities vs Injuries (1990–2011)",
    x = "Total Fatalities",
    y = "Total Injuries"
  ) +
  theme_minimal(base_size = 13)

Across the United States, which types of events have the greatest economic consequences?

To answer this question, a scatterplot was created to compare the estimated damage to crops and property between 1990 and 2011. The Storm Data FAQ page notes that “storm damage is estimated by making a best guess using all available data at the time of publication. The damage amounts are received from a variety of sources, and property and crop damage should be considered broad estimates.”

Source: NOAA Storm Data FAQ

ggplot(StormData_analysis,
       aes(x = Property_Damage,
           y = Crop_Damage,
           label = EVTYPE_clean)) +
  geom_point(color = "steelblue", size = 3, alpha = 0.7) +
  geom_text(check_overlap = TRUE, nudge_y = 50, size = 3) +
  labs(
    title = "Economic Impacts of Severe Weather Events",
    subtitle = "Property Damage vs Crop Damage (1990–2011)",
    x = "Most Property Damage",
    y = "Most Crop Damage"
  ) +
  theme_minimal(base_size = 13)

Most severe Impact

To support the prioritisation of resources for different types of events, the following bar chart lists the top ten weather events that have the greatest combined impact across all harm categories.

StormData_analysis_long <- StormData_analysis %>%
  pivot_longer(
    cols = c(Fatalities, Injuries, Property_Damage, Crop_Damage),
    names_to = "Outcome",
    values_to = "Count"
  )
StormData_analysis_long_top <- StormData_analysis_long %>%
  group_by(EVTYPE_clean) %>%
  summarise(Total_Harm = sum(Count)) %>%
  arrange(desc(Total_Harm)) %>%
  slice(1:10) %>%
  left_join(StormData_analysis_long, by = "EVTYPE_clean")  
ggplot(StormData_analysis_long_top,
       aes(x = reorder(EVTYPE_clean, Total_Harm),
           y = Count)) +
  geom_col(fill = "steelblue", position = position_dodge(width = 0.8)) +
  facet_wrap(~Outcome, scales = "free_x") +
  coord_flip() +
  labs(
    title = "Top 10 Most Harmful Weather Events",
    x = "Event Type",
    y = "Total Count"
  ) +
  theme_minimal(base_size = 13) +
  theme(axis.text.y = element_text(size = 11))

Results and Discussion

Analysis of severe weather events reported in the United States between 1990 and 2011 reveals distinct patterns in the types of hazards most associated with population health and economic loss.

Tornadoes consistently rank as the most harmful event type in terms of human health, accounting for the highest number of both fatalities and injuries. This reflects their sudden onset, concentrated intensity, and frequent occurrence in populated areas. Heat events also stand out, with a disproportionately high number of fatalities relative to injuries, suggesting that their impacts may be less visible but more deadly. A scatter plot comparing total fatalities and injuries highlights this contrast: tornadoes dominate the upper right quadrant (high injuries and moderate fatalities), while heat events occupy the lower right (high fatalities, moderate injuries). Other events such as lightning, floods, and thunderstorm winds cluster in the mid-range, indicating moderate but consistent health impacts.

In terms of economic damage, thunderstorm winds and tornadoes are the leading causes of property loss, while hail and drought are the most damaging to crops. A scatter plot comparing total property and crop damage shows thunderstorm winds positioned far to the right, reflecting their widespread impact on buildings and infrastructure, while hail appears high on the vertical axis, indicating its concentrated effect on agriculture. Floods and flash floods contribute significantly to both property and crop damage, often affecting large geographic areas and multiple sectors simultaneously.

The composite bar charts of the top 10 most harmful events reinforces these patterns. Tornadoes appear prominently across all categories except crop damage, while thunderstorm winds rank highest for property damage. Hail leads in crop damage, and flash floods rank highly for both fatalities and property loss. Events such as high wind, winter storm, and lightning show moderate impacts across categories, often affecting infrastructure, mobility, and public safety.

These patterns provide a data-driven foundation for understanding which event types have historically posed the greatest threats to human health and economic stability. They also offer a comparative lens for assessing the frequency, severity, and cross-sector impacts of different hazards, supporting strategic awareness and resource planning across jurisdictions.