Synopsis

Severe weather events can cause loss of life, injuries, and large economic losses. Using the NOAA Storm Database (1950–November 2011), I identify which event types are most harmful to population health and which have the greatest economic consequences. Because early years in the database contain fewer recorded events due to limited records, results focus on totals across the full period but note data completeness improves in later decades fileciteturn0file0. Event names were standardized toward the National Weather Service’s permitted list (48 event types) to reduce miscoding and duplication fileciteturn0file1. Health impact was measured as fatalities + injuries; economic impact was measured as the sum of property and crop damage after converting magnitude exponents (K/M/B). The analysis shows tornadoes dominate total injuries/fatalities, while floods and hurricanes/typhoons account for the largest total economic losses. Plots of the top 10 event types for each outcome are provided. All code is shown for full reproducibility.

Data Processing

This analysis starts from the original compressed CSV (repdata_data_StormData.csv.bz2). No preprocessing was done outside this document. The Storm Events database is compiled by NWS and archived by NCDC/NOAA; updates typically lag event months by ~90–120 days, and some inputs come from varied sources with possible uncertainty fileciteturn0file0. Event type standardization follows the NWS Instruction 10-1605 permitted table of event names fileciteturn0file1.

#| echo: true
# Load packages
suppressPackageStartupMessages({
  library(dplyr)
  library(readr)
  library(stringr)
  library(ggplot2)
})

# Download (if needed) and load data ----
data_url <- "https://d396qusza40orc.cloudfront.net/repdata/data/StormData.csv.bz2"
data_file <- "repdata_data_StormData.csv.bz2"

if (!file.exists(data_file)) {
  download.file(data_url, destfile = data_file, mode = "wb", quiet = TRUE)
}

# Read with base read.csv for strict compatibility
# (Using bzfile so we start from the raw compressed CSV, as required)
df <- read.csv(bzfile(data_file), stringsAsFactors = FALSE)

# Keep only columns we need for this assignment
df <- df %>%
  select(BGN_DATE, EVTYPE, FATALITIES, INJURIES,
         PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)

Event Type Normalization

NOAA/NWS defines a permitted list of event names; historical data contain many variants (e.g., TSTM WIND, THUNDERSTORM WINDS, etc.). Below, I map common variants to the closest permitted type to reduce fragmentation of counts fileciteturn0file1.

#| echo: true
# Helper: normalize exponents to multipliers
exp_to_mult <- function(x) {
  x <- toupper(trimws(as.character(x)))
  case_when(
    x %in% c("K") ~ 1e3,
    x %in% c("M") ~ 1e6,
    x %in% c("B") ~ 1e9,
    x %in% as.character(0:8) ~ 10^as.numeric(x), # occasionally digits appear
    TRUE ~ 1 # treat unknown/blank as 1 (conservative); see notes below
  )
}

# Helper: normalize event types toward NWS 48 categories (not exhaustive, but covers the bulk)
normalize_evtype <- function(x) {
  y <- toupper(x)
  y <- str_replace_all(y, "[^A-Z0-9 /()&-]", " ")
  y <- str_squish(y)

  dplyr::case_when(
    str_detect(y, "HURRICANE|TYPHOON") ~ "HURRICANE (TYPHOON)",
    str_detect(y, "TSTM|THUNDERSTORM") & str_detect(y, "MARINE") ~ "MARINE THUNDERSTORM WIND",
    str_detect(y, "THUNDERSTORM|TSTM") ~ "THUNDERSTORM WIND",
    str_detect(y, "TORNADO|TORNDAO|LANDSPOUT") ~ "TORNADO",
    str_detect(y, "WATERSPOUT") ~ "WATERSPOUT",
    str_detect(y, "FUNNEL") ~ "FUNNEL CLOUD",
    str_detect(y, "FLASH FLOOD") ~ "FLASH FLOOD",
    str_detect(y, "FLOOD|RIVER FLOOD|URBAN FLOOD|STREAM") ~ "FLOOD",
    str_detect(y, "COASTAL FLOOD|BEACH FLOOD|TIDAL FLOOD") ~ "COASTAL FLOOD",
    str_detect(y, "LAKESHORE FLOOD") ~ "LAKESHORE FLOOD",
    str_detect(y, "STORM SURGE|STORM TIDE|COASTAL SURGE|SURGE/TIDE") ~ "STORM SURGE/TIDE",
    str_detect(y, "TSUNAMI") ~ "TSUNAMI",
    str_detect(y, "TROPICAL STORM") ~ "TROPICAL STORM",
    str_detect(y, "TROPICAL DEPRESSION") ~ "TROPICAL DEPRESSION",
    str_detect(y, "WINTER STORM") ~ "WINTER STORM",
    str_detect(y, "WINTER WEATHER|WINTRY") ~ "WINTER WEATHER",
    str_detect(y, "BLIZZARD") ~ "BLIZZARD",
    str_detect(y, "ICE STORM|GLAZE") ~ "ICE STORM",
    str_detect(y, "HEAVY SNOW") ~ "HEAVY SNOW",
    str_detect(y, "LAKE EFFECT SNOW|LAKE-EFFECT") ~ "LAKE-EFFECT SNOW",
    str_detect(y, "SLEET") ~ "SLEET",
    str_detect(y, "FREEZING FOG") ~ "FREEZING FOG",
    str_detect(y, "DENSE FOG|FOG") & !str_detect(y, "FREEZING") ~ "DENSE FOG",
    str_detect(y, "EXTREME COLD|WIND CHILL") ~ "EXTREME COLD/WIND CHILL",
    str_detect(y, "COLD|CHILL") ~ "COLD/WIND CHILL",
    str_detect(y, "HEAT WAVE|EXCESSIVE HEAT") ~ "EXCESSIVE HEAT",
    str_detect(y, "HEAT") ~ "HEAT",
    str_detect(y, "DROUGHT") ~ "DROUGHT",
    str_detect(y, "HAIL") & str_detect(y, "MARINE") ~ "MARINE HAIL",
    str_detect(y, "HAIL") ~ "HAIL",
    str_detect(y, "LIGHTNING") ~ "LIGHTNING",
    str_detect(y, "HIGH WIND") & str_detect(y, "MARINE") ~ "MARINE HIGH WIND",
    str_detect(y, "STRONG WIND") & str_detect(y, "MARINE") ~ "MARINE STRONG WIND",
    str_detect(y, "HIGH WIND") ~ "HIGH WIND",
    str_detect(y, "STRONG WIND|GUSTY WIND") ~ "STRONG WIND",
    str_detect(y, "HEAVY RAIN|TORRENTIAL|RAINSTORM") ~ "HEAVY RAIN",
    str_detect(y, "RIP CURRENT") ~ "RIP CURRENT",
    str_detect(y, "HIGH SURF|HEAVY SURF|ROUGH SURF|HAZARDOUS SURF") ~ "HIGH SURF",
    str_detect(y, "SEICHE") ~ "SEICHE",
    str_detect(y, "DUST DEVIL") ~ "DUST DEVIL",
    str_detect(y, "DUST STORM|BLOWING DUST|SAHARAN DUST") ~ "DUST STORM",
    str_detect(y, "AVALANCHE") ~ "AVALANCHE",
    str_detect(y, "DEBRIS FLOW|MUDSLIDE|MUD SLIDE|LANDSLIDE") ~ "DEBRIS FLOW",
    str_detect(y, "VOLCANIC ASH") ~ "VOLCANIC ASH",
    str_detect(y, "WILDFIRE|WILD FIRE|GRASS FIRE|FOREST FIRE") ~ "WILDFIRE",
    str_detect(y, "ASTRONOMICAL LOW TIDE") ~ "ASTRONOMICAL LOW TIDE",
    TRUE ~ y # fall back to original (will be filtered later if not in the 48 list)
  )
}

df <- df %>%
  mutate(
    EVTYPE_NORM = normalize_evtype(EVTYPE),
    prop_mult = exp_to_mult(PROPDMGEXP),
    crop_mult = exp_to_mult(CROPDMGEXP),
    PROP_DMG_USD = as.numeric(PROPDMG) * prop_mult,
    CROP_DMG_USD = as.numeric(CROPDMG) * crop_mult,
    ECON_DMG_USD = PROP_DMG_USD + CROP_DMG_USD,
    HEALTH_HARM = as.numeric(FATALITIES) + as.numeric(INJURIES)
  )

Note on exponents: NWS guidance uses K/M/B shorthand for thousands/millions/billions; some records contain digits or blanks. I treat unknown/blank as 1 (conservative). For flood events specifically, NWS guidance encourages damage estimation where possible fileciteturn0file1.

To reduce noise, I keep only rows whose normalized type is one of the official 48 event types.

#| echo: true
permitted <- c(
  "ASTRONOMICAL LOW TIDE","AVALANCHE","BLIZZARD","COASTAL FLOOD",
  "COLD/WIND CHILL","DEBRIS FLOW","DENSE FOG","DENSE SMOKE","DROUGHT",
  "DUST DEVIL","DUST STORM","EXCESSIVE HEAT","EXTREME COLD/WIND CHILL",
  "FLASH FLOOD","FLOOD","FREEZING FOG","FROST/FREEZE","FUNNEL CLOUD",
  "HAIL","HEAT","HEAVY RAIN","HEAVY SNOW","HIGH SURF","HIGH WIND",
  "HURRICANE (TYPHOON)","ICE STORM","LAKESHORE FLOOD","LAKE-EFFECT SNOW",
  "LIGHTNING","MARINE HAIL","MARINE HIGH WIND","MARINE STRONG WIND",
  "MARINE THUNDERSTORM WIND","RIP CURRENT","SEICHE","SLEET","STORM SURGE/TIDE",
  "STRONG WIND","THUNDERSTORM WIND","TORNADO","TROPICAL DEPRESSION",
  "TROPICAL STORM","TSUNAMI","VOLCANIC ASH","WATERSPOUT","WILDFIRE",
  "WINTER STORM","WINTER WEATHER"
)

df_clean <- df %>% filter(EVTYPE_NORM %in% permitted)

Results

Which event types are most harmful to population health?

I compute total fatalities + injuries by normalized event type and show the top 10.

#| echo: true
health_top <- df_clean %>%
  group_by(EVTYPE_NORM) %>%
  summarise(total_health_harm = sum(HEALTH_HARM, na.rm = TRUE),
            fatalities = sum(FATALITIES, na.rm = TRUE),
            injuries = sum(INJURIES, na.rm = TRUE)) %>%
  arrange(desc(total_health_harm)) %>%
  slice_head(n = 10)

health_top
#| echo: true
# Plot 1: Top 10 event types by (fatalities + injuries)
library(ggplot2)
ggplot(health_top, aes(x = reorder(EVTYPE_NORM, total_health_harm), y = total_health_harm)) +
  geom_col() +
  coord_flip() +
  labs(title = "Top 10 Event Types by Total Injuries + Fatalities (1950–2011)",
       x = "Event Type", y = "Total Injuries + Fatalities",
       caption = "Source: NOAA Storm Database (downloaded per course site). Early years underreport events fileciteturn0file0.") +
  theme_minimal(base_size = 12)

Finding: Tornadoes overwhelmingly account for the greatest harm to population health over the entire period, followed by excessive heat/heat and thunderstorm winds.

Which event types have the greatest economic consequences?

I compute total property + crop damages (USD) by normalized event type and show the top 10.

#| echo: true
econ_top <- df_clean %>%
  group_by(EVTYPE_NORM) %>%
  summarise(total_econ_usd = sum(ECON_DMG_USD, na.rm = TRUE),
            prop_usd = sum(PROP_DMG_USD, na.rm = TRUE),
            crop_usd = sum(CROP_DMG_USD, na.rm = TRUE)) %>%
  arrange(desc(total_econ_usd)) %>%
  slice_head(n = 10)

# Nicely print billions for readability
econ_top_print <- econ_top %>%
  mutate(across(ends_with("_usd"), ~ .x / 1e9, .names = "{.col}_billions"))

econ_top_print
#| echo: true
# Plot 2: Top 10 event types by total economic damage
ggplot(econ_top, aes(x = reorder(EVTYPE_NORM, total_econ_usd), y = total_econ_usd/1e9)) +
  geom_col() +
  coord_flip() +
  labs(title = "Top 10 Event Types by Total Economic Damage (1950–2011)",
       x = "Event Type", y = "Total Damage (USD Billions)",
       caption = "Property + crop damage after K/M/B exponent conversion per NWS guidance fileciteturn0file1.") +
  theme_minimal(base_size = 12)

Finding: Floods and hurricanes/typhoons cause the largest total economic losses, with storm surge/tide also prominent. Tornadoes and hail contribute substantially as well.

Discussion and Limitations

Reproducibility

All code is included with echo=TRUE. Start from the raw .csv.bz2 file as shown. Session info:

#| echo: true
sessionInfo()