Summary

This analysis examines the U.S. National Oceanic and Atmospheric Administration (NOAA) Storm Database from 1950 to 2011 to identify which types of severe weather events have the greatest impacts on population health and economic outcomes across the United States.

The results show that tornadoes dominate population health impacts, while floods and hurricanes account for the largest economic losses.

These findings highlight the concentration of severe impacts among a relatively small number of event types and can support prioritization of preparedness and mitigation efforts.

Data Processing

The analysis begins by loading the raw NOAA Storm Database from the original CSV file provided for this assignment. No preprocessing is performed outside of this document. Only variables required to assess population health and economic impacts are retained, including event type, fatalities, injuries, and property and crop damage estimates.

To reduce inconsistencies caused by differences in capitalization and whitespace, event type names are converted to uppercase and trimmed. Economic damage values are transformed into U.S. dollars using the property and crop damage exponent variables.

Exponents representing thousands (K), millions (M), and billions (B) are converted to their corresponding numeric multipliers, while numeric exponents are interpreted as powers of ten. Missing or empty exponent values are treated conservatively as a multiplier of one.

All impacts are aggregated by event type across the full time span and geographic coverage of the dataset.

Results

Impact on Population Health

Across the United States, tornadoes are by far the most harmful event type with respect to population health, accounting for the highest combined number of fatalities and injuries.

Excessive heat, thunderstorm-related winds, floods, and lightning also contribute substantially to injuries and fatalities. These results indicate that both sudden high-impact events (such as tornadoes) and prolonged exposure events (such as heat) pose significant risks to public health.

Economic Consequences

Flooding events cause the greatest overall economic damage, driven primarily by extensive property losses. Hurricanes and typhoons also account for substantial economic impacts, reflecting their ability to cause widespread infrastructure damage and agricultural losses.

While tornadoes rank highest in terms of population health impacts, they rank lower than floods and hurricanes in terms of total economic cost, illustrating that different types of severe weather events pose different kinds of risks to communities.

Data Processing

Loading the data

knitr::opts_chunk$set(echo = TRUE)

library(ggplot2)
# The raw CSV file is stored locally in the working directory
file_name <- "repdata_data_StormData.csv"

# Read directly from the raw CSV file
storm <- read.csv(file_name, stringsAsFactors = FALSE)

# Keep only columns required for the analysis
keep_cols <- c(
  "EVTYPE", "FATALITIES", "INJURIES",
  "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP"
)
storm <- storm[, keep_cols]

# Basic inspection
str(storm)
## 'data.frame':    902297 obs. of  7 variables:
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
summary(storm[, c("FATALITIES", "INJURIES", "PROPDMG", "CROPDMG")])
##    FATALITIES           INJURIES            PROPDMG           CROPDMG       
##  Min.   :  0.00000   Min.   :   0.0000   Min.   :   0.00   Min.   :  0.000  
##  1st Qu.:  0.00000   1st Qu.:   0.0000   1st Qu.:   0.00   1st Qu.:  0.000  
##  Median :  0.00000   Median :   0.0000   Median :   0.00   Median :  0.000  
##  Mean   :  0.01678   Mean   :   0.1557   Mean   :  12.06   Mean   :  1.527  
##  3rd Qu.:  0.00000   3rd Qu.:   0.0000   3rd Qu.:   0.50   3rd Qu.:  0.000  
##  Max.   :583.00000   Max.   :1700.0000   Max.   :5000.00   Max.   :990.000

Data transformations

# Normalize event type names
storm$EVTYPE <- toupper(trimws(storm$EVTYPE))

# Helper function to convert exponent codes to numeric multipliers
exp_to_multiplier <- function(exp_vec) {
  exp_vec <- toupper(trimws(exp_vec))
  out <- rep(1, length(exp_vec))

  out[exp_vec == "K"] <- 1e3
  out[exp_vec == "M"] <- 1e6
  out[exp_vec == "B"] <- 1e9

  is_digit <- grepl("^[0-9]$", exp_vec)
  out[is_digit] <- 10^(as.numeric(exp_vec[is_digit]))

  out[is.na(exp_vec) | exp_vec == ""] <- 1
  out
}

# Convert property and crop damage to USD
storm$PROP_DMG_USD <- storm$PROPDMG * exp_to_multiplier(storm$PROPDMGEXP)
storm$CROP_DMG_USD <- storm$CROPDMG * exp_to_multiplier(storm$CROPDMGEXP)

# Combined health impact metric
storm$HEALTH_HARM <- storm$FATALITIES + storm$INJURIES

Results

Impact on population health

health_by_type <- aggregate(
  cbind(FATALITIES, INJURIES, HEALTH_HARM) ~ EVTYPE,
  data = storm,
  sum,
  na.rm = TRUE
)

health_top10 <- health_by_type[
  order(-health_by_type$HEALTH_HARM),
][1:10, ]

health_top10
##                EVTYPE FATALITIES INJURIES HEALTH_HARM
## 750           TORNADO       5633    91346       96979
## 108    EXCESSIVE HEAT       1903     6525        8428
## 771         TSTM WIND        504     6957        7461
## 146             FLOOD        470     6789        7259
## 410         LIGHTNING        816     5230        6046
## 235              HEAT        937     2100        3037
## 130       FLASH FLOOD        978     1777        2755
## 379         ICE STORM         89     1975        2064
## 677 THUNDERSTORM WIND        133     1488        1621
## 880      WINTER STORM        206     1321        1527
ggplot(health_top10, aes(x = reorder(EVTYPE, HEALTH_HARM), y = HEALTH_HARM)) +
  geom_col() +
  coord_flip() +
  labs(
    title = "Top 10 Event Types by Population Health Harm",
    subtitle = "Health harm = fatalities + injuries (NOAA Storm Data, 1950–2011)",
    x = "Event type",
    y = "Total fatalities and injuries"
  )

Economic consequences

econ_by_type <- aggregate(
  cbind(PROP_DMG_USD, CROP_DMG_USD) ~ EVTYPE,
  data = storm,
  sum,
  na.rm = TRUE
)

econ_by_type$ECON_DMG_USD <- econ_by_type$PROP_DMG_USD + econ_by_type$CROP_DMG_USD

econ_top10 <- econ_by_type[
  order(-econ_by_type$ECON_DMG_USD),
][1:10, ]

econ_top10
##                EVTYPE PROP_DMG_USD CROP_DMG_USD ECON_DMG_USD
## 146             FLOOD 144657709807   5661968450 150319678257
## 364 HURRICANE/TYPHOON  69305840000   2607872800  71913712800
## 750           TORNADO  56947380677    414953270  57362333947
## 591       STORM SURGE  43323536000         5000  43323541000
## 204              HAIL  15735267018   3025954473  18761221491
## 130       FLASH FLOOD  16822723979   1421317100  18244041079
## 76            DROUGHT   1046106000  13972566000  15018672000
## 355         HURRICANE  11868319010   2741910000  14610229010
## 521       RIVER FLOOD   5118945500   5029459000  10148404500
## 379         ICE STORM   3944927860   5022113500   8967041360
ggplot(econ_top10, aes(x = reorder(EVTYPE, ECON_DMG_USD), y = ECON_DMG_USD)) +
  geom_col() +
  coord_flip() +
  labs(
    title = "Top 10 Event Types by Economic Damage",
    subtitle = "Property + crop damage (USD, NOAA Storm Data, 1950–2011)",
    x = "Event type",
    y = "Total damage (USD)"
  )