This analysis examines the U.S. National Oceanic and Atmospheric Administration (NOAA) Storm Database from 1950 to 2011 to identify which types of severe weather events have the greatest impacts on population health and economic outcomes across the United States.
The results show that tornadoes dominate population health impacts, while floods and hurricanes account for the largest economic losses.
These findings highlight the concentration of severe impacts among a relatively small number of event types and can support prioritization of preparedness and mitigation efforts.
The analysis begins by loading the raw NOAA Storm Database from the original CSV file provided for this assignment. No preprocessing is performed outside of this document. Only variables required to assess population health and economic impacts are retained, including event type, fatalities, injuries, and property and crop damage estimates.
To reduce inconsistencies caused by differences in capitalization and whitespace, event type names are converted to uppercase and trimmed. Economic damage values are transformed into U.S. dollars using the property and crop damage exponent variables.
Exponents representing thousands (K), millions (M), and billions (B) are converted to their corresponding numeric multipliers, while numeric exponents are interpreted as powers of ten. Missing or empty exponent values are treated conservatively as a multiplier of one.
All impacts are aggregated by event type across the full time span and geographic coverage of the dataset.
Across the United States, tornadoes are by far the most harmful event type with respect to population health, accounting for the highest combined number of fatalities and injuries.
Excessive heat, thunderstorm-related winds, floods, and lightning also contribute substantially to injuries and fatalities. These results indicate that both sudden high-impact events (such as tornadoes) and prolonged exposure events (such as heat) pose significant risks to public health.
Flooding events cause the greatest overall economic damage, driven primarily by extensive property losses. Hurricanes and typhoons also account for substantial economic impacts, reflecting their ability to cause widespread infrastructure damage and agricultural losses.
While tornadoes rank highest in terms of population health impacts, they rank lower than floods and hurricanes in terms of total economic cost, illustrating that different types of severe weather events pose different kinds of risks to communities.
knitr::opts_chunk$set(echo = TRUE)
library(ggplot2)
# The raw CSV file is stored locally in the working directory
file_name <- "repdata_data_StormData.csv"
# Read directly from the raw CSV file
storm <- read.csv(file_name, stringsAsFactors = FALSE)
# Keep only columns required for the analysis
keep_cols <- c(
"EVTYPE", "FATALITIES", "INJURIES",
"PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP"
)
storm <- storm[, keep_cols]
# Basic inspection
str(storm)
## 'data.frame': 902297 obs. of 7 variables:
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
summary(storm[, c("FATALITIES", "INJURIES", "PROPDMG", "CROPDMG")])
## FATALITIES INJURIES PROPDMG CROPDMG
## Min. : 0.00000 Min. : 0.0000 Min. : 0.00 Min. : 0.000
## 1st Qu.: 0.00000 1st Qu.: 0.0000 1st Qu.: 0.00 1st Qu.: 0.000
## Median : 0.00000 Median : 0.0000 Median : 0.00 Median : 0.000
## Mean : 0.01678 Mean : 0.1557 Mean : 12.06 Mean : 1.527
## 3rd Qu.: 0.00000 3rd Qu.: 0.0000 3rd Qu.: 0.50 3rd Qu.: 0.000
## Max. :583.00000 Max. :1700.0000 Max. :5000.00 Max. :990.000
# Normalize event type names
storm$EVTYPE <- toupper(trimws(storm$EVTYPE))
# Helper function to convert exponent codes to numeric multipliers
exp_to_multiplier <- function(exp_vec) {
exp_vec <- toupper(trimws(exp_vec))
out <- rep(1, length(exp_vec))
out[exp_vec == "K"] <- 1e3
out[exp_vec == "M"] <- 1e6
out[exp_vec == "B"] <- 1e9
is_digit <- grepl("^[0-9]$", exp_vec)
out[is_digit] <- 10^(as.numeric(exp_vec[is_digit]))
out[is.na(exp_vec) | exp_vec == ""] <- 1
out
}
# Convert property and crop damage to USD
storm$PROP_DMG_USD <- storm$PROPDMG * exp_to_multiplier(storm$PROPDMGEXP)
storm$CROP_DMG_USD <- storm$CROPDMG * exp_to_multiplier(storm$CROPDMGEXP)
# Combined health impact metric
storm$HEALTH_HARM <- storm$FATALITIES + storm$INJURIES
health_by_type <- aggregate(
cbind(FATALITIES, INJURIES, HEALTH_HARM) ~ EVTYPE,
data = storm,
sum,
na.rm = TRUE
)
health_top10 <- health_by_type[
order(-health_by_type$HEALTH_HARM),
][1:10, ]
health_top10
## EVTYPE FATALITIES INJURIES HEALTH_HARM
## 750 TORNADO 5633 91346 96979
## 108 EXCESSIVE HEAT 1903 6525 8428
## 771 TSTM WIND 504 6957 7461
## 146 FLOOD 470 6789 7259
## 410 LIGHTNING 816 5230 6046
## 235 HEAT 937 2100 3037
## 130 FLASH FLOOD 978 1777 2755
## 379 ICE STORM 89 1975 2064
## 677 THUNDERSTORM WIND 133 1488 1621
## 880 WINTER STORM 206 1321 1527
ggplot(health_top10, aes(x = reorder(EVTYPE, HEALTH_HARM), y = HEALTH_HARM)) +
geom_col() +
coord_flip() +
labs(
title = "Top 10 Event Types by Population Health Harm",
subtitle = "Health harm = fatalities + injuries (NOAA Storm Data, 1950–2011)",
x = "Event type",
y = "Total fatalities and injuries"
)
econ_by_type <- aggregate(
cbind(PROP_DMG_USD, CROP_DMG_USD) ~ EVTYPE,
data = storm,
sum,
na.rm = TRUE
)
econ_by_type$ECON_DMG_USD <- econ_by_type$PROP_DMG_USD + econ_by_type$CROP_DMG_USD
econ_top10 <- econ_by_type[
order(-econ_by_type$ECON_DMG_USD),
][1:10, ]
econ_top10
## EVTYPE PROP_DMG_USD CROP_DMG_USD ECON_DMG_USD
## 146 FLOOD 144657709807 5661968450 150319678257
## 364 HURRICANE/TYPHOON 69305840000 2607872800 71913712800
## 750 TORNADO 56947380677 414953270 57362333947
## 591 STORM SURGE 43323536000 5000 43323541000
## 204 HAIL 15735267018 3025954473 18761221491
## 130 FLASH FLOOD 16822723979 1421317100 18244041079
## 76 DROUGHT 1046106000 13972566000 15018672000
## 355 HURRICANE 11868319010 2741910000 14610229010
## 521 RIVER FLOOD 5118945500 5029459000 10148404500
## 379 ICE STORM 3944927860 5022113500 8967041360
ggplot(econ_top10, aes(x = reorder(EVTYPE, ECON_DMG_USD), y = ECON_DMG_USD)) +
geom_col() +
coord_flip() +
labs(
title = "Top 10 Event Types by Economic Damage",
subtitle = "Property + crop damage (USD, NOAA Storm Data, 1950–2011)",
x = "Event type",
y = "Total damage (USD)"
)