This analysis examines U.S. NOAA storm event records to identify
which event types are most harmful to population health and which create
the largest economic losses.
Population health impact is defined as the combined total of fatalities
and injuries.
Economic impact is defined as the combined total of property damage and
crop damage after converting NOAA damage exponents (for example K, M, B)
to numeric multipliers.
The analysis starts from the raw compressed CSV file and performs all
data processing in this document for reproducibility.
Results are summarized with ranked tables and two figures showing the
top 10 event types in each category.
Under these definitions, tornadoes are typically the largest health
burden, while flooding-related events tend to dominate total economic
losses.
required_packages <- c("dplyr", "ggplot2", "knitr", "scales")
missing_packages <- required_packages[!sapply(required_packages, requireNamespace, quietly = TRUE)]
if (length(missing_packages) > 0) {
stop(
paste(
"Please install missing packages before knitting:",
paste(missing_packages, collapse = ", ")
)
)
}
library(dplyr)
library(ggplot2)
library(knitr)
library(scales)
# Start from the raw NOAA data file in the working directory
data_file <- "repdata_data_StormData.csv.bz2"
storm_raw <- read.csv(data_file, stringsAsFactors = FALSE)
# Convert NOAA exponent codes to numeric multipliers
exp_to_multiplier <- function(exp_code) {
exp_code <- toupper(trimws(exp_code))
mult <- rep(1, length(exp_code))
mult[exp_code == "H"] <- 1e2
mult[exp_code == "K"] <- 1e3
mult[exp_code == "M"] <- 1e6
mult[exp_code == "B"] <- 1e9
is_digit <- grepl("^[0-9]$", exp_code)
mult[is_digit] <- 10^as.numeric(exp_code[is_digit])
mult
}
# Keep needed variables and create analysis metrics
storm <- storm_raw %>%
transmute(
EVTYPE = toupper(trimws(EVTYPE)),
FATALITIES = as.numeric(FATALITIES),
INJURIES = as.numeric(INJURIES),
PROPDMG = as.numeric(PROPDMG),
PROPDMGEXP = as.character(PROPDMGEXP),
CROPDMG = as.numeric(CROPDMG),
CROPDMGEXP = as.character(CROPDMGEXP)
) %>%
mutate(
health_impact = FATALITIES + INJURIES,
property_damage = PROPDMG * exp_to_multiplier(PROPDMGEXP),
crop_damage = CROPDMG * exp_to_multiplier(CROPDMGEXP),
economic_impact = property_damage + crop_damage
)
# Summarize impacts by event type
health_by_event <- storm %>%
group_by(EVTYPE) %>%
summarise(
total_health_impact = sum(health_impact, na.rm = TRUE),
total_fatalities = sum(FATALITIES, na.rm = TRUE),
total_injuries = sum(INJURIES, na.rm = TRUE),
.groups = "drop"
) %>%
filter(total_health_impact > 0) %>%
arrange(desc(total_health_impact))
econ_by_event <- storm %>%
group_by(EVTYPE) %>%
summarise(
total_economic_impact = sum(economic_impact, na.rm = TRUE),
.groups = "drop"
) %>%
filter(total_economic_impact > 0) %>%
arrange(desc(total_economic_impact))
top_health <- head(health_by_event, 10)
top_econ <- head(econ_by_event, 10)
Data transformations were limited to trimming/uppercasing event names for consistent grouping and converting damage exponents to numeric multipliers so that property and crop losses can be added on a common dollar scale. Unknown or blank exponent symbols are treated as multiplier 1, which is a standard conservative choice for this dataset.
Across the United States, the event type with the largest population health impact is TORNADO with 96,979 combined fatalities and injuries.
kable(
top_health,
caption = "Top 10 event types by total population health impact (fatalities + injuries)."
)
| EVTYPE | total_health_impact | total_fatalities | total_injuries |
|---|---|---|---|
| TORNADO | 96979 | 5633 | 91346 |
| EXCESSIVE HEAT | 8428 | 1903 | 6525 |
| TSTM WIND | 7461 | 504 | 6957 |
| FLOOD | 7259 | 470 | 6789 |
| LIGHTNING | 6046 | 816 | 5230 |
| HEAT | 3037 | 937 | 2100 |
| FLASH FLOOD | 2755 | 978 | 1777 |
| ICE STORM | 2064 | 89 | 1975 |
| THUNDERSTORM WIND | 1621 | 133 | 1488 |
| WINTER STORM | 1527 | 206 | 1321 |
ggplot(top_health, aes(x = reorder(EVTYPE, total_health_impact), y = total_health_impact)) +
geom_col(fill = "tomato") +
coord_flip() +
labs(
x = "Event type",
y = "Total fatalities + injuries",
title = "Most Harmful Event Types for Population Health"
) +
theme_minimal()
Top 10 event types by total population health impact in the NOAA storm dataset.
Across the United States, the event type with the greatest total economic consequence is FLOOD with approximately $150,319,678,257 in combined property and crop damage.
kable(
top_econ,
caption = "Top 10 event types by total economic impact (property + crop damage)."
)
| EVTYPE | total_economic_impact |
|---|---|
| FLOOD | 150319678257 |
| HURRICANE/TYPHOON | 71913712800 |
| TORNADO | 57362333947 |
| STORM SURGE | 43323541000 |
| HAIL | 18761221986 |
| FLASH FLOOD | 18244041079 |
| DROUGHT | 15018672000 |
| HURRICANE | 14610229010 |
| RIVER FLOOD | 10148404500 |
| ICE STORM | 8967041360 |
ggplot(top_econ, aes(x = reorder(EVTYPE, total_economic_impact), y = total_economic_impact)) +
geom_col(fill = "steelblue") +
coord_flip() +
scale_y_continuous(labels = dollar_format(prefix = "$", scale = 1e-9, suffix = "B")) +
labs(
x = "Event type",
y = "Total economic damage (USD, billions)",
title = "Event Types with the Greatest Economic Consequences"
) +
theme_minimal()
Top 10 event types by total economic losses in the NOAA storm dataset.