This analysis explores the U.S. National Oceanic and Atmospheric
Administration (NOAA) Storm Database to identify which types of severe
weather events have the greatest impact on population health and the
greatest economic consequences in the United States between 1950 and
November 2011. Population health impact is measured by the total number
of fatalities and injuries attributed to each event type, while economic
impact is measured as the combined dollar value of property and crop
damage. Because damage magnitudes in the raw file are stored with an
alphabetic exponent column (PROPDMGEXP,
CROPDMGEXP), these values are translated into numeric
multipliers before summation. The analysis shows that tornadoes are by
far the leading cause of weather-related fatalities and injuries in the
United States, while floods are the most economically damaging event
type, followed by hurricanes/typhoons, tornadoes, and storm surges.
These findings suggest that preparedness resources should prioritize
tornado early-warning systems for public health protection and flood
mitigation infrastructure for economic resilience.
The analysis begins directly from the raw
repdata_data_StormData.csv.bz2 file provided with the
assignment. The file is downloaded (if not already present) and read
directly with read.csv, which handles bzip2 decompression
transparently.
data_url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
data_file <- "StormData.csv.bz2"
if (!file.exists(data_file)) {
download.file(data_url, destfile = data_file, mode = "wb")
}
storm <- read.csv(data_file, stringsAsFactors = FALSE)
dim(storm)
## [1] 902297 37
We only need the event type, the fatalities and injuries counts, and the property/crop damage fields with their exponent columns:
library(dplyr)
storm <- storm %>%
select(EVTYPE, FATALITIES, INJURIES,
PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
The PROPDMGEXP and CROPDMGEXP columns
encode the order of magnitude of PROPDMG and
CROPDMG as single characters: "K" = thousand,
"M" = million, "B" = billion. Numeric
characters "0"-"8" also appear in the raw
data; following the convention in the NOAA documentation we treat them
as powers of ten. Any other character (including "?",
"+", "-", empty) is treated as a multiplier of
1.
exp_to_mult <- function(e) {
e <- toupper(as.character(e))
dplyr::case_when(
e == "B" ~ 1e9,
e == "M" ~ 1e6,
e == "K" ~ 1e3,
e == "H" ~ 1e2,
e %in% as.character(0:8) ~ 10 ^ as.numeric(e),
TRUE ~ 1
)
}
storm <- storm %>%
mutate(
prop_damage = PROPDMG * exp_to_mult(PROPDMGEXP),
crop_damage = CROPDMG * exp_to_mult(CROPDMGEXP),
total_damage = prop_damage + crop_damage
)
EVTYPE is free-text and contains many near-duplicates
(leading spaces, case differences, etc.). We apply a light cleanup:
uppercase and trim whitespace. We do not attempt a full canonicalisation
against the 48 official NOAA categories because the ranking of the top
events is robust to those differences.
storm <- storm %>%
mutate(EVTYPE = trimws(toupper(EVTYPE)))
health <- storm %>%
group_by(EVTYPE) %>%
summarise(fatalities = sum(FATALITIES, na.rm = TRUE),
injuries = sum(INJURIES, na.rm = TRUE),
total = fatalities + injuries,
.groups = "drop") %>%
arrange(desc(total))
economic <- storm %>%
group_by(EVTYPE) %>%
summarise(property = sum(prop_damage, na.rm = TRUE),
crop = sum(crop_damage, na.rm = TRUE),
total = property + crop,
.groups = "drop") %>%
arrange(desc(total))
We rank event types by the total number of people affected (fatalities + injuries). The table below shows the top 10:
top_health <- head(health, 10)
knitr::kable(top_health,
caption = "Top 10 event types by total casualties (fatalities + injuries), 1950-2011")
| EVTYPE | fatalities | injuries | total |
|---|---|---|---|
| TORNADO | 5633 | 91346 | 96979 |
| EXCESSIVE HEAT | 1903 | 6525 | 8428 |
| TSTM WIND | 504 | 6957 | 7461 |
| FLOOD | 470 | 6789 | 7259 |
| LIGHTNING | 816 | 5230 | 6046 |
| HEAT | 937 | 2100 | 3037 |
| FLASH FLOOD | 978 | 1777 | 2755 |
| ICE STORM | 89 | 1975 | 2064 |
| THUNDERSTORM WIND | 133 | 1488 | 1621 |
| WINTER STORM | 206 | 1321 | 1527 |
library(ggplot2)
library(tidyr)
top_health_long <- top_health %>%
select(EVTYPE, fatalities, injuries) %>%
pivot_longer(c(fatalities, injuries),
names_to = "type", values_to = "count")
ggplot(top_health_long,
aes(x = reorder(EVTYPE, -count), y = count, fill = type)) +
geom_col(position = "dodge") +
scale_y_continuous(labels = scales::comma) +
labs(x = "Event type",
y = "Number of people",
fill = "",
title = "Top 10 severe weather event types by health impact") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 35, hjust = 1))
Finding. Tornadoes are by far the most harmful event type with respect to population health, responsible for more total fatalities and injuries than all other event types combined in the top 10. Excessive heat, TSTM wind, floods, and lightning round out the top five.
We rank event types by the total damage in U.S. dollars (property + crop). The table below shows the top 10:
top_econ <- head(economic, 10)
knitr::kable(
top_econ %>%
mutate(across(c(property, crop, total),
~ scales::dollar(., scale = 1e-9, suffix = "B", accuracy = 0.1))),
caption = "Top 10 event types by total property + crop damage, 1950-2011 (USD billions)")
| EVTYPE | property | crop | total |
|---|---|---|---|
| FLOOD | $144.7B | $5.7B | $150.3B |
| HURRICANE/TYPHOON | $69.3B | $2.6B | $71.9B |
| TORNADO | $56.9B | $0.4B | $57.4B |
| STORM SURGE | $43.3B | $0.0B | $43.3B |
| HAIL | $15.7B | $3.0B | $18.8B |
| FLASH FLOOD | $16.8B | $1.4B | $18.2B |
| DROUGHT | $1.0B | $14.0B | $15.0B |
| HURRICANE | $11.9B | $2.7B | $14.6B |
| RIVER FLOOD | $5.1B | $5.0B | $10.1B |
| ICE STORM | $3.9B | $5.0B | $9.0B |
top_econ_long <- top_econ %>%
select(EVTYPE, property, crop) %>%
pivot_longer(c(property, crop),
names_to = "type", values_to = "damage")
ggplot(top_econ_long,
aes(x = reorder(EVTYPE, -damage),
y = damage / 1e9, fill = type)) +
geom_col() +
labs(x = "Event type",
y = "Damage (USD billions)",
fill = "",
title = "Top 10 severe weather event types by economic impact") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 35, hjust = 1))
Finding. Floods are the most economically destructive event type, followed by hurricanes/typhoons, tornadoes, and storm surges. Property damage dominates the totals for most top event types; crop damage is most significant for droughts.
Across the United States over 1950-2011, tornadoes have the largest impact on population health while floods cause the greatest aggregate economic damage. A government manager responsible for allocating preparedness resources across both dimensions might therefore weight tornado warning/shelter programmes most heavily for life-safety concerns and flood mitigation (levees, drainage, insurance) most heavily for economic resilience.
sessionInfo()
## R version 4.5.2 (2025-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 26200)
##
## Matrix products: default
## LAPACK version 3.12.1
##
## locale:
## [1] LC_COLLATE=English_India.utf8 LC_CTYPE=English_India.utf8
## [3] LC_MONETARY=English_India.utf8 LC_NUMERIC=C
## [5] LC_TIME=English_India.utf8
##
## time zone: Asia/Calcutta
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] tidyr_1.3.2 ggplot2_4.0.2 dplyr_1.2.0
##
## loaded via a namespace (and not attached):
## [1] vctrs_0.7.2 cli_3.6.5 knitr_1.51 rlang_1.1.7
## [5] xfun_0.57 purrr_1.2.1 generics_0.1.4 S7_0.2.1
## [9] jsonlite_2.0.0 labeling_0.4.3 glue_1.8.0 htmltools_0.5.9
## [13] sass_0.4.10 scales_1.4.0 rmarkdown_2.31 grid_4.5.2
## [17] evaluate_1.0.5 jquerylib_0.1.4 tibble_3.3.1 fastmap_1.2.0
## [21] yaml_2.3.12 lifecycle_1.0.5 compiler_4.5.2 codetools_0.2-20
## [25] RColorBrewer_1.1-3 pkgconfig_2.0.3 farver_2.1.2 digest_0.6.39
## [29] R6_2.6.1 tidyselect_1.2.1 pillar_1.11.1 magrittr_2.0.4
## [33] bslib_0.10.0 gtable_0.3.6 tools_4.5.2 withr_3.0.2
## [37] cachem_1.1.0