This report analyzes the U.S. National Oceanic and Atmospheric
Administration (NOAA) Storm Events Database to identify which event
types are most harmful to population health and which have the greatest
economic consequences across the United States. We start from
the raw CSV repdata_data_StormData.csv (no
external pre-processing). We compute health impacts (fatalities,
injuries) and standardized economic losses (property + crop). We show
all R code (echo = TRUE) and cache heavy steps for speed.
Our results typically show that a small number of event types account
for a large share of adverse outcomes: tornadoes
dominate injuries and fatalities, while floods,
hurricanes/typhoons, and storm surges account for the largest
economic losses. These findings can help public officials prioritize
preparedness and mitigation resources.
# Expect these to be installed already. If not, run the install script first.
library(dplyr)
##
## Adjuntando el paquete: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(stringr)
library(readr)
library(forcats)
library(tidyr)
library(knitr)
csv_path <- "repdata_data_StormData.csv"
stopifnot(file.exists(csv_path))
storm_raw <- readr::read_csv(
csv_path,
show_col_types = FALSE,
progress = FALSE
)
storm <- storm_raw %>%
dplyr::select(
BGN_DATE, STATE, EVTYPE,
FATALITIES, INJURIES,
PROPDMG, PROPDMGEXP,
CROPDMG, CROPDMGEXP
)
rm(storm_raw)
The following table describes the original key variables used in this analysis and the derived fields we create. All text is taken from NOAA documentation conventions and common usage in the Storm Events dataset.
var_dict <- tibble::tibble(
Variable = c(
"BGN_DATE","STATE","EVTYPE",
"FATALITIES","INJURIES",
"PROPDMG","PROPDMGEXP",
"CROPDMG","CROPDMGEXP",
"prop_mult","crop_mult",
"prop_loss","crop_loss","total_loss"
),
Type = c(
"Date/Time","Factor/Character","Factor/Character",
"Numeric","Numeric",
"Numeric","Character",
"Numeric","Character",
"Numeric","Numeric",
"Numeric (USD)","Numeric (USD)","Numeric (USD)"
),
Description = c(
"Date the event began.",
"U.S. state/territory code.",
"Event type label as recorded (e.g., TORNADO, FLOOD).",
"Number of deaths directly/indirectly attributable to the event.",
"Number of injuries directly/indirectly attributable to the event.",
"Property damage base amount before exponent.",
"Exponent for property damage: H=10^2, K=10^3, M=10^6, B=10^9, digits=10^digit, others treated as 1.",
"Crop damage base amount before exponent.",
"Exponent for crop damage with same convention as PROPDMGEXP.",
"Multiplier derived from PROPDMGEXP.",
"Multiplier derived from CROPDMGEXP.",
"Standardized property damage in USD (PROPDMG * prop_mult).",
"Standardized crop damage in USD (CROPDMG * crop_mult).",
"Total economic loss = prop_loss + crop_loss in USD."
)
)
kable(var_dict, caption = "Key variables and derived fields used in this analysis.")
| Variable | Type | Description |
|---|---|---|
| BGN_DATE | Date/Time | Date the event began. |
| STATE | Factor/Character | U.S. state/territory code. |
| EVTYPE | Factor/Character | Event type label as recorded (e.g., TORNADO, FLOOD). |
| FATALITIES | Numeric | Number of deaths directly/indirectly attributable to the event. |
| INJURIES | Numeric | Number of injuries directly/indirectly attributable to the event. |
| PROPDMG | Numeric | Property damage base amount before exponent. |
| PROPDMGEXP | Character | Exponent for property damage: H=10^2, K=10^3, M=10^6, B=10^9, digits=10^digit, others treated as 1. |
| CROPDMG | Numeric | Crop damage base amount before exponent. |
| CROPDMGEXP | Character | Exponent for crop damage with same convention as PROPDMGEXP. |
| prop_mult | Numeric | Multiplier derived from PROPDMGEXP. |
| crop_mult | Numeric | Multiplier derived from CROPDMGEXP. |
| prop_loss | Numeric (USD) | Standardized property damage in USD (PROPDMG * prop_mult). |
| crop_loss | Numeric (USD) | Standardized crop damage in USD (CROPDMG * crop_mult). |
| total_loss | Numeric (USD) | Total economic loss = prop_loss + crop_loss in USD. |
For transparency, we apply minimal standardization: trim whitespace and convert to UPPER CASE.
storm <- storm %>%
mutate(EVTYPE = stringr::str_squish(stringr::str_to_upper(as.character(EVTYPE))))
health_by_event <- storm %>%
group_by(EVTYPE) %>%
summarise(
fatalities = sum(FATALITIES, na.rm = TRUE),
injuries = sum(INJURIES, na.rm = TRUE),
health_harm = fatalities + injuries,
.groups = "drop"
) %>%
filter(health_harm > 0)
top_health <- health_by_event %>%
arrange(desc(health_harm)) %>%
slice_head(n = 10)
knitr::kable(top_health, caption = "Top 10 event types by combined health harm (fatalities + injuries).")
| EVTYPE | fatalities | injuries | health_harm |
|---|---|---|---|
| TORNADO | 5633 | 91346 | 96979 |
| EXCESSIVE HEAT | 1903 | 6525 | 8428 |
| TSTM WIND | 504 | 6957 | 7461 |
| FLOOD | 470 | 6789 | 7259 |
| LIGHTNING | 816 | 5230 | 6046 |
| HEAT | 937 | 2100 | 3037 |
| FLASH FLOOD | 978 | 1777 | 2755 |
| ICE STORM | 89 | 1975 | 2064 |
| THUNDERSTORM WIND | 133 | 1488 | 1621 |
| WINTER STORM | 206 | 1321 | 1527 |
NOAA stores damage amounts as a base and an exponent. We convert the exponent fields into numeric multipliers and compute standardized USD losses.
exp_to_multiplier <- function(x) {
x <- toupper(trimws(as.character(x)))
m <- rep(1, length(x))
m[x %in% c("H")] <- 1e2
m[x %in% c("K")] <- 1e3
m[x %in% c("M")] <- 1e6
m[x %in% c("B")] <- 1e9
is_digit <- grepl("^[0-8]$", x)
m[is_digit] <- 10 ^ as.numeric(x[is_digit])
m
}
storm <- storm %>%
mutate(
prop_mult = exp_to_multiplier(PROPDMGEXP),
crop_mult = exp_to_multiplier(CROPDMGEXP),
prop_loss = as.numeric(PROPDMG) * prop_mult,
crop_loss = as.numeric(CROPDMG) * crop_mult,
total_loss = prop_loss + crop_loss
)
econ_by_event <- storm %>%
group_by(EVTYPE) %>%
summarise(
property_damage = sum(prop_loss, na.rm = TRUE),
crop_damage = sum(crop_loss, na.rm = TRUE),
economic_loss = sum(total_loss, na.rm = TRUE),
.groups = "drop"
) %>%
filter(economic_loss > 0)
top_econ <- econ_by_event %>%
arrange(desc(economic_loss)) %>%
slice_head(n = 10)
knitr::kable(top_econ, caption = "Top 10 event types by total economic loss (property + crop).")
| EVTYPE | property_damage | crop_damage | economic_loss |
|---|---|---|---|
| FLOOD | 144657709807 | 5661968450 | 150319678257 |
| HURRICANE/TYPHOON | 69305840000 | 2607872800 | 71913712800 |
| TORNADO | 56947380677 | 414953270 | 57362333947 |
| STORM SURGE | 43323536000 | 5000 | 43323541000 |
| HAIL | 15735267513 | 3025954473 | 18761221986 |
| FLASH FLOOD | 16822723979 | 1421317100 | 18244041079 |
| DROUGHT | 1046106000 | 13972566000 | 15018672000 |
| HURRICANE | 11868319010 | 2741910000 | 14610229010 |
| RIVER FLOOD | 5118945500 | 5029459000 | 10148404500 |
| ICE STORM | 3944927860 | 5022113500 | 8967041360 |
health_long <- health_by_event %>%
semi_join(top_health, by = "EVTYPE") %>%
select(EVTYPE, fatalities, injuries) %>%
tidyr::pivot_longer(cols = c(fatalities, injuries),
names_to = "metric", values_to = "count") %>%
mutate(EVTYPE = forcats::fct_reorder(EVTYPE, count, sum))
ggplot(health_long, aes(x = EVTYPE, y = count, fill = metric)) +
geom_col() +
coord_flip() +
labs(
x = "Event Type (EVTYPE)",
y = "People Affected",
fill = "Health Metric",
title = "Top 10 Event Types by Health Harm (Fatalities + Injuries)"
) +
theme_minimal(base_size = 12)
econ_long <- econ_by_event %>%
semi_join(top_econ, by = "EVTYPE") %>%
select(EVTYPE, property_damage, crop_damage) %>%
tidyr::pivot_longer(cols = c(property_damage, crop_damage),
names_to = "type", values_to = "usd") %>%
mutate(EVTYPE = forcats::fct_reorder(EVTYPE, usd, sum))
ggplot(econ_long, aes(x = EVTYPE, y = usd/1e9, fill = type)) +
geom_col() +
coord_flip() +
labs(
x = "Event Type (EVTYPE)",
y = "Economic Loss (Billions of USD)",
fill = "Damage Type",
title = "Top 10 Event Types by Economic Loss (Property + Crop)"
) +
theme_minimal(base_size = 12)
Below we provide a brief, decision-oriented interpretation of the results. The analysis automatically references the computed top categories.
# Extract top leaders for dynamic text
top_health_leader <- top_health$EVTYPE[1]
top_econ_leader <- top_econ$EVTYPE[1]
health_share_top3 <- health_by_event %>%
arrange(desc(health_harm)) %>%
slice_head(n = 3) %>%
summarise(share = sum(health_harm) / sum(health_by_event$health_harm)) %>%
pull(share)
econ_share_top3 <- econ_by_event %>%
arrange(desc(economic_loss)) %>%
slice_head(n = 3) %>%
summarise(share = sum(economic_loss) / sum(econ_by_event$economic_loss)) %>%
pull(share)
cat(sprintf("**Health impacts.** `%s` is the single most harmful event type when combining fatalities and injuries. The top three event types account for roughly %.1f%% of total health harm, indicating a highly skewed risk distribution.\n\n",
top_health_leader, 100*health_share_top3))
Health impacts. TORNADO is the single
most harmful event type when combining fatalities and injuries. The top
three event types account for roughly 72.5% of total health harm,
indicating a highly skewed risk distribution.
cat(sprintf("**Economic impacts.** `%s` yields the largest total economic losses (property + crop). Similarly, the top three event types account for about %.1f%% of total losses, underscoring the value of targeting mitigation resources to a small set of high-impact hazards.\n\n",
top_econ_leader, 100*econ_share_top3))
Economic impacts. FLOOD yields the
largest total economic losses (property + crop). Similarly, the top
three event types account for about 58.6% of total losses, underscoring
the value of targeting mitigation resources to a small set of
high-impact hazards.
# Additional insights: injury vs fatality composition, property vs crop composition
fatality_leader <- health_by_event %>% arrange(desc(fatalities)) %>% slice(1) %>% pull(EVTYPE)
injury_leader <- health_by_event %>% arrange(desc(injuries)) %>% slice(1) %>% pull(EVTYPE)
prop_leader <- econ_by_event %>% arrange(desc(property_damage)) %>% slice(1) %>% pull(EVTYPE)
crop_leader <- econ_by_event %>% arrange(desc(crop_damage)) %>% slice(1) %>% pull(EVTYPE)
cat(sprintf("**Composition.** Fatalities are led by `%s`, while injuries are led by `%s`. On the economic side, property damages are highest for `%s`, and crop damages are dominated by `%s`.\n\n",
fatality_leader, injury_leader, prop_leader, crop_leader))
Composition. Fatalities are led by
TORNADO, while injuries are led by TORNADO. On
the economic side, property damages are highest for FLOOD,
and crop damages are dominated by DROUGHT.
cat("**Policy takeaway.** A risk-informed preparedness plan should emphasize the leading health-impact hazards (e.g., warning systems, shelter access) and the leading economic-impact hazards (e.g., flood and wind-resistant infrastructure, surge barriers, crop protection and drought management). Geographic tailoring matters, as leading hazards vary by state and season.\n")
Policy takeaway. A risk-informed preparedness plan should emphasize the leading health-impact hazards (e.g., warning systems, shelter access) and the leading economic-impact hazards (e.g., flood and wind-resistant infrastructure, surge barriers, crop protection and drought management). Geographic tailoring matters, as leading hazards vary by state and season.
repdata_data_StormData.csv (no
external pre-processing).cache=TRUE on heavy chunks to speed re-runs.sessionInfo()
## R version 4.4.1 (2024-06-14 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 26100)
##
## Matrix products: default
##
##
## locale:
## [1] LC_COLLATE=Spanish_Mexico.utf8 LC_CTYPE=Spanish_Mexico.utf8
## [3] LC_MONETARY=Spanish_Mexico.utf8 LC_NUMERIC=C
## [5] LC_TIME=Spanish_Mexico.utf8
##
## time zone: America/Guayaquil
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] knitr_1.48 tidyr_1.3.1 forcats_1.0.0 readr_2.1.5 stringr_1.5.1
## [6] ggplot2_3.5.1 dplyr_1.1.4
##
## loaded via a namespace (and not attached):
## [1] bit_4.0.5 gtable_0.3.5 jsonlite_1.8.8 highr_0.11
## [5] crayon_1.5.3 compiler_4.4.1 tidyselect_1.2.1 parallel_4.4.1
## [9] jquerylib_0.1.4 scales_1.3.0 yaml_2.3.9 fastmap_1.2.0
## [13] R6_2.5.1 labeling_0.4.3 generics_0.1.3 tibble_3.2.1
## [17] munsell_0.5.1 bslib_0.7.0 pillar_1.9.0 tzdb_0.4.0
## [21] rlang_1.1.4 utf8_1.2.4 cachem_1.1.0 stringi_1.8.4
## [25] xfun_0.45 sass_0.4.9 bit64_4.0.5 cli_3.6.3
## [29] withr_3.0.0 magrittr_2.0.3 digest_0.6.36 grid_4.4.1
## [33] vroom_1.6.5 rstudioapi_0.16.0 hms_1.1.3 lifecycle_1.0.4
## [37] vctrs_0.6.5 evaluate_0.24.0 glue_1.7.0 farver_2.1.2
## [41] codetools_0.2-20 fansi_1.0.6 colorspace_2.1-0 purrr_1.0.2
## [45] rmarkdown_2.27 tools_4.4.1 pkgconfig_2.0.3 htmltools_0.5.8.1