Severe weather events can cause significant loss of life and economic damage. Understanding which types of events have the greatest impact helps authorities and policymakers prioritize resources for prevention and response. This analysis uses the NOAA Storm Database (1950–2011) to identify the weather events most harmful to population health and those with the highest economic consequences across the United States. Tornadoes were found to be the most harmful to human health, while floods and hurricanes resulted in the greatest economic losses. All steps, including data loading, cleaning, and analysis, are fully reproducible from the raw data file.
The raw NOAA Storm Database is provided as a compressed CSV
(.csv.bz2). All preprocessing steps are performed within
this document, starting from downloading the raw file. Damage estimates
use the exponents (PROPDMGEXP, CROPDMGEXP) to
convert values into USD. Event types (EVTYPE) are
standardized to uppercase for consistency.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(readr)
library(stringr)
library(ggplot2)
library(scales)
##
## Attaching package: 'scales'
## The following object is masked from 'package:readr':
##
## col_factor
library(knitr)
# Download raw data
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
destfile <- "StormData.csv.bz2"
if(!file.exists(destfile)) download.file(url, destfile, mode = "wb")
storm <- read_csv(destfile)
## Rows: 902297 Columns: 37
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (18): BGN_DATE, BGN_TIME, TIME_ZONE, COUNTYNAME, STATE, EVTYPE, BGN_AZI,...
## dbl (18): STATE__, COUNTY, BGN_RANGE, COUNTY_END, END_RANGE, LENGTH, WIDTH, ...
## lgl (1): COUNTYENDN
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Function to convert exponent to multiplier
exp_to_mult <- function(exp) {
if(is.na(exp) || exp == "") return(1)
e <- toupper(str_trim(as.character(exp)))
if(e == "H") return(100)
if(e == "K") return(1e3)
if(e == "M") return(1e6)
if(e == "B") return(1e9)
return(1)
}
df <- storm %>%
select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP) %>%
mutate(
EVTYPE = str_to_upper(str_trim(EVTYPE)),
PROP_MULT = sapply(PROPDMGEXP, exp_to_mult),
CROP_MULT = sapply(CROPDMGEXP, exp_to_mult),
PROP_DMG_USD = coalesce(PROPDMG,0) * PROP_MULT,
CROP_DMG_USD = coalesce(CROPDMG,0) * CROP_MULT,
TOTAL_ECONOMIC = PROP_DMG_USD + CROP_DMG_USD,
TOTAL_HEALTH = coalesce(FATALITIES,0) + coalesce(INJURIES,0)
)
by_event <- df %>%
group_by(EVTYPE) %>%
summarise(
FATALITIES = sum(FATALITIES, na.rm = TRUE),
INJURIES = sum(INJURIES, na.rm = TRUE),
HEALTH_IMPACT = sum(TOTAL_HEALTH, na.rm = TRUE),
PROP_DAMAGE = sum(PROP_DMG_USD, na.rm = TRUE),
CROP_DAMAGE = sum(CROP_DMG_USD, na.rm = TRUE),
ECON_IMPACT = sum(TOTAL_ECONOMIC, na.rm = TRUE)
)
top_health <- by_event %>% arrange(desc(HEALTH_IMPACT)) %>% slice(1:10)
kable(top_health[, c("EVTYPE","FATALITIES","INJURIES","HEALTH_IMPACT")], caption = "Top 10 Event Types by Health Impact")
| EVTYPE | FATALITIES | INJURIES | HEALTH_IMPACT |
|---|---|---|---|
| TORNADO | 5633 | 91346 | 96979 |
| EXCESSIVE HEAT | 1903 | 6525 | 8428 |
| TSTM WIND | 504 | 6957 | 7461 |
| FLOOD | 470 | 6789 | 7259 |
| LIGHTNING | 816 | 5230 | 6046 |
| HEAT | 937 | 2100 | 3037 |
| FLASH FLOOD | 978 | 1777 | 2755 |
| ICE STORM | 89 | 1975 | 2064 |
| THUNDERSTORM WIND | 133 | 1488 | 1621 |
| WINTER STORM | 206 | 1321 | 1527 |
ggplot(top_health, aes(x=reorder(EVTYPE, HEALTH_IMPACT), y=HEALTH_IMPACT)) +
geom_col(fill="steelblue") +
coord_flip() +
labs(x="Event Type", y="Total Fatalities + Injuries",
title="Most Harmful Weather Events to Population Health") +
theme_minimal()
Figure 1: Top 10 severe weather events causing the highest combined fatalities and injuries across the United States (1950–2011).
Interpretation: Tornadoes cause the most injuries and fatalities combined, followed by excessive heat, floods, and lightning.
top_econ <- by_event %>% arrange(desc(ECON_IMPACT)) %>% slice(1:10)
kable(top_econ[, c("EVTYPE","PROP_DAMAGE","CROP_DAMAGE","ECON_IMPACT")], caption = "Top 10 Event Types by Economic Damage")
| EVTYPE | PROP_DAMAGE | CROP_DAMAGE | ECON_IMPACT |
|---|---|---|---|
| FLOOD | 144657709807 | 5661968450 | 150319678257 |
| HURRICANE/TYPHOON | 69305840000 | 2607872800 | 71913712800 |
| TORNADO | 56937160779 | 414953270 | 57352114049 |
| STORM SURGE | 43323536000 | 5000 | 43323541000 |
| HAIL | 15732267543 | 3025954473 | 18758222016 |
| FLASH FLOOD | 16140862067 | 1421317100 | 17562179167 |
| DROUGHT | 1046106000 | 13972566000 | 15018672000 |
| HURRICANE | 11868319010 | 2741910000 | 14610229010 |
| RIVER FLOOD | 5118945500 | 5029459000 | 10148404500 |
| ICE STORM | 3944927860 | 5022113500 | 8967041360 |
ggplot(top_econ, aes(x=reorder(EVTYPE, ECON_IMPACT), y=ECON_IMPACT/1e9)) +
geom_col(fill="darkred") +
coord_flip() +
labs(x="Event Type", y="Total Economic Damage (Billions USD)",
title="Weather Events with Greatest Economic Consequences") +
theme_minimal()
Figure 2: Top 10 severe weather events causing the largest combined property and crop damages in the United States (1950–2011).
Interpretation: Floods and hurricanes/typhoons result in the largest total economic losses.
sessionInfo()
## R version 4.4.2 (2024-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 22631)
##
## Matrix products: default
##
##
## locale:
## [1] LC_COLLATE=Dutch_Netherlands.utf8 LC_CTYPE=Dutch_Netherlands.utf8
## [3] LC_MONETARY=Dutch_Netherlands.utf8 LC_NUMERIC=C
## [5] LC_TIME=Dutch_Netherlands.utf8
##
## time zone: Europe/Amsterdam
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] knitr_1.49 scales_1.3.0 ggplot2_3.5.1 stringr_1.5.1 readr_2.1.5
## [6] dplyr_1.1.4
##
## loaded via a namespace (and not attached):
## [1] bit_4.5.0.1 gtable_0.3.6 jsonlite_1.8.9 crayon_1.5.3
## [5] compiler_4.4.2 tidyselect_1.2.1 parallel_4.4.2 jquerylib_0.1.4
## [9] yaml_2.3.10 fastmap_1.2.0 R6_2.6.1 labeling_0.4.3
## [13] generics_0.1.3 tibble_3.2.1 munsell_0.5.1 bslib_0.9.0
## [17] pillar_1.10.1 tzdb_0.4.0 rlang_1.1.4 cachem_1.1.0
## [21] stringi_1.8.4 xfun_0.50 sass_0.4.9 bit64_4.5.2
## [25] cli_3.6.3 withr_3.0.2 magrittr_2.0.3 digest_0.6.37
## [29] grid_4.4.2 vroom_1.6.5 rstudioapi_0.17.1 hms_1.1.3
## [33] lifecycle_1.0.4 vctrs_0.6.5 evaluate_1.0.3 glue_1.8.0
## [37] farver_2.1.2 codetools_0.2-20 colorspace_2.1-1 rmarkdown_2.29
## [41] tools_4.4.2 pkgconfig_2.0.3 htmltools_0.5.8.1