Synopsis

Severe weather events can cause significant loss of life and economic damage. Understanding which types of events have the greatest impact helps authorities and policymakers prioritize resources for prevention and response. This analysis uses the NOAA Storm Database (1950–2011) to identify the weather events most harmful to population health and those with the highest economic consequences across the United States. Tornadoes were found to be the most harmful to human health, while floods and hurricanes resulted in the greatest economic losses. All steps, including data loading, cleaning, and analysis, are fully reproducible from the raw data file.

Data Processing

The raw NOAA Storm Database is provided as a compressed CSV (.csv.bz2). All preprocessing steps are performed within this document, starting from downloading the raw file. Damage estimates use the exponents (PROPDMGEXP, CROPDMGEXP) to convert values into USD. Event types (EVTYPE) are standardized to uppercase for consistency.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(readr)
library(stringr)
library(ggplot2)
library(scales)
## 
## Attaching package: 'scales'
## The following object is masked from 'package:readr':
## 
##     col_factor
library(knitr)

# Download raw data
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
destfile <- "StormData.csv.bz2"
if(!file.exists(destfile)) download.file(url, destfile, mode = "wb")

storm <- read_csv(destfile)
## Rows: 902297 Columns: 37
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (18): BGN_DATE, BGN_TIME, TIME_ZONE, COUNTYNAME, STATE, EVTYPE, BGN_AZI,...
## dbl (18): STATE__, COUNTY, BGN_RANGE, COUNTY_END, END_RANGE, LENGTH, WIDTH, ...
## lgl  (1): COUNTYENDN
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Function to convert exponent to multiplier
exp_to_mult <- function(exp) {
  if(is.na(exp) || exp == "") return(1)
  e <- toupper(str_trim(as.character(exp)))
  if(e == "H") return(100)
  if(e == "K") return(1e3)
  if(e == "M") return(1e6)
  if(e == "B") return(1e9)
  return(1)
}

df <- storm %>%
  select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP) %>%
  mutate(
    EVTYPE = str_to_upper(str_trim(EVTYPE)),
    PROP_MULT = sapply(PROPDMGEXP, exp_to_mult),
    CROP_MULT = sapply(CROPDMGEXP, exp_to_mult),
    PROP_DMG_USD = coalesce(PROPDMG,0) * PROP_MULT,
    CROP_DMG_USD = coalesce(CROPDMG,0) * CROP_MULT,
    TOTAL_ECONOMIC = PROP_DMG_USD + CROP_DMG_USD,
    TOTAL_HEALTH = coalesce(FATALITIES,0) + coalesce(INJURIES,0)
  )

Results

1. Events Most Harmful to Population Health

by_event <- df %>%
  group_by(EVTYPE) %>%
  summarise(
    FATALITIES = sum(FATALITIES, na.rm = TRUE),
    INJURIES = sum(INJURIES, na.rm = TRUE),
    HEALTH_IMPACT = sum(TOTAL_HEALTH, na.rm = TRUE),
    PROP_DAMAGE = sum(PROP_DMG_USD, na.rm = TRUE),
    CROP_DAMAGE = sum(CROP_DMG_USD, na.rm = TRUE),
    ECON_IMPACT = sum(TOTAL_ECONOMIC, na.rm = TRUE)
  )

top_health <- by_event %>% arrange(desc(HEALTH_IMPACT)) %>% slice(1:10)
kable(top_health[, c("EVTYPE","FATALITIES","INJURIES","HEALTH_IMPACT")], caption = "Top 10 Event Types by Health Impact")
Top 10 Event Types by Health Impact
EVTYPE FATALITIES INJURIES HEALTH_IMPACT
TORNADO 5633 91346 96979
EXCESSIVE HEAT 1903 6525 8428
TSTM WIND 504 6957 7461
FLOOD 470 6789 7259
LIGHTNING 816 5230 6046
HEAT 937 2100 3037
FLASH FLOOD 978 1777 2755
ICE STORM 89 1975 2064
THUNDERSTORM WIND 133 1488 1621
WINTER STORM 206 1321 1527
ggplot(top_health, aes(x=reorder(EVTYPE, HEALTH_IMPACT), y=HEALTH_IMPACT)) +
  geom_col(fill="steelblue") +
  coord_flip() +
  labs(x="Event Type", y="Total Fatalities + Injuries",
       title="Most Harmful Weather Events to Population Health") +
  theme_minimal()
Figure 1: Top 10 severe weather events causing the highest combined fatalities and injuries across the United States (1950–2011).

Figure 1: Top 10 severe weather events causing the highest combined fatalities and injuries across the United States (1950–2011).

Interpretation: Tornadoes cause the most injuries and fatalities combined, followed by excessive heat, floods, and lightning.

2. Events with Greatest Economic Consequences

top_econ <- by_event %>% arrange(desc(ECON_IMPACT)) %>% slice(1:10)
kable(top_econ[, c("EVTYPE","PROP_DAMAGE","CROP_DAMAGE","ECON_IMPACT")], caption = "Top 10 Event Types by Economic Damage")
Top 10 Event Types by Economic Damage
EVTYPE PROP_DAMAGE CROP_DAMAGE ECON_IMPACT
FLOOD 144657709807 5661968450 150319678257
HURRICANE/TYPHOON 69305840000 2607872800 71913712800
TORNADO 56937160779 414953270 57352114049
STORM SURGE 43323536000 5000 43323541000
HAIL 15732267543 3025954473 18758222016
FLASH FLOOD 16140862067 1421317100 17562179167
DROUGHT 1046106000 13972566000 15018672000
HURRICANE 11868319010 2741910000 14610229010
RIVER FLOOD 5118945500 5029459000 10148404500
ICE STORM 3944927860 5022113500 8967041360
ggplot(top_econ, aes(x=reorder(EVTYPE, ECON_IMPACT), y=ECON_IMPACT/1e9)) +
  geom_col(fill="darkred") +
  coord_flip() +
  labs(x="Event Type", y="Total Economic Damage (Billions USD)",
       title="Weather Events with Greatest Economic Consequences") +
  theme_minimal()
Figure 2: Top 10 severe weather events causing the largest combined property and crop damages in the United States (1950–2011).

Figure 2: Top 10 severe weather events causing the largest combined property and crop damages in the United States (1950–2011).

Interpretation: Floods and hurricanes/typhoons result in the largest total economic losses.

Conclusion

sessionInfo()
## R version 4.4.2 (2024-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 22631)
## 
## Matrix products: default
## 
## 
## locale:
## [1] LC_COLLATE=Dutch_Netherlands.utf8  LC_CTYPE=Dutch_Netherlands.utf8   
## [3] LC_MONETARY=Dutch_Netherlands.utf8 LC_NUMERIC=C                      
## [5] LC_TIME=Dutch_Netherlands.utf8    
## 
## time zone: Europe/Amsterdam
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] knitr_1.49    scales_1.3.0  ggplot2_3.5.1 stringr_1.5.1 readr_2.1.5  
## [6] dplyr_1.1.4  
## 
## loaded via a namespace (and not attached):
##  [1] bit_4.5.0.1       gtable_0.3.6      jsonlite_1.8.9    crayon_1.5.3     
##  [5] compiler_4.4.2    tidyselect_1.2.1  parallel_4.4.2    jquerylib_0.1.4  
##  [9] yaml_2.3.10       fastmap_1.2.0     R6_2.6.1          labeling_0.4.3   
## [13] generics_0.1.3    tibble_3.2.1      munsell_0.5.1     bslib_0.9.0      
## [17] pillar_1.10.1     tzdb_0.4.0        rlang_1.1.4       cachem_1.1.0     
## [21] stringi_1.8.4     xfun_0.50         sass_0.4.9        bit64_4.5.2      
## [25] cli_3.6.3         withr_3.0.2       magrittr_2.0.3    digest_0.6.37    
## [29] grid_4.4.2        vroom_1.6.5       rstudioapi_0.17.1 hms_1.1.3        
## [33] lifecycle_1.0.4   vctrs_0.6.5       evaluate_1.0.3    glue_1.8.0       
## [37] farver_2.1.2      codetools_0.2-20  colorspace_2.1-1  rmarkdown_2.29   
## [41] tools_4.4.2       pkgconfig_2.0.3   htmltools_0.5.8.1