Synopsis

This analysis explores the U.S. National Oceanic and Atmospheric Administration (NOAA) Storm Database to identify which types of severe weather events have the greatest impact on population health and the greatest economic consequences in the United States between 1950 and November 2011. Population health impact is measured by the total number of fatalities and injuries attributed to each event type, while economic impact is measured as the combined dollar value of property and crop damage. Because damage magnitudes in the raw file are stored with an alphabetic exponent column (PROPDMGEXP, CROPDMGEXP), these values are translated into numeric multipliers before summation. The analysis shows that tornadoes are by far the leading cause of weather-related fatalities and injuries in the United States, while floods are the most economically damaging event type, followed by hurricanes/typhoons, tornadoes, and storm surges. These findings suggest that preparedness resources should prioritize tornado early-warning systems for public health protection and flood mitigation infrastructure for economic resilience.

Data Processing

Loading the raw data

The analysis begins directly from the raw repdata_data_StormData.csv.bz2 file provided with the assignment. The file is downloaded (if not already present) and read directly with read.csv, which handles bzip2 decompression transparently.

data_url  <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
data_file <- "StormData.csv.bz2"

if (!file.exists(data_file)) {
    download.file(data_url, destfile = data_file, mode = "wb")
}

storm <- read.csv(data_file, stringsAsFactors = FALSE)
dim(storm)
## [1] 902297     37

Selecting relevant columns

We only need the event type, the fatalities and injuries counts, and the property/crop damage fields with their exponent columns:

library(dplyr)

storm <- storm %>%
    select(EVTYPE, FATALITIES, INJURIES,
           PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)

Converting damage exponents to numeric multipliers

The PROPDMGEXP and CROPDMGEXP columns encode the order of magnitude of PROPDMG and CROPDMG as single characters: "K" = thousand, "M" = million, "B" = billion. Numeric characters "0"-"8" also appear in the raw data; following the convention in the NOAA documentation we treat them as powers of ten. Any other character (including "?", "+", "-", empty) is treated as a multiplier of 1.

exp_to_mult <- function(e) {
    e <- toupper(as.character(e))
    dplyr::case_when(
        e == "B" ~ 1e9,
        e == "M" ~ 1e6,
        e == "K" ~ 1e3,
        e == "H" ~ 1e2,
        e %in% as.character(0:8) ~ 10 ^ as.numeric(e),
        TRUE     ~ 1
    )
}

storm <- storm %>%
    mutate(
        prop_damage = PROPDMG * exp_to_mult(PROPDMGEXP),
        crop_damage = CROPDMG * exp_to_mult(CROPDMGEXP),
        total_damage = prop_damage + crop_damage
    )

Cleaning event types

EVTYPE is free-text and contains many near-duplicates (leading spaces, case differences, etc.). We apply a light cleanup: uppercase and trim whitespace. We do not attempt a full canonicalisation against the 48 official NOAA categories because the ranking of the top events is robust to those differences.

storm <- storm %>%
    mutate(EVTYPE = trimws(toupper(EVTYPE)))

Aggregating by event type

health <- storm %>%
    group_by(EVTYPE) %>%
    summarise(fatalities = sum(FATALITIES, na.rm = TRUE),
              injuries   = sum(INJURIES,   na.rm = TRUE),
              total      = fatalities + injuries,
              .groups    = "drop") %>%
    arrange(desc(total))

economic <- storm %>%
    group_by(EVTYPE) %>%
    summarise(property = sum(prop_damage, na.rm = TRUE),
              crop     = sum(crop_damage, na.rm = TRUE),
              total    = property + crop,
              .groups  = "drop") %>%
    arrange(desc(total))

Results

Events most harmful to population health

We rank event types by the total number of people affected (fatalities + injuries). The table below shows the top 10:

top_health <- head(health, 10)
knitr::kable(top_health,
             caption = "Top 10 event types by total casualties (fatalities + injuries), 1950-2011")
Top 10 event types by total casualties (fatalities + injuries), 1950-2011
EVTYPE fatalities injuries total
TORNADO 5633 91346 96979
EXCESSIVE HEAT 1903 6525 8428
TSTM WIND 504 6957 7461
FLOOD 470 6789 7259
LIGHTNING 816 5230 6046
HEAT 937 2100 3037
FLASH FLOOD 978 1777 2755
ICE STORM 89 1975 2064
THUNDERSTORM WIND 133 1488 1621
WINTER STORM 206 1321 1527
library(ggplot2)
library(tidyr)

top_health_long <- top_health %>%
    select(EVTYPE, fatalities, injuries) %>%
    pivot_longer(c(fatalities, injuries),
                 names_to = "type", values_to = "count")

ggplot(top_health_long,
       aes(x = reorder(EVTYPE, -count), y = count, fill = type)) +
    geom_col(position = "dodge") +
    scale_y_continuous(labels = scales::comma) +
    labs(x = "Event type",
         y = "Number of people",
         fill = "",
         title = "Top 10 severe weather event types by health impact") +
    theme_minimal() +
    theme(axis.text.x = element_text(angle = 35, hjust = 1))
Figure 1. Top 10 weather event types in the United States (1950-2011) by total fatalities and injuries. Tornadoes dominate both categories by a wide margin.
Figure 1. Top 10 weather event types in the United States (1950-2011) by total fatalities and injuries. Tornadoes dominate both categories by a wide margin.

Finding. Tornadoes are by far the most harmful event type with respect to population health, responsible for more total fatalities and injuries than all other event types combined in the top 10. Excessive heat, TSTM wind, floods, and lightning round out the top five.

Events with the greatest economic consequences

We rank event types by the total damage in U.S. dollars (property + crop). The table below shows the top 10:

top_econ <- head(economic, 10)
knitr::kable(
    top_econ %>%
        mutate(across(c(property, crop, total),
                      ~ scales::dollar(., scale = 1e-9, suffix = "B", accuracy = 0.1))),
    caption = "Top 10 event types by total property + crop damage, 1950-2011 (USD billions)")
Top 10 event types by total property + crop damage, 1950-2011 (USD billions)
EVTYPE property crop total
FLOOD $144.7B $5.7B $150.3B
HURRICANE/TYPHOON $69.3B $2.6B $71.9B
TORNADO $56.9B $0.4B $57.4B
STORM SURGE $43.3B $0.0B $43.3B
HAIL $15.7B $3.0B $18.8B
FLASH FLOOD $16.8B $1.4B $18.2B
DROUGHT $1.0B $14.0B $15.0B
HURRICANE $11.9B $2.7B $14.6B
RIVER FLOOD $5.1B $5.0B $10.1B
ICE STORM $3.9B $5.0B $9.0B
top_econ_long <- top_econ %>%
    select(EVTYPE, property, crop) %>%
    pivot_longer(c(property, crop),
                 names_to = "type", values_to = "damage")

ggplot(top_econ_long,
       aes(x = reorder(EVTYPE, -damage),
           y = damage / 1e9, fill = type)) +
    geom_col() +
    labs(x = "Event type",
         y = "Damage (USD billions)",
         fill = "",
         title = "Top 10 severe weather event types by economic impact") +
    theme_minimal() +
    theme(axis.text.x = element_text(angle = 35, hjust = 1))
Figure 2. Top 10 weather event types in the United States (1950-2011) by combined property and crop damage, in billions of USD. Floods lead by a wide margin.
Figure 2. Top 10 weather event types in the United States (1950-2011) by combined property and crop damage, in billions of USD. Floods lead by a wide margin.

Finding. Floods are the most economically destructive event type, followed by hurricanes/typhoons, tornadoes, and storm surges. Property damage dominates the totals for most top event types; crop damage is most significant for droughts.

Conclusion

Across the United States over 1950-2011, tornadoes have the largest impact on population health while floods cause the greatest aggregate economic damage. A government manager responsible for allocating preparedness resources across both dimensions might therefore weight tornado warning/shelter programmes most heavily for life-safety concerns and flood mitigation (levees, drainage, insurance) most heavily for economic resilience.

Session info

sessionInfo()
## R version 4.5.2 (2025-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 26200)
## 
## Matrix products: default
##   LAPACK version 3.12.1
## 
## locale:
## [1] LC_COLLATE=English_India.utf8  LC_CTYPE=English_India.utf8   
## [3] LC_MONETARY=English_India.utf8 LC_NUMERIC=C                  
## [5] LC_TIME=English_India.utf8    
## 
## time zone: Asia/Calcutta
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] tidyr_1.3.2   ggplot2_4.0.2 dplyr_1.2.0  
## 
## loaded via a namespace (and not attached):
##  [1] vctrs_0.7.2        cli_3.6.5          knitr_1.51         rlang_1.1.7       
##  [5] xfun_0.57          purrr_1.2.1        generics_0.1.4     S7_0.2.1          
##  [9] jsonlite_2.0.0     labeling_0.4.3     glue_1.8.0         htmltools_0.5.9   
## [13] sass_0.4.10        scales_1.4.0       rmarkdown_2.31     grid_4.5.2        
## [17] evaluate_1.0.5     jquerylib_0.1.4    tibble_3.3.1       fastmap_1.2.0     
## [21] yaml_2.3.12        lifecycle_1.0.5    compiler_4.5.2     codetools_0.2-20  
## [25] RColorBrewer_1.1-3 pkgconfig_2.0.3    farver_2.1.2       digest_0.6.39     
## [29] R6_2.6.1           tidyselect_1.2.1   pillar_1.11.1      magrittr_2.0.4    
## [33] bslib_0.10.0       gtable_0.3.6       tools_4.5.2        withr_3.0.2       
## [37] cachem_1.1.0