NOAA Storm Data Analysis — Health & Economic Impact (1950

0.1 Synopsis
0.2 Data Processing
0.3 Results
- 0.3.1 Most harmful event types to population health (fatalities + injuries)
- 0.3.2 Event types with the greatest economic consequences
0.4 Reproducibility Notes
0.5 Session Info

0.1 Synopsis

This report analyzes the U.S. NOAA Storm Database (events from 1950 through November 2011) to identify which event types are most harmful to population health and which have the greatest economic consequences.
We begin from the raw compressed CSV (.csv.bz2) and perform all data processing within this document to ensure full reproducibility.
For health impact, we consider both fatalities and injuries.
For economic impact, we compute property and crop damages after converting NOAA’s damage exponents (e.g., K, M, B) to numeric multipliers.
Because EVTYPE values are messy and inconsistent across decades, we apply a transparent, pattern-based grouping into canonical event categories (e.g., TORNADO, FLOOD, THUNDERSTORM WIND, HURRICANE/TYPHOON).
We present two figures (top-10 health impact; top-10 economic damage) with descriptive captions.
Earlier years in the database contain fewer recorded events; more recent years are considered more complete.
This report is intended to support decision-makers who must prioritize preparedness and mitigation resources.

TL;DR (fill after knitting): After grouping, the largest impacts typically come from events like TORNADO (health) and FLOOD/HURRICANE (economic), though exact ranks and totals are shown in the figures below.

0.2 Data Processing

0.2.1 Loading libraries and downloading raw data

We rely on data.table for fast reads, dplyr for transformation, lubridate for dates, and stringr for text normalization.

library(data.table)
library(dplyr)
library(lubridate)
library(stringr)
library(ggplot2)
library(knitr)

We download the raw compressed CSV if it is not already present.
> Source typically used in the Reproducible Research course.

if (!dir.exists("data")) dir.create("data", showWarnings = FALSE)
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
destfile <- "data/StormData.csv.bz2"

if (!file.exists(destfile)) {
  download.file(url, destfile, mode = "wb", quiet = TRUE)
}

file.info(destfile)[, c("size", "mtime")]

0.2.2 Reading the raw file and initial inspection

storm_raw <- fread("data/StormData.csv.bz2", stringsAsFactors = FALSE, showProgress = FALSE)
dim(storm_raw)

## [1] 902297     37

head(storm_raw[, .(BGN_DATE, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)])

Key variables used: - EVTYPE: event type (text; very messy historically) - FATALITIES, INJURIES: population health metrics - PROPDMG, PROPDMGEXP: property damage and exponent code - CROPDMG, CROPDMGEXP: crop damage and exponent code - BGN_DATE: start date of the event

0.2.3 Cleaning and transformations

We parse dates, normalize EVTYPE to improve grouping, and convert damage exponents to numeric multipliers.

Rules for damage exponents: - K = 1,000; M = 1,000,000; B = 1,000,000,000; H = 100
- Empty/unknown/+/-/0 → multiplier = 1
- Digits 1–9 are treated as 10^digit (a common interpretation in this dataset)
- Any other unrecognized code defaults to multiplier = 1 (conservative)

storm_raw <- storm_raw %>%
  mutate(
    BGN_DATE = mdy_hms(BGN_DATE),
    EVTYPE = toupper(EVTYPE),
    EVTYPE = str_replace_all(EVTYPE, "[[:punct:]]", " "),
    EVTYPE = str_squish(EVTYPE)
  )

dmg_exp_to_num <- function(exp) {
  exp <- toupper(str_trim(as.character(exp)))
  exp[is.na(exp)] <- ""
  ifelse(exp %in% c("", "0", "+", "-"), 1,
  ifelse(exp == "H", 1e2,
  ifelse(exp == "K", 1e3,
  ifelse(exp == "M", 1e6,
  ifelse(exp == "B", 1e9,
  ifelse(grepl("^[0-9]+$", exp), 10^as.numeric(exp), 1))))))
}

storm_clean <- storm_raw %>%
  mutate(
    PROP_MULT = dmg_exp_to_num(PROPDMGEXP),
    CROP_MULT = dmg_exp_to_num(CROPDMGEXP),
    property_damage = as.numeric(PROPDMG) * PROP_MULT,
    crop_damage = as.numeric(CROPDMG) * CROP_MULT,
    total_damage = property_damage + crop_damage
  ) %>%
  select(BGN_DATE, STATE, EVTYPE, FATALITIES, INJURIES,
         property_damage, crop_damage, total_damage)

0.2.4 Grouping EVTYPE into canonical categories

NOAA’s event type values contain many variants and typographical differences.
We apply a pattern-based mapping to consolidate common classes.
This mapping is transparent and can be adjusted; unmatched events retain their cleaned EVTYPE.

storm_clean <- storm_clean %>%
  mutate(event_group = dplyr::case_when(
    grepl("TORNADO", EVTYPE) ~ "TORNADO",
    grepl("HURRICANE|TYPHOON|STORM SURGE", EVTYPE) ~ "HURRICANE/TYPHOON",
    grepl("THUNDERSTORM WIND|TSTM WIND|\\bTSTM\\b|THUNDERSTORM", EVTYPE) ~ "THUNDERSTORM WIND",
    grepl("\\bHAIL\\b", EVTYPE) ~ "HAIL",
    grepl("FLASH FLOOD", EVTYPE) ~ "FLASH FLOOD",
    grepl("\\bFLOOD\\b|URBAN.*FLOOD", EVTYPE) ~ "FLOOD",
    grepl("\\bHEAT\\b|EXCESSIVE HEAT|RECORD HEAT|WARMTH|HYPERTHERMIA", EVTYPE) ~ "HEAT",
    grepl("DROUGHT|DRY", EVTYPE) ~ "DROUGHT",
    grepl("LIGHTNING", EVTYPE) ~ "LIGHTNING",
    grepl("BLIZZARD|WINTER|SNOW|ICE|FREEZ|SLEET", EVTYPE) ~ "WINTER WEATHER",
    grepl("WIND", EVTYPE) & !grepl("THUNDERSTORM", EVTYPE) ~ "WIND",
    TRUE ~ EVTYPE
  ))

length(unique(storm_clean$EVTYPE))

## [1] 837

length(unique(storm_clean$event_group))

## [1] 332

kable(head(sort(table(storm_clean$event_group), decreasing = TRUE)), col.names = c("event_group", "count"))

event_group	count
THUNDERSTORM WIND	336806
HAIL	289279
TORNADO	60700
FLASH FLOOD	55667
WINTER WEATHER	44029
WIND	28129

0.3 Results

0.3.1 Most harmful event types to population health (fatalities + injuries)

We sum fatalities and injuries per event_group and present the top 10.

health_summary <- storm_clean %>%
  group_by(event_group) %>%
  summarise(
    total_fatalities = sum(as.numeric(FATALITIES), na.rm = TRUE),
    total_injuries = sum(as.numeric(INJURIES), na.rm = TRUE),
    total_health = total_fatalities + total_injuries,
    .groups = "drop"
  ) %>%
  arrange(desc(total_health))

top10_health <- head(health_summary, 10)

ggplot(top10_health, aes(x = reorder(event_group, total_health), y = total_health)) +
  geom_col() +
  coord_flip() +
  labs(
    title = "Top 10 Event Types by Human Impact",
    x = "Event Type (grouped)",
    y = "Fatalities + Injuries (total)"
  ) +
  theme_minimal()

Top 10 event groups by total population health impact (fatalities + injuries).

kable(top10_health, digits = 0, caption = "Table: Top-10 event groups by total health impact.")

Table: Top-10 event groups by total health impact.
event_group	total_fatalities	total_injuries	total_health
TORNADO	5661	91407	97068
HEAT	3139	9224	12363
THUNDERSTORM WIND	729	9544	10273
FLOOD	478	6793	7271
WINTER WEATHER	658	6052	6710
LIGHTNING	817	5231	6048
FLASH FLOOD	1035	1802	2837
WIND	690	1935	2625
HURRICANE/TYPHOON	159	1376	1535
HAIL	15	1371	1386

Interpretation (edit after knitting):
Describe the top-ranked event types by total health impact and note whether the burden is driven more by injuries or fatalities for each.

0.3.2 Event types with the greatest economic consequences

We compute property + crop damages (USD) and present the top 10 (in billions).

econ_summary <- storm_clean %>%
  group_by(event_group) %>%
  summarise(
    total_property = sum(property_damage, na.rm = TRUE),
    total_crop = sum(crop_damage, na.rm = TRUE),
    total_damage = total_property + total_crop,
    .groups = "drop"
  ) %>%
  arrange(desc(total_damage))

top10_econ <- head(econ_summary, 10)

ggplot(top10_econ, aes(x = reorder(event_group, total_damage), y = total_damage / 1e9)) +
  geom_col() +
  coord_flip() +
  labs(
    title = "Top 10 Event Types by Economic Damage",
    x = "Event Type (grouped)",
    y = "Total Damage (billion USD)"
  ) +
  theme_minimal()

Top 10 event groups by total economic damage (property + crop). Values shown in billions of USD.

kable(mutate(top10_econ,
             total_property = round(total_property/1e9, 2),
             total_crop = round(total_crop/1e9, 2),
             total_damage = round(total_damage/1e9, 2)),
      col.names = c("event_group","property (B)","crop (B)","total (B)"),
      caption = "Table: Top-10 event groups by economic damage (billions USD).")

Table: Top-10 event groups by economic damage (billions USD).
event_group	property (B)	crop (B)	total (B)
FLOOD	150.21	10.81	161.01
HURRICANE/TYPHOON	133.32	5.52	138.84
TORNADO	58.60	0.42	59.02
WINTER WEATHER	12.47	7.21	19.68
FLASH FLOOD	17.59	1.53	19.12
HAIL	15.74	3.05	18.78
DROUGHT	1.05	13.97	15.03
THUNDERSTORM WIND	11.18	1.27	12.46
TROPICAL STORM	7.70	0.68	8.38
WIND	6.20	0.78	6.98

Interpretation (edit after knitting):
Discuss which categories dominate economic losses and whether property vs. crop damages differ by event type (e.g., drought vs. flood).

0.4 Reproducibility Notes

The analysis starts from the raw compressed CSV (StormData.csv.bz2) and performs all cleaning and transformations within this R Markdown document.
We used explicit, documented rules to convert damage exponents to numeric multipliers. Unrecognized codes default to a multiplier of 1 (conservative).
We applied a transparent, pattern-based grouping for EVTYPE. The mapping is shown and can be refined; unmatched events retain their cleaned label.
Earlier decades in the dataset have fewer recorded events, likely due to limited records; modern years are considered more complete.

0.5 Session Info

sessionInfo()

## R version 4.4.1 (2024-06-14 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 26100)
## 
## Matrix products: default
## 
## 
## locale:
## [1] LC_COLLATE=English_India.utf8  LC_CTYPE=English_India.utf8   
## [3] LC_MONETARY=English_India.utf8 LC_NUMERIC=C                  
## [5] LC_TIME=English_India.utf8    
## 
## time zone: Asia/Calcutta
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] knitr_1.50        ggplot2_3.5.2     stringr_1.5.1     lubridate_1.9.4  
## [5] dplyr_1.1.4       data.table_1.16.2
## 
## loaded via a namespace (and not attached):
##  [1] vctrs_0.6.5        cli_3.6.3          rlang_1.1.4        xfun_0.52         
##  [5] stringi_1.8.4      generics_0.1.4     jsonlite_1.8.8     labeling_0.4.3    
##  [9] glue_1.7.0         htmltools_0.5.8.1  sass_0.4.9         scales_1.4.0      
## [13] rmarkdown_2.29     grid_4.4.1         evaluate_0.24.0    jquerylib_0.1.4   
## [17] tibble_3.3.0       fastmap_1.2.0      yaml_2.3.9         lifecycle_1.0.4   
## [21] compiler_4.4.1     RColorBrewer_1.1-3 timechange_0.3.0   pkgconfig_2.0.3   
## [25] farver_2.1.2       digest_0.6.36      R6_2.5.1           tidyselect_1.2.1  
## [29] pillar_1.11.0      magrittr_2.0.3     bslib_0.7.0        withr_3.0.0       
## [33] gtable_0.3.6       tools_4.4.1        cachem_1.1.0

NOAA Storm Data Analysis — Health & Economic Impact (1950–2011)

Rahul V

2025-08-18