0.1 Synopsis

This report analyzes the U.S. NOAA Storm Database (events from 1950 through November 2011) to identify which event types are most harmful to population health and which have the greatest economic consequences.
We begin from the raw compressed CSV (.csv.bz2) and perform all data processing within this document to ensure full reproducibility.
For health impact, we consider both fatalities and injuries.
For economic impact, we compute property and crop damages after converting NOAA’s damage exponents (e.g., K, M, B) to numeric multipliers.
Because EVTYPE values are messy and inconsistent across decades, we apply a transparent, pattern-based grouping into canonical event categories (e.g., TORNADO, FLOOD, THUNDERSTORM WIND, HURRICANE/TYPHOON).
We present two figures (top-10 health impact; top-10 economic damage) with descriptive captions.
Earlier years in the database contain fewer recorded events; more recent years are considered more complete.
This report is intended to support decision-makers who must prioritize preparedness and mitigation resources.

TL;DR (fill after knitting): After grouping, the largest impacts typically come from events like TORNADO (health) and FLOOD/HURRICANE (economic), though exact ranks and totals are shown in the figures below.

0.2 Data Processing

0.2.1 Loading libraries and downloading raw data

We rely on data.table for fast reads, dplyr for transformation, lubridate for dates, and stringr for text normalization.

library(data.table)
library(dplyr)
library(lubridate)
library(stringr)
library(ggplot2)
library(knitr)

We download the raw compressed CSV if it is not already present.
> Source typically used in the Reproducible Research course.

if (!dir.exists("data")) dir.create("data", showWarnings = FALSE)
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
destfile <- "data/StormData.csv.bz2"

if (!file.exists(destfile)) {
  download.file(url, destfile, mode = "wb", quiet = TRUE)
}

file.info(destfile)[, c("size", "mtime")]

0.2.2 Reading the raw file and initial inspection

storm_raw <- fread("data/StormData.csv.bz2", stringsAsFactors = FALSE, showProgress = FALSE)
dim(storm_raw)
## [1] 902297     37
head(storm_raw[, .(BGN_DATE, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)])

Key variables used: - EVTYPE: event type (text; very messy historically) - FATALITIES, INJURIES: population health metrics - PROPDMG, PROPDMGEXP: property damage and exponent code - CROPDMG, CROPDMGEXP: crop damage and exponent code - BGN_DATE: start date of the event

0.2.3 Cleaning and transformations

We parse dates, normalize EVTYPE to improve grouping, and convert damage exponents to numeric multipliers.

Rules for damage exponents: - K = 1,000; M = 1,000,000; B = 1,000,000,000; H = 100
- Empty/unknown/+/-/0 → multiplier = 1
- Digits 19 are treated as 10^digit (a common interpretation in this dataset)
- Any other unrecognized code defaults to multiplier = 1 (conservative)

storm_raw <- storm_raw %>%
  mutate(
    BGN_DATE = mdy_hms(BGN_DATE),
    EVTYPE = toupper(EVTYPE),
    EVTYPE = str_replace_all(EVTYPE, "[[:punct:]]", " "),
    EVTYPE = str_squish(EVTYPE)
  )

dmg_exp_to_num <- function(exp) {
  exp <- toupper(str_trim(as.character(exp)))
  exp[is.na(exp)] <- ""
  ifelse(exp %in% c("", "0", "+", "-"), 1,
  ifelse(exp == "H", 1e2,
  ifelse(exp == "K", 1e3,
  ifelse(exp == "M", 1e6,
  ifelse(exp == "B", 1e9,
  ifelse(grepl("^[0-9]+$", exp), 10^as.numeric(exp), 1))))))
}

storm_clean <- storm_raw %>%
  mutate(
    PROP_MULT = dmg_exp_to_num(PROPDMGEXP),
    CROP_MULT = dmg_exp_to_num(CROPDMGEXP),
    property_damage = as.numeric(PROPDMG) * PROP_MULT,
    crop_damage = as.numeric(CROPDMG) * CROP_MULT,
    total_damage = property_damage + crop_damage
  ) %>%
  select(BGN_DATE, STATE, EVTYPE, FATALITIES, INJURIES,
         property_damage, crop_damage, total_damage)

0.2.4 Grouping EVTYPE into canonical categories

NOAA’s event type values contain many variants and typographical differences.
We apply a pattern-based mapping to consolidate common classes.
This mapping is transparent and can be adjusted; unmatched events retain their cleaned EVTYPE.

storm_clean <- storm_clean %>%
  mutate(event_group = dplyr::case_when(
    grepl("TORNADO", EVTYPE) ~ "TORNADO",
    grepl("HURRICANE|TYPHOON|STORM SURGE", EVTYPE) ~ "HURRICANE/TYPHOON",
    grepl("THUNDERSTORM WIND|TSTM WIND|\\bTSTM\\b|THUNDERSTORM", EVTYPE) ~ "THUNDERSTORM WIND",
    grepl("\\bHAIL\\b", EVTYPE) ~ "HAIL",
    grepl("FLASH FLOOD", EVTYPE) ~ "FLASH FLOOD",
    grepl("\\bFLOOD\\b|URBAN.*FLOOD", EVTYPE) ~ "FLOOD",
    grepl("\\bHEAT\\b|EXCESSIVE HEAT|RECORD HEAT|WARMTH|HYPERTHERMIA", EVTYPE) ~ "HEAT",
    grepl("DROUGHT|DRY", EVTYPE) ~ "DROUGHT",
    grepl("LIGHTNING", EVTYPE) ~ "LIGHTNING",
    grepl("BLIZZARD|WINTER|SNOW|ICE|FREEZ|SLEET", EVTYPE) ~ "WINTER WEATHER",
    grepl("WIND", EVTYPE) & !grepl("THUNDERSTORM", EVTYPE) ~ "WIND",
    TRUE ~ EVTYPE
  ))

length(unique(storm_clean$EVTYPE))
## [1] 837
length(unique(storm_clean$event_group))
## [1] 332
kable(head(sort(table(storm_clean$event_group), decreasing = TRUE)), col.names = c("event_group", "count"))
event_group count
THUNDERSTORM WIND 336806
HAIL 289279
TORNADO 60700
FLASH FLOOD 55667
WINTER WEATHER 44029
WIND 28129

0.3 Results

0.3.1 Most harmful event types to population health (fatalities + injuries)

We sum fatalities and injuries per event_group and present the top 10.

health_summary <- storm_clean %>%
  group_by(event_group) %>%
  summarise(
    total_fatalities = sum(as.numeric(FATALITIES), na.rm = TRUE),
    total_injuries = sum(as.numeric(INJURIES), na.rm = TRUE),
    total_health = total_fatalities + total_injuries,
    .groups = "drop"
  ) %>%
  arrange(desc(total_health))

top10_health <- head(health_summary, 10)

ggplot(top10_health, aes(x = reorder(event_group, total_health), y = total_health)) +
  geom_col() +
  coord_flip() +
  labs(
    title = "Top 10 Event Types by Human Impact",
    x = "Event Type (grouped)",
    y = "Fatalities + Injuries (total)"
  ) +
  theme_minimal()
Top 10 event groups by total population health impact (fatalities + injuries).

Top 10 event groups by total population health impact (fatalities + injuries).

kable(top10_health, digits = 0, caption = "Table: Top-10 event groups by total health impact.")
Table: Top-10 event groups by total health impact.
event_group total_fatalities total_injuries total_health
TORNADO 5661 91407 97068
HEAT 3139 9224 12363
THUNDERSTORM WIND 729 9544 10273
FLOOD 478 6793 7271
WINTER WEATHER 658 6052 6710
LIGHTNING 817 5231 6048
FLASH FLOOD 1035 1802 2837
WIND 690 1935 2625
HURRICANE/TYPHOON 159 1376 1535
HAIL 15 1371 1386

Interpretation (edit after knitting):
Describe the top-ranked event types by total health impact and note whether the burden is driven more by injuries or fatalities for each.

0.3.2 Event types with the greatest economic consequences

We compute property + crop damages (USD) and present the top 10 (in billions).

econ_summary <- storm_clean %>%
  group_by(event_group) %>%
  summarise(
    total_property = sum(property_damage, na.rm = TRUE),
    total_crop = sum(crop_damage, na.rm = TRUE),
    total_damage = total_property + total_crop,
    .groups = "drop"
  ) %>%
  arrange(desc(total_damage))

top10_econ <- head(econ_summary, 10)

ggplot(top10_econ, aes(x = reorder(event_group, total_damage), y = total_damage / 1e9)) +
  geom_col() +
  coord_flip() +
  labs(
    title = "Top 10 Event Types by Economic Damage",
    x = "Event Type (grouped)",
    y = "Total Damage (billion USD)"
  ) +
  theme_minimal()
Top 10 event groups by total economic damage (property + crop). Values shown in billions of USD.

Top 10 event groups by total economic damage (property + crop). Values shown in billions of USD.

kable(mutate(top10_econ,
             total_property = round(total_property/1e9, 2),
             total_crop = round(total_crop/1e9, 2),
             total_damage = round(total_damage/1e9, 2)),
      col.names = c("event_group","property (B)","crop (B)","total (B)"),
      caption = "Table: Top-10 event groups by economic damage (billions USD).")
Table: Top-10 event groups by economic damage (billions USD).
event_group property (B) crop (B) total (B)
FLOOD 150.21 10.81 161.01
HURRICANE/TYPHOON 133.32 5.52 138.84
TORNADO 58.60 0.42 59.02
WINTER WEATHER 12.47 7.21 19.68
FLASH FLOOD 17.59 1.53 19.12
HAIL 15.74 3.05 18.78
DROUGHT 1.05 13.97 15.03
THUNDERSTORM WIND 11.18 1.27 12.46
TROPICAL STORM 7.70 0.68 8.38
WIND 6.20 0.78 6.98

Interpretation (edit after knitting):
Discuss which categories dominate economic losses and whether property vs. crop damages differ by event type (e.g., drought vs. flood).

0.4 Reproducibility Notes

0.5 Session Info

sessionInfo()
## R version 4.4.1 (2024-06-14 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 26100)
## 
## Matrix products: default
## 
## 
## locale:
## [1] LC_COLLATE=English_India.utf8  LC_CTYPE=English_India.utf8   
## [3] LC_MONETARY=English_India.utf8 LC_NUMERIC=C                  
## [5] LC_TIME=English_India.utf8    
## 
## time zone: Asia/Calcutta
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] knitr_1.50        ggplot2_3.5.2     stringr_1.5.1     lubridate_1.9.4  
## [5] dplyr_1.1.4       data.table_1.16.2
## 
## loaded via a namespace (and not attached):
##  [1] vctrs_0.6.5        cli_3.6.3          rlang_1.1.4        xfun_0.52         
##  [5] stringi_1.8.4      generics_0.1.4     jsonlite_1.8.8     labeling_0.4.3    
##  [9] glue_1.7.0         htmltools_0.5.8.1  sass_0.4.9         scales_1.4.0      
## [13] rmarkdown_2.29     grid_4.4.1         evaluate_0.24.0    jquerylib_0.1.4   
## [17] tibble_3.3.0       fastmap_1.2.0      yaml_2.3.9         lifecycle_1.0.4   
## [21] compiler_4.4.1     RColorBrewer_1.1-3 timechange_0.3.0   pkgconfig_2.0.3   
## [25] farver_2.1.2       digest_0.6.36      R6_2.5.1           tidyselect_1.2.1  
## [29] pillar_1.11.0      magrittr_2.0.3     bslib_0.7.0        withr_3.0.0       
## [33] gtable_0.3.6       tools_4.4.1        cachem_1.1.0