Synopsis

This report analyzes the NOAA severe weather events database to identify which event types cause the greatest harm to health and the largest economic losses in the United States. The analysis works directly from the original compressed .csv.bz2 file to maintain reproducibility. First, key variables are cleaned and transformed: event type, casualties, injuries, property damage, and crop damage. Next, economic damage exponents are normalized to convert all amounts into comparable dollars. Two aggregate metrics are then built by event type: health impact (fatalities + injuries) and economic impact (property + crop losses). Finally, event types are ranked to obtain the top ten in each dimension, and results are presented in tables and plots. The findings show that the events most damaging to health are not necessarily the same as those with the highest economic cost. This approach helps prioritize preparedness actions according to the dominant risk type.

Context

Severe storms and other weather phenomena can cause both public health and economic problems. This analysis uses the NOAA Storm Database to answer two questions: which events are most harmful to population health, and which events have the greatest economic consequences.

Data Processing

The analysis begins from the raw .csv.bz2 file. If the file does not exist locally, it is downloaded and then read from the compressed source using bzfile.

knitr::opts_chunk$set(echo = TRUE)

url_data <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
local_bz2 <- file.path("data", "StormData.csv.bz2")

if (!dir.exists("data")) {
  dir.create("data", recursive = TRUE)
}

if (!file.exists(local_bz2)) {
  download.file(url = url_data, destfile = local_bz2, mode = "wb")
}
storm_data <- read.csv(bzfile(local_bz2), stringsAsFactors = FALSE)
dim(storm_data)
## [1] 902297     37

Data preparation consisted of selecting the required variables and applying the transformations needed to answer the project questions: 1. EVTYPE is normalized to reduce spelling and casing variation. 2. Multipliers are created for PROPDMGEXP and CROPDMGEXP. 3. Health impact is computed as FATALITIES + INJURIES. 4. Economic impact is computed as PROPDMG + CROPDMG in dollars.

cols_needed <- c(
  "EVTYPE", "FATALITIES", "INJURIES",
  "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP"
)

storm_subset <- storm_data[, cols_needed]
storm_subset$EVTYPE <- toupper(trimws(storm_subset$EVTYPE))

exp_to_multiplier <- function(x) {
  x <- toupper(trimws(as.character(x)))
  out <- rep(1, length(x))

  out[x == "H"] <- 1e2
  out[x == "K"] <- 1e3
  out[x == "M"] <- 1e6
  out[x == "B"] <- 1e9

  digit_idx <- grepl("^[0-9]$", x)
  out[digit_idx] <- 10 ^ as.numeric(x[digit_idx])

  unknown_idx <- x %in% c("", "+", "-", "?", "NA") | is.na(x)
  out[unknown_idx] <- 1

  out
}

storm_subset$prop_damage <- storm_subset$PROPDMG * exp_to_multiplier(storm_subset$PROPDMGEXP)
storm_subset$crop_damage <- storm_subset$CROPDMG * exp_to_multiplier(storm_subset$CROPDMGEXP)
storm_subset$health_impact <- storm_subset$FATALITIES + storm_subset$INJURIES
storm_subset$economic_impact <- storm_subset$prop_damage + storm_subset$crop_damage

health_by_event <- aggregate(health_impact ~ EVTYPE, data = storm_subset, sum, na.rm = TRUE)
econ_by_event <- aggregate(economic_impact ~ EVTYPE, data = storm_subset, sum, na.rm = TRUE)

health_by_event <- health_by_event[health_by_event$health_impact > 0, ]
econ_by_event <- econ_by_event[econ_by_event$economic_impact > 0, ]

top_health <- head(health_by_event[order(-health_by_event$health_impact), ], 10)
top_econ <- head(econ_by_event[order(-econ_by_event$economic_impact), ], 10)

Results

Event types most harmful to population health

top_health
##                EVTYPE health_impact
## 750           TORNADO         96979
## 108    EXCESSIVE HEAT          8428
## 771         TSTM WIND          7461
## 146             FLOOD          7259
## 410         LIGHTNING          6046
## 235              HEAT          3037
## 130       FLASH FLOOD          2755
## 379         ICE STORM          2064
## 677 THUNDERSTORM WIND          1621
## 880      WINTER STORM          1527
n_health <- nrow(top_health)
health_colors <- character(n_health)
health_colors[order(top_health$health_impact)] <-
  grDevices::colorRampPalette(c("#F2E5FF", "#031b5f"))(n_health)

barplot(
  height = top_health$health_impact,
  names.arg = top_health$EVTYPE,
  las = 2,
  cex.names = 0.8,
  col = health_colors,
  main = "Top 10 event types by health impact",
  ylab = "Fatalities + Injuries"
)

Event types with the greatest economic consequences

top_econ
##                EVTYPE economic_impact
## 146             FLOOD    150319678257
## 364 HURRICANE/TYPHOON     71913712800
## 750           TORNADO     57362333947
## 591       STORM SURGE     43323541000
## 204              HAIL     18761221986
## 130       FLASH FLOOD     18244041079
## 76            DROUGHT     15018672000
## 355         HURRICANE     14610229010
## 521       RIVER FLOOD     10148404500
## 379         ICE STORM      8967041360
econ_colors_distinct <- grDevices::hcl.colors(nrow(top_econ), palette = "Set 2")

barplot(
  height = top_econ$economic_impact / 1e9,
  names.arg = top_econ$EVTYPE,
  las = 2,
  cex.names = 0.8,
  col = econ_colors_distinct,
  main = "Top 10 event types by economic impact",
  ylab = "Total economic damage (billion USD)"
)

In general terms, the findings show that the events with the greatest human impact and those with the highest economic cost do not always coincide. This suggests that preparedness and resource allocation should consider at least two risk dimensions: population health and economic losses.