Synopsis

Using the U.S. NOAA Storm Database (1950–Nov 2011), I quantify which event types are most harmful to population health (fatalities and injuries) and which have the greatest economic consequences (property + crop damage). After cleaning the damage exponents and aggregating by event type (EVTYPE), I find that tornadoes dominate injuries and fatalities, while floods and hurricanes/typhoons account for the largest total economic losses. Results are reproducible from the raw compressed data within this document.

Data Processing

The original dataset is a CSV compressed with bzip2. The code below downloads it if missing, then reads and processes only the variables needed for the analysis.

url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
f   <- "StormData.csv.bz2"

if (!file.exists(f)) {
  download.file(url, destfile = f, mode = "wb")
}

# Read directly from the bzip2 file
storm <- read.csv(bzfile(f), stringsAsFactors = FALSE)

# Keep only fields needed
keep <- c("EVTYPE","FATALITIES","INJURIES",
          "PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")
storm <- storm[keep]
str(storm)
## 'data.frame':    902297 obs. of  7 variables:
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...

Cleaning damage exponents

The dollar amounts are split into a number and an exponent code in PROPDMG/PROPDMGEXP and CROPDMG/CROPDMGEXP. We map common codes to multipliers; anything else becomes zero.

# Helper to convert exponent codes to multipliers
exp_to_mult <- function(x) {
  x <- toupper(trimws(x))
  out <- rep(0, length(x))
  out[x %in% c("", " ", NA)] <- 1
  out[x %in% c("H")] <- 1e2
  out[x %in% c("K")] <- 1e3
  out[x %in% c("M")] <- 1e6
  out[x %in% c("B")] <- 1e9
  # Some rows have digits; treat as 10^digit
  digs <- suppressWarnings(as.numeric(x))
  out[!is.na(digs)] <- 10^digs[!is.na(digs)]
  out
}

prop_mult <- exp_to_mult(storm$PROPDMGEXP)
crop_mult <- exp_to_mult(storm$CROPDMGEXP)

storm$PROP_DOLLARS <- storm$PROPDMG * prop_mult
storm$CROP_DOLLARS <- storm$CROPDMG * crop_mult
storm$ECON_DOLLARS <- storm$PROP_DOLLARS + storm$CROP_DOLLARS
storm$HEALTH_HARM  <- storm$FATALITIES + storm$INJURIES

Aggregate by event type

# Aggregate totals by event type
agg_health <- aggregate(cbind(FATALITIES, INJURIES, HEALTH_HARM) ~ EVTYPE,
                        data = storm, sum, na.rm = TRUE)
agg_econ   <- aggregate(cbind(PROP_DOLLARS, CROP_DOLLARS, ECON_DOLLARS) ~ EVTYPE,
                        data = storm, sum, na.rm = TRUE)

# Order by total impact
health_top <- agg_health[order(agg_health$HEALTH_HARM, decreasing = TRUE), ]
econ_top   <- agg_econ[order(agg_econ$ECON_DOLLARS, decreasing = TRUE), ]

head(health_top, 10)
##                EVTYPE FATALITIES INJURIES HEALTH_HARM
## 834           TORNADO       5633    91346       96979
## 130    EXCESSIVE HEAT       1903     6525        8428
## 856         TSTM WIND        504     6957        7461
## 170             FLOOD        470     6789        7259
## 464         LIGHTNING        816     5230        6046
## 275              HEAT        937     2100        3037
## 153       FLASH FLOOD        978     1777        2755
## 427         ICE STORM         89     1975        2064
## 760 THUNDERSTORM WIND        133     1488        1621
## 972      WINTER STORM        206     1321        1527
head(econ_top, 10)
##                EVTYPE PROP_DOLLARS CROP_DOLLARS ECON_DOLLARS
## 170             FLOOD 144657709807   5661968450 150319678257
## 411 HURRICANE/TYPHOON  69305840000   2607872800  71913712800
## 834           TORNADO  56947380616    414953270  57362333886
## 670       STORM SURGE  43323536000         5000  43323541000
## 244              HAIL  15735267513   3025954473  18761221986
## 153       FLASH FLOOD  16822673978   1421317100  18243991078
## 95            DROUGHT   1046106000  13972566000  15018672000
## 402         HURRICANE  11868319010   2741910000  14610229010
## 590       RIVER FLOOD   5118945500   5029459000  10148404500
## 427         ICE STORM   3944927860   5022113500   8967041360

Results

1) Events most harmful to population health

Below we plot the top 10 event types by total injuries + fatalities (left pane shows total harm; right panes show the two components). Finding: Tornadoes account for by far the largest combined burden of injuries and fatalities.

topN <- 10
ht <- head(health_top, topN)
op <- par(mfrow = c(1,3), mar = c(10,4,3,1))
barplot(ht$HEALTH_HARM, names.arg = ht$EVTYPE, las = 2,
        main = "Total harm", ylab = "Count")
barplot(ht$INJURIES, names.arg = ht$EVTYPE, las = 2,
        main = "Injuries", ylab = "Count")
barplot(ht$FATALITIES, names.arg = ht$EVTYPE, las = 2,
        main = "Fatalities", ylab = "Count")

par(op)

2) Events with greatest economic consequences

We again show the top 10 by total economic losses (property + crop). Finding: Floods and hurricanes/typhoons produce the largest total economic losses, with property damage dominating crop losses for most top events.

et <- head(econ_top, topN)
toBill <- function(x) x / 1e9

op <- par(mfrow = c(1,3), mar = c(10,4,3,1))
barplot(toBill(et$ECON_DOLLARS), names.arg = et$EVTYPE, las = 2,
        main = "Total", ylab = "Billions USD")
barplot(toBill(et$PROP_DOLLARS), names.arg = et$EVTYPE, las = 2,
        main = "Property", ylab = "Billions USD")
barplot(toBill(et$CROP_DOLLARS), names.arg = et$EVTYPE, las = 2,
        main = "Crop", ylab = "Billions USD")

par(op)

Thank you! Cheers