Using the U.S. NOAA Storm Database (1950–Nov 2011), I quantify which event types are most harmful to population health (fatalities and injuries) and which have the greatest economic consequences (property + crop damage). After cleaning the damage exponents and aggregating by event type (EVTYPE), I find that tornadoes dominate injuries and fatalities, while floods and hurricanes/typhoons account for the largest total economic losses. Results are reproducible from the raw compressed data within this document.
The original dataset is a CSV compressed with bzip2. The code below downloads it if missing, then reads and processes only the variables needed for the analysis.
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
f <- "StormData.csv.bz2"
if (!file.exists(f)) {
download.file(url, destfile = f, mode = "wb")
}
# Read directly from the bzip2 file
storm <- read.csv(bzfile(f), stringsAsFactors = FALSE)
# Keep only fields needed
keep <- c("EVTYPE","FATALITIES","INJURIES",
"PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")
storm <- storm[keep]
str(storm)
## 'data.frame': 902297 obs. of 7 variables:
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
The dollar amounts are split into a number and an exponent code in PROPDMG/PROPDMGEXP and CROPDMG/CROPDMGEXP. We map common codes to multipliers; anything else becomes zero.
# Helper to convert exponent codes to multipliers
exp_to_mult <- function(x) {
x <- toupper(trimws(x))
out <- rep(0, length(x))
out[x %in% c("", " ", NA)] <- 1
out[x %in% c("H")] <- 1e2
out[x %in% c("K")] <- 1e3
out[x %in% c("M")] <- 1e6
out[x %in% c("B")] <- 1e9
# Some rows have digits; treat as 10^digit
digs <- suppressWarnings(as.numeric(x))
out[!is.na(digs)] <- 10^digs[!is.na(digs)]
out
}
prop_mult <- exp_to_mult(storm$PROPDMGEXP)
crop_mult <- exp_to_mult(storm$CROPDMGEXP)
storm$PROP_DOLLARS <- storm$PROPDMG * prop_mult
storm$CROP_DOLLARS <- storm$CROPDMG * crop_mult
storm$ECON_DOLLARS <- storm$PROP_DOLLARS + storm$CROP_DOLLARS
storm$HEALTH_HARM <- storm$FATALITIES + storm$INJURIES
# Aggregate totals by event type
agg_health <- aggregate(cbind(FATALITIES, INJURIES, HEALTH_HARM) ~ EVTYPE,
data = storm, sum, na.rm = TRUE)
agg_econ <- aggregate(cbind(PROP_DOLLARS, CROP_DOLLARS, ECON_DOLLARS) ~ EVTYPE,
data = storm, sum, na.rm = TRUE)
# Order by total impact
health_top <- agg_health[order(agg_health$HEALTH_HARM, decreasing = TRUE), ]
econ_top <- agg_econ[order(agg_econ$ECON_DOLLARS, decreasing = TRUE), ]
head(health_top, 10)
## EVTYPE FATALITIES INJURIES HEALTH_HARM
## 834 TORNADO 5633 91346 96979
## 130 EXCESSIVE HEAT 1903 6525 8428
## 856 TSTM WIND 504 6957 7461
## 170 FLOOD 470 6789 7259
## 464 LIGHTNING 816 5230 6046
## 275 HEAT 937 2100 3037
## 153 FLASH FLOOD 978 1777 2755
## 427 ICE STORM 89 1975 2064
## 760 THUNDERSTORM WIND 133 1488 1621
## 972 WINTER STORM 206 1321 1527
head(econ_top, 10)
## EVTYPE PROP_DOLLARS CROP_DOLLARS ECON_DOLLARS
## 170 FLOOD 144657709807 5661968450 150319678257
## 411 HURRICANE/TYPHOON 69305840000 2607872800 71913712800
## 834 TORNADO 56947380616 414953270 57362333886
## 670 STORM SURGE 43323536000 5000 43323541000
## 244 HAIL 15735267513 3025954473 18761221986
## 153 FLASH FLOOD 16822673978 1421317100 18243991078
## 95 DROUGHT 1046106000 13972566000 15018672000
## 402 HURRICANE 11868319010 2741910000 14610229010
## 590 RIVER FLOOD 5118945500 5029459000 10148404500
## 427 ICE STORM 3944927860 5022113500 8967041360
Below we plot the top 10 event types by total injuries + fatalities (left pane shows total harm; right panes show the two components). Finding: Tornadoes account for by far the largest combined burden of injuries and fatalities.
topN <- 10
ht <- head(health_top, topN)
op <- par(mfrow = c(1,3), mar = c(10,4,3,1))
barplot(ht$HEALTH_HARM, names.arg = ht$EVTYPE, las = 2,
main = "Total harm", ylab = "Count")
barplot(ht$INJURIES, names.arg = ht$EVTYPE, las = 2,
main = "Injuries", ylab = "Count")
barplot(ht$FATALITIES, names.arg = ht$EVTYPE, las = 2,
main = "Fatalities", ylab = "Count")
par(op)
We again show the top 10 by total economic losses (property + crop). Finding: Floods and hurricanes/typhoons produce the largest total economic losses, with property damage dominating crop losses for most top events.
et <- head(econ_top, topN)
toBill <- function(x) x / 1e9
op <- par(mfrow = c(1,3), mar = c(10,4,3,1))
barplot(toBill(et$ECON_DOLLARS), names.arg = et$EVTYPE, las = 2,
main = "Total", ylab = "Billions USD")
barplot(toBill(et$PROP_DOLLARS), names.arg = et$EVTYPE, las = 2,
main = "Property", ylab = "Billions USD")
barplot(toBill(et$CROP_DOLLARS), names.arg = et$EVTYPE, las = 2,
main = "Crop", ylab = "Billions USD")
par(op)
Thank you! Cheers