This analysis explores the NOAA Storm Database to identify which types of severe weather events are most harmful to population health and which have the greatest economic consequences in the United States between 1950 and 2011. ## Data Processing
In this section, the raw NOAA Storm Database file is downloaded (if needed), read into R, and processed for analysis.
# 1) Download the raw data file (only if not already present)
file_url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
file_name <- "StormData.csv.bz2"
if (!file.exists(file_name)) {
download.file(file_url, destfile = file_name, mode = "wb")
}
# 2) Read the raw CSV (directly from the .bz2, no pre-unzipping needed)
storm <- read.csv(file_name, stringsAsFactors = FALSE)
# Quick check
dim(storm)
## [1] 902297 37
names(storm)[1:10]
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
# Keep only relevant columns
health <- storm %>%
select(EVTYPE, FATALITIES, INJURIES)
# Aggregate by event type
health_summary <- health %>%
group_by(EVTYPE) %>%
summarise(
fatalities = sum(FATALITIES, na.rm = TRUE),
injuries = sum(INJURIES, na.rm = TRUE),
total_harm = fatalities + injuries
) %>%
arrange(desc(total_harm))
# Top 10 most harmful events
top_health <- head(health_summary, 10)
top_health
## # A tibble: 10 × 4
## EVTYPE fatalities injuries total_harm
## <chr> <dbl> <dbl> <dbl>
## 1 TORNADO 5633 91346 96979
## 2 EXCESSIVE HEAT 1903 6525 8428
## 3 TSTM WIND 504 6957 7461
## 4 FLOOD 470 6789 7259
## 5 LIGHTNING 816 5230 6046
## 6 HEAT 937 2100 3037
## 7 FLASH FLOOD 978 1777 2755
## 8 ICE STORM 89 1975 2064
## 9 THUNDERSTORM WIND 133 1488 1621
## 10 WINTER STORM 206 1321 1527
The analysis shows that a small number of severe weather event types account for the majority of health-related impacts across the United States. Events such as tornadoes, excessive heat, and floods result in the highest combined numbers of fatalities and injuries. This suggests that emergency preparedness and public health resources should prioritize these high-impact event categories.
To estimate economic impact, the NOAA storm database records property damage (PROPDMG) and crop damage (CROPDMG) along with exponent fields (PROPDMGEXP, CROPDMGEXP) that indicate the multiplier (e.g., K = thousands, M = millions, B = billions). We convert these fields into numeric dollar amounts and then aggregate total damage by event type.
# Helper: convert exponent codes to numeric multipliers
exp_to_mult <- function(exp) {
exp <- toupper(exp)
dplyr::case_when(
exp == "H" ~ 1e2,
exp == "K" ~ 1e3,
exp == "M" ~ 1e6,
exp == "B" ~ 1e9,
exp %in% as.character(0:9) ~ 10^(as.numeric(exp)),
TRUE ~ 1
)
}
# Keep only relevant columns
econ <- storm %>%
select(EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
# Convert exponents -> multipliers, compute dollar damages
econ <- econ %>%
mutate(
prop_mult = exp_to_mult(PROPDMGEXP),
crop_mult = exp_to_mult(CROPDMGEXP),
prop_dmg = PROPDMG * prop_mult,
crop_dmg = CROPDMG * crop_mult,
total_dmg = prop_dmg + crop_dmg
)
# Aggregate by event type
econ_summary <- econ %>%
group_by(EVTYPE) %>%
summarise(
property_damage = sum(prop_dmg, na.rm = TRUE),
crop_damage = sum(crop_dmg, na.rm = TRUE),
total_damage = sum(total_dmg, na.rm = TRUE),
.groups = "drop"
) %>%
arrange(desc(total_damage))
# Top 10 by total economic damage
top_econ <- head(econ_summary, 10)
top_econ
## # A tibble: 10 × 4
## EVTYPE property_damage crop_damage total_damage
## <chr> <dbl> <dbl> <dbl>
## 1 FLOOD 144657709807 5661968450 150319678257
## 2 HURRICANE/TYPHOON 69305840000 2607872800 71913712800
## 3 TORNADO 56947380676. 414953270 57362333946.
## 4 STORM SURGE 43323536000 5000 43323541000
## 5 HAIL 15735267513. 3025954473 18761221986.
## 6 FLASH FLOOD 16822673978. 1421317100 18243991078.
## 7 DROUGHT 1046106000 13972566000 15018672000
## 8 HURRICANE 11868319010 2741910000 14610229010
## 9 RIVER FLOOD 5118945500 5029459000 10148404500
## 10 ICE STORM 3944927860 5022113500 8967041360
The economic-impact results indicate that a small set of event types account for the majority of combined property and crop losses. The ranking highlights which hazards create the greatest financial burden nationally and can help prioritize mitigation and preparedness investments.
library(ggplot2)
ggplot(top_econ, aes(x = reorder(EVTYPE, total_damage), y = total_damage)) +
geom_col(fill = "steelblue") +
coord_flip() +
labs(
x = "Event type",
y = "Total economic damage (USD)"
)
Figure 2. Top 10 storm event types by total economic damage in the United States (1950–2011).