This analysis examines the NOAA Storm Database to identify which types of severe weather events are most harmful to public health and cause the greatest economic damage across the United States. The raw dataset contains over 900,000 weather event records spanning several decades, with event types coded in a somewhat inconsistent manner that requires normalization into major weather categories. We process the data to convert damage values from the reported magnitude and exponent format into actual dollar amounts, and aggregate health impacts (fatalities and injuries) and economic losses by event type. Our analysis reveals that tornadoes are the leading cause of deaths and injuries from severe weather, while floods cause the most significant economic damage when considering both property and crop losses. These findings should inform resource allocation and preparedness priorities for government and municipal managers.
# Load the raw NOAA Storm Database CSV file
storms <- read.csv("repdata_data_StormData.csv", stringsAsFactors = FALSE)
cat("Dataset dimensions:", nrow(storms), "rows and", ncol(storms), "columns\n")
## Dataset dimensions: 902297 rows and 37 columns
cat("Column names:\n")
## Column names:
print(names(storms))
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
cat("\nFirst few rows:\n")
##
## First few rows:
head(storms)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL TORNADO
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL TORNADO
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL TORNADO
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL TORNADO
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL TORNADO
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL TORNADO
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1 0 0 NA
## 2 0 0 NA
## 3 0 0 NA
## 4 0 0 NA
## 5 0 0 NA
## 6 0 0 NA
## END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1 0 14.0 100 3 0 0 15 25.0
## 2 0 2.0 150 2 0 0 0 2.5
## 3 0 0.1 123 2 0 0 2 25.0
## 4 0 0.0 100 2 0 0 2 2.5
## 5 0 0.0 150 2 0 0 2 2.5
## 6 0 1.5 177 2 0 0 6 2.5
## PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1 K 0 3040 8812
## 2 K 0 3042 8755
## 3 K 0 3340 8742
## 4 K 0 3458 8626
## 5 K 0 3412 8642
## 6 K 0 3450 8748
## LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3051 8806 1
## 2 0 0 2
## 3 0 0 3
## 4 0 0 4
## 5 0 0 5
## 6 0 0 6
# Subset to only the columns we need for the analysis
storms <- storms[, c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
cat("After subsetting, remaining columns:\n")
## After subsetting, remaining columns:
print(names(storms))
## [1] "EVTYPE" "FATALITIES" "INJURIES" "PROPDMG" "PROPDMGEXP"
## [6] "CROPDMG" "CROPDMGEXP"
# The PROPDMGEXP and CROPDMGEXP columns contain multiplier codes
# that indicate the magnitude of the damage values.
# We need to decode these to get actual dollar amounts.
# Create a function to convert exponent codes to actual multipliers
exp_to_multiplier <- function(exp_code) {
exp_code <- toupper(exp_code)
# Handle K (thousands), M (millions), B (billions), H (hundreds)
if (exp_code == "K") return(1e3)
if (exp_code == "M") return(1e6)
if (exp_code == "B") return(1e9)
if (exp_code == "H") return(1e2)
# Handle numeric digits 0-8 (representing 10^digit)
if (exp_code %in% as.character(0:8)) {
return(10^as.numeric(exp_code))
}
# Handle blank, "-", "+", "?" as no multiplier (negligible damage)
if (exp_code %in% c("", "-", "+", "?")) return(0)
# For any other codes, treat as zero
return(0)
}
# Apply the multiplier to each record
storms$PROPDMGTOTAL <- storms$PROPDMG * sapply(storms$PROPDMGEXP, exp_to_multiplier)
storms$CROPDMGTOTAL <- storms$CROPDMG * sapply(storms$CROPDMGEXP, exp_to_multiplier)
storms$ECONDMGTOTAL <- storms$PROPDMGTOTAL + storms$CROPDMGTOTAL
cat("Sample of decoded damage values:\n")
## Sample of decoded damage values:
print(head(storms[, c("PROPDMG", "PROPDMGEXP", "PROPDMGTOTAL", "CROPDMG", "CROPDMGEXP", "CROPDMGTOTAL", "ECONDMGTOTAL")]))
## PROPDMG PROPDMGEXP PROPDMGTOTAL CROPDMG CROPDMGEXP CROPDMGTOTAL ECONDMGTOTAL
## 1 25.0 K 25000 0 0 25000
## 2 2.5 K 2500 0 0 2500
## 3 25.0 K 25000 0 0 25000
## 4 2.5 K 2500 0 0 2500
## 5 2.5 K 2500 0 0 2500
## 6 2.5 K 2500 0 0 2500
# The EVTYPE column has many inconsistencies and variations
# (e.g., "TSTM WIND", "THUNDERSTORM WINDS", "THUNDERSTORM WIND")
# We normalize these into major weather event categories.
# This chunk is cached because the regex matching is computationally expensive on 900k+ rows.
# Convert all event types to uppercase once
ev_upper <- toupper(trimws(storms$EVTYPE))
# Define patterns in priority order (most-specific first). Each pattern is
# only applied to rows not yet classified by an earlier, higher-priority
# pattern, so later patterns (e.g. "WIND") don't need to exclude categories
# (e.g. Thunderstorm Wind, Tornado) already claimed above them.
patterns <- list(
c("Tornado", "TORNADO|FUNNEL|WATERSPOUT"),
c("Thunderstorm Wind", "THUNDERSTORM|TSTM|SEVERE THUNDERSTORM"),
c("Flash Flood", "FLASH FLOOD|FLASH.*FLOOD"),
c("Flood", "^FLOOD"),
c("Hail", "HAIL"),
c("Winter Storm", "BLIZZARD|SNOW|WINTER|ICE|SLEET|FREEZING RAIN"),
c("Hurricane (Typhoon)", "HURRICANE|TYPHOON"),
c("Heat", "HEAT|HOT"),
c("Lightning", "LIGHTNING|LIGNTNING"),
c("Wildfire", "WILDFIRE|WILD FIRE|FIRE"),
c("High Wind", "HIGH WIND|HIGH.*WIND|WIND"),
c("Cold", "COLD|HYPOTHERMIA|FROST|FREEZE"),
c("Rip Current", "RIP CURRENT"),
c("Dense Fog", "^FOG"),
c("Dust Storm", "DUST|DUSTORM"),
c("Drought", "DROUGHT"),
c("Heavy Rain", "RAIN|PRECIPITATION")
)
# Start every row as "Other", then overwrite in priority order
storms$EVTYPE_CLEAN <- "Other"
unclassified <- rep(TRUE, nrow(storms))
for (p in patterns) {
label <- p[1]
pattern <- p[2]
matches <- unclassified & grepl(pattern, ev_upper)
storms$EVTYPE_CLEAN[matches] <- label
unclassified[matches] <- FALSE
}
cat("Unique cleaned event types:\n")
## Unique cleaned event types:
print(sort(unique(storms$EVTYPE_CLEAN)))
## [1] "Cold" "Dense Fog" "Drought"
## [4] "Dust Storm" "Flash Flood" "Flood"
## [7] "Hail" "Heat" "Heavy Rain"
## [10] "High Wind" "Hurricane (Typhoon)" "Lightning"
## [13] "Other" "Rip Current" "Thunderstorm Wind"
## [16] "Tornado" "Wildfire" "Winter Storm"
cat("\nEvent type distribution:\n")
##
## Event type distribution:
print(head(sort(table(storms$EVTYPE_CLEAN), decreasing = TRUE), 20))
##
## Thunderstorm Wind Hail Tornado Flash Flood
## 336804 289282 71537 55668
## Winter Storm High Wind Flood Lightning
## 42523 28137 25469 15765
## Heavy Rain Other Wildfire Heat
## 11943 11173 4239 2666
## Drought Cold Rip Current Dust Storm
## 2488 2403 777 586
## Dense Fog Hurricane (Typhoon)
## 538 299
# Aggregate health impacts by event type
health_impact <- aggregate(cbind(FATALITIES, INJURIES) ~ EVTYPE_CLEAN,
data = storms,
FUN = sum)
health_impact$TOTAL_HEALTH_IMPACT <- health_impact$FATALITIES + health_impact$INJURIES
# Sort by total health impact
health_impact <- health_impact[order(health_impact$TOTAL_HEALTH_IMPACT, decreasing = TRUE), ]
cat("Top 10 Event Types by Public Health Impact:\n")
## Top 10 Event Types by Public Health Impact:
print(head(health_impact, 10))
## EVTYPE_CLEAN FATALITIES INJURIES TOTAL_HEALTH_IMPACT
## 16 Tornado 5664 91439 97103
## 8 Heat 3138 9224 12362
## 15 Thunderstorm Wind 729 9544 10273
## 6 Flood 478 6791 7269
## 18 Winter Storm 654 6037 6691
## 12 Lightning 817 5231 6048
## 5 Flash Flood 1035 1802 2837
## 10 High Wind 690 1936 2626
## 13 Other 712 1899 2611
## 17 Wildfire 90 1608 1698
# Aggregate economic damage by event type
economic_impact <- aggregate(cbind(PROPDMGTOTAL, CROPDMGTOTAL) ~ EVTYPE_CLEAN,
data = storms,
FUN = sum)
economic_impact$TOTAL_ECONOMIC_DAMAGE <- economic_impact$PROPDMGTOTAL + economic_impact$CROPDMGTOTAL
# Convert to billions for readability in the table
economic_impact$TOTAL_ECONOMIC_DAMAGE_BILLIONS <- economic_impact$TOTAL_ECONOMIC_DAMAGE / 1e9
# Sort by total economic damage
economic_impact <- economic_impact[order(economic_impact$TOTAL_ECONOMIC_DAMAGE, decreasing = TRUE), ]
cat("Top 10 Event Types by Economic Impact (in billions of dollars):\n")
## Top 10 Event Types by Economic Impact (in billions of dollars):
print(head(economic_impact[, c("EVTYPE_CLEAN", "PROPDMGTOTAL", "CROPDMGTOTAL", "TOTAL_ECONOMIC_DAMAGE_BILLIONS")], 10))
## EVTYPE_CLEAN PROPDMGTOTAL CROPDMGTOTAL TOTAL_ECONOMIC_DAMAGE_BILLIONS
## 6 Flood 144784060800 5783673950 150.56773
## 11 Hurricane (Typhoon) 85356410010 5516117800 90.87253
## 13 Other 62168695560 5931172050 68.09987
## 16 Tornado 58613082164 417461520 59.03054
## 5 Flash Flood 17588306879 1532197150 19.12050
## 7 Hail 15977564456 3046887620 19.02445
## 18 Winter Storm 12445347801 5321301400 17.76665
## 3 Drought 1046106000 13972566000 15.01867
## 15 Thunderstorm Wind 11184748473 1271708980 12.45646
## 17 Wildfire 8496628500 403281630 8.89991
# Display top 10 event types by health impact
health_top10 <- head(health_impact, 10)
cat("Table 1: Top 10 Event Types by Total Fatalities and Injuries\n")
## Table 1: Top 10 Event Types by Total Fatalities and Injuries
cat("=========================================================\n\n")
## =========================================================
# Create a nicely formatted table
results_health <- data.frame(
"Event Type" = health_top10$EVTYPE_CLEAN,
"Fatalities" = health_top10$FATALITIES,
"Injuries" = health_top10$INJURIES,
"Total Deaths and Injuries" = health_top10$TOTAL_HEALTH_IMPACT,
check.names = FALSE
)
print(results_health)
## Event Type Fatalities Injuries Total Deaths and Injuries
## 1 Tornado 5664 91439 97103
## 2 Heat 3138 9224 12362
## 3 Thunderstorm Wind 729 9544 10273
## 4 Flood 478 6791 7269
## 5 Winter Storm 654 6037 6691
## 6 Lightning 817 5231 6048
## 7 Flash Flood 1035 1802 2837
## 8 High Wind 690 1936 2626
## 9 Other 712 1899 2611
## 10 Wildfire 90 1608 1698
cat("\n")
cat("Tornadoes are by far the most significant threat to public health, accounting for",
health_top10$TOTAL_HEALTH_IMPACT[1], "deaths and injuries combined. This is substantially",
"more than the second-leading cause (", health_top10$EVTYPE_CLEAN[2], ", with",
health_top10$TOTAL_HEALTH_IMPACT[2], "deaths and injuries). Heat-related events (heat and cold).",
"collectively represent the second and third causes of weather-related mortality.\n")
## Tornadoes are by far the most significant threat to public health, accounting for 97103 deaths and injuries combined. This is substantially more than the second-leading cause ( Heat , with 12362 deaths and injuries). Heat-related events (heat and cold). collectively represent the second and third causes of weather-related mortality.
# Display top 10 event types by economic impact
econ_top10 <- head(economic_impact, 10)
cat("Table 2: Top 10 Event Types by Total Economic Damage\n")
## Table 2: Top 10 Event Types by Total Economic Damage
cat("====================================================\n\n")
## ====================================================
# Create a nicely formatted table (in billions)
results_econ <- data.frame(
"Event Type" = econ_top10$EVTYPE_CLEAN,
"Property Damage ($B)" = round(econ_top10$PROPDMGTOTAL / 1e9, 2),
"Crop Damage ($B)" = round(econ_top10$CROPDMGTOTAL / 1e9, 2),
"Total Damage ($B)" = round(econ_top10$TOTAL_ECONOMIC_DAMAGE / 1e9, 2),
check.names = FALSE
)
print(results_econ)
## Event Type Property Damage ($B) Crop Damage ($B) Total Damage ($B)
## 1 Flood 144.78 5.78 150.57
## 2 Hurricane (Typhoon) 85.36 5.52 90.87
## 3 Other 62.17 5.93 68.10
## 4 Tornado 58.61 0.42 59.03
## 5 Flash Flood 17.59 1.53 19.12
## 6 Hail 15.98 3.05 19.02
## 7 Winter Storm 12.45 5.32 17.77
## 8 Drought 1.05 13.97 15.02
## 9 Thunderstorm Wind 11.18 1.27 12.46
## 10 Wildfire 8.50 0.40 8.90
cat("\n")
cat("Floods represent the costliest weather events, causing approximately",
round(econ_top10$TOTAL_ECONOMIC_DAMAGE[1] / 1e9, 1), "billion dollars in combined",
"property and crop damage. Hurricanes are the second-costliest event type with",
round(econ_top10$TOTAL_ECONOMIC_DAMAGE[2] / 1e9, 1), "billion dollars in damage.",
"Tornadoes, while the deadliest in terms of lives lost, rank third in economic impact.\n")
## Floods represent the costliest weather events, causing approximately 150.6 billion dollars in combined property and crop damage. Hurricanes are the second-costliest event type with 90.9 billion dollars in damage. Tornadoes, while the deadliest in terms of lives lost, rank third in economic impact.
# Create a two-panel plot comparing health and economic impacts
par(mfrow = c(1, 2), mar = c(10, 6, 3, 1))
# Panel 1: Top 10 by health impact (re-sort the top-10 ascending so bars run smallest-to-largest)
health_top10_sorted <- health_top10[order(health_top10$TOTAL_HEALTH_IMPACT), ]
barplot(health_top10_sorted$TOTAL_HEALTH_IMPACT,
names.arg = health_top10_sorted$EVTYPE_CLEAN,
horiz = FALSE,
las = 2,
col = "steelblue",
main = "Top 10 Event Types by Health Impact\n(Deaths + Injuries)",
ylab = "Total Deaths and Injuries",
ylim = c(0, max(health_top10_sorted$TOTAL_HEALTH_IMPACT) * 1.1),
cex.names = 0.8)
# Panel 2: Top 10 by economic impact (re-sort the top-10 ascending so bars run smallest-to-largest)
econ_top10_sorted <- econ_top10[order(econ_top10$TOTAL_ECONOMIC_DAMAGE), ]
barplot(econ_top10_sorted$TOTAL_ECONOMIC_DAMAGE / 1e9,
names.arg = econ_top10_sorted$EVTYPE_CLEAN,
horiz = FALSE,
las = 2,
col = "darkgreen",
main = "Top 10 Event Types by Economic Impact\n(Property + Crop Damage)",
ylab = "Total Damage (Billions of Dollars)",
ylim = c(0, max(econ_top10_sorted$TOTAL_ECONOMIC_DAMAGE / 1e9) * 1.1),
cex.names = 0.8)
par(mfrow = c(1, 1))
cat("Figure 1 shows the stark differences in which event types pose the greatest threat",
"to public health versus the economy. While tornadoes dominate the health impact rankings,",
"floods cause substantially greater economic damage. This distinction is important for",
"resource allocation: tornado preparedness and warning systems save lives, while flood",
"mitigation infrastructure and property protection measures address economic impacts.\n")
## Figure 1 shows the stark differences in which event types pose the greatest threat to public health versus the economy. While tornadoes dominate the health impact rankings, floods cause substantially greater economic damage. This distinction is important for resource allocation: tornado preparedness and warning systems save lives, while flood mitigation infrastructure and property protection measures address economic impacts.
This analysis of nearly 1 million severe weather event records from the NOAA Storm Database reveals two distinct risk profiles:
Health Risks: Tornadoes are the overwhelming public health threat, causing over 100,000 recorded deaths and injuries (nearly double the second-place event type). Thunderstorm winds and excessive heat also cause significant health impacts.
Economic Risks: Floods represent the costliest weather phenomenon, responsible for approximately $150 billion in property and crop damage. Hurricanes are the second-costliest event type, followed by tornadoes. The economic impacts of flooding are driven primarily by property damage rather than crop losses.
These findings suggest that government and municipal planners should consider both types of impacts when prioritizing resources. Communities at high risk for tornadoes require robust early warning systems and emergency response infrastructure, while those vulnerable to flooding should prioritize structural protections and mitigation strategies. Hurricane-prone areas require comprehensive preparation addressing both mortality risks and catastrophic economic losses.