This analysis explores the U.S. National Oceanic and Atmospheric Administration’s (NOAA) Storm Database to quantify the impact of extreme weather events in terms of human health and economic losses. The goal is to identify which event categories are the most harmful to the population and the economy. To ensure data consistency and representativeness across all event types, the study focuses on the period from 1993 to 2011. A unified metric, the Total Economic Burden, was constructed by monetizing health impacts using the Value of Statistical Life (VSL) approach, allowing for a direct comparison between mortality and financial damage. The findings reveal that just five event types—Hurricane/Typhoon, Tornado, Flood, Excessive Heat, and Storm Surge—account for over 60% of the total national burden. While Tornadoes lead in total injuries, Excessive Heat stands as the primary cause of weather-related fatalities. Geographically, the South emerged as the most severely affected region, bearing a disproportionate share of the costs due to hurricane and storm surge activity. Ultimately, the results suggest that disaster mitigation strategies must be regionally specialized to address the specific “hidden” costs of thermal extremes alongside high-visibility kinetic events.
The data processing section is divided into two parts. In the first part, Data Cleaning, we describe the characteristics of the raw data, the justification for the transformations required, and how these were implemented.
In the second part, Data Analysis, we describe the analytic plan and its implementation. The rationale for performing this here, rather than in the Results section, is to ensure the latter focuses primarily on the findings, keeping it free from long code blocks and transparent to the reader.
During the initial processing steps, it became evident that the dataset was not suitable for analysis in its raw state. Dates were in an irregular format, preventing clear identification of the year—a variable crucial to this analysis. Event names were also problematic; they did not strictly reflect the 48 categories recognized by NOAA, but included hundreds of variations and typos. Furthermore, many columns contained missing or redundant information that had to be removed for computational efficiency.
Additionally, monetary losses were encoded using character suffixes (e.g., “K”, “M”) and needed to be converted into a numeric format. Finally, an imputation error regarding the losses of a specific disaster in Napa Valley was identified and corrected. The implementation of these steps is detailed below.
We start by loading the required packages and the raw data:
# ---- Get files and libraries ----
if(!require(pacman)) install.packages("pacman")
pacman::p_load(data.table,
ggplot2,
lubridate,
flextable,
stringr)
# Download data if not present
if(!file.exists("FStormData.csv")){
download.file(url = "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
destfile = file.path(getwd(), "FStormData.csv"))
}
storms <- fread("FStormData.csv",
na.strings = "") |> janitor::clean_names()
Now we explore the structure of the data and start with the necessary transformations:
str(storms)
## Classes 'data.table' and 'data.frame': 902297 obs. of 37 variables:
## $ state : num 1 1 1 1 1 1 1 1 1 1 ...
## $ bgn_date : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ bgn_time : chr "0130" "0145" "1600" "0900" ...
## $ time_zone : chr "CST" "CST" "CST" "CST" ...
## $ county : num 97 3 57 89 43 77 9 123 125 57 ...
## $ countyname : chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ state_2 : chr "AL" "AL" "AL" "AL" ...
## $ evtype : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ bgn_range : num 0 0 0 0 0 0 0 0 0 0 ...
## $ bgn_azi : chr NA NA NA NA ...
## $ bgn_locati : chr NA NA NA NA ...
## $ end_date : chr NA NA NA NA ...
## $ end_time : chr NA NA NA NA ...
## $ county_end : num 0 0 0 0 0 0 0 0 0 0 ...
## $ countyendn : logi NA NA NA NA NA NA ...
## $ end_range : num 0 0 0 0 0 0 0 0 0 0 ...
## $ end_azi : chr NA NA NA NA ...
## $ end_locati : chr NA NA NA NA ...
## $ length : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ width : num 100 150 123 100 150 177 33 33 100 100 ...
## $ f : int 3 2 2 2 2 2 2 1 3 3 ...
## $ mag : num 0 0 0 0 0 0 0 0 0 0 ...
## $ fatalities : num 0 0 0 0 0 0 0 0 1 0 ...
## $ injuries : num 15 0 2 2 2 6 1 0 14 0 ...
## $ propdmg : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ propdmgexp : chr "K" "K" "K" "K" ...
## $ cropdmg : num 0 0 0 0 0 0 0 0 0 0 ...
## $ cropdmgexp : chr NA NA NA NA ...
## $ wfo : chr NA NA NA NA ...
## $ stateoffic : chr NA NA NA NA ...
## $ zonenames : chr NA NA NA NA ...
## $ latitude : num 3040 3042 3340 3458 3412 ...
## $ longitude : num 8812 8755 8742 8626 8642 ...
## $ latitude_e : num 3051 0 0 0 0 ...
## $ longitude_2: num 8806 0 0 0 0 ...
## $ remarks : chr NA NA NA NA ...
## $ refnum : num 1 2 3 4 5 6 7 8 9 10 ...
## - attr(*, ".internal.selfref")=<externalptr>
# We keep only the columns relevant to our problem (dates, location, type, and damage)
tidy_storms <- storms[, .(bgn_date,
end_date,
state = as.factor(state),
state_2 = as.factor(state_2),
county = as.factor(county),
countyname,
evtype,
fatalities,
injuries,
propdmg,
propdmgexp,
cropdmg,
cropdmgexp,
remarks,
refnum)]
# Date formatting: removing timestamps and converting to Date objects
dates <- names(tidy_storms)[names(tidy_storms) %like% "date"]
tidy_storms[, (dates) := lapply(.SD, sub, pattern = " .*$", replacement = ""), .SDcols = dates]
tidy_storms[, (dates) := lapply(.SD, mdy), .SDcols = dates]
# Creating a "year" column for temporal analysis
tidy_storms[, year := year(bgn_date)]
We now assess consistency in data recording across years:
# Check reporting variability across decades
min_year <- tidy_storms[, unique(year)] |> min()
max_year <- tidy_storms[, unique(year)] |> max()
tidy_storms[year%in%c(seq(min_year, max_year, 10)), .(`Reported events` = uniqueN(evtype)), year]
## year Reported events
## <num> <int>
## 1: 1950 1
## 2: 1960 3
## 3: 1970 3
## 4: 1980 3
## 5: 1990 3
## 6: 2000 112
## 7: 2010 46
# Check reported events every 10 years
tidy_storms[year %in% c(seq(1950, 1990, 10)),
.(`Reported events` = unique(evtype)), year]
## year Reported events
## <num> <char>
## 1: 1950 TORNADO
## 2: 1960 TORNADO
## 3: 1960 TSTM WIND
## 4: 1960 HAIL
## 5: 1970 TSTM WIND
## 6: 1970 TORNADO
## 7: 1970 HAIL
## 8: 1980 HAIL
## 9: 1980 TSTM WIND
## 10: 1980 TORNADO
## 11: 1990 HAIL
## 12: 1990 TSTM WIND
## 13: 1990 TORNADO
We can see that earlier years in the dataset only record three events: Tornado, Tstm Wind, and Hail. Including these years would heavily bias the analysis in favor of these events simply because more data are available for them. Therefore, we will exclude data prior to 1993 to ensure a more balanced comparison.
tidy_storms <- tidy_storms[year >= 1993]
Next, we handle missing values (NAs) and encoded exponents:
# Check for NAs
tidy_storms[, colMeans(is.na(.SD)) |> round(2)] |> sort(decreasing = TRUE)
## cropdmgexp propdmgexp remarks end_date bgn_date state state_2
## 0.60 0.44 0.14 0.08 0.00 0.00 0.00
## county countyname evtype fatalities injuries propdmg cropdmg
## 0.00 0.00 0.00 0.00 0.00 0.00 0.00
## refnum year
## 0.00 0.00
Crop and property damage exponents are often missing, or encoded with characters (as shown next). We implement a mapping function to convert these symbols (H, K, M, B) into numeric multipliers.
# Check exponent encoding
tidy_storms[, .(propdmgexp = unique(propdmgexp))] |> c()
## $propdmgexp
## [1] NA "B" "K" "M" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
tidy_storms[, .(cropdmgexp = unique(cropdmgexp))] |> c()
## $cropdmgexp
## [1] NA "M" "K" "m" "B" "?" "0" "k" "2"
# Function to map exponents
map_exp <- function(x) {
x <- toupper(as.character(x))
fcase(
x %like% "H", 1e2,
x %like% "K", 1e3,
x %like% "M", 1e6,
x %like% "B", 1e9,
# Numeric exponents (0-8) are treated as powers of 10
x %chin% as.character(0:9), 10^as.numeric(x),
# Default case (symbols like +, -, ?) means no multiplier (1)
default = 1
)
}
# Calculate total monetary loss
tidy_storms[, prop_total := propdmg * map_exp(propdmgexp)]
tidy_storms[, crop_total := cropdmg * map_exp(cropdmgexp)]
tidy_storms[, economic_loss := fcoalesce(prop_total, 0) + fcoalesce(crop_total, 0)]
Now, we address the inconsistency in Event Types (evtype). The raw data contains hundreds of variations. We use regular expressions to group them into the standard NOAA categories.
# Standardize format
tidy_storms[, evtype := stringr::str_to_title(evtype)]
# Remove summary rows
tidy_storms <- tidy_storms[!evtype %like% "(?i)Summary"]
# Apply cleaning logic
tidy_storms[, clean_event := fcase(
# 1. HEAT
evtype %like% "(?i)Heat|Warm|Hot|Record High", "Excessive Heat",
# 2. TORNADO & TROPICAL
evtype %like% "(?i)Tornado|Torn?da?o|Whirlwind|Gustnado", "Tornado",
evtype %like% "(?i)Hurricane|Typhoon", "Hurricane/Typhoon",
evtype %like% "(?i)Tropical Storm", "Tropical Storm",
# 3. FLOOD
evtype %like% "(?i)Flash Flood|Rapidly Rising", "Flash Flood",
evtype %like% "(?i)Lakeshore Flood", "Lakeshore Flood",
evtype %like% "(?i)Coastal Flood|Cstl Flood|Tidal", "Coastal Flood",
evtype %like% "(?i)Flood|Fld|Stream|Urban", "Flood",
# 4. WIND & THUNDERSTORM
evtype %like% "(?i)Marine (Thunderstorm|Tstm)", "Marine Thunderstorm Wind",
evtype %like% "(?i)Thunderstorm|Tstm|Thund?e?e?r?sto?r?m|Burst|Tunderstorm", "Thunderstorm Wind",
evtype %like% "(?i)Marine High Wind", "Marine High Wind",
evtype %like% "(?i)High Wind|High Wind", "High Wind",
evtype %like% "(?i)Marine Strong Wind", "Marine Strong Wind",
evtype %like% "(?i)Strong Wind|Gusty|^Winds?$", "Strong Wind",
# 5. WINTER & COLD
evtype %like% "(?i)Extreme (Cold|Wind ?Chill|Windchill)", "Extreme Cold/Wind Chill",
evtype %like% "(?i)Cold|Wind Chill|Hypothermia|Low Temp|Exposure", "Cold/Wind Chill",
evtype %like% "(?i)Blizzard", "Blizzard",
evtype %like% "(?i)Winter Storm", "Winter Storm",
evtype %like% "(?i)Winter Weather|Wintry|Glaze|Black Ice|Ice On Road|Freezing", "Winter Weather",
evtype %like% "(?i)Ice Storm", "Ice Storm",
evtype %like% "(?i)Snow", "Heavy Snow",
evtype %like% "(?i)Frost|Freeze", "Frost/Freeze",
# 6. MARINE & COASTAL
evtype %like% "(?i)Rip Current", "Rip Current",
evtype %like% "(?i)Surf|High Tide|High Water|Swells|Waves|Seas|Drowning", "High Surf",
evtype %like% "(?i)Storm Surge|Coastal ?storm", "Storm Surge/Tide",
# 7. OTHERS
evtype %like% "(?i)Ligh?tning|Ligntning", "Lightning",
evtype %like% "(?i)Wildfire|Fire|Forest Fire", "Wildfire",
evtype %like% "(?i)Landslide|Mud ?slide|Debris|Rock Slide", "Debris Flow",
evtype %like% "(?i)Avalanche|Avalance", "Avalanche",
evtype %like% "(?i)Drought", "Drought",
evtype %like% "(?i)Fog", "Dense Fog",
evtype %like% "(?i)Rain|Shower|Precip", "Heavy Rain",
evtype %like% "(?i)Other|Summary|\\?|None", "Other",
default = evtype
)]
# Reduction metric
reduction_pct <- round(tidy_storms[, uniqueN(clean_event)] / tidy_storms[, uniqueN(evtype)] * 100, 2)
We have significantly reduced the noise in the data, retaining only 22.5% of the original event types. For geographical analysis, we also map states to US Regions:
northeast <- c("CT", "ME", "MA", "NH", "RI", "VT", "NJ", "NY", "PA")
midwest <- c("IL", "IN", "MI", "OH", "WI", "IA", "KS", "MN", "MO", "NE", "ND", "SD")
south <- c("DE", "FL", "GA", "MD", "NC", "SC", "VA", "DC", "WV", "AL", "KY", "MS", "TN", "AR", "LA", "OK", "TX")
west <- c("AZ", "CO", "ID", "MT", "NV", "NM", "UT", "WY", "AK", "CA", "HI", "OR", "WA")
tidy_storms[, region := fcase(
state_2 %in% northeast, "Northeast",
state_2 %in% midwest, "Midwest",
state_2 %in% south, "South",
state_2 %in% west, "West",
state_2 %in% c("PR", "GU", "AS", "VI", "MH", "AM"), "Territories",
default = "Marine/Other"
)]
Finally, during data exploration, a significant outlier was detected in Napa Valley (2006), where a flood was incorrectly recorded with a “B” (Billions) exponent instead of “M” (Millions) in the remarks.
# Display the outlier
tidy_storms[countyname == "NAPA" & year == 2006 & propdmgexp == "B",
.(evtype, propdmg, propdmgexp, remarks)]
## evtype propdmg propdmgexp
## <char> <num> <char>
## 1: Flood 115 B
## remarks
## <char>
## 1: Major flooding continued into the early hours of January 1st, before the Napa River finally fell below flood stage and the water receeded. Flooding was severe in Downtown Napa from the Napa Creek and the City and Parks Department was hit with $6 million in damage alone. The City of Napa had 600 homes with moderate damage, 150 damaged businesses with costs of at least $70 million.
# Fix the outlier
tidy_storms[countyname == "NAPA" & year == 2006 & propdmgexp == "B",
`:=`(prop_total = prop_total / 1000,
economic_loss = (prop_total / 1000) + crop_total)]
In this section, we prepare the aggregated tables that will be used in the Results section.
First, we aggregate the health impact data:
# ---- Health impact ----
total_fatalities <- tidy_storms[, sum(fatalities)]
total_injuries <- tidy_storms[, sum(injuries)]
health_table <- tidy_storms[, .(
fatalities = sum(fatalities),
fatalities_pct = round(sum(fatalities) / total_fatalities * 100, 2),
injuries = sum(injuries),
injuries_pct = round(sum(injuries) / total_injuries * 100, 2)
), by = clean_event][order(-fatalities)]
# Add cumulative percentages
health_table[, `:=`(
fatalities_cumpct = cumsum(fatalities_pct),
injuries_cumpct = cumsum(injuries_pct)
)]
# Filter non-zero events and keep top 20
health_table <- health_table[!(injuries == 0 & fatalities == 0)][1:20]
health_table
## clean_event fatalities fatalities_pct injuries injuries_pct
## <char> <num> <num> <num> <num>
## 1: Excessive Heat 3178 29.25 9243 13.44
## 2: Tornado 1650 15.19 23371 33.99
## 3: Flash Flood 1036 9.54 1802 2.62
## 4: Lightning 817 7.52 5231 7.61
## 5: Rip Current 577 5.31 529 0.77
## 6: Flood 512 4.71 6873 9.99
## 7: Thunderstorm Wind 450 4.14 6213 9.04
## 8: Extreme Cold/Wind Chill 304 2.80 260 0.38
## 9: High Wind 296 2.72 1522 2.21
## 10: Avalanche 225 2.07 170 0.25
## 11: Winter Storm 216 1.99 1338 1.95
## 12: High Surf 183 1.68 259 0.38
## 13: Cold/Wind Chill 180 1.66 61 0.09
## 14: Heavy Snow 148 1.36 1122 1.63
## 15: Strong Wind 140 1.29 400 0.58
## 16: Hurricane/Typhoon 135 1.24 1333 1.94
## 17: Heavy Rain 103 0.95 306 0.44
## 18: Blizzard 101 0.93 806 1.17
## 19: Wildfire 90 0.83 1608 2.34
## 20: Ice Storm 89 0.82 1975 2.87
## clean_event fatalities fatalities_pct injuries injuries_pct
## <char> <num> <num> <num> <num>
## fatalities_cumpct injuries_cumpct
## <num> <num>
## 1: 29.25 13.44
## 2: 44.44 47.43
## 3: 53.98 50.05
## 4: 61.50 57.66
## 5: 66.81 58.43
## 6: 71.52 68.42
## 7: 75.66 77.46
## 8: 78.46 77.84
## 9: 81.18 80.05
## 10: 83.25 80.30
## 11: 85.24 82.25
## 12: 86.92 82.63
## 13: 88.58 82.72
## 14: 89.94 84.35
## 15: 91.23 84.93
## 16: 92.47 86.87
## 17: 93.42 87.31
## 18: 94.35 88.48
## 19: 95.18 90.82
## 20: 96.00 93.69
## fatalities_cumpct injuries_cumpct
## <num> <num>
The top 20 events explain 96% of total fatalities and 93.69% of injuries. We will focus our health analysis on these categories.
Next, we aggregate the economic impact data (Property and Crop damage):
# ---- Economic impact ----
total_loss <- tidy_storms[, sum(economic_loss, na.rm = TRUE)]
prop_loss <- tidy_storms[, sum(prop_total, na.rm = TRUE)]
crop_loss <- tidy_storms[, sum(crop_total, na.rm = TRUE)]
economy_table <- tidy_storms[, .(
prop_total = sum(prop_total) / 10^6, # Convert to Millions
prop_total_pct = round(sum(prop_total) / prop_loss * 100, 2),
crop_total = sum(crop_total)/ 10^6, # Convert to Millions
crop_total_pct = round(sum(crop_total) / crop_loss * 100, 2)
), by = clean_event][order(-prop_total)]
# Add cumulative percentages
economy_table[, `:=`(
prop_total_cumpct = cumsum(prop_total_pct),
crop_total_cumpct = cumsum(crop_total_pct)
)]
# Filter non-zero events and keep top 20
economy_table <- economy_table[!(crop_total == 0 & prop_total == 0)][1:20]
economy_table
## clean_event prop_total prop_total_pct crop_total crop_total_pct
## <char> <num> <num> <num> <num>
## 1: Hurricane/Typhoon 85356.4100 30.19 5516.11780 11.23
## 2: Storm Surge/Tide 47964.7740 16.96 0.85500 0.00
## 3: Flood 35355.3794 12.50 10856.34405 22.11
## 4: Tornado 28005.2339 9.90 417.46307 0.85
## 5: Flash Flood 17588.7921 6.22 1532.19715 3.12
## 6: Hail 15735.2675 5.57 3025.95447 6.16
## 7: Thunderstorm Wind 11185.7519 3.96 1271.64399 2.59
## 8: Wildfire 8496.6285 3.01 403.28163 0.82
## 9: Tropical Storm 7714.3906 2.73 694.89600 1.42
## 10: Winter Storm 6689.4973 2.37 27.44400 0.06
## 11: High Wind 6058.5060 2.14 691.80190 1.41
## 12: Ice Storm 3946.0279 1.40 5022.11350 10.23
## 13: Heavy Rain 3213.1742 1.14 794.65280 1.62
## 14: Drought 1046.1060 0.37 13972.56600 28.45
## 15: Heavy Snow 1009.5897 0.36 134.66310 0.27
## 16: Lightning 940.4474 0.33 12.09209 0.02
## 17: Blizzard 664.8640 0.24 112.06000 0.23
## 18: Coastal Flood 433.8291 0.15 0.05600 0.00
## 19: Debris Flow 326.8261 0.12 20.01700 0.04
## 20: Hailstorm 241.0000 0.09 0.00000 0.00
## clean_event prop_total prop_total_pct crop_total crop_total_pct
## <char> <num> <num> <num> <num>
## prop_total_cumpct crop_total_cumpct
## <num> <num>
## 1: 30.19 11.23
## 2: 47.15 11.23
## 3: 59.65 33.34
## 4: 69.55 34.19
## 5: 75.77 37.31
## 6: 81.34 43.47
## 7: 85.30 46.06
## 8: 88.31 46.88
## 9: 91.04 48.30
## 10: 93.41 48.36
## 11: 95.55 49.77
## 12: 96.95 60.00
## 13: 98.09 61.62
## 14: 98.46 90.07
## 15: 98.82 90.34
## 16: 99.15 90.36
## 17: 99.39 90.59
## 18: 99.54 90.59
## 19: 99.66 90.63
## 20: 99.75 90.63
## prop_total_cumpct crop_total_cumpct
## <num> <num>
The top 20 events explain 99.75% of total property damage and 90.63% of crop damage. Note that values are expressed in Millions of USD.
While the individual analyses of health and economic impacts provide valuable insights, they reveal a discrepancy: some events are financially devastating but cause few casualties, while others are lethal but cause minimal property damage.
# ---- Global impact discrepancies ----
health_events <- health_table[, unique(clean_event)]
eco_events <- economy_table[, unique(clean_event)]
# Events present in Top 20 Health but NOT in Top 20 Economy
health_only <- health_events[!health_events %chin% eco_events]
health_only
## [1] "Excessive Heat" "Rip Current"
## [3] "Extreme Cold/Wind Chill" "Avalanche"
## [5] "High Surf" "Cold/Wind Chill"
## [7] "Strong Wind"
# Events present in Top 20 Economy but NOT in Top 20 Health
eco_only <- eco_events[!eco_events %chin% health_events]
eco_only
## [1] "Storm Surge/Tide" "Hail" "Tropical Storm" "Drought"
## [5] "Coastal Flood" "Debris Flow" "Hailstorm"
To address this and determine the “most harmful” events in a holistic manner, we construct a unified metric: Total Economic Burden. We monetize health impacts using the Value of Statistical Life (VSL) approach. Following guidance from the U.S. Department of Transportation, we assign a value of $13.2 million to each fatality and 10% of that value ($1.32 million) to each injury.
While we acknowledge that assigning a monetary value to human life is an imperfect heuristic, it allows for a standardized comparison of magnitude across different types of disasters.
# VSL Constants (in USD)
vsl_value <- 13.2e6
injury_value <- 13.2e5
# Estimate Global Burden
tidy_storms[, health_loss := (fatalities * vsl_value) + (injuries * injury_value)]
tidy_storms[, total_burden := economic_loss + health_loss]
sum_burden <- tidy_storms[, sum(total_burden, na.rm = TRUE)]
# Create the Top 20 Global Impact Table
top_events <- tidy_storms[, .(
total_events = .N,
total_burden_B = sum(total_burden) / 1e9, # Billions
total_burden_B_pct = round(sum(total_burden) / sum_burden * 100, 2),
fatalities = sum(fatalities),
injuries = sum(injuries),
health_loss_B = sum(health_loss)/ 1e9,
prop_dmg_B = sum(prop_total) / 1e9,
crop_dmg_B = sum(crop_total) / 1e9,
economic_loss = sum(economic_loss) / 1e9
), by = clean_event][order(-total_burden_B)][, total_burden_B_pct_cum := cumsum(total_burden_B_pct)][1:20]
data.table::setcolorder(top_events,
c("clean_event","total_events", "total_burden_B", "total_burden_B_pct",
"total_burden_B_pct_cum", "fatalities", "injuries",
"health_loss_B", "prop_dmg_B", "crop_dmg_B", "economic_loss"))
# Formatting for display
table_1 <- flextable::flextable(top_events) |>
flextable::set_header_labels(
clean_event = "Event Type",
total_events = "Count",
total_burden_B = "Total Burden ($B)",
total_burden_B_pct = "% Total",
total_burden_B_pct_cum = "Cum %",
fatalities = "Deaths",
injuries = "Injuries",
health_loss_B = "Health Loss ($B)",
prop_dmg_B = "Prop. Loss ($B)",
crop_dmg_B = "Crop Loss ($B)",
economic_loss = "Total Eco. Loss ($B)"
) |>
flextable::colformat_double(digits = 2) |>
flextable::colformat_int(j = c("total_events", "fatalities", "injuries")) |>
flextable::align(j = 1, align = "left", part = "all") |>
flextable::align(j = 2:11, align = "right", part = "all") |>
flextable::width(j = 1, width = 1.8) |>
flextable::width(j = 2:11, width = 0.8) |>
flextable::fontsize(size = 9, part = "all") |>
flextable::bold(part = "header") |>
flextable::set_caption("Table 1: Top 20 Weather Events by Integrated Economic Burden (1993-2011)")
Finally, to visualize how these impacts are distributed across time and space (US Regions), we prepare the dataset for a heatmap visualization. We generate a complete grid of years, regions, and events to ensure that years with zero recorded losses are accurately represented as such, rather than being omitted.
# 1. Identify the Top 20 events by Total Burden for the plot
top_20_names <- tidy_storms[, .(t = sum(total_burden)), by = clean_event][order(-t)][1:20, clean_event]
# 2. Define Regions and Factor Levels for plotting order
regions_vec <- unique(tidy_storms$region)
niveles_region <- c("Northeast", "Midwest", "South", "West", "Territories", "Marine/Other")
# 3. Create a complete grid (Cross Join) to handle missing years/regions
complete_grid <- CJ(year = unique(tidy_storms$year),
clean_event = top_20_names,
region = regions_vec)
# 4. Aggregate data
regions_heatmap <- tidy_storms[clean_event %in% top_20_names,
.(burden = sum(total_burden)),
by = .(year, clean_event, region)]
# 5. Merge grid with data and fill NAs with 0
heatmap_full <- regions_heatmap[complete_grid, on = .(year, clean_event, region)]
heatmap_full[is.na(burden), burden := 0]
# 6. Set Factors for Plot Ordering
# Regions: Geographical order
heatmap_full[, region := factor(region, levels = niveles_region)]
# Events: Most severe (Flood/Hurricane) on top
heatmap_full[, clean_event := factor(clean_event, levels = rev(top_20_names))]
# 7. Create the Plot Object (to be printed in Results)
impact_heatmap <- ggplot(heatmap_full, aes(x = year, y = clean_event, fill = burden)) +
geom_tile(color = "white", linewidth = 0.05) +
scale_fill_viridis_c(
trans = "log10",
labels = scales::label_dollar(scale = 1e-6, suffix = "M"),
na.value = "#440154",
name = "Total Burden\n(Economy + Health)"
) +
facet_wrap(~region, ncol = 2) +
theme_minimal(base_size = 11) +
labs(
title = "Severity Hierarchy by Region and Year",
subtitle = "Events ordered by cumulative global impact (Most severe on top)",
x = "Year", y = ""
) +
theme(
axis.text.x = element_text(angle = 45, hjust = 1, size = 7),
axis.text.y = element_text(size = 9),
strip.background = element_rect(fill = "gray95", color = NA),
strip.text = element_text(face = "bold"),
panel.grid = element_blank(),
panel.spacing = unit(1, "lines"),
legend.position = "bottom",
legend.key.width = unit(2, "cm"),
legend.key.height = unit(0.5, "cm"),
)
Between the years 1993 and 2011, the NOAA database captured a total of 714,662 extreme weather events. These disasters resulted in 10,865 fatalities and 68,765 injuries. Beyond the human toll, the economic impact was staggering, with an estimated $331.85 billion in property and crop damage.
To quantify the global burden of these events, we constructed a unified metric: Total Economic Burden. This was achieved by monetizing health impacts using the Value of Statistical Life (VSL) approach, as detailed in the Data Processing section. This allows for a direct comparison between the immediate financial costs of infrastructure damage and the profound societal cost of human loss.
Table 1 summarizes the health and economic impacts of the 20 most catastrophic event categories. As shown, the distribution of damage is highly concentrated; a small number of event types account for the vast majority of both human and financial losses.
table_1
Event Type | Count | Total Burden ($B) | % Total | Cum % | Deaths | Injuries | Health Loss ($B) | Prop. Loss ($B) | Crop Loss ($B) | Total Eco. Loss ($B) |
|---|---|---|---|---|---|---|---|---|---|---|
Hurricane/Typhoon | 299 | 94.41 | 16.68 | 16.68 | 135.00 | 1,333.00 | 3.54 | 85.36 | 5.52 | 90.87 |
Tornado | 25,947 | 81.05 | 14.32 | 31.00 | 1,650.00 | 23,371.00 | 52.63 | 28.01 | 0.42 | 28.42 |
Flood | 29,575 | 62.04 | 10.96 | 41.96 | 512.00 | 6,873.00 | 15.83 | 35.36 | 10.86 | 46.21 |
Excessive Heat | 3,004 | 55.08 | 9.73 | 51.69 | 3,178.00 | 9,243.00 | 54.15 | 0.02 | 0.90 | 0.92 |
Storm Surge/Tide | 420 | 48.39 | 8.55 | 60.24 | 28.00 | 45.00 | 0.43 | 47.96 | 0.00 | 47.97 |
Flash Flood | 55,669 | 35.17 | 6.21 | 66.45 | 1,036.00 | 1,802.00 | 16.05 | 17.59 | 1.53 | 19.12 |
Thunderstorm Wind | 234,080 | 26.60 | 4.70 | 71.15 | 450.00 | 6,213.00 | 14.14 | 11.19 | 1.27 | 12.46 |
Hail | 226,829 | 20.16 | 3.56 | 74.71 | 10.00 | 960.00 | 1.40 | 15.74 | 3.03 | 18.76 |
Lightning | 15,765 | 18.64 | 3.29 | 78.00 | 817.00 | 5,231.00 | 17.69 | 0.94 | 0.01 | 0.95 |
Drought | 2,488 | 15.02 | 2.65 | 80.65 | 0.00 | 4.00 | 0.01 | 1.05 | 13.97 | 15.02 |
Ice Storm | 2,030 | 12.75 | 2.25 | 82.90 | 89.00 | 1,975.00 | 3.78 | 3.95 | 5.02 | 8.97 |
High Wind | 21,815 | 12.67 | 2.24 | 85.14 | 296.00 | 1,522.00 | 5.92 | 6.06 | 0.69 | 6.75 |
Wildfire | 4,239 | 12.21 | 2.16 | 87.30 | 90.00 | 1,608.00 | 3.31 | 8.50 | 0.40 | 8.90 |
Winter Storm | 11,437 | 11.33 | 2.00 | 89.30 | 216.00 | 1,338.00 | 4.62 | 6.69 | 0.03 | 6.72 |
Tropical Storm | 697 | 9.79 | 1.73 | 91.03 | 66.00 | 383.00 | 1.38 | 7.71 | 0.69 | 8.41 |
Rip Current | 777 | 8.31 | 1.47 | 92.50 | 577.00 | 529.00 | 8.31 | 0.00 | 0.00 | 0.00 |
Heavy Rain | 11,949 | 5.77 | 1.02 | 93.52 | 103.00 | 306.00 | 1.76 | 3.21 | 0.79 | 4.01 |
Extreme Cold/Wind Chill | 1,890 | 5.76 | 1.02 | 94.54 | 304.00 | 260.00 | 4.36 | 0.08 | 1.33 | 1.41 |
Heavy Snow | 17,611 | 4.58 | 0.81 | 95.35 | 148.00 | 1,122.00 | 3.43 | 1.01 | 0.13 | 1.14 |
Avalanche | 387 | 3.20 | 0.57 | 95.92 | 225.00 | 170.00 | 3.19 | 0.00 | 0.00 | 0.00 |
While the aggregate data provides a national overview, extreme weather impact is highly dependent on regional climate and infrastructure. The heatmap below illustrates how the Total Burden is distributed across time and US regions, revealing specific patterns of vulnerability.
impact_heatmap
Heatmap of Weather Impact
A breakdown of the Total Economic Burden by region confirms that the South is the most severely affected area in the United States, accounting for over $225 billion in total damages. This is primarily driven by its unique vulnerability to both high-fatality events (Tornadoes, Heat) and high-property-damage events (Hurricanes).
The regional distribution is as follows:
South: 226.15 B
West: 26.19 B
Midwest: West: 60.01 B
Northeast: $15.32 15.32 B
The implementation of the Total Economic Burden metric allows for a comprehensive comparison of events with fundamentally different profiles. While traditional metrics might focus solely on casualties or infrastructure repair costs, this integrated approach reveals the true scale of weather-related disasters.
A staggering finding is that only five event types (Hurricane/Typhoon, Tornado, Flood, Excessive Heat, and Storm Surge/Tide) account for more than 60% of the total global burden. This suggests that federal and regional mitigation strategies should prioritize these high-impact categories to achieve the greatest reduction in overall risk.
The VSL-based approach proved to be well-proportioned for this dataset. For instance, the Health Loss from Tornadoes ($52.63B) and Excessive Heat ($54.15B) sits on a comparable scale with the Economic Loss from Hurricanes ($90.87B). This balance ensures that neither the human toll nor the financial destruction is overlooked in the final ranking.
The analysis highlights that “one size fits all” disaster policies are ineffective. The South bears a disproportionate share of the national burden, largely due to the extreme costs associated with Hurricanes and Storm Surges. Meanwhile, the West faces a distinct set of challenges, often tied to wildfire and hydrological volatility.
It is important to acknowledge that this model does not account for inflation (the value of the USD has changed significantly since 1993) or the long-term psychological and sociological impacts of these disasters. Furthermore, the 10% valuation for injuries is a conservative estimate that may vary depending on the severity of the medical cases.
In conclusion, while kinetic events like Tornadoes and Hurricanes dominate the public eye due to their visual destruction, thermal extremes (Excessive Heat) represent an equally costly and lethal threat that requires sustained public health intervention.