Synopsis

This analysis explores the U.S. National Oceanic and Atmospheric Administration’s (NOAA) Storm Database to quantify the impact of extreme weather events in terms of human health and economic losses. The goal is to identify which event categories are the most harmful to the population and the economy. To ensure data consistency and representativeness across all event types, the study focuses on the period from 1993 to 2011. A unified metric, the Total Economic Burden, was constructed by monetizing health impacts using the Value of Statistical Life (VSL) approach, allowing for a direct comparison between mortality and financial damage. The findings reveal that just five event types—Hurricane/Typhoon, Tornado, Flood, Excessive Heat, and Storm Surge—account for over 60% of the total national burden. While Tornadoes lead in total injuries, Excessive Heat stands as the primary cause of weather-related fatalities. Geographically, the South emerged as the most severely affected region, bearing a disproportionate share of the costs due to hurricane and storm surge activity. Ultimately, the results suggest that disaster mitigation strategies must be regionally specialized to address the specific “hidden” costs of thermal extremes alongside high-visibility kinetic events.

Data Processing

The data processing section is divided into two parts. In the first part, Data Cleaning, we describe the characteristics of the raw data, the justification for the transformations required, and how these were implemented.

In the second part, Data Analysis, we describe the analytic plan and its implementation. The rationale for performing this here, rather than in the Results section, is to ensure the latter focuses primarily on the findings, keeping it free from long code blocks and transparent to the reader.

Data Cleaning

During the initial processing steps, it became evident that the dataset was not suitable for analysis in its raw state. Dates were in an irregular format, preventing clear identification of the year—a variable crucial to this analysis. Event names were also problematic; they did not strictly reflect the 48 categories recognized by NOAA, but included hundreds of variations and typos. Furthermore, many columns contained missing or redundant information that had to be removed for computational efficiency.

Additionally, monetary losses were encoded using character suffixes (e.g., “K”, “M”) and needed to be converted into a numeric format. Finally, an imputation error regarding the losses of a specific disaster in Napa Valley was identified and corrected. The implementation of these steps is detailed below.

We start by loading the required packages and the raw data:

# ---- Get files and libraries ----
if(!require(pacman)) install.packages("pacman")
pacman::p_load(data.table, 
               ggplot2, 
               lubridate, 
               flextable, 
               stringr)

# Download data if not present
if(!file.exists("FStormData.csv")){
  download.file(url = "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
                destfile = file.path(getwd(), "FStormData.csv"))
}

storms <- fread("FStormData.csv",
                na.strings = "") |> janitor::clean_names()

Now we explore the structure of the data and start with the necessary transformations:

str(storms)
## Classes 'data.table' and 'data.frame':   902297 obs. of  37 variables:
##  $ state      : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ bgn_date   : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ bgn_time   : chr  "0130" "0145" "1600" "0900" ...
##  $ time_zone  : chr  "CST" "CST" "CST" "CST" ...
##  $ county     : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ countyname : chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ state_2    : chr  "AL" "AL" "AL" "AL" ...
##  $ evtype     : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ bgn_range  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ bgn_azi    : chr  NA NA NA NA ...
##  $ bgn_locati : chr  NA NA NA NA ...
##  $ end_date   : chr  NA NA NA NA ...
##  $ end_time   : chr  NA NA NA NA ...
##  $ county_end : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ countyendn : logi  NA NA NA NA NA NA ...
##  $ end_range  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ end_azi    : chr  NA NA NA NA ...
##  $ end_locati : chr  NA NA NA NA ...
##  $ length     : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ width      : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ f          : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ mag        : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ fatalities : num  0 0 0 0 0 0 0 0 1 0 ...
##  $ injuries   : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ propdmg    : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ propdmgexp : chr  "K" "K" "K" "K" ...
##  $ cropdmg    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ cropdmgexp : chr  NA NA NA NA ...
##  $ wfo        : chr  NA NA NA NA ...
##  $ stateoffic : chr  NA NA NA NA ...
##  $ zonenames  : chr  NA NA NA NA ...
##  $ latitude   : num  3040 3042 3340 3458 3412 ...
##  $ longitude  : num  8812 8755 8742 8626 8642 ...
##  $ latitude_e : num  3051 0 0 0 0 ...
##  $ longitude_2: num  8806 0 0 0 0 ...
##  $ remarks    : chr  NA NA NA NA ...
##  $ refnum     : num  1 2 3 4 5 6 7 8 9 10 ...
##  - attr(*, ".internal.selfref")=<externalptr>
# We keep only the columns relevant to our problem (dates, location, type, and damage)
tidy_storms <- storms[, .(bgn_date, 
                          end_date, 
                          state = as.factor(state),
                          state_2 = as.factor(state_2), 
                          county = as.factor(county),
                          countyname, 
                          evtype, 
                          fatalities,
                          injuries, 
                          propdmg, 
                          propdmgexp, 
                          cropdmg, 
                          cropdmgexp,
                          remarks, 
                          refnum)]

# Date formatting: removing timestamps and converting to Date objects
dates <- names(tidy_storms)[names(tidy_storms) %like% "date"] 
tidy_storms[, (dates) := lapply(.SD, sub, pattern = " .*$", replacement = ""), .SDcols = dates]
tidy_storms[, (dates) := lapply(.SD, mdy), .SDcols = dates]

# Creating a "year" column for temporal analysis
tidy_storms[, year := year(bgn_date)]

We now assess consistency in data recording across years:

# Check reporting variability across decades

min_year <- tidy_storms[, unique(year)] |> min()
max_year <- tidy_storms[, unique(year)] |> max()
tidy_storms[year%in%c(seq(min_year, max_year, 10)), .(`Reported events` = uniqueN(evtype)), year]
##     year Reported events
##    <num>           <int>
## 1:  1950               1
## 2:  1960               3
## 3:  1970               3
## 4:  1980               3
## 5:  1990               3
## 6:  2000             112
## 7:  2010              46
# Check reported events every 10 years
tidy_storms[year %in% c(seq(1950, 1990, 10)), 
            .(`Reported events` = unique(evtype)), year]
##      year Reported events
##     <num>          <char>
##  1:  1950         TORNADO
##  2:  1960         TORNADO
##  3:  1960       TSTM WIND
##  4:  1960            HAIL
##  5:  1970       TSTM WIND
##  6:  1970         TORNADO
##  7:  1970            HAIL
##  8:  1980            HAIL
##  9:  1980       TSTM WIND
## 10:  1980         TORNADO
## 11:  1990            HAIL
## 12:  1990       TSTM WIND
## 13:  1990         TORNADO

We can see that earlier years in the dataset only record three events: Tornado, Tstm Wind, and Hail. Including these years would heavily bias the analysis in favor of these events simply because more data are available for them. Therefore, we will exclude data prior to 1993 to ensure a more balanced comparison.

tidy_storms <- tidy_storms[year >= 1993]

Next, we handle missing values (NAs) and encoded exponents:

# Check for NAs
tidy_storms[, colMeans(is.na(.SD)) |> round(2)] |> sort(decreasing = TRUE)
## cropdmgexp propdmgexp    remarks   end_date   bgn_date      state    state_2 
##       0.60       0.44       0.14       0.08       0.00       0.00       0.00 
##     county countyname     evtype fatalities   injuries    propdmg    cropdmg 
##       0.00       0.00       0.00       0.00       0.00       0.00       0.00 
##     refnum       year 
##       0.00       0.00

Crop and property damage exponents are often missing, or encoded with characters (as shown next). We implement a mapping function to convert these symbols (H, K, M, B) into numeric multipliers.

# Check exponent encoding
tidy_storms[, .(propdmgexp = unique(propdmgexp))] |> c()
## $propdmgexp
##  [1] NA  "B" "K" "M" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
tidy_storms[, .(cropdmgexp = unique(cropdmgexp))] |> c()
## $cropdmgexp
## [1] NA  "M" "K" "m" "B" "?" "0" "k" "2"
# Function to map exponents
map_exp <- function(x) {
  x <- toupper(as.character(x))
  fcase(
    x %like% "H", 1e2,
    x %like% "K", 1e3,
    x %like% "M", 1e6,
    x %like% "B", 1e9,
    # Numeric exponents (0-8) are treated as powers of 10
    x %chin% as.character(0:9), 10^as.numeric(x),  
    # Default case (symbols like +, -, ?) means no multiplier (1)
    default = 1
  )
}

# Calculate total monetary loss
tidy_storms[, prop_total := propdmg * map_exp(propdmgexp)]
tidy_storms[, crop_total := cropdmg * map_exp(cropdmgexp)]
tidy_storms[, economic_loss := fcoalesce(prop_total, 0) + fcoalesce(crop_total, 0)]

Now, we address the inconsistency in Event Types (evtype). The raw data contains hundreds of variations. We use regular expressions to group them into the standard NOAA categories.

# Standardize format
tidy_storms[, evtype := stringr::str_to_title(evtype)]

# Remove summary rows
tidy_storms <- tidy_storms[!evtype %like% "(?i)Summary"]

# Apply cleaning logic
tidy_storms[, clean_event := fcase(
  # 1. HEAT
  evtype %like% "(?i)Heat|Warm|Hot|Record High", "Excessive Heat",
  # 2. TORNADO & TROPICAL
  evtype %like% "(?i)Tornado|Torn?da?o|Whirlwind|Gustnado", "Tornado",
  evtype %like% "(?i)Hurricane|Typhoon", "Hurricane/Typhoon",
  evtype %like% "(?i)Tropical Storm", "Tropical Storm",
  # 3. FLOOD
  evtype %like% "(?i)Flash Flood|Rapidly Rising", "Flash Flood",
  evtype %like% "(?i)Lakeshore Flood", "Lakeshore Flood",
  evtype %like% "(?i)Coastal Flood|Cstl Flood|Tidal", "Coastal Flood",
  evtype %like% "(?i)Flood|Fld|Stream|Urban", "Flood",
  # 4. WIND & THUNDERSTORM
  evtype %like% "(?i)Marine (Thunderstorm|Tstm)", "Marine Thunderstorm Wind",
  evtype %like% "(?i)Thunderstorm|Tstm|Thund?e?e?r?sto?r?m|Burst|Tunderstorm", "Thunderstorm Wind",
  evtype %like% "(?i)Marine High Wind", "Marine High Wind",
  evtype %like% "(?i)High Wind|High  Wind", "High Wind",
  evtype %like% "(?i)Marine Strong Wind", "Marine Strong Wind",
  evtype %like% "(?i)Strong Wind|Gusty|^Winds?$", "Strong Wind",
  # 5. WINTER & COLD
  evtype %like% "(?i)Extreme (Cold|Wind ?Chill|Windchill)", "Extreme Cold/Wind Chill",
  evtype %like% "(?i)Cold|Wind Chill|Hypothermia|Low Temp|Exposure", "Cold/Wind Chill",
  evtype %like% "(?i)Blizzard", "Blizzard",
  evtype %like% "(?i)Winter Storm", "Winter Storm",
  evtype %like% "(?i)Winter Weather|Wintry|Glaze|Black Ice|Ice On Road|Freezing", "Winter Weather",
  evtype %like% "(?i)Ice Storm", "Ice Storm",
  evtype %like% "(?i)Snow", "Heavy Snow",
  evtype %like% "(?i)Frost|Freeze", "Frost/Freeze",
  # 6. MARINE & COASTAL
  evtype %like% "(?i)Rip Current", "Rip Current",
  evtype %like% "(?i)Surf|High Tide|High Water|Swells|Waves|Seas|Drowning", "High Surf",
  evtype %like% "(?i)Storm Surge|Coastal ?storm", "Storm Surge/Tide",
  # 7. OTHERS
  evtype %like% "(?i)Ligh?tning|Ligntning", "Lightning",
  evtype %like% "(?i)Wildfire|Fire|Forest Fire", "Wildfire",
  evtype %like% "(?i)Landslide|Mud ?slide|Debris|Rock Slide", "Debris Flow",
  evtype %like% "(?i)Avalanche|Avalance", "Avalanche",
  evtype %like% "(?i)Drought", "Drought",
  evtype %like% "(?i)Fog", "Dense Fog",
  evtype %like% "(?i)Rain|Shower|Precip", "Heavy Rain",
  evtype %like% "(?i)Other|Summary|\\?|None", "Other",
  default = evtype
)]

# Reduction metric
reduction_pct <- round(tidy_storms[, uniqueN(clean_event)] / tidy_storms[, uniqueN(evtype)] * 100, 2)

We have significantly reduced the noise in the data, retaining only 22.5% of the original event types. For geographical analysis, we also map states to US Regions:

northeast <- c("CT", "ME", "MA", "NH", "RI", "VT", "NJ", "NY", "PA")
midwest   <- c("IL", "IN", "MI", "OH", "WI", "IA", "KS", "MN", "MO", "NE", "ND", "SD")
south     <- c("DE", "FL", "GA", "MD", "NC", "SC", "VA", "DC", "WV", "AL", "KY", "MS", "TN", "AR", "LA", "OK", "TX")
west      <- c("AZ", "CO", "ID", "MT", "NV", "NM", "UT", "WY", "AK", "CA", "HI", "OR", "WA")

tidy_storms[, region := fcase(
  state_2 %in% northeast, "Northeast",
  state_2 %in% midwest,   "Midwest",
  state_2 %in% south,     "South",
  state_2 %in% west,      "West",
  state_2 %in% c("PR", "GU", "AS", "VI", "MH", "AM"), "Territories",
  default = "Marine/Other"
)]

Finally, during data exploration, a significant outlier was detected in Napa Valley (2006), where a flood was incorrectly recorded with a “B” (Billions) exponent instead of “M” (Millions) in the remarks.

# Display the outlier
tidy_storms[countyname == "NAPA" & year == 2006 & propdmgexp == "B", 
            .(evtype, propdmg, propdmgexp, remarks)]
##    evtype propdmg propdmgexp
##    <char>   <num>     <char>
## 1:  Flood     115          B
##                                                                                                                                                                                                                                                                                                                                                                                           remarks
##                                                                                                                                                                                                                                                                                                                                                                                            <char>
## 1: Major flooding continued into the early hours of January 1st, before the Napa River finally fell below flood stage and the water receeded. Flooding was severe in Downtown Napa from the Napa Creek and the City and Parks Department was hit with $6 million in damage alone. The City of Napa had 600 homes with moderate damage, 150 damaged businesses with costs of at least $70 million.
# Fix the outlier
tidy_storms[countyname == "NAPA" & year == 2006 & propdmgexp == "B", 
            `:=`(prop_total = prop_total / 1000,
                 economic_loss = (prop_total / 1000) + crop_total)]

Data Analysis

In this section, we prepare the aggregated tables that will be used in the Results section.

First, we aggregate the health impact data:

# ---- Health impact ----
total_fatalities <- tidy_storms[, sum(fatalities)]
total_injuries <- tidy_storms[, sum(injuries)]

health_table <- tidy_storms[, .(
    fatalities = sum(fatalities),
    fatalities_pct = round(sum(fatalities) / total_fatalities * 100, 2),
    injuries = sum(injuries),
    injuries_pct = round(sum(injuries) / total_injuries * 100, 2)
  ), by = clean_event][order(-fatalities)]

# Add cumulative percentages
health_table[, `:=`(
  fatalities_cumpct = cumsum(fatalities_pct),
  injuries_cumpct = cumsum(injuries_pct)
)]

# Filter non-zero events and keep top 20
health_table <- health_table[!(injuries == 0 & fatalities == 0)][1:20]

health_table
##                 clean_event fatalities fatalities_pct injuries injuries_pct
##                      <char>      <num>          <num>    <num>        <num>
##  1:          Excessive Heat       3178          29.25     9243        13.44
##  2:                 Tornado       1650          15.19    23371        33.99
##  3:             Flash Flood       1036           9.54     1802         2.62
##  4:               Lightning        817           7.52     5231         7.61
##  5:             Rip Current        577           5.31      529         0.77
##  6:                   Flood        512           4.71     6873         9.99
##  7:       Thunderstorm Wind        450           4.14     6213         9.04
##  8: Extreme Cold/Wind Chill        304           2.80      260         0.38
##  9:               High Wind        296           2.72     1522         2.21
## 10:               Avalanche        225           2.07      170         0.25
## 11:            Winter Storm        216           1.99     1338         1.95
## 12:               High Surf        183           1.68      259         0.38
## 13:         Cold/Wind Chill        180           1.66       61         0.09
## 14:              Heavy Snow        148           1.36     1122         1.63
## 15:             Strong Wind        140           1.29      400         0.58
## 16:       Hurricane/Typhoon        135           1.24     1333         1.94
## 17:              Heavy Rain        103           0.95      306         0.44
## 18:                Blizzard        101           0.93      806         1.17
## 19:                Wildfire         90           0.83     1608         2.34
## 20:               Ice Storm         89           0.82     1975         2.87
##                 clean_event fatalities fatalities_pct injuries injuries_pct
##                      <char>      <num>          <num>    <num>        <num>
##     fatalities_cumpct injuries_cumpct
##                 <num>           <num>
##  1:             29.25           13.44
##  2:             44.44           47.43
##  3:             53.98           50.05
##  4:             61.50           57.66
##  5:             66.81           58.43
##  6:             71.52           68.42
##  7:             75.66           77.46
##  8:             78.46           77.84
##  9:             81.18           80.05
## 10:             83.25           80.30
## 11:             85.24           82.25
## 12:             86.92           82.63
## 13:             88.58           82.72
## 14:             89.94           84.35
## 15:             91.23           84.93
## 16:             92.47           86.87
## 17:             93.42           87.31
## 18:             94.35           88.48
## 19:             95.18           90.82
## 20:             96.00           93.69
##     fatalities_cumpct injuries_cumpct
##                 <num>           <num>

The top 20 events explain 96% of total fatalities and 93.69% of injuries. We will focus our health analysis on these categories.

Next, we aggregate the economic impact data (Property and Crop damage):

# ---- Economic impact ----
total_loss <- tidy_storms[, sum(economic_loss, na.rm = TRUE)] 
prop_loss <- tidy_storms[, sum(prop_total, na.rm = TRUE)] 
crop_loss <- tidy_storms[, sum(crop_total, na.rm = TRUE)] 

economy_table <- tidy_storms[, .(
    prop_total = sum(prop_total) / 10^6, # Convert to Millions
    prop_total_pct = round(sum(prop_total) / prop_loss * 100, 2),
    crop_total = sum(crop_total)/ 10^6,  # Convert to Millions
    crop_total_pct = round(sum(crop_total) / crop_loss * 100, 2)
  ), by = clean_event][order(-prop_total)]

# Add cumulative percentages
economy_table[, `:=`(
  prop_total_cumpct = cumsum(prop_total_pct),
  crop_total_cumpct = cumsum(crop_total_pct)
)]

# Filter non-zero events and keep top 20
economy_table <- economy_table[!(crop_total == 0 & prop_total == 0)][1:20]

economy_table
##           clean_event prop_total prop_total_pct  crop_total crop_total_pct
##                <char>      <num>          <num>       <num>          <num>
##  1: Hurricane/Typhoon 85356.4100          30.19  5516.11780          11.23
##  2:  Storm Surge/Tide 47964.7740          16.96     0.85500           0.00
##  3:             Flood 35355.3794          12.50 10856.34405          22.11
##  4:           Tornado 28005.2339           9.90   417.46307           0.85
##  5:       Flash Flood 17588.7921           6.22  1532.19715           3.12
##  6:              Hail 15735.2675           5.57  3025.95447           6.16
##  7: Thunderstorm Wind 11185.7519           3.96  1271.64399           2.59
##  8:          Wildfire  8496.6285           3.01   403.28163           0.82
##  9:    Tropical Storm  7714.3906           2.73   694.89600           1.42
## 10:      Winter Storm  6689.4973           2.37    27.44400           0.06
## 11:         High Wind  6058.5060           2.14   691.80190           1.41
## 12:         Ice Storm  3946.0279           1.40  5022.11350          10.23
## 13:        Heavy Rain  3213.1742           1.14   794.65280           1.62
## 14:           Drought  1046.1060           0.37 13972.56600          28.45
## 15:        Heavy Snow  1009.5897           0.36   134.66310           0.27
## 16:         Lightning   940.4474           0.33    12.09209           0.02
## 17:          Blizzard   664.8640           0.24   112.06000           0.23
## 18:     Coastal Flood   433.8291           0.15     0.05600           0.00
## 19:       Debris Flow   326.8261           0.12    20.01700           0.04
## 20:         Hailstorm   241.0000           0.09     0.00000           0.00
##           clean_event prop_total prop_total_pct  crop_total crop_total_pct
##                <char>      <num>          <num>       <num>          <num>
##     prop_total_cumpct crop_total_cumpct
##                 <num>             <num>
##  1:             30.19             11.23
##  2:             47.15             11.23
##  3:             59.65             33.34
##  4:             69.55             34.19
##  5:             75.77             37.31
##  6:             81.34             43.47
##  7:             85.30             46.06
##  8:             88.31             46.88
##  9:             91.04             48.30
## 10:             93.41             48.36
## 11:             95.55             49.77
## 12:             96.95             60.00
## 13:             98.09             61.62
## 14:             98.46             90.07
## 15:             98.82             90.34
## 16:             99.15             90.36
## 17:             99.39             90.59
## 18:             99.54             90.59
## 19:             99.66             90.63
## 20:             99.75             90.63
##     prop_total_cumpct crop_total_cumpct
##                 <num>             <num>

The top 20 events explain 99.75% of total property damage and 90.63% of crop damage. Note that values are expressed in Millions of USD.

Unifying Metrics: The Global Burden

While the individual analyses of health and economic impacts provide valuable insights, they reveal a discrepancy: some events are financially devastating but cause few casualties, while others are lethal but cause minimal property damage.

# ---- Global impact discrepancies ----
health_events <- health_table[, unique(clean_event)]
eco_events <- economy_table[, unique(clean_event)]

# Events present in Top 20 Health but NOT in Top 20 Economy
health_only <- health_events[!health_events %chin% eco_events]
health_only
## [1] "Excessive Heat"          "Rip Current"            
## [3] "Extreme Cold/Wind Chill" "Avalanche"              
## [5] "High Surf"               "Cold/Wind Chill"        
## [7] "Strong Wind"
# Events present in Top 20 Economy but NOT in Top 20 Health
eco_only <- eco_events[!eco_events %chin% health_events]
eco_only
## [1] "Storm Surge/Tide" "Hail"             "Tropical Storm"   "Drought"         
## [5] "Coastal Flood"    "Debris Flow"      "Hailstorm"

To address this and determine the “most harmful” events in a holistic manner, we construct a unified metric: Total Economic Burden. We monetize health impacts using the Value of Statistical Life (VSL) approach. Following guidance from the U.S. Department of Transportation, we assign a value of $13.2 million to each fatality and 10% of that value ($1.32 million) to each injury.

While we acknowledge that assigning a monetary value to human life is an imperfect heuristic, it allows for a standardized comparison of magnitude across different types of disasters.

# VSL Constants (in USD)
vsl_value <- 13.2e6      
injury_value <- 13.2e5   

# Estimate Global Burden
tidy_storms[, health_loss := (fatalities * vsl_value) + (injuries * injury_value)]
tidy_storms[, total_burden := economic_loss + health_loss]

sum_burden <- tidy_storms[, sum(total_burden, na.rm = TRUE)]

# Create the Top 20 Global Impact Table
top_events <- tidy_storms[, .(
  total_events = .N,
  total_burden_B = sum(total_burden) / 1e9, # Billions
  total_burden_B_pct = round(sum(total_burden) / sum_burden * 100, 2),
  fatalities = sum(fatalities),
  injuries = sum(injuries),
  health_loss_B = sum(health_loss)/ 1e9,
  prop_dmg_B  = sum(prop_total) / 1e9,
  crop_dmg_B = sum(crop_total) / 1e9,
  economic_loss = sum(economic_loss)  / 1e9
), by = clean_event][order(-total_burden_B)][, total_burden_B_pct_cum := cumsum(total_burden_B_pct)][1:20]
data.table::setcolorder(top_events, 
                        c("clean_event","total_events", "total_burden_B", "total_burden_B_pct",
                          "total_burden_B_pct_cum", "fatalities", "injuries",
                          "health_loss_B", "prop_dmg_B", "crop_dmg_B", "economic_loss"))

# Formatting for display
table_1 <- flextable::flextable(top_events) |> 
  flextable::set_header_labels(
    clean_event = "Event Type",
    total_events = "Count",
    total_burden_B = "Total Burden ($B)",
    total_burden_B_pct = "% Total",
    total_burden_B_pct_cum = "Cum %",
    fatalities = "Deaths",
    injuries = "Injuries",
    health_loss_B = "Health Loss ($B)",
    prop_dmg_B = "Prop. Loss ($B)",
    crop_dmg_B = "Crop Loss ($B)",
    economic_loss = "Total Eco. Loss ($B)"
  ) |> 
  flextable::colformat_double(digits = 2) |> 
  flextable::colformat_int(j = c("total_events", "fatalities", "injuries")) |> 
  flextable::align(j = 1, align = "left", part = "all") |> 
  flextable::align(j = 2:11, align = "right", part = "all") |> 
  flextable::width(j = 1, width = 1.8) |>     
  flextable::width(j = 2:11, width = 0.8) |>  
  flextable::fontsize(size = 9, part = "all") |> 
  flextable::bold(part = "header") |> 
  flextable::set_caption("Table 1: Top 20 Weather Events by Integrated Economic Burden (1993-2011)")

Regional & Temporal Analysis Preparation

Finally, to visualize how these impacts are distributed across time and space (US Regions), we prepare the dataset for a heatmap visualization. We generate a complete grid of years, regions, and events to ensure that years with zero recorded losses are accurately represented as such, rather than being omitted.

# 1. Identify the Top 20 events by Total Burden for the plot
top_20_names <- tidy_storms[, .(t = sum(total_burden)), by = clean_event][order(-t)][1:20, clean_event]

# 2. Define Regions and Factor Levels for plotting order
regions_vec <- unique(tidy_storms$region) 
niveles_region <- c("Northeast", "Midwest", "South", "West", "Territories", "Marine/Other")

# 3. Create a complete grid (Cross Join) to handle missing years/regions
complete_grid <- CJ(year = unique(tidy_storms$year),
                    clean_event = top_20_names,
                    region = regions_vec)

# 4. Aggregate data
regions_heatmap <- tidy_storms[clean_event %in% top_20_names, 
                                     .(burden = sum(total_burden)), 
                                     by = .(year, clean_event, region)]

# 5. Merge grid with data and fill NAs with 0
heatmap_full <- regions_heatmap[complete_grid, on = .(year, clean_event, region)]
heatmap_full[is.na(burden), burden := 0]

# 6. Set Factors for Plot Ordering
# Regions: Geographical order
heatmap_full[, region := factor(region, levels = niveles_region)]
# Events: Most severe (Flood/Hurricane) on top
heatmap_full[, clean_event := factor(clean_event, levels = rev(top_20_names))]

# 7. Create the Plot Object (to be printed in Results)
impact_heatmap <- ggplot(heatmap_full, aes(x = year, y = clean_event, fill = burden)) +
  geom_tile(color = "white", linewidth = 0.05) +
  scale_fill_viridis_c(
    trans = "log10", 
    labels = scales::label_dollar(scale = 1e-6, suffix = "M"),
    na.value = "#440154", 
    name = "Total Burden\n(Economy + Health)"
  ) + 
  facet_wrap(~region, ncol = 2) +
  theme_minimal(base_size = 11) + 
  labs(
    title = "Severity Hierarchy by Region and Year",
    subtitle = "Events ordered by cumulative global impact (Most severe on top)",
    x = "Year", y = ""
  ) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1, size = 7),
    axis.text.y = element_text(size = 9),
    strip.background = element_rect(fill = "gray95", color = NA),
    strip.text = element_text(face = "bold"),
    panel.grid = element_blank(),
    panel.spacing = unit(1, "lines"),
    legend.position = "bottom",
    legend.key.width = unit(2, "cm"),
    legend.key.height = unit(0.5, "cm"),
  )

Results

Between the years 1993 and 2011, the NOAA database captured a total of 714,662 extreme weather events. These disasters resulted in 10,865 fatalities and 68,765 injuries. Beyond the human toll, the economic impact was staggering, with an estimated $331.85 billion in property and crop damage.

To quantify the global burden of these events, we constructed a unified metric: Total Economic Burden. This was achieved by monetizing health impacts using the Value of Statistical Life (VSL) approach, as detailed in the Data Processing section. This allows for a direct comparison between the immediate financial costs of infrastructure damage and the profound societal cost of human loss.

Total Economic Burden of Severe Weather Events

Table 1 summarizes the health and economic impacts of the 20 most catastrophic event categories. As shown, the distribution of damage is highly concentrated; a small number of event types account for the vast majority of both human and financial losses.

table_1
Table 1: Top 20 Weather Events by Integrated Economic Burden (1993-2011)

Event Type

Count

Total Burden ($B)

% Total

Cum %

Deaths

Injuries

Health Loss ($B)

Prop. Loss ($B)

Crop Loss ($B)

Total Eco. Loss ($B)

Hurricane/Typhoon

299

94.41

16.68

16.68

135.00

1,333.00

3.54

85.36

5.52

90.87

Tornado

25,947

81.05

14.32

31.00

1,650.00

23,371.00

52.63

28.01

0.42

28.42

Flood

29,575

62.04

10.96

41.96

512.00

6,873.00

15.83

35.36

10.86

46.21

Excessive Heat

3,004

55.08

9.73

51.69

3,178.00

9,243.00

54.15

0.02

0.90

0.92

Storm Surge/Tide

420

48.39

8.55

60.24

28.00

45.00

0.43

47.96

0.00

47.97

Flash Flood

55,669

35.17

6.21

66.45

1,036.00

1,802.00

16.05

17.59

1.53

19.12

Thunderstorm Wind

234,080

26.60

4.70

71.15

450.00

6,213.00

14.14

11.19

1.27

12.46

Hail

226,829

20.16

3.56

74.71

10.00

960.00

1.40

15.74

3.03

18.76

Lightning

15,765

18.64

3.29

78.00

817.00

5,231.00

17.69

0.94

0.01

0.95

Drought

2,488

15.02

2.65

80.65

0.00

4.00

0.01

1.05

13.97

15.02

Ice Storm

2,030

12.75

2.25

82.90

89.00

1,975.00

3.78

3.95

5.02

8.97

High Wind

21,815

12.67

2.24

85.14

296.00

1,522.00

5.92

6.06

0.69

6.75

Wildfire

4,239

12.21

2.16

87.30

90.00

1,608.00

3.31

8.50

0.40

8.90

Winter Storm

11,437

11.33

2.00

89.30

216.00

1,338.00

4.62

6.69

0.03

6.72

Tropical Storm

697

9.79

1.73

91.03

66.00

383.00

1.38

7.71

0.69

8.41

Rip Current

777

8.31

1.47

92.50

577.00

529.00

8.31

0.00

0.00

0.00

Heavy Rain

11,949

5.77

1.02

93.52

103.00

306.00

1.76

3.21

0.79

4.01

Extreme Cold/Wind Chill

1,890

5.76

1.02

94.54

304.00

260.00

4.36

0.08

1.33

1.41

Heavy Snow

17,611

4.58

0.81

95.35

148.00

1,122.00

3.43

1.01

0.13

1.14

Avalanche

387

3.20

0.57

95.92

225.00

170.00

3.19

0.00

0.00

0.00

Geographical and Temporal Distribution

While the aggregate data provides a national overview, extreme weather impact is highly dependent on regional climate and infrastructure. The heatmap below illustrates how the Total Burden is distributed across time and US regions, revealing specific patterns of vulnerability.

impact_heatmap
Heatmap of Weather Impact

Heatmap of Weather Impact

A breakdown of the Total Economic Burden by region confirms that the South is the most severely affected area in the United States, accounting for over $225 billion in total damages. This is primarily driven by its unique vulnerability to both high-fatality events (Tornadoes, Heat) and high-property-damage events (Hurricanes).

The regional distribution is as follows:

  • South: 226.15 B

  • West: 26.19 B

  • Midwest: West: 60.01 B

  • Northeast: $15.32 15.32 B

Discussion and Conclusions

The implementation of the Total Economic Burden metric allows for a comprehensive comparison of events with fundamentally different profiles. While traditional metrics might focus solely on casualties or infrastructure repair costs, this integrated approach reveals the true scale of weather-related disasters.

Concentration of Risk

A staggering finding is that only five event types (Hurricane/Typhoon, Tornado, Flood, Excessive Heat, and Storm Surge/Tide) account for more than 60% of the total global burden. This suggests that federal and regional mitigation strategies should prioritize these high-impact categories to achieve the greatest reduction in overall risk.

Metric Sensitivity and Balance

The VSL-based approach proved to be well-proportioned for this dataset. For instance, the Health Loss from Tornadoes ($52.63B) and Excessive Heat ($54.15B) sits on a comparable scale with the Economic Loss from Hurricanes ($90.87B). This balance ensures that neither the human toll nor the financial destruction is overlooked in the final ranking.

Geographical Specialization

The analysis highlights that “one size fits all” disaster policies are ineffective. The South bears a disproportionate share of the national burden, largely due to the extreme costs associated with Hurricanes and Storm Surges. Meanwhile, the West faces a distinct set of challenges, often tied to wildfire and hydrological volatility.

Limitations

It is important to acknowledge that this model does not account for inflation (the value of the USD has changed significantly since 1993) or the long-term psychological and sociological impacts of these disasters. Furthermore, the 10% valuation for injuries is a conservative estimate that may vary depending on the severity of the medical cases.

In conclusion, while kinetic events like Tornadoes and Hurricanes dominate the public eye due to their visual destruction, thermal extremes (Excessive Heat) represent an equally costly and lethal threat that requires sustained public health intervention.