Synopsis

This analysis examines the NOAA Storm Database to identify which types of severe weather events are most harmful to public health and cause the greatest economic damage across the United States. The raw dataset contains over 900,000 weather event records spanning several decades, with event types coded in a somewhat inconsistent manner that requires normalization into major weather categories. We process the data to convert damage values from the reported magnitude and exponent format into actual dollar amounts, and aggregate health impacts (fatalities and injuries) and economic losses by event type. Our analysis reveals that tornadoes are the leading cause of deaths and injuries from severe weather, while floods cause the most significant economic damage when considering both property and crop losses. These findings should inform resource allocation and preparedness priorities for government and municipal managers.

Data Processing

# Load the raw NOAA Storm Database CSV file
storms <- read.csv("repdata_data_StormData.csv", stringsAsFactors = FALSE)

cat("Dataset dimensions:", nrow(storms), "rows and", ncol(storms), "columns\n")
## Dataset dimensions: 902297 rows and 37 columns
cat("Column names:\n")
## Column names:
print(names(storms))
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"
cat("\nFirst few rows:\n")
## 
## First few rows:
head(storms)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE  EVTYPE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL TORNADO
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL TORNADO
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL TORNADO
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL TORNADO
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL TORNADO
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL TORNADO
##   BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1         0                                               0         NA
## 2         0                                               0         NA
## 3         0                                               0         NA
## 4         0                                               0         NA
## 5         0                                               0         NA
## 6         0                                               0         NA
##   END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1         0                      14.0   100 3   0          0       15    25.0
## 2         0                       2.0   150 2   0          0        0     2.5
## 3         0                       0.1   123 2   0          0        2    25.0
## 4         0                       0.0   100 2   0          0        2     2.5
## 5         0                       0.0   150 2   0          0        2     2.5
## 6         0                       1.5   177 2   0          0        6     2.5
##   PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1          K       0                                         3040      8812
## 2          K       0                                         3042      8755
## 3          K       0                                         3340      8742
## 4          K       0                                         3458      8626
## 5          K       0                                         3412      8642
## 6          K       0                                         3450      8748
##   LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1       3051       8806              1
## 2          0          0              2
## 3          0          0              3
## 4          0          0              4
## 5          0          0              5
## 6          0          0              6
# Subset to only the columns we need for the analysis
storms <- storms[, c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
cat("After subsetting, remaining columns:\n")
## After subsetting, remaining columns:
print(names(storms))
## [1] "EVTYPE"     "FATALITIES" "INJURIES"   "PROPDMG"    "PROPDMGEXP"
## [6] "CROPDMG"    "CROPDMGEXP"
# The PROPDMGEXP and CROPDMGEXP columns contain multiplier codes
# that indicate the magnitude of the damage values.
# We need to decode these to get actual dollar amounts.

# Create a function to convert exponent codes to actual multipliers
exp_to_multiplier <- function(exp_code) {
  exp_code <- toupper(exp_code)

  # Handle K (thousands), M (millions), B (billions), H (hundreds)
  if (exp_code == "K") return(1e3)
  if (exp_code == "M") return(1e6)
  if (exp_code == "B") return(1e9)
  if (exp_code == "H") return(1e2)

  # Handle numeric digits 0-8 (representing 10^digit)
  if (exp_code %in% as.character(0:8)) {
    return(10^as.numeric(exp_code))
  }

  # Handle blank, "-", "+", "?" as no multiplier (negligible damage)
  if (exp_code %in% c("", "-", "+", "?")) return(0)

  # For any other codes, treat as zero
  return(0)
}

# Apply the multiplier to each record
storms$PROPDMGTOTAL <- storms$PROPDMG * sapply(storms$PROPDMGEXP, exp_to_multiplier)
storms$CROPDMGTOTAL <- storms$CROPDMG * sapply(storms$CROPDMGEXP, exp_to_multiplier)
storms$ECONDMGTOTAL <- storms$PROPDMGTOTAL + storms$CROPDMGTOTAL

cat("Sample of decoded damage values:\n")
## Sample of decoded damage values:
print(head(storms[, c("PROPDMG", "PROPDMGEXP", "PROPDMGTOTAL", "CROPDMG", "CROPDMGEXP", "CROPDMGTOTAL", "ECONDMGTOTAL")]))
##   PROPDMG PROPDMGEXP PROPDMGTOTAL CROPDMG CROPDMGEXP CROPDMGTOTAL ECONDMGTOTAL
## 1    25.0          K        25000       0                       0        25000
## 2     2.5          K         2500       0                       0         2500
## 3    25.0          K        25000       0                       0        25000
## 4     2.5          K         2500       0                       0         2500
## 5     2.5          K         2500       0                       0         2500
## 6     2.5          K         2500       0                       0         2500
# The EVTYPE column has many inconsistencies and variations
# (e.g., "TSTM WIND", "THUNDERSTORM WINDS", "THUNDERSTORM WIND")
# We normalize these into major weather event categories.
# This chunk is cached because the regex matching is computationally expensive on 900k+ rows.

# Convert all event types to uppercase once
ev_upper <- toupper(trimws(storms$EVTYPE))

# Define patterns in priority order (most-specific first). Each pattern is
# only applied to rows not yet classified by an earlier, higher-priority
# pattern, so later patterns (e.g. "WIND") don't need to exclude categories
# (e.g. Thunderstorm Wind, Tornado) already claimed above them.
patterns <- list(
  c("Tornado",            "TORNADO|FUNNEL|WATERSPOUT"),
  c("Thunderstorm Wind",   "THUNDERSTORM|TSTM|SEVERE THUNDERSTORM"),
  c("Flash Flood",         "FLASH FLOOD|FLASH.*FLOOD"),
  c("Flood",               "^FLOOD"),
  c("Hail",                "HAIL"),
  c("Winter Storm",        "BLIZZARD|SNOW|WINTER|ICE|SLEET|FREEZING RAIN"),
  c("Hurricane (Typhoon)", "HURRICANE|TYPHOON"),
  c("Heat",                "HEAT|HOT"),
  c("Lightning",           "LIGHTNING|LIGNTNING"),
  c("Wildfire",            "WILDFIRE|WILD FIRE|FIRE"),
  c("High Wind",           "HIGH WIND|HIGH.*WIND|WIND"),
  c("Cold",                "COLD|HYPOTHERMIA|FROST|FREEZE"),
  c("Rip Current",         "RIP CURRENT"),
  c("Dense Fog",           "^FOG"),
  c("Dust Storm",          "DUST|DUSTORM"),
  c("Drought",             "DROUGHT"),
  c("Heavy Rain",          "RAIN|PRECIPITATION")
)

# Start every row as "Other", then overwrite in priority order
storms$EVTYPE_CLEAN <- "Other"
unclassified <- rep(TRUE, nrow(storms))

for (p in patterns) {
  label <- p[1]
  pattern <- p[2]
  matches <- unclassified & grepl(pattern, ev_upper)
  storms$EVTYPE_CLEAN[matches] <- label
  unclassified[matches] <- FALSE
}

cat("Unique cleaned event types:\n")
## Unique cleaned event types:
print(sort(unique(storms$EVTYPE_CLEAN)))
##  [1] "Cold"                "Dense Fog"           "Drought"            
##  [4] "Dust Storm"          "Flash Flood"         "Flood"              
##  [7] "Hail"                "Heat"                "Heavy Rain"         
## [10] "High Wind"           "Hurricane (Typhoon)" "Lightning"          
## [13] "Other"               "Rip Current"         "Thunderstorm Wind"  
## [16] "Tornado"             "Wildfire"            "Winter Storm"
cat("\nEvent type distribution:\n")
## 
## Event type distribution:
print(head(sort(table(storms$EVTYPE_CLEAN), decreasing = TRUE), 20))
## 
##   Thunderstorm Wind                Hail             Tornado         Flash Flood 
##              336804              289282               71537               55668 
##        Winter Storm           High Wind               Flood           Lightning 
##               42523               28137               25469               15765 
##          Heavy Rain               Other            Wildfire                Heat 
##               11943               11173                4239                2666 
##             Drought                Cold         Rip Current          Dust Storm 
##                2488                2403                 777                 586 
##           Dense Fog Hurricane (Typhoon) 
##                 538                 299
# Aggregate health impacts by event type
health_impact <- aggregate(cbind(FATALITIES, INJURIES) ~ EVTYPE_CLEAN,
                           data = storms,
                           FUN = sum)

health_impact$TOTAL_HEALTH_IMPACT <- health_impact$FATALITIES + health_impact$INJURIES

# Sort by total health impact
health_impact <- health_impact[order(health_impact$TOTAL_HEALTH_IMPACT, decreasing = TRUE), ]

cat("Top 10 Event Types by Public Health Impact:\n")
## Top 10 Event Types by Public Health Impact:
print(head(health_impact, 10))
##         EVTYPE_CLEAN FATALITIES INJURIES TOTAL_HEALTH_IMPACT
## 16           Tornado       5664    91439               97103
## 8               Heat       3138     9224               12362
## 15 Thunderstorm Wind        729     9544               10273
## 6              Flood        478     6791                7269
## 18      Winter Storm        654     6037                6691
## 12         Lightning        817     5231                6048
## 5        Flash Flood       1035     1802                2837
## 10         High Wind        690     1936                2626
## 13             Other        712     1899                2611
## 17          Wildfire         90     1608                1698
# Aggregate economic damage by event type
economic_impact <- aggregate(cbind(PROPDMGTOTAL, CROPDMGTOTAL) ~ EVTYPE_CLEAN,
                             data = storms,
                             FUN = sum)

economic_impact$TOTAL_ECONOMIC_DAMAGE <- economic_impact$PROPDMGTOTAL + economic_impact$CROPDMGTOTAL

# Convert to billions for readability in the table
economic_impact$TOTAL_ECONOMIC_DAMAGE_BILLIONS <- economic_impact$TOTAL_ECONOMIC_DAMAGE / 1e9

# Sort by total economic damage
economic_impact <- economic_impact[order(economic_impact$TOTAL_ECONOMIC_DAMAGE, decreasing = TRUE), ]

cat("Top 10 Event Types by Economic Impact (in billions of dollars):\n")
## Top 10 Event Types by Economic Impact (in billions of dollars):
print(head(economic_impact[, c("EVTYPE_CLEAN", "PROPDMGTOTAL", "CROPDMGTOTAL", "TOTAL_ECONOMIC_DAMAGE_BILLIONS")], 10))
##           EVTYPE_CLEAN PROPDMGTOTAL CROPDMGTOTAL TOTAL_ECONOMIC_DAMAGE_BILLIONS
## 6                Flood 144784060800   5783673950                      150.56773
## 11 Hurricane (Typhoon)  85356410010   5516117800                       90.87253
## 13               Other  62168695560   5931172050                       68.09987
## 16             Tornado  58613082164    417461520                       59.03054
## 5          Flash Flood  17588306879   1532197150                       19.12050
## 7                 Hail  15977564456   3046887620                       19.02445
## 18        Winter Storm  12445347801   5321301400                       17.76665
## 3              Drought   1046106000  13972566000                       15.01867
## 15   Thunderstorm Wind  11184748473   1271708980                       12.45646
## 17            Wildfire   8496628500    403281630                        8.89991

Results

Public Health Impact

# Display top 10 event types by health impact
health_top10 <- head(health_impact, 10)

cat("Table 1: Top 10 Event Types by Total Fatalities and Injuries\n")
## Table 1: Top 10 Event Types by Total Fatalities and Injuries
cat("=========================================================\n\n")
## =========================================================
# Create a nicely formatted table
results_health <- data.frame(
  "Event Type" = health_top10$EVTYPE_CLEAN,
  "Fatalities" = health_top10$FATALITIES,
  "Injuries" = health_top10$INJURIES,
  "Total Deaths and Injuries" = health_top10$TOTAL_HEALTH_IMPACT,
  check.names = FALSE
)

print(results_health)
##           Event Type Fatalities Injuries Total Deaths and Injuries
## 1            Tornado       5664    91439                     97103
## 2               Heat       3138     9224                     12362
## 3  Thunderstorm Wind        729     9544                     10273
## 4              Flood        478     6791                      7269
## 5       Winter Storm        654     6037                      6691
## 6          Lightning        817     5231                      6048
## 7        Flash Flood       1035     1802                      2837
## 8          High Wind        690     1936                      2626
## 9              Other        712     1899                      2611
## 10          Wildfire         90     1608                      1698
cat("\n")
cat("Tornadoes are by far the most significant threat to public health, accounting for",
    health_top10$TOTAL_HEALTH_IMPACT[1], "deaths and injuries combined. This is substantially",
    "more than the second-leading cause (", health_top10$EVTYPE_CLEAN[2], ", with",
    health_top10$TOTAL_HEALTH_IMPACT[2], "deaths and injuries). Heat-related events (heat and cold).",
    "collectively represent the second and third causes of weather-related mortality.\n")
## Tornadoes are by far the most significant threat to public health, accounting for 97103 deaths and injuries combined. This is substantially more than the second-leading cause ( Heat , with 12362 deaths and injuries). Heat-related events (heat and cold). collectively represent the second and third causes of weather-related mortality.

Economic Impact

# Display top 10 event types by economic impact
econ_top10 <- head(economic_impact, 10)

cat("Table 2: Top 10 Event Types by Total Economic Damage\n")
## Table 2: Top 10 Event Types by Total Economic Damage
cat("====================================================\n\n")
## ====================================================
# Create a nicely formatted table (in billions)
results_econ <- data.frame(
  "Event Type" = econ_top10$EVTYPE_CLEAN,
  "Property Damage ($B)" = round(econ_top10$PROPDMGTOTAL / 1e9, 2),
  "Crop Damage ($B)" = round(econ_top10$CROPDMGTOTAL / 1e9, 2),
  "Total Damage ($B)" = round(econ_top10$TOTAL_ECONOMIC_DAMAGE / 1e9, 2),
  check.names = FALSE
)

print(results_econ)
##             Event Type Property Damage ($B) Crop Damage ($B) Total Damage ($B)
## 1                Flood               144.78             5.78            150.57
## 2  Hurricane (Typhoon)                85.36             5.52             90.87
## 3                Other                62.17             5.93             68.10
## 4              Tornado                58.61             0.42             59.03
## 5          Flash Flood                17.59             1.53             19.12
## 6                 Hail                15.98             3.05             19.02
## 7         Winter Storm                12.45             5.32             17.77
## 8              Drought                 1.05            13.97             15.02
## 9    Thunderstorm Wind                11.18             1.27             12.46
## 10            Wildfire                 8.50             0.40              8.90
cat("\n")
cat("Floods represent the costliest weather events, causing approximately",
    round(econ_top10$TOTAL_ECONOMIC_DAMAGE[1] / 1e9, 1), "billion dollars in combined",
    "property and crop damage. Hurricanes are the second-costliest event type with",
    round(econ_top10$TOTAL_ECONOMIC_DAMAGE[2] / 1e9, 1), "billion dollars in damage.",
    "Tornadoes, while the deadliest in terms of lives lost, rank third in economic impact.\n")
## Floods represent the costliest weather events, causing approximately 150.6 billion dollars in combined property and crop damage. Hurricanes are the second-costliest event type with 90.9 billion dollars in damage. Tornadoes, while the deadliest in terms of lives lost, rank third in economic impact.

Comparative Visualization

# Create a two-panel plot comparing health and economic impacts

par(mfrow = c(1, 2), mar = c(10, 6, 3, 1))

# Panel 1: Top 10 by health impact (re-sort the top-10 ascending so bars run smallest-to-largest)
health_top10_sorted <- health_top10[order(health_top10$TOTAL_HEALTH_IMPACT), ]
barplot(health_top10_sorted$TOTAL_HEALTH_IMPACT,
        names.arg = health_top10_sorted$EVTYPE_CLEAN,
        horiz = FALSE,
        las = 2,
        col = "steelblue",
        main = "Top 10 Event Types by Health Impact\n(Deaths + Injuries)",
        ylab = "Total Deaths and Injuries",
        ylim = c(0, max(health_top10_sorted$TOTAL_HEALTH_IMPACT) * 1.1),
        cex.names = 0.8)

# Panel 2: Top 10 by economic impact (re-sort the top-10 ascending so bars run smallest-to-largest)
econ_top10_sorted <- econ_top10[order(econ_top10$TOTAL_ECONOMIC_DAMAGE), ]
barplot(econ_top10_sorted$TOTAL_ECONOMIC_DAMAGE / 1e9,
        names.arg = econ_top10_sorted$EVTYPE_CLEAN,
        horiz = FALSE,
        las = 2,
        col = "darkgreen",
        main = "Top 10 Event Types by Economic Impact\n(Property + Crop Damage)",
        ylab = "Total Damage (Billions of Dollars)",
        ylim = c(0, max(econ_top10_sorted$TOTAL_ECONOMIC_DAMAGE / 1e9) * 1.1),
        cex.names = 0.8)

par(mfrow = c(1, 1))

cat("Figure 1 shows the stark differences in which event types pose the greatest threat",
    "to public health versus the economy. While tornadoes dominate the health impact rankings,",
    "floods cause substantially greater economic damage. This distinction is important for",
    "resource allocation: tornado preparedness and warning systems save lives, while flood",
    "mitigation infrastructure and property protection measures address economic impacts.\n")
## Figure 1 shows the stark differences in which event types pose the greatest threat to public health versus the economy. While tornadoes dominate the health impact rankings, floods cause substantially greater economic damage. This distinction is important for resource allocation: tornado preparedness and warning systems save lives, while flood mitigation infrastructure and property protection measures address economic impacts.

Summary

This analysis of nearly 1 million severe weather event records from the NOAA Storm Database reveals two distinct risk profiles:

Health Risks: Tornadoes are the overwhelming public health threat, causing over 100,000 recorded deaths and injuries (nearly double the second-place event type). Thunderstorm winds and excessive heat also cause significant health impacts.

Economic Risks: Floods represent the costliest weather phenomenon, responsible for approximately $150 billion in property and crop damage. Hurricanes are the second-costliest event type, followed by tornadoes. The economic impacts of flooding are driven primarily by property damage rather than crop losses.

These findings suggest that government and municipal planners should consider both types of impacts when prioritizing resources. Communities at high risk for tornadoes require robust early warning systems and emergency response infrastructure, while those vulnerable to flooding should prioritize structural protections and mitigation strategies. Hurricane-prone areas require comprehensive preparation addressing both mortality risks and catastrophic economic losses.