Synopsis

This report analyzes the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database to identify the types of severe weather events most harmful to population health and those with the greatest economic consequences across the United States. The database spans weather events from 1950 to November 2011. After loading and cleaning the raw data, we aggregated fatalities and injuries by event type to assess public health impact, and computed total property and crop damage (adjusting for magnitude exponents) to assess economic impact. Tornadoes were found to be by far the most harmful event type for population health, accounting for the highest combined fatalities and injuries. In terms of economic damage, floods caused the greatest total property and crop losses, followed by hurricanes/typhoons and storm surges. These findings can help government and municipal managers prioritize resource allocation and emergency preparedness planning for severe weather events.


Data Processing

Load Required Libraries

library(ggplot2)
library(dplyr)
library(tidyr)
library(scales)

Download and Load Data

The data is downloaded directly from the course website as a bzip2-compressed CSV file and read into R.

storm_data <- read.csv('StormData.csv')
dim(storm_data)
## [1] 902297     37

Initial Exploration

str(storm_data[, c("EVTYPE", "FATALITIES", "INJURIES",
                   "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")])
## 'data.frame':    902297 obs. of  7 variables:
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...

Data Cleaning and Transformation

1. Standardize event type names by converting to uppercase and trimming whitespace to reduce duplicates caused by inconsistent formatting.

storm_data$EVTYPE <- trimws(toupper(storm_data$EVTYPE))

2. Convert damage exponent columns (PROPDMGEXP, CROPDMGEXP) to numeric multipliers. The raw data uses characters like K (thousands), M (millions), and B (billions) to represent magnitudes.

exp_to_numeric <- function(exp) {
  exp <- toupper(trimws(exp))
  case_when(
    exp == "K" ~ 1e3,
    exp == "M" ~ 1e6,
    exp == "B" ~ 1e9,
    exp == "H" ~ 1e2,
    exp %in% as.character(0:9) ~ 10^as.numeric(exp),
    TRUE ~ 1
  )
}

storm_data <- storm_data %>%
  mutate(
    PROP_MULT = exp_to_numeric(PROPDMGEXP),
    CROP_MULT = exp_to_numeric(CROPDMGEXP),
    PROP_DAMAGE = PROPDMG * PROP_MULT,
    CROP_DAMAGE = CROPDMG * CROP_MULT,
    TOTAL_DAMAGE = PROP_DAMAGE + CROP_DAMAGE
  )
## Warning: There were 2 warnings in `mutate()`.
## The first warning was:
## ℹ In argument: `PROP_MULT = exp_to_numeric(PROPDMGEXP)`.
## Caused by warning:
## ! NAs introduced by coercion
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.

3. Subset to relevant columns for efficiency.

storm_clean <- storm_data %>%
  select(EVTYPE, FATALITIES, INJURIES, PROP_DAMAGE, CROP_DAMAGE, TOTAL_DAMAGE)

Aggregate by Event Type

Health impact: Sum fatalities and injuries per event type, then compute total harm.

health_impact <- storm_clean %>%
  group_by(EVTYPE) %>%
  summarise(
    Total_Fatalities = sum(FATALITIES, na.rm = TRUE),
    Total_Injuries   = sum(INJURIES,   na.rm = TRUE),
    .groups = "drop"
  ) %>%
  mutate(Total_Harm = Total_Fatalities + Total_Injuries) %>%
  arrange(desc(Total_Harm))

top_health <- head(health_impact, 10)

Economic impact: Sum total damage per event type.

econ_impact <- storm_clean %>%
  group_by(EVTYPE) %>%
  summarise(
    Total_Property = sum(PROP_DAMAGE, na.rm = TRUE),
    Total_Crop     = sum(CROP_DAMAGE, na.rm = TRUE),
    Total_Damage   = sum(TOTAL_DAMAGE, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(Total_Damage))

top_econ <- head(econ_impact, 10)

Results

Question 1: Which event types are most harmful to population health?

The figure below shows the top 10 weather event types ranked by total health impact (fatalities + injuries combined), with bars stacked to show the breakdown between fatalities and injuries.

top_health_long <- top_health %>%
  pivot_longer(cols = c(Total_Fatalities, Total_Injuries),
               names_to = "Type", values_to = "Count") %>%
  mutate(
    EVTYPE = factor(EVTYPE, levels = rev(top_health$EVTYPE)),
    Type   = recode(Type,
                    Total_Fatalities = "Fatalities",
                    Total_Injuries   = "Injuries")
  )

ggplot(top_health_long, aes(x = EVTYPE, y = Count, fill = Type)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  scale_y_continuous(labels = comma) +
  scale_fill_manual(values = c("Fatalities" = "#c0392b", "Injuries" = "#e67e22")) +
  labs(
    title    = "Top 10 Weather Events by Population Health Impact (1950–2011)",
    subtitle = "Combined fatalities and injuries across the United States",
    x        = "Event Type",
    y        = "Total Casualties",
    fill     = "Casualty Type"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title    = element_text(face = "bold"),
    legend.position = "bottom"
  )
Figure 1: Top 10 weather event types by total population health impact (fatalities + injuries), 1950–2011. Tornadoes dominate all other event types by a wide margin.

Figure 1: Top 10 weather event types by total population health impact (fatalities + injuries), 1950–2011. Tornadoes dominate all other event types by a wide margin.

Finding: Tornadoes are overwhelmingly the most harmful event type for population health, causing over 90,000 combined fatalities and injuries — far exceeding all other event types. Excessive heat and thunderstorm winds rank second and third.


Question 2: Which event types have the greatest economic consequences?

The figure below shows the top 10 weather event types ranked by total economic damage (property + crop damage combined), with bars stacked to show the breakdown.

top_econ_long <- top_econ %>%
  pivot_longer(cols = c(Total_Property, Total_Crop),
               names_to = "Type", values_to = "Damage") %>%
  mutate(
    EVTYPE = factor(EVTYPE, levels = rev(top_econ$EVTYPE)),
    Type   = recode(Type,
                    Total_Property = "Property Damage",
                    Total_Crop     = "Crop Damage")
  )

ggplot(top_econ_long, aes(x = EVTYPE, y = Damage / 1e9, fill = Type)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  scale_y_continuous(labels = dollar_format(suffix = "B")) +
  scale_fill_manual(values = c("Property Damage" = "#2980b9", "Crop Damage" = "#27ae60")) +
  labs(
    title    = "Top 10 Weather Events by Economic Damage (1950–2011)",
    subtitle = "Combined property and crop damage across the United States",
    x        = "Event Type",
    y        = "Total Damage (USD Billions)",
    fill     = "Damage Type"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title    = element_text(face = "bold"),
    legend.position = "bottom"
  )
Figure 2: Top 10 weather event types by total economic damage (property + crop damage in USD), 1950–2011. Floods cause the greatest total economic losses.

Figure 2: Top 10 weather event types by total economic damage (property + crop damage in USD), 1950–2011. Floods cause the greatest total economic losses.

Finding: Floods cause the greatest total economic damage, with over $150 billion in combined property and crop losses. Hurricanes/typhoons and storm surges rank second and third. Drought is notable for its disproportionately high crop damage relative to property damage.


Summary Table: Top 5 Events by Health and Economic Impact

cat("=== Top 5 Events by Population Health Impact ===\n")
## === Top 5 Events by Population Health Impact ===
top_health %>%
  head(5) %>%
  select(EVTYPE, Total_Fatalities, Total_Injuries, Total_Harm) %>%
  knitr::kable(col.names = c("Event Type", "Fatalities", "Injuries", "Total Harm"),
               format.args = list(big.mark = ","))
Event Type Fatalities Injuries Total Harm
TORNADO 5,633 91,346 96,979
EXCESSIVE HEAT 1,903 6,525 8,428
TSTM WIND 504 6,957 7,461
FLOOD 470 6,789 7,259
LIGHTNING 816 5,230 6,046
cat("=== Top 5 Events by Economic Damage ===\n")
## === Top 5 Events by Economic Damage ===
top_econ %>%
  head(5) %>%
  mutate(across(c(Total_Property, Total_Crop, Total_Damage), ~ scales::dollar(., scale = 1e-9, suffix = "B"))) %>%
  select(EVTYPE, Total_Property, Total_Crop, Total_Damage) %>%
  knitr::kable(col.names = c("Event Type", "Property Damage", "Crop Damage", "Total Damage"))
Event Type Property Damage Crop Damage Total Damage
FLOOD $144.66B $5.66B $150.32B
HURRICANE/TYPHOON $69.31B $2.61B $71.91B
TORNADO $56.95B $0.41B $57.36B
STORM SURGE $43.32B $0.00B $43.32B
HAIL $15.74B $3.03B $18.76B