Synopsis

This analysis examines the NOAA Storm Database to identify which weather events are most harmful to population health and which generate the greatest economic losses in the United States. Population health impact was measured as the total number of fatalities and injuries aggregated by event type. Economic consequences were calculated as the total property and crop damage, converted into millions of U.S. dollars. The results show that tornadoes are by far the most harmful event type in terms of combined fatalities and injuries, substantially exceeding all other categories. Excessive heat and thunderstorm wind also contribute significantly to health impacts, though at much lower levels than tornadoes. In contrast, floods generate the greatest total economic losses, followed by hurricanes/typhoons and tornadoes. Storm surge and hail also rank among the top contributors to financial damage. Overall, the events that pose the greatest threat to human life are not always the same as those that cause the largest economic losses.

Data Processing

Load packages

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.5.2
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.5.2
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Read the raw data

storm_raw <- read.csv("repdata_data_StormData.csv.bz2", 
                      sep = ",")

Variable selection

For population health impacts, the variables FATALITIES and INJURIES were used. For economic consequences, the variables PROPDMG and CROPDMG, along with their corresponding exponent variables PROPDMGEXP and CROPDMGEXP, were selected. The variable EVTYPE was retained to classify and group events by type. All other variables were excluded to focus the analysis strictly on measures relevant to the assignment questions.

storm_selected <- storm_raw %>%
  select(
    EVTYPE,
    FATALITIES,
    INJURIES,
    PROPDMG,
    PROPDMGEXP,
    CROPDMG,
    CROPDMGEXP
  )

# Check structure
str(storm_selected)
## 'data.frame':    902297 obs. of  7 variables:
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...

Set the unit

I set the unit of damage value into millions.

exp_to_multiplier <- function(x) {
  x <- toupper(x)
  
  case_when(
    x == "K" ~ 1e3,
    x == "M" ~ 1e6,
    x == "B" ~ 1e9,
    x == "H" ~ 1e2,
    grepl("^[0-9]$", x) ~ 10^as.numeric(x),
    TRUE ~ 1
  )
}

storm_selected <- storm_selected %>%
  mutate(
    prop_multiplier = exp_to_multiplier(PROPDMGEXP),
    crop_multiplier = exp_to_multiplier(CROPDMGEXP)
  )
## Warning: There were 2 warnings in `mutate()`.
## The first warning was:
## ℹ In argument: `prop_multiplier = exp_to_multiplier(PROPDMGEXP)`.
## Caused by warning:
## ! NAs introduced by coercion
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
storm_selected <- storm_selected %>%
  mutate(
    property_damage = PROPDMG * prop_multiplier,
    crop_damage     = CROPDMG * crop_multiplier,
    total_damage    = property_damage + crop_damage
  )

# Set the unit into million
storm_selected <- storm_selected %>%
  mutate(
    property_damage_M = property_damage / 1000000,
    crop_damage_M     = crop_damage / 1000000,
    total_damage_M    = total_damage / 1000000
  )

Results

Population health

Tornadoes are the most harmful weather event in terms of fatalities and injuries, far exceeding all other event types.

health_by_event <- storm_selected %>%
  mutate(total_harm = FATALITIES + INJURIES) %>%
  group_by(EVTYPE) %>%
  summarise(
    total_fatalities = sum(FATALITIES, na.rm = TRUE),
    total_injuries   = sum(INJURIES, na.rm = TRUE),
    total_harm       = sum(total_harm, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(total_harm))

head(health_by_event, 10)
## # A tibble: 10 × 4
##    EVTYPE            total_fatalities total_injuries total_harm
##    <chr>                        <dbl>          <dbl>      <dbl>
##  1 TORNADO                       5633          91346      96979
##  2 EXCESSIVE HEAT                1903           6525       8428
##  3 TSTM WIND                      504           6957       7461
##  4 FLOOD                          470           6789       7259
##  5 LIGHTNING                      816           5230       6046
##  6 HEAT                           937           2100       3037
##  7 FLASH FLOOD                    978           1777       2755
##  8 ICE STORM                       89           1975       2064
##  9 THUNDERSTORM WIND              133           1488       1621
## 10 WINTER STORM                   206           1321       1527
# Plot
top10_health <- health_by_event %>%
  slice_head(n = 10)

ggplot(top10_health, 
       aes(x = reorder(EVTYPE, total_harm), y = total_harm)) +
  geom_col() +
  coord_flip() +
  labs(
    title = "Top 10 Most Harmful Weather Events (Fatalities + Injuries)",
    x = "Event Type",
    y = "Total Harm (Fatalities + Injuries)"
  )

Economic consequences

Floods generate the largest total economic losses, followed by hurricanes/typhoons and tornadoes.

econ_by_event <- storm_selected %>%
  group_by(EVTYPE) %>%
  summarise(
    total_damage_M = sum(total_damage_M, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(total_damage_M))

head(econ_by_event, 10)
## # A tibble: 10 × 2
##    EVTYPE            total_damage_M
##    <chr>                      <dbl>
##  1 FLOOD                    150320.
##  2 HURRICANE/TYPHOON         71914.
##  3 TORNADO                   57362.
##  4 STORM SURGE               43324.
##  5 HAIL                      18761.
##  6 FLASH FLOOD               18244.
##  7 DROUGHT                   15019.
##  8 HURRICANE                 14610.
##  9 RIVER FLOOD               10148.
## 10 ICE STORM                  8967.
# plot
top10_econ <- econ_by_event %>%
  slice_head(n = 10)

ggplot(top10_econ, aes(x = reorder(EVTYPE, total_damage_M), 
                       y = total_damage_M)) +
  geom_col() +
  coord_flip() +
  labs(
    title = "Top 10 Weather Events by Total Economic Damage",
    x = "Event Type",
    y = "Total Damage (Millions of USD)"
  )