Synopsis

This report analyzes the U.S. National Oceanic and Atmospheric Administration (NOAA) Storm Database to determine which severe weather events cause the greatest harm to public health and the economy. The database covers events recorded across the United States from 1950 to 2011. Population health impact was measured by summing fatalities and injuries per event type. Economic impact was quantified by converting property and crop damage values to a common monetary scale using the exponent codes provided in the dataset (K = thousands, M = millions, B = billions). Records with missing or unrecognized exponent codes were excluded from economic totals. Results show that tornadoes are by far the most dangerous event type for human health, responsible for the highest number of both injuries and fatalities. For economic damage, floods dominate property losses while droughts are the leading cause of crop damage. These findings highlight that different event types drive health versus economic harm, suggesting that emergency preparedness and mitigation strategies should be tailored accordingly.

Data Processing

Load libraries

library(tidyverse)
library(cowplot)
library(skimr)
library(baffle)

Read data

The data are loaded directly from the compressed CSV file using read_csv. No preprocessing was performed outside this document.

storm <- read_csv(
  "repdata_data_StormData.csv.bz2",
  locale = locale(encoding = "UTF-8"),
  quote = "\"",
  na = c("", "NA"),
  show_col_types = FALSE
)

Check the data structure

The variables relevant to answering the two research questions are:

  • EVTYPE — type of weather event (e.g., tornado, flood, hurricane)
  • FATALITIES — number of deaths directly attributed to the event
  • INJURIES — number of non-fatal injuries caused by the event
  • PROPDMG / PROPDMGEXP — property damage amount and its scale multiplier
  • CROPDMG / CROPDMGEXP — crop damage amount and its scale multiplier

The skim() function provides a quick structural overview of these columns.

storm %>%
    select(
        EVTYPE,
        FATALITIES,
        INJURIES,
        PROPDMG,
        PROPDMGEXP,
        CROPDMG,
        CROPDMGEXP
    ) %>%
    skim()
Data summary
Name Piped data
Number of rows 902297
Number of columns 7
_______________________
Column type frequency:
character 3
numeric 4
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
EVTYPE 0 1.00 1 30 0 977 0
PROPDMGEXP 465934 0.48 1 1 0 18 0
CROPDMGEXP 618413 0.31 1 1 0 8 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
FATALITIES 0 1 0.02 0.77 0 0 0 0.0 583 ▇▁▁▁▁
INJURIES 0 1 0.16 5.43 0 0 0 0.0 1700 ▇▁▁▁▁
PROPDMG 0 1 12.06 59.48 0 0 0 0.5 5000 ▇▁▁▁▁
CROPDMG 0 1 1.53 22.17 0 0 0 0.0 990 ▇▁▁▁▁

The distribution of exponent codes used for damage scaling:

table(storm$PROPDMGEXP)
## 
##      -      ?      +      0      1      2      3      4      5      6      7 
##      1      8      5    216     25     13      4      4     28      4      5 
##      8      B      h      H      K      m      M 
##      1     40      1      6 424665      7  11330
table(storm$CROPDMGEXP)
## 
##      ?      0      2      B      k      K      m      M 
##      7     19      1      9     21 281832      1   1994

Helper functions

To avoid code repetition, four reusable functions are defined below.

summarise_events() groups the data by event type, computes totals, and returns both a top-10 ranked table and a percentage-share table (with small categories collapsed into “OTHER”).

summarise_events <- function(data, value_var, top_n = 10, perc_threshold = 5) {
  value_var <- rlang::ensym(value_var)

  summary <- data %>%
    group_by(EVTYPE) %>%
    summarise(total = sum(!!value_var, na.rm = TRUE), .groups = "drop") %>%
    filter(total > 0) %>%
    arrange(desc(total))

  top <- summary %>%
    slice_head(n = top_n) %>%
    mutate(highlight = if_else(row_number() == 1, "TOP", "OTHER"))

  perc <- summary %>%
    mutate(perc = total / sum(total) * 100,
           EVTYPE2 = if_else(perc < perc_threshold, "OTHER", as.character(EVTYPE))) %>%
    group_by(EVTYPE2) %>%
    summarise(sum_perc = sum(perc), .groups = "drop") %>%
    arrange(desc(sum_perc))

  list(top = top, perc = perc)
}

process_damage() converts raw damage values to their true monetary amounts by applying the appropriate multiplier from the exponent column.

process_damage <- function(data, dmg_col, exp_col, out_col = "damage") {
  dmg_col <- rlang::ensym(dmg_col)
  exp_col <- rlang::ensym(exp_col)

  data %>%
    mutate(
      exp_clean = toupper(!!exp_col),
      !!out_col := case_when(
        !!exp_col == "H" ~ !!dmg_col * 1e2,
        !!exp_col == "K" ~ !!dmg_col * 1e3,
        !!exp_col == "M" ~ !!dmg_col * 1e6,
        !!exp_col == "B" ~ !!dmg_col * 1e9,
        TRUE             ~ NA_real_
      )
    )
}

plot_top_10() produces a styled bar chart of the top 10 event types for a given outcome variable.

plot_top_10 <- function(data, weather_type_column, value_column,
                        highlight_column, title, xlab, ylab, palette) {
  weather_type_column <- rlang::ensym(weather_type_column)
  value_column        <- rlang::ensym(value_column)
  highlight_column    <- rlang::ensym(highlight_column)

  ggplot(data, aes(x = reorder(!!weather_type_column, -!!value_column),
                   y = !!value_column,
                   fill = !!highlight_column)) +
    geom_col(col = "black") +
    scale_fill_manual(values = palette) +
    labs(x = xlab, y = ylab, title = title) +
    theme_bw() +
    theme(
      axis.text.x        = element_text(angle = 45, hjust = 1),
      plot.title         = element_text(color = "darkslategrey", size = 11, face = "bold.italic"),
      axis.title.x       = element_text(color = "darkslategrey", size = 11),
      axis.title.y       = element_text(color = "darkslategrey", size = 11),
      panel.grid.major.x = element_blank(),
      panel.grid.minor.x = element_blank()
    ) +
    guides(fill = "none")
}

plot_percentages() draws a waffle chart showing the proportional share of each event type, with the dominant category’s percentage annotated.

plot_percentages <- function(data, percent_col, weather_type_column, palette, title) {
  top_share <- max(data[[percent_col]])
  waffle(as.integer(round(data[[percent_col]], 0)), col = palette, from = "bottomleft")
  legend("right", legend = data[[weather_type_column]], fill = palette, cex = 0.8)
  text(x = 11, y = 1.5, labels = paste0(round(top_share), "%"), col = "black", font = 2)
  arrows(x0 = 10.5, y0 = 1.5, x1 = 9.5, y1 = 1.5, length = 0.1, col = "black")
  title(main = title, font.main = 4, col.main = "darkslategrey")
}

Build analysis datasets

pre_property <- storm %>%
  select(EVTYPE, PROPDMG, PROPDMGEXP) %>%
  process_damage(PROPDMG, PROPDMGEXP, out_col = "PROPDMG2")

pre_crop <- storm %>%
  select(EVTYPE, CROPDMG, CROPDMGEXP) %>%
  process_damage(CROPDMG, CROPDMGEXP, out_col = "CROPDMG2")

injuries   <- summarise_events(storm,        INJURIES)
fatalities <- summarise_events(storm,        FATALITIES)
property   <- summarise_events(pre_property, PROPDMG2)
crop       <- summarise_events(pre_crop,     CROPDMG2)

Results

Which event types are most harmful to population health?

Tornadoes cause vastly more injuries and fatalities than any other weather event type. Among the top 10 most injurious events, tornadoes account for more than 60% of all recorded injuries. For fatalities, tornadoes also rank first, followed by excessive heat and flash floods. Notably, excessive heat ranks higher for fatalities than for injuries, suggesting it tends to be lethal rather than merely injurious.

Figure 1 shows the top 10 event types by total injuries (panel A) and total fatalities (panel B).

pal_injuries   <- c("TOP" = "palevioletred4", "OTHER" = "pink")
pal_fatalities <- c("TOP" = "#040C25",        "OTHER" = "lemonchiffon")

p_inj <- plot_top_10(
  data                = injuries$top,
  weather_type_column = EVTYPE,
  value_column        = total,
  highlight_column    = highlight,
  title               = "Top 10 Events Causing the Most Injuries",
  xlab                = "Weather event type",
  ylab                = "Total injuries",
  palette             = pal_injuries
)

p_fat <- plot_top_10(
  data                = fatalities$top,
  weather_type_column = EVTYPE,
  value_column        = total,
  highlight_column    = highlight,
  title               = "Top 10 Events Causing the Most Fatalities",
  xlab                = "Weather event type",
  ylab                = "Total fatalities",
  palette             = pal_fatalities
)

plot_grid(p_inj, p_fat, labels = c("A", "B"), ncol = 2)
Figure 1. Top 10 weather event types by total injuries (A) and fatalities (B) recorded in the NOAA Storm Database. The leading event type is highlighted in each panel.

Figure 1. Top 10 weather event types by total injuries (A) and fatalities (B) recorded in the NOAA Storm Database. The leading event type is highlighted in each panel.

Figure 2 shows the proportional share of each event type across all injuries (left) and all fatalities (right). Events individually contributing less than 5% are grouped as “OTHER”.

pal_waffle <- c("#040C25", "#06729D", "dodgerblue", "#C4DFFD", "#EDF6FD", "lemonchiffon2")

waffle_fatalities <- fatalities$perc[c(1, 3:nrow(fatalities$perc), 2), ]
waffle_fatalities$sum_perc <- ifelse(
  waffle_fatalities$EVTYPE2 == "OTHER",
  waffle_fatalities$sum_perc + 1,
  waffle_fatalities$sum_perc
)

par(mfrow = c(1, 2), mar = c(3, 3, 3, 1))

plot_percentages(
  data                = injuries$perc,
  percent_col         = "sum_perc",
  weather_type_column = "EVTYPE2",
  palette             = c("palevioletred4", "pink"),
  title               = "Distribution of Injuries by Event Type"
)

plot_percentages(
  data                = waffle_fatalities,
  percent_col         = "sum_perc",
  weather_type_column = "EVTYPE2",
  palette             = pal_waffle,
  title               = "Distribution of Fatalities by Event Type"
)
Figure 2. Proportional share of all recorded injuries (left) and fatalities (right) by weather event type. Each square represents approximately 1% of the total. Event types below the 5% threshold are grouped as OTHER.

Figure 2. Proportional share of all recorded injuries (left) and fatalities (right) by weather event type. Each square represents approximately 1% of the total. Event types below the 5% threshold are grouped as OTHER.

par(mfrow = c(1, 1))

Which event types have the greatest economic consequences?

Floods are the dominant driver of property damage, with total losses far exceeding all other event types. Hurricanes and storm surges also contribute substantially. For crop damage, drought is the leading cause, reflecting the cumulative agricultural losses that prolonged dry periods inflict. River floods and ice storms are also notable contributors to crop losses. The contrast between the two panels underscores that water-excess events (floods) and water-deficit events (drought) represent distinct but equally serious economic threats.

Figure 3 shows the top 10 event types by total property damage (panel A) and total crop damage (panel B).

pal_property <- c("TOP" = "#6C2C6C", "OTHER" = "#D7BDE2")
pal_crops <- c("TOP" = "peachpuff4", "OTHER" = "bisque")

p_prop <- plot_top_10(
  data                = property$top,
  weather_type_column = EVTYPE,
  value_column        = total,
  highlight_column    = highlight,
  title               = "Top 10 Events Causing the Most Property Damage",
  xlab                = "Weather event type",
  ylab                = "Total property damage (USD)",
  palette             = pal_property
)

p_crop <- plot_top_10(
  data                = crop$top,
  weather_type_column = EVTYPE,
  value_column        = total,
  highlight_column    = highlight,
  title               = "Top 10 Events Causing the Most Crop Damage",
  xlab                = "Weather event type",
  ylab                = "Total crop damage (USD)",
  palette             = pal_crops
)

plot_grid(p_prop, p_crop, labels = c("A", "B"), ncol = 2)
Figure 3. Top 10 weather event types by total property damage (A) and crop damage (B) in USD, as recorded in the NOAA Storm Database. The leading event type is highlighted in each panel.

Figure 3. Top 10 weather event types by total property damage (A) and crop damage (B) in USD, as recorded in the NOAA Storm Database. The leading event type is highlighted in each panel.