This report analyzes the U.S. National Oceanic and Atmospheric Administration (NOAA) Storm Database to determine which severe weather events cause the greatest harm to public health and the economy. The database covers events recorded across the United States from 1950 to 2011. Population health impact was measured by summing fatalities and injuries per event type. Economic impact was quantified by converting property and crop damage values to a common monetary scale using the exponent codes provided in the dataset (K = thousands, M = millions, B = billions). Records with missing or unrecognized exponent codes were excluded from economic totals. Results show that tornadoes are by far the most dangerous event type for human health, responsible for the highest number of both injuries and fatalities. For economic damage, floods dominate property losses while droughts are the leading cause of crop damage. These findings highlight that different event types drive health versus economic harm, suggesting that emergency preparedness and mitigation strategies should be tailored accordingly.
library(tidyverse)
library(cowplot)
library(skimr)
library(baffle)
The data are loaded directly from the compressed CSV file using
read_csv. No preprocessing was performed outside this
document.
storm <- read_csv(
"repdata_data_StormData.csv.bz2",
locale = locale(encoding = "UTF-8"),
quote = "\"",
na = c("", "NA"),
show_col_types = FALSE
)
The variables relevant to answering the two research questions are:
The skim() function provides a quick structural overview
of these columns.
storm %>%
select(
EVTYPE,
FATALITIES,
INJURIES,
PROPDMG,
PROPDMGEXP,
CROPDMG,
CROPDMGEXP
) %>%
skim()
| Name | Piped data |
| Number of rows | 902297 |
| Number of columns | 7 |
| _______________________ | |
| Column type frequency: | |
| character | 3 |
| numeric | 4 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| EVTYPE | 0 | 1.00 | 1 | 30 | 0 | 977 | 0 |
| PROPDMGEXP | 465934 | 0.48 | 1 | 1 | 0 | 18 | 0 |
| CROPDMGEXP | 618413 | 0.31 | 1 | 1 | 0 | 8 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| FATALITIES | 0 | 1 | 0.02 | 0.77 | 0 | 0 | 0 | 0.0 | 583 | ▇▁▁▁▁ |
| INJURIES | 0 | 1 | 0.16 | 5.43 | 0 | 0 | 0 | 0.0 | 1700 | ▇▁▁▁▁ |
| PROPDMG | 0 | 1 | 12.06 | 59.48 | 0 | 0 | 0 | 0.5 | 5000 | ▇▁▁▁▁ |
| CROPDMG | 0 | 1 | 1.53 | 22.17 | 0 | 0 | 0 | 0.0 | 990 | ▇▁▁▁▁ |
The distribution of exponent codes used for damage scaling:
table(storm$PROPDMGEXP)
##
## - ? + 0 1 2 3 4 5 6 7
## 1 8 5 216 25 13 4 4 28 4 5
## 8 B h H K m M
## 1 40 1 6 424665 7 11330
table(storm$CROPDMGEXP)
##
## ? 0 2 B k K m M
## 7 19 1 9 21 281832 1 1994
To avoid code repetition, four reusable functions are defined below.
summarise_events() groups the data by event type,
computes totals, and returns both a top-10 ranked table and a
percentage-share table (with small categories collapsed into
“OTHER”).
summarise_events <- function(data, value_var, top_n = 10, perc_threshold = 5) {
value_var <- rlang::ensym(value_var)
summary <- data %>%
group_by(EVTYPE) %>%
summarise(total = sum(!!value_var, na.rm = TRUE), .groups = "drop") %>%
filter(total > 0) %>%
arrange(desc(total))
top <- summary %>%
slice_head(n = top_n) %>%
mutate(highlight = if_else(row_number() == 1, "TOP", "OTHER"))
perc <- summary %>%
mutate(perc = total / sum(total) * 100,
EVTYPE2 = if_else(perc < perc_threshold, "OTHER", as.character(EVTYPE))) %>%
group_by(EVTYPE2) %>%
summarise(sum_perc = sum(perc), .groups = "drop") %>%
arrange(desc(sum_perc))
list(top = top, perc = perc)
}
process_damage() converts raw damage values to their
true monetary amounts by applying the appropriate multiplier from the
exponent column.
process_damage <- function(data, dmg_col, exp_col, out_col = "damage") {
dmg_col <- rlang::ensym(dmg_col)
exp_col <- rlang::ensym(exp_col)
data %>%
mutate(
exp_clean = toupper(!!exp_col),
!!out_col := case_when(
!!exp_col == "H" ~ !!dmg_col * 1e2,
!!exp_col == "K" ~ !!dmg_col * 1e3,
!!exp_col == "M" ~ !!dmg_col * 1e6,
!!exp_col == "B" ~ !!dmg_col * 1e9,
TRUE ~ NA_real_
)
)
}
plot_top_10() produces a styled bar chart of the top 10
event types for a given outcome variable.
plot_top_10 <- function(data, weather_type_column, value_column,
highlight_column, title, xlab, ylab, palette) {
weather_type_column <- rlang::ensym(weather_type_column)
value_column <- rlang::ensym(value_column)
highlight_column <- rlang::ensym(highlight_column)
ggplot(data, aes(x = reorder(!!weather_type_column, -!!value_column),
y = !!value_column,
fill = !!highlight_column)) +
geom_col(col = "black") +
scale_fill_manual(values = palette) +
labs(x = xlab, y = ylab, title = title) +
theme_bw() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(color = "darkslategrey", size = 11, face = "bold.italic"),
axis.title.x = element_text(color = "darkslategrey", size = 11),
axis.title.y = element_text(color = "darkslategrey", size = 11),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank()
) +
guides(fill = "none")
}
plot_percentages() draws a waffle chart showing the
proportional share of each event type, with the dominant category’s
percentage annotated.
plot_percentages <- function(data, percent_col, weather_type_column, palette, title) {
top_share <- max(data[[percent_col]])
waffle(as.integer(round(data[[percent_col]], 0)), col = palette, from = "bottomleft")
legend("right", legend = data[[weather_type_column]], fill = palette, cex = 0.8)
text(x = 11, y = 1.5, labels = paste0(round(top_share), "%"), col = "black", font = 2)
arrows(x0 = 10.5, y0 = 1.5, x1 = 9.5, y1 = 1.5, length = 0.1, col = "black")
title(main = title, font.main = 4, col.main = "darkslategrey")
}
pre_property <- storm %>%
select(EVTYPE, PROPDMG, PROPDMGEXP) %>%
process_damage(PROPDMG, PROPDMGEXP, out_col = "PROPDMG2")
pre_crop <- storm %>%
select(EVTYPE, CROPDMG, CROPDMGEXP) %>%
process_damage(CROPDMG, CROPDMGEXP, out_col = "CROPDMG2")
injuries <- summarise_events(storm, INJURIES)
fatalities <- summarise_events(storm, FATALITIES)
property <- summarise_events(pre_property, PROPDMG2)
crop <- summarise_events(pre_crop, CROPDMG2)
Tornadoes cause vastly more injuries and fatalities than any other weather event type. Among the top 10 most injurious events, tornadoes account for more than 60% of all recorded injuries. For fatalities, tornadoes also rank first, followed by excessive heat and flash floods. Notably, excessive heat ranks higher for fatalities than for injuries, suggesting it tends to be lethal rather than merely injurious.
Figure 1 shows the top 10 event types by total injuries (panel A) and total fatalities (panel B).
pal_injuries <- c("TOP" = "palevioletred4", "OTHER" = "pink")
pal_fatalities <- c("TOP" = "#040C25", "OTHER" = "lemonchiffon")
p_inj <- plot_top_10(
data = injuries$top,
weather_type_column = EVTYPE,
value_column = total,
highlight_column = highlight,
title = "Top 10 Events Causing the Most Injuries",
xlab = "Weather event type",
ylab = "Total injuries",
palette = pal_injuries
)
p_fat <- plot_top_10(
data = fatalities$top,
weather_type_column = EVTYPE,
value_column = total,
highlight_column = highlight,
title = "Top 10 Events Causing the Most Fatalities",
xlab = "Weather event type",
ylab = "Total fatalities",
palette = pal_fatalities
)
plot_grid(p_inj, p_fat, labels = c("A", "B"), ncol = 2)
Figure 1. Top 10 weather event types by total injuries (A) and fatalities (B) recorded in the NOAA Storm Database. The leading event type is highlighted in each panel.
Figure 2 shows the proportional share of each event type across all injuries (left) and all fatalities (right). Events individually contributing less than 5% are grouped as “OTHER”.
pal_waffle <- c("#040C25", "#06729D", "dodgerblue", "#C4DFFD", "#EDF6FD", "lemonchiffon2")
waffle_fatalities <- fatalities$perc[c(1, 3:nrow(fatalities$perc), 2), ]
waffle_fatalities$sum_perc <- ifelse(
waffle_fatalities$EVTYPE2 == "OTHER",
waffle_fatalities$sum_perc + 1,
waffle_fatalities$sum_perc
)
par(mfrow = c(1, 2), mar = c(3, 3, 3, 1))
plot_percentages(
data = injuries$perc,
percent_col = "sum_perc",
weather_type_column = "EVTYPE2",
palette = c("palevioletred4", "pink"),
title = "Distribution of Injuries by Event Type"
)
plot_percentages(
data = waffle_fatalities,
percent_col = "sum_perc",
weather_type_column = "EVTYPE2",
palette = pal_waffle,
title = "Distribution of Fatalities by Event Type"
)
Figure 2. Proportional share of all recorded injuries (left) and fatalities (right) by weather event type. Each square represents approximately 1% of the total. Event types below the 5% threshold are grouped as OTHER.
par(mfrow = c(1, 1))
Floods are the dominant driver of property damage, with total losses far exceeding all other event types. Hurricanes and storm surges also contribute substantially. For crop damage, drought is the leading cause, reflecting the cumulative agricultural losses that prolonged dry periods inflict. River floods and ice storms are also notable contributors to crop losses. The contrast between the two panels underscores that water-excess events (floods) and water-deficit events (drought) represent distinct but equally serious economic threats.
Figure 3 shows the top 10 event types by total property damage (panel A) and total crop damage (panel B).
pal_property <- c("TOP" = "#6C2C6C", "OTHER" = "#D7BDE2")
pal_crops <- c("TOP" = "peachpuff4", "OTHER" = "bisque")
p_prop <- plot_top_10(
data = property$top,
weather_type_column = EVTYPE,
value_column = total,
highlight_column = highlight,
title = "Top 10 Events Causing the Most Property Damage",
xlab = "Weather event type",
ylab = "Total property damage (USD)",
palette = pal_property
)
p_crop <- plot_top_10(
data = crop$top,
weather_type_column = EVTYPE,
value_column = total,
highlight_column = highlight,
title = "Top 10 Events Causing the Most Crop Damage",
xlab = "Weather event type",
ylab = "Total crop damage (USD)",
palette = pal_crops
)
plot_grid(p_prop, p_crop, labels = c("A", "B"), ncol = 2)
Figure 3. Top 10 weather event types by total property damage (A) and crop damage (B) in USD, as recorded in the NOAA Storm Database. The leading event type is highlighted in each panel.