Severe weather events can cause substantial harm to communities,
including fatalities, injuries, and major economic losses.
This report analyzes the U.S. NOAA Storm Data database (1950–November
2011) to identify the event types associated with the greatest public
health and economic impacts.
Public health impact is measured as the total number of fatalities and
injuries aggregated by event type (EVTYPE).
Economic impact is measured as total property and crop damages,
converted into U.S. dollars using the magnitude exponent fields (e.g.,
K, M, B).
To improve comparability, EVTYPE labels are standardized (trimmed and
converted to uppercase) to reduce duplicates caused by inconsistent
formatting.
Results are presented using summary tables and two bar charts to support
prioritization of preparedness resources.
A decision table combines both dimensions (health and economy) to help
stakeholders identify event types that are critical under different
priorities.
All steps are documented with code to ensure reproducibility starting
from the raw compressed CSV file.
This section describes how the data are loaded from the original compressed file and transformed for analysis. We retain only the variables required for the assignment, standardize event type labels (EVTYPE) for consistent grouping, and convert recorded damages into U.S. dollars using the magnitude exponent fields (e.g., K, M, B).
knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE)
# Install/load required packages
pkgs <- c("dplyr", "ggplot2", "scales", "knitr")
for (p in pkgs) {
if (!requireNamespace(p, quietly = TRUE)) {
install.packages(p, repos = "https://cloud.r-project.org")
}
}
library(dplyr)
## Warning: le package 'dplyr' a été compilé avec la version R 4.4.3
##
## Attachement du package : 'dplyr'
## Les objets suivants sont masqués depuis 'package:stats':
##
## filter, lag
## Les objets suivants sont masqués depuis 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
## Warning: le package 'ggplot2' a été compilé avec la version R 4.4.2
library(scales)
## Warning: le package 'scales' a été compilé avec la version R 4.4.2
library(knitr)
## Warning: le package 'knitr' a été compilé avec la version R 4.4.3
# Loading the raw data (from the original .csv.bz2 file)
file <- "repdata_data_StormData.csv.bz2"
storm <- read.csv(file, stringsAsFactors = FALSE)
dim(storm)
## [1] 902297 37
names(storm)[1:15]
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
# Keep required variables and standardize EVTYPE
data <- storm %>%
select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP) %>%
mutate(EVTYPE = toupper(trimws(EVTYPE)))
# Convert damage exponents into dollars
exp_map <- function(x) {
x <- toupper(trimws(x))
dplyr::case_when(
x == "H" ~ 1e2,
x == "K" ~ 1e3,
x == "M" ~ 1e6,
x == "B" ~ 1e9,
x == "" ~ 1,
x == "0" ~ 1,
x == "1" ~ 10,
x == "2" ~ 1e2,
x == "3" ~ 1e3,
x == "4" ~ 1e4,
x == "5" ~ 1e5,
x == "6" ~ 1e6,
x == "7" ~ 1e7,
x == "8" ~ 1e8,
x == "9" ~ 1e9,
TRUE ~ NA_real_
)
}
data <- data %>%
mutate(
prop_mult = exp_map(PROPDMGEXP),
crop_mult = exp_map(CROPDMGEXP),
prop_damage = PROPDMG * prop_mult,
crop_damage = CROPDMG * crop_mult,
econ_damage = prop_damage + crop_damage
)
# Diagnostics for unexpected exponent codes
sum(is.na(data$prop_mult))
## [1] 14
sum(is.na(data$crop_mult))
## [1] 7
# # Results
# Health impacts
health <- data %>%
group_by(EVTYPE) %>%
summarise(
fatalities = sum(FATALITIES, na.rm = TRUE),
injuries = sum(INJURIES, na.rm = TRUE),
total_health = fatalities + injuries,
.groups = "drop"
) %>%
arrange(desc(total_health))
top_health <- health %>% slice_head(n = 10)
kable(top_health,
caption = "Table 1. Top 10 Event Types by Public Health Impact (Fatalities + Injuries)")
| EVTYPE | fatalities | injuries | total_health |
|---|---|---|---|
| TORNADO | 5633 | 91346 | 96979 |
| EXCESSIVE HEAT | 1903 | 6525 | 8428 |
| TSTM WIND | 504 | 6957 | 7461 |
| FLOOD | 470 | 6789 | 7259 |
| LIGHTNING | 816 | 5230 | 6046 |
| HEAT | 937 | 2100 | 3037 |
| FLASH FLOOD | 978 | 1777 | 2755 |
| ICE STORM | 89 | 1975 | 2064 |
| THUNDERSTORM WIND | 133 | 1488 | 1621 |
| WINTER STORM | 206 | 1321 | 1527 |
# Figure 1: Health impacts (top 10)
ggplot(top_health, aes(x = reorder(EVTYPE, total_health), y = total_health)) +
geom_col() +
coord_flip() +
labs(
title = "Figure 1. Event Types Most Harmful to Population Health",
subtitle = "Top 10 EVTYPE categories by total fatalities + injuries (1950–Nov 2011).",
x = "Event type (EVTYPE)",
y = "Total fatalities + injuries"
)
# Economic impacts
econ <- data %>%
group_by(EVTYPE) %>%
summarise(
property_damage = sum(prop_damage, na.rm = TRUE),
crop_damage = sum(crop_damage, na.rm = TRUE),
total_econ = sum(econ_damage, na.rm = TRUE),
.groups = "drop"
) %>%
arrange(desc(total_econ))
top_econ <- econ %>% slice_head(n = 10)
kable(top_econ,
caption = "Table 2. Top 10 Event Types by Total Economic Damage (Property + Crops, USD)")
| EVTYPE | property_damage | crop_damage | total_econ |
|---|---|---|---|
| FLOOD | 144657709807 | 5661968450 | 150319678257 |
| HURRICANE/TYPHOON | 69305840000 | 2607872800 | 71913712800 |
| TORNADO | 56947380617 | 414953270 | 57362333887 |
| STORM SURGE | 43323536000 | 5000 | 43323541000 |
| HAIL | 15735267513 | 3025954473 | 18761221986 |
| FLASH FLOOD | 16822723979 | 1421317100 | 18244041079 |
| DROUGHT | 1046106000 | 13972566000 | 15018672000 |
| HURRICANE | 11868319010 | 2741910000 | 14610229010 |
| RIVER FLOOD | 5118945500 | 5029459000 | 10148404500 |
| ICE STORM | 3944927860 | 5022113500 | 8967041360 |
# Figure 2: Economic impacts (top 10)
ggplot(top_econ, aes(x = reorder(EVTYPE, total_econ), y = total_econ)) +
geom_col() +
coord_flip() +
scale_y_continuous(labels = label_dollar(scale = 1e-9, suffix = "B")) +
labs(
title = "Figure 2. Event Types with the Greatest Economic Consequences",
subtitle = "Top 10 EVTYPE categories by total property + crop damage (1950–Nov 2011).",
x = "Event type (EVTYPE)",
y = "Total economic damage (Billions of USD)"
)
# # Data Analysis
# Helper formatters
fmt_int <- function(x) format(round(x), big.mark = ",")
fmt_usd <- function(x) paste0("$", format(round(x), big.mark = ","))
# Top 3 health and economic
h3 <- health %>% slice_head(n = 3)
e3 <- econ %>% slice_head(n = 3)
# Paragraphs for decision-makers
cat(paste0(
"Public health impacts are dominated by ", h3$EVTYPE[1],
", followed by ", h3$EVTYPE[2], " and ", h3$EVTYPE[3], ". ",
"Across 1950–Nov 2011, ", h3$EVTYPE[1], " accounts for ",
fmt_int(h3$fatalities[1]), " fatalities and ", fmt_int(h3$injuries[1]),
" injuries (total ", fmt_int(h3$total_health[1]), "). ",
"The next two event types total ", fmt_int(h3$total_health[2]),
" and ", fmt_int(h3$total_health[3]), " affected people, respectively.\n\n"
))
## Public health impacts are dominated by TORNADO, followed by EXCESSIVE HEAT and TSTM WIND. Across 1950–Nov 2011, TORNADO accounts for 5,633 fatalities and 91,346 injuries (total 96,979). The next two event types total 8,428 and 7,461 affected people, respectively.
cat(paste0(
"Economic losses are dominated by ", e3$EVTYPE[1],
", followed by ", e3$EVTYPE[2], " and ", e3$EVTYPE[3], ". ",
"Total damage (property + crops) is estimated at ", fmt_usd(e3$total_econ[1]),
" for ", e3$EVTYPE[1], " (property: ", fmt_usd(e3$property_damage[1]),
"; crops: ", fmt_usd(e3$crop_damage[1]), "). ",
"The next two event types are associated with ", fmt_usd(e3$total_econ[2]),
" and ", fmt_usd(e3$total_econ[3]), " in total losses, respectively.\n\n"
))
## Economic losses are dominated by FLOOD, followed by HURRICANE/TYPHOON and TORNADO. Total damage (property + crops) is estimated at $150,319,678,257 for FLOOD (property: $144,657,709,807; crops: $5,661,968,450). The next two event types are associated with $71,913,712,800 and $57,362,333,886 in total losses, respectively.
cat(
"Prioritization should recognize that the leading event types for human harm and for financial loss are not always identical; preparedness planning should balance life-safety interventions with mitigation strategies that reduce high-cost damages.\n\n"
)
## Prioritization should recognize that the leading event types for human harm and for financial loss are not always identical; preparedness planning should balance life-safety interventions with mitigation strategies that reduce high-cost damages.
# Decision table (union of top-10 lists)
candidate_evtypes <- union(top_health$EVTYPE, top_econ$EVTYPE)
health_ranked <- health %>% mutate(health_rank = row_number())
econ_ranked <- econ %>% mutate(econ_rank = row_number())
rng01 <- function(x) {
if (all(is.na(x))) return(rep(NA_real_, length(x)))
(x - min(x, na.rm = TRUE)) / (max(x, na.rm = TRUE) - min(x, na.rm = TRUE))
}
decision <- tibble(EVTYPE = candidate_evtypes) %>%
left_join(health_ranked %>% select(EVTYPE, health_rank, fatalities, injuries, total_health), by = "EVTYPE") %>%
left_join(econ_ranked %>% select(EVTYPE, econ_rank, property_damage, crop_damage, total_econ), by = "EVTYPE") %>%
mutate(
health_score_0_100 = 100 * rng01(total_health),
econ_score_0_100 = 100 * rng01(total_econ),
health_score_0_100 = ifelse(is.na(health_score_0_100), 0, health_score_0_100),
econ_score_0_100 = ifelse(is.na(econ_score_0_100), 0, econ_score_0_100)
)
# Combined score weights (adjust if needed)
w_health <- 0.5
w_econ <- 0.5
decision <- decision %>%
mutate(
combined_priority_score = w_health * health_score_0_100 + w_econ * econ_score_0_100,
total_econ_fmt = fmt_usd(total_econ),
property_fmt = fmt_usd(property_damage),
crop_fmt = fmt_usd(crop_damage)
) %>%
arrange(desc(combined_priority_score)) %>%
select(
EVTYPE,
health_rank, total_health, fatalities, injuries, health_score_0_100,
econ_rank, total_econ_fmt, property_fmt, crop_fmt, econ_score_0_100,
combined_priority_score
)
kable(decision, digits = 1,
caption = "Table 3. Decision Table Comparing Health and Economic Impacts (Union of Top-10 Lists; Scores normalized to 0–100)")
| EVTYPE | health_rank | total_health | fatalities | injuries | health_score_0_100 | econ_rank | total_econ_fmt | property_fmt | crop_fmt | econ_score_0_100 | combined_priority_score |
|---|---|---|---|---|---|---|---|---|---|---|---|
| TORNADO | 1 | 96979 | 5633 | 91346 | 100.0 | 3 | $ 57,362,333,886 | $ 56,947,380,616 | $ 414,953,270 | 38.0 | 69.0 |
| FLOOD | 4 | 7259 | 470 | 6789 | 7.5 | 1 | $150,319,678,257 | \(144,657,709,807 |\) 5,661,968,450 | 100.0 | 53.7 | |
| HURRICANE/TYPHOON | 13 | 1339 | 64 | 1275 | 1.4 | 2 | $ 71,913,712,800 | $ 69,305,840,000 | $ 2,607,872,800 | 47.7 | 24.5 |
| STORM SURGE | 51 | 51 | 13 | 38 | 0.0 | 4 | $ 43,323,541,000 | $ 43,323,536,000 | $ 5,000 | 28.6 | 14.3 |
| FLASH FLOOD | 7 | 2755 | 978 | 1777 | 2.8 | 6 | $ 18,244,041,078 | $ 16,822,723,978 | $ 1,421,317,100 | 11.9 | 7.4 |
| HAIL | 12 | 1376 | 15 | 1361 | 1.4 | 5 | $ 18,761,221,986 | $ 15,735,267,513 | $ 3,025,954,473 | 12.2 | 6.8 |
| TSTM WIND | 3 | 7461 | 504 | 6957 | 7.7 | 15 | $ 5,047,065,845 | $ 4,493,058,495 | $ 554,007,350 | 3.1 | 5.4 |
| DROUGHT | 115 | 4 | 0 | 4 | 0.0 | 7 | $ 15,018,672,000 | $ 1,046,106,000 | \(13,972,566,000 | 9.7| 4.9| |HURRICANE | 41| 107| 61| 46| 0.1| 8|\) 14,610,229,010 | $ 11,868,319,010 | $ 2,741,910,000 |
| EXCESSIVE HEAT | 2 | 8428 | 1903 | 6525 | 8.7 | 33 | $ 500,155,700 | $ 7,753,700 | $ 492,402,000 | 0.1 | 4.4 |
| ICE STORM | 8 | 2064 | 89 | 1975 | 2.1 | 10 | $ 8,967,041,360 | $ 3,944,927,860 | $ 5,022,113,500 | 5.7 | 3.9 |
| LIGHTNING | 5 | 6046 | 816 | 5230 | 6.2 | 28 | $ 942,471,520 | $ 930,379,430 | $ 12,092,090 | 0.4 | 3.3 |
| RIVER FLOOD | 121 | 4 | 2 | 2 | 0.0 | 9 | $ 10,148,404,500 | $ 5,118,945,500 | $ 5,029,459,000 | 6.5 | 3.3 |
| WINTER STORM | 10 | 1527 | 206 | 1321 | 1.6 | 12 | $ 6,715,441,251 | $ 6,688,497,251 | $ 26,944,000 | 4.2 | 2.9 |
| THUNDERSTORM WIND | 9 | 1621 | 133 | 1488 | 1.7 | 17 | $ 3,897,965,522 | $ 3,483,122,472 | $ 414,843,050 | 2.3 | 2.0 |
| HEAT | 6 | 3037 | 937 | 2100 | 3.1 | 35 | $ 403,258,500 | $ 1,797,000 | $ 401,461,500 | 0.0 | 1.6 |
All figures, tables, and numerical summaries in this report are generated from the raw dataset file (repdata_data_StormData.csv.bz2) within this document. Re-knitting the document should reproduce the same results, provided the raw data file is available in the working directory.