This analysis uses the U.S. National Oceanic and Atmospheric Administration (NOAA) storm database to answer two questions:
Population health is measured as the combined total of fatalities and injuries. Economic consequences are measured as the combined total of property and crop damage in U.S. dollars after decoding the damage exponent fields.
The analysis begins with the original compressed NOAA storm database
file, StormData.csv.bz2. If the file is not present
locally, it is downloaded directly from the course source and read into
R within this document, so the full workflow remains reproducible from
the raw data.
Only a small set of transformations is applied, and each one is intended to make the raw variables suitable for answering the assignment questions. Event type labels are trimmed to remove leading and trailing whitespace so that equivalent categories are not treated as different groups because of formatting inconsistencies. No broader manual recoding of event names is performed, which helps preserve fidelity to the original NOAA classifications.
To evaluate population health consequences, fatalities and injuries
are combined into a single health_impact measure. This is a
reasonable summary because these two variables are the direct health
outcomes recorded in the dataset, and the assignment asks which event
types are most harmful overall rather than separating deaths from
non-fatal injuries. To evaluate economic consequences, property damage
and crop damage are combined into a single economic_damage
measure so that the total cost of each event reflects both
infrastructure losses and agricultural losses.
An additional transformation is required for the damage variables
because the raw dataset stores the numeric damage amount separately from
its exponent code. The exponent fields are decoded so that values such
as H, K, M, and B
are converted into their corresponding powers of ten before totals are
calculated. This step is essential for expressing all damage estimates
in comparable dollar units. After these transformations, the data are
aggregated by event type to identify the categories associated with the
greatest overall health burden and economic loss across the United
States.
library(dplyr)
library(ggplot2)
library(scales)
data_dir <- "data"
dir.create(data_dir, showWarnings = FALSE)
storm_file <- file.path(data_dir, "StormData.csv.bz2")
storm_url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
if (!file.exists(storm_file)) {
download.file(storm_url, destfile = storm_file, mode = "wb")
}
storm_data <- read.csv(storm_file, stringsAsFactors = FALSE)
decode_exponent <- function(x) {
values <- toupper(trimws(x))
multipliers <- rep(1, length(values))
multipliers[values == "H"] <- 1e2
multipliers[values == "K"] <- 1e3
multipliers[values == "M"] <- 1e6
multipliers[values == "B"] <- 1e9
digit_idx <- grepl("^[0-8]$", values)
multipliers[digit_idx] <- 10 ^ as.numeric(values[digit_idx])
multipliers
}
storm_tidy <- storm_data %>%
transmute(
event_type = trimws(EVTYPE),
fatalities = FATALITIES,
injuries = INJURIES,
property_damage = PROPDMG * decode_exponent(PROPDMGEXP),
crop_damage = CROPDMG * decode_exponent(CROPDMGEXP)
) %>%
mutate(
health_impact = fatalities + injuries,
economic_damage = property_damage + crop_damage
)
health_summary <- storm_tidy %>%
group_by(event_type) %>%
summarise(
fatalities = sum(fatalities, na.rm = TRUE),
injuries = sum(injuries, na.rm = TRUE),
health_impact = sum(health_impact, na.rm = TRUE),
.groups = "drop"
) %>%
arrange(desc(health_impact))
economic_summary <- storm_tidy %>%
group_by(event_type) %>%
summarise(
property_damage = sum(property_damage, na.rm = TRUE),
crop_damage = sum(crop_damage, na.rm = TRUE),
economic_damage = sum(economic_damage, na.rm = TRUE),
.groups = "drop"
) %>%
arrange(desc(economic_damage))
top_health <- slice_head(health_summary, n = 10)
top_economic <- slice_head(economic_summary, n = 10)
knitr::kable(top_health, digits = 0)
| event_type | fatalities | injuries | health_impact |
|---|---|---|---|
| TORNADO | 5633 | 91346 | 96979 |
| EXCESSIVE HEAT | 1903 | 6525 | 8428 |
| TSTM WIND | 504 | 6957 | 7461 |
| FLOOD | 470 | 6789 | 7259 |
| LIGHTNING | 816 | 5230 | 6046 |
| HEAT | 937 | 2100 | 3037 |
| FLASH FLOOD | 978 | 1777 | 2755 |
| ICE STORM | 89 | 1975 | 2064 |
| THUNDERSTORM WIND | 133 | 1488 | 1621 |
| WINTER STORM | 206 | 1321 | 1527 |
ggplot(top_health, aes(x = reorder(event_type, health_impact), y = health_impact)) +
geom_col(fill = "#c0392b") +
coord_flip() +
labs(
title = "Top 10 Event Types by Population Health Impact",
x = "Event Type",
y = "Fatalities + Injuries"
) +
theme_minimal(base_size = 12)
Figure 1. Top 10 NOAA event types ranked by total population health impact, measured as the combined number of fatalities and injuries across the United States.
Table 1 and Figure 1 show that tornadoes are the most harmful event type with respect to population health by a wide margin. Excessive heat is the second largest contributor, while thunderstorm wind, flood, and lightning also account for substantial combined totals of fatalities and injuries.
knitr::kable(top_economic, digits = 0)
| event_type | property_damage | crop_damage | economic_damage |
|---|---|---|---|
| FLOOD | 144657709807 | 5661968450 | 150319678257 |
| HURRICANE/TYPHOON | 69305840000 | 2607872800 | 71913712800 |
| TORNADO | 56947380676 | 414953270 | 57362333946 |
| STORM SURGE | 43323536000 | 5000 | 43323541000 |
| HAIL | 15735267513 | 3025954473 | 18761221986 |
| FLASH FLOOD | 16822723978 | 1421317100 | 18244041078 |
| DROUGHT | 1046106000 | 13972566000 | 15018672000 |
| HURRICANE | 11868319010 | 2741910000 | 14610229010 |
| RIVER FLOOD | 5118945500 | 5029459000 | 10148404500 |
| ICE STORM | 3944927860 | 5022113500 | 8967041360 |
ggplot(top_economic, aes(x = reorder(event_type, economic_damage), y = economic_damage)) +
geom_col(fill = "#1f78b4") +
coord_flip() +
scale_y_continuous(labels = label_dollar(scale = 1e-9, suffix = "B")) +
labs(
title = "Top 10 Event Types by Economic Damage",
x = "Event Type",
y = "Property + Crop Damage (Billions USD)"
) +
theme_minimal(base_size = 12)
Figure 2. Top 10 NOAA event types ranked by total economic damage, measured as the combined value of property and crop losses in U.S. dollars.
Table 2 and Figure 2 show that floods produce the largest overall economic losses in the dataset after combining property and crop damage. Hurricanes/typhoons, tornadoes, storm surge, and hail follow as the next most costly event categories.
Using the NOAA storm database, tornadoes are the most harmful event type for population health, while floods have the greatest economic consequences. These results support prioritizing preparedness for tornado-related casualties and flood-related property and agricultural losses.