Synopsis

This analysis uses the U.S. National Oceanic and Atmospheric Administration (NOAA) storm database to answer two questions:

  1. Across the United States, which types of events are most harmful with respect to population health?
  2. Across the United States, which types of events have the greatest economic consequences?

Population health is measured as the combined total of fatalities and injuries. Economic consequences are measured as the combined total of property and crop damage in U.S. dollars after decoding the damage exponent fields.

Data Processing

The analysis begins with the original compressed NOAA storm database file, StormData.csv.bz2. If the file is not present locally, it is downloaded directly from the course source and read into R within this document, so the full workflow remains reproducible from the raw data.

Only a small set of transformations is applied, and each one is intended to make the raw variables suitable for answering the assignment questions. Event type labels are trimmed to remove leading and trailing whitespace so that equivalent categories are not treated as different groups because of formatting inconsistencies. No broader manual recoding of event names is performed, which helps preserve fidelity to the original NOAA classifications.

To evaluate population health consequences, fatalities and injuries are combined into a single health_impact measure. This is a reasonable summary because these two variables are the direct health outcomes recorded in the dataset, and the assignment asks which event types are most harmful overall rather than separating deaths from non-fatal injuries. To evaluate economic consequences, property damage and crop damage are combined into a single economic_damage measure so that the total cost of each event reflects both infrastructure losses and agricultural losses.

An additional transformation is required for the damage variables because the raw dataset stores the numeric damage amount separately from its exponent code. The exponent fields are decoded so that values such as H, K, M, and B are converted into their corresponding powers of ten before totals are calculated. This step is essential for expressing all damage estimates in comparable dollar units. After these transformations, the data are aggregated by event type to identify the categories associated with the greatest overall health burden and economic loss across the United States.

library(dplyr)
library(ggplot2)
library(scales)
data_dir <- "data"
dir.create(data_dir, showWarnings = FALSE)

storm_file <- file.path(data_dir, "StormData.csv.bz2")
storm_url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"

if (!file.exists(storm_file)) {
  download.file(storm_url, destfile = storm_file, mode = "wb")
}
storm_data <- read.csv(storm_file, stringsAsFactors = FALSE)
decode_exponent <- function(x) {
  values <- toupper(trimws(x))
  multipliers <- rep(1, length(values))
  multipliers[values == "H"] <- 1e2
  multipliers[values == "K"] <- 1e3
  multipliers[values == "M"] <- 1e6
  multipliers[values == "B"] <- 1e9

  digit_idx <- grepl("^[0-8]$", values)
  multipliers[digit_idx] <- 10 ^ as.numeric(values[digit_idx])

  multipliers
}

storm_tidy <- storm_data %>%
  transmute(
    event_type = trimws(EVTYPE),
    fatalities = FATALITIES,
    injuries = INJURIES,
    property_damage = PROPDMG * decode_exponent(PROPDMGEXP),
    crop_damage = CROPDMG * decode_exponent(CROPDMGEXP)
  ) %>%
  mutate(
    health_impact = fatalities + injuries,
    economic_damage = property_damage + crop_damage
  )
health_summary <- storm_tidy %>%
  group_by(event_type) %>%
  summarise(
    fatalities = sum(fatalities, na.rm = TRUE),
    injuries = sum(injuries, na.rm = TRUE),
    health_impact = sum(health_impact, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(health_impact))

economic_summary <- storm_tidy %>%
  group_by(event_type) %>%
  summarise(
    property_damage = sum(property_damage, na.rm = TRUE),
    crop_damage = sum(crop_damage, na.rm = TRUE),
    economic_damage = sum(economic_damage, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(economic_damage))

top_health <- slice_head(health_summary, n = 10)
top_economic <- slice_head(economic_summary, n = 10)

Results

Most Harmful Events for Population Health

knitr::kable(top_health, digits = 0)
event_type fatalities injuries health_impact
TORNADO 5633 91346 96979
EXCESSIVE HEAT 1903 6525 8428
TSTM WIND 504 6957 7461
FLOOD 470 6789 7259
LIGHTNING 816 5230 6046
HEAT 937 2100 3037
FLASH FLOOD 978 1777 2755
ICE STORM 89 1975 2064
THUNDERSTORM WIND 133 1488 1621
WINTER STORM 206 1321 1527
ggplot(top_health, aes(x = reorder(event_type, health_impact), y = health_impact)) +
  geom_col(fill = "#c0392b") +
  coord_flip() +
  labs(
    title = "Top 10 Event Types by Population Health Impact",
    x = "Event Type",
    y = "Fatalities + Injuries"
  ) +
  theme_minimal(base_size = 12)
Figure 1. Top 10 NOAA event types ranked by total population health impact, measured as the combined number of fatalities and injuries across the United States.

Figure 1. Top 10 NOAA event types ranked by total population health impact, measured as the combined number of fatalities and injuries across the United States.

Table 1 and Figure 1 show that tornadoes are the most harmful event type with respect to population health by a wide margin. Excessive heat is the second largest contributor, while thunderstorm wind, flood, and lightning also account for substantial combined totals of fatalities and injuries.

Events With the Greatest Economic Consequences

knitr::kable(top_economic, digits = 0)
event_type property_damage crop_damage economic_damage
FLOOD 144657709807 5661968450 150319678257
HURRICANE/TYPHOON 69305840000 2607872800 71913712800
TORNADO 56947380676 414953270 57362333946
STORM SURGE 43323536000 5000 43323541000
HAIL 15735267513 3025954473 18761221986
FLASH FLOOD 16822723978 1421317100 18244041078
DROUGHT 1046106000 13972566000 15018672000
HURRICANE 11868319010 2741910000 14610229010
RIVER FLOOD 5118945500 5029459000 10148404500
ICE STORM 3944927860 5022113500 8967041360
ggplot(top_economic, aes(x = reorder(event_type, economic_damage), y = economic_damage)) +
  geom_col(fill = "#1f78b4") +
  coord_flip() +
  scale_y_continuous(labels = label_dollar(scale = 1e-9, suffix = "B")) +
  labs(
    title = "Top 10 Event Types by Economic Damage",
    x = "Event Type",
    y = "Property + Crop Damage (Billions USD)"
  ) +
  theme_minimal(base_size = 12)
Figure 2. Top 10 NOAA event types ranked by total economic damage, measured as the combined value of property and crop losses in U.S. dollars.

Figure 2. Top 10 NOAA event types ranked by total economic damage, measured as the combined value of property and crop losses in U.S. dollars.

Table 2 and Figure 2 show that floods produce the largest overall economic losses in the dataset after combining property and crop damage. Hurricanes/typhoons, tornadoes, storm surge, and hail follow as the next most costly event categories.

Conclusion

Using the NOAA storm database, tornadoes are the most harmful event type for population health, while floods have the greatest economic consequences. These results support prioritizing preparedness for tornado-related casualties and flood-related property and agricultural losses.