Synopsis

This analysis explores the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database to identify which severe weather events have the most significant impact on population health and the economy. The dataset, containing events recorded from 1950 to 2011, includes estimates of fatalities, injuries, property damage, and crop damage. By aggregating these health and economic impacts across various event types, we determine the most devastating weather phenomena. The results show that tornadoes are overwhelmingly the most harmful to population health, causing the highest number of fatalities and injuries. Regarding economic consequences, floods result in the greatest total property and crop damage. This information is crucial for government and municipal managers to prioritize resources for severe weather preparedness.

Data Processing

The analysis starts directly from the raw data compressing using the bzip2 algorithm. We read the .csv.bz2 file into an R dataframe. Since processing the entire dataset is computationally intensive, we use the cache = TRUE option.

# Loading required libraries
library(dplyr)
library(ggplot2)
library(tidyr)

# Reading the raw data
raw_data <- read.csv("repdata-data-StormData.csv.bz2")

The database documents thousands of distinct event types (EVTYPE), some of which are misspellings or overlapping categories, but for the purpose of identifying the top few severe weather events, a simple aggregation by the exact EVTYPE name is sufficient and aligns with standard exploratory practices for this assignment.

To calculate the full economic impact, we need to convert the property damage (PROPDMG and PROPDMGEXP) and crop damage (CROPDMG and CROPDMGEXP) fields into absolute dollar amounts. The *EXP fields are characters indicating the magnitude (e.g., “K” for thousands, “M” for millions, “B” for billions). We create a helper function to decode these multipliers and compute the total damage values.

# Helper function to convert the character exponent into a numeric multiplier
get_multiplier <- function(exp) {
  exp_upper <- toupper(exp)
  if (exp_upper == "H") {
    return(10^2)
  } else if (exp_upper == "K") {
    return(10^3)
  } else if (exp_upper == "M") {
    return(10^6)
  } else if (exp_upper == "B") {
    return(10^9)
  } else if (exp_upper %in% c("0", "1", "2", "3", "4", "5", "6", "7", "8", "9")) {
    return(10^as.numeric(exp_upper))
  } else {
    return(1) # Default multiplier for unrecognized or empty values
  }
}

# Apply the multiplier to calculate property and crop damage in dollars
storm_data <- raw_data %>%
  rowwise() %>%
  mutate(
    prop_dmg_val = PROPDMG * get_multiplier(PROPDMGEXP),
    crop_dmg_val = CROPDMG * get_multiplier(CROPDMGEXP),
    total_dmg_val = prop_dmg_val + crop_dmg_val
  ) %>%
  ungroup()

Results

Question 1: Types of Events Most Harmful to Population Health

To address population health, we consider both fatalities and injuries. We aggregate the sum of fatalities and injuries for each event type and extract the top 10 most harmful events.

# Aggregate fatalities and injuries by EVTYPE
health_summary <- storm_data %>%
  group_by(EVTYPE) %>%
  summarize(
    Total_Fatalities = sum(FATALITIES, na.rm = TRUE),
    Total_Injuries = sum(INJURIES, na.rm = TRUE)
  ) %>%
  mutate(Total_Harm = Total_Fatalities + Total_Injuries) %>%
  arrange(desc(Total_Harm)) %>%
  head(10)

# Reshape data for plotting
health_long <- health_summary %>%
  select(EVTYPE, Total_Fatalities, Total_Injuries) %>%
  pivot_longer(
    cols = c("Total_Fatalities", "Total_Injuries"), 
    names_to = "Harm_Type", 
    values_to = "Count"
  )

# Create a stacked bar plot
ggplot(health_long, aes(x = reorder(EVTYPE, -Count), y = Count, fill = Harm_Type)) +
  geom_bar(stat = "identity") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  labs(
    title = "Top 10 Weather Events Most Harmful to Population Health (1950-2011)",
    x = "Event Type",
    y = "Total Number of People (Fatalities + Injuries)",
    fill = "Type of Harm"
  ) +
  scale_fill_manual(values = c("Total_Fatalities" = "firebrick", "Total_Injuries" = "steelblue"))

The figure above clearly illustrates that TORNADO is the most harmful event type regarding population health by a massive margin, followed by EXCESSIVE HEAT and TSTM WIND.

Question 2: Types of Events with the Greatest Economic Consequences

To address the economic impact, we look at the combined total of property and crop damage. We aggregate this total damage for each event type to find the events with the most severe economic consequences.

# Aggregate total economic damage by EVTYPE
econ_summary <- storm_data %>%
  group_by(EVTYPE) %>%
  summarize(
    Total_Property_Damage = sum(prop_dmg_val, na.rm = TRUE),
    Total_Crop_Damage = sum(crop_dmg_val, na.rm = TRUE)
  ) %>%
  mutate(Total_Economic_Damage = Total_Property_Damage + Total_Crop_Damage) %>%
  arrange(desc(Total_Economic_Damage)) %>%
  head(10)

# Reshape data for plotting
econ_long <- econ_summary %>%
  select(EVTYPE, Total_Property_Damage, Total_Crop_Damage) %>%
  pivot_longer(
    cols = c("Total_Property_Damage", "Total_Crop_Damage"), 
    names_to = "Damage_Type", 
    values_to = "Amount"
  )

# Create a stacked bar plot for economic consequences
ggplot(econ_long, aes(x = reorder(EVTYPE, -Amount), y = Amount / 1e9, fill = Damage_Type)) +
  geom_bar(stat = "identity") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  labs(
    title = "Top 10 Weather Events with Greatest Economic Consequences (1950-2011)",
    x = "Event Type",
    y = "Total Damage (Billions of Dollars)",
    fill = "Damage Type"
  ) +
  scale_fill_manual(values = c("Total_Crop_Damage" = "forestgreen", "Total_Property_Damage" = "darkorange"))

The figure above demonstrates that FLOOD events have overwhelmingly caused the highest total economic damage across the United States between 1950 and 2011. Hurricanes/Typhoons and Tornadoes also exhibit significant economic impacts, primarily driven by property damage.