Impact of Severe Weather Events on Population Health and Economy

Course Project 2

Author

Christoph Elmiger

Published

February 8, 2025

Synopsis

This analysis explores the NOAA Storm Database to determine which severe weather events are most harmful to population health and the economy in the United States. The dataset contains records of storms, floods, tornadoes, and other natural disasters, along with their associated fatalities, injuries, and financial damages.

The study first processes and cleans the data, replacing missing values and creating aggregated measures. To address population health impact, the total number of injuries and fatalities per event type is analyzed. The economic impact is assessed based on property and crop damages. Results indicate that tornadoes cause the highest number of casualties, while floods and hurricanes contribute the most to economic losses.

The findings highlight the importance of preparing for and mitigating the impact of these high-risk events.

Data Processing

The data is obtained from the NOAA Storm Database in CSV format. It contains records of weather events along with their associated casualties and economic damages.

The data is first loaded using read_csv(), and missing values in critical columns (FATALITIES, INJURIES, PROPDMG, CROPDMG) are replaced with 0.

A new CASUALTIES variable is created to represent the total number of fatalities and injuries, while DMG captures the total economic impact (sum of property and crop damages).

The dataset contains 977 unique event types, and no attempts were made to aggregate similar event names.

# Load required libraries
library(tidyverse) # For data manipulation and visualization (dplyr, ggplot2)
library(scales)    # For custom scaled graph labels (e.g., millions)

# Load dataset from CSV file (suppress column type messages)
df <- read_csv("storm_data.csv", show_col_types = FALSE)

# Data cleaning: Replace missing values in key numerical columns with 0
df <- df %>%
  mutate(across(c(FATALITIES, INJURIES, PROPDMG, CROPDMG), ~ replace_na(.x, 0))) %>%
  
  # Create new variables:
  # - CASUALTIES: Total harm to population (fatalities + injuries)
  # - DMG: Total economic impact (property + crop damage)
  mutate(CASUALTIES = FATALITIES + INJURIES,
         DMG = PROPDMG + CROPDMG)

# Create summary table aggregating event impact
df_full_summary <- df %>%
  group_by(EVTYPE) %>% # Group data by event type
  
  summarise(
    # Count occurrences of each event type
    EVENT_COUNT = n(),

    # Summarize total and average casualties
    CASUALTIES_TOTAL = sum(CASUALTIES),
    CASUALTIES_AVG = round(mean(CASUALTIES),1),
    
    # Summarize total and average fatalities
    FATALITIES_TOTAL = sum(FATALITIES),
    FATALITIES_AVG = round(mean(FATALITIES),1),
    
    # Summarize total and average injuries
    INJURIES_TOTAL = sum(INJURIES),
    INJURIES_AVG = round(mean(INJURIES),1),
    
    # Summarize total and average property damage
    PROPDMG_TOTAL = sum(PROPDMG),
    PROPDMG_AVG = round(mean(PROPDMG),1),
    
    # Summarize total and average crop damage
    CROPDMG_TOTAL = sum(CROPDMG),
    CROPDMG_AVG = round(mean(CROPDMG),1),
    
    # Summarize total and average overall economic damage
    DMG_TOTAL = sum(DMG),
    DMG_AVG = round(mean(DMG),1)
  ) %>%
  ungroup() # Remove grouping after summarization

Results

The following analyses highlight the most harmful weather events in terms of casualties and economic damages. The first visualization shows the top event types that cause the most injuries and fatalities. The second focuses solely on fatalities, and the third presents total economic damages.

Casualties as representation of harm to population health

The following visualization presents the top 7 event types that caused the highest number of casualties (fatalities and injuries combined). All other event types are grouped as "Other" for clarity.
The stacked bar chart distinguishes between injuries (orange) and fatalities (red), highlighting the most dangerous weather events for public safety.

# Define the number of top event types to display
top_n <- 7

# Identify the top n event types with the highest total casualties (injuries + fatalities)
top_evtypes_casualties <- df_full_summary %>%
  arrange(desc(CASUALTIES_TOTAL)) %>% # Sort events by highest casualty count
  slice_head(n = top_n) %>% # Select the top n event types
  pull(EVTYPE) # Extract event names into a vector

# Create a filtered dataset containing only the top n most harmful event types
df_top_casualties <- df_full_summary %>%
  filter(EVTYPE %in% top_evtypes_casualties)

# Aggregate all remaining event types into a single "Other" category
df_other_casualties <- df_full_summary %>%
  filter(!EVTYPE %in% top_evtypes_casualties) %>%
  summarise(
    EVTYPE = "Other", # Label the grouped category as "Other"
    
    # Aggregate counts and averages for all non-top event types
    EVENT_COUNT = sum(EVENT_COUNT),
    CASUALTIES_TOTAL = sum(CASUALTIES_TOTAL),
    CASUALTIES_AVG = round(mean(CASUALTIES_AVG),1),
    
    FATALITIES_TOTAL = sum(FATALITIES_TOTAL),
    FATALITIES_AVG = round(mean(FATALITIES_AVG),1),
    
    INJURIES_TOTAL = sum(INJURIES_TOTAL),
    INJURIES_AVG = round(mean(INJURIES_AVG),1),
    
    PROPDMG_TOTAL = sum(PROPDMG_TOTAL),
    PROPDMG_AVG = round(mean(PROPDMG_AVG),1),
    
    CROPDMG_TOTAL = sum(CROPDMG_TOTAL),
    CROPDMG_AVG = round(mean(CROPDMG_AVG),1),
    
    # Ensure DMG_TOTAL is properly calculated
    DMG_TOTAL = PROPDMG_TOTAL + CROPDMG_TOTAL
  )

# Combine the top n event types with the aggregated "Other" category
df_summary <- bind_rows(df_top_casualties, df_other_casualties) %>%
  arrange(desc(CASUALTIES_TOTAL))  # Ensure final dataset is sorted in descending order

# Reshape data into long format for easier plotting of stacked bars
df_long <- df_summary %>%
  select(EVTYPE, FATALITIES_TOTAL, INJURIES_TOTAL) %>%
  pivot_longer(cols = c(FATALITIES_TOTAL, INJURIES_TOTAL),
               names_to = "Casualty_Type", values_to = "Count")

# Create stacked bar plot showing fatalities and injuries per event type
ggplot(data = df_long, aes(x = reorder(EVTYPE, Count), y = Count, fill = Casualty_Type)) +
  geom_bar(stat = "identity") +
  coord_flip() +  # Flip axes for better readability
  scale_fill_manual(values = c("FATALITIES_TOTAL" = "red", "INJURIES_TOTAL" = "orange")) +
  labs(title = "Number of Casualties by EVTYPE",
       subtitle = "Other contains all EVTYPES not listed",
       x = "Event Type",
       y = "Casualties",
       fill = "Casualty Type")

FATALITIES as representation of harm to population health

The following visualization presents the top 7 event types that resulted in the highest number of fatalities across the United States. Unlike the previous analysis, which included both injuries and fatalities, this focuses solely on fatalities, which represent the most severe impact on human life. All other event types are grouped as "Other" for clarity.
The horizontal bar chart highlights the deadliest weather events, emphasizing the need for targeted mitigation strategies.

# Define the number of top event types to display
top_n <- 7

# Identify the top n event types with the highest total fatalities
top_evtypes <- df_full_summary %>%
  arrange(desc(FATALITIES_TOTAL)) %>% # Sort event types by highest fatality count
  slice_head(n = top_n) %>% # Select the top n event types
  pull(EVTYPE)  # Extract event names into a vector

# Create a filtered dataset containing only the top n most deadly event types
df_top5 <- df_full_summary %>%
  filter(EVTYPE %in% top_evtypes) %>% # Keep only top event types
  arrange(desc(FATALITIES_TOTAL)) # Ensure they remain sorted

# Aggregate all remaining event types into a single "Other" category
df_other <- df_full_summary %>%
  filter(!EVTYPE %in% top_evtypes) %>% # Select all non-top event types
  summarise(
    EVTYPE = "Other", # Label the grouped category as "Other"
    
    # Aggregate counts and averages for all non-top event types
    EVENT_COUNT = sum(EVENT_COUNT),
    CASUALTIES_TOTAL = sum(CASUALTIES_TOTAL),
    CASUALTIES_AVG = round(mean(CASUALTIES_AVG),1),
    
    FATALITIES_TOTAL = sum(FATALITIES_TOTAL),
    FATALITIES_AVG = round(mean(FATALITIES_AVG),1),
    
    INJURIES_TOTAL = sum(INJURIES_TOTAL),
    INJURIES_AVG = round(mean(INJURIES_AVG),1),
    
    PROPDMG_TOTAL = sum(PROPDMG_TOTAL),
    PROPDMG_AVG = round(mean(PROPDMG_AVG),1),
    
    CROPDMG_TOTAL = sum(CROPDMG_TOTAL),
    CROPDMG_AVG = round(mean(CROPDMG_AVG),1)
  )

# Combine the top n event types with the aggregated "Other" category
df_summary <- bind_rows(df_top5, df_other)

# Create a bar plot to visualize fatalities by event type
ggplot(data = df_summary, aes(x = reorder(EVTYPE, FATALITIES_TOTAL), y = FATALITIES_TOTAL, fill="red")) +
  geom_bar(stat = "identity") +  # Use actual values (not counts)
  coord_flip() +  # Flip axes for better readability
  labs(title = "Fatalities by EVTYPE",
       subtitle = "Other contains all EVTYPES not listed",
       x = "Event Type",
       y = "Total Fatalities") +
  theme(legend.position = "none") # Remove legend since the color is fixed

Damages as representation of economic consequences

The following visualization identifies the top 7 weather event types that caused the most economic damage, measured as the total sum of property and crop damage. The stacked bar chart distinguishes between property damage (navy blue) and crop damage (dark green) to show the relative impact on infrastructure versus agriculture. All other event types are grouped as "Other".
This analysis helps highlight which disasters have the most significant financial consequences, informing resource allocation and disaster preparedness efforts.

# Define the number of top event types to display
top_n <- 7  

# Identify the top n event types with the highest total economic damages
top_evtypes <- df_full_summary %>%
  arrange(desc(DMG_TOTAL)) %>%  # Sort events by highest total damage
  slice_head(n = top_n) %>%  # Select the top n event types
  pull(EVTYPE)  # Extract event names into a vector

# Create a filtered dataset containing only the top n most damaging event types
df_top5 <- df_full_summary %>%
  filter(EVTYPE %in% top_evtypes)  # Keep only top event types

# Aggregate all remaining event types into a single "Other" category
df_other <- df_full_summary %>%
  filter(!EVTYPE %in% top_evtypes) %>%  # Select all non-top event types
  summarise(
    EVTYPE = "Other",  # Label the grouped category as "Other"
    
    # Aggregate counts and averages for all non-top event types
    EVENT_COUNT = sum(EVENT_COUNT),
    CASUALTIES_TOTAL = sum(CASUALTIES_TOTAL),
    CASUALTIES_AVG = round(mean(CASUALTIES_AVG), 1),
    
    FATALITIES_TOTAL = sum(FATALITIES_TOTAL),
    FATALITIES_AVG = round(mean(FATALITIES_AVG), 1),
    
    INJURIES_TOTAL = sum(INJURIES_TOTAL),
    INJURIES_AVG = round(mean(INJURIES_AVG), 1),
    
    PROPDMG_TOTAL = sum(PROPDMG_TOTAL),
    PROPDMG_AVG = round(mean(PROPDMG_AVG), 1),
    
    CROPDMG_TOTAL = sum(CROPDMG_TOTAL),
    CROPDMG_AVG = round(mean(CROPDMG_AVG), 1),
    
    # Ensure total economic damage is correctly calculated
    DMG_TOTAL = sum(DMG_TOTAL)
  )

# Combine the top n event types with the aggregated "Other" category
df_summary <- bind_rows(df_top5, df_other) %>%
  arrange(desc(DMG_TOTAL))  # Ensure final dataset is sorted in descending order

# Reshape data into long format for easier plotting of stacked bars
df_long <- df_summary %>%
  select(EVTYPE, PROPDMG_TOTAL, CROPDMG_TOTAL) %>%
  pivot_longer(cols = c(PROPDMG_TOTAL, CROPDMG_TOTAL),
               names_to = "DMG_Type", values_to = "Count")

# Create stacked bar plot showing property and crop damage per event type
ggplot(data = df_long, aes(x = reorder(EVTYPE, Count), 
                           y = Count, fill = DMG_Type)) +
  geom_bar(stat = "identity") +  # Use actual values (not counts)
  coord_flip() +  # Flip axes for better readability
  scale_fill_manual(values = c("PROPDMG_TOTAL" = "navy", "CROPDMG_TOTAL" = "darkgreen")) +  # Assign colors to categories
  scale_y_continuous(labels = label_number(scale = 1e-6, suffix = " Million USD")) +  # Convert large numbers to millions
  labs(title = "Total Damages by EVTYPE",
       subtitle = "Other contains all EVTYPES not listed",
       x = "Event Type",
       y = "Total Damages",
       fill = "Damage Type")

Conclusion

This analysis of the NOAA Storm Database reveals that tornadoes pose the greatest threat to human life, while floods and hurricanes lead to the highest economic damages. These findings emphasize the need for targeted disaster preparedness, especially in regions vulnerable to extreme weather.

Future research should focus on long-term mitigation strategies, such as improved infrastructure and early warning systems, to reduce casualties and economic losses. Policymakers and emergency response teams can use these insights to allocate resources effectively and improve resilience against natural disasters.