Synopsis This analysis uses the 2020 U.S. National Oceanic and Atmospheric Administration (NOAA) Storm Events Database to assess the impact of severe weather events across the United States. Specifically, it investigates which types of events pose the greatest threats to public health, which are most frequent in each state, how these events are distributed across the calendar year, and which lead to the highest economic losses, both in property and agricultural damage. Starting with raw CSV data, the dataset is cleaned and analyzed using R. Key findings reveal that tornadoes are the most harmful to population health, wind and flooding is the most frequently reported event type, and many severe weather events tend to peak during the spring and summer months. These insights can assist emergency management officials in planning, resource allocation, and developing targeted preparedness strategies.

Data Processing

# Setting the file path
folder_path <- "/Users/christiemcgee-ross/Downloads"

# Actual file paths
details_file <- file.path(folder_path, "StormEvents_details-ftp_v1.0_d2020_c20240620.csv")
fatalities_file <- file.path(folder_path, "StormEvents_fatalities-ftp_v1.0_d2020_c20240620.csv")
locations_file <- file.path(folder_path, "StormEvents_locations-ftp_v1.0_d2020_c20240620.csv")

# Loading and cleaning the data
details <- read_csv(details_file) %>%
  mutate(EVENT_ID = as.character(EVENT_ID),
         MONTH = month(mdy_hms(BEGIN_DATE_TIME), label = TRUE, abbr = TRUE))

fatalities <- read_csv(fatalities_file) %>%
  mutate(EVENT_ID = as.character(EVENT_ID))

locations <- read_csv(locations_file) %>%
  mutate(EVENT_ID = as.character(EVENT_ID))

# Joining the data sets
joined_data <- details %>%
  left_join(locations, by = "EVENT_ID") %>%
  left_join(fatalities, by = "EVENT_ID")

# Creating a health impact variable
joined_data <- joined_data %>%
  mutate(HEALTH_IMPACT = INJURIES_DIRECT + INJURIES_INDIRECT +
                           DEATHS_DIRECT + DEATHS_INDIRECT)

Results

# Summarizing health impacts by event type
health_summary <- joined_data %>%
  group_by(EVENT_TYPE) %>% # Group data by type of weather event
  summarise(
    Total_Fatalities = sum(DEATHS_DIRECT + DEATHS_INDIRECT, na.rm = TRUE),
    Total_Injuries = sum(INJURIES_DIRECT + INJURIES_INDIRECT, na.rm = TRUE),
    Total_Health_Impact = sum(HEALTH_IMPACT, na.rm = TRUE)
  ) %>%
  arrange(desc(Total_Health_Impact)) %>% # Sort events by total health impact in descending order
  slice_head(n = 10) # Keep only the top 10 most harmful event types

# Creating the bar plot of the top 10 most harmful events
ggplot(health_summary, aes(x = reorder(EVENT_TYPE, Total_Health_Impact), y = Total_Health_Impact)) +
  geom_col(fill = "tomato") + # Create horizontal bars with a "tomato" color
  coord_flip() +  # Flip axes so the event types are on the y-axis for readability
  labs(title = "Top 10 Weather Events by Impact on Population Health (2020)",
       x = "Event Type", y = "Total Fatalities and Injuries")

Plot Analysis: This bar plot presents the top 10 most harmful weather event types in the United States in 2020, ranked by their combined impact on population health (total fatalities and injuries). Each horizontal bar represents a distinct event type, and the bar’s length reflects the total number of people harmed, including both direct and indirect injuries and deaths. By arranging the events in descending order the plot clearly highlights the most dangerous weather phenomena to public health. For instance, events like tornadoes and excessive heat stand out with significantly higher total harm compared to others. This visualization directly answers the question, “What types of events are most harmful with respect to population health?”, by identifying and comparing the most severe events based on quantitative health outcomes. The clear layout also improves readability, allowing decision-makers and public health officials to quickly interpret which event types warrant greater attention, preparedness, and resource allocation.

state_summary <- joined_data %>%
  group_by(STATE, EVENT_TYPE) %>%  # Grouping data by both state and event type
  summarise(Event_Count = n(), .groups = 'drop') %>%  # Counting number of events per state/event combination
  group_by(STATE) %>%  # Regroup by state only
  slice_max(order_by = Event_Count, n = 1)  # For each state keeping the event type with the highest count

# Creating a horizontal bar chart of most frequent event types by state
ggplot(state_summary, aes(x = reorder(STATE, Event_Count), y = Event_Count, fill = EVENT_TYPE)) +
  geom_col() +  # Draw bar plot with height based on event count and color based on event type
  coord_flip() +  # Flip coordinates to make states readable on the y-axis
  labs(
    title = "Most Frequent Event Type by State",  # Plot title
    x = "State",  # X-axis label (after flipping, this is the y-axis)
    y = "Number of Events",  # Y-axis label (after flipping, this is the x-axis)
    fill = "Event Type"  # Legend title
  )

Plot Analysis: The plot visualizes the most frequent event types across different states in the U.S., highlighting the event with the highest occurrence in each state. Each bar represents a state, and the length of the bar indicates the number of events of the most frequent event type in that state. The bars are colored according to the event type, providing a clear visual distinction between different event types. The use of a horizontal bar chart with flipped coordinates ensures that state names are legible and easy to read on the y-axis. This chart answers the question of which types of events occur most frequently in which states by showing the state-specific event type with the highest count. For instance, some states may have frequent tornado events, while others may see more hurricanes or floods, allowing for a quick comparative understanding of event patterns by region.

# Parse date and extract the month
joined_data <- joined_data %>%
  mutate(
    BEGIN_DATE_TIME = parse_date_time(BEGIN_DATE_TIME, orders = "d-b-y HMS"),  # parse custom format
    MONTH = month(BEGIN_DATE_TIME, label = TRUE, abbr = TRUE)  # extract labeled month
  )

# Count events per event type per month
monthly_event_counts <- joined_data %>%
  group_by(EVENT_TYPE, MONTH) %>%
  summarise(Event_Count = n(), .groups = "drop")

# Keep the top 10 most frequent event types for clarity
top_event_types <- joined_data %>%
  count(EVENT_TYPE, sort = TRUE) %>%
  slice_head(n = 10) %>%
  pull(EVENT_TYPE)

monthly_event_counts_filtered <- monthly_event_counts %>%
  filter(EVENT_TYPE %in% top_event_types)

# Create heatmap
ggplot(monthly_event_counts_filtered, aes(x = MONTH, y = EVENT_TYPE, fill = Event_Count)) +
  geom_tile(color = "white") +
  scale_fill_viridis_c(option = "plasma", name = "Event Count") +
  labs(
    title = "Monthly Distribution of Top 10 Weather Event Types",
    x = "Month",
    y = "Event Type"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Plot Analysis: This heatmap illustrates the monthly distribution of the top 10 most frequently occurring weather event types in 2020. Each row represents an event type, and each column corresponds to a month. The color intensity of each tile reflects the number of events observed in that month — darker shades indicate higher frequencies. The use of a “plasma” color palette enhances visual contrast, allowing seasonal patterns to emerge clearly.The plot answers the question “Which types of events are characterized by which months?” by revealing strong temporal trends. For instance, tornadoes tend to peak in the spring and early summer months (April–June), while thunderstorm winds and hail show increased activity during the warmer months. On the other hand, winter weather and snow events concentrate in the colder months like January and December. This seasonal clustering provides crucial insight for emergency preparedness and resource allocation throughout the year, enabling communities to anticipate and mitigate weather-related risks more effectively.

# Define a function to convert property/crop damage values (e.g., "2.5M", "300K") into numeric values
convert_damage <- function(x) {
  x <- toupper(gsub(",", "", x))  # Remove commas and convert to uppercase for consistent pattern matching
  multiplier <- case_when(        # Determine the appropriate multiplier based on suffix
    grepl("K$", x) ~ 1e3,         # "K" = thousands
    grepl("M$", x) ~ 1e6,         # "M" = millions
    grepl("B$", x) ~ 1e9,         # "B" = billions
    TRUE ~ 1                     # Default multiplier if no suffix is found
  )
  value <- suppressWarnings(as.numeric(gsub("[KMB]", "", x)))  # Remove suffix and convert to numeric
  return(value * multiplier)  # Return the scaled numeric value
}

# Apply the conversion function to create numeric columns for property and crop damage
joined_data <- joined_data %>%
  mutate(PROP_DAMAGE_NUM = convert_damage(DAMAGE_PROPERTY),  # Convert property damage to numeric
         CROP_DAMAGE_NUM = convert_damage(DAMAGE_CROPS))     # Convert crop damage to numeric

# Summarize total damage per event type
damage_summary <- joined_data %>%
  group_by(EVENT_TYPE) %>%  # Group data by event type
  summarise(
    Property_Damage = sum(PROP_DAMAGE_NUM, na.rm = TRUE),  # Sum of property damage
    Crop_Damage = sum(CROP_DAMAGE_NUM, na.rm = TRUE),      # Sum of crop damage
    Total_Damage = sum(PROP_DAMAGE_NUM + CROP_DAMAGE_NUM, na.rm = TRUE)  # Combined total damage
  ) %>%
  arrange(desc(Total_Damage)) %>%  # Sort by highest total damage
  slice_head(n = 10)  # Keep only the top 10 most damaging event types

# Create a horizontal bar chart for the top 10 most damaging event types
ggplot(damage_summary, aes(x = reorder(EVENT_TYPE, Total_Damage), y = Total_Damage / 1e6)) +
  geom_col(fill = "steelblue") +  # Draw bar chart with blue fill; convert damage to millions
  coord_flip() +  # Flip coordinates for better readability
  labs(
    title = "Top 10 Events by Total Economic Damage (2020)",  # Plot title
    x = "Event Type",  # X-axis label (after flipping, this becomes the y-axis)
    y = "Total Damage (Millions USD)"  # Y-axis label
  ) +
  theme_minimal()  # Use a clean, minimal visual theme

Plot Analysis: This plot illustrates the top 10 types of severe weather events that have caused the greatest total economic damage in the United States, combining both property and crop losses. Each horizontal bar represents an event type, ordered from highest to lowest total economic impact, with the length of the bar corresponding to the total damage in millions of U.S. dollars. The chart uses a clean, minimal aesthetic with a consistent steel blue color for clarity and readability, and the horizontal layout ensures that longer event names remain fully visible. This visual effectively answers the question by quantifying and ranking the financial consequences of different event types. Events like hurricanes, floods, and tornadoes dominate the top of the chart, highlighting their substantial destructive potential on infrastructure and agriculture. By aggregating and presenting economic loss in a direct visual format, the chart provides valuable insight for policymakers and emergency planners aiming to prioritize mitigation and preparedness efforts.

# Select the top 10 events with the highest total property damage
top_property <- damage_summary %>%
  arrange(desc(Property_Damage)) %>%  # sort by property damage
  slice_head(n = 10)                  # take top 10

# Create the bar plot for property damage
property_plot <- ggplot(top_property, aes(x = reorder(EVENT_TYPE, Property_Damage), y = Property_Damage / 1e6)) +
  geom_col(fill = "darkgreen") +  # create bar chart with green bars
  coord_flip() +                  # flip axes to make it horizontal
  labs(
    title = "Top 10 Events by Property Damage (2020)",  # title of the plot
    x = "Event Type",                                   # label for x-axis
    y = "Property Damage (Millions USD)"                # label for y-axis
  ) +
  theme_minimal()  # use a clean minimal theme

# Select the top 10 events with the highest total crop damage
top_crop <- damage_summary %>%
  arrange(desc(Crop_Damage)) %>%  # sort by crop damage
  slice_head(n = 10)              # take top 10

# Create the bar plot for crop damage
crop_plot <- ggplot(top_crop, aes(x = reorder(EVENT_TYPE, Crop_Damage), y = Crop_Damage / 1e6)) +
  geom_col(fill = "darkorange") +  # create bar chart with orange bars
  coord_flip() +                   # flip axes for horizontal bars
  labs(
    title = "Top 10 Events by Crop Damage (2020)",  # title of the plot
    x = "Event Type",                               # label for x-axis
    y = "Crop Damage (Millions USD)"                # label for y-axis
  ) +
  theme_minimal()  # use the same minimal theme for consistency

# Combine the two plots vertically using patchwork
property_plot / crop_plot  # "/" stacks the two plots on top of each other

Plot Analysis: This pair of stacked bar charts presents a detailed breakdown of the top 10 weather event types that caused the most significant property and crop damage in 2020, providing a clearer understanding of how economic impacts vary by damage type. The top chart, shaded in dark green, shows events ranked by total property damage, while the bottom chart, in dark orange, focuses on crop damage. Both use horizontal bars for legibility, especially with long event names, and the minimalist theme keeps the viewer’s attention on the data itself. The separation into two charts allows for an insightful comparison: some event types, like hurricanes and floods, dominate property damage due to their destructive impact on buildings and infrastructure, while others, such as droughts or freezes, appear prominently in crop damage due to their direct effect on agriculture. This visualization effectively answers the question of which events are most economically damaging by disaggregating the total losses, helping decision-makers identify which hazards require targeted mitigation strategies for different sectors.

Final Discussion: This analysis highlights how different weather events vary not only in frequency and impact, but also by geographic and seasonal patterns. For example tornadoes pose significant health risks and are most active in late spring and early summer. Thunderstorm winds and hail dominate in many states and peak during warmer months, while winter related events cluster in colder periods. Economic damages also vary. Property losses are largely driven by wind and flood events, while drought and extreme cold disproportionately affect agriculture. This report is based on data from a single year (2020) and may not reflect long-term trends. Additionally, inconsistencies in how events are recorded could affect accuracy. Damage values are often estimates and may not capture indirect or long-term economic impacts. By analyzing the frequency, geographic distribution, seasonality, and impact of severe weather events, this report provides valuable insights to help municipal managers and emergency planners allocate resources, plan for seasonal risks, and protect public health and infrastructure more effectively.