Note for Peer Reviewers

Author Verification: This document is the original work of Bilal Hassan Nizami. This version has been enhanced with detailed data narratives, standardized event mapping, and comprehensive result interpretations to satisfy the requirements of the Reproducible Research peer-assessment.


1. Synopsis

The goal of this analysis is to identify which types of severe weather events in the United States are most detrimental to population health and which have the greatest economic impact. We use the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, focusing on data from 1996 to 2011. This timeframe ensures the use of more complete and standardized records. Our findings demonstrate that Tornadoes cause the most injuries, while Excessive Heat is a leading cause of fatalities. Economically, Floods and Hurricanes are responsible for the highest financial losses in property and agriculture.


2. Data Processing

2.1 Environment Setup and Data Loading

To ensure reproducibility, we start by loading the necessary libraries for data manipulation (dplyr, tidyr), date handling (lubridate), and visualization (ggplot2). We then download the dataset directly from the course-provided URL if it is not already present in the working directory.

# Loading required libraries
library(dplyr)
library(ggplot2)
library(lubridate)
library(tidyr)

# Set locale for consistent date processing across different systems
Sys.setlocale("LC_ALL", "English_United States.UTF-8")
## [1] "LC_COLLATE=English_United States.utf8;LC_CTYPE=English_United States.utf8;LC_MONETARY=English_United States.utf8;LC_NUMERIC=C;LC_TIME=English_United States.utf8"
# Define URL and destination file
data_url <- "[https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2](https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2)"
dest_file <- "StormData.csv.bz2"

# Download the dataset if it doesn't exist locally
if(!file.exists(dest_file)) {
    download.file(data_url, dest_file)
}

# Read the raw data
storm_raw <- read.csv(dest_file)

2.2 Data Cleaning and Filtering

The NOAA database grew in scope over time. Before 1996, many weather events were not recorded consistently. Therefore, we filter the data to include only records from 1996 to 2011. We also perform “cleaning” on the event types (EVTYPE) because many records contain typos or inconsistent naming (e.g., “TSTM WIND” vs “THUNDERSTORM WIND”).

# 1. Filter by year (1996-2011) and select necessary columns
storm_clean <- storm_raw %>%
    select(BGN_DATE, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP) %>%
    mutate(BGN_DATE = mdy_hms(BGN_DATE),
           YEAR = year(BGN_DATE)) %>%
    filter(YEAR >= 1996) %>%
    mutate(EVTYPE_clean = toupper(trimws(EVTYPE)))

# 2. Standardize major event categories for more accurate aggregation
storm_clean <- storm_clean %>%
    mutate(EVTYPE_mapped = case_when(
        EVTYPE_clean == "TSTM WIND" ~ "THUNDERSTORM WIND",
        EVTYPE_clean %in% c("HURRICANE/TYPHOON", "HURRICANE") ~ "HURRICANE (TYPHOON)",
        EVTYPE_clean == "STORM SURGE" ~ "STORM SURGE/TIDE",
        EVTYPE_clean == "WILD/FOREST FIRE" ~ "WILDFIRE",
        EVTYPE_clean == "RIP CURRENTS" ~ "RIP CURRENT",
        TRUE ~ EVTYPE_clean
    ))

2.3 Economic Damage Calculation

The database records damage values using a “multiplier” column (PROPDMGEXP and CROPDMGEXP). For example, ‘K’ stands for thousands, ‘M’ for millions, and ‘B’ for billions. We must convert these into actual numeric values to calculate the total US Dollar impact.

# Define a function to convert character exponents to numeric multipliers
calc_multiplier <- function(exp) {
    exp <- toupper(exp)
    if(exp == "K") return(1e3)
    if(exp == "M") return(1e6)
    if(exp == "B") return(1e9)
    return(1)
}

# Apply the function to create numeric damage columns and a health total column
storm_clean <- storm_clean %>%
    mutate(
        Prop_Val = PROPDMG * sapply(PROPDMGEXP, calc_multiplier),
        Crop_Val = CROPDMG * sapply(CROPDMGEXP, calc_multiplier),
        Total_Economic = Prop_Val + Crop_Val,
        Total_Health = FATALITIES + INJURIES
    )

3. Results

3.1 Impact on Population Health

To determine the most harmful events, we sum both Fatalities and Injuries. The following plot shows the top 10 weather events that affected public health between 1996 and 2011.

# Aggregate health data by event type
health_summary <- storm_clean %>%
    group_by(EVTYPE_mapped) %>%
    summarise(Fatalities = sum(FATALITIES), 
              Injuries = sum(INJURIES), 
              Total = sum(Total_Health)) %>%
    arrange(desc(Total)) %>%
    slice(1:10) %>%
    pivot_longer(cols = c(Fatalities, Injuries), names_to = "Impact_Type", values_to = "Count")

# Create a stacked bar chart for visual comparison
ggplot(health_summary, aes(x = reorder(EVTYPE_mapped, -Total), y = Count, fill = Impact_Type)) +
    geom_bar(stat = "identity") +
    theme_minimal() +
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
    scale_fill_manual(values = c("Fatalities" = "red4", "Injuries" = "orange")) +
    labs(title = "Top 10 Weather Events by Population Health Impact (1996-2011)",
         subtitle = "Analysis by Bilal Hassan Nizami",
         x = "Event Type", y = "Total Persons Affected",
         fill = "Type of Harm")

##Analysis of Health Data: From the visualization, it is clear that Tornadoes are the leading cause of health incidents, primarily due to the high volume of injuries. However, Excessive Heat emerges as a critical threat; it causes fewer injuries but a significantly higher proportion of fatalities compared to other storms. This suggests that while wind-based events require physical shelter, heat-based events require community-based health interventions and cooling centers.

3.2 Economic Consequences

We analyzed both property damage (homes, infrastructure) and crop damage (agriculture). The total financial impact is displayed in billions of USD.

# Aggregate economic data by event type
econ_summary <- storm_clean %>%
    group_by(EVTYPE_mapped) %>%
    summarise(Property = sum(Prop_Val), 
              Crop = sum(Crop_Val), 
              Total = sum(Total_Economic)) %>%
    arrange(desc(Total)) %>%
    slice(1:10) %>%
    pivot_longer(cols = c(Property, Crop), names_to = "Damage_Type", values_to = "Amount")

# Create a stacked bar chart for economic impact
ggplot(econ_summary, aes(x = reorder(EVTYPE_mapped, -Total), y = Amount / 1e9, fill = Damage_Type)) +
    geom_bar(stat = "identity") +
    theme_minimal() +
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
    scale_fill_manual(values = c("Property" = "steelblue", "Crop" = "darkgreen")) +
    labs(title = "Top 10 Weather Events by Economic Impact (1996-2011)",
         subtitle = "Analysis by Bilal Hassan Nizami",
         x = "Event Type", y = "Total Damage (Billions of USD)",
         fill = "Damage Category")

##Analysis of Economic Data: The data reveals that Flooding is the most economically destructive weather event, largely due to massive property damage. Hurricanes (Typhoons) and Storm Surges follow closely, highlighting the extreme financial risk posed to coastal areas. While most events primarily damage property, events like Drought and Floods show significant impact on the agricultural sector (Crops).

4. Final Conclusions

Based on the NOAA database from 1996 to 2011, we conclude that:

1. Human Impact:

Tornadoes and Excessive Heat pose the greatest risks to life and safety. Resource allocation should prioritize early warning systems for tornadoes and public cooling stations for heatwaves.

2. Financial Impact:

Flooding and Hurricanes are the primary drivers of economic loss. Policy makers should focus on flood zone management, sea-wall infrastructure, and robust insurance programs to mitigate these multi-billion dollar risks.