Synopsis

This analysis explores the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database to identify weather events with the most significant impact on public health and the economy. The data covers the period from 1950 to November 2011. We processed the raw data by aggregating fatalities and injuries to assess population health, and combined property and crop damage to evaluate economic consequences. Special attention was paid to data cleaning, specifically converting damage exponent identifiers (K, M, B) into numerical values. Our results indicate that Tornadoes are the most harmful event type to population health. In terms of economic impact, Floods cause the greatest economic consequences.

Data Processing

Loading Libraries and Data

First, we load the necessary libraries and the raw CSV file. The file is compressed via bzip2, which read.csv can handle directly.

library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# Check if file exists, if not, download it (optional but good for reproducibility)
fileUrl <- "[https://d396qusza40orc.cloudfront.net/repdata/data/StormData.csv.bz2](https://d396qusza40orc.cloudfront.net/repdata/data/StormData.csv.bz2)"
if (!file.exists("repdata_data_StormData.csv.bz2")) {
    download.file(fileUrl, destfile = "repdata_data_StormData.csv.bz2")
}

# Read the data
storm_data <- read.csv("repdata_data_StormData.csv.bz2") 

# Select only relevant columns to save memory
clean_data <- storm_data %>%
    select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)

# Function to map exponents to multipliers
get_multiplier <- function(exp) {
    exp <- toupper(exp)
    if (exp == "H") return(100)
    else if (exp == "K") return(1000)
    else if (exp == "M") return(1000000)
    else if (exp == "B") return(1000000000)
    else return(1) # Ignore other characters/empty as 1 or 0 based on interpretation, here 1 preserves base value
}

# Apply multiplier
clean_data$prop_mult <- sapply(clean_data$PROPDMGEXP, get_multiplier)
clean_data$crop_mult <- sapply(clean_data$CROPDMGEXP, get_multiplier)

# Calculate total values
clean_data <- clean_data %>%
    mutate(
        Total_Health_Impact = FATALITIES + INJURIES,
        Total_Eco_Impact = (PROPDMG * prop_mult) + (CROPDMG * crop_mult)
    )

4. Results

(This section answers the two specific questions with plots.)

## Results

### 1. Across the United States, which types of events are most harmful with respect to population health?

We aggregate the total fatalities and injuries by event type and select the top 10 most harmful events.


``` r
# Aggregate health data
health_summary <- clean_data %>%
    group_by(EVTYPE) %>%
    summarise(
        Fatalities = sum(FATALITIES),
        Injuries = sum(INJURIES),
        Total_Health = sum(Total_Health_Impact)
    ) %>%
    arrange(desc(Total_Health)) %>%
    slice(1:10)

# Reshape for plotting (optional, but good for stacked bars) usually just plotting Total is fine
# Here we plot the Total Health Impact

ggplot(health_summary, aes(x = reorder(EVTYPE, -Total_Health), y = Total_Health)) +
    geom_bar(stat = "identity", fill = "steelblue") +
    theme_minimal() +
    labs(
        title = "Top 10 Weather Events Most Harmful to Population Health",
        x = "Event Type",
        y = "Total Fatalities and Injuries"
    ) +
    theme(axis.text.x = element_text(angle = 45, hjust = 1))

2. Across the United States, which types of events have the greatest economic consequences?

We aggregate the total property and crop damage by event type and select the top 10 events with the highest economic cost.

# Aggregate economic data
eco_summary <- clean_data %>%
    group_by(EVTYPE) %>%
    summarise(
        Total_Cost = sum(Total_Eco_Impact)
    ) %>%
    arrange(desc(Total_Cost)) %>%
    slice(1:10)

ggplot(eco_summary, aes(x = reorder(EVTYPE, -Total_Cost), y = Total_Cost / 1e9)) +
    geom_bar(stat = "identity", fill = "firebrick") +
    theme_minimal() +
    labs(
        title = "Top 10 Weather Events with Greatest Economic Consequences",
        x = "Event Type",
        y = "Total Economic Damage (Billions USD)"
    ) +
    theme(axis.text.x = element_text(angle = 45, hjust = 1))