Synopsis

This report analyzes the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database to determine which types of severe weather events are most detrimental to population health and which have the greatest economic consequences. The data covers events from 1950 to 2011, with more complete records in later years. The analysis involves processing raw event type data, extracting and converting fatality, injury, and property/crop damage figures. The results show that tornadoes are, by far, the most harmful event type to population health, causing the most fatalities and injuries. Regarding economic consequences, floods have inflicted the greatest total property damage, while droughts have caused the most significant crop damage. When combined, floods represent the event type with the single greatest overall economic impact. These findings can help government and municipal managers prioritize resource allocation and preparation strategies for different severe weather events.

Data Processing

1. Loading the Raw Data

The analysis starts from the original raw data file. The compressed CSV file is downloaded from the source URL if it is not already present in the working directory.

file_url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
dest_file <- "repdata_data_StormData.csv.bz2"

# Download the file if it doesn't exist
if (!file.exists(dest_file)) {
        download.file(file_url, destfile = dest_file, method = "curl")
}

# Read the data into R
storm_data <- read.csv(dest_file)

2. Processing Data for Population Health Analysis

To address the first question on population health, the data is grouped by event type (EVTYPE). The total number of fatalities (FATALITIES) and injuries (INJURIES) are summed for each group. The top 10 most harmful event types are selected and transformed into a long format suitable for plotting.

# Summarize health impact data
health_impact <- storm_data %>%
        group_by(EVTYPE) %>%
        summarise(Fatalities = sum(FATALITIES, na.rm = TRUE),
                  Injuries = sum(INJURIES, na.rm = TRUE),
                  Total_Health = Fatalities + Injuries) %>%
        arrange(desc(Total_Health)) %>% # Sort by total impact
        head(10) # Select top 10 events

# Transform data from wide to long format for ggplot2
health_impact_long <- health_impact %>%
        select(EVTYPE, Fatalities, Injuries) %>%
        pivot_longer(cols = c(Fatalities, Injuries),
                     names_to = "Impact_Type",
                     values_to = "Count")

3. Processing Data for Economic Consequences Analysis

The economic data requires significant transformation. The cost values are stored in two pairs of columns: PROPDMG/PROPDMGEXP and CROPDMG/CROPDMGEXP. The *DMGEXP columns contain alphabetic characters (e.g., ‘K’, ‘M’, ‘B’) that signify the magnitude (thousands, millions, billions). A function is created to map these exponents to numerical multipliers. The actual damage in U.S. dollars is then calculated for both property and crop damage.

Justification: This conversion is critical. Without it, a value of 5 and PROPDMGEXP = 'B' ($5 Billion) would be treated the same as a value of 5 and PROPDMGEXP = 'K' ($5 Thousand), leading to completely erroneous results.

# Function to convert exponent letters to numeric multipliers
convert_exp <- function(exp) {
        exp <- toupper(exp) # Convert to uppercase for consistency
        multiplier <- case_when(
                exp %in% c("", "+", "-", "?") ~ 1,    # Assume base value
                exp == "H" ~ 100,                     # Hundreds
                exp == "K" ~ 1000,                    # Thousands
                exp == "M" ~ 1e6,                     # Millions
                exp == "B" ~ 1e9,                     # Billions
                TRUE ~ NA_real_                       # Handle any unexpected values as NA
        )
        return(multiplier)
}

# Calculate economic damage in dollars and summarize
econ_impact <- storm_data %>%
        mutate(
            Prop_Damage_Dollars = PROPDMG * sapply(PROPDMGEXP, convert_exp),
            Crop_Damage_Dollars = CROPDMG * sapply(CROPDMGEXP, convert_exp),
            Total_Econ_Damage = Prop_Damage_Dollars + Crop_Damage_Dollars
        ) %>%
        group_by(EVTYPE) %>%
        summarise(
            Property_Damage = sum(Prop_Damage_Dollars, na.rm = TRUE),
            Crop_Damage = sum(Crop_Damage_Dollars, na.rm = TRUE),
            Total_Damage = sum(Total_Econ_Damage, na.rm = TRUE)
        ) %>%
        arrange(desc(Total_Damage)) %>%
        head(10)

# Transform data for plotting, converting dollars to billions for better axis labels
econ_impact_long <- econ_impact %>%
        select(EVTYPE, Property_Damage, Crop_Damage) %>%
        pivot_longer(cols = c(Property_Damage, Crop_Damage),
                     names_to = "Damage_Type",
                     values_to = "Cost_Dollars") %>%
        mutate(Cost_Billions = Cost_Dollars / 1e9)

Results

1. Events Most Harmful to Population Health

The following plot shows the top 10 most harmful weather event types by their total impact on population health, broken down into fatalities and injuries.

ggplot(health_impact_long, aes(x = reorder(EVTYPE, -Count), y = Count, fill = Impact_Type)) +
        geom_bar(stat = "identity") +
        theme(axis.text.x = element_text(angle = 45, hjust = 1)) + # Rotate x-axis labels
        labs(title = "Top 10 Most Harmful Weather Events to Population Health",
             subtitle = "U.S., 1950-2011",
             x = "Event Type",
             y = "Total Number of Fatalities & Injuries",
             fill = "Type of Impact") +
        scale_fill_manual(values = c("Fatalities" = "red3", "Injuries" = "orange"))

Figure 1: Impact of Severe Weather Events on Population Health. This bar chart displays the top 10 event types with the highest combined number of fatalities and injuries. Tornadoes are the most devastating event by a significant margin, causing over 90,000 combined casualties. Excessive heat and flash floods are also major contributors to fatalities, while thunderstorm winds cause a substantial number of injuries.

2. Events with the Greatest Economic Consequences

The plot below illustrates the top 10 most economically damaging event types, with costs separated into property and crop damage (in billions of U.S. dollars).

ggplot(econ_impact_long, aes(x = reorder(EVTYPE, -Cost_Billions), y = Cost_Billions, fill = Damage_Type)) +
        geom_bar(stat = "identity") +
        theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
        labs(title = "Top 10 Weather Events with the Greatest Economic Consequences",
             subtitle = "U.S., 1950-2011",
             x = "Event Type",
             y = "Total Cost (Billions of USD)",
             fill = "Damage Type") +
        scale_fill_manual(values = c("Property_Damage" = "steelblue", "Crop_Damage" = "goldenrod2"),
                          labels = c("Crop Damage", "Property Damage")) # Clean up legend labels

Figure 2: Economic Impact of Severe Weather Events. This bar chart shows the top 10 most costly event types. Floods have caused the greatest total economic damage, predominantly through property destruction. Hurricanes/typhoons and storm surges are also immensely damaging to property. In contrast, drought is the leading cause of crop damage by a wide margin, followed by floods and river flooding. This highlights how different events threaten different sectors of the economy.