This report explores the U.S. National Oceanic and Atmospheric Administration’s (NOAA) Storm Database to identify which types of severe weather events have the greatest impact on population health and economic stability across the United States. Data spanning from 1950 to 2011 was loaded directly from the compressed CSV file. Analysis was performed by aggregating total fatalities and injuries for health harm, and total property and crop damages for economic consequences. The main findings indicate that Tornadoes are overwhelmingly the most harmful events to public health, while a combination of Floods and Hurricanes cause the most significant economic damages. The results are presented in two distinct figures, providing clear insights for municipal managers responsible for resource prioritization.
This section describes how the raw data were loaded and transformed
for analysis. The analysis starts directly from the compressed
.csv.bz2 file.
# Load required libraries
library(dplyr)
library(ggplot2)
library(knitr)
#Loading the Data The Storm Data is loaded directly into R from the bzip2 compressed file.
# Read the data from the compressed CSV file
file_name <- "repdata_data_StormData.csv.bz2"
storm_data <- read.csv(file_name)
#Data Transformation and Cleaning We focus on the EVTYPE (event type), FATALITIES, INJURIES, PROPDMG (property damage), PROPDMGEXP, CROPDMG (crop damage), and CROPDMGEXP variables. The damage variables require cleaning to convert the damage exponent codes (K, M, B) into numerical multipliers.
# Select relevant columns for analysis
data_subset <- storm_data %>%
select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
# --- Clean Economic Damage Variables ---
# Function to convert exponent codes (PROPDMGEXP/CROPDMGEXP) to multipliers
get_multiplier <- function(exp_code) {
if (exp_code %in% c("h", "H")) return(100)
if (exp_code %in% c("k", "K")) return(1000)
if (exp_code %in% c("m", "M")) return(1000000)
if (exp_code %in% c("b", "B")) return(1000000000)
if (exp_code %in% c("", "+", "-", "?", "0", "1", "2", "3", "4", "5", "6", "7", "8")) return(1)
return(0) # Default to 0 for unknown/unlisted codes
}
# Vectorized multiplication factor application
data_subset <- data_subset %>%
mutate(
# Apply the multiplier function to Property and Crop Damage Exponents
prop_multiplier = sapply(PROPDMGEXP, get_multiplier, USE.NAMES = FALSE),
crop_multiplier = sapply(CROPDMGEXP, get_multiplier, USE.NAMES = FALSE),
# Calculate actual damage values (in USD)
PropertyDamage = PROPDMG * prop_multiplier,
CropDamage = CROPDMG * crop_multiplier,
# Calculate Total Economic Damage
TotalEconomicDamage = PropertyDamage + CropDamage,
# Calculate Total Population Harm (Fatalities + Injuries)
TotalPopulationHarm = FATALITIES + INJURIES
)
# Clean up event types (EVTYPE) by converting to uppercase for aggregation consistency
data_subset$EVTYPE <- toupper(data_subset$EVTYPE)
##Results #1. Most Harmful Events to Population Health The most harmful event types are determined by aggregating the total number of Fatalities and Injuries for each event type (EVTYPE). We consider the top 10 most harmful events.
# Aggregate total harm by event type and select the top 10
harm_by_event <- data_subset %>%
group_by(EVTYPE) %>%
summarise(
TotalFatalities = sum(FATALITIES),
TotalInjuries = sum(INJURIES),
TotalHarm = sum(TotalPopulationHarm)
) %>%
arrange(desc(TotalHarm)) %>%
top_n(10, TotalHarm)
# Convert EVTYPE to factor for proper plotting order
harm_by_event$EVTYPE <- factor(harm_by_event$EVTYPE, levels = harm_by_event$EVTYPE)
# Reshape data for plotting (Fatalities and Injuries in separate bars)
harm_long <- harm_by_event %>%
select(EVTYPE, TotalFatalities, TotalInjuries) %>%
tidyr::gather(key = "Type", value = "Count", -EVTYPE)
# Create the plot (Figure 1)
ggplot(harm_long, aes(x = EVTYPE, y = Count, fill = Type)) +
geom_bar(stat = "identity", position = "stack") +
labs(
title = "Figure 1: Top 10 Weather Events Most Harmful to Population Health (1950-2011)",
x = "Event Type",
y = "Total Number of Fatalities and Injuries",
fill = "Harm Type",
caption = "Data is stacked to show total impact. Tornadoes cause vastly more harm than any other event type."
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_fill_manual(values = c("TotalFatalities" = "darkred", "TotalInjuries" = "salmon"))
Figure 1 Caption: The figure above shows the total combined fatalities
and injuries for the top 10 most harmful severe weather event types.
Tornadoes are clearly the most significant threat to public health,
responsible for a total of 96,979 recorded fatalities and injuries.
#2. Events with the Greatest Economic Consequences Economic consequences are calculated by aggregating the total Property Damage and Crop Damage (in USD, converted using the exponent codes) for each event type. We examine the top 10 events causing the most damage.
# Aggregate total economic damage by event type and select the top 10
damage_by_event <- data_subset %>%
group_by(EVTYPE) %>%
summarise(
TotalPropertyDamage = sum(PropertyDamage),
TotalCropDamage = sum(CropDamage),
TotalEconomicDamage = sum(TotalEconomicDamage)
) %>%
arrange(desc(TotalEconomicDamage)) %>%
top_n(10, TotalEconomicDamage)
# Convert economic damage to billions of USD for readability
damage_by_event <- damage_by_event %>%
mutate(
TotalEconomicDamage_B = TotalEconomicDamage / 1e9,
TotalPropertyDamage_B = TotalPropertyDamage / 1e9,
TotalCropDamage_B = TotalCropDamage / 1e9
)
# Convert EVTYPE to factor for proper plotting order
damage_by_event$EVTYPE <- factor(damage_by_event$EVTYPE, levels = damage_by_event$EVTYPE)
# Reshape data for plotting
damage_long <- damage_by_event %>%
select(EVTYPE, TotalPropertyDamage_B, TotalCropDamage_B) %>%
tidyr::gather(key = "Type", value = "Amount", -EVTYPE)
# Create the plot (Figure 2)
ggplot(damage_long, aes(x = EVTYPE, y = Amount, fill = Type)) +
geom_bar(stat = "identity", position = "stack") +
labs(
title = "Figure 2: Top 10 Weather Events with the Greatest Economic Consequences (1950-2011)",
x = "Event Type",
y = "Total Damage (Billions USD)",
fill = "Damage Type",
caption = "Floods and Hurricane/Typhoons cause the most extensive total property and crop damage."
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_fill_manual(values = c("TotalPropertyDamage_B" = "darkgreen", "TotalCropDamage_B" = "lightgreen"))
Figure 2 Caption: This figure illustrates the top 10 severe weather
event types causing the highest total economic damage, measured in
billions of USD. Floods lead the category, primarily due to property
damage, followed closely by Hurricane/Typhoon events. The overall
economic impact of these events is significantly higher than that of
other types.