This analysis explores the NOAA Storm Database to identify which types of severe weather events are most harmful to population health and have the greatest economic consequences across the United States. The database contains records of major storms and weather events from 1950 to November 2011, including fatalities, injuries, and property/crop damage estimates. To address population health impacts, we analyzed fatalities and injuries by event type, finding that tornadoes cause the most total casualties, followed by excessive heat and flash floods. For economic consequences, we examined property and crop damage, discovering that floods cause the greatest total economic damage, followed by hurricanes/typhoons and tornadoes. The analysis reveals that while tornadoes are the deadliest individual event type, floods represent the costliest natural disaster category. These findings can help government and municipal managers prioritize resources and emergency preparedness efforts. The analysis processes the raw CSV data through data cleaning, aggregation, and visualization to ensure reproducible results. All data transformations are documented and justified within the analysis workflow.
# Load required libraries
library(dplyr)
library(ggplot2)
library(knitr)
library(tidyr) # For pivot_longer function
# Set options for better output formatting
options(scipen = 999) # Avoid scientific notation
# Download and read the storm data
if (!file.exists("StormData.csv.bz2")) {
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
"StormData.csv.bz2")
}
# Read the compressed CSV file
storm_data <- read.csv("StormData.csv.bz2", stringsAsFactors = FALSE)
# Display basic information about the dataset
cat("Dataset dimensions:", dim(storm_data)[1], "rows x", dim(storm_data)[2], "columns\n")
cat("Date range:", min(storm_data$BGN_DATE), "to", max(storm_data$BGN_DATE), "\n")
# Show structure of key variables
str(storm_data[, c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP",
"CROPDMG", "CROPDMGEXP")])
# Clean event types - convert to uppercase and trim whitespace
storm_data$EVTYPE <- toupper(trimws(storm_data$EVTYPE))
# Function to convert property/crop damage to actual dollar amounts
convert_damage <- function(damage, exp) {
# Convert exponent codes to multipliers
multiplier <- case_when(
toupper(exp) == "K" ~ 1000,
toupper(exp) == "M" ~ 1000000,
toupper(exp) == "B" ~ 1000000000,
toupper(exp) %in% c("H", "2") ~ 100,
toupper(exp) %in% c("3", "4", "5", "6", "7", "8") ~ 10^as.numeric(exp),
toupper(exp) == "1" ~ 10,
TRUE ~ 1
)
return(damage * multiplier)
}
# Apply damage conversion
storm_data$PROP_DAMAGE <- convert_damage(storm_data$PROPDMG, storm_data$PROPDMGEXP)
storm_data$CROP_DAMAGE <- convert_damage(storm_data$CROPDMG, storm_data$CROPDMGEXP)
storm_data$TOTAL_DAMAGE <- storm_data$PROP_DAMAGE + storm_data$CROP_DAMAGE
# Create total casualties variable
storm_data$TOTAL_CASUALTIES <- storm_data$FATALITIES + storm_data$INJURIES
# Show summary of processed data
summary(storm_data[, c("FATALITIES", "INJURIES", "TOTAL_CASUALTIES",
"PROP_DAMAGE", "CROP_DAMAGE", "TOTAL_DAMAGE")])
# Aggregate health impact by event type
health_impact <- storm_data %>%
group_by(EVTYPE) %>%
summarise(
Total_Fatalities = sum(FATALITIES, na.rm = TRUE),
Total_Injuries = sum(INJURIES, na.rm = TRUE),
Total_Casualties = sum(TOTAL_CASUALTIES, na.rm = TRUE),
.groups = 'drop'
) %>%
filter(Total_Casualties > 0) %>%
arrange(desc(Total_Casualties))
# Display top 15 most harmful events
kable(head(health_impact, 15),
caption = "Top 15 Weather Events Most Harmful to Population Health",
col.names = c("Event Type", "Total Fatalities", "Total Injuries", "Total Casualties"))
# Create visualization for health impacts
top_health <- head(health_impact, 10)
# Reshape data for stacked bar chart
health_plot_data <- top_health %>%
select(EVTYPE, Total_Fatalities, Total_Injuries) %>%
pivot_longer(cols = c(Total_Fatalities, Total_Injuries),
names_to = "Type", values_to = "Count") %>%
mutate(Type = gsub("Total_", "", Type))
# Create the plot
p1 <- ggplot(health_plot_data, aes(x = reorder(EVTYPE, Count), y = Count, fill = Type)) +
geom_bar(stat = "identity", position = "stack") +
coord_flip() +
labs(title = "Top 10 Weather Events Most Harmful to Population Health",
x = "Event Type",
y = "Number of People Affected",
fill = "Impact Type") +
scale_fill_manual(values = c("Fatalities" = "#d62728", "Injuries" = "#ff7f0e")) +
theme_minimal() +
theme(axis.text.y = element_text(size = 8))
print(p1)
http://127.0.0.1:42227/graphics/694ca633-32f8-4ea8-bc8a-1e5583bccddc.png
# Aggregate economic impact by event type
economic_impact <- storm_data %>%
group_by(EVTYPE) %>%
summarise(
Property_Damage = sum(PROP_DAMAGE, na.rm = TRUE),
Crop_Damage = sum(CROP_DAMAGE, na.rm = TRUE),
Total_Economic_Damage = sum(TOTAL_DAMAGE, na.rm = TRUE),
.groups = 'drop'
) %>%
filter(Total_Economic_Damage > 0) %>%
arrange(desc(Total_Economic_Damage))
# Convert to billions for better readability
economic_impact$Property_Damage_Billions <- economic_impact$Property_Damage / 1e9
economic_impact$Crop_Damage_Billions <- economic_impact$Crop_Damage / 1e9
economic_impact$Total_Damage_Billions <- economic_impact$Total_Economic_Damage / 1e9
# Display top 15 most economically damaging events
kable(head(economic_impact[, c("EVTYPE", "Property_Damage_Billions",
"Crop_Damage_Billions", "Total_Damage_Billions")], 15),
caption = "Top 15 Weather Events with Greatest Economic Impact (Billions USD)",
col.names = c("Event Type", "Property Damage", "Crop Damage", "Total Damage"),
digits = 2)
# Create visualization for economic impacts
top_economic <- head(economic_impact, 10)
# Reshape data for stacked bar chart
economic_plot_data <- top_economic %>%
select(EVTYPE, Property_Damage_Billions, Crop_Damage_Billions) %>%
pivot_longer(cols = c(Property_Damage_Billions, Crop_Damage_Billions),
names_to = "Damage_Type", values_to = "Amount") %>%
mutate(Damage_Type = case_when(
Damage_Type == "Property_Damage_Billions" ~ "Property Damage",
Damage_Type == "Crop_Damage_Billions" ~ "Crop Damage"
))
# Create the plot
p2 <- ggplot(economic_plot_data, aes(x = reorder(EVTYPE, Amount), y = Amount, fill = Damage_Type)) +
geom_bar(stat = "identity", position = "stack") +
coord_flip() +
labs(title = "Top 10 Weather Events with Greatest Economic Impact",
x = "Event Type",
y = "Economic Damage (Billions USD)",
fill = "Damage Type") +
scale_fill_manual(values = c("Property Damage" = "#2ca02c", "Crop Damage" = "#ff7f0e")) +
theme_minimal() +
theme(axis.text.y = element_text(size = 8))
print(p2)
http://127.0.0.1:42227/graphics/74430a4f-b9a3-40b8-8ce6-3c5badf3fdd5.png
# Create summary statistics
health_summary <- storm_data %>%
summarise(
Total_Events = n(),
Total_Fatalities = sum(FATALITIES, na.rm = TRUE),
Total_Injuries = sum(INJURIES, na.rm = TRUE),
Events_with_Casualties = sum(TOTAL_CASUALTIES > 0, na.rm = TRUE)
)
economic_summary <- storm_data %>%
summarise(
Total_Property_Damage = sum(PROP_DAMAGE, na.rm = TRUE) / 1e9,
Total_Crop_Damage = sum(CROP_DAMAGE, na.rm = TRUE) / 1e9,
Total_Economic_Damage = sum(TOTAL_DAMAGE, na.rm = TRUE) / 1e9,
Events_with_Damage = sum(TOTAL_DAMAGE > 0, na.rm = TRUE)
)
cat("=== HEALTH IMPACT SUMMARY ===\n")
cat("Total weather events recorded:", health_summary$Total_Events, "\n")
cat("Total fatalities:", health_summary$Total_Fatalities, "\n")
cat("Total injuries:", health_summary$Total_Injuries, "\n")
cat("Events causing casualties:", health_summary$Events_with_Casualties, "\n\n")
cat("=== ECONOMIC IMPACT SUMMARY ===\n")
cat("Total property damage: $", round(economic_summary$Total_Property_Damage, 2), "billion\n")
cat("Total crop damage: $", round(economic_summary$Total_Crop_Damage, 2), "billion\n")
cat("Total economic damage: $", round(economic_summary$Total_Economic_Damage, 2), "billion\n")
cat("Events causing economic damage:", economic_summary$Events_with_Damage, "\n")
Summary Statistics:
=== HEALTH IMPACT SUMMARY === - Total weather events recorded:
902,297 - Total fatalities: 15,145 - Total injuries: 140,528
- Events causing casualties: 21,929
=== ECONOMIC IMPACT SUMMARY === - Total property damage: $427.32
billion - Total crop damage: $49.37 billion - Total economic damage:
$476.69 billion
- Events causing economic damage: 245,031
Based on the analysis of the NOAA Storm Database from 1950 to 2011:
Population Health Impact: - Tornadoes are the most harmful weather events to population health, causing the highest number of total casualties (fatalities + injuries) - Excessive Heat and Flash Floods also rank high in terms of health impacts - Tornadoes alone account for a significant portion of weather-related fatalities and injuries in the United States
Economic Impact: - Floods cause the greatest total economic damage, primarily through property destruction - Hurricanes/Typhoons and Tornadoes also cause substantial economic losses - Property damage generally exceeds crop damage for most event types, with floods and hurricanes being particularly destructive to infrastructure
These findings suggest that emergency management resources should prioritize tornado preparedness for protecting public health, while flood mitigation and hurricane preparedness are crucial for minimizing economic losses. The data shows clear patterns that can inform disaster preparedness planning and resource allocation decisions for government and municipal managers.