This analysis explores the U.S. National Oceanic and Atmospheric Administration (NOAA) Storm Database from 1950 to November 2011 to identify severe weather events with the highest population health and economic impact. The raw CSV data file is loaded and processed entirely within this document. Population health harm is calculated by combining fatalities and injuries. Economic damage is calculated by summing property damage and crop damage, after converting exponential notation (e.g., ‘K’, ‘M’, ‘B’) into numerical dollar amounts. The results show that tornadoes are the most harmful event type for population health. For economic damage, floods are the most costly weather event, followed by hurricanes/typhoons and tornadoes. These findings provide critical prioritization data for municipal managers preparing for severe weather.
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.5.2
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(knitr)
setwd("C:\\Users\\aliss\\OneDrive\\Documents\\04-Professional Development\\Coursera_Data Science - Foundations using R Specialization_202507\\Course 5 - Reproducible Research\\Course project 2")
storm_data <- read.csv("repdata_data_StormData.csv", header = TRUE)
storm_data_filtered <- storm_data %>%
select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
head(storm_data_filtered)
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO 0 15 25.0 K 0
## 2 TORNADO 0 0 2.5 K 0
## 3 TORNADO 0 2 25.0 K 0
## 4 TORNADO 0 2 2.5 K 0
## 5 TORNADO 0 2 2.5 K 0
## 6 TORNADO 0 6 2.5 K 0
exponent_to_multiplier <- function(exp) {
# Convert to uppercase for case-insensitivity
exp <- toupper(exp)
# Define the multiplier based on the character code
multiplier <- case_when(
exp == "H" ~ 100, # Hundred
exp == "K" ~ 1000, # Thousand
exp == "M" ~ 10^6, # Million
exp == "B" ~ 10^9, # Billion
exp %in% c("", "0", "+", "-", "?") ~ 1, # No multiplier / ignored
TRUE ~ 0 # Catch-all for other non-standard codes
)
return(multiplier)
}
storm_data_processed <- storm_data_filtered %>%
mutate(
# Calculate property damage cost
prop_damage_multiplier = exponent_to_multiplier(PROPDMGEXP),
property_damage = PROPDMG * prop_damage_multiplier,
# Calculate crop damage cost
crop_damage_multiplier = exponent_to_multiplier(CROPDMGEXP),
crop_damage = CROPDMG * crop_damage_multiplier,
# Calculate total harm to population health
health_harm = FATALITIES + INJURIES,
# Calculate total economic damage
economic_consequence = property_damage + crop_damage
) %>%
# Filter out events with zero impact (for efficiency, though not strictly required)
filter(health_harm > 0 | economic_consequence > 0) %>%
# Simplify the EVTYPE variable by converting to uppercase and trimming whitespace
mutate(EVTYPE = toupper(trimws(EVTYPE)))
head(storm_data_processed %>% select(EVTYPE, property_damage, crop_damage, health_harm))
## EVTYPE property_damage crop_damage health_harm
## 1 TORNADO 25000 0 15
## 2 TORNADO 2500 0 0
## 3 TORNADO 25000 0 2
## 4 TORNADO 2500 0 2
## 5 TORNADO 2500 0 2
## 6 TORNADO 2500 0 6
# --- 1. Health Harm ---
health_harm_summary <- storm_data_processed %>%
group_by(EVTYPE) %>%
summarise(
Total_Fatalities = sum(FATALITIES),
Total_Injuries = sum(INJURIES),
Total_Health_Harm = sum(health_harm)
) %>%
arrange(desc(Total_Health_Harm))
top_10_health <- head(health_harm_summary, 10)
# --- 2. Economic Consequences ---
economic_consequence_summary <- storm_data_processed %>%
group_by(EVTYPE) %>%
summarise(
Total_Property_Damage = sum(property_damage) / 10^9, # Convert to billions of USD
Total_Crop_Damage = sum(crop_damage) / 10^9, # Convert to billions of USD
Total_Economic_Consequence = sum(economic_consequence) / 10^9 # Convert to billions of USD
) %>%
arrange(desc(Total_Economic_Consequence))
top_10_economic <- head(economic_consequence_summary, 10)
# Display the top 10 events most harmful to population health
kable(top_10_health,
caption = "Top 10 Most Harmful Severe Weather Events (Fatalities + Injuries)",
format = "markdown")
| EVTYPE | Total_Fatalities | Total_Injuries | Total_Health_Harm |
|---|---|---|---|
| TORNADO | 5633 | 91346 | 96979 |
| EXCESSIVE HEAT | 1903 | 6525 | 8428 |
| TSTM WIND | 504 | 6957 | 7461 |
| FLOOD | 470 | 6789 | 7259 |
| LIGHTNING | 816 | 5230 | 6046 |
| HEAT | 937 | 2100 | 3037 |
| FLASH FLOOD | 978 | 1777 | 2755 |
| ICE STORM | 89 | 1975 | 2064 |
| THUNDERSTORM WIND | 133 | 1488 | 1621 |
| WINTER STORM | 206 | 1321 | 1527 |
# Plotting the results (Figure 1 - Health Harm)
# Melt the data for ggplot to plot Fatalities and Injuries side-by-side
top_10_health_plot <- top_10_health %>%
select(EVTYPE, Total_Fatalities, Total_Injuries) %>%
tidyr::pivot_longer(cols = starts_with("Total"), names_to = "Harm_Type", values_to = "Count")
ggplot(top_10_health_plot, aes(x = reorder(EVTYPE, Count), y = Count, fill = Harm_Type)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(
title = "Figure 1: Top 10 Severe Weather Events by Total Population Health Harm",
subtitle = "Aggregated Fatalities and Injuries, 1950-2011",
x = "Event Type",
y = "Total Number of People Affected (Fatalities or Injuries)",
fill = "Type of Harm"
) +
theme_minimal() +
scale_fill_manual(values = c("Total_Fatalities" = "darkred", "Total_Injuries" = "salmon"))
# Display the top 10 events with the greatest economic consequences
kable(top_10_economic,
caption = "Top 10 Severe Weather Events by Total Economic Consequence (Billions USD)",
digits = 2,
format = "markdown")
| EVTYPE | Total_Property_Damage | Total_Crop_Damage | Total_Economic_Consequence |
|---|---|---|---|
| FLOOD | 144.66 | 5.66 | 150.32 |
| HURRICANE/TYPHOON | 69.31 | 2.61 | 71.91 |
| TORNADO | 56.94 | 0.41 | 57.35 |
| STORM SURGE | 43.32 | 0.00 | 43.32 |
| HAIL | 15.73 | 3.03 | 18.76 |
| FLASH FLOOD | 16.14 | 1.42 | 17.56 |
| DROUGHT | 1.05 | 13.97 | 15.02 |
| HURRICANE | 11.87 | 2.74 | 14.61 |
| RIVER FLOOD | 5.12 | 5.03 | 10.15 |
| ICE STORM | 3.94 | 5.02 | 8.97 |
# Plotting the results (Figure 2 - Economic Consequences)
# Melt the data for ggplot to plot Property and Crop Damage side-by-side
top_10_economic_plot <- top_10_economic %>%
select(EVTYPE, Total_Property_Damage, Total_Crop_Damage) %>%
tidyr::pivot_longer(cols = starts_with("Total"), names_to = "Damage_Type", values_to = "Cost_Billion_USD")
ggplot(top_10_economic_plot, aes(x = reorder(EVTYPE, Cost_Billion_USD), y = Cost_Billion_USD, fill = Damage_Type)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(
title = "Figure 2: Top 10 Severe Weather Events by Total Economic Damage",
subtitle = "Property and Crop Damage (in Billions USD), 1950-2011",
x = "Event Type",
y = "Total Cost (Billions USD)",
fill = "Type of Damage"
) +
theme_minimal() +
scale_fill_manual(values = c("Total_Property_Damage" = "darkblue", "Total_Crop_Damage" = "goldenrod"))
Tornado events are the most damaging to population health, accounting for the highest total number of both fatalities and injuries. Other significant contributors include Excessive Heat, TSTM (Thunderstorm) Winds, and Floods.
Flood events cause the greatest total economic damage, primarily through property destruction. Hurricane/Typhoon and Tornado events follow as the second and third most costly event types, respectively, with most damage also concentrated in property. While Drought does not rank highly for health, it is a major economic factor due to its large impact on crop production.