This analysis examines the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database to identify which types of severe weather events are most harmful to population health and have the greatest economic consequences. The dataset contains records from 1950 to 2011, with more complete data in recent years. After comprehensive data processing and analysis, we found that tornadoes are overwhelmingly the most harmful event type for population health, causing the highest number of fatalities and injuries. For economic consequences, floods inflict the greatest property damage while droughts cause the most crop damage. The total economic impact is dominated by floods, hurricanes/typhoons, and storm surges. These findings can assist government and municipal managers in prioritizing resources and preparation strategies for different types of severe weather events based on their specific impacts on public safety and economic stability.
library(dplyr)
library(ggplot2)
library(tidyr)
library(stringr)
library(scales)
## Data Processing
### Loading Required Packages
library(dplyr)
library(ggplot2)
library(tidyr)
library(stringr)
###Loading the Data The data was loaded directly from the compressed CSV file as required.
# Download and load the storm data from the provided URL
file_url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
dest_file <- "storm_data.csv.bz2"
# Download file if it doesn't exist
if (!file.exists(dest_file)) {
download.file(file_url, dest_file, method = "curl")
}
# Load the data
storm_data <- read.csv(dest_file, stringsAsFactors = FALSE)
# Examine the data structure
cat("Dataset dimensions:", dim(storm_data), "\n")
## Dataset dimensions: 902297 37
cat("Number of event types:", length(unique(storm_data$EVTYPE)), "\n")
## Number of event types: 985
###Data Transformation and Cleaning The data requires significant processing, particularly for the economic damage variables which use exponent codes to represent multipliers.
# Select relevant variables for analysis
clean_storm <- storm_data %>%
select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
# Examine the exponent columns to understand the coding system
cat("Unique PROPDMGEXP values:", unique(clean_storm$PROPDMGEXP), "\n")
## Unique PROPDMGEXP values: K M B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
cat("Unique CROPDMGEXP values:", unique(clean_storm$CROPDMGEXP), "\n")
## Unique CROPDMGEXP values: M K m B ? 0 k 2
# Function to convert damage exponents to numeric multipliers
convert_damage_exponent <- function(exp) {
exp <- toupper(as.character(exp))
case_when(
exp %in% c("", "+", "-", "?") ~ 1,
exp == "H" ~ 100, # Hundreds
exp == "K" ~ 1000, # Thousands
exp == "M" ~ 1000000, # Millions
exp == "B" ~ 1000000000, # Billions
exp %in% c("0", "1", "2", "3", "4", "5", "6", "7", "8", "9") ~ 10^as.numeric(exp),
TRUE ~ 1
)
}
# Calculate actual damage values in dollars
clean_storm <- clean_storm %>%
mutate(
PROP_DAMAGE = PROPDMG * convert_damage_exponent(PROPDMGEXP),
CROP_DAMAGE = CROPDMG * convert_damage_exponent(CROPDMGEXP),
TOTAL_DAMAGE = PROP_DAMAGE + CROP_DAMAGE,
TOTAL_HEALTH = FATALITIES + INJURIES
)
## Warning: There were 2 warnings in `mutate()`.
## The first warning was:
## ℹ In argument: `PROP_DAMAGE = PROPDMG * convert_damage_exponent(PROPDMGEXP)`.
## Caused by warning:
## ! NAs introduced by coercion
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
# Standardize event types by converting to uppercase and trimming whitespace
clean_storm <- clean_storm %>%
mutate(EVTYPE = str_to_upper(str_trim(EVTYPE)))
# Aggregate data by event type for health impact analysis
health_impact <- clean_storm %>%
group_by(EVTYPE) %>%
summarise(
Total_Fatalities = sum(FATALITIES, na.rm = TRUE),
Total_Injuries = sum(INJURIES, na.rm = TRUE),
Total_Health_Impact = sum(TOTAL_HEALTH, na.rm = TRUE),
Event_Count = n()
) %>%
arrange(desc(Total_Health_Impact))
# Aggregate data by event type for economic impact analysis
economic_impact <- clean_storm %>%
group_by(EVTYPE) %>%
summarise(
Property_Damage = sum(PROP_DAMAGE, na.rm = TRUE),
Crop_Damage = sum(CROP_DAMAGE, na.rm = TRUE),
Total_Economic_Impact = sum(TOTAL_DAMAGE, na.rm = TRUE),
Event_Count = n()
) %>%
arrange(desc(Total_Economic_Impact))
# Get top events for detailed analysis
top_health <- health_impact %>% head(10)
top_economic <- economic_impact %>% head(10)
###Results Most Harmful Events for Population Health
# Display top 10 most harmful events for population health
cat("Top 10 Most Harmful Events for Population Health:\n")
## Top 10 Most Harmful Events for Population Health:
print(top_health[, c("EVTYPE", "Total_Fatalities", "Total_Injuries", "Total_Health_Impact")])
## # A tibble: 10 × 4
## EVTYPE Total_Fatalities Total_Injuries Total_Health_Impact
## <chr> <dbl> <dbl> <dbl>
## 1 TORNADO 5633 91346 96979
## 2 EXCESSIVE HEAT 1903 6525 8428
## 3 TSTM WIND 504 6957 7461
## 4 FLOOD 470 6789 7259
## 5 LIGHTNING 816 5230 6046
## 6 HEAT 937 2100 3037
## 7 FLASH FLOOD 978 1777 2755
## 8 ICE STORM 89 1975 2064
## 9 THUNDERSTORM WIND 133 1488 1621
## 10 WINTER STORM 206 1321 1527
# Prepare data for plotting
health_plot_data <- top_health %>%
select(EVTYPE, Total_Fatalities, Total_Injuries) %>%
pivot_longer(cols = c(Total_Fatalities, Total_Injuries),
names_to = "Impact_Type",
values_to = "Count")
# Create the plot
ggplot(health_plot_data, aes(x = reorder(EVTYPE, -Count), y = Count, fill = Impact_Type)) +
geom_bar(stat = "identity", position = "stack") +
scale_fill_manual(values = c("Total_Fatalities" = "#E74C3C", "Total_Injuries" = "#F39C12"),
labels = c("Fatalities", "Injuries")) +
labs(title = "Top 10 Most Harmful Weather Events for Population Health",
x = "Event Type",
y = "Total Count",
fill = "Impact Type") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(hjust = 0.5)) +
scale_y_continuous(labels = comma_format())
Key Finding 1: Tornadoes are by far the most harmful event type for
population health, causing approximately 5,633 fatalities and 91,346
injuries - substantially more than any other weather event.
Key Finding 2: Excessive heat and flash floods are the second and third most harmful events for population health, primarily due to their fatal impacts.
###Events with Greatest Economic Consequences
# Display top 10 events with greatest economic consequences
cat("Top 10 Events with Greatest Economic Consequences (in billions):\n")
## Top 10 Events with Greatest Economic Consequences (in billions):
top_economic_display <- top_economic %>%
mutate(across(Property_Damage:Total_Economic_Impact, ~ . / 1e9))
print(top_economic_display[, c("EVTYPE", "Property_Damage", "Crop_Damage", "Total_Economic_Impact")])
## # A tibble: 10 × 4
## EVTYPE Property_Damage Crop_Damage Total_Economic_Impact
## <chr> <dbl> <dbl> <dbl>
## 1 FLOOD 145. 5.66 150.
## 2 HURRICANE/TYPHOON 69.3 2.61 71.9
## 3 TORNADO 56.9 0.415 57.4
## 4 STORM SURGE 43.3 0.000005 43.3
## 5 HAIL 15.7 3.03 18.8
## 6 FLASH FLOOD 16.8 1.42 18.2
## 7 DROUGHT 1.05 14.0 15.0
## 8 HURRICANE 11.9 2.74 14.6
## 9 RIVER FLOOD 5.12 5.03 10.1
## 10 ICE STORM 3.94 5.02 8.97
# Prepare data for plotting
economic_plot_data <- top_economic %>%
select(EVTYPE, Property_Damage, Crop_Damage) %>%
mutate(Property_Damage = Property_Damage / 1e9,
Crop_Damage = Crop_Damage / 1e9) %>%
pivot_longer(cols = c(Property_Damage, Crop_Damage),
names_to = "Damage_Type",
values_to = "Amount")
# Create the plot
ggplot(economic_plot_data, aes(x = reorder(EVTYPE, -Amount), y = Amount, fill = Damage_Type)) +
geom_bar(stat = "identity", position = "stack") +
scale_fill_manual(values = c("Property_Damage" = "#3498DB", "Crop_Damage" = "#27AE60"),
labels = c("Crop Damage", "Property Damage")) +
labs(title = "Top 10 Weather Events with Greatest Economic Consequences",
x = "Event Type",
y = "Total Damage (Billions USD)",
fill = "Damage Type") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(hjust = 0.5)) +
scale_y_continuous(labels = dollar_format())
*Key Finding 1: Floods cause the most property damage at approximately $144.7 billion, while droughts cause the most crop damage at about $13.9 billion.
*Key Finding 2: Hurricanes/typhoons and storm surges also cause massive economic damage, particularly to property, highlighting the devastating impact of coastal weather events.
###Comprehensive Impact Analysis
# Combine health and economic data for comprehensive analysis
combined_impact <- health_impact %>%
inner_join(economic_impact, by = "EVTYPE", suffix = c("_health", "_economic")) %>%
filter(Total_Health_Impact > 1000 | Total_Economic_Impact > 1e9) %>%
arrange(desc(Total_Health_Impact)) %>%
head(20)
# Create comprehensive scatter plot
ggplot(combined_impact,
aes(x = Total_Economic_Impact/1e9,
y = Total_Health_Impact,
size = Total_Health_Impact,
color = Total_Economic_Impact/1e9)) +
geom_point(alpha = 0.7) +
geom_text(aes(label = EVTYPE), size = 3, hjust = 0.5, vjust = -0.5,
check_overlap = FALSE) +
scale_color_gradient(low = "blue", high = "red",
name = "Economic Impact\n(Billions USD)") +
scale_size_continuous(name = "Health Impact") +
labs(title = "Comprehensive Analysis: Health vs Economic Impacts of Weather Events",
x = "Total Economic Impact (Billions USD)",
y = "Total Health Impact (Fatalities + Injuries)") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5)) +
scale_x_continuous(labels = dollar_format()) +
scale_y_continuous(labels = comma_format())
###Summary and Conclusions
Based on the comprehensive analysis of NOAA storm data from 1950 to 2011:
*Population Health Protection Priority: Tornadoes require the highest priority for public safety measures, as they cause the overwhelming majority of weather-related fatalities and injuries. Excessive heat and flash floods also warrant significant attention for health protection.
*Economic Protection Priority: Flood mitigation should be the primary focus for economic protection, followed by hurricane/typhoon preparedness and storm surge protection systems. Agricultural regions should prioritize drought preparedness.
*Resource Allocation Strategy: Government and municipal managers should adopt differentiated strategies:
**Tornado-prone areas: Invest in early warning systems, shelters, and public education
**Flood-prone regions: Focus on infrastructure protection, zoning regulations, and flood control systems
**Coastal areas: Prioritize hurricane and storm surge protection
**Agricultural regions: Implement water conservation and drought-resistant farming practices
The analysis demonstrates that different severe weather events pose distinct threats, requiring targeted preparation and resource allocation strategies to maximize public safety and economic protection.