This analysis examines the National Weather Service Storm Data to
specifically answer two key questions:
1. Across the United States, which types of weather events are most
harmful with respect to population health?
2. Across the United States, which types of weather events have the
greatest economic consequences?
Our analysis reveals that tornadoes are the most harmful weather events
with respect to population health, causing most fatalities as well as
injuries, while floods have the greatest economic impact when we combine
property and crop damage.
After loading our raw data, the weather event types are standardized and the economic damage values are converted. Finally missing values are cleaned up - all in such a way that the loss of data is minimal and data reduction is optimized for our analysis.
library(dplyr)
library(ggplot2)
library(knitr)
# Load the raw data
raw_data <- read.csv("repdata_data_StormData.csv.bz2")
# Standardize the storm event type data
event_types <- tolower(raw_data$EVTYPE)
event_types <- gsub("[[:punct:]]", "", event_types)
event_types <- gsub("[[:space:]]+", " ", event_types)
raw_data$EVTYPE_STD <- event_types
# Convert damage values
convert_damage <- function(value, exp) {
multiplier <- ifelse(exp == "K", 1e3,
ifelse(exp == "M", 1e6,
ifelse(exp == "B", 1e9, 1)))
as.numeric(value) * multiplier
}
raw_data$PROPDMG_VALUE <- convert_damage(raw_data$PROPDMG, raw_data$PROPDMGEXP)
raw_data$CROPDMG_VALUE <- convert_damage(raw_data$CROPDMG, raw_data$CROPDMGEXP)
# Clean the dataset missings
storm_data_std <- raw_data %>%
filter(!is.na(FATALITIES) & !is.na(INJURIES) &
!is.na(PROPDMG_VALUE) & !is.na(CROPDMG_VALUE))
First we analyzed the events that are most harmful to population health across the United States.
# Create combined plot for fatalities and injuries
health_impact <- storm_data_std %>%
group_by(EVTYPE_STD) %>%
summarise(
Fatalities = sum(FATALITIES, na.rm = TRUE),
Injuries = sum(INJURIES, na.rm = TRUE)
) %>%
arrange(desc(Fatalities)) %>%
head(10) %>%
tidyr::pivot_longer(cols = c(Fatalities, Injuries),
names_to = "Type",
values_to = "Count")
ggplot(health_impact,
aes(x = reorder(EVTYPE_STD, -Count), y = Count, fill = Type)) +
geom_bar(stat = "identity", position = "dodge") +
scale_fill_manual(values = c("Fatalities" = "darkred", "Injuries" = "orange")) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(title = "Overview of the 10 most harmful weather event types to population health",
x = "Weather event type",
y = "Number of affected victims",
fill = "Impact type")
This combined visualization shows fatalities and injuries for the top 10 most harmful weather events. The stark contrast between tornadoes and other storm event types is notable, with tornadoes causing both the highest number of fatalities as well as injuries by a significant margin.
We first investigated the total economic damage by looking at the combined property and crop damage per weather event type.
# Analyze event types with greatest economic consequences in total
economic_impact <- storm_data_std %>%
group_by(EVTYPE_STD) %>%
summarise(
Property_Damage = sum(PROPDMG_VALUE, na.rm = TRUE),
Crop_Damage = sum(CROPDMG_VALUE, na.rm = TRUE),
Total_Damage = Property_Damage + Crop_Damage
) %>%
arrange(desc(Total_Damage)) %>%
head(10)
# Create plot for economic impact
ggplot(economic_impact,
aes(x = reorder(EVTYPE_STD, Total_Damage), y = Total_Damage/1e9)) +
geom_bar(stat = "identity", fill = "steelblue") +
coord_flip() +
theme_minimal() +
labs(title = "Overview of the 10 most costly weather event types",
x = "Weather event type",
y = "Total damage (in billions USD)")
Following up, we created a scatter plot to visualize the relationship between property damage and crop damage for the most costly weather events. The size of each point represents the total combined damage.
# Create scatter plot showing relationship between property and crop damage
economic_relationship <- storm_data_std %>%
group_by(EVTYPE_STD) %>%
summarise(
Property_Damage = sum(PROPDMG_VALUE, na.rm = TRUE) / 1e9,
Crop_Damage = sum(CROPDMG_VALUE, na.rm = TRUE) / 1e9,
Total_Damage = Property_Damage + Crop_Damage
) %>%
arrange(desc(Total_Damage)) %>%
head(15)
ggplot(economic_relationship,
aes(x = Property_Damage, y = Crop_Damage, label = EVTYPE_STD)) +
geom_point(aes(size = Total_Damage), alpha = 0.6, color = "blue") +
geom_text(aes(label = EVTYPE_STD), hjust = -0.1, vjust = 0.5, size = 3) +
scale_size_continuous(range = c(3, 15)) +
theme_minimal() +
labs(title = "Property damage vs Crop damage by Weather event type",
x = "Property Damage (Billions USD)",
y = "Crop Damage (Billions USD)",
size = "Total Damage") +
theme(legend.position = "right")
Based on our analysis, we found that regarding population health impact, Tornadoes are by far the most dangerous weather events in terms of both fatalities and injuries. Furthermore (Excessive) Heat and Flash Floods are also particularly lethal weather events.
The analysis of economic impact reveals that Floods cause the greatest economic damage, with total damages exceeding $150 billion, followed by Hurricanes/Typhoons as the second most costly type of weather event. Tornadoes, which turned out to be particularly dangerous for population health as well, ranks third in our analysis of economic impact.
Our visualization of the relationship between property damage and crop damage demonstrates that while most events cause significantly more property damage than crop damage, floods stand out as causing the highest combined damage and certain events like drought have a disproportionate impact on crops relative to property.
We are convinced that our findings can help emergency management agencies prioritize their resources and preparedness efforts based on both the population health and economic impacts of different weather events.