This analysis uses the NOAA storm database to evaluate severe weather events across the US. We analyze two key aspects: harm to public health and economic losses. Raw compressed data is loaded, cleaned and aggregated. Results show tornadoes cause the most casualties, while floods lead to the largest economic damage. This report includes two figures to visualize the top dangerous weather events.
We directly load the original compressed file
StormData.csv.bz2 without manual extraction. Key columns
related to event type, fatalities, injuries, property and crop damage
are selected. We convert damage exponent codes to actual dollar values,
calculate total health impact and total economic loss, then aggregate
data by weather event type for comparison. Code caching is enabled to
speed up repeated rendering.
# Load required libraries
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
# Read original bz2 compressed raw data
storm_data <- read.csv("StormData.csv.bz2")
# Extract only useful columns
data_clean <- storm_data %>%
select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
# Calculate total health impact (fatalities + injuries)
data_clean$total_casualty <- data_clean$FATALITIES + data_clean$INJURIES
# Define exponent conversion for damage units (K=1000, M=1e6, B=1e9)
exp_rule <- c("K" = 1000, "M" = 1000000, "B" = 1000000000)
# Process property damage
data_clean$PROPDMGEXP <- ifelse(data_clean$PROPDMGEXP %in% names(exp_rule), data_clean$PROPDMGEXP, "K")
data_clean$prop_total <- data_clean$PROPDMG * exp_rule[data_clean$PROPDMGEXP]
# Process crop damage
data_clean$CROPDMGEXP <- ifelse(data_clean$CROPDMGEXP %in% names(exp_rule), data_clean$CROPDMGEXP, "K")
data_clean$crop_total <- data_clean$CROPDMG * exp_rule[data_clean$CROPDMGEXP]
# Total economic loss
data_clean$total_economic <- data_clean$prop_total + data_clean$crop_total
# Aggregate data by weather event type
health_summary <- data_clean %>%
group_by(EVTYPE) %>%
summarise(total_casualty = sum(total_casualty, na.rm = TRUE)) %>%
arrange(desc(total_casualty))
economic_summary <- data_clean %>%
group_by(EVTYPE) %>%
summarise(total_economic = sum(total_economic, na.rm = TRUE)) %>%
arrange(desc(total_economic))
The plot below displays the top 10 weather events by total fatalities and injuries. Tornadoes are the leading cause of harm to public health in the United States.
top10_health <- head(health_summary, 10)
ggplot(top10_health, aes(x = reorder(EVTYPE, total_casualty), y = total_casualty)) +
geom_col(fill = "#e74c3c") +
coord_flip() +
labs(x = "Weather Event Type",
y = "Total Fatalities & Injuries",
title = "Top Events Impacting Public Health") +
theme_minimal()
This plot shows the top 10 events ranked by total property and crop damage. Floods result in the highest economic losses nationwide.
top10_econ <- head(economic_summary, 10)
ggplot(top10_econ, aes(x = reorder(EVTYPE, total_economic), y = total_economic)) +
geom_col(fill = "#3498db") +
coord_flip() +
labs(x = "Weather Event Type",
y = "Total Economic Loss (USD)",
title = "Top Events Causing Economic Damage") +
theme_minimal()