This report analyzes the NOAA Storm Database to identify which types of severe weather events are most harmful to population health and economic stability in the United States. We processed data from 1950 to 2011, focusing on fatalities, injuries, and property/crop damage. Our analysis finds that tornadoes are the most significant threat to human health, while floods cause the greatest economic loss.
We read the raw data directly from the compressed bzip2 file.
# Set global options to show code
knitr::opts_chunk$set(echo = TRUE)
# Download and load data
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
if(!file.exists("stormData.csv.bz2")) {
download.file(url, "stormData.csv.bz2")
}
stormData <- read.csv("stormData.csv.bz2")
To analyze health impact, we sum the number of fatalities and injuries by event type (EVTYPE).
library(dplyr) health_impact <- stormData %>% group_by(EVTYPE) %>% summarise(Total_Health = sum(FATALITIES + INJURIES, na.rm = TRUE)) %>% arrange(desc(Total_Health))
top_health <- head(health_impact, 10) ### 2.3 Processing Economic Impact Economic damage is split into property damage (PROPDMG) and crop damage (CROPDMG). We must normalize the multipliers (K=1000, M=10^6, B=10^9).
exp_transform <- function(e) { if (e %in% c(‘h’, ‘H’)) return(2) if (e %in% c(‘k’, ‘K’)) return(3) if (e %in% c(‘m’, ‘M’)) return(6) if (e %in% c(‘b’, ‘B’)) return(9) if (!is.na(as.numeric(e))) return(as.numeric(e)) return(0) }
stormData\(prop_mult <- sapply(stormData\)PROPDMGEXP, exp_transform) stormData\(crop_mult <- sapply(stormData\)CROPDMGEXP, exp_transform)
stormData <- stormData %>% mutate(Total_Damage = PROPDMG * (10 ^ prop_mult) + CROPDMG * (10 ^ crop_mult))
economic_impact <- stormData %>% group_by(EVTYPE) %>% summarise(Total_Economic = sum(Total_Damage, na.rm = TRUE)) %>% arrange(desc(Total_Economic))
top_econ <- head(economic_impact, 10)
Tornadoes far exceed other events in terms of combined fatalities and injuries.
library(ggplot2) ggplot(top_health, aes(x = reorder(EVTYPE, -Total_Health), y = Total_Health)) + geom_bar(stat = “identity”, fill = “red”) + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + labs(title = “Top 10 Health Harmful Weather Events”, x = “Event Type”, y = “Total Fatalities and Injuries”)
Floods, hurricanes, and storm surges cause the highest financial losses.
ggplot(top_econ, aes(x = reorder(EVTYPE, -Total_Economic), y = Total_Economic / 10^9)) + geom_bar(stat = “identity”, fill = “blue”) + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + labs(title = “Top 10 Economic Costly Weather Events”, x = “Event Type”, y = “Total Damage (Billions of USD)”)
Based on the data, Tornadoes are the primary cause of population health issues, while Floods cause the most property and crop damage across the United States.