This report explores the U.S. NOAA storm database from 1950 to November 2011 to identify which types of severe weather events are most harmful to population health and which cause the greatest economic damage. The analysis starts from the raw dataset in CSV format and includes preprocessing, transformation, and visualization of the data. Fatalities, injuries, property damage, and crop damage are used as metrics to quantify impacts. Tornadoes are identified as the most harmful to health, while floods and hurricanes are found to have the highest economic consequences. This analysis is intended to inform emergency planners and municipal decision-makers.
library(dplyr)
library(ggplot2)
library(readr)
storm_data <- read.csv("repdata_data_StormData.csv.bz2")
storm_subset <- storm_data %>%
select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
exp_to_num <- function(exp) {
exp <- toupper(trimws(as.character(exp)))
dplyr::case_when(
exp == "H" ~ 1e2,
exp == "K" ~ 1e3,
exp == "M" ~ 1e6,
exp == "B" ~ 1e9,
grepl("^[0-9]$", exp) ~ 10 ^ as.numeric(exp),
TRUE ~ 1
)
}
storm_subset <- storm_subset %>%
mutate(
PROPDMGEXP_NUM = exp_to_num(PROPDMGEXP),
CROPDMGEXP_NUM = exp_to_num(CROPDMGEXP),
TotalPropDamage = PROPDMG * PROPDMGEXP_NUM,
TotalCropDamage = CROPDMG * CROPDMGEXP_NUM,
TotalEconomicDamage = TotalPropDamage + TotalCropDamage
)
## Warning in eval_tidy(pair$rhs, env = default_env): NAs introduced by coercion
## Warning in eval_tidy(pair$rhs, env = default_env): NAs introduced by coercion
health_impact <- storm_subset %>%
group_by(EVTYPE) %>%
summarize(
Fatalities = sum(FATALITIES, na.rm = TRUE),
Injuries = sum(INJURIES, na.rm = TRUE),
TotalHealthImpact = Fatalities + Injuries
) %>%
arrange(desc(TotalHealthImpact)) %>%
slice_head(n = 10)
## `summarise()` ungrouping output (override with `.groups` argument)
ggplot(health_impact, aes(x = reorder(EVTYPE, TotalHealthImpact), y = TotalHealthImpact)) +
geom_bar(stat = "identity", fill = "firebrick") +
coord_flip() +
labs(
title = "Top 10 Event Types Most Harmful to Population Health",
x = "Event Type",
y = "Total Fatalities + Injuries"
)
economic_impact <- storm_subset %>%
group_by(EVTYPE) %>%
summarize(TotalDamage = sum(TotalEconomicDamage, na.rm = TRUE)) %>%
arrange(desc(TotalDamage)) %>%
slice_head(n = 10)
## `summarise()` ungrouping output (override with `.groups` argument)
ggplot(economic_impact, aes(x = reorder(EVTYPE, TotalDamage), y = TotalDamage / 1e9)) +
geom_bar(stat = "identity", fill = "steelblue") +
coord_flip() +
labs(
title = "Top 10 Event Types by Economic Damage",
x = "Event Type",
y = "Total Damage (in Billions USD)"
)
This analysis shows that tornadoes have caused the most human harm, while floods and hurricanes account for the largest economic losses. This information can guide resource allocation for disaster preparedness and response.