This analysis explores the NOAA Storm Database to identify which weather events are most harmful to public health and the economy. We processed data from 1950 to 2011, focusing on fatalities, injuries, and property/crop damage. Results show that tornadoes are leading causes of health issues, while floods and hurricanes cause the most economic destruction.
# ¡AQUÍ ADENTRO SÍ FUNCIONA EL CÓDIGO!
library(dplyr)
##
## Adjuntando el paquete: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(tidyr)
# 1. Load data
storm_data <- read.csv("repdata_data_StormData.csv.bz2")
# 2. Function to transform exponents
exp_transform <- function(e) {
e <- toupper(as.character(e))
if (e == 'H') return(100)
if (e == 'K') return(1000)
if (e == 'M') return(1e+06)
if (e == 'B') return(1e+09)
return(1)
}
# 3. Clean and Calculate totals
important_data <- storm_data %>%
select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP) %>%
mutate(prop_mult = sapply(PROPDMGEXP, exp_transform),
crop_mult = sapply(CROPDMGEXP, exp_transform),
PROPTOTAL = PROPDMG * prop_mult,
CROPTOTAL = CROPDMG * crop_mult,
TOTAL_ECON = PROPTOTAL + CROPTOTAL)
# 4. Group by event for Health (Top 10)
health_impact <- important_data %>%
group_by(EVTYPE) %>%
summarise(FATALITIES = sum(FATALITIES), INJURIES = sum(INJURIES)) %>%
arrange(desc(FATALITIES + INJURIES)) %>%
slice(1:10)
# 5. Group by event for Economy (Top 10)
econ_impact <- important_data %>%
group_by(EVTYPE) %>%
summarise(TOTAL_ECON = sum(TOTAL_ECON)) %>%
arrange(desc(TOTAL_ECON)) %>%
slice(1:10)
The following figure illustrates the top 10 weather events that caused the highest number of fatalities and injuries combined across the United States.
# Transformamos los datos para poder graficar muertes y lesiones juntas
health_long <- health_impact %>%
pivot_longer(cols = c(FATALITIES, INJURIES), names_to = "Type", values_to = "Count")
ggplot(health_long, aes(x = reorder(EVTYPE, Count), y = Count, fill = Type)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = "Top 10 Health Impacts by Weather Event",
x = "Event Type",
y = "Total Number of People Affected") +
theme_minimal()
The following chart shows the top 10 weather events that are most harmful to the population, considering both fatalities and injuries. As seen in the data, tornadoes are the leading cause of health issues in the US.
library(tidyr)
# Transform data for stacked bar plot
health_long <- health_impact %>%
pivot_longer(cols = c(FATALITIES, INJURIES), names_to = "Type", values_to = "Count")
ggplot(health_long, aes(x = reorder(EVTYPE, Count), y = Count, fill = Type)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = "Top 10 Health Impacts by Weather Event",
x = "Event Type", y = "Total Number of People Affected") +
theme_minimal()