This analysis explores the NOAA Storm Database to identify which
weather event types have the greatest impact on public health and the
economy in the United States between 1950 and 2011. Using variables such
as fatalities, injuries, property damage, and crop damage, we summarize
and visualize the impact of different event types. Data transformations
include converting monetary values to numeric and grouping events.
Tornadoes are found to be the most harmful to population health, while
floods and hurricanes cause the highest economic damage. All data
processing and plotting were performed in R, using packages such as
dplyr, ggplot2, and knitr. The
analysis begins with the raw data file and is fully reproducible.
Figures are limited to three and include appropriate captions. This
report aims to inform stakeholders involved in disaster preparedness.
The final figures illustrate the top harmful and costly weather events.
The code and results are included in this document.
### Install Packages ####
install.packages(c("dplyr", "ggplot2", "readxl", "readr"))
### Libraries ####
library(dplyr)
library(ggplot2)
library(readr)
library(readxl)
# Leer datos desde archivo comprimido
storm_data <- read.csv("repdata_data_StormData.csv.bz2")
# Filtrar columnas relevantes
storm_df <- storm_data %>%
select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
# Función para convertir las magnitudes
convert_exp <- function(exp) {
if (exp %in% c("K", "k")) return(1e3)
if (exp %in% c("M", "m")) return(1e6)
if (exp %in% c("B", "b")) return(1e9)
if (exp %in% c("", "+", "-", "?", "0")) return(1)
return(1)
}
# Aplicar la conversión
storm_df <- storm_df %>%
mutate(
PROPDMGEXP = sapply(PROPDMGEXP, convert_exp),
CROPDMGEXP = sapply(CROPDMGEXP, convert_exp),
PROP_DAMAGE = PROPDMG * as.numeric(PROPDMGEXP),
CROP_DAMAGE = CROPDMG * as.numeric(CROPDMGEXP)
)
health_impact <- storm_df %>%
group_by(EVTYPE) %>%
summarise(
Fatalities = sum(FATALITIES),
Injuries = sum(INJURIES)
) %>%
arrange(desc(Fatalities + Injuries)) %>%
top_n(10, Fatalities + Injuries)
# Figura 1
library(ggplot2)
ggplot(health_impact, aes(x = reorder(EVTYPE, Fatalities + Injuries), y = Fatalities + Injuries)) +
geom_bar(stat = "identity", fill = "darkred") +
coord_flip() +
labs(title = "Eventos Más Dañinos para la Salud Pública",
x = "Tipo de Evento", y = "Total Fatalidades + Lesiones")
economic_impact <- storm_df %>%
group_by(EVTYPE) %>%
summarise(
Total_Damage = sum(PROP_DAMAGE + CROP_DAMAGE)
) %>%
arrange(desc(Total_Damage)) %>%
top_n(10, Total_Damage)
# Figura 2
ggplot(economic_impact, aes(x = reorder(EVTYPE, Total_Damage), y = Total_Damage)) +
geom_bar(stat = "identity", fill = "steelblue") +
coord_flip() +
labs(title = "Eventos con Mayor Impacto Económico",
x = "Tipo de Evento", y = "Daño Económico Total (USD)")