This report analyzes the U.S. National Oceanic and Atmospheric Administration (NOAA) storm database to identify which weather events are most harmful to population health and which have the greatest economic consequences. The data spans from 1950 to November 2011. The analysis is performed using R and focuses on injuries, fatalities, property, and crop damages. Key transformations are applied to clean and aggregate the data to extract meaningful insights. Results are visualized in a few key plots.
knitr::opts_chunk$set(echo = TRUE)
library(dplyr)
##
## Adjuntando el paquete: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(readr)
# Load the data
data_path <- "C:/Users/lenovo/Documents/COURSERA/datasciencecoursera/repdata_data_StormData.csv"
storm_data <- read.csv(data_path)
# Select relevant columns
storm <- storm_data %>%
select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
# Convert exponent columns
exp_map <- c("K" = 1e3, "M" = 1e6, "B" = 1e9)
storm$PROPDMGEXP <- toupper(as.character(storm$PROPDMGEXP))
storm$CROPDMGEXP <- toupper(as.character(storm$CROPDMGEXP))
storm$PROPDMGEXP <- ifelse(storm$PROPDMGEXP %in% names(exp_map), exp_map[storm$PROPDMGEXP], 1)
storm$CROPDMGEXP <- ifelse(storm$CROPDMGEXP %in% names(exp_map), exp_map[storm$CROPDMGEXP], 1)
# Calculate actual damage values
storm$property_damage <- storm$PROPDMG * as.numeric(storm$PROPDMGEXP)
storm$crop_damage <- storm$CROPDMG * as.numeric(storm$CROPDMGEXP)
storm$total_damage <- storm$property_damage + storm$crop_damage
# Aggregate health impact
health_impact <- storm %>%
group_by(EVTYPE) %>%
summarise(Fatalities = sum(FATALITIES, na.rm=TRUE),
Injuries = sum(INJURIES, na.rm=TRUE),
TotalHealthImpact = Fatalities + Injuries) %>%
arrange(desc(TotalHealthImpact)) %>%
top_n(10, TotalHealthImpact)
# Aggregate economic impact
economic_impact <- storm %>%
group_by(EVTYPE) %>%
summarise(EconomicDamage = sum(total_damage, na.rm=TRUE)) %>%
arrange(desc(EconomicDamage)) %>%
top_n(10, EconomicDamage)
ggplot(health_impact, aes(x = reorder(EVTYPE, TotalHealthImpact), y = TotalHealthImpact)) +
geom_bar(stat = "identity", fill = "tomato") +
coord_flip() +
labs(title = "Top 10 Most Harmful Weather Events to Population Health",
x = "Event Type", y = "Total (Fatalities + Injuries)")
ggplot(economic_impact, aes(x = reorder(EVTYPE, EconomicDamage), y = EconomicDamage/1e9)) +
geom_bar(stat = "identity", fill = "steelblue") +
coord_flip() +
labs(title = "Top 10 Weather Events by Economic Damage",
x = "Event Type", y = "Total Damage (Billion USD)")
This analysis reveals that tornadoes are the leading cause of health-related impacts, while floods cause the most significant economic damage. Understanding these risks helps inform public policy and disaster preparedness strategies.