The basic goal of this assignment is to explore the U.S. NOAA Storm Database to address two key questions:
EVTYPE variable) are most harmful with respect to
population health?This analysis uses data from 1950 to 2011 and calculates total injuries, fatalities, and economic damages (property and crop) by event type to inform preparedness strategies.
The data was downloaded and read directly into R. It includes information on storm events, their effects on public health and economic damages. We begin by loading necessary packages and reading the dataset.
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.3.3
library(ggplot2)
library(reshape2)
## Warning: package 'reshape2' was built under R version 4.3.3
storm_data <- read.csv("repdata_data_StormData.csv.bz2")
We select only the relevant variables and process the damage exponent values to get accurate numeric estimates.
storm_subset <- storm_data %>%
select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP,
CROPDMG, CROPDMGEXP)
# Map exponent letters to values
exp_map <- c("K"=1e3, "M"=1e6, "B"=1e9)
storm_subset$PROPDMGEXP <- toupper(storm_subset$PROPDMGEXP)
storm_subset$CROPDMGEXP <- toupper(storm_subset$CROPDMGEXP)
storm_subset <- storm_subset %>%
mutate(PROPDMGEXP = ifelse(PROPDMGEXP %in% names(exp_map), exp_map[PROPDMGEXP], 1),
CROPDMGEXP = ifelse(CROPDMGEXP %in% names(exp_map), exp_map[CROPDMGEXP], 1),
prop_damage = PROPDMG * as.numeric(PROPDMGEXP),
crop_damage = CROPDMG * as.numeric(CROPDMGEXP))
We calculate the total fatalities and injuries by event type, then reshape the data for visualization.
health_data <- aggregate(cbind(FATALITIES, INJURIES) ~ EVTYPE, data = storm_subset, sum)
health_data_top <- health_data[order(-(health_data$FATALITIES + health_data$INJURIES)), ][1:10, ]
health_data_melt <- melt(health_data_top, id.vars = "EVTYPE")
names(health_data_melt) <- c("EVTYPE", "TYPE", "COUNT")
# Plot
ggplot(health_data_melt, aes(x = reorder(EVTYPE, COUNT), y = COUNT, fill = TYPE)) +
geom_bar(stat = "identity") + coord_flip() +
labs(title = "Top 10 Most Harmful Events to Public Health",
x = "Event Type", y = "Total Count") +
theme_minimal()
We analyze and plot the top 10 events based on total economic damage (property + crop).
economic_data <- storm_subset %>%
group_by(EVTYPE) %>%
summarise(total_damage = sum(prop_damage + crop_damage, na.rm = TRUE)) %>%
arrange(desc(total_damage)) %>%
head(10)
# Plot
ggplot(economic_data, aes(x = reorder(EVTYPE, total_damage), y = total_damage / 1e9)) +
geom_bar(stat = "identity", fill = "darkred") + coord_flip() +
labs(title = "Top 10 Events with Greatest Economic Damage",
x = "Event Type", y = "Damage (Billion USD)") +
theme_minimal()