NOAA Storm Data Analysis

Synopsis

The basic goal of this assignment is to explore the U.S. NOAA Storm Database to address two key questions:

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
  2. Across the United States, which types of events have the greatest economic consequences?

This analysis uses data from 1950 to 2011 and calculates total injuries, fatalities, and economic damages (property and crop) by event type to inform preparedness strategies.

Data Processing

The data was downloaded and read directly into R. It includes information on storm events, their effects on public health and economic damages. We begin by loading necessary packages and reading the dataset.

library(dplyr)
## Warning: package 'dplyr' was built under R version 4.3.3
library(ggplot2)
library(reshape2)
## Warning: package 'reshape2' was built under R version 4.3.3
storm_data <- read.csv("repdata_data_StormData.csv.bz2")

We select only the relevant variables and process the damage exponent values to get accurate numeric estimates.

storm_subset <- storm_data %>%
  select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP,
         CROPDMG, CROPDMGEXP)

# Map exponent letters to values
exp_map <- c("K"=1e3, "M"=1e6, "B"=1e9)

storm_subset$PROPDMGEXP <- toupper(storm_subset$PROPDMGEXP)
storm_subset$CROPDMGEXP <- toupper(storm_subset$CROPDMGEXP)

storm_subset <- storm_subset %>%
  mutate(PROPDMGEXP = ifelse(PROPDMGEXP %in% names(exp_map), exp_map[PROPDMGEXP], 1),
         CROPDMGEXP = ifelse(CROPDMGEXP %in% names(exp_map), exp_map[CROPDMGEXP], 1),
         prop_damage = PROPDMG * as.numeric(PROPDMGEXP),
         crop_damage = CROPDMG * as.numeric(CROPDMGEXP))

Results

Impact on Public Health

We calculate the total fatalities and injuries by event type, then reshape the data for visualization.

health_data <- aggregate(cbind(FATALITIES, INJURIES) ~ EVTYPE, data = storm_subset, sum)
health_data_top <- health_data[order(-(health_data$FATALITIES + health_data$INJURIES)), ][1:10, ]
health_data_melt <- melt(health_data_top, id.vars = "EVTYPE")
names(health_data_melt) <- c("EVTYPE", "TYPE", "COUNT")

# Plot
ggplot(health_data_melt, aes(x = reorder(EVTYPE, COUNT), y = COUNT, fill = TYPE)) +
  geom_bar(stat = "identity") + coord_flip() +
  labs(title = "Top 10 Most Harmful Events to Public Health",
       x = "Event Type", y = "Total Count") +
  theme_minimal()

Economic Consequences

We analyze and plot the top 10 events based on total economic damage (property + crop).

economic_data <- storm_subset %>%
  group_by(EVTYPE) %>%
  summarise(total_damage = sum(prop_damage + crop_damage, na.rm = TRUE)) %>%
  arrange(desc(total_damage)) %>%
  head(10)

# Plot
ggplot(economic_data, aes(x = reorder(EVTYPE, total_damage), y = total_damage / 1e9)) +
  geom_bar(stat = "identity", fill = "darkred") + coord_flip() +
  labs(title = "Top 10 Events with Greatest Economic Damage",
       x = "Event Type", y = "Damage (Billion USD)") +
  theme_minimal()