Synopsis

This report explores the U.S. NOAA storm database from 1950 to November 2011 to identify which types of severe weather events are most harmful to population health and which cause the greatest economic damage. The analysis starts from the raw dataset in CSV format and includes preprocessing, transformation, and visualization of the data. Fatalities, injuries, property damage, and crop damage are used as metrics to quantify impacts. Tornadoes are identified as the most harmful to health, while floods and hurricanes are found to have the highest economic consequences. This analysis is intended to inform emergency planners and municipal decision-makers.

Data Processing

Load libraries

library(dplyr)
library(ggplot2)
library(readr)

Load the raw data

storm_data <- read.csv("repdata_data_StormData.csv.bz2")

Select relevant columns

storm_subset <- storm_data %>% 
  select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)

Convert exponent fields to numeric multipliers

exp_to_num <- function(exp) {
  exp <- toupper(trimws(as.character(exp)))
  dplyr::case_when(
    exp == "H" ~ 1e2,
    exp == "K" ~ 1e3,
    exp == "M" ~ 1e6,
    exp == "B" ~ 1e9,
    grepl("^[0-9]$", exp) ~ 10 ^ as.numeric(exp),
    TRUE ~ 1
  )
}

storm_subset <- storm_subset %>% 
  mutate(
    PROPDMGEXP_NUM = exp_to_num(PROPDMGEXP),
    CROPDMGEXP_NUM = exp_to_num(CROPDMGEXP),
    TotalPropDamage = PROPDMG * PROPDMGEXP_NUM,
    TotalCropDamage = CROPDMG * CROPDMGEXP_NUM,
    TotalEconomicDamage = TotalPropDamage + TotalCropDamage
  )
## Warning in eval_tidy(pair$rhs, env = default_env): NAs introduced by coercion

## Warning in eval_tidy(pair$rhs, env = default_env): NAs introduced by coercion

Results

1. Events Most Harmful to Population Health

health_impact <- storm_subset %>% 
  group_by(EVTYPE) %>% 
  summarize(
    Fatalities = sum(FATALITIES, na.rm = TRUE),
    Injuries = sum(INJURIES, na.rm = TRUE),
    TotalHealthImpact = Fatalities + Injuries
  ) %>% 
  arrange(desc(TotalHealthImpact)) %>% 
  slice_head(n = 10)
## `summarise()` ungrouping output (override with `.groups` argument)
ggplot(health_impact, aes(x = reorder(EVTYPE, TotalHealthImpact), y = TotalHealthImpact)) +
  geom_bar(stat = "identity", fill = "firebrick") +
  coord_flip() +
  labs(
    title = "Top 10 Event Types Most Harmful to Population Health",
    x = "Event Type",
    y = "Total Fatalities + Injuries"
  )

2. Events with Greatest Economic Consequences

economic_impact <- storm_subset %>% 
  group_by(EVTYPE) %>% 
  summarize(TotalDamage = sum(TotalEconomicDamage, na.rm = TRUE)) %>% 
  arrange(desc(TotalDamage)) %>% 
  slice_head(n = 10)
## `summarise()` ungrouping output (override with `.groups` argument)
ggplot(economic_impact, aes(x = reorder(EVTYPE, TotalDamage), y = TotalDamage / 1e9)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  coord_flip() +
  labs(
    title = "Top 10 Event Types by Economic Damage",
    x = "Event Type",
    y = "Total Damage (in Billions USD)"
  )

Conclusion

This analysis shows that tornadoes have caused the most human harm, while floods and hurricanes account for the largest economic losses. This information can guide resource allocation for disaster preparedness and response.