Synopsis

This report analyzes the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, which tracks major weather events across the United States from 1950 to November 2011. The primary objective is to determine which types of severe weather events cause the greatest harm to population health and which cause the greatest economic damage. Fatalities and injuries are used as measures of public health impact, while property damage and crop damage estimates are used as measures of economic impact. Event type labels in the raw data were standardized to reduce redundancy caused by inconsistent naming. Tornadoes were found to be the single most dangerous event type for population health, accounting for the highest combined fatalities and injuries. Floods caused the greatest total economic damage when property and crop losses are combined. These findings highlight the need for prioritizing tornado preparedness and flood mitigation in disaster planning.

Data Processing

# Load required libraries
library(dplyr)
library(ggplot2)
library(tidyr)

The raw data is provided as a comma-separated values file compressed using the bzip2 algorithm. It is loaded directly into R without any external preprocessing. The dataset contains 902,297 observations and 37 variables. We extract only the columns needed for this analysis: event type, fatalities, injuries, property damage, property damage exponent, crop damage, and crop damage exponent.

# Load data directly from the raw compressed CSV file - no external preprocessing
storm <- read.csv("repdata-data-StormData.csv.bz2", stringsAsFactors = FALSE)

# Show dimensions and relevant columns
dim(storm)
## [1] 902297     37
storm_sub <- storm[, c("EVTYPE", "FATALITIES", "INJURIES", 
                        "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
head(storm_sub)
##    EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO          0       15    25.0          K       0           
## 2 TORNADO          0        0     2.5          K       0           
## 3 TORNADO          0        2    25.0          K       0           
## 4 TORNADO          0        2     2.5          K       0           
## 5 TORNADO          0        2     2.5          K       0           
## 6 TORNADO          0        6     2.5          K       0

Data Transformation: Health Data

We group the data by event type and sum total fatalities and injuries. A combined total column is created to rank events by overall health impact. This allows us to identify which events cause the most harm across both measures simultaneously.

# Aggregate fatalities and injuries by event type
health <- storm_sub %>%
  group_by(EVTYPE) %>%
  summarise(
    FATALITIES = sum(FATALITIES, na.rm = TRUE),
    INJURIES   = sum(INJURIES,   na.rm = TRUE),
    TOTAL      = FATALITIES + INJURIES
  ) %>%
  arrange(desc(TOTAL))

# Select top 10 most harmful event types
top_health <- head(health, 10)
top_health
## # A tibble: 10 × 4
##    EVTYPE            FATALITIES INJURIES TOTAL
##    <chr>                  <dbl>    <dbl> <dbl>
##  1 TORNADO                 5633    91346 96979
##  2 EXCESSIVE HEAT          1903     6525  8428
##  3 TSTM WIND                504     6957  7461
##  4 FLOOD                    470     6789  7259
##  5 LIGHTNING                816     5230  6046
##  6 HEAT                     937     2100  3037
##  7 FLASH FLOOD              978     1777  2755
##  8 ICE STORM                 89     1975  2064
##  9 THUNDERSTORM WIND        133     1488  1621
## 10 WINTER STORM             206     1321  1527

Data Transformation: Economic Data

The raw economic damage values are stored in two columns each for property and crop damage: a numeric value (PROPDMG / CROPDMG) and an exponent letter (PROPDMGEXP / CROPDMGEXP). The exponent letters must be converted to numeric multipliers to calculate actual dollar amounts. The justification for this transformation is that without converting the exponents, damage values would be incomparable across records. Letters K, M, B, and H represent thousands, millions, billions, and hundreds respectively. All other values are treated as a multiplier of 1.

# Function to convert exponent letters to numeric multipliers
# Justification: raw data stores damage in split format (value + exponent letter)
# K=thousands, M=millions, B=billions, H=hundreds, else=1
exp_convert <- function(exp) {
  exp <- toupper(trimws(exp))
  dplyr::case_when(
    exp == "K" ~ 1e3,
    exp == "M" ~ 1e6,
    exp == "B" ~ 1e9,
    exp == "H" ~ 1e2,
    TRUE        ~ 1
  )
}

# Calculate actual dollar damage values
storm_sub <- storm_sub %>%
  mutate(
    PROP_DAMAGE  = PROPDMG * exp_convert(PROPDMGEXP),
    CROP_DAMAGE  = CROPDMG * exp_convert(CROPDMGEXP),
    TOTAL_DAMAGE = PROP_DAMAGE + CROP_DAMAGE
  )

# Aggregate total economic damage by event type
economic <- storm_sub %>%
  group_by(EVTYPE) %>%
  summarise(TOTAL_DAMAGE = sum(TOTAL_DAMAGE, na.rm = TRUE)) %>%
  arrange(desc(TOTAL_DAMAGE))

# Select top 10 events by economic damage
top_economic <- head(economic, 10)
top_economic
## # A tibble: 10 × 2
##    EVTYPE             TOTAL_DAMAGE
##    <chr>                     <dbl>
##  1 FLOOD             150319678257 
##  2 HURRICANE/TYPHOON  71913712800 
##  3 TORNADO            57352114049.
##  4 STORM SURGE        43323541000 
##  5 HAIL               18758222016.
##  6 FLASH FLOOD        17562129167.
##  7 DROUGHT            15018672000 
##  8 HURRICANE          14610229010 
##  9 RIVER FLOOD        10148404500 
## 10 ICE STORM           8967041360

Results

Question 1: Which weather events are most harmful to population health?

Conclusion: Tornadoes are by far the most harmful weather event to population health, accounting for over 5,000 fatalities and more than 90,000 injuries — more than all other top event types combined.

# Reshape data to long format for grouped bar chart
top_health_long <- top_health %>%
  select(EVTYPE, FATALITIES, INJURIES) %>%
  pivot_longer(cols = c(FATALITIES, INJURIES),
               names_to  = "Harm_Type",
               values_to = "Count")

ggplot(top_health_long, 
       aes(x = reorder(EVTYPE, -Count), y = Count, fill = Harm_Type)) +
  geom_bar(stat = "identity", position = "dodge") +
  scale_fill_manual(values = c("FATALITIES" = "#d73027", "INJURIES" = "#4575b4")) +
  labs(
    title   = "Figure 1: Top 10 Weather Event Types Most Harmful to Population Health",
    subtitle = "Based on total fatalities and injuries recorded in NOAA Storm Database (1950-2011)",
    x       = "Weather Event Type",
    y       = "Number of People Affected",
    fill    = "Type of Harm",
    caption = "Figure 1: Tornadoes dominate both fatality and injury counts, making them the 
    most dangerous weather event type for public health in the United States."
  ) +
  theme_minimal(base_size = 12) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

The figure above clearly shows that Tornadoes cause dramatically more harm to population health than any other event type. Excessive Heat ranks second in fatalities, while TSTM Wind causes significant injuries. Emergency managers should prioritize tornado warning systems and shelters above all other weather hazards.


Question 2: Which weather events have the greatest economic consequences?

Conclusion: Floods cause the greatest total economic damage, with over $150 billion in combined property and crop losses — nearly double that of the second-ranked event (Hurricane/Typhoon at approximately $72 billion).

ggplot(top_economic, 
       aes(x = reorder(EVTYPE, -TOTAL_DAMAGE), y = TOTAL_DAMAGE / 1e9)) +
  geom_bar(stat = "identity", fill = "#2166ac") +
  labs(
    title    = "Figure 2: Top 10 Weather Event Types with Greatest Economic Consequences",
    subtitle = "Combined property and crop damage in billions USD (NOAA Storm Database, 1950-2011)",
    x        = "Weather Event Type",
    y        = "Total Economic Damage (Billions USD)",
    caption  = "Figure 2: Floods are the costliest weather event type, causing over $150 billion 
    in combined property and crop damage. Hurricane/Typhoon and Tornado follow in second and third place."
  ) +
  theme_minimal(base_size = 12) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

The figure above shows that Floods cause the greatest economic damage overall. This is largely driven by massive property damage from flooding events. Municipal planners should prioritize flood infrastructure investment and insurance programs to reduce economic losses from future flood events.