Synopsis

This report analyzes the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, which covers major weather events from 1950 to November 2011. The goal is to identify which types of events are most harmful to population health and which have the greatest economic consequences. Health impact was measured by combining fatalities and injuries per event type. Economic impact was estimated by converting property and crop damage values to a common dollar scale using their respective magnitude codes. Results show that tornadoes cause the most harm to human health, while floods generate the greatest overall economic damage across the United States. These findings can help government and municipal managers prioritize resource allocation when preparing for severe weather events.


Data Processing

Loading the data

The data is downloaded directly from the course website and read from the compressed .csv.bz2 file.

# url <- "https://d396qusza40orc.frontcdn.net/repdata%2Fdata%2FStormData.csv.bz2"
# destfile <- "StormData.csv.bz2"
# 
# if (!file.exists(destfile)) {
#   download.file(url, destfile, method = "curl")
# }

storm <- read.csv("repdata_data_StormData.csv", stringsAsFactors = FALSE)
dim(storm)
## [1] 902297     37
names(storm)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

Subsetting relevant variables

We keep only the columns needed to answer both questions.

vars <- c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")
storm <- storm[, vars]

Processing health data

We aggregate total fatalities and injuries by event type.

library(dplyr)

health <- storm %>%
  group_by(EVTYPE) %>%
  summarise(
    Fatalities = sum(FATALITIES, na.rm = TRUE),
    Injuries   = sum(INJURIES,   na.rm = TRUE),
    Total      = Fatalities + Injuries
  ) %>%
  arrange(desc(Total)) %>%
  head(10)

Processing economic data

The PROPDMGEXP and CROPDMGEXP columns encode magnitude multipliers. We convert them to numeric values (K = 1,000 | M = 1,000,000 | B = 1,000,000,000).

# Function to convert EXP codes to numeric multipliers
exp_to_num <- function(exp) {
  exp <- toupper(trimws(exp))
  dplyr::case_when(
    exp == "K" ~ 1e3,
    exp == "M" ~ 1e6,
    exp == "B" ~ 1e9,
    exp == "H" ~ 1e2,
    exp %in% as.character(0:9) ~ 10^as.numeric(exp),
    TRUE ~ 1
  )
}

storm$PropDmgValue <- storm$PROPDMG * exp_to_num(storm$PROPDMGEXP)
storm$CropDmgValue <- storm$CROPDMG * exp_to_num(storm$CROPDMGEXP)

economic <- storm %>%
  group_by(EVTYPE) %>%
  summarise(
    PropertyDamage = sum(PropDmgValue, na.rm = TRUE),
    CropDamage     = sum(CropDmgValue, na.rm = TRUE),
    TotalDamage    = PropertyDamage + CropDamage
  ) %>%
  arrange(desc(TotalDamage)) %>%
  head(10)

Results

Question 1: Which events are most harmful to population health?

library(ggplot2)
library(tidyr)

health_long <- health %>%
  select(EVTYPE, Fatalities, Injuries) %>%
  pivot_longer(cols = c(Fatalities, Injuries), names_to = "Type", values_to = "Count") %>%
  mutate(EVTYPE = reorder(EVTYPE, -Count))

ggplot(health_long, aes(x = reorder(EVTYPE, Count), y = Count, fill = Type)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  scale_fill_manual(values = c("Fatalities" = "#c0392b", "Injuries" = "#e67e22")) +
  labs(
    title    = "Top 10 Weather Events by Health Impact (1950–2011)",
    subtitle = "Source: NOAA Storm Database",
    x        = "Event Type",
    y        = "Total Cases",
    fill     = ""
  ) +
  theme_minimal(base_size = 13) +
  theme(legend.position = "bottom")
Figure 1. Top 10 weather event types by total health impact (fatalities + injuries) in the U.S., 1950–2011. Tornadoes dominate both categories by a wide margin.

Figure 1. Top 10 weather event types by total health impact (fatalities + injuries) in the U.S., 1950–2011. Tornadoes dominate both categories by a wide margin.

Tornadoes are by far the most harmful event type, accounting for the majority of both fatalities and injuries. Excessive heat and floods follow distantly in second and third place.


Question 2: Which events have the greatest economic consequences?

economic_long <- economic %>%
  select(EVTYPE, PropertyDamage, CropDamage) %>%
  pivot_longer(cols = c(PropertyDamage, CropDamage), names_to = "Type", values_to = "Damage") %>%
  mutate(
    EVTYPE = reorder(EVTYPE, Damage),
    Damage = Damage / 1e9  # convert to billions
  )

ggplot(economic_long, aes(x = reorder(EVTYPE, Damage), y = Damage, fill = Type)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  scale_fill_manual(values = c("PropertyDamage" = "#2980b9", "CropDamage" = "#27ae60")) +
  labs(
    title    = "Top 10 Weather Events by Economic Damage (1950–2011)",
    subtitle = "Source: NOAA Storm Database",
    x        = "Event Type",
    y        = "Total Damage (Billion USD)",
    fill     = ""
  ) +
  theme_minimal(base_size = 13) +
  theme(legend.position = "bottom")
Figure 2. Top 10 weather event types by total economic damage (property + crop damage) in the U.S., 1950–2011. Floods produce the highest combined economic losses.

Figure 2. Top 10 weather event types by total economic damage (property + crop damage) in the U.S., 1950–2011. Floods produce the highest combined economic losses.

Floods cause the greatest total economic damage, driven primarily by property losses. Hurricanes/typhoons and tornadoes also rank among the top economic threats, while droughts stand out for their disproportionate impact on crop damage.


Summary Table

knitr::kable(
  health[, c("EVTYPE", "Fatalities", "Injuries", "Total")],
  caption = "Table 1. Top 10 events by health impact",
  col.names = c("Event Type", "Fatalities", "Injuries", "Total"),
  format.args = list(big.mark = ",")
)
Table 1. Top 10 events by health impact
Event Type Fatalities Injuries Total
TORNADO 5,633 91,346 96,979
EXCESSIVE HEAT 1,903 6,525 8,428
TSTM WIND 504 6,957 7,461
FLOOD 470 6,789 7,259
LIGHTNING 816 5,230 6,046
HEAT 937 2,100 3,037
FLASH FLOOD 978 1,777 2,755
ICE STORM 89 1,975 2,064
THUNDERSTORM WIND 133 1,488 1,621
WINTER STORM 206 1,321 1,527
knitr::kable(
  economic[, c("EVTYPE", "PropertyDamage", "CropDamage", "TotalDamage")],
  caption = "Table 2. Top 10 events by economic damage (USD)",
  col.names = c("Event Type", "Property Damage", "Crop Damage", "Total Damage"),
  format.args = list(big.mark = ",")
)
Table 2. Top 10 events by economic damage (USD)
Event Type Property Damage Crop Damage Total Damage
FLOOD 144,657,709,807 5,661,968,450 150,319,678,257
HURRICANE/TYPHOON 69,305,840,000 2,607,872,800 71,913,712,800
TORNADO 56,947,380,677 414,953,270 57,362,333,947
STORM SURGE 43,323,536,000 5,000 43,323,541,000
HAIL 15,735,267,513 3,025,954,473 18,761,221,986
FLASH FLOOD 16,822,673,979 1,421,317,100 18,243,991,079
DROUGHT 1,046,106,000 13,972,566,000 15,018,672,000
HURRICANE 11,868,319,010 2,741,910,000 14,610,229,010
RIVER FLOOD 5,118,945,500 5,029,459,000 10,148,404,500
ICE STORM 3,944,927,860 5,022,113,500 8,967,041,360