Synopsis

This report analyzes the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, which tracks major weather events across the United States from 1950 to November 2011. The goal is to identify which types of severe weather events are most harmful to population health and which have the greatest economic consequences. The analysis shows that tornadoes are by far the most harmful event type with respect to fatalities and injuries combined. Excessive heat is the leading cause of fatalities alone, while tornadoes cause the most injuries. For economic damage, floods cause the greatest property damage overall, while droughts cause the most crop damage. These findings can help government agencies prioritize resources and preparedness efforts for the most impactful weather event types.


Data Processing

The data comes as a comma-separated-value file compressed via the bzip2 algorithm. We load it directly using read.csv(), which handles .bz2 files natively in R.

# Load the data directly from bz2 file
storm <- read.csv("repdata_data_StormData.csv.bz2", stringsAsFactors = FALSE)

# Basic exploration
dim(storm)
## [1] 902297     37
names(storm)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

Subsetting Relevant Columns

For this analysis, we only need the event type, fatalities, injuries, property damage, crop damage, and their exponent multiplier columns.

library(dplyr)

storm_sub <- storm %>%
  select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)

Converting Damage Exponents

The PROPDMGEXP and CROPDMGEXP columns contain letters (K = thousands, M = millions, B = billions) that serve as multipliers. We convert them to numeric values.

# Function to convert exponent characters to numeric multipliers
convert_exp <- function(exp) {
  exp <- toupper(trimws(exp))
  dplyr::case_when(
    exp == "K" ~ 1e3,
    exp == "M" ~ 1e6,
    exp == "B" ~ 1e9,
    exp == "H" ~ 1e2,
    exp %in% as.character(0:9) ~ 10 ^ as.numeric(exp),
    TRUE ~ 1
  )
}

storm_sub <- storm_sub %>%
  mutate(
    prop_mult        = convert_exp(PROPDMGEXP),
    crop_mult        = convert_exp(CROPDMGEXP),
    prop_damage      = PROPDMG * prop_mult,
    crop_damage      = CROPDMG * crop_mult,
    total_damage     = prop_damage + crop_damage,
    total_casualties = FATALITIES + INJURIES
  )
## Warning: There were 2 warnings in `mutate()`.
## The first warning was:
## ℹ In argument: `prop_mult = convert_exp(PROPDMGEXP)`.
## Caused by warning:
## ! NAs introduced by coercion
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.

Cleaning Event Types

There are many inconsistently labelled event types. We convert them to uppercase and trim whitespace to reduce redundancy.

storm_sub$EVTYPE <- toupper(trimws(storm_sub$EVTYPE))

Results

Question 1: Which Event Types Are Most Harmful to Population Health?

We look at both fatalities and injuries separately, taking the top 10 event types for each.

# Top 10 by fatalities
top_fatalities <- storm_sub %>%
  group_by(EVTYPE) %>%
  summarise(Fatalities = sum(FATALITIES, na.rm = TRUE)) %>%
  arrange(desc(Fatalities)) %>%
  slice(1:10)

# Top 10 by injuries
top_injuries <- storm_sub %>%
  group_by(EVTYPE) %>%
  summarise(Injuries = sum(INJURIES, na.rm = TRUE)) %>%
  arrange(desc(Injuries)) %>%
  slice(1:10)

top_fatalities
## # A tibble: 10 × 2
##    EVTYPE         Fatalities
##    <chr>               <dbl>
##  1 TORNADO              5633
##  2 EXCESSIVE HEAT       1903
##  3 FLASH FLOOD           978
##  4 HEAT                  937
##  5 LIGHTNING             816
##  6 TSTM WIND             504
##  7 FLOOD                 470
##  8 RIP CURRENT           368
##  9 HIGH WIND             248
## 10 AVALANCHE             224
top_injuries
## # A tibble: 10 × 2
##    EVTYPE            Injuries
##    <chr>                <dbl>
##  1 TORNADO              91346
##  2 TSTM WIND             6957
##  3 FLOOD                 6789
##  4 EXCESSIVE HEAT        6525
##  5 LIGHTNING             5230
##  6 HEAT                  2100
##  7 ICE STORM             1975
##  8 FLASH FLOOD           1777
##  9 THUNDERSTORM WIND     1488
## 10 HAIL                  1361
library(ggplot2)
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 4.5.3
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine
p1 <- ggplot(top_fatalities, aes(x = reorder(EVTYPE, Fatalities), y = Fatalities)) +
  geom_bar(stat = "identity", fill = "firebrick") +
  coord_flip() +
  labs(title = "Top 10 Events by Fatalities",
       x = "Event Type", y = "Total Fatalities") +
  theme_bw(base_size = 9)

p2 <- ggplot(top_injuries, aes(x = reorder(EVTYPE, Injuries), y = Injuries)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  coord_flip() +
  labs(title = "Top 10 Events by Injuries",
       x = "Event Type", y = "Total Injuries") +
  theme_bw(base_size = 9)

grid.arrange(p1, p2, ncol = 2)
Figure 1: Top 10 weather event types by fatalities (left) and injuries (right) across the United States, 1950-2011.

Figure 1: Top 10 weather event types by fatalities (left) and injuries (right) across the United States, 1950-2011.

Finding: Tornadoes are the single most harmful event type for both fatalities and injuries. Excessive heat ranks second for fatalities, while thunderstorm winds rank second for injuries.


Question 2: Which Event Types Have the Greatest Economic Consequences?

We examine property damage and crop damage separately, and also look at total combined damage.

# Top 10 by property damage
top_prop <- storm_sub %>%
  group_by(EVTYPE) %>%
  summarise(PropertyDamage = sum(prop_damage, na.rm = TRUE)) %>%
  arrange(desc(PropertyDamage)) %>%
  slice(1:10)

# Top 10 by crop damage
top_crop <- storm_sub %>%
  group_by(EVTYPE) %>%
  summarise(CropDamage = sum(crop_damage, na.rm = TRUE)) %>%
  arrange(desc(CropDamage)) %>%
  slice(1:10)

top_prop
## # A tibble: 10 × 2
##    EVTYPE            PropertyDamage
##    <chr>                      <dbl>
##  1 FLOOD              144657709807 
##  2 HURRICANE/TYPHOON   69305840000 
##  3 TORNADO             56947380676.
##  4 STORM SURGE         43323536000 
##  5 FLASH FLOOD         16822723978.
##  6 HAIL                15735267513.
##  7 HURRICANE           11868319010 
##  8 TROPICAL STORM       7703890550 
##  9 WINTER STORM         6688497251 
## 10 HIGH WIND            5270046295
top_crop
## # A tibble: 10 × 2
##    EVTYPE             CropDamage
##    <chr>                   <dbl>
##  1 DROUGHT           13972566000
##  2 FLOOD              5661968450
##  3 RIVER FLOOD        5029459000
##  4 ICE STORM          5022113500
##  5 HAIL               3025954473
##  6 HURRICANE          2741910000
##  7 HURRICANE/TYPHOON  2607872800
##  8 FLASH FLOOD        1421317100
##  9 EXTREME COLD       1312973000
## 10 FROST/FREEZE       1094186000
p3 <- ggplot(top_prop, aes(x = reorder(EVTYPE, PropertyDamage), y = PropertyDamage / 1e9)) +
  geom_bar(stat = "identity", fill = "darkorange") +
  coord_flip() +
  labs(title = "Top 10 Events by Property Damage",
       x = "Event Type", y = "Property Damage (Billions USD)") +
  theme_bw(base_size = 9)

p4 <- ggplot(top_crop, aes(x = reorder(EVTYPE, CropDamage), y = CropDamage / 1e9)) +
  geom_bar(stat = "identity", fill = "darkgreen") +
  coord_flip() +
  labs(title = "Top 10 Events by Crop Damage",
       x = "Event Type", y = "Crop Damage (Billions USD)") +
  theme_bw(base_size = 9)

grid.arrange(p3, p4, ncol = 2)
Figure 2: Top 10 weather event types by property damage (left) and crop damage (right) in USD, across the United States, 1950-2011.

Figure 2: Top 10 weather event types by property damage (left) and crop damage (right) in USD, across the United States, 1950-2011.

Finding: Floods cause the greatest property damage (over $140 billion), followed by hurricanes/typhoons. Droughts are the leading cause of crop damage, followed by floods and river floods.


Summary Table: Top 5 Events Overall

# Combined top 5 by total economic damage
top_total <- storm_sub %>%
  group_by(EVTYPE) %>%
  summarise(
    Fatalities        = sum(FATALITIES,  na.rm = TRUE),
    Injuries          = sum(INJURIES,    na.rm = TRUE),
    Property_Damage_B = round(sum(prop_damage, na.rm = TRUE) / 1e9, 2),
    Crop_Damage_B     = round(sum(crop_damage, na.rm = TRUE) / 1e9, 2)
  ) %>%
  mutate(Total_Econ_B = Property_Damage_B + Crop_Damage_B) %>%
  arrange(desc(Total_Econ_B)) %>%
  slice(1:5)

knitr::kable(top_total,
  caption   = "Top 5 Event Types by Total Economic Damage (Billions USD)",
  col.names = c("Event Type", "Fatalities", "Injuries",
                "Property Dmg (B$)", "Crop Dmg (B$)", "Total Econ (B$)"))
Top 5 Event Types by Total Economic Damage (Billions USD)
Event Type Fatalities Injuries Property Dmg (B\()| Crop Dmg (B\)) Total Econ (B$)
FLOOD 470 6789 144.66 5.66 150.32
HURRICANE/TYPHOON 64 1275 69.31 2.61 71.92
TORNADO 5633 91346 56.95 0.41 57.36
STORM SURGE 13 38 43.32 0.00 43.32
HAIL 15 1361 15.74 3.03 18.77