1 Synopsis

This report explores the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, which tracks characteristics of major storms and weather events in the United States from 1950 to November 2011. The analysis addresses two key questions: (1) which types of weather events are most harmful to population health, and (2) which types of events have the greatest economic consequences.

After loading the raw data from the bzip2-compressed CSV file, we cleaned and processed the event type labels, decoded the property and crop damage exponents, and aggregated totals by event type. Tornadoes were found to be overwhelmingly the most harmful event type with respect to fatalities and injuries. For economic consequences, floods caused the most total economic damage, followed by hurricanes/typhoons and storm surges. These findings provide actionable insight for government and municipal managers seeking to prioritize emergency preparedness resources.


2 Data Processing

2.1 Loading Required Libraries

library(ggplot2)
library(dplyr)
library(tidyr)
library(scales)

2.2 Loading the Data

The data is loaded directly from the raw compressed .csv.bz2 file. The read.csv() function in R can read bzip2-compressed files natively.

# Load data directly from the bz2 file
storm_data <- read.csv(bzfile("repdata_data_StormData.csv.bz2"), 
                       header = TRUE, 
                       stringsAsFactors = FALSE)
dim(storm_data)
## [1] 902297     37

2.3 Inspecting the Data

# View structure and key columns
str(storm_data[, c("EVTYPE", "FATALITIES", "INJURIES", 
                   "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")])
## 'data.frame':    902297 obs. of  7 variables:
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...

2.4 Cleaning Event Types

Event type labels in the raw data are inconsistent (mixed case, abbreviations, trailing spaces). We standardize them for accurate grouping.

# Standardize event type: trim whitespace, convert to uppercase
storm_data$EVTYPE_CLEAN <- trimws(toupper(storm_data$EVTYPE))

2.5 Decoding Damage Exponents

The PROPDMGEXP and CROPDMGEXP columns use letter codes to represent multipliers (e.g., K = thousands, M = millions, B = billions). We convert these to numeric multipliers.

# Function to convert exponent codes to numeric multipliers
decode_exp <- function(exp) {
  exp <- toupper(trimws(exp))
  case_when(
    exp == "B" ~ 1e9,
    exp == "M" ~ 1e6,
    exp == "K" ~ 1e3,
    exp == "H" ~ 1e2,
    exp %in% as.character(0:9) ~ 10^as.numeric(exp),
    TRUE ~ 1
  )
}

# Apply multipliers
storm_data <- storm_data %>%
  mutate(
    PROP_MULT = decode_exp(PROPDMGEXP),
    CROP_MULT = decode_exp(CROPDMGEXP),
    PROP_DMG_USD = PROPDMG * PROP_MULT,
    CROP_DMG_USD = CROPDMG * CROP_MULT,
    TOTAL_DMG_USD = PROP_DMG_USD + CROP_DMG_USD
  )

2.6 Aggregating Health Impact Data

# Sum fatalities and injuries by event type
health_impact <- storm_data %>%
  group_by(EVTYPE_CLEAN) %>%
  summarise(
    Fatalities = sum(FATALITIES, na.rm = TRUE),
    Injuries   = sum(INJURIES,   na.rm = TRUE),
    .groups = "drop"
  ) %>%
  mutate(Total_Casualties = Fatalities + Injuries) %>%
  arrange(desc(Total_Casualties))

# Top 10 most harmful event types
top10_health <- head(health_impact, 10)
top10_health
## # A tibble: 10 × 4
##    EVTYPE_CLEAN      Fatalities Injuries Total_Casualties
##    <chr>                  <dbl>    <dbl>            <dbl>
##  1 TORNADO                 5633    91346            96979
##  2 EXCESSIVE HEAT          1903     6525             8428
##  3 TSTM WIND                504     6957             7461
##  4 FLOOD                    470     6789             7259
##  5 LIGHTNING                816     5230             6046
##  6 HEAT                     937     2100             3037
##  7 FLASH FLOOD              978     1777             2755
##  8 ICE STORM                 89     1975             2064
##  9 THUNDERSTORM WIND        133     1488             1621
## 10 WINTER STORM             206     1321             1527

2.7 Aggregating Economic Impact Data

# Sum total economic damage by event type
econ_impact <- storm_data %>%
  group_by(EVTYPE_CLEAN) %>%
  summarise(
    Property_Damage = sum(PROP_DMG_USD, na.rm = TRUE),
    Crop_Damage     = sum(CROP_DMG_USD, na.rm = TRUE),
    Total_Damage    = sum(TOTAL_DMG_USD, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  arrange(desc(Total_Damage))

# Top 10 most economically damaging event types
top10_econ <- head(econ_impact, 10)
top10_econ
## # A tibble: 10 × 4
##    EVTYPE_CLEAN      Property_Damage Crop_Damage  Total_Damage
##    <chr>                       <dbl>       <dbl>         <dbl>
##  1 FLOOD               144657709807   5661968450 150319678257 
##  2 HURRICANE/TYPHOON    69305840000   2607872800  71913712800 
##  3 TORNADO              56947380676.   414953270  57362333946.
##  4 STORM SURGE          43323536000         5000  43323541000 
##  5 HAIL                 15735267513.  3025954473  18761221986.
##  6 FLASH FLOOD          16822723978.  1421317100  18244041078.
##  7 DROUGHT               1046106000  13972566000  15018672000 
##  8 HURRICANE            11868319010   2741910000  14610229010 
##  9 RIVER FLOOD           5118945500   5029459000  10148404500 
## 10 ICE STORM             3944927860   5022113500   8967041360

3 Results

3.1 Question 1: Events Most Harmful to Population Health

The figure below shows the top 10 weather event types by total casualties (fatalities + injuries). The bars are stacked to show the proportion of fatalities versus injuries within each event type.

# Reshape for plotting
top10_health_long <- top10_health %>%
  select(EVTYPE_CLEAN, Fatalities, Injuries) %>%
  pivot_longer(cols = c(Fatalities, Injuries),
               names_to = "Type",
               values_to = "Count") %>%
  mutate(EVTYPE_CLEAN = factor(EVTYPE_CLEAN, 
                                levels = top10_health$EVTYPE_CLEAN[order(top10_health$Total_Casualties)]))

ggplot(top10_health_long, aes(x = EVTYPE_CLEAN, y = Count, fill = Type)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  scale_fill_manual(values = c("Fatalities" = "#D7191C", "Injuries" = "#FDAE61"),
                    name = "Casualty Type") +
  scale_y_continuous(labels = comma) +
  labs(
    title = "Top 10 Most Harmful Storm Events to Population Health (1950–2011)",
    subtitle = "Stacked total of fatalities and injuries per event type",
    x = "Event Type",
    y = "Number of Casualties"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "bottom",
    axis.text.y = element_text(size = 11)
  )
Figure 1: Top 10 Storm Event Types by Total Casualties (Fatalities + Injuries). Tornadoes dominate both fatalities and injuries by a wide margin, accounting for more casualties than all other event types combined.

Figure 1: Top 10 Storm Event Types by Total Casualties (Fatalities + Injuries). Tornadoes dominate both fatalities and injuries by a wide margin, accounting for more casualties than all other event types combined.

Key Finding: Tornadoes are by far the most dangerous weather event for population health, with 5,633 fatalities and 91,346 injuries recorded between 1950 and 2011. Excessive heat is the second-deadliest event type, followed by thunderstorm winds (TSTM WIND) and floods.


3.2 Question 2: Events with Greatest Economic Consequences

The figure below shows the top 10 weather event types by total economic damage (property damage + crop damage), with bars stacked to show the split between property and crop loss.

# Reshape for plotting
top10_econ_long <- top10_econ %>%
  select(EVTYPE_CLEAN, Property_Damage, Crop_Damage) %>%
  pivot_longer(cols = c(Property_Damage, Crop_Damage),
               names_to = "Type",
               values_to = "Damage_USD") %>%
  mutate(
    EVTYPE_CLEAN = factor(EVTYPE_CLEAN,
                          levels = top10_econ$EVTYPE_CLEAN[order(top10_econ$Total_Damage)]),
    Type = recode(Type, 
                  "Property_Damage" = "Property Damage", 
                  "Crop_Damage"     = "Crop Damage")
  )

ggplot(top10_econ_long, aes(x = EVTYPE_CLEAN, y = Damage_USD / 1e9, fill = Type)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  scale_fill_manual(values = c("Property Damage" = "#2C7BB6", "Crop Damage" = "#ABD9E9"),
                    name = "Damage Type") +
  scale_y_continuous(labels = dollar_format(suffix = "B", prefix = "$")) +
  labs(
    title = "Top 10 Storm Events by Total Economic Damage (1950–2011)",
    subtitle = "Combined property and crop damage in billions of USD",
    x = "Event Type",
    y = "Total Economic Damage (Billions USD)"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title = element_text(face = "bold", size = 14),
    legend.position = "bottom",
    axis.text.y = element_text(size = 11)
  )
Figure 2: Top 10 Storm Event Types by Total Economic Damage (USD). Floods cause the greatest total economic damage, followed by hurricanes/typhoons and tornado events. Drought has a proportionally larger share of crop damage.

Figure 2: Top 10 Storm Event Types by Total Economic Damage (USD). Floods cause the greatest total economic damage, followed by hurricanes/typhoons and tornado events. Drought has a proportionally larger share of crop damage.

Key Finding: Floods cause the greatest total economic damage, with approximately $150.3 billion in combined property and crop damage. Hurricanes/Typhoons and Tornadoes follow as the second and third most costly events, respectively. Notably, Drought stands out for its disproportionately high share of crop damage relative to property damage.


3.3 Summary Table

# Combined summary
cat("=== TOP 10 EVENTS: POPULATION HEALTH IMPACT ===\n")
## === TOP 10 EVENTS: POPULATION HEALTH IMPACT ===
print(top10_health[, c("EVTYPE_CLEAN", "Fatalities", "Injuries", "Total_Casualties")],
      row.names = FALSE)
## # A tibble: 10 × 4
##    EVTYPE_CLEAN      Fatalities Injuries Total_Casualties
##    <chr>                  <dbl>    <dbl>            <dbl>
##  1 TORNADO                 5633    91346            96979
##  2 EXCESSIVE HEAT          1903     6525             8428
##  3 TSTM WIND                504     6957             7461
##  4 FLOOD                    470     6789             7259
##  5 LIGHTNING                816     5230             6046
##  6 HEAT                     937     2100             3037
##  7 FLASH FLOOD              978     1777             2755
##  8 ICE STORM                 89     1975             2064
##  9 THUNDERSTORM WIND        133     1488             1621
## 10 WINTER STORM             206     1321             1527
cat("\n=== TOP 10 EVENTS: ECONOMIC IMPACT ===\n")
## 
## === TOP 10 EVENTS: ECONOMIC IMPACT ===
top10_econ_display <- top10_econ %>%
  mutate(
    Property_Damage_B = sprintf("$%.2fB", Property_Damage / 1e9),
    Crop_Damage_B     = sprintf("$%.2fB", Crop_Damage / 1e9),
    Total_Damage_B    = sprintf("$%.2fB", Total_Damage / 1e9)
  ) %>%
  select(EVTYPE_CLEAN, Property_Damage_B, Crop_Damage_B, Total_Damage_B)

print(top10_econ_display, row.names = FALSE)
## # A tibble: 10 × 4
##    EVTYPE_CLEAN      Property_Damage_B Crop_Damage_B Total_Damage_B
##    <chr>             <chr>             <chr>         <chr>         
##  1 FLOOD             $144.66B          $5.66B        $150.32B      
##  2 HURRICANE/TYPHOON $69.31B           $2.61B        $71.91B       
##  3 TORNADO           $56.95B           $0.41B        $57.36B       
##  4 STORM SURGE       $43.32B           $0.00B        $43.32B       
##  5 HAIL              $15.74B           $3.03B        $18.76B       
##  6 FLASH FLOOD       $16.82B           $1.42B        $18.24B       
##  7 DROUGHT           $1.05B            $13.97B       $15.02B       
##  8 HURRICANE         $11.87B           $2.74B        $14.61B       
##  9 RIVER FLOOD       $5.12B            $5.03B        $10.15B       
## 10 ICE STORM         $3.94B            $5.02B        $8.97B

4 Conclusion

This analysis of the NOAA Storm Database (1950–2011) reveals two clear priorities for emergency preparedness:

  1. Tornadoes pose the greatest threat to human life and safety, accounting for the most fatalities and injuries of any event type. Governments should invest in tornado detection, warning systems, and shelter infrastructure.

  2. Floods cause the largest economic losses in terms of property and crop damage. Infrastructure investment in flood management, levees, and crop insurance programs would be most cost-effective for economic protection.

These findings are based on the complete 61-year record of storm events and provide a robust basis for resource allocation decisions by emergency management agencies.