Synopsis

This analysis explores the NOAA Storm Database to determine which weather events are most harmful to public health and which have the greatest economic consequences in the United States. The analysis covers the period from 1950 to November 2011. I found that Tornadoes are the most harmful events with respect to population health (both fatalities and injuries). In terms of economic consequences, Floods cause the greatest property damage, while Droughts cause the most crop damage. However, combining both property and crop damage, Floods remain the leading cause of economic loss.

Data Processing

1. Loading the Data

The data comes in the form of a comma-separated-value file compressed via the bzip2 algorithm. We start by loading the necessary libraries and reading the raw CSV file. I use cache=TRUE to speed up processing on subsequent runs.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(tidyr) # For reshaping data for plotting

# Read the data directly from the bz2 file
zipFile <- "repdata_data_StormData.csv.bz2"
storm_data <- read.csv(zipFile)

# Preview the data structure
dim(storm_data)
## [1] 902297     37
head(storm_data[, c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP")])
##    EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP
## 1 TORNADO          0       15    25.0          K
## 2 TORNADO          0        0     2.5          K
## 3 TORNADO          0        2    25.0          K
## 4 TORNADO          0        2     2.5          K
## 5 TORNADO          0        2     2.5          K
## 6 TORNADO          0        6     2.5          K

2. Data Subsetting and Cleaning

The dataset contains many variables, but we only need those related to health (FATALITIES, INJURIES), economy (PROPDMG, CROPDMG), and the event type (EVTYPE).

# Select only relevant columns
df <- storm_data %>% 
  select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)

# Clean EVTYPE: Convert to uppercase to handle inconsistencies (e.g., "Flood" vs "FLOOD")
df$EVTYPE <- toupper(df$EVTYPE)

3. Handling Economic Data

The variables PROPDMGEXP and CROPDMGEXP represent the magnitude of the damage (e.g., “K” for thousands, “M” for millions, “B” for billions). We need to convert these characters into numeric multipliers.

# Function to map exponents to numeric values
get_multiplier <- function(exp) {
    exp <- toupper(exp)
    if (exp == "H") return(10^2)
    if (exp == "K") return(10^3)
    if (exp == "M") return(10^6)
    if (exp == "B") return(10^9)
    if (exp %in% c("0", "1", "2", "3", "4", "5", "6", "7", "8")) return(10^1) # Treat numbers as 10
    if (exp == "+") return(1)
    return(0) # Ignore empty or invalid characters
}

# Apply the function to create new numeric columns for actual damage cost
# Note: This step might take a moment due to dataset size
df$PROP_MULTIPLIER <- sapply(df$PROPDMGEXP, get_multiplier)
df$CROP_MULTIPLIER <- sapply(df$CROPDMGEXP, get_multiplier)

df$PROP_COST <- df$PROPDMG * df$PROP_MULTIPLIER
df$CROP_COST <- df$CROPDMG * df$CROP_MULTIPLIER
df$TOTAL_COST <- df$PROP_COST + df$CROP_COST

Results

1. Across the United States, which types of events are most harmful with respect to population health?

Aggregate the total number of fatalities and injuries by event type and select the top 10 most harmful events.

health_impact <- df %>%
  group_by(EVTYPE) %>%
  summarise(Fatalities = sum(FATALITIES),
            Injuries = sum(INJURIES)) %>%
  arrange(desc(Fatalities + Injuries)) %>%
  slice(1:10)

# Reshape for plotting (Wide to Long format)
health_long <- health_impact %>%
  pivot_longer(cols = c(Fatalities, Injuries), names_to = "Type", values_to = "Count")

print(health_impact)
## # A tibble: 10 × 3
##    EVTYPE            Fatalities Injuries
##    <chr>                  <dbl>    <dbl>
##  1 TORNADO                 5633    91346
##  2 EXCESSIVE HEAT          1903     6525
##  3 TSTM WIND                504     6957
##  4 FLOOD                    470     6789
##  5 LIGHTNING                816     5230
##  6 HEAT                     937     2100
##  7 FLASH FLOOD              978     1777
##  8 ICE STORM                 89     1975
##  9 THUNDERSTORM WIND        133     1488
## 10 WINTER STORM             206     1321
ggplot(health_long, aes(x = reorder(EVTYPE, Count), y = Count, fill = Type)) +
  geom_bar(stat = "identity", position = "dodge") + # Side by side bars
  coord_flip() + # Horizontal bars for better readability
  labs(title = "Top 10 Weather Events Most Harmful to Population Health",
       x = "Event Type",
       y = "Number of People Affected",
       fill = "Health Impact") +
  theme_minimal()

Figure 1: This chart shows the top 10 weather events causing the most fatalities and injuries. Tornados are by far the most dangerous event type to population health.

2. Across the United States, which types of events have the greatest economic consequences?

Aggregate the total economic damage (Property + Crop) by event type and select the top 10 events.

econ_impact <- df %>%
  group_by(EVTYPE) %>%
  summarise(Property = sum(PROP_COST),
            Crop = sum(CROP_COST),
            Total = sum(TOTAL_COST)) %>%
  arrange(desc(Total)) %>%
  slice(1:10)

# Reshape for plotting
econ_long <- econ_impact %>%
  select(EVTYPE, Property, Crop) %>%
  pivot_longer(cols = c(Property, Crop), names_to = "Damage_Type", values_to = "Cost")

print(econ_impact)
## # A tibble: 10 × 4
##    EVTYPE                Property        Crop        Total
##    <chr>                    <dbl>       <dbl>        <dbl>
##  1 FLOOD             144657709800  5661968450 150319678250
##  2 HURRICANE/TYPHOON  69305840000  2607872800  71913712800
##  3 TORNADO            56937162897   414954710  57352117607
##  4 STORM SURGE        43323536000        5000  43323541000
##  5 HAIL               15732269877  3025954650  18758224527
##  6 FLASH FLOOD        16140815011  1421317100  17562132111
##  7 DROUGHT             1046106000 13972566000  15018672000
##  8 HURRICANE          11868319010  2741910000  14610229010
##  9 RIVER FLOOD         5118945500  5029459000  10148404500
## 10 ICE STORM           3944928310  5022113500   8967041810
ggplot(econ_long, aes(x = reorder(EVTYPE, Cost), y = Cost / 10^9, fill = Damage_Type)) +
  geom_bar(stat = "identity") + # Stacked bars
  coord_flip() +
  scale_fill_manual(values = c("Property" = "steelblue", "Crop" = "darkgreen")) +
  labs(title = "Top 10 Weather Events with Greatest Economic Consequences",
       x = "Event Type",
       y = "Total Economic Damage (Billions of USD)",
       fill = "Damage Type") +
  theme_minimal()

Figure 2: This chart shows the top 10 weather events causing the most economic damage. Floods are the primary cause of economic loss, driven largely by property damage. Droughts are a significant contributor to crop damage.