Synopsis

This analysis explores the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database to determine which severe weather events cause the most harm to population health and the economy. The data spans from 1950 to 2011.

Our analysis reveals that Tornadoes are by far the most dangerous event to population health, causing the highest number of fatalities and injuries. Regarding economic impact, Floods cause the greatest total property and crop damage.

Data Processing

First, we load the required libraries and download the dataset directly from the course website.

# Download and load the data
file_url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
dest_file <- "StormData.csv.bz2"

if (!file.exists(dest_file)) {
  download.file(file_url, dest_file, method = "auto")
}

# Read the bz2 file
storm_data <- read.csv(dest_file)

Cleaning and Transforming the Data

To answer our questions, we only need a subset of the columns: EVTYPE (Event Type), FATALITIES, INJURIES, PROPDMG (Property Damage), PROPDMGEXP (Property Damage Exponent), CROPDMG (Crop Damage), and CROPDMGEXP (Crop Damage Exponent).

The economic damage columns use alphabetical characters (K, M, B) to signify thousands, millions, and billions. We must convert these into numerical multipliers to calculate the total cost.

clean_data <- storm_data %>%
  select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP) %>%
  mutate(
    # Standardize exponents to uppercase
    PROPDMGEXP = toupper(PROPDMGEXP),
    CROPDMGEXP = toupper(CROPDMGEXP),
    
    # Convert characters to numerical multipliers
    prop_mult = case_when(
      PROPDMGEXP == "K" ~ 1000,
      PROPDMGEXP == "M" ~ 1000000,
      PROPDMGEXP == "B" ~ 1000000000,
      TRUE ~ 1
    ),
    crop_mult = case_when(
      CROPDMGEXP == "K" ~ 1000,
      CROPDMGEXP == "M" ~ 1000000,
      CROPDMGEXP == "B" ~ 1000000000,
      TRUE ~ 1
    ),
    
    # Calculate actual dollar amounts
    TotalPropDmg = PROPDMG * prop_mult,
    TotalCropDmg = CROPDMG * crop_mult,
    TotalEconDmg = TotalPropDmg + TotalCropDmg,
    
    # Calculate total health impact
    TotalHealthImpact = FATALITIES + INJURIES
  )

Results

1. Most Harmful Events to Population Health

We aggregate the total fatalities and injuries by event type and identify the top 10 most harmful events.

health_data <- clean_data %>%
  group_by(EVTYPE) %>%
  summarize(Total_Casualties = sum(TotalHealthImpact, na.rm = TRUE)) %>%
  arrange(desc(Total_Casualties)) %>%
  slice(1:10)

# Plot the health data
ggplot(health_data, aes(x = reorder(EVTYPE, Total_Casualties), y = Total_Casualties)) +
  geom_bar(stat = "identity", fill = "darkred") +
  coord_flip() +
  labs(title = "Top 10 Weather Events by Population Health Impact",
       x = "Event Type",
       y = "Total Casualties (Fatalities + Injuries)") +
  theme_minimal()

As seen in the chart above, Tornadoes overwhelmingly cause the highest number of casualties.

2. Events with the Greatest Economic Consequences

We aggregate the combined property and crop damage by event type to find the top 10 most economically damaging events.

econ_data <- clean_data %>%
  group_by(EVTYPE) %>%
  summarize(Total_Damage = sum(TotalEconDmg, na.rm = TRUE)) %>%
  arrange(desc(Total_Damage)) %>%
  slice(1:10)

# Plot the economic data
ggplot(econ_data, aes(x = reorder(EVTYPE, Total_Damage), y = Total_Damage)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  coord_flip() +
  labs(title = "Top 10 Weather Events by Economic Impact",
       x = "Event Type",
       y = "Total Economic Damage (in US Dollars)") +
  theme_minimal()

As shown in the chart, Floods have caused the greatest economic damage, followed closely by Hurricanes/Typhoons and Tornadoes.