Synopsis

This report looks at weather data from the National Oceanic and Atmospheric Administration (NOAA). I’m trying to figure out which types of storms or severe weather cause the most harm to people (like injuries or deaths), and which ones cause the most financial damage (like property or crop loss).

The data includes events from 1950 to 2011. I’ll use R to read the raw data file, clean it up, and then create graphs to help answer these questions.

Data Processing

# These packages help us with data, plotting, and reading files
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.3.3
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(readr)
## Warning: package 'readr' was built under R version 4.3.3
# Load the dataset (make sure it's in the same folder as this file)
storm_data <- read.csv("repdata_data_StormData.csv.bz2")

# Take a quick look at the structure of the data
str(storm_data)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...
# Keep only relevant columns
clean_data <- storm_data %>%
  select(EVTYPE, FATALITIES, INJURIES,
         PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)

# Convert exponents for damage columns
exp_map <- c("K" = 1e3, "M" = 1e6, "B" = 1e9,
             "k" = 1e3, "m" = 1e6, "b" = 1e9)

# Replace exponent values with numeric multipliers
clean_data <- clean_data %>%
  mutate(
    prop_mult = exp_map[as.character(PROPDMGEXP)],
    crop_mult = exp_map[as.character(CROPDMGEXP)],
    prop_mult = ifelse(is.na(prop_mult), 1, prop_mult),
    crop_mult = ifelse(is.na(crop_mult), 1, crop_mult),
    property_damage = PROPDMG * prop_mult,
    crop_damage = CROPDMG * crop_mult
  )
# Summarize total fatalities and injuries by event type
health_impact <- clean_data %>%
  group_by(EVTYPE) %>%
  summarise(
    total_fatalities = sum(FATALITIES, na.rm = TRUE),
    total_injuries = sum(INJURIES, na.rm = TRUE),
    total_impact = total_fatalities + total_injuries
  ) %>%
  arrange(desc(total_impact)) %>%
  slice_head(n = 10)

# Plot top 10 most harmful events
ggplot(health_impact, aes(x = reorder(EVTYPE, -total_impact), y = total_impact)) +
  geom_bar(stat = "identity", fill = "tomato") +
  labs(title = "Top 10 Most Harmful Weather Events to Population Health",
       x = "Event Type", y = "Total Fatalities + Injuries") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

# Summarize total property + crop damage by event type
economic_impact <- clean_data %>%
  group_by(EVTYPE) %>%
  summarise(
    total_damage = sum(property_damage + crop_damage, na.rm = TRUE)
  ) %>%
  arrange(desc(total_damage)) %>%
  slice_head(n = 10)

# Plot top 10 events by economic damage
ggplot(economic_impact, aes(x = reorder(EVTYPE, -total_damage), y = total_damage / 1e9)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  labs(title = "Top 10 Weather Events by Economic Damage",
       x = "Event Type", y = "Total Damage (in Billions USD)") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Results

Based on the analysis, tornadoes are by far the most harmful weather event when it comes to public health. The first plot shows that tornadoes have caused the highest number of combined injuries and fatalities, far more than any other type of event. Other harmful events include excessive heat, floods, and lightning.

In terms of economic consequences, floods have caused the most total damage, followed by hurricanes/typhoons, tornadoes, and storm surges. The second plot shows the top 10 events in terms of property and crop damage. These findings can help emergency planners know which types of events have historically had the biggest impact on both people and infrastructure.