Introduction

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Main Questions

The primary questions being asked in this analysis are: What types of storms are the most damaging to human life? What types of storms are the most damaging to property?

Abstract

Perhaps the most dramatic natural disaster in America, it is no suprise that tornadoes are the most damaging to human life and property. Out of the 4 figurs appearing below, tornadoes appear prominently in 3 of them. Beyond tornadoes, it appears that the types of events causing fatalities and causing property damage are different. Heat related events cause lots of deaths in total and on average, while water and wind related events cause lots of property damage.

Session Information

sessionInfo()
## R version 4.4.2 (2024-10-31)
## Platform: aarch64-apple-darwin20
## Running under: macOS Ventura 13.0
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/New_York
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.37     R6_2.5.1          fastmap_1.2.0     xfun_0.50        
##  [5] cachem_1.1.0      knitr_1.49        htmltools_0.5.8.1 rmarkdown_2.29   
##  [9] lifecycle_1.0.4   cli_3.6.3         sass_0.4.9        jquerylib_0.1.4  
## [13] compiler_4.4.2    rstudioapi_0.17.1 tools_4.4.2       evaluate_1.0.1   
## [17] bslib_0.8.0       yaml_2.3.10       rlang_1.1.4       jsonlite_1.8.9

Libraries Used

knitr::opts_chunk$set(echo = TRUE)
library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Data Reading

##checking to see if storm data exists and downloading it if it does not    
stormDataURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
stormFileName <- "./myStormData.csv"
if(!file.exists(stormFileName)){
  download.file(stormDataURL, stormFileName)
}
##Checks to see if the file has already been read (saves a bunch of time on subsequent runs)
if(!exists("myStormData")){myStormData <- read.csv(stormFileName)}

Data Processing

I chose to add an additional column to the data that isolated just the year component of the date. This made plotting and data analysis easier

##adding a year column to make plotting easier
myStormData$YEAR<- format(as.POSIXct(myStormData$BGN_DATE, format = "%m/%d/%Y %H:%M:%S", tz = "" ), "%Y")
myStormData$YEAR <- as.numeric(myStormData$YEAR)

Results

What storm are most damaging to human life?

##getting average fatalities for event type
myFatalStorms <- aggregate(x=myStormData$FATALITIES, by = list(myStormData$EVTYPE), FUN=mean)
myFatalStorms <- myFatalStorms[order(myFatalStorms$x, decreasing=TRUE),]
colnames(myFatalStorms) <- c("stormType","AvgFatalities")
print(head(myFatalStorms))
##                      stormType AvgFatalities
## 842 TORNADOES, TSTM WIND, HAIL     25.000000
## 72               COLD AND SNOW     14.000000
## 851      TROPICAL STORM GORDON      8.000000
## 580      RECORD/EXCESSIVE HEAT      5.666667
## 142               EXTREME HEAT      4.363636
## 279          HEAT WAVE DROUGHT      4.000000

Here we can see that in terms of average fatalities, tornadoes top the list while heat waves are unexpectedly well represented with 3 in the top 6.

Plotting damage to human life

myFatalStorms_summary <- myStormData %>%
  group_by(EVTYPE) %>%
  summarize(Total_Sum = sum(FATALITIES), .groups = "drop") %>%
  arrange(desc(Total_Sum)) %>%  # Sort by total sum in descending order
  slice(1:5) %>% # Keep only the top 5 categories
  left_join(myStormData, by = "EVTYPE") %>% # Join back with original data to get Year and Value
  group_by(YEAR, EVTYPE) %>%
  summarize(Sum_Value = sum(Total_Sum), .groups = "drop") %>%
  arrange(YEAR, desc(Sum_Value)) # Sort by year and then within year by sum value

fatalPlot <- ggplot(myFatalStorms_summary, aes(x = YEAR, y = Sum_Value, fill = EVTYPE)) +
  geom_bar(stat = "identity", position = "dodge") +
  facet_wrap(~ EVTYPE, ncol = 1, scales="free_y")
  labs(title = "Sum of Fatalities by Year (Top 5 EVTYPEs)",
      x = "Year",
      y = "Sum of Fatalities",
      fill = "Event Type") +
    theme_bw() +
    xlim(1990, 2025) + 
    scale_x_continuous(breaks = unique(myFatalStorms_summary$YEAR))
## NULL
print(fatalPlot)

print("")
## [1] ""

Here we can see the most damaging event types to human life total Once again heat related events are well represented taking 2 of the top 5 spots

What storms are most damaging to property on average?

##economic Damage
myExpensiveStorms <- aggregate(x=myStormData$PROPDMG, by = list(myStormData$EVTYPE), FUN=mean)
myExpensiveStorms <- myExpensiveStorms[order(myExpensiveStorms$x, decreasing=TRUE),]
colnames(myExpensiveStorms) <- c("stormType","AvgPropertyDamage")
print(head(myExpensiveStorms))
##                  stormType AvgPropertyDamage
## 52         COASTAL EROSION               766
## 291   HEAVY RAIN AND FLOOD               600
## 589 RIVER AND STREAM FLOOD               600
## 445              Landslump               570
## 38   BLIZZARD/WINTER STORM               500
## 158           FLASH FLOOD/               500

Here we see that in terms of average property damage, water related events are king with floods taking 3/6 top spots and erosion taking yet another.

Plotting storms causing the most damage to property

myExpensiveStorms_summary <- myStormData %>%
  group_by(EVTYPE) %>%
  summarize(Total_Sum = sum(PROPDMG), .groups = "drop") %>%
  arrange(desc(Total_Sum)) %>%  # Sort by total sum in descending order
  slice(1:5) %>% # Keep only the top 5 categories
  left_join(myStormData, by = "EVTYPE") %>% # Join back with original data to get Year and Value
  group_by(YEAR, EVTYPE) %>%
  summarize(Sum_Value = sum(Total_Sum), .groups = "drop") %>%
  arrange(YEAR, desc(Sum_Value)) # Sort by year and then within year by sum value

propertyPlot <- ggplot(myExpensiveStorms_summary, aes(x = YEAR, y = Sum_Value, fill = EVTYPE)) +
  geom_bar(stat = "identity", position = "dodge") +
  facet_wrap(~ EVTYPE, ncol = 1, scales="free_y")
labs(title = "Sum of Property Damage by Year (Top 5 EVTYPEs)",
     x = "Year",
     y = "Sum of Fatalities",
     fill = "Event Type") +
  theme_bw() +
  scale_x_continuous(breaks = unique(myExpensiveStorms_summary$YEAR))
## NULL
print(propertyPlot)

Here we see that the top 5 most damaging to property in total are slightly different from the top 5 most damaging to human life. While heat occupied 2/5 top spots in the fatal plot, flooding occupies 2/5 spots in the property damage plot. Of course tornadoes are still king in both regards.