Main Questions
The primary questions being asked in this analysis are: What types of storms are the most damaging to human life? What types of storms are the most damaging to property?
Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The primary questions being asked in this analysis are: What types of storms are the most damaging to human life? What types of storms are the most damaging to property?
Perhaps the most dramatic natural disaster in America, it is no suprise that tornadoes are the most damaging to human life and property. Out of the 4 figurs appearing below, tornadoes appear prominently in 3 of them. Beyond tornadoes, it appears that the types of events causing fatalities and causing property damage are different. Heat related events cause lots of deaths in total and on average, while water and wind related events cause lots of property damage.
sessionInfo()
## R version 4.4.2 (2024-10-31)
## Platform: aarch64-apple-darwin20
## Running under: macOS Ventura 13.0
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.37 R6_2.5.1 fastmap_1.2.0 xfun_0.50
## [5] cachem_1.1.0 knitr_1.49 htmltools_0.5.8.1 rmarkdown_2.29
## [9] lifecycle_1.0.4 cli_3.6.3 sass_0.4.9 jquerylib_0.1.4
## [13] compiler_4.4.2 rstudioapi_0.17.1 tools_4.4.2 evaluate_1.0.1
## [17] bslib_0.8.0 yaml_2.3.10 rlang_1.1.4 jsonlite_1.8.9
knitr::opts_chunk$set(echo = TRUE)
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
##checking to see if storm data exists and downloading it if it does not
stormDataURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
stormFileName <- "./myStormData.csv"
if(!file.exists(stormFileName)){
download.file(stormDataURL, stormFileName)
}
##Checks to see if the file has already been read (saves a bunch of time on subsequent runs)
if(!exists("myStormData")){myStormData <- read.csv(stormFileName)}
I chose to add an additional column to the data that isolated just the year component of the date. This made plotting and data analysis easier
##adding a year column to make plotting easier
myStormData$YEAR<- format(as.POSIXct(myStormData$BGN_DATE, format = "%m/%d/%Y %H:%M:%S", tz = "" ), "%Y")
myStormData$YEAR <- as.numeric(myStormData$YEAR)
##getting average fatalities for event type
myFatalStorms <- aggregate(x=myStormData$FATALITIES, by = list(myStormData$EVTYPE), FUN=mean)
myFatalStorms <- myFatalStorms[order(myFatalStorms$x, decreasing=TRUE),]
colnames(myFatalStorms) <- c("stormType","AvgFatalities")
print(head(myFatalStorms))
## stormType AvgFatalities
## 842 TORNADOES, TSTM WIND, HAIL 25.000000
## 72 COLD AND SNOW 14.000000
## 851 TROPICAL STORM GORDON 8.000000
## 580 RECORD/EXCESSIVE HEAT 5.666667
## 142 EXTREME HEAT 4.363636
## 279 HEAT WAVE DROUGHT 4.000000
Here we can see that in terms of average fatalities, tornadoes top the list while heat waves are unexpectedly well represented with 3 in the top 6.
myFatalStorms_summary <- myStormData %>%
group_by(EVTYPE) %>%
summarize(Total_Sum = sum(FATALITIES), .groups = "drop") %>%
arrange(desc(Total_Sum)) %>% # Sort by total sum in descending order
slice(1:5) %>% # Keep only the top 5 categories
left_join(myStormData, by = "EVTYPE") %>% # Join back with original data to get Year and Value
group_by(YEAR, EVTYPE) %>%
summarize(Sum_Value = sum(Total_Sum), .groups = "drop") %>%
arrange(YEAR, desc(Sum_Value)) # Sort by year and then within year by sum value
fatalPlot <- ggplot(myFatalStorms_summary, aes(x = YEAR, y = Sum_Value, fill = EVTYPE)) +
geom_bar(stat = "identity", position = "dodge") +
facet_wrap(~ EVTYPE, ncol = 1, scales="free_y")
labs(title = "Sum of Fatalities by Year (Top 5 EVTYPEs)",
x = "Year",
y = "Sum of Fatalities",
fill = "Event Type") +
theme_bw() +
xlim(1990, 2025) +
scale_x_continuous(breaks = unique(myFatalStorms_summary$YEAR))
## NULL
print(fatalPlot)
print("")
## [1] ""
Here we can see the most damaging event types to human life total Once again heat related events are well represented taking 2 of the top 5 spots
##economic Damage
myExpensiveStorms <- aggregate(x=myStormData$PROPDMG, by = list(myStormData$EVTYPE), FUN=mean)
myExpensiveStorms <- myExpensiveStorms[order(myExpensiveStorms$x, decreasing=TRUE),]
colnames(myExpensiveStorms) <- c("stormType","AvgPropertyDamage")
print(head(myExpensiveStorms))
## stormType AvgPropertyDamage
## 52 COASTAL EROSION 766
## 291 HEAVY RAIN AND FLOOD 600
## 589 RIVER AND STREAM FLOOD 600
## 445 Landslump 570
## 38 BLIZZARD/WINTER STORM 500
## 158 FLASH FLOOD/ 500
Here we see that in terms of average property damage, water related events are king with floods taking 3/6 top spots and erosion taking yet another.
myExpensiveStorms_summary <- myStormData %>%
group_by(EVTYPE) %>%
summarize(Total_Sum = sum(PROPDMG), .groups = "drop") %>%
arrange(desc(Total_Sum)) %>% # Sort by total sum in descending order
slice(1:5) %>% # Keep only the top 5 categories
left_join(myStormData, by = "EVTYPE") %>% # Join back with original data to get Year and Value
group_by(YEAR, EVTYPE) %>%
summarize(Sum_Value = sum(Total_Sum), .groups = "drop") %>%
arrange(YEAR, desc(Sum_Value)) # Sort by year and then within year by sum value
propertyPlot <- ggplot(myExpensiveStorms_summary, aes(x = YEAR, y = Sum_Value, fill = EVTYPE)) +
geom_bar(stat = "identity", position = "dodge") +
facet_wrap(~ EVTYPE, ncol = 1, scales="free_y")
labs(title = "Sum of Property Damage by Year (Top 5 EVTYPEs)",
x = "Year",
y = "Sum of Fatalities",
fill = "Event Type") +
theme_bw() +
scale_x_continuous(breaks = unique(myExpensiveStorms_summary$YEAR))
## NULL
print(propertyPlot)
Here we see that the top 5 most damaging to property in total are slightly different from the top 5 most damaging to human life. While heat occupied 2/5 top spots in the fatal plot, flooding occupies 2/5 spots in the property damage plot. Of course tornadoes are still king in both regards.