In this analysis, I will be analyzing data from the NOAA database from about 1960 to 2011 to answer two specific questions about the affect of extreme weather events in the United States. The first question I will be asking is over the 51 year time frame, from 1950 to 2011, is what type of weather event contributed to the greatest number of fatalities. Furthermore, the second question I will be trying to answer is determining which type of weather event contributes the most to property and economic damage in the United States. This data could be used to try to predict where future resources by Federal, State, Local governments and private organization when trying to deal with natural weather event disasters.
In the data processing step, I will show how I import the data into R and then do some basic cleaning up of the data for further analysis. I import the data into R, and then convert the BGN_DATE which is a character into a date/time variable and then use the lubridate package from the tidyverse to pull out the year. I could have used gsub or one of the other detect functions to extract the year, but using the year function from lubridate is easier.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.2.0 ✔ readr 2.2.0
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.2 ✔ tibble 3.3.1
## ✔ lubridate 1.9.5 ✔ tidyr 1.3.2
## ✔ purrr 1.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(gtsummary)
library(gt)
file <- "repdata_data_StormData.csv.bz2"
pathway_to_unzip <- file.path(here::here(),"Data/Reproducible_Research")
final_file <- file.path(pathway_to_unzip, file)
#The unzipping doens't work because its not a .zip file, but I think i can just import it normally
#unzip(file.path(pathway_to_unzip, file), files = "Storm.csv")
#Read_Data
Storm_Data_Initial <- read.csv(final_file)
Storm_Data <- Storm_Data_Initial
#Convert the date as a character into a actual date and time which I can use for downstream analysis
Storm_Data$BGN_DATE <- strptime(Storm_Data$BGN_DATE, "%m/%d/%Y %H:%M:%S")
Storm_Data <- Storm_Data %>%
mutate(year = year(BGN_DATE))
One part of the data processing step that I identified after doing exploratory data analysis is that there are many different weather events in this dataset, and many of them are effectively the same or nearly the same. An example of this is that both “Flood” and “Flash Flood” are considered different categories. Therefore, I decided to categorize all of the 168 different weather events into 5 different larger classifications. The 1st classification is considered to be “Hot” weather events, such as drought, extreme heat or anything related to that type of weather and its subsequent deaths and property damage. The 2nd is “Cold” which includes blizzards, snow, low temperature, etc. The 3rd category is called “Wind” and that encompasses Tornados, Hurricanes, strong winds, etc. The 4th category is considered “Water Events” which includes tsunamis, riptides or rip currents, floods, and other related events. The 5th category includes any weather event that can not be categorized into one of the 4 main weather events. This code will be used to help categorize every EVTYPE, or natural disaster event type, into one of 5 larger classifications.
Hot_Weather_Events <- c("Heat","HOT","FIRE","WARM")
Hot_Weather_Events <- paste(Hot_Weather_Events, collapse = "|")
Cold_Weather_Events <- c("COLD","CHILL","LOW TEMPERATURE","WINTER","BLIZZARD", "ICE","SNOW","Hypothermia","FREEZING","FROST","HAIL","ICY","SLEET")
Cold_Weather_Events <- paste(Cold_Weather_Events, collapse = "|")
High_Wind_Events <- c("Tornado","Hurricane","WIND","TYPHOON","DUST DEVIL")
High_Wind_Events <- paste(High_Wind_Events, collapse = "|")
Water_Events <- c("Flood","SURGE","TIDE","SEAS","SURF","MARINE MISHAP","Waterspout","RIP","CURRENT","TSUNAMI","TROPICAL STORM")
Water_Events <- paste(Water_Events, collapse = "|")
The first part of my results will be creating a table demonstrating the top 10 weather leading to the most fatalities from 1950 to 2011. The second part of answering the first question will be classifying all the weather events into one of 5 weather classifications that were mentioned in the previous step. Then, after classifying all weather events into one of 5 weather event classifications, I will graph a plot demonstrating the fatalities from 1950 to 2011 for each of the 5 weather classifications.
#Calculating the total # of fatalities by natural disaster event type
Fatal_Events <- Storm_Data %>%
group_by(EVTYPE) %>%
summarize(total_fatalities = sum(FATALITIES)) %>%
filter(total_fatalities > 0) %>%
arrange(desc(total_fatalities))
#Make a simple table looking at the top 10 types of natural disaster that contributed to the most fatalities from 1960 to 2011
Fatal_Events %>%
head(10) %>%
rename("Total Fatalities from 1960 to 2011" = total_fatalities,
"Top 10 Weather Events" = EVTYPE) %>%
gt()
| Top 10 Weather Events | Total Fatalities from 1960 to 2011 |
|---|---|
| TORNADO | 5633 |
| EXCESSIVE HEAT | 1903 |
| FLASH FLOOD | 978 |
| HEAT | 937 |
| LIGHTNING | 816 |
| TSTM WIND | 504 |
| FLOOD | 470 |
| RIP CURRENT | 368 |
| HIGH WIND | 248 |
| AVALANCHE | 224 |
#Plotting a graph showing the fatalities per year from 1960 to 2011 by one of the 5 different weather classifications that I specified earlier
Yearly_Fatal_Events <- Storm_Data %>%
group_by(EVTYPE, year) %>%
summarize(total_fatalities = sum(FATALITIES)) %>%
filter(total_fatalities > 0) %>%
arrange(desc(total_fatalities))
## `summarise()` has regrouped the output.
## ℹ Summaries were computed grouped by EVTYPE and year.
## ℹ Output is grouped by EVTYPE.
## ℹ Use `summarise(.groups = "drop_last")` to silence this message.
## ℹ Use `summarise(.by = c(EVTYPE, year))` for per-operation grouping
## (`?dplyr::dplyr_by`) instead.
Yearly_Fatal_Events %>%
mutate(General_Event_Type = FALSE) %>%
mutate(General_Event_Type = case_when(grepl(Cold_Weather_Events, EVTYPE, ignore.case = TRUE) == TRUE ~ "Cold",
grepl(Hot_Weather_Events,EVTYPE, ignore.case = TRUE) == TRUE & General_Event_Type != TRUE ~ "Hot",
grepl(High_Wind_Events,EVTYPE, ignore.case = TRUE) == TRUE & General_Event_Type != TRUE ~ "Windy",
grepl(Water_Events, EVTYPE, ignore.case = TRUE) == TRUE & General_Event_Type != TRUE ~ "Water",
TRUE ~ "Other")) %>% #The TRUE ~ "Other" meanings anything else will be categorized as other
group_by(General_Event_Type, year) %>%
summarize(total_fatalities = sum(total_fatalities)) %>%
ggplot(aes(x=year, y = total_fatalities, color = General_Event_Type))+
geom_line()+
theme_bw()+
labs(title = "Yearly Fatalities from 5 different broad classifications \n of natural disaster event types",
x = "year",
y = "Total Fatalities",
color = "Natural Disaster Classification")+
scale_x_time( breaks = seq(from = 1950, to = 2011, by = 10),
labels = c(1950,1960,1970,1980,1990,2000,2010))+
theme(title = element_text(face = "bold", hjust = 0.5))
## `summarise()` has regrouped the output.
## ℹ Summaries were computed grouped by General_Event_Type and year.
## ℹ Output is grouped by General_Event_Type.
## ℹ Use `summarise(.groups = "drop_last")` to silence this message.
## ℹ Use `summarise(.by = c(General_Event_Type, year))` for per-operation grouping
## (`?dplyr::dplyr_by`) instead.
The next most important impact other than deaths from weather events is the economic and property damage that these weather events can cause. Consequently, I will use the same dataset to identify which weather events contribute to the most property damage and then do an over time analysis where all the different weather events are classified into one of 5 different weather classifications to show how broadly overall weather event property damage has changed over time.
Storm_Data_Financial <- Storm_Data %>%
select(c(1:9), PROPDMG, PROPDMGEXP, year) %>%
mutate(PROPDMGEXP = case_when(PROPDMGEXP == "M" ~ 1000000,
PROPDMGEXP == "K" ~ 100000,
PROPDMGEXP == "B" ~ 1000000000,
TRUE ~ 1)) %>%
mutate(PROPDMG_FULL = PROPDMG * PROPDMGEXP)
#Tibble of the top 10 causes of financial damage from weather events in the US not in one of 5 broad classifications
Storm_Data_Financial %>%
filter(PROPDMG > 0) %>%
group_by(EVTYPE) %>%
summarize(total_damage = sum(PROPDMG_FULL, na.rm = TRUE)) %>%
arrange(desc(total_damage)) %>%
head(10)
## # A tibble: 10 × 2
## EVTYPE total_damage
## <chr> <dbl>
## 1 TORNADO 370110228310.
## 2 FLOOD 231632160007
## 3 FLASH FLOOD 155318131557.
## 4 TSTM WIND 136428014055
## 5 THUNDERSTORM WIND 90018144144
## 6 HAIL 82562932333.
## 7 HURRICANE/TYPHOON 69500870000
## 8 LIGHTNING 60611728167.
## 9 THUNDERSTORM WINDS 45176921176.
## 10 STORM SURGE 45165530000
Storm_Data_Financial %>%
filter(PROPDMG > 0) %>%
mutate(General_Event_Type = FALSE) %>%
mutate(General_Event_Type = case_when(grepl(Cold_Weather_Events, EVTYPE, ignore.case = TRUE) == TRUE ~ "Cold",
grepl(Hot_Weather_Events,EVTYPE, ignore.case = TRUE) == TRUE & General_Event_Type != TRUE ~ "Hot",
grepl(High_Wind_Events,EVTYPE, ignore.case = TRUE) == TRUE & General_Event_Type != TRUE ~ "Windy",
grepl(Water_Events, EVTYPE, ignore.case = TRUE) == TRUE & General_Event_Type != TRUE ~ "Water",
TRUE ~ "Other")) %>%
group_by(General_Event_Type, year) %>%
summarize(total_damage = sum(PROPDMG_FULL, na.rm = TRUE)) %>%
ggplot(aes(x=year, y = total_damage, color = General_Event_Type))+
geom_line()+
theme_bw()+
labs(title = "Yearly Property Damage from 5 different broad classifications \n of natural disaster event types",
x = "year",
y = "Property Damage in USD ($)",
color = "Natural Disaster Classification")+
scale_x_time( breaks = seq(from = 1950, to = 2011, by = 10),
labels = c(1950,1960,1970,1980,1990,2000,2010))+
theme(title = element_text(face = "bold", hjust = 0.5))+
scale_y_continuous(breaks = seq(from = 0, to = 150000000000, by = 50000000000), limits = c(0,150000000000),
labels = c("0","$50,000,000,000","$100,000,000,000","$150,000,000,000"))
## `summarise()` has regrouped the output.
## ℹ Summaries were computed grouped by General_Event_Type and year.
## ℹ Output is grouped by General_Event_Type.
## ℹ Use `summarise(.groups = "drop_last")` to silence this message.
## ℹ Use `summarise(.by = c(General_Event_Type, year))` for per-operation grouping
## (`?dplyr::dplyr_by`) instead.
The conclusion of this data analysis of the NOAA dataset from 1950 to 2011 allowed us to answer several key questions. The weather event that contributed to the most fatalities over this period of time is Tornado at 5633 fatalities followed by Excessive Heat at 1903 and flash flooding at 978 deaths. When grouping all of the weather events into one of 5 broad classifications (Cold, Hot, Windy, Water, and Other), we identify that wind related deaths, which includes deaths from tornados, hurricanes and other similar weather events lead to the most casualties. However, it should be noted that the Windy fatalities is strongly influenced by the impact of Tornados. In fact, Hurricanes are not even within the top 10 events by number of fatalities while Tornados are by far the largest contributor to deaths at nearly triple the second leading cause of death. This is likely due to the case that while Hurricanes can be predicted prior to landfall and people are warned ahead of time to evacuate while Tornados can spawn in the matter of minutes and people are less likely to be prepared.
Similarly, Tornados are the leading cause of property damage with over 370 billion dollars in property and economic damage from 1950 to 2011. This is followed by damage from flooding and hurricanes. Based on the human fatalities and property damage, it seems that non-governmental and government resources should be focused on resources dealing with “Wind” related weather events, specifically Tornados and Tornado damage prevention This may include resources for after a Tornado strikes an area or preventative resources to minimize the damage and fatality of Tornados. The next biggest weather events category would be either excessive heat or flooding. This is not a report saying what specifically should be done to mitigate the damage from these weather events, but it does indicate where limited resources should be diverted to get the most impact with respect to fatalities and property damage.