In this section we explored patterns of recorded hurricane activity from 1900 -to- present in the Atlantic Ocean, Gulf of Mexico and Caribbean Sea. Our goal was to address the following questions:
Data used in our analysis are included in NOAA’s “North Atlantic hurricane Database (HURDAT)”. However, we scraped our information from 20 html tables listed on the Wikipedia page, “Atlantic hurricane season” - https://en.wikipedia.org/wiki/Atlantic_hurricane_season.
Steps taken to format the data for analysis in R (i.e., Tidy format) included: 1) importing the data into RStudio as a list of dataframes (each representing different periods of time); 2) iterating over this list to subset, filter, rename, and update column types; 3) addressing inconsistencies in variable entry, time-span, and title, and 4) combining these elements (20 total) into a final dataframe.
Additional cleaning was required to address uncertainties inherent to values recorded for select variables. For example, storm-related deaths included values input as “Unknown”, single numbers, approximate values, and/or a range of values. We converted this information to integer type and limited values to the minimum number of known deaths; values included as “Unknown” were treated as missing (NA) data. We applied similar steps to update a variable for storm-related damages.
Once in Tidy form, we exported the dataset to an external .csv file and proceeded with our analyses.
The following code can be used to import, clean, analyze, and export our data as described above. We conclude this section with a summary of our findings.
# Import Wikipedia data tables into a list of dataframes
temp <- read_html('https://en.wikipedia.org/wiki/Atlantic_hurricane_season')
storms <- temp %>%
html_nodes("table") %>%
html_table(fill = TRUE)
# Export Wikipedia data (list of dataframes) to create record of raw data.
capture.output(storms, file = "untidy_wikipedia_tables.txt")
# Iterate over list and subset dataframe columns
storms<-storms[c(-1,-2,-3,-4,-5,-6,-7)]
storms<-lapply(storms, function(x) filter(x, x$Year != "Total"))
storms<-lapply(storms, function(x) x[!(names(x) %in% c("Retired names", "Major landfall hurricanes","Number oftropical cyclones", "Notes", "Strongeststorm"))])
# Convert Year column to character type and rename columns as appropriate
storms <- lapply(storms, function(x) {x$Year <- as.character(x$Year);x})
storms<- lapply(storms, function(x) {colnames(x)[2] <- 'Number_Tropical_Storms'; x})
storms<- lapply(storms, function(x) {colnames(x)[3] <- 'Number_Hurricanes'; x})
storms<- lapply(storms, function(x) {colnames(x)[4] <- 'Number_Major_Hurricanes'; x})
storms<- lapply(storms, function(x) {colnames(x)[5] <- 'Accumulated_Cyclone_Energy'; x})
storms<- lapply(storms, function(x) {colnames(x)[7] <- 'Damage_USD'; x})
# Combine list elements to create a single dataframe
storms<-purrr::map_dfr(storms[], dplyr::bind_rows)
# Clean/rename Deaths col and change type to integer
storms<-storms%>%
mutate(Deaths, Deaths=sub("None", "0", Deaths))%>%
mutate(Deaths, Deaths=sub("Unknown", "", Deaths))%>%
mutate(Deaths, Deaths=sub("\\+", "", Deaths))%>%
mutate(Deaths, Deaths=sub(",", "", Deaths))%>%
mutate(Deaths, Deaths=sub(">", "", Deaths))%>%
mutate(Deaths, Deaths=sub("~", "", Deaths))%>%
rename(Min_Known_Deaths = Deaths)
storms$Min_Known_Deaths <- as.integer(storms$Min_Known_Deaths)
# Clean Damage_USD column and convert to type integer
storms<-storms%>%
mutate(Damage_USD, Damage_USD=sub("\\$", "", Damage_USD))%>%
mutate(Damage_USD, Damage_USD=sub("\\.", "", Damage_USD))%>%
mutate(Damage_USD, Damage_USD=sub("\\smillion", "000000", Damage_USD))%>%
mutate(Damage_USD, Damage_USD=sub("\\sbillion", "000000000", Damage_USD))%>%
mutate(Damage_USD, Damage_USD=sub(">|\\+|,", "", Damage_USD))%>%
mutate(Damage_USD = str_extract_all(Damage_USD, "\\d+"))
storms$Damage_USD<-as.integer(storms$Damage_USD)
# Review final dataframe: first five rows
head(storms, 5)%>%kbl%>%kable_material(c("striped"))
Year | Number_Tropical_Storms | Number_Hurricanes | Number_Major_Hurricanes | Accumulated_Cyclone_Energy | Min_Known_Deaths | Damage_USD |
---|---|---|---|---|---|---|
1900 | 7 | 3 | 2 | 83.35 | 8000 | 6.00e+07 |
1901 | 12 | 5 | 1 | 98.98 | 10 | 1.00e+06 |
1902 | 5 | 3 | 0 | 32.65 | 0 | NA |
1903 | 10 | 7 | 1 | 102.07 | 228 | 1.15e+08 |
1904 | 5 | 3 | 0 | 30.35 | 87 | 1.00e+06 |
# Save final dataframe as .csv file
write.csv(storms, "Proj2_Atlantic_Hurricanes.csv")
# Calculate Summary Statistics and obtain average percentage of hurricanes and major hurricanes over the period of record (feature engineering).
summary(storms)
## Year Number_Tropical_Storms Number_Hurricanes
## Length:121 Min. : 1.00 Min. : 0.000
## Class :character 1st Qu.: 7.00 1st Qu.: 4.000
## Mode :character Median :11.00 Median : 5.000
## Mean :10.74 Mean : 5.612
## 3rd Qu.:13.00 3rd Qu.: 7.000
## Max. :30.00 Max. :15.000
##
## Number_Major_Hurricanes Accumulated_Cyclone_Energy Min_Known_Deaths
## Min. :0.000 Min. : 2.53 Min. : 0.00
## 1st Qu.:1.000 1st Qu.: 49.77 1st Qu.: 25.75
## Median :2.000 Median : 83.48 Median : 100.50
## Mean :2.223 Mean : 93.35 Mean : 818.32
## 3rd Qu.:3.000 3rd Qu.:126.30 3rd Qu.: 486.75
## Max. :7.000 Max. :258.57 Max. :12000.00
## NA's :1
## Damage_USD
## Min. :6.700e+04
## 1st Qu.:4.500e+07
## Median :1.000e+08
## Mean :2.496e+08
## 3rd Qu.:3.090e+08
## Max. :1.575e+09
## NA's :52
p_hurricane<-storms%>%select(Number_Tropical_Storms, Number_Hurricanes)%>%mutate(n_h = (Number_Hurricanes/Number_Tropical_Storms)*100)%>%summarize(mean_h = mean(n_h, na.rm=TRUE))
p_major_hurricane<-storms%>%select(Number_Major_Hurricanes, Number_Hurricanes)%>%mutate(percent = (Number_Major_Hurricanes/Number_Hurricanes)*100)%>%summarize(mean_major = mean(percent, na.rm=TRUE))
# Plot storms counts over time
num <- storms%>%select(Year, Number_Hurricanes, Number_Major_Hurricanes)%>%
rename(Hurricanes = Number_Hurricanes, Major_Hurricanes=
Number_Major_Hurricanes)%>%
pivot_longer(cols=-c(Year), names_to = "Storm_Type", values_to=
"Storm_Number")
num%>%ggplot(aes(x = Year, y = Storm_Number, group = Storm_Type, color=
Storm_Type))+
geom_line()+
scale_x_discrete(breaks=c("1900","1920", "1940", "1960",
"1980","2000","2020"))+
theme_bw()+
theme(axis.title.x = element_text(size=14))+
theme(axis.title.y = element_text(size=14))+
labs(y="Number of Storms", x = "Year")+
labs(color = "Storm Type")+
ggtitle("Figure. Annual Count of Atlantic Storms Since 1900")+
theme(plot.title = element_text(hjust = 0.5))
# Compare accumulated storm energy from 1900-Present
ace <- storms%>%select(Year, Accumulated_Cyclone_Energy)
ace%>%ggplot(aes(x = Year, y = Accumulated_Cyclone_Energy))+
geom_point(color="light blue")+
scale_x_discrete(breaks=c("1900","1920", "1940", "1960", "1980",
"2000","2020"))+
geom_hline(yintercept = 152.5, linetype="dotted", color = "dark green", size=0.750)+
geom_vline(xintercept = "2005", linetype="dotted", color = "red",
size=0.75)+
theme_bw()+
theme(axis.title.x = element_text(size=11))+
theme(axis.title.y = element_text(size=11))+
labs(y="ACE (sq_kn)", x = "Year", title = "Figure 2. Annual Accumulated Energy of
Atlantic Storms Since 1900", subtitle = "Reported in Squared Knots (s_kn)")+
theme(plot.title = element_text(hjust = 0.5))+
theme(plot.subtitle=element_text(size=11, hjust=0.5,
color="black"))
# Compare storm related deaths vs. accumulated storm energy by year
ace_death<-storms
ace_death%>%ggplot(aes(Accumulated_Cyclone_Energy, Min_Known_Deaths))+
geom_point(color="light blue")+
theme_bw()+
theme(axis.title.x = element_text(size=10))+
theme(axis.title.y = element_text(size=10))+
labs(y="Documented Deaths (Minimum Count)", x = "Accumulated
Cyclone Energy (s_kn)", title = "Figure 2. Storm Related Deaths vs. Cumulative Storm
Intensity: 1900 to Present")+
theme(plot.title = element_text(hjust = 0.5))
# Compare total storm damage vs. accumulated storm energy by year
ace_damage<-storms
ace_damage%>%ggplot(aes(Accumulated_Cyclone_Energy, Damage_USD))+
geom_point(color="light blue")+
theme_bw()+
theme(axis.title.x = element_text(size=10))+
theme(axis.title.y = element_text(size=10))+
labs(y="Storm Damage (USD)", x = "Accumulated
Cyclone Energy (s_kn)", title = "Figure 3. Total Storm Damage vs. Cumulative Storm
Intensity: 1900 to Present")+
theme(plot.title = element_text(hjust = 0.5))
The total number of tropical storms recorded in the Atlantic ranged from 0 to 15 from years 1900-2020. Approximately 52% of these storms were documented hurricanes. And approximately 38% of these hurricanes were identified as Category 3 or higher on the Saffir–Simpson hurricane wind scale (i.e., major hurricanes). While inter-annual variation in the number of Atlantic hurricanes was high, there appears to be an upward trend in these numbers since 1900 (Figure 1). In contrast, there were no apparent trends in Accumulated Cyclone Energy (ACE: an aggregate index of storm energy) over the same period (Figure 2).
Similarly, there was no obvious relationship between annual ACE estimates and either storm related deaths or damage. These results were unexpected and may owe to several factors: 1) hurricanes that made landfall were not distinguished in the data; and 2) inter-site variations in population and infrastructure may mask the impact of hurricanes, particularly when data are aggregated annually and over large spatial scales. Both were the case for this dataset.
It is interesting to note that there were only 19 “extremely active” hurricane seasons in the past 120 years(i.e., ACE >152.5; see points above dashed green line in Figure 2). And that only year 1930 exceeded the 2005 hurricane season in terms of total storm activity (ACE 258.6 vs 250.1, respectively). Figure 2 includes a dashed red line to indicate year 2005 - which included Hurricane Katrina. While 2005 ranked 8th highest in storm-related mortality (3,912 deaths), the 1998 storm season was far deadlier (~12,000 deaths) due to the affects of Hurricane Mitch. Unfortunately, our dataset did not include estimates of storm-related damages for the 2005 hurricane season.
Our analyses can be refined/improved through additional data collection and documentation. We offer the following recommendations for future study: 1) limit the scope of analysis to the Gulf of Mexico, compile data for individual storm events rather than annual totals; 2) distinguish hurricanes that made landfall from annual counts; 3) acquire and verify additional estimates of storm damage, 4) employ time series models to evaluate any trends in annual hurricane patterns over time.
For information regarding ACE measurements and categories, please refer to the following websites:
National Weather Service Climate Predication Center: https://www.cpc.ncep.noaa.gov/products/outlooks/Background.html
Saffir Simpson Scale: https://bit.ly/3cvRIep