1. Wikipedia: Overview

In this section we explored patterns of recorded hurricane activity from 1900 -to- present in the Atlantic Ocean, Gulf of Mexico and Caribbean Sea. Our goal was to address the following questions:

  1. Has seasonal hurricane activity intensified over the past century across regions represented in this dataset?
  2. Do composite measures of storm intensity (e.g., annual metrics) correlate well with storm related deaths and/or property damage?
  3. Was the 2005 hurricane season, which included Hurricane Katrina, unusual in terms of its intensity?

Data used in our analysis are included in NOAA’s “North Atlantic hurricane Database (HURDAT)”. However, we scraped our information from 20 html tables listed on the Wikipedia page, “Atlantic hurricane season” - https://en.wikipedia.org/wiki/Atlantic_hurricane_season.

Steps taken to format the data for analysis in R (i.e., Tidy format) included: 1) importing the data into RStudio as a list of dataframes (each representing different periods of time); 2) iterating over this list to subset, filter, rename, and update column types; 3) addressing inconsistencies in variable entry, time-span, and title, and 4) combining these elements (20 total) into a final dataframe.

Additional cleaning was required to address uncertainties inherent to values recorded for select variables. For example, storm-related deaths included values input as “Unknown”, single numbers, approximate values, and/or a range of values. We converted this information to integer type and limited values to the minimum number of known deaths; values included as “Unknown” were treated as missing (NA) data. We applied similar steps to update a variable for storm-related damages.

Once in Tidy form, we exported the dataset to an external .csv file and proceeded with our analyses.

The following code can be used to import, clean, analyze, and export our data as described above. We conclude this section with a summary of our findings.

# Import Wikipedia data tables into a list of dataframes

temp <- read_html('https://en.wikipedia.org/wiki/Atlantic_hurricane_season')
storms <- temp %>%
  html_nodes("table") %>%
  html_table(fill = TRUE)

# Export Wikipedia data (list of dataframes) to create record of raw data.

capture.output(storms, file = "untidy_wikipedia_tables.txt")

# Iterate over list and subset dataframe columns

storms<-storms[c(-1,-2,-3,-4,-5,-6,-7)]

storms<-lapply(storms, function(x) filter(x, x$Year != "Total"))
storms<-lapply(storms, function(x) x[!(names(x) %in% c("Retired names", "Major landfall hurricanes","Number oftropical cyclones", "Notes", "Strongeststorm"))])

# Convert Year column to character type and rename columns as appropriate

storms <- lapply(storms, function(x) {x$Year <- as.character(x$Year);x})

storms<- lapply(storms, function(x) {colnames(x)[2] <- 'Number_Tropical_Storms'; x})
storms<- lapply(storms, function(x) {colnames(x)[3] <- 'Number_Hurricanes'; x}) 
storms<- lapply(storms, function(x) {colnames(x)[4] <- 'Number_Major_Hurricanes'; x})
storms<- lapply(storms, function(x) {colnames(x)[5] <- 'Accumulated_Cyclone_Energy'; x})
storms<- lapply(storms, function(x) {colnames(x)[7] <- 'Damage_USD'; x})

# Combine list elements to create a single dataframe 

storms<-purrr::map_dfr(storms[], dplyr::bind_rows)

# Clean/rename Deaths col and change type to integer

storms<-storms%>%
  mutate(Deaths, Deaths=sub("None", "0", Deaths))%>%
  mutate(Deaths, Deaths=sub("Unknown", "", Deaths))%>%
  mutate(Deaths, Deaths=sub("\\+", "", Deaths))%>%
  mutate(Deaths, Deaths=sub(",", "", Deaths))%>%
  mutate(Deaths, Deaths=sub(">", "", Deaths))%>%
  mutate(Deaths, Deaths=sub("~", "", Deaths))%>%
  rename(Min_Known_Deaths = Deaths) 

storms$Min_Known_Deaths <- as.integer(storms$Min_Known_Deaths)


# Clean Damage_USD column and convert to type integer

storms<-storms%>%
  mutate(Damage_USD, Damage_USD=sub("\\$", "",  Damage_USD))%>%
  mutate(Damage_USD, Damage_USD=sub("\\.", "", Damage_USD))%>%
  mutate(Damage_USD, Damage_USD=sub("\\smillion", "000000", Damage_USD))%>%
  mutate(Damage_USD, Damage_USD=sub("\\sbillion", "000000000",  Damage_USD))%>%
  mutate(Damage_USD, Damage_USD=sub(">|\\+|,", "",  Damage_USD))%>%
  mutate(Damage_USD = str_extract_all(Damage_USD, "\\d+"))

storms$Damage_USD<-as.integer(storms$Damage_USD)

# Review final dataframe: first five rows

head(storms, 5)%>%kbl%>%kable_material(c("striped"))
Year Number_Tropical_Storms Number_Hurricanes Number_Major_Hurricanes Accumulated_Cyclone_Energy Min_Known_Deaths Damage_USD
1900 7 3 2 83.35 8000 6.00e+07
1901 12 5 1 98.98 10 1.00e+06
1902 5 3 0 32.65 0 NA
1903 10 7 1 102.07 228 1.15e+08
1904 5 3 0 30.35 87 1.00e+06
# Save final dataframe as .csv file

write.csv(storms, "Proj2_Atlantic_Hurricanes.csv")

# Calculate Summary Statistics and obtain average percentage of hurricanes and major hurricanes over the period of record (feature engineering).

summary(storms)
##      Year           Number_Tropical_Storms Number_Hurricanes
##  Length:121         Min.   : 1.00          Min.   : 0.000   
##  Class :character   1st Qu.: 7.00          1st Qu.: 4.000   
##  Mode  :character   Median :11.00          Median : 5.000   
##                     Mean   :10.74          Mean   : 5.612   
##                     3rd Qu.:13.00          3rd Qu.: 7.000   
##                     Max.   :30.00          Max.   :15.000   
##                                                             
##  Number_Major_Hurricanes Accumulated_Cyclone_Energy Min_Known_Deaths  
##  Min.   :0.000           Min.   :  2.53             Min.   :    0.00  
##  1st Qu.:1.000           1st Qu.: 49.77             1st Qu.:   25.75  
##  Median :2.000           Median : 83.48             Median :  100.50  
##  Mean   :2.223           Mean   : 93.35             Mean   :  818.32  
##  3rd Qu.:3.000           3rd Qu.:126.30             3rd Qu.:  486.75  
##  Max.   :7.000           Max.   :258.57             Max.   :12000.00  
##                                                     NA's   :1         
##    Damage_USD       
##  Min.   :6.700e+04  
##  1st Qu.:4.500e+07  
##  Median :1.000e+08  
##  Mean   :2.496e+08  
##  3rd Qu.:3.090e+08  
##  Max.   :1.575e+09  
##  NA's   :52
p_hurricane<-storms%>%select(Number_Tropical_Storms, Number_Hurricanes)%>%mutate(n_h = (Number_Hurricanes/Number_Tropical_Storms)*100)%>%summarize(mean_h = mean(n_h, na.rm=TRUE))

p_major_hurricane<-storms%>%select(Number_Major_Hurricanes, Number_Hurricanes)%>%mutate(percent = (Number_Major_Hurricanes/Number_Hurricanes)*100)%>%summarize(mean_major = mean(percent, na.rm=TRUE))

# Plot storms counts over time 

num <- storms%>%select(Year, Number_Hurricanes, Number_Major_Hurricanes)%>%
  rename(Hurricanes = Number_Hurricanes, Major_Hurricanes=
    Number_Major_Hurricanes)%>%
    pivot_longer(cols=-c(Year), names_to = "Storm_Type", values_to=
    "Storm_Number")

num%>%ggplot(aes(x = Year, y = Storm_Number, group = Storm_Type, color=
    Storm_Type))+
    geom_line()+
    scale_x_discrete(breaks=c("1900","1920", "1940", "1960",
    "1980","2000","2020"))+
    theme_bw()+
    theme(axis.title.x = element_text(size=14))+
    theme(axis.title.y = element_text(size=14))+
    labs(y="Number of Storms", x = "Year")+
    labs(color = "Storm Type")+
    ggtitle("Figure. Annual Count of Atlantic Storms Since 1900")+
    theme(plot.title = element_text(hjust = 0.5))

# Compare accumulated storm energy from 1900-Present

ace <- storms%>%select(Year, Accumulated_Cyclone_Energy)
    
ace%>%ggplot(aes(x = Year, y = Accumulated_Cyclone_Energy))+
    geom_point(color="light blue")+
    scale_x_discrete(breaks=c("1900","1920", "1940", "1960", "1980",
    "2000","2020"))+
    geom_hline(yintercept = 152.5, linetype="dotted", color = "dark green", size=0.750)+
    geom_vline(xintercept = "2005", linetype="dotted", color = "red",
    size=0.75)+
    theme_bw()+
    theme(axis.title.x = element_text(size=11))+
    theme(axis.title.y = element_text(size=11))+
    labs(y="ACE (sq_kn)", x = "Year", title = "Figure 2. Annual Accumulated Energy of
    Atlantic Storms Since 1900", subtitle = "Reported in Squared Knots (s_kn)")+
    theme(plot.title = element_text(hjust = 0.5))+
    theme(plot.subtitle=element_text(size=11, hjust=0.5,
    color="black"))

# Compare storm related deaths vs. accumulated storm energy by year

ace_death<-storms

ace_death%>%ggplot(aes(Accumulated_Cyclone_Energy, Min_Known_Deaths))+
    geom_point(color="light blue")+
    theme_bw()+
    theme(axis.title.x = element_text(size=10))+
    theme(axis.title.y = element_text(size=10))+
    labs(y="Documented Deaths (Minimum Count)", x = "Accumulated
    Cyclone Energy (s_kn)", title = "Figure 2. Storm Related Deaths vs. Cumulative Storm
    Intensity: 1900 to Present")+
    theme(plot.title = element_text(hjust = 0.5))

# Compare total storm damage vs. accumulated storm energy by year

ace_damage<-storms

ace_damage%>%ggplot(aes(Accumulated_Cyclone_Energy, Damage_USD))+
    geom_point(color="light blue")+
    theme_bw()+
    theme(axis.title.x = element_text(size=10))+
    theme(axis.title.y = element_text(size=10))+
    labs(y="Storm Damage (USD)", x = "Accumulated
    Cyclone Energy (s_kn)", title = "Figure 3. Total Storm Damage vs. Cumulative Storm
    Intensity: 1900 to Present")+
    theme(plot.title = element_text(hjust = 0.5))

Wikipedia: Results

The total number of tropical storms recorded in the Atlantic ranged from 0 to 15 from years 1900-2020. Approximately 52% of these storms were documented hurricanes. And approximately 38% of these hurricanes were identified as Category 3 or higher on the Saffir–Simpson hurricane wind scale (i.e., major hurricanes). While inter-annual variation in the number of Atlantic hurricanes was high, there appears to be an upward trend in these numbers since 1900 (Figure 1). In contrast, there were no apparent trends in Accumulated Cyclone Energy (ACE: an aggregate index of storm energy) over the same period (Figure 2).

Similarly, there was no obvious relationship between annual ACE estimates and either storm related deaths or damage. These results were unexpected and may owe to several factors: 1) hurricanes that made landfall were not distinguished in the data; and 2) inter-site variations in population and infrastructure may mask the impact of hurricanes, particularly when data are aggregated annually and over large spatial scales. Both were the case for this dataset.

It is interesting to note that there were only 19 “extremely active” hurricane seasons in the past 120 years(i.e., ACE >152.5; see points above dashed green line in Figure 2). And that only year 1930 exceeded the 2005 hurricane season in terms of total storm activity (ACE 258.6 vs 250.1, respectively). Figure 2 includes a dashed red line to indicate year 2005 - which included Hurricane Katrina. While 2005 ranked 8th highest in storm-related mortality (3,912 deaths), the 1998 storm season was far deadlier (~12,000 deaths) due to the affects of Hurricane Mitch. Unfortunately, our dataset did not include estimates of storm-related damages for the 2005 hurricane season.

Our analyses can be refined/improved through additional data collection and documentation. We offer the following recommendations for future study: 1) limit the scope of analysis to the Gulf of Mexico, compile data for individual storm events rather than annual totals; 2) distinguish hurricanes that made landfall from annual counts; 3) acquire and verify additional estimates of storm damage, 4) employ time series models to evaluate any trends in annual hurricane patterns over time.

For information regarding ACE measurements and categories, please refer to the following websites:

  1. National Weather Service Climate Predication Center: https://www.cpc.ncep.noaa.gov/products/outlooks/Background.html

  2. Saffir Simpson Scale: https://bit.ly/3cvRIep