Introduction

With this mini-project, I endeavour to look at the factors that impact Hurricanes in the Atlantic Ocean. The 2020 Atlantic hurricane season is the most active Atlantic hurricane season on record, featuring tropical cyclone formation at a record-breaking rate. There have been 31 tropical or subtropical cyclones, 30 named storms, 13 hurricanes, and six major hurricanes, including one Category 5 hurricane.

Before we jump into the coding aspects, I would like to clarify a few concepts related to the topic to help understand the project.

Compared with El Niño conditions, La Niña conditions are generally more favorable for the formation of Atlantic hurricanes2.

With that in mind, let’s investigate the relationship between ACE and ONI. Is the relationship evident.


Data Gathering

The first step is to gather the required information. For our analysis, we will use the following resources.

  1. ACE Information - Yearly (from Wikipedia - https://en.wikipedia.org/wiki/Accumulated_cyclone_energy)
  2. ACE Information - Monthly (from National Oceanic and Atmospheric Administration - https://www.noaa.gov/)
  3. ONI Information - Monthly (from R Package - RSOI - https://cran.r-project.org/web/packages/rsoi/)

ACE Data - Yearly :: Scrape from Wikipedia

First, let us web-scrape from Wikipedia. The ACE information is written in Table format on the Wikipedia page. Below is a screenshot of the page that we are looking to scrape.

Screenshot from Wikipedia-Atlantic Ocean ACE history

To scrape from Wikipedia, we will require the rvest package.

The first step is to read the page into an object called wbpg using the read_html() function. Then we extract any table information into the tbl object. Once, we have done that, we then parse specifically for the Accumulated Cyclone Energy - Atlantic table using the grep function and read that into a data.frame.

# Load rvest library
library(rvest)
library(dplyr)
library(kableExtra) 
library(tidyverse)
library(rsoi)
library(plotly)
#### NOTE: SOURCE: https://www.r-bloggers.com/2018/09/scraping-tables-from-wikipedia-for-visualizing-climate-data/
url  <- "https://en.wikipedia.org/wiki/Accumulated_cyclone_energy"
# Read HTML page into wbpg object
wbpg <- read_html(url)
# Read table information from wbpg object into tbl object
tbl  <- html_nodes(wbpg, "table")
# Read required table and save to ACE_Yearly_DF data.frame
ACE_Yearly_DF <- html_table(tbl[grep("Accumulated Cyclone Energy - Atlantic\n",tbl,ignore.case = T)],fill = T)[[1]]
colnames(ACE_Yearly_DF)[1]<-"Year"
# Add Minor Tropical Storms column by calculating it from Tropical Storms, Hurricanes & Major Hurricanes
ACE_Yearly_DF$MTS <- ACE_Yearly_DF$TS-ACE_Yearly_DF$HU--ACE_Yearly_DF$MH
# Storing dataframe to csv 
write.csv(ACE_Yearly_DF,'ACE_Yearly_DF.csv')

#ACE_Yearly_DF <- read.csv('ACE_Yearly_DF.csv')

Now that we have downloaded the data into the data.frame, let’s view the data.frame contents. In RMarkdown, to display the table - I am using the kableExtra package.

# kableExtra is a package to use with RMarkdown to display tables. 
# The dataframe is quite large and I onlt want to display the first five rows. These are piped to Kable using the dplyr slice function. 

ACE_Yearly_DF %>% slice(1:5) %>%
  kbl(caption = "ACE DataFrame (Yearly) - showing first 5 rows") %>%
  kable_paper("hover", full_width = F, position = "center") %>% 
  row_spec(0,color="black",background = "grey")
ACE DataFrame (Yearly) - showing first 5 rows
Year ACE TS HU MH Classification MTS
1851 36.24 6 3 1 Below normal 4
1852 73.28 5 5 1 Near normal 1
1853 76.49 8 4 2 Near normal 6
1854 31.00 5 3 1 Below normal 3
1855 18.12 5 4 1 Below normal 2

ACE Data - Monthly :: Scrape from NOAA

It would be beneficial also to download/scrape the monthly ACE data from the NOAA website. The format for this website is not typical and resembles an ASCII or TSV file. Each line highlights the year & the rolling average ACE for each month of the year.

Screenshot NOAA ACE Data

Given the nature of the webpage (looking at source code), we can directly read the lines using the readlines() function. This information is stored in a character vector wbpg2. We do not require the first line and the last five lines. Then we have to do a series of string substitutions to clean up the data.

  1. Since the white-spaces between the values are not uniformly distributed, let’s replace all the single & double spaces with and
  2. Next, let’s get rid of the double ‘andand’ and replace with single ‘and’
  3. There is still a space at the start of each row. To remove that let’s use the str_replace option.

Once, the clean up is done. We can convert it to a data.frame.

# Read & store lines in wbpg2 in character vector
wbpg2   <- readLines("https://www.psl.noaa.gov/gcos_wgsp/Timeseries/Hurricane/hurr.atl.ace.data")
# Eliminate first and last five lines
df.temp <- wbpg2[-c(1,171:178)]
# 
df.temp <- gsub(pattern = ' |  |   ',replacement = 'and',x = df.temp)
df.temp <- gsub(pattern = 'andand',replacement = 'and',x = df.temp)
df.temp <- str_replace(string = df.temp,pattern = 'and',replacement = '')
df.temp <- as.data.frame(df.temp)
clnames <- c("Year","Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")
ACE_Monthly_DF <- df.temp %>% 
  separate(df.temp, clnames, "and") %>% 
  gather(key = "Month",value = "ACE",-Year) %>% 
  mutate(Year = as.numeric(Year), ACE = as.numeric(ACE)) %>% 
  arrange(Year) %>% 
  select(Year,Month,ACE)

write.csv(ACE_Monthly_DF,'ACE_Monthly_DF.csv')
#ACE_Monthly_DF <- read.csv('ACE_Monthly_DF.csv')

Once again, now lets view the data.frame using Kable.

ACE_Monthly_DF %>% slice(5:9) %>%
  kbl(caption = "ACE DataFrame (Monthly)") %>%
  kable_paper("hover", full_width = F, position = "center") %>% 
  row_spec(0,color="black",background = "grey")
ACE DataFrame (Monthly)
Year Month ACE
1851 May 0.00
1851 Jun 4.91
1851 Jul 0.89
1851 Aug 21.83
1851 Sep 4.00

ONI Data :: Download from RSOI Package

The download_enso() function from the RSOI package directly downloads the data into a dataframe. The data is downloaded in monthly intervals. For comparison with ACE Yearly values, let’s summarize the data into annual data also. We are only interested in ONI data, hence, we are only retaining ONI data. Once the yearly average ONI has been calculated, we can then assign the phase based on the ONI value.

enso_monthly_DF <- download_enso()
enso_yearly_DF <- enso_monthly_DF %>% group_by(Year) %>% summarise(ONI=mean(ONI))
enso_yearly_DF <- enso_yearly_DF %>%  mutate(phase = case_when(ONI > 0.5 ~ "Warm Phase/El Nino", 
                                ONI < -0.5  ~ "Cool Phase/La Nina",
                                ONI < 0.5 & ONI >-0.5  ~ "Neutral Phase"))
write.csv(enso_monthly_DF,'enso_monthly_DF.csv')
write.csv(enso_yearly_DF,'enso_yearly_DF.csv')

#enso_yearly_DF <- read.csv('enso_yearly_DF.csv')
#enso_monthly_DF <- read.csv('enso_monthly_DF.csv')

Visualization

Now that the data is loaded into the appropriate dataframes, let’s visualize the data. For visualization, we shall use the plotly package. For ease of use, it would be useful to store font information as a list. For this purpose, I have created two lists (t1 & t2) to store font information that will be used for axis title and tick fonts respectively.

# Merging ACE_Yearly_DF and enso_yearly_DF for ease. 
DF_Graph_Yearly <- merge(ACE_Yearly_DF,enso_yearly_DF,by="Year")


# Font Descriptions to use for Plot. 
t1 <- list(family = "Arial, sans-serif", size = 13, color = "black")
t2 <- list(family = "Arial, sans-serif", size = 12, color = "black")

#Creating a DF with Mean_ACE & Mean_Tropical Storms for use in plot 
hz_line_mean <- data.frame("Year"= DF_Graph_Yearly$Year,"Mean_TS"=rep(mean(ACE_Yearly_DF$TS),nrow(DF_Graph_Yearly)),"Mean_ACE"=rep(mean(ACE_Yearly_DF$ACE),nrow(DF_Graph_Yearly)))

fig.1 <- plot_ly(ACE_Yearly_DF)
# Adding Tropical Storms as bar chart
fig.1 <- fig.1 %>% add_trace(data=DF_Graph_Yearly,x=~Year,y=~MTS,type="bar",name="Minor Tropical Storms",marker = list(color = 'rgb(158,202,225)'))
# Adding Hurricanes as line chart
fig.1 <- fig.1 %>% add_trace(x=~Year, y=~HU, name = 'Hurricanes',type="scatter",mode="lines",line=list(width=4))
# Adding Major Hurricanes as line chart
fig.1 <- fig.1 %>% add_trace(x=~Year, y=~MH, name = 'Major Hurricanes',type="scatter",mode="lines",line=list(width=4))
#Adding Mean Tropical Storms Line to Plot
fig.1 <- fig.1 %>% add_trace(x=hz_line_mean$Year,y=hz_line_mean$Mean_TS,name="Mean Total Tropical Storms",            
                             type="scatter",mode="lines",line=list(width=3,dash='dash',color='blue'))
fig.1 <- fig.1 %>% layout(title="Tropical Storms - Atlantic Basin",
                          yaxis = list(title = 'Count of Tropical Storms & Hurricanes',titlefont=t1,tickfont=t2,showgrid=F,showline = T), 
                          annotations = list(text = "Increase in ACE since 2005", font=t1, x = 2010, y = 20,showarrow=FALSE ),
                          shapes = list(list(type = "rect", fillcolor = "grey", line = list(color = "grey"), opacity = 0.25,
                                              x0 = 2000, x1 = 2020, xref = "x",
                                              y0 = 0, y1 = 30, yref = "y")),
                          barmode = 'group',
                          legend = list(orientation = 'v',x=0.05,y=0.9),
                          xaxis=list(autotick=FALSE, dtick=5,title="Year",titlefont=t1,tickfont=t2,showgrid=F))
fig.1
## Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
## Please use `arrange()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
##### NOTE: CODE FROM GRAPHS HAVE BEEN SOURCED FROM PLOTLY

We can see that there is definitely an increase in number of tropical storms in the Atlantic Ocean since 2005. Similarly let’s look at the ACE data over the years.

fig.2 <- plot_ly(ACE_Yearly_DF)
# Adding ACE as bar chart
fig.2 <- fig.2 %>% add_trace(data=DF_Graph_Yearly,x=~Year,y=~ACE,type="bar",name="Minor Tropical Storms",marker = list(color = 'rgb(158,202,225)'))
#Adding Mean Tropical Storms Line to Plot
fig.2 <- fig.2 %>% add_trace(x=hz_line_mean$Year,y=hz_line_mean$Mean_ACE,name="Mean ACE",            
                             type="scatter",mode="lines",line=list(width=3,dash='dash',color='blue'))
fig.2 <- fig.2 %>% layout(title="Accumulated Cyclone Energy - Atlantic Basin",
                          yaxis = list(title = 'ACE (10^4 kn^2)',titlefont=t1,tickfont=t2,showgrid=F,showline = T), 
                          annotations = list(text = "Increase in ACE since 2005", font=t1, x = 2010, y = 200,showarrow=FALSE ),
                          shapes = list(list(type = "rect", fillcolor = "grey", line = list(color = "grey"), opacity = 0.25,
                                              x0 = 2000, x1 = 2020, xref = "x",
                                              y0 = 0, y1 = 250, yref = "y")),
                          barmode = 'group',
                          legend = list(orientation = 'v',x=0.05,y=0.9),
                          xaxis=list(autotick=FALSE, dtick=5,title="Year",titlefont=t1,tickfont=t2,showgrid=F))
fig.2

Now, lets look at the ONI over the years. We can see that there is a period cycling on El Nino & La Nina conditions.

#Creating a DF with El Nino & La Nina thresholds
hz_line_ONI <- data.frame("Year"= DF_Graph_Yearly$Year,"LaNina"=rep(-0.5,nrow(DF_Graph_Yearly)),"ElNino"=rep(0.5,nrow(DF_Graph_Yearly)))

# Font Descriptions to use for Plot - Annotations
f1 <- list( family = "Arial, sans-serif", size = 13, color = "red" )
f2 <- list( family = "Arial, sans-serif", size = 13, color = "blue")


fig.3 <- plot_ly(DF_Graph_Yearly)
# Adding ONI as bar chart - red for positive & blue for negative
fig.3 <- fig.3 %>% add_trace(data=DF_Graph_Yearly,x=~Year,y=~ONI,type="bar",name="ONI",color = ~ONI < 0, colors = c("orangered", "royalblue"))
# Adding La Nina Threshold
fig.3 <- fig.3 %>% add_trace(x=hz_line_ONI$Year,y=hz_line_ONI$LaNina,name="La Nina Threshold",            
                             type="scatter",mode="lines",line=list(width=3,dash='dash',color='blue'))
# Adding El Nino Threshold
fig.3 <- fig.3 %>% add_trace(x=hz_line_ONI$Year,y=hz_line_ONI$ElNino,name="El Nino Threshold",            
                             type="scatter",mode="lines",line=list(width=3,dash='dash',color='red'))
fig.3 <- fig.3 %>% layout(title="Oceanic Niño Index",
                          yaxis = list(title = 'ONI (°C)',titlefont=t1,tickfont=t2,showgrid=F,showline = T),
                          annotations = list(
                            list(text = "El Nino Threshold", font=f1, x = 2009, y = 0.60,showarrow=FALSE ),
                            list(text = "La Nina Threshold", font=f2, x = 1963, y = -0.60,showarrow=FALSE )),
                          showlegend = FALSE,
                          xaxis=list(autotick=FALSE, dtick=5,title="Year",titlefont=t1,tickfont=t2,showgrid=F))
fig.3

Data Analysis - Trend Analysis

Now that we have explored the data, we can investigate the relationships between the ACE and ONI. As a first step, let’s plot ACE & ONI on the same plot. Let’s look at the monthly data from 2000 to 2020 as a first step.

# Merging ACE_Yearly_DF and enso_yearly_DF for ease.
DF_Graph_Monthly <- merge(ACE_Monthly_DF,enso_monthly_DF,by=c("Year","Month"))

#margin descriptions
m = list(l = 35,r = 40,b = 20,t = 50,pad = 1)

fig.4 <- plot_ly(DF_Graph_Monthly)
# Adding ACE data to Plot
fig.4 <- fig.4 %>% add_trace(data=DF_Graph_Monthly,x=~Date,y=~ACE,type="bar",name="ACE-Monthly")
#Adding ONI data to Plot
fig.4 <- fig.4 %>% add_trace(data=DF_Graph_Monthly,x=~Date,y=~ONI,type="bar",name="ONI-Monthly", yaxis='y2',marker=list(opacity=0.4))
fig.4 <- fig.4 %>%  layout(title="Accumulated Cyclone Energy - Atlantic Basin (2005-2020)",
                           yaxis = list(title = 'ACE (10^4 kn^2)', titlefont=t1,tickfont=t2,showgrid=F,showline = T,side='left',zeroline=F,range=c(-180,180)), 
                           yaxis2 = list(title = 'ONI °C', titlefont=t1,tickfont=t2,showgrid=F,showline = T,side='right',overlaying = "y",zeroline=F,range=c(-3,3)), 
                          legend = list(orientation = 'v',x=0.05,y=0.9),
                          xaxis=list(autotick=T,title="Year",titlefont=t1,tickfont=t2,showgrid=F,showline = T,zeroline=F,
                                     range = c("2005-01-01", "2020-01-01")),
                          margin = m)
fig.4

To view long-term trends, we can use the LOESS (locally weighted smoothing) function to generate a smoothed out line. Loess regression can be applied using the loess() on a numerical vector to smoothen it and to predict the Y locally (i.e, within the trained values of Xs). The size of the neighborhood can be controlled using the span argument, which ranges between 0 to 1. It controls the degree of smoothing. So, the greater the value of span, more smooth is the fitted curve. In this case, let’s try two smoothing values of 0.05 and 0.25. First, we have to build a model and then use the model to predict values of ACE or ONI based on the x-axis i.e. Year in this case.

ACE_y_smooth <- loess(DF_Graph_Yearly$ACE~DF_Graph_Yearly$Year,span=0.05)
ONI_y_smooth <- loess(DF_Graph_Yearly$ONI~DF_Graph_Yearly$Year,span=0.05)
ACE_y_smooth_high <- loess(DF_Graph_Yearly$ACE~DF_Graph_Yearly$Year,span=0.25)
ONI_y_smooth_high <- loess(DF_Graph_Yearly$ONI~DF_Graph_Yearly$Year,span=0.25)
ACE_Pred <-predict(ACE_y_smooth,se=TRUE)
ONI_Pred <-predict(ONI_y_smooth,se=TRUE)
ACE_Pred_high <-predict(ACE_y_smooth_high,se=TRUE)
ONI_Pred_high <-predict(ONI_y_smooth_high,se=TRUE)

fig.5 <- plot_ly(DF_Graph_Yearly)
fig.5 <- fig.5 %>% add_trace(data=DF_Graph_Yearly,x=DF_Graph_Yearly$Year,y=ACE_Pred$fit,type="scatter", mode="line",name="ACE-Smoothed Span=0.05",line=list(color="orangered",width=4))
fig.5 <- fig.5 %>% add_trace(data=DF_Graph_Yearly,x=DF_Graph_Yearly$Year,y=ACE_Pred_high$fit,type="scatter", mode="line",name="ACE-Smoothed Span=0.25",line=list(color="orangered",dash="dash",width=4))
fig.5 <- fig.5 %>% add_trace(data=DF_Graph_Yearly,x=DF_Graph_Yearly$Year,y=ONI_Pred$fit,type="scatter", mode="line",name="ONI-Smoothed Span=0.05", yaxis='y2',line=list(color="navy",width=4))
fig.5 <- fig.5 %>% add_trace(data=DF_Graph_Yearly,x=DF_Graph_Yearly$Year,y=ONI_Pred_high$fit,type="scatter", mode="line",name="ONI-Smoothed Span=0.25", yaxis='y2',line=list(color="navy",dash="dash",width=4))
fig.5 <- fig.5 %>%  layout(title="Accumulated Cyclone Energy - Atlantic Basin",
                            yaxis = list(title = 'ACE (10^4 kn^2)', titlefont=t1,tickfont=t2,showgrid=F,showline = T,side='left',zeroline=F), 
                            yaxis2 = list(title = 'ONI °C', titlefont=t1,tickfont=t2,showgrid=F,showline = T,side='right',overlaying = "y",zeroline=F), 
                           legend = list(orientation = 'v',x=0.45,y=0.9),
                           xaxis=list(autotick=FALSE, dtick=5,title="Year",titlefont=t1,tickfont=t2,showgrid=F,showline = T,zeroline=F,range = c(1995, 2020)),
                           margin = m)

fig.5 

Based on the chart, it is hard to determine if there is relationship between the two.

Let us create a cross-plot of ACE with ONI.

fig.6 <- plot_ly(DF_Graph_Yearly)
fig.6 <- fig.6 %>% add_trace(data=DF_Graph_Yearly,x=~ACE,y=~ONI,type="scatter",color=~phase)
fig.6 <- fig.6 %>% layout(title="Crossplot: ACE vs ONI",
                          yaxis = list(title = 'ONI (°C)',titlefont=t1,tickfont=t2,showgrid=F,showline = T,zeroline=F),
                          legend = list(orientation = 'v',x=0.7,y=0.9),
                          xaxis=list(autotick=F, dtick=20,title="ACE (10^4 kn^2)",titlefont=t1,tickfont=t2,showline = T,showgrid=F,
                                     type="linear",range=c(0,300)))
fig.6

If we calculate the calculate the correlation between the two, we can see that the correlation is -0.253. This is using the Yearly data. The correlation coefficient calculated using the monthly data is -0.08 It seems the annual data perhaps removes some of the noise.

Conditional Probability

The prevailing school of thought is that El Niño conditions favors stronger hurricane activity in the central and eastern Pacific basins, and suppresses it in the Atlantic basin. Conversely, La Niña suppresses hurricane activity in the central and eastern Pacific basins, and enhances it in the Atlantic basin Let’s look at this by looking at the conditional probabilities.

In order to calculate the conditional probabilities, lets classify the hurricane activity based on the ACE. The average ACE(monthly data) is approximately 10 x 104 kn2. Based on this, we can classify the activity as

  • ACE > 10 = “Active”
  • ACE < 4 = “Not Active”
  • ACE > 4 & <10 = “Moderate”
DF_Graph_Monthly <- DF_Graph_Monthly %>%  mutate(activity = case_when(ACE > 10 ~ "Active", 
                                ACE < 4  ~ "Not Active",
                                ACE > 4 & ACE <10 ~ "Moderate"))
cond_prob_monthly <- prop.table(table(DF_Graph_Monthly$activity, DF_Graph_Monthly$phase), margin = 1)
cond_prob_monthly %>% 
  kbl(caption = "Joint and Conditional Probability Tables - Monthly Data",digits=2) %>%
   kable_paper("hover", full_width = T, position = "center") %>% 
  row_spec(0,color="black",background = "grey")
Joint and Conditional Probability Tables - Monthly Data
Cool Phase/La Nina Neutral Phase Warm Phase/El Nino
Active 0.28 0.50 0.22
Moderate 0.25 0.52 0.23
Not Active 0.25 0.46 0.29

We can check the same with the annual data. The annual data already has been classified by the NOAA. The conditional probabilities are listed below.

DF_Graph_Yearly <- merge(ACE_Yearly_DF,enso_yearly_DF,by=c("Year"))
cond_prob_yearly <- prop.table(table(DF_Graph_Yearly$Classification, DF_Graph_Yearly$phase), margin = 1)

cond_prob_yearly %>% 
  kbl(caption = "Joint and Conditional Probability Tables - Yearly Data",digits=2) %>%
   kable_paper("hover", full_width = T, position = "center") %>% 
  row_spec(0,color="black",background = "grey")
Joint and Conditional Probability Tables - Yearly Data
Cool Phase/La Nina Neutral Phase Warm Phase/El Nino
Above normal 0.33 0.44 0.22
Below normal 0.14 0.48 0.38
Extremely active 0.14 0.79 0.07
Near normal 0.27 0.50 0.23

While is hard to see a high hurricane activity during the La Nina season, we can definitely see a low occurance of cyclone activity during El Nino conditions

As a next step, lets look at building a linear regression model relating ACE & ONI. In the linear model, in addition to ONI, lets also consider the La Nina/El Nino phase as categorical variables.

DF_Graph_Yearly$phase <- as.factor(DF_Graph_Yearly$phase) 
DF_Graph_Yearly$phase <- relevel(DF_Graph_Yearly$phase, ref = "Neutral Phase")
lm_yearly <- lm(DF_Graph_Yearly$ACE~DF_Graph_Yearly$ONI+DF_Graph_Yearly$phase,data=DF_Graph_Yearly)
summary(lm_yearly)
## 
## Call:
## lm(formula = DF_Graph_Yearly$ACE ~ DF_Graph_Yearly$ONI + DF_Graph_Yearly$phase, 
##     data = DF_Graph_Yearly)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -95.563 -37.905  -7.758  34.958 136.446 
## 
## Coefficients:
##                                         Estimate Std. Error t value Pr(>|t|)
## (Intercept)                              116.030      8.785  13.207   <2e-16
## DF_Graph_Yearly$ONI                      -53.162     24.886  -2.136   0.0364
## DF_Graph_Yearly$phaseCool Phase/La Nina  -54.058     26.073  -2.073   0.0421
## DF_Graph_Yearly$phaseWarm Phase/El Nino    6.051     25.470   0.238   0.8129
##                                            
## (Intercept)                             ***
## DF_Graph_Yearly$ONI                     *  
## DF_Graph_Yearly$phaseCool Phase/La Nina *  
## DF_Graph_Yearly$phaseWarm Phase/El Nino    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 53.99 on 66 degrees of freedom
## Multiple R-squared:  0.1315, Adjusted R-squared:  0.09202 
## F-statistic: 3.331 on 3 and 66 DF,  p-value: 0.02471

Let’s finally look at some diagnostics for the linear regression model and see if the fit is any better. We can see that the while the R2 is not very strong, ONI is indicated as a significant predictor.

Let’s plot the model on the crossplot to see the fit.

prediction_df <- data.frame(DF_Graph_Yearly$ONI,DF_Graph_Yearly$phase)

prediction_answer <- predict(lm_yearly,prediction_df)

fig.7 <- plot_ly(DF_Graph_Yearly)
fig.7 <- fig.7 %>% add_trace(data=DF_Graph_Yearly,x=~ACE,y=~ONI,type="scatter",color=~phase)
fig.7 <- fig.7 %>% add_trace(data=DF_Graph_Yearly,x=~prediction_answer,y=~ONI,type="scatter",mode="line",color=~phase)
fig.7 <- fig.7 %>% layout(title="Crossplot: ACE vs ONI",
                          yaxis = list(title = 'ONI (°C)',titlefont=t1,tickfont=t2,showgrid=F,showline = T,zeroline=F),
                          legend = list(orientation = 'v',x=0.7,y=0.9),
                          xaxis=list(autotick=F, dtick=20,title="ACE (10^4 kn^2)",titlefont=t1,tickfont=t2,showline = T,showgrid=F,
                                     type="linear",range=c(0,300)))
fig.7

The fit does not look very good. Finally, for completeness, let’s also look at the residuals - both as a histogram and a cross-plot of residual vs. ONI to check for any heteroskedasticity.

fig.8 <- plot_ly(DF_Graph_Yearly)
fig.8 <- fig.8 %>% add_trace(data=DF_Graph_Yearly,x=~ONI,y=lm_yearly$residuals,type="scatter")
fig.8 <- fig.8 %>% layout(title="Crossplot of ONI vs Residuals",
                          yaxis = list(title = 'Residual',titlefont=t1,tickfont=t2,showgrid=F,showline = T,zeroline=F),
                          xaxis=list(autotick=T,title="ONI (°C)",titlefont=t1,tickfont=t2,showline = T,showgrid=F,zeroline=F,
                                     type="linear"))
fig.8
fig.9 <- plot_ly(DF_Graph_Yearly)
fig.9 <- fig.9 %>% add_trace(x=lm_yearly$residuals,type="histogram")
fig.9 <- fig.9 %>% layout(title="Histogram of Residuals",
                          yaxis = list(title = 'Frequency',titlefont=t1,tickfont=t2,showgrid=F,showline = T,zeroline=F),
                          xaxis=list(autotick=T,title="Residual Value",titlefont=t1,tickfont=t2,showline = T,showgrid=F,zeroline=F,
                                     type="linear"))

fig.9

The residuals look fairly normally distributed from the above plots.


Conclusions

Based on the analysis that we have done thus far, it is hard to say that ONI is strong predictor of ACE and hence Hurricane Activity. We have definitely see that there is a link between the two, especially when considering the conditional probabilities. We perhaps have to account for other parameters (as is often the case in the complex world of weather forecasting)