Introduction

With this mini-project, I endeavour to look at the factors that impact Hurricanes in the Atlantic Ocean. The 2020 Atlantic hurricane season is the most active Atlantic hurricane season on record, featuring tropical cyclone formation at a record-breaking rate. There have been 31 tropical or subtropical cyclones, 30 named storms, 13 hurricanes, and six major hurricanes, including one Category 5 hurricane.

Before we jump into the coding aspects, I would like to clarify a few concepts related to the topic to help understand the project.

ACE (Accumulated Cyclone Energy) is a unit of measurement of a hurricane season’s activity. It is calculated for a storm by taking the storm’s wind speed in knots, squaring it, dividing it by 10000, then summing these values for each advisory.
El Niño is the abnormal appearance of unusually high ocean temperatures along South America’s tropical west coast. It has been associated with adverse effects on fishing, agriculture and local climate in South America. Additionally, it is said to be related to anomalous weather behaviour in Asia & North America.
La Niña is the counterpart of El Niño - consisting of cooling the Pacific Ocean’s surface waters. Its local effects are generally the opposite of those associated with El Niño. However, its global effects are thought to be more complex.
ONI (Oceanic Niño Index) is a measure of the departure from average sea surface temperature in the east-central Pacific Ocean. This is the standard means by which each El Niño or La Niña episode is determined, gauged, and forecast. Its unit is degree C.
- An ONI Value less than -0.5°C indicates a La Niña episode¹
- An ONI Value greater than 0.5°C indicates an El Niño episode
- An ONI Value between -0.5°C and 0.5°C is indicative of a neutral phase

Compared with El Niño conditions, La Niña conditions are generally more favorable for the formation of Atlantic hurricanes².

With that in mind, let’s investigate the relationship between ACE and ONI. Is the relationship evident.

Data Gathering

The first step is to gather the required information. For our analysis, we will use the following resources.

ACE Information - Yearly (from Wikipedia - https://en.wikipedia.org/wiki/Accumulated_cyclone_energy)
ACE Information - Monthly (from National Oceanic and Atmospheric Administration - https://www.noaa.gov/)
ONI Information - Monthly (from R Package - RSOI - https://cran.r-project.org/web/packages/rsoi/)

ACE Data - Yearly :: Scrape from Wikipedia

First, let us web-scrape from Wikipedia. The ACE information is written in Table format on the Wikipedia page. Below is a screenshot of the page that we are looking to scrape.

Screenshot from Wikipedia-Atlantic Ocean ACE history

To scrape from Wikipedia, we will require the rvest package.

The first step is to read the page into an object called wbpg using the read_html() function. Then we extract any table information into the tbl object. Once, we have done that, we then parse specifically for the Accumulated Cyclone Energy - Atlantic table using the grep function and read that into a data.frame.

# Load rvest library
library(rvest)
library(dplyr)
library(kableExtra) 
library(tidyverse)
library(rsoi)
library(plotly)
#### NOTE: SOURCE: https://www.r-bloggers.com/2018/09/scraping-tables-from-wikipedia-for-visualizing-climate-data/
url  <- "https://en.wikipedia.org/wiki/Accumulated_cyclone_energy"
# Read HTML page into wbpg object
wbpg <- read_html(url)
# Read table information from wbpg object into tbl object
tbl  <- html_nodes(wbpg, "table")
# Read required table and save to ACE_Yearly_DF data.frame
ACE_Yearly_DF <- html_table(tbl[grep("Accumulated Cyclone Energy - Atlantic\n",tbl,ignore.case = T)],fill = T)[[1]]
colnames(ACE_Yearly_DF)[1]<-"Year"
# Add Minor Tropical Storms column by calculating it from Tropical Storms, Hurricanes & Major Hurricanes
ACE_Yearly_DF$MTS <- ACE_Yearly_DF$TS-ACE_Yearly_DF$HU--ACE_Yearly_DF$MH
# Storing dataframe to csv 
write.csv(ACE_Yearly_DF,'ACE_Yearly_DF.csv')

#ACE_Yearly_DF <- read.csv('ACE_Yearly_DF.csv')

Now that we have downloaded the data into the data.frame, let’s view the data.frame contents. In RMarkdown, to display the table - I am using the kableExtra package.

# kableExtra is a package to use with RMarkdown to display tables. 
# The dataframe is quite large and I onlt want to display the first five rows. These are piped to Kable using the dplyr slice function. 

ACE_Yearly_DF %>% slice(1:5) %>%
  kbl(caption = "ACE DataFrame (Yearly) - showing first 5 rows") %>%
  kable_paper("hover", full_width = F, position = "center") %>% 
  row_spec(0,color="black",background = "grey")

ACE DataFrame (Yearly) - showing first 5 rows
Year	ACE	TS	HU	MH	Classification	MTS
1851	36.24	6	3	1	Below normal	4
1852	73.28	5	5	1	Near normal	1
1853	76.49	8	4	2	Near normal	6
1854	31.00	5	3	1	Below normal	3
1855	18.12	5	4	1	Below normal	2

ACE Data - Monthly :: Scrape from NOAA

It would be beneficial also to download/scrape the monthly ACE data from the NOAA website. The format for this website is not typical and resembles an ASCII or TSV file. Each line highlights the year & the rolling average ACE for each month of the year.

Screenshot NOAA ACE Data

Given the nature of the webpage (looking at source code), we can directly read the lines using the readlines() function. This information is stored in a character vector wbpg2. We do not require the first line and the last five lines. Then we have to do a series of string substitutions to clean up the data.

Since the white-spaces between the values are not uniformly distributed, let’s replace all the single & double spaces with and
Next, let’s get rid of the double ‘andand’ and replace with single ‘and’
There is still a space at the start of each row. To remove that let’s use the str_replace option.

Once, the clean up is done. We can convert it to a data.frame.

# Read & store lines in wbpg2 in character vector
wbpg2   <- readLines("https://www.psl.noaa.gov/gcos_wgsp/Timeseries/Hurricane/hurr.atl.ace.data")
# Eliminate first and last five lines
df.temp <- wbpg2[-c(1,171:178)]
# 
df.temp <- gsub(pattern = ' |  |   ',replacement = 'and',x = df.temp)
df.temp <- gsub(pattern = 'andand',replacement = 'and',x = df.temp)
df.temp <- str_replace(string = df.temp,pattern = 'and',replacement = '')
df.temp <- as.data.frame(df.temp)
clnames <- c("Year","Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")
ACE_Monthly_DF <- df.temp %>% 
  separate(df.temp, clnames, "and") %>% 
  gather(key = "Month",value = "ACE",-Year) %>% 
  mutate(Year = as.numeric(Year), ACE = as.numeric(ACE)) %>% 
  arrange(Year) %>% 
  select(Year,Month,ACE)

write.csv(ACE_Monthly_DF,'ACE_Monthly_DF.csv')
#ACE_Monthly_DF <- read.csv('ACE_Monthly_DF.csv')

Once again, now lets view the data.frame using Kable.

ACE_Monthly_DF %>% slice(5:9) %>%
  kbl(caption = "ACE DataFrame (Monthly)") %>%
  kable_paper("hover", full_width = F, position = "center") %>% 
  row_spec(0,color="black",background = "grey")

ACE DataFrame (Monthly)
Year	Month	ACE
1851	May	0.00
1851	Jun	4.91
1851	Jul	0.89
1851	Aug	21.83
1851	Sep	4.00

ONI Data :: Download from RSOI Package

The download_enso() function from the RSOI package directly downloads the data into a dataframe. The data is downloaded in monthly intervals. For comparison with ACE Yearly values, let’s summarize the data into annual data also. We are only interested in ONI data, hence, we are only retaining ONI data. Once the yearly average ONI has been calculated, we can then assign the phase based on the ONI value.

enso_monthly_DF <- download_enso()
enso_yearly_DF <- enso_monthly_DF %>% group_by(Year) %>% summarise(ONI=mean(ONI))
enso_yearly_DF <- enso_yearly_DF %>%  mutate(phase = case_when(ONI > 0.5 ~ "Warm Phase/El Nino", 
                                ONI < -0.5  ~ "Cool Phase/La Nina",
                                ONI < 0.5 & ONI >-0.5  ~ "Neutral Phase"))
write.csv(enso_monthly_DF,'enso_monthly_DF.csv')
write.csv(enso_yearly_DF,'enso_yearly_DF.csv')

#enso_yearly_DF <- read.csv('enso_yearly_DF.csv')
#enso_monthly_DF <- read.csv('enso_monthly_DF.csv')

Visualization

Now that the data is loaded into the appropriate dataframes, let’s visualize the data. For visualization, we shall use the plotly package. For ease of use, it would be useful to store font information as a list. For this purpose, I have created two lists (t1 & t2) to store font information that will be used for axis title and tick fonts respectively.

# Merging ACE_Yearly_DF and enso_yearly_DF for ease. 
DF_Graph_Yearly <- merge(ACE_Yearly_DF,enso_yearly_DF,by="Year")


# Font Descriptions to use for Plot. 
t1 <- list(family = "Arial, sans-serif", size = 13, color = "black")
t2 <- list(family = "Arial, sans-serif", size = 12, color = "black")

#Creating a DF with Mean_ACE & Mean_Tropical Storms for use in plot 
hz_line_mean <- data.frame("Year"= DF_Graph_Yearly$Year,"Mean_TS"=rep(mean(ACE_Yearly_DF$TS),nrow(DF_Graph_Yearly)),"Mean_ACE"=rep(mean(ACE_Yearly_DF$ACE),nrow(DF_Graph_Yearly)))

fig.1 <- plot_ly(ACE_Yearly_DF)
# Adding Tropical Storms as bar chart
fig.1 <- fig.1 %>% add_trace(data=DF_Graph_Yearly,x=~Year,y=~MTS,type="bar",name="Minor Tropical Storms",marker = list(color = 'rgb(158,202,225)'))
# Adding Hurricanes as line chart
fig.1 <- fig.1 %>% add_trace(x=~Year, y=~HU, name = 'Hurricanes',type="scatter",mode="lines",line=list(width=4))
# Adding Major Hurricanes as line chart
fig.1 <- fig.1 %>% add_trace(x=~Year, y=~MH, name = 'Major Hurricanes',type="scatter",mode="lines",line=list(width=4))
#Adding Mean Tropical Storms Line to Plot
fig.1 <- fig.1 %>% add_trace(x=hz_line_mean$Year,y=hz_line_mean$Mean_TS,name="Mean Total Tropical Storms",            
                             type="scatter",mode="lines",line=list(width=3,dash='dash',color='blue'))
fig.1 <- fig.1 %>% layout(title="Tropical Storms - Atlantic Basin",
                          yaxis = list(title = 'Count of Tropical Storms & Hurricanes',titlefont=t1,tickfont=t2,showgrid=F,showline = T), 
                          annotations = list(text = "Increase in ACE since 2005", font=t1, x = 2010, y = 20,showarrow=FALSE ),
                          shapes = list(list(type = "rect", fillcolor = "grey", line = list(color = "grey"), opacity = 0.25,
                                              x0 = 2000, x1 = 2020, xref = "x",
                                              y0 = 0, y1 = 30, yref = "y")),
                          barmode = 'group',
                          legend = list(orientation = 'v',x=0.05,y=0.9),
                          xaxis=list(autotick=FALSE, dtick=5,title="Year",titlefont=t1,tickfont=t2,showgrid=F))
fig.1

## Warning: `arrange_()` is deprecated as of dplyr 0.7.0.
## Please use `arrange()` instead.
## See vignette('programming') for more help
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.

##### NOTE: CODE FROM GRAPHS HAVE BEEN SOURCED FROM PLOTLY

We can see that there is definitely an increase in number of tropical storms in the Atlantic Ocean since 2005. Similarly let’s look at the ACE data over the years.

fig.2 <- plot_ly(ACE_Yearly_DF)
# Adding ACE as bar chart
fig.2 <- fig.2 %>% add_trace(data=DF_Graph_Yearly,x=~Year,y=~ACE,type="bar",name="Minor Tropical Storms",marker = list(color = 'rgb(158,202,225)'))
#Adding Mean Tropical Storms Line to Plot
fig.2 <- fig.2 %>% add_trace(x=hz_line_mean$Year,y=hz_line_mean$Mean_ACE,name="Mean ACE",            
                             type="scatter",mode="lines",line=list(width=3,dash='dash',color='blue'))
fig.2 <- fig.2 %>% layout(title="Accumulated Cyclone Energy - Atlantic Basin",
                          yaxis = list(title = 'ACE (10^4 kn^2)',titlefont=t1,tickfont=t2,showgrid=F,showline = T), 
                          annotations = list(text = "Increase in ACE since 2005", font=t1, x = 2010, y = 200,showarrow=FALSE ),
                          shapes = list(list(type = "rect", fillcolor = "grey", line = list(color = "grey"), opacity = 0.25,
                                              x0 = 2000, x1 = 2020, xref = "x",
                                              y0 = 0, y1 = 250, yref = "y")),
                          barmode = 'group',
                          legend = list(orientation = 'v',x=0.05,y=0.9),
                          xaxis=list(autotick=FALSE, dtick=5,title="Year",titlefont=t1,tickfont=t2,showgrid=F))
fig.2

Now, lets look at the ONI over the years. We can see that there is a period cycling on El Nino & La Nina conditions.

#Creating a DF with El Nino & La Nina thresholds
hz_line_ONI <- data.frame("Year"= DF_Graph_Yearly$Year,"LaNina"=rep(-0.5,nrow(DF_Graph_Yearly)),"ElNino"=rep(0.5,nrow(DF_Graph_Yearly)))

# Font Descriptions to use for Plot - Annotations
f1 <- list( family = "Arial, sans-serif", size = 13, color = "red" )
f2 <- list( family = "Arial, sans-serif", size = 13, color = "blue")


fig.3 <- plot_ly(DF_Graph_Yearly)
# Adding ONI as bar chart - red for positive & blue for negative
fig.3 <- fig.3 %>% add_trace(data=DF_Graph_Yearly,x=~Year,y=~ONI,type="bar",name="ONI",color = ~ONI < 0, colors = c("orangered", "royalblue"))
# Adding La Nina Threshold
fig.3 <- fig.3 %>% add_trace(x=hz_line_ONI$Year,y=hz_line_ONI$LaNina,name="La Nina Threshold",            
                             type="scatter",mode="lines",line=list(width=3,dash='dash',color='blue'))
# Adding El Nino Threshold
fig.3 <- fig.3 %>% add_trace(x=hz_line_ONI$Year,y=hz_line_ONI$ElNino,name="El Nino Threshold",            
                             type="scatter",mode="lines",line=list(width=3,dash='dash',color='red'))
fig.3 <- fig.3 %>% layout(title="Oceanic Niño Index",
                          yaxis = list(title = 'ONI (°C)',titlefont=t1,tickfont=t2,showgrid=F,showline = T),
                          annotations = list(
                            list(text = "El Nino Threshold", font=f1, x = 2009, y = 0.60,showarrow=FALSE ),
                            list(text = "La Nina Threshold", font=f2, x = 1963, y = -0.60,showarrow=FALSE )),
                          showlegend = FALSE,
                          xaxis=list(autotick=FALSE, dtick=5,title="Year",titlefont=t1,tickfont=t2,showgrid=F))
fig.3

Data Analysis - Trend Analysis

Now that we have explored the data, we can investigate the relationships between the ACE and ONI. As a first step, let’s plot ACE & ONI on the same plot. Let’s look at the monthly data from 2000 to 2020 as a first step.

# Merging ACE_Yearly_DF and enso_yearly_DF for ease.
DF_Graph_Monthly <- merge(ACE_Monthly_DF,enso_monthly_DF,by=c("Year","Month"))

#margin descriptions
m = list(l = 35,r = 40,b = 20,t = 50,pad = 1)

fig.4 <- plot_ly(DF_Graph_Monthly)
# Adding ACE data to Plot
fig.4 <- fig.4 %>% add_trace(data=DF_Graph_Monthly,x=~Date,y=~ACE,type="bar",name="ACE-Monthly")
#Adding ONI data to Plot
fig.4 <- fig.4 %>% add_trace(data=DF_Graph_Monthly,x=~Date,y=~ONI,type="bar",name="ONI-Monthly", yaxis='y2',marker=list(opacity=0.4))
fig.4 <- fig.4 %>%  layout(title="Accumulated Cyclone Energy - Atlantic Basin (2005-2020)",
                           yaxis = list(title = 'ACE (10^4 kn^2)', titlefont=t1,tickfont=t2,showgrid=F,showline = T,side='left',zeroline=F,range=c(-180,180)), 
                           yaxis2 = list(title = 'ONI °C', titlefont=t1,tickfont=t2,showgrid=F,showline = T,side='right',overlaying = "y",zeroline=F,range=c(-3,3)), 
                          legend = list(orientation = 'v',x=0.05,y=0.9),
                          xaxis=list(autotick=T,title="Year",titlefont=t1,tickfont=t2,showgrid=F,showline = T,zeroline=F,
                                     range = c("2005-01-01", "2020-01-01")),
                          margin = m)
fig.4

To view long-term trends, we can use the LOESS (locally weighted smoothing) function to generate a smoothed out line. Loess regression can be applied using the loess() on a numerical vector to smoothen it and to predict the Y locally (i.e, within the trained values of Xs). The size of the neighborhood can be controlled using the span argument, which ranges between 0 to 1. It controls the degree of smoothing. So, the greater the value of span, more smooth is the fitted curve. In this case, let’s try two smoothing values of 0.05 and 0.25. First, we have to build a model and then use the model to predict values of ACE or ONI based on the x-axis i.e. Year in this case.

ACE_y_smooth <- loess(DF_Graph_Yearly$ACE~DF_Graph_Yearly$Year,span=0.05)
ONI_y_smooth <- loess(DF_Graph_Yearly$ONI~DF_Graph_Yearly$Year,span=0.05)
ACE_y_smooth_high <- loess(DF_Graph_Yearly$ACE~DF_Graph_Yearly$Year,span=0.25)
ONI_y_smooth_high <- loess(DF_Graph_Yearly$ONI~DF_Graph_Yearly$Year,span=0.25)
ACE_Pred <-predict(ACE_y_smooth,se=TRUE)
ONI_Pred <-predict(ONI_y_smooth,se=TRUE)
ACE_Pred_high <-predict(ACE_y_smooth_high,se=TRUE)
ONI_Pred_high <-predict(ONI_y_smooth_high,se=TRUE)

fig.5 <- plot_ly(DF_Graph_Yearly)
fig.5 <- fig.5 %>% add_trace(data=DF_Graph_Yearly,x=DF_Graph_Yearly$Year,y=ACE_Pred$fit,type="scatter", mode="line",name="ACE-Smoothed Span=0.05",line=list(color="orangered",width=4))
fig.5 <- fig.5 %>% add_trace(data=DF_Graph_Yearly,x=DF_Graph_Yearly$Year,y=ACE_Pred_high$fit,type="scatter", mode="line",name="ACE-Smoothed Span=0.25",line=list(color="orangered",dash="dash",width=4))
fig.5 <- fig.5 %>% add_trace(data=DF_Graph_Yearly,x=DF_Graph_Yearly$Year,y=ONI_Pred$fit,type="scatter", mode="line",name="ONI-Smoothed Span=0.05", yaxis='y2',line=list(color="navy",width=4))
fig.5 <- fig.5 %>% add_trace(data=DF_Graph_Yearly,x=DF_Graph_Yearly$Year,y=ONI_Pred_high$fit,type="scatter", mode="line",name="ONI-Smoothed Span=0.25", yaxis='y2',line=list(color="navy",dash="dash",width=4))
fig.5 <- fig.5 %>%  layout(title="Accumulated Cyclone Energy - Atlantic Basin",
                            yaxis = list(title = 'ACE (10^4 kn^2)', titlefont=t1,tickfont=t2,showgrid=F,showline = T,side='left',zeroline=F), 
                            yaxis2 = list(title = 'ONI °C', titlefont=t1,tickfont=t2,showgrid=F,showline = T,side='right',overlaying = "y",zeroline=F), 
                           legend = list(orientation = 'v',x=0.45,y=0.9),
                           xaxis=list(autotick=FALSE, dtick=5,title="Year",titlefont=t1,tickfont=t2,showgrid=F,showline = T,zeroline=F,range = c(1995, 2020)),
                           margin = m)

fig.5

Based on the chart, it is hard to determine if there is relationship between the two.

Let us create a cross-plot of ACE with ONI.

fig.6 <- plot_ly(DF_Graph_Yearly)
fig.6 <- fig.6 %>% add_trace(data=DF_Graph_Yearly,x=~ACE,y=~ONI,type="scatter",color=~phase)
fig.6 <- fig.6 %>% layout(title="Crossplot: ACE vs ONI",
                          yaxis = list(title = 'ONI (°C)',titlefont=t1,tickfont=t2,showgrid=F,showline = T,zeroline=F),
                          legend = list(orientation = 'v',x=0.7,y=0.9),
                          xaxis=list(autotick=F, dtick=20,title="ACE (10^4 kn^2)",titlefont=t1,tickfont=t2,showline = T,showgrid=F,
                                     type="linear",range=c(0,300)))
fig.6

If we calculate the calculate the correlation between the two, we can see that the correlation is -0.253. This is using the Yearly data. The correlation coefficient calculated using the monthly data is -0.08 It seems the annual data perhaps removes some of the noise.

Conditional Probability

The prevailing school of thought is that El Niño conditions favors stronger hurricane activity in the central and eastern Pacific basins, and suppresses it in the Atlantic basin. Conversely, La Niña suppresses hurricane activity in the central and eastern Pacific basins, and enhances it in the Atlantic basin Let’s look at this by looking at the conditional probabilities.

In order to calculate the conditional probabilities, lets classify the hurricane activity based on the ACE. The average ACE(monthly data) is approximately 10 x 10⁴ kn². Based on this, we can classify the activity as

ACE > 10 = “Active”
ACE < 4 = “Not Active”
ACE > 4 & <10 = “Moderate”

DF_Graph_Monthly <- DF_Graph_Monthly %>%  mutate(activity = case_when(ACE > 10 ~ "Active", 
                                ACE < 4  ~ "Not Active",
                                ACE > 4 & ACE <10 ~ "Moderate"))
cond_prob_monthly <- prop.table(table(DF_Graph_Monthly$activity, DF_Graph_Monthly$phase), margin = 1)
cond_prob_monthly %>% 
  kbl(caption = "Joint and Conditional Probability Tables - Monthly Data",digits=2) %>%
   kable_paper("hover", full_width = T, position = "center") %>% 
  row_spec(0,color="black",background = "grey")

Joint and Conditional Probability Tables - Monthly Data
	Cool Phase/La Nina	Neutral Phase	Warm Phase/El Nino
Active	0.28	0.50	0.22
Moderate	0.25	0.52	0.23
Not Active	0.25	0.46	0.29

We can check the same with the annual data. The annual data already has been classified by the NOAA. The conditional probabilities are listed below.

DF_Graph_Yearly <- merge(ACE_Yearly_DF,enso_yearly_DF,by=c("Year"))
cond_prob_yearly <- prop.table(table(DF_Graph_Yearly$Classification, DF_Graph_Yearly$phase), margin = 1)

cond_prob_yearly %>% 
  kbl(caption = "Joint and Conditional Probability Tables - Yearly Data",digits=2) %>%
   kable_paper("hover", full_width = T, position = "center") %>% 
  row_spec(0,color="black",background = "grey")

Joint and Conditional Probability Tables - Yearly Data
	Cool Phase/La Nina	Neutral Phase	Warm Phase/El Nino
Above normal	0.33	0.44	0.22
Below normal	0.14	0.48	0.38
Extremely active	0.14	0.79	0.07
Near normal	0.27	0.50	0.23

While is hard to see a high hurricane activity during the La Nina season, we can definitely see a low occurance of cyclone activity during El Nino conditions

As a next step, lets look at building a linear regression model relating ACE & ONI. In the linear model, in addition to ONI, lets also consider the La Nina/El Nino phase as categorical variables.

DF_Graph_Yearly$phase <- as.factor(DF_Graph_Yearly$phase) 
DF_Graph_Yearly$phase <- relevel(DF_Graph_Yearly$phase, ref = "Neutral Phase")
lm_yearly <- lm(DF_Graph_Yearly$ACE~DF_Graph_Yearly$ONI+DF_Graph_Yearly$phase,data=DF_Graph_Yearly)
summary(lm_yearly)

## 
## Call:
## lm(formula = DF_Graph_Yearly$ACE ~ DF_Graph_Yearly$ONI + DF_Graph_Yearly$phase, 
##     data = DF_Graph_Yearly)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -95.563 -37.905  -7.758  34.958 136.446 
## 
## Coefficients:
##                                         Estimate Std. Error t value Pr(>|t|)
## (Intercept)                              116.030      8.785  13.207   <2e-16
## DF_Graph_Yearly$ONI                      -53.162     24.886  -2.136   0.0364
## DF_Graph_Yearly$phaseCool Phase/La Nina  -54.058     26.073  -2.073   0.0421
## DF_Graph_Yearly$phaseWarm Phase/El Nino    6.051     25.470   0.238   0.8129
##                                            
## (Intercept)                             ***
## DF_Graph_Yearly$ONI                     *  
## DF_Graph_Yearly$phaseCool Phase/La Nina *  
## DF_Graph_Yearly$phaseWarm Phase/El Nino    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 53.99 on 66 degrees of freedom
## Multiple R-squared:  0.1315, Adjusted R-squared:  0.09202 
## F-statistic: 3.331 on 3 and 66 DF,  p-value: 0.02471

Let’s finally look at some diagnostics for the linear regression model and see if the fit is any better. We can see that the while the R² is not very strong, ONI is indicated as a significant predictor.

Let’s plot the model on the crossplot to see the fit.

prediction_df <- data.frame(DF_Graph_Yearly$ONI,DF_Graph_Yearly$phase)

prediction_answer <- predict(lm_yearly,prediction_df)

fig.7 <- plot_ly(DF_Graph_Yearly)
fig.7 <- fig.7 %>% add_trace(data=DF_Graph_Yearly,x=~ACE,y=~ONI,type="scatter",color=~phase)
fig.7 <- fig.7 %>% add_trace(data=DF_Graph_Yearly,x=~prediction_answer,y=~ONI,type="scatter",mode="line",color=~phase)
fig.7 <- fig.7 %>% layout(title="Crossplot: ACE vs ONI",
                          yaxis = list(title = 'ONI (°C)',titlefont=t1,tickfont=t2,showgrid=F,showline = T,zeroline=F),
                          legend = list(orientation = 'v',x=0.7,y=0.9),
                          xaxis=list(autotick=F, dtick=20,title="ACE (10^4 kn^2)",titlefont=t1,tickfont=t2,showline = T,showgrid=F,
                                     type="linear",range=c(0,300)))
fig.7

The fit does not look very good. Finally, for completeness, let’s also look at the residuals - both as a histogram and a cross-plot of residual vs. ONI to check for any heteroskedasticity.

fig.8 <- plot_ly(DF_Graph_Yearly)
fig.8 <- fig.8 %>% add_trace(data=DF_Graph_Yearly,x=~ONI,y=lm_yearly$residuals,type="scatter")
fig.8 <- fig.8 %>% layout(title="Crossplot of ONI vs Residuals",
                          yaxis = list(title = 'Residual',titlefont=t1,tickfont=t2,showgrid=F,showline = T,zeroline=F),
                          xaxis=list(autotick=T,title="ONI (°C)",titlefont=t1,tickfont=t2,showline = T,showgrid=F,zeroline=F,
                                     type="linear"))
fig.8

fig.9 <- plot_ly(DF_Graph_Yearly)
fig.9 <- fig.9 %>% add_trace(x=lm_yearly$residuals,type="histogram")
fig.9 <- fig.9 %>% layout(title="Histogram of Residuals",
                          yaxis = list(title = 'Frequency',titlefont=t1,tickfont=t2,showgrid=F,showline = T,zeroline=F),
                          xaxis=list(autotick=T,title="Residual Value",titlefont=t1,tickfont=t2,showline = T,showgrid=F,zeroline=F,
                                     type="linear"))

fig.9

The residuals look fairly normally distributed from the above plots.

Conclusions

Based on the analysis that we have done thus far, it is hard to say that ONI is strong predictor of ACE and hence Hurricane Activity. We have definitely see that there is a link between the two, especially when considering the conditional probabilities. We perhaps have to account for other parameters (as is often the case in the complex world of weather forecasting)

Hurricane Analysis using R