Libraries
Introduction
Objective
Arctic Sea Ice Extent data
- Read data
Descriptive Analysis
Transformations
Model Specification
Model Fitting (Parameter Estimation)
Final Analysis Conclusion:
Future Directions:

Libraries

library(TSA)
library(forecast)
library(fUnitRoots)
library(tseries)
library(lmtest)

Introduction

Every year, news platforms and various social media platforms has made us aware of the melting of Arctic ice. Some news even suggest the acceleration of this event. These statements already suggest the trend of arctic sea ice area will be a declining one over the years and the acceleration of this event suggests larger variations in the recent years than in the past.

Objective

Our aim for the collected data of Arctic sea ice extent over the years is to use specify models and fit the best model of the possible models having reasonable and parsimonious parameter estimates. This will be achieved by, * performing a descriptive analysis to know the data and inform futher modeling * Model specification * Model fitting based on various model scoring techniques.

Arctic Sea Ice Extent data

The dataset is a NASA climate dataset. The data holds the area covered by the Arctic Sea ice in million square kilometers since 1979 till 2022. Thus, there are 2 columns and 44 observations.

Read data

ArcticSea <- read.csv("C:/Users/admin/Desktop/Time Series/Assignment 2/ArcticSeaIceExtent.csv", header=TRUE)$Arctic.Sea.Ice.Extent..million.square.km. # Read raw data
class(ArcticSea) # Data is not a TS object

## [1] "numeric"

ArcticSea = ts(ArcticSea,start = 1979) # Convert to TS object
class(ArcticSea)

## [1] "ts"

\(ArcticSea\) is our raw data time series.

Descriptive Analysis

Lets look at the descriptive statistics of our time series.

Summary statistics

summary(ArcticSea)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.390   4.707   5.995   5.800   6.900   7.540

The mean and median of the arctic ice extent are very close indicating symmetrical distribution.

Time Series plot:

The time series plot for our data is generated using the following code chunk,

# Time series plot using ArcticSea TS object
plot(ArcticSea,type='o',ylab='Ice extent (mil. sq. km)', xlab='Time (Years)', main = " Figure 1: Time series plot of Arctic Sea Ice Extent series.")

Plot Inference

From Figure 1, we can comment on the time series’s,

Trend: Over the period of time the Arctic sea ice extent (in million sq. km) follows a downward trend. This indicates non-stationarity in the time series.
Seasonality: From the plot, no noticeable seasonal behavior is seen as no clear repeating patterns are visible. At this point, it can be said that there is no seasonality. Thus, our model wouldn’t need a seasonal component.
Change in variance: There is a slight increase in variance over time. See the variation in data. Around the year ~1982 we see small variations compared to variations around ~2010).
Behavior: We notice mixed behavior of MA and AR series. We can see ups and downs or fluctuations in the time series. Thus, our time series exhibits Moving Average behavior. We can see succeeding data points follow each other at a various time points (~1983, ~1987, 2009, etc.). Thus our time series exhibits Auto Regressive behavior as well. We expect, MA component to be more dominant than AR as we observed more up and down fluctuations than following data points.
Intervention/Change points: We do not notice any particular intervention points. Although around ~2012 the data drops a lot, we reason this behavior to the AR (Auto Regressive) nature of the series as the data points from ~2007 to ~2012 are following each other. A sudden increase in ice area from ~2012 to ~2013 seen is same as the rise and drop around ~2006 to ~2009. This behavior is regarded to the MA (Moving Average) nature of the series but with a higher variance compared to variation seen in the earlier years. Thus, no intervention points are observed.
Shape of plot: The time series seems to follow a quadratic pattern. See the decline. It is not linear, but a little curved.

Plot scatter plot? Since we have AR as well as MA components, plotting scatter plot for correlation between consecutive years wouldn’t be fruitful as we do not know the order of the AR and MA components. We do not know if \(Y_t\) is correlated to \(Y_{t-1}\) or \(Y_{t-2}\) or so on. Thus, just plotting correlation scatter plot between \(Y_t\) and \(Y_{t-1}\) is unfruitful.

ACF and PACF plots:

par(mfrow=c(2,1))
acf(ArcticSea, main ="ACF plot of Arctic Sea Ice Extent time series")
pacf(ArcticSea, main ="PACF plot of Arctic Sea Ice Extent time series")

par(mfrow=c(1,1))

ACF plot: We notice multiple autocorrelations are significant. A slowly decaying pattern indicates non stationary series. We see a ‘wavish’ form but it is not consistent (not repeating at a particular frequency). Thus, no seasonal behavior is observed. We did not observe seasonality in time series plot as well.
PACF plot: We see 1 high vertical spike indicating non stationary series. We have observed non stationarity in the time series plot as well. Also, the second correlation bar is significant as well.

Check normality

Many model estimating procedures ( For example, Maximum Likelihood) assume normality of the residuals. If this assumption doesnt hold, then the coefficient estimates are not optimum. Thus, it is ideal to have a normal time series data and in turn a residuals. Lets look at the Quantile-Quantile (QQ) plot to to observe normality visually and the Shapiro-Wilk test to statistically confirm the result.

qqnorm(ArcticSea, main = "Normal Q-Q Plot of Raw ArcticSea Time Series")
qqline(ArcticSea, col = 2)

We see deviations from normality. Clearly, both the tails are off and most of the data in middle is off the line as well. Thus, \(ArcticSea\) time series seems to not normal distributed.

shapiro.test(ArcticSea)

## 
##  Shapiro-Wilk normality test
## 
## data:  ArcticSea
## W = 0.94543, p-value = 0.0373

From the Shapiro-Wilk test, since p < 0.05 significance level, we reject the null hypothesis that states the data is normal.

Conclusion from descriptive analysis:

From the descriptive analysis, we expect an ARIMA model as we notice AR and MA behavior. Since we notice a trend (downward) which can be fixed using differencing. Also, since Our data is not normal and has increasing variance. Log transformation or Box-Cox transformation will be considered.

Transformations

Box-Cox or Log transformation to improve normality or variance

To improve normality and increasing variance in our time series \(ArcticSea\), lets test Log and Box-Cox transformations.

# Get Box-Cox Confidence interval 
BC = BoxCox.ar(ArcticSea)
BC$ci

## [1] 0.4 2.0

title(main = "Log-likelihood versus the values of lambda for Arctic Sea Ice extent")

# Get lambda based on log-likelihood having maximum likelihood
lambda <- BC$lambda[which(max(BC$loglike) == BC$loglike)]
lambda

## [1] 1.4

# Apply Box-Cox tranformed data with the lambda value 1.4
BC.ArcticSea = (ArcticSea^lambda-1)/lambda

\(BC.ArcticSea\) is our Box-Cox transformed data.

Is our time series better now?

Visually comparing Time Series:

par(mfrow=c(3,1))

plot(ArcticSea,type='o',ylab=' ', xlab=' ', main = " Time series plot of Arctic Sea Ice extent series without BC transformation")

plot(BC.ArcticSea,type='o',ylab='Ice extent (mil. sq. km)', xlab=' ', main = " Time series plot of Arctic Sea Ice extent series with BC transformation")

plot(log(ArcticSea),type='o',ylab=' ', xlab='Time (Years)', main = " Time series plot of Arctic Sea Ice extent series with log transformation")

par(mfrow=c(1,1))

From the plot, almost no improvement in the time series is visible after BC transformation. Note, log transformation increases the variation.

Visually comparing QQ plots:

par(mfrow=c(2,1))
qqnorm(ArcticSea, main = "Normal Q-Q Plot of Raw ArcticSea Time Series")
qqline(ArcticSea, col = 2)

qqnorm(BC.ArcticSea, main = "Normal Q-Q Plot of BC Transformed ArcticSea Time Series")
qqline(BC.ArcticSea, col = 2)

par(mfrow=c(1,1))

No observable improvement in normality can be seen after BC transformation.

Shapiro-Wilk normality score test

shapiro.test(ArcticSea)

## 
##  Shapiro-Wilk normality test
## 
## data:  ArcticSea
## W = 0.94543, p-value = 0.0373

shapiro.test(BC.ArcticSea)

## 
##  Shapiro-Wilk normality test
## 
## data:  BC.ArcticSea
## W = 0.94795, p-value = 0.04619

The data is still not normal after box cox transformation. But the normality has improved slightly (p value increased from 0.0373 to 0.04619). Note - as p value increases, we reach closer to normality).

Conclusion from BC transformation:

Since slight improvement in normality is achieved and although almost no improvement in increasing variance is obtained, we will proceed with BC transformed time series \(BC.ArcticSea\).

Differencing to improve Stationarity

The ACF, PACF and time series plots at the descriptive analysis stage of \(ArcticSea\) time series tells us non-stationarity in our time series. This non-stationarity is still present in BC transformed series \(BC.ArcticSea\) as no actions are taken yet to flatten the trend curve. Lets confirm the non-stationarity using Dickey-Fuller Unit-Root Test or ADF test.

Testing Stationarity on BC Transformed series

Dickey-Fuller Unit-Root Test:

The Dickey-Fuller unit-root test is used to test the null hypothesis that the process is difference nonstationary (the process is nonstationary but becomes stationary after first differencing). The alternative hypothesis is that the process is stationary.

\(H_0\): Time series is Difference non-stationary
\(H_a\): Time series is Stationary

To carry out the test with \(BC.ArcticSea\) data, we must determine \(k\) (number of coefficients) using the following code:

ar(BC.ArcticSea)

## 
## Call:
## ar(x = BC.ArcticSea)
## 
## Coefficients:
##      1       2  
## 0.5069  0.3531  
## 
## Order selected 2  sigma^2 estimated as  1.761

\(k\) = 2. Now, we can perform ADF test. We will using fUnitRoots’s and tseries’s ADF tests for double confirmation.

fUnitRoots’s ADF test:

The ADF test with no intercept (constant) nor time trend is implemented with the following code chunk:

# ADF test using fUnitRoots package
adfTest(BC.ArcticSea, lags = 2, type = "nc", title = NULL,description = NULL)

## 
## Title:
##  Augmented Dickey-Fuller Test
## 
## Test Results:
##   PARAMETER:
##     Lag Order: 2
##   STATISTIC:
##     Dickey-Fuller: -1.592
##   P VALUE:
##     0.1048 
## 
## Description:
##  Fri Sep  6 16:56:46 2024 by user: admin

The ADF test with an intercept (constant) but no time trend is implemented with the following code chunk:

adfTest(BC.ArcticSea, lags = 2, type = "c", title = NULL,description = NULL)

## 
## Title:
##  Augmented Dickey-Fuller Test
## 
## Test Results:
##   PARAMETER:
##     Lag Order: 2
##   STATISTIC:
##     Dickey-Fuller: -0.9547
##   P VALUE:
##     0.6968 
## 
## Description:
##  Fri Sep  6 16:56:46 2024 by user: admin

The ADF test with an intercept (constant) and a time trend is implemented with the following code chunk:

adfTest(BC.ArcticSea, lags = 2, type = "ct", title = NULL,description = NULL)

## 
## Title:
##  Augmented Dickey-Fuller Test
## 
## Test Results:
##   PARAMETER:
##     Lag Order: 2
##   STATISTIC:
##     Dickey-Fuller: -3.1251
##   P VALUE:
##     0.1284 
## 
## Description:
##  Fri Sep  6 16:56:46 2024 by user: admin

# For all cases, we observed values greater than 0.1. Thus, the null hypothesis that the process is difference nonstationary (the process is nonstationary but 
# becomes stationary after first differencing) holds.

For all 3 cases, we observed p values greater than 0.05. Thus, the null hypothesis that the process is difference non-stationary or non-stationary cannot be rejected.

tseries’s ADF test:

# ADF test using tseries package
adf.test(BC.ArcticSea, alternative = c("stationary")) # Omit k to use the default formula

## 
##  Augmented Dickey-Fuller Test
## 
## data:  BC.ArcticSea
## Dickey-Fuller = -2.369, Lag order = 3, p-value = 0.4277
## alternative hypothesis: stationary

Result is consistent. Since the p value is high, we cannot reject null hypothesis. Thus, BC.ArcticSea is non-stationary.

PP Test: PP test is another stationarity test. The null and alternate hypothesis are same as ADF test.

# using pp test
pp.test(BC.ArcticSea)

## Warning in pp.test(BC.ArcticSea): p-value smaller than printed p-value

## 
##  Phillips-Perron Unit Root Test
## 
## data:  BC.ArcticSea
## Dickey-Fuller Z(alpha) = -44.117, Truncation lag parameter = 3, p-value
## = 0.01
## alternative hypothesis: stationary

PP test gives statistically significant test result has (P ≤ 0.05). This result rejects null hypothesis and says the data is stationary. This result is contrary to ADF test results and descriptive analysis. Lets perform another stationarity test, kpss test.

KPSS Test:

In KPSS, The null and alternate hypothesis are opposite to that of ADF test and PP test. Null hypothesis is that, series is stationary.

kpss.test(BC.ArcticSea)

## Warning in kpss.test(BC.ArcticSea): p-value smaller than printed p-value

## 
##  KPSS Test for Level Stationarity
## 
## data:  BC.ArcticSea
## KPSS Level = 1.1283, Truncation lag parameter = 3, p-value = 0.01

Since p<0.05, we reject null hypthesis. Data is non stationary.

Conclusion from Stationarity testing on BC transformed series:

Since 2 ADF and a KPSS test suggests non-stationarity, although pp test tells otherwise, we conclude, \(BC.ArcticSea\) is non-stationary.

Transformation using Differencing

Finally, since we have non stationary series, we need to transform this into a stationary series using differencing. Lets apply first differencing and check if stationarity is achieved.

# First difference of BC transformed series
diff.BC.ArcticSea = diff(BC.ArcticSea)

Time Series plot of transformed series

# Time series plot using diff.BC.ArcticSea TS object
plot(diff.BC.ArcticSea,type='o',ylab='Ice extent (mil. sq. km)', xlab='Time (Years)')
title(main = "Time series plot of First Differenced BC
transformed Arctic Sea Ice Extent series.", line = 1)

Comparing to BC transformed series,

# Time series plot using BC.ArcticSea TS object
plot(BC.ArcticSea,type='o',ylab='Ice extent (mil. sq. km)', xlab='Time (Years)')
title(main = "Time series plot of BC transformed
Arctic Sea Ice Extent series.", line = 1)

Testing normality for First Differenced BC Transformed series

shapiro.test(diff.BC.ArcticSea)

## 
##  Shapiro-Wilk normality test
## 
## data:  diff.BC.ArcticSea
## W = 0.98383, p-value = 0.796

From the Shapiro-Wilk test, since p > 0.05 significance level, null hypothesis that states the data is normal holds. Thus, First Differenced BC Transformed series is normal.

Testing Stationarity on First Differenced BC Transformed series

Again, performing 2 ADF tests (fUnitRoots and tseries), PP and KPSS tests. Please read results in comments next to codes. Codes are commented for fUnitRoots to avoid overcrowding of results.

ADF test using fUnitRoots package,

# ADF test using fUnitRoots package 
# ar(diff.BC.ArcticSea) # k=4
# adfTest(diff.BC.ArcticSea, lags = 4, type = "nc", title = NULL,description = NULL) gives p value 0.01
# adfTest(diff.BC.ArcticSea, lags = 4, type = "c", title = NULL,description = NULL) gives p value 0.01
# adfTest(diff.BC.ArcticSea, lags = 4, type = "ct", title = NULL,description = NULL) gives p value 0.02842

Result: Null hypothesis is rejected. Data is Stationary

ADF test using tseries package,

# ADF test using tseries package
adf.test(diff.BC.ArcticSea, alternative = c("stationary")) # Omit k to use the default formula

## Warning in adf.test(diff.BC.ArcticSea, alternative = c("stationary")): p-value
## smaller than printed p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff.BC.ArcticSea
## Dickey-Fuller = -6.6147, Lag order = 3, p-value = 0.01
## alternative hypothesis: stationary

Result: p value 0.01. Null hypothesis is rejected. Data is Stationary

PP test,

pp.test(diff.BC.ArcticSea) # gives p value 0.01

## Warning in pp.test(diff.BC.ArcticSea): p-value smaller than printed p-value

## 
##  Phillips-Perron Unit Root Test
## 
## data:  diff.BC.ArcticSea
## Dickey-Fuller Z(alpha) = -54.357, Truncation lag parameter = 3, p-value
## = 0.01
## alternative hypothesis: stationary

Result: Consistent result. Null hypothesis is rejected. Data is Stationary

KPSS test,

kpss.test(diff.BC.ArcticSea) # gives p value 0.1

## Warning in kpss.test(diff.BC.ArcticSea): p-value greater than printed p-value

## 
##  KPSS Test for Level Stationarity
## 
## data:  diff.BC.ArcticSea
## KPSS Level = 0.073296, Truncation lag parameter = 3, p-value = 0.1

Result: Consistent result. Null hypothesis holds. Data is Stationary

Conclusion from Stationarity testing on First Differenced BC transformed series:

Since all 4 test results suggests stationarity, we conclude, \(diff.BC.ArcticSea\) time series is stationary. We proceed with First Differenced BC transformed series.

Model Specification

Using ACF and PACF plots

Lets plot ACF and PACF autocorrelation plots for our time series \(diff.BC.ArcticSea\).

par(mfrow=c(2,1))
acf(diff.BC.ArcticSea, main ="ACF plot of First Differenced BC tranformed series")
# There is no slowly decaying pattern in ACF, thus the 1st differenced BC tranformed series is stationary. #mention late lags
pacf(diff.BC.ArcticSea, main ="PACF plot of First Differenced BC tranformed series")

par(mfrow=c(1,1))

ACF plot: We see the first autocorrelation is significant. Also, at lag 5, we see a significant autocorrelation but we consider this as a late lag. Hence, we have one Moving average (MA) parameter, i.e, q=1.

Descriptive analysis Also, no slowly decaying pattern or wave pattern is visible, indicating, stationary and no seasonal component in First differenced time series.

PACF plot: We see 3 autocorrelations are significant (1,2 and 4). 3rd autocorrelation maybe falsely reported as insignificant and 2nd autocorrelation maybe falsely reported as significant. Thus, we have 2,3 or 4 Auto Regressive (AR) parameters, i.e, p=2,3, or 4.

Descriptive analysis Also, we do not see a first very high vertical significance bar in PACF, indicating the transformed series is stationary.

Note -

Conclusion and possible set of models:

From ACF and PACF plots, our time series’s model is an ARIMA model with p=2,3, or 4, q=1 and d=1 (first differenced). An ARIMA model has ARIMA(p,d,q) format hence, our possible set of models are :

{ARIMA(2,1,1), ARIMA(3,1,1), ARIMA(4,1,1)}

Using EACF (Extended Autocorrelation Function)

For a mixed ARMA model, extended autocorrelation (EACF) method is better than ACF and PACF for order identification of AR and MA components. A vertex is chosen based on the top most and left most \(0\) in the EACF matrix (\(0\) shouldn’t have \(x\) to the close right as it becomes less significant). Lets plot EACF matrix for our time series,

eacf(diff.BC.ArcticSea)

## AR/MA
##   0 1 2 3 4 5 6 7 8 9 10 11 12 13
## 0 x o o o o o o o o o o  o  o  o 
## 1 x x o o x o o o o o o  o  o  o 
## 2 x o o o o o o o o o o  o  o  o 
## 3 x o o o o o o o o o o  o  o  o 
## 4 o o o o o o o o o o o  o  o  o 
## 5 o o o o o o o o o o o  o  o  o 
## 6 o x o o o o o o o o o  o  o  o 
## 7 o x o o o o o o o o o  o  o  o

Vertical axis gives p lags and horizontal axis gives q lags. Our vertex is at p=0,q=1. And it has two neighboring models with p=0,q=2 and p=1,q=2. Thus, from EACF, our possible set of models are,

IMA(1,1) or ARIMA(0,1,1) #best
IMA(1,2) or ARIMA(0,1,2) #second best
ARIMA(1,1,2) # p=1.q=2 #third best

Possible set of models:

{ARIMA(0,1,1), ARIMA(0,1,2), ARIMA(1,1,2)}

Using AIC and BIC scores

Akaike’s (1973) Information Criterion (AIC) and Bayesian Information Criterion (BIC) are useful model specification and selection criterion.

In AIC, the aim is to reduce the AIC score based on the maximum likelihood estimation. AIC score is calculated as,

\[AIC=−2log(maximum likelihood)+2k\]

where \(k=p+q+1\) if the model contains an intercept or constant term and \(k=p+q\) otherwise. \(2k\) is penalty function to ensure parsimonious models and to avoid too many parameters in the model.

In BIC, the aim again is to reduce BIC score based on the maximum likelihood estimation, but the penalty function is different than AIC’s penalty function. BIC score is calculated as,

\[BIC=−2log(maximum likelihood)+klog(n)\]

For a model with finite orders, BIC is consistent.

BIC table for Arctic Sea Ice Extent series

Lets create the BIC table for our time series using the following code chunk. The nar and nma are set low as we do not see higher order models from ACF, PACF or EACF. (Note - BIC tables using nma,nar = 10 produce very high order models. nma,nar = 5 produced same result as nma,nar = 7)

# Generate BIC table and plot
res=armasubsets(y=diff.BC.ArcticSea,nar=7,nma=7,y.name='p', ar.method='ols') # fyi, taking nma and nar as 5 gives same models. 7 gives better results
plot(res)
title(main = "BIC table for First Differenced BC transformed
Arctic Sea Ice Extent data", line = 5)

Here, p-lags give p of AR component and error-lags give q of MA component. We notice, p=0,1,2 are consistent in multiple models. and q=1,2 in the top model. Also, we consider q=4 as in the third best model. Note - top row gives the best model and vice-versa.

Possible set of models:

(Comments are added next to models)

ARIMA(0,1,1) #captured by eacf
ARIMA(0,1,2) #captured by eacf

ARIMA(1,1,0)
ARIMA(2,1,0)
ARIMA(1,1,1)
ARIMA(2,1,1)
ARIMA(1,1,2) #captured by eacf
ARIMA(2,1,2)

ARIMA(0,1,4) # large! As expected BIC tables gives larger models in general
ARIMA(1,1,4) # large!
ARIMA(2,1,5) # large!

Final Set or ARIMA models

Compiling ARIMA models obtained from ACF, PACF, EACF and BIC table. The final set of 12 models is,

1. ARIMA(0,1,1)
2. ARIMA(0,1,2)

3. ARIMA(1,1,0)
4. ARIMA(2,1,0)
5. ARIMA(2,1,1)
6. ARIMA(1,1,2)
7. ARIMA(2,1,2)

8. ARIMA(0,1,4)
9. ARIMA(1,1,4)
10. ARIMA(2,1,5)

11. ARIMA(3,1,1)
12. ARIMA(4,1,1)

Model Fitting (Parameter Estimation)

The parameter estimates of each of these 12 models need to be analysed and the best fitting model is chosen based on the estimation method used. There are 2 major parameter methods used are Least Squares Estimation and Maximum Likelihood Estimation. It is important to note that, Maximum Likelihood Estimation assumes normality in the data. CSS estimates the coefficients by trying to minimize the distance of the data points from the line of best fit.

Lets perform parameter estimation for each of the 12 models. (Note - In the test outputs, ar1 stands for p or \(\phi_1\), ma1 stands for q or \(\theta_1\). p or q values significance is stated in comments next to r codes.)

Parameter estimation using Least Square and Maximum likelihood techniques

1 ARIMA(0,1,1) (Result: All significant parameters)

The following code chunk perform ML and CSS estimations (CSS stands for conditional sum of squares which is the Least Square estimation method)

# Maximum likelihood method
model.011 = arima(BC.ArcticSea, order=c(0,1,1), method = 'ML')
coeftest(model.011)

## 
## z test of coefficients:
## 
##      Estimate Std. Error z value  Pr(>|z|)    
## ma1 -0.624619   0.089687 -6.9644 3.297e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# q=1 is significant

# Least squares method
model.011CSS = arima(BC.ArcticSea, order=c(0,1,1), method = 'CSS')
coeftest(model.011CSS)

## 
## z test of coefficients:
## 
##      Estimate Std. Error z value  Pr(>|z|)    
## ma1 -0.633040   0.088697 -7.1371 9.529e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# q=1 is significant (same result)

2 ARIMA(0,1,2) (Result: Insignificant parameters present)

# Maximum likelihood method
model.012 = arima(BC.ArcticSea, order=c(0,1,2), method = 'ML')
coeftest(model.012)

## 
## z test of coefficients:
## 
##     Estimate Std. Error z value  Pr(>|z|)    
## ma1 -0.79782    0.19634 -4.0635 4.835e-05 ***
## ma2  0.18396    0.17213  1.0688    0.2852    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# q=1 is significant, q=2 is insignificant

# Least squares method
model.012CSS = arima(BC.ArcticSea, order=c(0,1,2), method = 'CSS')
coeftest(model.012CSS)

## 
## z test of coefficients:
## 
##     Estimate Std. Error z value  Pr(>|z|)    
## ma1 -0.80431    0.19690 -4.0848 4.412e-05 ***
## ma2  0.18115    0.17296  1.0474    0.2949    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# q=1 is significant, q=2 is insignificant (same result)

3 ARIMA(1,1,0) (Result: All significant parameters)

# Maximum likelihood method
model.110 = arima(BC.ArcticSea, order=c(1,1,0), method = 'ML')
coeftest(model.110)

## 
## z test of coefficients:
## 
##     Estimate Std. Error z value  Pr(>|z|)    
## ar1 -0.51093    0.12970 -3.9394 8.169e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# p=1 is significant

# Least squares method
model.110CSS = arima(BC.ArcticSea, order=c(1,1,0), method = 'CSS')
coeftest(model.110CSS)

## 
## z test of coefficients:
## 
##     Estimate Std. Error z value Pr(>|z|)    
## ar1 -0.51037    0.12899 -3.9567  7.6e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# p=1 is significant (same result)

4 ARIMA(2,1,0) (Result: Insignificant parameters present)

# Maximum likelihood method
model.210 = arima(BC.ArcticSea, order=c(2,1,0), method = 'ML')
coeftest(model.210)

## 
## z test of coefficients:
## 
##     Estimate Std. Error z value  Pr(>|z|)    
## ar1 -0.65664    0.14556 -4.5111 6.451e-06 ***
## ar2 -0.28651    0.14738 -1.9440   0.05189 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# p=1 is significant, p=2 is marginally insignificant 

# Least squares method
model.210CSS = arima(BC.ArcticSea, order=c(2,1,0), method = 'CSS')
coeftest(model.210CSS)

## 
## z test of coefficients:
## 
##     Estimate Std. Error z value  Pr(>|z|)    
## ar1 -0.64969    0.14605 -4.4485 8.645e-06 ***
## ar2 -0.29060    0.14659 -1.9824   0.04743 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# p=1,2 are significant 

# Maximum likelihood using CSS method
model.210CSSML = arima(BC.ArcticSea, order=c(2,1,0), method = 'CSS-ML')
coeftest(model.210CSSML)

## 
## z test of coefficients:
## 
##     Estimate Std. Error z value  Pr(>|z|)    
## ar1 -0.65664    0.14556 -4.5111 6.451e-06 ***
## ar2 -0.28651    0.14738 -1.9440   0.05189 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# same as ML

Since our data is normal, we go with ML results which are backed by CSS-ML results as well. Thus, for ARIMA(2,1,0), we conclude p=1 is significant, p=2 is marginally insignificant.

5 ARIMA(2,1,1) (Result: Insignificant parameters present)

# Maximum likelihood method
model.211 = arima(BC.ArcticSea, order=c(2,1,1), method = 'ML')
coeftest(model.211)

## 
## z test of coefficients:
## 
##     Estimate Std. Error z value Pr(>|z|)  
## ar1 -0.26765    0.24156  -1.108  0.26785  
## ar2 -0.11904    0.19482  -0.611  0.54119  
## ma1 -0.47506    0.19761  -2.404  0.01622 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# q=1 is significant, p=1 & 2 are insignificant 

# Least squares method
model.211CSS = arima(BC.ArcticSea, order=c(2,1,1), method = 'CSS')
coeftest(model.211CSS)

## 
## z test of coefficients:
## 
##     Estimate Std. Error z value Pr(>|z|)  
## ar1 -0.25440    0.23763 -1.0705  0.28437  
## ar2 -0.11833    0.19407 -0.6097  0.54203  
## ma1 -0.48960    0.19240 -2.5447  0.01094 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# q=1 is significant, p=1 & 2 are insignificant (same result)

6 ARIMA(1,1,2) (Result: All significant parameters)

# Maximum likelihood method
model.112 = arima(BC.ArcticSea, order=c(1,1,2), method = 'ML')
coeftest(model.112)

## 
## z test of coefficients:
## 
##     Estimate Std. Error z value  Pr(>|z|)    
## ar1  0.73582    0.11788  6.2419 4.323e-10 ***
## ma1 -1.81253    0.23885 -7.5886 3.233e-14 ***
## ma2  0.99998    0.26092  3.8325 0.0001268 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# p=1 q=1,2 are significant 

# Least squares method
model.112CSS = arima(BC.ArcticSea, order=c(1,1,2), method = 'CSS')
coeftest(model.112CSS)

## 
## z test of coefficients:
## 
##     Estimate Std. Error z value Pr(>|z|)
## ar1 -0.49214    0.39095 -1.2589   0.2081
## ma1 -0.20062    0.45422 -0.4417   0.6587
## ma2 -0.22168    0.32054 -0.6916   0.4892

# p=1 q=1,2 are significant (same result)

# Maximum likelihood using CSS method
model.112CSSML = arima(BC.ArcticSea, order=c(1,1,2), method = 'CSS-ML')
coeftest(model.112CSSML)

## 
## z test of coefficients:
## 
##     Estimate Std. Error z value  Pr(>|z|)    
## ar1  0.73582    0.11788  6.2419 4.322e-10 ***
## ma1 -1.81254    0.23886 -7.5883 3.241e-14 ***
## ma2  0.99998    0.26093  3.8324 0.0001269 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# same as ML

Since our data is normal, we go with ML results which are backed by CSS-ML results as well. Thus, for ARIMA(1,1,2), we conclude p=1 q=1,2 are significant.

7 ARIMA(2,1,2) (Result: Insignificant parameters present)

# Maximum likelihood method
model.212 = arima(BC.ArcticSea, order=c(2,1,2), method = 'ML')
coeftest(model.212)

## 
## z test of coefficients:
## 
##     Estimate Std. Error  z value  Pr(>|z|)    
## ar1  0.80111    0.15582   5.1414 2.727e-07 ***
## ar2 -0.10819    0.16268  -0.6651     0.506    
## ma1 -1.80370    0.14114 -12.7793 < 2.2e-16 ***
## ma2  1.00000    0.15058   6.6412 3.112e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# p=1 q=1,2 significant. p=2 is insignificant

# Least squares method
model.212CSS = arima(BC.ArcticSea, order=c(2,1,2), method = 'CSS')
coeftest(model.212CSS)

## 
## z test of coefficients:
## 
##     Estimate Std. Error z value  Pr(>|z|)    
## ar1  0.38248    0.16886  2.2650   0.02351 *  
## ar2 -0.10864    0.22575 -0.4812   0.63035    
## ma1 -1.29920    0.19812 -6.5578 5.462e-11 ***
## ma2  0.60132    0.14913  4.0322 5.526e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# p=1 q=1,2 significant. p=2 is insignificant (same result)

8 ARIMA(0,1,4) (Result: Insignificant parameters present)

# Maximum likelihood method
model.014 = arima(BC.ArcticSea, order=c(0,1,4), method = 'ML')
coeftest(model.014)

## 
## z test of coefficients:
## 
##     Estimate Std. Error z value  Pr(>|z|)    
## ma1 -1.10826    0.22886 -4.8425 1.282e-06 ***
## ma2  0.34692    0.31566  1.0990   0.27176    
## ma3 -0.28642    0.39077 -0.7330   0.46358    
## ma4  0.53777    0.23203  2.3177   0.02047 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# q=1,4 are significant. q=2,3 are insignificant

# Least squares method
model.014CSS = arima(BC.ArcticSea, order=c(0,1,4), method = 'CSS')
coeftest(model.014CSS)

## 
## z test of coefficients:
## 
##     Estimate Std. Error z value  Pr(>|z|)    
## ma1 -0.98442    0.15529 -6.3394 2.307e-10 ***
## ma2  0.23908    0.27357  0.8739   0.38216    
## ma3 -0.20276    0.31637 -0.6409   0.52159    
## ma4  0.41486    0.18925  2.1921   0.02837 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# q=1,4 are significant. q=2,3 are insignificant (same result)

9 ARIMA(1,1,4) (Result: Insignificant parameters present)

# Maximum likelihood method
model.114 = arima(BC.ArcticSea, order=c(1,1,4), method = 'ML')
coeftest(model.114)

## 
## z test of coefficients:
## 
##     Estimate Std. Error z value  Pr(>|z|)    
## ar1  0.28937    0.31064  0.9315   0.35158    
## ma1 -1.29269    0.28898 -4.4732 7.705e-06 ***
## ma2  0.52742    0.29141  1.8099   0.07031 .  
## ma3 -0.24846    0.40710 -0.6103   0.54165    
## ma4  0.41647    0.32422  1.2845   0.19896    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# q=1 is significant. p=1 q=2,3,4 are insignificant (q=2 is marginally insignificant)

# Least squares method
model.114CSS = arima(BC.ArcticSea, order=c(1,1,4), method = 'CSS')
coeftest(model.114CSS)

## 
## z test of coefficients:
## 
##      Estimate Std. Error z value Pr(>|z|)    
## ar1 -0.180249   0.260028 -0.6932  0.48819    
## ma1 -0.783891   0.234749 -3.3393  0.00084 ***
## ma2  0.044283   0.340892  0.1299  0.89664    
## ma3 -0.127907   0.274208 -0.4665  0.64089    
## ma4  0.411130   0.164678  2.4966  0.01254 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# q=1,4 are significant.p=1 q=2,3 are insignificant 

# Maximum likelihood using CSS method
model.114CSSML = arima(BC.ArcticSea, order=c(1,1,4), method = 'CSS-ML')
coeftest(model.114CSSML)

## 
## z test of coefficients:
## 
##     Estimate Std. Error z value  Pr(>|z|)    
## ar1  0.28937    0.31064  0.9315   0.35158    
## ma1 -1.29269    0.28898 -4.4732 7.705e-06 ***
## ma2  0.52742    0.29141  1.8099   0.07032 .  
## ma3 -0.24845    0.40710 -0.6103   0.54166    
## ma4  0.41647    0.32422  1.2845   0.19896    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# q=1 is significant. p=1 q=2,3,4 are insignificant (q=2 is marginally insignificant)

Since our data is normal, we go with ML results which are backed by CSS-ML results as well. Thus, for ARIMA(1,1,4), we conclude q=1 is significant. p=1 q=2,3,4 are insignificant.

10 ARIMA(2,1,5) (Result: Insignificant parameters present)

# Maximum likelihood method
model.215 = arima(BC.ArcticSea, order=c(2,1,5), method = 'ML')
coeftest(model.215)

## 
## z test of coefficients:
## 
##      Estimate Std. Error z value Pr(>|z|)   
## ar1 -0.474186   0.222338 -2.1327 0.032948 * 
## ar2 -0.312789   0.173710 -1.8006 0.071759 . 
## ma1 -0.446532   0.300718 -1.4849 0.137574   
## ma2  0.085727   0.244297  0.3509 0.725652   
## ma3 -0.536850   0.208690 -2.5725 0.010097 * 
## ma4  0.216720   0.290285  0.7466 0.455319   
## ma5  0.689492   0.261994  2.6317 0.008496 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# p=1 q=3,5 are significant. p=2 q=1,2,4 are insignificant (p=2 is marginally insignificant)

# Least squares method
model.215CSS = arima(BC.ArcticSea, order=c(2,1,5), method = 'CSS')
coeftest(model.215CSS)

## 
## z test of coefficients:
## 
##     Estimate Std. Error z value Pr(>|z|)   
## ar1 -0.40054    0.21241 -1.8857 0.059333 . 
## ar2 -0.10792    0.25549 -0.4224 0.672716   
## ma1 -0.52725    0.18951 -2.7822 0.005400 **
## ma2 -0.15832    0.24584 -0.6440 0.519565   
## ma3 -0.26272    0.25815 -1.0177 0.308816   
## ma4  0.18789    0.16095  1.1674 0.243055   
## ma5  0.59860    0.20400  2.9343 0.003343 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# q=1,5 are significant. p=1,2 q=2,3,4 are insignificant (p=1 is marginally insignificant)

# Maximum likelihood using CSS method
model.215CSSML = arima(BC.ArcticSea, order=c(2,1,5), method = 'CSS-ML')
coeftest(model.215CSSML)

## 
## z test of coefficients:
## 
##      Estimate Std. Error z value Pr(>|z|)   
## ar1 -0.474148   0.222361 -2.1323 0.032980 * 
## ar2 -0.312808   0.173708 -1.8008 0.071738 . 
## ma1 -0.446568   0.300791 -1.4846 0.137638   
## ma2  0.085788   0.244345  0.3511 0.725518   
## ma3 -0.536883   0.208715 -2.5723 0.010102 * 
## ma4  0.216759   0.290342  0.7466 0.455327   
## ma5  0.689430   0.262035  2.6311 0.008512 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# p=1 q=3,5 are significant. p=2 q=1,2,4 are insignificant (p=2 is marginally insignificant)

Since our data is normal, we go with ML results which are backed by CSS-ML results as well. Thus, for ARIMA(2,1,5), we conclude p=1 q=3,5 are significant. p=2 q=1,2,4 are insignificant (p=2 is marginally insignificant)

11 ARIMA(3,1,1) (Result: All Insignificant parameters)

# Maximum likelihood method
model.311 = arima(BC.ArcticSea, order=c(3,1,1), method = 'ML')
coeftest(model.311)

## 
## z test of coefficients:
## 
##     Estimate Std. Error z value Pr(>|z|)
## ar1 -0.45129    0.29735 -1.5177   0.1291
## ar2 -0.27370    0.24055 -1.1378   0.2552
## ar3 -0.17910    0.17667 -1.0138   0.3107
## ma1 -0.30602    0.27014 -1.1328   0.2573

#p=1,2,3 q=1 all are insignificant

# Least squares method
model.311CSS = arima(BC.ArcticSea, order=c(3,1,1), method = 'CSS')
coeftest(model.311CSS)

## 
## z test of coefficients:
## 
##     Estimate Std. Error z value Pr(>|z|)
## ar1 -0.44972    0.28509 -1.5775   0.1147
## ar2 -0.29013    0.23739 -1.2222   0.2216
## ar3 -0.19020    0.17446 -1.0903   0.2756
## ma1 -0.30295    0.25925 -1.1686   0.2426

#p=1,2,3 q=1 all are insignificant (same result)

12 ARIMA(4,1,1) (Result: Insignificant parameters present)

# Maximum likelihood method
model.411 = arima(BC.ArcticSea, order=c(4,1,1), method = 'ML')
coeftest(model.411)

## 
## z test of coefficients:
## 
##     Estimate Std. Error z value Pr(>|z|)    
## ar1 -1.11974    0.28757 -3.8938 9.87e-05 ***
## ar2 -0.80377    0.26848 -2.9938 0.002756 ** 
## ar3 -0.58980    0.21973 -2.6842 0.007271 ** 
## ar4 -0.37172    0.14072 -2.6415 0.008254 ** 
## ma1  0.36189    0.28986  1.2485 0.211841    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#p=1,2,3,4 are significant. q=1 is insignificant

# Least squares method
model.411CSS = arima(BC.ArcticSea, order=c(4,1,1), method = 'CSS')
coeftest(model.411CSS)

## 
## z test of coefficients:
## 
##     Estimate Std. Error z value  Pr(>|z|)    
## ar1 -1.11548    0.27000 -4.1314 3.606e-05 ***
## ar2 -0.81589    0.25857 -3.1553  0.001603 ** 
## ar3 -0.62407    0.22034 -2.8323  0.004622 ** 
## ar4 -0.39401    0.14334 -2.7488  0.005981 ** 
## ma1  0.36544    0.26705  1.3684  0.171173    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#p=1,2,3,4 are significant. q=1 is insignificant (same result)

13 ARIMA(4,1,0) (attempting underfitting model) (Result: Insignificant parameters present)

Since, ARIMA(4,1,1) has q=1 is insignificant, lets see if ARIMA(4,1,0) fits,

# Maximum likelihood method
model.410 = arima(BC.ArcticSea, order=c(4,1,0), method = 'ML')
coeftest(model.410)

## 
## z test of coefficients:
## 
##     Estimate Std. Error z value  Pr(>|z|)    
## ar1 -0.79023    0.14605 -5.4107 6.279e-08 ***
## ar2 -0.55914    0.17937 -3.1172  0.001826 ** 
## ar3 -0.43034    0.17667 -2.4358  0.014859 *  
## ar4 -0.27411    0.14458 -1.8959  0.057978 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#p=1,2,3 are significant. p=4 is marginally insignificant

# Least squares method
model.410CSS = arima(BC.ArcticSea, order=c(4,1,0), method = 'CSS')
coeftest(model.410CSS)

## 
## z test of coefficients:
## 
##     Estimate Std. Error z value Pr(>|z|)    
## ar1 -0.78480    0.14642 -5.3598 8.33e-08 ***
## ar2 -0.57383    0.17907 -3.2044 0.001353 ** 
## ar3 -0.45667    0.17991 -2.5384 0.011137 *  
## ar4 -0.29129    0.14739 -1.9763 0.048117 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#p=1,2,3 are significant. p=4 is marginally insignificant

Since our data is normal, we go with ML results. Thus, for ARIMA(4,1,0), p=4 are insignificant.

Parameter estimations for 13 models were analysed. Now, rate these models based on their AIC and BIC scores and pick one model with the best AIC score and one with best BIC score. The following chunk of code does this,

AIC Scores:

# Creating a custom function, sort.score, to arrange AIC and BIC scores for our 13 specified models 
sort.score <- function(x, score = c("bic", "aic")){
  if (score == "aic"){
    x[with(x, order(AIC)),]
  } else if (score == "bic") {
    x[with(x, order(BIC)),]
  } else {
    warning('score = "x" only accepts valid arguments ("aic","bic")')
  }
}

# Get AIC scores for 13 models and arrange in ascending order based on AIC scores 
sort.score(AIC(model.011 ,model.012 ,model.110 ,model.210 ,model.211 ,model.112 ,model.212 ,model.014 ,model.114 ,model.215 ,model.311 ,model.411 ,model.410), score = 'aic')

##           df      AIC
## model.112  4 131.4230
## model.212  5 132.9743
## model.014  5 133.3630
## model.215  8 133.5519
## model.114  6 134.1905
## model.011  2 136.0854
## model.012  3 136.8267
## model.410  5 137.6498
## model.411  6 138.4298
## model.211  4 138.9046
## model.210  3 139.3581
## model.311  5 140.0541
## model.110  2 140.9475

Result: model112 has the least AIC score of 131.4230. That is, ARIMA(1,1,2) model is best fitting model as per AIC scores.

BIC Scores:

# Get AIC scores for 13 models and arrange in ascending order based on AIC scores
sort.score(BIC(model.011 ,model.012 ,model.110 ,model.210 ,model.211 ,model.112 ,model.212 ,model.014 ,model.114 ,model.215 ,model.311 ,model.411 ,model.410), score = 'bic')

##           df      BIC
## model.112  4 138.4678
## model.011  2 139.6078
## model.212  5 141.7803
## model.012  3 142.1103
## model.014  5 142.1690
## model.110  2 144.4699
## model.210  3 144.6417
## model.114  6 144.7577
## model.211  4 145.9494
## model.410  5 146.4558
## model.215  8 147.6415
## model.311  5 148.8601
## model.411  6 148.9970

Result: model112 has the least BIC score of 138.4678. That is, ARIMA(1,1,2) model is best fitting model as per BIC scores.

Conclusion of parameter estimation using AIC and BIC scores:

ARIMA(1,1,2) model is best fitting model as it has best AIC and BIC scores.

Error Estimation

Another way to test the fit of a model is by checking the error estimates. Lower the error estimates, better the models fit. Lets perform error estimation for each of the 13 models and compare the results. This is done using the following chunk of code,

#Lets check the error estimates of the 13 models
library('forecast')
model.011A = Arima(BC.ArcticSea, order=c(0,1,1), method='ML') 
model.012A = Arima(BC.ArcticSea, order=c(0,1,2), method='ML') 
model.110A = Arima(BC.ArcticSea, order=c(1,1,0), method='ML') 
model.210A = Arima(BC.ArcticSea, order=c(2,1,0), method='ML') 
model.211A = Arima(BC.ArcticSea, order=c(2,1,1), method='ML') 
model.112A = Arima(BC.ArcticSea, order=c(1,1,2), method='ML') 
model.212A = Arima(BC.ArcticSea, order=c(2,1,2), method='ML') 
model.014A = Arima(BC.ArcticSea, order=c(0,1,4), method='ML') 
model.114A = Arima(BC.ArcticSea, order=c(1,1,4), method='ML') 
model.215A = Arima(BC.ArcticSea, order=c(2,1,5), method='ML') 
model.311A = Arima(BC.ArcticSea, order=c(3,1,1), method='ML') 
model.411A = Arima(BC.ArcticSea, order=c(4,1,1), method='ML') 
model.410A = Arima(BC.ArcticSea, order=c(4,1,0), method='ML') 

Smodel.011A <- accuracy(model.011A)[1:7]
Smodel.012A <- accuracy(model.012A)[1:7]
Smodel.110A <- accuracy(model.110A)[1:7]
Smodel.210A <- accuracy(model.210A)[1:7]
Smodel.211A <- accuracy(model.211A)[1:7]
Smodel.112A <- accuracy(model.112A)[1:7]
Smodel.212A <- accuracy(model.212A)[1:7]
Smodel.014A <- accuracy(model.014A)[1:7]
Smodel.114A <- accuracy(model.114A)[1:7]
Smodel.215A <- accuracy(model.215A)[1:7]
Smodel.311A <- accuracy(model.311A)[1:7]
Smodel.411A <- accuracy(model.411A)[1:7]
Smodel.410A <- accuracy(model.410A)[1:7]

df.Smodels <- data.frame(
  rbind(Smodel.011A, Smodel.012A, Smodel.110A
        , Smodel.210A, Smodel.211A, Smodel.112A
        , Smodel.212A, Smodel.014A, Smodel.114A
        , Smodel.215A, Smodel.311A, Smodel.411A, Smodel.410A)
)
colnames(df.Smodels) <- c("ME", "RMSE", "MAE", "MPE", "MAPE",
                          "MASE", "ACF1")
rownames(df.Smodels) <- c("ARIMA(0,1,1)", "ARIMA(0,1,2)", "ARIMA(1,1,0)"
                          ,"ARIMA(2,1,0)", "ARIMA(2,1,1)", "ARIMA(1,1,2)" 
                          ,"ARIMA(2,1,2)", "ARIMA(0,1,4)", "ARIMA(1,1,4)" 
                          ,"ARIMA(2,1,5)", "ARIMA(3,1,1)", "ARIMA(4,1,1)", "ARIMA(4,1,0)")
df.Smodels

##                      ME      RMSE       MAE       MPE     MAPE      MASE
## ARIMA(0,1,1) -0.3062147 1.1048327 0.9068893 -6.637663 14.04689 0.7974244
## ARIMA(0,1,2) -0.3008403 1.0865081 0.8789888 -6.493769 13.70052 0.7728916
## ARIMA(1,1,0) -0.1574984 1.1717076 0.9798613 -4.355535 14.96707 0.8615885
## ARIMA(2,1,0) -0.2163730 1.1215838 0.9235469 -5.269109 14.17559 0.8120714
## ARIMA(2,1,1) -0.3068757 1.0880547 0.8814228 -6.608009 13.70375 0.7750319
## ARIMA(1,1,2) -0.1587656 0.9409922 0.7742585 -3.832943 11.85723 0.6808026
## ARIMA(2,1,2) -0.1782252 0.9387239 0.7636391 -4.150710 11.68170 0.6714651
## ARIMA(0,1,4) -0.2386894 0.9376547 0.7564040 -4.836657 11.59038 0.6651033
## ARIMA(1,1,4) -0.2007748 0.9301629 0.7640564 -4.394948 11.62921 0.6718320
## ARIMA(2,1,5) -0.2081998 0.8380855 0.6899258 -4.162401 10.45515 0.6066493
## ARIMA(3,1,1) -0.3215838 1.0762527 0.8557404 -6.790272 13.30575 0.7524494
## ARIMA(4,1,1) -0.3368329 1.0275668 0.8268813 -6.726672 12.70590 0.7270737
## ARIMA(4,1,0) -0.3607930 1.0436981 0.8420779 -7.193197 13.10259 0.7404360
##                     ACF1
## ARIMA(0,1,1) -0.20874144
## ARIMA(0,1,2) -0.05823374
## ARIMA(1,1,0) -0.16715046
## ARIMA(2,1,0) -0.11766083
## ARIMA(2,1,1) -0.11553479
## ARIMA(1,1,2)  0.02964037
## ARIMA(2,1,2) -0.05667410
## ARIMA(0,1,4)  0.02146668
## ARIMA(1,1,4) -0.05384995
## ARIMA(2,1,5) -0.05556044
## ARIMA(3,1,1) -0.13557784
## ARIMA(4,1,1) -0.10370153
## ARIMA(4,1,0) -0.07720725

From error estimates table, Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) for ARIMA(2,1,5) model is the lowest. But, AIC and BIC scores of ARIMA(2,1,5) models are not good. most importantly, the parameter estimation using ML and CSS for ARIMA(2,1,5) (10th model) results in insignificant parameters p=2 q=1,2,4. Only, p=1 q=3,5 are significant. Also, ARIMA(2,1,5) disregards parsimony as it is bigger model. Thus, we can reject ARIMA(2,1,5) even though it has best least error estimates.

From error estimates table, as per Mean Error (ME), ARIMA(1,1,2) has second best score. Best ME score is by ARIMA(1,1,0) model, but its AIC score is the worst(13th) and BIC score is not good as well (6th best). Although, Mean Error is not a good estimate as positive and negative error values get cancelled, ARIMA(1,1,2) scores very good as per Mean Error.

Conclusion from Error Estimation

ARIMA(1,1,2) is the best fitting model even though it has not so good error estimates.

Final Best Fitting model:

ARIMA(1,1,2) has the all significant parameters as per parameter estimation using Maximum Likelihood (ML) estimation technique. ARIMA(1,1,2) has the lowest AIC and BIC scores indicating best fit out of specified 13 models. Although, performing averagely in error estimation, ARIMA(1,1,2) is the best fitting model for our Box-Cox transformed Arctic Sea Ice Extent time series data.

Overfitting models:

Adjacent overfit models of ARIMA(1,1,2) are ARIMA(3,1,2) and ARIMA(1,1,4). We already have seen parameter estimates of ARIMA(1,1,4) (9th model) during Parameter Estimation resulted in insignificant parameters. As for ARIMA(3,1,2), lets perform parameter estimation using ML and CSS to estimate its fit,

# Maximum likelihood method
model.312 = arima(BC.ArcticSea, order=c(3,1,2), method = 'ML')
coeftest(model.312)

## 
## z test of coefficients:
## 
##      Estimate Std. Error  z value  Pr(>|z|)    
## ar1  0.796483   0.157318   5.0629 4.130e-07 ***
## ar2 -0.074527   0.196206  -0.3798    0.7041    
## ar3 -0.049002   0.156986  -0.3121    0.7549    
## ma1 -1.801055   0.126791 -14.2049 < 2.2e-16 ***
## ma2  0.999999   0.132851   7.5272 5.182e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# p=1 q=1,2 are significant. p=2,3 are insignificant. (Results points us to 112 model as p=1, q=2 are significant)

# Least squares method
model.312CSS = arima(BC.ArcticSea, order=c(3,1,2), method = 'CSS')
coeftest(model.312CSS)

## 
## z test of coefficients:
## 
##      Estimate Std. Error  z value  Pr(>|z|)    
## ar1 -0.409206   0.099848  -4.0983 4.162e-05 ***
## ar2 -0.920922   0.061145 -15.0613 < 2.2e-16 ***
## ar3 -0.505729   0.109665  -4.6116 3.996e-06 ***
## ma1 -0.192102   0.058683  -3.2736  0.001062 ** 
## ma2  1.218802   0.058466  20.8464 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# all coefficients are significant.

Since our data is normal, we go with ML results which gives us p=2,3 as insignificant. Thus, we reject ARIMA(3,1,2) model

Final Analysis Conclusion:

The best fitting model we could achieved for Arctic Sea Ice Extent time series data is ARIMA(1,1,2) model. It has AR and MA components where MA component is more dominant than AR component as expected at descriptive analysis stage. Minor power transformation and First order Differencing helped the raw series achieve normality and stationarity. The best fitting model as per AIC and BIC scores was ARIMA(1,1,2) with parameter estimates significant. Although, performing averagely in error estimation, ARIMA(1,1,2) is the best fitting model for our Box-Cox transformed Arctic Sea Ice Extent time series data.

Future Directions:

As we have the best fitting model in hand, the model can be defined using the parameter estimates. Next, residual analysis can be performed to test any assumptions behind the fitted model. Forecasting for the Arctic Sea Ice extent using the defined model can be done to collect forecast values for required years.

Time Series Analysis - Project 2

Time Series Analysis of Arctic Sea Ice Extent: Model Selection and Forecasting with ARIMA

Prepared by Shaikh Mohammad Rahil

April 23, 2023

Table of Contents

Libraries

Introduction

Objective

Arctic Sea Ice Extent data

Read data

Descriptive Analysis

Summary statistics

Time Series plot:

ACF and PACF plots:

Check normality

Conclusion from descriptive analysis:

Transformations

Box-Cox or Log transformation to improve normality or variance

Conclusion from BC transformation:

Differencing to improve Stationarity

Testing Stationarity on BC Transformed series

Conclusion from Stationarity testing on BC transformed series:

Transformation using Differencing

Time Series plot of transformed series

Testing normality for First Differenced BC Transformed series

Testing Stationarity on First Differenced BC Transformed series

Conclusion from Stationarity testing on First Differenced BC transformed series:

Model Specification

Using ACF and PACF plots

Conclusion and possible set of models:

Using EACF (Extended Autocorrelation Function)

Possible set of models:

Using AIC and BIC scores

BIC table for Arctic Sea Ice Extent series

Possible set of models:

Final Set or ARIMA models

Model Fitting (Parameter Estimation)

Parameter estimation using Least Square and Maximum likelihood techniques

1 ARIMA(0,1,1) (Result: All significant parameters)

2 ARIMA(0,1,2) (Result: Insignificant parameters present)

3 ARIMA(1,1,0) (Result: All significant parameters)

4 ARIMA(2,1,0) (Result: Insignificant parameters present)

5 ARIMA(2,1,1) (Result: Insignificant parameters present)

6 ARIMA(1,1,2) (Result: All significant parameters)

7 ARIMA(2,1,2) (Result: Insignificant parameters present)

8 ARIMA(0,1,4) (Result: Insignificant parameters present)

9 ARIMA(1,1,4) (Result: Insignificant parameters present)

10 ARIMA(2,1,5) (Result: Insignificant parameters present)

11 ARIMA(3,1,1) (Result: All Insignificant parameters)

12 ARIMA(4,1,1) (Result: Insignificant parameters present)

13 ARIMA(4,1,0) (attempting underfitting model) (Result: Insignificant parameters present)

AIC Scores:

BIC Scores:

Conclusion of parameter estimation using AIC and BIC scores:

Error Estimation

Conclusion from Error Estimation

Final Best Fitting model:

Overfitting models:

Final Analysis Conclusion:

Future Directions:

`April 23, 2023`