Data pre-processing

Load libraries

Load data set

sea_ice_orig <-  read.csv("sea_ice_arctic.csv")

colnames(sea_ice_orig) <- c('Year', 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec')

sea_ice_orig <- sea_ice_orig %>% gather('Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec', key = 'Month', value = 'volume') %>% arrange(Year)

Convert to time series

sea_ice_orig_ts <- ts(sea_ice_orig$volume, start = 1980, frequency = 12)

Plot data

plot(sea_ice_orig_ts, type = "o", pch = 19, cex = 0.5, xlab = "Year", ylab = expression(paste("Sea Ice Volume (1000 km"^"3", ")")), main = "Arctic Sea Ice")

Load truncated data set

sea_ice_long <-  read.csv("sea_ice_arctic2.csv")

colnames(sea_ice_long) <- c('Year', 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec')

sea_ice_long <- sea_ice_long %>% gather('Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec', key = 'Month', value = 'volume') %>% arrange(Year)

Convert truncated data set to time series

sea_ice_long_ts <- ts(sea_ice_long$volume, start = 1986, frequency = 12)
lambda = 0.85
sea_ice_long_BC <- (sea_ice_long_ts^lambda-1)/lambda
sea_ice <-  read.csv("sea_ice_arctic3.csv")

colnames(sea_ice) <- c('Year', 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec')

sea_ice <- sea_ice %>% gather('Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec', key = 'Month', value = 'volume') %>% arrange(Year)
sea_ice_ts <- ts(sea_ice$volume, start = 1986, frequency = 12)

Plot truncated time series data

plot(sea_ice_ts, type = "o", xlab = "Year", ylab = expression(paste("Sea Ice Volume (1000 km"^"3", ")")), main = "Arctic Sea Ice")

Annotated plot time series of truncated data to examine seasonality

plot(sea_ice_ts, ylab=expression(paste("Sea Ice Volume (1000 km"^"3", ")")), xlab = 'Year', type = "l", main = "Arctic Sea Ice")
points(y=sea_ice_ts,x=time(sea_ice_ts),pch=as.vector(season(sea_ice_ts)))

  • From the plot of the output we know:
  1. Seasonality is present.
  2. There is a downward trend (non-stationary).
  3. There is no obvious change in variance.
  4. It is not possible to determine behaviour due to the presence of seasonality.

Harmonic-Trend model

month.=season(sea_ice_ts) # period added to improve table display and this line sets up indicators
model2=lm(sea_ice_ts~month.-1) # -1 removes the intercept term
summary(model2)
## 
## Call:
## lm(formula = sea_ice_ts ~ month. - 1)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -7.840 -2.387  0.302  2.705  5.265 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## month.January    21.9192     0.6247   35.09   <2e-16 ***
## month.February   24.6609     0.6247   39.48   <2e-16 ***
## month.March      26.6870     0.6247   42.72   <2e-16 ***
## month.April      27.6801     0.6247   44.31   <2e-16 ***
## month.May        26.9158     0.6247   43.09   <2e-16 ***
## month.June       23.5491     0.6247   37.70   <2e-16 ***
## month.July       17.1039     0.6247   27.38   <2e-16 ***
## month.August     12.3081     0.6247   19.70   <2e-16 ***
## month.September  11.0253     0.6247   17.65   <2e-16 ***
## month.October    12.1887     0.6247   19.51   <2e-16 ***
## month.November   14.9781     0.6247   23.98   <2e-16 ***
## month.December   18.2201     0.6247   29.17   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.246 on 312 degrees of freedom
## Multiple R-squared:  0.9768, Adjusted R-squared:  0.9759 
## F-statistic:  1093 on 12 and 312 DF,  p-value: < 2.2e-16

All of the parameters corresponding to months are statistically significant at 5% level.

month.=season(sea_ice_ts) # period added to improve table display and this line sets up indicators
model3=lm(sea_ice_ts~month.) # -1 removes the intercept term
summary(model3)
## 
## Call:
## lm(formula = sea_ice_ts ~ month.)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -7.840 -2.387  0.302  2.705  5.265 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      21.9192     0.6247  35.088  < 2e-16 ***
## month.February    2.7417     0.8834   3.103  0.00209 ** 
## month.March       4.7678     0.8834   5.397 1.34e-07 ***
## month.April       5.7609     0.8834   6.521 2.81e-10 ***
## month.May         4.9966     0.8834   5.656 3.51e-08 ***
## month.June        1.6300     0.8834   1.845  0.06598 .  
## month.July       -4.8153     0.8834  -5.451 1.02e-07 ***
## month.August     -9.6110     0.8834 -10.879  < 2e-16 ***
## month.September -10.8939     0.8834 -12.331  < 2e-16 ***
## month.October    -9.7304     0.8834 -11.014  < 2e-16 ***
## month.November   -6.9411     0.8834  -7.857 6.42e-14 ***
## month.December   -3.6990     0.8834  -4.187 3.68e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.246 on 312 degrees of freedom
## Multiple R-squared:  0.7779, Adjusted R-squared:  0.7701 
## F-statistic: 99.33 on 11 and 312 DF,  p-value: < 2.2e-16

Fit of cosine curve to average monthly Arctic sea ice series.

har.=harmonic(sea_ice_ts,1) # calculate cos(2*pi*t) and sin(2*pi*t)
t1 <- time(sea_ice_ts)
model4=lm(sea_ice_ts~har. + t1)
summary(model4)
## 
## Call:
## lm(formula = sea_ice_ts ~ har. + t1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.9501 -0.9066  0.0295  0.9480  3.0424 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     794.03512   20.30584   39.10   <2e-16 ***
## har.cos(2*pi*t)   2.09440    0.11190   18.72   <2e-16 ***
## har.sin(2*pi*t)   7.91857    0.11194   70.74   <2e-16 ***
## t1               -0.38724    0.01016  -38.13   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.424 on 320 degrees of freedom
## Multiple R-squared:  0.9561, Adjusted R-squared:  0.9557 
## F-statistic:  2326 on 3 and 320 DF,  p-value: < 2.2e-16

Low p-values indicate a high degree of significance for the determined parameters. The R-squared value is high (0.96) which is very good but care should be taken to ensure this is not due to overfitting.

Plot of fitted Harmonic-Trend curve along with observed average monthly Arctic sea ice series.

plot(ts(fitted(model4),freq=12,start=c(1986,1)),xlab = 'Year', ylab=expression(paste("Sea Ice Volume (1000 km"^"3", ")")),type='l',
ylim=range(c(fitted(model4),sea_ice_ts)),main="Fitted model to Sea Ice Volume") # ylim ensures that the y axis range fits the raw data and the fitted values
points(sea_ice_ts) 

* The Harmonic-Trend model appears to fit reasonably well to the average monthly Arctic sea ice data up to 2007 but there is some deviation beyond this point.

Time series plot of standardized residuals.

plot(y=rstudent(model4),x=as.vector(time(sea_ice_ts)), xlab='Year',ylab='Standardized Residuals',type='o', main = "Time series plot of residuals")

* There was a departure from randomness in the plot of the standardized residuals. Therefore, labeled months to determine if there is a trend present.

Labeled months in plot of standardized residuals.

plot(y=rstudent(model4),x=as.vector(time(sea_ice_ts)),xlab='Year', ylab='Standardized Residuals',type='l', main = "Time series plot of residuals")
points(y=rstudent(model4),x=as.vector(time(sea_ice_ts)), pch=as.vector(season(sea_ice_ts)))

* Some high and low points but appears more random than the original series.

Plot of standardized residuals with labels.

plot(y=rstudent(model3),x=as.vector(fitted(model3)), xlab='Fitted Trend Values', ylab='Standardized Residuals',type='n', main = "Time series plot of standardised residuals")
points(y=rstudent(model3),x=as.vector(fitted(model3)),pch=as.vector(season(sea_ice_ts)))

* There was a similar spread of labels across the plot. The plot does not indicate any dramatic patterns that would cause us to doubt the seasonal means model.

Normality of standardized residuals

Histogram of standardized residuals for Harmoic-Trend model.

hist(rstudent(model4),xlab='Standardized Residuals', main = 'Histogram of Standardized Residuals')

Q-Q plot of standardized residuals for Harmoic-Trend model.

y = rstudent(model4)
qqnorm(y)
qqline(y, col = 2, lwd = 1, lty = 2)

shapiro.test(model4$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  model4$residuals
## W = 0.99224, p-value = 0.08909
  • The histogram, Q-Q plot of standardized residuals for the Harmoic-Trend model shows a normal distribution. The Shpiro-Wilk test gave a p-value of 0.089 indicating that we can not reject the null hypothesis that the stocastic component of this model is normally distributed.
sea_ice2 = read.csv("sea_ice_arctic2.csv")
sea_ice2
colnames(sea_ice2) <- c('Year','Jan', 'Feb', 'Mar', 'Apr', 'May', 'June','July','Aug', 'Sep', 'Oct' , 'Nov', 'Dec')
sea_ice2 <- sea_ice2 %>% gather('Jan', 'Feb', 'Mar', 'Apr', 'May', 'June','July','Aug', 'Sep', 'Oct' , 'Nov', 'Dec', key = "Month", value = "volume") %>% arrange(Year)
sea_ice_ts2 <- ts(sea_ice2$volume, start = 1, frequency = 12)
plot(sea_ice_ts2, type = "o")

Three-year forecast of Arctic sea ice volumes from 2013 to 2015 using the Harmonic-Trend model.

har.=harmonic(sea_ice_ts2,1) # calculate cos(2*pi*t) and sin(2*pi*t)
t3 <- time(sea_ice_ts2)
t1 = har.[,1] # To make it easier assign harmonic variables to separate variables
t2 = har.[,2]
model4=lm(sea_ice_ts2~t3+t1+t2) # Fit the model with separate variables
# We need to create continuous time for 12 months starting from the first month of 2013
t = c(34.000, 34.083, 34.167 ,34.250, 34.333, 34.417 ,34.500, 34.583, 34.667, 34.750, 34.833, 34.917, 35.000, 35.083, 35.167 ,35.250, 35.333, 35.417 ,35.500, 35.583, 35.667, 35.750, 35.833, 35.917, 36.000, 36.083, 36.167 ,36.250, 36.333, 36.417 ,36.500, 36.583, 36.667, 36.750, 36.833, 36.917, 37.000)
t1 = cos(2*pi*t)
t2 = sin(2*pi*t)
t3 <- t
t3
##  [1] 34.000 34.083 34.167 34.250 34.333 34.417 34.500 34.583 34.667 34.750
## [11] 34.833 34.917 35.000 35.083 35.167 35.250 35.333 35.417 35.500 35.583
## [21] 35.667 35.750 35.833 35.917 36.000 36.083 36.167 36.250 36.333 36.417
## [31] 36.500 36.583 36.667 36.750 36.833 36.917 37.000
new = data.frame(t3, t1 , t2) # Step 1
# Notice here that I'm using the same variable names "t1" and "t2" as in the 
# fitted model above, where the name of the variables showing sine and cosine
# components are also "t1" and "t2". To run the predict() function properly,
# the names of variables in fitted model and "new" data frame
# must be the same!!!
forecasts = predict(model4, new, interval = "prediction")
print(forecasts)
##          fit        lwr       upr
## 1  14.946060 12.0910334 17.801086
## 2  18.577035 15.7217116 21.432358
## 3  20.664983 17.8094369 23.520529
## 4  20.598113 17.7424564 23.453770
## 5  18.422072 15.5664148 21.277729
## 6  14.656961 11.8013863 17.512536
## 7  10.383300  7.5278354 13.238764
## 8   6.692016  3.8366324  9.547400
## 9   4.543033  1.6876495  7.398417
## 10  4.549595  1.6941007  7.405089
## 11  6.665328  3.8096123  9.521043
## 12 10.369404  7.5133780 13.225429
## 13 14.582757 11.7263900 17.439123
## 14 18.213732 15.3570505 21.070414
## 15 20.301680 17.4447628 23.158597
## 16 20.234810 17.3777759 23.091845
## 17 18.058769 15.2017343 20.915803
## 18 14.293658 11.4367106 17.150606
## 19 10.019997  7.1631661 12.876828
## 20  6.328713  3.4719678  9.185458
## 21  4.179730  1.3229849  7.036476
## 22  4.186292  1.3294297  7.043154
## 23  6.302025  3.4449284  9.159121
## 24 10.006101  7.1486763 12.863525
## 25 14.219454 11.3616690 17.077238
## 26 17.850429 14.9923119 20.708546
## 27 19.938377 17.0800112 22.796743
## 28 19.871507 17.0130178 22.729997
## 29 17.695466 14.8369763 20.553956
## 30 13.930355 11.0719573 16.788753
## 31  9.656694  6.7984193 12.514969
## 32  5.965410  3.1072257  8.823595
## 33  3.816427  0.9582427  6.674612
## 34  3.822989  0.9646811  6.681297
## 35  5.938722  3.0801669  8.797277
## 36  9.642798  6.7838971 12.501698
## 37 13.856151 10.9968705 16.715431

Plot of 3-year forecast of Arctic sea ice.

plot(sea_ice_ts2, xlim = c(0,36), ylim = c(0, 42), ylab = expression(paste("Sea Ice Volume (1000 km"^"3", ")")))
# Here we convert the forecasts and prediction limits to monthly time series!
lines(ts(as.vector(forecasts[,1]), start = c(34,1), frequency = 12), col="red", type="l")
lines(ts(as.vector(forecasts[,2]), start = c(34,1), frequency = 12), col="blue", type="l")
lines(ts(as.vector(forecasts[,3]), start = c(34,1), frequency = 12), col="blue", type="l")
legend("topleft", lty=1, pch=1, col=c("black","blue","red"), text.width = 15,
       c("Data","5% forecast limits", "Forecasts"))

plot(sea_ice_ts2, xlim = c(0,36), ylim = c(0, 42), ylab = expression(paste("Sea Ice Volume (1000 km"^"3", ")")))
# Here we convert the forecasts and prediction limits to monthly time series!
lines(ts(as.vector(forecasts[,1]), start = c(34,1), frequency = 12), col="red", type="l")
legend("topleft", lty=1, pch=1, col=c("black","red"), text.width = 15,
       c("Data","Forecasts"))

Nonstationary seasonal ARIMA (SARIMA) model (Residuals Approach).

Box-Cox transformation of data

sea_ice_transform <- BoxCox.ar(sea_ice_ts, method = "yule-walker")

sea_ice_transform$ci
## [1] 0.7 1.0
lambda = 0.85
sea_ice_BC <- (sea_ice_ts^lambda-1)/lambda
plot(sea_ice_BC, type = "o", xlab = "Year", ylab = "Sea Ice Volume (1000 km3)", main = "Box Cox transform")

Fit of SARIMA models.

Initial fit of SARIMA(0,0,0)x(0,1,0)12 model

m1_sea_ice = arima(sea_ice_BC,order=c(0,0,0),seasonal=list(order=c(0,1,0), period=12))
res_m1 = residuals(m1_sea_ice);
plot(res_m1,xlab='Time', ylab='Residuals',main="Time series plot of the residuals")

ACF and PACF plots of residuals for SARIMA(0,0,0)x(0,1,0)12 model

par(mfrow=c(1,2))
acf(res_m1, lag.max = 36, main = "The sample ACF of the residuals")
pacf(res_m1, lag.max = 36, main = "The sample PACF of the residuals")

  • There is now no pattern implying the existence of a seasonal trend.

  • However, the slowly decaying pattern prior to the first period at 1s implies the existence of an ordinary trend in the ACF plot of the residuals.

  • We will attempt to remove this ordinary trend by fitting a SARIMA(0,1,0)x(0,1,0)12 model.

Fit of SARIMA(0,1,0)x(0,1,0)12 model

m2_sea_ice = arima(sea_ice_BC,order=c(0,1,0),seasonal=list(order=c(0,1,0), period=12))
res_m2 = residuals(m2_sea_ice);
plot(res_m2,xlab='Time',ylab='Residuals',main="Time series plot of the residuals")

ACF and PACF plots for fit of SARIMA(0,1,0)x(0,1,0)12 model

par(mfrow=c(1,2))
acf(res_m2, lag.max = 36, main = "The sample ACF of the residuals")
pacf(res_m2, lag.max = 36, main = "The sample PACF of the residuals")

  • No evidence of an ordinary trend remaining in the residuals.

  • There is a decreasing pattern in lags 1, 2, 3,…. in the SARMA conponent of the PACF plot. The correation at lag 1 in the ACF plot is significant. This implies the existence of an SMA(1) component.

  • ACF - 1 significant lag at 1s and 1 in first part prior to 1s (q=1, Q=1)

  • PACF - 1 significant lag and 1 not so significant lag in first part and 3 significant after 1s (p=1,2, P=1,2,3)

  • Now tried to fit a SARIMA(0,1,0)x(0,1,1)12 model to try to remove the remaining seasonal component in the residuals.

Fit of SARIMA(0,1,0)x(0,1,1)12 model.

m3_sea_ice = arima(sea_ice_BC,order=c(0,1,0),seasonal=list(order=c(0,1,1), period=12))
res_m3 = residuals(m3_sea_ice);
plot(res_m3,xlab='Time',ylab='Residuals',main="Time series plot of the residuals")

ACF AND PACF plots for SARIMA(0,1,0)x(0,1,1)12 model.

par(mfrow=c(1,2))
acf(res_m3, lag.max = 36, main = "The sample ACF of the residuals")
pacf(res_m3, lag.max = 36, main = "The sample PACF of the residuals")

  • The autocorrelations, especially the first seasonal lag (lag 1) in the ACF plot, become insignificant or slightly significant after adding the seasonal component.

  • The ACF and PACF plots can be used to determine the orders of the ARMA component since there are no highly significant correlations at lags s, 2s, 3s, ….

  • The ACF plot displays one significant and one less significant autocorrelation (q=1,2) while the PACF plot has two significant and one less significant autocorrelation (p=1,2). This suggests a ARMA(1,1), ARMA(1,2) and ARMA(2,1) models.

  • We will now fit SARIMA(1,1,1)x(0,1,1)12, SARIMA(1,1,2)x(0,1,1)12 models and SARIMA(2,1,1)x(0,1,1)12 models.

Fit of SARIMA(1,1,1)x(0,1,1)12 model

m4_sea_ice = arima(sea_ice_BC,order=c(1,1,1),seasonal=list(order=c(0,1,1), period=12))
res_m4 = residuals(m4_sea_ice);
plot(res_m4,xlab='Time',ylab='Residuals',main="Time series plot of the residuals")

ACF and PACF plots for SARIMA(1,1,1)x(0,1,1)12 model

par(mfrow=c(1,2))
acf(res_m4, lag.max = 36, main = "The sample ACF of the residuals")
pacf(res_m4, lag.max = 36, main = "The sample PACF of the residuals")

Fit of SARIMA(1,1,2)x(0,1,1)12 model

m5_sea_ice = arima(sea_ice_BC,order=c(1,1,2),seasonal=list(order=c(0,1,1), period=12))
res_m5 = residuals(m5_sea_ice);
plot(res_m5,xlab='Time',ylab='Residuals',main="Time series plot of the residuals")

ACF and PACF plots for SARIMA(1,1,2)x(0,1,1)12 model

par(mfrow=c(1,2))
acf(res_m5, lag.max = 36, main = "The sample ACF of the residuals")
pacf(res_m5, lag.max = 36, main = "The sample PACF of the residuals")

Fit of SARIMA(1,1,3)x(0,1,1)12 model

m55_sea_ice = arima(sea_ice_BC,order=c(1,1,3),seasonal=list(order=c(0,1,1), period=12))
res_m55 = residuals(m55_sea_ice);
plot(res_m55,xlab='Time',ylab='Residuals',main="Time series plot of the residuals")

ACF and PACF plots for SARIMA(1,1,3)x(0,1,1)12 model

par(mfrow=c(1,2))
acf(res_m55, lag.max = 36, main = "The sample ACF of the residuals")
pacf(res_m55, lag.max = 36, main = "The sample PACF of the residuals")

Fit of SARIMA(2,1,3)x(0,1,1)12 model

m56_sea_ice = arima(sea_ice_BC,order=c(2,1,3),seasonal=list(order=c(0,1,1), period=12))
res_m56 = residuals(m56_sea_ice);
plot(res_m56,xlab='Time',ylab='Residuals',main="Time series plot of the residuals")

ACF and PACF plots for SARIMA(2,1,3)x(0,1,1)12 model

par(mfrow=c(1,2))
acf(res_m56, lag.max = 36, main = "The sample ACF of the residuals")
pacf(res_m56, lag.max = 36, main = "The sample PACF of the residuals")

Fit of SARIMA(2,1,1)x(0,1,1)12 model

m6_sea_ice = arima(sea_ice_BC,order=c(2,1,1),seasonal=list(order=c(0,1,1), period=12))
res_m6 = residuals(m6_sea_ice);
plot(res_m6,xlab='Time',ylab='Residuals',main="Time series plot of the residuals")

ACF and PACF plots for SARIMA(2,1,1)x(0,1,1)12 model

par(mfrow=c(1,2))
acf(res_m6, lag.max = 36, main = "The sample ACF of the residuals")
pacf(res_m6, lag.max = 36, main = "The sample PACF of the residuals")

Fit of SARIMA(2,1,2)x(0,1,1)12 model

m6_sea_ice = arima(sea_ice_BC,order=c(2,1,2),seasonal=list(order=c(0,1,1), period=12))
res_m6 = residuals(m6_sea_ice);
plot(res_m6,xlab='Time',ylab='Residuals',main="Time series plot of the residuals")

ACF and PACF plots for SARIMA(2,1,2)x(0,1,1)12 model

par(mfrow=c(1,2))
acf(res_m6, lag.max = 36, main = "The sample ACF of the residuals")
pacf(res_m6, lag.max = 36, main = "The sample PACF of the residuals")

  • The residuals for the SARIMA(1,1,3)x(0,1,1)12 model are closer to white noise.

  • Therefore we can conclude that the orders are: p=1, d=1, q=3, P=0, D=1, Q=1 and s=12 for the SARIMA(p,d,q)x(P,D,Q)s model.

Model Fitting - ML estimates and Conditional Least Squares for SARIMA models.

ML estimates for SARIMA(0,1,1)x(0,1,1)12 model

m1_sea_ice_ts = arima(sea_ice_BC,order=c(0,1,1), seasonal=list(order=c(0,1,1), period=12),method = "ML")
coeftest(m1_sea_ice_ts)
## 
## z test of coefficients:
## 
##       Estimate Std. Error z value  Pr(>|z|)    
## ma1   0.528877   0.041731  12.673 < 2.2e-16 ***
## sma1 -0.612651   0.045046 -13.601 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conditional Least Squares for SARIMA(0,1,1)x(0,1,1)12 model

m1_sea_ice_CSS = arima(sea_ice_BC,order=c(0,1,1), seasonal=list(order=c(0,1,1), period=12),method ="CSS")
coeftest(m1_sea_ice_CSS)
## 
## z test of coefficients:
## 
##       Estimate Std. Error z value  Pr(>|z|)    
## ma1   0.533031   0.041657  12.796 < 2.2e-16 ***
## sma1 -0.620469   0.044145 -14.055 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ML estimates for SARIMA(0,1,2)x(0,1,1)12 model

m2_sea_ice_ts = arima(sea_ice_BC,order=c(0,1,2), seasonal=list(order=c(0,1,1), period=12),method = "ML")
coeftest(m2_sea_ice_ts)
## 
## z test of coefficients:
## 
##       Estimate Std. Error  z value  Pr(>|z|)    
## ma1   0.619615   0.058099  10.6649 < 2.2e-16 ***
## ma2   0.148423   0.054706   2.7131  0.006666 ** 
## sma1 -0.616319   0.045470 -13.5543 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conditional Least Squares for SARIMA(0,1,2)x(0,1,1)12 model

m2_sea_ice_CSS = arima(sea_ice_BC,order=c(0,1,2), seasonal=list(order=c(0,1,1), period=12),method ="CSS")
coeftest(m2_sea_ice_CSS)
## 
## z test of coefficients:
## 
##       Estimate Std. Error  z value  Pr(>|z|)    
## ma1   0.627115   0.058190  10.7769 < 2.2e-16 ***
## ma2   0.153729   0.054856   2.8024  0.005072 ** 
## sma1 -0.626383   0.044601 -14.0440 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ML estimates for SARIMA(0,1,3)x(0,1,1)12 model

m3_sea_ice_ts = arima(sea_ice_BC,order=c(0,1,3), seasonal=list(order=c(0,1,1), period=12),method = "ML")
coeftest(m3_sea_ice_ts)
## 
## z test of coefficients:
## 
##       Estimate Std. Error  z value Pr(>|z|)    
## ma1   0.617162   0.057424  10.7475  < 2e-16 ***
## ma2   0.127340   0.066599   1.9120  0.05587 .  
## ma3  -0.032059   0.058845  -0.5448  0.58589    
## sma1 -0.615565   0.045614 -13.4950  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conditional Least Squares for SARIMA(0,1,3)x(0,1,1)12 model

m3_sea_ice_CSS = arima(sea_ice_BC,order=c(0,1,3), seasonal=list(order=c(0,1,1), period=12),method ="CSS")
coeftest(m3_sea_ice_CSS)
## 
## z test of coefficients:
## 
##       Estimate Std. Error  z value Pr(>|z|)    
## ma1   0.624482   0.057664  10.8296  < 2e-16 ***
## ma2   0.134359   0.067151   2.0008  0.04541 *  
## ma3  -0.029056   0.059128  -0.4914  0.62314    
## sma1 -0.625413   0.044756 -13.9739  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ML estimates for SARIMA(0,1,4)x(0,1,1)12 model

m4_sea_ice_ts = arima(sea_ice_BC,order=c(0,1,4), seasonal=list(order=c(0,1,1), period=12),method = "ML")
coeftest(m4_sea_ice_ts)
## 
## z test of coefficients:
## 
##        Estimate Std. Error  z value Pr(>|z|)    
## ma1   0.6220928  0.0582100  10.6871   <2e-16 ***
## ma2   0.1373741  0.0702727   1.9549   0.0506 .  
## ma3  -0.0032871  0.0746616  -0.0440   0.9649    
## ma4   0.0392999  0.0643236   0.6110   0.5412    
## sma1 -0.6153579  0.0460568 -13.3608   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conditional Least Squares for SARIMA(0,1,4)x(0,1,1)12 model

m4_sea_ice_CSS = arima(sea_ice_BC,order=c(0,1,4), seasonal=list(order=c(0,1,1), period=12),method ="CSS")
coeftest(m4_sea_ice_CSS)
## 
## z test of coefficients:
## 
##       Estimate Std. Error  z value Pr(>|z|)    
## ma1   0.629026   0.058469  10.7583  < 2e-16 ***
## ma2   0.143989   0.070915   2.0304  0.04231 *  
## ma3  -0.002412   0.075436  -0.0320  0.97449    
## ma4   0.036002   0.064600   0.5573  0.57731    
## sma1 -0.625303   0.045165 -13.8448  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ML estimates for SARIMA(1,1,0)x(0,1,1)12 model

m5_sea_ice_ts = arima(sea_ice_BC,order=c(1,1,0), seasonal=list(order=c(0,1,1), period=12),method = "ML")
coeftest(m5_sea_ice_ts)
## 
## z test of coefficients:
## 
##       Estimate Std. Error z value  Pr(>|z|)    
## ar1   0.496346   0.049392  10.049 < 2.2e-16 ***
## sma1 -0.601932   0.045173 -13.325 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conditional Least Squares for SARIMA(1,1,0)x(0,1,1)12 model

m5_sea_ice_CSS = arima(sea_ice_BC,order=c(1,1,0), seasonal=list(order=c(0,1,1), period=12),method ="CSS")
coeftest(m5_sea_ice_CSS)
## 
## z test of coefficients:
## 
##       Estimate Std. Error z value  Pr(>|z|)    
## ar1   0.501415   0.049472  10.135 < 2.2e-16 ***
## sma1 -0.611851   0.044375 -13.788 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ML estimates for SARIMA(2,1,0)x(0,1,1)12 model

m6_sea_ice_ts = arima(sea_ice_BC,order=c(2,1,0), seasonal=list(order=c(0,1,1), period=12),method = "ML")
coeftest(m6_sea_ice_ts)
## 
## z test of coefficients:
## 
##       Estimate Std. Error  z value  Pr(>|z|)    
## ar1   0.604111   0.055947  10.7979 < 2.2e-16 ***
## ar2  -0.213680   0.055763  -3.8319 0.0001271 ***
## sma1 -0.616286   0.044968 -13.7051 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conditional Least Squares for SARIMA(2,1,0)x(0,1,1)12 model

m6_sea_ice_CSS = arima(sea_ice_BC,order=c(2,1,0), seasonal=list(order=c(0,1,1), period=12),method ="CSS")
coeftest(m6_sea_ice_CSS)
## 
## z test of coefficients:
## 
##       Estimate Std. Error  z value  Pr(>|z|)    
## ar1   0.609424   0.056056  10.8718 < 2.2e-16 ***
## ar2  -0.216147   0.055703  -3.8803 0.0001043 ***
## sma1 -0.627819   0.044033 -14.2579 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ML estimates for SARIMA(3,1,0)x(0,1,1)12 model

m7_sea_ice_ts = arima(sea_ice_BC,order=c(3,1,0), seasonal=list(order=c(0,1,1), period=12),method = "ML")
coeftest(m7_sea_ice_ts)
## 
## z test of coefficients:
## 
##       Estimate Std. Error  z value  Pr(>|z|)    
## ar1   0.611605   0.057279  10.6777 < 2.2e-16 ***
## ar2  -0.234647   0.065514  -3.5816 0.0003415 ***
## ar3   0.034612   0.056825   0.6091 0.5424531    
## sma1 -0.615806   0.045225 -13.6166 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conditional Least Squares for SARIMA(3,1,0)x(0,1,1)12 model

m7_sea_ice_CSS = arima(sea_ice_BC,order=c(3,1,0), seasonal=list(order=c(0,1,1), period=12),method ="CSS")
coeftest(m7_sea_ice_CSS)
## 
## z test of coefficients:
## 
##       Estimate Std. Error  z value  Pr(>|z|)    
## ar1   0.614598   0.057541  10.6811 < 2.2e-16 ***
## ar2  -0.235922   0.065715  -3.5901 0.0003306 ***
## ar3   0.036156   0.056968   0.6347 0.5256421    
## sma1 -0.623436   0.044454 -14.0243 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ML estimates for SARIMA(4,1,0)x(0,1,1)12 model

m8_sea_ice_ts = arima(sea_ice_BC,order=c(4,1,0), seasonal=list(order=c(0,1,1), period=12),method = "ML")
coeftest(m8_sea_ice_ts)
## 
## z test of coefficients:
## 
##       Estimate Std. Error  z value  Pr(>|z|)    
## ar1   0.615433   0.057335  10.7340 < 2.2e-16 ***
## ar2  -0.250318   0.067215  -3.7241  0.000196 ***
## ar3   0.071776   0.067039   1.0706  0.284328    
## ar4  -0.060419   0.058035  -1.0411  0.297833    
## sma1 -0.625842   0.045952 -13.6195 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conditional Least Squares for SARIMA(4,1,0)x(0,1,1)12 model

m8_sea_ice_CSS = arima(sea_ice_BC,order=c(4,1,0), seasonal=list(order=c(0,1,1), period=12),method ="CSS")
coeftest(m8_sea_ice_CSS)
## 
## z test of coefficients:
## 
##       Estimate Std. Error  z value  Pr(>|z|)    
## ar1   0.621123   0.057618  10.7800 < 2.2e-16 ***
## ar2  -0.256354   0.067577  -3.7935 0.0001485 ***
## ar3   0.078219   0.067560   1.1578 0.2469603    
## ar4  -0.064425   0.058417  -1.1028 0.2700961    
## sma1 -0.633048   0.045281 -13.9804 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ML estimates for SARIMA(2,1,2)x(0,1,1)12 model

m9_sea_ice_ts = arima(sea_ice_BC,order=c(2,1,2), seasonal=list(order=c(0,1,1), period=12),method = "ML")
coeftest(m9_sea_ice_ts)
## 
## z test of coefficients:
## 
##       Estimate Std. Error  z value  Pr(>|z|)    
## ar1  -0.580844   0.204539  -2.8398  0.004515 ** 
## ar2   0.064961   0.106110   0.6122  0.540404    
## ma1   1.202765   0.200433   6.0008 1.963e-09 ***
## ma2   0.421000   0.128387   3.2792  0.001041 ** 
## sma1 -0.615401   0.046191 -13.3229 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conditional Least Squares for SARIMA(2,1,2)x(0,1,1)12 model

m9_sea_ice_CSS = arima(sea_ice_BC,order=c(2,1,2), seasonal=list(order=c(0,1,1), period=12),method ="CSS")
coeftest(m9_sea_ice_CSS)
## 
## z test of coefficients:
## 
##       Estimate Std. Error  z value  Pr(>|z|)    
## ar1   1.187971   0.088643  13.4017 < 2.2e-16 ***
## ar2  -0.367223   0.085463  -4.2969 1.732e-05 ***
## ma1  -0.629360   0.086346  -7.2888 3.127e-13 ***
## ma2  -0.301433   0.086996  -3.4649 0.0005304 ***
## sma1 -0.654313   0.045405 -14.4106 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ML estimates for SARIMA(1,1,1)x(0,1,1)12 model

m10_sea_ice_ts = arima(sea_ice_BC,order=c(1,1,1), seasonal=list(order=c(0,1,1), period=12),method = "ML")
coeftest(m10_sea_ice_ts)
## 
## z test of coefficients:
## 
##       Estimate Std. Error  z value  Pr(>|z|)    
## ar1   0.235818   0.091246   2.5844  0.009754 ** 
## ma1   0.372973   0.083984   4.4410 8.955e-06 ***
## sma1 -0.616586   0.045391 -13.5839 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conditional Least Squares for SARIMA(1,1,1)x(0,1,1)12 model

m10_sea_ice_CSS = arima(sea_ice_BC,order=c(1,1,1), seasonal=list(order=c(0,1,1), period=12),method ="CSS")
coeftest(m10_sea_ice_CSS)
## 
## z test of coefficients:
## 
##       Estimate Std. Error  z value  Pr(>|z|)    
## ar1   0.244460   0.090732   2.6943  0.007054 ** 
## ma1   0.371198   0.083745   4.4325 9.315e-06 ***
## sma1 -0.626547   0.044505 -14.0781 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ML estimates for SARIMA(1,1,2)x(0,1,1)12 model

m11_sea_ice_ts = arima(sea_ice_BC,order=c(1,1,2), seasonal=list(order=c(0,1,1), period=12),method ="ML")
coeftest(m11_sea_ice_ts)
## 
## z test of coefficients:
## 
##       Estimate Std. Error  z value  Pr(>|z|)    
## ar1  -0.603916   0.273424  -2.2087  0.027194 *  
## ma1   1.214468   0.268193   4.5283 5.945e-06 ***
## ma2   0.457942   0.142336   3.2173  0.001294 ** 
## sma1 -0.613789   0.046308 -13.2544 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conditional Least Squares for SARIMA(1,1,2)x(0,1,1)12 model

m11_sea_ice_CSS = arima(sea_ice_BC,order=c(1,1,2), seasonal=list(order=c(0,1,1), period=12),method ="CSS")
coeftest(m11_sea_ice_CSS)
## 
## z test of coefficients:
## 
##      Estimate Std. Error  z value Pr(>|z|)    
## ar1  -0.40408    0.96542  -0.4186   0.6755    
## ma1   1.02279    0.93827   1.0901   0.2757    
## ma2   0.36175    0.47705   0.7583   0.4483    
## sma1 -0.62331    0.04560 -13.6691   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ML estimates for SARIMA(1,1,3)x(0,1,1)12 model

m12_sea_ice_ts = arima(sea_ice_BC,order=c(1,1,3), seasonal=list(order=c(0,1,1), period=12),method ="ML")
## Warning in log(s2): NaNs produced
coeftest(m12_sea_ice_ts)
## 
## z test of coefficients:
## 
##       Estimate Std. Error  z value  Pr(>|z|)    
## ar1   0.843175   0.039409  21.3954 < 2.2e-16 ***
## ma1  -0.297855   0.065530  -4.5453 5.485e-06 ***
## ma2  -0.470732   0.050607  -9.3018 < 2.2e-16 ***
## ma3  -0.206533   0.056359  -3.6646 0.0002478 ***
## sma1 -0.617171   0.046128 -13.3797 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conditional Least Squares for SARIMA(1,1,3)x(0,1,1)12 model

m12_sea_ice_CSS = arima(sea_ice_BC,order=c(1,1,3), seasonal=list(order=c(0,1,1), period=12),method ="CSS")
coeftest(m12_sea_ice_CSS)
## 
## z test of coefficients:
## 
##       Estimate Std. Error  z value  Pr(>|z|)    
## ar1   0.880661   0.047849  18.4052 < 2.2e-16 ***
## ma1  -0.299610   0.071574  -4.1860 2.839e-05 ***
## ma2  -0.455134   0.051237  -8.8829 < 2.2e-16 ***
## ma3  -0.186179   0.055515  -3.3537 0.0007975 ***
## sma1 -0.618007   0.045718 -13.5178 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • The Maximum likelihood (ML) estimates and Conditional Least Squares (CSS) were determined to be highly significant for the SARIMA(1,1,1)x(0,1,1)12 model.

Diagnostic Checking

Time series plot for standardized residuals for SARIMA(1,1,3)x(0,1,1)12 model

plot(window(rstandard(m55_sea_ice),start=c(1986, 1)),
ylab='Standardized Residuals',type='o',
main="Residuals from the SARIMA(1,1,3)x(0,1,1)_12 Model")
abline(h=0, col = 'red')

  • Plot suggests no major abnormalities with this model(except around 1994) although there are several outliers around 2010 that may need to be investigated in more detail.

ACF plot of standardized residuals for SARIMA(1,1,3)x(0,1,1)12 model

acf(as.vector(window(rstandard(m55_sea_ice),start=c(1986,1))),
lag.max=36,
main="ACF of Residuals from the SARIMA(1,1,3)x(0,1,1)_12 Model")

  • Besides the slightly significant autocorrelations at lag 16 there is no sign of violation of the independence of residuals.
Box.test(window(rstandard(m55_sea_ice), start = c(1986,1)), lag = 16, type = "Ljung-Box", fitdf = 0)
## 
##  Box-Ljung test
## 
## data:  window(rstandard(m55_sea_ice), start = c(1986, 1))
## X-squared = 18.208, df = 16, p-value = 0.3119
  • Overall, the Ljung-Box test indicates that there is no problem in terms of independence of errors.

Normality

Histogram

hist(window(rstandard(m55_sea_ice),start=c(1986,1)), xlab = 'Standardized Residuals', ylab = 'Frequency', main = 'Residuals of SARIMA(1,1,3)x(0,1,1)_12 Model')

Q-Q plot

qqnorm(window(rstandard(m55_sea_ice),start=c(1986,1)),main="Q-Q plot for Residuals: SARIMA(1,1,3)x(0,1,1)_12 Model")
qqline(window(rstandard(m55_sea_ice),start=c(1986,1)))

Shapiro-Wilk test for normality

shapiro.test(window(rstandard(m55_sea_ice), start=c(1986,1), end=c(2012,12)))
## 
##  Shapiro-Wilk normality test
## 
## data:  window(rstandard(m55_sea_ice), start = c(1986, 1), end = c(2012,     12))
## W = 0.97325, p-value = 1.017e-05
  • The histogram displaying the distribution of the standardized residuals, as well as the Q-Q plot indicate that the residuals are reasonably normally distributed. The data at the tails of the Q-Q plot deviate to some extent from linearlity but overall it is probably acceptable.

  • The Shapiro-Wilk test gave a p-value < 0.05 sugesting that the data is not normally distributed. However, this test is sensitive to sample size and with large samples (as in this case) even small deviations from normality will be detected by this test.

Consider overfitting

Conditional Least Squares for SARIMA(2,1,3)x(0,1,1)12 model

m13_sea_ice_CSS = arima(sea_ice_BC,order=c(2,1,3), seasonal=list(order=c(0,1,1), period=12),method ="CSS")
coeftest(m13_sea_ice_CSS)
## 
## z test of coefficients:
## 
##       Estimate Std. Error  z value  Pr(>|z|)    
## ar1   0.451699   0.056038   8.0606 7.591e-16 ***
## ar2   0.289928   0.055508   5.2231 1.759e-07 ***
## ma1   0.049083   0.082804   0.5928    0.5533    
## ma2  -0.670525   0.046281 -14.4880 < 2.2e-16 ***
## ma3  -0.424236   0.065659  -6.4612 1.039e-10 ***
## sma1 -0.641936   0.046264 -13.8755 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
AIC(m55_sea_ice, m56_sea_ice)
  • The Estimate and log-likelihood have not changed much while the AIC value is worse (increased).

Prediction of Seasonal Arctic Sea-Ice

Five year forecast

sea_ice_for = Arima(sea_ice_BC,order=c(1,1,3),seasonal=list(order=c(0,1,1), period=12))

future = forecast(sea_ice_for, h = 60)
plot(future)

* The 5-year forecast is shown as a blue line and the forecast limits are grey.

Long-term (10 year) forecast

sea_ice_for_long = Arima(sea_ice_BC,order=c(1,1,3),seasonal=list(order=c(0,1,1), period=12))

future = forecast(sea_ice_for_long, h = 120)
par(mfrow=c(1,1))
plot(future)
lines(window(sea_ice_long_BC,start=c(2013, 1)), col = 2, lty = 5, lwd = 2, type = 'l')

  • The truncated data (2013-2018) has been overlayed (dashed red line) onto the 10-year forecast for Arctic sea-ice volumes. The data lie within the forecast limits and show the potential for forecasts to gain a idea of future Arctic sea-ice volumes.