Introduction

Autoregressive integrated moving average (ARIMA) model is one of the important statistic analysis model for modelling time series data for better understanding of the data set and to predict the future trends. For this assignment we are given a data set which shows the Antarctica land ice mass in billion metric tons relative to the ice mass in 2001. We have the following objectives for the assignment.

Descriptive Analysis

library(TSA)
## 
## Attaching package: 'TSA'
## The following objects are masked from 'package:stats':
## 
##     acf, arima
## The following object is masked from 'package:utils':
## 
##     tar
library(tseries)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
# Loading the required data
raw_ice_mass <- read.csv("assignment2Data2022.csv", col.names = c("Year", "Ice_Mass"))
# Checking the summary statistics of the data
summary(raw_ice_mass)
##       Year         Ice_Mass        
##  Min.   :2002   Min.   :-2539.055  
##  1st Qu.:2006   1st Qu.:-1785.137  
##  Median :2011   Median : -938.991  
##  Mean   :2011   Mean   :-1063.774  
##  3rd Qu.:2016   3rd Qu.: -289.390  
##  Max.   :2020   Max.   :   -7.547
ice_mass_ts <- ts(raw_ice_mass$Ice_Mass, start = 2002, end = 2020)
# Plotting the time series data
plot(ice_mass_ts, xlab = "Year", ylab = "Annual Antarctica land ice mass", 
     main = "Time series plot for yearly change in Antarctica Land Ice Mass", type = "o")
Time series plot for yearly change in Antarctica Land Ice Mass

Time series plot for yearly change in Antarctica Land Ice Mass

We will analyse the time series plot based on the following statistics:

Stationarity Validation

As there is a clear downward trend and based on the above statistics we can clearly say that the series is non-stationary. We have to deal with this first and we need to ensure the series is stationary before proceeding forward. We will confirm the series is non-stationary using Auto correlation function (ACF) and Partial auto correlation function (PACF) plots.

# Function for ACF and PACF plots
plot_acf_pacf <- function(acf,pacf) {
  par(mfrow=c(1,2))
  acf(ice_mass_ts, main = "ACF plot of 
    Antarctica Land Ice Mass")
  pacf(ice_mass_ts, main = "PACF plot of 
     Antarctica Land Ice Mass") 
}

plot_acf_pacf("ACF plot of 
    Antarctica Land Ice Mass", "PACF plot of 
     Antarctica Land Ice Mass")
ACF and PACF plots

ACF and PACF plots

adf.test(ice_mass_ts, alternative=c("stationary"))
## 
##  Augmented Dickey-Fuller Test
## 
## data:  ice_mass_ts
## Dickey-Fuller = -2.7141, Lag order = 2, p-value = 0.3004
## alternative hypothesis: stationary
qqnorm(ice_mass_ts)
qqline(ice_mass_ts, col = 2)
Q-Q Plot

Q-Q Plot

shapiro.test(ice_mass_ts)
## 
##  Shapiro-Wilk normality test
## 
## data:  ice_mass_ts
## W = 0.92616, p-value = 0.1471

Stationary Conversion

# Applying differencing to the data set
ice_mass_ts_diff <- diff(ice_mass_ts, differences = 1)
# Plotting the differenced series
plot(ice_mass_ts_diff, ylab = "First difference of Antarctic land ice mass", xlab = "Year",
     main = "First difference of Antarctic land ice mass", type="o")
First difference of Antarctic land ice mass

First difference of Antarctic land ice mass

We can still see that there is a small downward trend for the first differenced series. We can confirm whether the series is stationary using the ACF-PACF and the unit root tests.

plot_acf_pacf("ACF plot of 
    Antarctica Land Ice Mass", "PACF plot of 
     Antarctica Land Ice Mass")
ACF and PACF plots

ACF and PACF plots

There is no slowly decaying pattern in ACF plot (Figure 5) and no very large first lag in the PACF plot (Figure 5). This indicates that we are in the right path to stationarity. We have to confirm it with unit root tests. We will perform the Dickey-Fuller test now.

adf.test(ice_mass_ts_diff, alternative=c("stationary"))
## 
##  Augmented Dickey-Fuller Test
## 
## data:  ice_mass_ts_diff
## Dickey-Fuller = -2.5664, Lag order = 2, p-value = 0.3566
## alternative hypothesis: stationary

The unit test is suggesting that the series is still non-stationary. We will do the other unit test to confirm this.

pp.test(ice_mass_ts_diff)
## 
##  Phillips-Perron Unit Root Test
## 
## data:  ice_mass_ts_diff
## Dickey-Fuller Z(alpha) = -17.99, Truncation lag parameter = 2, p-value
## = 0.04887
## alternative hypothesis: stationary

Here we can see that the p-value is very close to 0.05 and coming in the range of 0.03-0.1 and hence the test cannot be considered that reliable. We will also perform a KPSS test for testing.

kpss.test(ice_mass_ts_diff)
## 
##  KPSS Test for Level Stationarity
## 
## data:  ice_mass_ts_diff
## KPSS Level = 0.30558, Truncation lag parameter = 2, p-value = 0.1

For the KPSS test as the hypothesis is inverse, it is suggesting that the difference series is stationary. But, the p-value 0.1 is at the borderline. The value shown by ADF test is higher 0.3566. So, we cannot make a conclusion here. For the ADF test we can try changing the lag order and see the p-values for the neighboring lag orders.

adf.test(ice_mass_ts_diff, alternative=c("stationary"),k = 1)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  ice_mass_ts_diff
## Dickey-Fuller = -4.2788, Lag order = 1, p-value = 0.01353
## alternative hypothesis: stationary
adf.test(ice_mass_ts_diff, alternative=c("stationary"),k = 3)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  ice_mass_ts_diff
## Dickey-Fuller = -2.9425, Lag order = 3, p-value = 0.2133
## alternative hypothesis: stationary

Here, we can see that for the lag order 3 p-value suggests that series is non-stationary and lag order 1 suggests that series is stationary. So, again we have a problem. We can put our findings as below.

Overall, we can see that this is pointing that the series is non-stationary and we have to consider the second differencing. Before we proceed, we can do one more thing. We can apply the transformation and see if any new information can be obtained.

We will apply the BoxCox transformation for the series. But, as we mentioned before we have to convert the values to positive else the transformation will throw us an error.

As the minimum value is -2539.055, we will add the scalar value 3000 so that all the values are positive.

ice_mass_with_scalar <- ice_mass_ts + 3000
summary(ice_mass_with_scalar)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   460.9  1214.9  2061.0  1936.2  2710.6  2992.5

All the values are positive now. We can go ahead with the transformation.

BC = BoxCox.ar(ice_mass_with_scalar, main)
BoxCox Transformation Fitting

BoxCox Transformation Fitting

# Confidence interval limits
BC$ci
## [1] 0.5 1.8
# Calculating the lambda
lambda <- BC$lambda[which(max(BC$loglike) == BC$loglike)]
lambda
## [1] 1.1
ice_mass_transformed = (ice_mass_with_scalar^lambda-1)/lambda
# Plotting the transformed series
plot(ice_mass_transformed, ylab = "Transformed Antarctic land ice mass", xlab = "Year",
     main = "Transformed Antarctic land ice mass", type="o")
BoxCox Transformation Fitting

BoxCox Transformation Fitting

Series is remaining almost with the same variations (Figure 7). We will see once again if the transformation has done anything to the normality of the data.

qqnorm(ice_mass_transformed)
qqline(ice_mass_transformed, col = 2)
Q-Q Plot

Q-Q Plot

shapiro.test(ice_mass_transformed)
## 
##  Shapiro-Wilk normality test
## 
## data:  ice_mass_transformed
## W = 0.92658, p-value = 0.1498

We can see there is no change in the normality of the data. Now, we will directly go with the first differencing and see the plot if the log transformed first difference plot is stationary

ice_mass_transformed_diff <- diff(ice_mass_transformed)
plot(ice_mass_transformed_diff, ylab = "First difference of 
     Transformed Antarctic land ice mass", xlab = "Year", 
     main = "First difference of Transformed Antarctic land ice mass", type="o")
First difference of Transformed Antarctic land ice mass

First difference of Transformed Antarctic land ice mass

Now, we can see that there is slight downward trend and then picking up from 2015. We will direcly conduct the ADF, pp and kpss test now.

adf.test(ice_mass_transformed_diff, alternative=c("stationary"))
## 
##  Augmented Dickey-Fuller Test
## 
## data:  ice_mass_transformed_diff
## Dickey-Fuller = -2.5047, Lag order = 2, p-value = 0.3801
## alternative hypothesis: stationary
pp.test(ice_mass_transformed_diff)
## 
##  Phillips-Perron Unit Root Test
## 
## data:  ice_mass_transformed_diff
## Dickey-Fuller Z(alpha) = -17.88, Truncation lag parameter = 2, p-value
## = 0.05044
## alternative hypothesis: stationary
kpss.test(ice_mass_transformed_diff)
## 
##  KPSS Test for Level Stationarity
## 
## data:  ice_mass_transformed_diff
## KPSS Level = 0.26015, Truncation lag parameter = 2, p-value = 0.1

Again, if we weigh both stationary and non-stationary we can say that overall suggestion is the series is non-stationary. Hence, we will go with the second differencing for the time series. As we have done the transformation we can go with the transformed series itself further since it doesn’t matter whether we are going with the raw series or the transformed series.

ice_mass_transformed_diff2 <- diff(ice_mass_transformed, differences = 2)
plot(ice_mass_transformed_diff2,
     ylab = "Second difference of Transformed Antarctic land ice mass",
    xlab = "Year", main = "Second difference of Transformed Antarctic land ice mass",
    type="o")
Second difference of Transformed Antarctic land ice mass

Second difference of Transformed Antarctic land ice mass

Now we could see that the series appears to be stationary (Figure 10). We will plot the ACF and PACF plots and do the ADF test to confirm the stationarity.

plot_acf_pacf("ACF plot of Antarctica Land Ice
    Mass (Second Difference)", "PACF plot of Antarctica Land Ice
     Mass (Second Difference)")
ACF and PACF plots

ACF and PACF plots

adf.test(ice_mass_transformed_diff2, alternative=c("stationary"))
## 
##  Augmented Dickey-Fuller Test
## 
## data:  ice_mass_transformed_diff2
## Dickey-Fuller = -3.5722, Lag order = 2, p-value = 0.05386
## alternative hypothesis: stationary

The ACF plot (Figure 11) is not having decaying pattern and PACF plot (Figure 11) doesn’t have very high first lag. So, the series may be stationary. But the result of ADF is concerning for us since the value is above 0.05. We have to try the pp test now.

pp.test(ice_mass_transformed_diff2)
## 
##  Phillips-Perron Unit Root Test
## 
## data:  ice_mass_transformed_diff2
## Dickey-Fuller Z(alpha) = -19.719, Truncation lag parameter = 2, p-value
## = 0.02727
## alternative hypothesis: stationary

The pp test confirms that we are having a stationary series. We will perform the KPSS test also to confirm.

kpss.test(ice_mass_transformed_diff2)
## 
##  KPSS Test for Level Stationarity
## 
## data:  ice_mass_transformed_diff2
## KPSS Level = 0.11081, Truncation lag parameter = 2, p-value = 0.1

KPSS suggests the series is stationary. But, we are having the borderline value. Overall we have the following.

Hence, now we can proceed considering the series as stationary.

Model Specification

As we have the stationary series available now, we can start with the model specification now.

  1. Models based on ACF-PACF plots
plot_acf_pacf("ACF plot of Antarctica
    Land Ice Mass", "PACF plot of Antarctica
     Land Ice Mass")
ACF and PACF plots

ACF and PACF plots

From the ACF plot (Figure 12), q = 0 or 1 (Since the significance is less) From the PACF plot (Figure 12), p = 1 or 2 (Since one lag is significant and the other lag is little significant) So, the models suggested by ACF-PACF plots are:

  1. Models based one Extended auto correlation function (EACF)
eacf(ice_mass_transformed_diff2, ar.max = 3, ma.max = 3)
## AR/MA
##   0 1 2 3
## 0 o o o o
## 1 x x o o
## 2 o o o o
## 3 o o o o

As there are fewer data points, we can have a maximum level of AR and MA set as 3. The top left most 0 is at location (0,0). The neighbour model is (0,1). As we are concentrating on ARIMA models we are not considering (0,0) as the AR and MA degree is 0 there. Hence, from the EACF we can find that the suggested ARIMA model is:

  1. Models based on Bayesian Information Criterion (BIC) table
bic_table = armasubsets(y=ice_mass_transformed_diff2 , nar=5 , nma=5, y.name='p',
                        ar.method='ols')
plot(bic_table)
BIC Table

BIC Table

Because of the fewer data points for the BIC table also we won’t be getting any large models. From the above BIC table, the models suggested are:

We have a second best model with MA order of 4 and AR order 0. But we are not going to consider that since the shade is too low and from the y axis we can see that the value is about 1/3rd of the best model.

Results

We have conducted a thorough analysis of the time series given and found out the suggested models from the different model specification tools as below:

Conclusion

References