An ARIMA Example

Data set globaltemp:

We begin by ploting the series and looking at the ACF and PACF plots. ACF seems to decay very slowly, a sign of non-stationarity:

load("/Users/arminda/Desktop/Arminda/TS/TS1-Session/TS-New/milk.RData")
library("astsa")
plot(globtemp.ts)

acf2(globtemp.ts)

We take first-order differences and confirm by a series of tests than one first-order difference is enough to make the series stationary:

library("tseries")
adf.test(diff(globtemp.ts))

## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff(globtemp.ts)
## Dickey-Fuller = -8.3444, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary

kpss.test(diff(globtemp.ts))

## 
##  KPSS Test for Level Stationarity
## 
## data:  diff(globtemp.ts)
## KPSS Level = 0.10493, Truncation lag parameter = 4, p-value = 0.1

sd(globtemp.ts)

## [1] 0.2566048

sd(diff(globtemp.ts))

## [1] 0.1499603

sd(diff(diff(globtemp.ts)))

## [1] 0.2381071

The ACF and PACF of the first differences:

library("forecast")
acf2(diff(globtemp.ts))

The ACF and PACF of the first differences look much more plausible for an ARMA model. If the first differences were an AR or an MA process, we would expect either the ACF or the PACF to cut off after a finite lag. Since this doesn’t seem to happen, we will propose an ARMA model; the simplest candidate is ARMA(1,1), so let’s start with that. As we have taken first-order differences on the original series to make it stationary, we have to fit an ARIMA(1,1,1) model to the original series:

model.1=Arima(globtemp.ts,order=c(1,1,1),include.constant=1)
summary(model.1)

## Series: globtemp.ts 
## ARIMA(1,1,1) with drift 
## 
## Coefficients:
##          ar1      ma1   drift
##       0.2695  -0.8180  0.0061
## s.e.  0.1122   0.0624  0.0030
## 
## sigma^2 = 0.01722:  log likelihood = 77.05
## AIC=-146.11   AICc=-145.77   BIC=-134.83
## 
## Training set error measures:
##                        ME      RMSE        MAE MPE MAPE      MASE       ACF1
## Training set -0.002162733 0.1291073 0.09925874 Inf  Inf 0.8277124 0.04097333

The AR coefficient is estimated to be \(0.27\) (with a standard error of \(0.11\)) and the MA coefficient to be \(-0.818\) (with a standard error of \(0.06\)). The mean of the differenced sequence, so it corresponds to the drift of the original sequence, is estimated at \(0.006\), with a standard error of \(0.003\)). Some diagnostic and diagnostic plots follow:

jarque.bera.test(model.1$residuals)

## 
##  Jarque Bera Test
## 
## data:  model.1$residuals
## X-squared = 1.496, df = 2, p-value = 0.4733

tsdiag(model.1)

The Jarque-Bera test returns a fairly reasonable p-value of \(0.4733\), but we could attempt to reduce the almost significant correlations of the residuals at lags \(2\) and \(4\) by introducing some more MA terms. In fact, doing so reduces the AIC of the fitted model. After trying MA degrees of 1 through 5, it is found that an ARIMA(1,1,4) model has the lowest AIC. It is not lower than AIC for model.1, but it passes the independence residuals test.

model.2=Arima(globtemp.ts,order=c(1,1,4),include.constant=1)
summary(model.2)

## Series: globtemp.ts 
## ARIMA(1,1,4) with drift 
## 
## Coefficients:
##           ar1     ma1      ma2      ma3     ma4   drift
##       -0.7875  0.3034  -0.7214  -0.2547  0.1837  0.0060
## s.e.   0.1879  0.2056   0.1119   0.0985  0.1081  0.0033
## 
## sigma^2 = 0.01633:  log likelihood = 81.76
## AIC=-149.52   AICc=-148.56   BIC=-129.78
## 
## Training set error measures:
##                        ME      RMSE        MAE MPE MAPE     MASE        ACF1
## Training set -0.001313962 0.1241457 0.09612207 Inf  Inf 0.801556 0.008604604

tsdiag(model.2)

cov2cor(model.2$var.coef)

##               ar1          ma1         ma2        ma3         ma4        drift
## ar1    1.00000000 -0.895080684  0.53796027  0.3983651  0.47167214 -0.013879452
## ma1   -0.89508068  1.000000000 -0.32403569 -0.6151942 -0.57346262  0.009982013
## ma2    0.53796027 -0.324035694  1.00000000  0.1080222 -0.23603126 -0.016105031
## ma3    0.39836506 -0.615194225  0.10802216  1.0000000  0.49905622 -0.012889498
## ma4    0.47167214 -0.573462623 -0.23603126  0.4990562  1.00000000 -0.019272251
## drift -0.01387945  0.009982013 -0.01610503 -0.0128895 -0.01927225  1.000000000

But, when we check correlations between coefficients, we find something alarming, there is one high correlation (greater than \(0.8\)). So, we discard this model and try increasing the order of the autoregressive part instead.

model.3=Arima(globtemp.ts,order=c(2,1,1),include.constant=1)
model.3$aicc

## [1] -148.1998

tsdiag(model.3)

jarque.bera.test(model.3$residuals)

## 
##  Jarque Bera Test
## 
## data:  model.3$residuals
## X-squared = 0.54661, df = 2, p-value = 0.7609

Predictions with model.3 follow. An upward drift is evident in the predictions corresponding to the mean \(0.006\) that we noticed in the differenced data, but the standard error is quite large (you can see this in the length of the confidence intervals for predictions).

plot(forecast(model.3, h=6))

Observation: In both ARIMA models fitted so far, an include.constat=1 term is included. When the order of total differences in a model is equal to \(1\), a constant term should be included if the series has non zero average trend. We observe, in the plot of the original series and upward trend or upward drift. If the model fitted has two or more orders of total differences, it normally doesn’t include a constant term. A model with no order of differencing normally includes a constant term (which allows for a non-zero mean value).