ARIMA model hypothesis testing and diagnostics

hypothesis testing where the null is that the \(\phi_i\) = 0 or \(\theta_i\) = 0. we test against the alternative hypothesis \(\phi_i \neq 0\) or \(\theta_i \neq 0\).
We also use the diagnostic box plot to analyze the residuals of the time series. The box pierce stat has been improved and now Ljung Box is the stat used for residual analysis. The idea is to test if ARIMA is accounting for the dependencies of Y_t’s, and therefore, the residuals and the box plot will be small. The null hypothesis here is that the model is accurate and accounts for dependencies. We test against the alternative hypothesis that the model is inadequate.
Here we will define a model using the auto.arima command and use the Ljung Box test for the residuals of the model. Make sure to capitalize B in box because R will give you a very unhappy answer otherwise.

library(forecast)
## Warning: package 'forecast' was built under R version 3.4.4
library(tseries)
## Warning: package 'tseries' was built under R version 3.4.4
mod <- auto.arima(WWWusage)
mod
## Series: WWWusage 
## ARIMA(1,1,1) 
## 
## Coefficients:
##          ar1     ma1
##       0.6504  0.5256
## s.e.  0.0842  0.0896
## 
## sigma^2 estimated as 9.995:  log likelihood=-254.15
## AIC=514.3   AICc=514.55   BIC=522.08
Box.test(mod$residuals, lag=1, type = "Ljung")
## 
##  Box-Ljung test
## 
## data:  mod$residuals
## X-squared = 0.030322, df = 1, p-value = 0.8618

Our p value is .8618, which means we do not reject the null. Anotherwards, we don’t have evidence that this model is inadequate, so the residuals are small enough that it is accounting for dependencies of y_ts. This is good news!

Seasonal ARIMA

We need to define both the seasonal and non-seasonal pieces of the data set. Usually step 1 of ARIMA is to find the number of differences d, then we would find a better p,q afterward.
We read the data in and then break up the time series into the observed trends and seasonal trends.

births <- scan("http://robjhyndman.com/tsdldata/data/nybirths.dat")
birthstimeseries <- ts(births, frequency=12, start=c(1946,1))

TScomp <- decompose(birthstimeseries)
plot(TScomp)

this breaks up the seasonal and the random trends so we can visualize the long term trends better.

If we just want to find the trend/irregular components, we should subtract seasonal component so that only trend and irregular component remains. We can also display what we removed in the second plot.

noseason <- birthstimeseries - TScomp$seasonal
plot(noseason)

plot(TScomp$seasonal)

Seasonal ARIMA process by hand

there are multiple steps outlined below if you don’t want to use auto.arima.

plots

First plot the data. We can plot it without a lot of the noise by filtering it.

births <- births[-c(1:24)]
births <- ts(births, frequency=12, start=c(1946,1))
par(mfrow=c(2,1))
plot.ts(births) # hard to see seasonality due to noise
#this filter helps smooth the noise, seasonality sticks out
plot.ts(filter(births, sides=2, filter=rep(1/3,3)))

#this filter mostly smooths out the seasonality
par(mfrow=c(1,1))
(wgts <- c(.5, rep(1,11), .5)/12)
##  [1] 0.04166667 0.08333333 0.08333333 0.08333333 0.08333333 0.08333333
##  [7] 0.08333333 0.08333333 0.08333333 0.08333333 0.08333333 0.08333333
## [13] 0.04166667
plot.ts(filter(births, sides=2, filter = wgts))

we’re pretty sure there’s seasonality here, so we need to take the differences of lag S to remove the seasonality.

differences

We believe it’s dependent on months so we will set S=12. Then we have to take differences to remove any trend after taking out seasonality. This is done by setting lag =12 and defining the differences at diff1. Part A:

diff1 <- diff(births, lag = 12)
plot.ts(diff1) 

adf.test(diff1)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff1
## Dickey-Fuller = -3.4166, Lag order = 5, p-value = 0.05488
## alternative hypothesis: stationary

We’re wondering if this looks stationary (equal mean and variance). It looks fine, but we can check the ADF test to be certain. ADF says this has a p value of .05. When we do tests like this, we can use a significance level like .1 so that we don’t overcomplicate the model because it’s not our main priority.
Therefore, we can guess that our uppercase D = 1.

Part B: doesn’t look necessary anymore (no trend remaining in diff1). Therefore our lowercase d will be = 0. If there were still a trend, we would try to take the differences of lag 1 next and our lowercase d would be 1.

ACF and PACF

By looking at the Auto-correlation function and partial Auto-correlation function, we can look for spikes to decide what the ps and qs should be.
Let’s look at the nonseasonal p and q first:

Acf(diff1) #spikes at low lags ==> nonseasonal MA (q > =1)

Pacf(diff1) # spike at low lag ==> nonseasonal AR (p >= 1)

Anything within the dotted blue lines is not significant. Spikes at low legs of ACF imply we need a moving average part for the non-seasonal (lowercase q). Spikes at low legs of PACF imply the we need an Auto-regressive part for the non seasonal (lowercase p).
In this data set, lowercase q = 1 or 2, and lowercase p = 1.

For the seasonal component, the lags that are multiples of S should be noted. For our data, spikes at lag 12 in ACF is significant so we should use uppercase Q = 1 or 2. Looking at PACF, lag 2 is not significant, uppercase should be = 1 or 2.
In summary, our model thus far is p=1 or 2, d=0, q=1 or 2, and P=1 or 2, D=1 or 2, Q= 1 or 2.

estimating the model

It’t time to fill in our ARIMA (p, d, q) x (P, D, Q) S

We start with the nonseasonal (lowercase p,d,q) then list the seasonal components with the period (monthly) in our model. This first model will have components ARIMA (1, 0, 1) x (1, 1, 1) 12

mod1 <- arima(births, order = c(1,0,1), season = list( order = c( 1,1,1), period=12))

bettering the models

It’s important to build the best model. We test our first model with the Ljung test to see if it is adequate.

summary(mod1) 
## 
## Call:
## arima(x = births, order = c(1, 0, 1), seasonal = list(order = c(1, 1, 1), period = 12))
## 
## Coefficients:
##          ar1      ma1     sar1     sma1
##       0.9884  -0.1564  -0.2199  -0.8820
## s.e.  0.0287   0.1747   0.1004   0.1498
## 
## sigma^2 estimated as 0.3924:  log likelihood = -136.81,  aic = 283.61
## 
## Training set error measures:
##                      ME      RMSE       MAE       MPE     MAPE      MASE
## Training set 0.06656964 0.5998188 0.4452406 0.2299257 1.727419 0.3675936
##                   ACF1
## Training set 0.0410978
Box.test(mod1$residuals, type = "Ljung")
## 
##  Box-Ljung test
## 
## data:  mod1$residuals
## X-squared = 0.24832, df = 1, p-value = 0.6183

Our p-value is .6183 so we do not reject the null that the model is adequate and we can assume our residuals are small enough.

We can try to alter our first model to improve upon it by changing the ps and qs.

From our first model we can see that the test stat of ma1 is pretty small, so we may want to eliminate that one. If we use a hypothesis test to see whether it is \(/neq\) 0, we examine whether to reject the null and be able to say the moving average is not significant and therefore can reduce it.
Removing ma1, we will make our model again:

# ARIMA (1, 0, 0) x (1, 1, 1) 12
mod2 <- arima(births, order = c(1,0,0), season = list( order = c( 1,1,1), period=12))
Box.test(mod2$residuals, type = "Ljung") # yay
## 
##  Box-Ljung test
## 
## data:  mod2$residuals
## X-squared = 0.62922, df = 1, p-value = 0.4276
summary(mod2) # most Wald test stats look decently far from 0
## 
## Call:
## arima(x = births, order = c(1, 0, 0), seasonal = list(order = c(1, 1, 1), period = 12))
## 
## Coefficients:
##          ar1     sar1     sma1
##       0.9750  -0.2263  -0.8595
## s.e.  0.0326   0.0998   0.1280
## 
## sigma^2 estimated as 0.3986:  log likelihood = -137.27,  aic = 282.55
## 
## Training set error measures:
##                      ME      RMSE      MAE     MPE     MAPE      MASE
## Training set 0.07991759 0.6045416 0.450681 0.28466 1.748639 0.3720852
##                     ACF1
## Training set -0.06542041

let’s test the parameter with Wald test stat closest to 0

(teststat <- -0.2263/0.0998)
## [1] -2.267535
2*pt(abs(teststat), df = 144-3, lower.tail = FALSE) #p value
## [1] 0.02487795
# therefore, reject null that this parameter = 0
# we should keep this parameter estimate in the model
# and not remove the seasonal autoregressive parameter

Let’s see if we can improve our model by further alterations of ps and qs.

## we said p >=1, P >=1, Q >=1 and we tried all EQUAL to 1
# ARIMA (2, 0, 0) x (1, 1, 1) 12
mod3 <- arima(births, order = c(2,0,0), season = list( order = c( 1,1,1), period=12))
mod3 # ar2 not significantly diff from 0
## 
## Call:
## arima(x = births, order = c(2, 0, 0), seasonal = list(order = c(1, 1, 1), period = 12))
## 
## Coefficients:
##          ar1     ar2     sar1     sma1
##       0.9275  0.0513  -0.2237  -0.8671
## s.e.  0.0876  0.0881   0.0999   0.1333
## 
## sigma^2 estimated as 0.3965:  log likelihood = -137.1,  aic = 284.21

question: should we have the nonseasonal autoregressive piece? no. let’s forget about mod3 and go back to mod 2. should we increase the order of any of the seasonal components? let’s try increasing AR piece.

# ARIMA (1, 0, 0) x (2, 1, 1) 12
mod4 <- arima(births, order = c(1,0,0), season = list( order = c( 2,1,1), period=12))
mod4 # sar2 IS significantly diff from 0
## 
## Call:
## arima(x = births, order = c(1, 0, 0), seasonal = list(order = c(2, 1, 1), period = 12))
## 
## Coefficients:
##          ar1     sar1     sar2     sma1
##       0.9677  -0.4796  -0.3846  -0.6243
## s.e.  0.0268   0.1138   0.1022   0.1227
## 
## sigma^2 estimated as 0.3673:  log likelihood = -131.72,  aic = 273.44
AIC(mod4)
## [1] 273.4391
AIC(mod2) # we improved! mod4 better than mod2
## [1] 282.5484
# ARIMA (1, 0, 0) x (2, 1, 2) 12
mod5 <- arima(births, order = c(1,0,0), season = list( order = c( 2,1,2), period=12))
mod5 # sma1 NOT significantly diff from 0
## 
## Call:
## arima(x = births, order = c(1, 0, 0), seasonal = list(order = c(2, 1, 2), period = 12))
## 
## Coefficients:
##          ar1     sar1     sar2     sma1    sma2
##       0.9612  -0.2486  -0.3563  -0.8751  0.2615
## s.e.  0.0290   0.2787   0.1310   0.3022  0.2498
## 
## sigma^2 estimated as 0.3631:  log likelihood = -131.09,  aic = 274.19
# so forget mod5. Simpler (mod4) is better 
AIC(mod5)
## [1] 274.1868
AIC(mod4) #plus they have the same AIC (roughly anyway)
## [1] 273.4391

Nope,there’s not a significant change between the more complicated model 5 and mod4, so we go back to mod 4.

We can examine the diagnostics for ARIMA (1, 0, 0) x (2, 1, 1) 12

mod4
## 
## Call:
## arima(x = births, order = c(1, 0, 0), seasonal = list(order = c(2, 1, 1), period = 12))
## 
## Coefficients:
##          ar1     sar1     sar2     sma1
##       0.9677  -0.4796  -0.3846  -0.6243
## s.e.  0.0268   0.1138   0.1022   0.1227
## 
## sigma^2 estimated as 0.3673:  log likelihood = -131.72,  aic = 273.44
Box.test(mod4$residuals) # model not inadequate :)
## 
##  Box-Pierce test
## 
## data:  mod4$residuals
## X-squared = 1.3432, df = 1, p-value = 0.2465

Hrere our p value is .2465, which means we do not reject the null hypothesis that the model is adequate. Yay.

Let’s compare what we got to auto.arima:

automod <- auto.arima(births)
automod # pretty similar
## Series: births 
## ARIMA(1,0,0)(2,1,1)[12] with drift 
## 
## Coefficients:
##          ar1     sar1     sar2     sma1   drift
##       0.6988  -0.4160  -0.3503  -0.6846  0.0447
## s.e.  0.0631   0.1222   0.1093   0.1281  0.0033
## 
## sigma^2 estimated as 0.3287:  log likelihood=-122.05
## AIC=256.09   AICc=256.77   BIC=273.39
AIC(automod)
## [1] 256.0934
AIC(mod4)
## [1] 273.4391

There is a significant improvement between mod4 and automod because the AIC differs >10.

We cal also perform a box test

Box.test(automod$residuals)
## 
##  Box-Pierce test
## 
## data:  automod$residuals
## X-squared = 0.23843, df = 1, p-value = 0.6253

The p value is high so we again do not reject the null that the model is adequate.

Forecasting

additionally, we can look at the difference between the model we came up with and the model provided by R’s auto.arima, using the forecast.

par(mfrow=c(2,1))
plot(forecast(automod, h = 12), main = "auto.arima")
plot(forecast(mod4, h = 12), main = "home-cooked")

There is less variation in the auto.arima model, which means it has better predicting power. Basically we’re not as a good as a computer, which is only to be expected.