Introduction

Likelihood Ratio Test or Wilks Test is used to compare two nested models. Nested models means that one is a special case of the other. The test compares the models in terms of model fit.

Test statistic = -2*[loglikelihood of the complicated model - loglikelihood of the simple model]

The test statistic follows a chi-square distribution with degrees of freedom being the difference in the number of parameters for the two models.

To do:

  1. fit two models

  2. find the logLik of each model and the teststat

  3. pchisq(teststat)

Example

library(forecast)
## Warning: package 'forecast' was built under R version 3.4.4
library(quantmod)
## Warning: package 'quantmod' was built under R version 3.4.4
## Warning: package 'xts' was built under R version 3.4.4
## Warning: package 'zoo' was built under R version 3.4.4
library(tseries)
## Warning: package 'tseries' was built under R version 3.4.4
library(timeSeries)
## Warning: package 'timeSeries' was built under R version 3.4.4
## Warning: package 'timeDate' was built under R version 3.4.4
library(forecast)
library(xts)

First, we need to call the data. I have worked with this dataset in the past so I know that we need to take a log of the data to get rid of increasing seasonal varaince.

data = log(AirPassengers)
plot.ts(data)

Now, we have constant seasonal variance so we can start on our models.

We can see there is a seasonal piece and an upward trend. We’ll want to take a difference to find D first.

diff1 = diff(data, lag = 4)
plot.ts(diff1)

adf.test(diff1)
## Warning in adf.test(diff1): p-value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff1
## Dickey-Fuller = -8.4949, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

Since we the upward trend disappeared, we don’t need to take any more differences.

D = 1 and d = 0

Acf(diff1)

Pacf(diff1)

From this, we can see that q >= 1, Q >= 1, p >= 1 and P >= 1. We talked about how to get these numbers in the last learning log.

Now, we can make our two arima models. The first one will be a simple model and the second one will be more complicated.

mod1 = arima(data, order = c(1,0,1), season = list( order = c(1,1,1), period=4))
mod2 = arima(data, order = c(2,0,2), season = list( order = c(1,1,1), period=4))

Our hypotheses are below:

\[{H_o} : {\phi} = 0, {\theta} = 0\] \[{H_a} : {\phi} > 0 - or- {\phi} < 0 -and/or- {\theta} > 0 -or- {\theta} < 0 \]

In other words, our null is that we don’t need Q = 2 and P = 2 and our alternative is that we do need Q = 2 and P = 2 in our model.

We can find the loglikelihood of the models by using the logLik command. Once we have those, we can find our test statistic. Because the distribution is chi-squared, our test statistic needs to be positive. If our formula above doesn’t give a positive number, multiply by -2 instead of just 2.

logLik(mod1)
## 'log Lik.' 129.7264 (df=5)
logLik(mod2)
## 'log Lik.' 138.7879 (df=7)
teststat = 2*(as.numeric(logLik(mod2)) - as.numeric(logLik(mod1)))
teststat
## [1] 18.12298

Now, we can find our p-value by using the pchisq command.

pchisq(teststat, df=2, lower.tail=FALSE)
## [1] 0.0001160501

Our p-value is small so we will keep the more complicated model. That means that the model we should use is mod2 with Q = 2 and P = 2.

Let’s try again using an auto.arima model.

automod <- auto.arima(data)
logLik(automod)
## 'log Lik.' 244.6995 (df=3)
teststat = 2*(as.numeric(logLik(automod)) - as.numeric(logLik(mod2)))
teststat
## [1] 211.8233
pchisq(teststat, df=4, lower.tail=FALSE)
## [1] 1.076915e-44

Since df = 3 for our automod, that means automod is actually the simpler model. Our p-value is telling us to keep the more complicated model.

Overview

When deciding if we need to keep a categorical varaible in our model, we need to use anova or this test. This also helps compare nested models like arima models.