Introduction

Likelihood Ratio Test or Wilks Test is used to compare two nested models. Nested models means that one is a special case of the other. The test compares the models in terms of model fit.

Test statistic = -2*[loglikelihood of the complicated model - loglikelihood of the simple model]

The test statistic follows a chi-square distribution with degrees of freedom being the difference in the number of parameters for the two models.

To do:

fit two models
find the logLik of each model and the teststat
pchisq(teststat)

Example

library(forecast)

## Warning: package 'forecast' was built under R version 3.4.4

library(quantmod)

## Warning: package 'quantmod' was built under R version 3.4.4

## Warning: package 'xts' was built under R version 3.4.4

## Warning: package 'zoo' was built under R version 3.4.4

library(tseries)

## Warning: package 'tseries' was built under R version 3.4.4

library(timeSeries)

## Warning: package 'timeSeries' was built under R version 3.4.4

## Warning: package 'timeDate' was built under R version 3.4.4

library(forecast)
library(xts)

First, we need to call the data. I have worked with this dataset in the past so I know that we need to take a log of the data to get rid of increasing seasonal varaince.

data = log(AirPassengers)
plot.ts(data)

Now, we have constant seasonal variance so we can start on our models.

We can see there is a seasonal piece and an upward trend. We’ll want to take a difference to find D first.

diff1 = diff(data, lag = 4)
plot.ts(diff1)

adf.test(diff1)

## Warning in adf.test(diff1): p-value smaller than printed p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff1
## Dickey-Fuller = -8.4949, Lag order = 5, p-value = 0.01
## alternative hypothesis: stationary

Since we the upward trend disappeared, we don’t need to take any more differences.

D = 1 and d = 0

Acf(diff1)

Pacf(diff1)

From this, we can see that q >= 1, Q >= 1, p >= 1 and P >= 1. We talked about how to get these numbers in the last learning log.

Now, we can make our two arima models. The first one will be a simple model and the second one will be more complicated.

mod1 = arima(data, order = c(1,0,1), season = list( order = c(1,1,1), period=4))
mod2 = arima(data, order = c(2,0,2), season = list( order = c(1,1,1), period=4))

Our hypotheses are below:

\[{H_o} : {\phi} = 0, {\theta} = 0\] \[{H_a} : {\phi} > 0 - or- {\phi} < 0 -and/or- {\theta} > 0 -or- {\theta} < 0 \]

In other words, our null is that we don’t need Q = 2 and P = 2 and our alternative is that we do need Q = 2 and P = 2 in our model.

We can find the loglikelihood of the models by using the logLik command. Once we have those, we can find our test statistic. Because the distribution is chi-squared, our test statistic needs to be positive. If our formula above doesn’t give a positive number, multiply by -2 instead of just 2.

logLik(mod1)

## 'log Lik.' 129.7264 (df=5)

logLik(mod2)

## 'log Lik.' 138.7879 (df=7)

teststat = 2*(as.numeric(logLik(mod2)) - as.numeric(logLik(mod1)))
teststat

## [1] 18.12298

Now, we can find our p-value by using the pchisq command.

pchisq(teststat, df=2, lower.tail=FALSE)

## [1] 0.0001160501

Our p-value is small so we will keep the more complicated model. That means that the model we should use is mod2 with Q = 2 and P = 2.

Let’s try again using an auto.arima model.

automod <- auto.arima(data)
logLik(automod)

## 'log Lik.' 244.6995 (df=3)

teststat = 2*(as.numeric(logLik(automod)) - as.numeric(logLik(mod2)))
teststat

## [1] 211.8233

pchisq(teststat, df=4, lower.tail=FALSE)

## [1] 1.076915e-44

Since df = 3 for our automod, that means automod is actually the simpler model. Our p-value is telling us to keep the more complicated model.

Overview

When deciding if we need to keep a categorical varaible in our model, we need to use anova or this test. This also helps compare nested models like arima models.

Learning Log Day 24

Megan Blaschko

May 1, 2018

Introduction

Example

Overview