Introduction

Today in class we covered a new form of model comparison: the likelihood ration test, otherwise known as Wilk’s Test. For LRT, we need the models to be nested, where one model is a special case of the other (one model is more complicated with more terms than the other). We need evidence to justify using a more complicated model, and LRT gives us a way to test whether or not we have evidence.

The likelihood ratio test compares the fit of two models using likelihood. The test stat in this case is -2[loglikelihood of simpler - loglikelihood or complicated]. The test statistic follows the chi-squared distribution, where the degrees of freedom is the difference in the number of parameters. The more extreme our test statistic is, the smaller the p-value is, which means we want to reject the null hypothesis in favor of the alternative. In this case, the complicated model better and we have enough evidence to use it.

Example

We will use the souvenir data set from the previous learning log to demonstrate this concept.

souvenir <- scan("http://robjhyndman.com/tsdldata/data/fancy.dat")
souvenir <- ts(souvenir, frequency=12, start=c(1987,1))
logsouvenir <- log(souvenir)
library(forecast)
## Warning: package 'forecast' was built under R version 3.4.4
library(quantmod)
## Warning: package 'quantmod' was built under R version 3.4.4
## Loading required package: xts
## Warning: package 'xts' was built under R version 3.4.4
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.4.3
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: TTR
## Warning: package 'TTR' was built under R version 3.4.3
## Version 0.4-0 included new data defaults. See ?getSymbols.
library(tseries)
## Warning: package 'tseries' was built under R version 3.4.4
library(timeSeries)
## Warning: package 'timeSeries' was built under R version 3.4.4
## Loading required package: timeDate
## Warning: package 'timeDate' was built under R version 3.4.3
## 
## Attaching package: 'timeSeries'
## The following object is masked from 'package:zoo':
## 
##     time<-
library(forecast)
library(xts)

We will first fit our two models.

mod1 <- arima(logsouvenir, order = c(1,1,1), season = list(order = c(0,1,0), period=12))
mod2 <- arima(logsouvenir, order = c(0,1,1), season = list(order = c(0,1,0), period=12))

mod2 is nested in the mod1, so we can use LRT to compare these two models.

Our degrees of freedom is equal to 1 because the first model has one more term than the second.

mydf<-1
(teststat<- -2*as.numeric(logLik(mod2) - logLik(mod1)))
## [1] 0.3855086
pchisq(teststat, df = mydf, lower.tail = FALSE)
## [1] 0.5346698

Since we have a large p-value, we don’t want to reject the null hypothesis. In this case, we want to go with the simpler model, which is also what we found in the learning log yesterday.

We don’t know that mod2 is the best, but we do know that it’s better than mod1!