Today we learned about using a Likelihood Ratio Test for model comparison. Like its name says, it compares the fit of 2 models using likelihood. I will use to to compare some models that I made in my last learning log (http://rpubs.com/burr0038/383890).

library(forecast)

## Warning: package 'forecast' was built under R version 3.4.4

library(quantmod)

## Warning: package 'quantmod' was built under R version 3.4.4

## Loading required package: xts

## Warning: package 'xts' was built under R version 3.4.4

## Loading required package: zoo

## Warning: package 'zoo' was built under R version 3.4.3

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

## Loading required package: TTR

## Warning: package 'TTR' was built under R version 3.4.3

## Version 0.4-0 included new data defaults. See ?getSymbols.

library(tseries)

## Warning: package 'tseries' was built under R version 3.4.4

library(timeSeries)

## Warning: package 'timeSeries' was built under R version 3.4.4

## Loading required package: timeDate

## Warning: package 'timeDate' was built under R version 3.4.3

## 
## Attaching package: 'timeSeries'

## The following object is masked from 'package:zoo':
## 
##     time<-

library(forecast)
library(xts)

Models

I was working with the Air Passengers data and applied a log transformation to the data.

data("AirPassengers")
Air<-log(AirPassengers)
plot(Air)

Here are two different ARIMA models we could try for the AirPassenger data. modair2 is the more complicated model, adding an extra lag term and an extra error term to the model.

modair <- arima(Air, order = c(1,0,1), season = list( order = c( 1,2,1), period=12))
modair2 <- arima(Air, order = c(2,0,2), season = list( order = c( 1,2,1), period=12))
summary(modair)

## 
## Call:
## arima(x = Air, order = c(1, 0, 1), seasonal = list(order = c(1, 2, 1), period = 12))
## 
## Coefficients:
##          ar1      ma1     sar1     sma1
##       0.9303  -0.4175  -0.4505  -1.0000
## s.e.  0.0448   0.0991   0.0861   0.1038
## 
## sigma^2 estimated as 0.001415:  log likelihood = 203.55,  aic = -397.1
## 
## Training set error measures:
##                        ME       RMSE        MAE         MPE      MAPE
## Training set -0.001134586 0.03439361 0.02451314 -0.01830164 0.4413303
##                   MASE        ACF1
## Training set 0.2706075 -0.02443055

summary(modair2)

## 
## Call:
## arima(x = Air, order = c(2, 0, 2), seasonal = list(order = c(1, 2, 1), period = 12))
## 
## Coefficients:
##           ar1     ar2    ma1      ma2     sar1     sma1
##       -0.0132  0.8802  0.552  -0.4233  -0.4476  -1.0000
## s.e.   0.0878  0.0786  0.141   0.1054   0.0864   0.1059
## 
## sigma^2 estimated as 0.001391:  log likelihood = 203.78,  aic = -393.56
## 
## Training set error measures:
##                        ME       RMSE        MAE         MPE      MAPE
## Training set -0.001105129 0.03410132 0.02432502 -0.01781168 0.4378125
##                   MASE        ACF1
## Training set 0.2685308 -0.03115159

By looking at the model summaries, we can confirm that they are nested models. This is something we must check because the Likelihood Ratio Test can only compare nested models.

Likelihood Ratio Test

Now that we have our models and are sure they are nested. Let’s compare them. First we will find our degrees of freedom. This is the difference in the number of parameters between the 2 models. Another way we can think about it is how many variables would we be dropping if we choose the simpler model? This is easy to do in our head in this case but we can also make R check for us.

mydf <- length(coef(modair2)) - length(coef(modair)) 
mydf

## [1] 2

Now we will find our test stat.

(teststat<- -2*as.numeric(logLik(modair) - logLik(modair2)))

## [1] 0.4553452

This is a pretty small test-stat, so it is likely that our complicated model is not necessary.

Let’s check out our p-value.

pchisq(teststat, df = mydf, lower.tail = FALSE)

## [1] 0.796385

As expected, our p-value is quite large. This means that our complicated model is not significantly better. We will stick with the simpler model!

It is important to keep in mind that the Likelihood Ratio test does not give us the best model, but just tells us which is better out of the ones we are comparing.

LRT is another tool you can use when doing model selection stuff. It will be interesting to try LRT with various model buidling techniques.

Forecasting

Let’s try some forecasting using our best model (the simple one).

plot(forecast(modair, h = 12))

predictions<-forecast(modair, h = 12)
predictions

##          Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## Jan 1961       6.121999 6.071443 6.172555 6.044681 6.199317
## Feb 1961       6.047318 5.990571 6.104064 5.960531 6.134104
## Mar 1961       6.162154 6.100556 6.223751 6.067949 6.256359
## Apr 1961       6.216652 6.151155 6.282150 6.116482 6.316822
## May 1961       6.271385 6.202699 6.340071 6.166339 6.376431
## Jun 1961       6.392622 6.321302 6.463942 6.283548 6.501697
## Jul 1961       6.549850 6.476338 6.623363 6.437422 6.662278
## Aug 1961       6.543516 6.468170 6.618862 6.428285 6.658748
## Sep 1961       6.348452 6.271570 6.425335 6.230870 6.466035
## Oct 1961       6.242951 6.164778 6.321123 6.123396 6.362505
## Nov 1961       6.097196 6.017943 6.176449 5.975988 6.218403
## Dec 1961       6.197996 6.117841 6.278151 6.075409 6.320583

This data is from 1949-1960. Using these predictions, I would expect that the number of international airline passengers in the summer months 1961 would be about 6.5 thousand a month.

Let’s see if we predict all the way ahead to today. It probably won’t be very accurate since it is so far out.

plot(forecast(modair, h = 696))

predictions2<-forecast(modair, h = 696)
length(predictions2)

## [1] 10

pred<-data.frame(predictions2)
pred[c(685:696),]

##          Point.Forecast    Lo.80    Hi.80     Lo.95    Hi.95
## Jan 2018       12.87249 11.34766 14.39732 10.540460 15.20451
## Feb 2018       12.11580 10.58272 13.64889  9.771156 14.46045
## Mar 2018       12.15646 10.61676 13.69617  9.801694 14.51123
## Apr 2018       12.70796 11.16311 14.25281 10.345318 15.07060
## May 2018       13.31975 11.77111 14.86840 10.951309 15.68820
## Jun 2018       13.40077 11.84960 14.95195 11.028458 15.77309
## Jul 2018       13.80794 12.25544 15.36044 11.433590 16.18229
## Aug 2018       13.72787 12.17521 15.28053 11.353282 16.10245
## Sep 2018       12.98792 11.43627 14.53956 10.614878 15.36095
## Oct 2018       13.09366 11.54422 14.64310 10.723998 15.46333
## Nov 2018       12.87543 11.32944 14.42142 10.511041 15.23982
## Dec 2018       12.71895 11.17774 14.26017 10.361864 15.07604

Here we can say it predicts the number of international airline passengers during the summer months this year will be about 13.8 thousand per month. We can’t trust this since the data is so old. BUT it is still kinda fun!

Learning Log 5/1

Regina Burroughs

May 1, 2018

Models

Likelihood Ratio Test

Forecasting