Today we learned about using a Likelihood Ratio Test for model comparison. Like its name says, it compares the fit of 2 models using likelihood. I will use to to compare some models that I made in my last learning log (http://rpubs.com/burr0038/383890).
library(forecast)
## Warning: package 'forecast' was built under R version 3.4.4
library(quantmod)
## Warning: package 'quantmod' was built under R version 3.4.4
## Loading required package: xts
## Warning: package 'xts' was built under R version 3.4.4
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.4.3
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: TTR
## Warning: package 'TTR' was built under R version 3.4.3
## Version 0.4-0 included new data defaults. See ?getSymbols.
library(tseries)
## Warning: package 'tseries' was built under R version 3.4.4
library(timeSeries)
## Warning: package 'timeSeries' was built under R version 3.4.4
## Loading required package: timeDate
## Warning: package 'timeDate' was built under R version 3.4.3
##
## Attaching package: 'timeSeries'
## The following object is masked from 'package:zoo':
##
## time<-
library(forecast)
library(xts)
I was working with the Air Passengers data and applied a log transformation to the data.
data("AirPassengers")
Air<-log(AirPassengers)
plot(Air)
Here are two different ARIMA models we could try for the AirPassenger data. modair2 is the more complicated model, adding an extra lag term and an extra error term to the model.
modair <- arima(Air, order = c(1,0,1), season = list( order = c( 1,2,1), period=12))
modair2 <- arima(Air, order = c(2,0,2), season = list( order = c( 1,2,1), period=12))
summary(modair)
##
## Call:
## arima(x = Air, order = c(1, 0, 1), seasonal = list(order = c(1, 2, 1), period = 12))
##
## Coefficients:
## ar1 ma1 sar1 sma1
## 0.9303 -0.4175 -0.4505 -1.0000
## s.e. 0.0448 0.0991 0.0861 0.1038
##
## sigma^2 estimated as 0.001415: log likelihood = 203.55, aic = -397.1
##
## Training set error measures:
## ME RMSE MAE MPE MAPE
## Training set -0.001134586 0.03439361 0.02451314 -0.01830164 0.4413303
## MASE ACF1
## Training set 0.2706075 -0.02443055
summary(modair2)
##
## Call:
## arima(x = Air, order = c(2, 0, 2), seasonal = list(order = c(1, 2, 1), period = 12))
##
## Coefficients:
## ar1 ar2 ma1 ma2 sar1 sma1
## -0.0132 0.8802 0.552 -0.4233 -0.4476 -1.0000
## s.e. 0.0878 0.0786 0.141 0.1054 0.0864 0.1059
##
## sigma^2 estimated as 0.001391: log likelihood = 203.78, aic = -393.56
##
## Training set error measures:
## ME RMSE MAE MPE MAPE
## Training set -0.001105129 0.03410132 0.02432502 -0.01781168 0.4378125
## MASE ACF1
## Training set 0.2685308 -0.03115159
By looking at the model summaries, we can confirm that they are nested models. This is something we must check because the Likelihood Ratio Test can only compare nested models.
Now that we have our models and are sure they are nested. Let’s compare them. First we will find our degrees of freedom. This is the difference in the number of parameters between the 2 models. Another way we can think about it is how many variables would we be dropping if we choose the simpler model? This is easy to do in our head in this case but we can also make R check for us.
mydf <- length(coef(modair2)) - length(coef(modair))
mydf
## [1] 2
Now we will find our test stat.
(teststat<- -2*as.numeric(logLik(modair) - logLik(modair2)))
## [1] 0.4553452
This is a pretty small test-stat, so it is likely that our complicated model is not necessary.
Let’s check out our p-value.
pchisq(teststat, df = mydf, lower.tail = FALSE)
## [1] 0.796385
As expected, our p-value is quite large. This means that our complicated model is not significantly better. We will stick with the simpler model!
It is important to keep in mind that the Likelihood Ratio test does not give us the best model, but just tells us which is better out of the ones we are comparing.
LRT is another tool you can use when doing model selection stuff. It will be interesting to try LRT with various model buidling techniques.
Let’s try some forecasting using our best model (the simple one).
plot(forecast(modair, h = 12))
predictions<-forecast(modair, h = 12)
predictions
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 1961 6.121999 6.071443 6.172555 6.044681 6.199317
## Feb 1961 6.047318 5.990571 6.104064 5.960531 6.134104
## Mar 1961 6.162154 6.100556 6.223751 6.067949 6.256359
## Apr 1961 6.216652 6.151155 6.282150 6.116482 6.316822
## May 1961 6.271385 6.202699 6.340071 6.166339 6.376431
## Jun 1961 6.392622 6.321302 6.463942 6.283548 6.501697
## Jul 1961 6.549850 6.476338 6.623363 6.437422 6.662278
## Aug 1961 6.543516 6.468170 6.618862 6.428285 6.658748
## Sep 1961 6.348452 6.271570 6.425335 6.230870 6.466035
## Oct 1961 6.242951 6.164778 6.321123 6.123396 6.362505
## Nov 1961 6.097196 6.017943 6.176449 5.975988 6.218403
## Dec 1961 6.197996 6.117841 6.278151 6.075409 6.320583
This data is from 1949-1960. Using these predictions, I would expect that the number of international airline passengers in the summer months 1961 would be about 6.5 thousand a month.
Let’s see if we predict all the way ahead to today. It probably won’t be very accurate since it is so far out.
plot(forecast(modair, h = 696))
predictions2<-forecast(modair, h = 696)
length(predictions2)
## [1] 10
pred<-data.frame(predictions2)
pred[c(685:696),]
## Point.Forecast Lo.80 Hi.80 Lo.95 Hi.95
## Jan 2018 12.87249 11.34766 14.39732 10.540460 15.20451
## Feb 2018 12.11580 10.58272 13.64889 9.771156 14.46045
## Mar 2018 12.15646 10.61676 13.69617 9.801694 14.51123
## Apr 2018 12.70796 11.16311 14.25281 10.345318 15.07060
## May 2018 13.31975 11.77111 14.86840 10.951309 15.68820
## Jun 2018 13.40077 11.84960 14.95195 11.028458 15.77309
## Jul 2018 13.80794 12.25544 15.36044 11.433590 16.18229
## Aug 2018 13.72787 12.17521 15.28053 11.353282 16.10245
## Sep 2018 12.98792 11.43627 14.53956 10.614878 15.36095
## Oct 2018 13.09366 11.54422 14.64310 10.723998 15.46333
## Nov 2018 12.87543 11.32944 14.42142 10.511041 15.23982
## Dec 2018 12.71895 11.17774 14.26017 10.361864 15.07604
Here we can say it predicts the number of international airline passengers during the summer months this year will be about 13.8 thousand per month. We can’t trust this since the data is so old. BUT it is still kinda fun!