We will be looking to see which model of our two from last learning log will be best. To refresh, last learning log we created our own ARIMA model, based on charts. We also used the autoARIMA model, which gave us a slightly different example. We can use these old models to compare and see which model is better.
We will take our code from last time, which we already know how to do.
Our hand-built ARIMA model is (2, 1, 1) x (1, 1, 1) [12]. Our auto.ARIMA model is (0, 1, 1) x (0, 1, 1) [12].
We can make these into mod1 and mod2.
data("AirPassengers")
library(forecast)
## Warning: package 'forecast' was built under R version 3.4.4
mod1 <-arima(log(AirPassengers), order = c(2,1,1), season = list( order = c(1, 1, 1), period = 12))
mod2 <- arima(log(AirPassengers), order = c(0,1,1), season = list(order = c(0, 1, 1), period = 12))
Now that we have set up our models, we can use logLik to learn about our teststats.
(testStat1 <- logLik(mod1))
## 'log Lik.' 246.2106 (df=6)
(testStat2 <- logLik(mod2))
## 'log Lik.' 244.6995 (df=3)
Here we can see our numbers and our degrees of freedom. We have df = 3 and we will subtract these numbers to see our final test stat.
(testStatFinal <- 2 *(as.numeric(logLik(mod1)) - as.numeric(logLik(mod2))))
## [1] 3.022101
We can now use pchisq to look at our p-value.
pchisq(testStatFinal, df = 3, lower.tail = FALSE)
## [1] 0.3882302
This gives us a big p-value, so we want to go with our first model. This means that our hand-made ARIMA model was much better.
In terms of applications, it is a big difference to use an autoregressive component of 2 instead of 0. This means that we should be looking at 2 points around the yt instead of just 0 around or 1 around. We also increased the autoregressive seasonal component from 0 to 1.
This means that if airlines are trying to predict the busy times for people to use planes, they have to look at the months bordering the one in question, not just the month in question.