In class on Tuesday we learned how to compare nested ARIMA models using the likelihood ratio test. The goal is to find the best model fit. For this learning log, I will build off of my finding in learning log 23 in which I used the souvenir data. I will compare a model, ARIMA (2,0,0) x (2, 1, 2) 12, to the auto ARIMA model from R, (2,0,0) x (1,1,0) 12, so that we can clearly see why R’s model is the best model.
A likelihood ratio test requires nested models. The test statistic is \(2[loglikelihood(model1)-loglikelihood(model2)]\). We compare this to a Chi-squared distribution with the degrees of freedom equal to the difference between model parameters. The informal null hypothesis is \(H_0: \theta = 0 \ \ and \ \ \phi=0\). The alternative hypothesis is: \(H_A: \theta \ \ and / or \ \ \phi \neq 0\)
Before building the models and checking if they are nested, we call the data and get a refresher on what the souvenir time series looks like.
library(forecast)
library(quantmod)
library(tseries)
library(timeSeries)
library(forecast)
library(xts)
souvenir <- scan("http://robjhyndman.com/tsdldata/data/fancy.dat")
souvenir <- ts(souvenir, frequency=12, start=c(1987,1))
plot.ts(souvenir)
The souvenir data definitely has some seasonality, and we could decompose the time series to confirm. The size of the peaks in the data is not constant, so we will try a log tranform to remedy the problem.
logsouvenir<-log(souvenir)
plot.ts(logsouvenir)
Now that we are familiar with the souvenir time series data and its log tranform, we will check if our two models are nested models.
mod<- arima(logsouvenir, order = c(2,0,0), season = list( order = c( 2,1,2), period=12))
## Warning in arima(logsouvenir, order = c(2, 0, 0), season = list(order =
## c(2, : possible convergence problem: optim gave code = 1
automod<-auto.arima(logsouvenir)
mod
##
## Call:
## arima(x = logsouvenir, order = c(2, 0, 0), seasonal = list(order = c(2, 1, 2),
## period = 12))
##
## Coefficients:
## ar1 ar2 sar1 sar2 sma1 sma2
## 0.5095 0.4752 -0.6747 0.2386 0.1808 -0.6582
## s.e. 0.1101 0.1055 0.7646 0.3429 0.7984 0.5998
##
## sigma^2 estimated as 0.03006: log likelihood = 20.54, aic = -27.08
automod
## Series: logsouvenir
## ARIMA(2,0,0)(1,1,0)[12] with drift
##
## Coefficients:
## ar1 ar2 sar1 drift
## 0.3493 0.3602 -0.3278 0.0247
## s.e. 0.1086 0.1159 0.1334 0.0044
##
## sigma^2 estimated as 0.03182: log likelihood=23.04
## AIC=-36.09 AICc=-35.18 BIC=-24.71
The mod model has coefficients ar1, ar2, sar1, sar2, sma1, and sma2. The automod model has coefficients ar1, ar2, and sar1. Thus, the automod model is a nested model of the mod model. We will ignore drift in this discussion.
Now that we have our models, we can test the fits using the equation for the test statistic from above. The difference in parameters between models is 3, so this is our dfs.
teststat<- 2*(as.numeric(logLik(automod))-as.numeric(logLik(mod)))
teststat
## [1] 5.009028
pchisq(teststat, df=3, lower.tail = FALSE )
## [1] 0.1711373
The p-value is large, so the larger autoregressive and moving average seasonal component in mod is not needed. Thus, the simpler model automod is better. This is expected because we used the auto.arima command in R to build the more simple model.
We can check the AIC values to also determine the best model fit.
AIC(mod); AIC(automod)
## [1] -27.08062
## [1] -36.08965
The automod model has the better AIC value.
We can use the model we have determined to be the better model to predict souvenir sales for the next year for this shop.
forecast(automod, h=12)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 1994 9.561175 9.332560 9.789790 9.211539 9.910811
## Feb 1994 9.710395 9.468237 9.952553 9.340046 10.080744
## Mar 1994 10.273626 10.007560 10.539692 9.866714 10.680539
## Apr 1994 10.041821 9.767385 10.316257 9.622108 10.461534
## May 1994 9.911652 9.630035 10.193269 9.480956 10.342348
## Jun 1994 10.121112 9.835715 10.406510 9.684635 10.557590
## Jul 1994 10.428384 10.140343 10.716424 9.987864 10.868903
## Aug 1994 10.542958 10.253331 10.832584 10.100012 10.985903
## Sep 1994 10.645778 10.355110 10.936447 10.201239 11.090318
## Oct 1994 10.670899 10.379577 10.962221 10.225361 11.116437
## Nov 1994 11.062977 10.771234 11.354720 10.616795 11.509159
## Dec 1994 11.870077 11.578067 12.162087 11.423486 12.316668