Overview

In class on Tuesday we learned how to compare nested ARIMA models using the likelihood ratio test. The goal is to find the best model fit. For this learning log, I will build off of my finding in learning log 23 in which I used the souvenir data. I will compare a model, ARIMA (2,0,0) x (2, 1, 2) 12, to the auto ARIMA model from R, (2,0,0) x (1,1,0) 12, so that we can clearly see why R’s model is the best model.

A likelihood ratio test requires nested models. The test statistic is \(2[loglikelihood(model1)-loglikelihood(model2)]\). We compare this to a Chi-squared distribution with the degrees of freedom equal to the difference between model parameters. The informal null hypothesis is \(H_0: \theta = 0 \ \ and \ \ \phi=0\). The alternative hypothesis is: \(H_A: \theta \ \ and / or \ \ \phi \neq 0\)

Data Exploration

Before building the models and checking if they are nested, we call the data and get a refresher on what the souvenir time series looks like.

library(forecast)
library(quantmod)
library(tseries)
library(timeSeries)
library(forecast)
library(xts)

souvenir <- scan("http://robjhyndman.com/tsdldata/data/fancy.dat")
souvenir <- ts(souvenir, frequency=12, start=c(1987,1))
plot.ts(souvenir)

The souvenir data definitely has some seasonality, and we could decompose the time series to confirm. The size of the peaks in the data is not constant, so we will try a log tranform to remedy the problem.

logsouvenir<-log(souvenir)
plot.ts(logsouvenir)

Checking if Nested

Now that we are familiar with the souvenir time series data and its log tranform, we will check if our two models are nested models.

mod<- arima(logsouvenir, order = c(2,0,0), season = list( order = c( 2,1,2), period=12))

## Warning in arima(logsouvenir, order = c(2, 0, 0), season = list(order =
## c(2, : possible convergence problem: optim gave code = 1

automod<-auto.arima(logsouvenir)
mod

## 
## Call:
## arima(x = logsouvenir, order = c(2, 0, 0), seasonal = list(order = c(2, 1, 2), 
##     period = 12))
## 
## Coefficients:
##          ar1     ar2     sar1    sar2    sma1     sma2
##       0.5095  0.4752  -0.6747  0.2386  0.1808  -0.6582
## s.e.  0.1101  0.1055   0.7646  0.3429  0.7984   0.5998
## 
## sigma^2 estimated as 0.03006:  log likelihood = 20.54,  aic = -27.08

automod

## Series: logsouvenir 
## ARIMA(2,0,0)(1,1,0)[12] with drift 
## 
## Coefficients:
##          ar1     ar2     sar1   drift
##       0.3493  0.3602  -0.3278  0.0247
## s.e.  0.1086  0.1159   0.1334  0.0044
## 
## sigma^2 estimated as 0.03182:  log likelihood=23.04
## AIC=-36.09   AICc=-35.18   BIC=-24.71

The mod model has coefficients ar1, ar2, sar1, sar2, sma1, and sma2. The automod model has coefficients ar1, ar2, and sar1. Thus, the automod model is a nested model of the mod model. We will ignore drift in this discussion.

Test Statistic and P-value

Now that we have our models, we can test the fits using the equation for the test statistic from above. The difference in parameters between models is 3, so this is our dfs.

teststat<- 2*(as.numeric(logLik(automod))-as.numeric(logLik(mod)))
teststat

## [1] 5.009028

pchisq(teststat, df=3, lower.tail = FALSE )

## [1] 0.1711373

The p-value is large, so the larger autoregressive and moving average seasonal component in mod is not needed. Thus, the simpler model automod is better. This is expected because we used the auto.arima command in R to build the more simple model.

Results

We can check the AIC values to also determine the best model fit.

AIC(mod); AIC(automod)

## [1] -27.08062

## [1] -36.08965

The automod model has the better AIC value.

We can use the model we have determined to be the better model to predict souvenir sales for the next year for this shop.

forecast(automod, h=12)

##          Point Forecast     Lo 80     Hi 80     Lo 95     Hi 95
## Jan 1994       9.561175  9.332560  9.789790  9.211539  9.910811
## Feb 1994       9.710395  9.468237  9.952553  9.340046 10.080744
## Mar 1994      10.273626 10.007560 10.539692  9.866714 10.680539
## Apr 1994      10.041821  9.767385 10.316257  9.622108 10.461534
## May 1994       9.911652  9.630035 10.193269  9.480956 10.342348
## Jun 1994      10.121112  9.835715 10.406510  9.684635 10.557590
## Jul 1994      10.428384 10.140343 10.716424  9.987864 10.868903
## Aug 1994      10.542958 10.253331 10.832584 10.100012 10.985903
## Sep 1994      10.645778 10.355110 10.936447 10.201239 11.090318
## Oct 1994      10.670899 10.379577 10.962221 10.225361 11.116437
## Nov 1994      11.062977 10.771234 11.354720 10.616795 11.509159
## Dec 1994      11.870077 11.578067 12.162087 11.423486 12.316668

Learning Log 24 ARIMA Model Comparison

Jill Wanner

May 2, 2018

Overview

Data Exploration

Checking if Nested

Test Statistic and P-value

Results