Class

In class, we discussed the seasonal piece of ARIMA models. If we find a seasonal trend, we can employ the following techniques that are used in this example.

library(forecast)
## Warning: package 'forecast' was built under R version 3.4.4
library(quantmod)
## Warning: package 'quantmod' was built under R version 3.4.4
## Loading required package: xts
## Warning: package 'xts' was built under R version 3.4.4
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.4.4
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: TTR
## Warning: package 'TTR' was built under R version 3.4.3
## Version 0.4-0 included new data defaults. See ?getSymbols.
library(tseries)
## Warning: package 'tseries' was built under R version 3.4.4
library(timeSeries)
## Warning: package 'timeSeries' was built under R version 3.4.4
## Loading required package: timeDate
## Warning: package 'timeDate' was built under R version 3.4.4
## 
## Attaching package: 'timeSeries'
## The following object is masked from 'package:zoo':
## 
##     time<-
library(forecast)
library(xts)

Example

For this example I will be using the souvenir data set.

souvenir <- scan("http://robjhyndman.com/tsdldata/data/fancy.dat")
souvenir <- ts(souvenir, frequency=12, start=c(1987,1))

First, we should plot the data to determine if we have seasonality.

plot.ts(souvenir) 

Now, we can apply the filter function so that we can see the seasonality.

plot.ts(filter(souvenir, sides=2, filter=rep(1/3,3)))

Differencing

This trend looks like it reoccures every six months so we can try a lag 6.

diff1<-diff(log(souvenir),lag=6)
plot.ts(diff1)

adf.test(diff1)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff1
## Dickey-Fuller = -3.8741, Lag order = 4, p-value = 0.01989
## alternative hypothesis: stationary

The small pvalue from our test tells us we want D=1 and because there appears to still be a trend we want d=1.

ACF

Now, we can use the ACF plot to look at the differences we just created to determine q and Q.

Acf(diff1)

Because there is not a lot of spikes at the beginning we want q to equal 0. There are however spikes every 6 so we want Q to equal 1.

PACF

Next, we can use PACF function to determine p and P.

Pacf(diff1)

Based on the graph, there are a couple of spikes at the beginning so p should be 1 and not a lot of spike later on so P should be 0.

Home cooked ARIMA

From our earlier tests, we can run a ARIMA (1,1,0)x(0,1,1)[12].

mod1 <- arima(log(souvenir), order = c(1,1,0), season = list( order = c( 0,1,1), period=12))
summary(mod1) 
## 
## Call:
## arima(x = log(souvenir), order = c(1, 1, 0), seasonal = list(order = c(0, 1, 
##     1), period = 12))
## 
## Coefficients:
##           ar1     sma1
##       -0.5017  -0.5107
## s.e.   0.1013   0.1543
## 
## sigma^2 estimated as 0.03111:  log likelihood = 20.49,  aic = -34.99
## 
## Training set error measures:
##                        ME      RMSE       MAE         MPE     MAPE
## Training set -0.005242458 0.1621816 0.1189982 -0.07803559 1.293914
##                   MASE        ACF1
## Training set 0.2951603 -0.01734621
Box.test(mod1$residuals, type = "Ljung")
## 
##  Box-Ljung test
## 
## data:  mod1$residuals
## X-squared = 0.026188, df = 1, p-value = 0.8714

Our pval for our model is high which means our model is acceptable.

Auto vs Home cooked

To test our home cooked model we can compare it to the auto one using AIC values.

automod<-auto.arima(log(souvenir))
AIC(mod1)
## [1] -34.98918
AIC(automod)
## [1] -36.08965

The AIC values are fairly close so now we can plot the models to see which one focasts better.

plot(forecast(automod, h = 12), main = "auto arima")

plot(forecast(mod1, h = 12), main = "home-cooked")

Our homecooked model looks pretty good the only real difference is that the prediction interval is wider.