In class, we discussed the seasonal piece of ARIMA models. If we find a seasonal trend, we can employ the following techniques that are used in this example.
library(forecast)
## Warning: package 'forecast' was built under R version 3.4.4
library(quantmod)
## Warning: package 'quantmod' was built under R version 3.4.4
## Loading required package: xts
## Warning: package 'xts' was built under R version 3.4.4
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.4.4
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: TTR
## Warning: package 'TTR' was built under R version 3.4.3
## Version 0.4-0 included new data defaults. See ?getSymbols.
library(tseries)
## Warning: package 'tseries' was built under R version 3.4.4
library(timeSeries)
## Warning: package 'timeSeries' was built under R version 3.4.4
## Loading required package: timeDate
## Warning: package 'timeDate' was built under R version 3.4.4
##
## Attaching package: 'timeSeries'
## The following object is masked from 'package:zoo':
##
## time<-
library(forecast)
library(xts)
For this example I will be using the souvenir data set.
souvenir <- scan("http://robjhyndman.com/tsdldata/data/fancy.dat")
souvenir <- ts(souvenir, frequency=12, start=c(1987,1))
First, we should plot the data to determine if we have seasonality.
plot.ts(souvenir)
Now, we can apply the filter function so that we can see the seasonality.
plot.ts(filter(souvenir, sides=2, filter=rep(1/3,3)))
This trend looks like it reoccures every six months so we can try a lag 6.
diff1<-diff(log(souvenir),lag=6)
plot.ts(diff1)
adf.test(diff1)
##
## Augmented Dickey-Fuller Test
##
## data: diff1
## Dickey-Fuller = -3.8741, Lag order = 4, p-value = 0.01989
## alternative hypothesis: stationary
The small pvalue from our test tells us we want D=1 and because there appears to still be a trend we want d=1.
Now, we can use the ACF plot to look at the differences we just created to determine q and Q.
Acf(diff1)
Because there is not a lot of spikes at the beginning we want q to equal 0. There are however spikes every 6 so we want Q to equal 1.
Next, we can use PACF function to determine p and P.
Pacf(diff1)
Based on the graph, there are a couple of spikes at the beginning so p should be 1 and not a lot of spike later on so P should be 0.
From our earlier tests, we can run a ARIMA (1,1,0)x(0,1,1)[12].
mod1 <- arima(log(souvenir), order = c(1,1,0), season = list( order = c( 0,1,1), period=12))
summary(mod1)
##
## Call:
## arima(x = log(souvenir), order = c(1, 1, 0), seasonal = list(order = c(0, 1,
## 1), period = 12))
##
## Coefficients:
## ar1 sma1
## -0.5017 -0.5107
## s.e. 0.1013 0.1543
##
## sigma^2 estimated as 0.03111: log likelihood = 20.49, aic = -34.99
##
## Training set error measures:
## ME RMSE MAE MPE MAPE
## Training set -0.005242458 0.1621816 0.1189982 -0.07803559 1.293914
## MASE ACF1
## Training set 0.2951603 -0.01734621
Box.test(mod1$residuals, type = "Ljung")
##
## Box-Ljung test
##
## data: mod1$residuals
## X-squared = 0.026188, df = 1, p-value = 0.8714
Our pval for our model is high which means our model is acceptable.
To test our home cooked model we can compare it to the auto one using AIC values.
automod<-auto.arima(log(souvenir))
AIC(mod1)
## [1] -34.98918
AIC(automod)
## [1] -36.08965
The AIC values are fairly close so now we can plot the models to see which one focasts better.
plot(forecast(automod, h = 12), main = "auto arima")
plot(forecast(mod1, h = 12), main = "home-cooked")
Our homecooked model looks pretty good the only real difference is that the prediction interval is wider.