Although I was absent for class, I’ve been going through the material and think I have enough to construct a learning log. The topic covered was seasonal ARIMA models. This is similar to the original ARIMA but with a seasonal component (WOW!!). The format is as follows: arima(p,d,q)x(P,D,Q)*S. S is the timespan of the season and the uppercase pdq are the seasonal component, and the lowercase pdq are the same as what we did before. The first step is to plot the data.
library(forecast)
library(quantmod)
## Loading required package: xts
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: TTR
## Version 0.4-0 included new data defaults. See ?getSymbols.
library(tseries)
library(timeSeries)
## Loading required package: timeDate
##
## Attaching package: 'timeSeries'
## The following object is masked from 'package:zoo':
##
## time<-
library(forecast)
library(xts)
data(AirPassengers)
plot(log(AirPassengers))
We can clearly see a seasonal component in the data set, which is logged to have a relatively constant variance. Now we need to determine the order of differenceing, for which we use the diff function. In this we need to set a lag amount, which is based on the type of seasonality. So, because we have monthly data we use a lag of 12.
diffseas <- diff(log(AirPassengers), lag=12)
plot(diffseas)
The data doesn’t look like it is very increasing or decreasing, but let’s do an adf test to be sure
adf.test(diffseas)
##
## Augmented Dickey-Fuller Test
##
## data: diffseas
## Dickey-Fuller = -3.1899, Lag order = 5, p-value = 0.09265
## alternative hypothesis: stationary
Our p-value is less than .1 so we conclude that the data is stationary, meaning that we can use a small d of 0. Now we can figure out what we’ll use for the autoregressive and moving average parts of the model. We start by using the acf function.
Acf(diffseas)
By looking at this graph, we can determine our q and Q. In low lags we see the first 5 lags to be very high, but they remain above the blue line until about 8 so we’ll try a q of 5. To find Q we look at the lags around our multiples of 12, because that’s what we used in our diff function. We see that there is a large lag at 12 so we’ll use 1 for Q. Now we can find our P and p by using the pacf function.
Pacf(diffseas)
Our first lag is very large, but remains above the line for a second, so we need at least 1 for p. 12 and 24 don’t show huge lags so we’ll use 0 for P. Now we can create the arima model by inputing a seasonal component.
passmod <- arima(log(AirPassengers), order = c(1, 0, 5), season = list(order = c(0, 1, 1), period= 12))
summary(passmod)
##
## Call:
## arima(x = log(AirPassengers), order = c(1, 0, 5), seasonal = list(order = c(0,
## 1, 1), period = 12))
##
## Coefficients:
## ar1 ma1 ma2 ma3 ma4 ma5 sma1
## 0.9970 -0.3838 0.0691 -0.1690 -0.1222 0.1606 -0.5741
## s.e. 0.0042 0.0863 0.0922 0.0895 0.0989 0.0895 0.0795
##
## sigma^2 estimated as 0.001274: log likelihood = 248.85, aic = -481.7
##
## Training set error measures:
## ME RMSE MAE MPE MAPE
## Training set 0.002793002 0.03420653 0.02545209 0.05301649 0.4612939
## MASE ACF1
## Training set 0.2809729 0.004827989
Now we want to test to see if our predictors are good.
Box.test(passmod$residuals, type = "Ljung")
##
## Box-Ljung test
##
## data: passmod$residuals
## X-squared = 0.003427, df = 1, p-value = 0.9533
Our box test tells us that we have a p-value of .9533. In this test our null hypothesis is that our model is adequate. So in this case we fail to reject the null and say that our model is adequate.