Our focus for today was learning about ARIMA model that had seasonality. We also focused on using the diagnostic plots in our analysis. Our ARIMA models will have the form (p,d,q)x(P,D,Q)S where:
(p, d, q) are the nonseasonal components
(P, D, Q) are the seasonal componants
S is the frequency of the seasonal trend (12=monthly for example)
Knowing these parts of the model allow us to create our ARIMA model by hand, rather than using the auto.arima command from Tuesday. To do this, we need to carry out the following 5 steps:
Plot to ID seasonality
Do diffs to identify d and D
Preliminarily choose p, q, P, Q by looking at ACF and PACF
Estimate model using ARIMA
Look at diagnostics and compare models with AIC
We will use an example using the AirPassangers data to demonstrate these.
First we will download the data and plot the time series. We also take the log of the data because if displays an increasing variance.
library(quantmod)
## Loading required package: xts
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: TTR
## Version 0.4-0 included new data defaults. See ?getSymbols.
library(tseries)
library(timeSeries)
## Loading required package: timeDate
## Warning in as.POSIXlt.POSIXct(Sys.time()): unknown timezone 'zone/tz/2018c.
## 1.0/zoneinfo/America/Chicago'
##
## Attaching package: 'timeSeries'
## The following object is masked from 'package:zoo':
##
## time<-
library(forecast)
library(xts)
data(AirPassengers)
plot.ts(log(AirPassengers))
AirPass <- log(AirPassengers)
From this plot, we can see that there is an increasing trend over time, but also that there is a seasonal trend that seems to be repeating. We know from our data that it is in monthly intervals, so we will use S = 12.
Next we have to find d and D for our model using the differences. This can be done by first using the diff command. After that we will plot the difference.
diff1 <- diff(AirPass, lag = 12)
plot.ts(diff1)
We can now use the adf.test command to create a hypothesis test to determine if the difference is stationary.
adf.test(diff1)
##
## Augmented Dickey-Fuller Test
##
## data: diff1
## Dickey-Fuller = -3.1899, Lag order = 5, p-value = 0.09265
## alternative hypothesis: stationary
Based on the plot and the adf.test, we can conclude that the data is stationary and we don’t have to worry about non-seasonal differences.
For this step we will use the ACF and PACF plots to determine our p, q, P, and Q values. We will use ACF for q and Q, and we will use PACF for p and P.
Acf(diff1)
We see that the spikes for the first few lag values are above the significance level until we reach lag 8. This meand that q is going to be greater than or equal to 1. When determining Q, we need to look at lags 12 and 24. We see that the lag for 12 is significant but for 24 it isn’t, so we will use Q=1. Next we use the PACF plot.
Pacf(diff1)
Our spikes for both the first and second lag are significant, we we will need to use a p that is greater than or equal to 1. For P, we look at lags 12 and 24, and we see that the spike at lag 12 is not significant, so P will equal 0.
After using the Acf and Pacf plots to estimate our p’s and q’s, we can estimate the ARIMA model. Based on the plots, we will use p=1, P=0, q=1, and Q=1. We will also use D=1 and S=12.
mod.1 <- arima(AirPass, order = c(1,0,1), season = list(order = c(0,1,1), period = 12))
summary(mod.1)
##
## Call:
## arima(x = AirPass, order = c(1, 0, 1), seasonal = list(order = c(0, 1, 1), period = 12))
##
## Coefficients:
## ar1 ma1 sma1
## 0.9959 -0.3994 -0.5526
## s.e. 0.0050 0.0898 0.0741
##
## sigma^2 estimated as 0.001345: log likelihood = 245.61, aic = -483.22
##
## Training set error measures:
## ME RMSE MAE MPE MAPE
## Training set 0.003012636 0.03514401 0.02638549 0.05667535 0.4783111
## MASE ACF1
## Training set 0.2912769 0.01751614
We will use the box.test command to check our model. We will use the Ljung box test, as that is the improved version.
Box.test(mod.1$residuals, type = "Ljung")
##
## Box-Ljung test
##
## data: mod.1$residuals
## X-squared = 0.045108, df = 1, p-value = 0.8318
The large p-value of .83 indicates that the coefficients for our model are adequete and we have chosen the right p, q, and d values. We can check each one of these coefficients to make sure they are significant by calculating the test stat, which is the coefficient divided by the squared error of the coefficient with 140 degrees of freedom.
ar1.stat <- 0.9959/0.005
2*pt(abs(ar1.stat), df = 144-4, lower.tail = FALSE)
## [1] 1.138476e-173
ma1.stat <- -.3994/.0898
2*pt(abs(ma1.stat), df = 144-4, lower.tail = FALSE)
## [1] 1.755218e-05
sma1.stat <- -.5526/.0741
2*pt(abs(sma1.stat), df = 144-4, lower.tail = FALSE)
## [1] 8.409341e-12
As we can see from each of these p-values, we can reject our null hypothesis and determine that each of our coefficients is significant.
Finally, we can create a model using auto ARIMA to compare to the model constructed by hand.
auto.mod <- auto.arima(log(AirPassengers))
auto.mod
## Series: log(AirPassengers)
## ARIMA(0,1,1)(0,1,1)[12]
##
## Coefficients:
## ma1 sma1
## -0.4018 -0.5569
## s.e. 0.0896 0.0731
##
## sigma^2 estimated as 0.001371: log likelihood=244.7
## AIC=-483.4 AICc=-483.21 BIC=-474.77
The auto.arima command produces a slightly different variation of the model, as it has the form (0,1,1)(0,1,1)12. This could be for a variety of reasons, but the msot likely reason is simply that this is our first time learning about this so we still have a lot of practicing to do. We can use the AIC values to compare the two models.
AIC(mod.1)
## [1] -483.2233
AIC(auto.mod)
## [1] -483.3991
The AIC values for this model are actually almost identical, which honestly has me really confused. I expected the first AIC value to be significantly larger than the AIC value for the auto arima model. To be honest, I do not know what happened here, but I do not think this is right. Either way, I tried my best and feel like I still learned a lot about how to build an arima model by hand. With practice, this process should become easier and make even more sense.