library(forecast)
## Warning: package 'forecast' was built under R version 3.4.4
library(quantmod)
## Warning: package 'quantmod' was built under R version 3.4.4
## Loading required package: xts
## Warning: package 'xts' was built under R version 3.4.4
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.4.3
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: TTR
## Warning: package 'TTR' was built under R version 3.4.3
## Version 0.4-0 included new data defaults. See ?getSymbols.
library(tseries)
## Warning: package 'tseries' was built under R version 3.4.4
library(timeSeries)
## Warning: package 'timeSeries' was built under R version 3.4.4
## Loading required package: timeDate
## Warning: package 'timeDate' was built under R version 3.4.3
##
## Attaching package: 'timeSeries'
## The following object is masked from 'package:zoo':
##
## time<-
library(forecast)
library(xts)
We are using the AirPassenger data to find our own ARIMA model. First, we have to look to make sure there is no trend in our data–we don’t want the distance between peaks and pits growing over time.
plot.ts(AirPassengers)
This looks bad, so we can try a log transformation.
plot.ts(log(AirPassengers))
Now, we can filter our data so that we can see the seasonality.
plot.ts(filter(log(AirPassengers), sides=2, filter=rep(1/3,3)))
It looks like there is a yearly trend, so we will try differencing with 12.
diff1 <- diff(log(AirPassengers), lag = 12)
plot.ts(diff1)
adf.test(diff1)
##
## Augmented Dickey-Fuller Test
##
## data: diff1
## Dickey-Fuller = -3.1899, Lag order = 5, p-value = 0.09265
## alternative hypothesis: stationary
Because our adf test has a small p-value, we want to have a D of 1. Looking at the plot, there is not really a trend any more, so d will be 0.
Conclusions on differencing: D=1, d=0.
If there are spikes at the beginning, then we want to have q equal a number. If there are spikes every multiple, then we want Q to equal a number.
Acf(diff1)
Based on this information, we want a q of 1 or 2. We want a Q of 1.
Conclusion: q=1 or 2, Q=1
We can use PACF for the autoregressive component. If there are spikes at the beginning, then p will not be 0. If there are spikes at multiples, then P will not be zero.
Pacf(diff1)
Based on this graph, I would say p=1 and P=0.
Conclusion: p=1, P=0.
Based on our testing earlier, we will run an ARIMA of (1, 0, 1) x (0, 1, 1) [12]
mod1 <- arima(log(AirPassengers), order = c(1,0,1), season = list( order = c( 0,1,1), period=12))
summary(mod1)
##
## Call:
## arima(x = log(AirPassengers), order = c(1, 0, 1), seasonal = list(order = c(0,
## 1, 1), period = 12))
##
## Coefficients:
## ar1 ma1 sma1
## 0.9959 -0.3994 -0.5526
## s.e. 0.0050 0.0898 0.0741
##
## sigma^2 estimated as 0.001345: log likelihood = 245.61, aic = -483.22
##
## Training set error measures:
## ME RMSE MAE MPE MAPE
## Training set 0.003012636 0.03514401 0.02638549 0.05667535 0.4783111
## MASE ACF1
## Training set 0.2912769 0.01751614
Box.test(mod1$residuals, type = "Ljung")
##
## Box-Ljung test
##
## data: mod1$residuals
## X-squared = 0.045108, df = 1, p-value = 0.8318
We want to check that none of our test stats are over 2; if they are over 2 then that will make our p-values too big. We can do this manually; by taking \(\theta\) / SE(\(\theta\)) or \(\phi\) / SE(\(\phi\)). However, we can use the Ljung-Box funtion instead. Which we see at the bottom of this output. It tells us that our p-value is 0.8381. This is good, that means that we have no evidence that our model is inadequate.
If we wanted to do it by hand, we can. We will show an example with ar1.
(teststat <- 0.9959/0.0050)
## [1] 199.18
2*pt(abs(teststat), df = 144-3, lower.tail = FALSE)
## [1] 1.109161e-174
This p-value is small, so we know it should stay in the model.
Even though this model is good, we might as well try adding more components to see what we get.
mod2 <-arima(log(AirPassengers), order = c(2,1,1), season = list( order = c(1, 1, 2), period = 12))
summary(mod2)
##
## Call:
## arima(x = log(AirPassengers), order = c(2, 1, 1), seasonal = list(order = c(1,
## 1, 2), period = 12))
##
## Coefficients:
## ar1 ar2 ma1 sar1 sma1 sma2
## 0.5467 0.2597 -0.9654 -0.9515 0.3821 -0.4916
## s.e. 0.0965 0.0954 0.0472 0.2510 0.2992 0.1984
##
## sigma^2 estimated as 0.001293: log likelihood = 246.52, aic = -479.03
##
## Training set error measures:
## ME RMSE MAE MPE MAPE
## Training set -0.0007550448 0.03432059 0.02519184 -0.01036797 0.4566572
## MASE ACF1
## Training set 0.2780999 -0.005528303
Box.test(mod2$residuals, type = "Ljung")
##
## Box-Ljung test
##
## data: mod2$residuals
## X-squared = 0.0044933, df = 1, p-value = 0.9466
(teststat <- .3821/.2992)
## [1] 1.277072
2*pt(abs(teststat), df = 144-3, lower.tail = FALSE)
## [1] 0.2036758
This p-value is too big, so we will take this out of our model. After we fit it, we can compare it to our original one.
mod3 <-arima(log(AirPassengers), order = c(2,1,1), season = list( order = c(1, 1, 1), period = 12))
summary(mod3)
##
## Call:
## arima(x = log(AirPassengers), order = c(2, 1, 1), seasonal = list(order = c(1,
## 1, 1), period = 12))
##
## Coefficients:
## ar1 ar2 ma1 sar1 sma1
## 0.5552 0.2530 -0.9653 -0.0598 -0.5168
## s.e. 0.0956 0.0949 0.0466 0.1551 0.1367
##
## sigma^2 estimated as 0.001305: log likelihood = 246.21, aic = -480.42
##
## Training set error measures:
## ME RMSE MAE MPE MAPE
## Training set -0.0007815228 0.0344809 0.02534988 -0.01083164 0.4593521
## MASE ACF1
## Training set 0.2798445 -0.002774659
Box.test(mod3$residuals, type = "Ljung")
##
## Box-Ljung test
##
## data: mod3$residuals
## X-squared = 0.0011319, df = 1, p-value = 0.9732
Now that the new model is set up, we can use AIC to compare them.
AIC(mod1)
## [1] -483.2233
AIC(mod3)
## [1] -480.4212
Obviously, mod3 is much smaller.
In conclusion, our hand-built ARIMA model is (2, 1, 1) x (1, 1, 1,) [12].
There is a function in R that can auto fit an ARIMA model. We will compare the one we built to the one that R gives us.
(autoMod <- auto.arima(log(AirPassengers)))
## Series: log(AirPassengers)
## ARIMA(0,1,1)(0,1,1)[12]
##
## Coefficients:
## ma1 sma1
## -0.4018 -0.5569
## s.e. 0.0896 0.0731
##
## sigma^2 estimated as 0.001371: log likelihood=244.7
## AIC=-483.4 AICc=-483.21 BIC=-474.77
R’s is a little difference in the autoregressive section. We can use AIC to compare which model is better.
AIC(mod3)
## [1] -480.4212
AIC(autoMod)
## [1] -483.3991
They are quite close, so we want to go with the smaller model. Our final decision should be to use the model that R gave us.
The last subject we will use with our ARIMA models is to forecast the data into the future.
plot(forecast(autoMod, h = 12), main = "auto.arima")
plot(forecast(mod3, h = 12), main = "home-cooked")
When clicking between the two, we can’t find too much of a difference.