Today we covered ARIMA models. ARIMA stands for AutoRegressive Integrated Moving Average. We need to load several packages first. The autoregressive component models the current data point using previous data as predictors. The moving average uses the residuals of previous data points to predict the current data point. The integrated component is used to achieve equal variance.
library(quantmod)
## Warning: package 'quantmod' was built under R version 3.3.3
## Loading required package: xts
## Warning: package 'xts' was built under R version 3.3.3
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: TTR
## Warning: package 'TTR' was built under R version 3.3.3
## Version 0.4-0 included new data defaults. See ?getSymbols.
library(tseries)
## Warning: package 'tseries' was built under R version 3.3.3
library(timeSeries)
## Warning: package 'timeSeries' was built under R version 3.3.3
## Loading required package: timeDate
##
## Attaching package: 'timeSeries'
## The following object is masked from 'package:zoo':
##
## time<-
library(forecast)
## Warning: package 'forecast' was built under R version 3.3.3
library(xts)
data(gas)
We can work with the gas dataset from the forecast library. This is a time series measuring the monthly gas production in Australia between 1956 and 1995. The auto.arima function will create an ARIMA model using our dataset and display the output.
plot(gas)
MyMod <- auto.arima(gas)
MyMod
## Series: gas
## ARIMA(2,1,1)(1,0,0)[12]
##
## Coefficients:
## ar1 ar2 ma1 sar1
## 0.5117 0.1824 -0.9638 0.8478
## s.e. 0.0502 0.0498 0.0134 0.0277
##
## sigma^2 estimated as 3201509: log likelihood=-4236.9
## AIC=8483.81 AICc=8483.94 BIC=8504.63
We can see this time series model has a lot of seasonal variation. We will cover how to account for seasonality later on. Now that we have our model, we can use it to forecast future Australian gas production. We do this using the forecast function, and the arguments are the model we created and the number of periods for forecasting defined as “h” (this is how far we are forecasting). Let’s try predicting 12 months into the future.
forecastGas <- forecast(MyMod,h=12)
plot(forecastGas)
The dark blue line is our forecast for future monthly gas production for one year. This line is surrounded by two regions in light blue. These regions correspond to prediction intervals. The ligher region is for a 95% interval while the slightly darker region is for an 80% interval. As we have seen before, a higher confidence leverl results in a wider interval. The farther into the future we forecast, the wider our intervals will be.
We can change the value of h to predict one month into the future.
plot(forecast(MyMod, h = 1))
In this case, the blue dot is our point prediction. There are two shaded regions that correspond to prediction intervals. The light blue region is for a 95% interval while the slightly darker region is for an 80% interval.
We have already covered the concept of a moving average in previous classes. ARIMA incorporates the moving average along with autoregression and differencing to create a more accurate time series model. While the concepts are not entirely intuitive, the code is fairly straightforward.