In class on Tuesday we discussed ARIMA models. ARIMA stands for: autoregressive integrated moving average, which is a model that combines a autoregressive model, a moving average model, and a “difference” factor. The difference factor is used to balance our model so that the assumptions of having a mean of zero and a constant variance of our error terms. The ARIMA model is great for time series datasets, as the model is related to a factor of time, which can then model any long term trends.

To illustrate this point, I worked on the AirPassengers data set to model how the number of passengers changed over time.

data(AirPassengers)
plot(AirPassengers)

We can plot the set and observe that use of planes has increased over time, and also note that we do have seasonal trends. Our goal then is to build a model that accounts for the autocorrelation between error terms, the moving averages, and any differences that needs to get added in order to uphold our assumptions.

We use the auto.arima function on the dataset

mod55<-auto.arima(AirPassengers)
mod55
## Series: AirPassengers 
## ARIMA(2,1,1)(0,1,0)[12] 
## 
## Coefficients:
##          ar1     ar2      ma1
##       0.5960  0.2143  -0.9819
## s.e.  0.0888  0.0880   0.0292
## 
## sigma^2 estimated as 132.3:  log likelihood=-504.92
## AIC=1017.85   AICc=1018.17   BIC=1029.35

Calling the function then provides us the details of the ARIMA model. Note the parameters of the ARIMA function, (2,1,1). This represents the orders of each model component. 2 represents the order of the autoregressive model, or p. This is saying that, \[{y_t}={y_{t-1}},...,{y_{t-p}}\] with p=2. The following 1 represents the order of our difference factor, here this means we are taking the difference between consequtive observed values, while also subtracting the same from the left side of the equation Lastly, our q value represents the order of the moving average factor.

The auto.arima function provides the best model based on it’s AIC. So the function in R does a lot of behind the scenes work Next we can use our built model and forecast to a new level:

plot(forecast(mod55, h=20))

The plot shows the forecasted portion in the blue. Note that it accounts for an increasing variation, while also modeling the seasonal trends too.