The main concept we discussed today was using ARIMA models to make predictions for time series. ARIMA models incorporate three aspects of time series modeling to make the most accurate predictions. First, they use autoregressive aspects. This means that they use previous data points to predict the current data point. This is similar to using a linear regression model to make predictions only using previous data points as predictor variables. Second, they incorporate differencing to get constant variance and a zero mean. Differencing means that the model uses the differences between consecutive measurements instead of the measurements themselves. Third, ARIMA incorporates moving average aspects. Similar to the autoregressive aspect, the moving average aspect is like making a linear regression model using the error terms for the previous data points as predictors.
ARIMA models are important to use when making predictions on a time series data set because they are one of the most accurate prediction techniques available.
There are functions within R that can help us make these models. In the following section, I will illustrate these functions using an example.
The data set I will be using for this example is the data set \(\texttt{globtemp}\) from the R package astsa. This data set gives the global mean land-ocean temperature deviations, measured in centigrade, for 1880-2015. In addition to loading this package, we will need to load additional packages to run our ARIMA commands. These packages include quantmod, tseries, timeSeries, forecast and xts. The first command we’ll use is \(\texttt{auto.arima(dataset)}\). This will automatically give us the orders of the autoregressive, differencing and moving average portions of our model as well as the coefficients we can use in the model to build our equation.
library(astsa)
library(quantmod)
library(timeSeries)
library(tseries)
library(forecast)
library(xts)
data(globtemp)
mod <- auto.arima(globtemp)
mod
## Series: globtemp
## ARIMA(1,1,1) with drift
##
## Coefficients:
## ar1 ma1 drift
## 0.3549 -0.7663 0.0072
## s.e. 0.1314 0.0874 0.0032
##
## sigma^2 estimated as 0.01011: log likelihood=119.88
## AIC=-231.76 AICc=-231.46 BIC=-220.14
Our model output tells us that we should use a model with one degree of autoregression, one degree of moving average and one degree of differencing. Thus, the model would look like this: \(y_t = .3549 y_{t-1} + \epsilon_t - .7663 \epsilon_{t-1}\). Drift indicates the overall trend of the model, so our drift of .0072 tells us that we have an upward trend.
Now that we have our model, we can start forecasting, or making predictions for our time series. We can use the command \(\texttt{forecast(model, h)}\) where h is the number of time units in the future we would like to forecast. Let’s forcast 5 years into the future, so we would be predicting the mean land-ocean temperature deviations for 2016-2020.
fore <- forecast(mod, h=5)
fore
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2016 0.8031838 0.6743294 0.9320383 0.6061180 1.000250
## 2017 0.7841177 0.6346035 0.9336320 0.5554555 1.012780
## 2018 0.7819961 0.6219774 0.9420147 0.5372687 1.026723
## 2019 0.7858872 0.6181356 0.9536388 0.5293333 1.042441
## 2020 0.7919120 0.6174348 0.9663893 0.5250721 1.058752
Our forecast output gives us the point prediction as well as the 80% prediction interval and the 95% prediction interval for each point prediction. We can see these point predictions as well as their corresponding prediction intervals on a plot of our time series using the command \(\texttt{plot(forecast.output)}\).
plot(fore)
On our plot, the black line represents the data points we have already observed, the blue line represents our point predictions, the dark blue shaded area represents our 80% prediction interval and the light blue shaded area represents our 95% prediction interval for our point predictions.
Using ARIMA models is very relevant for our class because they allow us to make predictions for time series models which are a large part of our curriculum. They are integral to any time series analysis and are an important tool to have. Additionally, they expand upon the idea of the moving average techniques that we discussed previously.