In class on Tuesday we introduced ARIMA models. ARIMA stands for autoregressive integrated moving average. We learned how to make ARIMA models in R and forecast time series data. The autoregressive part of the ARIMA model, in part, considers past data in order to make predictions. The model also incorporates differencing, the integrated part, to obtain constant variance and zero mean for the residuals. The moving average takes the residuals from the previous data points.
I will be using the globtemp data in the astsa R package. The globtemp data is a time series of global mean land-ocean temperature deviations (from 1951-1980 average), measured in degrees Celsius, for the years 1880-2015. In order to use the ARIMA model in R, we first need to load some R packages.
library(astsa)
library(timeSeries)
library(tseries)
library(forecast)
library(xts)
data("globtemp")
The auto.arima() command with the data set as the argument creates the ARIMA model. The output also lists the orders of the autoregressive, differencing, and moving average parts of the model. It also lists the coefficients for each part of the model as well as the drift. The drift indicates the trend, but we will talk about this in the next class.
mod1 <- auto.arima(globtemp)
mod1
## Series: globtemp
## ARIMA(1,1,1) with drift
##
## Coefficients:
## ar1 ma1 drift
## 0.3549 -0.7663 0.0072
## s.e. 0.1314 0.0874 0.0032
##
## sigma^2 estimated as 0.01011: log likelihood=119.88
## AIC=-231.76 AICc=-231.46 BIC=-220.14
The output ARIMA (1,1,1) indicates a model with first order autoregression, first order differencing, and first order moving average. With the given coefficients, the model is: \[y_t=.355y_{t-1}-.766\epsilon_{t-1}+\epsilon_t\] The drift of .0072 indicates we have an overall upward trend.
We also learned how to forecast for our time series using the ARIMA model. Use the command forecast() with the model and the units of time to forecast out. Setting h=5 means that we will be forecasting five years into the future, from 2016 to 2020. It’s important to be familiar with the data in order to input the correct amount of time. The output also lists the forecasted values with prediction intervals for each time period.
fcast<-forecast(mod1, h=5)
fcast
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2016 0.8031838 0.6743294 0.9320383 0.6061180 1.000250
## 2017 0.7841177 0.6346035 0.9336320 0.5554555 1.012780
## 2018 0.7819961 0.6219774 0.9420147 0.5372687 1.026723
## 2019 0.7858872 0.6181356 0.9536388 0.5293333 1.042441
## 2020 0.7919120 0.6174348 0.9663893 0.5250721 1.058752
We can visualize how the forecasted data compares to the rest of the time series by plotting it with our data.
plot(fcast)
The dark blue shaded area shows the 80% prediction intervals, which we expect to be larger. The light blue shaded area is the 95% prediction interval.