In the previous lecture we learned how a regression model can capture trend and seasonality
In this chapter we are going to learn how a regression model can be used to quantify the correlation between neighboring values in a time series (called autocorrelation)
This type of model, called an autoregressive (AR) model, is useful for improving forecast accuracy by making use of the information contained in the autocorrelation (beyond trend and seasonality)
If we know that values tend to be followed by similar values (positive autocorrelation) or by very different values (negative autocorrelation), then we can use that information to adjust forecasts
A stationary time series is one whose properties do not depend on the time at which the series is observed
Thus, time series with trends, or with seasonality, are not stationary — the trend and seasonality will affect the value of the time series at different times.
On the other hand, a white noise series is stationary — it does not matter when you observe it, it should look much the same at any point in time
In general, a stationary time series will have no predictable patterns in the long-term
Time plots will show the series to be roughly horizontal (although some cyclic behavior is possible), with constant variance
More formally, we say that a time series is stationary if the multivariate cumulative distribution function of the time series does not change overtime
This is what we call the strong stationarity.
A more simpler and comprehensive definition of the stationarity is what we call the weak stationarity.
A time series is said weakly stationary if
y <- rnorm(1000, 0, 1)
plot.ts(y, main="A stationary time series")
x <- 1:1000
y <- rnorm(1000, 0, 1*x)
z <- 2*x + y
plot.ts(z, main="A non-stationary series")
Which of these series is stationary?
One simple way to make a non-stationary time series stationary is to compute the differences between consecutive observations (known as differencing)
Differencing can help stabilise the mean of a time series by removing changes in the level of a time series, and therefore eliminating (or reducing) trend and seasonality, \(y_t^{'}=y_t - y_{t-1}\)
Occasionally the differenced data will not appear to be stationary and it may be necessary to difference the data a second time to obtain a stationary series \[\begin{align} y_t^{''} &= y_t^{'} - y_{t-1}^{'} \notag \\ &= (y_t - y_{t-1}) - (y_{t-1} - y_{t-2}) \notag \\ &= y_t -2y_{t-1} + y_{t-2} \notag \end{align}\]
Transformations such as logarithms can also help to stabilize the variance of a time series
A seasonal difference is the difference between an observation and the previous observation from the same season, \(y_t^{'} = y_t - y_{t-m}\), where \(m =\) number of seasons
These are also called lag-m differences, as we subtract the observation after a lag of \(m\) periods
Sometimes it is necessary to take both a seasonal difference and a first difference to obtain stationary data
cbind("Billion kWh" = usmelec,
"Logs" = log(usmelec),
"Seasonally\n differenced logs" =
diff(log(usmelec),12),
"Doubly\n differenced logs" =
diff(diff(log(usmelec),12),1)) %>%
autoplot(facets=TRUE) +
xlab("Year") + ylab("") +
ggtitle("Monthly US net electricity generation")
If a time series can be made stationary by differencing, it is said to contain a unit root
One way to determine more objectively whether differencing is required is to use a unit root test
These are statistical hypothesis tests of stationarity that are designed for determining whether differencing is required
A number of unit root tests are available, which are based on different assumptions and may lead to conflicting answers
Augmented Dickey-Fuller (ADF) test (Said and Dickey, 1984)
Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test (Kwiatkowski, Phillips, Schmidt, & Shin, 1992)
Elliott-Rothenberg-Stock (ERS) test (Elliott, G., Rothenberg, T.J. and Stock, J.H.,1996)
library(urca)
library(fpp2)
goog %>% ur.kpss() %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 7 lags.
##
## Value of test-statistic is: 10.7223
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
The test statistic is greater than the 1% critical value, indicating that the null hypothesis is rejected, that is, the data are not stationary
We apply differencing and re-run KPSS test. The results are shown below.
#performs KPSS test on the differenced series
goog %>% diff() %>% ur.kpss() %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 7 lags.
##
## Value of test-statistic is: 0.0324
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
library(tseries)
goog %>% adf.test()
##
## Augmented Dickey-Fuller Test
##
## data: .
## Dickey-Fuller = -2.5417, Lag order = 9, p-value = 0.349
## alternative hypothesis: stationary
ndiffs(goog)
## [1] 1
No seasonal differences are suggested if \(F_S < 0.64\), otherwise one seasonal difference is suggested
We can apply nsdiffs() to the logged US monthly electricity data
#Determines the number of seasonal differencing
usmelec %>% log() %>% nsdiffs()
## [1] 1
#Determines the number of differencing
#on the seasonally differenced series
usmelec %>% log() %>% diff(lag=12) %>% ndiffs()
## [1] 1
In time series data, values in neighboring periods tend to be correlated
Such correlation, called autocorrelation, is informative and can help in improving forecasts
If values tend to be followed by similar values, then we have positive autocorrelation
If values tend to be followed by very different values, then we have negative autocorrelation
We use that information to adjust forecasts
We will now discuss how to compute the autocorrelation of a series and how best to utilize the information for improving forecasting
To compute autocorrelation, we compute the correlation between the series and a lagged version of the series
This is illustrated in the following code chunk.
amtrak <- read.csv("Amtrak.csv")
#Time series containing the 1st 24 months
amtrak.ts <- ts(amtrak$Ridership,
start=c(1991,1),
end=c(1992,12),
freq=12)
#Generates lag 1 series using dplyr package
lag1 <- dplyr::lag(as.data.frame(amtrak.ts),1)
#Creates a data frame
cordat <- cbind(as.data.frame(amtrak.ts),lag1)
#Computes the Pearson correlation
cor(cordat,use="complete.obs")
## x x
## x 1.00000000 0.08088924
## x 0.08088924 1.00000000
Acf(amtrak.ts,lag.max=12, main="")
A few typical autocorrelation behaviors that are useful to explore are:
Strong autocorrelation (positive or negative) at multiples of a lag larger than 1 typically reflects a cyclical pattern. For example, strong positive autocorrelation at lags 12, 24, 36,. . . in monthly data will reflect an annual seasonality (where values during a given month each year are positively correlated).
Positive lag-1 autocorrelation (called “stickiness”) describes a series where consecutive values move generally in the same direction. In the presence of a strong linear trend, we would expect to see a strong positive lag-1 autocorrelation.
Negative lag-1 autocorrelation reflects swings in the series, where high values are immediately followed by low values and vice versa.
AR models are similar to linear regression models, except that the predictors are the past values of the series
An an autoregressive model of order p can be written as
\[ y_t = c + \phi_1 y_{t-1} + \phi_2 y_{t-2} + \cdots + \phi_p y_{t-p} + \epsilon_t \]
where \(\epsilon_t\) is white noise.
This is like a multiple regression but with lagged values of \(y_t\) as predictors
We refer to this as an \(AR(p)\) model, an autoregressive model of order \(p\)
Autoregressive models are remarkably flexible at handling a wide range of different time series patterns
Below are examples of AR models:
The corresponding equations of the above AR models are
AR(1): \(y_t =18 - 0.8 y_{t-1} + \epsilon_t\)
AR(2): \(y_t = 8 + 1.3 y_{t-1} - 0.7y_{t-2} + \epsilon_t\)
In both equations \(\epsilon_t \sim N(0, 1)\)
Changing the parameters \(\phi_1, \phi_2, \cdots, \phi_p\) results in different time series patterns
The variance of the error term \(\epsilon_t\) will only change the scale of the series, not the patterns
For an AR(1) model:
We normally restrict autoregressive models to stationary data, in which case some constraints on the values of the parameters are required
When \(p \geq 3\), the restrictions are much more complicated. R takes care of these restrictions when estimating a model.
Rather than using past values of the forecast variable in a regression, a moving average model uses past forecast errors (\(\epsilon_t\)) in a regression-like model. \[ y_t = c + \epsilon_t + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} + \cdots + \theta_q \epsilon_{t-q} \]
where \(\epsilon_t\) is white noise.
We refer to this as an \(MA(q)\) model, a moving average model of order \(q\).
Note that we do not actually observe the values of \(\epsilon_t\), so it is not really a regression in the usual sense.
Notice that each value of \(y_t\) can be thought of as a weighted moving average of the past few forecast errors
Below are examples of MA models
The corresponding equations of the above MA models are
MA(1): \(y_t = 20 + \epsilon_t + 0.8 \epsilon_{t-1}\)
MA(2): \(y_t = \epsilon_t - \epsilon_{t-1} + 0.8 \epsilon_{t-2}\)
In both cases \(\epsilon_t \sim N(0, 1)\)
Stationarity constraints of the MA model:
More complicated conditions hold for \(q \geq 3\). Again, R will take care of these constraints when estimating the models
If we combine differencing with autoregression and a moving average model, we obtain a (non-seasonal) ARIMA model
ARIMA is an acronym for AutoRegressive Integrated Moving Average (in this context, “integration” is the reverse of differencing)
The full model can be written as \[ y_t^{'} = c + \phi_1 y_{t-1}^{'} + \cdots + \phi_p y_{t-p}^{'} + \theta_1 \epsilon_{t-1} + \cdots + \theta_q \epsilon_{t-q} + \epsilon_t \] where:
We call this an \(ARIMA(p,d,q)\) model, where: \(p\) = order of the autoregressive part; \(d\) = degree of first differencing involved; and \(q\) = order of the moving average part
Many of the models we have already discussed are special cases of the ARIMA model
Selecting appropriate values for \(p, d, \text{and}\; q\) can be difficult
The auto.arima() function in R will do it automatically for us
The use of the auto.arima() function is illustrated in the following code chunks
autoplot(uschange[,"Consumption"]) +
xlab("Year") + ylab("Quarterly percentage change")
fit <- auto.arima(uschange[,"Consumption"], seasonal=FALSE)
fit
## Series: uschange[, "Consumption"]
## ARIMA(1,0,3) with non-zero mean
##
## Coefficients:
## ar1 ma1 ma2 ma3 mean
## 0.5885 -0.3528 0.0846 0.1739 0.7454
## s.e. 0.1541 0.1658 0.0818 0.0843 0.0930
##
## sigma^2 = 0.3499: log likelihood = -164.81
## AIC=341.61 AICc=342.08 BIC=361
\[ y_t = c + 0.589 y_{t-1} - 0.353 \epsilon_{t-1} + 0.085 \epsilon_{t-2} + 0.174 \epsilon_{t-3} + \epsilon_t \]
Here \(c = 0.745 \times (1 − 0.589) = 0.307\) and \(\epsilon_t\) is white noise with standard deviation of \(\sqrt{0.3499} \approx 0.592\)
Forecasts from the model are obtained and plotted using the following code chunk:
fit |>
forecast(h=10) |>
autoplot(include=80)
The auto.arima() function is useful, but anything automated can be a little dangerous, and it is worth understanding something of the behavior of the models even when you rely on an automatic procedure to choose the model for you.
The constant \(c\) has an important effect on the long-term forecasts obtained from these models
The value of \(d\) also has an effect on the prediction intervals — the higher the value of \(d\), the more rapidly the prediction intervals increase in size
For \(d=0\), the long-term forecast standard deviation will go to the standard deviation of the historical data, so the prediction intervals will all be essentially the same
The value of \(p\) is important if the data show cycles
To obtain cyclic forecasts, it is necessary to have \(p \geq 2\), along with some additional conditions on the parameters.
For an AR(2) model, cyclic behavior occurs if \(\phi_1^2 + 4 \phi_2 < 1\). In that case, the average period of the cycles is \[ \frac{2 \pi}{arccos(-\phi_1(1+\phi_2)/(4\phi_2))} \]
Selecting appropriate values for \(p\), \(d\), and \(q\) can be difficult
The autocorrelation function (ACF) plot is a bar chart of coefficients of correlation between a time series and its lagged values
ACF plot shows the autocorrelations between \(y_t\) and \(y_{t-k}\) for different values of \(k\)
The partial autocorrelation function (PACF) plot shows the relationship between \(y_t\) and \(y_{t-k}\) after removing the effects of lags \(1,2,\cdots,k-1\)
If the time series is from an \(ARIMA(p,d,0)\) or \(ARIMA(0,d,q)\), then the ACF and PACF plots can be helpful in determining the values of \(p\) and \(q\)
The data may follow an \(ARIMA(p,d,0)\) model if the ACF and PACF plots of the differenced data show the following patterns:
The data may follow an \(ARIMA(0,d,q)\) model if the ACF and PACF plots of the differenced data show the following patterns:
AR(1) Process
AR(2) Process
MA(1) Process
MA(2) Process
Once the model order has been identified, that is, the values of \(p\), \(d\), and \(q\) have been identified, we need to estimate the parameters \(c, \phi_1, \cdots, \phi_p, \theta_1, \cdots, \theta_q\)
Maximum likelihood estimation (MLE) finds the values of the parameters which maximize the probability of obtaining the data
The values of \(p\), \(d\), and \(q\) obtained based on the ACF and PACF serves as initial order of the ARIMA model
We can experiment with different models in the neighborhood of the initial model
Several statistical measures can be used to select the “best” fitting ARIMA model: AIC, BIC, etc
The Akaike’s Information Criterion (AIC) can be used to compare competing ARIMA models \[ AIC = -2log(L) + 2(p + q + k + 1) \] where: \(L\) is the likelihood; \(k=1\) if \(c \neq 0\) and \(k=0\) if \(c=0\)
To adjust for overfitting (too many parameters) when the sample size is small, we use the corrected AIC (AICc) \[ AICc = AIC + \frac{2(p+q+k+1)(p+q+k+2)}{T-p-q-k-2} \]
The Bayesian Information Criterion (BIC) can be written as \[ BIC = AIC + [log(T)-2](p+q+k+1) \]
Good models have minimum AIC, AICc or BIC
General process for forecasting using an ARIMA model
library(fpp2)
library(ggplot2)
library(dplyr)
elecequip %>% stl(s.window='periodic') %>% seasadj() -> eeadj
autoplot(eeadj)
eeadj %>% diff() %>% ggtsdisplay(main="")
fit1 <- Arima(eeadj, order=c(3,1,1))
summary(fit1)
## Series: eeadj
## ARIMA(3,1,1)
##
## Coefficients:
## ar1 ar2 ar3 ma1
## 0.0044 0.0916 0.3698 -0.3921
## s.e. 0.2201 0.0984 0.0669 0.2426
##
## sigma^2 = 9.577: log likelihood = -492.69
## AIC=995.38 AICc=995.7 BIC=1011.72
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.03288179 3.054718 2.357169 -0.00647009 2.481603 0.28844
## ACF1
## Training set 0.008980578
checkresiduals(fit1)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(3,1,1)
## Q* = 24.034, df = 20, p-value = 0.2409
##
## Model df: 4. Total lags used: 24
autoplot(forecast(fit1))