Visit my website for more like this! I would love to hear your feedback (seriously).
library(astsa, quietly=TRUE, warn.conflicts=FALSE)
require(knitr)
library(ggplot2)
Heavily borrowed from:
Online course at Penn State
So far we have introduced autocorrelation and cross-correlation (ACF and CCF) as techniques for clarifying relations that may occur within and between time series at various lags. We also know how to build linear models based on classical regression techniques, and exploit the large lagged correlations noted from ACF or CCF to generate accurate predictor variables. This section deals with data that are possibly nonstationary with emphasis on forecasting future values. Now that we have covered the basics, we can focus on time series specific regression techniques, as classical regression is often not suitable for explaining all aspects of a time series.
Using correlation we can identify lagged linear relationships, which give rise to autoregressive (AR) and autoregressive moving average (ARMA) models. Next we explore nonstationary models, the autoregressive integrated moving average (ARIMA). Eventually, we demonstrate the Box-Jenkins method for identifying an ARIMA model, as well as techniques for parameter estimation and forecasting. ARIMA models are models that may possibly include AR terms, MA terms, and differencing operations.
Most software specify the model attributes in the order parameter (AR order, differencing, MA order).
One of the biggest challenges in time series analysis, is choosing the best model for the data at hand. Here we also introduce the partial autocorrelation function, which is the final diagnostic tool (paired with ACF, CCF, and lagged scatter plots) to help us choose the most suitable model.
Classical regression is a static global model. With time series, we can allow the dependent variable to be influenced by not only the current values of the independent variables, but past values as well. We can even include past values of the dependent variable. If the present can be modeled using only the past values of the independent variables, we can start to make very appropriate forecasting predictions.
Autoregressive models are based on the idea that the current value of the series xt, can be explained as a function of past p values, where p is the number of steps into the past needed to forecast the current model. The extent to which we can forecast a real data series from its own past values can be investigated by interpreting the autocorrelation function, and/or lagged scatter plot matrix. For example, we may find using the scatter plots shows that lags 4 and 5 are linearly associated with the current value. The ACF could indicate positive lags at 1, 2, 12, and 36. The AR(1) model takes the form
\[xt=δ+ϕ1xt−1+wt\]
where the correlation between observations h time periods apart is:
\[ρh=ϕh1\]
AR(1) models value of x at time t is a linear function of the value of x at time t-1. The model assumes:
A example of a sample AR(1) process with \(φ = 0.9\) and \(φ = 0.-9\) are plotted below. This constant parameter defines how closely values across time are correlated to each other. \(φ = 0.9\) means that observations contiguous in time will tend to be close in value. The second example then, implies the opposite, whereby values contiguous in time are negatively correlated. However, observations two points in time apart are positively correlated.
In general, if φ1 is positive the ACF exponentially decreases to 0 as lag h increases, and for negative φ1, the ACF also exponentially decays to 0 as lag increases, but the algebraic signs for the autocorrelations alternate between positive and negative.
par(mfrow=c(4,1))
plot(arima.sim(list(order=c(1,0,0), ar=0.9), n=100), ylab='x',
main=(expression(AR(1)~~~phi==+0.9)))
acf(arima.sim(list(order=c(1,0,0), ar=0.9), n=100), lag.max=11, main='ACF')
plot(arima.sim(list(order=c(1,0,0), ar=-0.9), n=100), ylab='x',
main=(expression(AR(1)~~~phi==-0.9)))
acf(arima.sim(list(order=c(1,0,0), ar=-0.9), n=100), lag.max=11, main='ACF')
AR(1) models with \(|ϕ1|>1\) are explosive because the time series can quickly become large in magnitude.
Moving average models MA of order q (MA(q)), like autoregressive models that use lagged versions of xt - 1 multiplied by a coefficient to predict xt , MA(q) models model past error multiplied by a coefficient.
\[MA(1): xt=μ+wt+θ1wt−1\] \[MA(2): xt=μ+wt+θ1wt−1+θ2wt−2\] \[MA(3): xt=μ+wt+θ1wt−1+θ2wt−2+⋯+θqwt−q\]
where \(μ\) is the mean and \(wt\) is the Gaussian normally distributed error term with mean 0.
In general, MA(q) models have non-zero autocorrelations for the first q lags and autocorrelations = 0 for all lags > q.
A sample ACF with significant autocorrelation at only lag 1 is an indicator of a possible MA(1) model.
set.seed(1234)
ma1<-arima.sim(list(order=c(0,0,1), ma=c(-0.5)), n=100) #Simulates 200 values from MA(1)
par(mfrow=c(2,1))
plot(ma1, type='b', main='Simulated MA(1)', ylab='y')
acf(ma1, main='ACF for simulated sample data')
A sample ACF with significant correlations at lags 1 and 2, but non-significant autocorrelation for higher lags indicates possible MA(2) model.
set.seed(1234)
ma1<-arima.sim(list(order=c(0,0,2), ma=c(0.5, 0.5)), n=100) #Simulates 200 values from MA(1)
par(mfrow=c(2,1))
plot(ma1, type='b', main='Simulated MA(2)', ylab='y')
acf(ma1, main='ACF for simulated sample data')
We note that the correlation between lags is the same for \(θ\) and \(1/θ\). To satisfy this theoretical restriction called invertibility, we restrict MA(1) models to have values with an absolute value < 1.
A MA model is said to be converging if it is algebraically equivalent to a converging infinite order AR model. This means that the AR coefficients decrease to 0 as we move back in time. This invertibility is programmed into our software and is not something we worry about in data analysis.
We have seen that for MA(q) models, the ACF will be insignificant for lags > q. Therefore, the ACF is provides considerable information for specifying an MA(q) process. Unfortunately, if the process is ARMA or AR, the ACF alone yields little information about the orders of dependence (p, q). There is however, another function which can act like the ACF for an MA process, but for AR models; the partial autocorrelation function (PACF).
An easy example of a PACF can be explained using a linear regression where we predict y from x1, x2, and x3. Basically, in a PACF we want to, for example, correlate the “parts” of y and x3 that are not predicted by x1 and x2. However, what happens normally (non-partial) is that the linear dependency between x3 and y has accounted accounted for the dependency between y and x1 and x2.
Similarly for time series, For a time series, the PACF is the conditional autocorrelation between xs and xt, with the linear effect of everything in between those two points removed. Consider and AR(1) model, whereby the correlation between xt and xt - 2 is not zero, as it would be for an MA(1), because xt is dependent on xt - 2 through xt - 1. Thus, a PACF would break this chain of dependence by literally subtracting out (or partial out) the effect of xt - 1.
For an AR model, the theoretical PACF essentially “shuts off” the past order of the model. Thus, exactly the same as the MA(q) model selection using an ACF, we identify the order of the model by the number of non-zero partial autocorrelations. An AR(1) model will have one significant (non-zero) autocorrelation at lag 1 in the PACF.
Note: the PACF of an MA(1) model will have a pattern that gradually tapers to zero.
Recall that we specify a model using the order attribute, (AR order, differencing, MA order). And that specifying an ARIMA model is all about identifying the form combination for any particular time series. Most software specify the model attributes in the order parameter (AR order, differencing, MA order).
Three items should be considered to determine a first guess at the form of an ARIMA model:
What to look for : possible trends, seasonality, outliers, constant/non-constant variance.
This is the most obvious first step to begin to understand the data; though you will not be able to spot a model, you could be informed of some possible next steps.
If there is an obvious upward or downward linear trend, a first difference may be needed, or second differences for a quadratic term. Yet, over differencing can cause us to introduce unnecessary levels of dependency.
For data with a curved upward trend and increasing variance, we can consider transforming the series with either logarithm or square root.
Note: non-constant variance in a series with no trend may indicate something like and ARCH model which is designed for modelling changing variance over time, we will cover this later.
The ACF and PACF should be considered together, as a few combining patterns will stand out, especially with experience.
After you have made a few guesses at the possible model, we use software to estimate the coefficients. Usually this is done using maximum likelihood estimation methods. Once the model is estimated…
Sometimes, more than one model can appear to work for the same time series.
Lake Erie water level in October. There is possibly some overall trend, but it we can't be sure yet. We will not worry about a trend for now. Let's plot the ACF and PACF.
data <- c(14.3, 14.6, 13.5, 14.2, 12.1, 14.19, 14.55, 13.6, 14.59, 16.6, 15.4,
12.89, 12.2, 12.15, 10.89, 11.11, 11.98, 13.44, 14.05, 13.91, 14.1,
12.66, 14.6, 16.7, 15.4, 17.1, 15.45, 16.76, 15.7, 14.1, 15.3, 16.4,
17.09, 16.8, 16.98, 16.39, 15.8, 15.1, 13.7, 13.79, 15.7)
# load built in commands for TSA diagnostics
source(url("http://lib.stat.cmu.edu/general/tsa2/Rcode/itall.R"))
## itall has been installed
plot(data, type='b', ylab='water level')
acf2(data)
## ACF PACF
## [1,] 0.72 0.72
## [2,] 0.52 0.02
## [3,] 0.39 0.01
## [4,] 0.24 -0.09
## [5,] 0.14 -0.02
## [6,] 0.08 0.00
## [7,] 0.15 0.23
## [8,] 0.17 0.02
## [9,] 0.13 -0.11
## [10,] 0.03 -0.18
## [11,] -0.12 -0.24
## [12,] -0.16 0.11
## [13,] -0.15 0.16
## [14,] -0.12 0.06
## [15,] -0.14 -0.22
## [16,] -0.19 -0.32
## [17,] -0.14 0.05
The PACF shows a single spike at the first lag, and the ACF shows a tapering pattern, right away we can see an AR(1) model might be appropriate.
sarima (data, 1, 0, 0, ) # this is the AR(1) model estimated with the tools loaded above
## initial value 0.519565
## iter 2 value 0.153108
## iter 3 value 0.153003
## iter 4 value 0.152866
## iter 5 value 0.152842
## iter 6 value 0.152841
## iter 6 value 0.152841
## final value 0.152841
## converged
## initial value 0.150083
## iter 2 value 0.149811
## iter 3 value 0.149705
## iter 4 value 0.149704
## iter 5 value 0.149704
## iter 6 value 0.149704
## iter 6 value 0.149704
## final value 0.149704
## converged
## $fit
##
## Call:
## arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S),
## xreg = xmean, include.mean = FALSE, optim.control = list(trace = trc, REPORT = 1,
## reltol = tol))
##
## Coefficients:
## ar1 xmean
## 0.709 14.583
## s.e. 0.105 0.584
##
## sigma^2 estimated as 1.33: log likelihood = -64.31, aic = 134.6
##
## $AIC
## [1] 1.38
##
## $AICc
## [1] 1.445
##
## $BIC
## [1] 0.4636
The AR coefficient is statistically significant (z = 0.709/0.1094 = 6.480). It’s not necessary to test the mean coefficient. We know that it’s not 0.
The time series plot of the standardized residuals seems to indicate there is no trend in the residuals, no outliers, and no obvious change in variance over time.
The ACF of the residuals shows no significant autocorrelations, great!
The Q-Q plot is a normal probability plot; small deviations at the tails are not abnormal, and usually nothing to worry about.
The last plot on the bottom gives p-values for the Ljung-Box-Pierce statistic for each lag up to 20. These tests consider the accumulated residual autocorrelation from lag 1. The dashed blue line is at 0.05, and all p-values are above it, which is a good result for this test.
What happens if we misinterpreted the ACF and PACF and used an MA(1) model instead of an AR(1) model?
sarima(data, 0, 0, 1)
## initial value 0.507459
## iter 2 value 0.271938
## iter 3 value 0.270110
## iter 4 value 0.259999
## iter 5 value 0.259329
## iter 6 value 0.259211
## iter 7 value 0.259210
## iter 8 value 0.259209
## iter 9 value 0.259209
## iter 10 value 0.259209
## iter 10 value 0.259209
## iter 10 value 0.259209
## final value 0.259209
## converged
## initial value 0.261526
## iter 2 value 0.261466
## iter 3 value 0.261392
## iter 4 value 0.261391
## iter 5 value 0.261391
## iter 5 value 0.261391
## iter 5 value 0.261391
## final value 0.261391
## converged
## $fit
##
## Call:
## arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S),
## xreg = xmean, include.mean = FALSE, optim.control = list(trace = trc, REPORT = 1,
## reltol = tol))
##
## Coefficients:
## ma1 xmean
## 0.628 14.539
## s.e. 0.116 0.325
##
## sigma^2 estimated as 1.67: log likelihood = -68.89, aic = 143.8
##
## $AIC
## [1] 1.608
##
## $AICc
## [1] 1.673
##
## $BIC
## [1] 0.6917
The MA(1) coefficient is still significant, but overall the statistics are worse than our AR(1) model.
The diagnostic plots are quite informative as well. The ACF has a significant spike at lag 2, and several Ljung-Box_pierce p-values are below 0.05.
Next we try fitting an ARMA model to the same data
sarima(data, 1, 0, 1) # over parameterized model
## initial value 0.519565
## iter 2 value 0.445004
## iter 3 value 0.166926
## iter 4 value 0.159131
## iter 5 value 0.155006
## iter 6 value 0.152952
## iter 7 value 0.152844
## iter 8 value 0.152840
## iter 9 value 0.152839
## iter 10 value 0.152838
## iter 11 value 0.152838
## iter 12 value 0.152838
## iter 12 value 0.152838
## iter 12 value 0.152838
## final value 0.152838
## converged
## initial value 0.150077
## iter 2 value 0.149790
## iter 3 value 0.149741
## iter 4 value 0.149713
## iter 5 value 0.149708
## iter 6 value 0.149704
## iter 7 value 0.149704
## iter 8 value 0.149704
## iter 8 value 0.149704
## final value 0.149704
## converged
## $fit
##
## Call:
## arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, Q), period = S),
## xreg = xmean, include.mean = FALSE, optim.control = list(trace = trc, REPORT = 1,
## reltol = tol))
##
## Coefficients:
## ar1 ma1 xmean
## 0.708 0.001 14.583
## s.e. 0.149 0.224 0.584
##
## sigma^2 estimated as 1.33: log likelihood = -64.31, aic = 136.6
##
## $AIC
## [1] 1.429
##
## $AICc
## [1] 1.505
##
## $BIC
## [1] 0.5541
Right away we see the MA(1) coefficient is insignificant, thus we could drop that term and be left with the correct AR(1) model. The AIC, BIC and standard error are also higher than the correct AR(1) model.