Modeling stationary time series

Universidad Privada Boliviana

Author

Prof. J. Dávalos (Ph.d.)

Introduction to the Box-Jenkins specification

Stationary models may be generated by a variety of stationary models, AR(p), AR(q) or ARMA(p,q)
So the natural question is to identify which one may be the DGP of our TS of interest $Y_t$
The Box-Jenkins approach seeks to chose the “best” ARMA model.
It builds on the autocorrelation function (ACF) and the partial autocorrelation function (PACF)

Large scale structural multiequational models perfomed well in in-sample macro forecasts, and poorly out-of-sample.
Simpler (uniequational) ARMA models performed better out-of-sample
Parsimony is key. Occam’s razor principle.

Autocorrelation (ACF) and partial autocorrelation (PACF)

The autocorrelation function is defined from the autocovariance $\gamma(h)$ as $\rho(h) = \gamma(h)/\gamma(0)$
It measures the time dependence of a TS $X_t$ with a lag $X_{t-h}$.
The standard partial correlation (PACF) controls for the influence of other factors i.e. other lags, thus identifying the share of the relationship that is to be attributed to a specific lag.

MA autocorrelation function (ACF)

Consider MA(q) process whose $\gamma(h)$ was defined earlier.
Its ACF is $\rho(h) = \gamma(h)/\gamma(0)$:

$\rho(h) = \\ \begin{cases} 1 & \text{if $h=0$}\\ \frac{\theta_h + \theta_{h+1}\theta_1+ \theta_{h+2}\theta_2 + ...+\theta_q\theta_{q-h}}{ (1+\sum_{j=1}^q\theta_j^2) } & \text{if $h=1,2,...,q$ }\\ 0 & j>q \end{cases}$

Example. For a MA(3) process: $Y_t = 2 + \varepsilon_t + 1.5\varepsilon_{t-1}+1\varepsilon_{t-2} + 0.5\varepsilon_{t-3}, where \sigma^2 =2$

ACF

[1] 1.0000000 0.7777778 0.3888889 0.1111111 0.0000000 0.0000000

If a time series exhibit such behavior in its ACF, it might be a MA(3)
Let’s simulate this MA(3) process

Simulation MA(3)

clear all
set obs 1000         // Long Time Series
scalar sigma2  = 2  // Sigma2
scalar a       = 2   // a
scalar delta1  = 1.5
scalar delta2  = 1
scalar delta3  = 0.5

gen E          = rnormal(0,sqrt(sigma2)) // Gaussian WN
gen time       = _n
gen y          = .  // initializing y

tsset time

* Simulation
dyngen{
update y = a + E + delta1*l.E + delta2*l2.E + delta3*l3.E ///
          if time > 3
}

Number of observations (_N) was 0, now 1,000.








(1,000 missing values generated)


Time variable: time, 1 to 1000
        Delta: 1 unit

Estimated (auto) correlations using a standard STATA command (cor)

cor y l.y l2.y l3.y l4.y l5.y

(obs=992)

             |                 L.      L2.      L3.      L4.      L5.
             |        y        y        y        y        y        y
-------------+------------------------------------------------------
           y |
         --. |   1.0000
         L1. |   0.7760   1.0000
         L2. |   0.3699   0.7758   1.0000
         L3. |   0.0466   0.3704   0.7763   1.0000
         L4. |  -0.1184   0.0472   0.3711   0.7763   1.0000
         L5. |  -0.1499  -0.1170   0.0479   0.3701   0.7751   1.0000

A MA(q) process will exhibit a non-zero ACF up to $q$

AR(p) partial autocorrelation (PACF)

The order of an AR(p) model cannot be best identified by the ACF but by the PACF
Remember the AR(p) stationary process, so that $\sum_j |\delta_j| < 1$:
- $Y_t = a + \sum_{j=1}^p \delta_j Y_{t-j} + \varepsilon_t$
If $p=1$ we know that this is a $MA(\infty)$,
- $Y_t = a(1/(1-\delta)) + \varepsilon_t + \delta \varepsilon_{t-1} + \delta^2 \varepsilon_{t-2} +...+ \delta^{t-1} \varepsilon_1$
- $\gamma(h)$ and $\rho(h)$ will exist for every h, although they will decay towards zero as per the $\delta^j$ factor. ACF is useless to identify $p$ in an AR process.

PACF

In an AR(1), $Y_t$ and $Y_{t-2}$ will be correlated by transitivity. Instead we calculate the specific (PARTIAL) correlation between $Y_t$ and its lags in order to identify the order of $p$. A nice feature of the PACF is that it can be obtained from the ACF ($\rho(h)$).For any AR(p) process the theoretical PACF noted $\phi_{ss}$ is:

$\phi_{11} = \rho(1)$
$\phi_{22} = (\rho(2) - \rho(1)^2)/(1-\rho(1)^2)$
$\phi_{ss} = \frac{\rho_s - \sum_{j=1}^{s-1}\phi_{s-1,j}\rho(s-j)}{1-\sum_{j=1}^{s-1}\phi_{s-1,j}\rho(s-j)}$ for $s>2$ and where $\phi_{sj} = \phi_{s-1,j} - \phi_{ss}\phi_{s-1,s-j}$
In an AR(p) process, $\phi_{ss}$ is 0 given s>p. Similarly, the ACF should decay

The PACF and ACF of MA(1) and AR(1)

* ESTIMATED (not theoretical) ACF and PACF 
* just to avoid the cumbersome formulas
corrgram yar1, lags(5)
corrgram yma1, lags(5)

                                          -1       0       1 -1       0       1
 LAG       AC       PAC      Q     Prob>Q  [Autocorrelation]  [Partial autocor]
-------------------------------------------------------------------------------
1        0.8114   0.8130   660.41  0.0000          |------            |------  
2        0.6566   0.0009   1093.3  0.0000          |-----             |        
3        0.5443   0.0482     1391  0.0000          |----              |        
4        0.4363  -0.0407   1582.5  0.0000          |---               |        
5        0.3571   0.0285   1710.9  0.0000          |--                |        

                                          -1       0       1 -1       0       1
 LAG       AC       PAC      Q     Prob>Q  [Autocorrelation]  [Partial autocor]
-------------------------------------------------------------------------------
1        0.3761   0.3779   141.86  0.0000          |---               |---     
2       -0.0017  -0.1765   141.86  0.0000          |                 -|        
3        0.0348   0.1341   143.07  0.0000          |                  |-       
4        0.0073  -0.0760   143.13  0.0000          |                  |        
5        0.0097   0.0597   143.22  0.0000          |                  |

In an AR(1):
$\phi_{11} = \rho(1)$ as stated above.
The PACF drops inmediatly after $p=1$ as expected
The ACF drops slowly at the factor of $\theta^h$

In an MA(1):
- $\phi_{11} = \rho(1)$ as stated above.
- The ACF drops inmediatly after $q=1$ as expected
- The PACF only signals the presence of $\rho(1)$ and decays slowly after $q=1$
Summary
- AR1(1): The continuous decay of the AC and sharp decay of the PAC at h=1 suggests and AR(1) process
- MA(1): The sharp decay of the AC at h =1 and slow decay of the PAC suggests a MA(1)

The PACF and ACF of MA(2) and AR(2)

clear all
set seed 2022        // Setting the simulation seed  
set obs 1000         // Long Time Series
scalar sigma2  = 30  // Sigma2
scalar a       = 5   // a
scalar y0      = 100 // initial condition in Y_t
scalar theta1   = 0.5  
scalar theta2   = 1.5 
  scalar delta1   = 0.5 // delta1 + 
  scalar delta2   = 0.3 // delta2 <1 stationary

scalar p       = 2  // AR(p)
scalar q       = 2  // MA(q)

gen E          = rnormal(0,sqrt(sigma2)) // Gaussian WN
gen time       = _n

tsset time
gen yar2  = y0 in 1/2   // Just to initialize our AR process
gen yma2  = y0 in 1/2   // Just to initialize our AR process

  
* Simulation
dyngen{
update yar2 = a + E + delta1*l.yar2 + delta2*l2.yar2 if time > p
update yma2 = a + E + theta1*l.E + theta2*l2.E       if time > q
}

Number of observations (_N) was 0, now 1,000.













Time variable: time, 1 to 1000
        Delta: 1 unit

(998 missing values generated)

(998 missing values generated)

ESTIMATED (not theoretical) ACF and PACF, just to avoid the cumbersome formulas

corrgram yar2, lags(5)
corrgram yma2, lags(5)

                                          -1       0       1 -1       0       1
 LAG       AC       PAC      Q     Prob>Q  [Autocorrelation]  [Partial autocor]
-------------------------------------------------------------------------------
1        0.7655   0.7659   587.73  0.0000          |------            |------  
2        0.6891   0.3036   1064.5  0.0000          |-----             |--      
3        0.5583  -0.0580   1377.7  0.0000          |----              |        
4        0.4680  -0.0142   1598.1  0.0000          |---               |        
5        0.3947   0.0087     1755  0.0000          |---               |        

                                          -1       0       1 -1       0       1
 LAG       AC       PAC      Q     Prob>Q  [Autocorrelation]  [Partial autocor]
-------------------------------------------------------------------------------
1        0.4004   0.4009   160.84  0.0000          |---               |---     
2        0.4036   0.3161   324.41  0.0000          |---               |--      
3        0.0071  -0.3108   324.46  0.0000          |                --|        
4       -0.0128  -0.1005   324.63  0.0000          |                  |        
5       -0.0346   0.1972   325.83  0.0000          |                  |-

AR(2):
Sharp decay in the PACF at the p-lag is indicative of the AR(p=2).
The ACF confirms. It does not decay sharply, it just reflects $Y_t$ transitive autocorrelation from the AR process.
MA(2):
Sharp decay in the ACF at the p-lag is indicative of the MA(q=2). The PACF decays slowly.

Box-Jenkins approach

Under the assumption of Normally distributed WN:

1. Inspect the TS (plot), the autocorrelogram (ACF, PACF) and select candidates $p$ or $q$
1. Estimation of the candidate models. Each one may be assessed by statistical information criteria: AIC, BIC (the lower, the better).
1. Check if the best model residuals are WN.
- Visually inspect residuals, they need to be WN (check the correlagram, perform WN tests)
- Check for breaks or evidence of non-normality, this can be tested (Jarque-Bera test)

Estimation and error checking

Let’s run stages 2 and 3 using our simulated series for an AR(2) process against an AR(3) and MA(2) alternative models.
These will be our hypothetical candidates after looking at their correlograms (stage 1)

Stage 2. Estimation and model selection via information criteria

The models are estimated by Maximum Likelihood which imposes a normally distributed WN.
Let’s get their AIC and BIC
- First estimate the model (arima command)
- Second, ask for the information criteria (estat ic)

qui: arima yar2 , ar(1 2)    //AR(2)
estat ic

Akaike's information criterion and Bayesian information criterion

-----------------------------------------------------------------------------
       Model |          N   ll(null)  ll(model)      df        AIC        BIC
-------------+---------------------------------------------------------------
           . |      1,000          .  -3159.775       4   6327.549    6347.18
-----------------------------------------------------------------------------
Note: BIC uses N = number of observations. See [R] BIC note.

qui: arima yar2 , ar(1 2 3)  //AR(3)
estat ic

Akaike's information criterion and Bayesian information criterion

-----------------------------------------------------------------------------
       Model |          N   ll(null)  ll(model)      df        AIC        BIC
-------------+---------------------------------------------------------------
           . |      1,000          .  -3157.968       5   6325.935   6350.474
-----------------------------------------------------------------------------
Note: BIC uses N = number of observations. See [R] BIC note.

qui: arima yar2 , ma(1 2)  //MA(2)
estat ic

Akaike's information criterion and Bayesian information criterion

-----------------------------------------------------------------------------
       Model |          N   ll(null)  ll(model)      df        AIC        BIC
-------------+---------------------------------------------------------------
           . |      1,000          .  -3315.414       4   6638.828   6658.459
-----------------------------------------------------------------------------
Note: BIC uses N = number of observations. See [R] BIC note.

The best model is either the AR(2) or AR(3), so let’s check stage 3.

Stage 3. Is the error term a normally distributed WN?

$\varepsilon_t \sim N(0,\sigma^2)$ and iid i.e. time independence.
- Visually inspect residuals, they need to be WN
- check the correlogram and perform WN tests
- It is a good practice to evaluate asymptotic normality i.e. symmetry and kurtosis of the estimated white noise.
  - Q-Q plot
  - Shapiro-Francia test

Time independence

Residuals are by definition 0 mean and just for now, assume that $\gamma(0)$ is constant and = $\sigma^2$
So let’s check for the iid assumption i.e. $\gamma(h)/\gamma(0) -> \rho(h)$ the ACF, it must be statistically 0. This can be tested for each lag, or equivalently by testing whether the sum of $\rho(h)$ is 0.
$H_0: \sum_{h=1}^k \rho(h) = 0$
This is the ‘Portmanteau’ test, aka Q-test. It follows a $\chi^2_h$ asymptotic distribution (T sufficiently large)

Let’s generate the estimated residuals $\hat{\varepsilon}_t$

* This is a simulated series so we drop the first periods (50) non-stationary periods

qui: {
  arima yar2 , ar(1 2)
  predict residar2 if time > 50, r  

    arima yar2 , ar(1 2 3)
  predict residar3 if time > 50, r  

}
summarize residar2 residar3

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
    residar2 |        950   -.1642168    5.577541   -19.3184    19.1575
    residar3 |        950   -.1650889    5.569911  -19.61905   18.76305

tsline residar2 residar3

The series look random, with constant variance, except for the beginning of the series (we already discussed why this should be dropped in a series observed since its origins, initial $t$’s)… this is good news

Correlogram (ACF)

corrgram residar2, lags(5)  // standard representation
corrgram residar3, lags(5)  // standard representation

                                          -1       0       1 -1       0       1
 LAG       AC       PAC      Q     Prob>Q  [Autocorrelation]  [Partial autocor]
-------------------------------------------------------------------------------
1       -0.0307  -0.0307   .89921  0.3430          |                  |        
2       -0.0001  -0.0011   .89922  0.6379          |                  |        
3       -0.0622  -0.0624   4.5915  0.2043          |                  |        
4       -0.0444  -0.0487   6.4777  0.1662          |                  |        
5       -0.0275  -0.0310   7.2025  0.2060          |                  |        

                                          -1       0       1 -1       0       1
 LAG       AC       PAC      Q     Prob>Q  [Autocorrelation]  [Partial autocor]
-------------------------------------------------------------------------------
1       -0.0510  -0.0511   2.4827  0.1151          |                  |        
2       -0.0442  -0.0471   4.3499  0.1136          |                  |        
3       -0.0323  -0.0374   5.3448  0.1482          |                  |        
4       -0.0410  -0.0473   6.9518  0.1385          |                  |        
5       -0.0134  -0.0218   7.1243  0.2116          |                  |

The ACF values look very low, this seems good news.
The Q-statistics show non-significant autocovariances (10% level)… good news again at both models.

ac residar2 , level(90)

Among too many lags, some $\rho(h)$ could be significant even under the null. Better to consider a joint Q-test.

ac residar3 , level(90)

Among many too many lags, some $\rho(h)$ could be significant even under the null. Better to consider a joint Q-test.

Portmanteau Q-stat
A WN test or a Q-test up to the k-th lag. I chose 5 arbitrary given the short memory of the DGP (AR 2 or 3). Think about a quarterly series, we would not expect a long memory from it (no more than a year).
As a rule of thumb, with no a priori information, you can use T/4 lags. In econometric applications, you are expected to have a priori information.

wntestq  residar2, lags(5)
wntestq  residar3, lags(5)

Portmanteau test for white noise
---------------------------------------
 Portmanteau (Q) statistic =     7.2025
 Prob > chi2(5)            =     0.2060


Portmanteau test for white noise
---------------------------------------
 Portmanteau (Q) statistic =     7.1243
 Prob > chi2(5)            =     0.2116

If not normal, consider alternative transformations for $Y_t$, (log, Box-Cox, etc.)

Quantile - Quantile plot, empirical vs normal distribution:

qnorm residar2

qnorm residar3

$H_0: \varepsilon_t \sim N(0,\sigma^2)$

sfrancia residar2

                  Shapiro–Francia W' test for normal data

    Variable |       Obs       W'          V'        z       Prob>z
-------------+-----------------------------------------------------
    residar2 |       950    0.99873      0.813    -0.474    0.68210

sfrancia residar3

                  Shapiro–Francia W' test for normal data

    Variable |       Obs       W'          V'        z       Prob>z
-------------+-----------------------------------------------------
    residar3 |       950    0.99864      0.872    -0.313    0.62289

normality is not rejected

Conclusion

Inspection of the TS and ACF suggest stationary residuals for both process.
We do not reject the covariance stationarity assumption (Q-stat), thus, we do not reject residuals’ WN given a constant mean and variance in both process AR(2) and AR(3).
Q-Q plots for both residuals and the normality test do not reject the normality assumption for both process.
- From Occam’s razor principle, we should chose the simplest one AR(2)

Extension to an ARMA(p,q)

Selection of canditate models
- The AR(p) is identified by the PACF and the MA(q) from the ACF
- The AR(p) causes the ACF to decay slowly, and the MA(q) causes the PACF also to decay
- Having both components in an ARMA(p,q) makes this analysis difficult given that both (ACF and PACF) will decay. The main identifier will be the practical and statistical significance of the ACF and PACF
- practically and/or statistically significant ACF and PACF will signal candidates q and p, respectively
- The decay can be oscillatory or not, this is just an artifact of the parameters’ signs
Check slides 40

Exercises

1. Follow the Box-Jenkins approach and identify the ARMA(p,q) process for Toyota’s stock returns (check first session). Provide clear evidence on the 3 steps
1. Pick the time series of your choice (different from the ones presented in class and not a simulated one) that seems to be stationary:
- Analyse its correlogram and explain what are the features that suggest it is a stationary (not WN) time series ${Y_t}$.(Hint: analyse the correlogram of a time series that seems not to be stationary to make your point i.e. a random walk that can simulate: $Y_t = Y_{t-1} + \varepsilon_t$)
- Implement the Box-Jenkins approach to identify its ARMA(p,q) process