[1] 1.0000000 0.7777778 0.3888889 0.1111111 0.0000000 0.0000000
Universidad Privada Boliviana
Prof. J. Dávalos (Ph.d.)
Stationary models may be generated by a variety of stationary models, AR(p), AR(q) or ARMA(p,q)
So the natural question is to identify which one may be the DGP of our TS of interest \(Y_t\)
The Box-Jenkins approach seeks to chose the “best” ARMA model.
It builds on the autocorrelation function (ACF) and the partial autocorrelation function (PACF)
Large scale structural multiequational models perfomed well in in-sample macro forecasts, and poorly out-of-sample.
Simpler (uniequational) ARMA models performed better out-of-sample
Parsimony is key. Occam’s razor principle.
The autocorrelation function is defined from the autocovariance \(\gamma(h)\) as \(\rho(h) = \gamma(h)/\gamma(0)\)
It measures the time dependence of a TS \(X_t\) with a lag \(X_{t-h}\).
The standard partial correlation (PACF) controls for the influence of other factors i.e. other lags, thus identifying the share of the relationship that is to be attributed to a specific lag.
Consider MA(q) process whose \(\gamma(h)\) was defined earlier.
Its ACF is \(\rho(h) = \gamma(h)/\gamma(0)\):
\(\rho(h) = \\ \begin{cases} 1 & \text{if $h=0$}\\ \frac{\theta_h + \theta_{h+1}\theta_1+ \theta_{h+2}\theta_2 + ...+\theta_q\theta_{q-h}}{ (1+\sum_{j=1}^q\theta_j^2) } & \text{if $h=1,2,...,q$ }\\ 0 & j>q \end{cases}\)
If a time series exhibit such behavior in its ACF, it might be a MA(3)
Let’s simulate this MA(3) process
clear all
set obs 1000 // Long Time Series
scalar sigma2 = 2 // Sigma2
scalar a = 2 // a
scalar delta1 = 1.5
scalar delta2 = 1
scalar delta3 = 0.5
gen E = rnormal(0,sqrt(sigma2)) // Gaussian WN
gen time = _n
gen y = . // initializing y
tsset time
* Simulation
dyngen{
update y = a + E + delta1*l.E + delta2*l2.E + delta3*l3.E ///
if time > 3
}Number of observations (_N) was 0, now 1,000.
(1,000 missing values generated)
Time variable: time, 1 to 1000
Delta: 1 unit
(obs=992)
| L. L2. L3. L4. L5.
| y y y y y y
-------------+------------------------------------------------------
y |
--. | 1.0000
L1. | 0.7760 1.0000
L2. | 0.3699 0.7758 1.0000
L3. | 0.0466 0.3704 0.7763 1.0000
L4. | -0.1184 0.0472 0.3711 0.7763 1.0000
L5. | -0.1499 -0.1170 0.0479 0.3701 0.7751 1.0000
The order of an AR(p) model cannot be best identified by the ACF but by the PACF
Remember the AR(p) stationary process, so that \(\sum_j |\delta_j| < 1\):
If \(p=1\) we know that this is a \(MA(\infty)\),
In an AR(1), \(Y_t\) and \(Y_{t-2}\) will be correlated by transitivity. Instead we calculate the specific (PARTIAL) correlation between \(Y_t\) and its lags in order to identify the order of \(p\). A nice feature of the PACF is that it can be obtained from the ACF (\(\rho(h)\)).For any AR(p) process the theoretical PACF noted \(\phi_{ss}\) is:
\(\phi_{11} = \rho(1)\)
\(\phi_{22} = (\rho(2) - \rho(1)^2)/(1-\rho(1)^2)\)
\(\phi_{ss} = \frac{\rho_s - \sum_{j=1}^{s-1}\phi_{s-1,j}\rho(s-j)}{1-\sum_{j=1}^{s-1}\phi_{s-1,j}\rho(s-j)}\) for \(s>2\) and where \(\phi_{sj} = \phi_{s-1,j} - \phi_{ss}\phi_{s-1,s-j}\)
In an AR(p) process, \(\phi_{ss}\) is 0 given s>p. Similarly, the ACF should decay
* ESTIMATED (not theoretical) ACF and PACF
* just to avoid the cumbersome formulas
corrgram yar1, lags(5)
corrgram yma1, lags(5) -1 0 1 -1 0 1
LAG AC PAC Q Prob>Q [Autocorrelation] [Partial autocor]
-------------------------------------------------------------------------------
1 0.8114 0.8130 660.41 0.0000 |------ |------
2 0.6566 0.0009 1093.3 0.0000 |----- |
3 0.5443 0.0482 1391 0.0000 |---- |
4 0.4363 -0.0407 1582.5 0.0000 |--- |
5 0.3571 0.0285 1710.9 0.0000 |-- |
-1 0 1 -1 0 1
LAG AC PAC Q Prob>Q [Autocorrelation] [Partial autocor]
-------------------------------------------------------------------------------
1 0.3761 0.3779 141.86 0.0000 |--- |---
2 -0.0017 -0.1765 141.86 0.0000 | -|
3 0.0348 0.1341 143.07 0.0000 | |-
4 0.0073 -0.0760 143.13 0.0000 | |
5 0.0097 0.0597 143.22 0.0000 | |
clear all
set seed 2022 // Setting the simulation seed
set obs 1000 // Long Time Series
scalar sigma2 = 30 // Sigma2
scalar a = 5 // a
scalar y0 = 100 // initial condition in Y_t
scalar theta1 = 0.5
scalar theta2 = 1.5
scalar delta1 = 0.5 // delta1 +
scalar delta2 = 0.3 // delta2 <1 stationary
scalar p = 2 // AR(p)
scalar q = 2 // MA(q)
gen E = rnormal(0,sqrt(sigma2)) // Gaussian WN
gen time = _n
tsset time
gen yar2 = y0 in 1/2 // Just to initialize our AR process
gen yma2 = y0 in 1/2 // Just to initialize our AR process
* Simulation
dyngen{
update yar2 = a + E + delta1*l.yar2 + delta2*l2.yar2 if time > p
update yma2 = a + E + theta1*l.E + theta2*l2.E if time > q
}Number of observations (_N) was 0, now 1,000.
Time variable: time, 1 to 1000
Delta: 1 unit
(998 missing values generated)
(998 missing values generated)
-1 0 1 -1 0 1
LAG AC PAC Q Prob>Q [Autocorrelation] [Partial autocor]
-------------------------------------------------------------------------------
1 0.7655 0.7659 587.73 0.0000 |------ |------
2 0.6891 0.3036 1064.5 0.0000 |----- |--
3 0.5583 -0.0580 1377.7 0.0000 |---- |
4 0.4680 -0.0142 1598.1 0.0000 |--- |
5 0.3947 0.0087 1755 0.0000 |--- |
-1 0 1 -1 0 1
LAG AC PAC Q Prob>Q [Autocorrelation] [Partial autocor]
-------------------------------------------------------------------------------
1 0.4004 0.4009 160.84 0.0000 |--- |---
2 0.4036 0.3161 324.41 0.0000 |--- |--
3 0.0071 -0.3108 324.46 0.0000 | --|
4 -0.0128 -0.1005 324.63 0.0000 | |
5 -0.0346 0.1972 325.83 0.0000 | |-
Under the assumption of Normally distributed WN:
Let’s run stages 2 and 3 using our simulated series for an AR(2) process against an AR(3) and MA(2) alternative models.
These will be our hypothetical candidates after looking at their correlograms (stage 1)
Akaike's information criterion and Bayesian information criterion
-----------------------------------------------------------------------------
Model | N ll(null) ll(model) df AIC BIC
-------------+---------------------------------------------------------------
. | 1,000 . -3159.775 4 6327.549 6347.18
-----------------------------------------------------------------------------
Note: BIC uses N = number of observations. See [R] BIC note.
Akaike's information criterion and Bayesian information criterion
-----------------------------------------------------------------------------
Model | N ll(null) ll(model) df AIC BIC
-------------+---------------------------------------------------------------
. | 1,000 . -3157.968 5 6325.935 6350.474
-----------------------------------------------------------------------------
Note: BIC uses N = number of observations. See [R] BIC note.
Akaike's information criterion and Bayesian information criterion
-----------------------------------------------------------------------------
Model | N ll(null) ll(model) df AIC BIC
-------------+---------------------------------------------------------------
. | 1,000 . -3315.414 4 6638.828 6658.459
-----------------------------------------------------------------------------
Note: BIC uses N = number of observations. See [R] BIC note.
* This is a simulated series so we drop the first periods (50) non-stationary periods
qui: {
arima yar2 , ar(1 2)
predict residar2 if time > 50, r
arima yar2 , ar(1 2 3)
predict residar3 if time > 50, r
}
summarize residar2 residar3 Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
residar2 | 950 -.1642168 5.577541 -19.3184 19.1575
residar3 | 950 -.1650889 5.569911 -19.61905 18.76305
The series look random, with constant variance, except for the beginning of the series (we already discussed why this should be dropped in a series observed since its origins, initial \(t\)’s)… this is good news
Correlogram (ACF)
corrgram residar2, lags(5) // standard representation
corrgram residar3, lags(5) // standard representation -1 0 1 -1 0 1
LAG AC PAC Q Prob>Q [Autocorrelation] [Partial autocor]
-------------------------------------------------------------------------------
1 -0.0307 -0.0307 .89921 0.3430 | |
2 -0.0001 -0.0011 .89922 0.6379 | |
3 -0.0622 -0.0624 4.5915 0.2043 | |
4 -0.0444 -0.0487 6.4777 0.1662 | |
5 -0.0275 -0.0310 7.2025 0.2060 | |
-1 0 1 -1 0 1
LAG AC PAC Q Prob>Q [Autocorrelation] [Partial autocor]
-------------------------------------------------------------------------------
1 -0.0510 -0.0511 2.4827 0.1151 | |
2 -0.0442 -0.0471 4.3499 0.1136 | |
3 -0.0323 -0.0374 5.3448 0.1482 | |
4 -0.0410 -0.0473 6.9518 0.1385 | |
5 -0.0134 -0.0218 7.1243 0.2116 | |
Portmanteau Q-stat
A WN test or a Q-test up to the k-th lag. I chose 5 arbitrary given the short memory of the DGP (AR 2 or 3). Think about a quarterly series, we would not expect a long memory from it (no more than a year).
As a rule of thumb, with no a priori information, you can use T/4 lags. In econometric applications, you are expected to have a priori information.
Portmanteau test for white noise
---------------------------------------
Portmanteau (Q) statistic = 7.2025
Prob > chi2(5) = 0.2060
Portmanteau test for white noise
---------------------------------------
Portmanteau (Q) statistic = 7.1243
Prob > chi2(5) = 0.2116
If not normal, consider alternative transformations for \(Y_t\), (log, Box-Cox, etc.)
\(H_0: \varepsilon_t \sim N(0,\sigma^2)\)
Inspection of the TS and ACF suggest stationary residuals for both process.
We do not reject the covariance stationarity assumption (Q-stat), thus, we do not reject residuals’ WN given a constant mean and variance in both process AR(2) and AR(3).
Q-Q plots for both residuals and the normality test do not reject the normality assumption for both process.