Last lecture:
Wold’s decomposition, Stationary AR(p) and MA(q),
This lecture:
General modeling, ARMA modeling: principles and procedures
10/29/2018
Last lecture:
Wold’s decomposition, Stationary AR(p) and MA(q),
This lecture:
General modeling, ARMA modeling: principles and procedures
The objective: obtain adequate but parsimonious models.
Parsimony: the smallest possible number of parameters for adequate representation.
For example: consider an MA(q) model \[Y_{t} = \varepsilon_t +\theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} + \cdots + \theta_q \epsilon_{t-q}.\] For sufficiently large \(q\), it can be adequately represented by an AR(1) model (recall Wold’s decomposition) \[(1- \phi(\mathbb{L})) Y_{t}= \varepsilon_{t}\]
The latter model contains only one parameter \(\phi\). Fitting a relationship like MA(\(\infty\)) containing \(q\rightarrow \infty\) leads to unnecessarily poor estimation.
Assume that the data has a weakly stationary structure. (The class)
Pick up a reasonable pair of (p,q)
Estimate the parameter of ARMA(p,q).
Test the estimated model.
Forecast (and control).
library(forecast); library(Quandl); library(xts)
oil = Quandl("FRED/DCOILWTICO", type = "xts", collapse = "monthly",
start_date="2006-01-01", end_date="2018-10-19")
head(oil, 10)
## [,1] ## Jan 2006 67.86 ## Feb 2006 61.37 ## Mar 2006 66.25 ## Apr 2006 71.80 ## May 2006 71.42 ## Jun 2006 73.94 ## Jul 2006 74.56 ## Aug 2006 70.38 ## Sep 2006 62.90 ## Oct 2006 58.72
plot(oil)
ar.1=lm(oil~lag(oil)); ar.2=lm(oil~0+lag(oil)+lag(oil,2)); AIC(ar.1)
## [1] 1024.31
AIC(ar.2)
## [1] 1010.092
BIC(ar.1)
## [1] 1033.401
BIC(ar.2)
## [1] 1019.164
ar.2.ML = arima(oil,order=c(2,0,0), method="ML",include.mean=FALSE) coef(ar.2)
## lag(oil) lag(oil, 2) ## 1.2614023 -0.2655493
coef(ar.2.ML)
## ar1 ar2 ## 1.2596994 -0.2649732
library(lmtest) coeftest(ar.2.ML)
## ## z test of coefficients: ## ## Estimate Std. Error z value Pr(>|z|) ## ar1 1.259699 0.077582 16.2370 < 2.2e-16 *** ## ar2 -0.264973 0.077785 -3.4065 0.0006581 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
arma11 = arima(oil,order=c(1,0,1), method="ML",include.mean=FALSE) coef(arma11)
## ar1 ma1 ## 0.9941501 0.2099341
coeftest(arma11)
## ## z test of coefficients: ## ## Estimate Std. Error z value Pr(>|z|) ## ar1 0.9941501 0.0058697 169.37 < 2.2e-16 *** ## ma1 0.2099341 0.0686059 3.06 0.002213 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
For i.i.d. data, the density function \(p(y_{t};\beta)\)
The joint density is \[p(\mathbf{y};\beta)=p(y_{1},\dots,y_{T};\beta)=\prod_{t=1}^{T}p(y_{t};\beta)\]
The likelihood function is a function of \(\beta\) conditioning on the observing data \(\mathbf{y}\) : \[L(\beta|\mathbf{y})=L(\beta|y_{1},\dots,y_{T})=\prod_{t=1}^{T}p(\beta|y_{t})\]
AR(1):\[Y_{t}=c+\phi Y_{t-1}+\varepsilon_{t}, \varepsilon_{t}\sim i.i.d.\:\mathcal{N}(0,\sigma^{2})\] \[\beta=(c,\phi,\sigma^{2}) \quad|\phi|<1\]
Conditional on past information \(\mathcal{I}_{t-1} = y_{t-1}\) \[y_{t}|\mathcal{I}_{t-1}\sim\mathcal{N}(c+\phi y_{t-1},\sigma^{2})\] \(t=2,\dots,T\).
MLE is “easy” to compute if we condition on prior observations \(Y_1, \dots, Y_{t-1}\) and prior errors \(\hat{\varepsilon}_1, \dots, \hat{\varepsilon}_{t-1}\) where \(\hat{\varepsilon}_1 = Y_1\), \(\hat{\varepsilon}_2 = Y_2 - \phi Y_1 - \theta \hat{\varepsilon}_1\), … and \[\hat{\varepsilon}_t = Y_t - \sum_{j=1}^{t-1} \phi Y_{t-j} - \sum_{j=1}^{t-1} \theta \hat{\varepsilon}_{t-j}.\]
Then the likelihood can be written in terms of \(\hat{\varepsilon}_1, \dots, \hat{\varepsilon}_{t-1}\), which are i.i.d. Gaussian white noises. Estimation of \(\theta\) and \(\phi\) then reduces to maximize the log-likelihood of \[\max_{\theta, \phi} \sum_{t=2}^{T}\ln p(\hat{\varepsilon}_t|y_1,\dots, y_{t-1}, \hat{\varepsilon}_1, \dots, \hat{\varepsilon}_{t-1} ;\theta, \phi).\]
Test for autocorrelation in the residuals of a fitted ARMA model.
Test statistics (Box-Ljung test): \[T\sum_{j=1}^{k}(\hat{\rho}_j)^{2}\rightarrow\chi^{2}(k)\] \(k\) is the number of correlated lags of \(\hat{\rho}\)
Test hypothesis \[H_{0}:\rho_{1}=\cdots=\rho_{k}=0.\]
The hypothesis: the residuals from the ARMA model have no autocorrelation.
Box.test(arma11$resid, lag=3, type="Ljung-Box", fitdf=2)
## ## Box-Ljung test ## ## data: arma11$resid ## X-squared = 4.5907, df = 1, p-value = 0.03215
Box.test(ar.2.ML$resid, lag=3, type="Ljung-Box", fitdf=2)
## ## Box-Ljung test ## ## data: ar.2.ML$resid ## X-squared = 2.2142, df = 1, p-value = 0.1367
lag: the autocorrelation coefficients in the statistics of the residuals.
fitdf: fitdf \(= p+q\) and lag \(>\) fitdf.
The degrees of freedom: lag \(-p-q\)
Cannot reject the hypothesis that the residuals of ARMA(1,1) are independently distributed. So ARMA(1,1) fits well the data.
forecast(ar.2.ML,h=3)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95 ## Nov 13 71.11137 62.68790 79.53483 58.22879 83.99394 ## Dec 13 70.54327 56.99526 84.09128 49.82337 91.26317 ## Jan 14 70.02071 52.48416 87.55726 43.20087 96.84055
forecast(arma11,h=3)
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95 ## Nov 13 71.08221 62.58951 79.57491 58.09374 84.07068 ## Dec 13 70.66639 57.37370 83.95908 50.33697 90.99580 ## Jan 14 70.25300 53.51844 86.98755 44.65970 95.84629
plot(forecast(arma11,h=10))
If \(\{Y_{t}\}\) is weakly stationary process with Wold representation:\[Y_{t}=\mu+\sum_{j=0}^{\infty}\psi_{j}\varepsilon_{t-j},\;\varepsilon_{t}\sim WN(0,\sigma^{2})\]
\(\mathcal{I}_{t}=\{y_{t},y_{t-1},\dots\, y_{0} \}\): the information set up to time \(t\).
By Wold representation, we know \[\mathbb{E}[Y_{t}]= \mu, \,\, \mbox{Var}(Y_{t})= \sigma^{2}\sum_{j=0}^{\infty}\psi_{j}^{2}\]
Predict the future \(Y_{t+h}\) based on available information \(\mathcal{T}_{t}\) of the noises.
The expression of \(Y_{t+h}\) is \[Y_{t+h}=\mu+ \varepsilon_{t+h}+\psi_{1}\varepsilon_{t+h-1}+\cdots\] \[+\psi_{h-1}\varepsilon_{t+1} +\psi_{h}\varepsilon_{t}+\psi_{h+1}\varepsilon_{t-1}+\cdots\]
\(Y_{t+h|t}\) is the forecast of \(Y_{t+h}\) based on information \(\mathcal{T}_{t}\): \[Y_{t+h|t} =\mathbb{E} \left. \left[ Y_{t+h} \right| \mathcal{T}_{t} \right] = \mu+ \] \[ \mathbb{E} \left. \left[ \varepsilon_{t+h}+\psi_{1}\varepsilon_{t+h-1}+\cdots +\psi_{h-1}\varepsilon_{t+1} \right| \mathcal{T}_{t} \right] \] \[ +\psi_{h}\varepsilon_{t}+\psi_{h+1}\varepsilon_{t-1}+\cdots\]
The forecast error is \[\mathbb{E} \left. \left[ \varepsilon_{t+h}+\psi_{1}\varepsilon_{t+h-1}+\cdots +\psi_{h-1}\varepsilon_{t+1} \right| \mathcal{T}_{t} \right] \]
The forecast error is \[\varepsilon_{t+h|t}=Y_{t+h}-Y_{t+h|t}\] and prediction Mean Square Error (MSE) is \[MSE(\varepsilon_{t+h|t})=\mathbb{E}[\varepsilon_{t+h|t}^{2}]=\mathbb{E}[Y_{t+h}-Y_{t+h|t}]^{2}\]
Conditional Expectation: \(Y_{t+h|t}=\mathbb{E}\left[Y_{t+h}|\mathcal{I}_{t}\right].\) A linear function: \[\hat{Y}_{t+h|t}= k_{1}Y_{t}+\ldots+k_{t}Y_{1}\]
Coefficients \({\bf k}=(k_{1},\ldots,k_{t})^{\prime}\) are selected to minimize the prediction MSE.
As \(\{\varepsilon_{t}\}\) are independent white noises, \(\mathbb{E}[\varepsilon_{t+1}|\mathcal{I}_{t}]=0\) and \(\mathbb{E}[Y_{t+h}|\mathcal{I}_{t}]\) is a linear function of \(\{\varepsilon_{t}\}\) \[ Y_{t+h|t}=\mu+\psi_{h}\varepsilon_{t}+\psi_{h+1}\varepsilon_{t-1}+\dots\]
The forecast error is \[\varepsilon_{t+h|t}= Y_{t+h}-Y_{t+h|t} = \varepsilon_{t+h}+\psi_{1}\varepsilon_{t+h-1}+\cdots+\psi_{h-1}\varepsilon_{t+1}\]
Variance of \(\varepsilon_{t+h|t}\) is \[\sigma^{2}(1+\psi_{1}^{2}+\cdots+\psi_{h-1}^{2})\]
If \(\{\varepsilon_{t}\}\) are Gaussian white noises then \[Y_{t+h}|\mathcal{I}_{t}\sim\mathcal{N}(Y_{t+h|t},\sigma^{2}(1+\psi_{1}^{2}+\cdots+\psi_{h-1}^{2}))\] A \(95\%\) confidence interval for the \(h\)-step prediction has the form \[Y_{t+h} \pm 1.96\sqrt{\sigma^{2}(1+\psi_{1}^{2}+\cdots+\psi_{h-1}^{2})}\]
Discussion of a general ARMA modeling procedure
Implementation of model selection, estimation, testing, and forecasting under the stationary assumption.