6.1 Estimation of Moving Average Models

Le Nguyen Dang Khoa

Thu 17 February 2022

Introduction of chap 6

The introduction of moving average (MA) components to the model complicates the estimation problem (because the least squares criterion is no longer linear in the parameters).

Both least squares and maximum likelihood estimation for models involving MA terms involves numerical optimization and is relatively computationally difficult.

As a result, a variety of techniques for the estimation of models with MA terms have been suggested that do not involve numerical optimization. These techniques have generally made use (implicitly or explicitly) of moment conditions implied by the ARMA model, and therefore fall within the class of GMM estimators.

Outline:

The estimation of pure MA models.

How extensions to higher order models follow.

Content

The estimation of pure moving average models of the form

\[y_t=\epsilon _t+\theta_1\epsilon_{t-1}+\dots+\theta_q+\epsilon_{t-q}\] $\epsilon_t$ is an i.i.d. zero mean error process with variance $\sigma^2_0$ and fourth order cumulant $\mathcal K _4$ With the extra assumption that et is normally distributed, maximum likelihood estimation of $\theta_1, \dots, \theta_q$ is possible, but requires numerical maximization of the likelihood. $\Rightarrow$ There has been considerable interest infinding simpler estimators that have properties approaching those of maximum likelihood, and some of these simpler estimators can be put in the GMM framework.

Simple estimator derived driectly from the moments implied by an MA(1) model how that it generally has poor properties. We then describe a popular approach to the estimation of moving average models which makes use of approximate autoregressive models that can be estimated by OLS regression. The properties of these estimators can be very good. Finally, we indicate how the methods can be extended to moving average models of general order.

A simple Estimator of an MA(1) Models

We consider the estimation of $\theta_0$ in the model \[y_t=\epsilon_t+\theta_0\epsilon_{t-1},(*)\] where we assume that $\epsilon_t\sim i.i.d.(0,\sigma^2_0)$ and $|\theta_0|<1$. It is straight forward to show that the 1st order autocorrelation implied by this model is: \[\rho=\frac{\mathbb E(y_ty_{t-1})}{\mathbb E(y^2_t)}=\frac{\theta_0}{1+\theta_0^2}, (**)\] It is also the case that all higher order auto correlations are zero, but these additional moment conditions are not used here. If we define the sample 1st order auto-correlation \[\hat{\rho}_T=\frac{\sum_{t=2}^{T}y_ty_{t-1}}{\sum^{T}_{t=2}y^2_t}, \] Then the replacement of unknown parameters by sample estimators in Equation (6-2) suggests we solve the quadratic \[\hat{\theta}_T^2-\hat{\rho}^{-1}_T\times \hat{\theta_T}+1=0\] To obtain the estimator $\hat{\theta_T}$. In order for the solution of this quadratic to be real we require $|\hat{\rho}_T|\leq 0.5$, and it is also the case that the true 1st order autocorrelation satisfies $|\rho_0|<0.5$. Thus, given the consistency of $\hat{\rho}_T$, it follows that $\mathbb P(|\hat{\rho}_T|\leq 0.5)\rightarrow 1$. However, in a finite sample it is possible that $|\hat{rho}|>0.5$, particularly if $|\theta_0|$ is near one. To provide ab estimator of $\theta_0$ that is always real valued we could define: \[\tilde{\rho}_T=\left\{\begin{matrix} -0.5, & \mathrm{if} \;\hat{\rho}_T<-0.5,\\ \hat{\rho}_T, & \mathrm{if}\; |\hat{\rho}_T| 0.5,\\ 0.5, & \mathrm{if} \;\hat{\rho}_T >0.5 \end{matrix}\right.\] and $\tilde{\theta}_T$ to be the quadratic solution \[\tilde{\theta}_T= \frac{1-\sqrt{1-4\hat{\rho}^2_T}}{2\hat{\rho }_T}\] IT can be seen that the finite sample distribution of $\tilde \theta_T$ consists of a misture of two discrete probability masses ar $\pm 1$ (which disappear as $T\rightarrow \infty$) and a continuous distribution for $-1<\tilde \theta_T<1$. An estimator for $\sigma^2_0$ can then be found from the relationship \[\mathbb E(y^2_t)=\sigma^2_0$(1+\theta^2_0).(***)\] Using approximate sale quantities we have: \[\tilde \sigma^2_T=\frac{T^{-1}\sum^{T}_{t=1}y^2_t}{1+\tilde \theta^2_T}\]. We can put this estimator into the GMM framework define in Chapter 1. If we let parameter vector be $\eta=(\theta,\sigma^2)'$ and define \[f(y_t,\eta)=\begin{pmatrix} y_ty_{t-1}-\sigma^2\theta\\ y_T^2-\sigma^2(1+\theta^2)) \end{pmatrix},\] Then is follows from $(**)$ and $(***)$ that $Ef(y_t,\eta_0)=0$. the sample moments are \[f_T(\eta)=T^{-1}\sum^{T}_{t=1}f(y_t,\eta)=\begin{pmatrix} T^{-1}\sum^{T}_{t=1}y_ty_{t-1}-\sigma^2 \theta\\ T^{-1}\sum^{T}_{t=1}y^2_t-\sigma^2(1+\theta^2) \end{pmatrix},\] nad solcving the exactly identified equation $f_T(\hat{\eta}_T)=0$ for $\hat \eta_T$ gives the estimator $\hat \theta_T$ define above , and $\hat \theta^2_{\epsilon}$ defined analogously to $\tilde \sigma^2_{\epsilon}$. We can also define $\tilde \eta_T=(\tilde \theta_T,\tilde \sigma^2_{\epsilon})'$.

The asymptotic properties of these estimatetors are summarized in the 6.1 theorem:

Theorem 6.1 It is of interest to consider the asymptotic efficiency of $\hat \theta_T$ in particular. Its asymptotic variance is \[\frac{1+\theta^2_0+4\theta_0^4+\theta^6_0+\theta_0^8}{(1-\theta^2_0)^2}\] as compared to the asymptotic variance of the maximum likelihood estimator under the normality assumption, which is $1-\theta^2_0$. If $\theta_0=0$ then it can be seen that $\hat \theta_T$ is as asymptotically efficient as the maximum likelihood estimator, but as $\theta_0$ departs from zero $\hat \theta_T$ rapidly becomes less efficient than maximum likelihood. This estimator and its generally poor properties appear in Hannan [1960],(p. 47-48), see also Hannan [1970], (p. 373-374). This provides a simple example of the fact that (G)MM can provide quite inefficient estimators, and it is important to examine the properties of any new estimator derived.