October 2, 2017

Concepts, Models, and Definitions

Defining Nonlinearity

  • As a first attempt of a definition of linearity, consider

\[ \mathsf{E}\{y_t | \boldsymbol{z}_t\} = \alpha' \boldsymbol{z}_t + g(\boldsymbol{z}_t) \label{eqn:first} \] where \(y_t\) is the dependent variable, \(\boldsymbol{z}_t\) the explanatory vector including lagged \(y_t\).

  • The model for the conditional mean is said to be linear if \(g(\boldsymbol{z}_t) \equiv 0\).

  • This definition is used in Lee, White and Granger (1993).

Where does nonlinearity come from?

  • Nonlinearity arises from the structure and institutions of the economy.
    • Ceilings and floors to some variables and one would expect the variable to behave differently near them
    • Union requirements may make it easier to increase wages than to decrease them
    • public reactions to prices make increases more difficult than price decreases
    • and etc.
  • Linear model: lack of symmetry, which is a form of nonlinearity

  • The theory is not very specific about how nonlinear the data will turn out to be.

  • The amount of nonlinearity may be reduced from high-frequency to low-frequency.

Stationarity and Nonstationarity

  • For nonlinear models, conditional quantities like the conditional distribution, the conditional mean, and variance will be much more important than the unconditional ones.

  • In most models in the book, strict stationarity and existence of at least second‐order moments are assumed.

  • Stationarity conditions: a recursively defined Markov model.
    • Forster-Lyapunov drift criterion + a sufficient condition: \(g(y, \theta) < |y|\) for \(|y|\) large + geometric ergodicity
  • Possible transformation from non-stationary to stationary.

Wold's Representation and Volterra Expansion

  • Wold (1938) shows that if \(y_t\) is purely nondeterministic, then there will always be a representation

\[ y_t = \varepsilon_t + \sum_{j=1}^q \theta_j \varepsilon_{t-j} \]

Note that \(q\) not necessarily finite.

  • But it is an identity in the mean square sense.

  • A formal nonlinear generalization, Volterra expansion

\[ y_t = \sum_{i=0}^q \theta_i \varepsilon_{t-i} + \sum_{i=0}^q \sum_{j=i}^q \theta_{ij} \varepsilon_{t-i} \varepsilon_{t-j} + \sum_{i=0}^q \sum_{j=i}^q \sum_{k=j}^q \theta_{ijk} \varepsilon_{t-i} \varepsilon_{t-j} \varepsilon_{t-k} + ... \]

Nonlinear Models in Economic Theory

Disequilibrium Models

  • Fair and Jaffee (1972), the general disequilibrium model

\[ D_t = \alpha_0' x_t^D + \alpha_1 p_t + \varepsilon_t^D \] \[ S_t = \beta_0' x_t^S + \beta_1 p_t + \varepsilon_t^S, \] with the "min-condtion" \[ D_t^{obs} = \min(D_t, S_t). \]

  • Another possibility that Fair and Jaffee (1972) considered

\[ p_t - p_{t-1} = \gamma (D_t - S_t). \]

Exchange Rates in a Target Zone

  • The research in Krugman (1991)

Production Theory

  • A two‐input version of the translog production function by Christensen, Jorgensen, and Lau (1973)

\[ \ln y = \ln \gamma + \alpha_1 \ln x_1 + \alpha_2 \ln x_2 + \alpha_{11}(\ln x_1)^2 + \alpha_{22} (\ln x_2)^2 + \alpha_{12} (\ln x_1 \ln x_2). \]

  • The RHS is a second‐order Kolmogorov–Gabor polynomial to be discussed in Section 3.

Parametric Nonlinear Models

Switching Regression Models

  • The standard switching regression (SR) model is piecewise linear,

\[ y_t = \sum_{j=1}^r (\phi_j' \boldsymbol{z}_t + \varepsilon_{jt}) I(c_{j-1} < s_t \leq c_j) \]

  • A special case, 2 regime SR model \[ y_t = (\phi_1' \boldsymbol{z}_t + \varepsilon_{1t}) I(s_t \leq c_1) + (\phi_2' \boldsymbol{z}_t + \varepsilon_{2t}) I(s_t > c_1) \]

  • When \(\boldsymbol{z}_t\) only contains the intercept and the lagged \(y_t\), and \(s_t = y_{t-d}\), the model becomes the self‐exciting threshold autoregressive (SETAR, or TAR for short) model.

Switching Regression Models

  • Another special case of the univariate TAR model is the one in which only the intercept is switching

\[ y_t = \sum_{j=1}^r \phi_{0j}' I(c_{j-1} < s_t \leq c_j) + \phi' \tilde{\boldsymbol{w}}_t + \varepsilon_t \]

  • Estimation of SR models can be carried out by conditional least squares.

  • Asymptotic distribution for \(c\), Chan (1993) and Hansen (2000).

Markov-Switching Regression Models

  • The observable regime indicator \(s_t\) in SR model is replaced by an unobservable discrete stochastic variable \(\theta_t\).

  • The sequence \(\{\theta_t\}\) is assumed to be a sequence of iid variables or to follow a Markov chain, typically of order one, with transition probabilities

\[ p_{ij} = \mathsf{Pr} \{ \theta_t = \nu_j | \theta_{t-1} = \nu_i \}, \quad i,j = 1, ..., r. \] * The Markov‐switching (MS) or hidden Markov regression model

\[ y_t = \sum_{j=1}^r (\phi_j' \boldsymbol{z}_t + \varepsilon_{jt}) I(\theta_t = \nu_j) \]

Smooth Transition Regression Models

  • The SR models has been criticized for its lack of smoothness in its transition mechanism.

  • Bacon and Watts (1971) considered two regression lines and devised a model in which the transition from one line to the other is smooth.

  • Goldfeld and Quandt (1972) independently presented a STR model, suggested that the step function \(I\) be replaced by a normal cdf.

Smooth Transition Regression Models

  • Maddala (1977) recommended the logistic function instead of the normal cdf, and this has become the prevailing standard.

  • The logistic STR (LSTR) model

\[ y_t = \{ \phi + \psi G(\gamma, c, s_t) \}' \boldsymbol{z}_t + \varepsilon_t \] with \[ G(\gamma, c, s_t) = \left( 1 + \exp \left\{ - \gamma \prod_{k=1}^K (s_t - c_k) \right\} \right)^{-1} \] where \(\gamma > 0\).

The ESTR Model

  • It should be mentioned that there exists an alternative to the LSTR2 model, the so–called exponential STR (ESTR) model with \[ G(\gamma, c, s_t) = 1 - \exp \left\{ - \gamma (s_t - c_k)^2 \right\} \] where \(\gamma > 0\).

The Additive and multiple STR Model

  • Van Dijk and Franses (1999) introduced the additive STR model \[ y_t = \phi_1' \boldsymbol{z}_t + \sum_{j=2}^n \phi_j' \boldsymbol{z}_t G(\gamma_j, c_j, s_{jt}) + \varepsilon_t \]

  • They also considered the multiple regime STAR model

\[ y_t = \phi_0' \boldsymbol{w}_t + \phi_1' \boldsymbol{w}_t G(\gamma_1, c_1, s_{1t}) + \phi_2' \boldsymbol{w}_t G(\gamma_2, c_2, s_{2t}) \] \[ + \phi_{12}' \boldsymbol{w}_t G(\gamma_1, c_1, s_{1t}) G(\gamma_2, c_2, s_{2t}) + \varepsilon_t \]

Polynomial Models

  • Wiener (1958) considered a nonlinear causal relationship between two processes \(x_t\) and \(y_t\)

\[ y_t = \sum_{i=0}^\infty \theta_i x_{t-i} + \sum_{i=0}^\infty \sum_{j=i}^\infty \theta_{ij} x_{t-i} x_{t-j} \] \[ + \sum_{i=0}^\infty \sum_{j=i}^\infty \sum_{k=j}^\infty \theta_{ij} x_{t-i} x_{t-j} x_{t-k} + ... \]

  • The RHS is called the Volterra series expansion.

  • If the lag‐length, and thus the number of sums is finite, it is called the Kolmogorov–Gabor polynomial.

  • The Kolmogorov–Gabor polynomial is a universal approximator.

Artificial Neural Network Models

  • The so–called ‘single hidden–layer’ model \[ y_t = \beta_0' \boldsymbol{z}_t + \sum_{j=1}^q \beta_j G(\gamma_j' \boldsymbol{z}_t) + \varepsilon_t \]
    • \(\beta_j\) are called ‘connection strengths’.
    • \(G\) are called the ‘squashing function’.
  • A theoretical argument used to motivate the use of ANN models is that they are universal approximators.

Min-Max Models

  • Granger and Hyung (2006) introduced the min-max model \[ y_{1t} = \max ( \alpha y_{1,t-1} + a, \quad \beta y_{2,t-1} + b ) + \varepsilon_{1t} \] \[ y_{2t} = \min ( \gamma y_{1,t-1} + c, \quad \delta y_{2,t-1} + d ) + \varepsilon_{2t} \]

  • The authors are particularly interested in the special case \(\alpha=\beta=\gamma=\delta=1\).

  • They show that when \(a-d<0\), the process \[ u_t = y_{1t} - y_{2t} \] is geometrically ergodic, and thus \((1,-1)\) may be viewed as the cointegration vector.

  • Application in the US interest rates of different frequencies.

Some Other Nonlinear Models

  • Nonlinear moving average models: threshold effects on the parameters of the MA models.

  • Bilinear models: autoregressive and moving average terms are combined in such a way that the models are nonlinear in variables but linear in parameters.

  • Time-Varying Parameters and State Space Models

  • Random Coefficient Models

  • Volatility Models

Testing Linearity against Parametric Alternatives

Lagrange Multiplier or Score Test

  • Consider the following additive nonlinear model \[ y_{t}=\mathbf{\beta }^{\prime }\mathbf{z}_{t}+G(\mathbf{z}_{t};\mathbf{\gamma })+\varepsilon _{t} \]

  • Assume that \(G(\mathbf{z}_{t};\mathbf{0})=0\) and \(G(\mathbf{z}_{t};\mathbf{\gamma })\neq 0\) for \(\mathbf{\gamma }\neq \mathbf{0}\).

  • It appears that the best way of testing the hypothesis is to apply the Lagrange multiplier (LM) or score principle because that only requires the estimation of the linear model.

  • The log-likelihood function \[ L_{T}(\mathbf{\theta })=c-(T/2)\ln \sigma ^{2}-(1/2\sigma ^{2})\sum_{t=1}^{T}(y_{t}-\mathbf{\beta }^{\prime }\mathbf{z}_{t}-G(\mathbf{z}_{t};\mathbf{\gamma }))^{2}. \]

The Average Score

  • The average score evaluated at \(\mathbf{\gamma =0}\) equals \[ \mathbf{s}_{T}( \widetilde{\mathbf{\theta }})=T^{-1}\left[ \begin{array}{cc} \partial L_{T}/\partial \mathbf{\beta }^{\prime } & \partial L_{T}/\partial \mathbf{\gamma }^{\prime } \end{array} \right] ^{\prime }|_{\text{H}_{0}} \] \[ =(\widetilde{\sigma }^{2}T)^{-1} \sum_{t=1}^{T}\widetilde{\varepsilon }_{t}(\mathbf{0}_{k+p+1}^{\prime } \mathbf{,(h}_{t}^{0}\mathbf{)}^{\prime })^{\prime } \] where \(\mathbf{h}_{t}^{0}=\partial G(\mathbf{z}_{t};\mathbf{\gamma })/\partial \mathbf{\gamma }|_{\mathbf{\gamma }=0}\).

The Second Partial Derivatives

  • The second partial derivatives of the likelihood function are \[ \frac{\partial ^{2}L_{T}(\mathbf{\theta })}{\partial \mathbf{\beta }\partial \mathbf{\beta }^{\prime }} =-(1/\sigma ^{2})\sum_{t=1}^{T}\mathbf{z}_{t} \mathbf{z}_{t}^{\prime } \] \[ \frac{\partial ^{2}L_{T}(\mathbf{\theta })}{\partial \mathbf{\gamma } \partial \mathbf{\gamma }^{\prime }} =-(1/\sigma ^{2})\sum_{t=1}^{T}( \mathbf{h}_{t}\mathbf{h}_{t}^{\prime }+\varepsilon _{t}\frac{\partial ^{2}G( \mathbf{z}_{t};\mathbf{\gamma })}{\partial \mathbf{\gamma }\partial \mathbf{ \gamma }^{\prime }}) \] \[ \frac{\partial ^{2}L_{T}(\mathbf{\theta })}{\partial \mathbf{\beta }\partial \mathbf{\gamma }^{\prime }} =-(1/\sigma ^{2})\sum_{t=1}^{T}\mathbf{z}_{t} \mathbf{h}_{t}^{\prime } \]

The Information Matrix

  • Since plim\(_{T\rightarrow \infty}T^{-1}\sum_{t=1}^{T}\varepsilon _{t}\frac{\partial ^{2}G(\mathbf{z}_{t}; \mathbf{\gamma })}{\partial \mathbf{\gamma }\partial \mathbf{\gamma } ^{\prime }}=0,\) this suggests the following consistent estimator for the population information matrix \(\mathbf{I}(\mathbf{\theta })\): \[ \widetilde{\mathbf{I}}_{T}(\widetilde{\mathbf{\theta }})=(1/\widetilde{% \sigma }^{2})\left[ \begin{array}{cc} T^{-1}\sum_{t=1}^{T}\mathbf{z}_{t}\mathbf{z}_{t}^{\prime } & T^{-1}\sum_{t=1}^{T}\mathbf{z}_{t}(\mathbf{h}_{t}^{0})^{\prime } \\ T^{-1}\sum_{t=1}^{T}\mathbf{h}_{t}^{0}\mathbf{z}_{t}^{\prime } & T^{-1}\sum_{t=1}^{T}\mathbf{h}_{t}^{0}(\mathbf{h}_{t}^{0})^{\prime }% \end{array}% \right] . \]

The LM Statistic

  • In matrix form, the LM statistic \[ S_{T}^{\text{LM}}=T\mathbf{s}_{T}(\widetilde{\mathbf{\theta }})^{\prime }% \widetilde{\mathbf{I}}_{T}(\widetilde{\mathbf{\theta }})^{-1}\mathbf{s}_{T}(% \widetilde{\mathbf{\theta }}) \] can thus be written as \[ S_{T}^{\text{LM}}=(1/\widetilde{\sigma }^{2})\widetilde{\mathbf{\varepsilon }% }^{\prime }\mathbf{H(H}^{\prime }\mathbf{H-H}^{\prime }\mathbf{Z(Z}^{\prime }% \mathbf{Z)}^{-1}\mathbf{Z}^{\prime }\mathbf{H)}^{-1}\mathbf{H}^{\prime }% \widetilde{\mathbf{\varepsilon }} \label{LM-stat} \]

where \(\mathbf{Z}=(\mathbf{z}_{1}^{\prime }\mathbf{,...,z}_{T}^{\prime})^{\prime },\)\(\mathbf{H=((h}_{1}^{0}\mathbf{)}^{\prime }\mathbf{,...,(h}_{T}^{0}\mathbf{)}^{\prime })^{\prime }\) and \(\widetilde{\mathbf{\varepsilon }}=(\widetilde{\varepsilon }_{1},...,\widetilde{\varepsilon }_{T})^{\prime }\).

The LM Statistic

  • Under H\(_{0},\) the statistic has an asymptotic \(\chi ^{2}\) distribution with \(n\) degrees of freedom.

  • It is exactly the same statistic as the one obtained for testing the null hypothesis \(\mathbf{\delta =0}\) in the linear model \[ \mathbf{y}=\mathbf{Z\beta }+\mathbf{H\delta }+\mathbf{\varepsilon } \]

  • Another way of viewing the test is that it has been obtained after a linearization by a Taylor expansion around the null hypothesis. This suggests that there may be several nonlinear models with the same LM test of linearity.

The Auxiliary Regression

  • the Lagrange multiplier test can also be carried out by two regressions. This form of the test is often called the \(TR^{2}\) form.
    • Estimate model under H\(_{0}\) (estimate a linear model), compute the residuals \(\widetilde{\varepsilon }_{t},\) and the residual sum of squares \(SSR_{0}.\)
    • Regress \(\widetilde{\varepsilon }_{t}\) (or \(y_{t})\) on \(\mathbf{z}_{t}\) and \(\mathbf{h}_{t}^{0},\) compute the residuals and the residual sum of squares \(SSR_{1}.\)
    • Compute the asymptotic test statistic \[ LM_{\chi ^{2}}=T\frac{SSR_{0}-SSR_{1}}{SSR_{0}} \]

The Auxiliary Regression

    • or the F-version \[ LM_{F}=\frac{(SSR_{0}-SSR_{1})/n}{SSR_{1}/\{T-(k+p+1)-n\}}. \label{lmf} \]

    • Under the null hypothesis the latter statistic has an approximate F-distribution with \(n\) and \(T-(k+p+1)-n\) degrees of freedom.

    • Robustifying against conditional heteroskedasticity.

Locally Equivalent Alternatives

  • Godfrey (1988) or Gouriéroux and Monfort (1990)

  • Let \[ y_{t}=\mathbf{\beta }^{\prime }\mathbf{z}_{t}+G_{1}(\mathbf{z}_{t}\mathbf{% ;\alpha })+\varepsilon _{1t},\;\{\varepsilon _{1t}\}\sim \text{iid}(0,\sigma ^{2}) \label{nl-1} \] and \[ y_{t}=\mathbf{\beta }^{\prime }\mathbf{z}_{t}+G_{2}(\mathbf{z}_{t}\mathbf{% ;\gamma })+\varepsilon _{2t},\;\{\varepsilon _{2t}\}\sim \text{iid}(0,\sigma ^{2}) \label{nl-2} \] be two additive nonlinear models.

  • Assume that the two equations are linear for \(\mathbf{\alpha =0}\) and \(\mathbf{\gamma =0.}\)

Locally Equivalent Alternatives

  • The two models are locally equivalent in a neighbourhood of H\(_{01}:\mathbf{\alpha =0}\) and H\(_{02}:\) \(\mathbf{\gamma =0}\) if the following two conditions are satisfied:
    • \(G_{1}(\mathbf{z}_{t}\mathbf{;0})=G_{2}(\mathbf{z}_{t}\mathbf{;0}).\)
    • \(\partial G_{1}(\mathbf{z}_{t}\mathbf{;\alpha })/\partial \mathbf{\alpha |_{\mathbf{\alpha }=\mathbf{0}}=A}\partial G_{2}(\mathbf{z}_{t} \mathbf{;\gamma })/\partial \mathbf{\gamma }|_{\mathbf{\gamma =0}}\) where \(\mathbf{A}\) is nonsingular.
  • It follows from the second condition that the LM tests derived for testing H\(_{01}\) and H\(_{02}\) are identical.

  • LM test based on Taylor expansion.

Identification Problem

  • Consider again the following additive nonlinear model \[ y_{t}=\mathbf{\beta }_{0}^{\prime }\mathbf{z}_{t}\mathbf{+\beta }% _{1}^{\prime }\mathbf{z}_{t}G(\mathbf{\gamma };\mathbf{s}_{t})+\varepsilon _{t}=(\mathbf{\beta }_{0}+\mathbf{\beta }_{1}G(\mathbf{\gamma };\mathbf{s}% _{t}))^{\prime }\mathbf{z}_{t}+\varepsilon _{t} \label{add-nlmodel} \]

  • \(\beta_1\) is not identified when \(\gamma=0\).

  • \(\gamma\) is not identified when \(\beta_1=0\).

  • The model is only identified under the alternative.

The Solution

  • The sup test.

  • The average test.

  • The exponential test.

  • Taylor expansion: approximate \(G\) locally around the null hypothesis.

Testing Parameter Constancy

Testing Parameter Constancy

  • Parameter constancy (stability) is a crucial assumption, which should be tested after a model has been estimated.

  • The Chow (1960) test, single break

  • The Bai (1999) test, multiple breaks

LM Type Tests

  • Testing against smoothly changing parameters

  • The idea is to modify the smooth transition regression model to fit this situation.

  • No need to know where the break-point is in advance.