Nonlinear Economic Time Series Models

October 2, 2017

Concepts, Models, and Definitions

Defining Nonlinearity

As a first attempt of a definition of linearity, consider

\[ \mathsf{E}\{y_t | \boldsymbol{z}_t\} = \alpha' \boldsymbol{z}_t + g(\boldsymbol{z}_t) \label{eqn:first} \] where \(y_t\) is the dependent variable, \(\boldsymbol{z}_t\) the explanatory vector including lagged \(y_t\).

The model for the conditional mean is said to be linear if \(g(\boldsymbol{z}_t) \equiv 0\).
This definition is used in Lee, White and Granger (1993).

Where does nonlinearity come from?

Nonlinearity arises from the structure and institutions of the economy.
- Ceilings and floors to some variables and one would expect the variable to behave differently near them
- Union requirements may make it easier to increase wages than to decrease them
- public reactions to prices make increases more difficult than price decreases
- and etc.
Linear model: lack of symmetry, which is a form of nonlinearity
The theory is not very specific about how nonlinear the data will turn out to be.
The amount of nonlinearity may be reduced from high-frequency to low-frequency.

Stationarity and Nonstationarity

For nonlinear models, conditional quantities like the conditional distribution, the conditional mean, and variance will be much more important than the unconditional ones.
In most models in the book, strict stationarity and existence of at least second‐order moments are assumed.
Stationarity conditions: a recursively defined Markov model.
- Forster-Lyapunov drift criterion + a sufficient condition: \(g(y, \theta) < |y|\) for \(|y|\) large + geometric ergodicity
Possible transformation from non-stationary to stationary.

Wold's Representation and Volterra Expansion

Wold (1938) shows that if \(y_t\) is purely nondeterministic, then there will always be a representation

\[ y_t = \varepsilon_t + \sum_{j=1}^q \theta_j \varepsilon_{t-j} \]

Note that \(q\) not necessarily finite.

But it is an identity in the mean square sense.
A formal nonlinear generalization, Volterra expansion

\[ y_t = \sum_{i=0}^q \theta_i \varepsilon_{t-i} + \sum_{i=0}^q \sum_{j=i}^q \theta_{ij} \varepsilon_{t-i} \varepsilon_{t-j} + \sum_{i=0}^q \sum_{j=i}^q \sum_{k=j}^q \theta_{ijk} \varepsilon_{t-i} \varepsilon_{t-j} \varepsilon_{t-k} + ... \]

Nonlinear Models in Economic Theory

Disequilibrium Models

Fair and Jaffee (1972), the general disequilibrium model

\[ D_t = \alpha_0' x_t^D + \alpha_1 p_t + \varepsilon_t^D \] \[ S_t = \beta_0' x_t^S + \beta_1 p_t + \varepsilon_t^S, \] with the "min-condtion" \[ D_t^{obs} = \min(D_t, S_t). \]

Another possibility that Fair and Jaffee (1972) considered

\[ p_t - p_{t-1} = \gamma (D_t - S_t). \]

Exchange Rates in a Target Zone

The research in Krugman (1991)

Production Theory

A two‐input version of the translog production function by Christensen, Jorgensen, and Lau (1973)

\[ \ln y = \ln \gamma + \alpha_1 \ln x_1 + \alpha_2 \ln x_2 + \alpha_{11}(\ln x_1)^2 + \alpha_{22} (\ln x_2)^2 + \alpha_{12} (\ln x_1 \ln x_2). \]

The RHS is a second‐order Kolmogorov–Gabor polynomial to be discussed in Section 3.

Parametric Nonlinear Models

Switching Regression Models

The standard switching regression (SR) model is piecewise linear,

\[ y_t = \sum_{j=1}^r (\phi_j' \boldsymbol{z}_t + \varepsilon_{jt}) I(c_{j-1} < s_t \leq c_j) \]

A special case, 2 regime SR model \[ y_t = (\phi_1' \boldsymbol{z}_t + \varepsilon_{1t}) I(s_t \leq c_1) + (\phi_2' \boldsymbol{z}_t + \varepsilon_{2t}) I(s_t > c_1) \]
When \(\boldsymbol{z}_t\) only contains the intercept and the lagged \(y_t\), and \(s_t = y_{t-d}\), the model becomes the self‐exciting threshold autoregressive (SETAR, or TAR for short) model.

Switching Regression Models

Another special case of the univariate TAR model is the one in which only the intercept is switching

\[ y_t = \sum_{j=1}^r \phi_{0j}' I(c_{j-1} < s_t \leq c_j) + \phi' \tilde{\boldsymbol{w}}_t + \varepsilon_t \]

Estimation of SR models can be carried out by conditional least squares.
Asymptotic distribution for \(c\), Chan (1993) and Hansen (2000).

Markov-Switching Regression Models

The observable regime indicator \(s_t\) in SR model is replaced by an unobservable discrete stochastic variable \(\theta_t\).
The sequence \(\{\theta_t\}\) is assumed to be a sequence of iid variables or to follow a Markov chain, typically of order one, with transition probabilities

\[ p_{ij} = \mathsf{Pr} \{ \theta_t = \nu_j | \theta_{t-1} = \nu_i \}, \quad i,j = 1, ..., r. \] * The Markov‐switching (MS) or hidden Markov regression model

\[ y_t = \sum_{j=1}^r (\phi_j' \boldsymbol{z}_t + \varepsilon_{jt}) I(\theta_t = \nu_j) \]

Smooth Transition Regression Models

The SR models has been criticized for its lack of smoothness in its transition mechanism.
Bacon and Watts (1971) considered two regression lines and devised a model in which the transition from one line to the other is smooth.
Goldfeld and Quandt (1972) independently presented a STR model, suggested that the step function \(I\) be replaced by a normal cdf.

Smooth Transition Regression Models

Maddala (1977) recommended the logistic function instead of the normal cdf, and this has become the prevailing standard.
The logistic STR (LSTR) model

\[ y_t = \{ \phi + \psi G(\gamma, c, s_t) \}' \boldsymbol{z}_t + \varepsilon_t \] with \[ G(\gamma, c, s_t) = \left( 1 + \exp \left\{ - \gamma \prod_{k=1}^K (s_t - c_k) \right\} \right)^{-1} \] where \(\gamma > 0\).

The ESTR Model

It should be mentioned that there exists an alternative to the LSTR2 model, the so–called exponential STR (ESTR) model with \[ G(\gamma, c, s_t) = 1 - \exp \left\{ - \gamma (s_t - c_k)^2 \right\} \] where \(\gamma > 0\).

The Additive and multiple STR Model

Van Dijk and Franses (1999) introduced the additive STR model \[ y_t = \phi_1' \boldsymbol{z}_t + \sum_{j=2}^n \phi_j' \boldsymbol{z}_t G(\gamma_j, c_j, s_{jt}) + \varepsilon_t \]
They also considered the multiple regime STAR model

\[ y_t = \phi_0' \boldsymbol{w}_t + \phi_1' \boldsymbol{w}_t G(\gamma_1, c_1, s_{1t}) + \phi_2' \boldsymbol{w}_t G(\gamma_2, c_2, s_{2t}) \] \[ + \phi_{12}' \boldsymbol{w}_t G(\gamma_1, c_1, s_{1t}) G(\gamma_2, c_2, s_{2t}) + \varepsilon_t \]

Polynomial Models

Wiener (1958) considered a nonlinear causal relationship between two processes \(x_t\) and \(y_t\)

\[ y_t = \sum_{i=0}^\infty \theta_i x_{t-i} + \sum_{i=0}^\infty \sum_{j=i}^\infty \theta_{ij} x_{t-i} x_{t-j} \] \[ + \sum_{i=0}^\infty \sum_{j=i}^\infty \sum_{k=j}^\infty \theta_{ij} x_{t-i} x_{t-j} x_{t-k} + ... \]

The RHS is called the Volterra series expansion.
If the lag‐length, and thus the number of sums is finite, it is called the Kolmogorov–Gabor polynomial.
The Kolmogorov–Gabor polynomial is a universal approximator.

Artificial Neural Network Models

The so–called ‘single hidden–layer’ model \[ y_t = \beta_0' \boldsymbol{z}_t + \sum_{j=1}^q \beta_j G(\gamma_j' \boldsymbol{z}_t) + \varepsilon_t \]
- \(\beta_j\) are called ‘connection strengths’.
- \(G\) are called the ‘squashing function’.
A theoretical argument used to motivate the use of ANN models is that they are universal approximators.

Min-Max Models

Granger and Hyung (2006) introduced the min-max model \[ y_{1t} = \max ( \alpha y_{1,t-1} + a, \quad \beta y_{2,t-1} + b ) + \varepsilon_{1t} \] \[ y_{2t} = \min ( \gamma y_{1,t-1} + c, \quad \delta y_{2,t-1} + d ) + \varepsilon_{2t} \]
The authors are particularly interested in the special case \(\alpha=\beta=\gamma=\delta=1\).
They show that when \(a-d<0\), the process \[ u_t = y_{1t} - y_{2t} \] is geometrically ergodic, and thus \((1,-1)\) may be viewed as the cointegration vector.
Application in the US interest rates of different frequencies.

Some Other Nonlinear Models

Nonlinear moving average models: threshold effects on the parameters of the MA models.
Bilinear models: autoregressive and moving average terms are combined in such a way that the models are nonlinear in variables but linear in parameters.
Time-Varying Parameters and State Space Models
Random Coefficient Models
Volatility Models

Testing Linearity against Parametric Alternatives

Lagrange Multiplier or Score Test

Consider the following additive nonlinear model \[ y_{t}=\mathbf{\beta }^{\prime }\mathbf{z}_{t}+G(\mathbf{z}_{t};\mathbf{\gamma })+\varepsilon _{t} \]
Assume that \(G(\mathbf{z}_{t};\mathbf{0})=0\) and \(G(\mathbf{z}_{t};\mathbf{\gamma })\neq 0\) for \(\mathbf{\gamma }\neq \mathbf{0}\).
It appears that the best way of testing the hypothesis is to apply the Lagrange multiplier (LM) or score principle because that only requires the estimation of the linear model.
The log-likelihood function \[ L_{T}(\mathbf{\theta })=c-(T/2)\ln \sigma ^{2}-(1/2\sigma ^{2})\sum_{t=1}^{T}(y_{t}-\mathbf{\beta }^{\prime }\mathbf{z}_{t}-G(\mathbf{z}_{t};\mathbf{\gamma }))^{2}. \]

The Average Score

The average score evaluated at \(\mathbf{\gamma =0}\) equals \[ \mathbf{s}_{T}( \widetilde{\mathbf{\theta }})=T^{-1}\left[ \begin{array}{cc} \partial L_{T}/\partial \mathbf{\beta }^{\prime } & \partial L_{T}/\partial \mathbf{\gamma }^{\prime } \end{array} \right] ^{\prime }|_{\text{H}_{0}} \] \[ =(\widetilde{\sigma }^{2}T)^{-1} \sum_{t=1}^{T}\widetilde{\varepsilon }_{t}(\mathbf{0}_{k+p+1}^{\prime } \mathbf{,(h}_{t}^{0}\mathbf{)}^{\prime })^{\prime } \] where \(\mathbf{h}_{t}^{0}=\partial G(\mathbf{z}_{t};\mathbf{\gamma })/\partial \mathbf{\gamma }|_{\mathbf{\gamma }=0}\).

The Second Partial Derivatives

The second partial derivatives of the likelihood function are \[ \frac{\partial ^{2}L_{T}(\mathbf{\theta })}{\partial \mathbf{\beta }\partial \mathbf{\beta }^{\prime }} =-(1/\sigma ^{2})\sum_{t=1}^{T}\mathbf{z}_{t} \mathbf{z}_{t}^{\prime } \] \[ \frac{\partial ^{2}L_{T}(\mathbf{\theta })}{\partial \mathbf{\gamma } \partial \mathbf{\gamma }^{\prime }} =-(1/\sigma ^{2})\sum_{t=1}^{T}( \mathbf{h}_{t}\mathbf{h}_{t}^{\prime }+\varepsilon _{t}\frac{\partial ^{2}G( \mathbf{z}_{t};\mathbf{\gamma })}{\partial \mathbf{\gamma }\partial \mathbf{ \gamma }^{\prime }}) \] \[ \frac{\partial ^{2}L_{T}(\mathbf{\theta })}{\partial \mathbf{\beta }\partial \mathbf{\gamma }^{\prime }} =-(1/\sigma ^{2})\sum_{t=1}^{T}\mathbf{z}_{t} \mathbf{h}_{t}^{\prime } \]

The Information Matrix

Since plim\(_{T\rightarrow \infty}T^{-1}\sum_{t=1}^{T}\varepsilon _{t}\frac{\partial ^{2}G(\mathbf{z}_{t}; \mathbf{\gamma })}{\partial \mathbf{\gamma }\partial \mathbf{\gamma } ^{\prime }}=0,\) this suggests the following consistent estimator for the population information matrix \(\mathbf{I}(\mathbf{\theta })\): \[ \widetilde{\mathbf{I}}_{T}(\widetilde{\mathbf{\theta }})=(1/\widetilde{% \sigma }^{2})\left[ \begin{array}{cc} T^{-1}\sum_{t=1}^{T}\mathbf{z}_{t}\mathbf{z}_{t}^{\prime } & T^{-1}\sum_{t=1}^{T}\mathbf{z}_{t}(\mathbf{h}_{t}^{0})^{\prime } \\ T^{-1}\sum_{t=1}^{T}\mathbf{h}_{t}^{0}\mathbf{z}_{t}^{\prime } & T^{-1}\sum_{t=1}^{T}\mathbf{h}_{t}^{0}(\mathbf{h}_{t}^{0})^{\prime }% \end{array}% \right] . \]

The LM Statistic

In matrix form, the LM statistic \[ S_{T}^{\text{LM}}=T\mathbf{s}_{T}(\widetilde{\mathbf{\theta }})^{\prime }% \widetilde{\mathbf{I}}_{T}(\widetilde{\mathbf{\theta }})^{-1}\mathbf{s}_{T}(% \widetilde{\mathbf{\theta }}) \] can thus be written as \[ S_{T}^{\text{LM}}=(1/\widetilde{\sigma }^{2})\widetilde{\mathbf{\varepsilon }% }^{\prime }\mathbf{H(H}^{\prime }\mathbf{H-H}^{\prime }\mathbf{Z(Z}^{\prime }% \mathbf{Z)}^{-1}\mathbf{Z}^{\prime }\mathbf{H)}^{-1}\mathbf{H}^{\prime }% \widetilde{\mathbf{\varepsilon }} \label{LM-stat} \]

where \(\mathbf{Z}=(\mathbf{z}_{1}^{\prime }\mathbf{,...,z}_{T}^{\prime})^{\prime },\)\(\mathbf{H=((h}_{1}^{0}\mathbf{)}^{\prime }\mathbf{,...,(h}_{T}^{0}\mathbf{)}^{\prime })^{\prime }\) and \(\widetilde{\mathbf{\varepsilon }}=(\widetilde{\varepsilon }_{1},...,\widetilde{\varepsilon }_{T})^{\prime }\).

The LM Statistic

Under H\(_{0},\) the statistic has an asymptotic \(\chi ^{2}\) distribution with \(n\) degrees of freedom.
It is exactly the same statistic as the one obtained for testing the null hypothesis \(\mathbf{\delta =0}\) in the linear model \[ \mathbf{y}=\mathbf{Z\beta }+\mathbf{H\delta }+\mathbf{\varepsilon } \]
Another way of viewing the test is that it has been obtained after a linearization by a Taylor expansion around the null hypothesis. This suggests that there may be several nonlinear models with the same LM test of linearity.

The Auxiliary Regression

the Lagrange multiplier test can also be carried out by two regressions. This form of the test is often called the \(TR^{2}\) form.
- Estimate model under H\(_{0}\) (estimate a linear model), compute the residuals \(\widetilde{\varepsilon }_{t},\) and the residual sum of squares \(SSR_{0}.\)
- Regress \(\widetilde{\varepsilon }_{t}\) (or \(y_{t})\) on \(\mathbf{z}_{t}\) and \(\mathbf{h}_{t}^{0},\) compute the residuals and the residual sum of squares \(SSR_{1}.\)
- Compute the asymptotic test statistic \[ LM_{\chi ^{2}}=T\frac{SSR_{0}-SSR_{1}}{SSR_{0}} \]

The Auxiliary Regression

- or the F-version \[ LM_{F}=\frac{(SSR_{0}-SSR_{1})/n}{SSR_{1}/\{T-(k+p+1)-n\}}. \label{lmf} \]
- Under the null hypothesis the latter statistic has an approximate F-distribution with \(n\) and \(T-(k+p+1)-n\) degrees of freedom.
- Robustifying against conditional heteroskedasticity.

Locally Equivalent Alternatives

Godfrey (1988) or Gouriéroux and Monfort (1990)
Let \[ y_{t}=\mathbf{\beta }^{\prime }\mathbf{z}_{t}+G_{1}(\mathbf{z}_{t}\mathbf{% ;\alpha })+\varepsilon _{1t},\;\{\varepsilon _{1t}\}\sim \text{iid}(0,\sigma ^{2}) \label{nl-1} \] and \[ y_{t}=\mathbf{\beta }^{\prime }\mathbf{z}_{t}+G_{2}(\mathbf{z}_{t}\mathbf{% ;\gamma })+\varepsilon _{2t},\;\{\varepsilon _{2t}\}\sim \text{iid}(0,\sigma ^{2}) \label{nl-2} \] be two additive nonlinear models.
Assume that the two equations are linear for \(\mathbf{\alpha =0}\) and \(\mathbf{\gamma =0.}\)

Locally Equivalent Alternatives

The two models are locally equivalent in a neighbourhood of H\(_{01}:\mathbf{\alpha =0}\) and H\(_{02}:\) \(\mathbf{\gamma =0}\) if the following two conditions are satisfied:
- \(G_{1}(\mathbf{z}_{t}\mathbf{;0})=G_{2}(\mathbf{z}_{t}\mathbf{;0}).\)
- \(\partial G_{1}(\mathbf{z}_{t}\mathbf{;\alpha })/\partial \mathbf{\alpha |_{\mathbf{\alpha }=\mathbf{0}}=A}\partial G_{2}(\mathbf{z}_{t} \mathbf{;\gamma })/\partial \mathbf{\gamma }|_{\mathbf{\gamma =0}}\) where \(\mathbf{A}\) is nonsingular.
It follows from the second condition that the LM tests derived for testing H\(_{01}\) and H\(_{02}\) are identical.
LM test based on Taylor expansion.

Identification Problem

Consider again the following additive nonlinear model \[ y_{t}=\mathbf{\beta }_{0}^{\prime }\mathbf{z}_{t}\mathbf{+\beta }% _{1}^{\prime }\mathbf{z}_{t}G(\mathbf{\gamma };\mathbf{s}_{t})+\varepsilon _{t}=(\mathbf{\beta }_{0}+\mathbf{\beta }_{1}G(\mathbf{\gamma };\mathbf{s}% _{t}))^{\prime }\mathbf{z}_{t}+\varepsilon _{t} \label{add-nlmodel} \]
\(\beta_1\) is not identified when \(\gamma=0\).
\(\gamma\) is not identified when \(\beta_1=0\).
The model is only identified under the alternative.

The Solution

The sup test.
The average test.
The exponential test.
Taylor expansion: approximate \(G\) locally around the null hypothesis.

Testing Parameter Constancy

Parameter constancy (stability) is a crucial assumption, which should be tested after a model has been estimated.
The Chow (1960) test, single break
The Bai (1999) test, multiple breaks

LM Type Tests

Testing against smoothly changing parameters
The idea is to modify the smooth transition regression model to fit this situation.
No need to know where the break-point is in advance.