QUESTIONS

  1. Suppose that you work in a Central Bank and you are asked to estimate a dynamic factor model (like in homework 2) with a quarterly variable and four monthly variables (all contemporaneous, not leading or lagging behavior, exactly like in Homework2). Following the MATLAB codes, seen in class, The matrix filter represents:

    a The estimation of the unobserved components in period t with the information up to period t TRUE

    b. The true value of the unobserved components

    c. The Kalman gain

    d. The transition Matrix

Solution

In reduced form we can write the Kalman Filter as:

\[ \begin{align*} y_t &= A'x_t + H'h_t +w_t \quad Observation \quad Eq. \\ h_{t+1} &= Fh_{t} + v_t \quad \quad \quad \quad \quad \quad \quad State\quad Eq. \end{align*} \]

Given some initial values for the filter and following the course’s notation, in each iteration the prediction follows:

\[ \begin{align*} h_{t|t-1}&= Fh_{t-1|t-1}\\ P_{t|t-1}&= FP_{t-1|t-1}F' + Q \end{align*} \]

The prediction is (assuming no exogenous variables):

\[ \hat{y}_{t|t-1}= H'\hat{h}_{t|t-1} \]

With forecast error:

\[ \begin{align*} \eta_{t|t-1}&= y_t - \hat{y}_{t|t-1}= y- H'\hat{h}_{t|t-1} \\ E[(y_t - \hat{y}_{t|t-1})(y_t - \hat{y}_{t|t-1})']&= HP_{t-1|t-1}H' + R \end{align*} \]

To make the notation more compact we can define the Kalman Gain as:

\[ K_t = P_{t|t-1}H(HP_{t-1|t-1}H' + R)^{-1} \]

Therefore the update of the unobserved component is just:

\[ \hat{h}_{t|t}= \hat{h}_{t|t-1} + K_t \eta_{t|t-1} \]

In the code we define the matrix filter as:

filter(it,:) = beta11' where beta11 is equal to: \(\hat{h}_{t|t}\)

Therefore, the matrix filter is the estimation of the unobserved components in period t with the information up to period t

  1. Suppose that you work in a Central Bank and you are asked to estimate a dynamic factor model (like in homework2) with a quarterly variable and four monthly variables (all contemporaneous, not leading or lagging behavior, exactly like in Homework2). Following the MATLAB codes, seen in class, Is filter(t,2) =filter(t-1,1)? (t refers to the time period)
  1. Yes, always because the information set is different
  2. No, never because the information set is different.
  3. Sometimes, depending on the information set. TRUE
  4. They are completely independent. It might coincide but just by luck

Solution

As we have seen in the solution above, we can define the matrix filter as: filter(it,:) = beta11'where beta11 is equal to: \(\hat{h}_{t|t}\). Therefore, each row of the matrix filter is just the transpose of the matrix \(\hat{h}_{t|t}\) from the observation equation.

\[ \begin{align*} y_t &= A'x_t + H'h_t +w_t \quad Observation \quad Eq. \\ h_{t+1} &= Fh_{t} + v_t \quad \quad \quad \quad \quad \quad \quad State\quad Eq. \end{align*} \]

For the example of the Homework 2, this matrix looks something like the transpose of the following matrix:

\[ h_t=\begin{bmatrix} f_{t}\\ f_{t-1}\\ \vdots \\ f_{t-11}\\ e_{1t} \\ e_{1t-1} \\ e_{2t} \\ e_{2t-1} \\ e_{3t} \\ e_{3t-1} \\ e_{4t} \\ e_{4t-1} \\ \end{bmatrix} \]

The column of the matrix filter starting from the left, represents the estimated factor, the second one the lag of the estimated factor, the third one the second lag and so on. If our matrix of raw data looks like this:

We can see that the last 3 raws (832 to 834) there is no new information added to our model. The conditional set remains the same, therefore our estimation of the factor remains the same, can not improve, worsen or change. No information, no update. As we can see below,filter(t,2)=filter(t-1,1) for the rows t=833.

  1. Suppose that you work in a Central Bank and you are asked to estimate a dynamic factor model (like in homework2) with a quarterly variable and four monthly variables (all contemporaneous, not leading or lagging behavior, exactly like in Homework2). Following the MATLAB codes, seen in class, If T refers to the last observation used in the estimation (suppose it is June 2020), and if filter(T,1)=-0.3
  1. The economy is growing below the average growth TRUE
  2. The quarterly growth rate of the economy is negative
  3. The economy is in recession because the factor is negative.
  4. None of the above is correct

Solution

Intuition: we have normalized everything with mean zero. If everything is average, \(f_t =0\). If the factor is below 0, all we can say is that we are growing below average.

The model yields a monthly factor and we usually evaluate the economic growth in a quarterly basis according to the National Accounts System. For a monthly factor to estimate a quarterly series we can follow a procedure like this:

For a given quarter and month:

\[ Y^*_q = \frac{1}{3}y^*_t +\frac{2}{3}y^*_{t-1}+y^*_{t-2}+\frac{2}{3}y^*_{t-3}+\frac{1}{3}y^*_{t-4} \]

From this point we can compute the monthly factor as:

\[ Y^*_q = \frac{1}{3}f_t +\frac{2}{3}f_{t-1}+f_{t-2}+\frac{2}{3}f_{t-3}+\frac{1}{3}f_{t-4} \]

And the estimation just needs to add the error:

\[ \hat{Y}^*_q = \frac{1}{3}f_t +\frac{2}{3}f_{t-1}+f_{t-2}+\frac{2}{3}f_{t-3}+\frac{1}{3}f_{t-4} + \frac{1}{3}e_t +\frac{2}{3}e_{t-1}+e_{t-2}+\frac{2}{3}e_{t-3}+\frac{1}{3}e_{t-4} \]

Then, that filter(T,1)=-0.3 just means that \(f_t\) = -.3. The quaterly growth can be positive given that the factor estimate for April and May are: \(\frac{1}{3}f_t \leq\frac{2}{3}f_{t-1}+f_{t-2}\) and we have a positive carry over effect.

  1. Suppose that you work in a Central Bank and you are asked to estimate a dynamic factor model (like in homework2) with a quarterly variable and four monthly variables (all contemporaneous, not leading or lagging behavior, exactly like in Homework2). Following the MATLAB codes, seen in class,

Suppose that, as of today, I want to forecast all the variables in the system up to December 2020. In order to do that, I fill the dataset with missing data until the last observation that I want to forecast (December 2020). If T refers to the last observation used in the estimation (in this exercise will be December 2020). What would be the values of the coefficients of the Kalman gain matrix in the last observation (December 2020).

  1. We do not have enough information to say anything about those values
  2. All the elements of the Kalman gain will be positive
  3. All the elements of the Kalman gain will be negative.
  4. All the elements of the Kalman gain will be 0 TRUE

Solution

We can define the Kalman Gain as:

\[ K_t = P_{t|t-1}H(HP_{t-1|t-1}H' + R)^{-1} \]

Which comes from the expression:

\[ \sigma_{\varepsilon_t}=E[(y_t - \hat{y}_{t|t-1})(y_t - \hat{y}_{t|t-1})']= HP_{t-1|t-1}H' + R \]

So we could call \(\sigma_{\varepsilon_t}\) variance of the the one-step ahead innovation errors. Therefore, the Kalman Gain determines how much the innovation errors at t influence the estimate of the state at time t. We will not have innovation erros in forecasting one step ahead data but the Kalman Gain would still be zero.

  1. Suppose that you work in a Central Bank and you are asked to estimate a dynamic factor model (like in homework2) with a quarterly variable and four monthly variables (all contemporaneous, not leading or lagging behavior, exactly like in Homework2). If Remember that your model is designed to forecast GDP growth.

    Suppose that car registration is not a variable that you have in your four monthly variables and somebody suggest that you should include it. Which statistical criteria would you apply?

  1. I would include car registration in the model if the factor loading ( \(\gamma_5\), according to the notation seen in class) of car registrations is positive and significant, because this positive association always imply a better fit for GDP.
  2. I would include car registration in the model if the correlation of the common factor and GDP increases after I include the variable in the model TRUE
  3. Including more variables is always worse for forecasting. We should keep it parsimonious.
  4. None of the above is correct.

Solution

The idea is the following: the more variables you include in a forecasting model the more capacity you will have to explain the variance of the independent vector. However, this comes at a cost, the more variables in a model also mean the higher the noise. When you you have too noisy dataset, even it closely explains 100% of the variance of the independent variable, the more difficult it wold be to extract the actual subyacent signal. This is the so called Bias-Variance Tradeoff.

The answer (a) does not make sense, a positive association it does not imply better fit for GDP. For instance, the unemployment rate can help to forecast GDP but it has a negative assocaition. The answer (b) does not make sense, it would likely increase the correlation of the factor and GPD, but there is no free lunch, you would have a noiser model. The answer (c) neither make sense, because the trade off mentioned before there is no rule of thumb.

Therefore, more data is not always better and can increase forecast errors even when using dimensionality reduction techniques (Boivin & Ng, 2006). To check if more data helps there are two popular options. One is to estimate the RMSE of the model with and without the extra variables and see which one performs better. However, this is an out of sample exercise and makes the model to overfit to the out of sample data which is not always a good idea.

A more refined solution is hard thresholding (Bai & Ng,2007) consists of regressing the forecast variable on its lags and each individual indicator and selecting all indicators with an absolute t-statistic above a certain threshold. In this case, the threshold is obtained by comparing out-of-sample performance of forecasts across a range of thresholds and choosing the threshold that delivers the lowest forecast errors.

  1. Suppose that you work in a Central Bank and you are asked to estimate a dynamic factor model (like in homework2) with a quarterly variable and four monthly variables (all contemporaneous, not leading or lagging behavior, exactly like in Homework2). If Remember that your model is designed to forecast GDP growth Suppose that you want to include in the model a soft indicator that you know it is related to annual growth rate of the economy.
  1. It cannot be included because the model is written in monthly frequency and annual variables cannot be introduced in a monthly model
  2. The dimension of the state variable \(h_t\) has to include at least 12 lags of the common factor. TRUE
  3. The dimension of the state variable \(h_t\) does not change.
  4. The system is not identified.

Solution

The soft indicator is often released in annual growth rates. This implies that it is related to the level of activity in period t minus the level of activity in period t-1. We can define the annual activity using the lag operator as:

\[Annual \quad Activity = (1-L)^{12}F_t =(1-L) (1 +L + L^2 + ...+ L^{11})F_t= f_t + f_{t-1}+ ...+ f_{t-11}\]

Therefore the monthly growth rates can be expressed as:

\[y_{1t}= \gamma_1(f_t + f_{t-1}+ ...+ f_{t-11})+\epsilon_{1t}\]For the second variable, which is expressed in quarterly g

  1. Suppose that you have eight monthly variables and you want to extract the common component of all of them.
  1. Kalman filter and Principal components will give you always the same estimation of the common factor.
  2. Kalman filter and principal components differ because Principal components take into account the dynamics of the factor and Kalman filter is static.
  3. Both are static filters that capture the non-observed components of the variables.
  4. None of the above is correct. TRUE

Solution

In the KF you impose the dynamics of the data to follow a certain process while in the PCA you do not have this capacity. In a sense, both techniques extract a non-observed component and this non-observed is dynamic, but the dynamic is “set” in the KF while in the PCA there is no control.

  1. Suppose that you work in a Central Bank and you are asked to estimate a dynamic factor model (like in homework2) with a quarterly variable and four monthly variables (all contemporaneous, not leading or lagging behavior, exactly like in Homework2). If Remember that your model is designed to forecast GDP growth Suppose that you want to include an additional variable, “cement consumption” that you know it is a leading variable (3 months lead with respect to GDP)
  1. You cannot do that. Lagging or contemporaneous variables can be included. Leading variables cannot.
  2. You should enlarge your state model to include, at least 8 leads of the factor.
  3. You should enlarge your state model to include, at least 3 leads of the factor. TRUE
  4. You should enlarge your state model to include, at least 12 leads of the factor.

Solution

If our monthly factor is related to GDP, cement consumption will also be related to the monthly factor with a 3 months lead. The state equation would look like:

\[ \begin{bmatrix} f_{t+3}\\ f_{t+2}\\ \vdots\\ f_{t-4}\\ \\ \\ \\ \\ \\ \end{bmatrix} = \begin{bmatrix} \end{bmatrix} \begin{bmatrix} f_{t+2}\\ f_{t+1}\\ \vdots\\ f_{t-5}\\ \\ \\ \\ \\ \\ \end{bmatrix} + error\quad matrix \]

  1. Suppose that the relation between two variables, \(y_t\) 𝑎𝑛𝑑 \(x_t\). instead of being driven by the standard linear model, \(y_t = Bx_t +\epsilon_t\) is driven by\(y_t = B_tx_t +\epsilon_t\) , where \(B_t\) is equal to \(B_1\)if an unobservable variable, St=1 and \(B_0\) if an unobservable variable St=0, with St following a Markov Chain. \(\epsilon_t\) 𝑖𝑠 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑑 𝑎𝑠 𝑁(0, 𝜎) Tell me which of the following is NOT CORRECT:
  1. The estimated density for \(y_t\) is a mixture of normal distributions.
  2. A Markov Chain implies that \(Pr(S_t = i |\Omega_t) i =1,0\) is a sufficient statistic to calculate \(Pr(S_{t+1} = i |\Omega_t)\quad \Omega_t\)information set in period “t”
  3. The model can be estimated with a Kalman filter with time-varying coefficients FALSE
  4. The parameters that measure the probability of staying in the same state of the economy are nuisance parameters under the null of the linear model being the true model.

Solution

The estimated density does not need to be necessarily a mixture of normals. This is a possible situation but will not always be true because the MC does not imply normality.

  1. Suppose that the relation between two variables, 𝑦௧ 𝑎𝑛𝑑 𝑥௧ . instead of being driven by the standard linear model, 𝑦௧ = 𝐵𝑥௧ + 𝜖௧ is driven by 𝑦௧ = 𝐵௧𝑥௧ + 𝜖௧ , where 𝐵௧ = 𝐹𝐵௧ିଵ + 𝑢௟ with 𝜀௧ 𝑖𝑠 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑑 𝑎𝑠 𝑁(0, 𝜎) and 𝑢௧ 𝑖𝑠 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑑 𝑎𝑠 𝑁(0, 𝜁) Tell me which of the following is CORRECT:
  1. The estimated density for 𝑦௧ is a mixture of normal distributions.
  2. The model can be estimated using Hamilton Filter implying restrictions in the transition matrix.
  3. The model can be estimated with a Kalman filter with time-varying coefficients TRUE
  4. The model can be estimated by principal components analysis

We can translate the model from this form:

\[ \begin{align*} y_t &= B_tx_t +w_t \quad Observation \quad Eq. \\ B_t &= FB_{t-1} + v_t \quad \quad \quad \quad State\quad Eq. \end{align*} \]

To the usual reduced form of the KF by substituting the \(B_t\) by the unobserved parameter \(h_t\) :

\[ \begin{align*} y_t &= A'x_t + H'h_t +w_t \quad Observation \quad Eq. \\ h_t &= Fh_{t-1} + v_t \quad \quad \quad \quad State\quad Eq. \end{align*} \]