Suppose that you work in a Central Bank and you are asked to estimate a dynamic factor model (like in homework 2) with a quarterly variable and four monthly variables (all contemporaneous, not leading or lagging behavior, exactly like in Homework2). Following the MATLAB codes, seen in class, The matrix filter represents:
a The estimation of the unobserved components in period t with the information up to period t TRUE
b. The true value of the unobserved components
c. The Kalman gain
d. The transition Matrix
Solution
In reduced form we can write the Kalman Filter as:
\[ \begin{align*} y_t &= A'x_t + H'h_t +w_t \quad Observation \quad Eq. \\ h_{t+1} &= Fh_{t} + v_t \quad \quad \quad \quad \quad \quad \quad State\quad Eq. \end{align*} \]
Given some initial values for the filter and following the course’s notation, in each iteration the prediction follows:
\[ \begin{align*} h_{t|t-1}&= Fh_{t-1|t-1}\\ P_{t|t-1}&= FP_{t-1|t-1}F' + Q \end{align*} \]
The prediction is (assuming no exogenous variables):
\[ \hat{y}_{t|t-1}= H'\hat{h}_{t|t-1} \]
With forecast error:
\[ \begin{align*} \eta_{t|t-1}&= y_t - \hat{y}_{t|t-1}= y- H'\hat{h}_{t|t-1} \\ E[(y_t - \hat{y}_{t|t-1})(y_t - \hat{y}_{t|t-1})']&= HP_{t-1|t-1}H' + R \end{align*} \]
To make the notation more compact we can define the Kalman Gain as:
\[ K_t = P_{t|t-1}H(HP_{t-1|t-1}H' + R)^{-1} \]
Therefore the update of the unobserved component is just:
\[ \hat{h}_{t|t}= \hat{h}_{t|t-1} + K_t \eta_{t|t-1} \]
In the code we define the matrix filter
as:
filter(it,:) = beta11'
where beta11
is equal to: \(\hat{h}_{t|t}\)
Therefore, the matrix filter
is the estimation of the unobserved components in period t with the information up to period t
Solution
As we have seen in the solution above, we can define the matrix filter
as: filter(it,:) = beta11'
where beta11
is equal to: \(\hat{h}_{t|t}\). Therefore, each row of the matrix filter
is just the transpose of the matrix \(\hat{h}_{t|t}\) from the observation equation.
\[ \begin{align*} y_t &= A'x_t + H'h_t +w_t \quad Observation \quad Eq. \\ h_{t+1} &= Fh_{t} + v_t \quad \quad \quad \quad \quad \quad \quad State\quad Eq. \end{align*} \]
For the example of the Homework 2, this matrix looks something like the transpose of the following matrix:
\[ h_t=\begin{bmatrix} f_{t}\\ f_{t-1}\\ \vdots \\ f_{t-11}\\ e_{1t} \\ e_{1t-1} \\ e_{2t} \\ e_{2t-1} \\ e_{3t} \\ e_{3t-1} \\ e_{4t} \\ e_{4t-1} \\ \end{bmatrix} \]
The column of the matrix filter
starting from the left, represents the estimated factor, the second one the lag of the estimated factor, the third one the second lag and so on. If our matrix of raw data looks like this:
We can see that the last 3 raws (832 to 834) there is no new information added to our model. The conditional set remains the same, therefore our estimation of the factor remains the same, can not improve, worsen or change. No information, no update. As we can see below,filter(t,2)=filter(t-1,1)
for the rows t=833.
Solution
Intuition: we have normalized everything with mean zero. If everything is average, \(f_t =0\). If the factor is below 0, all we can say is that we are growing below average.
The model yields a monthly factor and we usually evaluate the economic growth in a quarterly basis according to the National Accounts System. For a monthly factor to estimate a quarterly series we can follow a procedure like this:
For a given quarter and month:
\[ Y^*_q = \frac{1}{3}y^*_t +\frac{2}{3}y^*_{t-1}+y^*_{t-2}+\frac{2}{3}y^*_{t-3}+\frac{1}{3}y^*_{t-4} \]
From this point we can compute the monthly factor as:
\[ Y^*_q = \frac{1}{3}f_t +\frac{2}{3}f_{t-1}+f_{t-2}+\frac{2}{3}f_{t-3}+\frac{1}{3}f_{t-4} \]
And the estimation just needs to add the error:
\[ \hat{Y}^*_q = \frac{1}{3}f_t +\frac{2}{3}f_{t-1}+f_{t-2}+\frac{2}{3}f_{t-3}+\frac{1}{3}f_{t-4} + \frac{1}{3}e_t +\frac{2}{3}e_{t-1}+e_{t-2}+\frac{2}{3}e_{t-3}+\frac{1}{3}e_{t-4} \]
Then, that filter(T,1)=-0.3
just means that \(f_t\) = -.3. The quaterly growth can be positive given that the factor estimate for April and May are: \(\frac{1}{3}f_t \leq\frac{2}{3}f_{t-1}+f_{t-2}\) and we have a positive carry over effect.
Suppose that, as of today, I want to forecast all the variables in the system up to December 2020. In order to do that, I fill the dataset with missing data until the last observation that I want to forecast (December 2020). If T refers to the last observation used in the estimation (in this exercise will be December 2020). What would be the values of the coefficients of the Kalman gain matrix in the last observation (December 2020).
Solution
We can define the Kalman Gain as:
\[ K_t = P_{t|t-1}H(HP_{t-1|t-1}H' + R)^{-1} \]
Which comes from the expression:
\[ \sigma_{\varepsilon_t}=E[(y_t - \hat{y}_{t|t-1})(y_t - \hat{y}_{t|t-1})']= HP_{t-1|t-1}H' + R \]
So we could call \(\sigma_{\varepsilon_t}\) variance of the the one-step ahead innovation errors. Therefore, the Kalman Gain determines how much the innovation errors at t influence the estimate of the state at time t. We will not have innovation erros in forecasting one step ahead data but the Kalman Gain would still be zero.
Suppose that you work in a Central Bank and you are asked to estimate a dynamic factor model (like in homework2) with a quarterly variable and four monthly variables (all contemporaneous, not leading or lagging behavior, exactly like in Homework2). If Remember that your model is designed to forecast GDP growth.
Suppose that car registration is not a variable that you have in your four monthly variables and somebody suggest that you should include it. Which statistical criteria would you apply?
Solution
The idea is the following: the more variables you include in a forecasting model the more capacity you will have to explain the variance of the independent vector. However, this comes at a cost, the more variables in a model also mean the higher the noise. When you you have too noisy dataset, even it closely explains 100% of the variance of the independent variable, the more difficult it wold be to extract the actual subyacent signal. This is the so called Bias-Variance Tradeoff.
The answer (a) does not make sense, a positive association it does not imply better fit for GDP. For instance, the unemployment rate can help to forecast GDP but it has a negative assocaition. The answer (b) does not make sense, it would likely increase the correlation of the factor and GPD, but there is no free lunch, you would have a noiser model. The answer (c) neither make sense, because the trade off mentioned before there is no rule of thumb.
Therefore, more data is not always better and can increase forecast errors even when using dimensionality reduction techniques (Boivin & Ng, 2006). To check if more data helps there are two popular options. One is to estimate the RMSE of the model with and without the extra variables and see which one performs better. However, this is an out of sample exercise and makes the model to overfit to the out of sample data which is not always a good idea.
A more refined solution is hard thresholding (Bai & Ng,2007) consists of regressing the forecast variable on its lags and each individual indicator and selecting all indicators with an absolute t-statistic above a certain threshold. In this case, the threshold is obtained by comparing out-of-sample performance of forecasts across a range of thresholds and choosing the threshold that delivers the lowest forecast errors.
Solution
The soft indicator is often released in annual growth rates. This implies that it is related to the level of activity in period t minus the level of activity in period t-1. We can define the annual activity using the lag operator as:
\[Annual \quad Activity = (1-L)^{12}F_t =(1-L) (1 +L + L^2 + ...+ L^{11})F_t= f_t + f_{t-1}+ ...+ f_{t-11}\]
Therefore the monthly growth rates can be expressed as:
\[y_{1t}= \gamma_1(f_t + f_{t-1}+ ...+ f_{t-11})+\epsilon_{1t}\]For the second variable, which is expressed in quarterly g
Solution
In the KF you impose the dynamics of the data to follow a certain process while in the PCA you do not have this capacity. In a sense, both techniques extract a non-observed component and this non-observed is dynamic, but the dynamic is “set” in the KF while in the PCA there is no control.
Solution
If our monthly factor is related to GDP, cement consumption will also be related to the monthly factor with a 3 months lead. The state equation would look like:
\[ \begin{bmatrix} f_{t+3}\\ f_{t+2}\\ \vdots\\ f_{t-4}\\ \\ \\ \\ \\ \\ \end{bmatrix} = \begin{bmatrix} \end{bmatrix} \begin{bmatrix} f_{t+2}\\ f_{t+1}\\ \vdots\\ f_{t-5}\\ \\ \\ \\ \\ \\ \end{bmatrix} + error\quad matrix \]
Solution
The estimated density does not need to be necessarily a mixture of normals. This is a possible situation but will not always be true because the MC does not imply normality.
We can translate the model from this form:
\[ \begin{align*} y_t &= B_tx_t +w_t \quad Observation \quad Eq. \\ B_t &= FB_{t-1} + v_t \quad \quad \quad \quad State\quad Eq. \end{align*} \]
To the usual reduced form of the KF by substituting the \(B_t\) by the unobserved parameter \(h_t\) :
\[ \begin{align*} y_t &= A'x_t + H'h_t +w_t \quad Observation \quad Eq. \\ h_t &= Fh_{t-1} + v_t \quad \quad \quad \quad State\quad Eq. \end{align*} \]