Econometrics Final Review
For a multiple variable regression model:
- Given n, SSE, SSR, SST and (X’X)-1, calculate confidence interval and t-statistic for a particular beta. Recall that s2 here is SSE/(n-k) where k is the number of betas including the intercept. Var(beta) is s2 * (X’X)-1.
Example: \(n = 48\), \(SSE = 189050\), \(SSR = 399316\), \(SST = 588366\), and \[(X'X)^{-1} = \begin{bmatrix} 7.8301941 & 0.4265133 & 0.0001495 & -0.0000753 & 6.1107646 \\ 0.4265133 & 0.0382636 & 0.0000059 & 0.0000057 & 0.2215811 \\ 0.0001495 & 0.0000059 & 0.0000001 & 0 & 0.0001450 \\ 0.0000753 & 0.0000057 & 0 & 0 & 0.0000413 \\ 6.1107646 & 0.2215811 & 0.0001450 & 0.0000413 & 8.4108914 \end{bmatrix}\]
Model is \(y = 377.29 - 34.79x_2-.07x_3-.0024x_4+1336.45x_5\)
Caclulate confidence interval and t-statistic for \(\beta_2\) and \(\beta_5\):
\(\beta_2 = - 34.79\) First we need to get \(s^2 = {SSE\over{n-k}} = {189050\over{48 - 5}} = 4396.51\) Then, we get \(s_{\beta_2}\) so we multiply \(s^2\) by the diagonal of the inverse matrix for that beta and take the squareroot. in this case \(s_{\beta_2} = \sqrt{s^2*\textrm{diagnonal}} = \sqrt{ 4396.51 * 0.0382636} = 12.97\). Now we find t-stastitic for \(n-k=48-5=43\) degrees of freedom (area in 2 tails: 0.05 for 95% confidence): \(t = 2.017\)
confidence for \(\beta_2 = \beta_2\pm t\times s_{\beta_2} = -34.79 \pm 2.017(12.97)\)
\(\beta_5 = 1336.45\) \(s_{\beta_5} = \sqrt{s^2*\textrm{diagnonal}} = \sqrt{ 4396.51 * 8.4108914} = 192.298\). confidence for \(\beta_5 = \beta_5\pm t\times s_{\beta_5} = 1336.45 \pm 2.017(192.298)\)
T-test for both is \({\beta\over{S_{\beta_2}}}\) so for \(\beta_2\), the t-test = \({-34.79\over12.97} = -2.68\) and for \(\beta_5\) it is \({1336.45\over{192.298}} = 6.95\) both are greater than (or less than) 2.017 (or -2.017).
- Calculate the adjusted R2.
\(R^2 = {SSR\over SST} = {399316\over588366} = .6787\)
Adjusted \(R^2 = 1 - {{SSE/n-k}\over{SST/n-1}} = {{189050/43}\over{588366/47}} = .6488\)
- Calculate the F-test that all betas = 0.
\(F = {{SSR/k-1}\over{SSE/n-k}}={{399316/4}\over{189050/43}} = 22.706\) compare with F-table row n-k (43), column k-1(4): 2.606
- Calculate a partial F-test to test whether additional betas add significance to the model. If the value exceeds the number on the table then this indicates significance.
Example the reduced model has 3 betas while the full model has 5. \(SSE_\textrm{full} = 3798\), \(SSE_\textrm{reduced} = 7505\), \(n = 45\).
In the followin equation \(R = \textrm{reduced model}\), \(UR = \textrm{full model}\), \(k = \textrm{number of betas}\), \(q = \Delta \textrm{number of betas} = k_UR - k_R\): \[F_{q,n-k}={{SSE_R-SSE_{UR}/q}\over{SSE_{UR}/n-k}} = {{7507-3798/2}\over{3798/40}} = 19.53\] Compare with \(F_{2,40}\) (F table, row 40 column 2) = 3.2317. Since F-statistoc is greater than on table, result is statistically significant.
- Know how to use a dummy variable in the model for non-numeric data. Know how to implement the model when the dummy variable adds to the intercept and when it adds to the slope parameter.
To implelment, we use 1 to imply yes and 0 to imply no. We can either add the dummy variable to only change the y-intercept, so our model looks like this:
\[y = \beta_0+\beta_1x_1+\beta_2D_1\] In this case if \(D_1 = 1\) the only thing that changes is the y-interept because the value of \(\beta_2\) is added to \(\beta_0\). If we want to change the slope parameter, the model will look as follows:
\[y=\beta_0+\beta_1x_1+\beta_2x_1D_1\] where the the third term changes the slope of x based on \(\beta_2\) if \(D_1 = 1\)
- Be familiar with the various diagnostic issues in the regression model. We discussed multicollinearity, heteroscedasticity, and autocorrelation. Know what each means, what indicates each, and how the residual plots indicate them.
multicollinearity: when the independent variables (e.g. \(x_1\) and \(x_2\) are highly correlated - individual t-statistics for betas will be low even though overall \(R^2\) is high
heterodcedasticity: the variance of the error terms is not constant. - if in residual plot, residuals fan out or funnel in (should be straight)
autocorrelation: error terms are not independent - positive autocorrelation results in cyclical error terms pattern - negative autocorrelation results in alternating pattern
- Know how to calculate an n-period moving average for the one-step and two-step etc forecast.
one step ahead is simply an average \[y_{t+1} = {{y_t+y_{t-1}+y_{t-2}}\over{3}}\] two steps ahead uses the forcasted value of \(y_t+1\) in the average: \[y_{t+2} = {{y_{t+1}+y_{t}+y_{t-1}}\over{3}}\] > 8) Given alpha, the current smoothed value and the latest observation, use simple exponential smoothing to calculate the one-step and two-step etc forecast. (They are the same!) Know the concept of how alpha is selected. It’s the value that minimizes the historical SSE2.
So current smoothed value \(S(t)\) is basically \(\alpha y_{t} + (1 - \alpha)S(t-1)\). To calculate one-step forcast, we use \(S(t)\) in place of \(y_t\) and in place of \(S(t-1)\), so it is:
\(S(t+1) = \alpha S(t)+(1-\alpha)S(t)\) which basically means it stays the same. The same goes for \(S(t)\) and so forth. \(S(t+2) = \alpha\times S(t+1) +(1 - \alpha)\times S(t+1)\)
So if for example, the previous smoothed value \(S(t-1) = 5\), the latest observation \(y_t = 10\), and \(\alpha = .4\). the current smoothed value \(S(t) = \alpha y_{t} + (1 - \alpha)S(t-1) = .4\times 10 + .6 \times 5 = 7\). In that case, \(S(t+1) = \alpha S(t)+(1-\alpha)S(t) = .4 \times 7 + .6 \times 7 = 7\) and \(S(t+2) = \alpha\times S(t+1) +(1 - \alpha)\times S(t+1) = 7\).
The higher \(\alpha\) is, the faster previous observations are dampened out and greater ceredence is given to more recent observations. The best \(\alpha\) is the one that minimizes the least squared error in past observations.
Know how a seasonal linear trend model is implemented. Multiply the y-hat by the seasonal factor.
Know how to recognize a correlation of zero using Bartlett test. i.e. if value less than 2/sqrt(t).
Know how to recognize a white noise process using Box-Pierce test. Chi-square with k d.f.
Know how to use ARIMA models to forecast one-step, two-step etc. ahead forecasts given the parameters.
Know how to analyze adequacy of the model using Box-Pierce test. Chi-square with (k-p-q) d.f. Residuals should resemble a white noise process.
For an AR(1) and any MA(q) know calculate the prediction interval for one-step, two-step etc. ahead forecast.
Know the concept of non-stationarity and how to fix it (differencing).
Know the concept of a white noise process versus a random walk model. White noise has no memory, random walk has infinite memory.
Know conceptually the difference between the ACF and PACF.