Econometrics Final Review
For a multiple variable regression model:
- Given n, SSE, SSR, SST and (X’X)-1, calculate confidence interval and t-statistic for a particular beta. Recall that s2 here is SSE/(n-k) where k is the number of betas including the intercept. Var(beta) is s2 * (X’X)-1.
Example: \(n = 48\), \(SSE = 189050\), \(SSR = 399316\), \(SST = 588366\), and \[(X'X)^{-1} = \begin{bmatrix} 7.8301941 & 0.4265133 & 0.0001495 & -0.0000753 & 6.1107646 \\ 0.4265133 & 0.0382636 & 0.0000059 & 0.0000057 & 0.2215811 \\ 0.0001495 & 0.0000059 & 0.0000001 & 0 & 0.0001450 \\ 0.0000753 & 0.0000057 & 0 & 0 & 0.0000413 \\ 6.1107646 & 0.2215811 & 0.0001450 & 0.0000413 & 8.4108914 \end{bmatrix}\]
Model is \(y = 377.29 - 34.79x_2-.07x_3-.0024x_4+1336.45x_5\)
Caclulate confidence interval and t-statistic for \(\beta_2\) and \(\beta_5\):
\(\beta_2 = - 34.79\) First we need to get \(s^2 = {SSE\over{n-k}} = {189050\over{48 - 5}} = 4396.51\) Then, we get \(s_{\beta_2}\) so we multiply \(s^2\) by the diagonal of the inverse matrix for that beta and take the squareroot. in this case \(s_{\beta_2} = \sqrt{s^2*(X'X)^{-1}_{2,2}} = \sqrt{ 4396.51 * 0.0382636} = 12.97\). Now we find t-stastitic for \(n-k=48-5=43\) degrees of freedom (area in 2 tails: 0.05 for 95% confidence): \(t = 2.017\)
confidence for \(\beta_2 = \beta_2\pm t\times s_{\beta_2} = -34.79 \pm 2.017(12.97)\)
\(\beta_5 = 1336.45\) \(s_{\beta_5} = \sqrt{s^2*(X'X)_{5,5}^{-1}} = \sqrt{ 4396.51 * 8.4108914} = 192.298\). confidence for \(\beta_5 = \beta_5\pm t\times s_{\beta_5} = 1336.45 \pm 2.017(192.298)\)
T-test for both is \({\beta\over{S_{\beta_2}}}\) so for \(\beta_2\), the t-test = \({-34.79\over12.97} = -2.68\) and for \(\beta_5\) it is \({1336.45\over{192.298}} = 6.95\) both are greater than (or less than) 2.017 (or -2.017).
- Calculate the adjusted R2.
\(R^2 = {SSR\over SST} = {399316\over588366} = .6787\)
Adjusted \(R^2 = 1 - {{SSE/n-k}\over{SST/n-1}} = {{189050/43}\over{588366/47}} = .6488\)
- Calculate the F-test that all betas = 0.
\(F = {{SSR/k-1}\over{SSE/n-k}}={{399316/4}\over{189050/43}} = 22.706\) compare with F-table row n-k (43), column k-1(4): 2.606
- Calculate a partial F-test to test whether additional betas add significance to the model. If the value exceeds the number on the table then this indicates significance.
Example the reduced model has 3 betas while the full model has 5. \(SSE_\textrm{full} = 3798\), \(SSE_\textrm{reduced} = 7505\), \(n = 45\).
In the followin equation \(R = \textrm{reduced model}\), \(UR = \textrm{full model}\), \(k = \textrm{number of betas}\), \(q = \Delta \textrm{number of betas} = k_UR - k_R\): \[F_{q,n-k}={{SSE_R-SSE_{UR}/q}\over{SSE_{UR}/n-k}} = {{7507-3798/2}\over{3798/40}} = 19.53\] Compare with \(F_{2,40}\) (F table, row 40 column 2) = 3.2317. Since F-statistoc is greater than on table, result is statistically significant.
- Know how to use a dummy variable in the model for non-numeric data. Know how to implement the model when the dummy variable adds to the intercept and when it adds to the slope parameter.
To implelment, we use 1 to imply yes and 0 to imply no. We can either add the dummy variable to only change the y-intercept, so our model looks like this:
\[y = \beta_0+\beta_1x_1+\beta_2D_1\] In this case if \(D_1 = 1\) the only thing that changes is the y-interept because the value of \(\beta_2\) is added to \(\beta_0\). If we want to change the slope parameter, the model will look as follows:
\[y=\beta_0+\beta_1x_1+\beta_2x_1D_1\] where the the third term changes the slope of x based on \(\beta_2\) if \(D_1 = 1\)
- Be familiar with the various diagnostic issues in the regression model. We discussed multicollinearity, heteroscedasticity, and autocorrelation. Know what each means, what indicates each, and how the residual plots indicate them.
multicollinearity: when the independent variables (e.g. \(x_1\) and \(x_2\) are highly correlated - individual t-statistics for betas will be low even though overall \(R^2\) is high
heterodcedasticity: the variance of the error terms is not constant. - if in residual plot, residuals fan out or funnel in (should be straight)
autocorrelation: error terms are not independent - positive autocorrelation results in cyclical error terms pattern - negative autocorrelation results in alternating pattern
- Know how to calculate an n-period moving average for the one-step and two-step etc forecast.
one step ahead is simply an average \[y_{t+1} = {{y_t+y_{t-1}+y_{t-2}}\over{3}}\] two steps ahead uses the forcasted value of \(y_t+1\) in the average: \[y_{t+2} = {{y_{t+1}+y_{t}+y_{t-1}}\over{3}}\]
- Given alpha, the current smoothed value and the latest observation, use simple exponential smoothing to calculate the one-step and two-step etc forecast. (They are the same!) Know the concept of how alpha is selected. It’s the value that minimizes the historical \(SSE^2\).
Current smoothed value \(S(t)\) is basically \(\alpha y_{t} + (1 - \alpha)S(t-1)\).
For example, if \(\alpha = .7\), latest observation \(y_t = 100\) and previous smoothed value \(S(t-1) = 95\), then \(S(t)\) will be:
\[S(t) = .7\times 100 + .3 \times 95 = 98.5\] then $S(t+1) = S(t)+(1-)S(t) = .798.5 + .398.5 = 98.5 $ which basically means it stays the same. The same goes for \(S(t+2)\) and so forth. \(S(t+2) = \alpha\times S(t+1) +(1 - \alpha)\times S(t+1)\)
The higher \(\alpha\) is, the faster previous observations are dampened out and greater ceredence is given to more recent observations. The best \(\alpha\) is the one that minimizes the least squared error in past observations.
- Know how a seasonal linear trend model is implemented. Multiply the y-hat by the seasonal factor.
\(\hat{y}_t = TR_t \times SN_t \times CL_t \times IR_t\) where \(TR\) is the trend component, \(SN\) is the seasonal component, \(CL\) is the cyclical component, and \(IR\) is the irregualr component. To find \(SN\), we first average the montly data, this gives us \(tr \times cl\). then, if we divide \({\hat{y}_t \over {\textrm{averaged data}}} = sn * ir\). Then, we average \(sn * ir\) for each month over all data it will average out the irregualr and we’re left with \(sn\) for each month. The sum of all \(sn\) is 12. if it doesn’t. we normalize it: \(12\over \sum_{t=1}^{12}sn\).
To “deseasonlize” \(\hat{y}_t\) we divide by \(sn\)
so \(y = (\alpha + \beta t) \times sn_{\textrm{month}}\)
- Know how to recognize a correlation of zero using Bartlett test. i.e. if value less than 2/sqrt(t).
if \(\hat{p}_k < {2 \over \sqrt {t}}\) then we assume it is zero. \(\hat{p}_k\) is the estimated autocorrelation coefficient (i.e. the correlation of the error terms: \(\sum_{t=1}^{t-k}{(y_t-\bar y)(y_{t+k}-\bar{y})}\over{\sum_{t=1}^t(y_t-\bar y)^2}\))
- Know how to recognize a white noise process using Box-Pierce test. Chi-square with k d.f.
Bartlett only tests a single correlation. To test whether all correlations are zero:
\[Q = T\sum_{k=1}^k \hat \rho_k^2\] To test \(Q\) look on chi-square table for \(k\) d.f. (where k is number of lags). if \(Q > \textrm{critical value}\) we assume not all correlations are zero.
- Know how to use ARIMA models to forecast one-step, two-step etc. ahead forecasts given the parameters.
The \(AR(p)\) model is:
\[y_t = \phi_1 y_{t-1}+\phi_2 y_{t-2}+\dots+\phi_p y_{t-p}+\delta+\epsilon_t\]
example:
Time series with \(n = 100\). \(y_{100} = 103\), \(y_{99} = 99\), and \(y_{98} = 102\). You model it with AR(1) and \(\epsilon_{100} = 6\), \(\epsilon_{99} = -7\), \(\epsilon{98} = 5\). The parameters you use are \(\phi = .3\), \(\delta = 70\), and \(\sum\epsilon^2=9900\)
The AR(1) formula is: \(y_t = \phi y_{t-1} + \delta + \epsilon_t\)
so:
\(y_{101} = .3\times103+70 = 100.9\)
\(y_{102} = .3 \times 100.9 + 70 = 100.27\)
\(y_{103} = .3 \times 100.27 + 70 = 100.081\)
The \(MA(q)\) model is:
\[y_t = \mu+\epsilon_t - \theta_1 \epsilon_{t-1}-\theta_2\epsilon_{t-2}-\dots-\theta_q\epsilon_{t-q}\] Example: Same series as above The MA(2) parameters are for example: \(\mu = 100\), \(\theta_1 = .3\), \(\theta_2 = -.2\), and \(\sum\epsilon^2=9800\).
The MA(2) model formua is: \(y_t = \mu + \epsilon_t - \theta_1\epsilon_{t-1}\)
so:
\(y_{101} = 100 _ .3 \times 6 - (-.2 \times -7 = 96.8\)
\(y_{102} = 100 - .3 \times 0 - (-.2) \times 6 = 101.2\)
\(y_{103} = 100 - .3 \times 0 -(-2) \times 0 = 100\)
The mixed model formula \(ARMA(p, q)\) basically combines the two above models:
\[y_t = \phi_1 y_{t-1}+\phi_2 y_{t-2}+\dots+\phi_p y_{t-p}+\delta+\epsilon_t - \theta_1 \epsilon_{t-1}-\theta_2\epsilon_{t-2}-\dots-\theta_q\epsilon_{t-q}\]
- Know how to analyze adequacy of the model using Box-Pierce test. Chi-square with (k-p-q) d.f. Residuals should resemble a white noise process.
This is the same as question 11, but now the \(\rho\)s are for the residuals, not for the terms themselves. \(k\) stands for number of lags, \(p\) for number of lags used in \(AR\), \(q\) for lags used in \(MA\)
- For an AR(1) and any MA(q) know calculate the prediction interval for one-step, two-step etc. ahead forecast.
The formula for this is
\[\hat y\pm 2\sqrt{(1 + \sum_{j=1}^{\ell-1}\psi^2_j)}\:\:\sigma_\epsilon\] First we need to know the variance \(\sigma_\epsilon\). \[\sigma_\epsilon = {\sum_{t=1}^T\sigma^2_t\over{T-p-q}}\] where \(T\) is number of observations, \(p\) is the number of AR parameters, and \(q\) is the nubmer of MA parameters.
The variance of 1-step ahead is just \(\sigma^2_\epsilon\). The variance of future step ahead is \[1 + \sum_{j=1}^{\ell-1}\psi^2_j\] where \(\psi\) is the weight for the particular error term, and \(\ell\) is the number of steps ahead.
AR example using the example of question 12, variance of the error terms is \(9900/(100-1) = 100\)
variance for time 101 is just 100. standard deviation is square root of that so it’s 10. margine of error = 2*10 = 20.
for time 102 its \((1 + .3^2)100 = 109\). SD is \(\sqrt(109) = 10.44\), margin of error is 20.88
variance for time 103 = \((1 + .3^2 + .3^4)100 =109.81\) (notice the power of 4 the second time)
MA example:
if \(\theta_1 = .3\) and \(\theta_2 = -.2\) then the variance of the 3 step ahead forcast is \((1 + (.3)^2 + (.2)^2)\sigma_\epsilon^2\) This will be the variance for all forcasts > 2.
using the example of question 12:
variance o ftime 101 \(\sigma^2_{101} = {9800\over(100-2)} = 100\)
variance of time 102 \(\sigma^2_{102} = (1+.3^2)100 = 109\)
variance of time 103 \(\sigma^2_{103} = (1+.3^2 + (-.2)^2)100 = 113\)
sd of time 101 \(\sigma_{101} = \sqrt(100) = 10\)
sd of time 102 \(\sigma_{102} = \sqrt(109) = 10.44\)
sd of time 101 \(\sigma_{103} = \sqrt(113) = 10.63\)
since margine of error is twice sd, margine of error for time 101 = 20, for time 102 it’s 20.88 and for time 103 it’s 21.26.
- Know the concept of non-stationarity and how to fix it (differencing).
In order to be able to make forecasts, the data needs to be stationary.This means that the data fluctuates around a constant mean independent of time and the variance of the fluctuation remains constant over time. if the variance grows over time it makes it impossible to forecast into the future. The variance and covariances of the observations are independent of time. The covariance depends only on the lag between the observations.
For \(AR(p)\), stationarity requirements are:
if \(p = 1\), then \(-1 < \phi_1 < 1\)
for \(p = 2\) \(-1 < \phi_2 < 1\), \(\phi_1 + \phi_2 < 1\), and \(\phi_2 - \phi_1 < 1\)
For \(MA(q)\), stationarity requirements are:
if \(q = 1\), then \(-1 < \theta_1 < 1\)
for \(q = 2\) \(-1 < \theta_2 < 1\), \(\theta_1 + \theta_2 < 1\), and \(\theta_2 - \theta_1 < 1\)
If we see a graph where the time series is trending upwards or downwards its probably non-stationary,
Non-stationarity can often be fixed by taking and modeling the differences between data points \(\Delta y_t = y_t - y_{t-1}\). If first differences don’t make the model stationary we try taking second differences (the difference of the differences), etc.
- Know the concept of a white noise process versus a random walk model. White noise has no memory, random walk has infinite memory.
White noise describes the assumption that each element in a series is a random draw from a population with zero mean and constant variance. Basically, every term has zero correlation with any previous term.
Random walk is just the opposite, every term is correlated with the previous term which in turn is correlated with the term before
- Know conceptually the difference between the ACF and PACF.
ACF stands for Autocorrelation Function and PACF means partial autocorrelation function.
The ACF measures the autocorelation of lags. In AR(1) and AR(p) we expect a high autocorelation at lag 1 and then it dies down quickly. it never goes to zero though because of the residual effects of the error terms of each subsequent observations. For MA(1) it cuts off to 0 after lag 1. For MA(q) it cuts off after lag q. For ARMA it dies down.
PACF measures the direct correlation of a particular lag without the intervening residuals. At lag 1 it will be the same as the autocorelation. For AR(1) it will be zero for any lag > 1. For AR(p) it cuts off after lag(p). For MA(1) and MA(q) it dies quickly down. For ARMA it dies down.
| Model | \(ACF\) | \(PACF\) | |
|---|---|---|---|
| white noise | 0 at all lags including lag 1 | 0 at all lags including lag 1 | |
| random walk (try differencing) | dies very slowly | Very close to 1 at lag 1, may cut off afterward | |
| \(AR(p)\) | dies down quickly | cuts off after lag p | |
| \(MA(q)\) | cuts off after lag q | dies down quickly | |
| \(ARMA\) | dies down quickly | dies down quickly | |
| try both AR and MA | non-zero and then cuts off | non zero and then cuts off |