Jose M. Fernandez
Introduction to Econometrics
Multiple Regression allows us to answer these questions.
Example: Wages
\[wages_{i}=\beta_0+\beta_1*Education_{i}+\beta_2*Experience_{i}+e_{i}\]
Multiple Regression also helps you generalize the functional relationships between variables Although, wage increases with experience it may do so at a decreasing rate Each additional year of experience increases your wage by less than the previous year. Example: Wages
\[wages_{i}=\beta_0+\beta_1*Education_{i}+\beta_2*Experience_{i}+\beta_2*Experience^2_{i}+e_{i}\]
The relative increase in wages for an additional year of experience is given by the first partial derrivative
\[\frac{\partial wages_i}{\partial Experience}=\beta_1+2\beta_2 *Experience\]
If \(\beta_2 <0\), then wages are increasing at a decreasing rate (and may eventually decrease altogether)
Recall that our regression model was
\[ TestScore = \beta_0 + \beta_1 STR + u_i,~i=1,\dots,n \]
By not including them in the regression, they are implicitly included in the error term \(u_i\)
This is typically not a problem, unless these omitted factors influence an included regressor
library("AER", quietly=TRUE, warn.conflicts = FALSE, verbose = FALSE)
## Loading required package: zoo
##
## Attaching package: 'zoo'
##
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
##
## Loading required package: splines
data("CASchools")
CASchools$stratio <- with(CASchools, students/teachers)
CASchools$score <- with(CASchools, (math + read)/2)
cor(CASchools$stratio, CASchools$score)
## [1] -0.2264
cor(CASchools$stratio, CASchools$english)
## [1] 0.1876
An OLS estimator will have omitted variable bias (OVB) if
The omitted variable is correlated with an included regressor
The omitted variable is a determinant of the dependent variable
What OLS assumption does OVB violate?
What OLS assumption does OVB violate?
It violates the OLS assumption A.1: \(E[u_i|X_i] = 0\).
As \(u_i\) and \(X_i\) are correlated due to \(u_i\)’s causal influence on \(X_i\), \(E[u_i|X_i] \neq 0\). This causes a bias in the estimator that does not vanish in very large samples. # Omitted Variable Bias - Math (1)
Suppose the true model is given as \[y=\beta_0+\beta_1 x_1+\beta_2 x_2+u\] but we estimate \(\widetilde{y}=\widetilde{\beta_0} + \widetilde{\beta_1} x_1 + u\), then \[ \widetilde{\beta}_1=\frac{\sum{(x_{i1}-\overline{x})y_i}}{\sum{(x_{i1}-\overline{x})^2}}\]
Recall the true model, \[y=\beta_0+\beta_1 x_1+\beta_2 x_2+u\]
The numerator in our estimate for \(\beta_1\) is \[\widetilde{\beta}_1=\frac{\sum{(x_{i1}-\overline{x})(\beta_0+\beta_1 x_{i1}+\beta_2 x_{i2}+u_i)}}{\sum{(x_{i1}-\overline{x})^2}} \\ =\frac{\beta_1\sum{(x_{i1}-\overline{x})^2}+\beta_2\sum{(x_{i1}-\overline{x})x_{i2}}+\sum{(x_{i1}-\overline{x})u_i}}{\sum{(x_{i1}-\overline{x})^2}} \\ = \beta_1+\beta_2 \frac{\sum{(x_{i1}-\overline{x})x_{i2}}}{\sum{(x_{i1}-\overline{x})^2}}+\frac{\sum{(x_{i1}-\overline{x})u_i}}{\sum{(x_{i1}-\overline{x})^2}}\]
If the \(E(u_i)=0\), then by taking expectations we find \[E(\widetilde{\beta}_1)=\beta_1+\beta_2 \frac{\sum{(x_{i1}-\overline{x})x_{i2}}}{\sum{(x_{i1}-\overline{x})^2}}\]
From here we see clearly the two conditions need for an omitted Variable Bias
The omitted variable is correlated with an included regressor (i.e. \(\sum{(x_{i1}-\overline{x})x_{i2}})\neq 0\))
The omitted variable is a determinant of the dependent variable (i.e. \(\beta_2\neq 0\))
| Signs | Corr(\(x_1\),\(x_2\))>0 | Corr(\(x_1\),\(x_2\))<0 |
|---|---|---|
| \(\beta_2>0\) | Positive Bias | Negative Bias |
| \(\beta_2<0\) | Negative Bias | Positive Bias |
Since there is a correlation between \(u_i\) and \(X_i\), \(Corr(X_i, u_i) = \rho_{Xu} \ne 0\)
The OLS estimator has the limit \(\hat{\beta}_1 \overset{p}{\longrightarrow} \beta_1 + \rho_{Xu}\frac{\sigma_u}{\sigma_X}\) which means that \(\hat{\beta}_2\) approaches the right hand value with increasing probability as the sample size grows.
The Multiple Regressor Model: Regressing On More Than One Variable
In a multiple regression model, we allow for more than one regressor. This allows us to isolate the effect of a particular variable holding all others constant.
In otherwords, we can minimize the OBV by including variables in the regression equation that are important and potentially correlated with other regressors.
Look at what happens when break down scores by Pct. English and Student-Teacher Ratio
Regressing On More Than One Variable
In a multiple regression model we allow for more than one regressor. This allows us to isolate the effect of a particular variable holding all others constant.
The population regression line (function) with two regressors would be
\[ E[Y_i|X_{1i} = x_1, X_{2i} = x_2] = \beta_0 + \beta_1 x_1 + \beta_2 x_2 \]
We interpret \(\beta_1\) (also referred to as the coefficient on \(X_{1i}\)), as the effect on \(Y\) of a unit change in \(X_1\), holding \(X_2\) constant or controlling for \(X_2\). For simplicity let us write the population regression line as
\[ Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 \]
Suppose we change \(X_1\) by an amount \(\Delta X_1\), which would cause \(Y\) to change to \(Y + \Delta Y\).
\[ Y + \Delta Y = \beta_0 + \beta_1 (X_1 + \Delta X_1) + \beta_2 X_2 \]
\[Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 \\ Y + \Delta Y = \beta_0 + \beta_1 (X_1 + \Delta X_1) + \beta_2 X_2\]
Subtract the first equation from the second equation, yielding \[\begin{align*} \Delta Y &= \beta_1 \Delta X_1 \\ \beta_1 &= \frac{\Delta Y}{\Delta X_1} \end{align*}\]
\(\beta_1\) is also referred to as the partial effect on \(Y\) of \(X_1\), holding \(X_2\) fixed.
The same as with the single regressor case, the regression line describes the average value of the dependent variable and its relationship with the model regressors. In reality, the actual population values of \(Y\) will not be exactly on the regression line since there are many other factors that are not accounted for in the model.
These other unobserved factors are captured by the error term \(u_i\) in the population multiple regression \[Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \cdots + \beta_k X_{ki} + u_i,~i=1,\dots,n \]
(Generally, we can have any number of k regressors as shown above)
As with the case of a single regressor model, the population multiple regression model can be either homoskedastic or heteroskedastic. it is homoskedastic if
\[Var(u_i|X_{1i},\dots,X_{ki})\]
is constant for all \(i=1,\dots,n\). Otherwise, it is heteroskedastic.
Similar to the single regressor case, we do not observe the true population parameters \(\beta_0,\dots,\beta_k\). From an observed sample \(\{(Y_i, X_{1i},\dots,X_{ki})\}_{i=1}^n\) we want to calculate an estimator of the population parameters
We do this by minimizing the sum of squared differences between the observed dependent variable and its predicted value
\[\min_{b_0,\dots,b_k} \sum_i (Y_i - b_0 - b_1 X_{1i} - \cdots - b_k X_{ki})^2\]
Similar to the simple linear regression case, we would have k+1 equations and k+1 unknowns.
The resulting estimators are called the ordinary least squares (OLS) estimators: \(\hat{\beta}_0,\hat{\beta}_1,\dots,\hat{\beta}_k\)
The predicted values would be
\[\hat{Y}_i = \hat{\beta}_0 + \hat{\beta}_1 X_{1i} + \cdots + \hat{\beta}_k X_{ki},~i=1,\dots,n\]
The OLS residuals would be
\[\hat{u}_i = Y_i - \hat{Y}_i,~i=1,\dots, n\]
Recall that, using observations from 420 school districts, we regressed student test scores on STR we got \(\widehat{TestScore} = 698.9 - 2.28 \times STR\)
However, there was concern about the possibility of OVB due to the exclusion of the percentage of English learners in a district, when it influences both test scores and STR.
We can now address this concern by including the percentage of English learners in our model
\[ TestScore_i = \beta_0 + \beta_1 \times STR_i + \beta_2 \times PctEL_i + u_i \]
where \(PctEL_i\) is the percentage of English learners in school district \(i\).
(Notice that we are using heteroskedastic-robust standard errors)
regress.results <- lm(score ~ stratio + english, data = CASchools)
het.se <- vcovHC(regress.results)
coeftest(regress.results, vcov.=het.se)
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 686.0322 8.8122 77.85 <2e-16 ***
## stratio -1.1013 0.4371 -2.52 0.012 *
## english -0.6498 0.0313 -20.76 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In the single regressor case our estimates were \[\widehat{TestScore} = 698.9 - 2.28 \times STR\] and with the added regressor we have \[\widehat{TestScore} = 686.0 - 1.10 \times STR - 0.65 \times PctEL\]
Notice the lower effect of STR in the second regression The second regression captures the effect of STR holding the percentage of English learners constant We can conclude that the first regression does suffer from OVB This multiple regression approach is superior to the tabular approach shown before; we can give a clear estimate of the effect of STR and it is possible to easily add more regressors if the need arises.
Similar to the single regressor case, except for the modified adjustment for the degrees of freedom, the \(SER\) is
\[SER = s_{\hat{u}}\text{ where }s_{\hat{u}}^2 = \frac{\sum_i \hat{u}^2_i}{n - k - 1} = \frac{SSR}{n - k - 1}\]
Instead of adjusting for the two degrees of freedom used to estimate two coefficients, we now need to adjust for \(k+1\) estimations.
\[ R^2 = \frac{EES}{TSS} = 1 - \frac{SSR}{TSS} \]
In order to address the inflation problem of the \(R^2\) we can calculated an “adjusted” version to corrects for that \[\bar{R}^2 = 1 - \frac{n-1}{n-k-1}\frac{SSR}{TSS} = 1 - \frac{s_{\hat{u}}^2}{s_Y^2}\]
From our multiple regression of the test scores on STR and the percentage of English learners we have the \(R^2\), \(\bar{R}^2\), and \(SER\)
regress.summary <- summary(regress.results)
regress.summary$r.squared
## [1] 0.4264
regress.summary$adj.r.squared
## [1] 0.4237
regress.summary$sigma
## [1] 14.46
We notice a large increase in the \(R^2 = 0.426\) compared to that of the single regressor estimation 0.051. Adding the percentage of English learners has added a significant increase in the explanatory power of the regression.
Because \(n\) is large compared to the two regressors used, \(\bar{R}^2\) is not very different from \(R^2\).
We must be careful not to let the increase in \(R^2\) (or \(\bar{R}^2\)) drive our choice of regressors. Later in chapter 7 we will cover how to decided on what variables to include.
For multiple regressions we have four assumptions: three of them are updated versions of the single regressor assumptions and one new assumption.
The regressors exhibit perfect multicollinearity if one of the regressors is a linear function of the other regressor.
Assumption A.4 requires that there be no perfect multicollinearity
Perfect multicollinearity can occur if a regressor is accidentally repeated, for example, if we regress \(TestScore\) on \(STR\) and \(STR\) again (R simply ignores the repeated regressor). This could also occur if a regressor is a multiple of another.
Mathematically, this is not allowed because it leads to division by zero Intuitively, we cannot logically think of measuring the effect of \(STR\) while holding other regressors constant since the other regressor is \(STR\) as well (or a multiple of)
Fraction of English learners
“Not very small” classes:
Let \(NVS_i = 1 \) if \( STR_i \geq 12\)
None of the data available has \(STR_i \le 12 \) therefore \( NVS_i\) is always equal to \(1\).
Suppose we want to categorize school districts as rural, suburban, and urban
Imperfect multicollinearity means that two or more regressors are highly correlated. It differs from perfect multicollinearity it that they don’t have to be exact linear functions of each other.