Stat 115s (Introduction to Econometrics)

Lesson 2.1- Multiple Linear Regression Model (OLS Estimation)

Author

Norberto E. Milla, Jr.

Published

March 16, 2023

1 Regression model with two explanatory variables

  • In the simple regression analysis with only one explanatory variable the assumption SLR.4 (all other factors affecting y are uncorrelated with x, ceteris paribus) is generally very restrictive and unrealistic.

  • Multiple regression analysis allow us to explicitly control for many other factors that simultaneously affect y.

  • By adding new explanatory variables we can explain more of the variation in y. In other words, we can develop more successful models.

  • Additional advantage: multiple regression can incorporate general functional form relationships.

  • Example 1: Wage equation

wage = \beta_0 + \beta_1educ + \beta_2exper + u

  • wage: hourly wages (in US dollars); educ: level of education (in years); exper: level of experience (in years)

  • This wage equation allows us to measure the impact of education on wage holding experience fixed, and vice versa

    • \beta_1: measures the impact of education on wage, holding all other factors fixed.
    • \beta_2: measures the ceteris paribus effect of experience on wage.


  • Example 2: Student success and family income equation

avgscore = \beta_0 + \beta_1expend + \beta_2avginc + u

  • avgscore: average standardized test score; expend: education spending per student; avgincome: average family income

  • If avginc is not included in the model directly, its effect will be included in the error term, u.

  • Because average family income is generally correlated with education expenditures, the key assumption SLR.3 will be invalid: x (expend) will be correlated with u leading to biased OLS estimators.

  • Multiple regression analysis allow us to use more general functional forms.

  • Consider the following quadratic model of consumption

cons = \beta_0 + \beta_1inc + \beta_2inc^2 + u

  • Here x_1 = inc and x_2 = inc^2

  • \beta_1: we cannot fix inc^2 while changing inc

  • Marginal propensity to consume (MPC) is approximated by:

\frac{\Delta{cons}}{\Delta{inc}} \approx \beta_1 + 2\beta_2inc

  • Hence, MPC depends on inc

  • The regression model with 2 explanatory variables is formally given by:

y = \beta_0 + \beta_1x_1 + \beta_2x_2 + u

  • Zero conditional mean” assumption: E(u|x_1,x_2) = 0

  • In other words, for all combinations of x_1 and x_2, the expected value of unobservables, u, is zero.

  • In the wage equation: E(u|educ,exper) = 0

which means that unobservable factors affecting wage are not related on average with educ and exper

  • If ability is a part of u, then average ability levels must be the same across all combinations of education and experience in the population.

  • In the test scores and average family income equation, the unconditional mean assumption implies that

E(u|expend,avginc) = 0

  • All other factors affecting the average test scores (such as quality of schools, student characteristics, etc.) are, on average, unrelated to education expenditures and family income

  • For the quadratic consumption model: E(u|inc, inc^2) = E(u|inc) = 0

  • Since inc^2 is automatically known when inc is known we do not need to write it in the conditional expectation.

2 Regression model with k explanatory variables

y = \beta_0 + \beta_1x_1 + \beta_2x_2 + \cdots + \beta_kx_k +u

  • We have k explanatory variables in k + 1 unknown \beta paramaters

  • The definition of the error term u is the same: it represents all other factors affecting y that are not included in the model.

  • \beta_j measures the change in y in response to a unit change in x_j holding all other x_is and unobservable factors u fixed.

  • The zero conditional mean assumption is reformulated as

E(u|x_1, x_2, \cdots, x_k) = 0

which means that u is unrelated with explanatory variables.

  • If u is correlated with any of x_is then OLS estimators will in general be biased and estimation results will be unreliable.

  • If there are omitted important variables affecting y this assumption may not hold. This may lead to omitted variable bias

3 OLS Estimation

  • Sample regression function: \hat{y} = \hat{\beta}_0 + \hat{\beta}_1x_1 + \hat{\beta}_2x_2 + \cdots + \hat{\beta}_kx_k
  • OLS estimators minimize the sum of squared residuals

\sum_{i=1}^n u_i^2 = \sum_{i=1}^n (y_i - \hat{\beta}_0 - \hat{\beta}_1x_1 - \hat{\beta}_2x_2 - \cdots - \hat{\beta}_kx_k)^2

  • the OLS estimators \hat{\beta}_i, \; i = 1, 2, \cdots, k\; are found by solving the following system of equations:

\begin{align} \sum_{i=1}^n (y_i - \hat{\beta}_0 - \hat{\beta}_1x_1 - \hat{\beta}_2x_2 - \cdots - \hat{\beta}_kx_k) &= 0 \notag \\ \sum_{i=1}^n x_{i1}(y_i - \hat{\beta}_0 - \hat{\beta}_1x_1 - \hat{\beta}_2x_2 - \cdots - \hat{\beta}_kx_k) &= 0 \notag \\ \sum_{i=1}^n x_{i2}(y_i - \hat{\beta}_0 - \hat{\beta}_1x_1 - \hat{\beta}_2x_2 - \cdots - \hat{\beta}_kx_k) &= 0 \notag \\ &\vdots \notag \\ \sum_{i=1}^n x_{ik}(y_i - \hat{\beta}_0 - \hat{\beta}_1x_1 - \hat{\beta}_2x_2 - \cdots - \hat{\beta}_kx_k) &= 0 \notag \end{align}

  • The estimated or sample regression equation is:

\hat{y} = \hat{\beta}_0 + \hat{\beta}_1x_1 + \hat{\beta}_2x_2 + \cdots + \hat{\beta}_kx_k

  • In terms of changes we have:

\Delta{\hat{y}} = \hat{\beta}_1 \Delta{x_1} + \hat{\beta}_2 \Delta{x_2} + \cdots + \hat{\beta}_k \Delta{x_k}

  • Thus, if \Delta{x_2} = \Delta{x_3} = \cdots = \Delta{x_k} = 0, then

\begin{align} \Delta{\hat{y}} &= \hat{\beta}_1 \Delta{x_1} \notag \\ \implies \hat{\beta}_1 &= \frac{\Delta{\hat{y}}}{\Delta{x_1}} \notag \end{align}

  • Holding all other independent variables fixed (i.e. controlling for all other x variables), the change in \hat{y} given a one unit change in x_1 is _1

4 Examples

  1. Determinants of college success: colGPA= university GPA; hsGPA= high school GPA; ACT = acheivement test score
Code
data(gpa1)
mlr1 <- lm(colGPA ~ hsGPA + ACT, data = gpa1)
summary(mlr1)

Call:
lm(formula = colGPA ~ hsGPA + ACT, data = gpa1)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.85442 -0.24666 -0.02614  0.28127  0.85357 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 1.286328   0.340822   3.774 0.000238 ***
hsGPA       0.453456   0.095813   4.733 5.42e-06 ***
ACT         0.009426   0.010777   0.875 0.383297    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3403 on 138 degrees of freedom
Multiple R-squared:  0.1764,    Adjusted R-squared:  0.1645 
F-statistic: 14.78 on 2 and 138 DF,  p-value: 1.526e-06


  • Estimated regression equation: \widehat{colGPA} = 1.286 + 0.453hsGPA + 0.009ACT

  • \hat{\beta}_1 = 0.453: Holding ACT fixed, given a one-point change in high school GPA, college GPA is predicted to increase by 0.453 points (almost half a point)

  • If we choose two students, A and B, with the same ACT score but hsGPA of student A is one point higher than the hsGPA of student B, then we predict that student A’s college GPA is 0.453 points higher than that of student B’s.

  • \hat{\beta}_2 = 0.009: A point change in ACT increases college GPA by 0.009 (a very small effect), holding the effect of high school GPA constant


  1. Logarithmic wage equation: wage: average hourly earnings, educ: years of education; exper: years potential experience; tenure: years with current employer
Code
data(wage1)
mlr2 <- lm(log(wage) ~ educ + exper + tenure, data = wage1)
summary(mlr2)

Call:
lm(formula = log(wage) ~ educ + exper + tenure, data = wage1)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.05802 -0.29645 -0.03265  0.28788  1.42809 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.284360   0.104190   2.729  0.00656 ** 
educ        0.092029   0.007330  12.555  < 2e-16 ***
exper       0.004121   0.001723   2.391  0.01714 *  
tenure      0.022067   0.003094   7.133 3.29e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4409 on 522 degrees of freedom
Multiple R-squared:  0.316, Adjusted R-squared:  0.3121 
F-statistic: 80.39 on 3 and 522 DF,  p-value: < 2.2e-16


  • Estimated regression function: \widehat{logwage} = 0.282 + 0.092educ + 0.004exper + 0.022tenure

  • Holding experience and tenure fixed, another year of education is predicted to increase 9.2% change in wages

  • An additional year of experience increases wages by 0.4%, ceteris paribus

5 Fitted values and residuals

  • Fitted (predicted) value for observation i: \hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1x_{i1} + \hat{\beta}_2x_{i2} + \cdots + \hat{\beta}_kx_{ik}

  • Residual for observation i: \hat{u}_i = y_i - \hat{y}_i

    • \hat{u}_i > 0 implies y_i > \hat{y}_i, underprediction

    • \hat{u}_i < 0 implies y_i < \hat{y}_i, overprediction

  • The sum (also the average) of residuals is zero: \sum_{i=1}^n \hat{u}_i = 0

  • Sample covariance between residuals and each x_j is zero: \sum_{i=1}^n x_{ij}\hat{u}_i = 0

6 Sums of squares and goodness-of-fit

  • Total variation in y: SST = \sum_{i=1}^n (y_i - \overline{y})^2

  • Explained variation:

SSR = \sum_{i=1}^n (\hat{y}_i - \overline{y})^2

  • Unexplained variation:

SSE = \sum_{i=1}^n \hat{u}_i^2 = \sum_{i=1}^n (y_i - \hat{y}_i)^2

  • Coefficient of determination (R^2): the ratio of the variation in the explained part to the total variation

R^2 = \frac{SSR}{SST} = 1- \frac{SSE}{SST}

  • When a new x variable is added to the regression R^2 always increases (or stays the same). It never decreases.

  • The reason is that, when a new variable is added SSE always decreases.

  • For this reason R^2 may not be a good indicator for model selection.

  • Instead, we will use adjusted R^2 for model selection

\begin{align} \text{Adjusted}\; R^2 &= 1- \frac{\frac{SSE}{n-k-1}}{\frac{SST}{n-1}} \notag \\ &= 1- \frac{(n-1)SSE}{(n-k-1)SST} \notag \\ &= 1- \frac{(n-1) (1-R^2)}{n-k-1} \notag \end{align}

  • Adjusted R^2 is interpreted the same way as R^2

7 Assumptions for unbiasedness of OLS estimators

MLR.1: Linear in Parameters

y = \beta_0 + \beta_1x_1 + \beta_2x_2 + \cdots + \beta_kx_k +u

  • Population model is linear in parameters

MLR.2: Random sampling

  • We have a random sample of n observations drawn from the population model defined in MLR.1.

MLR.3: No Perfect Collinearity

  • There is no perfect linear relationship between all independent x variables.

  • Any of the x variables cannot be written as a linear combination of other independent variables. If this assumption fails we have perfect collinearity (multicollinearity)

    • If x variables are perfectly collinear then it is not possible to mathematically determine OLS estimators.
    • This assumption allows the independent variables to be correlated: but they cannot be perfectly correlated.
    • If we did not allow for any correlation among the independent variables, then multiple regression would be of very limited use for econometric analysis
    • For example, in the regression of student GPA on education expenditures and family income, we suspect that expenditure and income may be correlated and so we would like to hold income fixed to find the impact of expenditure on GPA.

MLR.4: Zero conditional mean

E(u|x_1, x_2, \cdots, x_k) = 0

  • This assumption states that x explanatory variables are strictly exogenous. Random error term is uncorrelated with explanatory variables.

  • There are several ways this assumption can fail:

    • Functional form misspecification: if the functional relationship between the explained and explanatory variables is misspecified this assumption can fail.

    • Omitting an important variable that is correlated with any of explanatory variables

    • Measurement error in explanatory variables

Theorem: Under assumptions MLR.1 through MLR.4 OLS estimators are unbiased:

E(\hat{\beta}_j) = \beta_j, \; j = 0, 1, 2, \cdots, k

The centers of the sampling distributions (i.e. expectations) of OLS estimators are equal to the unknown population parameters.

Including irrelevant explanatory variables in a regression Model

  • What happens if we add an irrelevant variable in the model? (overspecification of the model)

  • Irrelevance of the variable means that its coefficient in the population model is zero

  • E.g., suppose that in the regression below the partial effect of x_3 is zero, that is \beta_3 = 0

y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 + u

  • Taking the conditional expectation we have:

E(y|x_1, x_2, X_3) = E(y|x_1, x_2) = \beta_0 + \beta_1 x_1 + \beta_2 x_2

  • Even though the true model is given above x_3 is added to the model by mistake.

  • In this case SRF is given by

\hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x_1 + \hat{\beta}_2 x_2 + \hat{\beta}_3 x_3

  • OLS estimators are still unbiased:

\begin{align} E(\hat{\beta}_0) &= \beta_0 \notag \\ E(\hat{\beta}_1) &= \beta_1 \notag \\ E(\hat{\beta}_2) &= \beta_2 \notag \\ E(\hat{\beta}_0) &= 0 \notag \end{align}

  • However, even if they are unbiased, the variance of the regression will be larger if the model is overspecified

Omitting a relevant explanatory variable

  • What happens if we exclude an important variable?

  • If a relevant variable is omitted this implies that its parameter is not 0 in the PRF. (underspecification of the model)

  • In this case OLS estimators will be biased.

  • For example, suppose that the PRF includes 2 independent variables:

y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + u

  • Suppose that we omitted x_2 because, say, it is unobservable. Now the SRF is

\tilde{y} = \tilde{\beta}_0 + \tilde{\beta}_1 x_1

  • The impact of the omitted variable will be included in the error term: y = \beta_0 + \beta_1 x_1 + v, \; \text{where}\;\; v = \beta_2 x_2 + u

  • OLS estimator of \beta_1 in the model above is:

\hat{\beta}_1 = \frac{\sum_{i=1}^n (x_{i1} - \overline{x}_1)y_i}{\sum_{i=1}^n (x_{i1} - \overline{x}_1)^2}

  • To determine the magnitude and sign of the bias we will substitute y_i = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + u_i in the formula for \tilde{\beta}_1, re-arrange and take expectation

\begin{align} \tilde{\beta}_1 &= \frac{\sum_{i=1}^n (x_{i1} - \overline{x}_1)y_i}{\sum_{i=1}^n (x_{i1} - \overline{x}_1)^2} \notag \\ &= \frac{\sum_{i=1}^n (x_{i1} - \overline{x}_1)(\beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + u_i)}{\sum_{i=1}^n (x_{i1} - \overline{x}_1)^2} \notag \\ &= \beta_1 + \beta_2 \frac{\sum_{i=1}^n (x_{i1} - \overline{x}_1)x_{i2}}{\sum_{i=1}^n (x_{i1} - \overline{x}_1)^2} + \frac{\sum_{i=1}^n (x_{i1} - \overline{x}_1)u_i}{\sum_{i=1}^n (x_{i1} - \overline{x}_1)^2} \notag \end{align}

Now, \begin{align} E(\tilde{\beta}_1) &= \beta_1 + \beta_2 \frac{\sum_{i=1}^n (x_{i1} - \overline{x}_1)x_{i2}}{\sum_{i=1}^n (x_{i1} - \overline{x}_1)^2} + \frac{\sum_{i=1}^n (x_{i1} - \overline{x}_1)\bf{E}(u_i)}{\sum_{i=1}^n (x_{i1} - \overline{x}_1)^2} \notag \\ &= \beta_1 + \beta_2 \frac{\sum_{i=1}^n (x_{i1} - \overline{x}_1)x_{i2}}{\sum_{i=1}^n (x_{i1} - \overline{x}_1)^2} \notag \end{align}

Therefore, the bias of \tilde{\beta}_1 is given by \beta_2 \frac{\sum_{i=1}^n (x_{i1} - \overline{x}_1)x_{i2}}{\sum_{i=1}^n (x_{i1} - \overline{x}_1)^2}

  • The sign of bias depends on both \beta_2 and the correlation between omitted variable (x_2) and included variable (x_1)

  • It is not possible to calculate this correlation if omitted variable cannot be observed

  • The following table summarizes possible cases:

Code
knitr::include_graphics("pic12.png")

  • A small bias relative to \beta_1 may not be a problem in practice

  • In most cases we are not able to calculate the size of the bias. But in some cases we may have an idea about the direction of bias.

  • For example, suppose that in the wage equation true PRF contains both education and ability.

  • Suppose also that ability is omitted because it cannot be observed, leading to omitted variable bias.

  • In this case we can say that sign of the bias is (+) because it is reasonable to think that people with more ability tend to have higher levels of education and ability is positively related to wage

8 Variances of OLS estimators

MLR.5: Homoscedasticity

This assumption states that conditional on x variables error term has constant variance: var(u|x_1, x_2, \cdots, x_k) = \sigma^2

  • If this assumption fails the model exhibits heteroscedasticity.

  • This assumption is essential in deriving variances and standard errors of OLS estimators and in showing whether OLS estimators are efficient

  • We do not need this assumption for unbiasedness

  • In the wage equation this assumption implies that, the variance of unobserved factors does not change with the factors included in the model (education, experience, tenure, etc.)

  • MLR.1 to MLR.5 assumptions are only valid for cross-sectional data. They need to be modified for time series data.

  • These assumptions are referred as the Gauss-Markov assumptions

  • Assumptions MLR.4 and MLR.5 can be restated in terms of the dependent variable: \begin{align} E(y|x_1, x_2, \cdots, x_k) &= \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_k x_k \notag \\ var(y|x_1, x_2, \cdots, x_k) &= var(u|x_1, x_2, \cdots, x_k) = \sigma^2 \notag \end{align}

Theorem:

Under the Gauss-Markov assumptions, Var(\hat{\beta}_j) = \frac{\sigma^2}{SST_j(1-R_j^2)}, \; j = 1, 2, \cdots, k where SST_j = \sum_{i=1}^n (x_{ij} - \overline{x}_j)^2

is the total sample variation in x_{ij} and R_j^2 is the R-squared from regressing x_j on all other independent variables (including an intercept term)

  • Var(\hat{\beta}_j) moves in the same direction with \sigma^2 but in opposite direction with SST_j

  • To increase SST_j\; we need to collect more data (increase n).

  • To reduce \sigma^2\; we need to find good explanatory variables

  • Var(\hat{\beta}_j) also depends on R_j^2 which measures the degree of correlation among x variables

  • We did not have this term in the simple regression analysis because there was only one explanatory variable.

  • As the degree of correlation increases among x variables the variances of OLS estimators get larger and larger.

  • When there are high level of collinearity among x variables variances of OLS estimators will be larger. This is called multicollinearity problem

Estimating Variances

An unbiased estimator of the error variance is \hat{\sigma}^2 = \frac{1}{n-k-1}\sum_{i=1}^n \hat{u}_i^2 = \frac{SSE}{n-k-1}

Consequently, the standard error of \hat{\beta}_js is given by se(\hat{\beta}_j) = \frac{\hat{\sigma}}{\sqrt{SST_j(1-R_j^2)}}, \; j = 1, 2, \cdots, k

  • \hat{\sigma} = \sqrt{\hat{\sigma}^2} is called the root mean square error (RMSE); also referred to as stardard error of regression (SER)

  • It is the estimator of the standard deviation of the error term

  • It increases or decreases when a new variable is added in the model

  • It is used in constructing confidence intervals and testing hypothesis

  • Under the assumptions MLR.1 through MLR.5, OLS estimators are the best linear unbiased estimators (BLUE) for unknown population parameters

    • best: minimum variance (efficient)

    • linear: the beta coefficients are linear function of the y_is, i. e., \hat{\beta}_j = \sum_{i=1}^n w_{ij}y_i

    • unbiased: E(\hat{\beta}_j) = \beta_j

  • If any of the five assumptions fails, the Gauss-Markov theorem does not hold

    • When MLR.4 fails unbiasedness property does not hold
    • When MLR.5 fails efficiency does not hold