Chapter 7: Hypothesis Tests and Confidence Intervals in Multiple Regression

Karim Naguib (Boston University)
10/3/2013

Hypothesis Tests and Confidence Intervals for a Single Coefficient

Standard Errors for the OLS Estimators

Recall for single regressor estimation we were able to estimate the variance of \( \hat{\beta}_1 \) (\( \sigma_{\hat{\beta}_1}^2 \)) by relying on the law of large numbers
We were able to substitute sample averages for expectations
We called this estimate \( \hat{\sigma}_{\hat{\beta}_1}^2 \)
From it we can get the standard error \( SE(\hat{\beta}_1) = \sqrt{\hat{\sigma}_{\hat{\beta}_1}^2} \)
With multiple regressors we are able to rely on the same concepts (large-sample normality of the estimators) to calculate \( SE(\hat{\beta}_j) \) for any \( j \)

Hypothesis Tests for a Single Coefficient (1)

A Two-sided test for any parameter \( \beta_j \) would be \[ \begin{align*} H_0&: \beta_j = \beta_{j,0} \\ H_1&: \beta_j \ne \beta_{j,0} \end{align*} \]

For example, suppose the coefficient for \( STR \) is \( \beta_j \) and we want to test the hypothesis that it is equal to zero. In that case, we would have \( \beta_{j,0} = 0 \)

Hypothesis Tests for a Single Coefficient (2)

The \( t \)-statistic would be \[ t = \frac{\hat{\beta}_j - \beta_{j,0}}{SE(\hat{\beta}_j)} \]
The \( p \)-value would be \[ p\text{-value} = 2\Phi(-|t^{act}|) \] where \( t^{act} \) is the value of \( t \) for the observed sample.

Confidence Intervals for a Single Coefficient

The method for constructing a confidence interval is the same as with a single regression. The \( (1-\alpha)\times 100\% \) confidence interval for coefficient \( \beta_j \) is

\[ [\hat{\beta}_j - z_{\alpha/2}SE(\hat{\beta}_j), \hat{\beta}_j + z_{\alpha/2}SE(\hat{\beta}_j)] \]

Application to Test Scores and the Student-Teacher Ratio (1)

To restate our results from regressing \( TestScore \) on \( STR \) and \( PctEL \)

\[ \begin{array}{lclclcl} TestScore & = & 686.0 & - & 1.10 \times STR & - & 0.650 \times PctEL \\ & & (8.7) & & (00.43) & & (0.031) \end{array} \]

To test the hypothesis that the coefficient on \( STR \) is 0 we need to compute the \( t \)-value \[ t = \frac{-1.10 - 0}{0.43} = -2.54 \] } and the associated \( p \)-value is \[ p\text{-value} = 2\Phi(-2.54) = 0.011 \] We can reject the hypothesis at a significance level of 5% but not 1%.

Application to Test Scores and the Student-Teacher Ratio (2)

To calculate the 95% confidence interval for the coefficient on \( STR \)

\[ -1.10 \pm 1.96 \times 0.43 = (-1.95, -0.26) \]

And in response to a increase of 2 of the STR, a 95% confidence interval for the effect on test scores

\[ (-1.95\times 2, -0.26\times 2) = (-3.90, -0.52) \]

Adding Another Regressor (1)

Suppose we now want to also estimate the effect of expenditure per student. We want to know whether budget-cuts would be a good idea. We add a new regressor to the two we already have

\[ TestScore_i = \beta_0 + \beta_1 STR + \beta_2 Expn + \beta_3 PctEL + u_i \]

test.score.data$expn.per.1k <- test.score.data$expn.stu/1000
regress.results <- lm(testscr ~ str + expn.per.1k + el.pct, data = test.score.data)
coeftest(regress.results, vcov.=vcovHC(regress.results))


t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 649.5779    15.6686   41.46   <2e-16 ***
str          -0.2864     0.4875   -0.59    0.557    
expn.per.1k   3.8679     1.6074    2.41    0.017 *  
el.pct       -0.6560     0.0321  -20.43   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Adding Another Regressor (2)

The regression results can be restated as

\[ \begin{array}{lclclclcl} TestScore & = & 649.58 &-& 0.29 \times STR &+& 3.87 \times Expn &-& 0.66 \times PctEL \\ & & (15.38) & & (00.48) & & (1.57) & & (0.03) \end{array} \]

Notice that the coefficient on \( STR \) has dropped from \( -1.10 \) in the original regression
The \( t \)-statistic for the hypothesis that the coefficient on \( STR \) is zero is \( (-0.29-0)/0.48 = -0.60 \) and the \( p \)-value is \( 0.551 \)
We cannot reject the hypothesis that it is zero even at the 10% level
This leads us to conclude that there is no evidence that hiring more teachers improves test scores (reduces \( STR \)) when expenditure is held constant.

Adding Another Regressor (3)

\[ \begin{array}{lclclclcl} TestScore & = & 649.58 & - & 0.29 \times STR & + & 3.87 \times Expn & - & 0.66 \times PctEL \\ & & (15.38) & & (00.48) & & (1.57) & & (0.03) \end{array} \]

Note that the standard error of \( STR \) increased when we added \( Expn \)
Note that there is strong correlation between \( STR \) and \( Expn \)

cor(test.score.data$str, test.score.data$expn.stu)

[1] -0.62

and hence we're seeing the effect of imperfect multicollinearity

Tests of Joint Hypothesis

Testing Hypotheses on Two or More Coefficients

\[ TestScore_i = \beta_0 + \beta_1 STR + \beta_2 Expn + u_i \]

Suppose that in the test score/STR analysis, an angry taxpayer hypothesizes that neither the STR nor expenditure per student have an effect on test scores

\[ \begin{align*} H_0&: \beta_1 = 0 \text{ and } \beta_2 = 0 \\ H_1&: \beta_1 \ne 0 \text{ and/or } \beta_2 \ne 0 \end{align*} \]

(As a matter of terminology, here we see the null hypothesis imposing two restrictions)

Joint Hypothesis

joint hypothesis is a hypothesis that imposes \( q \geq 2 \) restrictions on the regression coefficents.

\( H_0: \beta_j = \beta_{j,0}, \beta_m = \beta_{m,0},\dots \) for \( q \) restrictions

\( H_1: \) one or more of the \( q \) restrictions under \( H_0 \) does not hold

For example, suppose we wanted to test the null hypothesis that the \( 2^{nd}, 4^{th}, \text{and }5^{th} \) coefficients are zero, we would have the \( q=3 \) restrictions \( \beta_2 = 0, \beta_4 = 0, \text{and }\beta_5 = 0 \)

Why Not Each Coefficient One At a Time? (1)

If we want to test the joint null hypothesis \( \beta_1 = 0 \) and \( \beta_2 = 0 \), why can't we get the \( t \)-statistic for each and if either exceeds our test's critical value, we reject it?
Because \( t_1 \) and \( t_2 \), respectively for the tests on \( \beta_1 \) and \( \beta_2 \), are random variables with a bivariate normal distribution, where each has the distribution \( N(0,1) \).

Why Not Each Coefficient One At a Time? (2)

Consider the special case where \( t_1 \) and \( t_2 \) are uncorrelated

What is the probability each test on its own is rejected, while the joint null hypothesis is true? It should be greater than the \( \alpha \times 100 \) significance level!
Take \( \alpha = 0.05 \): the joint null hypothesis is not rejected if both \( |t_1| \leq 1.96 \) and \( |t_2| \leq 1.96 \). This has the probability \[ Pr(|t_1| \leq 1.96) \times Pr(|t_2| \leq 1.96) = 0.95^2 = 0.9025 \] And hence the actual \( \alpha \) is \( 1-0.9025 = 0.0975 \)

The F-Statistic

In order to test a joint hypothesis with \( q=2 \) restriction we rely on the \( F \)-statistic \[ F = \frac{1}{2}\left(\frac{t_1^2 + t_2^2 + 2\hat{\rho}_{t_1,t_2}t_1t_2}{1 - \hat{\rho}_{t_1,t_2}^2}\right) \sim F_{2,\infty} \] where \( \hat{\rho}_{t_1,t_2} \) is an estimator of the correlation between the two \( t \)-statistics.
In general, for a hypothesis with \( q \) restrictions we would have \[ F \sim F_{q,\infty} \]

Computing the p-Value Using The F-Statistic

We can use the large-sample \( F_{q,\infty} \) approximation to calculate the \( p \)-value for an observe \( F^{act} \) \[ p\text{-value} = Pr[F_{q,\infty} > F^{act}] \]

Application to Test Scores Regression

We can now test the null hypothesis that the coefficients on \( STR \) and \( Expn \) are zero, against the alternative that at least one of them is nonzero, holding \( PctEL \) fixed. In R we can calculate the heteroskedasticity-robust \( F \)-statistic and its \( p \)-value, for the restrictions \( \beta_1 = 0 \) and \( \beta_2 = 0 \)

library(car)
regress.results <- lm(testscr ~ str + expn.per.1k + el.pct, data = test.score.data)
lht(regress.results, c('str = 0', 'expn.per.1k = 0'), test='F', vcov.=vcovHC(regress.results))

Linear hypothesis test

Hypothesis:
str = 0
expn.per.1k = 0

Model 1: restricted model
Model 2: testscr ~ str + expn.per.1k + el.pct

Note: Coefficient covariance matrix supplied.

  Res.Df Df    F Pr(>F)   
1    418                  
2    416  2 5.26 0.0055 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Testing Single Restrictions Involving Multiple Coefficients

One Restriction with Two Coefficients (1)

Suppose we now want to test the null hypothesis (\( q = 1 \)) that two of the parameters are equal

\[ \begin{align*} H_0&: \beta_1 = \beta_2 \\ H_1&: \beta_1 \ne \beta_2 \end{align*} \]

One Restriction with Two Coefficients (2)

There are two approaches to do this

Test Directly Some software packages allow for a \( q=1 \) null hypothesis with multiple coefficients

One Restriction with Two Coefficients (3)

Transform the Regression Consider the model \[ Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + u_i \] We can transform it by adding and subtracting \( \beta_2 X_{1i} \) \[ \begin{align*} Y_i &= \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + u_i \\ &= \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \beta_2 X_{1i} - \beta_2 X_{1i} + u_i \\ &= \beta_0 + (\underbrace{\beta_1 - \beta_2}_{\gamma_1}) X_{1i} + \beta_2 (\underbrace{X_{1i} + X_{2i}}_{W_i}) + u_i \\ &= \beta_0 + \gamma_1 X_{1i} + \beta_2 W_i + u_i \end{align*} \]

The test simplifies to testing the \( H_0: \gamma_1 = 0 \) vs \( H_1: \gamma \ne 0 \).

Testing One Restriction with Two Coefficients with R

Consider testing \( \beta_1 = \beta_3 \) in the test scores model

regress.results <- lm(testscr ~ str + expn.per.1k + el.pct, data = test.score.data)
lht(regress.results, 'str = el.pct', test='F', vcov.=vcovHC(regress.results))

Linear hypothesis test

Hypothesis:
str - el.pct = 0

Model 1: restricted model
Model 2: testscr ~ str + expn.per.1k + el.pct

Note: Coefficient covariance matrix supplied.

  Res.Df Df    F Pr(>F)
1    417               
2    416  1 0.56   0.45

Confidence Sets of Multiple Coefficients

Confidence Sets for Multiple Coefficients

In the single regressor case, where we describe an interval over which we are \( (1-\alpha)\times 100\% \) confident that the true parameter value lies.
We know what to be able to describe a \( k \)-dimensional set over which we are \( (1-\alpha)\times 100\% \) confident that the true value of \( k \) parameters lies
Consider the case of two coefficients: \( \beta_1 \) and \( \beta_2 \).
A 95% confidence set would be the two dimensional area inside which we are confident that the true values of \( \beta_1 \) and \( \beta_2 \) lie with 95% confidence

Graphical Representation of a Confidence Set

Model Specification for Multiple Regression

How To Choose Regressors

The question now becomes what model specification do we use: what variables should our dependent variable be regressed on?
We should not rely solely on purely statistical measures such as the \( R^2 \) or \( \bar{R}^2 \)
Instead, we should rely our expert knowledge of the problem being analyzed.

Omitted Variable Bias in Multiple Regression (1)

In a multiple regression we will have OVB if
- there is an omitted variable that is correlated with one of the included variables
- this omitted variable is a determinant of the dependent variable \( Y \)
Mathematically, OVB means that \( E[u_i|X_{1i},\dots,X_{ki}] \) is nonzero, which violates A.1

Omitted Variable Bias in Multiple Regression (2)

Consider the test scores example

We are regressing \( TestScore \) on \( STR \), \( Expn \) and \( PctEL \)
But consider that wealthier household would provide greater opportunity for learning outside school for this children
Additionally, districts with wealthier residents would tend to have smaller classer and greater expenditure per student
Hence, the level of affluence of households in a district, which is omitted, would be correlated with \( STR \) and \( Expn \) and have a direct effect on test scores

The Role of Control Variables in Multiple Regression

A control variable is a variable added to a regression to remove OVB from a variable of interest. The coefficient on the control variable is not of interest. The causality (influence on the dependent variable) of the control variable cannot be assumed.

Controlling For Students' Economic Background (1)

Now in order to control for the possible OVB in the test scores example due to “outside learning opportunities”“ available to richer households, we add the regressor \( LchPct \): the percentage of students receiving free or subsidized school lunch.

regress.results <- lm(testscr ~ str + el.pct + meal.pct, 
              data = test.score.data)
coeftest(regress.results, vcov.=vcovHC(regress.results))


t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 700.1500     5.6410  124.12  < 2e-16 ***
str          -0.9983     0.2738   -3.65  0.00030 ***
el.pct       -0.1216     0.0332   -3.66  0.00029 ***
meal.pct     -0.5473     0.0243  -22.51  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Controlling For Students' Economic Background (2)

No major change is observed for the coefficient on \( STR \), and it remains significant at the 1% level
The coefficient on \( LchPct \) is significant and very high. What can we make of this?
- The size of the coefficient suggests that dropping the percentage of free or subsidized means from 50% to 0% would result in a 27.34 points increase (\( 0.547\times (50 - 0) \))
- Can we claim the same causality for \( LchPct \) that we do for \( STR \)?

Assumption A.1': Conditional Mean Independence (1)

We replace assumption A.1 with assumption A.1' that requires conditional mean independence instead of conditional mean zero (\( E[u_i|X_{1i},\dots,X_{ki}] = 0 \)).
To explain conditional mean independence, consider a regression with two regressors: \( X_{1i} \), the variable of interest, and \( X_{2i} \), the control variable. Conditional mean independence would be

\[ E[u_i|X_{1i}, X_{2i}] = E[u_i|X_{2i}] \]

Assumption A.1': Conditional Mean Independence (2)

\[ E[u_i|X_{1i}, X_{2i}] = E[u_i|X_{2i}] \]

Therefore, in order for us to interpret the causality of \( X_{1i} \) the conditional mean of both \( X_{1i} \) and \( X_{2i} \) must be independent of \( X_{1i} \)
In other words, once we control for \( X_{2i} \) we can assume that \( X_{1i} \) is randomly assigned and not correlated with any unobserved characteristics included in \( u_i \)
The control variable could still be correlated with \( u_i \) and hence suffers from OVB

Model Specification in Theory and in Practice (1)

Given a set of variables, the question is which should be included in our regression?
If there is an OVB problem we clearly need to add the variables to control for it, but in practice how do we decide which should be controlled for?
Using our expert judgment or economic theory we first decided on a core set of variables to regress on. This is called the base specification

Model Specification in Theory and in Practice (2)

Next we develop a list of candidate alternative specifications
If we see no difference between the base and alternative specification we conclude that we do not need to add the extra regressors in the alternative specification
If there are differences (suggesting OVB) we need to control for the extra omitted variables

Interpreting the R Squared and the Adjusted R Squared in Practice

As mentioned before we must be careful not to over rely on these measures in making our choice of specification. Some problems with the \( R^2 \) and \( \bar{R}^2 \).

An increase in the \( R^2 \) and \( \bar{R}^2 \) do not indicate that an added variable is significant
The \( R^2 \) and \( \bar{R}^2 \) do not indicate causality
The \( R^2 \) and \( \bar{R}^2 \) do not indicate no OVB
The \( R^2 \) and \( \bar{R}^2 \) do not indicate the appropriateness of a choice of regressors

Analysis of the Test Score Data Set

Discussion of the Base and Alternative Specifications (1)

As we've done before we want to run a multiple regression to determine the effect of student to teacher ratio on average district test scores. We explained how we were concerned about OVB and need to control for students' background characteristics. Some controls we will consider

The percentage of students learning English
The percentage of students who are eligible for free or subsidized lunch
The percentage of students whose families qualify for income assistance

Discussion of Empirical Results

Controlling for student background characteristics cuts the effect of \( STR \) almost by half, but remains significant. Not large difference between specifications.
Student characteristics are a strong predictor of test scores as witnessed by the big increase in \( \bar{R}^2 \).
The control variable for the percentage of families receiving income assistance when added to regression (3) as in (5) is not significant, therefore it is redundant when the percentage of free or subsidized meals is included