Chapter 5: Regression with a Single Regressor (Hypothesis Tests and Confidence Intervals)

Karim Naguib (Boston University)
9/14/2013

Inference on the Slope Parameter

  • Recall that our model is

    \[ Y_i = \beta_0 + \beta_1 X_i + u_i \]

  • We know the sampling distribution of the estimator \( \hat{\beta}_1 \)

    \[ \hat{\beta}_1 \sim N\left(\beta_1, \frac{Var[(X_i - \mu_X)u_i]}{n[Var(X_i)]^2}\right) \]

  • We would like to carry out some inferences on the population parameter \( \beta_1 \)

Testing Hypotheses About One of the Regression Coefficients

Hypothesis Testing for the Slope Parameter

  • Suppose in the case of the California test scores and their relationship to the \( STR \), a parent claims that class size has no effect on test scores
  • We want to represent this claim in a testable form: the claim is that \( \beta_{ClassSize} = 0 \), which means that the regression line is flat
  • Given the data from the 420 school districts can we reject this hypothesis?

Two-Sided Hypothesis Testing

  • Recall that to test any hypothesis we rely on the \( t \)-statistic

    \[ t = \frac{\text{estimator} - \text{hypothesized value}}{\text{standard error of the estimator}} \]

  • From that we calculate the \( p \)-value which tells us the smallest significance level at which \( H_0 \) can be rejected

    \[ p\text{-value} = 2 \Phi(-|t|) \]

Two-Sided Hypotheses Concerning the Slope Parameter (1)

  • To test hypotheses such as \( \beta_{ClassSize} \) is equal to a particular value, we generally state the test as

\[ \begin{align*} H_0&: \beta_1 = \beta_{1,0} \\ H_1&: \beta_1 \ne \beta_{1,0} \end{align*} \] and the \( t \)-statistic is

\[ t = \frac{\hat{\beta}_1 - \beta_{1,0}}{SE(\hat{\beta}_1)} \]

Two-Sided Hypotheses Concerning the Slope Parameter (2)

To calculate the \( t \)-statistic, the first thing we need is the standard error \( \hat{\beta}_1 \). Recall that the standard error is an estimate of \( \sigma_{\hat{\beta}_1} \) (the standard deviation of \( \hat{\beta}_1 \)) \[ SE(\hat{\beta}_1) = \sqrt{\hat{\sigma}_{\hat{\beta}_1}^2} \] where \[ \hat{\sigma}_{\hat{\beta}_1}^2 = \frac{1}{n}\times \frac{\overbrace{\frac{1}{n-2}\sum_i(X_i - \bar{X})^2\hat{u}_i^2}^{\text{sample variance of }(X_i - \mu_X)u_i}}{\big[\underbrace{\frac{1}{n - 1}\sum_i(X_i - \bar{X})^2}_{\text{sample variance of }X_i}\big]^2} \]

Two-Sided Hypotheses Concerning the Slope Parameter (3)

To calculate the \( p \)-value \[ \begin{align*} p\text{-value} &= Pr\left[|\hat{\beta}_1 - \beta_{1,0}| > |\hat{\beta}_1^{act} - \beta_{1,0}|\big|H_0\right] \\ &= Pr\left[\left|\frac{\hat{\beta}_1 - \beta_{1,0}}{SE(\hat{\beta}_1)}\right| > \left|\frac{\hat{\beta}_1^{act} - \beta_{1,0}}{SE(\hat{\beta}_1)}\right|\big|H_0\right] \\ &= Pr[|t| > |t^{act}|\big|H_0] \end{align*} \] And using a large sample, \( t \) can be approximated by a standard normal distribution, and therefore \[ p\text{-value} = Pr(|Z| > |t^{act}|) = 2\Phi(-|t^{act}|) \] where \( Z \sim N(0,1) \)

R Regression Output for Test Scores/STR Model

regress.results <- lm(testscr ~ str, data = test.score.data)
summary(regress.results)

Call:
lm(formula = testscr ~ str, data = test.score.data)

Residuals:
   Min     1Q Median     3Q    Max 
-47.73 -14.25   0.48  12.82  48.54 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   698.93       9.47   73.82  < 2e-16 ***
str            -2.28       0.48   -4.75  2.8e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 18.6 on 418 degrees of freedom
Multiple R-squared:  0.0512,    Adjusted R-squared:  0.049 
F-statistic: 22.6 on 1 and 418 DF,  p-value: 2.78e-06

Reporting Regression Results

For the test scores/STR model we saw that \( SE(\hat{\beta}_0) = 9.4675 \) and \( SE(\hat{\beta}_1) = 0.4798 \). A compact way to report regression results is \[ \begin{align*} \widehat{TestScore} = &698.9330 - 2.2798 \times STR\\ (&9.4675)~ ~ ~ ~ ~(0.4798) \\ \\ R^2 = &0.05124,~SER = 18.58 \end{align*} \] And the reported \( t \)-statistic is calculated as \[ t^{act} = \frac{-2.2798 - 0}{0.4798} = -4.751 \] Since \( |t^{act}| > z_{0.025} = 1.96 \) we can reject \( H_0 \) at the 5% significance level.

One-Sided Hypotheses Concerning the Slope Parameter

Alternatively, we can have a one-sided test for \( \beta_1 \) such as \[ \begin{align*} H_0&: \beta_1 = \beta_{1,0} \\ H_1&: \beta_1 < \beta_{1,0} \end{align*} \] In this case, our decision rule would be to reject \( H_0 \) if \[ t^{act} < -z_{\alpha} \] and \[ p\text{-value} = Pr(Z < t^{act}) = \Phi(t^{act}) \] (If the direction of inequality in \( H_1 \) was reversed, the decision rule would be to reject if \( t^{act} > z_{\alpha} \), and \( p\text{value} = 1 - \Phi(t^{act}) \))

One-Sided Test for Test Scores

Consider the hypotheses

\[ \begin{align*} H_0&: \beta_{ClassSize} = 0 \\ H_1&: \beta_{ClassSize} < 0 \end{align*} \]

Since we know from the regression output that

\[ t^{act} = -4.751 < -z_{0.05} = -1.645 \]

we can reject \( H_0 \) at the 5% significance level.

Confidence Intervals For a Regression Coefficient

Confidence Interval for the Slope Parameter

The \( (1-\alpha)\times 100 \% \) confidence interval for the parameter \( \beta_1 \) is \[ [\hat{\beta}_1 - z_{\alpha/2}SE(\hat{\beta}_1), \hat{\beta}_1 + z_{\alpha/2}SE(\hat{\beta}_1)] \]

Application to Test Scores

The 95% confidence interval for \( \beta_{ClassSize} \) is \[ \begin{align*} &[\hat{\beta}_{ClassSize} - z_{0.025}SE(\hat{\beta}_{ClassSize}),\hat{\beta}_{ClassSize} + z_{0.025}SE(\hat{\beta}_{ClassSize})] = \\ &[-2.28 - 1.96\times 0.48,-2.28 + 1.96\times 0.48] = \\ &[-3.22, -1.34] \end{align*} \]

regress.results <- lm(testscr ~ str, data = test.score.data)
confint(regress.results, "str", level = 0.95) 
     2.5 % 97.5 %
str -3.223 -1.337

Confidence Interval for Predicted Effects of Changing X

The confidence interval for the predicted change in the dependent variable \( Y \) in response to change \( \Delta x \) in \( X \) \[ \left[ [\hat{\beta}_1 - z_{\alpha/2}SE(\hat{\beta}_1)]\times \Delta x, [\hat{\beta}_1 + z_{\alpha/2}SE(\hat{\beta}_1)]\times \Delta x \right] \]

Confidence Interval for Predicted Effects of Changing *STR*

Suppose we want to calculate the confidence interval for the predicted change in test scores in response to a reduction of 2 in the \( STR \) \[ \begin{align*} &\left[ (-2.28 - 1.96\times 0.48)\times 2,(-2.28 + 1.96\times 0.48)\times 2\right] = \\ &[-6.45, -2.67] \end{align*} \]

confint(regress.results, 'str', level = 0.95) * 2
     2.5 % 97.5 %
str -6.446 -2.673

Regression When X Is a Binary Variable

Dummy Variables

  • It is possible to have our regressor \( X \) be a discrete categorical variables, as opposed to a continuous one. For example, it can indicate a worker's gender (= 1 if female, = 0 if male), or whether a school district in an urban or rural area (= 1 if urban, = 0 if rural).
  • Such a regressor is called an indicator variable or a dummy variable

Interpretation of the Regression Coefficient (1)

  • The regression method used for a dummy variable regression is the same as was done with a continuous regressor
  • However, the interpretation is not the same
  • In the test score model, consider the dummy variable \( D_i \) where it is \[ \begin{equation*} D_i = \left\{ \begin{array}{ll} 1 & \text{if the STR in district }i < 20 \\ 0 & \text{if the STR in district }i \geq 20 \end{array} \right. \end{equation*} \]
  • The population regression model is \[ Y_i = \beta_0 + \beta_1 D_i + u_i,~i=1,\dots,n \]
  • We can no longer interpret \( \beta_1 \) as the slope. We usually refer to \( \beta_1 \) as the coefficient on \( D_i \)

Interpretation of the Regression Coefficient (2)

In order to interpret the meaning of the coefficient \( \beta_1 \), we need to consider the different cases of \( D_i \)

  • [\( D_i = 0 \)] The regression equation becomes \( Y_i = \beta_0 + u_i \) and since \( E[u_i|D_i] = 0 \) (A.1), \( E[Y_i|D_i = 0] = \beta_0 \) (the population mean of test scores when the \( STR \) in district \( i \) is high)}
  • [\( D_i = 1 \)] The regression equation becomes \( Y_i = \beta_0 + \beta_1 + u_i \) and since \( E[u_i|D_i] = 0 \) (A.1), \( E[Y_i|D_i = 1] = \beta_0 + \beta_1 \) (the population mean of test scores when the STR in district \( i \) is low)

Therefore, \( \beta_1 \) can be viewed as the difference between the two means \[ E[Y_i|D_i = 1] - E[Y_i|D_i = 0] = (\beta_0 + \beta_1) - \beta_0 = \beta_1 \]

Hypothesis Testing and Confidence Intervals for the Dummy Coefficient

  • What is of interest is to test the hypothesis that there is no difference between the population means of the two population \( (D_i = 0) \) and \( (D_i = 1) \), which is the same two-sided test as before

\[ \begin{align*} H_0&: \beta_1 = 0 \\ H_1&: \beta_1 \ne 0 \end{align*} \]

  • Confidence intervals are calculated the same way as with continuous regressor variables

Dummy Variable Regression Using R

test.score.data$d <- ifelse(test.score.data$str < 20, 1, 0)
str(test.score.data$d)
 num [1:420] 1 0 1 1 1 0 1 0 1 0 ...
regression.results <- lm(testscr ~ d, data = test.score.data)
summary(regression.results)

Call:
lm(formula = testscr ~ d, data = test.score.data)

Residuals:
   Min     1Q Median     3Q    Max 
-50.60 -14.05  -0.45  12.84  49.40 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   649.98       1.39     468  < 2e-16 ***
d               7.37       1.84       4  7.5e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 18.7 on 418 degrees of freedom
Multiple R-squared:  0.0369,    Adjusted R-squared:  0.0345 
F-statistic:   16 on 1 and 418 DF,  p-value: 7.52e-05

Heteroskedasticity and Homoskedasticity

Heteroskedasticity and Homoskedasticity

The error term \( u_i \) is said to be homoskedastic if the variance of the conditional distribution of \( u_i \) given \( X_i \) \[ Var(u_i|X_i = x), \] is constant for all \( i \) and does not depend on \( x \). Otherwise, the error term is said to be heteroskedastic.

Homoskedasticity Example

Heteroskedasticity Example

Numerical Example (1)

Suppose we are interested in the gender gap in income of college graduates in the US. Our model is

\[ Earnings_i = \beta_0 + \beta_1 MALE_i + u_i,~i=1,\dots,n \]

where \( MALE_i \) is a dummy variables that is equal to 1 for males and 0 for females. \( \beta_1 \) would be the difference in mean earnings between men and women.

Numerical Example (2)

It is useful to write this model as two equations

Subpopulation
Women: \( Earnings_i = \beta_0 + u_i \)
Men: \( Earnings_i = \beta_0 + \beta_1 + u_i \)

  • \( u_i \) are the deviations around the mean earnings. Any statements about the variance of \( u_i \) is a statement about the variance of earnings
  • If \( u_i \) are homoskedastic, that means the variance (spread) of income around the male and female subpopulations is the same
  • Conversely ,if \( u_i \) are heteroskedastic, that means the variance (spread) of income around the male and female subpopulations is different

Mathematical Implications of Homoskedasticity (1)

  • Because none of the OLS assumptions (A.1-A.3) that we covered in the previous chapter place any restrictions on the conditional variance of the error term \( u_i \), OLS estimators remain unbiased consistent under homoskedasticity.
  • If assumptions A.1-A.3 hold and the error terms are homoskedastic, then the OLS estimators \( \hat{\beta}_0 \) and \( \hat{\beta}_1 \) are efficient among all linear estimators

Mathematical Implications of Homoskedasticity (2)

If the error terms \( u_i \) are homoskedastic, we can simplify the formula for the estimate of the variance of \( \hat{\beta}_1 \) to be \[ \tilde{\sigma}_{\hat{\beta}_1}^2 = \frac{s_{\hat{u}}^2}{\sum_i (X_i - \bar{X})^2} \] and the homoskedasticity-only standard error of \( \hat{\beta}_1 \) would be \[ SE(\hat{\beta}_1) = \sqrt{\tilde{\sigma}_{\hat{\beta}_1}^2} \]

When To Assume Homoskedasticity?

  • It is typically a safer strategy to always assume that the error terms are heteroskedast c
  • We should normally use the heteroskedasticity-robust standard errors that were presented in the previous chapter, unless we have very strong reasons to assume homoskedasticity

Homoskedasticity and Heteroskedasticity in R (1)

The reason why some of standard errors we got before using R were different than from the textbook is that R assumes that the error terms are homoskedastic while the textbook does all its calculations using heteroskedasticity-robust standard errors.

Homoskedasticity and Heteroskedasticity in R (2)

To use heteroskedasticity-robust standard errors in R

library(sandwich)
library(lmtest)
regress.results <- lm(testscr ~ str, data = test.score.data)
heterosk.robust.se <- vcovHC(regress.results)
coeftest(regress.results, vcov.=heterosk.robust.se)

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  698.933     10.461   66.82  < 2e-16 ***
str           -2.280      0.524   -4.35  1.7e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1