Discussion Heteroskedasticity

Author

Bryan Calderon

Part 1

’1) What is “heteroskedasticity”, and the econometric issue it causes (affects point estimates or standard errors)? (2-3 sentences in your own words.

  • Heteroskedasticity occurs when the variability of the errors in a regression model changes across observations. While the accuracy of the estimated coefficients isn’t affected, it can lead to incorrect standard errors, making hypothesis tests and confidence intervals unreliable.

’2) What is the null and alternative hypothesis in Breusch–Pagan test or White test? The hypothesis is the same, but the auxiliary regression specification is slightly different. Do you agree with the test logic? (2-3 sentences)

  • In the Breusch Pagan or White test, the null hypothesis assumes constant error variance (homoskedasticity) while the alternative hypothesis suggests error variance depends on one or more predictors (heteroskedasticity). Overall, both these tests seem fairly logical and resonable as they analyze whether residuals show patterns related to the predictors, which would signal heteroskedasticity.

Part 2

When you click the Render button a document will be generated that includes both content and the output of embedded code. You can embed code like this:

You can add options to executable code like this

data("CASchools")

Running Model - Math scores

\(Math \ scores_i = \beta_0 + \beta_1 * Income_i + \beta_2 * computers_i + \beta_3 * teachers_i + \epsilon_i\)

library(stargazer)

# Model
main_model <- lm(math ~ income + computer + teachers, 
                 data = CASchools)

# Print
stargazer(main_model, 
          type = "text")

===============================================
                        Dependent variable:    
                    ---------------------------
                               math            
-----------------------------------------------
income                       1.801***          
                              (0.090)          
                                               
computer                      0.009**          
                              (0.004)          
                                               
teachers                     -0.033***         
                              (0.010)          
                                               
Constant                    627.294***         
                              (1.559)          
                                               
-----------------------------------------------
Observations                    420            
R2                             0.512           
Adjusted R2                    0.508           
Residual Std. Error      13.150 (df = 416)     
F Statistic          145.433*** (df = 3; 416)  
===============================================
Note:               *p<0.1; **p<0.05; ***p<0.01

From initial observations, there is evidence of heteroskedasticity, as the variance of the residuals is not constant across the fitted values.

plot(main_model, which = 1)

par(mfrow = c(2, 2))

plot(main_model)

Applying the White Test

Null Hypothesis: There is a constant variance in the residuals (homoskedasticity)

Alternative Hypothesis: There is not a constant variance in the residuals (heteroskedasticity)

# Conduct White test for heteroscedasticity
skedastic_package_white  <- white(mainlm =  main_model, 
                                  interactions = TRUE)         

# View the test results
skedastic_package_white
# A tibble: 1 × 5
  statistic    p.value parameter method       alternative
      <dbl>      <dbl>     <dbl> <chr>        <chr>      
1      44.8 0.00000102         9 White's Test greater    

Interpreting the output

  • Test Statistic: 44.8

  • P Value: 0.00000102

    • P Value < 0.05, we reject the null hypothesis of homoskedasticity, indicating heteroskedasticity is present.

Applying the Auxiliary Regression

The objective is to look for a relationship between the squared residuals and the original independent variables.

Original independent variables

CASchools$residuals <- resid(object = main_model)

summary(CASchools$residuals) # mean should be zero
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
-40.65993  -8.79418  -0.02287   0.00000   8.28064  32.51686 

Squared residuals

CASchools$squared_residuals <- (CASchools$residuals)^2 

summary(CASchools$squared_residuals) # should not have any negative values
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.00   14.37   71.69  171.26  232.64 1653.23 

Regression

white_auxillary_reg <- lm(formula = squared_residuals ~ income + computer + teachers + 
                            I(income^2)  + I(computer^2) + I(teachers^2) +
                            income:computer + computer:teachers + teachers:income,
                          data = CASchools) 

white_auxillary_reg_summary <- summary(white_auxillary_reg)

white_auxillary_reg_summary

Call:
lm(formula = squared_residuals ~ income + computer + teachers + 
    I(income^2) + I(computer^2) + I(teachers^2) + income:computer + 
    computer:teachers + teachers:income, data = CASchools)

Residuals:
    Min      1Q  Median      3Q     Max 
-346.54 -145.22  -57.87   62.57 1422.18 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)        4.267e+02  6.270e+01   6.806 3.57e-11 ***
income            -2.300e+01  6.119e+00  -3.759 0.000195 ***
computer          -4.630e-01  2.785e-01  -1.662 0.097186 .  
teachers           1.451e-01  7.626e-01   0.190 0.849234    
I(income^2)        4.868e-01  1.201e-01   4.053 6.04e-05 ***
I(computer^2)      4.677e-04  1.803e-04   2.594 0.009823 ** 
I(teachers^2)      8.523e-04  7.436e-04   1.146 0.252407    
income:computer   -1.597e-03  1.568e-02  -0.102 0.918916    
computer:teachers -1.311e-03  6.896e-04  -1.901 0.058017 .  
income:teachers    3.033e-02  4.428e-02   0.685 0.493802    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 227.3 on 410 degrees of freedom
Multiple R-squared:  0.1066,    Adjusted R-squared:  0.08696 
F-statistic: 5.434 on 9 and 410 DF,  p-value: 4.677e-07

Results

The R squared of 0.107 is smaller than the R squared (0.512) in our original regression, Since the R squared from the auxiliary regression is much smaller than that of the original model, it suggests that the predictors are not strongly explaining the variance in the residuals.

Applying the Chi Squared Test

Chi-Square test statistic

Taking the R squared and the # of observations from the Auxiliary Regression.

  • R squared of the auxiliary regression: 0.1066

  • Observations: 420

0.1066*420
[1] 44.772

Chi-Square P value

chisq_p_value <-  pchisq(q = white_auxillary_reg_summary$r.squared * nobs(white_auxillary_reg), 
                         df = 9, 
                         lower.tail = FALSE  )
chisq_p_value
[1] 1.022575e-06

Replicating test statistic: same as above

white_auxillary_reg_summary$r.squared * nobs(white_auxillary_reg)
[1] 44.75848
skedastic_package_white
# A tibble: 1 × 5
  statistic    p.value parameter method       alternative
      <dbl>      <dbl>     <dbl> <chr>        <chr>      
1      44.8 0.00000102         9 White's Test greater    

Critical Value

# critical value - Upper tail analysis
qchisq(p = .95, 
       df = 9)
[1] 16.91898

T value 44.8 > Critical Value of 16.9

This means we reject the null hypothesis of homoskedasticity, indicating heteroskedasticity is present. Similarily, we can arrive to the same conclusion by referencing the p valye of 0.00000102 < 0.05.