data("CASchools")
Discussion Heteroskedasticity
Part 1
’1) What is “heteroskedasticity”, and the econometric issue it causes (affects point estimates or standard errors)? (2-3 sentences in your own words.
- Heteroskedasticity occurs when the variability of the errors in a regression model changes across observations. While the accuracy of the estimated coefficients isn’t affected, it can lead to incorrect standard errors, making hypothesis tests and confidence intervals unreliable.
’2) What is the null and alternative hypothesis in Breusch–Pagan test or White test? The hypothesis is the same, but the auxiliary regression specification is slightly different. Do you agree with the test logic? (2-3 sentences)
- In the Breusch Pagan or White test, the null hypothesis assumes constant error variance (homoskedasticity) while the alternative hypothesis suggests error variance depends on one or more predictors (heteroskedasticity). Overall, both these tests seem fairly logical and resonable as they analyze whether residuals show patterns related to the predictors, which would signal heteroskedasticity.
Part 2
When you click the Render button a document will be generated that includes both content and the output of embedded code. You can embed code like this:
You can add options to executable code like this
Running Model - Math scores
\(Math \ scores_i = \beta_0 + \beta_1 * Income_i + \beta_2 * computers_i + \beta_3 * teachers_i + \epsilon_i\)
library(stargazer)
# Model
<- lm(math ~ income + computer + teachers,
main_model data = CASchools)
# Print
stargazer(main_model,
type = "text")
===============================================
Dependent variable:
---------------------------
math
-----------------------------------------------
income 1.801***
(0.090)
computer 0.009**
(0.004)
teachers -0.033***
(0.010)
Constant 627.294***
(1.559)
-----------------------------------------------
Observations 420
R2 0.512
Adjusted R2 0.508
Residual Std. Error 13.150 (df = 416)
F Statistic 145.433*** (df = 3; 416)
===============================================
Note: *p<0.1; **p<0.05; ***p<0.01
From initial observations, there is evidence of heteroskedasticity, as the variance of the residuals is not constant across the fitted values.
plot(main_model, which = 1)
par(mfrow = c(2, 2))
plot(main_model)
Applying the White Test
Null Hypothesis: There is a constant variance in the residuals (homoskedasticity)
Alternative Hypothesis: There is not a constant variance in the residuals (heteroskedasticity)
# Conduct White test for heteroscedasticity
<- white(mainlm = main_model,
skedastic_package_white interactions = TRUE)
# View the test results
skedastic_package_white
# A tibble: 1 × 5
statistic p.value parameter method alternative
<dbl> <dbl> <dbl> <chr> <chr>
1 44.8 0.00000102 9 White's Test greater
Interpreting the output
Test Statistic: 44.8
P Value: 0.00000102
- P Value < 0.05, we reject the null hypothesis of homoskedasticity, indicating heteroskedasticity is present.
Applying the Auxiliary Regression
The objective is to look for a relationship between the squared residuals and the original independent variables.
Original independent variables
$residuals <- resid(object = main_model)
CASchools
summary(CASchools$residuals) # mean should be zero
Min. 1st Qu. Median Mean 3rd Qu. Max.
-40.65993 -8.79418 -0.02287 0.00000 8.28064 32.51686
Squared residuals
$squared_residuals <- (CASchools$residuals)^2
CASchools
summary(CASchools$squared_residuals) # should not have any negative values
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 14.37 71.69 171.26 232.64 1653.23
Regression
<- lm(formula = squared_residuals ~ income + computer + teachers +
white_auxillary_reg I(income^2) + I(computer^2) + I(teachers^2) +
:computer + computer:teachers + teachers:income,
incomedata = CASchools)
<- summary(white_auxillary_reg)
white_auxillary_reg_summary
white_auxillary_reg_summary
Call:
lm(formula = squared_residuals ~ income + computer + teachers +
I(income^2) + I(computer^2) + I(teachers^2) + income:computer +
computer:teachers + teachers:income, data = CASchools)
Residuals:
Min 1Q Median 3Q Max
-346.54 -145.22 -57.87 62.57 1422.18
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.267e+02 6.270e+01 6.806 3.57e-11 ***
income -2.300e+01 6.119e+00 -3.759 0.000195 ***
computer -4.630e-01 2.785e-01 -1.662 0.097186 .
teachers 1.451e-01 7.626e-01 0.190 0.849234
I(income^2) 4.868e-01 1.201e-01 4.053 6.04e-05 ***
I(computer^2) 4.677e-04 1.803e-04 2.594 0.009823 **
I(teachers^2) 8.523e-04 7.436e-04 1.146 0.252407
income:computer -1.597e-03 1.568e-02 -0.102 0.918916
computer:teachers -1.311e-03 6.896e-04 -1.901 0.058017 .
income:teachers 3.033e-02 4.428e-02 0.685 0.493802
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 227.3 on 410 degrees of freedom
Multiple R-squared: 0.1066, Adjusted R-squared: 0.08696
F-statistic: 5.434 on 9 and 410 DF, p-value: 4.677e-07
Results
The R squared of 0.107 is smaller than the R squared (0.512) in our original regression, Since the R squared from the auxiliary regression is much smaller than that of the original model, it suggests that the predictors are not strongly explaining the variance in the residuals.
Applying the Chi Squared Test
Chi-Square test statistic
Taking the R squared and the # of observations from the Auxiliary Regression.
R squared of the auxiliary regression: 0.1066
Observations: 420
0.1066*420
[1] 44.772
Chi-Square P value
<- pchisq(q = white_auxillary_reg_summary$r.squared * nobs(white_auxillary_reg),
chisq_p_value df = 9,
lower.tail = FALSE )
chisq_p_value
[1] 1.022575e-06
Replicating test statistic: same as above
$r.squared * nobs(white_auxillary_reg) white_auxillary_reg_summary
[1] 44.75848
skedastic_package_white
# A tibble: 1 × 5
statistic p.value parameter method alternative
<dbl> <dbl> <dbl> <chr> <chr>
1 44.8 0.00000102 9 White's Test greater
Critical Value
# critical value - Upper tail analysis
qchisq(p = .95,
df = 9)
[1] 16.91898
T value 44.8 > Critical Value of 16.9
This means we reject the null hypothesis of homoskedasticity, indicating heteroskedasticity is present. Similarily, we can arrive to the same conclusion by referencing the p valye of 0.00000102 < 0.05.