1.1 What is heteroskedasticity? and the econometric issue it causes.
Heteroscedasticity refers to the situation where the error terms are not uniformly distributed around the predicted values of the dependent variable. Although the sum of the error terms is zero, the standard deviation of the error terms can vary at different values of the independent variables.
Heteroscedasticity can lead to a situation where the accuracy of predictions is affected near the values of the independent variables where the standard deviation of the error terms is either large or small, making it more likely to produce erroneous predictions. This can impact the results of certain statistical tests, particularly the t-test.
1.2 What are the null and alternative hypotheses in BP and White Test?
# Main regression model specification (Sepal Length as dependent variable)# Sepal.Length = β0 + β1 Sepal.Width + β2 Petal.Length + β3 Petal.Width + εsubset_iris <- iris[, c(1, 2, 3, 4)] # Select relevant columns# Linear regression modellm_mod <-lm(formula = Sepal.Length ~ ., data = subset_iris)# Summary of the regression modelsummary(lm_mod) # confirm the independent variables you want
Call:
lm(formula = Sepal.Length ~ ., data = subset_iris)
Residuals:
Min 1Q Median 3Q Max
-0.82816 -0.21989 0.01875 0.19709 0.84570
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.85600 0.25078 7.401 9.85e-12 ***
Sepal.Width 0.65084 0.06665 9.765 < 2e-16 ***
Petal.Length 0.70913 0.05672 12.502 < 2e-16 ***
Petal.Width -0.55648 0.12755 -4.363 2.41e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.3145 on 146 degrees of freedom
Multiple R-squared: 0.8586, Adjusted R-squared: 0.8557
F-statistic: 295.5 on 3 and 146 DF, p-value: < 2.2e-16
# Heteroskedasticity tests## Residual Analysis# Plot "residuals versus fitted values" chartplot(lm_mod, which =1)
# Set up the plotting region to have four subplotspar(mfrow =c(2, 2))plot(lm_mod)
par(mfrow =c(1, 1))# Formal tests with packages for heteroskedasticity# White test for heteroskedasticity# Conduct White test for heteroscedasticity using the skedastic packagewhite_test_result <-white(mainlm = lm_mod, interactions =TRUE)# View the test resultswhite_test_result
# A tibble: 1 × 5
statistic p.value parameter method alternative
<dbl> <dbl> <dbl> <chr> <chr>
1 10.4 0.323 9 White's Test greater
# Auxiliary regression for heteroskedasticity (White Test)# Create the residuals squared variablesubset_iris$residuals <-resid(lm_mod)summary(subset_iris$residuals) # mean should be zero
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.82816 -0.21989 0.01875 0.00000 0.19709 0.84570
subset_iris$squared_residuals <- (subset_iris$residuals)^2summary(subset_iris$squared_residuals) # should not have any negative values
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000003 0.0105026 0.0461078 0.0963027 0.1386987 0.7152021
# Create X variables: original predictors, squared terms, and interaction termswhite_auxiliary_reg <-lm(formula = squared_residuals ~ Sepal.Width + Petal.Length + Petal.Width +I(Sepal.Width^2) +I(Petal.Length^2) +I(Petal.Width^2) + Sepal.Width:Petal.Length + Petal.Length:Petal.Width + Petal.Width:Sepal.Width,data = subset_iris)# Summary of the auxiliary regressionwhite_auxiliary_reg_summary <-summary(white_auxiliary_reg)white_auxiliary_reg_summary
# BP test (Breusch-Pagan test) for heteroskedasticity# Perform Breusch-Pagan test for heteroskedasticitybp_test_result <-bptest(formula = lm_mod, studentize =TRUE)# View the test resultsbp_test_result
studentized Breusch-Pagan test
data: lm_mod
BP = 6.9605, df = 3, p-value = 0.07317
# Breusch-Pagan test using skedastic packagebp_skedastic_result <- skedastic::breusch_pagan(lm_mod)# View both results (they should match)bp_test_result
studentized Breusch-Pagan test
data: lm_mod
BP = 6.9605, df = 3, p-value = 0.07317
# Conclusion of Breusch-Pagan Testbp_test_statistic <- bp_test_result$statisticbp_p_value <- bp_test_result$p.valuebp_test_statistic
BP
6.960452
bp_p_value
BP
0.07316904
The R-squared value in the auxiliary regression is 0.06901, which indicates that this model can only explain 6.9% of the variation in the residuals. This is a relatively small value, which also suggests that the model is less likely to have heteroscedasticity issues.
Next, I will conduct a chi-square test at alpha= 5%.
# Given R^2 and sample size nR_squared <-0.06901n <-150# Calculate the chi-square test statistic (χ²)test_statistic <- R_squared * n# Print the test statisticcat("Test Statistic (χ²): ", test_statistic, "\n")
Test Statistic (χ²): 10.3515
qchisq(1-0.05, df =9)
[1] 16.91898
Since the test statistic is less than the critical value, we cannot reject the null hypothesis, which also indicates that the model does not have heteroscedasticity issues.