Heteroskedasticity

Author

Song Lu

1.

Heteroskedasty occurs when the variance of the error term is not constant for all observations. This means the standard error is not reliable because the estimation is inefficient. 

Null Hypothesis: the residuals have constant variance (Homoscedasticity).

Alternative Hypothesis: the residuals do not have constant variance (Heteroskedasticity).

The logic behind these tests is reliable, as they help identify violations of the homoscedasticity assumption.

2.

Data Selection

library(skedastic)
Warning: package 'skedastic' was built under R version 4.4.2
df <- stackloss
model <- lm(df$stack.loss ~ df$Air.Flow + df$Water.Temp + df$Acid.Conc., data = df)

White Test

The p-value is 0.1275405 which is greater than the common significance level of 0.05. Therefore we failed to reject the null hypothesis. This means there is no evidence of heteroskedasticity in the model.

white_test <- white(model)
print(white_test)
# A tibble: 1 × 5
  statistic p.value parameter method       alternative
      <dbl>   <dbl>     <dbl> <chr>        <chr>      
1      9.93   0.128         6 White's Test greater    

3.

Auxiliary Regression

The R-squared value of 0.7156 indicates that approximately 71.56% of the variance in the squared residuals can be explained by the independent variables. Given the p-value is 0.04145, which is less than 0.05, we reject the null hypothesis of homoscedasticity. This confirms that there is evidence of heteroskedasticity in the model.

residuals_squared <- residuals(model)^2

auxiliary_model <- lm(residuals_squared ~ Air.Flow + Water.Temp + Acid.Conc. + I(Air.Flow^2) + I(Water.Temp^2) + I(Acid.Conc.^2) + Air.Flow:Water.Temp + Air.Flow:Acid.Conc. + Water.Temp:Acid.Conc., data = df)

summary(auxiliary_model)

Call:
lm(formula = residuals_squared ~ Air.Flow + Water.Temp + Acid.Conc. + 
    I(Air.Flow^2) + I(Water.Temp^2) + I(Acid.Conc.^2) + Air.Flow:Water.Temp + 
    Air.Flow:Acid.Conc. + Water.Temp:Acid.Conc., data = df)

Residuals:
     Min       1Q   Median       3Q      Max 
-11.4127  -4.7385  -0.9596   4.0676  19.4514 

Coefficients:
                        Estimate Std. Error t value Pr(>|t|)  
(Intercept)           -6.773e+02  5.464e+02  -1.240   0.2409  
Air.Flow              -4.540e+00  1.351e+01  -0.336   0.7431  
Water.Temp             5.174e+01  2.846e+01   1.818   0.0963 .
Acid.Conc.             5.933e+00  1.110e+01   0.535   0.6036  
I(Air.Flow^2)          2.155e-01  1.188e-01   1.814   0.0970 .
I(Water.Temp^2)        9.782e-01  7.969e-01   1.228   0.2452  
I(Acid.Conc.^2)       -9.361e-03  1.114e-01  -0.084   0.9345  
Air.Flow:Water.Temp   -1.106e+00  4.601e-01  -2.404   0.0350 *
Air.Flow:Acid.Conc.    2.940e-02  2.098e-01   0.140   0.8911  
Water.Temp:Acid.Conc. -2.970e-01  3.383e-01  -0.878   0.3988  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 9.108 on 11 degrees of freedom
Multiple R-squared:  0.7156,    Adjusted R-squared:  0.483 
F-statistic: 3.076 on 9 and 11 DF,  p-value: 0.04145

4.

Chi-Squared Test

Since the test statistic is 15.0276 which is less than the critical value 16.91898, and the p-value is 0.09018064 which is greater than 0.05, we fail to reject the null hypothesis. So is no evidence of heteroskedasticity in the model.

r_squared <- 0.7156
n <- nrow(df)  
test_statistic <- r_squared * n
print("Test statistic: ")
[1] "Test statistic: "
print(test_statistic)
[1] 15.0276
print("")
[1] ""
alpha <- 0.05
k <- 9 
critical_value <- qchisq(1 - alpha, df = k)
print("Critical value: ")
[1] "Critical value: "
print(critical_value)
[1] 16.91898
print("")
[1] ""
p_value <- 1 - pchisq(test_statistic, df = k)
print("p-value: ")
[1] "p-value: "
print(p_value)
[1] 0.09018064