1 Issue Summary

1.1 What is Heteroskedasticity?

Heteroskedasticity means the error variance isn’t constant across observations. For example, when predicting income from education, low-wage workers might have similar incomes (small errors), but high-education workers have much more variation in their salaries (large errors).

1.2 Why Does It Matter?

The important thing is that heteroskedasticity doesn’t bias our coefficient estimates—the \(\beta\) values are still correct. The problem is it makes our standard errors unreliable, which messes up t-tests, p-values, and confidence intervals. So we might think a variable is significant when it’s not, or vice versa.

1.3 Hypotheses

For White’s test:

  • Null hypothesis: Homoskedasticity (constant error variance)
  • Alternative hypothesis: Heteroskedasticity (non-constant error variance)

White’s test works by running an auxiliary regression where we regress the squared residuals on the original predictors, their squares, and their interactions. If the squared residuals are related to these terms, that suggests heteroskedasticity.

2 Data and Model

For my analysis, I’m using the mtcars dataset with 32 cars from a 1974 magazine.

# Load packages and data
rm(list = ls())
library(tidyverse)
library(skedastic)
library(lmtest)

data("mtcars")
str(mtcars)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

2.1 Main Regression

I’m estimating a model to predict fuel efficiency (mpg) using weight, horsepower, and displacement:

\[mpg_i = \beta_0 + \beta_1 \cdot wt_i + \beta_2 \cdot hp_i + \beta_3 \cdot disp_i + \epsilon_i\]

# Estimate the model
main_model <- lm(mpg ~ wt + hp + disp, data = mtcars)
summary(main_model)
## 
## Call:
## lm(formula = mpg ~ wt + hp + disp, data = mtcars)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -3.891 -1.640 -0.172  1.061  5.861 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 37.105505   2.110815  17.579  < 2e-16 ***
## wt          -3.800891   1.066191  -3.565  0.00133 ** 
## hp          -0.031157   0.011436  -2.724  0.01097 *  
## disp        -0.000937   0.010350  -0.091  0.92851    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.639 on 28 degrees of freedom
## Multiple R-squared:  0.8268, Adjusted R-squared:  0.8083 
## F-statistic: 44.57 on 3 and 28 DF,  p-value: 8.65e-11

Weight has the biggest effect—each extra 1,000 lbs reduces mpg by about 3.8 miles per gallon. Horsepower is also significant (p = 0.011), but displacement isn’t (p = 0.929), probably because it’s highly correlated with the other variables. The adjusted R² is 0.82, so the model fits pretty well.

3 Residual Plots

# Check residual plots
par(mfrow = c(2, 2))
plot(main_model)

par(mfrow = c(1, 1))

The residuals vs. fitted plot shows a funnel shape—residuals are tighter for high-mpg cars but spread out more for low-mpg cars. This could suggest heteroskedasticity, but visual inspection isn’t always reliable, so I’ll use White’s test.

4 White’s Test Using skedastic Package

# Run White's test
white_test_result <- white(mainlm = main_model, interactions = TRUE)
white_test_result
## # A tibble: 1 × 5
##   statistic p.value parameter method       alternative
##       <dbl>   <dbl>     <dbl> <chr>        <chr>      
## 1      6.74   0.664         9 White's Test greater

4.1 Results

  • Test statistic: 6.74
  • Degrees of freedom: 9
  • p-value: 0.664

Since the p-value (0.664) is greater than 0.05, I fail to reject the null hypothesis. There isn’t enough evidence to say heteroskedasticity is present in this model. This means the standard OLS standard errors should be fine.

5 Manual White’s Test Calculation

Now I’ll replicate the test manually to make sure I understand how it works.

5.1 Extract and Square Residuals

# Get residuals and square them
residuals_main <- resid(main_model)
squared_residuals <- residuals_main^2

5.2 Auxiliary Regression

For the auxiliary regression, I need to include the original variables (wt, hp, disp), their squares, and their interactions.

# Run auxiliary regression
aux_regression <- lm(squared_residuals ~ wt + hp + disp +
                     I(wt^2) + I(hp^2) + I(disp^2) +
                     wt:hp + wt:disp + hp:disp,
                     data = mtcars)
summary(aux_regression)
## 
## Call:
## lm(formula = squared_residuals ~ wt + hp + disp + I(wt^2) + I(hp^2) + 
##     I(disp^2) + wt:hp + wt:disp + hp:disp, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -15.454  -3.826  -2.096   2.042  20.085 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)
## (Intercept)  3.212e+01  3.952e+01   0.813    0.425
## wt           5.759e+00  4.060e+01   0.142    0.888
## hp          -3.883e-01  3.573e-01  -1.087    0.289
## disp        -7.838e-02  3.643e-01  -0.215    0.832
## I(wt^2)     -4.277e+00  1.079e+01  -0.396    0.696
## I(hp^2)     -1.688e-04  6.340e-04  -0.266    0.793
## I(disp^2)    6.324e-05  9.061e-04   0.070    0.945
## wt:hp        1.334e-01  1.572e-01   0.849    0.405
## wt:disp      1.199e-02  1.862e-01   0.064    0.949
## hp:disp     -4.766e-07  1.198e-03   0.000    1.000
## 
## Residual standard error: 9.925 on 22 degrees of freedom
## Multiple R-squared:  0.2106, Adjusted R-squared:  -0.1123 
## F-statistic: 0.6523 on 9 and 22 DF,  p-value: 0.7414

The auxiliary regression has an R² of 0.211, which means only about 21.1% of the variation in squared residuals is explained by these 9 predictors. A higher R² would suggest the error variance is related to the predictors (heteroskedasticity), but this is pretty low.

5.3 Calculate Test Statistic

# Sample size
n <- nobs(aux_regression)

# Get R-squared
r_squared_aux <- summary(aux_regression)$r.squared

# White test statistic: LM = n * R²
white_statistic_manual <- n * r_squared_aux
white_statistic_manual
## [1] 6.740797

The manual LM statistic is 6.74.

5.4 Calculate p-value

# Degrees of freedom = number of slope terms in auxiliary regression
df_white <- 9

# Calculate p-value
p_value_manual <- pchisq(white_statistic_manual, df = df_white, lower.tail = FALSE)
p_value_manual
## [1] 0.6640856

My manual p-value is 0.664.

5.5 Compare to Package Results

# Comparison table
comparison <- data.frame(
  Method = c("skedastic Package", "Manual Calculation"),
  Test_Statistic = c(white_test_result$statistic, white_statistic_manual),
  DF = c(white_test_result$parameter, df_white),
  P_Value = c(white_test_result$p.value, p_value_manual)
)
print(comparison)
##               Method Test_Statistic DF   P_Value
## 1  skedastic Package       6.740797  9 0.6640856
## 2 Manual Calculation       6.740797  9 0.6640856

The manual calculations match the package results exactly, so my implementation is correct.

6 Conclusion

Based on White’s test, I fail to reject the null hypothesis of homoskedasticity (p = 0.664). The auxiliary regression only explained 21.1% of the variation in squared residuals, which doesn’t suggest a strong relationship between error variance and the predictors.

This means the standard OLS standard errors from my original regression are valid, and I can trust the t-tests and confidence intervals without needing to use robust standard errors.