In this section, I will extensively talk about the GLS, with a focus on the heteroskedasticity issue. The take-home messages are: 1) when you use “weight” in “lm” function to run WLS, do not forget about the inverse. 2) The White or Breusch-Pagan is unreliable. My personal suggestion is: do not use them when you are really disappointed.

1. Model Set-up

To illustrate, I design an heteroskedascticity issue as follows. The variance of error term follows poisson distribution with mean of 100, unconditionally. The real regression equation is \(y = 2*x_1 + 3*x_2 + \epsilon\)

num <- 1000
set.seed(109)
x1 <- rpois(num, 12)
x2 <- rgamma(num, 2)
cond_var <- rpois(num, 99)
error_term <- sapply(cond_var, function(i) rnorm(1, 0, sqrt(i)))

y <- 2*x1 + 3*x2 + error_term # equation

dat <- cbind(y, x1, x2, error_term, cond_var)
dat <- as.data.frame(dat[sample(1:num,500,replace = TRUE),]) # mimic of sampling from population

Without any treatment of heteroskedasticity, we could run the OLS to get our estimates.

reg <- lm(y ~ x1 + x2, data  = dat)
summary(reg)
## 
## Call:
## lm(formula = y ~ x1 + x2, data = dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -25.8910  -6.0058  -0.0059   6.2481  25.3100 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -0.8512     1.6670  -0.511     0.61    
## x1            1.9186     0.1252  15.327   <2e-16 ***
## x2            3.4128     0.2884  11.833   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.439 on 497 degrees of freedom
## Multiple R-squared:  0.4205, Adjusted R-squared:  0.4182 
## F-statistic: 180.3 on 2 and 497 DF,  p-value: < 2.2e-16
vcov(reg)
##             (Intercept)           x1           x2
## (Intercept)   2.7790229 -0.187765599 -0.188472030
## x1           -0.1877656  0.015669910  0.001483032
## x2           -0.1884720  0.001483032  0.083177548

It is clear from the above regression that the presence of heteroskedascticity doesn’t affect consistency.

2. Treatments of Heteroskedascticity

Treatment 1 - Weighted least sqaure (WLS)

We leave the variance of original error term in the manually created dataset to facilitate the WLS regression. Rather than using the “lm” command in R, I use the home-made codes to run the WLS as follows. And, our focus is on the efficiency, or the covariance matrix of parameter estimates. Note that, this treatment is basically infeasible, because you will never know the variance structure as I do in this particular case.

design_m <- model.matrix(reg)
y_m <- as.matrix(dat[,'y'])
  
weight_m <- matrix(0, 500, 500)
diag(weight_m) <- dat$cond_var
beta_wls <- solve(t(design_m) %*% solve(weight_m) %*% design_m) %*% (t(design_m) %*% solve(weight_m) %*% y_m)
beta_wls
##                  [,1]
## (Intercept) -1.363470
## x1           1.966025
## x2           3.377820

It is clearly seen that we have improved the estimate a lot, because we achieve lower variance, given that we know the parameter estimates are true and non-zero. We will reject the null with much higher likelihood as we wish. Besides, I want to remind the “lm” users that: you will have to use the inverse of weight vector to specify the weight.

reg_wls <- lm(y ~ x1 + x2, data  = dat, weights = 1/cond_var) 
summary(reg_wls)
## 
## Call:
## lm(formula = y ~ x1 + x2, data = dat, weights = 1/cond_var)
## 
## Weighted Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.59131 -0.61612 -0.01263  0.62371  2.51999 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -1.3635     1.6524  -0.825     0.41    
## x1            1.9660     0.1247  15.769   <2e-16 ***
## x2            3.3778     0.2856  11.828   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9492 on 497 degrees of freedom
## Multiple R-squared:  0.4311, Adjusted R-squared:  0.4288 
## F-statistic: 188.3 on 2 and 497 DF,  p-value: < 2.2e-16
vcov(reg_wls) 
##             (Intercept)           x1           x2
## (Intercept)   2.7305163 -0.185263467 -0.182159583
## x1           -0.1852635  0.015544702  0.001178971
## x2           -0.1821596  0.001178971  0.081557382

Treatment 2 - Robust Standard Error (Heteroskedasticity-consistent standard errors)

The mechanism of HC estimator is hard to summarize. In stata, there are four different HC estimators, while in R, there are 6. But, all of these estimators are ways to find an asymptotically consistent analogue of the “meat” matrix: \(X'\sigma X\). It is an improvement on the OLS, rather than GLS.

vcovHC(reg, type = 'HC')
##             (Intercept)           x1           x2
## (Intercept)   2.7392965 -0.184578677 -0.184614469
## x1           -0.1845787  0.014905980  0.003212793
## x2           -0.1846145  0.003212793  0.076074223

Treatment 3 -Feasible GLS (FGLS)

This treatment is the feasible version of the first treatment. Imagine we do not the variance of error terms, so we resort to some exogeneous vector and treat it as weight. In this case, I just use the square of residuals.

fgls <- lm(y ~ x1 + x2, data  = dat, weights = 1/reg$fitted.values^2)
summary(fgls)
## 
## Call:
## lm(formula = y ~ x1 + x2, data = dat, weights = 1/reg$fitted.values^2)
## 
## Weighted Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.59433 -0.22344 -0.00492  0.21818  1.22034 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -3.8720     1.3077  -2.961  0.00321 ** 
## x1            2.0992     0.1193  17.592  < 2e-16 ***
## x2            3.9448     0.3560  11.082  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3793 on 497 degrees of freedom
## Multiple R-squared:  0.4932, Adjusted R-squared:  0.4912 
## F-statistic: 241.9 on 2 and 497 DF,  p-value: < 2.2e-16
vcov(fgls)
##             (Intercept)          x1         x2
## (Intercept)   1.7101384 -0.13205447 -0.1510193
## x1           -0.1320545  0.01423961 -0.0049040
## x2           -0.1510193 -0.00490400  0.1267105

Treatment 4 - Iterative Feasible GLS (IFGLS)

This type of treatment is, from my standpoit, fancy and empirically reasonal. The idea is to produce an stable output, which converges with iterative regressions. Below are the codes which produce the “fancy” estimation.

test <- vector('list', 50)
test[[1]] <- list(coef0 = coef(reg), res0 = reg$residuals)

wls_func_lm <- function(obj){
  weight <- obj[['res0']]^2
  lm_reg <- lm(y ~ x1 + x2, data = dat, weights = 1/weight)
  lm_reg_res <- lm(lm_reg$residuals^2 ~ x1 + x2, data = dat)
  return(list(coef0 = coef(lm_reg), res0 = lm_reg_res$fitted.values)) 
}

for (i in 1:10){
test[[i+1]] <- wls_func_lm(test[[i]])
dif <- sum(test[[i+1]][['coef0']] - test[[i]][['coef0']])^2
cat(i, test[[i]][['coef0']], dif , '\n')
}
## 1 -0.8512072 1.918575 3.412768 0.0005372193 
## 2 -0.9073932 1.918907 3.445444 0.02072117 
## 3 -1.070646 1.93026 3.453395 1.279228e-07 
## 4 -1.071218 1.930276 3.453594 1.166371e-11 
## 5 -1.071223 1.930276 3.453596 1.512537e-15 
## 6 -1.071224 1.930276 3.453596 2.002253e-19 
## 7 -1.071224 1.930276 3.453596 2.69207e-23 
## 8 -1.071224 1.930276 3.453596 4.061105e-27 
## 9 -1.071224 1.930276 3.453596 1.043269e-28 
## 10 -1.071224 1.930276 3.453596 1.065406e-27

You can clearly see the trace of convergence here.

3. Detection of heteroskedasticity

There are two dominant ways to detect the heteroskedasticity. One is to plot the residuals, and the second is to use the popular “White” test. How do these two approaches perform in our case?

(1) Visualization

plot(reg$residuals^2, cex = 0.7)

There is no visible strutural heteroskedasticity here. But we do find some “outliers” in the plot.

(2) White Test

res_reg <- lm(reg$residuals^2 ~ dat$x1 + dat$x2 + dat$x1^2 + dat$x2^2 + dat$x1*dat$x2)
pchisq(q = 500*0.02318, df = 5, lower.tail = T)
## [1] 0.9591413

The White test fail to reject the null hypothesis! It is actually not surprising! Both of the White and Breusch-Pagan test assume that the variance of error terms is conditional on some exogeneous variables. If this is not true, both of the tests will “unconditionally” fail.