In this section, I will extensively talk about the GLS, with a focus on the heteroskedasticity issue. The take-home messages are: 1) when you use “weight” in “lm” function to run WLS, do not forget about the inverse. 2) The White or Breusch-Pagan is unreliable. My personal suggestion is: do not use them when you are really disappointed.
To illustrate, I design an heteroskedascticity issue as follows. The variance of error term follows poisson distribution with mean of 100, unconditionally. The real regression equation is \(y = 2*x_1 + 3*x_2 + \epsilon\)
num <- 1000
set.seed(109)
x1 <- rpois(num, 12)
x2 <- rgamma(num, 2)
cond_var <- rpois(num, 99)
error_term <- sapply(cond_var, function(i) rnorm(1, 0, sqrt(i)))
y <- 2*x1 + 3*x2 + error_term # equation
dat <- cbind(y, x1, x2, error_term, cond_var)
dat <- as.data.frame(dat[sample(1:num,500,replace = TRUE),]) # mimic of sampling from population
Without any treatment of heteroskedasticity, we could run the OLS to get our estimates.
reg <- lm(y ~ x1 + x2, data = dat)
summary(reg)
##
## Call:
## lm(formula = y ~ x1 + x2, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -25.8910 -6.0058 -0.0059 6.2481 25.3100
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.8512 1.6670 -0.511 0.61
## x1 1.9186 0.1252 15.327 <2e-16 ***
## x2 3.4128 0.2884 11.833 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.439 on 497 degrees of freedom
## Multiple R-squared: 0.4205, Adjusted R-squared: 0.4182
## F-statistic: 180.3 on 2 and 497 DF, p-value: < 2.2e-16
vcov(reg)
## (Intercept) x1 x2
## (Intercept) 2.7790229 -0.187765599 -0.188472030
## x1 -0.1877656 0.015669910 0.001483032
## x2 -0.1884720 0.001483032 0.083177548
It is clear from the above regression that the presence of heteroskedascticity doesn’t affect consistency.
We leave the variance of original error term in the manually created dataset to facilitate the WLS regression. Rather than using the “lm” command in R, I use the home-made codes to run the WLS as follows. And, our focus is on the efficiency, or the covariance matrix of parameter estimates. Note that, this treatment is basically infeasible, because you will never know the variance structure as I do in this particular case.
design_m <- model.matrix(reg)
y_m <- as.matrix(dat[,'y'])
weight_m <- matrix(0, 500, 500)
diag(weight_m) <- dat$cond_var
beta_wls <- solve(t(design_m) %*% solve(weight_m) %*% design_m) %*% (t(design_m) %*% solve(weight_m) %*% y_m)
beta_wls
## [,1]
## (Intercept) -1.363470
## x1 1.966025
## x2 3.377820
It is clearly seen that we have improved the estimate a lot, because we achieve lower variance, given that we know the parameter estimates are true and non-zero. We will reject the null with much higher likelihood as we wish. Besides, I want to remind the “lm” users that: you will have to use the inverse of weight vector to specify the weight.
reg_wls <- lm(y ~ x1 + x2, data = dat, weights = 1/cond_var)
summary(reg_wls)
##
## Call:
## lm(formula = y ~ x1 + x2, data = dat, weights = 1/cond_var)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -2.59131 -0.61612 -0.01263 0.62371 2.51999
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.3635 1.6524 -0.825 0.41
## x1 1.9660 0.1247 15.769 <2e-16 ***
## x2 3.3778 0.2856 11.828 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9492 on 497 degrees of freedom
## Multiple R-squared: 0.4311, Adjusted R-squared: 0.4288
## F-statistic: 188.3 on 2 and 497 DF, p-value: < 2.2e-16
vcov(reg_wls)
## (Intercept) x1 x2
## (Intercept) 2.7305163 -0.185263467 -0.182159583
## x1 -0.1852635 0.015544702 0.001178971
## x2 -0.1821596 0.001178971 0.081557382
The mechanism of HC estimator is hard to summarize. In stata, there are four different HC estimators, while in R, there are 6. But, all of these estimators are ways to find an asymptotically consistent analogue of the “meat” matrix: \(X'\sigma X\). It is an improvement on the OLS, rather than GLS.
vcovHC(reg, type = 'HC')
## (Intercept) x1 x2
## (Intercept) 2.7392965 -0.184578677 -0.184614469
## x1 -0.1845787 0.014905980 0.003212793
## x2 -0.1846145 0.003212793 0.076074223
This treatment is the feasible version of the first treatment. Imagine we do not the variance of error terms, so we resort to some exogeneous vector and treat it as weight. In this case, I just use the square of residuals.
fgls <- lm(y ~ x1 + x2, data = dat, weights = 1/reg$fitted.values^2)
summary(fgls)
##
## Call:
## lm(formula = y ~ x1 + x2, data = dat, weights = 1/reg$fitted.values^2)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -1.59433 -0.22344 -0.00492 0.21818 1.22034
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.8720 1.3077 -2.961 0.00321 **
## x1 2.0992 0.1193 17.592 < 2e-16 ***
## x2 3.9448 0.3560 11.082 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3793 on 497 degrees of freedom
## Multiple R-squared: 0.4932, Adjusted R-squared: 0.4912
## F-statistic: 241.9 on 2 and 497 DF, p-value: < 2.2e-16
vcov(fgls)
## (Intercept) x1 x2
## (Intercept) 1.7101384 -0.13205447 -0.1510193
## x1 -0.1320545 0.01423961 -0.0049040
## x2 -0.1510193 -0.00490400 0.1267105
This type of treatment is, from my standpoit, fancy and empirically reasonal. The idea is to produce an stable output, which converges with iterative regressions. Below are the codes which produce the “fancy” estimation.
test <- vector('list', 50)
test[[1]] <- list(coef0 = coef(reg), res0 = reg$residuals)
wls_func_lm <- function(obj){
weight <- obj[['res0']]^2
lm_reg <- lm(y ~ x1 + x2, data = dat, weights = 1/weight)
lm_reg_res <- lm(lm_reg$residuals^2 ~ x1 + x2, data = dat)
return(list(coef0 = coef(lm_reg), res0 = lm_reg_res$fitted.values))
}
for (i in 1:10){
test[[i+1]] <- wls_func_lm(test[[i]])
dif <- sum(test[[i+1]][['coef0']] - test[[i]][['coef0']])^2
cat(i, test[[i]][['coef0']], dif , '\n')
}
## 1 -0.8512072 1.918575 3.412768 0.0005372193
## 2 -0.9073932 1.918907 3.445444 0.02072117
## 3 -1.070646 1.93026 3.453395 1.279228e-07
## 4 -1.071218 1.930276 3.453594 1.166371e-11
## 5 -1.071223 1.930276 3.453596 1.512537e-15
## 6 -1.071224 1.930276 3.453596 2.002253e-19
## 7 -1.071224 1.930276 3.453596 2.69207e-23
## 8 -1.071224 1.930276 3.453596 4.061105e-27
## 9 -1.071224 1.930276 3.453596 1.043269e-28
## 10 -1.071224 1.930276 3.453596 1.065406e-27
You can clearly see the trace of convergence here.
There are two dominant ways to detect the heteroskedasticity. One is to plot the residuals, and the second is to use the popular “White” test. How do these two approaches perform in our case?
plot(reg$residuals^2, cex = 0.7)
There is no visible strutural heteroskedasticity here. But we do find some “outliers” in the plot.
res_reg <- lm(reg$residuals^2 ~ dat$x1 + dat$x2 + dat$x1^2 + dat$x2^2 + dat$x1*dat$x2)
pchisq(q = 500*0.02318, df = 5, lower.tail = T)
## [1] 0.9591413
The White test fail to reject the null hypothesis! It is actually not surprising! Both of the White and Breusch-Pagan test assume that the variance of error terms is conditional on some exogeneous variables. If this is not true, both of the tests will “unconditionally” fail.