head(myData)
##   group score
## 1     a    86
## 2     a    98
## 3     a    87
## 4     a    97
## 5     a    95
## 6     a    92

Exploratory data analysis

  • Table 1 shows the summary statistics of score stratified by group. The standard deviation and the variance are noticeably different among groups. The unequal variance among groups is also clear in the boxplot depicted in Figure 1.
Table 1 Summary Statistics
Characteristic a, N = 20 b, N = 20 c, N = 20 d, N = 20
score



    Mean 90.75 106.70 78.05 119.55
    Median 90.00 104.50 78.00 121.50
    SD 4.38 10.90 34.32 53.17
    SE 0.98 2.44 7.67 11.89
    Variance 19.14 118.75 1,177.63 2,827.21

  • Let’s now fit a linear model of score against group, then evaluate the results and the residual plots.
m.lm <- lm(score~group, 
           data = myData)
summary(m.lm)
## 
## Call:
## lm(formula = score ~ group, data = myData)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -86.050  -7.712  -1.625  10.700 104.450 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   90.750      7.196  12.611  < 2e-16 ***
## groupb        15.950     10.177   1.567  0.12120    
## groupc       -12.700     10.177  -1.248  0.21589    
## groupd        28.800     10.177   2.830  0.00595 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 32.18 on 76 degrees of freedom
## Multiple R-squared:  0.2007, Adjusted R-squared:  0.1692 
## F-statistic: 6.362 on 3 and 76 DF,  p-value: 0.0006624
  • The summary shows that group is a significant predictor of score (F = 6.362, p < 0.001). The model assumes equal variances, so the standard errors of group coefficients are equal and estimated to be 10.177. Compared to the reference, group a, only group d has a significantly larger mean, while the mean of either group b or c is not significantly different.

  • Figure 2 depicts the residual plots that clearly show unequal variance (heteroscedasticity, panel A) and deviation of normality (panel B) of the residuals.

  • Formal tests for heteroscedasticity (Breusch-Pagan test) and normality (Shapiro-Wilk test) are significant and support the conclusions withdrawn from the visual inspection of residuals.
performance::check_heteroscedasticity(m.lm) # Breusch-Pagan test
## Warning: Heteroscedasticity (non-constant error variance) detected (p < .001).
performance::check_normality(m.lm) # Shapiro-Wilk test
## Warning: Non-normality of residuals detected (p < .001).

Methods to deal with heteroscedasticity

  • Although the transformation of the dependent variable is an option, however, it changes the original scale and makes interpretation difficult. In addition, the back transformation can be complex. Therefore, I introduce here other viable options.


A. Weighted Least Squares (WLS) regression

  • This method depends on assigning a different weight for each observation in such a way that observations with small residual variance have a large weight, while observations with large residual variance have small weight.

  • The following are some common forms of the weights used in WLS:

    1. Inverse variance or standard deviation weights:

      • Each observation is weighted by the inverse of its variance or standard deviation, respectively. The variances or the standard deviations should be known.


    2. Fitted values weights:

      • Each observation is weighted by the inverse of its squared fitted value. The fitted values are extracted from the initial OLS regression (with non constant variance).


    3. Residual-based weights:

      • Each observation is weighted by the inverse of its absolute or squared residual. The residuals are extracted from the initial Ordinary Least Squares (OLS) regression (with non constant variance).


    4. Group-based weights:

      • If the model has a grouping variable, each observation is weighted by the inverse of its group-specific variance.


  • In our example it is prudent to use the inverse of the group-specific variance as weights.

    gp_variance <- aggregate(score ~ group, 
                             data = myData, 
                             var) # calculate the variance of each group
    
    names(gp_variance) <- c("group", "variance")
    mydata_wls <- merge(myData, 
                        gp_variance, 
                        by = "group") # merge varinces with the original data
    
    head(mydata_wls) # each obsrvation is assigned the variance of its group
    ##   group score variance
    ## 1     a    86 19.14474
    ## 2     a    98 19.14474
    ## 3     a    87 19.14474
    ## 4     a    97 19.14474
    ## 5     a    95 19.14474
    ## 6     a    92 19.14474
    m.wls <- lm(score~group, 
                data = mydata_wls, 
                weights = 1/variance) # fit the WLS model and specify weights as the inverse of variance
  • Now, let’s have a look at the WLS model summary:

    summary(m.wls)
    ## 
    ## Call:
    ## lm(formula = score ~ group, data = mydata_wls, weights = 1/variance)
    ## 
    ## Weighted Residuals:
    ##     Min      1Q  Median      3Q     Max 
    ## -2.5075 -0.6218 -0.1560  0.6699  2.1549 
    ## 
    ## Coefficients:
    ##             Estimate Std. Error t value Pr(>|t|)    
    ## (Intercept)  90.7500     0.9784  92.755  < 2e-16 ***
    ## groupb       15.9500     2.6258   6.074 4.58e-08 ***
    ## groupc      -12.7000     7.7355  -1.642   0.1048    
    ## groupd       28.8000    11.9297   2.414   0.0182 *  
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## Residual standard error: 1 on 76 degrees of freedom
    ## Multiple R-squared:  0.3754, Adjusted R-squared:  0.3507 
    ## F-statistic: 15.22 on 3 and 76 DF,  p-value: 7.522e-08
  • The coefficients estimated by the WLS model are exactly the same as those estimated by the OLS model. However, the standard errors are different and vary based on the group.

  • The WLS model has a larger F-value \((15.22\ \text{vs.}\ 6.36)\) and improved \(R^2\ (0.35\ \text{vs.}\ 0.17)\) compared to the OLS model .

  • The mean score of group b and d become significantly different from that of group a.

  • What about the residual plots:

    • As shown in Figure 3, there is something wrong with the residual plots. They have not changed and still reveal heteroscedasticity and deviation from normality. What is going on?


  • The problem is that we used the raw residual of the WLS model, which are unweighted and the same as those of the OLS model. We can see this by comparing the first few residuals of both models.

    head(resid(m.lm)) # OLS residuals
    ##     1     2     3     4     5     6 
    ## -4.75  7.25 -3.75  6.25  4.25  1.25
    head(resid(m.wls)) # Unweighted WLS residuals
    ##     1     2     3     4     5     6 
    ## -4.75  7.25 -3.75  6.25  4.25  1.25


  • We have to use the standardized residuals which are weighted.

head(rstandard(m.wls)) # Standardized WLS residuals
##          1          2          3          4          5          6 
## -1.1137997  1.7000101 -0.8793156  1.4655260  0.9965576  0.2931052


  • As shown Figure 4, the variance of the standardized residuals becomes roughly constant as well as the normality has greatly improved. This shows that stabilizing the variance can indirectly correct the normality violation.


  • The heteroscedasticity (Breusch-Pagan test) and normality (Shapiro-Wilk test) tests are now not significant providing evidence for the correction of both violations observed with the OLS model.
performance::check_heteroscedasticity(m.wls) # Breusch-Pagan test
## OK: Error variance appears to be homoscedastic (p > .999).
performance::check_normality(m.wls) # Shapiro-Wilk test
## OK: residuals appear as normally distributed (p = 0.641).


B. Fitting linear model using generalized least squares (GLS)

  • The function nlme::gls() fits linear models using generalized least squares allowing the errors to be correlated and/or have unequal variances.

    m.gls <- gls(score~group, 
                 data = myData, 
                 weights = varIdent(form = ~1|group))
    
    summary(m.gls)
    ## Generalized least squares fit by REML
    ##   Model: score ~ group 
    ##   Data: myData 
    ##        AIC      BIC    logLik
    ##   675.8608 694.5067 -329.9304
    ## 
    ## Variance function:
    ##  Structure: Different standard deviations per stratum
    ##  Formula: ~1 | group 
    ##  Parameter estimates:
    ##         a         b         c         d 
    ##  1.000000  2.490504  7.842952 12.152180 
    ## 
    ## Coefficients:
    ##              Value Std.Error  t-value p-value
    ## (Intercept)  90.75  0.978385 92.75492  0.0000
    ## groupb       15.95  2.625758  6.07444  0.0000
    ## groupc      -12.70  7.735547 -1.64177  0.1048
    ## groupd       28.80 11.929695  2.41414  0.0182
    ## 
    ##  Correlation: 
    ##        (Intr) groupb groupc
    ## groupb -0.373              
    ## groupc -0.126  0.047       
    ## groupd -0.082  0.031  0.010
    ## 
    ## Standardized residuals:
    ##        Min         Q1        Med         Q3        Max 
    ## -2.5075330 -0.6217673 -0.1560044  0.6699013  2.1549339 
    ## 
    ## Residual standard error: 4.37547 
    ## Degrees of freedom: 80 total; 76 residual
  • Using varIdent(~1|group) argument allows different variances according to the levels of a classification factor (in our case the group).

  • The summary shows that the standard errors are different based on group and are the same as the WLS model.

  • Let’s check the residuals keeping in mind that we cannot use the raw residuals. In addition, we cannot simply use rstandard() function as this will spit an error.

    rstandard(m.gls)
    ## Error in UseMethod("rstandard"): no applicable method for 'rstandard' applied to an object of class "gls"


  • The standardized residuals from the GLS model can be extracted using resid() function specifying the argument type = 'pearson' and specifying the indices from 1 to the length of the dependent variable to extract the residuals only and leave their weights out.
m.gls.res <- resid(m.gls, 
                   type = 'pearson')[1:length(myData$score)]
head(m.gls.res)
##          1          2          3          4          5          6 
## -1.0855977  1.6569649 -0.8570508  1.4284180  0.9713243  0.2856836


  • The residual plots (Figure 5) look fine.


  • In addition, the formal tests for heteroscedasticity and normality are also fine.
design.matrix <- model.matrix(score~ group, 
                              data = myData) # Extract the design matrix
bptest(m.gls.res~design.matrix) # Perform Breusch-Pagan test 
## 
##  studentized Breusch-Pagan test
## 
## data:  m.gls.res ~ design.matrix
## BP = 1.9803e-13, df = 3, p-value = 1
shapiro.test(m.gls.res)
## 
##  Shapiro-Wilk normality test
## 
## data:  m.gls.res
## W = 0.98765, p-value = 0.6413


  • There are other forms of the variance function that can be used to define weights in the GLS model such as VarExp(), varPower(), etc.


C. Adjust for heteroscedasticity using robust standard errors.

  • The functions lmtest::coeftest() and sandwich::vcovHC() are used to perform hypothesis tests on the coefficients of a linear model, while adjusting the variance-covariance matrix of the coefficient estimates for heteroscedasticity using robust standard errors.

    coeftest(x = m.lm, 
             vcov. = vcovHC(m.lm)) # Include the original OLS model with non-constant variance
    ## 
    ## t test of coefficients:
    ## 
    ##             Estimate Std. Error t value  Pr(>|t|)    
    ## (Intercept)  90.7500     1.0038 90.4063 < 2.2e-16 ***
    ## groupb       15.9500     2.6940  5.9206 8.706e-08 ***
    ## groupc      -12.7000     7.9365 -1.6002   0.11370    
    ## groupd       28.8000    12.2396  2.3530   0.02121 *  
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


  • The output shows the coefficient estimates along with the robust standard errors adjusted for heteroskedasticity. The results are very close to those of the WLS and GLS models.