library(MASS)

Weighting Methods for Robust Regression


Using Other Psi Operators

Fitting is done by iterated re-weighted least squares (IWLS).

Psi functions are supplied for the Huber, Hampel and Tukey bisquare proposals as psi.huber, psi.hampel and psi.bisquare. Huber’s corresponds to a convex optimization problem and gives a unique solution (up to collinearity). The other two will have multiple local minima, and a good starting point is desirable.

Huber Weighting

In Huber weighting, observations with small residuals get a weight of 1 and the larger the residual, the smaller the weight. This is defined by the weight function

\[\begin{eqnarray} w(e) = 1 \mbox{ for } |e| \leq k \\ w(e) = \frac{k}{|e|} \mbox{ for } |e| > k \end{eqnarray}\]

The value \(k\) for the Huber method is called a tuning constant; smaller values of \(k\) produce more resistance to outliers, but at the expense of lower efficiency when the errors are normally distributed.

The tuning constant is generally picked to give reasonably high efficiency in the normal case; in particular, \(k = 1.345\sigma\) for the Huber’s method, where \(\sigma\) is the standard deviation of the errors) produce 95-percent efficiency when the errors are normal, and still offer protection against outliers.

library(MASS)
library(faraway)
FitAll.rr = rlm(taste ~ Acetic + H2S + Lactic,data=cheddar)


summary(FitAll.rr)
## 
## Call: rlm(formula = taste ~ Acetic + H2S + Lactic, data = cheddar)
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -16.163  -5.612  -1.153   5.487  27.106 
## 
## Coefficients:
##             Value    Std. Error t value 
## (Intercept) -20.7529  20.1001    -1.0325
## Acetic       -1.5331   4.5422    -0.3375
## H2S           4.0515   1.2715     3.1864
## Lactic       20.1459   8.7885     2.2923
## 
## Residual standard error: 8.471 on 26 degrees of freedom

Regression Equation:

\[ \hat{Taste} = -20.75 -1.53 Acetic + 4.05 H2S + 20.15 Lactic\]

From before, we noticed that observations 15 , 12 and 8 were influential in the determination of the coefficients. The following table indicates the weight given to each observation when using robust regression.

We can see that roughly, as the absolute residual goes down, the weight goes up. In other words, cases with a large residuals tend to be down-weighted.

hweights <- data.frame(taste=cheddar$taste,  #Observation 
                       resid = FitAll.rr$resid, #Residual
                       weight = FitAll.rr$w) #Weight
hweights2 <- hweights[order(FitAll.rr$w), ] 
hweights2 %>% kable(digits = 2)
taste resid weight
15 54.9 27.11 0.42
12 57.2 17.52 0.65
8 21.9 -16.16 0.70
3 39.0 14.32 0.80
18 6.4 -13.61 0.84
28 0.7 -11.41 1.00
1 12.3 9.99 1.00
2 20.9 -1.69 1.00
4 47.9 10.65 1.00
5 5.6 -1.87 1.00
6 25.9 2.63 1.00
7 37.3 5.74 1.00
9 18.1 4.78 1.00
10 21.0 1.05 1.00
11 34.9 5.72 1.00
13 0.7 -5.18 1.00
14 25.9 8.47 1.00
16 40.9 -2.49 1.00
17 15.9 4.77 1.00
19 18.0 -11.05 1.00
20 38.9 -8.82 1.00
21 14.0 -1.47 1.00
22 15.2 -3.87 1.00
23 32.0 -5.34 1.00
24 56.7 4.61 1.00
25 16.8 4.54 1.00
26 11.6 -0.84 1.00
27 26.5 -5.70 1.00
29 13.4 -5.79 1.00
30 5.5 -8.86 1.00

Implementation with Bisquare Weighting

Implementing with bisquare weighting simply requires the specification of the additional argument, as per the code below (see last argument)

FitAll.rr.2 = rlm(taste ~ Acetic + H2S + Lactic, 
                  data=cheddar,
                  psi = psi.bisquare)
summary(FitAll.rr.2)
## 
## Call: rlm(formula = taste ~ Acetic + H2S + Lactic, data = cheddar, 
##     psi = psi.bisquare)
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -15.7034  -5.1552  -0.9793   5.6933  27.7661 
## 
## Coefficients:
##             Value    Std. Error t value 
## (Intercept) -17.7730  20.7031    -0.8585
## Acetic       -2.2650   4.6784    -0.4841
## H2S           4.0569   1.3096     3.0977
## Lactic       20.6885   9.0522     2.2855
## 
## Residual standard error: 7.878 on 26 degrees of freedom

Regression Equation :

\[Taste* = -17.77 -2.26 Acetic + 4.05 H2S + 20.68 Lactic\]

Weights using Bisquare estimator.

hweights <- data.frame(taste=cheddar$taste,  #Observation 
                       resid = FitAll.rr.2$resid, #Residual
                       weight = FitAll.rr.2$w) #Weight
hweights2 <- hweights[order(FitAll.rr.2$w), ] 
hweights2 %>% kable(digits=2)
taste resid weight
15 54.9 27.77 0.19
12 57.2 18.18 0.57
8 21.9 -15.70 0.67
3 39.0 14.38 0.72
18 6.4 -13.46 0.75
28 0.7 -11.19 0.82
19 18.0 -11.11 0.83
4 47.9 10.86 0.83
1 12.3 9.85 0.86
20 38.9 -8.95 0.89
14 25.9 8.59 0.89
30 5.5 -8.02 0.91
7 37.3 6.33 0.94
11 34.9 6.00 0.95
13 0.7 -5.47 0.96
23 32.0 -5.16 0.96
29 13.4 -5.15 0.96
27 26.5 -4.93 0.96
24 56.7 4.77 0.97
25 16.8 4.76 0.97
9 18.1 4.66 0.97
17 15.9 4.65 0.97
22 15.2 -3.72 0.98
6 25.9 3.19 0.99
5 5.6 -1.99 0.99
16 40.9 -1.81 1.00
21 14.0 -1.76 1.00
2 20.9 -1.75 1.00
10 21.0 1.02 1.00
26 11.6 -0.20 1.00

Conclusion

We can see that the weight given to some observations is dramatically lower using the bisquare weighting function than the Huber weighting function and the coefficient estimates from these two different weighting methods differ.

When comparing the results of a regular OLS regression and a robust regression, if the results are very different, you will most likely want to use the results from the robust regression.

Large differences suggest that the model parameters are being highly influenced by outliers. Different functions have advantages and drawbacks.

Huber weights can have difficulties with severe outliers, and bisquare weights can have difficulties converging or may yield multiple solutions.