Robust Regression - Weighting Methods

library(MASS)

Weighting Methods for Robust Regression

The idea of robust regression is to weigh the observations differently based on how well behaved these observations are. Roughly speaking, it is a form of weighted and reweighted least squares regression (i.e. a two step process , first fitting a linear model, then a robust model to correct for the influence of outliers).
Robust regression is done by iterated re-weighted least squares (IRLS). The rlm command in the MASS package command implements several versions of robust regression.
There are several weighting functions that can be used for IRLS. We are going to first use the Huber weights in this example. We will then look at the final weights created by the IRLS process. This can be very useful.
Also we will look at an alternative weighting approach to Huber’s weighting – called Bisquare weighting.

Using Other `Psi` Operators

Fitting is done by iterated re-weighted least squares (IWLS).

Psi functions are supplied for the Huber, Hampel and Tukey bisquare proposals as psi.huber, psi.hampel and psi.bisquare. Huber’s corresponds to a convex optimization problem and gives a unique solution (up to collinearity). The other two will have multiple local minima, and a good starting point is desirable.

huber
bisquare
hampel

Huber Weighting

In Huber weighting, observations with small residuals get a weight of 1 and the larger the residual, the smaller the weight. This is defined by the weight function

\[\begin{eqnarray} w(e) = 1 \mbox{ for } |e| \leq k \\ w(e) = \frac{k}{|e|} \mbox{ for } |e| > k \end{eqnarray}\]

The value \(k\) for the Huber method is called a tuning constant; smaller values of \(k\) produce more resistance to outliers, but at the expense of lower efficiency when the errors are normally distributed.

The tuning constant is generally picked to give reasonably high efficiency in the normal case; in particular, \(k = 1.345\sigma\) for the Huber’s method, where \(\sigma\) is the standard deviation of the errors) produce 95-percent efficiency when the errors are normal, and still offer protection against outliers.

library(MASS)
library(faraway)
FitAll.rr = rlm(taste ~ Acetic + H2S + Lactic,data=cheddar)


summary(FitAll.rr)

## 
## Call: rlm(formula = taste ~ Acetic + H2S + Lactic, data = cheddar)
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -16.163  -5.612  -1.153   5.487  27.106 
## 
## Coefficients:
##             Value    Std. Error t value 
## (Intercept) -20.7529  20.1001    -1.0325
## Acetic       -1.5331   4.5422    -0.3375
## H2S           4.0515   1.2715     3.1864
## Lactic       20.1459   8.7885     2.2923
## 
## Residual standard error: 8.471 on 26 degrees of freedom

Regression Equation:

\[ \hat{Taste} = -20.75 -1.53 Acetic + 4.05 H2S + 20.15 Lactic\]

From before, we noticed that observations 15 , 12 and 8 were influential in the determination of the coefficients. The following table indicates the weight given to each observation when using robust regression.

We can see that roughly, as the absolute residual goes down, the weight goes up. In other words, cases with a large residuals tend to be down-weighted.

hweights <- data.frame(taste=cheddar$taste,  #Observation 
                       resid = FitAll.rr$resid, #Residual
                       weight = FitAll.rr$w) #Weight
hweights2 <- hweights[order(FitAll.rr$w), ]

hweights2 %>% kable(digits = 2)

	taste	resid	weight
15	54.9	27.11	0.42
12	57.2	17.52	0.65
8	21.9	-16.16	0.70
3	39.0	14.32	0.80
18	6.4	-13.61	0.84
28	0.7	-11.41	1.00
1	12.3	9.99	1.00
2	20.9	-1.69	1.00
4	47.9	10.65	1.00
5	5.6	-1.87	1.00
6	25.9	2.63	1.00
7	37.3	5.74	1.00
9	18.1	4.78	1.00
10	21.0	1.05	1.00
11	34.9	5.72	1.00
13	0.7	-5.18	1.00
14	25.9	8.47	1.00
16	40.9	-2.49	1.00
17	15.9	4.77	1.00
19	18.0	-11.05	1.00
20	38.9	-8.82	1.00
21	14.0	-1.47	1.00
22	15.2	-3.87	1.00
23	32.0	-5.34	1.00
24	56.7	4.61	1.00
25	16.8	4.54	1.00
26	11.6	-0.84	1.00
27	26.5	-5.70	1.00
29	13.4	-5.79	1.00
30	5.5	-8.86	1.00

Implementation with Bisquare Weighting

Implementing with bisquare weighting simply requires the specification of the additional argument, as per the code below (see last argument)

FitAll.rr.2 = rlm(taste ~ Acetic + H2S + Lactic, 
                  data=cheddar,
                  psi = psi.bisquare)
summary(FitAll.rr.2)

## 
## Call: rlm(formula = taste ~ Acetic + H2S + Lactic, data = cheddar, 
##     psi = psi.bisquare)
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -15.7034  -5.1552  -0.9793   5.6933  27.7661 
## 
## Coefficients:
##             Value    Std. Error t value 
## (Intercept) -17.7730  20.7031    -0.8585
## Acetic       -2.2650   4.6784    -0.4841
## H2S           4.0569   1.3096     3.0977
## Lactic       20.6885   9.0522     2.2855
## 
## Residual standard error: 7.878 on 26 degrees of freedom

Regression Equation :

\[Taste* = -17.77 -2.26 Acetic + 4.05 H2S + 20.68 Lactic\]

Weights using Bisquare estimator.

hweights <- data.frame(taste=cheddar$taste,  #Observation 
                       resid = FitAll.rr.2$resid, #Residual
                       weight = FitAll.rr.2$w) #Weight
hweights2 <- hweights[order(FitAll.rr.2$w), ]

hweights2 %>% kable(digits=2)

	taste	resid	weight
15	54.9	27.77	0.19
12	57.2	18.18	0.57
8	21.9	-15.70	0.67
3	39.0	14.38	0.72
18	6.4	-13.46	0.75
28	0.7	-11.19	0.82
19	18.0	-11.11	0.83
4	47.9	10.86	0.83
1	12.3	9.85	0.86
20	38.9	-8.95	0.89
14	25.9	8.59	0.89
30	5.5	-8.02	0.91
7	37.3	6.33	0.94
11	34.9	6.00	0.95
13	0.7	-5.47	0.96
23	32.0	-5.16	0.96
29	13.4	-5.15	0.96
27	26.5	-4.93	0.96
24	56.7	4.77	0.97
25	16.8	4.76	0.97
9	18.1	4.66	0.97
17	15.9	4.65	0.97
22	15.2	-3.72	0.98
6	25.9	3.19	0.99
5	5.6	-1.99	0.99
16	40.9	-1.81	1.00
21	14.0	-1.76	1.00
2	20.9	-1.75	1.00
10	21.0	1.02	1.00
26	11.6	-0.20	1.00

Conclusion

We can see that the weight given to some observations is dramatically lower using the bisquare weighting function than the Huber weighting function and the coefficient estimates from these two different weighting methods differ.

When comparing the results of a regular OLS regression and a robust regression, if the results are very different, you will most likely want to use the results from the robust regression.

Large differences suggest that the model parameters are being highly influenced by outliers. Different functions have advantages and drawbacks.

Huber weights can have difficulties with severe outliers, and bisquare weights can have difficulties converging or may yield multiple solutions.