library(MASS)
Fitting is done by iterated re-weighted least squares (IWLS).
Psi functions are supplied for the Huber, Hampel and Tukey bisquare proposals as psi.huber, psi.hampel and psi.bisquare. Huber’s corresponds to a convex optimization problem and gives a unique solution (up to collinearity). The other two will have multiple local minima, and a good starting point is desirable.
In Huber weighting, observations with small residuals get a weight of 1 and the larger the residual, the smaller the weight. This is defined by the weight function
\[\begin{eqnarray} w(e) = 1 \mbox{ for } |e| \leq k \\ w(e) = \frac{k}{|e|} \mbox{ for } |e| > k \end{eqnarray}\]
The value \(k\) for the Huber method is called a tuning constant; smaller values of \(k\) produce more resistance to outliers, but at the expense of lower efficiency when the errors are normally distributed.
The tuning constant is generally picked to give reasonably high efficiency in the normal case; in particular, \(k = 1.345\sigma\) for the Huber’s method, where \(\sigma\) is the standard deviation of the errors) produce 95-percent efficiency when the errors are normal, and still offer protection against outliers.
library(MASS)
library(faraway)
FitAll.rr = rlm(taste ~ Acetic + H2S + Lactic,data=cheddar)
summary(FitAll.rr)
##
## Call: rlm(formula = taste ~ Acetic + H2S + Lactic, data = cheddar)
## Residuals:
## Min 1Q Median 3Q Max
## -16.163 -5.612 -1.153 5.487 27.106
##
## Coefficients:
## Value Std. Error t value
## (Intercept) -20.7529 20.1001 -1.0325
## Acetic -1.5331 4.5422 -0.3375
## H2S 4.0515 1.2715 3.1864
## Lactic 20.1459 8.7885 2.2923
##
## Residual standard error: 8.471 on 26 degrees of freedom
\[ \hat{Taste} = -20.75 -1.53 Acetic + 4.05 H2S + 20.15 Lactic\]
From before, we noticed that observations 15 , 12 and 8 were influential in the determination of the coefficients. The following table indicates the weight given to each observation when using robust regression.
We can see that roughly, as the absolute residual goes down, the weight goes up. In other words, cases with a large residuals tend to be down-weighted.
hweights <- data.frame(taste=cheddar$taste, #Observation
resid = FitAll.rr$resid, #Residual
weight = FitAll.rr$w) #Weight
hweights2 <- hweights[order(FitAll.rr$w), ]
hweights2 %>% kable(digits = 2)
taste | resid | weight | |
---|---|---|---|
15 | 54.9 | 27.11 | 0.42 |
12 | 57.2 | 17.52 | 0.65 |
8 | 21.9 | -16.16 | 0.70 |
3 | 39.0 | 14.32 | 0.80 |
18 | 6.4 | -13.61 | 0.84 |
28 | 0.7 | -11.41 | 1.00 |
1 | 12.3 | 9.99 | 1.00 |
2 | 20.9 | -1.69 | 1.00 |
4 | 47.9 | 10.65 | 1.00 |
5 | 5.6 | -1.87 | 1.00 |
6 | 25.9 | 2.63 | 1.00 |
7 | 37.3 | 5.74 | 1.00 |
9 | 18.1 | 4.78 | 1.00 |
10 | 21.0 | 1.05 | 1.00 |
11 | 34.9 | 5.72 | 1.00 |
13 | 0.7 | -5.18 | 1.00 |
14 | 25.9 | 8.47 | 1.00 |
16 | 40.9 | -2.49 | 1.00 |
17 | 15.9 | 4.77 | 1.00 |
19 | 18.0 | -11.05 | 1.00 |
20 | 38.9 | -8.82 | 1.00 |
21 | 14.0 | -1.47 | 1.00 |
22 | 15.2 | -3.87 | 1.00 |
23 | 32.0 | -5.34 | 1.00 |
24 | 56.7 | 4.61 | 1.00 |
25 | 16.8 | 4.54 | 1.00 |
26 | 11.6 | -0.84 | 1.00 |
27 | 26.5 | -5.70 | 1.00 |
29 | 13.4 | -5.79 | 1.00 |
30 | 5.5 | -8.86 | 1.00 |
Implementing with bisquare weighting simply requires the specification of the additional argument, as per the code below (see last argument)
FitAll.rr.2 = rlm(taste ~ Acetic + H2S + Lactic,
data=cheddar,
psi = psi.bisquare)
summary(FitAll.rr.2)
##
## Call: rlm(formula = taste ~ Acetic + H2S + Lactic, data = cheddar,
## psi = psi.bisquare)
## Residuals:
## Min 1Q Median 3Q Max
## -15.7034 -5.1552 -0.9793 5.6933 27.7661
##
## Coefficients:
## Value Std. Error t value
## (Intercept) -17.7730 20.7031 -0.8585
## Acetic -2.2650 4.6784 -0.4841
## H2S 4.0569 1.3096 3.0977
## Lactic 20.6885 9.0522 2.2855
##
## Residual standard error: 7.878 on 26 degrees of freedom
Regression Equation :
\[Taste* = -17.77 -2.26 Acetic + 4.05 H2S + 20.68 Lactic\]
Weights using Bisquare estimator.
hweights <- data.frame(taste=cheddar$taste, #Observation
resid = FitAll.rr.2$resid, #Residual
weight = FitAll.rr.2$w) #Weight
hweights2 <- hweights[order(FitAll.rr.2$w), ]
hweights2 %>% kable(digits=2)
taste | resid | weight | |
---|---|---|---|
15 | 54.9 | 27.77 | 0.19 |
12 | 57.2 | 18.18 | 0.57 |
8 | 21.9 | -15.70 | 0.67 |
3 | 39.0 | 14.38 | 0.72 |
18 | 6.4 | -13.46 | 0.75 |
28 | 0.7 | -11.19 | 0.82 |
19 | 18.0 | -11.11 | 0.83 |
4 | 47.9 | 10.86 | 0.83 |
1 | 12.3 | 9.85 | 0.86 |
20 | 38.9 | -8.95 | 0.89 |
14 | 25.9 | 8.59 | 0.89 |
30 | 5.5 | -8.02 | 0.91 |
7 | 37.3 | 6.33 | 0.94 |
11 | 34.9 | 6.00 | 0.95 |
13 | 0.7 | -5.47 | 0.96 |
23 | 32.0 | -5.16 | 0.96 |
29 | 13.4 | -5.15 | 0.96 |
27 | 26.5 | -4.93 | 0.96 |
24 | 56.7 | 4.77 | 0.97 |
25 | 16.8 | 4.76 | 0.97 |
9 | 18.1 | 4.66 | 0.97 |
17 | 15.9 | 4.65 | 0.97 |
22 | 15.2 | -3.72 | 0.98 |
6 | 25.9 | 3.19 | 0.99 |
5 | 5.6 | -1.99 | 0.99 |
16 | 40.9 | -1.81 | 1.00 |
21 | 14.0 | -1.76 | 1.00 |
2 | 20.9 | -1.75 | 1.00 |
10 | 21.0 | 1.02 | 1.00 |
26 | 11.6 | -0.20 | 1.00 |
We can see that the weight given to some observations is dramatically lower using the bisquare weighting function than the Huber weighting function and the coefficient estimates from these two different weighting methods differ.
When comparing the results of a regular OLS regression and a robust regression, if the results are very different, you will most likely want to use the results from the robust regression.
Large differences suggest that the model parameters are being highly influenced by outliers. Different functions have advantages and drawbacks.
Huber weights can have difficulties with severe outliers, and bisquare weights can have difficulties converging or may yield multiple solutions.