We will use the following exemplar model for demonstration purposes. This data is from the mtcars data set.
library(car)
fit <- lm(mpg~disp+hp+wt+drat, data=mtcars)
In linear regression model, the leverage score for \[ h_{ii} = (H)_{ii}\] the \(i-th\) data unit is defined as:
\[ H = X(X^{T} X)^{-1}X^{T} \] the diagonal of the hat matrix.
The leverage score is also known as the observation self-sensitivity or self-influence
The leverage of an observation is based on how much the observation’s value on the predictor variable differs from the mean of the predictor variable. The greater an observation’s leverage, the more potential it has to be an influential observation.
For example, an observation with a value equal to the mean on the predictor variable has no influence on the slope of the regression line regardless of its value on the criterion variable. On the other hand, an observation that is extreme on the predictor variable has the potential to affect the slope greatly.
Generally, a point with leverage greater than \((2k+2)/n\) should be carefully examined, where k is the number of predictor variables and n is the number of observations.
Leverage points do not necessarily have a large effect on the outcome of fitting regression models.
Leverage points are those observations, if any, made at extreme or outlying values of the independent variables such that the lack of neighboring observations means that the fitted regression model will pass close to that particular observation.
Another result of the fact that points further out on X have more leverage is that they tend to be closer to the regression line (or more accurately: the regression line is fit so as to be closer to them) than points that are near \(\bar{x}\). In other words, the residual standard deviation can differ at different points on X (even if the error standard deviation is constant). To correct for this, residuals are often standardized so that they have constant variance (assuming the underlying data generating process is homoscedastic, of course).
The first step is to standardize the predictor variable so that it has a mean of 0 and a standard deviation of 1.
Then, the leverage (h) is computed by squaring the observation’s value on the standardized predictor variable, adding 1, and dividing by the number of observations.
broom::augment(fit)
## # A tibble: 32 x 12
## .rownames mpg disp hp wt drat .fitted .resid .hat .sigma .cooksd
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Mazda RX4 21 160 110 2.62 3.9 23.7 -2.71 0.0460 2.60 1.10e-2
## 2 Mazda RX4~ 21 160 110 2.88 3.9 22.8 -1.82 0.0499 2.63 5.43e-3
## 3 Datsun 710 22.8 108 93 2.32 3.85 25.1 -2.26 0.0674 2.61 1.17e-2
## 4 Hornet 4 ~ 21.4 258 110 3.22 3.08 20.6 0.835 0.123 2.65 3.30e-3
## 5 Hornet Sp~ 18.7 360 175 3.44 3.15 18.0 0.666 0.172 2.65 3.27e-3
## 6 Valiant 18.1 225 105 3.46 2.76 19.2 -1.10 0.201 2.64 1.12e-2
## 7 Duster 360 14.3 360 245 3.57 3.21 15.3 -0.953 0.145 2.64 5.30e-3
## 8 Merc 240D 24.4 147. 62 3.19 3.69 23.0 1.42 0.126 2.63 9.93e-3
## 9 Merc 230 22.8 141. 95 3.15 3.92 22.4 0.449 0.107 2.65 7.98e-4
## 10 Merc 280 19.2 168. 123 3.44 3.92 20.5 -1.27 0.129 2.64 8.08e-3
## # ... with 22 more rows, and 1 more variable: .std.resid <dbl>
The leveragePlot() function is contained in the {car} R package and is used to display a generalization of added-variable plots to multiple-df terms in a linear model.
# leverage plots
leveragePlots(fit,layout=c(2,2))
The hat matrix, H, sometimes also called influence matrix and projection matrix, maps the vector of observed values to the vector of fitted values (or predicted values). It describes the influence each observed value has on each fitted value.
The diagonal elements of the hat matrix are the leverages, which describe the influence each observed value has on the fitted value for that same observation.
If the vector of observed values is denoted by y and the vector of fitted values by \(\hat{y}\),
\[ \hat{y} = H y\]