Cook’s Distance

Using Cook’s Distance

Cook’s Distance Formula

Cook’s D is a good measure of the influence of an observation and is proportional to the sum of the squared differences between predictions made with all observations in the analysis and predictions made leaving out the observation in question.

It is calculated as: \[D_i = \frac{ \sum_{j=1}^n (\hat Y_j\ - \hat Y_{j(i)})^2 }{p \ \mathrm{MSE}}, \]

where:

For the case of simple linear regression, the following are the algebraically equivalent expressions \[D_i = \frac{e_i^2}{p \ \mathrm{MSE}}\left[\frac{h_{ii}}{(1-h_{ii})^2}\right], \]

\[ D_i = \frac{ (\hat \beta - \hat {\beta}^{(-i)})^T(X^TX)(\hat \beta - \hat {\beta}^{(-i)}) } {(1+p)s^2}, \]

where:

R Code

R code for computing Cook’s Distance

fit = lm(mpg ~ cyl + wt,data=mtcars )
cooks.distance(fit)
##           Mazda RX4       Mazda RX4 Wag          Datsun 710      Hornet 4 Drive 
##        0.0050772590        0.0004442585        0.0567764620        0.0018029260 
##   Hornet Sportabout             Valiant          Duster 360           Merc 240D 
##        0.0235271472        0.0050205614        0.0178733213        0.0091033181 
##            Merc 230            Merc 280           Merc 280C          Merc 450SE 
##        0.0065061176        0.0004643600        0.0075293380        0.0116847953 
##          Merc 450SL         Merc 450SLC  Cadillac Fleetwood Lincoln Continental 
##        0.0102875723        0.0005228914        0.0035498738        0.0001501537 
##   Chrysler Imperial            Fiat 128         Honda Civic      Toyota Corolla 
##        0.3189363624        0.1592990291        0.0276449872        0.2233281268 
##       Toyota Corona    Dodge Challenger         AMC Javelin          Camaro Z28 
##        0.0913548207        0.0040263378        0.0120218543        0.0165559199 
##    Pontiac Firebird           Fiat X1-9       Porsche 914-2        Lotus Europa 
##        0.0569730451        0.0001790454        0.0033281614        0.0216355209 
##      Ford Pantera L        Ferrari Dino       Maserati Bora          Volvo 142E 
##        0.0237336584        0.0105550987        0.0072685192        0.0727399065
cooks.distance(fit)[which.max(cooks.distance(fit))]
## Chrysler Imperial 
##         0.3189364

Plots

plot(fit,which=4)

plot(cooks.distance(fit),type="b",pch=18,col="red")

N = 32
k = 2
cutoff = 4/ (N-k-1)
abline(h=cutoff,lty=2)

The broom::augment() function

library(broom)
augment(fit) 
## # A tibble: 32 x 10
##    .rownames     mpg   cyl    wt .fitted .resid   .hat .sigma .cooksd .std.resid
##    <chr>       <dbl> <dbl> <dbl>   <dbl>  <dbl>  <dbl>  <dbl>   <dbl>      <dbl>
##  1 Mazda RX4    21       6  2.62    22.3 -1.28  0.0548   2.60 5.08e-3     -0.512
##  2 Mazda RX4 ~  21       6  2.88    21.5 -0.465 0.0376   2.61 4.44e-4     -0.185
##  3 Datsun 710   22.8     4  2.32    26.3 -3.45  0.0798   2.52 5.68e-2     -1.40 
##  4 Hornet 4 D~  21.4     6  3.22    20.4  1.02  0.0321   2.61 1.80e-3      0.404
##  5 Hornet Spo~  18.7     8  3.44    16.6  2.05  0.0912   2.58 2.35e-2      0.839
##  6 Valiant      18.1     6  3.46    19.6 -1.50  0.0407   2.60 5.02e-3     -0.596
##  7 Duster 360   14.3     8  3.57    16.2 -1.93  0.0801   2.59 1.79e-2     -0.785
##  8 Merc 240D    24.4     4  3.19    23.5  0.924 0.152    2.61 9.10e-3      0.391
##  9 Merc 230     22.8     4  3.15    23.6 -0.804 0.146    2.61 6.51e-3     -0.339
## 10 Merc 280     19.2     6  3.44    19.7 -0.463 0.0396   2.61 4.64e-4     -0.184
## # ... with 22 more rows

Interpretation

A common rule of thumb is that an observation with a value of Cook’s D over 1.0 has too much influence. As with all rules of thumb, this rule should be applied judiciously and not thoughtlessly.

Cook’s Distance in relation to other measures

References