Measures of Influence

The impact of an observation on a regression fitting can be determined by the difference between the estimated regression coefficient of a model with all observations and the estimated coefficient when the particular observation is deleted.

Linear Model using “mtcars”

# Assume that we are fitting a multiple linear regression
# on the MTCARS data
library(car)
fit <- lm(mpg~disp+hp+wt+drat, data=mtcars)

DFBETA

The DFBETA statistic for measuring the influence of the th observation is defined as the one-step approximation to the difference in the MLE of the regression parameter vector and the MLE of the regression parameter vector without the \(i\)-th observation. This one-step approximation assumes a Fisher scoring step, and is given by

\[ DFBETA_{a} = \hat{\beta} - \hat{\beta}_{(i)} \\ = B(Y-Y_{\bar{i}}) \]

Computing DFBETAs

### First Five Cases
dfbeta(fit) %>% head(5)

##                   (Intercept)          disp            hp          wt
## Mazda RX4         -0.02267771 -0.0002647051  0.0005049280  0.04086788
## Mazda RX4 Wag      0.22158639  0.0003780904  0.0002551938 -0.05647338
## Datsun 710        -0.60700110  0.0007284006 -0.0001415078  0.02924580
## Hornet 4 Drive     0.53341019  0.0005131331 -0.0007342495 -0.05901785
## Hornet Sportabout  0.26289141  0.0010766448 -0.0005258034 -0.10018641
##                          drat
## Mazda RX4         -0.05856277
## Mazda RX4 Wag     -0.06243923
## Datsun 710         0.08059859
## Hornet 4 Drive    -0.09021621
## Hornet Sportabout -0.02411619

## using {car}
dfbetaPlots(fit, pch=18,col = "red")

DFFITS

DFFITS is a statistical measured designed to a show how influential an observation is in a statistical model. It is closely related to the studentized residual.

\[ DFFITS = {\widehat{y_i} - \widehat{y_{i(k)}} \over s_{(k)} \sqrt{h_{ii}}}\]

Computing DFFITS

dffits(fit)

##           Mazda RX4       Mazda RX4 Wag          Datsun 710      Hornet 4 Drive 
##         -0.23494875         -0.16323630         -0.24088916          0.12636107 
##   Hornet Sportabout             Valiant          Duster 360           Merc 240D 
##          0.12568741         -0.23301630         -0.16025758          0.22002691 
##            Merc 230            Merc 280           Merc 280C          Merc 450SE 
##          0.06202448         -0.19830768         -0.42431940          0.17440746 
##          Merc 450SL         Merc 450SLC  Cadillac Fleetwood Lincoln Continental 
##          0.10901061         -0.12335605         -0.07496510          0.12768870 
##   Chrysler Imperial            Fiat 128         Honda Civic      Toyota Corolla 
##          1.24884874          0.75755502         -0.10190373          0.85512599 
##       Toyota Corona    Dodge Challenger         AMC Javelin          Camaro Z28 
##         -0.29698092         -0.43453466         -0.45028635         -0.31083304 
##    Pontiac Firebird           Fiat X1-9       Porsche 914-2        Lotus Europa 
##          0.53995072         -0.04246382         -0.12803854          0.78669131 
##      Ford Pantera L        Ferrari Dino       Maserati Bora          Volvo 142E 
##         -0.70284716         -0.12598452          1.42878611         -0.25396193

PRESS

The prediction residual sum of squares (PRESS) is an value associated with this calculation. When fitting linear models, PRESS can be used as a criterion for model selection, with smaller values indicating better model fits. \[ PRESS = \sum(y-y^{(k)})^2 \]

Computing PRESS

For linear models, rstandard(, type = “predictive”) provides leave-one-out cross validation residuals, and the “PRESS” statistic (PREdictive Sum of Squares, the same as the CV score) of model “model”fit" is

   PRESS <- sum(rstandard(fit, type="pred")^2)