The impact of an observation on a regression fitting can be determined by the difference between the estimated regression coefficient of a model with all observations and the estimated coefficient when the particular observation is deleted.
# Assume that we are fitting a multiple linear regression
# on the MTCARS data
library(car)
fit <- lm(mpg~disp+hp+wt+drat, data=mtcars)
The DFBETA statistic for measuring the influence of the th observation is defined as the one-step approximation to the difference in the MLE of the regression parameter vector and the MLE of the regression parameter vector without the \(i\)-th observation. This one-step approximation assumes a Fisher scoring step, and is given by
\[ DFBETA_{a} = \hat{\beta} - \hat{\beta}_{(i)} \\ = B(Y-Y_{\bar{i}}) \]
### First Five Cases
dfbeta(fit) %>% head(5)
## (Intercept) disp hp wt
## Mazda RX4 -0.02267771 -0.0002647051 0.0005049280 0.04086788
## Mazda RX4 Wag 0.22158639 0.0003780904 0.0002551938 -0.05647338
## Datsun 710 -0.60700110 0.0007284006 -0.0001415078 0.02924580
## Hornet 4 Drive 0.53341019 0.0005131331 -0.0007342495 -0.05901785
## Hornet Sportabout 0.26289141 0.0010766448 -0.0005258034 -0.10018641
## drat
## Mazda RX4 -0.05856277
## Mazda RX4 Wag -0.06243923
## Datsun 710 0.08059859
## Hornet 4 Drive -0.09021621
## Hornet Sportabout -0.02411619
## using {car}
dfbetaPlots(fit, pch=18,col = "red")
DFFITS is a statistical measured designed to a show how influential an observation is in a statistical model. It is closely related to the studentized residual.
\[ DFFITS = {\widehat{y_i} - \widehat{y_{i(k)}} \over s_{(k)} \sqrt{h_{ii}}}\]
dffits(fit)
## Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive
## -0.23494875 -0.16323630 -0.24088916 0.12636107
## Hornet Sportabout Valiant Duster 360 Merc 240D
## 0.12568741 -0.23301630 -0.16025758 0.22002691
## Merc 230 Merc 280 Merc 280C Merc 450SE
## 0.06202448 -0.19830768 -0.42431940 0.17440746
## Merc 450SL Merc 450SLC Cadillac Fleetwood Lincoln Continental
## 0.10901061 -0.12335605 -0.07496510 0.12768870
## Chrysler Imperial Fiat 128 Honda Civic Toyota Corolla
## 1.24884874 0.75755502 -0.10190373 0.85512599
## Toyota Corona Dodge Challenger AMC Javelin Camaro Z28
## -0.29698092 -0.43453466 -0.45028635 -0.31083304
## Pontiac Firebird Fiat X1-9 Porsche 914-2 Lotus Europa
## 0.53995072 -0.04246382 -0.12803854 0.78669131
## Ford Pantera L Ferrari Dino Maserati Bora Volvo 142E
## -0.70284716 -0.12598452 1.42878611 -0.25396193
The prediction residual sum of squares (PRESS) is an value associated with this calculation. When fitting linear models, PRESS can be used as a criterion for model selection, with smaller values indicating better model fits. \[ PRESS = \sum(y-y^{(k)})^2 \]
For linear models, rstandard(, type = “predictive”) provides leave-one-out cross validation residuals, and the “PRESS” statistic (PREdictive Sum of Squares, the same as the CV score) of model “model”fit" is
PRESS <- sum(rstandard(fit, type="pred")^2)