Regression Deletion Diagnostics
This suite of functions can be used to compute some of the regression (leave-one-out deletion) diagnostics for linear and generalized linear models discussed in Belsley, Kuh and Welsch (1980), Cook and Weisberg (1982), etc.
Details
The primary high-level function is
influence.measures()
which produces a class "infl" object tabular display showing the DFBETAS for each model variable, DFFITS, covariance ratios, Cook's distances and the diagonal elements of the hat matrix. Cases which are influential with respect to any of these measures are marked with an asterisk.The functions
dfbetas()
,dffits()
,covratio()
andcooks.distance()
provide direct access to the corresponding diagnostic quantities.Functions
rstandard()
andrstudent()
give the standardized and Studentized residuals respectively.(These functions re-normalize the residuals to have unit variance, using an overall and leave-one-out measure of the error variance respectively.)
R commands
The optional
infl()
,res()
andsd()
arguments are there to encourage the use of these direct access functions, in situations where, e.g., the underlying basic influence measures (fromlm.influence()
or the generic influence) are already available.Note that cases with
weights == 0
are dropped from all these functions, but that if a linear model has been fitted withna.action = na.exclude
, suitable values are filled in for the cases excluded during fitting.
Implementation
## S3 method for class 'lm' rstandard(Fit_4, infl = lm.influence(Fit_4, do.coef = FALSE), sd = sqrt(deviance(Fit_4)/df.residual(Fit_4)))
#rstandard(Fit_4)
rstudent()
### rstudent(Fit_4=)
## S3 method for class 'lm'
#rstudent(Fit_4, infl = lm.influence(Fit_4, do.coef = FALSE),
#res = infl$wt.res)
dffits(Fit_4)
## 1 2 3 4 5 6
## 0.52971523 -0.05017807 0.36652235 0.32089734 -0.05655190 0.10479628
## 7 8 9 10 11 12
## 0.26450803 -0.56627494 0.15075718 0.02864957 0.12823528 0.72871457
## 13 14 15 16 17 18
## -0.19602704 0.17406599 0.94834090 -0.18018808 0.16248110 -0.38022994
## 19 20 21 22 23 24
## -0.34375517 -0.58567945 -0.02588694 -0.08480492 -0.29968252 0.25877488
## 25 26 27 28 29 30
## 0.11729707 -0.19188145 -0.34863248 -0.34332200 -0.38166182 -0.64420967
Influential Points in Regression
Sometimes in regression analysis, a few data points have disproportionate effects on the slope of the regression equation. We can describe how to identify those influential points.
DFBETAS
inflm.fit <- influence.measures(Fit_4)
which(apply(inflm.fit$is.inf, 1, any))
## 6 15 24 26
## 6 15 24 26
dfbeta(Fit_4)
## (Intercept) Acetic H2S Lactic
## 1 7.07731820 -0.68012625 0.0420550124 -2.19359377
## 2 -0.38764360 0.11032996 0.0202150540 -0.27859860
## 3 1.19605590 -0.47130371 -0.1602239216 1.97108492
## 4 -0.60698006 -0.30825961 0.0168790642 1.77930153
## 5 -0.83595216 0.09838964 -0.0083766644 0.20147832
## 6 0.11759125 0.11243109 0.0837737105 -0.80330668
## 7 -0.07205384 0.23381240 0.2459615860 -1.71692172
## 8 4.83294413 -0.52904953 -0.0674181836 -1.49515621
## 9 1.70869606 -0.29187091 -0.0680016586 0.33250675
## 10 0.05436601 -0.02673581 -0.0238338121 0.18218091
## 11 -0.77410871 0.06359132 -0.0656032228 0.68395577
## 12 -10.08450212 1.56441733 -0.1891603413 2.23346307
## 13 -3.03365018 0.46595867 0.0343966900 0.06437970
## 14 1.32398602 -0.10433473 -0.0288768367 -0.20427672
## 15 -11.11994787 2.79814047 -0.1268840267 -1.78794068
## 16 1.56449261 -0.26498213 -0.1151354613 0.28964504
## 17 2.32664393 -0.33710478 -0.0196510188 -0.11807832
## 18 0.19882307 -0.03715341 0.2918596950 -1.54856780
## 19 -2.78292195 0.93355752 -0.0009873396 -1.90407108
## 20 -3.63470859 1.69175579 -0.2757600708 -3.06238914
## 21 -0.47524232 0.08659421 -0.0120878916 0.03299263
## 22 -0.57516179 0.04292218 0.0079113411 0.10057971
## 23 -2.40345340 0.51056494 -0.3465413439 0.99119107
## 24 0.51352039 -0.49721920 0.1737491648 0.95109008
## 25 -0.23243413 0.15705824 -0.1079103901 0.10333016
## 26 2.37695270 -0.60880791 0.2042806670 -0.25751941
## 27 5.75579661 -1.19050910 0.1533294223 -0.29643604
## 28 0.01256323 -0.42767305 0.2416490573 0.32629777
## 29 0.93560400 -0.78927388 -0.1888564588 2.93631661
## 30 7.93824546 -2.50462227 0.3153678402 2.42252121
## S3 method for class 'lm'
dfbeta(Fit_4,
infl = lm.influence(Fit_4, do.coef = TRUE))
## (Intercept) Acetic H2S Lactic
## 1 7.07731820 -0.68012625 0.0420550124 -2.19359377
## 2 -0.38764360 0.11032996 0.0202150540 -0.27859860
## 3 1.19605590 -0.47130371 -0.1602239216 1.97108492
## 4 -0.60698006 -0.30825961 0.0168790642 1.77930153
## 5 -0.83595216 0.09838964 -0.0083766644 0.20147832
## 6 0.11759125 0.11243109 0.0837737105 -0.80330668
## 7 -0.07205384 0.23381240 0.2459615860 -1.71692172
## 8 4.83294413 -0.52904953 -0.0674181836 -1.49515621
## 9 1.70869606 -0.29187091 -0.0680016586 0.33250675
## 10 0.05436601 -0.02673581 -0.0238338121 0.18218091
## 11 -0.77410871 0.06359132 -0.0656032228 0.68395577
## 12 -10.08450212 1.56441733 -0.1891603413 2.23346307
## 13 -3.03365018 0.46595867 0.0343966900 0.06437970
## 14 1.32398602 -0.10433473 -0.0288768367 -0.20427672
## 15 -11.11994787 2.79814047 -0.1268840267 -1.78794068
## 16 1.56449261 -0.26498213 -0.1151354613 0.28964504
## 17 2.32664393 -0.33710478 -0.0196510188 -0.11807832
## 18 0.19882307 -0.03715341 0.2918596950 -1.54856780
## 19 -2.78292195 0.93355752 -0.0009873396 -1.90407108
## 20 -3.63470859 1.69175579 -0.2757600708 -3.06238914
## 21 -0.47524232 0.08659421 -0.0120878916 0.03299263
## 22 -0.57516179 0.04292218 0.0079113411 0.10057971
## 23 -2.40345340 0.51056494 -0.3465413439 0.99119107
## 24 0.51352039 -0.49721920 0.1737491648 0.95109008
## 25 -0.23243413 0.15705824 -0.1079103901 0.10333016
## 26 2.37695270 -0.60880791 0.2042806670 -0.25751941
## 27 5.75579661 -1.19050910 0.1533294223 -0.29643604
## 28 0.01256323 -0.42767305 0.2416490573 0.32629777
## 29 0.93560400 -0.78927388 -0.1888564588 2.93631661
## 30 7.93824546 -2.50462227 0.3153678402 2.42252121
dfbetas(Fit_4)%>%
head(4) %>%
round(2)
## (Intercept) Acetic H2S Lactic
## 1 0.36 -0.15 0.03 -0.26
## 2 -0.02 0.02 0.02 -0.03
## 3 0.06 -0.11 -0.13 0.23
## 4 -0.03 -0.07 0.01 0.21
DFBETAS
## S3 method for class 'lm'
dfbetas(Fit_4,
infl = lm.influence(Fit_4, do.coef = TRUE)) %>%
head(4) %>%
round(2)
## (Intercept) Acetic H2S Lactic
## 1 0.36 -0.15 0.03 -0.26
## 2 -0.02 0.02 0.02 -0.03
## 3 0.06 -0.11 -0.13 0.23
## 4 -0.03 -0.07 0.01 0.21
COVRATIOS
covratio(Fit_4,
infl = lm.influence(Fit_4, do.coef = FALSE),
res = weighted.residuals(Fit_4))%>%
head(4) %>%
round(2)
## 1 2 3 4
## 1.15 1.26 0.90 1.09
Arguments
cooks.distance(Fit_4)
## S3 method for class 'lm'
cooks.distance(Fit_4,
infl = lm.influence(Fit_4, do.coef = FALSE),
res = weighted.residuals(Fit_4),
sd = sqrt(deviance(Fit_4)/df.residual(Fit_4)),
hat = infl$hat)
hatvalues(Fit_4) %>% head(6) %>% round(2)
## 1 2 3 4 5 6
## 0.18 0.08 0.06 0.09 0.13 0.23
## S3 method for class 'lm'
## hatvalues(Fit_4,
## infl = lm.influence(Fit_4, do.coef = FALSE))