env <- new.env()
load("../data.rda", envir = env)
df <- get("CH10TA01", envir = env)
names(df) <- c("X1", "X2", "Y")
fit <- lm(Y ~ ., df)
cbind(
"Avg Income" = df$X1,
"Risk Aversion" = df$X2,
"Life Insurance" = df$Y)
## Avg Income Risk Aversion Life Insurance
## [1,] 45.01 6 91
## [2,] 57.20 4 162
## [3,] 26.85 5 11
## [4,] 66.29 7 240
## [5,] 40.96 5 73
## [6,] 73.00 10 311
## [7,] 79.38 1 316
## [8,] 52.77 8 154
## [9,] 55.92 6 164
## [10,] 38.12 4 54
## [11,] 35.84 6 53
## [12,] 75.80 9 326
## [13,] 37.41 5 55
## [14,] 54.38 2 130
## [15,] 46.19 7 112
## [16,] 46.13 4 91
## [17,] 30.37 3 14
## [18,] 39.06 5 63
The car package contains avPlots
for this, among other useful diagnostic functions. See below. At this time, Quick-R contains some examples at
A <- resid(lm(Y ~ X2, df))
B <- resid(lm(X1 ~ X2, df))
par(mfrow = c(1, 2), pch = 19)
plot(resid(fit) ~ X1, df, xlab = "X1", ylab = "Residual")
title("(a) Residual Plot against X1")
abline(0, 0, lty = 2)
plot(A ~ B, xlab = "e(X1|X2)", ylab = "e(Y|X2)")
title("(b) Added-Variable Plot for X1")
abline(lm(A ~ B))
abline(0, 0, lty = 2)
df <- get("CH07TA01", envir = env)
names(df) <- c("X1", "X2", "X3", "Y")
fit <- lm(Y ~ X1 + X2, df)
The above manual approach can be extended to this example, but no more pedagogical benefit derives. The reader should be able to match the output to that in the book.
library(car)
avPlots(fit)
residualPlots(fit, fitted = FALSE, quadratic = FALSE, test = FALSE)
See Chapter 5 tutorial on how to do the matrix algebra. This will all be done using the R utilities.
Note, you can use model.matrix(fit)
to obtain the X matrix for calculating the hat matrix and error variance accordingly. For example
X %*% solve(t(X) %*% X) %*% t(X)
fit <- lm(Y ~ X1 + X2, data =
data.frame(
'Y' = c(301, 327, 246, 187),
'X1' = c(14, 19, 12, 11),
'X2' = c(25, 32, 22, 15)))
cbind(
'X1' = model.frame(fit)$X1,
'X2' = model.frame(fit)$X2,
'Y' = model.frame(fit)$Y,
'Yhat' = fitted(fit),
'e' = resid(fit),
'h' = hatvalues(fit),
's' = deviance(fit) * (1 - hatvalues(fit)))
## X1 X2 Y Yhat e h s
## 1 14 25 301 282.2 18.7621 0.3877 352.0155
## 2 19 32 327 332.3 -5.2919 0.9513 28.0039
## 3 12 22 246 260.0 -13.9513 0.6614 194.6384
## 4 11 15 187 186.5 0.4811 0.9996 0.2314
plot(X2 ~ X1, model.frame(fit), pch = 19, xlim = c(10, 20), ylim = c(10, 37),
xlab = "X1", ylab = "X2", main = "FIGURE 10.6")
text(X2 ~ X1, model.frame(fit), labels = round(hatvalues(fit), 4), pos = 3)
points(mean(X2) ~ mean(X1), model.frame(fit), pch = 22)
text(mean(X2) ~ mean(X1), model.frame(fit),
cex = 0.75, pos = 4, labels = "(X1bar, X2bar)")
fit <- lm(Y ~ X1 + X2, df)
cbind(
'e' = round(resid(fit), 3),
'h' = round(hatvalues(fit), 3),
't' = round(rstudent(fit), 3))
## e h t
## 1 -1.683 0.201 -0.730
## 2 3.643 0.059 1.534
## 3 -3.176 0.372 -1.654
## 4 -3.158 0.111 -1.348
## 5 0.000 0.248 0.000
## 6 -0.361 0.129 -0.148
## 7 0.716 0.156 0.298
## 8 4.015 0.096 1.760
## 9 2.655 0.115 1.118
## 10 -2.475 0.110 -1.034
## 11 0.336 0.120 0.137
## 12 2.226 0.109 0.923
## 13 -3.947 0.178 -1.826
## 14 3.447 0.148 1.525
## 15 0.571 0.333 0.267
## 16 0.642 0.095 0.258
## 17 -0.851 0.106 -0.345
## 18 -0.783 0.197 -0.334
## 19 -2.857 0.067 -1.176
## 20 1.040 0.050 0.409
plot(X2 ~ X1, df, type = 'n', xlim = c(13, 33), ylim = c(40, 60),
xlab = "Triceps Skinfold Thickness", ylab = "Thigh Circumference")
text(X2 ~ X1, df, labels = seq(nrow(df)), cex=.75)
Compare with the R function influence.measures(fit)
cbind(
"DFFITS" = round(dffits(fit), 4),
"D" = round(cooks.distance(fit), 4),
"DFBETA0" = round(dfbetas(fit)[,1], 4),
"DFBETA1" = round(dfbetas(fit)[,2], 4),
"DFBETA2" = round(dfbetas(fit)[,3], 4))
## DFFITS D DFBETA0 DFBETA1 DFBETA2
## 1 -0.3661 0.0460 -0.3052 -0.1315 0.2320
## 2 0.3838 0.0455 0.1726 0.1150 -0.1426
## 3 -1.2731 0.4902 -0.8471 -1.1825 1.0669
## 4 -0.4763 0.0722 -0.1016 -0.2935 0.1961
## 5 -0.0001 0.0000 -0.0001 0.0000 0.0001
## 6 -0.0567 0.0011 0.0397 0.0401 -0.0443
## 7 0.1279 0.0058 -0.0775 -0.0156 0.0543
## 8 0.5745 0.0979 0.2614 0.3911 -0.3325
## 9 0.4022 0.0531 -0.1514 -0.2947 0.2469
## 10 -0.3639 0.0440 0.2377 0.2446 -0.2688
## 11 0.0505 0.0009 -0.0090 0.0171 -0.0025
## 12 0.3233 0.0352 -0.1305 0.0225 0.0700
## 13 -0.8508 0.2122 0.1194 0.5924 -0.3895
## 14 0.6355 0.1249 0.4517 0.1132 -0.2977
## 15 0.1889 0.0126 -0.0030 -0.1248 0.0688
## 16 0.0838 0.0025 0.0093 0.0431 -0.0251
## 17 -0.1184 0.0049 0.0795 0.0550 -0.0761
## 18 -0.1655 0.0096 0.1321 0.0753 -0.1161
## 19 -0.3151 0.0324 -0.1296 -0.0041 0.0644
## 20 0.0940 0.0031 0.0102 0.0023 -0.0033
par(mfrow = c(1, 2), pch = 19)
plot(resid(fit) ~ fitted(fit), cex = cooks.distance(fit)*10,
xlim = c(10, 30), ylim = c(-4.5, 4.5),
xlab = "YHAT", ylab = "Residual")
title("(a) Proportional Influence Plot")
plot(seq(nrow(df)), cooks.distance(fit), type = "o", lwd = 2,
xlab = "Case Index Number", ylab = "Cook's Distance D")
title("(b) Index Influence Plot")
R has vif
(car) for this. It takes in an lm object. To get the standardized coefficients we update the model by generically scaling the values. The maximum VIF is clear, and an average can easily be computed. I leave it to the interested student to perform that simple task.
fit <- lm(scale(Y) ~ scale(X1) + scale(X2) + scale(X3), df)
cbind(
"Beta*" = round(as.vector(coef(fit)[-1]), 4),
"VIF" = round(vif(lm(Y ~ ., df)), 2))
## Beta* VIF
## X1 4.264 708.8
## X2 -2.929 564.3
## X3 -1.561 104.6
Note: R > 2.15 has paste0
. No need for sep = ""
anymore.
df <- get("CH09TA01", envir = env)
names(df) <- c(paste("X", seq(8), sep = ""), "Y" , "lnY")
fit <- lm(lnY ~ X1 + X2 + X3 + X8, df)
vif(fit) # For page 412
## X1 X2 X3 X8
## 1.103 1.020 1.049 1.092
The car functions used earlier can be used to explore the unshown information discussed on page 410. This example, however, requires relating added-values from variables not within the model. Thus, it cannot be used with residualPlots
or avPlots
. It may be possible to fit a full model and use these functions with appropriately selected 'terms' parameters, though.
par(mfrow = c(2, 2), pch = 19)
plot(resid(fit) ~ fitted(fit), ylim = c(-0.7, 0.7),
xlab = "Predicted Value", ylab = "Residual")
title("(a) Residual Plot against Predicted")
plot(resid(fit) ~ df$X5, ylim = c(-0.7, 0.7),
xlab = "X5", ylab = "Residual")
title("(b) Residual Plot against X5")
plot(resid(fit) ~ resid(update(fit, X5 ~ .)), ylim = c(-0.7, 0.7),
xlab = "e(X5|X1238)", ylab = "e(Y'|X1238)")
title("(c) Added-Variable Plot for X5")
abline(lm(resid(fit) ~ resid(update(fit, X5 ~ .))) )
qqnorm(resid(fit), xlab = "Expected Value", ylab = "Residual", main = "")
qqline(resid(fit))
title("(d) Normal Probability Plot")
par(mfrow = c(2, 2), pch = 19)
plot(seq(nrow(df)), rstudent(fit), type = "o",
main = "(a) Studentized Deleted Residuals",
xlab = "Case Index", ylab = "t")
plot(seq(nrow(df)), hatvalues(fit), type = "o",
main = "(b) Leverage Values",
xlab = "Case Index", ylab = "h")
plot(seq(nrow(df)), cooks.distance(fit), type = "o",
main = "(c) Cook's Distance",
xlab = "Case Index", ylab = "D")
plot(seq(nrow(df)), dffits(fit), type = "o",
main = "(d) DFFITS values",
xlab = "Case Index", ylab = "DFFITS")
cases <- c(17, 23, 28, 32, 38, 42, 52)
cbind(
'e' = round(resid(fit), 4),
't' = round(rstudent(fit), 4),
'h' = round(hatvalues(fit), 4),
'D' = round(cooks.distance(fit), 4),
'DFFITS' = round(dffits(fit), 4))[cases, ]
## e t h D DFFITS
## 17 0.5952 3.3696 0.1499 0.3306 1.4151
## 23 0.2788 1.4854 0.1885 0.1001 0.7160
## 28 0.0876 0.4896 0.2914 0.0200 0.3140
## 32 -0.2861 -1.5585 0.2202 0.1333 -0.8283
## 38 -0.2271 -1.3016 0.3059 0.1472 -0.8641
## 42 -0.0303 -0.1620 0.2262 0.0016 -0.0876
## 52 -0.1375 -0.7358 0.2221 0.0312 -0.3931