env <- new.env()
load("../data.rda", envir = env)
df <- get("CH07TA01", envir = env)
names(df) <- c("x1", "x2", "x3", "y")
fit <- lm(y ~ x1 + x2 + x3, df)
with(df, cbind("Triceps" = x1,
"Thigh" = x2,
"Midarm" = x3,
"Body Fat" = y))
## Triceps Thigh Midarm Body Fat
## [1,] 19.5 43.1 29.1 11.9
## [2,] 24.7 49.8 28.2 22.8
## [3,] 30.7 51.9 37.0 18.7
## [4,] 29.8 54.3 31.1 20.1
## [5,] 19.1 42.2 30.9 12.9
## [6,] 25.6 53.9 23.7 21.7
## [7,] 31.4 58.5 27.6 27.1
## [8,] 27.9 52.1 30.6 25.4
## [9,] 22.1 49.9 23.2 21.3
## [10,] 25.5 53.5 24.8 19.3
## [11,] 31.1 56.6 30.0 25.4
## [12,] 30.4 56.7 28.3 27.2
## [13,] 18.7 46.5 23.0 11.7
## [14,] 19.7 44.2 28.6 17.8
## [15,] 14.6 42.7 21.3 12.8
## [16,] 29.5 54.4 30.1 23.9
## [17,] 27.7 55.3 25.7 22.6
## [18,] 30.2 58.6 24.6 25.4
## [19,] 22.7 48.2 27.1 14.8
## [20,] 25.2 51.0 27.5 21.1
One thing to realize with R ANOVA results is that it doesn't give you an SSR for the entire regression. Instead, it gives you the sum of squares for the first term in the model, then the extra from the addition of the second term, and so on. With their example of y ~ x1 + x2 we have SSR = 385.44; however, the ANOVA results provided would need to be aggregated to obtain that answer. Therefore, what we require is:
SSR(x1) + SSR(x2|x1) = 352.27 + 33.17 = 385.44 = SSR(x1, x2)
So be aware that the R ANOVA results provide the extra SS directly, depending on how the terms were entered into the model.
summary(lm(y ~ x1, df))
##
## Call:
## lm(formula = y ~ x1, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.120 -2.190 0.674 1.938 3.852
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.496 3.319 -0.45 0.66
## x1 0.857 0.129 6.66 3e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.82 on 18 degrees of freedom
## Multiple R-squared: 0.711, Adjusted R-squared: 0.695
## F-statistic: 44.3 on 1 and 18 DF, p-value: 3.02e-06
anova(lm(y ~ x1, df))
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## x1 1 352 352 44.3 3e-06 ***
## Residuals 18 143 8
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(lm(y ~ x2, df))
##
## Call:
## lm(formula = y ~ x2, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.495 -1.567 0.124 1.336 4.408
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -23.634 5.657 -4.18 0.00057 ***
## x2 0.857 0.110 7.79 3.6e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.51 on 18 degrees of freedom
## Multiple R-squared: 0.771, Adjusted R-squared: 0.758
## F-statistic: 60.6 on 1 and 18 DF, p-value: 3.6e-07
anova(lm(y ~ x2, df))
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## x2 1 382 382 60.6 3.6e-07 ***
## Residuals 18 113 6
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(lm(y ~ x1 + x2, df))
##
## Call:
## lm(formula = y ~ x1 + x2, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.947 -1.881 0.168 1.337 4.015
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -19.174 8.361 -2.29 0.035 *
## x1 0.222 0.303 0.73 0.474
## x2 0.659 0.291 2.26 0.037 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.54 on 17 degrees of freedom
## Multiple R-squared: 0.778, Adjusted R-squared: 0.752
## F-statistic: 29.8 on 2 and 17 DF, p-value: 2.77e-06
anova(lm(y ~ x1 + x2, df))
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## x1 1 352 352 54.47 1.1e-06 ***
## x2 1 33 33 5.13 0.037 *
## Residuals 17 110 6
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(fit)
##
## Call:
## lm(formula = y ~ x1 + x2 + x3, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.726 -1.611 0.392 1.466 4.128
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 117.08 99.78 1.17 0.26
## x1 4.33 3.02 1.44 0.17
## x2 -2.86 2.58 -1.11 0.28
## x3 -2.19 1.60 -1.37 0.19
##
## Residual standard error: 2.48 on 16 degrees of freedom
## Multiple R-squared: 0.801, Adjusted R-squared: 0.764
## F-statistic: 21.5 on 3 and 16 DF, p-value: 7.34e-06
anova(fit)
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## x1 1 352 352 57.28 1.1e-06 ***
## x2 1 33 33 5.39 0.034 *
## x3 1 12 12 1.88 0.190
## Residuals 16 98 6
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
For numeric subscripting, the following will be utilized
Number Rows Columns
1 x1 (x1) Df
2 x2 (x2 | x1) Sum Sq
3 x3 (x3 | x1, x2) Mean Sq
4 Residuals (pure error)
The code below compiles each column row-wise. Naming is only required for each record once, but indentation is provided for clarity.
fit.aov <- anova(fit)
tab <- as.table(cbind(
'SS' = c("SSR(x1, x2, x3)" = sum(fit.aov[1:3, 2]),
"SSR(x1)" = fit.aov[1, 2],
"SSR(x2|x1)" = fit.aov[2, 2],
"SSR(x3|x1, x2)" = fit.aov[3, 2],
"SSE" = fit.aov[4, 2],
"Total" = sum(fit.aov[, 2])),
'Df' = c( sum(fit.aov[1:3, 1]),
fit.aov[1, 1],
fit.aov[2, 1],
fit.aov[3, 1],
fit.aov[4, 1],
sum(fit.aov$Df)),
'MS' = c( sum(fit.aov[1:3, 2]) / sum(fit.aov[1:3, 1]),
fit.aov[1, 3],
fit.aov[2, 3],
fit.aov[3, 3],
fit.aov[4, 3],
NA)
))
round(tab, 2)
## SS Df MS
## SSR(x1, x2, x3) 396.98 3.00 132.33
## SSR(x1) 352.27 1.00 352.27
## SSR(x2|x1) 33.17 1.00 33.17
## SSR(x3|x1, x2) 11.55 1.00 11.55
## SSE 98.40 16.00 6.15
## Total 495.39 19.00
Since the above example demonstrates how to extract each of the components in an extra SS analysis, I will assume the reader can figure out how to compute the arithmetic with those components. Instead, here will be two ways to perform the same analysis. The anova function can compare models in precisely this way. R also has the drop1 that does the same test for each of possible “drop 1 term” models.
anova(update(fit, . ~ . - x3), fit) # the "." indicates "old model terms"
## Analysis of Variance Table
##
## Model 1: y ~ x1 + x2
## Model 2: y ~ x1 + x2 + x3
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 17 110.0
## 2 16 98.4 1 11.6 1.88 0.19
drop1(fit, test = "F") # Alternative for each term
## Single term deletions
##
## Model:
## y ~ x1 + x2 + x3
## Df Sum of Sq RSS AIC F value Pr(>F)
## <none> 98.4 39.9
## x1 1 12.70 111.1 40.3 2.07 0.17
## x2 1 7.53 105.9 39.3 1.22 0.28
## x3 1 11.55 110.0 40.1 1.88 0.19
anova(lm(y ~ x1, df), fit)
## Analysis of Variance Table
##
## Model 1: y ~ x1
## Model 2: y ~ x1 + x2 + x3
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 18 143.1
## 2 16 98.4 2 44.7 3.64 0.05 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
sign.1 <- sign(coef(lm(y ~ x1 + x2, df)))[["x1"]]
sign.2 <- sign(coef(lm(y ~ x1 + x2, df)))[["x2"]]
sign.3 <- sign(coef(fit))[["x3"]]
rbind(
"2|1" = c("Rsq" = anova(fit)["x2", 2] / anova(lm(y ~ x1, df))["Residuals", 2],
"r" = sign.2 * sqrt(anova(fit)["x2", 2] / anova(lm(y ~ x1, df))["Residuals", 2])),
"3|12" = c( anova(fit)["x3", 2] / anova(lm(y ~ x1 + x2, df))["Residuals", 2],
sign.3 * sqrt(anova(fit)["x3", 2] / anova(lm(y ~ x1 + x2, df))["Residuals", 2])),
"1|2" = c( anova(lm(y ~ x2 + x1, df))["x1", 2] / anova(lm(y ~ x2, df))["Residuals", 2],
sign.1 * sqrt(anova(lm(y ~ x2 + x1, df))["x1", 2] / anova(lm(y ~ x2, df))["Residuals", 2]))
)
## Rsq r
## 2|1 0.23176 0.4814
## 3|12 0.10501 -0.3241
## 1|2 0.03062 0.1750
df <- get("CH06FI05", envir = env)
names(df) <- c("x1", "x2", "y")
with(df, cbind(Sales = y, Population = x1, Income = x2)) # Table 7.5(a)
## Sales Population Income
## [1,] 174.4 68.5 16.7
## [2,] 164.4 45.2 16.8
## [3,] 244.2 91.3 18.2
## [4,] 154.6 47.8 16.3
## [5,] 181.6 46.9 17.3
## [6,] 207.5 66.1 18.2
## [7,] 152.8 49.5 15.9
## [8,] 163.2 52.0 17.2
## [9,] 145.4 48.9 16.6
## [10,] 137.2 38.4 16.0
## [11,] 241.9 87.9 18.3
## [12,] 191.1 72.8 17.1
## [13,] 232.0 88.4 17.4
## [14,] 145.3 42.9 15.8
## [15,] 161.1 52.5 17.8
## [16,] 209.7 85.7 18.4
## [17,] 146.4 41.3 16.5
## [18,] 144.0 51.7 16.3
## [19,] 232.6 89.6 18.1
## [20,] 224.1 82.7 19.1
## [21,] 166.5 52.3 16.0
# Data used in this model is a standardized transform, performed inline
fit <- lm(y ~ 0 + x1 + x2, data =
transform(df,
y = c(scale(y )) / sqrt(nrow(df) - 1),
x1 = c(scale(x1)) / sqrt(nrow(df) - 1),
x2 = c(scale(x2)) / sqrt(nrow(df) - 1)))
model.frame(fit) # Table 7.5(b) Transformed Data
## y x1 x2
## 1 -0.046368 0.07783 -0.102052
## 2 -0.108153 -0.20198 -0.079008
## 3 0.384889 0.35163 0.243608
## 4 -0.168702 -0.17075 -0.194228
## 5 -0.001883 -0.18156 0.036212
## 6 0.158139 0.04901 0.243608
## 7 -0.179823 -0.15034 -0.286404
## 8 -0.115567 -0.12032 0.013168
## 9 -0.225543 -0.15754 -0.125096
## 10 -0.276207 -0.28364 -0.263360
## 11 0.370679 0.31080 0.266652
## 12 0.056812 0.12947 -0.009876
## 13 0.309512 0.31680 0.059256
## 14 -0.226161 -0.22960 -0.309448
## 15 -0.128542 -0.11431 0.151432
## 16 0.171732 0.28438 0.289696
## 17 -0.219365 -0.24881 -0.148140
## 18 -0.234193 -0.12392 -0.194228
## 19 0.313219 0.33121 0.220564
## 20 0.260702 0.24835 0.451005
## 21 -0.095178 -0.11671 -0.263360
coef(fit)
## x1 x2
## 0.7484 0.2511
# Return to regular coefficients
with(df, coef(fit) * c(sd(y) / sd(x1), sd(y) / sd(x2))) # (7.53)
## x1 x2
## 1.455 9.366
df <- get("CH07TA06", envir = env)
names(df) <- c("x1", "x2", "y")
cbind("Crew Size" = df$x1,
"Bonus Pay" = df$x2,
"Crew Productivity" = df$y)
## Crew Size Bonus Pay Crew Productivity
## [1,] 4 2 42
## [2,] 4 2 39
## [3,] 4 3 48
## [4,] 4 3 51
## [5,] 6 2 49
## [6,] 6 2 53
## [7,] 6 3 61
## [8,] 6 3 60
anova(lm(y ~ ., df))
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## x1 1 231.1 231.1 65.6 0.00047 ***
## x2 1 171.1 171.1 48.5 0.00094 ***
## Residuals 5 17.6 3.5
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(lm(y ~ x1, df))
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## x1 1 231 231.1 7.35 0.035 *
## Residuals 6 189 31.5
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(lm(y ~ x2, df))
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## x2 1 171 171.1 4.13 0.088 .
## Residuals 6 249 41.5
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
df <- get("CH07TA01", envir = env)
names(df) <- c("x1", "x2", "x3", "y")
with(df, pairs(cbind(x1, x2, x3), pch=19))
with(df, cor(cbind(x1, x2, x3)))
## x1 x2 x3
## x1 1.0000 0.92384 0.45778
## x2 0.9238 1.00000 0.08467
## x3 0.4578 0.08467 1.00000