df <- data.frame(x1 = c(0.4, 0.4, 3.1, 0.6, 4.7, 1.7, 9.4, 10.1, 11.6, 12.6,
10.9, 23.1, 23.1, 21.6, 23.1, 1.9, 26.8, 29.9), x2 = c(52, 23, 19, 34, 24,
65, 44, 31, 29, 58, 37, 46, 50, 44, 56, 36, 58, 51), x3 = c(158, 163, 37,
157, 59, 123, 46, 117, 173, 112, 111, 114, 134, 73, 168, 143, 202, 124),
y = c(64, 60, 71, 61, 54, 77, 81, 93, 93, 51, 76, 96, 77, 93, 95, 54, 168,
99))
library(xtable)
xdf <- xtable(df)
digits(xdf) <- c(0, 1, 0, 0, 0)
print(xdf, type = "html", html.table.attributes = "border=\"1\"; style=\"border-style: solid;border-width: 1px;\"")
| x1 | x2 | x3 | y | |
|---|---|---|---|---|
| 1 | 0.4 | 52 | 158 | 64 |
| 2 | 0.4 | 23 | 163 | 60 |
| 3 | 3.1 | 19 | 37 | 71 |
| 4 | 0.6 | 34 | 157 | 61 |
| 5 | 4.7 | 24 | 59 | 54 |
| 6 | 1.7 | 65 | 123 | 77 |
| 7 | 9.4 | 44 | 46 | 81 |
| 8 | 10.1 | 31 | 117 | 93 |
| 9 | 11.6 | 29 | 173 | 93 |
| 10 | 12.6 | 58 | 112 | 51 |
| 11 | 10.9 | 37 | 111 | 76 |
| 12 | 23.1 | 46 | 114 | 96 |
| 13 | 23.1 | 50 | 134 | 77 |
| 14 | 21.6 | 44 | 73 | 93 |
| 15 | 23.1 | 56 | 168 | 95 |
| 16 | 1.9 | 36 | 143 | 54 |
| 17 | 26.8 | 58 | 202 | 168 |
| 18 | 29.9 | 51 | 124 | 99 |
rm(xdf)
detach("package:xtable", unload = TRUE)
lm.sol <- lm(y ~ x1 + x2 + x3, data = df)
summary(lm.sol)
##
## Call:
## lm(formula = y ~ x1 + x2 + x3, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -28.35 -11.38 -2.66 12.10 48.81
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 43.6501 18.0544 2.42 0.0298 *
## x1 1.7853 0.5398 3.31 0.0052 **
## x2 -0.0833 0.4204 -0.20 0.8458
## x3 0.1610 0.1116 1.44 0.1710
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 20 on 14 degrees of freedom
## Multiple R-squared: 0.549, Adjusted R-squared: 0.453
## F-statistic: 5.69 on 3 and 14 DF, p-value: 0.00923
故y=43.65+1.785*x1-0.083*x2+0.161*x3为所求多元线性回归方程。
虽然R-squared值0.549不高,但由于p-value=0.00923<0.05,仍然可认为显著。
lm.step <- step(lm.sol)
## Start: AIC=111.3
## y ~ x1 + x2 + x3
##
## Df Sum of Sq RSS AIC
## - x2 1 16 5599 109
## <none> 5584 111
## - x3 1 831 6414 112
## - x1 1 4363 9947 120
##
## Step: AIC=109.3
## y ~ x1 + x3
##
## Df Sum of Sq RSS AIC
## <none> 5599 109
## - x3 1 833 6433 110
## - x1 1 5169 10769 119
运用step函数后发现程序自动去掉x2变量后,AIC值111.27-109.32=1.95,减小。
summary(lm.step)
##
## Call:
## lm(formula = y ~ x1 + x3, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -29.71 -11.32 -2.95 11.29 48.68
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 41.479 13.883 2.99 0.0092 **
## x1 1.737 0.467 3.72 0.0020 **
## x3 0.155 0.104 1.49 0.1559
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 19.3 on 15 degrees of freedom
## Multiple R-squared: 0.548, Adjusted R-squared: 0.488
## F-statistic: 9.1 on 2 and 15 DF, p-value: 0.00259
此时R-squared值0.548微降,但p-value=0.002589<0.05,显著性得到改善。
lm.step2 <- step(lm.step)
## Start: AIC=109.3
## y ~ x1 + x3
##
## Df Sum of Sq RSS AIC
## <none> 5599 109
## - x3 1 833 6433 110
## - x1 1 5169 10769 119
再次运行step无法进一步自动去掉x3,故可以将y=41.48+1.7374*x1+0.159*x3作为更好的多元线性方程模型。