第四周作业

6.2 某地区土壤所含可给态磷分析

输入数据:

df <- data.frame(x1 = c(0.4, 0.4, 3.1, 0.6, 4.7, 1.7, 9.4, 10.1, 11.6, 12.6, 
    10.9, 23.1, 23.1, 21.6, 23.1, 1.9, 26.8, 29.9), x2 = c(52, 23, 19, 34, 24, 
    65, 44, 31, 29, 58, 37, 46, 50, 44, 56, 36, 58, 51), x3 = c(158, 163, 37, 
    157, 59, 123, 46, 117, 173, 112, 111, 114, 134, 73, 168, 143, 202, 124), 
    y = c(64, 60, 71, 61, 54, 77, 81, 93, 93, 51, 76, 96, 77, 93, 95, 54, 168, 
        99))

显示数据:

library(xtable)
xdf <- xtable(df)
digits(xdf) <- c(0, 1, 0, 0, 0)
print(xdf, type = "html", html.table.attributes = "border=\"1\"; style=\"border-style: solid;border-width: 1px;\"")
x1 x2 x3 y
1 0.4 52 158 64
2 0.4 23 163 60
3 3.1 19 37 71
4 0.6 34 157 61
5 4.7 24 59 54
6 1.7 65 123 77
7 9.4 44 46 81
8 10.1 31 117 93
9 11.6 29 173 93
10 12.6 58 112 51
11 10.9 37 111 76
12 23.1 46 114 96
13 23.1 50 134 77
14 21.6 44 73 93
15 23.1 56 168 95
16 1.9 36 143 54
17 26.8 58 202 168
18 29.9 51 124 99
rm(xdf)
detach("package:xtable", unload = TRUE)

求Y关于X的多元线性回归方程

lm.sol <- lm(y ~ x1 + x2 + x3, data = df)
summary(lm.sol)
## 
## Call:
## lm(formula = y ~ x1 + x2 + x3, data = df)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -28.35 -11.38  -2.66  12.10  48.81 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  43.6501    18.0544    2.42   0.0298 * 
## x1            1.7853     0.5398    3.31   0.0052 **
## x2           -0.0833     0.4204   -0.20   0.8458   
## x3            0.1610     0.1116    1.44   0.1710   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 20 on 14 degrees of freedom
## Multiple R-squared: 0.549,   Adjusted R-squared: 0.453 
## F-statistic: 5.69 on 3 and 14 DF,  p-value: 0.00923

故y=43.65+1.785*x1-0.083*x2+0.161*x3为所求多元线性回归方程。

方程显著性分析

虽然R-squared值0.549不高,但由于p-value=0.00923<0.05,仍然可认为显著。

变量的逐步回归分析

lm.step <- step(lm.sol)
## Start:  AIC=111.3
## y ~ x1 + x2 + x3
## 
##        Df Sum of Sq  RSS AIC
## - x2    1        16 5599 109
## <none>              5584 111
## - x3    1       831 6414 112
## - x1    1      4363 9947 120
## 
## Step:  AIC=109.3
## y ~ x1 + x3
## 
##        Df Sum of Sq   RSS AIC
## <none>               5599 109
## - x3    1       833  6433 110
## - x1    1      5169 10769 119

运用step函数后发现程序自动去掉x2变量后,AIC值111.27-109.32=1.95,减小。

summary(lm.step)
## 
## Call:
## lm(formula = y ~ x1 + x3, data = df)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -29.71 -11.32  -2.95  11.29  48.68 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   41.479     13.883    2.99   0.0092 **
## x1             1.737      0.467    3.72   0.0020 **
## x3             0.155      0.104    1.49   0.1559   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 19.3 on 15 degrees of freedom
## Multiple R-squared: 0.548,   Adjusted R-squared: 0.488 
## F-statistic:  9.1 on 2 and 15 DF,  p-value: 0.00259

此时R-squared值0.548微降,但p-value=0.002589<0.05,显著性得到改善。

lm.step2 <- step(lm.step)
## Start:  AIC=109.3
## y ~ x1 + x3
## 
##        Df Sum of Sq   RSS AIC
## <none>               5599 109
## - x3    1       833  6433 110
## - x1    1      5169 10769 119

再次运行step无法进一步自动去掉x3,故可以将y=41.48+1.7374*x1+0.159*x3作为更好的多元线性方程模型。