回归是统计学的核心。
回归是一个广义的概念,是指用一个或多个自变量(解释变量)来预测因变量(被解释变量)的方法。
回归分析可以用来挑选与被解释变量相关的解释变量,分析两者的关系,也可以通过解释变量来预测被解释变量。
重点介绍普通最小二乘(OLS)回归法,包括简单线性回归、多项式回归、多元线性回归、逻辑回归。
一元线性回归方程:\[{\hat{y}=\beta_0+\beta_1x}\]
其中: \(\hat{y}\)为解释变量的估计值。\(\beta_0\)为回归线的截距。\(\beta_1\)为回归线的斜率。\(\beta_0\),\(\beta_1\)被称为回归系数。
\((y-\hat{y})\)被称为预测误差或残差。最小二乘回归用于获得使所有数据点残差平方和最小的回归线。
假定上述方程中有n个观察值,则:\[{{y_i}=\beta_0+\beta_1x_i+\varepsilon_i, i=1,2,\cdots,n}\]
最小二乘法是拟合出残差平方和最小的直线,残差平方和SSE为:
\[SSE=\sum_{i=1}^{n}{\varepsilon_i}^2=\sum_{i=1}^{n}(y_i-\beta_0-\beta_1x_i)^2\]
为了能够解释OLS模型的系数,数据必须满足以下统计假设:
正态性 对于固定的自变量值,因变量值成正态分布。
独立性 \(y_i\)值之间相互独立。
线性 因变量与自变量之间为线性相关。
同方差性 因变量的方差不随自变量的水平不同而变化。
如果违背了以上假设,得到的统计显著性检验结果和所得的置信区间很可能就不精确。注意,OLS回归还假定自变量是固定的且测量无误差,但在实践中通常都放松了这个假设。
通过一个简单线性回归示例来熟悉R操作。使用基础安装中的数据集women进行回归分析。
该数据集提供了15位女性的身高和体重信息,我们想获得一个通过身高来预测体重的模型。
women
## height weight
## 1 58 115
## 2 59 117
## 3 60 120
## 4 61 123
## 5 62 126
## 6 63 129
## 7 64 132
## 8 65 135
## 9 66 139
## 10 67 142
## 11 68 146
## 12 69 150
## 13 70 154
## 14 71 159
## 15 72 164
fit <- lm(weight ~ height, data = women)
summary(fit)
##
## Call:
## lm(formula = weight ~ height, data = women)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.7333 -1.1333 -0.3833 0.7417 3.1167
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -87.51667 5.93694 -14.74 1.71e-09 ***
## height 3.45000 0.09114 37.85 1.09e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.525 on 13 degrees of freedom
## Multiple R-squared: 0.991, Adjusted R-squared: 0.9903
## F-statistic: 1433 on 1 and 13 DF, p-value: 1.091e-14
通过输出结果,可以得到预测等式: \[ weight=-87.51667+3.45000*height \]
因为身高不可能为0,所以没必要给截距项(-87.51667)一个物理解释,它仅仅是一个常量调整项。
在\(Pr(>|t|)\)一栏,可以看到回归系数(3.45)显著不为0(p<0.001)。表明身高每增高1英寸,体重将预期增加3.45磅。
\(R^2\)拟合优度为0.991。R²的值越接近1,说明回归直线对观测值的拟合程度越好。
women$weight
## [1] 115 117 120 123 126 129 132 135 139 142 146 150 154 159 164
fitted(fit)
## 1 2 3 4 5 6 7 8
## 112.5833 116.0333 119.4833 122.9333 126.3833 129.8333 133.2833 136.7333
## 9 10 11 12 13 14 15
## 140.1833 143.6333 147.0833 150.5333 153.9833 157.4333 160.8833
residuals(fit)
## 1 2 3 4 5 6
## 2.41666667 0.96666667 0.51666667 0.06666667 -0.38333333 -0.83333333
## 7 8 9 10 11 12
## -1.28333333 -1.73333333 -1.18333333 -1.63333333 -1.08333333 -0.53333333
## 13 14 15
## 0.01666667 1.56666667 3.11666667
plot(women$height, women$weight, main = "Women",
xlab = "Height (in inches)", ylab = "Weight (in pounds)")
# add the line of best fit
abline(fit)
通过观察上图,发现用曲线来拟合应该会提高回归的预测精度。因此,我们添加一个hight的二次项。
fit2 <- lm(weight ~ height + I(height^2), data = women)
注意:I(height^2)表示height的平方项。因为^符号在回归表达式中有特殊的含义,所以此处必须要用I()函数。
summary(fit2)
##
## Call:
## lm(formula = weight ~ height + I(height^2), data = women)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.50941 -0.29611 -0.00941 0.28615 0.59706
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 261.87818 25.19677 10.393 2.36e-07 ***
## height -7.34832 0.77769 -9.449 6.58e-07 ***
## I(height^2) 0.08306 0.00598 13.891 9.32e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3841 on 12 degrees of freedom
## Multiple R-squared: 0.9995, Adjusted R-squared: 0.9994
## F-statistic: 1.139e+04 on 2 and 12 DF, p-value: < 2.2e-16
通过输出结果,可以得到预测等式: \[ weight=261.87818-7.34832*height+0.08306*height^2 \]
在\(Pr(>|t|)\)一栏,可以看到回归系数都显著不为0(p<0.001)。
\(R^2\)拟合优度为0.9995。比之前的高,说明加入二次项的回归模型拟合的效果更好。
plot(women$height, women$weight, main = "Women",
xlab = "Height (in inches)", ylab = "Weight (in lbs)")
lines(women$height, fitted(fit2))
当预测变量不止一个时,简单线性回归就变成了多元线性回归,分析也更加复杂。
以基础包中的state.x77数据集为例,来探究一个州的犯罪率和其他因素的关系,其他因素包括 人口、文盲率、平均收入和结霜天数(温度在冰点以下的天数)。
state.x77
## Population Income Illiteracy Life Exp Murder HS Grad Frost
## Alabama 3615 3624 2.1 69.05 15.1 41.3 20
## Alaska 365 6315 1.5 69.31 11.3 66.7 152
## Arizona 2212 4530 1.8 70.55 7.8 58.1 15
## Arkansas 2110 3378 1.9 70.66 10.1 39.9 65
## California 21198 5114 1.1 71.71 10.3 62.6 20
## Colorado 2541 4884 0.7 72.06 6.8 63.9 166
## Connecticut 3100 5348 1.1 72.48 3.1 56.0 139
## Delaware 579 4809 0.9 70.06 6.2 54.6 103
## Florida 8277 4815 1.3 70.66 10.7 52.6 11
## Georgia 4931 4091 2.0 68.54 13.9 40.6 60
## Hawaii 868 4963 1.9 73.60 6.2 61.9 0
## Idaho 813 4119 0.6 71.87 5.3 59.5 126
## Illinois 11197 5107 0.9 70.14 10.3 52.6 127
## Indiana 5313 4458 0.7 70.88 7.1 52.9 122
## Iowa 2861 4628 0.5 72.56 2.3 59.0 140
## Kansas 2280 4669 0.6 72.58 4.5 59.9 114
## Kentucky 3387 3712 1.6 70.10 10.6 38.5 95
## Louisiana 3806 3545 2.8 68.76 13.2 42.2 12
## Maine 1058 3694 0.7 70.39 2.7 54.7 161
## Maryland 4122 5299 0.9 70.22 8.5 52.3 101
## Massachusetts 5814 4755 1.1 71.83 3.3 58.5 103
## Michigan 9111 4751 0.9 70.63 11.1 52.8 125
## Minnesota 3921 4675 0.6 72.96 2.3 57.6 160
## Mississippi 2341 3098 2.4 68.09 12.5 41.0 50
## Missouri 4767 4254 0.8 70.69 9.3 48.8 108
## Montana 746 4347 0.6 70.56 5.0 59.2 155
## Nebraska 1544 4508 0.6 72.60 2.9 59.3 139
## Nevada 590 5149 0.5 69.03 11.5 65.2 188
## New Hampshire 812 4281 0.7 71.23 3.3 57.6 174
## New Jersey 7333 5237 1.1 70.93 5.2 52.5 115
## New Mexico 1144 3601 2.2 70.32 9.7 55.2 120
## New York 18076 4903 1.4 70.55 10.9 52.7 82
## North Carolina 5441 3875 1.8 69.21 11.1 38.5 80
## North Dakota 637 5087 0.8 72.78 1.4 50.3 186
## Ohio 10735 4561 0.8 70.82 7.4 53.2 124
## Oklahoma 2715 3983 1.1 71.42 6.4 51.6 82
## Oregon 2284 4660 0.6 72.13 4.2 60.0 44
## Pennsylvania 11860 4449 1.0 70.43 6.1 50.2 126
## Rhode Island 931 4558 1.3 71.90 2.4 46.4 127
## South Carolina 2816 3635 2.3 67.96 11.6 37.8 65
## South Dakota 681 4167 0.5 72.08 1.7 53.3 172
## Tennessee 4173 3821 1.7 70.11 11.0 41.8 70
## Texas 12237 4188 2.2 70.90 12.2 47.4 35
## Utah 1203 4022 0.6 72.90 4.5 67.3 137
## Vermont 472 3907 0.6 71.64 5.5 57.1 168
## Virginia 4981 4701 1.4 70.08 9.5 47.8 85
## Washington 3559 4864 0.6 71.72 4.3 63.5 32
## West Virginia 1799 3617 1.4 69.48 6.7 41.6 100
## Wisconsin 4589 4468 0.7 72.48 3.0 54.5 149
## Wyoming 376 4566 0.6 70.29 6.9 62.9 173
## Area
## Alabama 50708
## Alaska 566432
## Arizona 113417
## Arkansas 51945
## California 156361
## Colorado 103766
## Connecticut 4862
## Delaware 1982
## Florida 54090
## Georgia 58073
## Hawaii 6425
## Idaho 82677
## Illinois 55748
## Indiana 36097
## Iowa 55941
## Kansas 81787
## Kentucky 39650
## Louisiana 44930
## Maine 30920
## Maryland 9891
## Massachusetts 7826
## Michigan 56817
## Minnesota 79289
## Mississippi 47296
## Missouri 68995
## Montana 145587
## Nebraska 76483
## Nevada 109889
## New Hampshire 9027
## New Jersey 7521
## New Mexico 121412
## New York 47831
## North Carolina 48798
## North Dakota 69273
## Ohio 40975
## Oklahoma 68782
## Oregon 96184
## Pennsylvania 44966
## Rhode Island 1049
## South Carolina 30225
## South Dakota 75955
## Tennessee 41328
## Texas 262134
## Utah 82096
## Vermont 9267
## Virginia 39780
## Washington 66570
## West Virginia 24070
## Wisconsin 54464
## Wyoming 97203
因为lm()函数需数据框格式(state.x77数据集是矩阵),为处理方便,需要将其转化为数据框格式。
class(state.x77)
## [1] "matrix"
states <- as.data.frame(state.x77[, c("Murder", "Population",
"Illiteracy", "Income", "Frost")])
states
## Murder Population Illiteracy Income Frost
## Alabama 15.1 3615 2.1 3624 20
## Alaska 11.3 365 1.5 6315 152
## Arizona 7.8 2212 1.8 4530 15
## Arkansas 10.1 2110 1.9 3378 65
## California 10.3 21198 1.1 5114 20
## Colorado 6.8 2541 0.7 4884 166
## Connecticut 3.1 3100 1.1 5348 139
## Delaware 6.2 579 0.9 4809 103
## Florida 10.7 8277 1.3 4815 11
## Georgia 13.9 4931 2.0 4091 60
## Hawaii 6.2 868 1.9 4963 0
## Idaho 5.3 813 0.6 4119 126
## Illinois 10.3 11197 0.9 5107 127
## Indiana 7.1 5313 0.7 4458 122
## Iowa 2.3 2861 0.5 4628 140
## Kansas 4.5 2280 0.6 4669 114
## Kentucky 10.6 3387 1.6 3712 95
## Louisiana 13.2 3806 2.8 3545 12
## Maine 2.7 1058 0.7 3694 161
## Maryland 8.5 4122 0.9 5299 101
## Massachusetts 3.3 5814 1.1 4755 103
## Michigan 11.1 9111 0.9 4751 125
## Minnesota 2.3 3921 0.6 4675 160
## Mississippi 12.5 2341 2.4 3098 50
## Missouri 9.3 4767 0.8 4254 108
## Montana 5.0 746 0.6 4347 155
## Nebraska 2.9 1544 0.6 4508 139
## Nevada 11.5 590 0.5 5149 188
## New Hampshire 3.3 812 0.7 4281 174
## New Jersey 5.2 7333 1.1 5237 115
## New Mexico 9.7 1144 2.2 3601 120
## New York 10.9 18076 1.4 4903 82
## North Carolina 11.1 5441 1.8 3875 80
## North Dakota 1.4 637 0.8 5087 186
## Ohio 7.4 10735 0.8 4561 124
## Oklahoma 6.4 2715 1.1 3983 82
## Oregon 4.2 2284 0.6 4660 44
## Pennsylvania 6.1 11860 1.0 4449 126
## Rhode Island 2.4 931 1.3 4558 127
## South Carolina 11.6 2816 2.3 3635 65
## South Dakota 1.7 681 0.5 4167 172
## Tennessee 11.0 4173 1.7 3821 70
## Texas 12.2 12237 2.2 4188 35
## Utah 4.5 1203 0.6 4022 137
## Vermont 5.5 472 0.6 3907 168
## Virginia 9.5 4981 1.4 4701 85
## Washington 4.3 3559 0.6 4864 32
## West Virginia 6.7 1799 1.4 3617 100
## Wisconsin 3.0 4589 0.7 4468 149
## Wyoming 6.9 376 0.6 4566 173
Murder为谋杀率,Population为人口,Illiteracy为文盲率,Income为收入,Frost为结霜天气。
cor(states)
## Murder Population Illiteracy Income Frost
## Murder 1.0000000 0.3436428 0.7029752 -0.2300776 -0.5388834
## Population 0.3436428 1.0000000 0.1076224 0.2082276 -0.3321525
## Illiteracy 0.7029752 0.1076224 1.0000000 -0.4370752 -0.6719470
## Income -0.2300776 0.2082276 -0.4370752 1.0000000 0.2262822
## Frost -0.5388834 -0.3321525 -0.6719470 0.2262822 1.0000000
library(car)
## Loading required package: carData
scatterplotMatrix(states, spread = FALSE, lty.smooth = 1,
main = "Scatterplot Matrix")
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in axis(side = side, at = at, labels = labels, ...): "spread"不是图
## 形参数
## Warning in axis(side = side, at = at, labels = labels, ...): "lty.smooth"不
## 是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in axis(side = side, at = at, labels = labels, ...): "spread"不是图
## 形参数
## Warning in axis(side = side, at = at, labels = labels, ...): "lty.smooth"不
## 是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in axis(side = side, at = at, labels = labels, ...): "spread"不是图
## 形参数
## Warning in axis(side = side, at = at, labels = labels, ...): "lty.smooth"不
## 是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in axis(side = side, at = at, labels = labels, ...): "spread"不是图
## 形参数
## Warning in axis(side = side, at = at, labels = labels, ...): "lty.smooth"不
## 是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in axis(side = side, at = at, labels = labels, ...): "spread"不是图
## 形参数
## Warning in axis(side = side, at = at, labels = labels, ...): "lty.smooth"不
## 是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in axis(side = side, at = at, labels = labels, ...): "spread"不是图
## 形参数
## Warning in axis(side = side, at = at, labels = labels, ...): "lty.smooth"不
## 是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in axis(side = side, at = at, labels = labels, ...): "spread"不是图
## 形参数
## Warning in axis(side = side, at = at, labels = labels, ...): "lty.smooth"不
## 是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in axis(side = side, at = at, labels = labels, ...): "spread"不是图
## 形参数
## Warning in axis(side = side, at = at, labels = labels, ...): "lty.smooth"不
## 是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in axis(side = side, at = at, labels = labels, ...): "spread"不是图
## 形参数
## Warning in axis(side = side, at = at, labels = labels, ...): "lty.smooth"不
## 是图形参数
## Warning in axis(side = side, at = at, labels = labels, ...): "spread"不是图
## 形参数
## Warning in axis(side = side, at = at, labels = labels, ...): "lty.smooth"不
## 是图形参数
fit3 <- lm(Murder ~ Population + Illiteracy + Income +
Frost, data = states)
summary(fit3)
##
## Call:
## lm(formula = Murder ~ Population + Illiteracy + Income + Frost,
## data = states)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.7960 -1.6495 -0.0811 1.4815 7.6210
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.235e+00 3.866e+00 0.319 0.7510
## Population 2.237e-04 9.052e-05 2.471 0.0173 *
## Illiteracy 4.143e+00 8.744e-01 4.738 2.19e-05 ***
## Income 6.442e-05 6.837e-04 0.094 0.9253
## Frost 5.813e-04 1.005e-02 0.058 0.9541
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.535 on 45 degrees of freedom
## Multiple R-squared: 0.567, Adjusted R-squared: 0.5285
## F-statistic: 14.73 on 4 and 45 DF, p-value: 9.133e-08
Illiteracy在p<0.001的水平下显著不为0。 Population在p<0.05的水平下显著不为0。
在尝试获取一个回归方程时,实际上面临对着从众多可能的模型中做选择的问题。是不是所有的变量都要包括?抑或去掉那个对预测贡献不显著的变量?本节讨论的问题,就是如何在候选模型中进行筛选。
AIC(Akaike Information Criterion,赤池信息准则)可以用来比较模型,它考虑了模型的 统计拟合度以及用来拟合的参数数目。 AIC值越小的模型要优先选择,它说明模型用较少的参数 获得了足够的拟合度。
\[AIC=2K-2ln(L)\] 其中,k是变量的数量,L是似然函数。
在上节states的多元回归模型中,我们发现Income和Frost的回归系 数不显著,因此,我们可以检验不含这两个变量的模型与包含这两项的模型预测效果是否一样好。
fit4 <- lm(Murder ~ Population + Illiteracy + Income +
Frost, data = states)
fit5 <- lm(Murder ~ Population + Illiteracy, data = states)
AIC(fit4, fit5)
## df AIC
## fit4 6 241.6429
## fit5 4 237.6565
此处AIC值表明没有Income和Frost的模型更佳。
当通过一系列连续型或类别型预测变量来预测二值型(0,1)结果变量时, Logistic回归是一个非 常有用的工具。Logistic回归模型假设Y服从二项分布,线性模型的拟合形式为:
\[ln(\frac{\pi }{1-\pi })=\beta _0+\sum_{j=1}^{n}\beta _jx_j\] 其中\(\pi\)是y的条件均值(即给定一系列x的值时y= 1的概率)。
以AER包中的数据框Affairs为例,我们将通过探究婚外情的数据来阐述Logistic回归的过程。
1、展现Affairs数据集
data(Affairs, package = "AER")
Affairs
## affairs gender age yearsmarried children religiousness education
## 4 0 male 37.0 10.000 no 3 18
## 5 0 female 27.0 4.000 no 4 14
## 11 0 female 32.0 15.000 yes 1 12
## 16 0 male 57.0 15.000 yes 5 18
## 23 0 male 22.0 0.750 no 2 17
## 29 0 female 32.0 1.500 no 2 17
## 44 0 female 22.0 0.750 no 2 12
## 45 0 male 57.0 15.000 yes 2 14
## 47 0 female 32.0 15.000 yes 4 16
## 49 0 male 22.0 1.500 no 4 14
## 50 0 male 37.0 15.000 yes 2 20
## 55 0 male 27.0 4.000 yes 4 18
## 64 0 male 47.0 15.000 yes 5 17
## 80 0 female 22.0 1.500 no 2 17
## 86 0 female 27.0 4.000 no 4 14
## 93 0 female 37.0 15.000 yes 1 17
## 108 0 female 37.0 15.000 yes 2 18
## 114 0 female 22.0 0.750 no 3 16
## 115 0 female 22.0 1.500 no 2 16
## 116 0 female 27.0 10.000 yes 2 14
## 123 0 female 22.0 1.500 no 2 16
## 127 0 female 22.0 1.500 no 2 16
## 129 0 female 27.0 10.000 yes 4 16
## 134 0 female 32.0 10.000 yes 3 14
## 137 0 male 37.0 4.000 yes 2 20
## 139 0 female 22.0 1.500 no 2 18
## 147 0 female 27.0 7.000 no 4 16
## 151 0 male 42.0 15.000 yes 5 20
## 153 0 male 27.0 4.000 yes 3 16
## 155 0 female 27.0 4.000 yes 3 17
## 162 0 male 42.0 15.000 yes 4 20
## 163 0 female 22.0 1.500 no 3 16
## 165 0 male 27.0 0.417 no 4 17
## 168 0 female 42.0 15.000 yes 5 14
## 170 0 male 32.0 4.000 yes 1 18
## 172 0 female 22.0 1.500 no 4 16
## 184 0 female 42.0 15.000 yes 3 12
## 187 0 female 22.0 4.000 no 4 17
## 192 0 male 22.0 1.500 yes 1 14
## 194 0 female 22.0 0.750 no 3 16
## 210 0 male 32.0 10.000 yes 5 20
## 217 0 male 52.0 15.000 yes 5 18
## 220 0 female 22.0 0.417 no 5 14
## 224 0 female 27.0 4.000 yes 2 18
## 227 0 female 32.0 7.000 yes 5 17
## 228 0 male 22.0 4.000 no 3 16
## 239 0 female 27.0 7.000 yes 4 18
## 241 0 female 42.0 15.000 yes 2 18
## 245 0 male 27.0 1.500 yes 4 16
## 249 0 male 42.0 15.000 yes 2 20
## 262 0 female 22.0 0.750 no 5 14
## 265 0 male 32.0 7.000 yes 2 20
## 267 0 male 27.0 4.000 yes 5 20
## 269 0 male 27.0 10.000 yes 4 20
## 271 0 male 22.0 4.000 no 1 18
## 277 0 female 37.0 15.000 yes 4 14
## 290 0 male 22.0 1.500 yes 5 16
## 292 0 female 37.0 15.000 yes 4 17
## 293 0 female 27.0 0.750 no 4 17
## 295 0 male 32.0 10.000 yes 4 20
## 299 0 female 47.0 15.000 yes 5 14
## 320 0 male 37.0 10.000 yes 3 20
## 321 0 female 22.0 0.750 no 2 16
## 324 0 male 27.0 4.000 no 2 18
## 334 0 male 32.0 7.000 no 4 20
## 351 0 male 42.0 15.000 yes 2 17
## 355 0 male 37.0 10.000 yes 4 20
## 361 0 female 47.0 15.000 yes 3 17
## 362 0 female 22.0 1.500 no 5 16
## 366 0 female 27.0 1.500 no 2 16
## 370 0 female 27.0 4.000 no 3 17
## 374 0 female 32.0 10.000 yes 5 14
## 378 0 female 22.0 0.125 no 2 12
## 381 0 male 47.0 15.000 yes 4 14
## 382 0 male 32.0 15.000 yes 1 14
## 383 0 male 27.0 7.000 yes 4 16
## 384 0 female 22.0 1.500 yes 3 16
## 400 0 male 27.0 4.000 yes 3 17
## 403 0 female 22.0 1.500 no 3 16
## 409 0 male 57.0 15.000 yes 2 14
## 412 0 male 17.5 1.500 yes 3 18
## 413 0 male 57.0 15.000 yes 4 20
## 416 0 female 22.0 0.750 no 2 16
## 418 0 male 42.0 4.000 no 4 17
## 422 0 female 22.0 1.500 yes 4 12
## 435 0 female 22.0 0.417 no 1 17
## 439 0 female 32.0 15.000 yes 4 17
## 445 0 female 27.0 1.500 no 3 18
## 447 0 female 22.0 1.500 yes 3 14
## 448 0 female 37.0 15.000 yes 3 14
## 449 0 female 32.0 15.000 yes 4 14
## 478 0 male 37.0 10.000 yes 2 14
## 482 0 male 37.0 10.000 yes 4 16
## 486 0 male 57.0 15.000 yes 5 20
## 489 0 male 27.0 0.417 no 1 16
## 490 0 female 42.0 15.000 yes 5 14
## 491 0 male 57.0 15.000 yes 3 16
## 492 0 male 37.0 10.000 yes 1 16
## 503 0 male 37.0 15.000 yes 3 17
## 508 0 male 37.0 15.000 yes 4 20
## 509 0 female 27.0 10.000 yes 5 14
## 512 0 male 37.0 10.000 yes 2 18
## 515 0 female 22.0 0.125 no 4 12
## 517 0 male 57.0 15.000 yes 5 20
## 532 0 female 37.0 15.000 yes 4 18
## 533 0 male 22.0 4.000 yes 4 14
## 535 0 male 27.0 7.000 yes 4 18
## 537 0 male 57.0 15.000 yes 4 20
## 538 0 male 32.0 15.000 yes 3 14
## 543 0 female 22.0 1.500 no 2 14
## 547 0 female 32.0 7.000 yes 4 17
## 550 0 female 37.0 15.000 yes 4 17
## 558 0 female 32.0 1.500 no 5 18
## 571 0 male 42.0 10.000 yes 5 20
## 578 0 female 27.0 7.000 no 3 16
## 583 0 male 37.0 15.000 no 4 20
## 586 0 male 37.0 15.000 yes 4 14
## 594 0 male 32.0 10.000 no 5 18
## 597 0 female 22.0 0.750 no 4 16
## 602 0 female 27.0 7.000 yes 4 12
## 603 0 female 27.0 7.000 yes 2 16
## 604 0 female 42.0 15.000 yes 5 18
## 612 0 male 42.0 15.000 yes 4 17
## 613 0 female 27.0 7.000 yes 2 16
## 621 0 female 22.0 1.500 no 3 16
## 627 0 male 37.0 15.000 yes 5 20
## 630 0 female 22.0 0.125 no 2 14
## 631 0 male 27.0 1.500 no 4 16
## 632 0 male 32.0 1.500 no 2 18
## 639 0 male 27.0 1.500 no 2 17
## 645 0 female 27.0 10.000 yes 4 16
## 647 0 male 42.0 15.000 yes 4 18
## 648 0 female 27.0 1.500 no 2 16
## 651 0 male 27.0 4.000 no 2 18
## 655 0 female 32.0 10.000 yes 3 14
## 667 0 female 32.0 15.000 yes 3 18
## 670 0 female 22.0 0.750 no 2 18
## 671 0 female 37.0 15.000 yes 2 16
## 673 0 male 27.0 4.000 yes 4 20
## 701 0 male 27.0 4.000 no 1 20
## 705 0 female 27.0 10.000 yes 2 12
## 706 0 female 32.0 15.000 yes 5 18
## 709 0 male 27.0 7.000 yes 5 12
## 717 0 male 52.0 15.000 yes 2 18
## 719 0 male 27.0 4.000 no 3 20
## 723 0 male 37.0 4.000 yes 1 18
## 724 0 male 27.0 4.000 yes 4 14
## 726 0 female 52.0 15.000 yes 5 12
## 734 0 female 57.0 15.000 yes 4 16
## 735 0 male 27.0 7.000 yes 1 16
## 736 0 male 37.0 7.000 yes 4 20
## 737 0 male 22.0 0.750 no 2 14
## 739 0 male 32.0 4.000 yes 2 18
## 743 0 male 37.0 15.000 yes 4 20
## 745 0 male 22.0 0.750 yes 2 14
## 747 0 male 42.0 15.000 yes 4 20
## 751 0 female 52.0 15.000 yes 5 17
## 752 0 female 37.0 15.000 yes 4 14
## 754 0 male 27.0 7.000 yes 4 14
## 760 0 male 32.0 4.000 yes 2 16
## 763 0 female 27.0 4.000 yes 2 18
## 774 0 female 27.0 4.000 yes 2 18
## 776 0 male 37.0 15.000 yes 5 18
## 779 0 female 47.0 15.000 yes 5 12
## 784 0 female 32.0 10.000 yes 3 17
## 788 0 female 27.0 1.500 yes 4 17
## 794 0 female 57.0 15.000 yes 2 18
## 795 0 female 22.0 1.500 no 4 14
## 798 0 male 42.0 15.000 yes 3 14
## 800 0 male 57.0 15.000 yes 4 9
## 803 0 male 57.0 15.000 yes 4 20
## 807 0 female 22.0 0.125 no 4 14
## 812 0 female 32.0 10.000 yes 4 14
## 820 0 female 42.0 15.000 yes 3 18
## 823 0 female 27.0 1.500 no 2 18
## 830 0 male 32.0 0.125 yes 2 18
## 843 0 female 27.0 4.000 no 3 16
## 848 0 female 27.0 10.000 yes 2 16
## 851 0 female 32.0 7.000 yes 4 16
## 854 0 female 37.0 15.000 yes 4 14
## 856 0 female 42.0 15.000 yes 5 17
## 857 0 male 32.0 1.500 yes 4 14
## 859 0 female 32.0 4.000 yes 3 17
## 863 0 female 37.0 7.000 no 4 18
## 865 0 female 22.0 0.417 yes 3 14
## 867 0 female 27.0 7.000 yes 4 14
## 870 0 male 27.0 0.750 no 3 16
## 873 0 male 27.0 4.000 yes 2 20
## 875 0 male 32.0 10.000 yes 4 16
## 876 0 male 32.0 15.000 yes 1 14
## 877 0 male 22.0 0.750 no 3 17
## 880 0 female 27.0 7.000 yes 4 17
## 903 0 male 27.0 0.417 yes 4 20
## 904 0 male 37.0 15.000 yes 4 20
## 905 0 female 37.0 15.000 yes 2 14
## 908 0 male 22.0 4.000 yes 1 18
## 909 0 male 37.0 15.000 yes 4 17
## 910 0 female 22.0 1.500 no 2 14
## 912 0 male 52.0 15.000 yes 4 14
## 914 0 female 22.0 1.500 no 4 17
## 915 0 male 32.0 4.000 yes 5 14
## 916 0 male 32.0 4.000 yes 2 14
## 920 0 female 22.0 1.500 no 3 16
## 921 0 male 27.0 0.750 no 2 18
## 925 0 female 22.0 7.000 yes 2 14
## 926 0 female 27.0 0.750 no 2 17
## 929 0 female 37.0 15.000 yes 4 12
## 931 0 female 22.0 1.500 no 1 14
## 945 0 female 37.0 10.000 no 2 12
## 947 0 female 37.0 15.000 yes 4 18
## 949 0 female 42.0 15.000 yes 3 12
## 950 0 male 22.0 4.000 no 2 18
## 961 0 male 52.0 7.000 yes 2 20
## 965 0 male 27.0 0.750 no 2 17
## 966 0 female 27.0 4.000 no 2 17
## 967 0 male 42.0 1.500 no 5 20
## 987 0 male 22.0 1.500 no 4 17
## 990 0 male 22.0 4.000 no 4 17
## 992 0 female 22.0 4.000 yes 1 14
## 995 0 male 37.0 15.000 yes 5 20
## 1009 0 female 37.0 10.000 yes 3 16
## 1021 0 male 42.0 15.000 yes 4 17
## 1026 0 female 47.0 15.000 yes 4 17
## 1027 0 male 22.0 1.500 no 4 16
## 1030 0 female 32.0 10.000 yes 3 12
## 1031 0 female 22.0 7.000 yes 1 14
## 1034 0 female 32.0 10.000 yes 4 17
## 1037 0 male 27.0 1.500 yes 2 16
## 1038 0 male 37.0 15.000 yes 4 14
## 1039 0 male 42.0 4.000 yes 3 14
## 1045 0 female 37.0 15.000 yes 5 14
## 1046 0 female 32.0 7.000 yes 4 17
## 1054 0 female 42.0 15.000 yes 4 18
## 1059 0 male 27.0 4.000 no 4 18
## 1063 0 male 22.0 0.750 no 4 18
## 1068 0 male 27.0 4.000 yes 4 14
## 1070 0 female 22.0 0.750 no 5 18
## 1072 0 female 52.0 15.000 yes 5 9
## 1073 0 male 32.0 10.000 yes 3 14
## 1077 0 female 37.0 15.000 yes 4 16
## 1081 0 male 32.0 7.000 yes 2 20
## 1083 0 female 42.0 15.000 yes 3 18
## 1084 0 male 32.0 15.000 yes 1 16
## 1086 0 male 27.0 4.000 yes 3 18
## 1087 0 female 32.0 15.000 yes 4 12
## 1089 0 male 22.0 0.750 yes 3 14
## 1096 0 female 22.0 1.500 no 3 16
## 1102 0 female 42.0 15.000 yes 4 14
## 1103 0 female 52.0 15.000 yes 3 16
## 1107 0 male 37.0 15.000 yes 5 20
## 1109 0 female 47.0 15.000 yes 4 12
## 1115 0 male 57.0 15.000 yes 2 20
## 1119 0 male 32.0 7.000 yes 4 17
## 1124 0 female 27.0 7.000 yes 4 17
## 1126 0 male 22.0 1.500 no 1 18
## 1128 0 female 22.0 4.000 yes 3 9
## 1129 0 female 22.0 1.500 no 2 14
## 1130 0 male 42.0 15.000 yes 2 20
## 1133 0 male 57.0 15.000 yes 4 9
## 1140 0 female 27.0 7.000 yes 2 18
## 1143 0 female 22.0 4.000 yes 3 14
## 1146 0 male 37.0 15.000 yes 4 14
## 1153 0 male 32.0 7.000 yes 1 18
## 1156 0 female 22.0 1.500 no 2 14
## 1157 0 female 22.0 1.500 yes 3 12
## 1158 0 male 52.0 15.000 yes 2 14
## 1160 0 female 37.0 15.000 yes 2 14
## 1161 0 female 32.0 10.000 yes 2 14
## 1166 0 male 42.0 15.000 yes 4 20
## 1177 0 female 27.0 4.000 yes 3 18
## 1178 0 male 37.0 15.000 yes 4 20
## 1180 0 male 27.0 1.500 no 3 18
## 1187 0 female 22.0 0.125 no 2 16
## 1191 0 male 32.0 10.000 yes 2 20
## 1195 0 female 27.0 4.000 no 4 18
## 1207 0 female 27.0 7.000 yes 2 12
## 1208 0 male 32.0 4.000 yes 5 18
## 1209 0 female 37.0 15.000 yes 2 17
## 1211 0 male 47.0 15.000 no 4 20
## 1215 0 male 27.0 1.500 no 1 18
## 1221 0 male 37.0 15.000 yes 4 20
## 1226 0 female 32.0 15.000 yes 4 18
## 1229 0 female 32.0 7.000 yes 4 17
## 1231 0 female 42.0 15.000 yes 3 14
## 1234 0 female 27.0 7.000 yes 3 16
## 1235 0 male 27.0 1.500 no 3 16
## 1242 0 male 22.0 1.500 no 3 16
## 1245 0 male 27.0 4.000 yes 3 16
## 1260 0 female 27.0 7.000 yes 3 12
## 1266 0 female 37.0 15.000 yes 2 18
## 1271 0 female 37.0 7.000 yes 3 14
## 1273 0 male 22.0 1.500 no 2 16
## 1276 0 male 37.0 15.000 yes 5 20
## 1280 0 female 22.0 1.500 no 4 16
## 1282 0 female 32.0 10.000 yes 4 16
## 1285 0 male 27.0 4.000 no 2 17
## 1295 0 female 22.0 0.417 no 4 14
## 1298 0 female 27.0 4.000 no 2 18
## 1299 0 male 37.0 15.000 yes 4 18
## 1304 0 male 37.0 10.000 yes 5 20
## 1305 0 female 27.0 7.000 yes 2 14
## 1311 0 male 32.0 4.000 yes 2 16
## 1314 0 male 32.0 4.000 yes 2 16
## 1319 0 male 22.0 1.500 no 3 18
## 1322 0 female 22.0 4.000 yes 4 14
## 1324 0 female 17.5 0.750 no 2 18
## 1327 0 male 32.0 10.000 yes 4 20
## 1328 0 female 32.0 0.750 no 5 14
## 1330 0 male 37.0 15.000 yes 4 17
## 1332 0 male 32.0 4.000 no 3 14
## 1333 0 female 27.0 1.500 no 2 17
## 1336 0 female 22.0 7.000 yes 4 14
## 1341 0 male 47.0 15.000 yes 5 14
## 1344 0 male 27.0 4.000 yes 1 16
## 1352 0 female 37.0 15.000 yes 5 14
## 1358 0 male 42.0 4.000 yes 4 18
## 1359 0 female 32.0 4.000 yes 2 14
## 1361 0 male 52.0 15.000 yes 2 14
## 1364 0 female 22.0 1.500 no 2 16
## 1368 0 male 52.0 15.000 yes 4 12
## 1384 0 female 22.0 0.417 no 3 17
## 1390 0 female 22.0 1.500 no 2 16
## 1393 0 male 27.0 4.000 yes 4 20
## 1394 0 female 32.0 15.000 yes 4 14
## 1402 0 female 27.0 1.500 no 2 16
## 1407 0 male 32.0 4.000 no 1 20
## 1408 0 male 37.0 15.000 yes 3 20
## 1412 0 female 32.0 10.000 no 2 16
## 1413 0 female 32.0 10.000 yes 5 14
## 1416 0 male 37.0 1.500 yes 4 18
## 1417 0 male 32.0 1.500 no 2 18
## 1418 0 female 32.0 10.000 yes 4 14
## 1419 0 female 47.0 15.000 yes 4 18
## 1420 0 female 27.0 10.000 yes 5 12
## 1423 0 male 27.0 4.000 yes 3 16
## 1424 0 female 37.0 15.000 yes 4 12
## 1432 0 female 27.0 0.750 no 4 16
## 1433 0 female 37.0 15.000 yes 4 16
## 1437 0 female 32.0 15.000 yes 3 16
## 1438 0 female 27.0 10.000 yes 2 16
## 1439 0 male 27.0 7.000 no 2 20
## 1446 0 female 37.0 15.000 yes 2 14
## 1450 0 male 27.0 1.500 yes 2 17
## 1451 0 female 22.0 0.750 yes 2 14
## 1452 0 male 22.0 4.000 yes 4 14
## 1453 0 male 42.0 0.125 no 4 17
## 1456 0 male 27.0 1.500 yes 4 18
## 1464 0 male 27.0 7.000 yes 3 16
## 1469 0 female 52.0 15.000 yes 4 14
## 1473 0 male 27.0 1.500 no 5 20
## 1481 0 female 27.0 1.500 no 2 16
## 1482 0 female 27.0 1.500 no 3 17
## 1496 0 male 22.0 0.125 no 5 16
## 1497 0 female 27.0 4.000 yes 4 16
## 1504 0 female 27.0 4.000 yes 4 12
## 1513 0 female 47.0 15.000 yes 2 14
## 1515 0 female 32.0 15.000 yes 3 14
## 1534 0 male 42.0 7.000 yes 2 16
## 1535 0 male 22.0 0.750 no 4 16
## 1536 0 male 27.0 0.125 no 3 20
## 1540 0 male 32.0 10.000 yes 3 20
## 1551 0 female 22.0 0.417 no 5 14
## 1555 0 female 47.0 15.000 yes 5 14
## 1557 0 female 32.0 10.000 yes 3 14
## 1566 0 male 57.0 15.000 yes 4 17
## 1567 0 male 27.0 4.000 yes 3 20
## 1576 0 female 32.0 7.000 yes 4 17
## 1584 0 female 37.0 10.000 yes 4 16
## 1585 0 female 32.0 10.000 yes 1 18
## 1590 0 female 22.0 4.000 no 3 14
## 1594 0 female 27.0 7.000 yes 4 14
## 1595 0 male 57.0 15.000 yes 5 18
## 1603 0 male 32.0 7.000 yes 2 18
## 1608 0 female 27.0 1.500 no 4 17
## 1609 0 male 22.0 1.500 no 4 14
## 1615 0 female 22.0 1.500 yes 4 14
## 1616 0 female 32.0 7.000 yes 3 16
## 1617 0 female 47.0 15.000 yes 3 16
## 1620 0 female 22.0 0.750 no 3 16
## 1621 0 female 22.0 1.500 yes 2 14
## 1637 0 female 27.0 4.000 yes 1 16
## 1638 0 male 52.0 15.000 yes 4 16
## 1650 0 male 32.0 10.000 yes 4 20
## 1654 0 male 47.0 15.000 yes 4 16
## 1665 0 female 27.0 7.000 yes 2 14
## 1670 0 female 22.0 1.500 no 4 14
## 1671 0 female 32.0 10.000 yes 2 16
## 1675 0 female 22.0 0.750 no 2 16
## 1688 0 female 22.0 1.500 no 2 16
## 1691 0 female 42.0 15.000 yes 3 18
## 1695 0 female 27.0 7.000 yes 5 14
## 1698 0 male 42.0 15.000 yes 4 16
## 1704 0 female 57.0 15.000 yes 3 18
## 1705 0 male 42.0 15.000 yes 3 18
## 1711 0 female 32.0 7.000 yes 2 14
## 1719 0 male 22.0 4.000 no 5 12
## 1723 0 female 22.0 1.500 no 1 16
## 1726 0 female 22.0 0.750 no 1 14
## 1749 0 female 32.0 15.000 yes 4 12
## 1752 0 male 22.0 1.500 no 2 18
## 1754 0 male 27.0 4.000 yes 5 17
## 1758 0 female 27.0 4.000 yes 4 12
## 1761 0 male 42.0 15.000 yes 5 18
## 1773 0 male 32.0 1.500 no 2 20
## 1775 0 male 57.0 15.000 no 4 9
## 1786 0 male 37.0 7.000 no 4 18
## 1793 0 male 52.0 15.000 yes 2 17
## 1799 0 male 47.0 15.000 yes 4 17
## 1803 0 female 27.0 7.000 no 2 17
## 1806 0 female 27.0 7.000 yes 4 14
## 1807 0 female 22.0 4.000 no 2 14
## 1808 0 male 37.0 7.000 yes 2 20
## 1814 0 male 27.0 7.000 no 4 12
## 1815 0 male 42.0 10.000 yes 4 18
## 1818 0 female 22.0 1.500 no 3 14
## 1827 0 female 22.0 4.000 yes 2 14
## 1834 0 female 57.0 15.000 no 4 20
## 1835 0 male 37.0 15.000 yes 4 14
## 1843 0 female 27.0 7.000 yes 3 18
## 1846 0 female 17.5 10.000 no 4 14
## 1850 0 male 22.0 4.000 yes 4 16
## 1851 0 female 27.0 4.000 yes 2 16
## 1854 0 female 37.0 15.000 yes 2 14
## 1859 0 female 22.0 1.500 no 5 14
## 1861 0 male 27.0 7.000 yes 2 20
## 1866 0 male 27.0 4.000 yes 4 14
## 1873 0 male 22.0 0.125 no 1 16
## 1875 0 female 27.0 7.000 yes 4 14
## 1885 0 female 32.0 15.000 yes 5 16
## 1892 0 male 32.0 10.000 yes 4 18
## 1895 0 female 32.0 15.000 yes 2 14
## 1896 0 female 22.0 1.500 no 3 17
## 1897 0 male 27.0 4.000 yes 4 17
## 1899 0 female 52.0 15.000 yes 5 14
## 1904 0 female 27.0 7.000 yes 2 12
## 1905 0 female 27.0 7.000 yes 3 12
## 1908 0 female 42.0 15.000 yes 2 14
## 1916 0 female 42.0 15.000 yes 4 14
## 1918 0 male 27.0 7.000 yes 4 14
## 1920 0 male 27.0 7.000 yes 2 20
## 1930 0 female 42.0 15.000 yes 3 12
## 1940 0 male 27.0 4.000 yes 3 16
## 1947 0 female 27.0 7.000 yes 3 14
## 1949 0 female 22.0 1.500 no 2 14
## 1951 0 female 27.0 4.000 yes 4 14
## 1952 0 female 22.0 4.000 no 4 14
## 1960 0 female 22.0 1.500 no 2 16
## 9001 0 male 47.0 15.000 no 4 14
## 9012 0 male 37.0 10.000 yes 2 18
## 9023 0 male 37.0 15.000 yes 3 17
## 9029 0 female 27.0 4.000 yes 2 16
## 6 3 male 27.0 1.500 no 3 18
## 12 3 female 27.0 4.000 yes 3 17
## 43 7 male 37.0 15.000 yes 5 18
## 53 12 female 32.0 10.000 yes 3 17
## 67 1 male 22.0 0.125 no 4 16
## 79 1 female 22.0 1.500 yes 2 14
## 122 12 male 37.0 15.000 yes 4 14
## 126 7 female 22.0 1.500 no 2 14
## 133 2 male 37.0 15.000 yes 2 18
## 138 3 female 32.0 15.000 yes 4 12
## 154 1 female 37.0 15.000 yes 4 14
## 159 7 female 42.0 15.000 yes 3 17
## 174 12 female 42.0 15.000 yes 5 9
## 176 12 male 37.0 10.000 yes 2 20
## 181 12 female 32.0 15.000 yes 3 14
## 182 3 male 27.0 4.000 no 1 18
## 186 7 male 37.0 10.000 yes 2 18
## 189 7 female 27.0 4.000 no 3 17
## 204 1 male 42.0 15.000 yes 4 16
## 215 1 female 47.0 15.000 yes 5 14
## 232 7 female 27.0 4.000 yes 3 18
## 233 1 female 27.0 7.000 yes 5 14
## 252 12 male 27.0 1.500 yes 3 17
## 253 12 female 27.0 7.000 yes 4 14
## 274 3 female 42.0 15.000 yes 4 16
## 275 7 female 27.0 10.000 yes 4 12
## 287 1 male 27.0 1.500 no 2 18
## 288 1 male 32.0 4.000 no 4 20
## 325 1 female 27.0 7.000 yes 3 14
## 328 3 female 32.0 10.000 yes 4 14
## 344 3 male 27.0 4.000 yes 2 18
## 353 1 female 17.5 0.750 no 5 14
## 354 1 female 32.0 10.000 yes 4 18
## 367 7 female 32.0 7.000 yes 2 17
## 369 7 male 37.0 15.000 yes 2 20
## 390 7 female 37.0 10.000 no 1 20
## 392 12 female 32.0 10.000 yes 2 16
## 423 7 male 52.0 15.000 yes 2 20
## 432 7 female 42.0 15.000 yes 1 12
## 436 1 male 52.0 15.000 yes 2 20
## 483 2 male 37.0 15.000 yes 3 18
## 513 12 female 22.0 4.000 no 3 12
## 516 12 male 27.0 7.000 yes 1 18
## 518 1 male 27.0 4.000 yes 3 18
## 520 12 male 47.0 15.000 yes 4 17
## 526 12 female 42.0 15.000 yes 4 12
## 528 7 male 27.0 4.000 no 3 14
## 553 7 female 32.0 7.000 yes 4 18
## 576 1 male 32.0 0.417 yes 3 12
## 611 3 male 47.0 15.000 yes 5 16
## 625 12 male 37.0 15.000 yes 2 20
## 635 7 male 22.0 4.000 yes 2 17
## 646 1 male 27.0 4.000 no 2 14
## 657 7 female 52.0 15.000 yes 5 16
## 659 1 male 27.0 4.000 no 3 14
## 666 1 female 27.0 10.000 yes 4 16
## 679 1 male 32.0 7.000 yes 3 14
## 729 7 male 32.0 7.000 yes 2 18
## 755 3 male 22.0 1.500 no 1 14
## 758 7 male 22.0 4.000 yes 3 18
## 770 7 male 42.0 15.000 yes 4 20
## 786 2 female 57.0 15.000 yes 1 18
## 797 7 female 32.0 4.000 yes 3 18
## 811 1 male 27.0 4.000 yes 1 16
## 834 7 male 32.0 7.000 yes 4 16
## 858 2 male 57.0 15.000 yes 1 17
## 885 7 female 42.0 15.000 yes 4 14
## 893 7 male 37.0 10.000 yes 1 18
## 927 3 male 42.0 15.000 yes 3 17
## 928 1 female 52.0 15.000 yes 3 14
## 933 2 female 27.0 7.000 yes 3 17
## 951 12 male 32.0 7.000 yes 2 12
## 968 1 male 22.0 4.000 no 4 14
## 972 3 male 27.0 7.000 yes 3 18
## 975 12 female 37.0 15.000 yes 1 18
## 977 7 female 32.0 15.000 yes 3 17
## 981 7 female 27.0 7.000 no 2 17
## 986 1 female 32.0 7.000 yes 3 17
## 1002 1 male 32.0 1.500 yes 2 14
## 1007 12 female 42.0 15.000 yes 4 14
## 1011 7 male 32.0 10.000 yes 3 14
## 1035 7 male 37.0 4.000 yes 1 20
## 1050 1 female 27.0 4.000 yes 2 16
## 1056 12 female 42.0 15.000 yes 3 14
## 1057 1 male 27.0 10.000 yes 5 20
## 1075 12 male 37.0 10.000 yes 2 20
## 1080 12 female 27.0 7.000 yes 1 14
## 1125 3 female 27.0 7.000 yes 4 12
## 1131 3 male 32.0 10.000 yes 2 14
## 1138 12 female 17.5 0.750 yes 2 12
## 1150 12 female 32.0 15.000 yes 3 18
## 1163 2 female 22.0 7.000 no 4 14
## 1169 1 male 32.0 7.000 yes 4 20
## 1198 7 male 27.0 4.000 yes 2 18
## 1204 1 female 22.0 1.500 yes 5 14
## 1218 12 female 32.0 15.000 no 3 17
## 1230 12 female 42.0 15.000 yes 2 12
## 1236 7 male 42.0 15.000 yes 3 20
## 1247 12 male 32.0 10.000 no 2 18
## 1259 12 female 32.0 15.000 yes 3 9
## 1294 7 male 57.0 15.000 yes 5 20
## 1353 12 male 47.0 15.000 yes 4 20
## 1370 2 female 42.0 15.000 yes 2 17
## 1427 12 male 37.0 15.000 yes 3 17
## 1445 12 male 37.0 15.000 yes 5 17
## 1460 7 male 27.0 10.000 yes 2 20
## 1480 2 male 37.0 15.000 yes 2 16
## 1505 12 female 32.0 15.000 yes 1 14
## 1543 7 male 32.0 10.000 yes 3 17
## 1548 2 male 37.0 15.000 yes 4 18
## 1550 7 female 27.0 1.500 no 2 17
## 1561 3 female 47.0 15.000 yes 2 17
## 1564 12 male 37.0 15.000 yes 2 17
## 1573 12 female 27.0 4.000 no 2 14
## 1575 2 female 27.0 10.000 yes 4 14
## 1599 1 female 22.0 4.000 yes 3 16
## 1622 12 male 52.0 7.000 no 4 16
## 1629 2 female 27.0 4.000 yes 1 16
## 1664 7 female 37.0 15.000 yes 2 17
## 1669 2 female 27.0 4.000 no 1 17
## 1674 12 female 17.5 0.750 yes 2 12
## 1682 7 female 32.0 15.000 yes 5 18
## 1685 7 female 22.0 4.000 no 1 16
## 1697 2 male 32.0 4.000 yes 4 18
## 1716 1 female 22.0 1.500 yes 3 18
## 1730 3 female 42.0 15.000 yes 2 17
## 1731 1 male 32.0 7.000 yes 4 16
## 1732 12 male 37.0 15.000 no 3 14
## 1743 1 male 42.0 15.000 yes 3 16
## 1751 1 male 27.0 4.000 yes 1 18
## 1757 2 male 37.0 15.000 yes 4 20
## 1763 7 male 37.0 15.000 yes 3 20
## 1766 3 male 22.0 1.500 no 2 12
## 1772 3 male 32.0 4.000 yes 3 20
## 1776 2 male 32.0 15.000 yes 5 20
## 1782 12 female 52.0 15.000 yes 1 18
## 1784 12 male 47.0 15.000 no 1 18
## 1791 3 female 32.0 15.000 yes 4 16
## 1831 7 female 32.0 15.000 yes 3 14
## 1840 7 female 27.0 7.000 yes 4 16
## 1844 12 male 42.0 15.000 yes 3 18
## 1856 7 female 42.0 15.000 yes 2 14
## 1876 12 male 27.0 7.000 yes 2 17
## 1929 3 male 32.0 10.000 yes 4 14
## 1935 7 male 47.0 15.000 yes 3 16
## 1938 1 male 22.0 1.500 yes 1 12
## 1941 7 female 32.0 10.000 yes 2 18
## 1954 2 male 32.0 10.000 yes 2 17
## 1959 2 male 22.0 7.000 yes 3 18
## 9010 1 female 32.0 15.000 yes 3 14
## occupation rating
## 4 7 4
## 5 6 4
## 11 1 4
## 16 6 5
## 23 6 3
## 29 5 5
## 44 1 3
## 45 4 4
## 47 1 2
## 49 4 5
## 50 7 2
## 55 6 4
## 64 6 4
## 80 5 4
## 86 5 4
## 93 5 5
## 108 4 3
## 114 5 4
## 115 5 5
## 116 1 5
## 123 5 5
## 127 5 5
## 129 5 4
## 134 1 5
## 137 6 4
## 139 5 5
## 147 1 5
## 151 6 4
## 153 5 5
## 155 5 4
## 162 6 3
## 163 5 5
## 165 6 4
## 168 5 4
## 170 6 4
## 172 5 3
## 184 1 4
## 187 5 5
## 192 3 5
## 194 1 5
## 210 6 5
## 217 6 3
## 220 1 4
## 224 6 1
## 227 5 3
## 228 5 5
## 239 6 5
## 241 5 4
## 245 3 5
## 249 6 4
## 262 3 5
## 265 6 4
## 267 6 5
## 269 6 4
## 271 5 5
## 277 3 1
## 290 4 4
## 292 1 5
## 293 5 4
## 295 6 4
## 299 7 2
## 320 6 4
## 321 5 5
## 324 4 5
## 334 6 4
## 351 3 5
## 355 6 4
## 361 6 5
## 362 5 5
## 366 6 4
## 370 5 5
## 374 4 5
## 378 5 5
## 381 4 3
## 382 5 5
## 383 5 5
## 384 5 5
## 400 6 5
## 403 5 5
## 409 7 2
## 412 6 5
## 413 6 5
## 416 3 4
## 418 3 3
## 422 1 5
## 435 6 4
## 439 5 5
## 445 5 2
## 447 1 5
## 448 1 4
## 449 3 4
## 478 5 3
## 482 5 4
## 486 5 3
## 489 3 4
## 490 1 5
## 491 6 1
## 492 6 4
## 503 5 5
## 508 6 5
## 509 1 5
## 512 6 4
## 515 4 5
## 517 6 5
## 532 6 4
## 533 6 4
## 535 5 4
## 537 5 4
## 538 6 3
## 543 5 4
## 547 1 5
## 550 6 5
## 558 5 5
## 571 7 4
## 578 5 4
## 583 6 5
## 586 3 2
## 594 6 4
## 597 1 5
## 602 2 4
## 603 2 5
## 604 5 4
## 612 5 3
## 613 1 2
## 621 5 5
## 627 6 5
## 630 4 5
## 631 5 5
## 632 6 5
## 639 6 5
## 645 1 3
## 647 6 5
## 648 6 5
## 651 6 3
## 655 5 3
## 667 5 4
## 670 6 5
## 671 1 4
## 673 5 5
## 701 5 4
## 705 1 4
## 706 6 4
## 709 5 3
## 717 5 4
## 719 6 3
## 723 5 4
## 724 5 4
## 726 1 3
## 734 6 4
## 735 5 4
## 736 6 3
## 737 4 3
## 739 5 3
## 743 6 3
## 745 4 3
## 747 6 3
## 751 1 1
## 752 1 2
## 754 5 3
## 760 5 5
## 763 6 5
## 774 5 5
## 776 6 5
## 779 5 4
## 784 1 4
## 788 1 2
## 794 5 2
## 795 5 4
## 798 3 4
## 800 2 2
## 803 6 5
## 807 4 5
## 812 1 5
## 820 5 4
## 823 6 5
## 830 5 2
## 843 5 4
## 848 1 4
## 851 1 3
## 854 5 4
## 856 6 2
## 857 6 5
## 859 5 3
## 863 5 5
## 865 3 5
## 867 1 5
## 870 5 5
## 873 5 5
## 875 4 5
## 876 5 5
## 877 4 5
## 880 1 4
## 903 5 4
## 904 5 4
## 905 1 3
## 908 5 4
## 909 5 3
## 910 4 5
## 912 6 2
## 914 5 5
## 915 3 5
## 916 3 5
## 920 6 5
## 921 3 3
## 925 5 2
## 926 5 3
## 929 1 2
## 931 1 5
## 945 4 4
## 947 5 3
## 949 3 3
## 950 5 5
## 961 6 2
## 965 5 5
## 966 4 5
## 967 6 5
## 987 6 5
## 990 5 3
## 992 5 4
## 995 4 5
## 1009 6 3
## 1021 6 5
## 1026 5 5
## 1027 5 4
## 1030 1 4
## 1031 3 5
## 1034 5 4
## 1037 2 4
## 1038 5 5
## 1039 4 5
## 1045 5 4
## 1046 5 5
## 1054 6 5
## 1059 6 4
## 1063 6 5
## 1068 5 3
## 1070 1 5
## 1072 5 5
## 1073 5 5
## 1077 4 4
## 1081 5 4
## 1083 1 4
## 1084 5 5
## 1086 5 5
## 1087 3 4
## 1089 2 4
## 1096 5 3
## 1102 3 5
## 1103 5 4
## 1107 6 4
## 1109 2 3
## 1115 6 4
## 1119 5 5
## 1124 1 4
## 1126 6 5
## 1128 1 4
## 1129 1 5
## 1130 6 4
## 1133 2 4
## 1140 1 5
## 1143 1 5
## 1146 5 3
## 1153 6 4
## 1156 5 5
## 1157 1 3
## 1158 5 5
## 1160 1 1
## 1161 5 5
## 1166 4 5
## 1177 4 5
## 1178 6 5
## 1180 5 5
## 1187 6 3
## 1191 6 3
## 1195 5 4
## 1207 5 1
## 1208 6 3
## 1209 5 5
## 1211 6 4
## 1215 5 5
## 1221 6 4
## 1226 1 4
## 1229 5 4
## 1231 1 3
## 1234 1 4
## 1235 4 2
## 1242 3 5
## 1245 4 2
## 1260 1 2
## 1266 5 4
## 1271 4 4
## 1273 5 5
## 1276 5 4
## 1280 5 3
## 1282 1 5
## 1285 5 3
## 1295 5 5
## 1298 5 5
## 1299 5 3
## 1304 7 4
## 1305 4 2
## 1311 5 5
## 1314 6 4
## 1319 4 5
## 1322 3 4
## 1324 5 4
## 1327 4 5
## 1328 3 3
## 1330 5 3
## 1332 4 5
## 1333 3 2
## 1336 1 5
## 1341 6 5
## 1344 4 4
## 1352 1 3
## 1358 5 5
## 1359 1 5
## 1361 7 4
## 1364 1 4
## 1368 2 4
## 1384 1 5
## 1390 5 5
## 1393 6 4
## 1394 1 5
## 1402 3 5
## 1407 6 5
## 1408 6 4
## 1412 6 5
## 1413 5 5
## 1416 5 3
## 1417 4 4
## 1418 1 4
## 1419 5 4
## 1420 1 5
## 1423 4 5
## 1424 4 2
## 1432 5 5
## 1433 1 5
## 1437 1 5
## 1438 1 5
## 1439 6 5
## 1446 1 3
## 1450 4 4
## 1451 1 5
## 1452 2 4
## 1453 6 4
## 1456 6 5
## 1464 6 3
## 1469 1 3
## 1473 5 2
## 1481 5 5
## 1482 5 5
## 1496 4 4
## 1497 1 5
## 1504 1 5
## 1513 5 5
## 1515 5 3
## 1534 5 5
## 1535 6 4
## 1536 6 5
## 1540 6 5
## 1551 4 5
## 1555 1 4
## 1557 1 5
## 1566 5 5
## 1567 6 5
## 1576 1 5
## 1584 1 5
## 1585 1 4
## 1590 1 4
## 1594 3 2
## 1595 5 2
## 1603 5 5
## 1608 1 3
## 1609 5 5
## 1615 5 4
## 1616 1 5
## 1617 5 4
## 1620 1 5
## 1621 5 5
## 1637 5 5
## 1638 5 5
## 1650 6 5
## 1654 6 4
## 1665 1 2
## 1670 4 5
## 1671 5 4
## 1675 5 4
## 1688 5 5
## 1691 6 4
## 1695 4 5
## 1698 4 4
## 1704 5 2
## 1705 6 2
## 1711 1 2
## 1719 4 5
## 1723 6 5
## 1726 4 5
## 1749 1 5
## 1752 5 3
## 1754 2 5
## 1758 1 5
## 1761 5 4
## 1773 7 3
## 1775 3 1
## 1786 5 5
## 1793 5 4
## 1799 6 5
## 1803 5 4
## 1806 5 5
## 1807 3 3
## 1808 6 5
## 1814 4 3
## 1815 6 4
## 1818 1 5
## 1827 1 3
## 1834 6 5
## 1835 4 3
## 1843 5 5
## 1846 4 5
## 1850 5 5
## 1851 1 4
## 1854 5 1
## 1859 1 4
## 1861 5 4
## 1866 5 5
## 1873 3 5
## 1875 1 4
## 1885 5 3
## 1892 5 4
## 1895 3 4
## 1896 5 5
## 1897 4 4
## 1899 1 5
## 1904 1 2
## 1905 1 4
## 1908 1 4
## 1916 5 4
## 1918 3 3
## 1920 6 2
## 1930 3 3
## 1940 3 5
## 1947 1 4
## 1949 4 5
## 1951 1 4
## 1952 5 5
## 1960 4 5
## 9001 5 4
## 9012 6 2
## 9023 5 4
## 9029 1 4
## 6 4 4
## 12 1 5
## 43 6 2
## 53 5 2
## 67 5 5
## 79 1 5
## 122 5 2
## 126 3 4
## 133 6 4
## 138 3 2
## 154 4 2
## 159 1 4
## 174 4 1
## 176 6 2
## 181 1 2
## 182 6 5
## 186 7 3
## 189 5 5
## 204 5 5
## 215 4 5
## 232 5 4
## 233 1 4
## 252 5 4
## 253 6 2
## 274 5 4
## 275 7 3
## 287 5 2
## 288 6 4
## 325 1 3
## 328 1 4
## 344 7 2
## 353 4 5
## 354 1 5
## 367 6 4
## 369 6 4
## 390 5 3
## 392 5 5
## 423 6 4
## 432 1 3
## 436 6 3
## 483 6 5
## 513 3 4
## 516 6 2
## 518 5 5
## 520 6 5
## 526 1 1
## 528 3 4
## 553 4 5
## 576 3 4
## 611 5 4
## 625 5 4
## 635 6 4
## 646 4 5
## 657 1 3
## 659 3 3
## 666 1 4
## 679 7 4
## 729 4 1
## 755 3 2
## 758 6 4
## 770 6 4
## 786 5 4
## 797 5 2
## 811 4 4
## 834 1 4
## 858 4 4
## 885 5 2
## 893 5 3
## 927 6 1
## 928 4 4
## 933 5 3
## 951 4 2
## 968 2 5
## 972 6 4
## 975 5 5
## 977 1 3
## 981 5 5
## 986 5 3
## 1002 2 4
## 1007 1 2
## 1011 5 4
## 1035 6 3
## 1050 5 3
## 1056 4 3
## 1057 6 5
## 1075 6 2
## 1080 3 3
## 1125 1 2
## 1131 4 4
## 1138 1 3
## 1150 5 4
## 1163 4 3
## 1169 6 5
## 1198 6 2
## 1204 5 3
## 1218 5 1
## 1230 1 2
## 1236 5 4
## 1247 4 2
## 1259 1 1
## 1294 4 5
## 1353 6 4
## 1370 6 3
## 1427 6 3
## 1445 5 2
## 1460 6 4
## 1480 5 4
## 1505 5 2
## 1543 6 3
## 1548 5 1
## 1550 5 5
## 1561 5 2
## 1564 5 4
## 1573 5 5
## 1575 1 5
## 1599 1 3
## 1622 5 5
## 1629 3 5
## 1664 6 4
## 1669 3 1
## 1674 3 5
## 1682 5 4
## 1685 3 5
## 1697 6 4
## 1716 5 2
## 1730 5 4
## 1731 4 4
## 1732 6 2
## 1743 6 3
## 1751 5 4
## 1757 7 3
## 1763 6 4
## 1766 3 3
## 1772 6 2
## 1776 6 5
## 1782 5 5
## 1784 6 5
## 1791 4 4
## 1831 3 2
## 1840 1 2
## 1844 6 2
## 1856 3 2
## 1876 5 4
## 1929 4 3
## 1935 4 2
## 1938 2 5
## 1941 5 4
## 1954 6 5
## 1959 6 2
## 9010 1 5
该数据从601个参与者身上收集了9个变量,包括一年来婚外私通的频率以及参与者性别、年龄、婚龄、是否 有小孩、宗教信仰程度(5分制, 1分表示反对, 5分表示非常信仰)、学历、职业(逆向编号的戈 登7种分类),还有对婚姻的自我评分(5分制, 1表示非常不幸福, 5表示非常幸福)。
2、展现各变量的情况
summary(Affairs)
## affairs gender age yearsmarried children
## Min. : 0.000 female:315 Min. :17.50 Min. : 0.125 no :171
## 1st Qu.: 0.000 male :286 1st Qu.:27.00 1st Qu.: 4.000 yes:430
## Median : 0.000 Median :32.00 Median : 7.000
## Mean : 1.456 Mean :32.49 Mean : 8.178
## 3rd Qu.: 0.000 3rd Qu.:37.00 3rd Qu.:15.000
## Max. :12.000 Max. :57.00 Max. :15.000
## religiousness education occupation rating
## Min. :1.000 Min. : 9.00 Min. :1.000 Min. :1.000
## 1st Qu.:2.000 1st Qu.:14.00 1st Qu.:3.000 1st Qu.:3.000
## Median :3.000 Median :16.00 Median :5.000 Median :4.000
## Mean :3.116 Mean :16.17 Mean :4.195 Mean :3.932
## 3rd Qu.:4.000 3rd Qu.:18.00 3rd Qu.:6.000 3rd Qu.:5.000
## Max. :5.000 Max. :20.00 Max. :7.000 Max. :5.000
假设我们感兴趣的是二值型结果(有过一次以上婚外情/没有过婚外情)。 我们需要将affairs转化为二值型因子ynaffair
3、将affairs转化为二值型因子ynaffair
Affairs$ynaffair[Affairs$affairs > 0] <- 1
Affairs$ynaffair[Affairs$affairs == 0] <- 0
Affairs$ynaffair <- factor(Affairs$ynaffair, levels = c(0,
1), labels = c("No", "Yes"))
table(Affairs$ynaffair)
##
## No Yes
## 451 150
5、构建逻辑回归模型
fit.full <- glm(ynaffair ~ gender + age + yearsmarried +
children + religiousness + education + occupation + rating,
data = Affairs, family = binomial())
summary(fit.full)
##
## Call:
## glm(formula = ynaffair ~ gender + age + yearsmarried + children +
## religiousness + education + occupation + rating, family = binomial(),
## data = Affairs)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.5713 -0.7499 -0.5690 -0.2539 2.5191
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.37726 0.88776 1.551 0.120807
## gendermale 0.28029 0.23909 1.172 0.241083
## age -0.04426 0.01825 -2.425 0.015301 *
## yearsmarried 0.09477 0.03221 2.942 0.003262 **
## childrenyes 0.39767 0.29151 1.364 0.172508
## religiousness -0.32472 0.08975 -3.618 0.000297 ***
## education 0.02105 0.05051 0.417 0.676851
## occupation 0.03092 0.07178 0.431 0.666630
## rating -0.46845 0.09091 -5.153 2.56e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 675.38 on 600 degrees of freedom
## Residual deviance: 609.51 on 592 degrees of freedom
## AIC: 627.51
##
## Number of Fisher Scoring iterations: 4
从回归系数的p值(最后一栏)可以看到,性别、是否有孩子、学历和职业对方程的贡献都 不显著(你无法拒绝参数为0的假设)。
6、去除掉不显著的变量,构建回归方程
fit.reduced <- glm(ynaffair ~ age + yearsmarried +
religiousness + rating, data = Affairs, family = binomial())
summary(fit.reduced)
##
## Call:
## glm(formula = ynaffair ~ age + yearsmarried + religiousness +
## rating, family = binomial(), data = Affairs)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.6278 -0.7550 -0.5701 -0.2624 2.3998
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.93083 0.61032 3.164 0.001558 **
## age -0.03527 0.01736 -2.032 0.042127 *
## yearsmarried 0.10062 0.02921 3.445 0.000571 ***
## religiousness -0.32902 0.08945 -3.678 0.000235 ***
## rating -0.46136 0.08884 -5.193 2.06e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 675.38 on 600 degrees of freedom
## Residual deviance: 615.36 on 596 degrees of freedom
## AIC: 625.36
##
## Number of Fisher Scoring iterations: 4
比之前的结果更加显著