7.1 OLS 回归



7.1.1 回归方程



一元线性回归方程:\[{\hat{y}=\beta_0+\beta_1x}\]

其中: \(\hat{y}\)为解释变量的估计值。\(\beta_0\)为回归线的截距。\(\beta_1\)为回归线的斜率。\(\beta_0\)\(\beta_1\)被称为回归系数。

\((y-\hat{y})\)被称为预测误差或残差。最小二乘回归用于获得使所有数据点残差平方和最小的回归线。



假定上述方程中有n个观察值,则:\[{{y_i}=\beta_0+\beta_1x_i+\varepsilon_i, i=1,2,\cdots,n}\]

最小二乘法是拟合出残差平方和最小的直线,残差平方和SSE为:

\[SSE=\sum_{i=1}^{n}{\varepsilon_i}^2=\sum_{i=1}^{n}(y_i-\beta_0-\beta_1x_i)^2\]



为了能够解释OLS模型的系数,数据必须满足以下统计假设:

  1. 正态性 对于固定的自变量值,因变量值成正态分布。

  2. 独立性 \(y_i\)值之间相互独立。

  3. 线性 因变量与自变量之间为线性相关。

  4. 同方差性 因变量的方差不随自变量的水平不同而变化。

如果违背了以上假设,得到的统计显著性检验结果和所得的置信区间很可能就不精确。注意,OLS回归还假定自变量是固定的且测量无误差,但在实践中通常都放松了这个假设。



7.1.2 简单线性回归



通过一个简单线性回归示例来熟悉R操作。使用基础安装中的数据集women进行回归分析。

该数据集提供了15位女性的身高和体重信息,我们想获得一个通过身高来预测体重的模型。



  1. 展示women数据集
women
##    height weight
## 1      58    115
## 2      59    117
## 3      60    120
## 4      61    123
## 5      62    126
## 6      63    129
## 7      64    132
## 8      65    135
## 9      66    139
## 10     67    142
## 11     68    146
## 12     69    150
## 13     70    154
## 14     71    159
## 15     72    164


  1. 利用lm()函数,建立回归模型
fit <- lm(weight ~ height, data = women)


  1. 利用summary()函数,展示拟合模型的详细结果
summary(fit)
## 
## Call:
## lm(formula = weight ~ height, data = women)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7333 -1.1333 -0.3833  0.7417  3.1167 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -87.51667    5.93694  -14.74 1.71e-09 ***
## height        3.45000    0.09114   37.85 1.09e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.525 on 13 degrees of freedom
## Multiple R-squared:  0.991,  Adjusted R-squared:  0.9903 
## F-statistic:  1433 on 1 and 13 DF,  p-value: 1.091e-14

通过输出结果,可以得到预测等式: \[ weight=-87.51667+3.45000*height \]

因为身高不可能为0,所以没必要给截距项(-87.51667)一个物理解释,它仅仅是一个常量调整项。

\(Pr(>|t|)\)一栏,可以看到回归系数(3.45)显著不为0(p<0.001)。表明身高每增高1英寸,体重将预期增加3.45磅。

\(R^2\)拟合优度为0.991。R²的值越接近1,说明回归直线对观测值的拟合程度越好。



  1. 展示women数据集中weight数据
women$weight
##  [1] 115 117 120 123 126 129 132 135 139 142 146 150 154 159 164
  1. 利用fitted()函数,得到拟合模型预测的weight数据
fitted(fit)
##        1        2        3        4        5        6        7        8 
## 112.5833 116.0333 119.4833 122.9333 126.3833 129.8333 133.2833 136.7333 
##        9       10       11       12       13       14       15 
## 140.1833 143.6333 147.0833 150.5333 153.9833 157.4333 160.8833


  1. 利用residuals()函数,得到拟合模型的残差值
residuals(fit)
##           1           2           3           4           5           6 
##  2.41666667  0.96666667  0.51666667  0.06666667 -0.38333333 -0.83333333 
##           7           8           9          10          11          12 
## -1.28333333 -1.73333333 -1.18333333 -1.63333333 -1.08333333 -0.53333333 
##          13          14          15 
##  0.01666667  1.56666667  3.11666667


  1. 画出身高体重的散点图,以及拟合的直线
plot(women$height, women$weight, main = "Women", 
    xlab = "Height (in inches)", ylab = "Weight (in pounds)")
# add the line of best fit
abline(fit)



7.1.3 多项式回归



通过观察上图,发现用曲线来拟合应该会提高回归的预测精度。因此,我们添加一个hight的二次项。



  1. 建立增加二次项的回归模型
fit2 <- lm(weight ~ height + I(height^2), data = women)

注意:I(height^2)表示height的平方项。因为^符号在回归表达式中有特殊的含义,所以此处必须要用I()函数。



  1. 展示回归结果
summary(fit2)
## 
## Call:
## lm(formula = weight ~ height + I(height^2), data = women)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.50941 -0.29611 -0.00941  0.28615  0.59706 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 261.87818   25.19677  10.393 2.36e-07 ***
## height       -7.34832    0.77769  -9.449 6.58e-07 ***
## I(height^2)   0.08306    0.00598  13.891 9.32e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3841 on 12 degrees of freedom
## Multiple R-squared:  0.9995, Adjusted R-squared:  0.9994 
## F-statistic: 1.139e+04 on 2 and 12 DF,  p-value: < 2.2e-16

通过输出结果,可以得到预测等式: \[ weight=261.87818-7.34832*height+0.08306*height^2 \]

\(Pr(>|t|)\)一栏,可以看到回归系数都显著不为0(p<0.001)。

\(R^2\)拟合优度为0.9995。比之前的高,说明加入二次项的回归模型拟合的效果更好。



  1. 画出身高体重的散点图,以及拟合的曲线
plot(women$height, women$weight, main = "Women", 
    xlab = "Height (in inches)", ylab = "Weight (in lbs)")
lines(women$height, fitted(fit2))



7.1.4 多元线性回归



当预测变量不止一个时,简单线性回归就变成了多元线性回归,分析也更加复杂。

以基础包中的state.x77数据集为例,来探究一个州的犯罪率和其他因素的关系,其他因素包括 人口、文盲率、平均收入和结霜天数(温度在冰点以下的天数)。



  1. 展示数据集state.x77
state.x77
##                Population Income Illiteracy Life Exp Murder HS Grad Frost
## Alabama              3615   3624        2.1    69.05   15.1    41.3    20
## Alaska                365   6315        1.5    69.31   11.3    66.7   152
## Arizona              2212   4530        1.8    70.55    7.8    58.1    15
## Arkansas             2110   3378        1.9    70.66   10.1    39.9    65
## California          21198   5114        1.1    71.71   10.3    62.6    20
## Colorado             2541   4884        0.7    72.06    6.8    63.9   166
## Connecticut          3100   5348        1.1    72.48    3.1    56.0   139
## Delaware              579   4809        0.9    70.06    6.2    54.6   103
## Florida              8277   4815        1.3    70.66   10.7    52.6    11
## Georgia              4931   4091        2.0    68.54   13.9    40.6    60
## Hawaii                868   4963        1.9    73.60    6.2    61.9     0
## Idaho                 813   4119        0.6    71.87    5.3    59.5   126
## Illinois            11197   5107        0.9    70.14   10.3    52.6   127
## Indiana              5313   4458        0.7    70.88    7.1    52.9   122
## Iowa                 2861   4628        0.5    72.56    2.3    59.0   140
## Kansas               2280   4669        0.6    72.58    4.5    59.9   114
## Kentucky             3387   3712        1.6    70.10   10.6    38.5    95
## Louisiana            3806   3545        2.8    68.76   13.2    42.2    12
## Maine                1058   3694        0.7    70.39    2.7    54.7   161
## Maryland             4122   5299        0.9    70.22    8.5    52.3   101
## Massachusetts        5814   4755        1.1    71.83    3.3    58.5   103
## Michigan             9111   4751        0.9    70.63   11.1    52.8   125
## Minnesota            3921   4675        0.6    72.96    2.3    57.6   160
## Mississippi          2341   3098        2.4    68.09   12.5    41.0    50
## Missouri             4767   4254        0.8    70.69    9.3    48.8   108
## Montana               746   4347        0.6    70.56    5.0    59.2   155
## Nebraska             1544   4508        0.6    72.60    2.9    59.3   139
## Nevada                590   5149        0.5    69.03   11.5    65.2   188
## New Hampshire         812   4281        0.7    71.23    3.3    57.6   174
## New Jersey           7333   5237        1.1    70.93    5.2    52.5   115
## New Mexico           1144   3601        2.2    70.32    9.7    55.2   120
## New York            18076   4903        1.4    70.55   10.9    52.7    82
## North Carolina       5441   3875        1.8    69.21   11.1    38.5    80
## North Dakota          637   5087        0.8    72.78    1.4    50.3   186
## Ohio                10735   4561        0.8    70.82    7.4    53.2   124
## Oklahoma             2715   3983        1.1    71.42    6.4    51.6    82
## Oregon               2284   4660        0.6    72.13    4.2    60.0    44
## Pennsylvania        11860   4449        1.0    70.43    6.1    50.2   126
## Rhode Island          931   4558        1.3    71.90    2.4    46.4   127
## South Carolina       2816   3635        2.3    67.96   11.6    37.8    65
## South Dakota          681   4167        0.5    72.08    1.7    53.3   172
## Tennessee            4173   3821        1.7    70.11   11.0    41.8    70
## Texas               12237   4188        2.2    70.90   12.2    47.4    35
## Utah                 1203   4022        0.6    72.90    4.5    67.3   137
## Vermont               472   3907        0.6    71.64    5.5    57.1   168
## Virginia             4981   4701        1.4    70.08    9.5    47.8    85
## Washington           3559   4864        0.6    71.72    4.3    63.5    32
## West Virginia        1799   3617        1.4    69.48    6.7    41.6   100
## Wisconsin            4589   4468        0.7    72.48    3.0    54.5   149
## Wyoming               376   4566        0.6    70.29    6.9    62.9   173
##                  Area
## Alabama         50708
## Alaska         566432
## Arizona        113417
## Arkansas        51945
## California     156361
## Colorado       103766
## Connecticut      4862
## Delaware         1982
## Florida         54090
## Georgia         58073
## Hawaii           6425
## Idaho           82677
## Illinois        55748
## Indiana         36097
## Iowa            55941
## Kansas          81787
## Kentucky        39650
## Louisiana       44930
## Maine           30920
## Maryland         9891
## Massachusetts    7826
## Michigan        56817
## Minnesota       79289
## Mississippi     47296
## Missouri        68995
## Montana        145587
## Nebraska        76483
## Nevada         109889
## New Hampshire    9027
## New Jersey       7521
## New Mexico     121412
## New York        47831
## North Carolina  48798
## North Dakota    69273
## Ohio            40975
## Oklahoma        68782
## Oregon          96184
## Pennsylvania    44966
## Rhode Island     1049
## South Carolina  30225
## South Dakota    75955
## Tennessee       41328
## Texas          262134
## Utah            82096
## Vermont          9267
## Virginia        39780
## Washington      66570
## West Virginia   24070
## Wisconsin       54464
## Wyoming         97203


因为lm()函数需数据框格式(state.x77数据集是矩阵),为处理方便,需要将其转化为数据框格式。

class(state.x77)
## [1] "matrix"


  1. 采用as.data.frame将其转化
states <- as.data.frame(state.x77[, c("Murder", "Population", 
    "Illiteracy", "Income", "Frost")])
states
##                Murder Population Illiteracy Income Frost
## Alabama          15.1       3615        2.1   3624    20
## Alaska           11.3        365        1.5   6315   152
## Arizona           7.8       2212        1.8   4530    15
## Arkansas         10.1       2110        1.9   3378    65
## California       10.3      21198        1.1   5114    20
## Colorado          6.8       2541        0.7   4884   166
## Connecticut       3.1       3100        1.1   5348   139
## Delaware          6.2        579        0.9   4809   103
## Florida          10.7       8277        1.3   4815    11
## Georgia          13.9       4931        2.0   4091    60
## Hawaii            6.2        868        1.9   4963     0
## Idaho             5.3        813        0.6   4119   126
## Illinois         10.3      11197        0.9   5107   127
## Indiana           7.1       5313        0.7   4458   122
## Iowa              2.3       2861        0.5   4628   140
## Kansas            4.5       2280        0.6   4669   114
## Kentucky         10.6       3387        1.6   3712    95
## Louisiana        13.2       3806        2.8   3545    12
## Maine             2.7       1058        0.7   3694   161
## Maryland          8.5       4122        0.9   5299   101
## Massachusetts     3.3       5814        1.1   4755   103
## Michigan         11.1       9111        0.9   4751   125
## Minnesota         2.3       3921        0.6   4675   160
## Mississippi      12.5       2341        2.4   3098    50
## Missouri          9.3       4767        0.8   4254   108
## Montana           5.0        746        0.6   4347   155
## Nebraska          2.9       1544        0.6   4508   139
## Nevada           11.5        590        0.5   5149   188
## New Hampshire     3.3        812        0.7   4281   174
## New Jersey        5.2       7333        1.1   5237   115
## New Mexico        9.7       1144        2.2   3601   120
## New York         10.9      18076        1.4   4903    82
## North Carolina   11.1       5441        1.8   3875    80
## North Dakota      1.4        637        0.8   5087   186
## Ohio              7.4      10735        0.8   4561   124
## Oklahoma          6.4       2715        1.1   3983    82
## Oregon            4.2       2284        0.6   4660    44
## Pennsylvania      6.1      11860        1.0   4449   126
## Rhode Island      2.4        931        1.3   4558   127
## South Carolina   11.6       2816        2.3   3635    65
## South Dakota      1.7        681        0.5   4167   172
## Tennessee        11.0       4173        1.7   3821    70
## Texas            12.2      12237        2.2   4188    35
## Utah              4.5       1203        0.6   4022   137
## Vermont           5.5        472        0.6   3907   168
## Virginia          9.5       4981        1.4   4701    85
## Washington        4.3       3559        0.6   4864    32
## West Virginia     6.7       1799        1.4   3617   100
## Wisconsin         3.0       4589        0.7   4468   149
## Wyoming           6.9        376        0.6   4566   173

Murder为谋杀率,Population为人口,Illiteracy为文盲率,Income为收入,Frost为结霜天气。



  1. 利用cor()函数,观察变量之间的相关系数
cor(states)
##                Murder Population Illiteracy     Income      Frost
## Murder      1.0000000  0.3436428  0.7029752 -0.2300776 -0.5388834
## Population  0.3436428  1.0000000  0.1076224  0.2082276 -0.3321525
## Illiteracy  0.7029752  0.1076224  1.0000000 -0.4370752 -0.6719470
## Income     -0.2300776  0.2082276 -0.4370752  1.0000000  0.2262822
## Frost      -0.5388834 -0.3321525 -0.6719470  0.2262822  1.0000000
  1. 利用图形观测变量之间关系
library(car)
## Loading required package: carData
scatterplotMatrix(states, spread = FALSE, lty.smooth = 1, 
    main = "Scatterplot Matrix")
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in axis(side = side, at = at, labels = labels, ...): "spread"不是图
## 形参数
## Warning in axis(side = side, at = at, labels = labels, ...): "lty.smooth"不
## 是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in axis(side = side, at = at, labels = labels, ...): "spread"不是图
## 形参数
## Warning in axis(side = side, at = at, labels = labels, ...): "lty.smooth"不
## 是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in axis(side = side, at = at, labels = labels, ...): "spread"不是图
## 形参数
## Warning in axis(side = side, at = at, labels = labels, ...): "lty.smooth"不
## 是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in axis(side = side, at = at, labels = labels, ...): "spread"不是图
## 形参数
## Warning in axis(side = side, at = at, labels = labels, ...): "lty.smooth"不
## 是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in axis(side = side, at = at, labels = labels, ...): "spread"不是图
## 形参数
## Warning in axis(side = side, at = at, labels = labels, ...): "lty.smooth"不
## 是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in axis(side = side, at = at, labels = labels, ...): "spread"不是图
## 形参数
## Warning in axis(side = side, at = at, labels = labels, ...): "lty.smooth"不
## 是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in axis(side = side, at = at, labels = labels, ...): "spread"不是图
## 形参数
## Warning in axis(side = side, at = at, labels = labels, ...): "lty.smooth"不
## 是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in axis(side = side, at = at, labels = labels, ...): "spread"不是图
## 形参数
## Warning in axis(side = side, at = at, labels = labels, ...): "lty.smooth"不
## 是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in plot.window(...): "spread"不是图形参数
## Warning in plot.window(...): "lty.smooth"不是图形参数
## Warning in plot.xy(xy, type, ...): "spread"不是图形参数
## Warning in plot.xy(xy, type, ...): "lty.smooth"不是图形参数
## Warning in title(...): "spread"不是图形参数
## Warning in title(...): "lty.smooth"不是图形参数
## Warning in axis(side = side, at = at, labels = labels, ...): "spread"不是图
## 形参数
## Warning in axis(side = side, at = at, labels = labels, ...): "lty.smooth"不
## 是图形参数
## Warning in axis(side = side, at = at, labels = labels, ...): "spread"不是图
## 形参数
## Warning in axis(side = side, at = at, labels = labels, ...): "lty.smooth"不
## 是图形参数



  1. 使用lm()函数拟合多元线性回归模型
fit3 <- lm(Murder ~ Population + Illiteracy + Income + 
    Frost, data = states)
summary(fit3)
## 
## Call:
## lm(formula = Murder ~ Population + Illiteracy + Income + Frost, 
##     data = states)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.7960 -1.6495 -0.0811  1.4815  7.6210 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.235e+00  3.866e+00   0.319   0.7510    
## Population  2.237e-04  9.052e-05   2.471   0.0173 *  
## Illiteracy  4.143e+00  8.744e-01   4.738 2.19e-05 ***
## Income      6.442e-05  6.837e-04   0.094   0.9253    
## Frost       5.813e-04  1.005e-02   0.058   0.9541    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.535 on 45 degrees of freedom
## Multiple R-squared:  0.567,  Adjusted R-squared:  0.5285 
## F-statistic: 14.73 on 4 and 45 DF,  p-value: 9.133e-08

Illiteracy在p<0.001的水平下显著不为0。 Population在p<0.05的水平下显著不为0。



7.2 选择“最佳”的回归模型



在尝试获取一个回归方程时,实际上面临对着从众多可能的模型中做选择的问题。是不是所有的变量都要包括?抑或去掉那个对预测贡献不显著的变量?本节讨论的问题,就是如何在候选模型中进行筛选。



AIC(Akaike Information Criterion,赤池信息准则)可以用来比较模型,它考虑了模型的 统计拟合度以及用来拟合的参数数目。 AIC值越小的模型要优先选择,它说明模型用较少的参数 获得了足够的拟合度。

\[AIC=2K-2ln(L)\] 其中,k是变量的数量,L是似然函数。

在上节states的多元回归模型中,我们发现Income和Frost的回归系 数不显著,因此,我们可以检验不含这两个变量的模型与包含这两项的模型预测效果是否一样好。



fit4 <- lm(Murder ~ Population + Illiteracy + Income + 
    Frost, data = states)
fit5 <- lm(Murder ~ Population + Illiteracy, data = states)
AIC(fit4, fit5)
##      df      AIC
## fit4  6 241.6429
## fit5  4 237.6565

此处AIC值表明没有Income和Frost的模型更佳。



7.3 逻辑回归



当通过一系列连续型或类别型预测变量来预测二值型(0,1)结果变量时, Logistic回归是一个非 常有用的工具。Logistic回归模型假设Y服从二项分布,线性模型的拟合形式为:

\[ln(\frac{\pi }{1-\pi })=\beta _0+\sum_{j=1}^{n}\beta _jx_j\] 其中\(\pi\)是y的条件均值(即给定一系列x的值时y= 1的概率)。

以AER包中的数据框Affairs为例,我们将通过探究婚外情的数据来阐述Logistic回归的过程。



1、展现Affairs数据集

data(Affairs, package = "AER")
Affairs
##      affairs gender  age yearsmarried children religiousness education
## 4          0   male 37.0       10.000       no             3        18
## 5          0 female 27.0        4.000       no             4        14
## 11         0 female 32.0       15.000      yes             1        12
## 16         0   male 57.0       15.000      yes             5        18
## 23         0   male 22.0        0.750       no             2        17
## 29         0 female 32.0        1.500       no             2        17
## 44         0 female 22.0        0.750       no             2        12
## 45         0   male 57.0       15.000      yes             2        14
## 47         0 female 32.0       15.000      yes             4        16
## 49         0   male 22.0        1.500       no             4        14
## 50         0   male 37.0       15.000      yes             2        20
## 55         0   male 27.0        4.000      yes             4        18
## 64         0   male 47.0       15.000      yes             5        17
## 80         0 female 22.0        1.500       no             2        17
## 86         0 female 27.0        4.000       no             4        14
## 93         0 female 37.0       15.000      yes             1        17
## 108        0 female 37.0       15.000      yes             2        18
## 114        0 female 22.0        0.750       no             3        16
## 115        0 female 22.0        1.500       no             2        16
## 116        0 female 27.0       10.000      yes             2        14
## 123        0 female 22.0        1.500       no             2        16
## 127        0 female 22.0        1.500       no             2        16
## 129        0 female 27.0       10.000      yes             4        16
## 134        0 female 32.0       10.000      yes             3        14
## 137        0   male 37.0        4.000      yes             2        20
## 139        0 female 22.0        1.500       no             2        18
## 147        0 female 27.0        7.000       no             4        16
## 151        0   male 42.0       15.000      yes             5        20
## 153        0   male 27.0        4.000      yes             3        16
## 155        0 female 27.0        4.000      yes             3        17
## 162        0   male 42.0       15.000      yes             4        20
## 163        0 female 22.0        1.500       no             3        16
## 165        0   male 27.0        0.417       no             4        17
## 168        0 female 42.0       15.000      yes             5        14
## 170        0   male 32.0        4.000      yes             1        18
## 172        0 female 22.0        1.500       no             4        16
## 184        0 female 42.0       15.000      yes             3        12
## 187        0 female 22.0        4.000       no             4        17
## 192        0   male 22.0        1.500      yes             1        14
## 194        0 female 22.0        0.750       no             3        16
## 210        0   male 32.0       10.000      yes             5        20
## 217        0   male 52.0       15.000      yes             5        18
## 220        0 female 22.0        0.417       no             5        14
## 224        0 female 27.0        4.000      yes             2        18
## 227        0 female 32.0        7.000      yes             5        17
## 228        0   male 22.0        4.000       no             3        16
## 239        0 female 27.0        7.000      yes             4        18
## 241        0 female 42.0       15.000      yes             2        18
## 245        0   male 27.0        1.500      yes             4        16
## 249        0   male 42.0       15.000      yes             2        20
## 262        0 female 22.0        0.750       no             5        14
## 265        0   male 32.0        7.000      yes             2        20
## 267        0   male 27.0        4.000      yes             5        20
## 269        0   male 27.0       10.000      yes             4        20
## 271        0   male 22.0        4.000       no             1        18
## 277        0 female 37.0       15.000      yes             4        14
## 290        0   male 22.0        1.500      yes             5        16
## 292        0 female 37.0       15.000      yes             4        17
## 293        0 female 27.0        0.750       no             4        17
## 295        0   male 32.0       10.000      yes             4        20
## 299        0 female 47.0       15.000      yes             5        14
## 320        0   male 37.0       10.000      yes             3        20
## 321        0 female 22.0        0.750       no             2        16
## 324        0   male 27.0        4.000       no             2        18
## 334        0   male 32.0        7.000       no             4        20
## 351        0   male 42.0       15.000      yes             2        17
## 355        0   male 37.0       10.000      yes             4        20
## 361        0 female 47.0       15.000      yes             3        17
## 362        0 female 22.0        1.500       no             5        16
## 366        0 female 27.0        1.500       no             2        16
## 370        0 female 27.0        4.000       no             3        17
## 374        0 female 32.0       10.000      yes             5        14
## 378        0 female 22.0        0.125       no             2        12
## 381        0   male 47.0       15.000      yes             4        14
## 382        0   male 32.0       15.000      yes             1        14
## 383        0   male 27.0        7.000      yes             4        16
## 384        0 female 22.0        1.500      yes             3        16
## 400        0   male 27.0        4.000      yes             3        17
## 403        0 female 22.0        1.500       no             3        16
## 409        0   male 57.0       15.000      yes             2        14
## 412        0   male 17.5        1.500      yes             3        18
## 413        0   male 57.0       15.000      yes             4        20
## 416        0 female 22.0        0.750       no             2        16
## 418        0   male 42.0        4.000       no             4        17
## 422        0 female 22.0        1.500      yes             4        12
## 435        0 female 22.0        0.417       no             1        17
## 439        0 female 32.0       15.000      yes             4        17
## 445        0 female 27.0        1.500       no             3        18
## 447        0 female 22.0        1.500      yes             3        14
## 448        0 female 37.0       15.000      yes             3        14
## 449        0 female 32.0       15.000      yes             4        14
## 478        0   male 37.0       10.000      yes             2        14
## 482        0   male 37.0       10.000      yes             4        16
## 486        0   male 57.0       15.000      yes             5        20
## 489        0   male 27.0        0.417       no             1        16
## 490        0 female 42.0       15.000      yes             5        14
## 491        0   male 57.0       15.000      yes             3        16
## 492        0   male 37.0       10.000      yes             1        16
## 503        0   male 37.0       15.000      yes             3        17
## 508        0   male 37.0       15.000      yes             4        20
## 509        0 female 27.0       10.000      yes             5        14
## 512        0   male 37.0       10.000      yes             2        18
## 515        0 female 22.0        0.125       no             4        12
## 517        0   male 57.0       15.000      yes             5        20
## 532        0 female 37.0       15.000      yes             4        18
## 533        0   male 22.0        4.000      yes             4        14
## 535        0   male 27.0        7.000      yes             4        18
## 537        0   male 57.0       15.000      yes             4        20
## 538        0   male 32.0       15.000      yes             3        14
## 543        0 female 22.0        1.500       no             2        14
## 547        0 female 32.0        7.000      yes             4        17
## 550        0 female 37.0       15.000      yes             4        17
## 558        0 female 32.0        1.500       no             5        18
## 571        0   male 42.0       10.000      yes             5        20
## 578        0 female 27.0        7.000       no             3        16
## 583        0   male 37.0       15.000       no             4        20
## 586        0   male 37.0       15.000      yes             4        14
## 594        0   male 32.0       10.000       no             5        18
## 597        0 female 22.0        0.750       no             4        16
## 602        0 female 27.0        7.000      yes             4        12
## 603        0 female 27.0        7.000      yes             2        16
## 604        0 female 42.0       15.000      yes             5        18
## 612        0   male 42.0       15.000      yes             4        17
## 613        0 female 27.0        7.000      yes             2        16
## 621        0 female 22.0        1.500       no             3        16
## 627        0   male 37.0       15.000      yes             5        20
## 630        0 female 22.0        0.125       no             2        14
## 631        0   male 27.0        1.500       no             4        16
## 632        0   male 32.0        1.500       no             2        18
## 639        0   male 27.0        1.500       no             2        17
## 645        0 female 27.0       10.000      yes             4        16
## 647        0   male 42.0       15.000      yes             4        18
## 648        0 female 27.0        1.500       no             2        16
## 651        0   male 27.0        4.000       no             2        18
## 655        0 female 32.0       10.000      yes             3        14
## 667        0 female 32.0       15.000      yes             3        18
## 670        0 female 22.0        0.750       no             2        18
## 671        0 female 37.0       15.000      yes             2        16
## 673        0   male 27.0        4.000      yes             4        20
## 701        0   male 27.0        4.000       no             1        20
## 705        0 female 27.0       10.000      yes             2        12
## 706        0 female 32.0       15.000      yes             5        18
## 709        0   male 27.0        7.000      yes             5        12
## 717        0   male 52.0       15.000      yes             2        18
## 719        0   male 27.0        4.000       no             3        20
## 723        0   male 37.0        4.000      yes             1        18
## 724        0   male 27.0        4.000      yes             4        14
## 726        0 female 52.0       15.000      yes             5        12
## 734        0 female 57.0       15.000      yes             4        16
## 735        0   male 27.0        7.000      yes             1        16
## 736        0   male 37.0        7.000      yes             4        20
## 737        0   male 22.0        0.750       no             2        14
## 739        0   male 32.0        4.000      yes             2        18
## 743        0   male 37.0       15.000      yes             4        20
## 745        0   male 22.0        0.750      yes             2        14
## 747        0   male 42.0       15.000      yes             4        20
## 751        0 female 52.0       15.000      yes             5        17
## 752        0 female 37.0       15.000      yes             4        14
## 754        0   male 27.0        7.000      yes             4        14
## 760        0   male 32.0        4.000      yes             2        16
## 763        0 female 27.0        4.000      yes             2        18
## 774        0 female 27.0        4.000      yes             2        18
## 776        0   male 37.0       15.000      yes             5        18
## 779        0 female 47.0       15.000      yes             5        12
## 784        0 female 32.0       10.000      yes             3        17
## 788        0 female 27.0        1.500      yes             4        17
## 794        0 female 57.0       15.000      yes             2        18
## 795        0 female 22.0        1.500       no             4        14
## 798        0   male 42.0       15.000      yes             3        14
## 800        0   male 57.0       15.000      yes             4         9
## 803        0   male 57.0       15.000      yes             4        20
## 807        0 female 22.0        0.125       no             4        14
## 812        0 female 32.0       10.000      yes             4        14
## 820        0 female 42.0       15.000      yes             3        18
## 823        0 female 27.0        1.500       no             2        18
## 830        0   male 32.0        0.125      yes             2        18
## 843        0 female 27.0        4.000       no             3        16
## 848        0 female 27.0       10.000      yes             2        16
## 851        0 female 32.0        7.000      yes             4        16
## 854        0 female 37.0       15.000      yes             4        14
## 856        0 female 42.0       15.000      yes             5        17
## 857        0   male 32.0        1.500      yes             4        14
## 859        0 female 32.0        4.000      yes             3        17
## 863        0 female 37.0        7.000       no             4        18
## 865        0 female 22.0        0.417      yes             3        14
## 867        0 female 27.0        7.000      yes             4        14
## 870        0   male 27.0        0.750       no             3        16
## 873        0   male 27.0        4.000      yes             2        20
## 875        0   male 32.0       10.000      yes             4        16
## 876        0   male 32.0       15.000      yes             1        14
## 877        0   male 22.0        0.750       no             3        17
## 880        0 female 27.0        7.000      yes             4        17
## 903        0   male 27.0        0.417      yes             4        20
## 904        0   male 37.0       15.000      yes             4        20
## 905        0 female 37.0       15.000      yes             2        14
## 908        0   male 22.0        4.000      yes             1        18
## 909        0   male 37.0       15.000      yes             4        17
## 910        0 female 22.0        1.500       no             2        14
## 912        0   male 52.0       15.000      yes             4        14
## 914        0 female 22.0        1.500       no             4        17
## 915        0   male 32.0        4.000      yes             5        14
## 916        0   male 32.0        4.000      yes             2        14
## 920        0 female 22.0        1.500       no             3        16
## 921        0   male 27.0        0.750       no             2        18
## 925        0 female 22.0        7.000      yes             2        14
## 926        0 female 27.0        0.750       no             2        17
## 929        0 female 37.0       15.000      yes             4        12
## 931        0 female 22.0        1.500       no             1        14
## 945        0 female 37.0       10.000       no             2        12
## 947        0 female 37.0       15.000      yes             4        18
## 949        0 female 42.0       15.000      yes             3        12
## 950        0   male 22.0        4.000       no             2        18
## 961        0   male 52.0        7.000      yes             2        20
## 965        0   male 27.0        0.750       no             2        17
## 966        0 female 27.0        4.000       no             2        17
## 967        0   male 42.0        1.500       no             5        20
## 987        0   male 22.0        1.500       no             4        17
## 990        0   male 22.0        4.000       no             4        17
## 992        0 female 22.0        4.000      yes             1        14
## 995        0   male 37.0       15.000      yes             5        20
## 1009       0 female 37.0       10.000      yes             3        16
## 1021       0   male 42.0       15.000      yes             4        17
## 1026       0 female 47.0       15.000      yes             4        17
## 1027       0   male 22.0        1.500       no             4        16
## 1030       0 female 32.0       10.000      yes             3        12
## 1031       0 female 22.0        7.000      yes             1        14
## 1034       0 female 32.0       10.000      yes             4        17
## 1037       0   male 27.0        1.500      yes             2        16
## 1038       0   male 37.0       15.000      yes             4        14
## 1039       0   male 42.0        4.000      yes             3        14
## 1045       0 female 37.0       15.000      yes             5        14
## 1046       0 female 32.0        7.000      yes             4        17
## 1054       0 female 42.0       15.000      yes             4        18
## 1059       0   male 27.0        4.000       no             4        18
## 1063       0   male 22.0        0.750       no             4        18
## 1068       0   male 27.0        4.000      yes             4        14
## 1070       0 female 22.0        0.750       no             5        18
## 1072       0 female 52.0       15.000      yes             5         9
## 1073       0   male 32.0       10.000      yes             3        14
## 1077       0 female 37.0       15.000      yes             4        16
## 1081       0   male 32.0        7.000      yes             2        20
## 1083       0 female 42.0       15.000      yes             3        18
## 1084       0   male 32.0       15.000      yes             1        16
## 1086       0   male 27.0        4.000      yes             3        18
## 1087       0 female 32.0       15.000      yes             4        12
## 1089       0   male 22.0        0.750      yes             3        14
## 1096       0 female 22.0        1.500       no             3        16
## 1102       0 female 42.0       15.000      yes             4        14
## 1103       0 female 52.0       15.000      yes             3        16
## 1107       0   male 37.0       15.000      yes             5        20
## 1109       0 female 47.0       15.000      yes             4        12
## 1115       0   male 57.0       15.000      yes             2        20
## 1119       0   male 32.0        7.000      yes             4        17
## 1124       0 female 27.0        7.000      yes             4        17
## 1126       0   male 22.0        1.500       no             1        18
## 1128       0 female 22.0        4.000      yes             3         9
## 1129       0 female 22.0        1.500       no             2        14
## 1130       0   male 42.0       15.000      yes             2        20
## 1133       0   male 57.0       15.000      yes             4         9
## 1140       0 female 27.0        7.000      yes             2        18
## 1143       0 female 22.0        4.000      yes             3        14
## 1146       0   male 37.0       15.000      yes             4        14
## 1153       0   male 32.0        7.000      yes             1        18
## 1156       0 female 22.0        1.500       no             2        14
## 1157       0 female 22.0        1.500      yes             3        12
## 1158       0   male 52.0       15.000      yes             2        14
## 1160       0 female 37.0       15.000      yes             2        14
## 1161       0 female 32.0       10.000      yes             2        14
## 1166       0   male 42.0       15.000      yes             4        20
## 1177       0 female 27.0        4.000      yes             3        18
## 1178       0   male 37.0       15.000      yes             4        20
## 1180       0   male 27.0        1.500       no             3        18
## 1187       0 female 22.0        0.125       no             2        16
## 1191       0   male 32.0       10.000      yes             2        20
## 1195       0 female 27.0        4.000       no             4        18
## 1207       0 female 27.0        7.000      yes             2        12
## 1208       0   male 32.0        4.000      yes             5        18
## 1209       0 female 37.0       15.000      yes             2        17
## 1211       0   male 47.0       15.000       no             4        20
## 1215       0   male 27.0        1.500       no             1        18
## 1221       0   male 37.0       15.000      yes             4        20
## 1226       0 female 32.0       15.000      yes             4        18
## 1229       0 female 32.0        7.000      yes             4        17
## 1231       0 female 42.0       15.000      yes             3        14
## 1234       0 female 27.0        7.000      yes             3        16
## 1235       0   male 27.0        1.500       no             3        16
## 1242       0   male 22.0        1.500       no             3        16
## 1245       0   male 27.0        4.000      yes             3        16
## 1260       0 female 27.0        7.000      yes             3        12
## 1266       0 female 37.0       15.000      yes             2        18
## 1271       0 female 37.0        7.000      yes             3        14
## 1273       0   male 22.0        1.500       no             2        16
## 1276       0   male 37.0       15.000      yes             5        20
## 1280       0 female 22.0        1.500       no             4        16
## 1282       0 female 32.0       10.000      yes             4        16
## 1285       0   male 27.0        4.000       no             2        17
## 1295       0 female 22.0        0.417       no             4        14
## 1298       0 female 27.0        4.000       no             2        18
## 1299       0   male 37.0       15.000      yes             4        18
## 1304       0   male 37.0       10.000      yes             5        20
## 1305       0 female 27.0        7.000      yes             2        14
## 1311       0   male 32.0        4.000      yes             2        16
## 1314       0   male 32.0        4.000      yes             2        16
## 1319       0   male 22.0        1.500       no             3        18
## 1322       0 female 22.0        4.000      yes             4        14
## 1324       0 female 17.5        0.750       no             2        18
## 1327       0   male 32.0       10.000      yes             4        20
## 1328       0 female 32.0        0.750       no             5        14
## 1330       0   male 37.0       15.000      yes             4        17
## 1332       0   male 32.0        4.000       no             3        14
## 1333       0 female 27.0        1.500       no             2        17
## 1336       0 female 22.0        7.000      yes             4        14
## 1341       0   male 47.0       15.000      yes             5        14
## 1344       0   male 27.0        4.000      yes             1        16
## 1352       0 female 37.0       15.000      yes             5        14
## 1358       0   male 42.0        4.000      yes             4        18
## 1359       0 female 32.0        4.000      yes             2        14
## 1361       0   male 52.0       15.000      yes             2        14
## 1364       0 female 22.0        1.500       no             2        16
## 1368       0   male 52.0       15.000      yes             4        12
## 1384       0 female 22.0        0.417       no             3        17
## 1390       0 female 22.0        1.500       no             2        16
## 1393       0   male 27.0        4.000      yes             4        20
## 1394       0 female 32.0       15.000      yes             4        14
## 1402       0 female 27.0        1.500       no             2        16
## 1407       0   male 32.0        4.000       no             1        20
## 1408       0   male 37.0       15.000      yes             3        20
## 1412       0 female 32.0       10.000       no             2        16
## 1413       0 female 32.0       10.000      yes             5        14
## 1416       0   male 37.0        1.500      yes             4        18
## 1417       0   male 32.0        1.500       no             2        18
## 1418       0 female 32.0       10.000      yes             4        14
## 1419       0 female 47.0       15.000      yes             4        18
## 1420       0 female 27.0       10.000      yes             5        12
## 1423       0   male 27.0        4.000      yes             3        16
## 1424       0 female 37.0       15.000      yes             4        12
## 1432       0 female 27.0        0.750       no             4        16
## 1433       0 female 37.0       15.000      yes             4        16
## 1437       0 female 32.0       15.000      yes             3        16
## 1438       0 female 27.0       10.000      yes             2        16
## 1439       0   male 27.0        7.000       no             2        20
## 1446       0 female 37.0       15.000      yes             2        14
## 1450       0   male 27.0        1.500      yes             2        17
## 1451       0 female 22.0        0.750      yes             2        14
## 1452       0   male 22.0        4.000      yes             4        14
## 1453       0   male 42.0        0.125       no             4        17
## 1456       0   male 27.0        1.500      yes             4        18
## 1464       0   male 27.0        7.000      yes             3        16
## 1469       0 female 52.0       15.000      yes             4        14
## 1473       0   male 27.0        1.500       no             5        20
## 1481       0 female 27.0        1.500       no             2        16
## 1482       0 female 27.0        1.500       no             3        17
## 1496       0   male 22.0        0.125       no             5        16
## 1497       0 female 27.0        4.000      yes             4        16
## 1504       0 female 27.0        4.000      yes             4        12
## 1513       0 female 47.0       15.000      yes             2        14
## 1515       0 female 32.0       15.000      yes             3        14
## 1534       0   male 42.0        7.000      yes             2        16
## 1535       0   male 22.0        0.750       no             4        16
## 1536       0   male 27.0        0.125       no             3        20
## 1540       0   male 32.0       10.000      yes             3        20
## 1551       0 female 22.0        0.417       no             5        14
## 1555       0 female 47.0       15.000      yes             5        14
## 1557       0 female 32.0       10.000      yes             3        14
## 1566       0   male 57.0       15.000      yes             4        17
## 1567       0   male 27.0        4.000      yes             3        20
## 1576       0 female 32.0        7.000      yes             4        17
## 1584       0 female 37.0       10.000      yes             4        16
## 1585       0 female 32.0       10.000      yes             1        18
## 1590       0 female 22.0        4.000       no             3        14
## 1594       0 female 27.0        7.000      yes             4        14
## 1595       0   male 57.0       15.000      yes             5        18
## 1603       0   male 32.0        7.000      yes             2        18
## 1608       0 female 27.0        1.500       no             4        17
## 1609       0   male 22.0        1.500       no             4        14
## 1615       0 female 22.0        1.500      yes             4        14
## 1616       0 female 32.0        7.000      yes             3        16
## 1617       0 female 47.0       15.000      yes             3        16
## 1620       0 female 22.0        0.750       no             3        16
## 1621       0 female 22.0        1.500      yes             2        14
## 1637       0 female 27.0        4.000      yes             1        16
## 1638       0   male 52.0       15.000      yes             4        16
## 1650       0   male 32.0       10.000      yes             4        20
## 1654       0   male 47.0       15.000      yes             4        16
## 1665       0 female 27.0        7.000      yes             2        14
## 1670       0 female 22.0        1.500       no             4        14
## 1671       0 female 32.0       10.000      yes             2        16
## 1675       0 female 22.0        0.750       no             2        16
## 1688       0 female 22.0        1.500       no             2        16
## 1691       0 female 42.0       15.000      yes             3        18
## 1695       0 female 27.0        7.000      yes             5        14
## 1698       0   male 42.0       15.000      yes             4        16
## 1704       0 female 57.0       15.000      yes             3        18
## 1705       0   male 42.0       15.000      yes             3        18
## 1711       0 female 32.0        7.000      yes             2        14
## 1719       0   male 22.0        4.000       no             5        12
## 1723       0 female 22.0        1.500       no             1        16
## 1726       0 female 22.0        0.750       no             1        14
## 1749       0 female 32.0       15.000      yes             4        12
## 1752       0   male 22.0        1.500       no             2        18
## 1754       0   male 27.0        4.000      yes             5        17
## 1758       0 female 27.0        4.000      yes             4        12
## 1761       0   male 42.0       15.000      yes             5        18
## 1773       0   male 32.0        1.500       no             2        20
## 1775       0   male 57.0       15.000       no             4         9
## 1786       0   male 37.0        7.000       no             4        18
## 1793       0   male 52.0       15.000      yes             2        17
## 1799       0   male 47.0       15.000      yes             4        17
## 1803       0 female 27.0        7.000       no             2        17
## 1806       0 female 27.0        7.000      yes             4        14
## 1807       0 female 22.0        4.000       no             2        14
## 1808       0   male 37.0        7.000      yes             2        20
## 1814       0   male 27.0        7.000       no             4        12
## 1815       0   male 42.0       10.000      yes             4        18
## 1818       0 female 22.0        1.500       no             3        14
## 1827       0 female 22.0        4.000      yes             2        14
## 1834       0 female 57.0       15.000       no             4        20
## 1835       0   male 37.0       15.000      yes             4        14
## 1843       0 female 27.0        7.000      yes             3        18
## 1846       0 female 17.5       10.000       no             4        14
## 1850       0   male 22.0        4.000      yes             4        16
## 1851       0 female 27.0        4.000      yes             2        16
## 1854       0 female 37.0       15.000      yes             2        14
## 1859       0 female 22.0        1.500       no             5        14
## 1861       0   male 27.0        7.000      yes             2        20
## 1866       0   male 27.0        4.000      yes             4        14
## 1873       0   male 22.0        0.125       no             1        16
## 1875       0 female 27.0        7.000      yes             4        14
## 1885       0 female 32.0       15.000      yes             5        16
## 1892       0   male 32.0       10.000      yes             4        18
## 1895       0 female 32.0       15.000      yes             2        14
## 1896       0 female 22.0        1.500       no             3        17
## 1897       0   male 27.0        4.000      yes             4        17
## 1899       0 female 52.0       15.000      yes             5        14
## 1904       0 female 27.0        7.000      yes             2        12
## 1905       0 female 27.0        7.000      yes             3        12
## 1908       0 female 42.0       15.000      yes             2        14
## 1916       0 female 42.0       15.000      yes             4        14
## 1918       0   male 27.0        7.000      yes             4        14
## 1920       0   male 27.0        7.000      yes             2        20
## 1930       0 female 42.0       15.000      yes             3        12
## 1940       0   male 27.0        4.000      yes             3        16
## 1947       0 female 27.0        7.000      yes             3        14
## 1949       0 female 22.0        1.500       no             2        14
## 1951       0 female 27.0        4.000      yes             4        14
## 1952       0 female 22.0        4.000       no             4        14
## 1960       0 female 22.0        1.500       no             2        16
## 9001       0   male 47.0       15.000       no             4        14
## 9012       0   male 37.0       10.000      yes             2        18
## 9023       0   male 37.0       15.000      yes             3        17
## 9029       0 female 27.0        4.000      yes             2        16
## 6          3   male 27.0        1.500       no             3        18
## 12         3 female 27.0        4.000      yes             3        17
## 43         7   male 37.0       15.000      yes             5        18
## 53        12 female 32.0       10.000      yes             3        17
## 67         1   male 22.0        0.125       no             4        16
## 79         1 female 22.0        1.500      yes             2        14
## 122       12   male 37.0       15.000      yes             4        14
## 126        7 female 22.0        1.500       no             2        14
## 133        2   male 37.0       15.000      yes             2        18
## 138        3 female 32.0       15.000      yes             4        12
## 154        1 female 37.0       15.000      yes             4        14
## 159        7 female 42.0       15.000      yes             3        17
## 174       12 female 42.0       15.000      yes             5         9
## 176       12   male 37.0       10.000      yes             2        20
## 181       12 female 32.0       15.000      yes             3        14
## 182        3   male 27.0        4.000       no             1        18
## 186        7   male 37.0       10.000      yes             2        18
## 189        7 female 27.0        4.000       no             3        17
## 204        1   male 42.0       15.000      yes             4        16
## 215        1 female 47.0       15.000      yes             5        14
## 232        7 female 27.0        4.000      yes             3        18
## 233        1 female 27.0        7.000      yes             5        14
## 252       12   male 27.0        1.500      yes             3        17
## 253       12 female 27.0        7.000      yes             4        14
## 274        3 female 42.0       15.000      yes             4        16
## 275        7 female 27.0       10.000      yes             4        12
## 287        1   male 27.0        1.500       no             2        18
## 288        1   male 32.0        4.000       no             4        20
## 325        1 female 27.0        7.000      yes             3        14
## 328        3 female 32.0       10.000      yes             4        14
## 344        3   male 27.0        4.000      yes             2        18
## 353        1 female 17.5        0.750       no             5        14
## 354        1 female 32.0       10.000      yes             4        18
## 367        7 female 32.0        7.000      yes             2        17
## 369        7   male 37.0       15.000      yes             2        20
## 390        7 female 37.0       10.000       no             1        20
## 392       12 female 32.0       10.000      yes             2        16
## 423        7   male 52.0       15.000      yes             2        20
## 432        7 female 42.0       15.000      yes             1        12
## 436        1   male 52.0       15.000      yes             2        20
## 483        2   male 37.0       15.000      yes             3        18
## 513       12 female 22.0        4.000       no             3        12
## 516       12   male 27.0        7.000      yes             1        18
## 518        1   male 27.0        4.000      yes             3        18
## 520       12   male 47.0       15.000      yes             4        17
## 526       12 female 42.0       15.000      yes             4        12
## 528        7   male 27.0        4.000       no             3        14
## 553        7 female 32.0        7.000      yes             4        18
## 576        1   male 32.0        0.417      yes             3        12
## 611        3   male 47.0       15.000      yes             5        16
## 625       12   male 37.0       15.000      yes             2        20
## 635        7   male 22.0        4.000      yes             2        17
## 646        1   male 27.0        4.000       no             2        14
## 657        7 female 52.0       15.000      yes             5        16
## 659        1   male 27.0        4.000       no             3        14
## 666        1 female 27.0       10.000      yes             4        16
## 679        1   male 32.0        7.000      yes             3        14
## 729        7   male 32.0        7.000      yes             2        18
## 755        3   male 22.0        1.500       no             1        14
## 758        7   male 22.0        4.000      yes             3        18
## 770        7   male 42.0       15.000      yes             4        20
## 786        2 female 57.0       15.000      yes             1        18
## 797        7 female 32.0        4.000      yes             3        18
## 811        1   male 27.0        4.000      yes             1        16
## 834        7   male 32.0        7.000      yes             4        16
## 858        2   male 57.0       15.000      yes             1        17
## 885        7 female 42.0       15.000      yes             4        14
## 893        7   male 37.0       10.000      yes             1        18
## 927        3   male 42.0       15.000      yes             3        17
## 928        1 female 52.0       15.000      yes             3        14
## 933        2 female 27.0        7.000      yes             3        17
## 951       12   male 32.0        7.000      yes             2        12
## 968        1   male 22.0        4.000       no             4        14
## 972        3   male 27.0        7.000      yes             3        18
## 975       12 female 37.0       15.000      yes             1        18
## 977        7 female 32.0       15.000      yes             3        17
## 981        7 female 27.0        7.000       no             2        17
## 986        1 female 32.0        7.000      yes             3        17
## 1002       1   male 32.0        1.500      yes             2        14
## 1007      12 female 42.0       15.000      yes             4        14
## 1011       7   male 32.0       10.000      yes             3        14
## 1035       7   male 37.0        4.000      yes             1        20
## 1050       1 female 27.0        4.000      yes             2        16
## 1056      12 female 42.0       15.000      yes             3        14
## 1057       1   male 27.0       10.000      yes             5        20
## 1075      12   male 37.0       10.000      yes             2        20
## 1080      12 female 27.0        7.000      yes             1        14
## 1125       3 female 27.0        7.000      yes             4        12
## 1131       3   male 32.0       10.000      yes             2        14
## 1138      12 female 17.5        0.750      yes             2        12
## 1150      12 female 32.0       15.000      yes             3        18
## 1163       2 female 22.0        7.000       no             4        14
## 1169       1   male 32.0        7.000      yes             4        20
## 1198       7   male 27.0        4.000      yes             2        18
## 1204       1 female 22.0        1.500      yes             5        14
## 1218      12 female 32.0       15.000       no             3        17
## 1230      12 female 42.0       15.000      yes             2        12
## 1236       7   male 42.0       15.000      yes             3        20
## 1247      12   male 32.0       10.000       no             2        18
## 1259      12 female 32.0       15.000      yes             3         9
## 1294       7   male 57.0       15.000      yes             5        20
## 1353      12   male 47.0       15.000      yes             4        20
## 1370       2 female 42.0       15.000      yes             2        17
## 1427      12   male 37.0       15.000      yes             3        17
## 1445      12   male 37.0       15.000      yes             5        17
## 1460       7   male 27.0       10.000      yes             2        20
## 1480       2   male 37.0       15.000      yes             2        16
## 1505      12 female 32.0       15.000      yes             1        14
## 1543       7   male 32.0       10.000      yes             3        17
## 1548       2   male 37.0       15.000      yes             4        18
## 1550       7 female 27.0        1.500       no             2        17
## 1561       3 female 47.0       15.000      yes             2        17
## 1564      12   male 37.0       15.000      yes             2        17
## 1573      12 female 27.0        4.000       no             2        14
## 1575       2 female 27.0       10.000      yes             4        14
## 1599       1 female 22.0        4.000      yes             3        16
## 1622      12   male 52.0        7.000       no             4        16
## 1629       2 female 27.0        4.000      yes             1        16
## 1664       7 female 37.0       15.000      yes             2        17
## 1669       2 female 27.0        4.000       no             1        17
## 1674      12 female 17.5        0.750      yes             2        12
## 1682       7 female 32.0       15.000      yes             5        18
## 1685       7 female 22.0        4.000       no             1        16
## 1697       2   male 32.0        4.000      yes             4        18
## 1716       1 female 22.0        1.500      yes             3        18
## 1730       3 female 42.0       15.000      yes             2        17
## 1731       1   male 32.0        7.000      yes             4        16
## 1732      12   male 37.0       15.000       no             3        14
## 1743       1   male 42.0       15.000      yes             3        16
## 1751       1   male 27.0        4.000      yes             1        18
## 1757       2   male 37.0       15.000      yes             4        20
## 1763       7   male 37.0       15.000      yes             3        20
## 1766       3   male 22.0        1.500       no             2        12
## 1772       3   male 32.0        4.000      yes             3        20
## 1776       2   male 32.0       15.000      yes             5        20
## 1782      12 female 52.0       15.000      yes             1        18
## 1784      12   male 47.0       15.000       no             1        18
## 1791       3 female 32.0       15.000      yes             4        16
## 1831       7 female 32.0       15.000      yes             3        14
## 1840       7 female 27.0        7.000      yes             4        16
## 1844      12   male 42.0       15.000      yes             3        18
## 1856       7 female 42.0       15.000      yes             2        14
## 1876      12   male 27.0        7.000      yes             2        17
## 1929       3   male 32.0       10.000      yes             4        14
## 1935       7   male 47.0       15.000      yes             3        16
## 1938       1   male 22.0        1.500      yes             1        12
## 1941       7 female 32.0       10.000      yes             2        18
## 1954       2   male 32.0       10.000      yes             2        17
## 1959       2   male 22.0        7.000      yes             3        18
## 9010       1 female 32.0       15.000      yes             3        14
##      occupation rating
## 4             7      4
## 5             6      4
## 11            1      4
## 16            6      5
## 23            6      3
## 29            5      5
## 44            1      3
## 45            4      4
## 47            1      2
## 49            4      5
## 50            7      2
## 55            6      4
## 64            6      4
## 80            5      4
## 86            5      4
## 93            5      5
## 108           4      3
## 114           5      4
## 115           5      5
## 116           1      5
## 123           5      5
## 127           5      5
## 129           5      4
## 134           1      5
## 137           6      4
## 139           5      5
## 147           1      5
## 151           6      4
## 153           5      5
## 155           5      4
## 162           6      3
## 163           5      5
## 165           6      4
## 168           5      4
## 170           6      4
## 172           5      3
## 184           1      4
## 187           5      5
## 192           3      5
## 194           1      5
## 210           6      5
## 217           6      3
## 220           1      4
## 224           6      1
## 227           5      3
## 228           5      5
## 239           6      5
## 241           5      4
## 245           3      5
## 249           6      4
## 262           3      5
## 265           6      4
## 267           6      5
## 269           6      4
## 271           5      5
## 277           3      1
## 290           4      4
## 292           1      5
## 293           5      4
## 295           6      4
## 299           7      2
## 320           6      4
## 321           5      5
## 324           4      5
## 334           6      4
## 351           3      5
## 355           6      4
## 361           6      5
## 362           5      5
## 366           6      4
## 370           5      5
## 374           4      5
## 378           5      5
## 381           4      3
## 382           5      5
## 383           5      5
## 384           5      5
## 400           6      5
## 403           5      5
## 409           7      2
## 412           6      5
## 413           6      5
## 416           3      4
## 418           3      3
## 422           1      5
## 435           6      4
## 439           5      5
## 445           5      2
## 447           1      5
## 448           1      4
## 449           3      4
## 478           5      3
## 482           5      4
## 486           5      3
## 489           3      4
## 490           1      5
## 491           6      1
## 492           6      4
## 503           5      5
## 508           6      5
## 509           1      5
## 512           6      4
## 515           4      5
## 517           6      5
## 532           6      4
## 533           6      4
## 535           5      4
## 537           5      4
## 538           6      3
## 543           5      4
## 547           1      5
## 550           6      5
## 558           5      5
## 571           7      4
## 578           5      4
## 583           6      5
## 586           3      2
## 594           6      4
## 597           1      5
## 602           2      4
## 603           2      5
## 604           5      4
## 612           5      3
## 613           1      2
## 621           5      5
## 627           6      5
## 630           4      5
## 631           5      5
## 632           6      5
## 639           6      5
## 645           1      3
## 647           6      5
## 648           6      5
## 651           6      3
## 655           5      3
## 667           5      4
## 670           6      5
## 671           1      4
## 673           5      5
## 701           5      4
## 705           1      4
## 706           6      4
## 709           5      3
## 717           5      4
## 719           6      3
## 723           5      4
## 724           5      4
## 726           1      3
## 734           6      4
## 735           5      4
## 736           6      3
## 737           4      3
## 739           5      3
## 743           6      3
## 745           4      3
## 747           6      3
## 751           1      1
## 752           1      2
## 754           5      3
## 760           5      5
## 763           6      5
## 774           5      5
## 776           6      5
## 779           5      4
## 784           1      4
## 788           1      2
## 794           5      2
## 795           5      4
## 798           3      4
## 800           2      2
## 803           6      5
## 807           4      5
## 812           1      5
## 820           5      4
## 823           6      5
## 830           5      2
## 843           5      4
## 848           1      4
## 851           1      3
## 854           5      4
## 856           6      2
## 857           6      5
## 859           5      3
## 863           5      5
## 865           3      5
## 867           1      5
## 870           5      5
## 873           5      5
## 875           4      5
## 876           5      5
## 877           4      5
## 880           1      4
## 903           5      4
## 904           5      4
## 905           1      3
## 908           5      4
## 909           5      3
## 910           4      5
## 912           6      2
## 914           5      5
## 915           3      5
## 916           3      5
## 920           6      5
## 921           3      3
## 925           5      2
## 926           5      3
## 929           1      2
## 931           1      5
## 945           4      4
## 947           5      3
## 949           3      3
## 950           5      5
## 961           6      2
## 965           5      5
## 966           4      5
## 967           6      5
## 987           6      5
## 990           5      3
## 992           5      4
## 995           4      5
## 1009          6      3
## 1021          6      5
## 1026          5      5
## 1027          5      4
## 1030          1      4
## 1031          3      5
## 1034          5      4
## 1037          2      4
## 1038          5      5
## 1039          4      5
## 1045          5      4
## 1046          5      5
## 1054          6      5
## 1059          6      4
## 1063          6      5
## 1068          5      3
## 1070          1      5
## 1072          5      5
## 1073          5      5
## 1077          4      4
## 1081          5      4
## 1083          1      4
## 1084          5      5
## 1086          5      5
## 1087          3      4
## 1089          2      4
## 1096          5      3
## 1102          3      5
## 1103          5      4
## 1107          6      4
## 1109          2      3
## 1115          6      4
## 1119          5      5
## 1124          1      4
## 1126          6      5
## 1128          1      4
## 1129          1      5
## 1130          6      4
## 1133          2      4
## 1140          1      5
## 1143          1      5
## 1146          5      3
## 1153          6      4
## 1156          5      5
## 1157          1      3
## 1158          5      5
## 1160          1      1
## 1161          5      5
## 1166          4      5
## 1177          4      5
## 1178          6      5
## 1180          5      5
## 1187          6      3
## 1191          6      3
## 1195          5      4
## 1207          5      1
## 1208          6      3
## 1209          5      5
## 1211          6      4
## 1215          5      5
## 1221          6      4
## 1226          1      4
## 1229          5      4
## 1231          1      3
## 1234          1      4
## 1235          4      2
## 1242          3      5
## 1245          4      2
## 1260          1      2
## 1266          5      4
## 1271          4      4
## 1273          5      5
## 1276          5      4
## 1280          5      3
## 1282          1      5
## 1285          5      3
## 1295          5      5
## 1298          5      5
## 1299          5      3
## 1304          7      4
## 1305          4      2
## 1311          5      5
## 1314          6      4
## 1319          4      5
## 1322          3      4
## 1324          5      4
## 1327          4      5
## 1328          3      3
## 1330          5      3
## 1332          4      5
## 1333          3      2
## 1336          1      5
## 1341          6      5
## 1344          4      4
## 1352          1      3
## 1358          5      5
## 1359          1      5
## 1361          7      4
## 1364          1      4
## 1368          2      4
## 1384          1      5
## 1390          5      5
## 1393          6      4
## 1394          1      5
## 1402          3      5
## 1407          6      5
## 1408          6      4
## 1412          6      5
## 1413          5      5
## 1416          5      3
## 1417          4      4
## 1418          1      4
## 1419          5      4
## 1420          1      5
## 1423          4      5
## 1424          4      2
## 1432          5      5
## 1433          1      5
## 1437          1      5
## 1438          1      5
## 1439          6      5
## 1446          1      3
## 1450          4      4
## 1451          1      5
## 1452          2      4
## 1453          6      4
## 1456          6      5
## 1464          6      3
## 1469          1      3
## 1473          5      2
## 1481          5      5
## 1482          5      5
## 1496          4      4
## 1497          1      5
## 1504          1      5
## 1513          5      5
## 1515          5      3
## 1534          5      5
## 1535          6      4
## 1536          6      5
## 1540          6      5
## 1551          4      5
## 1555          1      4
## 1557          1      5
## 1566          5      5
## 1567          6      5
## 1576          1      5
## 1584          1      5
## 1585          1      4
## 1590          1      4
## 1594          3      2
## 1595          5      2
## 1603          5      5
## 1608          1      3
## 1609          5      5
## 1615          5      4
## 1616          1      5
## 1617          5      4
## 1620          1      5
## 1621          5      5
## 1637          5      5
## 1638          5      5
## 1650          6      5
## 1654          6      4
## 1665          1      2
## 1670          4      5
## 1671          5      4
## 1675          5      4
## 1688          5      5
## 1691          6      4
## 1695          4      5
## 1698          4      4
## 1704          5      2
## 1705          6      2
## 1711          1      2
## 1719          4      5
## 1723          6      5
## 1726          4      5
## 1749          1      5
## 1752          5      3
## 1754          2      5
## 1758          1      5
## 1761          5      4
## 1773          7      3
## 1775          3      1
## 1786          5      5
## 1793          5      4
## 1799          6      5
## 1803          5      4
## 1806          5      5
## 1807          3      3
## 1808          6      5
## 1814          4      3
## 1815          6      4
## 1818          1      5
## 1827          1      3
## 1834          6      5
## 1835          4      3
## 1843          5      5
## 1846          4      5
## 1850          5      5
## 1851          1      4
## 1854          5      1
## 1859          1      4
## 1861          5      4
## 1866          5      5
## 1873          3      5
## 1875          1      4
## 1885          5      3
## 1892          5      4
## 1895          3      4
## 1896          5      5
## 1897          4      4
## 1899          1      5
## 1904          1      2
## 1905          1      4
## 1908          1      4
## 1916          5      4
## 1918          3      3
## 1920          6      2
## 1930          3      3
## 1940          3      5
## 1947          1      4
## 1949          4      5
## 1951          1      4
## 1952          5      5
## 1960          4      5
## 9001          5      4
## 9012          6      2
## 9023          5      4
## 9029          1      4
## 6             4      4
## 12            1      5
## 43            6      2
## 53            5      2
## 67            5      5
## 79            1      5
## 122           5      2
## 126           3      4
## 133           6      4
## 138           3      2
## 154           4      2
## 159           1      4
## 174           4      1
## 176           6      2
## 181           1      2
## 182           6      5
## 186           7      3
## 189           5      5
## 204           5      5
## 215           4      5
## 232           5      4
## 233           1      4
## 252           5      4
## 253           6      2
## 274           5      4
## 275           7      3
## 287           5      2
## 288           6      4
## 325           1      3
## 328           1      4
## 344           7      2
## 353           4      5
## 354           1      5
## 367           6      4
## 369           6      4
## 390           5      3
## 392           5      5
## 423           6      4
## 432           1      3
## 436           6      3
## 483           6      5
## 513           3      4
## 516           6      2
## 518           5      5
## 520           6      5
## 526           1      1
## 528           3      4
## 553           4      5
## 576           3      4
## 611           5      4
## 625           5      4
## 635           6      4
## 646           4      5
## 657           1      3
## 659           3      3
## 666           1      4
## 679           7      4
## 729           4      1
## 755           3      2
## 758           6      4
## 770           6      4
## 786           5      4
## 797           5      2
## 811           4      4
## 834           1      4
## 858           4      4
## 885           5      2
## 893           5      3
## 927           6      1
## 928           4      4
## 933           5      3
## 951           4      2
## 968           2      5
## 972           6      4
## 975           5      5
## 977           1      3
## 981           5      5
## 986           5      3
## 1002          2      4
## 1007          1      2
## 1011          5      4
## 1035          6      3
## 1050          5      3
## 1056          4      3
## 1057          6      5
## 1075          6      2
## 1080          3      3
## 1125          1      2
## 1131          4      4
## 1138          1      3
## 1150          5      4
## 1163          4      3
## 1169          6      5
## 1198          6      2
## 1204          5      3
## 1218          5      1
## 1230          1      2
## 1236          5      4
## 1247          4      2
## 1259          1      1
## 1294          4      5
## 1353          6      4
## 1370          6      3
## 1427          6      3
## 1445          5      2
## 1460          6      4
## 1480          5      4
## 1505          5      2
## 1543          6      3
## 1548          5      1
## 1550          5      5
## 1561          5      2
## 1564          5      4
## 1573          5      5
## 1575          1      5
## 1599          1      3
## 1622          5      5
## 1629          3      5
## 1664          6      4
## 1669          3      1
## 1674          3      5
## 1682          5      4
## 1685          3      5
## 1697          6      4
## 1716          5      2
## 1730          5      4
## 1731          4      4
## 1732          6      2
## 1743          6      3
## 1751          5      4
## 1757          7      3
## 1763          6      4
## 1766          3      3
## 1772          6      2
## 1776          6      5
## 1782          5      5
## 1784          6      5
## 1791          4      4
## 1831          3      2
## 1840          1      2
## 1844          6      2
## 1856          3      2
## 1876          5      4
## 1929          4      3
## 1935          4      2
## 1938          2      5
## 1941          5      4
## 1954          6      5
## 1959          6      2
## 9010          1      5

该数据从601个参与者身上收集了9个变量,包括一年来婚外私通的频率以及参与者性别、年龄、婚龄、是否 有小孩、宗教信仰程度(5分制, 1分表示反对, 5分表示非常信仰)、学历、职业(逆向编号的戈 登7种分类),还有对婚姻的自我评分(5分制, 1表示非常不幸福, 5表示非常幸福)。



2、展现各变量的情况

summary(Affairs)
##     affairs          gender         age         yearsmarried    children 
##  Min.   : 0.000   female:315   Min.   :17.50   Min.   : 0.125   no :171  
##  1st Qu.: 0.000   male  :286   1st Qu.:27.00   1st Qu.: 4.000   yes:430  
##  Median : 0.000                Median :32.00   Median : 7.000            
##  Mean   : 1.456                Mean   :32.49   Mean   : 8.178            
##  3rd Qu.: 0.000                3rd Qu.:37.00   3rd Qu.:15.000            
##  Max.   :12.000                Max.   :57.00   Max.   :15.000            
##  religiousness     education       occupation        rating     
##  Min.   :1.000   Min.   : 9.00   Min.   :1.000   Min.   :1.000  
##  1st Qu.:2.000   1st Qu.:14.00   1st Qu.:3.000   1st Qu.:3.000  
##  Median :3.000   Median :16.00   Median :5.000   Median :4.000  
##  Mean   :3.116   Mean   :16.17   Mean   :4.195   Mean   :3.932  
##  3rd Qu.:4.000   3rd Qu.:18.00   3rd Qu.:6.000   3rd Qu.:5.000  
##  Max.   :5.000   Max.   :20.00   Max.   :7.000   Max.   :5.000


假设我们感兴趣的是二值型结果(有过一次以上婚外情/没有过婚外情)。 我们需要将affairs转化为二值型因子ynaffair

3、将affairs转化为二值型因子ynaffair

Affairs$ynaffair[Affairs$affairs > 0] <- 1
Affairs$ynaffair[Affairs$affairs == 0] <- 0
Affairs$ynaffair <- factor(Affairs$ynaffair, levels = c(0, 
    1), labels = c("No", "Yes"))
table(Affairs$ynaffair)
## 
##  No Yes 
## 451 150


5、构建逻辑回归模型

fit.full <- glm(ynaffair ~ gender + age + yearsmarried + 
    children + religiousness + education + occupation + rating, 
    data = Affairs, family = binomial())
summary(fit.full)
## 
## Call:
## glm(formula = ynaffair ~ gender + age + yearsmarried + children + 
##     religiousness + education + occupation + rating, family = binomial(), 
##     data = Affairs)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.5713  -0.7499  -0.5690  -0.2539   2.5191  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    1.37726    0.88776   1.551 0.120807    
## gendermale     0.28029    0.23909   1.172 0.241083    
## age           -0.04426    0.01825  -2.425 0.015301 *  
## yearsmarried   0.09477    0.03221   2.942 0.003262 ** 
## childrenyes    0.39767    0.29151   1.364 0.172508    
## religiousness -0.32472    0.08975  -3.618 0.000297 ***
## education      0.02105    0.05051   0.417 0.676851    
## occupation     0.03092    0.07178   0.431 0.666630    
## rating        -0.46845    0.09091  -5.153 2.56e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 675.38  on 600  degrees of freedom
## Residual deviance: 609.51  on 592  degrees of freedom
## AIC: 627.51
## 
## Number of Fisher Scoring iterations: 4

从回归系数的p值(最后一栏)可以看到,性别、是否有孩子、学历和职业对方程的贡献都 不显著(你无法拒绝参数为0的假设)。



6、去除掉不显著的变量,构建回归方程

fit.reduced <- glm(ynaffair ~ age + yearsmarried + 
    religiousness + rating, data = Affairs, family = binomial())
summary(fit.reduced)
## 
## Call:
## glm(formula = ynaffair ~ age + yearsmarried + religiousness + 
##     rating, family = binomial(), data = Affairs)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.6278  -0.7550  -0.5701  -0.2624   2.3998  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    1.93083    0.61032   3.164 0.001558 ** 
## age           -0.03527    0.01736  -2.032 0.042127 *  
## yearsmarried   0.10062    0.02921   3.445 0.000571 ***
## religiousness -0.32902    0.08945  -3.678 0.000235 ***
## rating        -0.46136    0.08884  -5.193 2.06e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 675.38  on 600  degrees of freedom
## Residual deviance: 615.36  on 596  degrees of freedom
## AIC: 625.36
## 
## Number of Fisher Scoring iterations: 4

比之前的结果更加显著