Problem: Reproduce the results of analysis reported in Chapter 11 of Behavioral data analysis with R (Cheng & Sheu, 2015) with this R script and the married dataset.
Data:
## 教育 性別 婚姻 年齡
## 1 國小以下 女 未婚 32.5
## 2 專科 男 有偶 57.5
## 3 國中 男 有偶 42.5
## 4 國小以下 男 有偶 42.5
## 5 高中 女 喪偶 62.5
## 6 高中 女 有偶 52.5
## 'data.frame': 10000 obs. of 4 variables:
## $ 教育: chr "國小以下" "專科" "國中" "國小以下" ...
## $ 性別: chr "女" "男" "男" "男" ...
## $ 婚姻: chr "未婚" "有偶" "有偶" "有偶" ...
## $ 年齡: num 32.5 57.5 42.5 42.5 62.5 52.5 32.5 37.5 42.5 62.5 ...
## 教育 性別 婚姻 年齡
## Length:10000 Length:10000 Length:10000 Min. :32.50
## Class :character Class :character Class :character 1st Qu.:37.50
## Mode :character Mode :character Mode :character Median :47.50
## Mean :46.62
## 3rd Qu.:52.50
## Max. :62.50
The mean age is 46.62 (range from 32.50 to 62.50).
## $教育
## x
## 大學以上 高中 國小以下 國中 專科
## 0.1070 0.3297 0.2406 0.1360 0.1867
##
## $性別
## x
## 女 男
## 0.5073 0.4927
##
## $婚姻
## x
## 未婚 有偶 喪偶 離婚
## 0.1966 0.6550 0.0341 0.1143
##
## $年齡
## x
## 32.5 37.5 42.5 47.5 52.5 57.5 62.5
## 0.1619 0.1516 0.1435 0.1434 0.1558 0.1306 0.1132
##
## $年次
## x
## 39.5 44.5 49.5 54.5 59.5 64.5 69.5
## 0.1132 0.1306 0.1558 0.1434 0.1435 0.1516 0.1619
##
## $結過婚
## x
## 0 1
## 0.1966 0.8034
The highest proportion of educational level is high school level (32.97%).
Gender ratio is close to 1:1.
80% of them are married.
## 結過婚 0 1
## 教育
## 國小以下 726 1680
## 國中 273 1087
## 高中 584 2713
## 專科 311 1556
## 大學以上 72 998
The proportion of each educational level is similar in married group. However, the single group tend to be lower educational level.
## 結過婚 0 1
## 性別
## 女 894 4179
## 男 1072 3855
The ratio of single to married in male (0.28) is slightly higher than female (0.21).
The proportion of female in married group is higher than male.
Male is tend to be single?
## 結過婚 0 1
## 年次
## 39.5 64 1068
## 44.5 105 1201
## 49.5 136 1422
## 54.5 201 1233
## 59.5 263 1172
## 64.5 422 1094
## 69.5 775 844
The more younger generation, the more tend to be single? (cohort effect)
## 結過婚 0 1
## 教育 性別
## 國小以下 女 399 774
## 男 327 906
## 國中 女 155 524
## 男 118 563
## 高中 女 235 1472
## 男 349 1241
## 專科 女 85 746
## 男 226 810
## 大學以上 女 20 663
## 男 52 335
## 結過婚 0 1
## 教育 年次
## 國小以下 39.5 7 123
## 44.5 16 150
## 49.5 32 200
## 54.5 51 250
## 59.5 76 286
## 64.5 154 350
## 69.5 390 321
## 國中 39.5 5 85
## 44.5 10 119
## 49.5 17 166
## 54.5 34 172
## 59.5 50 204
## 64.5 70 223
## 69.5 87 118
## 高中 39.5 23 236
## 44.5 37 368
## 49.5 42 500
## 54.5 65 497
## 59.5 90 458
## 64.5 132 366
## 69.5 195 288
## 專科 39.5 14 146
## 44.5 21 288
## 49.5 37 425
## 54.5 41 263
## 59.5 43 204
## 64.5 57 134
## 69.5 98 96
## 大學以上 39.5 15 478
## 44.5 21 276
## 49.5 8 131
## 54.5 10 51
## 59.5 4 20
## 64.5 9 21
## 69.5 5 21
## 結過婚 0 1
## 性別 年次
## 女 39.5 43 576
## 44.5 49 597
## 49.5 62 708
## 54.5 93 603
## 59.5 123 605
## 64.5 187 573
## 69.5 337 517
## 男 39.5 21 492
## 44.5 56 604
## 49.5 74 714
## 54.5 108 630
## 59.5 140 567
## 64.5 235 521
## 69.5 438 327
Only compute the count not the ratio
## 結過婚 0 1
## 教育 性別 年次
## 國小以下 女 39.5 6 47
## 44.5 12 58
## 49.5 23 86
## 54.5 28 104
## 59.5 53 124
## 64.5 87 155
## 69.5 190 200
## 男 39.5 1 76
## 44.5 4 92
## 49.5 9 114
## 54.5 23 146
## 59.5 23 162
## 64.5 67 195
## 69.5 200 121
## 國中 女 39.5 5 32
## 44.5 5 38
## 49.5 10 68
## 54.5 21 71
## 59.5 29 107
## 64.5 38 135
## 69.5 47 73
## 男 39.5 0 53
## 44.5 5 81
## 49.5 7 98
## 54.5 13 101
## 59.5 21 97
## 64.5 32 88
## 69.5 40 45
## 高中 女 39.5 15 117
## 44.5 19 186
## 49.5 17 257
## 54.5 29 272
## 59.5 30 255
## 64.5 49 217
## 69.5 76 168
## 男 39.5 8 119
## 44.5 18 182
## 49.5 25 243
## 54.5 36 225
## 59.5 60 203
## 64.5 83 149
## 69.5 119 120
## 專科 女 39.5 11 78
## 44.5 7 124
## 49.5 10 200
## 54.5 13 128
## 59.5 9 105
## 64.5 12 54
## 69.5 23 57
## 男 39.5 3 68
## 44.5 14 164
## 49.5 27 225
## 54.5 28 135
## 59.5 34 99
## 64.5 45 80
## 69.5 75 39
## 大學以上 女 39.5 6 302
## 44.5 6 191
## 49.5 2 97
## 54.5 2 28
## 59.5 2 14
## 64.5 1 12
## 69.5 1 19
## 男 39.5 9 176
## 44.5 15 85
## 49.5 6 34
## 54.5 8 23
## 59.5 2 6
## 64.5 8 9
## 69.5 4 2
## 結過婚 0 1
## 教育 性別 年次
## 國小以下 女 39.5 0.11320755 0.88679245
## 44.5 0.17142857 0.82857143
## 49.5 0.21100917 0.78899083
## 54.5 0.21212121 0.78787879
## 59.5 0.29943503 0.70056497
## 64.5 0.35950413 0.64049587
## 69.5 0.48717949 0.51282051
## 男 39.5 0.01298701 0.98701299
## 44.5 0.04166667 0.95833333
## 49.5 0.07317073 0.92682927
## 54.5 0.13609467 0.86390533
## 59.5 0.12432432 0.87567568
## 64.5 0.25572519 0.74427481
## 69.5 0.62305296 0.37694704
## 國中 女 39.5 0.13513514 0.86486486
## 44.5 0.11627907 0.88372093
## 49.5 0.12820513 0.87179487
## 54.5 0.22826087 0.77173913
## 59.5 0.21323529 0.78676471
## 64.5 0.21965318 0.78034682
## 69.5 0.39166667 0.60833333
## 男 39.5 0.00000000 1.00000000
## 44.5 0.05813953 0.94186047
## 49.5 0.06666667 0.93333333
## 54.5 0.11403509 0.88596491
## 59.5 0.17796610 0.82203390
## 64.5 0.26666667 0.73333333
## 69.5 0.47058824 0.52941176
## 高中 女 39.5 0.11363636 0.88636364
## 44.5 0.09268293 0.90731707
## 49.5 0.06204380 0.93795620
## 54.5 0.09634551 0.90365449
## 59.5 0.10526316 0.89473684
## 64.5 0.18421053 0.81578947
## 69.5 0.31147541 0.68852459
## 男 39.5 0.06299213 0.93700787
## 44.5 0.09000000 0.91000000
## 49.5 0.09328358 0.90671642
## 54.5 0.13793103 0.86206897
## 59.5 0.22813688 0.77186312
## 64.5 0.35775862 0.64224138
## 69.5 0.49790795 0.50209205
## 專科 女 39.5 0.12359551 0.87640449
## 44.5 0.05343511 0.94656489
## 49.5 0.04761905 0.95238095
## 54.5 0.09219858 0.90780142
## 59.5 0.07894737 0.92105263
## 64.5 0.18181818 0.81818182
## 69.5 0.28750000 0.71250000
## 男 39.5 0.04225352 0.95774648
## 44.5 0.07865169 0.92134831
## 49.5 0.10714286 0.89285714
## 54.5 0.17177914 0.82822086
## 59.5 0.25563910 0.74436090
## 64.5 0.36000000 0.64000000
## 69.5 0.65789474 0.34210526
## 大學以上 女 39.5 0.01948052 0.98051948
## 44.5 0.03045685 0.96954315
## 49.5 0.02020202 0.97979798
## 54.5 0.06666667 0.93333333
## 59.5 0.12500000 0.87500000
## 64.5 0.07692308 0.92307692
## 69.5 0.05000000 0.95000000
## 男 39.5 0.04864865 0.95135135
## 44.5 0.15000000 0.85000000
## 49.5 0.15000000 0.85000000
## 54.5 0.25806452 0.74193548
## 59.5 0.25000000 0.75000000
## 64.5 0.47058824 0.52941176
## 69.5 0.66666667 0.33333333
The younger male cohort more likely to be single in each of educational level.
## 年次 39.5 44.5 49.5 54.5 59.5 64.5 69.5
## 結過婚 0 1 0 1 0 1 0 1 0 1 0 1 0 1
## 教育 性別
## 國小以下 女 6 47 12 58 23 86 28 104 53 124 87 155 190 200
## 男 1 76 4 92 9 114 23 146 23 162 67 195 200 121
## 國中 女 5 32 5 38 10 68 21 71 29 107 38 135 47 73
## 男 0 53 5 81 7 98 13 101 21 97 32 88 40 45
## 高中 女 15 117 19 186 17 257 29 272 30 255 49 217 76 168
## 男 8 119 18 182 25 243 36 225 60 203 83 149 119 120
## 專科 女 11 78 7 124 10 200 13 128 9 105 12 54 23 57
## 男 3 68 14 164 27 225 28 135 34 99 45 80 75 39
## 大學以上 女 6 302 6 191 2 97 2 28 2 14 1 12 1 19
## 男 9 176 15 85 6 34 8 23 2 6 8 9 4 2
## as.factor(年次) 39.5 44.5 49.5 54.5 59.5 64.5 69.5
## 性別 教育
## 女 國小以下 47 58 86 104 124 155 200
## 國中 32 38 68 71 107 135 73
## 高中 117 186 257 272 255 217 168
## 專科 78 124 200 128 105 54 57
## 大學以上 302 191 97 28 14 12 19
## 男 國小以下 76 92 114 146 162 195 121
## 國中 53 81 98 101 97 88 45
## 高中 119 182 243 225 203 149 120
## 專科 68 164 225 135 99 80 39
## 大學以上 176 85 34 23 6 9 2
## Loading required package: ggplot2
# 圖示不同年次、性別與教育程度的結過婚比率,加上信賴區間
# 圖11.1
ggplot(data = dta1, aes(x = 年次, y = 結過婚, shape = 性別, color = 性別)) +
stat_summary(fun.data = mean_se, size = .3) +
facet_grid(. ~ 教育 ) +
labs(x = '年次', y = '結過婚百分比') +
theme(legend.position = c (.95, .1))
The younger cohort more likely to be single in each of educational level.
The married ratio of male is higher than female among the educational level under junior high school. However, the married ratio of male is lower than female among the high school level and above.
May consider the interaction effect of education with gender and education with cohort on the ratio of married.
The cohort effect on the y
##
## Call:
## glm(formula = 結過婚 ~ 年次, family = binomial, data = dta1)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.5499 0.2811 0.4498 0.7040 1.0540
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 7.04825 0.19472 36.20 <2e-16 ***
## 年次 -0.09713 0.00319 -30.45 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 9913.1 on 9999 degrees of freedom
## Residual deviance: 8766.2 on 9998 degrees of freedom
## AIC: 8770.2
##
## Number of Fisher Scoring iterations: 5
## Waiting for profiling to be done...
## coef 2.5 % 97.5 %
## (Intercept) 1150.8459885 788.6829642 1692.1285115
## 年次 0.9074337 0.9017331 0.9130795
The odds of married ratio among the younger cohort is lower than the older.
When the cohort age add one year each, the odds of married will be increase 0.907 times (0.902-0.913).
##
## Call:
## glm(formula = 結過婚 ~ 年次 * 性別 * 教育, family = binomial,
## data = dta1)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.0831 0.2328 0.4607 0.6584 1.3921
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 4.774415 0.512535 9.315 < 2e-16 ***
## 年次 -0.066837 0.008136 -8.215 < 2e-16 ***
## 性別男 6.481320 0.956018 6.779 1.21e-11 ***
## 教育國中 -0.519822 0.893145 -0.582 0.56056
## 教育高中 0.564630 0.721014 0.783 0.43356
## 教育專科 0.629152 0.905912 0.694 0.48737
## 教育大學以上 1.102391 1.299680 0.848 0.39633
## 年次:性別男 -0.098016 0.014898 -6.579 4.73e-11 ***
## 年次:教育國中 0.015964 0.014465 1.104 0.26976
## 年次:教育高中 0.006055 0.011738 0.516 0.60595
## 年次:教育專科 0.007865 0.015370 0.512 0.60882
## 年次:教育大學以上 0.015564 0.025722 0.605 0.54513
## 性別男:教育國中 -2.045260 1.518446 -1.347 0.17800
## 性別男:教育高中 -4.635321 1.187586 -3.903 9.50e-05 ***
## 性別男:教育專科 -3.743424 1.354443 -2.764 0.00571 **
## 性別男:教育大學以上 -5.534359 1.765063 -3.136 0.00172 **
## 年次:性別男:教育國中 0.027590 0.024293 1.136 0.25609
## 年次:性別男:教育高中 0.056880 0.018937 3.004 0.00267 **
## 年次:性別男:教育專科 0.035498 0.022259 1.595 0.11076
## 年次:性別男:教育大學以上 0.044030 0.033610 1.310 0.19018
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 9913.1 on 9999 degrees of freedom
## Residual deviance: 8489.7 on 9980 degrees of freedom
## AIC: 8529.7
##
## Number of Fisher Scoring iterations: 6
顯著
單變項:年次, 性別
二階交互作用項:年次:性別, 性別:教育
## Single term deletions
##
## Model:
## 結過婚 ~ 年次 * 性別 * 教育
## Df Deviance AIC LRT Pr(>Chi)
## <none> 8489.7 8529.7
## 年次:性別:教育 4 8499.1 8531.1 9.4573 0.05063 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
可去除三階交互作用? nearly
##
## Call:
## glm(formula = 結過婚 ~ 年次 + 性別 + 教育 + 年次:性別 + 年次:教育 +
## 性別:教育, family = binomial, data = dta1)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.8957 0.2352 0.4540 0.6443 1.4225
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 5.478361 0.460628 11.893 < 2e-16 ***
## 年次 -0.078070 0.007260 -10.753 < 2e-16 ***
## 性別男 4.303726 0.462789 9.300 < 2e-16 ***
## 教育國中 -1.065792 0.719274 -1.482 0.13840
## 教育高中 -0.818436 0.558725 -1.465 0.14297
## 教育專科 -0.120577 0.643046 -0.188 0.85126
## 教育大學以上 0.087694 0.847448 0.103 0.91758
## 年次:性別男 -0.063821 0.007125 -8.958 < 2e-16 ***
## 年次:教育國中 0.024595 0.011556 2.128 0.03331 *
## 年次:教育高中 0.028792 0.009027 3.190 0.00143 **
## 年次:教育專科 0.019908 0.010677 1.865 0.06223 .
## 年次:教育大學以上 0.033298 0.016165 2.060 0.03941 *
## 性別男:教育國中 -0.270610 0.172757 -1.566 0.11725
## 性別男:教育高中 -1.104423 0.139475 -7.918 2.41e-15 ***
## 性別男:教育專科 -1.491088 0.178288 -8.363 < 2e-16 ***
## 性別男:教育大學以上 -2.882467 0.314888 -9.154 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 9913.1 on 9999 degrees of freedom
## Residual deviance: 8499.1 on 9984 degrees of freedom
## AIC: 8531.1
##
## Number of Fisher Scoring iterations: 6
# 邏輯迴歸模型診斷,將預測值與實際觀測值放在一起
# 圖11.2
ggplot(data = dta1, aes(x = 年次, y = 結過婚, shape = 性別, color = 性別)) +
stat_summary(fun.data = mean_se, size = .3) +
stat_summary(aes(y = fitted(m_drop1)), fun = mean, geom = 'line', size = .3) +
facet_grid(. ~ 教育) +
scale_x_continuous(breaks = seq(30, 70, by = 5)) +
labs(x= '年次', y = '結過婚百分比') +
theme(legend.position= c (.9, .15))
Fitted line of the model (mdrop1) quite fit with the oberved data.
## Loading required package: dispmod
# 將資料整理成 dispmod 要的形式
dta1_f <- data.frame(ftable(dta1, row.vars = c(1, 2, 5), col.vars = '結過婚'))
dta1_f1 <- subset(dta1_f , 結過婚 == 1)
dta1_f1$Tot <- dta1_f[1:70, 5] + dta1_f[71:140, 5]
dta1_f1 <- dta1_f1[,-4]
## 教育 性別 年次 Freq Tot
## 71 國小以下 女 39.5 47 53
## 72 國中 女 39.5 32 37
## 73 高中 女 39.5 117 132
## 74 專科 女 39.5 78 89
## 75 大學以上 女 39.5 302 308
## 76 國小以下 男 39.5 76 77
# 以整理過資料重新分析,確認與前面相同
summary(m_last <- glm(cbind(Freq, Tot-Freq) ~ 年次 * 性別 * 教育 - 年次:性別:教育,
data = dta1_f1, family = binomial))
##
## Call:
## glm(formula = cbind(Freq, Tot - Freq) ~ 年次 * 性別 * 教育 -
## 年次:性別:教育, family = binomial, data = dta1_f1)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.7255 -0.8572 0.0203 0.5868 3.1982
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 5.478361 0.460628 11.893 < 2e-16 ***
## 年次 -0.078070 0.007260 -10.753 < 2e-16 ***
## 性別男 4.303726 0.462789 9.300 < 2e-16 ***
## 教育國中 -1.065792 0.719274 -1.482 0.13840
## 教育高中 -0.818436 0.558725 -1.465 0.14297
## 教育專科 -0.120577 0.643045 -0.188 0.85126
## 教育大學以上 0.087694 0.847449 0.103 0.91758
## 年次:性別男 -0.063821 0.007125 -8.958 < 2e-16 ***
## 年次:教育國中 0.024595 0.011556 2.128 0.03331 *
## 年次:教育高中 0.028792 0.009027 3.190 0.00143 **
## 年次:教育專科 0.019908 0.010677 1.865 0.06223 .
## 年次:教育大學以上 0.033298 0.016165 2.060 0.03941 *
## 性別男:教育國中 -0.270610 0.172757 -1.566 0.11725
## 性別男:教育高中 -1.104423 0.139475 -7.918 2.41e-15 ***
## 性別男:教育專科 -1.491088 0.178288 -8.363 < 2e-16 ***
## 性別男:教育大學以上 -2.882467 0.314888 -9.154 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1530.13 on 69 degrees of freedom
## Residual deviance: 116.19 on 54 degrees of freedom
## AIC: 446.31
##
## Number of Fisher Scoring iterations: 4
##
## Binomial overdispersed logit model fitting...
## Iter. 1 phi: 0.009144343
## Iter. 2 phi: 0.0068854
## Iter. 3 phi: 0.007239847
## Iter. 4 phi: 0.007179674
## Iter. 5 phi: 0.007189756
## Iter. 6 phi: 0.007188063
## Iter. 7 phi: 0.007188347
## Iter. 8 phi: 0.0071883
## Iter. 9 phi: 0.007188308
## Iter. 10 phi: 0.007188306
## Iter. 11 phi: 0.007188306
## Converged after 11 iterations.
## Estimated dispersion parameter: 0.007188306
##
## Call:
## glm(formula = cbind(Freq, Tot - Freq) ~ 年次 * 性別 * 教育 -
## 年次:性別:教育, family = binomial, data = dta1_f1, weights = disp.weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.34206 -0.58236 0.01665 0.49197 1.91197
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 5.08101 0.63580 7.992 1.33e-15 ***
## 年次 -0.07189 0.01043 -6.895 5.38e-12 ***
## 性別男 4.41527 0.64693 6.825 8.80e-12 ***
## 教育國中 -0.71418 0.93255 -0.766 0.4438
## 教育高中 -0.70145 0.81672 -0.859 0.3904
## 教育專科 0.15978 0.89475 0.179 0.8583
## 教育大學以上 0.24864 1.12922 0.220 0.8257
## 年次:性別男 -0.06503 0.01015 -6.407 1.49e-10 ***
## 年次:教育國中 0.01878 0.01533 1.225 0.2204
## 年次:教育高中 0.02688 0.01355 1.984 0.0473 *
## 年次:教育專科 0.01487 0.01495 0.995 0.3200
## 年次:教育大學以上 0.03072 0.02041 1.505 0.1322
## 性別男:教育國中 -0.28931 0.24870 -1.163 0.2447
## 性別男:教育高中 -1.13265 0.22766 -4.975 6.52e-07 ***
## 性別男:教育專科 -1.48291 0.25459 -5.825 5.72e-09 ***
## 性別男:教育大學以上 -2.87409 0.41432 -6.937 4.01e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 633.454 on 69 degrees of freedom
## Residual deviance: 52.675 on 54 degrees of freedom
## AIC: 236.78
##
## Number of Fisher Scoring iterations: 4
##
## Call:
## glm(formula = cbind(Freq, Tot - Freq) ~ 年次 * 性別 * 教育 -
## 年次:性別:教育, family = binomial, data = dta1_f1, weights = disp.weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.34206 -0.58236 0.01665 0.49197 1.91197
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 5.08101 0.63580 7.992 1.33e-15 ***
## 年次 -0.07189 0.01043 -6.895 5.38e-12 ***
## 性別男 4.41527 0.64693 6.825 8.80e-12 ***
## 教育國中 -0.71418 0.93255 -0.766 0.4438
## 教育高中 -0.70145 0.81672 -0.859 0.3904
## 教育專科 0.15978 0.89475 0.179 0.8583
## 教育大學以上 0.24864 1.12922 0.220 0.8257
## 年次:性別男 -0.06503 0.01015 -6.407 1.49e-10 ***
## 年次:教育國中 0.01878 0.01533 1.225 0.2204
## 年次:教育高中 0.02688 0.01355 1.984 0.0473 *
## 年次:教育專科 0.01487 0.01495 0.995 0.3200
## 年次:教育大學以上 0.03072 0.02041 1.505 0.1322
## 性別男:教育國中 -0.28931 0.24870 -1.163 0.2447
## 性別男:教育高中 -1.13265 0.22766 -4.975 6.52e-07 ***
## 性別男:教育專科 -1.48291 0.25459 -5.825 5.72e-09 ***
## 性別男:教育大學以上 -2.87409 0.41432 -6.937 4.01e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 633.454 on 69 degrees of freedom
## Residual deviance: 52.675 on 54 degrees of freedom
## AIC: 236.78
##
## Number of Fisher Scoring iterations: 4
phi = 0.0072
## [1] 0.007188306
## Analysis of Deviance Table
##
## Model 1: cbind(Freq, Tot - Freq) ~ 年次 * 性別 * 教育 - 年次:性別:教育
## Model 2: cbind(Freq, Tot - Freq) ~ 年次 * 性別 * 教育 - 年次:性別:教育
## Resid. Df Resid. Dev Df Deviance
## 1 54 116.189
## 2 54 52.675 0 63.515
116.189 -> 52.675
# 準備再一次模型診斷,收集資料,畫圖
# 圖11.3
dta1_mlastd <- data.frame(dta1_f1, phat = fitted(m_lastd))
ggplot(data = dta1_mlastd, aes(x = 年次, y = Freq/Tot, color = 性別, shape = 性別)) +
geom_point(size = 1) +
stat_smooth(aes(x = 年次, y = phat), method = 'loess', se = F, size = 1) +
facet_grid(. ~ 教育) +
scale_x_continuous(breaks = seq(30, 70, by = 5)) +
labs(x= '年次', y = '結過婚百分比') +
theme(legend.position = 'NONE' )
## `geom_smooth()` using formula 'y ~ x'
In male, the higher educational level tend to be single and the younger generation more likely to be single. However, in female, the higher educational level tend to be married and the younger generation more likely to be single.
##
## Call:
## glm(formula = cbind(Freq, Tot - Freq) ~ 年次 + 性別 + 教育 +
## 年次:性別 + 年次:教育 + 性別:教育 - 1, family = binomial,
## data = dta1_f1, weights = disp.weights)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.34206 -0.58236 0.01665 0.49197 1.91197
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## 年次 -0.07189 0.01043 -6.895 5.38e-12 ***
## 性別女 5.08101 0.63580 7.992 1.33e-15 ***
## 性別男 9.49628 0.75372 12.599 < 2e-16 ***
## 教育國中 -0.71418 0.93255 -0.766 0.4438
## 教育高中 -0.70145 0.81672 -0.859 0.3904
## 教育專科 0.15978 0.89475 0.179 0.8583
## 教育大學以上 0.24864 1.12922 0.220 0.8257
## 年次:性別男 -0.06503 0.01015 -6.407 1.49e-10 ***
## 年次:教育國中 0.01878 0.01533 1.225 0.2204
## 年次:教育高中 0.02688 0.01355 1.984 0.0473 *
## 年次:教育專科 0.01487 0.01495 0.995 0.3200
## 年次:教育大學以上 0.03072 0.02041 1.505 0.1322
## 性別男:教育國中 -0.28931 0.24870 -1.163 0.2447
## 性別男:教育高中 -1.13265 0.22766 -4.975 6.52e-07 ***
## 性別男:教育專科 -1.48291 0.25459 -5.825 5.72e-09 ***
## 性別男:教育大學以上 -2.87409 0.41432 -6.937 4.01e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2504.909 on 70 degrees of freedom
## Residual deviance: 52.675 on 54 degrees of freedom
## AIC: 236.78
##
## Number of Fisher Scoring iterations: 4
## Loading required package: coefplot
## 婚姻
## 年次 有偶 未婚 離婚 喪偶
## 39.5 0.719964664 0.056537102 0.113957597 0.109540636
## 44.5 0.728177642 0.080398162 0.129402757 0.062021440
## 49.5 0.713735558 0.087291399 0.155327343 0.043645700
## 54.5 0.716875872 0.140167364 0.120641562 0.022315202
## 59.5 0.671080139 0.183275261 0.135888502 0.009756098
## 64.5 0.622031662 0.278364116 0.093007916 0.006596306
## 69.5 0.455836936 0.478690550 0.058060531 0.007411983
## Loading required package: mlogit
## Loading required package: dfidx
##
## Attaching package: 'dfidx'
## The following object is masked from 'package:stats':
##
## filter
# 將資料轉成 mlogit 需要格式
dta1_w <- mlogit.data(dta1, shape = 'wide', choice = '婚姻')
#看看資料
#程式報表11.13
head(dta1_w)
## ~~~~~~~
## first 10 observations out of 40000
## ~~~~~~~
## 教育 性別 婚姻 年齡 年次 結過婚 chid alt idx
## 1 國小以下 女 TRUE 32.5 69.5 0 1 未婚 1:未婚
## 2 國小以下 女 FALSE 32.5 69.5 0 1 有偶 1:有偶
## 3 國小以下 女 FALSE 32.5 69.5 0 1 喪偶 1:喪偶
## 4 國小以下 女 FALSE 32.5 69.5 0 1 離婚 1:離婚
## 5 專科 男 FALSE 57.5 44.5 1 2 未婚 2:未婚
## 6 專科 男 TRUE 57.5 44.5 1 2 有偶 2:有偶
## 7 專科 男 FALSE 57.5 44.5 1 2 喪偶 2:喪偶
## 8 專科 男 FALSE 57.5 44.5 1 2 離婚 2:離婚
## 9 國中 男 FALSE 42.5 59.5 1 3 未婚 3:未婚
## 10 國中 男 TRUE 42.5 59.5 1 3 有偶 3:有偶
##
## ~~~ indexes ~~~~
## chid alt
## 1 1 未婚
## 2 1 有偶
## 3 1 喪偶
## 4 1 離婚
## 5 2 未婚
## 6 2 有偶
## 7 2 喪偶
## 8 2 離婚
## 9 3 未婚
## 10 3 有偶
## indexes: 1, 2
##
## Call:
## mlogit(formula = 婚姻 ~ 0 | 年次, data = dta1_w, method = "nr")
##
## Frequencies of alternatives:choice
## 未婚 有偶 喪偶 離婚
## 0.1966 0.6550 0.0341 0.1143
##
## nr method
## 7 iterations, 0h:0m:1s
## g'(-H)^-1g = 0.000103
## successive function values within tolerance limits
##
## Coefficients :
## Estimate Std. Error z-value Pr(>|z|)
## (Intercept):有偶 6.6554207 0.1975766 33.685 < 2.2e-16 ***
## (Intercept):喪偶 8.5654937 0.3956179 21.651 < 2.2e-16 ***
## (Intercept):離婚 5.2686719 0.2507264 21.014 < 2.2e-16 ***
## 年次:有偶 -0.0935790 0.0032415 -28.869 < 2.2e-16 ***
## 年次:喪偶 -0.1905932 0.0079392 -24.007 < 2.2e-16 ***
## 年次:離婚 -0.1002414 0.0043105 -23.255 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log-Likelihood: -8918.3
## McFadden R^2: 0.07105
## Likelihood ratio test : chisq = 1364.2 (p.value = < 2.22e-16)
## b_hat 2.5 % 97.5 %
## (Intercept):有偶 776.9847186 527.5152157 1.144432e+03
## (Intercept):喪偶 5247.4301390 2416.5519282 1.139455e+04
## (Intercept):離婚 194.1579345 118.7781080 3.173759e+02
## 年次:有偶 0.9106660 0.9048986 9.164702e-01
## 年次:喪偶 0.8264687 0.8137080 8.394295e-01
## 年次:離婚 0.9046190 0.8970085 9.122941e-01
# 準備畫圖,先得到預測值
newdta1 <- with(dta1, data.frame(
年次 = rep(seq(from = 39.5, to = 69.5, length = 7), 4),
婚姻 = factor(rep(c('有偶', '未婚', '離婚', '喪偶'), each = 7) ) ) )
newdta1_w <- mlogit.data(newdta1, shape = 'wide', choice = '婚姻')
ml0_phat <- as.data.frame(predict(ml0, newdta1_w)[1:7, ])
## Loading required package: reshape2
# 製造年次婚姻列聯表、將預測值轉成需要格式、彙整在一起
obs_p <- data.frame(with(dta1, prop.table(table(年次, 婚姻), 1)))
dta1_op <- data.frame(obs_p, melt(ml0_phat))
## No id variables; using all as measure variables
# 繪圖
# 圖11.5
ggplot(data = dta1_op, aes(x = 年次, y = Freq, shape = 婚姻)) +
geom_point(size = 2) +
stat_smooth(aes(x = 年次, y = value, group = 婚姻, color = 婚姻),
method = 'loess', se = F, size = 1) +
labs(x = '年次', y = '分類百分比') +
theme(legend.position = c(.1, .5))
## `geom_smooth()` using formula 'y ~ x'
Chapter 11 of Behavioral data analysis with R (Cheng & Sheu, 2015)
Problem: The interest is to model the number of days of absence from school using student’s gender, the standardized test in math, and the type of program in which the student is enrolled.
Data: The data set days absent contains attendance data on 314 high school juniors from high schools. The response variable of interest is days absent from school. The standardized math score for each student was given as well as the program variable indicating the type of instructional program in which the student was enrolled.
Column 1: Student ID
Column 2: Gender ID
Column 3: Program ID
Column 4: Standardized mathematics score
Column 5: Days absent from school
pacman::p_load(tidyr, tidyverse, nlme, car, afex, emmeans, lattice, faraway, boot, ggplot2, kableExtra, MASS)
dta2 <- read.csv("days_absent.csv")
str(dta2)
## 'data.frame': 314 obs. of 5 variables:
## $ ID : chr "S1001" "S1002" "S1003" "S1004" ...
## $ Gender : chr "male" "male" "female" "female" ...
## $ Program: chr "Academic" "Academic" "Academic" "Academic" ...
## $ Math : int 63 27 20 16 2 71 63 3 51 49 ...
## $ Days : int 4 4 2 3 3 13 11 7 10 9 ...
## ID Gender Program Math Days
## 1 S1001 male Academic 63 4
## 2 S1002 male Academic 27 4
## 3 S1003 female Academic 20 2
## 4 S1004 female Academic 16 3
## 5 S1005 female Academic 2 3
## 6 S1006 female Academic 71 13
ggplot(dta2, aes(Days)) +
geom_histogram(binwidth = 1.5, fill=8, col = 1)+
facet_grid(Gender ~ Program) +
labs(x="Days absent",
y="Frequency") +
theme_minimal()+
theme(legend.position = c(.95, .9))
The pattern (absent days) of differ gender in differ program is similar.
The “General” and “Academic” program may more likely to have higher frequency of absent days.
kbl(aggregate(cbind(Days, Math)~ Gender+Program, data=dta2, FUN=mean))%>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = F)
Gender | Program | Days | Math |
---|---|---|---|
female | Academic | 7.759036 | 43.69880 |
male | Academic | 6.119048 | 41.50000 |
female | General | 12.363636 | 44.40909 |
male | General | 8.555556 | 45.33333 |
female | Vocational | 2.709091 | 58.00000 |
male | Vocational | 2.634615 | 58.84615 |
The female students have higher absent days than male in every program. (Gender effect) However, only the math score of female students in “Academic” program is higher than male student(43.6 > 41.5). (May consider the program effect)
kbl(aggregate(cbind(Days, Math)~ Gender+Program, data=dta2, FUN=var))%>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = F)
Gender | Program | Days | Math |
---|---|---|---|
female | Academic | 68.55098 | 586.7252 |
male | Academic | 41.81698 | 689.8193 |
female | General | 84.33766 | 804.5390 |
male | General | 41.67320 | 923.5294 |
female | Vocational | 10.28418 | 512.4815 |
male | Vocational | 18.07956 | 364.9170 |
ggplot(dta2, aes(Math, Days, color=Gender)) +
stat_smooth(method="glm",
formula=y ~ x,
method.args=list(family=poisson),
size=rel(.5)) +
geom_point(pch=20, alpha=.5) +
facet_grid(. ~ Program) +
labs(y="Days absent",
x="Math score") +
theme_minimal()+
theme(legend.position = c(.95, .9))
sjPlot::tab_model(m0 <-glm(Days ~ Program*Gender*Math, data=dta2,
family=poisson(link=log)),
show.se=T, show.r2=F, show.obs=F)
Days | ||||
---|---|---|---|---|
Predictors | Incidence Rate Ratios | std. Error | CI | p |
(Intercept) | 10.43 | 0.08 | 8.94 – 12.14 | <0.001 |
Program [General] | 1.47 | 0.13 | 1.13 – 1.91 | 0.004 |
Program [Vocational] | 0.23 | 0.25 | 0.14 – 0.36 | <0.001 |
Gender [male] | 0.85 | 0.11 | 0.69 – 1.06 | 0.146 |
Math | 0.99 | 0.00 | 0.99 – 1.00 | <0.001 |
Program [General] * Gender [male] |
0.97 | 0.20 | 0.65 – 1.43 | 0.863 |
Program [Vocational] * Gender [male] |
1.59 | 0.37 | 0.76 – 3.27 | 0.214 |
Program [General] * Math | 1.00 | 0.00 | 1.00 – 1.01 | 0.479 |
Program [Vocational] * Math |
1.01 | 0.00 | 1.00 – 1.02 | 0.019 |
Gender [male] * Math | 1.00 | 0.00 | 0.99 – 1.00 | 0.274 |
(Program [General] Gender [male]) Math |
1.00 | 0.00 | 0.99 – 1.01 | 0.692 |
(Program [Vocational] Gender [male]) Math |
1.00 | 0.01 | 0.98 – 1.01 | 0.634 |
## # Overdispersion test
##
## dispersion ratio = 6.492
## Pearson's Chi-Squared = 1960.446
## p-value = < 0.001
## Overdispersion detected.
plot(log(fitted(m0)), log((dta2$Days-fitted(m0))^2),
xlab=expression(hat(mu)),
ylab=expression((y-hat(mu))^2))
grid()
abline(0, 1, col=2)
kbl(as.data.frame(summary(m0, dispersion=6.492)$coef)) %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = F)
Estimate | Std. Error | z value | Pr(>|z|) | |
---|---|---|---|---|
(Intercept) | 2.3450136 | 0.1989867 | 11.7847779 | 0.0000000 |
ProgramGeneral | 0.3873043 | 0.3411743 | 1.1352096 | 0.2562875 |
ProgramVocational | -1.4878389 | 0.6257345 | -2.3777477 | 0.0174187 |
Gendermale | -0.1588403 | 0.2785265 | -0.5702878 | 0.5684825 |
Math | -0.0071045 | 0.0043252 | -1.6425893 | 0.1004679 |
ProgramGeneral:Gendermale | -0.0347250 | 0.5144980 | -0.0674929 | 0.9461893 |
ProgramVocational:Gendermale | 0.4617659 | 0.9469746 | 0.4876222 | 0.6258175 |
ProgramGeneral:Math | 0.0019809 | 0.0071360 | 0.2775861 | 0.7813301 |
ProgramVocational:Math | 0.0094842 | 0.0103312 | 0.9180121 | 0.3586125 |
Gendermale:Math | -0.0026914 | 0.0062654 | -0.4295662 | 0.6675112 |
ProgramGeneral:Gendermale:Math | -0.0017088 | 0.0110050 | -0.1552787 | 0.8766016 |
ProgramVocational:Gendermale:Math | -0.0029735 | 0.0159262 | -0.1867067 | 0.8518906 |
## Warning in drop1.glm(m0, test = "F"): F test assumes 'quasipoisson' family
## Single term deletions
##
## Model:
## Days ~ Program * Gender * Math
## Df Deviance AIC F value Pr(>F)
## <none> 1730.4 2637.7
## Program:Gender:Math 2 1730.7 2634.0 0.0274 0.973
quasipoisson
sjPlot::tab_model(m1 <- update(m0, . ~ . - Gender:Math:School, family=quasipoisson),
show.se=T, show.r2=F, show.obs=F)
Days | ||||
---|---|---|---|---|
Predictors | Incidence Rate Ratios | std. Error | CI | p |
(Intercept) | 10.43 | 0.20 | 6.98 – 15.23 | <0.001 |
Program [General] | 1.47 | 0.34 | 0.74 – 2.83 | 0.257 |
Program [Vocational] | 0.23 | 0.63 | 0.06 – 0.71 | 0.018 |
Gender [male] | 0.85 | 0.28 | 0.49 – 1.47 | 0.569 |
Math | 0.99 | 0.00 | 0.98 – 1.00 | 0.101 |
Program [General] * Gender [male] |
0.97 | 0.51 | 0.35 – 2.62 | 0.946 |
Program [Vocational] * Gender [male] |
1.59 | 0.95 | 0.23 – 9.99 | 0.626 |
Program [General] * Math | 1.00 | 0.01 | 0.99 – 1.02 | 0.782 |
Program [Vocational] * Math |
1.01 | 0.01 | 0.99 – 1.03 | 0.359 |
Gender [male] * Math | 1.00 | 0.01 | 0.99 – 1.01 | 0.668 |
(Program [General] Gender [male]) Math |
1.00 | 0.01 | 0.98 – 1.02 | 0.877 |
(Program [Vocational] Gender [male]) Math |
1.00 | 0.02 | 0.97 – 1.03 | 0.852 |
## Single term deletions
##
## Model:
## Days ~ Program + Gender + Math + Program:Gender + Program:Math +
## Gender:Math + Program:Gender:Math
## Df Deviance F value Pr(>F)
## <none> 1730.4
## Program:Gender:Math 2 1730.7 0.0274 0.973
sjPlot::tab_model(m2 <- update(m1, . ~ . - Program:Gender:Math, family=quasipoisson), show.se=T, show.r2=F, show.obs=F)
Days | ||||
---|---|---|---|---|
Predictors | Incidence Rate Ratios | std. Error | CI | p |
(Intercept) | 10.27 | 0.18 | 7.08 – 14.59 | <0.001 |
Program [General] | 1.52 | 0.28 | 0.86 – 2.63 | 0.144 |
Program [Vocational] | 0.24 | 0.50 | 0.09 – 0.61 | 0.005 |
Gender [male] | 0.88 | 0.24 | 0.55 – 1.40 | 0.586 |
Math | 0.99 | 0.00 | 0.99 – 1.00 | 0.084 |
Program [General] * Gender [male] |
0.91 | 0.30 | 0.50 – 1.61 | 0.738 |
Program [Vocational] * Gender [male] |
1.36 | 0.35 | 0.68 – 2.71 | 0.386 |
Program [General] * Math | 1.00 | 0.01 | 0.99 – 1.01 | 0.819 |
Program [Vocational] * Math |
1.01 | 0.01 | 0.99 – 1.02 | 0.296 |
Gender [male] * Math | 1.00 | 0.00 | 0.99 – 1.01 | 0.469 |
dta_m2 <- broom::augment(m2)
dta_m2 <- dta_m2 %>% mutate(yhat=predict(m2, type="response"),
residSqr=residuals(m2, type="response")^2)
ggplot(data=dta_m2,
aes(x=yhat,
y=residSqr)) +
geom_point() +
geom_abline(intercept=0, slope=1) +
geom_abline(intercept=0,
slope=summary(m2)$dispersion,
color="coral") +
stat_smooth(method="loess",
formula=y ~ x,
se=FALSE) +
labs(x="Fitted values",
y="Squared residuals") +
theme_minimal()
Problem:
Data: These data arise from a study of a psychiatric screening questionnaire called the General Health Questionnaire (GHQ). The research interest focuses on how diagnosis of psychiatric illness is related to gender and GHQ score.
Column 1: GHQ score
Column 2: Gender ID, F = Female = F, M = Male
Column 3: Number of psychiatric cases
Column 4: Number of non-psychiatric cases
#
pacman::p_load(HSAUR3, tidyverse, binomTools)
pacman::p_load(tidyr, tidyverse, nlme, car, afex, emmeans, lattice, faraway, boot, ggplot2, kableExtra, MASS)
dta3 <- read.csv("GHQ.csv")
str(dta3)
## 'data.frame': 22 obs. of 4 variables:
## $ Score : int 0 1 2 3 4 5 6 7 8 9 ...
## $ Gender : chr "F" "F" "F" "F" ...
## $ Case : int 4 4 8 6 4 6 3 2 3 2 ...
## $ Noncase: int 80 29 15 3 2 1 1 0 0 0 ...
## Score Gender Case Noncase
## 1 0 F 4 80
## 2 1 F 4 29
## 3 2 F 8 15
## 4 3 F 6 3
## 5 4 F 4 2
## 6 5 F 6 1
#
data(GHQ, package="HSAUR3")
#
dta <- GHQ %>%
mutate(Total = cases + non.cases) %>%
dplyr::rename(Gender=gender, Score=GHQ, Case=cases, Noncase=non.cases)
#
str(dta)
## 'data.frame': 22 obs. of 5 variables:
## $ Score : int 0 1 2 3 4 5 6 7 8 9 ...
## $ Gender : Factor w/ 2 levels "female","male": 1 1 1 1 1 1 1 1 1 1 ...
## $ Case : int 4 4 8 6 4 6 3 2 3 2 ...
## $ Noncase: int 80 29 15 3 2 1 1 0 0 0 ...
## $ Total : int 84 33 23 9 6 7 4 2 3 2 ...
## Gender Case Noncase
## 1 female 43 131
## 2 male 25 79
## Score Case Noncase
## 1 0 5 116
## 2 1 6 54
## 3 2 10 23
## 4 3 7 7
## 5 4 7 3
## 6 5 9 2
## 7 6 5 2
## 8 7 6 2
## 9 8 6 1
## 10 9 4 0
## 11 10 3 0
# set black and white theme
ot <- theme_set(theme_bw())
#
ggplot(dta, aes(Score, Case/Total, color = Gender)) +
geom_jitter() +
stat_smooth(method = "glm", formula = y ~ x,
method.args = list(family = binomial),
aes(weight = Total), se = FALSE) +
labs(x = "GHQ score", y = "Proportion of cases") +
theme(legend.position = c(.1, .9))
#logistic regression models
# slopes and intercepts dependent on gender
summary(m0 <- glm(cbind(Case, Noncase) ~ Score*Gender, data = dta,
family = "binomial"))
##
## Call:
## glm(formula = cbind(Case, Noncase) ~ Score * Gender, family = "binomial",
## data = dta)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.29971 -0.32521 -0.03273 0.39672 1.45689
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.7732 0.3586 -7.732 1.06e-14 ***
## Score 0.9412 0.1569 6.000 1.97e-09 ***
## Gendermale -0.2253 0.6093 -0.370 0.712
## Score:Gendermale -0.3020 0.1990 -1.517 0.129
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 130.3059 on 21 degrees of freedom
## Residual deviance: 8.7669 on 18 degrees of freedom
## AIC: 52.741
##
## Number of Fisher Scoring iterations: 5
# different intercepts for gender
summary(m1 <- glm(cbind(Case, Noncase) ~ Score+Gender, data = dta,
family = "binomial"))
##
## Call:
## glm(formula = cbind(Case, Noncase) ~ Score + Gender, family = "binomial",
## data = dta)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.3955 -0.3939 0.1876 0.4315 1.3306
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.49351 0.28164 -8.854 < 2e-16 ***
## Score 0.77910 0.09903 7.867 3.63e-15 ***
## Gendermale -0.93609 0.43435 -2.155 0.0311 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 130.306 on 21 degrees of freedom
## Residual deviance: 11.113 on 19 degrees of freedom
## AIC: 53.087
##
## Number of Fisher Scoring iterations: 5
# no gender effect
summary(m2 <- glm(cbind(Case, Noncase) ~ Score, data = dta,
family = "binomial"))
##
## Call:
## glm(formula = cbind(Case, Noncase) ~ Score, family = "binomial",
## data = dta)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.7690 -0.7231 0.1255 0.5306 1.7578
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.71073 0.27245 -9.950 < 2e-16 ***
## Score 0.73604 0.09457 7.783 7.11e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 130.306 on 21 degrees of freedom
## Residual deviance: 16.237 on 20 degrees of freedom
## AIC: 56.211
##
## Number of Fisher Scoring iterations: 5
## Analysis of Deviance Table
##
## Model 1: cbind(Case, Noncase) ~ Score * Gender
## Model 2: cbind(Case, Noncase) ~ Score + Gender
## Model 3: cbind(Case, Noncase) ~ Score
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 18 8.7669
## 2 19 11.1130 -1 -2.3461 0.1256
## 3 20 16.2368 -1 -5.1238 0.0236 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# fortify data with proportions and predicted probabilities
dta_m1 <- dta %>%
mutate(Prop = Case/(Case+Noncase),
phat = predict(m1, newdata = dta[, c(1,2)],
type = "response"))
# plot
ggplot(dta_m1, aes(Score, Prop, color = Gender)) +
geom_jitter() +
geom_line(aes(Score, phat, color = Gender)) +
geom_hline(yintercept = .5, col = "gray") +
labs(x = "GHQ score", y = "Proportion of cases") +
theme(legend.position = c(.9, .1))
# model diagnostics
# residual plot
plot(Residuals(m1, type = c('standard.pearson')) ~ fitted(m1),
ylim=c(-3.3, 3.3),
pch = 20,
xlab = "Fitted values",
ylab = "Pearson residuals")
abline(h=0)
grid()
Goldberg, B.P. (1972). The detection of psychiatric illness by questionnaire. Reported in Everitt, B.S. (2001). Statistics for Psychologists: An Intermediate Course, p. 309.