pacman::p_load(magrittr, dplyr, knitr, rmdformats, geepack, repolr, geesmv)
## Global options
options(max.print="75")
opts_chunk$set(echo=FALSE,
cache=TRUE,
prompt=FALSE,
tidy=TRUE,
comment=NA,
message=FALSE,
warning=FALSE)
opts_knit$set(width=75)
fL <- "https://content.sph.harvard.edu/fitzmaur/ala2e/muscatine-data.txt"
dta <- read.table(fL, h=F, na.strings = '.')
str(dta)
## 'data.frame': 14568 obs. of 6 variables:
## $ V1: int 1 1 1 2 2 2 3 3 3 4 ...
## $ V2: int 0 0 0 0 0 0 0 0 0 0 ...
## $ V3: int 6 6 6 6 6 6 6 6 6 6 ...
## $ V4: int 6 8 10 6 8 10 6 8 10 6 ...
## $ V5: int 1 2 3 1 2 3 1 2 3 1 ...
## $ V6: int 1 1 1 1 1 1 1 1 1 1 ...
ID Gender Age0 Age Occasion Obese
1 1 Boys 6 6 1977 1
2 1 Boys 6 8 1979 1
3 1 Boys 6 10 1981 1
4 2 Boys 6 6 1977 1
5 2 Boys 6 8 1979 1
6 2 Boys 6 10 1981 1
3.以列表檢視變更資料
Occasion 1 2 3
Obese 0 1 0 1 0 1
Gender Age
F 6 141 23 0 0 0 0
8 294 58 265 55 0 0
10 270 92 276 87 268 90
12 291 91 279 99 278 92
14 300 89 226 64 256 73
16 0 0 250 87 226 56
18 0 0 0 0 140 37
M 6 174 15 0 0 0 0
8 289 67 296 54 0 0
10 312 84 298 77 308 83
12 281 90 299 88 290 90
14 307 73 233 65 269 78
16 0 0 251 67 224 54
18 0 0 0 0 153 34
m0 : Obese ~ Gender + Age + Age2 + Gender:Age + Gender:Age2 (correlation matrix for change over time: AR1) 結果顯示年齡與年齡^2為判定為obesity的重要變項
ID Gender Age0 Age Occasion Obese Age2
1 1 M 6 -6 1 1 36
2 1 M 6 -4 2 1 16
3 1 M 6 -2 3 1 4
4 2 M 6 -6 1 1 36
5 2 M 6 -4 2 1 16
6 2 M 6 -2 3 1 4
Call:
geeglm(formula = Obese ~ Gender + Age + Age2 + Gender:Age + Gender:Age2,
family = binomial, data = dta1, id = ID, corstr = "ar1")
Coefficients:
Estimate Std.err Wald Pr(>|W|)
(Intercept) -1.100392 0.050179 480.889 < 2e-16 ***
GenderM -0.105306 0.071357 2.178 0.140012
Age 0.045164 0.012811 12.428 0.000423 ***
Age2 -0.014597 0.003249 20.184 7.03e-06 ***
GenderM:Age -0.002900 0.018563 0.024 0.875848
GenderM:Age2 -0.003495 0.004715 0.549 0.458569
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation structure = ar1
Estimated Scale Parameters:
Estimate Std.err
(Intercept) 0.9943 0.02807
Link = identity
Estimated Correlation Parameters:
Estimate Std.err
alpha 0.6106 0.02036
Number of clusters: 4856 Maximum cluster size: 3
交互作用移除後,性別與年齡及年齡^2皆為判定為obesity的重要變項,資料顯示應該是女生(=1)較會被判定為obese
Call:
geeglm(formula = Obese ~ Gender + Age + Age2, family = binomial,
data = dta1, id = ID, corstr = "ar1")
Coefficients:
Estimate Std.err Wald Pr(>|W|)
(Intercept) -1.08751 0.04724 530.08 < 2e-16 ***
GenderM -0.13145 0.06288 4.37 0.037 *
Age 0.04387 0.00926 22.43 2.2e-06 ***
Age2 -0.01630 0.00235 48.05 4.2e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation structure = ar1
Estimated Scale Parameters:
Estimate Std.err
(Intercept) 0.994 0.028
Link = identity
Estimated Correlation Parameters:
Estimate Std.err
alpha 0.61 0.0203
Number of clusters: 4856 Maximum cluster size: 3
| Obese | Obese | |||||
|---|---|---|---|---|---|---|
| Predictors | Odds Ratios | CI | p | Odds Ratios | CI | p |
| (Intercept) | 0.33 | 0.30 – 0.37 | <0.001 | 0.34 | 0.31 – 0.37 | <0.001 |
| Gender [M] | 0.90 | 0.78 – 1.03 | 0.136 | 0.88 | 0.78 – 0.99 | 0.036 |
| Age | 1.04 | 1.02 – 1.07 | 0.001 | 1.04 | 1.02 – 1.06 | <0.001 |
| Age2 | 0.99 | 0.98 – 0.99 | <0.001 | 0.98 | 0.98 – 0.99 | <0.001 |
| Gender [M] * Age | 0.99 | 0.96 – 1.03 | 0.765 | |||
| Gender [M] * Age2 | 1.00 | 0.99 – 1.01 | 0.483 | |||
在1977, 1979, 1981;
三次施測的到的結果類似,女生較肥,12~14歲左右為肥胖高峰期
The end