For the claudication example, first reproduce the results of the analysis then analyze it with a model constraining baseline effect to be the same for both treatment groups while ignoring gender.
The new drug Novafylline is developed to reduce the symptoms of intermittent claudication, which is a medical term for pain caused by too little blood flow, usually during exercise. Patients were randomly assigned to receive either Novafylline (active treatment) or a placebo in a double-blind study. A total of 38 patients (20 for treatment, 18 for placebo) underwent treadmill testing at baseline and at each of 4 monthly, follow-up visits. Patients were stratified by gender (17 females, 21 males). The primary measurement of efficacy is the walking distance on a treadmill until discontinuation due to claudication pain. Is there any distinction in exercise tolerance profiles between patients who receive Novafylline and those on placebo?
Source: Glenn Walker, G., & Shostak, J.(2010). Common Statistical Methods for Clinical Research with SAS Examples.
Column 1: Treatment ID Column 2: Patient ID Column 3: Gender ID Column 4: Walking distance at baseline (month 0) Column 5: Walking distance at month 1 Column 6: Walking distance at month 2 Column 7: Walking distance at month 3 Column 8: Walking distance at month 4
pacman::p_load(knitr, simstudy, tidyverse, nlme, rms, ggplot2, lme4)
dta <- read.csv('data/claudication.csv')
head(dta)
Treatment PID Gender D0 D1 D2 D3 D4
1 ACT P101 M 190 212 213 195 248
2 ACT P105 M 98 137 185 215 225
3 ACT P109 M 155 145 196 189 176
4 ACT P112 M 245 228 280 274 260
5 ACT P117 M 182 205 218 194 193
6 ACT P122 M 140 138 187 195 205
dtaL <- gather(data=dta, key=Time, value=Score, D0:D4, factor_key=TRUE)
head(dtaL)
Treatment PID Gender Time Score
1 ACT P101 M D0 190
2 ACT P105 M D0 98
3 ACT P109 M D0 155
4 ACT P112 M D0 245
5 ACT P117 M D0 182
6 ACT P122 M D0 140
theme_set(theme_bw())
# means with error bars
ggplot(dtaL, aes(Time, Score, group=Treatment, color=Treatment)) +
geom_point(alpha = .3) +
stat_summary(fun = mean, geom = "line") +
stat_summary(fun = mean, geom = "point") +
stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.2) +
facet_grid( ~ Gender) +
labs(x = "Time (in months)", y = "Walking Distance (in meters)")+
theme(legend.position=c(0.08,0.88))
dta %>%
group_by(Gender, Treatment) %>%
summarise_all('mean') %>%
select(-PID)
# A tibble: 4 x 7
# Groups: Gender [2]
Gender Treatment D0 D1 D2 D3 D4
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 F ACT 180. 193. 208. 197. 208.
2 F PBO 166. 170. 176. 168. 171.
3 M ACT 163. 180. 196. 204. 208.
4 M PBO 178. 166. 173. 194. 194.
dta %>%
group_by(Gender, Treatment) %>%
summarise_all('var') %>%
select(-PID)
# A tibble: 4 x 7
# Groups: Gender [2]
Gender Treatment D0 D1 D2 D3 D4
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 F ACT 1779. 1096. 3388. 1735. 3261.
2 F PBO 1725. 1531. 2233. 1118. 2056.
3 M ACT 1793. 1191. 1612. 807. 786.
4 M PBO 2026. 2073. 1908. 3067. 2242.
cor(subset(dta, Treatment == "ACT" & Gender == "F")[,-(1:3)])
D0 D1 D2 D3 D4
D0 1.00000 0.80220 0.87247 0.84839 0.87572
D1 0.80220 1.00000 0.86451 0.70738 0.90544
D2 0.87247 0.86451 1.00000 0.89375 0.91596
D3 0.84839 0.70738 0.89375 1.00000 0.92511
D4 0.87572 0.90544 0.91596 0.92511 1.00000
cor(subset(dta, Treatment == "ACT" & Gender == "M")[,-(1:3)])
D0 D1 D2 D3 D4
D0 1.00000 0.86248 0.80526 0.59283 0.40690
D1 0.86248 1.00000 0.64643 0.34983 0.44968
D2 0.80526 0.64643 1.00000 0.67049 0.63236
D3 0.59283 0.34983 0.67049 1.00000 0.59414
D4 0.40690 0.44968 0.63236 0.59414 1.00000
cor(subset(dta, Treatment == "PBO" & Gender == "F")[,-(1:3)])
D0 D1 D2 D3 D4
D0 1.00000 0.92691 0.60579 0.86576 0.77769
D1 0.92691 1.00000 0.68262 0.77079 0.75800
D2 0.60579 0.68262 1.00000 0.37132 0.48466
D3 0.86576 0.77079 0.37132 1.00000 0.79310
D4 0.77769 0.75800 0.48466 0.79310 1.00000
cor(subset(dta, Treatment == "PBO" & Gender == "M")[,-(1:3)])
D0 D1 D2 D3 D4
D0 1.00000 0.92712 0.85306 0.88336 0.92020
D1 0.92712 1.00000 0.70360 0.78467 0.90313
D2 0.85306 0.70360 1.00000 0.82928 0.80731
D3 0.88336 0.78467 0.82928 1.00000 0.91975
D4 0.92020 0.90313 0.80731 0.91975 1.00000
dtaL <- mutate(dtaL, Time_f = factor(Time))
dtaL$Time <- as.numeric(dtaL$Time)
m0_gls <- gls(Score ~ Time + Gender * Treatment,
weights = varIdent(form = ~ 1 | Time_f),
correlation=corSymm (form = ~ 1 | PID),
data = dtaL)
summary(m0_gls)
Generalized least squares fit by REML
Model: Score ~ Time + Gender * Treatment
Data: dtaL
AIC BIC logLik
1790.6 1855 -875.31
Correlation Structure: General
Formula: ~1 | PID
Parameter estimate(s):
Correlation:
1 2 3 4
2 0.872
3 0.747 0.717
4 0.760 0.652 0.681
5 0.718 0.739 0.717 0.840
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | Time_f
Parameter estimates:
D0 D1 D2 D3 D4
1.00000 0.85134 1.01293 0.89669 0.96687
Coefficients:
Value Std.Error t-value p-value
(Intercept) 179.391 12.8983 13.9081 0.0000
Time 6.451 1.3386 4.8192 0.0000
GenderM 0.020 15.6405 0.0013 0.9990
TreatmentPBO -29.113 16.6506 -1.7485 0.0820
GenderM:TreatmentPBO 0.204 22.4846 0.0091 0.9928
Correlation:
(Intr) Time GendrM TrtPBO
Time -0.343
GenderM -0.728 0.000
TreatmentPBO -0.683 0.000 0.564
GenderM:TreatmentPBO 0.506 0.000 -0.696 -0.741
Standardized residuals:
Min Q1 Med Q3 Max
-1.93342 -0.56241 -0.13294 0.53448 3.67355
Residual standard error: 45.444
Degrees of freedom: 190 total; 185 residual
m0_Gls <- Gls(Score ~ Time + Gender * Treatment,
weights = varIdent(form = ~ 1 | Time_f),
correlation=corSymm (form = ~ 1 | PID),
data = dtaL)
predictions <- cbind(dtaL, predict(m0_Gls, dtaL, conf.int=0.95))
predictions$Group2 <- factor(predictions$Treatment, levels=c("All", "ACT", "PBO"))
predictions$Group2[predictions$Time=="D0"] <- "All"
pd <- position_dodge(.08)
limits <- aes(ymax=upper , ymin=lower, shape=Group2)
pCI <- ggplot(predictions,
aes(y=linear.predictors,
x=Time)) +
geom_errorbar(limits,
width=0.09,
position=pd) +
geom_line(aes(group=Treatment,
y=linear.predictors),
position=pd)+
geom_point(aes(fill=Group2, shape=Group2),
position=pd,
size=rel(4)) +
scale_fill_manual(values=c("white", "gray80", "gray20", "black")) +
theme_minimal() +
theme(panel.grid.major=element_blank(),
legend.title=element_blank(),
text=element_text(size=14),
legend.position="bottom") +
labs(x="Time",
y="Estimated mean with corresponding 95% CI")
pCI
Homework 2: Analyze the Alzheimer progression data set following the disussions in the lecture note.
Patients were randomized to receive one of two daily doses (low or high) of a new treatment for Alzheimer・s disease or a placebo in a study. Each patient returned to the clinic every 2 months for 1 year for assessment of disease progression based on cognitive measurements on the Alzheimer・s Disease Assessment Scale (ADAS). This test evaluates memory, language, and praxis function, and is based on the sum of scores from an 11-item scale, with a potential range of 0 to 70, higher scores indicative of greater disease severity. The primary goal is to determine if the rate of disease progression is slowed with active treatment compared with a placebo. Is there a difference in response profiles over time among the three groups?
Source: Glenn Walker, G., & Shostak, J.(2010). Common Statistical Methods for Clinical Research with SAS Examples.
Column 1: Treatment ID (L = Low dose, H = High dose, P = Placebo) Column 2: Patient ID Column 3: ADAS score at month 2 Column 4: ADAS score at month 4 Column 5: ADAS score at month 6 Column 6: ADAS score at month 8 Column 7: ADAS score at month 10 Column 8: ADAS score at month 12
dta <- read.table('data/adas.txt', header = T, na.strings = ".") %>%
na.omit(.)
head(dta)
Treatment PID adas02 adas04 adas06 adas08 adas10 adas12
2 L 5 34 35 46 37 31 35
3 L 8 40 41 41 46 52 48
6 L 15 31 36 41 46 52 57
7 L 19 22 27 28 24 27 28
8 L 21 43 49 42 48 48 46
10 L 28 25 24 27 18 21 22
dtaL <- gather(data=dta, key=Time, value=Adas, adas02:adas12, factor_key=TRUE)
dtaL$Time <- factor(dtaL$Time, levels = c("adas02", "adas04", "adas06", "adas08",
"adas10", "adas12"),
labels = c("2", "4", "6", "8", "10", "12"))
head(dtaL)
Treatment PID Time Adas
1 L 5 2 34
2 L 8 2 40
3 L 15 2 31
4 L 19 2 22
5 L 21 2 43
6 L 28 2 25
theme_set(theme_bw())
# means with error bars
ggplot(dtaL, aes(Time, Adas, group=Treatment, color=Treatment)) +
geom_point(alpha = .3) +
stat_summary(fun = mean, geom = "line") +
stat_summary(fun = mean, geom = "point") +
stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.2) +
facet_grid( ~ Treatment)
labs(x = "Time (in months)", y = "Adas Scores")
$x
[1] "Time (in months)"
$y
[1] "Adas Scores"
attr(,"class")
[1] "labels"
dta %>%
group_by(Treatment) %>%
summarise_all('mean') %>%
select(-PID)
# A tibble: 3 x 7
Treatment adas02 adas04 adas06 adas08 adas10 adas12
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 H 30.2 32.4 33.6 33.3 33.7 34.3
2 L 32.8 35.1 37.6 36.4 38.4 39.2
3 P 29.6 33.4 34.9 36.4 37.6 39.0
dta %>%
group_by(Treatment) %>%
summarise_all('var') %>%
select(-PID)
# A tibble: 3 x 7
Treatment adas02 adas04 adas06 adas08 adas10 adas12
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 H 73.2 53.7 73.2 82 55.4 50.4
2 L 56.4 74.6 82.6 123. 125. 116.
3 P 45.9 41.2 41.3 40.2 49.5 77.4
cor(subset(dta, Treatment == "H")[,-(1:2)])
adas02 adas04 adas06 adas08 adas10 adas12
adas02 1.00000 0.88586 0.73007 0.78384 0.86916 0.80794
adas04 0.88586 1.00000 0.82823 0.90027 0.92266 0.75602
adas06 0.73007 0.82823 1.00000 0.84630 0.87371 0.68458
adas08 0.78384 0.90027 0.84630 1.00000 0.95110 0.78685
adas10 0.86916 0.92266 0.87371 0.95110 1.00000 0.83392
adas12 0.80794 0.75602 0.68458 0.78685 0.83392 1.00000
cor(subset(dta, Treatment == "L")[,-(1:2)])
adas02 adas04 adas06 adas08 adas10 adas12
adas02 1.00000 0.96956 0.89628 0.89383 0.85371 0.82287
adas04 0.96956 1.00000 0.90886 0.92848 0.89502 0.85668
adas06 0.89628 0.90886 1.00000 0.92592 0.83630 0.83417
adas08 0.89383 0.92848 0.92592 1.00000 0.95700 0.93159
adas10 0.85371 0.89502 0.83630 0.95700 1.00000 0.97054
adas12 0.82287 0.85668 0.83417 0.93159 0.97054 1.00000
cor(subset(dta, Treatment == "P")[,-(1:2)])
adas02 adas04 adas06 adas08 adas10 adas12
adas02 1.00000 0.83409 0.48174 0.50351 0.48623 0.54094
adas04 0.83409 1.00000 0.41375 0.20586 0.26914 0.32287
adas06 0.48174 0.41375 1.00000 0.73207 0.23857 0.54100
adas08 0.50351 0.20586 0.73207 1.00000 0.49470 0.59789
adas10 0.48623 0.26914 0.23857 0.49470 1.00000 0.86075
adas12 0.54094 0.32287 0.54100 0.59789 0.86075 1.00000
car::scatterplotMatrix(~ adas02 + adas04 + adas06 + adas08 + adas10 + adas12 | Treatment,
data=dta,
col=8,
smooth=FALSE,
regLine=FALSE,
ellipse=list(levels=c(0.68, 0.975),
fill=TRUE,
fill.alpha=0.1))
High dose group
dta %>%
filter( Treatment == "H") %>%
select( -(1:2)) %>%
var(na.rm = T)
adas02 adas04 adas06 adas08 adas10 adas12
adas02 73.242 55.542 53.458 60.745 55.359 49.111
adas04 55.542 53.673 51.915 59.725 50.307 39.340
adas06 53.458 51.915 73.203 65.569 55.634 41.601
adas08 60.745 59.725 65.569 82.000 64.098 50.608
adas10 55.359 50.307 55.634 64.098 55.389 44.082
adas12 49.111 39.340 41.601 50.608 44.082 50.448
Low dose group
dta %>%
filter( Treatment == "L") %>%
select( -(1:2)) %>%
var(na.rm = T)
adas02 adas04 adas06 adas08 adas10 adas12
adas02 56.404 62.897 61.184 74.452 71.754 66.533
adas04 62.897 74.610 71.357 88.949 86.518 79.665
adas06 61.184 71.357 82.618 93.342 85.070 81.629
adas08 74.452 88.949 93.342 123.007 118.783 111.235
adas10 71.754 86.518 85.070 118.783 125.243 116.934
adas12 66.533 79.665 81.629 111.235 116.934 115.904
Placebo group
dta %>%
filter( Treatment == "P") %>%
select( -(1:2)) %>%
var(na.rm = T)
adas02 adas04 adas06 adas08 adas10 adas12
adas02 45.948 36.2714 20.993 21.6524 23.179 32.269
adas04 36.271 41.1571 17.064 8.3786 12.143 18.229
adas06 20.993 17.0643 41.329 29.8571 10.786 30.607
adas08 21.652 8.3786 29.857 40.2476 22.071 33.381
adas10 23.179 12.1429 10.786 22.0714 49.457 53.271
adas12 32.269 18.2286 30.607 33.3810 53.271 77.448
m0 <- lm(cbind(adas02 , adas04 , adas06 , adas08 , adas10 , adas12) ~ Treatment, data=dta)
anova(m0)
Analysis of Variance Table
Df Pillai approx F num Df den Df Pr(>F)
(Intercept) 1 0.964 212.4 6 48 <2e-16
Treatment 2 0.254 1.2 12 98 0.3
Residuals 53
summary(m0a <- aov(Adas ~ Time*Treatment + Error(PID/Time), data=dtaL))
Error: PID
Df Sum Sq Mean Sq
Treatment 1 3.58 3.58
Error: PID:Time
Df Sum Sq Mean Sq
Time 5 1224 245
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
Time 5 450 90 1.29 0.2662
Treatment 2 717 359 5.16 0.0062
Time:Treatment 10 231 23 0.33 0.9718
Residuals 312 21679 69
sjPlot::tab_model(m0_er <- lmer(Adas ~ Time*Treatment + (PID|Time), data=dtaL), show.ci=F, show.r2=F, show.ngroups=F, show.obs=F)
| Adas | ||
|---|---|---|
| Predictors | Estimates | p |
| (Intercept) | 30.22 | <0.001 |
| Time [4] | 2.22 | 0.853 |
| Time [6] | 3.33 | 0.781 |
| Time [8] | 3.11 | 0.796 |
| Time [10] | 3.50 | 0.771 |
| Time [12] | 4.06 | 0.736 |
| Treatment [L] | 2.60 | 0.352 |
| Treatment [P] | -0.60 | 0.820 |
| Time [4] * Treatment [L] | 0.07 | 0.985 |
| Time [6] * Treatment [L] | 1.49 | 0.706 |
| Time [8] * Treatment [L] | 0.48 | 0.904 |
| Time [10] * Treatment [L] | 2.03 | 0.608 |
| Time [12] * Treatment [L] | 2.30 | 0.561 |
| Time [4] * Treatment [P] | 1.59 | 0.673 |
| Time [6] * Treatment [P] | 1.90 | 0.612 |
| Time [8] * Treatment [P] | 3.65 | 0.331 |
| Time [10] * Treatment [P] | 4.45 | 0.236 |
| Time [12] * Treatment [P] | 5.37 | 0.153 |
| Random Effects | ||
| σ2 | 68.40 | |
| τ00 Time | 68.40 | |
| τ11 Time.PID | 0.00 | |
| ρ01 Time | 1.00 | |
dtaL <- mutate(dtaL, Time_f = factor(Time))
dtaL$Time <- as.numeric(dtaL$Time)
summary(m0_csh <- gls(Adas ~ Time*Treatment, data=dtaL,
weights=varIdent(form= ~ 1 | Time_f),
correlation=corCompSymm(form= ~ 1 | PID)))
Generalized least squares fit by REML
Model: Adas ~ Time * Treatment
Data: dtaL
AIC BIC logLik
2069.4 2118.7 -1021.7
Correlation Structure: Compound symmetry
Formula: ~1 | PID
Parameter estimate(s):
Rho
0.76308
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | Time_f
Parameter estimates:
2 4 6 8 10 12
1.00000 0.96819 1.05763 1.11731 1.09674 1.17009
Coefficients:
Value Std.Error t-value p-value
(Intercept) 30.5984 1.70067 17.9919 0.0000
Time 0.7066 0.23384 3.0217 0.0027
TreatmentL 2.0359 2.44023 0.8343 0.4047
TreatmentP -1.4339 2.31763 -0.6187 0.5365
Time:TreatmentL 0.4697 0.33552 1.4000 0.1624
Time:TreatmentP 1.0645 0.31866 3.3405 0.0009
Correlation:
(Intr) Time TrtmnL TrtmnP Tm:TrL
Time -0.207
TreatmentL -0.697 0.145
TreatmentP -0.734 0.152 0.511
Time:TreatmentL 0.145 -0.697 -0.207 -0.106
Time:TreatmentP 0.152 -0.734 -0.106 -0.207 0.511
Standardized residuals:
Min Q1 Med Q3 Max
-2.243020 -0.753525 -0.071033 0.682438 2.675540
Residual standard error: 7.7169
Degrees of freedom: 336 total; 330 residual
m0_unh <- gls(Adas ~ Time*Treatment, data=dtaL,
weights=varIdent(form= ~ 1 | Time_f),
correlation=corSymm(form= ~ 1 | PID))
summary(m0_unh)
Generalized least squares fit by REML
Model: Adas ~ Time * Treatment
Data: dtaL
AIC BIC logLik
2017.4 2120 -981.72
Correlation Structure: General
Formula: ~1 | PID
Parameter estimate(s):
Correlation:
1 2 3 4 5
2 0.881
3 0.706 0.757
4 0.751 0.748 0.842
5 0.741 0.724 0.686 0.852
6 0.709 0.640 0.676 0.790 0.905
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | Time_f
Parameter estimates:
2 4 6 8 10 12
1.0000 0.9705 1.0392 1.1543 1.1268 1.1828
Coefficients:
Value Std.Error t-value p-value
(Intercept) 30.2745 1.79017 16.9115 0.0000
Time 0.9080 0.29553 3.0726 0.0023
TreatmentL 2.8237 2.56865 1.0993 0.2724
TreatmentP -1.8918 2.43960 -0.7755 0.4386
Time:TreatmentL 0.3743 0.42405 0.8826 0.3781
Time:TreatmentP 1.1448 0.40274 2.8424 0.0048
Correlation:
(Intr) Time TrtmnL TrtmnP Tm:TrL
Time -0.394
TreatmentL -0.697 0.275
TreatmentP -0.734 0.289 0.511
Time:TreatmentL 0.275 -0.697 -0.394 -0.201
Time:TreatmentP 0.289 -0.734 -0.201 -0.394 0.511
Standardized residuals:
Min Q1 Med Q3 Max
-2.28500 -0.83379 -0.12055 0.61321 2.64189
Residual standard error: 7.6689
Degrees of freedom: 336 total; 330 residual
anova(m0_csh, m0_unh)
Model df AIC BIC logLik Test L.Ratio p-value
m0_csh 1 13 2069.4 2118.8 -1021.68
m0_unh 2 27 2017.4 2120.0 -981.72 1 vs 2 79.926 <.0001
Unstructured may be the better fit model than Compound Symmetry model.
In this case, although it was a randomized exp design, it did not record their adas scores before intervention as baseline. I don’t know whether using their adas scores at 2 months is appropriate for constrained model, so I don’t do it.
Use appropriate methods to answer questions posed in the body fat example.
The body fat was measured, using dual-energy x-ray absorptiometry, in several hundred children at age 11, 13, and 15 years.
Use the data set (in Stata dta format) to answer the following questions: (a) Does body fat increase with age for both girls and boys? (b) Is there a difference between girls and boys with respect to the increase of the body fat with age? (c) Do girls tend to have more body fat than boys?
Source: Vach, W. (2013). Regression Models as a Tool in Medical Research. p. 330. Chapman/Hall & CRC Press.
Column 1: Subject ID Column 2: Gender ID Column 3: Age in years Column 4: Body fat in kilograms
dta <- foreign::read.dta('data/bodyfat.dta')
head(dta)
id sex age bodyfat
1 1 male 11 4.0
2 1 male 13 6.2
3 1 male 15 10.5
4 2 female 11 8.1
5 2 female 13 10.4
6 2 female 15 15.2
lattice::xyplot(bodyfat ~ age|sex,
groups=id,
type=c('l','g', 'p'),
xlab="Time (year)",
ylab="Body Fat (kg)",
data=dta)
dta %>%
group_by(sex, age) %>%
summarise_all('mean') %>%
select(-id)
# A tibble: 6 x 3
# Groups: sex [2]
sex age bodyfat
<fct> <dbl> <dbl>
1 female 11 8.77
2 female 13 10.2
3 female 15 11.3
4 male 11 7.94
5 male 13 8.84
6 male 15 9.82
dta %>%
group_by(sex, age) %>%
summarise_all('var') %>%
select(-id)
# A tibble: 6 x 3
# Groups: sex [2]
sex age bodyfat
<fct> <dbl> <dbl>
1 female 11 3.71
2 female 13 5.13
3 female 15 7.82
4 male 11 3.93
5 male 13 4.92
6 male 15 8.14
dtaW <- dta %>%
gather(key, value, -sex, -age, -id) %>%
unite(col, key, age) %>%
spread(col, value)
head(dtaW)
id sex bodyfat_11 bodyfat_13 bodyfat_15
1 1 male 4.0 6.2 10.5
2 2 female 8.1 10.4 15.2
3 3 female 9.0 9.2 11.4
4 4 female 7.3 9.2 12.3
5 5 female 11.6 10.9 10.5
6 6 male 9.9 9.6 11.9
Female
cor(subset(dtaW, sex == "female")[,-(1:2)])
bodyfat_11 bodyfat_13 bodyfat_15
bodyfat_11 1 NA NA
bodyfat_13 NA 1 NA
bodyfat_15 NA NA 1
Male
cor(subset(dtaW, sex == "male")[,-(1:2)])
bodyfat_11 bodyfat_13 bodyfat_15
bodyfat_11 1 NA NA
bodyfat_13 NA 1 NA
bodyfat_15 NA NA 1
car::scatterplotMatrix(~ bodyfat_11 + bodyfat_13 + bodyfat_15 | sex,
data=dtaW,
col=8,
smooth=FALSE,
regLine=FALSE,
ellipse=list(levels=c(0.68, 0.975),
fill=TRUE,
fill.alpha=0.1))
ggplot(dta,
aes(age, bodyfat, color=sex)) +
stat_summary(fun.data="mean_cl_boot",
geom="pointrange",
col=2) +
geom_hline(yintercept=mean(dta$bodyfat),
linetype="dashed") +
geom_point(alpha=.3, size=rel(.8)) +
facet_grid(~ sex)
labs(x="Age (year)",
y="Bodyfat (kg)") +
theme_minimal() +
theme(legend.position = "none")
NULL
summary(m0 <- aov(bodyfat ~ age*sex + Error(id/age), data=dta))
Error: id
Df Sum Sq Mean Sq
age 1 8.63 8.63
Error: id:age
Df Sum Sq Mean Sq
age 1 1468 1468
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
age 1 353 353 63.6 2.4e-15
sex 1 821 821 147.8 < 2e-16
age:sex 1 43 43 7.8 0.0053
Residuals 2267 12590 6
Univariate approach revealed that (a) Does body fat increase with age for both girls and boys? A: Yes
Is there a difference between girls and boys with respect to the increase of the body fat with age? A: Yes
Do girls tend to have more body fat than boys? A: Yes
summary(m1 <- gls(bodyfat ~ sex*age,
weights = varIdent(form = ~ 1 | age*sex),
data = dta))
Generalized least squares fit by REML
Model: bodyfat ~ sex * age
Data: dta
AIC BIC logLik
10273 10330 -5126.3
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | age * sex
Parameter estimates:
11*male 13*male 15*male 11*female 13*female 15*female
1.00000 1.11720 1.43844 0.97085 1.14155 1.41011
Coefficients:
Value Std.Error t-value p-value
(Intercept) 1.69527 0.55897 3.0328 0.0025
sexmale 1.09634 0.76891 1.4258 0.1541
age 0.64532 0.04450 14.5030 0.0000
sexmale:age -0.17819 0.06113 -2.9148 0.0036
Correlation:
(Intr) sexmal age
sexmale -0.727
age -0.992 0.721
sexmale:age 0.722 -0.992 -0.728
Standardized residuals:
Min Q1 Med Q3 Max
-2.85177 -0.70626 -0.10066 0.59643 5.17847
Residual standard error: 1.9832
Degrees of freedom: 2273 total; 2269 residual
Model 2: Variance Component across time.
summary(m2 <- gls(bodyfat ~ sex*age,
weights = varExp(form = ~ age),
data = dta))
Generalized least squares fit by REML
Model: bodyfat ~ sex * age
Data: dta
AIC BIC logLik
10268 10302 -5127.8
Variance function:
Structure: Exponential of variance covariate
Formula: ~age
Parameter estimates:
expon
0.091678
Coefficients:
Value Std.Error t-value p-value
(Intercept) 1.69849 0.55798 3.0440 0.0024
sexmale 1.09179 0.75884 1.4388 0.1504
age 0.64498 0.04441 14.5226 0.0000
sexmale:age -0.17767 0.06038 -2.9424 0.0033
Correlation:
(Intr) sexmal age
sexmale -0.735
age -0.992 0.730
sexmale:age 0.730 -0.992 -0.736
Standardized residuals:
Min Q1 Med Q3 Max
-2.86724 -0.71920 -0.10029 0.60677 5.32886
Residual standard error: 0.70298
Degrees of freedom: 2273 total; 2269 residual
summary(m3 <- gls(bodyfat ~ sex*age,
weights = varExp(form = ~ age | sex),
data = dta))
Generalized least squares fit by REML
Model: bodyfat ~ sex * age
Data: dta
AIC BIC logLik
10270 10310 -5127.8
Variance function:
Structure: Exponential of variance covariate, different strata
Formula: ~age | sex
Parameter estimates:
male female
0.092026 0.091266
Coefficients:
Value Std.Error t-value p-value
(Intercept) 1.69884 0.55502 3.0609 0.0022
sexmale 1.09153 0.75824 1.4396 0.1501
age 0.64495 0.04417 14.6017 0.0000
sexmale:age -0.17765 0.06033 -2.9444 0.0033
Correlation:
(Intr) sexmal age
sexmale -0.732
age -0.992 0.726
sexmale:age 0.726 -0.992 -0.732
Standardized residuals:
Min Q1 Med Q3 Max
-2.88503 -0.71545 -0.10076 0.60445 5.30852
Residual standard error: 0.70297
Degrees of freedom: 2273 total; 2269 residual
summary(m4 <- gls(bodyfat ~ sex*age,
correlation = corCompSymm(form = ~ 1 | id),
weights = varIdent(form = ~ 1 | age*sex),
data = dta))
Generalized least squares fit by REML
Model: bodyfat ~ sex * age
Data: dta
AIC BIC logLik
10169 10232 -5073.6
Correlation Structure: Compound symmetry
Formula: ~1 | id
Parameter estimate(s):
Rho
0.23257
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | age * sex
Parameter estimates:
11*male 13*male 15*male 11*female 13*female 15*female
1.00000 1.07940 1.40790 0.97918 1.10819 1.37546
Coefficients:
Value Std.Error t-value p-value
(Intercept) 1.73423 0.49715 3.4883 0.0005
sexmale 1.06276 0.68275 1.5566 0.1197
age 0.64205 0.03977 16.1422 0.0000
sexmale:age -0.17534 0.05458 -3.2128 0.0013
Correlation:
(Intr) sexmal age
sexmale -0.728
age -0.986 0.718
sexmale:age 0.719 -0.986 -0.729
Standardized residuals:
Min Q1 Med Q3 Max
-2.869437 -0.706899 -0.099607 0.599113 5.088486
Residual standard error: 2.0181
Degrees of freedom: 2273 total; 2269 residual
summary(m5 <- gls(bodyfat ~ sex*age,
correlation = corSymm(form = ~ 1 | id),
weights = varIdent(form = ~ 1 | age*sex),
data = dta))
Generalized least squares fit by REML
Model: bodyfat ~ sex * age
Data: dta
AIC BIC logLik
10083 10157 -5028.4
Correlation Structure: General
Formula: ~1 | id
Parameter estimate(s):
Correlation:
1 2
2 0.166
3 0.054 0.461
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | age * sex
Parameter estimates:
11*male 13*male 15*male 11*female 13*female 15*female
1.00000 1.12179 1.44611 0.97823 1.14538 1.40922
Coefficients:
Value Std.Error t-value p-value
(Intercept) 1.80450 0.54805 3.2926 0.0010
sexmale 0.97602 0.75390 1.2946 0.1956
age 0.63588 0.04446 14.3036 0.0000
sexmale:age -0.16775 0.06114 -2.7438 0.0061
Correlation:
(Intr) sexmal age
sexmale -0.727
age -0.989 0.719
sexmale:age 0.719 -0.989 -0.727
Standardized residuals:
Min Q1 Med Q3 Max
-2.85169 -0.70638 -0.10301 0.59504 5.19623
Residual standard error: 1.9765
Degrees of freedom: 2273 total; 2269 residual
anova(m1, m2, m3, m4, m5)
Model df AIC BIC logLik Test L.Ratio p-value
m1 1 10 10272 10330 -5126.3
m2 2 6 10268 10302 -5127.8 1 vs 2 3.111 0.5394
m3 3 7 10270 10310 -5127.8 2 vs 3 0.111 0.7389
m4 4 11 10169 10232 -5073.6 3 vs 4 108.338 <.0001
m5 5 13 10083 10157 -5028.4 4 vs 5 90.379 <.0001
Linear mix model approach revealed that (a) Does body fat increase with age for both girls and boys? A: Yes
Is there a difference between girls and boys with respect to the increase of the body fat with age? A: Yes
Do girls tend to have more body fat than boys? A: Yes