1 Homework 1:

For the claudication example, first reproduce the results of the analysis then analyze it with a model constraining baseline effect to be the same for both treatment groups while ignoring gender.

The new drug Novafylline is developed to reduce the symptoms of intermittent claudication, which is a medical term for pain caused by too little blood flow, usually during exercise. Patients were randomly assigned to receive either Novafylline (active treatment) or a placebo in a double-blind study. A total of 38 patients (20 for treatment, 18 for placebo) underwent treadmill testing at baseline and at each of 4 monthly, follow-up visits. Patients were stratified by gender (17 females, 21 males). The primary measurement of efficacy is the walking distance on a treadmill until discontinuation due to claudication pain. Is there any distinction in exercise tolerance profiles between patients who receive Novafylline and those on placebo?

Source: Glenn Walker, G., & Shostak, J.(2010). Common Statistical Methods for Clinical Research with SAS Examples.

Column 1: Treatment ID Column 2: Patient ID Column 3: Gender ID Column 4: Walking distance at baseline (month 0) Column 5: Walking distance at month 1 Column 6: Walking distance at month 2 Column 7: Walking distance at month 3 Column 8: Walking distance at month 4

1.1 Load data file

pacman::p_load(knitr, simstudy, tidyverse, nlme, rms, ggplot2, lme4)

dta <- read.csv('data/claudication.csv')

head(dta)

  Treatment  PID Gender  D0  D1  D2  D3  D4
1       ACT P101      M 190 212 213 195 248
2       ACT P105      M  98 137 185 215 225
3       ACT P109      M 155 145 196 189 176
4       ACT P112      M 245 228 280 274 260
5       ACT P117      M 182 205 218 194 193
6       ACT P122      M 140 138 187 195 205

1.2 Wide to long format

dtaL <- gather(data=dta, key=Time, value=Score, D0:D4, factor_key=TRUE)  

head(dtaL)

  Treatment  PID Gender Time Score
1       ACT P101      M   D0   190
2       ACT P105      M   D0    98
3       ACT P109      M   D0   155
4       ACT P112      M   D0   245
5       ACT P117      M   D0   182
6       ACT P122      M   D0   140

1.3 Data Visualization

theme_set(theme_bw())

# means with error bars
ggplot(dtaL, aes(Time, Score, group=Treatment, color=Treatment)) +
  geom_point(alpha = .3) +
  stat_summary(fun = mean, geom = "line") +
  stat_summary(fun = mean, geom = "point") +
  stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.2) +
  facet_grid( ~ Gender) +
  labs(x = "Time (in months)", y = "Walking Distance (in meters)")+
  theme(legend.position=c(0.08,0.88))

1.4 Means & Variances

dta %>%
  group_by(Gender, Treatment) %>%
  summarise_all('mean') %>%
  select(-PID)

# A tibble: 4 x 7
# Groups:   Gender [2]
  Gender Treatment    D0    D1    D2    D3    D4
  <chr>  <chr>     <dbl> <dbl> <dbl> <dbl> <dbl>
1 F      ACT        180.  193.  208.  197.  208.
2 F      PBO        166.  170.  176.  168.  171.
3 M      ACT        163.  180.  196.  204.  208.
4 M      PBO        178.  166.  173.  194.  194.

dta %>%
  group_by(Gender, Treatment) %>%
  summarise_all('var') %>%
  select(-PID)

# A tibble: 4 x 7
# Groups:   Gender [2]
  Gender Treatment    D0    D1    D2    D3    D4
  <chr>  <chr>     <dbl> <dbl> <dbl> <dbl> <dbl>
1 F      ACT       1779. 1096. 3388. 1735. 3261.
2 F      PBO       1725. 1531. 2233. 1118. 2056.
3 M      ACT       1793. 1191. 1612.  807.  786.
4 M      PBO       2026. 2073. 1908. 3067. 2242.

1.5 Correlation matrix for each group across gender

cor(subset(dta, Treatment == "ACT" & Gender == "F")[,-(1:3)])

        D0      D1      D2      D3      D4
D0 1.00000 0.80220 0.87247 0.84839 0.87572
D1 0.80220 1.00000 0.86451 0.70738 0.90544
D2 0.87247 0.86451 1.00000 0.89375 0.91596
D3 0.84839 0.70738 0.89375 1.00000 0.92511
D4 0.87572 0.90544 0.91596 0.92511 1.00000

cor(subset(dta, Treatment == "ACT" & Gender == "M")[,-(1:3)])

        D0      D1      D2      D3      D4
D0 1.00000 0.86248 0.80526 0.59283 0.40690
D1 0.86248 1.00000 0.64643 0.34983 0.44968
D2 0.80526 0.64643 1.00000 0.67049 0.63236
D3 0.59283 0.34983 0.67049 1.00000 0.59414
D4 0.40690 0.44968 0.63236 0.59414 1.00000

cor(subset(dta, Treatment == "PBO" & Gender == "F")[,-(1:3)])

        D0      D1      D2      D3      D4
D0 1.00000 0.92691 0.60579 0.86576 0.77769
D1 0.92691 1.00000 0.68262 0.77079 0.75800
D2 0.60579 0.68262 1.00000 0.37132 0.48466
D3 0.86576 0.77079 0.37132 1.00000 0.79310
D4 0.77769 0.75800 0.48466 0.79310 1.00000

cor(subset(dta, Treatment == "PBO" & Gender == "M")[,-(1:3)])

        D0      D1      D2      D3      D4
D0 1.00000 0.92712 0.85306 0.88336 0.92020
D1 0.92712 1.00000 0.70360 0.78467 0.90313
D2 0.85306 0.70360 1.00000 0.82928 0.80731
D3 0.88336 0.78467 0.82928 1.00000 0.91975
D4 0.92020 0.90313 0.80731 0.91975 1.00000

1.6 Constrained Model

dtaL <- mutate(dtaL, Time_f = factor(Time))
dtaL$Time <- as.numeric(dtaL$Time)
m0_gls <- gls(Score ~ Time + Gender * Treatment, 
                weights = varIdent(form = ~ 1 | Time_f), 
                correlation=corSymm (form = ~ 1 | PID), 
                data = dtaL)
summary(m0_gls)

Generalized least squares fit by REML
  Model: Score ~ Time + Gender * Treatment 
  Data: dtaL 
     AIC  BIC  logLik
  1790.6 1855 -875.31

Correlation Structure: General
 Formula: ~1 | PID 
 Parameter estimate(s):
 Correlation: 
  1     2     3     4    
2 0.872                  
3 0.747 0.717            
4 0.760 0.652 0.681      
5 0.718 0.739 0.717 0.840
Variance function:
 Structure: Different standard deviations per stratum
 Formula: ~1 | Time_f 
 Parameter estimates:
     D0      D1      D2      D3      D4 
1.00000 0.85134 1.01293 0.89669 0.96687 

Coefficients:
                       Value Std.Error t-value p-value
(Intercept)          179.391   12.8983 13.9081  0.0000
Time                   6.451    1.3386  4.8192  0.0000
GenderM                0.020   15.6405  0.0013  0.9990
TreatmentPBO         -29.113   16.6506 -1.7485  0.0820
GenderM:TreatmentPBO   0.204   22.4846  0.0091  0.9928

 Correlation: 
                     (Intr) Time   GendrM TrtPBO
Time                 -0.343                     
GenderM              -0.728  0.000              
TreatmentPBO         -0.683  0.000  0.564       
GenderM:TreatmentPBO  0.506  0.000 -0.696 -0.741

Standardized residuals:
     Min       Q1      Med       Q3      Max 
-1.93342 -0.56241 -0.13294  0.53448  3.67355 

Residual standard error: 45.444 
Degrees of freedom: 190 total; 185 residual

m0_Gls <- Gls(Score ~ Time + Gender * Treatment, 
                weights = varIdent(form = ~ 1 | Time_f), 
                correlation=corSymm (form = ~ 1 | PID), 
                data = dtaL)

predictions <- cbind(dtaL, predict(m0_Gls, dtaL, conf.int=0.95))

predictions$Group2 <- factor(predictions$Treatment, levels=c("All", "ACT", "PBO"))
predictions$Group2[predictions$Time=="D0"] <- "All"

pd <- position_dodge(.08)
limits <- aes(ymax=upper , ymin=lower, shape=Group2)

pCI <- ggplot(predictions, 
              aes(y=linear.predictors, 
                  x=Time)) + 
  geom_errorbar(limits, 
                width=0.09, 
                position=pd) +
  geom_line(aes(group=Treatment, 
                y=linear.predictors), 
            position=pd)+
  geom_point(aes(fill=Group2, shape=Group2),
             position=pd, 
             size=rel(4)) + 
  scale_fill_manual(values=c("white", "gray80", "gray20", "black")) + 
  theme_minimal() + 
  theme(panel.grid.major=element_blank(), 
        legend.title=element_blank(), 
        text=element_text(size=14), 
        legend.position="bottom") + 
  labs(x="Time", 
       y="Estimated mean with corresponding 95% CI")
pCI

Homework 2: Analyze the Alzheimer progression data set following the disussions in the lecture note.

Patients were randomized to receive one of two daily doses (low or high) of a new treatment for Alzheimer・s disease or a placebo in a study. Each patient returned to the clinic every 2 months for 1 year for assessment of disease progression based on cognitive measurements on the Alzheimer・s Disease Assessment Scale (ADAS). This test evaluates memory, language, and praxis function, and is based on the sum of scores from an 11-item scale, with a potential range of 0 to 70, higher scores indicative of greater disease severity. The primary goal is to determine if the rate of disease progression is slowed with active treatment compared with a placebo. Is there a difference in response profiles over time among the three groups?

Source: Glenn Walker, G., & Shostak, J.(2010). Common Statistical Methods for Clinical Research with SAS Examples.

Column 1: Treatment ID (L = Low dose, H = High dose, P = Placebo) Column 2: Patient ID Column 3: ADAS score at month 2 Column 4: ADAS score at month 4 Column 5: ADAS score at month 6 Column 6: ADAS score at month 8 Column 7: ADAS score at month 10 Column 8: ADAS score at month 12

1.7 Load data file

dta <- read.table('data/adas.txt', header = T, na.strings = ".") %>%
  na.omit(.)

head(dta)

   Treatment PID adas02 adas04 adas06 adas08 adas10 adas12
2          L   5     34     35     46     37     31     35
3          L   8     40     41     41     46     52     48
6          L  15     31     36     41     46     52     57
7          L  19     22     27     28     24     27     28
8          L  21     43     49     42     48     48     46
10         L  28     25     24     27     18     21     22

1.8 Wide to long format

dtaL <- gather(data=dta, key=Time, value=Adas, adas02:adas12, factor_key=TRUE)  
dtaL$Time <- factor(dtaL$Time, levels = c("adas02", "adas04", "adas06", "adas08",
                                          "adas10", "adas12"),
                    labels = c("2", "4", "6", "8", "10", "12"))

head(dtaL)

  Treatment PID Time Adas
1         L   5    2   34
2         L   8    2   40
3         L  15    2   31
4         L  19    2   22
5         L  21    2   43
6         L  28    2   25

1.9 Data Visualization

theme_set(theme_bw())

# means with error bars
ggplot(dtaL, aes(Time, Adas, group=Treatment, color=Treatment)) +
  geom_point(alpha = .3) +
  stat_summary(fun = mean, geom = "line") +
  stat_summary(fun = mean, geom = "point") +
  stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.2) +
  facet_grid( ~ Treatment)

  labs(x = "Time (in months)", y = "Adas Scores")

$x
[1] "Time (in months)"

$y
[1] "Adas Scores"

attr(,"class")
[1] "labels"

1.10 Means & Variances

dta %>%
  group_by(Treatment) %>%
  summarise_all('mean') %>%
  select(-PID)

# A tibble: 3 x 7
  Treatment adas02 adas04 adas06 adas08 adas10 adas12
  <chr>      <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1 H           30.2   32.4   33.6   33.3   33.7   34.3
2 L           32.8   35.1   37.6   36.4   38.4   39.2
3 P           29.6   33.4   34.9   36.4   37.6   39.0

dta %>%
  group_by(Treatment) %>%
  summarise_all('var') %>%
  select(-PID)

# A tibble: 3 x 7
  Treatment adas02 adas04 adas06 adas08 adas10 adas12
  <chr>      <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1 H           73.2   53.7   73.2   82     55.4   50.4
2 L           56.4   74.6   82.6  123.   125.   116. 
3 P           45.9   41.2   41.3   40.2   49.5   77.4

1.11 Correlation matrix for each group

cor(subset(dta, Treatment == "H")[,-(1:2)])

        adas02  adas04  adas06  adas08  adas10  adas12
adas02 1.00000 0.88586 0.73007 0.78384 0.86916 0.80794
adas04 0.88586 1.00000 0.82823 0.90027 0.92266 0.75602
adas06 0.73007 0.82823 1.00000 0.84630 0.87371 0.68458
adas08 0.78384 0.90027 0.84630 1.00000 0.95110 0.78685
adas10 0.86916 0.92266 0.87371 0.95110 1.00000 0.83392
adas12 0.80794 0.75602 0.68458 0.78685 0.83392 1.00000

cor(subset(dta, Treatment == "L")[,-(1:2)])

        adas02  adas04  adas06  adas08  adas10  adas12
adas02 1.00000 0.96956 0.89628 0.89383 0.85371 0.82287
adas04 0.96956 1.00000 0.90886 0.92848 0.89502 0.85668
adas06 0.89628 0.90886 1.00000 0.92592 0.83630 0.83417
adas08 0.89383 0.92848 0.92592 1.00000 0.95700 0.93159
adas10 0.85371 0.89502 0.83630 0.95700 1.00000 0.97054
adas12 0.82287 0.85668 0.83417 0.93159 0.97054 1.00000

cor(subset(dta, Treatment == "P")[,-(1:2)])

        adas02  adas04  adas06  adas08  adas10  adas12
adas02 1.00000 0.83409 0.48174 0.50351 0.48623 0.54094
adas04 0.83409 1.00000 0.41375 0.20586 0.26914 0.32287
adas06 0.48174 0.41375 1.00000 0.73207 0.23857 0.54100
adas08 0.50351 0.20586 0.73207 1.00000 0.49470 0.59789
adas10 0.48623 0.26914 0.23857 0.49470 1.00000 0.86075
adas12 0.54094 0.32287 0.54100 0.59789 0.86075 1.00000

car::scatterplotMatrix(~ adas02 + adas04 + adas06 + adas08 + adas10 + adas12 | Treatment,
                  data=dta,
                  col=8,
                  smooth=FALSE, 
                  regLine=FALSE,
                  ellipse=list(levels=c(0.68, 0.975), 
                               fill=TRUE, 
                               fill.alpha=0.1))

1.12 Covariance matrices:

High dose group

dta %>%
 filter( Treatment == "H") %>%
 select( -(1:2)) %>%
 var(na.rm = T)

       adas02 adas04 adas06 adas08 adas10 adas12
adas02 73.242 55.542 53.458 60.745 55.359 49.111
adas04 55.542 53.673 51.915 59.725 50.307 39.340
adas06 53.458 51.915 73.203 65.569 55.634 41.601
adas08 60.745 59.725 65.569 82.000 64.098 50.608
adas10 55.359 50.307 55.634 64.098 55.389 44.082
adas12 49.111 39.340 41.601 50.608 44.082 50.448

Low dose group

dta %>%
 filter( Treatment == "L") %>%
 select( -(1:2)) %>%
 var(na.rm = T)

       adas02 adas04 adas06  adas08  adas10  adas12
adas02 56.404 62.897 61.184  74.452  71.754  66.533
adas04 62.897 74.610 71.357  88.949  86.518  79.665
adas06 61.184 71.357 82.618  93.342  85.070  81.629
adas08 74.452 88.949 93.342 123.007 118.783 111.235
adas10 71.754 86.518 85.070 118.783 125.243 116.934
adas12 66.533 79.665 81.629 111.235 116.934 115.904

Placebo group

dta %>%
 filter( Treatment == "P") %>%
 select( -(1:2)) %>%
 var(na.rm = T)

       adas02  adas04 adas06  adas08 adas10 adas12
adas02 45.948 36.2714 20.993 21.6524 23.179 32.269
adas04 36.271 41.1571 17.064  8.3786 12.143 18.229
adas06 20.993 17.0643 41.329 29.8571 10.786 30.607
adas08 21.652  8.3786 29.857 40.2476 22.071 33.381
adas10 23.179 12.1429 10.786 22.0714 49.457 53.271
adas12 32.269 18.2286 30.607 33.3810 53.271 77.448

1.13 Multivariate analysis of variance

m0 <- lm(cbind(adas02 , adas04 , adas06 , adas08 , adas10 , adas12) ~ Treatment, data=dta)
anova(m0)

Analysis of Variance Table

            Df Pillai approx F num Df den Df Pr(>F)
(Intercept)  1  0.964    212.4      6     48 <2e-16
Treatment    2  0.254      1.2     12     98    0.3
Residuals   53

1.14 Univariate ANOVA for repeated measures

summary(m0a <- aov(Adas ~ Time*Treatment + Error(PID/Time), data=dtaL))


Error: PID
          Df Sum Sq Mean Sq
Treatment  1   3.58    3.58

Error: PID:Time
     Df Sum Sq Mean Sq
Time  5   1224     245

Error: Within
                Df Sum Sq Mean Sq F value Pr(>F)
Time             5    450      90    1.29 0.2662
Treatment        2    717     359    5.16 0.0062
Time:Treatment  10    231      23    0.33 0.9718
Residuals      312  21679      69

1.15 Mixed-effects approach - lme4

sjPlot::tab_model(m0_er <- lmer(Adas ~ Time*Treatment + (PID|Time), data=dtaL), show.ci=F, show.r2=F, show.ngroups=F, show.obs=F)

	Adas
Predictors	Estimates	p
(Intercept)	30.22	<0.001
Time [4]	2.22	0.853
Time [6]	3.33	0.781
Time [8]	3.11	0.796
Time [10]	3.50	0.771
Time [12]	4.06	0.736
Treatment [L]	2.60	0.352
Treatment [P]	-0.60	0.820
Time [4] * Treatment [L]	0.07	0.985
Time [6] * Treatment [L]	1.49	0.706
Time [8] * Treatment [L]	0.48	0.904
Time [10] * Treatment [L]	2.03	0.608
Time [12] * Treatment [L]	2.30	0.561
Time [4] * Treatment [P]	1.59	0.673
Time [6] * Treatment [P]	1.90	0.612
Time [8] * Treatment [P]	3.65	0.331
Time [10] * Treatment [P]	4.45	0.236
Time [12] * Treatment [P]	5.37	0.153
Random Effects
σ²	68.40
τ₀₀ _Time	68.40
τ₁₁ _Time.PID	0.00
ρ₀₁ _Time	1.00

1.16 Compound Symmetry

dtaL <- mutate(dtaL, Time_f = factor(Time))
dtaL$Time <- as.numeric(dtaL$Time)
summary(m0_csh <- gls(Adas ~ Time*Treatment, data=dtaL, 
              weights=varIdent(form= ~ 1 | Time_f),
              correlation=corCompSymm(form= ~ 1 | PID)))

Generalized least squares fit by REML
  Model: Adas ~ Time * Treatment 
  Data: dtaL 
     AIC    BIC  logLik
  2069.4 2118.7 -1021.7

Correlation Structure: Compound symmetry
 Formula: ~1 | PID 
 Parameter estimate(s):
    Rho 
0.76308 
Variance function:
 Structure: Different standard deviations per stratum
 Formula: ~1 | Time_f 
 Parameter estimates:
      2       4       6       8      10      12 
1.00000 0.96819 1.05763 1.11731 1.09674 1.17009 

Coefficients:
                  Value Std.Error t-value p-value
(Intercept)     30.5984   1.70067 17.9919  0.0000
Time             0.7066   0.23384  3.0217  0.0027
TreatmentL       2.0359   2.44023  0.8343  0.4047
TreatmentP      -1.4339   2.31763 -0.6187  0.5365
Time:TreatmentL  0.4697   0.33552  1.4000  0.1624
Time:TreatmentP  1.0645   0.31866  3.3405  0.0009

 Correlation: 
                (Intr) Time   TrtmnL TrtmnP Tm:TrL
Time            -0.207                            
TreatmentL      -0.697  0.145                     
TreatmentP      -0.734  0.152  0.511              
Time:TreatmentL  0.145 -0.697 -0.207 -0.106       
Time:TreatmentP  0.152 -0.734 -0.106 -0.207  0.511

Standardized residuals:
      Min        Q1       Med        Q3       Max 
-2.243020 -0.753525 -0.071033  0.682438  2.675540 

Residual standard error: 7.7169 
Degrees of freedom: 336 total; 330 residual

1.17 Unstructured

m0_unh <- gls(Adas ~ Time*Treatment, data=dtaL, 
              weights=varIdent(form= ~ 1 | Time_f),
              correlation=corSymm(form= ~ 1 | PID))
summary(m0_unh)

Generalized least squares fit by REML
  Model: Adas ~ Time * Treatment 
  Data: dtaL 
     AIC  BIC  logLik
  2017.4 2120 -981.72

Correlation Structure: General
 Formula: ~1 | PID 
 Parameter estimate(s):
 Correlation: 
  1     2     3     4     5    
2 0.881                        
3 0.706 0.757                  
4 0.751 0.748 0.842            
5 0.741 0.724 0.686 0.852      
6 0.709 0.640 0.676 0.790 0.905
Variance function:
 Structure: Different standard deviations per stratum
 Formula: ~1 | Time_f 
 Parameter estimates:
     2      4      6      8     10     12 
1.0000 0.9705 1.0392 1.1543 1.1268 1.1828 

Coefficients:
                  Value Std.Error t-value p-value
(Intercept)     30.2745   1.79017 16.9115  0.0000
Time             0.9080   0.29553  3.0726  0.0023
TreatmentL       2.8237   2.56865  1.0993  0.2724
TreatmentP      -1.8918   2.43960 -0.7755  0.4386
Time:TreatmentL  0.3743   0.42405  0.8826  0.3781
Time:TreatmentP  1.1448   0.40274  2.8424  0.0048

 Correlation: 
                (Intr) Time   TrtmnL TrtmnP Tm:TrL
Time            -0.394                            
TreatmentL      -0.697  0.275                     
TreatmentP      -0.734  0.289  0.511              
Time:TreatmentL  0.275 -0.697 -0.394 -0.201       
Time:TreatmentP  0.289 -0.734 -0.201 -0.394  0.511

Standardized residuals:
     Min       Q1      Med       Q3      Max 
-2.28500 -0.83379 -0.12055  0.61321  2.64189 

Residual standard error: 7.6689 
Degrees of freedom: 336 total; 330 residual

1.18 Model Comparison

anova(m0_csh, m0_unh)

       Model df    AIC    BIC   logLik   Test L.Ratio p-value
m0_csh     1 13 2069.4 2118.8 -1021.68                       
m0_unh     2 27 2017.4 2120.0  -981.72 1 vs 2  79.926  <.0001

Unstructured may be the better fit model than Compound Symmetry model.

In this case, although it was a randomized exp design, it did not record their adas scores before intervention as baseline. I don’t know whether using their adas scores at 2 months is appropriate for constrained model, so I don’t do it.

2 Homework 3:

Use appropriate methods to answer questions posed in the body fat example.

The body fat was measured, using dual-energy x-ray absorptiometry, in several hundred children at age 11, 13, and 15 years.

Use the data set (in Stata dta format) to answer the following questions: (a) Does body fat increase with age for both girls and boys? (b) Is there a difference between girls and boys with respect to the increase of the body fat with age? (c) Do girls tend to have more body fat than boys?

Source: Vach, W. (2013). Regression Models as a Tool in Medical Research. p. 330. Chapman/Hall & CRC Press.

Column 1: Subject ID Column 2: Gender ID Column 3: Age in years Column 4: Body fat in kilograms

2.1 Load data file

dta <- foreign::read.dta('data/bodyfat.dta')

head(dta)

  id    sex age bodyfat
1  1   male  11     4.0
2  1   male  13     6.2
3  1   male  15    10.5
4  2 female  11     8.1
5  2 female  13    10.4
6  2 female  15    15.2

2.2 Data Visualization

lattice::xyplot(bodyfat ~ age|sex, 
                groups=id, 
                type=c('l','g', 'p'), 
                xlab="Time (year)",
                ylab="Body Fat (kg)",
                data=dta)

2.3 Means & Variances

dta %>%
  group_by(sex, age) %>%
  summarise_all('mean') %>%
  select(-id)

# A tibble: 6 x 3
# Groups:   sex [2]
  sex      age bodyfat
  <fct>  <dbl>   <dbl>
1 female    11    8.77
2 female    13   10.2 
3 female    15   11.3 
4 male      11    7.94
5 male      13    8.84
6 male      15    9.82

dta %>%
  group_by(sex, age) %>%
  summarise_all('var') %>%
  select(-id)

# A tibble: 6 x 3
# Groups:   sex [2]
  sex      age bodyfat
  <fct>  <dbl>   <dbl>
1 female    11    3.71
2 female    13    5.13
3 female    15    7.82
4 male      11    3.93
5 male      13    4.92
6 male      15    8.14

2.4 From long to wide format

dtaW <- dta %>%
  gather(key, value, -sex, -age, -id) %>%
  unite(col, key, age) %>%
  spread(col, value)

head(dtaW)

  id    sex bodyfat_11 bodyfat_13 bodyfat_15
1  1   male        4.0        6.2       10.5
2  2 female        8.1       10.4       15.2
3  3 female        9.0        9.2       11.4
4  4 female        7.3        9.2       12.3
5  5 female       11.6       10.9       10.5
6  6   male        9.9        9.6       11.9

2.5 Correlation matrix for gender across age

Female

cor(subset(dtaW, sex == "female")[,-(1:2)])

           bodyfat_11 bodyfat_13 bodyfat_15
bodyfat_11          1         NA         NA
bodyfat_13         NA          1         NA
bodyfat_15         NA         NA          1

Male

cor(subset(dtaW, sex == "male")[,-(1:2)])

           bodyfat_11 bodyfat_13 bodyfat_15
bodyfat_11          1         NA         NA
bodyfat_13         NA          1         NA
bodyfat_15         NA         NA          1

car::scatterplotMatrix(~ bodyfat_11 + bodyfat_13 + bodyfat_15 | sex,
                  data=dtaW,
                  col=8,
                  smooth=FALSE, 
                  regLine=FALSE,
                  ellipse=list(levels=c(0.68, 0.975), 
                               fill=TRUE, 
                               fill.alpha=0.1))

ggplot(dta, 
       aes(age, bodyfat, color=sex)) +
 stat_summary(fun.data="mean_cl_boot", 
              geom="pointrange", 
              col=2) + 
 geom_hline(yintercept=mean(dta$bodyfat), 
            linetype="dashed") +
 geom_point(alpha=.3, size=rel(.8)) +
 facet_grid(~ sex)

 labs(x="Age (year)", 
      y="Bodyfat (kg)") +
 theme_minimal() +
 theme(legend.position = "none")

NULL

2.6 Univariate ANOVA for repeated measures

summary(m0 <- aov(bodyfat ~ age*sex + Error(id/age), data=dta))


Error: id
    Df Sum Sq Mean Sq
age  1   8.63    8.63

Error: id:age
    Df Sum Sq Mean Sq
age  1   1468    1468

Error: Within
            Df Sum Sq Mean Sq F value  Pr(>F)
age          1    353     353    63.6 2.4e-15
sex          1    821     821   147.8 < 2e-16
age:sex      1     43      43     7.8  0.0053
Residuals 2267  12590       6

Univariate approach revealed that (a) Does body fat increase with age for both girls and boys? A: Yes

Is there a difference between girls and boys with respect to the increase of the body fat with age? A: Yes
Do girls tend to have more body fat than boys? A: Yes

2.7 Model 1: Variance Component for different sex across time

summary(m1 <- gls(bodyfat ~ sex*age,
                  weights = varIdent(form = ~ 1 | age*sex),
                  data = dta))

Generalized least squares fit by REML
  Model: bodyfat ~ sex * age 
  Data: dta 
    AIC   BIC  logLik
  10273 10330 -5126.3

Variance function:
 Structure: Different standard deviations per stratum
 Formula: ~1 | age * sex 
 Parameter estimates:
  11*male   13*male   15*male 11*female 13*female 15*female 
  1.00000   1.11720   1.43844   0.97085   1.14155   1.41011 

Coefficients:
               Value Std.Error t-value p-value
(Intercept)  1.69527   0.55897  3.0328  0.0025
sexmale      1.09634   0.76891  1.4258  0.1541
age          0.64532   0.04450 14.5030  0.0000
sexmale:age -0.17819   0.06113 -2.9148  0.0036

 Correlation: 
            (Intr) sexmal age   
sexmale     -0.727              
age         -0.992  0.721       
sexmale:age  0.722 -0.992 -0.728

Standardized residuals:
     Min       Q1      Med       Q3      Max 
-2.85177 -0.70626 -0.10066  0.59643  5.17847 

Residual standard error: 1.9832 
Degrees of freedom: 2273 total; 2269 residual

Model 2: Variance Component across time.

summary(m2 <- gls(bodyfat ~ sex*age,
                  weights = varExp(form = ~ age),
                  data = dta))

Generalized least squares fit by REML
  Model: bodyfat ~ sex * age 
  Data: dta 
    AIC   BIC  logLik
  10268 10302 -5127.8

Variance function:
 Structure: Exponential of variance covariate
 Formula: ~age 
 Parameter estimates:
   expon 
0.091678 

Coefficients:
               Value Std.Error t-value p-value
(Intercept)  1.69849   0.55798  3.0440  0.0024
sexmale      1.09179   0.75884  1.4388  0.1504
age          0.64498   0.04441 14.5226  0.0000
sexmale:age -0.17767   0.06038 -2.9424  0.0033

 Correlation: 
            (Intr) sexmal age   
sexmale     -0.735              
age         -0.992  0.730       
sexmale:age  0.730 -0.992 -0.736

Standardized residuals:
     Min       Q1      Med       Q3      Max 
-2.86724 -0.71920 -0.10029  0.60677  5.32886 

Residual standard error: 0.70298 
Degrees of freedom: 2273 total; 2269 residual

2.8 Model 3: Exponential of variance covariate

summary(m3 <- gls(bodyfat ~ sex*age,
                  weights = varExp(form = ~ age | sex),
                  data = dta))

Generalized least squares fit by REML
  Model: bodyfat ~ sex * age 
  Data: dta 
    AIC   BIC  logLik
  10270 10310 -5127.8

Variance function:
 Structure: Exponential of variance covariate, different strata
 Formula: ~age | sex 
 Parameter estimates:
    male   female 
0.092026 0.091266 

Coefficients:
               Value Std.Error t-value p-value
(Intercept)  1.69884   0.55502  3.0609  0.0022
sexmale      1.09153   0.75824  1.4396  0.1501
age          0.64495   0.04417 14.6017  0.0000
sexmale:age -0.17765   0.06033 -2.9444  0.0033

 Correlation: 
            (Intr) sexmal age   
sexmale     -0.732              
age         -0.992  0.726       
sexmale:age  0.726 -0.992 -0.732

Standardized residuals:
     Min       Q1      Med       Q3      Max 
-2.88503 -0.71545 -0.10076  0.60445  5.30852 

Residual standard error: 0.70297 
Degrees of freedom: 2273 total; 2269 residual

2.9 Model 4: Compound symmetry

summary(m4 <- gls(bodyfat ~ sex*age,
                  correlation = corCompSymm(form = ~ 1 | id),
                  weights = varIdent(form = ~ 1 | age*sex),
                  data = dta))

Generalized least squares fit by REML
  Model: bodyfat ~ sex * age 
  Data: dta 
    AIC   BIC  logLik
  10169 10232 -5073.6

Correlation Structure: Compound symmetry
 Formula: ~1 | id 
 Parameter estimate(s):
    Rho 
0.23257 
Variance function:
 Structure: Different standard deviations per stratum
 Formula: ~1 | age * sex 
 Parameter estimates:
  11*male   13*male   15*male 11*female 13*female 15*female 
  1.00000   1.07940   1.40790   0.97918   1.10819   1.37546 

Coefficients:
               Value Std.Error t-value p-value
(Intercept)  1.73423   0.49715  3.4883  0.0005
sexmale      1.06276   0.68275  1.5566  0.1197
age          0.64205   0.03977 16.1422  0.0000
sexmale:age -0.17534   0.05458 -3.2128  0.0013

 Correlation: 
            (Intr) sexmal age   
sexmale     -0.728              
age         -0.986  0.718       
sexmale:age  0.719 -0.986 -0.729

Standardized residuals:
      Min        Q1       Med        Q3       Max 
-2.869437 -0.706899 -0.099607  0.599113  5.088486 

Residual standard error: 2.0181 
Degrees of freedom: 2273 total; 2269 residual

2.10 Model 5: Unstructured

summary(m5 <- gls(bodyfat ~ sex*age,
                  correlation = corSymm(form = ~ 1 | id),
                  weights = varIdent(form = ~ 1 | age*sex),
                  data = dta))

Generalized least squares fit by REML
  Model: bodyfat ~ sex * age 
  Data: dta 
    AIC   BIC  logLik
  10083 10157 -5028.4

Correlation Structure: General
 Formula: ~1 | id 
 Parameter estimate(s):
 Correlation: 
  1     2    
2 0.166      
3 0.054 0.461
Variance function:
 Structure: Different standard deviations per stratum
 Formula: ~1 | age * sex 
 Parameter estimates:
  11*male   13*male   15*male 11*female 13*female 15*female 
  1.00000   1.12179   1.44611   0.97823   1.14538   1.40922 

Coefficients:
               Value Std.Error t-value p-value
(Intercept)  1.80450   0.54805  3.2926  0.0010
sexmale      0.97602   0.75390  1.2946  0.1956
age          0.63588   0.04446 14.3036  0.0000
sexmale:age -0.16775   0.06114 -2.7438  0.0061

 Correlation: 
            (Intr) sexmal age   
sexmale     -0.727              
age         -0.989  0.719       
sexmale:age  0.719 -0.989 -0.727

Standardized residuals:
     Min       Q1      Med       Q3      Max 
-2.85169 -0.70638 -0.10301  0.59504  5.19623 

Residual standard error: 1.9765 
Degrees of freedom: 2273 total; 2269 residual

2.11 Model comparison

anova(m1, m2, m3, m4, m5)

   Model df   AIC   BIC  logLik   Test L.Ratio p-value
m1     1 10 10272 10330 -5126.3                       
m2     2  6 10268 10302 -5127.8 1 vs 2   3.111  0.5394
m3     3  7 10270 10310 -5127.8 2 vs 3   0.111  0.7389
m4     4 11 10169 10232 -5073.6 3 vs 4 108.338  <.0001
m5     5 13 10083 10157 -5028.4 4 vs 5  90.379  <.0001

Linear mix model approach revealed that (a) Does body fat increase with age for both girls and boys? A: Yes

Is there a difference between girls and boys with respect to the increase of the body fat with age? A: Yes
Do girls tend to have more body fat than boys? A: Yes

Covariance Patterns Models: Homework exercises

Hao-Lun Fu

2020-10-26