1 Inclass 1

1.1 Info

Problem: Edit and comment on the analysis in the weights of Asian children in U.K. markdown file.
Data: The data consist of weight gains of 68 Asian children in a British community. Measurements of weight were recorded for children on up to five occasions visiting a clinic. The ages at which the measurements were taken are roughly to target examination dates of 6 weeks, then 8, 12, 27 months.

1.2 Data input

# Loading package
pacman::p_load(tidyverse, afex, segmented, foreign, haven, lme4, lmerTest, ggplot2)

# data file link
fL <- "asian.dta"
dta1 <- read.dta(fL)

dta1$id <- factor(dta1$id)
dta1$gender <- factor(dta1$gender, 1:2, labels = c("M", "F")) # 把原本的1,2 改成"M", "F"
dta1$brthwt <- dta1$brthwt/1000 # 出生體重原本單位是g，改成kg方便跟weight gains單位統一
dta1$occ <- factor(dta1$occ) # 場域編號改成factor變項

glimpse(dta1) # It's a little like "str()" applied to a data frame

## Rows: 198
## Columns: 6
## $ id     <fct> 45, 45, 45, 45, 45, 258, 258, 258, 258, 287, 287, 287, 287, ...
## $ occ    <fct> 1, 2, 3, 4, 5, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 1, 2, 1, 2, ...
## $ age    <dbl> 0.1368925, 0.6570842, 1.2183436, 1.4291581, 2.2724161, 0.191...
## $ weight <dbl> 5.171, 10.860, 13.150, 13.200, 15.880, 5.300, 9.740, 9.980, ...
## $ brthwt <dbl> 4.140, 4.140, 4.140, 4.140, 4.140, 3.155, 3.155, 3.155, 3.15...
## $ gender <fct> M, M, M, M, M, F, F, F, F, M, M, M, M, F, F, F, F, F, M, M, ...

head(dta1)

##    id occ       age weight brthwt gender
## 1  45   1 0.1368925  5.171  4.140      M
## 2  45   2 0.6570842 10.860  4.140      M
## 3  45   3 1.2183436 13.150  4.140      M
## 4  45   4 1.4291581 13.200  4.140      M
## 5  45   5 2.2724161 15.880  4.140      M
## 6 258   1 0.1916496  5.300  3.155      F

1.3 Line chart

1.3.1 Age to Weight

ggplot(dta1, aes(age, weight, group = id, color = gender))+
 geom_point()+ 
 geom_line() +
 labs(x = "Age (year)", y = "Weight (kg)") +
 theme(legend.position = c(.9, .2))

1.3.2 By gender

The body weight change by age between male and female.
Male > Female?

ggplot(dta1, aes(age, weight, group = id, color = gender))+
 geom_point()+ 
 geom_line() +
 facet_wrap( ~ gender)+
 labs(x = "Age (year)", y = "Weight (kg)") +
 theme(legend.position = c(.9, .2))

對照生長曲線過去提及3、4個月大時是出生體重的大約2倍；1歲時是出生體重的大約3倍；2歲時是出生體重的大約4倍，此資料好像也可以印證類似的說法。

1.3.3 Regression line

Polynomial regression line 因為原本資料呈現的狀況不是線性關係，試著以Polynomial regression 來找合適的模型，但polynominal也可能估不準？（正弦曲線sine wave 到極值就會轉向），所以後來有logistic regression?

ggplot(dta1, aes(age, weight, color = gender))+
 geom_point()+
 stat_smooth(method = "lm", formula = y ~ poly(x, 2)) +
 labs(x = "Age (year)", y = "Weight (kg)")+
 theme(legend.position = c(.9, .2))

1.4 LMER

dta1$age2 <- dta1$age^2  #age squre

1.4.1 m0

Welch–Satterthwaite?
Fixed effect: birthwt, gender, age, age²
Random effect (random intercept and slope): age

# t-tests use Satterthwaite's method
summary(m0 <- lmer(weight ~ brthwt + gender + age + age2 + (age | id), data = dta1))

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: weight ~ brthwt + gender + age + age2 + (age | id)
##    Data: dta1
## 
## REML criterion at convergence: 496.6
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.81353 -0.43471 -0.03652  0.45573  2.79813 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr
##  id       (Intercept) 0.1920   0.4381       
##           age         0.2702   0.5198   0.07
##  Residual             0.3346   0.5784       
## Number of obs: 198, groups:  id, 68
## 
## Fixed effects:
##              Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)   1.02225    0.54924  57.65949   1.861  0.06782 .  
## brthwt        0.87755    0.16616  53.16001   5.281 2.43e-06 ***
## genderF      -0.51922    0.16698  52.23221  -3.109  0.00303 ** 
## age           7.73933    0.23674 135.03625  32.691  < 2e-16 ***
## age2         -1.67125    0.08762 114.78740 -19.074  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##         (Intr) brthwt gendrF age   
## brthwt  -0.963                     
## genderF -0.240  0.095              
## age     -0.221  0.058 -0.003       
## age2     0.199 -0.051  0.002 -0.927

The estimated weight is (1.02 ± 0.44) + 0.88(brthwt) - 0.52(female) + (7.74 ± 0.52)(age) - 1.67(age²)

1.4.2 m0k (KR)

Kenward-Roger

# Type 3 tests, KR-method - This is better
summary(m0k <- afex::mixed(weight ~ brthwt + gender + age + age2 + (age | id), data = dta1))

## Contrasts set to contr.sum for the following variables: gender, id

## Numerical variables NOT centered on 0: brthwt, age, age2
## If in interactions, interpretation of lower order (e.g., main) effects difficult.

## Fitting one lmer() model. [DONE]
## Calculating p-values. [DONE]

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: weight ~ brthwt + gender + age + age2 + (age | id)
##    Data: data
## 
## REML criterion at convergence: 498
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.81353 -0.43471 -0.03652  0.45573  2.79813 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr
##  id       (Intercept) 0.1920   0.4381       
##           age         0.2702   0.5198   0.07
##  Residual             0.3346   0.5784       
## Number of obs: 198, groups:  id, 68
## 
## Fixed effects:
##              Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)   0.76264    0.53537  58.73938   1.425  0.15959    
## brthwt        0.87755    0.16616  53.16001   5.281 2.43e-06 ***
## gender1       0.25961    0.08349  52.23221   3.109  0.00303 ** 
## age           7.73933    0.23674 135.03625  32.691  < 2e-16 ***
## age2         -1.67125    0.08762 114.78741 -19.074  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##         (Intr) brthwt gendr1 age   
## brthwt  -0.973                     
## gender1  0.090 -0.095              
## age     -0.227  0.058  0.003       
## age2     0.204 -0.051 -0.002 -0.927

sjPlot::tab_model(m0, m0k, show.p=FALSE, show.r2=FALSE, show.obs=FALSE, show.ngroups=FALSE, show.se=TRUE, show.ci=FALSE)

	weight		weight
Predictors	Estimates	std. Error	Estimates	std. Error
(Intercept)	1.02	0.55	0.76	0.54
brthwt	0.88	0.17	0.88	0.17
gender [F]	-0.52	0.17
age	7.74	0.24	7.74	0.24
age2	-1.67	0.09	-1.67	0.09
gender1			0.26	0.08
Random Effects
σ²	0.33		0.33
τ₀₀	0.19 _id		0.19 _id
τ₁₁	0.27 _id.age		0.27 _id.age
ρ₀₁	0.07 _id		0.07 _id
ICC	0.68		0.68

1.4.3 m1

Four data collection time-point -> 3 knots (切在每次間格的中間?)

# construct piecewise lines defined by knots
dta1$d1 <- dta1$age - 0.4; dta1$d1[dta1$d1 < 0] <- 0
dta1$d2 <- dta1$age - 0.9; dta1$d2[dta1$d2 < 0] <- 0
dta1$d3 <- dta1$age - 1.5; dta1$d3[dta1$d3 < 0] <- 0

Fixed effect: birthwt, gender, age, d1, d2, d3
Random effect (random intercept and slope): age

summary(m1 <- lmer(weight ~ brthwt + gender + age + d1 + d2 + d3 +
(age | id), data = dta1))

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: weight ~ brthwt + gender + age + d1 + d2 + d3 + (age | id)
##    Data: dta1
## 
## REML criterion at convergence: 465
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.81475 -0.41746 -0.01623  0.42323  2.51409 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  id       (Intercept) 0.2476   0.4976        
##           age         0.3273   0.5721   -0.12
##  Residual             0.2523   0.5023        
## Number of obs: 198, groups:  id, 68
## 
## Fixed effects:
##             Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)   0.4941     0.5600 65.7487   0.882 0.380826    
## brthwt        0.8949     0.1617 51.4586   5.533 1.07e-06 ***
## genderF      -0.5459     0.1627 50.8735  -3.356 0.001504 ** 
## age          10.3595     1.0792 83.2345   9.599 4.03e-15 ***
## d1           -6.1891     1.8004 80.6638  -3.438 0.000931 ***
## d2           -1.6641     1.3071 82.8518  -1.273 0.206547    
## d3           -0.3684     0.9920 80.3757  -0.371 0.711367    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##         (Intr) brthwt gendrF age    d1     d2    
## brthwt  -0.927                                   
## genderF -0.231  0.099                            
## age     -0.339  0.045 -0.002                     
## d1       0.313 -0.039 -0.001 -0.985              
## d2      -0.239  0.032  0.012  0.816 -0.884       
## d3       0.167 -0.031 -0.019 -0.533  0.606 -0.886

The estimated weight is (0.49 ± 0.5) + 0.9(brthwt) - 0.55(female) + (10.36 ± 0.57)(age) - 6.12(d1) -1.66(d2) -0.37(d3)

1.4.4 m1 fit

Y hat (written ŷ ) is the predicted value of y (the dependent variable) in m1.

dta1fm1 <- data.frame(dta1, yhat=fitted(m1))

1.4.5 Line chart (fitted)

點是實際的體重值
線是預測的體重值

ggplot(dta1fm1, aes(age, weight, group=id, color = gender))+
 geom_point()+
 geom_line(aes(age, yhat))+
 facet_wrap(~ gender)+
 labs(x = "Age (year)", y = "Weight (kg)")

quantile(dta1$age)

##        0%       25%       50%       75%      100% 
## 0.1149897 0.6461328 0.9965777 1.4709103 2.5462012

1.5 LM

1.5.1 m0(lm)

看年齡對體重的影響(Simple linear regression)

summary(m0 <- lm(weight ~ age, data=dta1))

## 
## Call:
## lm(formula = weight ~ age, data = dta1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.2994 -1.0341 -0.2524  1.0947  4.2614 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   5.2021     0.1779   29.24   <2e-16 ***
## age           3.3640     0.1332   25.26   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.471 on 196 degrees of freedom
## Multiple R-squared:  0.765,  Adjusted R-squared:  0.7638 
## F-statistic: 637.9 on 1 and 196 DF,  p-value: < 2.2e-16

Estimate weight = 5.20 + 3.36(age)

1.5.2 m0z (PR)

Piecewise regression based on quantile

summary(m0z <- segmented::segmented(m0, seg.Z= ~ age, 
                            psi=list(age=c(0.6, 1.4)),
                            control=seg.control(display=FALSE)))

## 
##  ***Regression Model with Segmented Relationship(s)***
## 
## Call: 
## segmented.lm(obj = m0, seg.Z = ~age, psi = list(age = c(0.6, 
##     1.4)), control = seg.control(display = FALSE))
## 
## Estimated Break-Point(s):
##            Est. St.Err
## psi1.age 0.318  0.139
## psi2.age 2.300  0.047
## 
## Meaningful coefficients of the linear terms:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)    2.269      1.391   1.631    0.105
## age           16.373     10.311   1.588    0.114
## U1.age       -13.734     10.312  -1.332       NA
## U2.age        -8.539      3.457  -2.470       NA
## 
## Residual standard error: 1.176 on 192 degrees of freedom
## Multiple R-Squared: 0.8528,  Adjusted R-squared: 0.849 
## 
## Convergence attained in 1 iter. (rel. change 1.6843e-06)

psi: estimated break-points and relevant (approximate) standard errors
The estimated weight is 2.26 -13.79(U1.age) -8.54(U1.age) + 16.43(age)

plot(m0z)
with(dta1, points(age, weight))
abline(v = m0z$psi[,2], lty = 3)
grid()

從出生至四個月大(0.317)間年齡對體重的影響大於“四個月(0.317)至2歲間(2.3)” ，2歲後年齡對體重的影響逐漸降低。

2 Inclass 2

2.1 Info

Problem: Edit the R script and comment on the analysis in the sleep deprivation example. Provide additional code chunks to complete the graphical analysis.
Data (sleepstudy{lme4}): To study the impact of sleep deprivation on cognitive performance, a sample of 18 truck drivers were allowed only 3 hours of sleep during 10 consecutive days and their reaction times were measured several times each day.

Column 1: Mean reaction time in milliseconds
Column 2: Days of sleep deprivation
Column 3: Truck driver ID

2.2 Data input

# load package
library(lme4)

# 18 subjects (3hrs sleep) 10 observations on each, p64 of bates
# the "new" random effects (vs nlme) see Bates book 2010
# sleep deprivation 3hrs/night truckers

#input 
data(sleepstudy, package="lme4")  

# nuber of row and column
dim(sleepstudy)

## [1] 180   3

# top of six value
head(sleepstudy)

##   Reaction Days Subject
## 1 249.5600    0     308
## 2 258.7047    1     308
## 3 250.8006    2     308
## 4 321.4398    3     308
## 5 356.8519    4     308
## 6 414.6901    5     308

# with function (8 subjects, 10 observations on each)
with(sleepstudy, table(Subject))

## Subject
## 308 309 310 330 331 332 333 334 335 337 349 350 351 352 369 370 371 372 
##  10  10  10  10  10  10  10  10  10  10  10  10  10  10  10  10  10  10

2.3 Scatterplot

bivariate scatterplots or time-series plots

# 
lattice::xyplot(Reaction ~ Days | Subject, data=sleepstudy, 
       xlab = "Days of sleep deprivation",
       ylab = "Average reaction time (ms)",
       type = c("g","p","r"),
       as.table = TRUE)

The average reaction time of the most of the subject is increase by the days of sleep deprivation. Only No.335 in the opposite.

2.3.1 Scatterplot 2

facet by subject

# unneeded
m0 <- lmer(Reaction ~ Days | Subject, data = sleepstudy)
# unneeded
sleepstudy1 <- sleepstudy %>% 
       mutate(coef1 = fitted(m0))

sleepstudy1$Subject <- factor(sleepstudy1$Subject, levels = c("309", "310", "335", "349", "351", "370", "371", "334", "330", "369", "332", "331", "350", "333", "372", "352", "308", "337"))

ggplot(sleepstudy1 , aes(Days, Reaction)) +
  stat_smooth(method = "lm", 
              formula = y ~ x, 
              se = FALSE, 
              lwd = rel(.5)) +
  geom_point (size = rel(.5)) +
  facet_wrap ( ~ Subject, nrow = 1) +
  labs(x = "Deprivation (in Days)", 
       y = "Reaction time (in ms)")

2.3.2 Graphs of averages

ggplot2::ggplot(sleepstudy1, aes(Days, Reaction)) +
 geom_point() +
 stat_smooth(method = "lm", aes(Days, Reaction), se = TRUE) +
 labs(x = "Deprivation (in Days)", 
      y = "Reaction time (in ms)") +
 theme_bw()

## `geom_smooth()` using formula 'y ~ x'

2.4 Lm

The impact of Days of sleep deprivation on the reaction time by each of subjects.

# Fit a list of lm or glm objects with a common model for different subgroups of the data.
sleeplmList <- lmList(Reaction ~ Days | Subject, data = sleepstudy)

# summary by subjects
sleeplmList

## Call: lmList(formula = Reaction ~ Days | Subject, data = sleepstudy) 
## Coefficients:
##     (Intercept)      Days
## 308    244.1927 21.764702
## 309    205.0549  2.261785
## 310    203.4842  6.114899
## 330    289.6851  3.008073
## 331    285.7390  5.266019
## 332    264.2516  9.566768
## 333    275.0191  9.142045
## 334    240.1629 12.253141
## 335    263.0347 -2.881034
## 337    290.1041 19.025974
## 349    215.1118 13.493933
## 350    225.8346 19.504017
## 351    261.1470  6.433498
## 352    276.3721 13.566549
## 369    254.9681 11.348109
## 370    210.4491 18.056151
## 371    253.6360  9.188445
## 372    267.0448 11.298073
## 
## Degrees of freedom: 180 total; 144 residual
## Residual standard error: 25.59182

#mean int and slope match lmer Fixed effects results p.67
mean(coef(sleeplmList)[,1])

## [1] 251.4051

mean(coef(sleeplmList)[,2])

## [1] 10.46729

Mean intercept = 251.4051
Mean slope (Days) = 10.46729

# these are too big as they should be, 
# compare with variance random effects p.67
# mle subtracts off wobble in estimated indiv regressions Y on t

# - SSR/SST
var(coef(sleeplmList)[,1])

## [1] 838.3423

var(coef(sleeplmList)[,2])

## [1] 43.01034

# quantile of intercept and slope
quantile(coef(sleeplmList)[,1])

##       0%      25%      50%      75%     100% 
## 203.4842 229.4167 258.0576 273.0255 290.1041

quantile(coef(sleeplmList)[,2])

##        0%       25%       50%       75%      100% 
## -2.881034  6.194548 10.432421 13.548395 21.764702

Should consider the random effect of slope?

2.5 Stem-and-leaf plot

# stem-and-leaf plot     
stem(coef(sleeplmList)[,2])

## 
##   The decimal point is 1 digit(s) to the right of the |
## 
##   -0 | 3
##   0 | 23
##   0 | 56699
##   1 | 011234
##   1 | 89
##   2 | 02

2.6 Correlation

correlation between intercept and slope is -0.14

cor(coef(sleeplmList)[,1], coef(sleeplmList)[,2])

## [1] -0.1375534

2.7 LMER

2.7.1 REML

# first 2 do REML, second pair match 
# Bates p.67 in doing MLE (REML FALSE)

sleeplmer <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
summary(sleeplmer)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: Reaction ~ Days + (Days | Subject)
##    Data: sleepstudy
## 
## REML criterion at convergence: 1743.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.9536 -0.4634  0.0231  0.4634  5.1793 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr
##  Subject  (Intercept) 612.10   24.741       
##           Days         35.07    5.922   0.07
##  Residual             654.94   25.592       
## Number of obs: 180, groups:  Subject, 18
## 
## Fixed effects:
##             Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)  251.405      6.825  17.000  36.838  < 2e-16 ***
## Days          10.467      1.546  17.000   6.771 3.26e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##      (Intr)
## Days -0.138

2.7.2 Pair match

sleeplmer2 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy, REML=FALSE)
summary(sleeplmer2)

## Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
##   method [lmerModLmerTest]
## Formula: Reaction ~ Days + (Days | Subject)
##    Data: sleepstudy
## 
##      AIC      BIC   logLik deviance df.resid 
##   1763.9   1783.1   -876.0   1751.9      174 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.9416 -0.4656  0.0289  0.4636  5.1793 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr
##  Subject  (Intercept) 565.48   23.780       
##           Days         32.68    5.717   0.08
##  Residual             654.95   25.592       
## Number of obs: 180, groups:  Subject, 18
## 
## Fixed effects:
##             Estimate Std. Error      df t value Pr(>|t|)    
## (Intercept)  251.405      6.632  18.001  37.907  < 2e-16 ***
## Days          10.467      1.502  18.000   6.968 1.65e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##      (Intr)
## Days -0.138

3 Inclass 3

3.1 Info

Problem: Edit and comment on the analysis in the fat and menarche markdown file.
Data: Study of influence of menarche on changes in body fat accretion. The data are from a prospective study on body fat accretion in a cohort of 162 girls from the MIT Growth and Development Study. The study examined changes in percent body fat before and after menarche. The data represent a subset of the study materials and should not be used to draw substantive conclusions.

Column 1: Subject ID
Column 2: Current Age (years)
Column 3: Age at Menarche (years)
Column 4: Time relative to Menarche (years)
Column 5: Percent Body Fat

3.2 Data input

# load them
pacman::p_load(mlmRev, tidyverse, lme4, foreign, merTools, sjPlot)

# download a copy of data from the following link and save it to the
# data folder
# https://content.sph.harvard.edu/fitzmaur/ala2e/fat.dta

# input data
dta3 <- read.dta("fat.dta")

# rename
names(dta3) <-c("ID", "Age", "Age_m", "Time", "PBF")

# show first 6 obs.
head(dta3)

##   ID   Age Age_m  Time   PBF
## 1  1  9.32 13.19 -3.87  7.94
## 2  1 10.33 13.19 -2.86 15.65
## 3  1 11.24 13.19 -1.95 13.51
## 4  1 12.19 13.19 -1.00 23.23
## 5  1 13.24 13.19  0.05 10.52
## 6  1 14.24 13.19  1.05 20.45

# construct indicator variable for after mc
dta3 <- dta3 %>% 
       mutate(ID = factor(ID),                
              Time_a = ifelse(Time > 0, Time, 0),
              Menarche = as.factor(ifelse(Time > 0, "T", "F")))

# show data structure
str(dta3)

## 'data.frame':    1049 obs. of  7 variables:
##  $ ID      : Factor w/ 162 levels "1","2","3","4",..: 1 1 1 1 1 1 2 2 2 2 ...
##  $ Age     : num  9.32 10.33 11.24 12.19 13.24 ...
##  $ Age_m   : num  13.2 13.2 13.2 13.2 13.2 ...
##  $ Time    : num  -3.87 -2.86 -1.95 -1 0.05 ...
##  $ PBF     : num  7.94 15.65 13.51 23.23 10.52 ...
##  $ Time_a  : num  0 0 0 0 0.05 ...
##  $ Menarche: Factor w/ 2 levels "F","T": 1 1 1 1 2 2 1 1 1 1 ...
##  - attr(*, "datalabel")= chr ""
##  - attr(*, "time.stamp")= chr "23 Mar 2011 15:46"
##  - attr(*, "formats")= chr [1:5] "%9.0g" "%9.0g" "%9.0g" "%9.0g" ...
##  - attr(*, "types")= int [1:5] 254 254 254 254 254
##  - attr(*, "val.labels")= chr [1:5] "" "" "" "" ...
##  - attr(*, "var.labels")= chr [1:5] "" "" "" "" ...
##  - attr(*, "version")= int 12

# black and white theme                     
theme_set(theme_minimal())

3.3 Plot

3.3.1 Before MC

ggplot(subset(dta3, Menarche=="F"),
       aes(Time, PBF, group=ID)) +
  #geom_point()+
  geom_line(col="Rosy Brown 1")+
  stat_smooth(aes(group=1),method='lm',formula=y~x, col="Indian Red 1")+
  labs(x="Age (in year)",
       y="% Body fat")

The percent body fat is not increased by age before the menarche among 162 girls.

3.3.2 After MC

ggplot(subset(dta3, Menarche=="T"),
       aes(Time, PBF, group=ID)) +
# geom_point()+
 geom_line(col="cyan3")+
 stat_smooth(aes(group=1),method='lm',formula=y~x, col="cyan4")+
 labs(x="Age (in year)",
      y="% Body fat")

3.4 LMER

3.4.1 Before MC

# baseline model if you want one
mb_0 <- lmer(PBF ~ Time + (1 | ID), data = subset(dta3, Menarche=='F'))
tab_model(mb_0)

	PBF
Predictors	Estimates	CI	p
(Intercept)	20.20	19.06 – 21.34	<0.001
Time	-0.09	-0.33 – 0.16	0.494
Random Effects
σ²	10.02
τ₀₀ _ID	43.81
ICC	0.81
N _ID	162
Observations	497
Marginal R² / Conditional R²	0.000 / 0.814

3.4.2 After MC

ma_0 <- lmer(PBF ~ Time + (Time | ID), data=subset(dta3, Menarche=='T'))
tab_model(ma_0)

	PBF
Predictors	Estimates	CI	p
(Intercept)	22.59	21.42 – 23.75	<0.001
Time	1.97	1.68 – 2.25	<0.001
Random Effects
σ²	7.82
τ₀₀ _ID	46.21
τ₁₁ _ID.Time	1.42
ρ₀₁ _ID	-0.55
ICC	0.82
N _ID	160
Observations	552
Marginal R² / Conditional R²	0.123 / 0.844

3.4.3 Confidence Ellipses

betas_a <- na.omit(coef(lmList(PBF ~ Time | ID, data=subset(dta3, Menarche=='T'))))
car::dataEllipse(betas_a[,1],betas_a[,2])

3.4.4 All

m4 <- lmer(PBF ~ Time + Time_a + (Time + Time_a | ID), data  =dta3)
dta_full <- dta3 %>% 
       mutate(yhat = fitted(m4))

# individual connected lines different color for before and after
# add group regression lines
# dta_full
ggplot(dta_full, aes(Time, yhat, color = Menarche)) +
 geom_line(aes(group = ID)) +
 geom_point(alpha = 0.5) +
 stat_smooth(method="lm", aes(group = Menarche)) +
 guides(color = F) +
 labs(x = "Age relative to menarche (in years)", 
      y = "Percent body fat")

## `geom_smooth()` using formula 'y ~ x'

3.4.5 m0

# baseline model if you want one
summary(m0 <- lmer(PBF ~ Time + (Time | ID), data=dta3))

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.00354498 (tol = 0.002, component 1)

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: PBF ~ Time + (Time | ID)
##    Data: dta3
## 
## REML criterion at convergence: 6196.9
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.7577 -0.5961  0.0221  0.6169  3.0981 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  ID       (Intercept) 38.2178  6.1821        
##           Time         0.6938  0.8329   -0.24
##  Residual             11.6215  3.4090        
## Number of obs: 1049, groups:  ID, 162
## 
## Fixed effects:
##              Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)  23.04978    0.49985 159.73871   46.11   <2e-16 ***
## Time          1.55480    0.08484 145.73759   18.33   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##      (Intr)
## Time -0.199
## convergence code: 0
## Model failed to converge with max|grad| = 0.00354498 (tol = 0.002, component 1)

3.4.6 m1

# model with 3 random effects associated with an individual
summary(m1 <- lmer(PBF ~ Time + Time_a + (Time + Time_a | ID), data = dta3))

## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: PBF ~ Time + Time_a + (Time + Time_a | ID)
##    Data: dta3
## 
## REML criterion at convergence: 6062.4
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.7742 -0.5900 -0.0359  0.5946  3.3798 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr       
##  ID       (Intercept) 45.942   6.778               
##           Time         1.631   1.277     0.29      
##           Time_a       2.750   1.658    -0.54 -0.83
##  Residual              9.473   3.078               
## Number of obs: 1049, groups:  ID, 162
## 
## Fixed effects:
##             Estimate Std. Error       df t value Pr(>|t|)    
## (Intercept)  21.3614     0.5646 161.5610  37.837  < 2e-16 ***
## Time          0.4171     0.1572 108.4614   2.654  0.00915 ** 
## Time_a        2.0471     0.2280 132.6773   8.980 2.32e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##        (Intr) Time  
## Time    0.351       
## Time_a -0.515 -0.872

# estimate random effects by simulation
m1_re <- REsim(m1)

# plot for random effects
plotREsim(m1_re)

# residual plot
plot(m1, resid(., type = "pearson") ~ fitted(.),
     abline = 0, pch = 20, cex = .8, id = 0.05,
     xlab = "Ftted values", ylab = "Pearson Residuals")

In-class Exercises: Random coefficients models

Chang-Chun Chen

2020-10-19