1 Homework 1:

Use the models specified in the IQ and language example to replicate the results of analysis.

1.1 Load data file

pacman::p_load(mlmRev, tidyverse, lme4)

dta <- read.table('data/Homework.txt', header = T)
head(dta)

  School Pupil   IQ Language Group_size     IQ_c School_mean Group_mean   SES_c
1      1 17001 15.0       46         29  3.16594     -1.5141        5.9  -4.812
2      1 17002 14.5       45         29  2.66594     -1.5141        5.9 -17.812
3      1 17003  9.5       33         29 -2.33406     -1.5141        5.9 -12.812
4      1 17004 11.0       46         29 -0.83406     -1.5141        5.9  -4.812
5      1 17005  8.0       20         29 -3.83406     -1.5141        5.9 -17.812
6      1 17006  9.5       30         29 -2.33406     -1.5141        5.9 -17.812

1.2 Model 1

summary(m1 <- lmer(Language ~ IQ_c + (1 | School), data = dta))

Linear mixed model fit by REML ['lmerMod']
Formula: Language ~ IQ_c + (1 | School)
   Data: dta

REML criterion at convergence: 15256

Scaled residuals: 
   Min     1Q Median     3Q    Max 
-4.094 -0.637  0.058  0.706  3.145 

Random effects:
 Groups   Name        Variance Std.Dev.
 School   (Intercept)  9.6     3.1     
 Residual             42.2     6.5     
Number of obs: 2287, groups:  School, 131

Fixed effects:
            Estimate Std. Error t value
(Intercept)  40.6082     0.3082   131.8
IQ_c          2.4876     0.0701    35.5

Correlation of Fixed Effects:
     (Intr)
IQ_c 0.018

The variance of school is 9.6 with 3.1 SD and the estimate of centered IQ is 2.49 and SE is 0.07.

2 Homework 2:

Replicate the results of analysis reported the math scores over years data example.

2.1 Load data file

# load them
pacman::p_load(foreign, tidyverse, lme4, merTools)

# file location
fLoc <- "data/eg_hlm.sav"

# read spss format and covert it to data frame 
dta <- read.spss(fLoc) %>% as.data.frame

# check data structure
str(dta)

'data.frame':   7230 obs. of  12 variables:
 $ size    : num  380 380 380 380 380 380 380 380 380 380 ...
 $ lowinc  : num  40.3 40.3 40.3 40.3 40.3 40.3 40.3 40.3 40.3 40.3 ...
 $ mobility: num  12.5 12.5 12.5 12.5 12.5 12.5 12.5 12.5 12.5 12.5 ...
 $ female  : num  0 0 0 0 0 0 0 0 0 0 ...
 $ black   : num  0 0 0 0 0 0 0 0 0 0 ...
 $ hispanic: num  1 1 1 0 0 0 0 0 1 1 ...
 $ year    : num  0.5 1.5 2.5 -1.5 -0.5 0.5 1.5 2.5 -1.5 -0.5 ...
 $ grade   : num  2 3 4 0 1 2 3 4 0 1 ...
 $ math    : num  1.146 1.134 2.3 -1.303 0.439 ...
 $ retained: num  0 0 0 0 0 0 0 0 0 0 ...
 $ school  : Factor w/ 60 levels "        2020",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ cid     : Factor w/ 1721 levels "   101480302",..: 244 244 244 248 248 248 248 248 253 253 ...

2.2 Visualization

# set black-and-white theme
theme_set(theme_bw())

#
ggplot(dta, aes(year, math, group = cid)) +
 geom_point(alpha = .1) +
 geom_line(alpha = .1) +
 stat_smooth(aes(group = school), method = "lm", se = F, lwd = .6) +
 stat_smooth(aes(group = 1), method = "lm", se = F, col = "magenta") +
 labs(x = "Year of age (centered)", y = "Math score")

2.3 Model 1

# nested specification with random intercepts only
# fixed-effect for slope
summary(m1 <- lmer(math ~ year + (1 | school/cid), data = dta))

Linear mixed model fit by REML ['lmerMod']
Formula: math ~ year + (1 | school/cid)
   Data: dta

REML criterion at convergence: 16759

Scaled residuals: 
   Min     1Q Median     3Q    Max 
-3.809 -0.583 -0.027  0.557  6.087 

Random effects:
 Groups     Name        Variance Std.Dev.
 cid:school (Intercept) 0.670    0.818   
 school     (Intercept) 0.187    0.432   
 Residual               0.347    0.589   
Number of obs: 7230, groups:  cid:school, 1721; school, 60

Fixed effects:
            Estimate Std. Error t value
(Intercept)  -0.7805     0.0611   -12.8
year          0.7461     0.0054   138.3

Correlation of Fixed Effects:
     (Intr)
year -0.031

# same as
#lmer(math ~ year + (1 | school) +( 1 | school:cid) , data = dta))
#

The variance of school is 0.19 with 0.43 SD, the variance of child nested by school is 0,67 and SD is 0.82, and the effect of year is significant on math with 0.75 beta and 0.005 SE.

2.4 Model 2

# random slopes
summary(m2 <- lmer(math ~ ( year | school/cid), data = dta))

Linear mixed model fit by REML ['lmerMod']
Formula: math ~ (year | school/cid)
   Data: dta

REML criterion at convergence: 16554

Scaled residuals: 
   Min     1Q Median     3Q    Max 
-3.223 -0.561 -0.031  0.531  5.158 

Random effects:
 Groups     Name        Variance Std.Dev. Corr
 cid:school (Intercept) 0.6413   0.801        
            year        0.0113   0.106    0.55
 school     (Intercept) 0.9244   0.961        
            year        0.6079   0.780    0.92
 Residual               0.3011   0.549        
Number of obs: 7230, groups:  cid:school, 1721; school, 60

Fixed effects:
            Estimate Std. Error t value
(Intercept)  -1.6442     0.0546   -30.1
convergence code: 0
Model failed to converge with max|grad| = 0.00321233 (tol = 0.002, component 1)

When we considered the effect of year, the variance of school is 0.92 with 0.96 SD, the variance of child nested by school is 0.64 and SD is 0.80.

2.5 Model Comparison

anova(m1, m2)

Data: dta
Models:
m1: math ~ year + (1 | school/cid)
m2: math ~ (year | school/cid)
   npar   AIC   BIC logLik deviance Chisq Df Pr(>Chisq)
m1    5 16757 16791  -8374    16747                    
m2    8 16566 16621  -8275    16550   197  3     <2e-16

These two models are significant different

2.6 Random effects

# merTools
# estimate random-effects parameters by simulation
m2_re <- REsim(m2)


# plot for random effects
plotREsim(m2_re)

2.7 Diagnosis

# residual plots
plot(m1, ylim = c(-4, 4), 
     xlab = "Fitted values", ylab = "Pearson residuals", cex =.5)

#
plot(m2, ylim = c(-4, 4),
     xlab = "Fitted values", ylab = "Pearson residuals", cex =.5)

3 Homework 3:

How does the estimate of grand mean depend on the intra-class correlation in a simple random-effects model?

3.1 Load data file

dta1 <- read.table('data/demo1.txt', header = T)

head(dta1)

  Score School Pupil
1    11     S1    P1
2    13     S1    P2
3    51     S2    P3
4    53     S2    P4
5    55     S2    P5
6    91     S3    P6

dta2 <- read.table('data/demo2.txt', header = T)
head(dta2)

  Score School Pupil
1    11     S1    P1
2    51     S1    P2
3    13     S2    P3
4    55     S2    P4
5    91     S2    P5
6    51     S3    P6

3.2 Model 1

summary(m1 <- lmer(Score ~  1 + (1 | School), data=dta1))

Linear mixed model fit by REML ['lmerMod']
Formula: Score ~ 1 + (1 | School)
   Data: dta1

REML criterion at convergence: 51.5

Scaled residuals: 
   Min     1Q Median     3Q    Max 
-1.328 -0.474  0.000  0.461  1.355 

Random effects:
 Groups   Name        Variance Std.Dev.
 School   (Intercept) 1679     40.98   
 Residual                5      2.24   
Number of obs: 9, groups:  School, 3

Fixed effects:
            Estimate Std. Error t value
(Intercept)     53.0       23.7    2.24

For the first data set, the results showed the variance of school is 1679 with 40.98 SD.

3.3 Model 2

summary(m2 <- lmer(Score ~  1 + (1 | School), data=dta2))

Linear mixed model fit by REML ['lmerMod']
Formula: Score ~ 1 + (1 | School)
   Data: dta2

REML criterion at convergence: 80.4

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.3673 -0.3429 -0.0278  1.0715  1.1400 

Random effects:
 Groups   Name        Variance Std.Dev.
 School   (Intercept) 104      10.2    
 Residual             968      31.1    
Number of obs: 9, groups:  School, 3

Fixed effects:
            Estimate Std. Error t value
(Intercept)     56.4       12.0    4.69

For the second data set, the results showed the variance of school is 104 with 10.2 SD.

Multilevel Models: Homework exercises

Hao-Lun Fu

2020-10-12