1 Homework 1

1.1 Info

Problem: Use the models specified in the “IQ and language” example to replicate the results of analysis.
Data: The data set is concerned with grade 8 pupils (age about 11 years) in elementary schools in the Netherlands. The number of pupils is 2287, and the number of schools is 131. Class sizes are from 4 to 35. The question of interest is how the language test score depends on the pupil’s intelligence and his family’s socio-economic status, and on some school variables.

Column 1: School ID
Column 2: Pupil ID
Column 3: Verbal IQ
Column 4: Language achievement score
Column 5: A family’s Socio-Economic Status
Column 6: The number of 8th graders in the classroom
Column 7: Verbal IQ centered
Column 8: School mean centered IQ
Column 9: Group size centered
Column 10: SES centered

1.2 Data input

# load package
pacman::p_load(mlmRev, tidyverse, lme4, nlme)
# input data
dta1 <- read.table("iq_language.txt", header = T)
#
head(dta1)

##   School Pupil   IQ Language Group_size       IQ_c School_mean Group_mean
## 1      1 17001 15.0       46         29  3.1659379    -1.51406        5.9
## 2      1 17002 14.5       45         29  2.6659379    -1.51406        5.9
## 3      1 17003  9.5       33         29 -2.3340621    -1.51406        5.9
## 4      1 17004 11.0       46         29 -0.8340621    -1.51406        5.9
## 5      1 17005  8.0       20         29 -3.8340621    -1.51406        5.9
## 6      1 17006  9.5       30         29 -2.3340621    -1.51406        5.9
##        SES_c
## 1  -4.811981
## 2 -17.811981
## 3 -12.811981
## 4  -4.811981
## 5 -17.811981
## 6 -17.811981

1.3 LMER

1.3.1 m0

One level: School(randoom effect)

summary(m0 <- lmer(Language ~ (1|School), data = dta1))

## Linear mixed model fit by REML ['lmerMod']
## Formula: Language ~ (1 | School)
##    Data: dta1
## 
## REML criterion at convergence: 16253.1
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -3.11618 -0.65703  0.07597  0.74128  2.50639 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  School   (Intercept) 19.63    4.431   
##  Residual             64.56    8.035   
## Number of obs: 2287, groups:  School, 131
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)  40.3624     0.4282   94.26

The estimated overall language score is 40.36.
The estimated between school variance in language score is 19.63.
The variation between student within the same school is estimated to be 64.56.
The between school variance is about 19.63 / (19.63 + 64.56) = 0.2331631 (23%)of the total variance. This is the intraclass correlation.

1.3.2 m1

One level with one predictor

summary(m1 <- lmer(Language ~ IQ_c + (1|School), data = dta1))

## Linear mixed model fit by REML ['lmerMod']
## Formula: Language ~ IQ_c + (1 | School)
##    Data: dta1
## 
## REML criterion at convergence: 15255.8
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.0939 -0.6375  0.0579  0.7061  3.1448 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  School   (Intercept)  9.602   3.099   
##  Residual             42.245   6.500   
## Number of obs: 2287, groups:  School, 131
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept) 40.60823    0.30819   131.8
## IQ_c         2.48759    0.07008    35.5
## 
## Correlation of Fixed Effects:
##      (Intr)
## IQ_c 0.018

9.4968/(9.4968 + 42.2272) = .18
9.602/(9.602 + 42.245) = 0.1851988
The between-pupil differences in language scores are partially explained by IQ score.

1.3.3 Comparison

sjPlot::tab_model(m0, m1, show.p=FALSE, show.r2=FALSE, show.obs=FALSE, show.ngroups=FALSE, show.se=TRUE, show.ci=FALSE)

	Language		Language
Predictors	Estimates	std. Error	Estimates	std. Error
(Intercept)	40.36	0.43	40.61	0.31
IQ_c			2.49	0.07
Random Effects
σ²	64.56		42.24
τ₀₀	19.63 _School		9.60 _School
ICC	0.23		0.19

2 Homework 2

2.1 Info

Problem: Replicate the results of analysis reported the math scores over years data example.
Data: In this data set, the variable year is nested within students (the variable _cid), and students are nested within schools (the variable _school). There are 60 schools. The focus here is to examine the annual changes in math scores (_math) of students within schools.

2.2 Data input

# load them
pacman::p_load(foreign, tidyverse, lme4, merTools)
# file location
fLoc <- "eg_hlm.sav"
# read spss format and covert it to data frame 
dta2 <- read.spss(fLoc) %>% as.data.frame
# check data structure
str(dta2)

## 'data.frame':    7230 obs. of  12 variables:
##  $ size    : num  380 380 380 380 380 380 380 380 380 380 ...
##  $ lowinc  : num  40.3 40.3 40.3 40.3 40.3 40.3 40.3 40.3 40.3 40.3 ...
##  $ mobility: num  12.5 12.5 12.5 12.5 12.5 12.5 12.5 12.5 12.5 12.5 ...
##  $ female  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ black   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ hispanic: num  1 1 1 0 0 0 0 0 1 1 ...
##  $ year    : num  0.5 1.5 2.5 -1.5 -0.5 0.5 1.5 2.5 -1.5 -0.5 ...
##  $ grade   : num  2 3 4 0 1 2 3 4 0 1 ...
##  $ math    : num  1.146 1.134 2.3 -1.303 0.439 ...
##  $ retained: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ school  : Factor w/ 60 levels "        2020",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ cid     : Factor w/ 1721 levels "   101480302",..: 244 244 244 248 248 248 248 248 253 253 ...

# set black-and-white theme
theme_set(theme_bw())

2.3 Plot

ggplot(dta2, aes(year, math, group = cid)) +
 geom_point(alpha = .1) +
 geom_line(alpha = .1) +
 stat_smooth(aes(group = school), method = "lm", se = F, lwd = .6) +
 stat_smooth(aes(group = 1), method = "lm", se = F, col = "magenta") +
 labs(x = "Year of age (centered)", y = "Math score")

## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'

2.4 LMER

2.4.1 m1

# nested specification with random intercepts only
# fixed-effect for slope (year) 
# same as
# lmer(math ~ year + (1 | school) +( 1 | school:cid) , data = dta))
summary(m1 <- lmer(math ~ year + (1 | school/cid), data = dta2))

## Linear mixed model fit by REML ['lmerMod']
## Formula: math ~ year + (1 | school/cid)
##    Data: dta2
## 
## REML criterion at convergence: 16759.4
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.8093 -0.5831 -0.0270  0.5566  6.0867 
## 
## Random effects:
##  Groups     Name        Variance Std.Dev.
##  cid:school (Intercept) 0.6699   0.8185  
##  school     (Intercept) 0.1869   0.4324  
##  Residual               0.3470   0.5891  
## Number of obs: 7230, groups:  cid:school, 1721; school, 60
## 
## Fixed effects:
##              Estimate Std. Error t value
## (Intercept) -0.780482   0.061088  -12.78
## year         0.746123   0.005396  138.26
## 
## Correlation of Fixed Effects:
##      (Intr)
## year -0.031

2.4.2 m2

# random slopes (year) 
summary(m2 <- lmer(math ~ ( year | school/cid), data = dta2))

## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.00321233 (tol = 0.002, component 1)

## Linear mixed model fit by REML ['lmerMod']
## Formula: math ~ (year | school/cid)
##    Data: dta2
## 
## REML criterion at convergence: 16553.9
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.2231 -0.5611 -0.0308  0.5312  5.1579 
## 
## Random effects:
##  Groups     Name        Variance Std.Dev. Corr
##  cid:school (Intercept) 0.64127  0.8008       
##             year        0.01132  0.1064   0.55
##  school     (Intercept) 0.92436  0.9614       
##             year        0.60793  0.7797   0.92
##  Residual               0.30115  0.5488       
## Number of obs: 7230, groups:  cid:school, 1721; school, 60
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept) -1.64424    0.05462  -30.11
## convergence code: 0
## Model failed to converge with max|grad| = 0.00321233 (tol = 0.002, component 1)

2.4.3 comparison

# compare models
anova(m1, m2)

## refitting model(s) with ML (instead of REML)

## Data: dta2
## Models:
## m1: math ~ year + (1 | school/cid)
## m2: math ~ (year | school/cid)
##    npar   AIC   BIC  logLik deviance Chisq Df Pr(>Chisq)    
## m1    5 16757 16792 -8373.5    16747                        
## m2    8 16566 16621 -8275.0    16550 197.1  3  < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

sjPlot::tab_model(m1, m2, show.p=FALSE, show.r2=FALSE, show.obs=FALSE, show.ngroups=FALSE, show.se=TRUE, show.ci=FALSE)

	math		math
Predictors	Estimates	std. Error	Estimates	std. Error
(Intercept)	-0.78	0.06	-1.64	0.05
year	0.75	0.01
Random Effects
σ²	0.35		0.30
τ₀₀	0.67 _cid:school		0.64 _cid:school
	0.19 _school		0.92 _school
τ₁₁			0.01 _{cid:school.year}
			0.61 _school.year
ρ₀₁			0.55 _cid:school
			0.92 _school
ICC	0.71		0.84

2.5 Plot for random effects

# merTools
# estimate random-effects parameters by simulation
m2_re <- REsim(m2)
# plot for random effects
plotREsim(m2_re)

2.6 Residual plots

2.6.1 m1

# residual plots of m1
plot(m1, ylim = c(-4, 4), 
     xlab = "Fitted values", ylab = "Pearson residuals", cex =.5)

2.6.2 m2

# residual plots of m2
plot(m2, ylim = c(-4, 4),
     xlab = "Fitted values", ylab = "Pearson residuals", cex =.5)

3 Homework 3

3.1 Info

Problem: How does the estimate of grand mean depend on the intra-class correlation in a simple random-effects model?
Data: Two data sets are generated for you to investigate how the estimate of grand mean depend on the intra-class correlation in a simple random-effects model.

Column 1: Score
Column 2: School ID
Column 3: Student ID

3.2 Data input

# load package
pacman::p_load(mlmRev, tidyverse, lme4, nlme)
# input data
dta3_1 <- read.table("demo1.txt", header = T)
dta3_2 <- read.table("demo2.txt", header = T)

# grand mean 
summary(dta3_1$Score)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   11.00   51.00   55.00   62.11   93.00   97.00

summary(dta3_2$Score)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   11.00   51.00   53.00   57.44   91.00   97.00

dta3_1 <- dta3_1 %>% 
  group_by(School) %>% 
  mutate( ave_Score = mean(Score),
          c_Score = ave_Score - Score)
dta3_2 <- dta3_2 %>% 
  group_by(School) %>% 
  mutate( ave_Score = mean(Score),
          c_Score = ave_Score - Score)

3.3 Scatter Plot

ggplot(data = dta3_1, aes(School, Score)) +
  geom_point(data = dta3_1, aes(School, Score), colour = 'skyblue', size = 1) +
  geom_point(data = dta3_2, aes(School, Score), colour = 'coral', size = 1) +
  geom_hline(yintercept = mean(dta3_1$Score), colour = 'skyblue') +
  geom_hline(yintercept = mean(dta3_2$Score), colour = 'coral')

## LMER

summary(m0_1 <- lmer(Score ~ (1|School), data = dta3_1))

## Linear mixed model fit by REML ['lmerMod']
## Formula: Score ~ (1 | School)
##    Data: dta3_1
## 
## REML criterion at convergence: 51.5
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -1.3280 -0.4745  0.0000  0.4608  1.3553 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  School   (Intercept) 1679     40.976  
##  Residual                5      2.236  
## Number of obs: 9, groups:  School, 3
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)    53.01      23.67    2.24

summary(m0_2 <- lmer(Score ~ (1|School), data = dta3_2))

## Linear mixed model fit by REML ['lmerMod']
## Formula: Score ~ (1 | School)
##    Data: dta3_2
## 
## REML criterion at convergence: 80.4
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -1.36734 -0.34286 -0.02776  1.07153  1.13999 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  School   (Intercept) 104.2    10.21   
##  Residual             967.8    31.11   
## Number of obs: 9, groups:  School, 3
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)    56.36      12.01   4.692

sjPlot::tab_model(m0_1, m0_2, show.p=FALSE, show.r2=FALSE, show.obs=FALSE, show.ngroups=FALSE, show.se=TRUE, show.ci=FALSE)

	Score		Score
Predictors	Estimates	std. Error	Estimates	std. Error
(Intercept)	53.01	23.67	56.36	12.01
Random Effects
σ²	5.00		967.76
τ₀₀	1679.06 _School		104.17 _School
ICC	1.00		0.10

3.4 Residual plot

plot(m0_1, ylim = c(-20, 20), 
     xlab = "Fitted values", ylab = "Pearson residuals", cex =.5)

plot(m0_2, ylim = c(-20, 20), 
     xlab = "Fitted values", ylab = "Pearson residuals", cex =.5)

3.5 ICC

ICC(outcome = "Score", group ="School", data=dta3_1)

## [1] 0.9970309

ICC(outcome = "Score", group ="School", data=dta3_2)

## [1] 0.0971812

3.6 lm?

summary(m0_1c <- lm(Score ~ School:c_Score, data = dta3_1))

## 
## Call:
## lm(formula = Score ~ School:c_Score, data = dta3_1)
## 
## Residuals:
##       1       2       3       4       5       6       7       8       9 
## -50.111 -50.111  -9.111  -9.111  -9.111  31.889  31.889  31.889  31.889 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)   
## (Intercept)        62.111     14.406   4.311  0.00763 **
## SchoolS1:c_Score   -1.000     30.560  -0.033  0.97516   
## SchoolS2:c_Score   -1.000     15.280  -0.065  0.95036   
## SchoolS3:c_Score   -1.000      9.664  -0.103  0.92161   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 43.22 on 5 degrees of freedom
## Multiple R-squared:  0.003202,   Adjusted R-squared:  -0.5949 
## F-statistic: 0.005354 on 3 and 5 DF,  p-value: 0.9994

summary(m0_2c <- lm(Score ~ School:c_Score, data = dta3_2))

## 
## Call:
## lm(formula = Score ~ School:c_Score, data = dta3_2)
## 
## Residuals:
##       1       2       3       4       5       6       7       8       9 
## -26.444 -26.444  -4.444  -4.444  -4.444  16.556  16.556  16.556  16.556 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       57.4444     7.5340   7.625 0.000617 ***
## SchoolS1:c_Score  -1.0000     0.7991  -1.251 0.266142    
## SchoolS2:c_Score  -1.0000     0.4094  -2.443 0.058462 .  
## SchoolS3:c_Score  -1.0000     0.5131  -1.949 0.108844    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22.6 on 5 degrees of freedom
## Multiple R-squared:  0.6938, Adjusted R-squared:  0.5101 
## F-statistic: 3.777 on 3 and 5 DF,  p-value: 0.09333

sjPlot::tab_model(m0_1c, m0_2c, show.p=FALSE, show.r2=FALSE, show.obs=FALSE, show.ngroups=FALSE, show.se=TRUE, show.ci=FALSE)

	Score		Score
Predictors	Estimates	std. Error	Estimates	std. Error
(Intercept)	62.11	14.41	57.44	7.53
School [S1] : c_Score	-1.00	30.56	-1.00	0.80
School [S2] : c_Score	-1.00	15.28	-1.00	0.41
School [S3] : c_Score	-1.00	9.66	-1.00	0.51

Homework: Multilevel models

Chang-Chun Chen

2020-10-11