To decide this you need to look at two things: The sampling structure and the ICC.
Data Structure
Remember from the first lecture that MLM data structures will include variables that have different sample sizes because they represent 2 or more structural sampling units.
Lets say we have a study examining students sense of community and school connectedness, and our study includes students from 50 different high schools across the State. From each school we were able to survey a random sample of 100 students.
Our data file includes variables on adolescent attributes such as gender, age, race/ethnicity, perceived sense of connectedness to their community and to their school (e.g., “I feel like I am a valued member of my school”) but also has a number of variables that are attributes of the school: % of students receiving free and reduced lunch, student-to-teacher ratio, level of parent engagement, etc. With this data structure we will have different sample sizes for each set of variables:
Student Variables: 100 students from 50 high schools –> n=5,000
School Variables: n = 50
Thus, it looks like we will need a MLM analysis. But this raises a practical issue: How do we set the data up in the data file?
Traditionally we analyse data in a spreadsheet format like so:
Wide format data frame
Student ID
Age
Gender
Connectedness
Free/Reduced
001
x
x
x
x
002
x
x
x
x
003
x
x
x
x
But this doesn’t work because it doesn’t account for school level effects and mis-attributes the percentage of students with free and reduced lunch to a student level variable. So the first thing we need to do is restructure the data from wide to long format:
Long format data frame
School ID
Student ID
Age
Gender
Feeling Connected to School
Free Reduced Lunch
01
001
x
x
x
x
01
002
x
x
x
x
02
001
x
x
x
x
Lets take a real example
Start with loading the packages we are going to need
Installing package into 'C:/Users/partr/AppData/Local/R/win-library/4.4'
(as 'lib' is unspecified)
package 'lme4' successfully unpacked and MD5 sums checked
Warning: cannot remove prior installation of package 'lme4'
Warning in file.copy(savedcopy, lib, recursive = TRUE): problem copying
C:\Users\partr\AppData\Local\R\win-library\4.4\00LOCK\lme4\libs\x64\lme4.dll to
C:\Users\partr\AppData\Local\R\win-library\4.4\lme4\libs\x64\lme4.dll:
Permission denied
Warning: restored 'lme4'
The downloaded binary packages are in
C:\Users\partr\AppData\Local\Temp\Rtmpczpjwu\downloaded_packages
install.packages("lmerTest")
Installing package into 'C:/Users/partr/AppData/Local/R/win-library/4.4'
(as 'lib' is unspecified)
package 'lmerTest' successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\partr\AppData\Local\Temp\Rtmpczpjwu\downloaded_packages
install.packages("dataMaid")
Installing package into 'C:/Users/partr/AppData/Local/R/win-library/4.4'
(as 'lib' is unspecified)
package 'dataMaid' successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\partr\AppData\Local\Temp\Rtmpczpjwu\downloaded_packages
library(dataMaid)library(lme4)
Loading required package: Matrix
Attaching package: 'lme4'
The following object is masked from 'package:dataMaid':
isSingular
library(lmerTest)
Attaching package: 'lmerTest'
The following object is masked from 'package:lme4':
lmer
The following object is masked from 'package:stats':
step
In this data set we can see that there are 100 organizations with 10 respondents per organization. Thus,
Organization Variables: n=100
Employee Variables: n = 1,000
Clustering v. Nesting
This is a bit pedantic, but one can think of nesting as the data structure and clustering as the statistical violation of independence. Looking at our data we have established that it is nested, but is it clustered? To figure this out we need to take a look at the ICC.
The first step in calculating the ICC is to run an unconditional random intercept model:
Model_2 <-lmer(Distrust ~ age + (1+ age |Organization), data = MLM_2LV_Data)
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.0601635 (tol = 0.002, component 1)
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?
summary(Model_2)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: Distrust ~ age + (1 + age | Organization)
Data: MLM_2LV_Data
REML criterion at convergence: 2159.8
Scaled residuals:
Min 1Q Median 3Q Max
-2.1766 -0.7007 -0.1910 0.6793 2.5072
Random effects:
Groups Name Variance Std.Dev. Corr
Organization (Intercept) 0.3614853 0.60124
age 0.0001546 0.01243 -0.85
Residual 0.4179503 0.64649
Number of obs: 1000, groups: Organization, 100
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 1.083290 0.093047 94.467521 11.642 < 2e-16 ***
age 0.012798 0.001903 98.079178 6.727 1.16e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
age -0.911
optimizer (nloptwrap) convergence code: 0 (OK)
Model failed to converge with max|grad| = 0.0601635 (tol = 0.002, component 1)
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?
Now we can plot this model to examine the organization level influences. First we need to extract the fitted values and the random effects from the model
Then we can plot them with each org having its own graph.
ggplot(MLM_2LV_Data, aes(x = age, y = Distrust)) +geom_point(color ="blue", alpha =0.5) +# Observed datageom_line(aes(y = fitted), color ="red") +# Fitted valuesfacet_wrap(~ Organization) +# Separate plots for each grouplabs(title ="Random Intercept and Slope Model",x ="Employee Age",y ="Distrust") +theme_minimal()
We can also look at org on one graph.
ggplot(MLM_2LV_Data, aes(x = age, y = Distrust, group = Organization, color = Organization)) +geom_point(alpha =0.4) +# Observed datageom_line(aes(y = fitted), linewidth = .5) +# Fitted valueslabs(title ="Random Intercept and Slope Model",x ="Employee Age",y ="Distrust",color ="Organization") +theme_minimal()
This begs another question. Are there characteristics of the organization that can account for the variability in either the overall level of distrust in an organization or the relationship of age to distrust within an organization.
To answer this question we can include a level 2 predictor
Model_3 <-lmer(Distrust ~ age + RelSalary + age:RelSalary + (1+ age | Organization),data = MLM_2LV_Data)
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.579786 (tol = 0.002, component 1)
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?
summary(Model_3)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: Distrust ~ age + RelSalary + age:RelSalary + (1 + age | Organization)
Data: MLM_2LV_Data
REML criterion at convergence: 2149.5
Scaled residuals:
Min 1Q Median 3Q Max
-2.1466 -0.6965 -0.1570 0.6727 2.5922
Random effects:
Groups Name Variance Std.Dev. Corr
Organization (Intercept) 0.4056374 0.63690
age 0.0001979 0.01407 -0.91
Residual 0.4119980 0.64187
Number of obs: 1000, groups: Organization, 100
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 1.080976 0.094934 80.467811 11.387 < 2e-16 ***
age 0.012854 0.002004 84.400915 6.413 7.85e-09 ***
RelSalary -0.268217 0.094417 79.899713 -2.841 0.00571 **
age:RelSalary 0.001979 0.002005 84.724569 0.987 0.32647
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) age RlSlry
age -0.933
RelSalary 0.010 -0.012
age:RelSlry -0.012 0.013 -0.933
optimizer (nloptwrap) convergence code: 0 (OK)
Model failed to converge with max|grad| = 0.579786 (tol = 0.002, component 1)
Model is nearly unidentifiable: very large eigenvalue
- Rescale variables?
Now we can graph this model to aid in interpretation. Again first we need to extract the fitted values from the model.
MLM_2LV_Data$fitted <-fitted(Model_3)
Then we can generate a plot in ggplot.
ggplot(MLM_2LV_Data, aes(x = age, y = fitted, group = RelSalary, color = RelSalary)) +geom_point(alpha =0.5) +# Observed datageom_line(aes(y = fitted), linewidth = .5) +# Fitted valueslabs(title ="Random Intercept and Slope Model with Level 2 Predictor",x ="Age",y ="Distrust",color ="RelSalary") +theme_minimal()
3-level models
To specify a 3-level model we just expand on the nested structure. In equation form it looks like the following: