In this post, we discuss about how power analysis is done with simr package for multilevel data.

Example study: - Pupils from different classes will be recruited.
- Pupil level predictor: extraversion
- Class level predictor: teacher experience
- Outcome variable: pupil popularity

library(lme4)
library(MonteCarlo)
library(dplyr)
library(ggplot2)
library(afex)
library(simr)
library(pwr)
library(broom) #extract random effects
library(knitr)
library(truncnorm)

Data preprocessing

data <- read.delim('popular2.dat', header = TRUE, sep="", skip=2, as.is=TRUE)
data<-data[,c(1,2,4,5,6,7)]
names(data) <- c("class","pupil","extrav","sex","texp","popular")

df <- data %>% 
  # Grand mean centering
  mutate(extrav.gmc = extrav-mean(extrav)) %>%
  mutate(texp.gmc = texp-mean(texp)) %>%
  mutate(popular.gmc = popular-mean(popular))

#imagine we only have pilot data from 10 classes, we will create simulated model based on parameters from these pilot data.
#logic to select 10 classes in random

# set.seed(12345)
# class_unique<-data.frame(unique(df$class))
# class_sample<-sample_n(class_unique,20,replace=FALSE)
# 
# #logic to select all subjects in these classes
# df<-df%>%
#   filter(class %in% class_sample$unique.df.class.)

cols<-c('class','pupil',"sex")
data[cols]<-lapply(data[cols],factor)

kable(head(df))
class pupil extrav sex texp popular extrav.gmc texp.gmc popular.gmc
1 4 3 1 24 4.7 -2.2148222 9.751627 -0.3758137
1 5 5 1 24 6.0 -0.2148222 9.751627 0.9241863
1 6 4 0 24 4.7 -1.2148222 9.751627 -0.3758137
1 7 5 0 24 5.9 -0.2148222 9.751627 0.8241863
1 8 4 0 24 4.2 -1.2148222 9.751627 -0.8758137
1 9 5 0 24 5.2 -0.2148222 9.751627 0.1241863

Changing number of clusters while holding Ss per cluster = 10

#logic to select 10 subjects from each of these randomly selected classes
set.seed(0)

df_10<-df%>%
  group_by(class)%>%
  sample_n(size=10)

Fit lmer model

m<-lmer(popular.gmc~extrav.gmc*texp.gmc+
                (1|class),df_10,REML=FALSE)

summary(m)
## Linear mixed model fit by maximum likelihood . t-tests use
##   Satterthwaite's method [lmerModLmerTest]
## Formula: popular.gmc ~ extrav.gmc * texp.gmc + (1 | class)
##    Data: df_10
## 
##      AIC      BIC   logLik deviance df.resid 
##   2896.1   2925.5  -1442.0   2884.1      994 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.2958 -0.6777  0.0206  0.6763  2.5537 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  class    (Intercept) 0.4013   0.6335  
##  Residual             0.8824   0.9394  
## Number of obs: 1000, groups:  class, 100
## 
## Fixed effects:
##                       Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)          -0.080519   0.071156 105.784977  -1.132     0.26    
## extrav.gmc            0.487537   0.028548 963.511858  17.078  < 2e-16 ***
## texp.gmc              0.107757   0.010899 108.529925   9.886  < 2e-16 ***
## extrav.gmc:texp.gmc  -0.022240   0.004021 994.768035  -5.531 4.07e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) extrv. txp.gm
## extrav.gmc   0.024              
## texp.gmc    -0.022  0.188       
## extrv.gmc:.  0.181  0.046 -0.099

Extract parameter estimates from lmer model

# fixed intercept and slope
fixed <- c(-.0805,.4875,.1078,-.02224) 
# random intercept and slope variance-covariance matrix
# For class
rand <-.4013

# Exrtact the residual sd
s <- .8824^.5 

Create Simulated model using extracted parameters

model <- makeLmer(y ~ extrav*texp + (1|class), fixef=fixed, VarCorr=rand, sigma=s, data=df_10)
model
## Linear mixed model fit by REML ['lmerMod']
## Formula: y ~ extrav * texp + (1 | class)
##    Data: df_10
## REML criterion at convergence: 2911.247
## Random effects:
##  Groups   Name        Std.Dev.
##  class    (Intercept) 0.6335  
##  Residual             0.9394  
## Number of obs: 1000, groups:  class, 100
## Fixed Effects:
## (Intercept)       extrav         texp  extrav:texp  
##    -0.08050      0.48750      0.10780     -0.02224

Changing the number of classes

To study the effect an increase in sample size has on our ability to detect the effect of interest we can increase the number of levels for one of the factors in our model. This can be achieved with the extend function. For example, we could increase the number of classes from 10 to 50, this way we can do power analysis for any number of clusters that is less than 50.

model_ext_class <- extend(model, along="class", n=50)
model_ext_class
## Linear mixed model fit by REML ['lmerMod']
## Formula: y ~ extrav * texp + (1 | class)
##    Data: df_10
## REML criterion at convergence: 2911.247
## Random effects:
##  Groups   Name        Std.Dev.
##  class    (Intercept) 0.6335  
##  Residual             0.9394  
## Number of obs: 1000, groups:  class, 100
## Fixed Effects:
## (Intercept)       extrav         texp  extrav:texp  
##    -0.08050      0.48750      0.10780     -0.02224

Power curve of pupil extraversion

Here we test how number of clusters effect the power of extraversion.

set.seed(0)
p_curve_extrav <- powerCurve(model_ext_class, 
                            test=fcompare(y~texp), 
                            along="class",
                            breaks=c(10,15,20,30,50),
                            nsim=500,alpha=.05, progress=FALSE)
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
plot(p_curve_extrav)

This curve represents the power estimations of pupil extraversion when nsize = 10 and cluster size varies. We need at least 20 clusters of 10 to have a power of at least 80% given the effect is present.

Power curve of teacher experience

Here we test how number of clusters affect the power of teacher experience.

set.seed(0)
p_curve_texp <- powerCurve(model_ext_class, 
                            test=fcompare(y~extrav), 
                            along="class",
                            breaks=c(10,15,20,30,50),
                            nsim=500,alpha=.05, progress=FALSE)
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
plot(p_curve_texp)

This curve represents the power estimations of teacher experience when nsize = 10 and cluster size varies. We need at least 45 clusters of 10 to have a power of at least 80% given the effect is present.

Power curve of interaction between extraversion and teacher experience

Here we test how number of clusters affect the power of inetaction between extravtion and teacher experience.

set.seed(0)
p_curve_interaction <- powerCurve(model_ext_class, 
                            test=fcompare(y~texp+extrav), 
                            along="class",
                            breaks=c(10,15,20,30,50),
                            nsim=500,alpha=.05, progress=FALSE)
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
plot(p_curve_interaction)

This curve represents the power estimations of the cross-level interaction between extraversion and teacher experience when nsize = 10 and cluster size varies. We need at least 42 clusters of 10 to have a power of at least 80% given the effect is present.

Changing number of clusters while holding Ss per cluster = 15

#logic to select 10 subjects from each of these randomly selected classes
set.seed(0)

df_15<-df%>%
  group_by(class)%>%
  sample_n(size=15)

Fit lme model

m2<-lmer(popular.gmc~extrav.gmc*texp.gmc+
                (1|class),df_15,REML=FALSE)

summary(m2)
## Linear mixed model fit by maximum likelihood . t-tests use
##   Satterthwaite's method [lmerModLmerTest]
## Formula: popular.gmc ~ extrav.gmc * texp.gmc + (1 | class)
##    Data: df_15
## 
##      AIC      BIC   logLik deviance df.resid 
##   4292.8   4324.7  -2140.4   4280.8     1494 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.1896 -0.7144  0.0131  0.6716  3.1418 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  class    (Intercept) 0.3682   0.6068  
##  Residual             0.8908   0.9438  
## Number of obs: 1500, groups:  class, 100
## 
## Fixed effects:
##                       Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)         -6.548e-02  6.632e-02  1.051e+02  -0.987    0.326    
## extrav.gmc           4.794e-01  2.310e-02  1.474e+03  20.757  < 2e-16 ***
## texp.gmc             1.071e-01  1.018e-02  1.087e+02  10.522  < 2e-16 ***
## extrav.gmc:texp.gmc -2.539e-02  3.348e-03  1.464e+03  -7.585 5.88e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) extrv. txp.gm
## extrav.gmc  -0.002              
## texp.gmc    -0.030  0.169       
## extrv.gmc:.  0.167  0.027 -0.127

Extract parameter estimates from lmer model

# fixed intercept and slope
fixed2 <- c(-.0655,.4794,.1071,-.02539) 
# random intercept and slope variance-covariance matrix
# For class
rand2 <-.3682

# Exrtact the residual sd
s2 <- .8908^.5 

Create Simulated model using extracted parameters

model2 <- makeLmer(y ~ extrav*texp + (1|class), fixef=fixed2, VarCorr=rand2, sigma=s2, data=df_15)
model2
## Linear mixed model fit by REML ['lmerMod']
## Formula: y ~ extrav * texp + (1 | class)
##    Data: df_15
## REML criterion at convergence: 4403.964
## Random effects:
##  Groups   Name        Std.Dev.
##  class    (Intercept) 0.6068  
##  Residual             0.9438  
## Number of obs: 1500, groups:  class, 100
## Fixed Effects:
## (Intercept)       extrav         texp  extrav:texp  
##    -0.06550      0.47940      0.10710     -0.02539

Changing the number of classes

model_ext_class2 <- extend(model2, along="class", n=50)
model_ext_class2
## Linear mixed model fit by REML ['lmerMod']
## Formula: y ~ extrav * texp + (1 | class)
##    Data: df_15
## REML criterion at convergence: 4403.964
## Random effects:
##  Groups   Name        Std.Dev.
##  class    (Intercept) 0.6068  
##  Residual             0.9438  
## Number of obs: 1500, groups:  class, 100
## Fixed Effects:
## (Intercept)       extrav         texp  extrav:texp  
##    -0.06550      0.47940      0.10710     -0.02539

Power curve of pupil extraversion

Here we test how number of clusters effect the power of extraversion

set.seed(0)
p_curve_extrav2 <- powerCurve(model_ext_class2, 
                            test=fcompare(y~texp), 
                            along="class",
                            breaks=c(10,15,20,30,50),
                            nsim=500,alpha=.05, progress=FALSE)
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
plot(p_curve_extrav2)

This curve represents the power estimations of pupil extraversion when nsize = 15 and cluster size varies. We need at least 15 clusters of 15 to have a power of at least 80% given the effect is present.

Power curve of teacher experience

Here we test how number of clusters effect the power of teacher experience

set.seed(0)
p_curve_texp2 <- powerCurve(model_ext_class2, 
                            test=fcompare(y~extrav), 
                            along="class",
                            breaks=c(10,15,20,30,50),
                            nsim=500,alpha=.05, progress=FALSE)
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
plot(p_curve_texp2)

This curve represents the power estimations of teacher experience when nsize = 15 and cluster size varies. We need at least 18 clusters of 15 to have a power of at least 80% given the effect is present.

Power curve of interaction between extraversion and teacher experience

Here we test how number of clusters affect the power of interaction between extraversion and teacher experience

set.seed(0)
p_curve_interaction2 <- powerCurve(model_ext_class2, 
                            test=fcompare(y~texp+extrav), 
                            along="class",
                            breaks=c(10,15,20,30,50),
                            nsim=500,alpha=.05, progress=FALSE)
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
plot(p_curve_interaction2)

This curve represents the power estimations of the cross-level interaction between extraversion and teacher experience when nsize = 15 and cluster size varies. We need at least 20 clusters of 15 to have a power of at least 80% given the effect is present.


Conlusion:
We need at least 20 (clusters) x 15 (Ss/cluster) = 300 observations to detect all 3 effects (extrav, texp, extrav x texp) given that all 3 effects exist in the population.

references: https://humburg.github.io/Power-Analysis/simr_power_analysis.html
http://www.alexanderdemos.org/Mixed9.html#