In this post, we discuss about how power analysis is done with simr package for multilevel data.
Example study: - Pupils from different classes will be recruited. - Pupil level predictor: extraversion - Class level predictor: teacher experience - Outcome variable: pupil popularity
library(lme4)
library(MonteCarlo)
library(dplyr)
library(ggplot2)
library(afex)
library(simr)
library(pwr)
library(broom) #extract random effects
library(knitr)
library(truncnorm)
data <- read.delim('popular2.dat', header = TRUE, sep="", skip=2, as.is=TRUE)
data<-data[,c(1,2,4,5,6,7)]
names(data) <- c("class","pupil","extrav","sex","texp","popular")
df <- data %>%
# Grand mean centering
mutate(extrav.gmc = extrav-mean(extrav)) %>%
mutate(texp.gmc = texp-mean(texp)) %>%
mutate(popular.gmc = popular-mean(popular))
#imagine we only have pilot data from 10 classes, we will create simulated model based on parameters from these pilot data.
#logic to select 10 classes in random
# set.seed(12345)
# class_unique<-data.frame(unique(df$class))
# class_sample<-sample_n(class_unique,20,replace=FALSE)
#
# #logic to select all subjects in these classes
# df<-df%>%
# filter(class %in% class_sample$unique.df.class.)
cols<-c('class','pupil',"sex")
data[cols]<-lapply(data[cols],factor)
kable(head(df))
class | pupil | extrav | sex | texp | popular | extrav.gmc | texp.gmc | popular.gmc |
---|---|---|---|---|---|---|---|---|
1 | 4 | 3 | 1 | 24 | 4.7 | -2.2148222 | 9.751627 | -0.3758137 |
1 | 5 | 5 | 1 | 24 | 6.0 | -0.2148222 | 9.751627 | 0.9241863 |
1 | 6 | 4 | 0 | 24 | 4.7 | -1.2148222 | 9.751627 | -0.3758137 |
1 | 7 | 5 | 0 | 24 | 5.9 | -0.2148222 | 9.751627 | 0.8241863 |
1 | 8 | 4 | 0 | 24 | 4.2 | -1.2148222 | 9.751627 | -0.8758137 |
1 | 9 | 5 | 0 | 24 | 5.2 | -0.2148222 | 9.751627 | 0.1241863 |
#logic to select 10 subjects from each of these randomly selected classes
set.seed(0)
df_10<-df%>%
group_by(class)%>%
sample_n(size=10)
m<-lmer(popular.gmc~extrav.gmc*texp.gmc+
(1|class),df_10,REML=FALSE)
summary(m)
## Linear mixed model fit by maximum likelihood . t-tests use
## Satterthwaite's method [lmerModLmerTest]
## Formula: popular.gmc ~ extrav.gmc * texp.gmc + (1 | class)
## Data: df_10
##
## AIC BIC logLik deviance df.resid
## 2896.1 2925.5 -1442.0 2884.1 994
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.2958 -0.6777 0.0206 0.6763 2.5537
##
## Random effects:
## Groups Name Variance Std.Dev.
## class (Intercept) 0.4013 0.6335
## Residual 0.8824 0.9394
## Number of obs: 1000, groups: class, 100
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) -0.080519 0.071156 105.784977 -1.132 0.26
## extrav.gmc 0.487537 0.028548 963.511858 17.078 < 2e-16 ***
## texp.gmc 0.107757 0.010899 108.529925 9.886 < 2e-16 ***
## extrav.gmc:texp.gmc -0.022240 0.004021 994.768035 -5.531 4.07e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) extrv. txp.gm
## extrav.gmc 0.024
## texp.gmc -0.022 0.188
## extrv.gmc:. 0.181 0.046 -0.099
# fixed intercept and slope
fixed <- c(-.0805,.4875,.1078,-.02224)
# random intercept and slope variance-covariance matrix
# For class
rand <-.4013
# Exrtact the residual sd
s <- .8824^.5
model <- makeLmer(y ~ extrav*texp + (1|class), fixef=fixed, VarCorr=rand, sigma=s, data=df_10)
model
## Linear mixed model fit by REML ['lmerMod']
## Formula: y ~ extrav * texp + (1 | class)
## Data: df_10
## REML criterion at convergence: 2911.247
## Random effects:
## Groups Name Std.Dev.
## class (Intercept) 0.6335
## Residual 0.9394
## Number of obs: 1000, groups: class, 100
## Fixed Effects:
## (Intercept) extrav texp extrav:texp
## -0.08050 0.48750 0.10780 -0.02224
To study the effect an increase in sample size has on our ability to detect the effect of interest we can increase the number of levels for one of the factors in our model. This can be achieved with the extend function. For example, we could increase the number of classes from 10 to 50, this way we can do power analysis for any number of clusters that is less than 50.
model_ext_class <- extend(model, along="class", n=50)
model_ext_class
## Linear mixed model fit by REML ['lmerMod']
## Formula: y ~ extrav * texp + (1 | class)
## Data: df_10
## REML criterion at convergence: 2911.247
## Random effects:
## Groups Name Std.Dev.
## class (Intercept) 0.6335
## Residual 0.9394
## Number of obs: 1000, groups: class, 100
## Fixed Effects:
## (Intercept) extrav texp extrav:texp
## -0.08050 0.48750 0.10780 -0.02224
Here we test how number of clusters effect the power of extraversion.
set.seed(0)
p_curve_extrav <- powerCurve(model_ext_class,
test=fcompare(y~texp),
along="class",
breaks=c(10,15,20,30,50),
nsim=500,alpha=.05, progress=FALSE)
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
plot(p_curve_extrav)
This curve represents the power estimations of pupil extraversion when nsize = 10 and cluster size varies. We need at least 20 clusters of 10 to have a power of at least 80% given the effect is present.
Here we test how number of clusters affect the power of teacher experience.
set.seed(0)
p_curve_texp <- powerCurve(model_ext_class,
test=fcompare(y~extrav),
along="class",
breaks=c(10,15,20,30,50),
nsim=500,alpha=.05, progress=FALSE)
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
plot(p_curve_texp)
This curve represents the power estimations of teacher experience when nsize = 10 and cluster size varies. We need at least 45 clusters of 10 to have a power of at least 80% given the effect is present.
Here we test how number of clusters affect the power of inetaction between extravtion and teacher experience.
set.seed(0)
p_curve_interaction <- powerCurve(model_ext_class,
test=fcompare(y~texp+extrav),
along="class",
breaks=c(10,15,20,30,50),
nsim=500,alpha=.05, progress=FALSE)
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
plot(p_curve_interaction)
This curve represents the power estimations of the cross-level interaction between extraversion and teacher experience when nsize = 10 and cluster size varies. We need at least 42 clusters of 10 to have a power of at least 80% given the effect is present.
#logic to select 10 subjects from each of these randomly selected classes
set.seed(0)
df_15<-df%>%
group_by(class)%>%
sample_n(size=15)
m2<-lmer(popular.gmc~extrav.gmc*texp.gmc+
(1|class),df_15,REML=FALSE)
summary(m2)
## Linear mixed model fit by maximum likelihood . t-tests use
## Satterthwaite's method [lmerModLmerTest]
## Formula: popular.gmc ~ extrav.gmc * texp.gmc + (1 | class)
## Data: df_15
##
## AIC BIC logLik deviance df.resid
## 4292.8 4324.7 -2140.4 4280.8 1494
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.1896 -0.7144 0.0131 0.6716 3.1418
##
## Random effects:
## Groups Name Variance Std.Dev.
## class (Intercept) 0.3682 0.6068
## Residual 0.8908 0.9438
## Number of obs: 1500, groups: class, 100
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) -6.548e-02 6.632e-02 1.051e+02 -0.987 0.326
## extrav.gmc 4.794e-01 2.310e-02 1.474e+03 20.757 < 2e-16 ***
## texp.gmc 1.071e-01 1.018e-02 1.087e+02 10.522 < 2e-16 ***
## extrav.gmc:texp.gmc -2.539e-02 3.348e-03 1.464e+03 -7.585 5.88e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) extrv. txp.gm
## extrav.gmc -0.002
## texp.gmc -0.030 0.169
## extrv.gmc:. 0.167 0.027 -0.127
# fixed intercept and slope
fixed2 <- c(-.0655,.4794,.1071,-.02539)
# random intercept and slope variance-covariance matrix
# For class
rand2 <-.3682
# Exrtact the residual sd
s2 <- .8908^.5
model2 <- makeLmer(y ~ extrav*texp + (1|class), fixef=fixed2, VarCorr=rand2, sigma=s2, data=df_15)
model2
## Linear mixed model fit by REML ['lmerMod']
## Formula: y ~ extrav * texp + (1 | class)
## Data: df_15
## REML criterion at convergence: 4403.964
## Random effects:
## Groups Name Std.Dev.
## class (Intercept) 0.6068
## Residual 0.9438
## Number of obs: 1500, groups: class, 100
## Fixed Effects:
## (Intercept) extrav texp extrav:texp
## -0.06550 0.47940 0.10710 -0.02539
model_ext_class2 <- extend(model2, along="class", n=50)
model_ext_class2
## Linear mixed model fit by REML ['lmerMod']
## Formula: y ~ extrav * texp + (1 | class)
## Data: df_15
## REML criterion at convergence: 4403.964
## Random effects:
## Groups Name Std.Dev.
## class (Intercept) 0.6068
## Residual 0.9438
## Number of obs: 1500, groups: class, 100
## Fixed Effects:
## (Intercept) extrav texp extrav:texp
## -0.06550 0.47940 0.10710 -0.02539
Here we test how number of clusters effect the power of extraversion
set.seed(0)
p_curve_extrav2 <- powerCurve(model_ext_class2,
test=fcompare(y~texp),
along="class",
breaks=c(10,15,20,30,50),
nsim=500,alpha=.05, progress=FALSE)
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
plot(p_curve_extrav2)
This curve represents the power estimations of pupil extraversion when nsize = 15 and cluster size varies. We need at least 15 clusters of 15 to have a power of at least 80% given the effect is present.
Here we test how number of clusters effect the power of teacher experience
set.seed(0)
p_curve_texp2 <- powerCurve(model_ext_class2,
test=fcompare(y~extrav),
along="class",
breaks=c(10,15,20,30,50),
nsim=500,alpha=.05, progress=FALSE)
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
plot(p_curve_texp2)
This curve represents the power estimations of teacher experience when nsize = 15 and cluster size varies. We need at least 18 clusters of 15 to have a power of at least 80% given the effect is present.
Here we test how number of clusters affect the power of interaction between extraversion and teacher experience
set.seed(0)
p_curve_interaction2 <- powerCurve(model_ext_class2,
test=fcompare(y~texp+extrav),
along="class",
breaks=c(10,15,20,30,50),
nsim=500,alpha=.05, progress=FALSE)
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
## boundary (singular) fit: see ?isSingular
plot(p_curve_interaction2)
This curve represents the power estimations of the cross-level interaction between extraversion and teacher experience when nsize = 15 and cluster size varies. We need at least 20 clusters of 15 to have a power of at least 80% given the effect is present.
Conlusion: We need at least 20 (clusters) x 15 (Ss/cluster) = 300 observations to detect all 3 effects (extrav, texp, extrav x texp) given that all 3 effects exist in the population.
references: https://humburg.github.io/Power-Analysis/simr_power_analysis.html http://www.alexanderdemos.org/Mixed9.html#