In order to determine the sample size for a bifactorial factor analysis, giving a power of .8 I did a simulation study. First a correlation matrix was defined with containing the correlations between different parts of the subtests. Subtest recognition has three parts, counting one part, discrimination one part, number and place value four parts, substraction and addition 9 parts, multiplication and division five parts and shape four parts. Note that time turned out to be too difficult and is therefore not taken into account. So, in total there were 27 parts (k), forming a correlation matrix of 27 by 27. It was assumed that the correlation between parts belonging to the same subtest (rw) was between .5 and .7. This was based on the average inter-item correlation within parts. The correlation between parts belonging to different subtest (rb) was set between .3 and .45. This was based on the average inter item correlation between items of different subtests. We took the most conservative situation for calculating the required sample size, i.e. rw=.5 and rb=.3.
sigma.fun<-function(k,rb,rw){
m<-matrix(NA,sum(k),sum(k))
s=1
m[c(1:k[s]),]<-t(matrix(
rep(
c(rep(rw,k[s]),rep(rb,sum(k[(s+1):length(k)]))),
k[s]
),
sum(k),k[s])
)
for(s in c(2:(length(k)-1))){
m[(sum(k[1:(s-1)])+1):(sum(k[1:(s-1)])+k[s]),]<-t(matrix(
rep(
c(rep(rb,sum(k[1:(s-1)])),rep(rw,k[s]),rep(rb,sum(k[(s+1):length(k)]))),
k[s]
),
sum(k),k[s])
)
}
m[(sum(k)-k[length(k)]+1):sum(k),]<-t(matrix(
rep(
c(rep(rb,sum(k[-length(k)])),rep(rw,k[length(k)])),
k[length(k)]
),
sum(k),k[length(k)])
)
for(i in seq.int(nrow(m))){
m[i,i]<-1
}
return(m)
}
Next data was simulated with sample size (J) based on the correlation matrix.
data.CFA.sim<-function(J,k,rb,rw){
sigma<-sigma.fun(k,rb,rw)
df<-rmvnorm(n=J,mean=rep(0,nrow(sigma)),sigma=sigma)
colnames(df)<-c(paste0("x",c(1:sum(k))))
return(df)
}
In the next step the bifactorial factor model was defined. The seven subtest were seven latent variables, or factors. The parts form the indicators of the subtest (x1 through x27).
model <- 'recognition =~ x1 + x2 + x3
counting =~ x4
discrimination =~ x5
numberplacevalue =~ x6+x7+x8+x9
adding_substract =~ x10+x11+x12+x13+x14+x15+x16+x17+x18
multipication =~ x19+x20+x21+x22+x23
shape =~ x24+x25+x26+x27
#higher order
F1= ~ recognition
F1= ~ counting
F1= ~ discrimination
F1= ~ numberplacevalue
F1= ~ adding_substract
F1= ~ multipication
F1= ~ shape
'
Finally the power function for the bifactorial factor model was defined. The rationale is as follows. For 1000 replications we simulated data and fit the model and calculate the fit measures. A simulated dataset was assumed to fit when the cfi>=.95, the rmsea<=.06, the srmr<=.08 and the gfi >=.9. Note that Chi-square statistics and accompanying p-values are not informative for the model fit when sample size is too large or too small. The power is now defined as the number of samples that fit, giving the above criteria.
CFA.power<-function(J,k,rb,rw,model,n.sims){
signif<-rep(NA,n.sims)
err<-rep(NA,n.sims)
for(s in 1:n.sims){
sim<-data.CFA.sim(J,k,rb,rw)
fit<-tryCatch(sem(model, data=sim), error=function(e) NA)
cfi<- tryCatch(fitMeasures(fit,"cfi"), error=function(e) NA)
rmsea<- tryCatch(fitMeasures(fit,"rmsea"), error=function(e) NA)
srmr<- tryCatch(fitMeasures(fit,"srmr"), error=function(e) NA)
gfi<- tryCatch(fitMeasures(fit,"gfi"), error=function(e) NA)
if(!is.na(cfi) & !is.na(rmsea) & !is.na(srmr) & !is.na(gfi)){
signif[s]<- cfi>=.95 & rmsea<=.06 & srmr<=.08 & gfi >=.9
err[s]<-0
}else{
signif[s]<-FALSE
err[s]<-1
}
}
power<-mean(signif,na.rm=TRUE)
error<-paste("proportion error is",mean(err),sep=" ")
res<-list(power,error)
return(res)
}
The sample size J was ranged from 200 to 270 in steps of 5).
J=seq(200,270,5)
rb.3rw.5<-read.csv("/Users/samanthabouwmeester/rb.3rw.5.csv")[,2]
df<-data.frame(J,rb.3rw.5)
plot_ly(df,x = df[,1], y =df[,2],
type = 'scatter',
mode="markers+lines",
line=list(width =1),
marker = list(size=2),
showlegend=FALSE,
height=250,width=500)%>%
layout(title="power = .8",xaxis=list(
title ="Sample Size"),yaxis=list(
title ="Power"))%>%
add_segments(x = 200, xend = 270, y = 0.8, yend = .8)
## A marker object has been specified, but markers is not in the mode
## Adding markers to the mode...
The results show that a sample size of 230 will be sufficient for having a power of .8 to find a good model fit. Note that these results are based on the Math test only and does not guarantee a good fit of the bifactorial model on the literacy test.