Power Bifactorial Factor analysis

In order to determine the sample size for a bifactorial factor analysis, giving a power of .8 I did a simulation study. First a correlation matrix was defined with containing the correlations between different parts of the subtests. Subtest recognition has three parts, counting one part, discrimination one part, number and place value four parts, substraction and addition 9 parts, multiplication and division five parts and shape four parts. Note that time turned out to be too difficult and is therefore not taken into account. So, in total there were 27 parts (k), forming a correlation matrix of 27 by 27. It was assumed that the correlation between parts belonging to the same subtest (rw) was between .5 and .7. This was based on the average inter-item correlation within parts. The correlation between parts belonging to different subtest (rb) was set between .3 and .45. This was based on the average inter item correlation between items of different subtests. We took the most conservative situation for calculating the required sample size, i.e. rw=.5 and rb=.3.

sigma.fun<-function(k,rb,rw){
  m<-matrix(NA,sum(k),sum(k))
  s=1
  m[c(1:k[s]),]<-t(matrix(
    rep(
      c(rep(rw,k[s]),rep(rb,sum(k[(s+1):length(k)]))),
      k[s]
    ),
    sum(k),k[s])
  )
  
  for(s in c(2:(length(k)-1))){
    m[(sum(k[1:(s-1)])+1):(sum(k[1:(s-1)])+k[s]),]<-t(matrix(
      rep(
        c(rep(rb,sum(k[1:(s-1)])),rep(rw,k[s]),rep(rb,sum(k[(s+1):length(k)]))),
        k[s]
      ),
      sum(k),k[s])
    )
  }
  
  m[(sum(k)-k[length(k)]+1):sum(k),]<-t(matrix(
    rep(
      c(rep(rb,sum(k[-length(k)])),rep(rw,k[length(k)])),
      k[length(k)]
    ),
    sum(k),k[length(k)])
  )
  
  for(i in seq.int(nrow(m))){
    m[i,i]<-1
  }
  return(m)
}

Next data was simulated with sample size (J) based on the correlation matrix.

data.CFA.sim<-function(J,k,rb,rw){
 
  sigma<-sigma.fun(k,rb,rw)
    df<-rmvnorm(n=J,mean=rep(0,nrow(sigma)),sigma=sigma)
 colnames(df)<-c(paste0("x",c(1:sum(k))))
  return(df)
}

In the next step the bifactorial factor model was defined. The seven subtest were seven latent variables, or factors. The parts form the indicators of the subtest (x1 through x27).

  model <- 'recognition  =~ x1 + x2 + x3 
            counting  =~ x4 
            discrimination  =~ x5
            numberplacevalue  =~ x6+x7+x8+x9
            adding_substract =~ x10+x11+x12+x13+x14+x15+x16+x17+x18
            multipication =~ x19+x20+x21+x22+x23
            shape =~ x24+x25+x26+x27 
  #higher order
            F1= ~ recognition
            F1= ~ counting
           F1= ~  discrimination 
            F1= ~ numberplacevalue  
            F1= ~ adding_substract 
            F1= ~ multipication
            F1= ~ shape 
            '

Finally the power function for the bifactorial factor model was defined. The rationale is as follows. For 1000 replications we simulated data and fit the model and calculate the fit measures. A simulated dataset was assumed to fit when the cfi>=.95, the rmsea<=.06, the srmr<=.08 and the gfi >=.9. Note that Chi-square statistics and accompanying p-values are not informative for the model fit when sample size is too large or too small. The power is now defined as the number of samples that fit, giving the above criteria.

CFA.power<-function(J,k,rb,rw,model,n.sims){
  signif<-rep(NA,n.sims)
  err<-rep(NA,n.sims)
  
  for(s in 1:n.sims){
    sim<-data.CFA.sim(J,k,rb,rw)
   
    fit<-tryCatch(sem(model, data=sim), error=function(e) NA)
    cfi<- tryCatch(fitMeasures(fit,"cfi"), error=function(e) NA)
    rmsea<- tryCatch(fitMeasures(fit,"rmsea"), error=function(e) NA)
    srmr<- tryCatch(fitMeasures(fit,"srmr"), error=function(e) NA)
    gfi<- tryCatch(fitMeasures(fit,"gfi"), error=function(e) NA)
  
    if(!is.na(cfi) & !is.na(rmsea) & !is.na(srmr) & !is.na(gfi)){
    signif[s]<- cfi>=.95 & rmsea<=.06 & srmr<=.08 & gfi >=.9
    err[s]<-0
    }else{
     signif[s]<-FALSE
     err[s]<-1
    }
  }
  power<-mean(signif,na.rm=TRUE)
  error<-paste("proportion error is",mean(err),sep=" ")
  res<-list(power,error)
  return(res)
}

Results

The sample size J was ranged from 200 to 270 in steps of 5).

J=seq(200,270,5)
rb.3rw.5<-read.csv("/Users/samanthabouwmeester/rb.3rw.5.csv")[,2]

df<-data.frame(J,rb.3rw.5)
plot_ly(df,x = df[,1], y =df[,2], 
        type = 'scatter',
        mode="markers+lines",
        line=list(width =1),
        marker = list(size=2),
        showlegend=FALSE,
        height=250,width=500)%>%
  layout(title="power = .8",xaxis=list(
    title ="Sample Size"),yaxis=list(
      title ="Power"))%>%
  add_segments(x = 200, xend = 270, y = 0.8, yend = .8) 
## A marker object has been specified, but markers is not in the mode
## Adding markers to the mode...

Conclusion

The results show that a sample size of 230 will be sufficient for having a power of .8 to find a good model fit. Note that these results are based on the Math test only and does not guarantee a good fit of the bifactorial model on the literacy test.