ZREG <- function(B1, B2, SEB1, SEB2) {
Z = (B1-B2)/sqrt((SEB1)^2 + (SEB2)^2)
return(Z)}
Kovacs, Molenaar & Conway (2019) claimed to find theoretical support for Process Overlap Theory in the form of differential differentiation of working memory capacity and short-term memory capacity by Gf, Gc, and Gv, as they suggested theory predicted. They said they predicted that working memory should be more differentiated with respect to fluid intelligence than short-term memory, and that fluid intelligence should be a greater moderator than crystallized or visuospatial intelligence. They made this claim despite negative evidence in their study, stemming from the fact that they failed to consider that “the difference between ‘significant’ and ‘not significant’ [was] not itself statistically significant.” (Gelman & Stern, 2006).
The analysis they did before this analysis just delivers nothing with respect to inference because it needed to be residualized for other ability variance before testing, as their other hypotheses make abundantly clear through discussion of construct correlations. The issue of heteroskedastic residual variances was not touched and could not be addressed with their data due to their limited number of indicators, regardless of the contamination of WMC by general (a touchy subject) or other specific variance. Without taking place in a battery with many tests and a very large sample to obviate finite sample concerns, this analysis should be regarded as impossible with respect to theory testing because a large part of the variance in working memory is causal variance shared between other abilities (see Panizzon et al., 2014), and the residual is of unknown validity. In bifactor modeling, I typically see the working memory factor collapse, as there is too little residual non-g variance and too little violation of local independence to support a net-of-g working memory factor. Their analysis concerned - as their full set of hypotheses clearly showed - too many constructs that are too closely mirrored to differentiate between them and figure out what was actually tested. I thus see no reason to address Kovacs, Molenaar & Conway’s first study from their paper, as what it tested is unknown and it was as if they tested an explanation for the positive manifold while denying its existence; I merely wish to showcase the extremely obvious error in their subsequent analyses.
A Z-score of 1.96 represents a two-sided p-value of .05. If one believes the theory is strong, they could argue for a one-sided p-value of .05, which corresponds to a Z-score of 1.65.
#Intra-WMC
ZREG(-.198, .034, .089, .085) #Gf vs Gc Moderation of WMC
## [1] -1.88512
ZREG(-.198, .102, .089, .088) #Gf vs Gv Moderation of WMC
## [1] -2.396934
ZREG(.034, .102, .085, .088) #Gc vs Gv Moderation of WMC
## [1] -0.5557923
#Intra-STM
ZREG(.065, -.133, .135, .120) #Gf vs Gc Moderation of STM
## [1] 1.0962
ZREG(.065, -.033, .135, .120) #Gf vs Gv Moderation of STM
## [1] 0.5425638
ZREG(-.133, -.033, .120, .136) #Gc vs Gv Moderation of STM
## [1] -0.5513514
#WMC vs STM
ZREG(-.198, .065, .089, .135) #Gf for WMC vs Gf for STM
## [1] -1.626496
ZREG(-.198, -.133, .089, .120) #Gf for WMC vs Gc for STM
## [1] -0.4350674
ZREG(-.198, -.033, .089, .136) #Gf for WMC vs Gv for STM
## [1] -1.015178
ZREG(.034, .065, .085, .135) #Gc for WMC vs Gf for STM
## [1] -0.1943201
ZREG(.034, -.133, .085, .120) #Gc for WMC vs Gc for STM
## [1] 1.135634
ZREG(.034, -.033, .085, .136) #Gc for WMC vs Gv for STM
## [1] 0.4177639
ZREG(.102, .065, .088, .135) #Gv for WMC vs Gf for STM
## [1] 0.2296012
ZREG(.102, -.133, .088, .120) #Gv for WMC vs Gc for STM
## [1] 1.57921
ZREG(.102, -.033, .088, .136) #Gv for WMC vs Gv for STM
## [1] 0.8333968
What differed theoretically? Gf was supposed to be a stronger moderator, and WMC was supposed to be more strongly moderated - negatively - than STM. What happened in practice? Gf only differed from the rest of the pack once, and only from the moderation of WMC by Gc, and WMC and STM did not differ significantly at all, even using a one-sided test. There were a total of 15 comparisons, and one was significant, which, when the likelihood of at least one false-positive among this set is at least \(1 - .95^{15} = 54\%\) given the null, seems like it’s basically nothing with respect to process overlap theory.
The quality of the evidence in favor of process overlap theory provided by Kovacs, Molenaar & Conway (2019) was extremely poor, and that assumes that it is not worse than shown here due to the correlatedness of comparisons or low-quality in measurement. Since Gf, Gc, and Gv were not residualized for one another, nor WMC for its STM variance, we can assume all outcomes are tautologically correlated given the stylized fact of the positive manifold. Therefore, the evidence must be weaker than I have shown here, even if it is qualified that the theoretical predictions have to do with their degrees of correlatedness as that just means theory testing issues will be exacerbated by finite sample problems. See Stevens, Masud & Suyundikov (2017), Vickerstaff, Omar & Ambler (2019), and Vul et al. (2009) for more details on correlated comparison issues.
Kovacs, K., Molenaar, D., & Conway, A. R. A. (2019). The domain specificity of working memory is a matter of ability. Journal of Memory and Language, 109. https://doi.org/10.1016/j.jml.2019.104048
Gelman, A., & Stern, H. (2006). The Difference Between “Significant” and “Not Significant” is not Itself Statistically Significant. The American Statistician, 60(4), 328–331. https://doi.org/10.1198/000313006X152649
Panizzon, M. S., Vuoksimaa, E., Spoon, K. M., Jacobson, K. C., Lyons, M. J., Franz, C. E., Xian, H., Vasilopoulos, T., & Kremen, W. S. (2014). Genetic and Environmental Influences of General Cognitive Ability: Is g a valid latent construct? Intelligence, 43, 65–76. https://doi.org/10.1016/j.intell.2014.01.008
Stevens, J. R., Masud, A. A., & Suyundikov, A. (2017). A comparison of multiple testing adjustment methods with block-correlation positively-dependent tests. PLOS ONE, 12(4), e0176124. https://doi.org/10.1371/journal.pone.0176124
Vickerstaff, V., Omar, R. Z., & Ambler, G. (2019). Methods to adjust for multiple comparisons in the analysis and sample size calculation of randomised controlled trials with multiple primary outcomes. BMC Medical Research Methodology, 19(1), 129. https://doi.org/10.1186/s12874-019-0754-4
Vul, E., Harris, C., Winkielman, P., & Pashler, H. (2009). Puzzlingly High Correlations in fMRI Studies of Emotion, Personality, and Social Cognition. Perspectives on Psychological Science, 4(3), 274–290. https://doi.org/10.1111/j.1745-6924.2009.01125.x
sessionInfo()
## R version 4.1.2 (2021-11-01)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19042)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.28 R6_2.5.1 jsonlite_1.7.2 magrittr_2.0.1
## [5] evaluate_0.14 rlang_0.4.12 stringi_1.7.5 jquerylib_0.1.4
## [9] bslib_0.3.1 rmarkdown_2.11 tools_4.1.2 stringr_1.4.0
## [13] xfun_0.27 yaml_2.2.1 fastmap_1.1.0 compiler_4.1.2
## [17] htmltools_0.5.2 knitr_1.36 sass_0.4.0