Setup

Packages

library(pacman)
p_load(lavaan, psych, semPlot, tidySEM)

Data

#Correlation matrices
#SST has been reflected to achieve all-positive loadings

lowerFRE = '
1                                           
0.3 1                                       
0.44    0.33    1                                   
0.21    0.35    0.27    1                               
0.18    0.23    0.19    0.18    1                           
-0.06   -0.06   -0.05   0   -0.06   1                       
0.05    0.08    0.11    0.09    0.05    -0.13   1                   
0.09    0.09    0.15    0.1 0.12    0.01    0.07    1               
-0.01   -0.03   -0.02   0   0   0   0   0.02    1           
0.05    0.09    0.06    0.1 0.06    -0.01   0.03    0.03    -0.05   1       
0.28    0.4 0.31    0.3 0.23    -0.04   0.1 0.08    -0.02   0.06    1   
0.26    0.42    0.31    0.26    0.24    -0.04   0.06    0.05    -0.07   0.07    0.41    1'

lowerFRECOG = '
1                       
0.3 1                   
0.44    0.33    1               
0.18    0.23    0.19    1           
0.09    0.09    0.15    0.12    1       
0.28    0.4 0.31    0.23    0.08    1   
0.26    0.42    0.31    0.24    0.05    0.41    1'

#Variable names

Names = list("Flanker", "List", "Card", "Picture", "Nback", "StroopRT", "StroopACC", "SST", "PI ", "RI", "MatrixR", "PicVocab")
NamesCog = list("Flanker", "List", "Card", "Nback", "SST", "MatrixR", "PicVocab")

#Convert to variance-covariance matrices

FRE.cor = getCov(lowerFRE, names = Names)
FRECOG.cor = getCov(lowerFRECOG, names = NamesCog)
FRESDs <- c(9.14, 12.09, 9.51, 12.07, 0.12, 63.42, 0.05, 65.99, 1.99, 1.99, 3.84, 8.12)
FRE.cov = lavaan::cor2cov(R = FRE.cor, sds = FRESDs)

#Sample Size - Truncated Average Used for Expedience, PI/RI/StroopRT/StroopACC Excluded

AN <- 10662

#Fit measures

FITM <- c("chisq", "df", "nPar", "cfi", "rmsea", "rmsea.ci.lower", "rmsea.ci.upper", "aic", "bic")

LATS <- list(
  CEF = c("Card", "Flanker", "Nback", "List", "SST"),
  US = c("Nback", "List"),
  INT = c("MatrixR", "PicVocab"))

Rationale

Freis et al. (2020) appear to have gotten caught in a loosely-theoretical vein of intelligence research. The research area I’m referring to frequently provides atheoretical justifications for factor models and construct operationalizations and takes an inconsistently realist stance towards factor models. Their justifications for their factor models do not have a solid theoretical basis since they are ad hoc and inexplicable in the light of even the evidence they often cite. For this reason, I believe they are researchers who may not be researching for the purposes of programmatic advancement but instead to avoid not publishing and thus perishing; the mediator in this explanation is that by having these motives, they remain methodologically, theoretically, philosophically, and empirically ignorant thanks to their regrettable orientation towards research. The low psychometric quality of their work is strong evidence that this - or something else which explains these empirical facts - operates to produce downright incoherent research. This paragraph may sound crude and insulting so it deserves elaboration to avoid giving that impression.

This vein of research is intimately tied to an operationalist stance that comes with the the tacit acknowledgement of well-defined constructs and the concept of construct validation but which still identifies constructs by their indicators in a haphazard style. For example, researchers for whom this description applies pay lip-service to the concept of general intelligence and then proceed to define it as a narrow group factor based on identification of g with particular tests, failing to acknowledge that if what they were looking at was general it would load on all tests, and the real construct they have foregone modeling is involved in all of their factors and may explain practically all of their common variance and their covariances (this is stunningly common). The implicit misdefinition leads to an inability to clearly interpret the results of the model. A typical finding may be worded “working memory mediated the relationship between g and reading ability” (this is a hypothetical) but despite the model apparently showing this result, the relationship could be entirely attributed to g, properly defined. Therefore, the result says nothing about mediation of the conceptually identified factors. This is of course a wider problem where methods do not correspond to results but it is notable here because it is a major barrier to research, credibility, and it threatens the enterprise of construct validation that has been so vital to the superior replicability, validity, and explanatory power of intelligence compared to other constructs under psychological study.

As such, one major problem of the paper is the definition of intelligence: it is unintelligible; it is neither consistent with how intelligence (i.e., general intelligence or g) is usually defined nor with any major theory of intelligence of which I am aware including Cattell-Horn-Carroll, g-VPR, or mutualism. The reason for this is that they have implicitly defined intelligence as the shared variance of two visuospatial tasks. The implied prediction here is that a visuospatial group factor - Gv - will showcase an extreme (perhaps approaching unity) relationship with a g factor defined with a greater diversity of tasks. Astute researchers in this field probably already know this is unlikely to happen. In a similar instance, Johnson et al. (2008) found that the Cattell Culture Fair Test (matrix reasoning portion) “g factor”, while highly correlated with the g factor from larger batteries, did not produce exactly the same g, as is typical of comparisons between large, diverse batteries, regardless of construction. One could argue that they are properly defining intelligence as far as they can with their battery if two conditions are met: First, that their two tests are, in fact, highly g loaded normally, and second, that the other tasks are not. Since we see a correlation of r = 0.64 between their “Intelligence” measures and “Common EF”, and since, as one can use the data here to find, the relationship between these factors in a model without the “Updating-Specific” factor is r \(\approx\) 0.89 (this would presumably be greater with more diverse content; the reason for the decrease in relationship between CEF and intelligence with US is a tautology from modeling a factor covariance for an orthogonal factor defined by cross-loadings), this is virtually guaranteed to be false. Why it should be otherwise is beyond me. A more theoretically consistent definition of intelligence can be adapted from Gratch’s (1973, cited in and slightly modified by Guttman & Levy, 1991) definition of the domain of intelligence tests:

An item belongs to the universe of intelligence test items if and only if its domain asks about an objective rule, and its range is ordered from \(\begin{Bmatrix} \text{very right}\\ \text{to}\\ \text{very wrong} \end{Bmatrix}\) with respect to that rule.

Another way of stating this is the equivalent statement that

[General] Intelligence is the common cause of performance among all cognitive performance indicators.

or to take an anti-realist stance

[General] Intelligence is the variance shared by all cognitive performance indicators.

In the first definition, “cognitive performance” is regarded as a real thing whereas in the second, “cognitive” and “performance” are separate such that something is “cognitive” to the extent that it is psychological and “performance” is function with respect to obtaining the correct answers on an assessment. These all disagree with the implicit definition taken by Freis et al. (2020) because their definition is parochial and therefore cannot correspond to (general) intelligence or a definition of intelligence that is anything but domain-specific. I can conceptually evaluate whether the g factor should be expanded based on their descriptions of the subtests involved in the study. The five subtests not included under the aegis of intelligence were card and list sorting, flanker, n-back, and stop signal reaction time. They call these “cognitive tasks” (which signals confusions about their definitions) and each of them contains a performance dimension and must, thus, be included under intelligence lest they risk a phenotypic factor model that does not produce inferences corresponding to what they want them to (i.e., conclusions about the relationship between executive function and intelligence, as opposed to a parochial factor sharing the name “intelligence” for no good reason).

To my knowledge, when properly modeled, the g factor from psychometric tests and the g factor from executive function tests are identical. This has been tested one time of which I am aware. Floyd et al. (2013) tested this with the Delis-Kaplan Executive Function System and the Woodcock-Johnson III, finding that the g factors from these tests correlated at 0.99 to 1.00. The eleven subtests of the DKEFS share a good deal of content - inhibition, fluency, switching, and sorting - with the ABCD’s battery and the higher-order g factor in the DKEFS explained more of the indicator variance than did the group factors so there is a real cause to do some proper psychometric modeling here. I can strengthen this point independently. To do this, I will use the data provided in Engelhardt’s (2018) excellent dissertation. I show below that this data replicates the result from Floyd et al. (2013), suggesting again that tests intended to assess executive function really tap into the same g as standard psychometric instruments, and that these tests are principally unidimensional representations of g with minimally interpretable group factors when the variance from g is removed. The data are provided in full below so anyone may interrogate the result as they wish. Based on the result, it is difficult to argue for how Freis et al. modeled “intelligence”. At the very least, they should change this factor’s name to “visuospatial ability and g” since it currently (presumably) models major contributions from both.

It may appear that Freis et al. (2020) have preempted this concern, writing

EFs and IQ are sometimes conflated as tapping the same general cognitive function…. However, our findings demonstrate that these constructs are phenotypically separate in middle childhood.

This is strange because their citations do not properly investigate possible factorial identity and they do show evidence for g loading to EF measures (a correlation between factors >0 but <1 means they share variance, not that they are tapping wholly different constructs that would be separate with a better model/data!). Moreover, because they are not actually modeling g and g-EF, they can’t really say there’s a phenotypic separateness to these constructs, only to their flawed operationalizations. In point of fact, there is likely a g factor involved in all of the subtests as the unidimensionality of the battery indicates. A simple parallel analysis would have provided some evidence of this. Sometimes people confuse indicators and constructs in the sorts of analyses they cite. This error was pointed out in Spearman (1904), oddly enough. The error amounts to a confusion regarding that for any given indicator, the relationship with g may be imperfect, but for a set of indicators, their g factor will likely produce a g factor identical to that from another battery; that is, g is the common variance, not a specific test result (which contains error, specificity, and other sources of systematic variance). For a statement to the effect of what I’ve said here, take the one made by Major, Johnson & Bouchard (2011; see also Frazier & Youngstrom, 2007 on overfactoring; Reeve & Blacksmith, 2009 on badly identifying g):

[T]he general factors from small non-hierarchical models were found to not be reliable enough for the purposes of theoretical research.

The interpretation of differences in variance components across age is also confused. The fact that EF tests are more heritable in childhood than are typical psychometric or achievement tests (and, perhaps, some of their resulting factors, definition in mind) can be attributed to the same explanation for the differences in the heritability of achievement measures across countries, i.e., access to requisite information, hence the inflation of C and its decline with age. EF tests require relatively less learned content (just review the tasks) and the content administered is unlikely to be content which some people have exposure to and others do not in early childhood, and therefore would probably yield less bias (because a lack of requisite information on the part of one or another group will yield bias) than an achievement test administered at the same age.

Regarding the genetic and environmental models, a full common pathway model needs to be assessed, both for the bifactor and the higher-order configuration, and these need compared to an independent pathway model (and a unified factor model). The reason for the shared variance among the modeled factors could be that they are not modeled properly. As has been found in many different studies, the entirety of the covariance between group factors could be mediated by g; this model needs tested, but to do that, g needs to be properly identified. It may be because there are so few indicators, this isn’t really possible. If this is the case, then the conclusions need to be moderated in line with the weakness of the results, as is proper. Properly conducting these models may not be feasible with how few indicators there are unfortunately, so conclusions should be moderated if that is the case. The matrices necessary for replicating those models were not provided but should be.

All updates to this page as a result of correspondence with authors or peers or from amendments resulting from finding additional datasets will be added to the bottom of the page and will not be reflected in this section which will be preserved as it is for posterity. I thank Freis et al. for providing enough of their data to allow checking and extending some of their results. Data from this study will be used in my Jensen effect meta-analysis. This reanalysis is of the preprint as it was uploaded on June 30, 2020.

Analysis

Engelhardt (2018) - “Just One g”

Just one g results tend to use higher-order factor models because psychometric sampling variance tends to have a greater effect at lower levels. This is not always the case, like in the GATB, but it usually is. Moreover, higher-order models make sense theoretically where bifactor ones do not (Hood, 2010). If things like common pathway modeling consistently support models like the bifactor rather than the higher-order model researchers should move towards them though. Panizzon found support for a higher-order model and Wainwright found support for the bifactor configuration. Many researchers who have tested common pathway models have not also tested the alternative (independent or bifactor/orthogonal), so, sadly, some data that may be hard to access have had potentially improper results recorded for them. Regardless, there is still concern and argument over which models more accurately reflect the structure of abilities.

This model included executive function measures, one of which was used by Freis et al. - stop signal RT - and another which was conceptually quite similar - 2-back - among others which Engelhardt placed in group factors named “inhibition”, “switching”, “working memory”, and “updating”. These are conceptually similar in some cases to the content covered by Freis et al.’s sample, so this demonstration should shift their prior towards modeling a common g factor (or better for fit/theory reasons, a higher-order or bifactor model) instead of whatever conceptually indeterminate “intelligence” factor they chose to model. The intelligence tests used here are just called that because they’re commonly found on intelligence tests. Unlike Engelhardt, I included speed measures under this label for a total of seven g indicators: pattern and letter comparison, symbol search, block design, matrix reasoning, vocabulary, and similarities.

lowerENG <- '
1                                                                       
0.09    1                                                                   
0.04    0.13    1                                                               
0.23    0.14    0.09    1                                                           
0.27    0.06    0.03    0.21    1                                                       
0.27    0.14    0.06    0.42    0.24    1                                                   
0.18    0.12    0.11    0.38    0.16    0.27    1                                               
0.26    0.16    0.06    0.4 0.22    0.37    0.32    1                                           
0.29    0.2 0.05    0.45    0.27    0.41    0.42    0.53    1                                       
0.25    0.15    0.09    0.45    0.23    0.39    0.44    0.46    0.6 1                                   
0.22    0.12    0.09    0.41    0.17    0.29    0.34    0.38    0.47    0.48    1                               
0.23    0.15    0.15    0.38    0.16    0.35    0.33    0.39    0.47    0.55    0.42    1                           
0.23    0.18    0.12    0.46    0.16    0.36    0.35    0.51    0.47    0.47    0.42    0.36    1                       
0.2 0.14    0.11    0.43    0.14    0.33    0.36    0.38    0.44    0.49    0.45    0.42    0.57    1                   
0.27    0.19    0.1 0.44    0.22    0.37    0.34    0.45    0.56    0.56    0.45    0.5 0.5 0.53    1               
0.25    0.17    0.1 0.45    0.21    0.37    0.32    0.43    0.51    0.5 0.41    0.43    0.5 0.45    0.74    1           
0.27    0.19    0.08    0.41    0.18    0.38    0.27    0.42    0.44    0.4 0.32    0.35    0.41    0.37    0.48    0.44    1       
0.28    0.18    0.03    0.42    0.23    0.41    0.33    0.46    0.44    0.38    0.28    0.32    0.4 0.3 0.45    0.44    0.69    1   
0.33    0.21    0.15    0.45    0.25    0.45    0.33    0.5 0.5 0.47    0.41    0.37    0.52    0.39    0.48    0.48    0.6 0.58    1'

lowerENGEF <- '
1                                           
0.09    1                                       
0.04    0.13    1                                   
0.23    0.14    0.09    1                               
0.27    0.06    0.03    0.21    1                           
0.27    0.14    0.06    0.42    0.24    1                       
0.18    0.12    0.11    0.38    0.16    0.27    1                   
0.26    0.16    0.06    0.4 0.22    0.37    0.32    1               
0.29    0.2 0.05    0.45    0.27    0.41    0.42    0.53    1           
0.25    0.15    0.09    0.45    0.23    0.39    0.44    0.46    0.6 1       
0.22    0.12    0.09    0.41    0.17    0.29    0.34    0.38    0.47    0.48    1   
0.23    0.15    0.15    0.38    0.16    0.35    0.33    0.39    0.47    0.55    0.42    1'

lowerENGWI <- '
1                       
0.57    1                   
0.5 0.53    1               
0.5 0.45    0.74    1           
0.41    0.37    0.48    0.44    1       
0.4 0.3 0.45    0.44    0.69    1   
0.52    0.39    0.48    0.48    0.6 0.58    1'

nENG <- 800

ENG.cor = getCov(lowerENG, names = c("STR", "MIC", "STO", "TRM", "PMI", "LOGL", "DIGS", "SYMM", "LISR", "RUNM", "KETR", "TBAC", "BD", "MATR", "VOCA", "SIMI", "PATC", "LETC", "SYMS"))

ENGEF.cor = getCov(lowerENGEF, names = c("STR", "MIC", "STO", "TRM", "PMI", "LOGL", "DIGS", "SYMM", "LISR", "RUNM", "KETR", "TBAC"))

ENGWI.cor = getCov(lowerENGWI, names = c("BD", "MATR", "VOCA", "SIMI", "PATC", "LETC", "SYMS"))

fa.parallel(ENG.cor, n.obs = nENG)

## Parallel analysis suggests that the number of factors =  4  and the number of components =  1

fa.parallel(ENGEF.cor, n.obs = nENG)

## Parallel analysis suggests that the number of factors =  2  and the number of components =  1

fa.parallel(ENGWI.cor, n.obs = nENG)

## Parallel analysis suggests that the number of factors =  3  and the number of components =  1

ENGEFA <- fa(ENGEF.cor, n.obs = nENG, nfactors = 2)

## Loading required namespace: GPArotation

print(ENGEFA$loadings, cutoff = 0.3)

## 
## Loadings:
##      MR1    MR2   
## STR          0.450
## MIC               
## STO               
## TRM   0.553       
## PMI          0.486
## LOGL  0.361       
## DIGS  0.566       
## SYMM  0.523       
## LISR  0.679       
## RUNM  0.811       
## KETR  0.645       
## TBAC  0.706       
## 
##                  MR1   MR2
## SS loadings    3.160 0.607
## Proportion Var 0.263 0.051
## Cumulative Var 0.263 0.314

Two factors for executive function seems to be the best two-stratum higher-order model. Going higher, the correlation between g factors is likely (but not guaranteed) to increase. To identify this two-factor model, we could constrain both factor group factor loadings on g and release the constraint on the latent variance or we could just constrain one loading and the latent variance; either way, we must make two constraints. Because we are not worried about the values of the g loadings, it might be expedient to just constrain one arbitrarily, but another option is to remove the latent variance constraint and set them both to 1 in order to get a more accurate estimate of the g loadings. I tried all of the permutations of these options and they didn’t affect initial fit so I arbitrarily identified the model by constraining the first group factor’s higher-order loading because it was the first. Fit was negligibly improved over the less parsimonious unidentified model. This does not affect the cross-battery model (except the one in which the g factors are unrelated) where this is unnecessary for identification.

ENGEF.model <- '
F1 =~ TRM + LOGL + DIGS + SYMM + LISR + RUNM + KETR + TBAC
F2 =~ STR + PMI

gEF =~ 1*F1 + F2'

ENGEF.fit <- cfa(ENGEF.model, sample.cov = ENG.cor, sample.nobs = nENG, std.lv = T)
fitMeasures(ENGEF.fit, FITM)

##          chisq             df           npar            cfi          rmsea 
##         77.322         34.000         21.000          0.981          0.040 
## rmsea.ci.lower rmsea.ci.upper            aic            bic 
##          0.028          0.052      20514.996      20613.373

Three group factors below general intelligence seem optimal here.

ENGEWI <- fa(ENGWI.cor, n.obs = nENG, nfactors = 3)
print(ENGEWI$loadings, cutoff = 0.2)

## 
## Loadings:
##      MR2    MR1    MR3   
## BD                  0.940
## MATR         0.284  0.443
## VOCA         1.019       
## SIMI         0.597       
## PATC  0.836              
## LETC  0.853              
## SYMS  0.574         0.215
## 
##                  MR2   MR1   MR3
## SS loadings    1.773 1.478 1.146
## Proportion Var 0.253 0.211 0.164
## Cumulative Var 0.253 0.464 0.628

Borderline cross-loadings probably signal that the observed ultra-Heywood case will translate to a negative variance in a confirmatory model so we’ll adjust the convergence criteria slightly. This will, at most, negligibly affect model fit and do so by worsening it. This is true for all models tested. For model comparisons, if one in a set of models had to have its criteria changed, all of them were changed to avoid any pretense that the comparison was unfair to one or another model.

ENGWI.model <- '
F1 =~ MATR + VOCA + SIMI
F2 =~ PATC + LETC + SYMS
F3 =~ BD + MATR

gNO =~ F1 + F2 + F3'

ENGWI.fit <- cfa(ENGWI.model, sample.cov = ENG.cor, sample.nobs = nENG, std.lv = T,  control=list(rel.tol=1e-4))
fitMeasures(ENGWI.fit, FITM)

##          chisq             df           npar            cfi          rmsea 
##         73.637         10.000         18.000          0.976          0.089 
## rmsea.ci.lower rmsea.ci.upper            aic            bic 
##          0.071          0.109      13297.081      13381.404

The first step in fitting the combined model for this sort of modeling is to fit a model in which the g factors are constrained to no relationship. The second model allows them to be related. This is followed by a model in which group factors are allowed to covary and then by a model in which specific indicators are allowed to covary. Finally, a model in which the g factors are constrained to identity is fitted. These cross-battery covariances are assessed by way of the modification indices in order to allow only strongly covarying group factors and indicators to be connected; this is done in order to avoid inflating the g factor relationship and because it does make sense for indicators with similar content to be related specifically. The indicator residual covariances follow the group factor residual covariances to avoid obscuring them and overfitting when their relationship may be mediated by their group factor. The problem here is clearly overfitting, so being judicious is advisable. Perhaps making the threshold modification index quite high is a good idea, but ultimately, there are only arbitrary ways to do this. Often enough, failing to do this leads to g factor correlations greater than one, clearly justifying it in such cases.

ENGFI.model <- '
F1 =~ TRM + LOGL + DIGS + SYMM + LISR + RUNM + KETR + TBAC
F2 =~ STR + PMI

gEF =~ F1 + F2

F3 =~ MATR + VOCA + SIMI
F4 =~ PATC + LETC + SYMS
F5 =~ BD + MATR

gNO =~ F3 + F4 + F5

gEF ~~ 1*gNO

SYMM ~~ BD
KETR ~~ MATR + LETC
TBAC ~ BD

F2 ~~ F4'

ENGNO.fit <- cfa(ENGNO.model, sample.cov = ENG.cor, sample.nobs = nENG, std.lv = T, orthogonal = T, check.gradient = F, control=list(rel.tol=1e-4))
ENGFR.fit <- cfa(ENGFR.model, sample.cov = ENG.cor, sample.nobs = nENG, std.lv = T, orthogonal = T, check.gradient = F, control=list(rel.tol=1e-4))
ENGGC.fit <- cfa(ENGGC.model, sample.cov = ENG.cor, sample.nobs = nENG, std.lv = T, orthogonal = T, check.gradient = F, control=list(rel.tol=1e-4))
ENGRC.fit <- cfa(ENGRC.model, sample.cov = ENG.cor, sample.nobs = nENG, std.lv = T, orthogonal = T, check.gradient = F, control=list(rel.tol=1e-4))
ENGFI.fit <- cfa(ENGFI.model, sample.cov = ENG.cor, sample.nobs = nENG, std.lv = T, orthogonal = T, check.gradient = F, control=list(rel.tol=1e-4))

round(cbind(NONE = fitMeasures(ENGNO.fit, FITM),
            FREE = fitMeasures(ENGFR.fit, FITM),
            GROUPS = fitMeasures(ENGGC.fit, FITM),
            INDICATORS = fitMeasures(ENGRC.fit, FITM),
            IDENTICAL = fitMeasures(ENGFI.fit, FITM)),3)

##                     NONE      FREE    GROUPS INDICATORS IDENTICAL
## chisq           1162.744   359.362   350.096    300.109   303.317
## df               114.000   112.000   111.000    107.000   108.000
## npar              39.000    41.000    42.000     46.000    45.000
## cfi                0.821     0.958     0.959      0.967     0.967
## rmsea              0.107     0.053     0.052      0.047     0.048
## rmsea.ci.lower     0.102     0.047     0.046      0.041     0.041
## rmsea.ci.upper     0.113     0.059     0.058      0.054     0.054
## aic            33812.894 33013.512 33006.246  32964.258 32965.467
## bic            33995.594 33205.581 33203.000  33179.751 33176.274

It seems that the model with identical g factors is acceptable. I will plot the prior model to see what the g factor correlation is without the unity constraint. In the model where the g factors are freed and no other modifications are made, the r is 0.987 so my modifications improved fit a bit and negligibly reduced the relationship, perhaps spuriously because it was such a small effect. Either value can be validly interpreted with minor qualification. On the other hand, the idea that the g factors are not related is not a viable suggestion.

ENGLAT <- list(
  F1 = c("TRM", "LOGL", "DIGS", "SYMM", "LISR", "RUNM", "KETR", "TBAC"),
  F2 = c("STR", "PMI"),
  F3 = c("MATR", "VOCA", "SIMI"),
  F4 = c("PATC", "LETC", "SYMS"),
  F5 = c("BD", "MATR"))

semPaths(ENGRC.fit, "model", "std", title = F, residuals = F, groups = "ENGLAT", pastel = T, mar = c(2, 1, 3, 1), bifactor = c("gEF", "gNO"), layout = "tree2", exoCov = T)

## Warning in if (w <= 0) w <- 1e-07: the condition has length > 1 and only the
## first element will be used

And in order to get acquainted with the new tidySEM package

ENGLAY <- get_layout("TRM", "LOGL", "DIGS", "SYMM", "LISR", "RUNM", "KETR", "TBAC", "STR", "PMI",
                     "", "", "", "F1", "", "", "F2", "", "", "",
                     "", "", "", "", "", "gEF", "", "", "", "",
                     "", "", "", "", "gNO", "", "", "", "", "",
                     "", "", "F3", "", "F4", "", "F5", "", "", "",
                     "", "MATR", "VOCA", "SIMI", "PATC", "LETC", "SYMS", "BD", "", "", rows = 6)

graph_sem(model = ENGRC.fit, layout = ENGLAY)

I assume this package will be updated to have some standard features that make its plots look better in the future, like standardized coefficients, removing factor and indicator covariances in toto, and centering factors. I could work more on making this one pretty but I don’t care enough to do that and I make my plots for publication in graphviz currently.

The model from Engelhardt et al. (2016) can also be used. This has the advantage that it’s theoretically motivated. This can be done in three ways:

Like they did it, in which g is a narrow first-order factor.
With a g factor over group factors for the block design, matrix reasoning, vocabulary, and similarities subtests, one for each pair.
With a broader higher-order g.

The last model is the most theoretically preferable as g is a higher-order concept and we do not want to run into psychometric sampling error, but we can also check the result for the other options, the first two, both with and without speed. I do not model age like the original researchers did and of course I am using a matrix input so my results will differ slightly from the raw input.

fa.parallel(ENGWI.cor, n.obs = nENG)

## Parallel analysis suggests that the number of factors =  3  and the number of components =  1

fa(ENGWI.cor, n.obs = nENG, nfactors = 2)

## Factor Analysis using method =  minres
## Call: fa(r = ENGWI.cor, nfactors = 2, n.obs = nENG)
## Standardized loadings (pattern matrix) based upon correlation matrix
##        MR1   MR2   h2   u2 com
## BD    0.63  0.09 0.49 0.51 1.0
## MATR  0.70 -0.06 0.44 0.56 1.0
## VOCA  0.83  0.00 0.69 0.31 1.0
## SIMI  0.77  0.02 0.61 0.39 1.0
## PATC  0.01  0.82 0.69 0.31 1.0
## LETC -0.05  0.86 0.68 0.32 1.0
## SYMS  0.25  0.55 0.55 0.45 1.4
## 
##                        MR1  MR2
## SS loadings           2.33 1.82
## Proportion Var        0.33 0.26
## Cumulative Var        0.33 0.59
## Proportion Explained  0.56 0.44
## Cumulative Proportion 0.56 1.00
## 
##  With factor correlations of 
##      MR1  MR2
## MR1 1.00 0.67
## MR2 0.67 1.00
## 
## Mean item complexity =  1.1
## Test of the hypothesis that 2 factors are sufficient.
## 
## The degrees of freedom for the null model are  21  and the objective function was  3.37 with Chi Square of  2683.61
## The degrees of freedom for the model are 8  and the objective function was  0.21 
## 
## The root mean square of the residuals (RMSR) is  0.04 
## The df corrected root mean square of the residuals is  0.07 
## 
## The harmonic number of observations is  800 with the empirical chi square  65.94  with prob <  3.1e-11 
## The total number of observations was  800  with Likelihood Chi Square =  163.65  with prob <  2.8e-31 
## 
## Tucker Lewis Index of factoring reliability =  0.846
## RMSEA index =  0.156  and the 90 % confidence intervals are  0.136 0.177
## BIC =  110.18
## Fit based upon off diagonal values = 0.99
## Measures of factor score adequacy             
##                                                    MR1  MR2
## Correlation of (regression) scores with factors   0.93 0.92
## Multiple R square of scores with factors          0.86 0.86
## Minimum correlation of possible factor scores     0.72 0.71

HARMOD1.model <- '
F1 =~ STR + MIC + STO
F2 =~ TRM + PMI + LOGL
F3 =~ DIGS + SYMM + LISR
F4 =~ RUNM + KETR + TBAC
EF =~ F1 + F2 + F3 + F4

F5 =~ BD + MATR 
F6 =~ VOCA + SIMI
SP =~ PATC + LETC + SYMS
g =~ F5 + F6

EF ~~ g + SP
g ~~ SP'

HARMOD2.model <- '
F1 =~ STR + MIC + STO
F2 =~ TRM + PMI + LOGL
F3 =~ DIGS + SYMM + LISR
F4 =~ RUNM + KETR + TBAC
EF =~ F1 + F2 + F3 + F4

F5 =~ BD + MATR 
F6 =~ VOCA + SIMI
g =~ F5 + F6

EF ~~ g'

HARMOD3.model <- '
F1 =~ STR + MIC + STO
F2 =~ TRM + PMI + LOGL
F3 =~ DIGS + SYMM + LISR
F4 =~ RUNM + KETR + TBAC
EF =~ F1 + F2 + F3 + F4

F5 =~ BD + MATR 
F6 =~ VOCA + SIMI
F7 =~ PATC + LETC + SYMS
g =~ F5 + F6 + F7

EF ~~ g'

HARMOD4.model <- '
F1 =~ STR + MIC + STO
F2 =~ TRM + PMI + LOGL
F3 =~ DIGS + SYMM + LISR
F4 =~ RUNM + KETR + TBAC
EF =~ F1 + F2 + F3 + F4

F5 =~ BD + MATR + VOCA + SIMI
F6 =~ PATC + LETC + SYMS
g =~ F5 + F6

EF ~~ g'

HARMOD5.model <- '
F1 =~ STR + MIC + STO
F2 =~ TRM + PMI + LOGL
F3 =~ DIGS + SYMM + LISR
F4 =~ RUNM + KETR + TBAC
EF =~ F1 + F2 + F3 + F4

SP =~ PATC + LETC + SYMS
g =~ BD + MATR + VOCA + SIMI

EF ~~ g + SP
g ~~ SP'

HARMOD6.model <- '
F1 =~ STR + MIC + STO
F2 =~ TRM + PMI + LOGL
F3 =~ DIGS + SYMM + LISR
F4 =~ RUNM + KETR + TBAC
EF =~ F1 + F2 + F3 + F4

g =~ BD + MATR + VOCA + SIMI

EF ~~ g'


HARMOD1.fit <- cfa(HARMOD1.model, sample.cov = ENG.cor, sample.nobs = nENG, std.lv = T)
HARMOD2.fit <- cfa(HARMOD2.model, sample.cov = ENG.cor, sample.nobs = nENG, std.lv = T)
HARMOD3.fit <- cfa(HARMOD3.model, sample.cov = ENG.cor, sample.nobs = nENG, std.lv = T)
HARMOD4.fit <- cfa(HARMOD4.model, sample.cov = ENG.cor, sample.nobs = nENG, std.lv = T)
HARMOD5.fit <- cfa(HARMOD5.model, sample.cov = ENG.cor, sample.nobs = nENG, std.lv = T)
HARMOD6.fit <- cfa(HARMOD6.model, sample.cov = ENG.cor, sample.nobs = nENG, std.lv = T)

round(cbind(MODEL1 = fitMeasures(HARMOD1.fit, FITM),
            MODEL2 = fitMeasures(HARMOD2.fit, FITM),
            MODEL3 = fitMeasures(HARMOD3.fit, FITM),
            MODEL4 = fitMeasures(HARMOD4.fit, FITM),
            MODEL5 = fitMeasures(HARMOD5.fit, FITM),
            MODEL6 = fitMeasures(HARMOD6.fit, FITM)),3)

##                   MODEL1    MODEL2    MODEL3    MODEL4    MODEL5    MODEL6
## chisq            398.925   202.629   402.092   530.061   530.061   335.270
## df               143.000    97.000   144.000   145.000   145.000    99.000
## npar              47.000    39.000    46.000    45.000    45.000    37.000
## cfi                0.957     0.976     0.957     0.935     0.935     0.946
## rmsea              0.047     0.037     0.047     0.058     0.058     0.055
## rmsea.ci.lower     0.042     0.030     0.042     0.052     0.052     0.048
## rmsea.ci.upper     0.053     0.044     0.053     0.063     0.063     0.061
## aic            37498.009 32098.879 37499.176 37625.145 37625.145 32227.520
## bic            37718.185 32281.579 37714.668 37835.952 37835.952 32400.850

g factor correlations were r = 0.957, 0.956, 0.976, 0.986, 0.890, and 0.889. Because it would lead to NPD, models 1 and 5 could not have their g-EF relationship constrained to identity, but models 2, 3, and 4 reasonably could, but model 6 could not. Models 1, 5, and 6 were not theoretically acceptable due to the separate speed factor or the narrow, unified concept of g. This leaves 2, 3, and 4 and 2 should be dropped in favor of 3 and 4 because those have broader content to their g factors with the addition of the speed subtests. Model 3 fits much better and is more parsimonious by one df. Because EFA could indicate model 4, I plotted it too even though it made practically no difference.

The correlations between the g factors in models 3 and 4 increase if F3 is removed, are unaffected by removing F1, and are increased by removing both. Improperly defining g as a first-order factor and removing the speed subtests and removing both F1 and F3, the correlation drops to 0.91 and if g remains higher-order with the same indicators (i.e., keeps F5 and F6), then the correlation changes to 0.98. Try it out. The idea that this is confounded by speed or working memory is really strange and goes against the logic of factor analysis both conceptually and mathematically. If you set both g and EF to be first-order factors and remove speed, the correlation drops to 0.88 and if you add speed, it goes up to 0.92, but if you remove BD and VOCA (i.e., one-half of the same content subtests; do the same with MATR and SIMI, MATR and VOCA, whatever), it also drops to 0.88 (0.89, 0.88, all similar) from the model with MATR, SIMI, PATC, LETC, and SYMS, indicating it is, again, not a content confounding issue. That people can think this is so strange when they can very easily just test their suppositions. In whatever case, these observations all contradict the expectations from advocates of the specificity doctrine.

HARLAT <- list(
  F1 = c("STR", "MIC", "STO"),
  F2 = c("TRM", "PMI", "LOGL"),
  F3 = c("DIGS", "SYMM", "LISR"),
  F4 = c("RUNM", "KETR", "TBAC"),
  F5 = c("BD", "MATR"),
  F6 = c("VOCA", "SIMI"),
  F7 = c("PATC", "LETC", "SYMS"))

semPaths(HARMOD3.fit, "model", "std", title = F, residuals = F, groups = "HARLAT", pastel = T, mar = c(2, 1, 3, 1), bifactor = c("EF", "g"), layout = "tree2", exoCov = T)

HARLAT2 <- list(
  F1 = c("STR", "MIC", "STO"),
  F2 = c("TRM", "PMI", "LOGL"),
  F3 = c("DIGS", "SYMM", "LISR"),
  F4 = c("RUNM", "KETR", "TBAC"),
  F5 = c("BD", "MATR", "VOCA", "SIMI"),
  F6 = c("PATC", "LETC", "SYMS"))

semPaths(HARMOD4.fit, "model", "std", title = F, residuals = F, groups = "HARLAT2", pastel = T, mar = c(2, 1, 3, 1), bifactor = c("EF", "g"), layout = "tree2", exoCov = T)

Freis et al.

First, their model fits and parameters need to be replicated. These will be right around but slightly different from their parameters since I am using a matrix input and the average n from the used tests. There are two “off” things about their models I should mention before fitting them though. The first is that it is mathematically impossible to get positive loadings from the SST subtest with a unified (i.e., single-factor) model because all of its coefficients are reported as negative when, to get all-positive loadings in such a model, they must be positive. The second is that the number of degrees of freedom they reported for their model with two orthogonal factors is impossible because there are five loadings on the common executive function factor (5 df) and two on their updating-specific factor that is loaded on two of the common EF indicator (-2 df as a result) for a total of 3 df. They report 4. Both of these anomalies suggest undisclosed analytic flexibility, the former of which is innocuous but for integrity must be reported and the second of which is extremely hard to interpret. Difficulty in interpretation of the second anomaly comes from the fact that the models they provided do not show which parameter was constrained. The constraint was not placed on a loading, nor on a residual, and it could not have been placed on a single factor variance or they would not have been able to statistically identify the model. So where’s the beef? We cannot know except by iteratively constraining parameters until we arrive at approximately their result; ‘approximately’ because the raw data are not provided, so the approximate nature of the summary data really obscures our ability to determine with any exactness where their analytic flexibility was utilized (as do differences in the program used, but even using “mimic = Mplus” would not solve the problem; maybe their analysis options - not emulatable in lavaan yet - affected this, but I doubt it since subsequent models had correct df). This is an unfortunate reality. Another unfortunate reality is that with a large n and too few reported significant digits, we will very likely be unable to assess their undisclosed constraint, and moreover, because they provided too few digits in their p value, we can’t try to garner information about the discrepancy from that either. This whole thing is bungled so I’m going to just refit the model and assess if the fit measures are practically what they arrived at. I did reflect the SST correlations like they must have (unless they misreported computed coefficients or mirrored them in the diagram; either way, they did not say they did this).

#Unified Model

UNI <- '
CEF =~ Card + Flanker + Nback + List + SST'

#EF Measurement Model

EFMM <- '
CEF =~ Card + Flanker + Nback + List + SST
US =~ Nback + List'

FREUNI.fit <- cfa(UNI, sample.cov = FRE.cov, sample.nobs = AN, std.lv = T)
FREEFMM.fit <- cfa(EFMM, sample.cov = FRE.cov, sample.nobs = AN, std.lv = T, orthogonal = T)

## Warning in lav_model_vcov(lavmodel = lavmodel2, lavsamplestats = lavsamplestats, : lavaan WARNING:
##     Could not compute standard errors! The information matrix could
##     not be inverted. This may be a symptom that the model is not
##     identified.

## Warning in lav_object_post_check(object): lavaan WARNING: some estimated ov
## variances are negative

round(cbind(UNIFIED = fitMeasures(FREUNI.fit, FITM),
            MEASUREMENT = fitMeasures(FREEFMM.fit, FITM)),3)

##                   UNIFIED MEASUREMENT
## chisq             181.536      77.023
## df                  5.000       3.000
## npar               10.000      12.000
## cfi                 0.965       0.985
## rmsea               0.058       0.048
## rmsea.ci.lower      0.051       0.039
## rmsea.ci.upper      0.065       0.058
## aic            338923.148  338822.634
## bic            338995.893  338909.928

The N-back variance is an ultra-Heywood in the measurement model (df = 3). Constraining it does not fix the problem but it brings the df in line with what was reported. As mentioned above, mimicking Mplus does not fix this issue as it just makes the Updating-Specific loadings a mix of extremely negative and null. This may have resulted from the nature of the reported summary statistics including by way of possible innacuracies. The fits are pretty much as expected though. Though the bifactor and exact higher-order model are likely to run into convergence problems, if I change measurement model convergence criteria, it can produce a quite large par change and is inadvisable (try it out and see that the model becomes uninterpretable). Given that this is models with different numbers of factors rather than just nested changes, this is acceptable. Note that if the group factors are not interpretable on their own, as is often the case, the bifactor models will have to be refitted with pruning or outright removal of group factor loadings.

#Measurement Model

MM <- '
CEF =~ Card + Flanker + Nback + List + SST
US =~ Nback + List
INT =~ MatrixR + PicVocab

CEF ~~ INT
US ~~ INT'

#Higher-Order Model - Copied Group Factors

HOFI <- '
CEF =~ Card + Flanker + Nback + List + SST
US =~ Nback + List
INT =~ MatrixR + PicVocab

g =~ CEF + US + INT'

#Higher-Order Model - Mixed

HOFM <- '
CEF =~ Card + Flanker + Nback + List + SST
US =~ Nback + List

g =~ CEF + US + MatrixR + PicVocab'

#Bifactor Model - Improper

BFI <- '
CEF =~ Card + Flanker + Nback + List + SST
US =~ Nback + List
INT =~ MatrixR + PicVocab 

g =~ Card + Flanker + Nback + List + SST + MatrixR + PicVocab'

#Bifactor Model - Proper
#Identification constraints on US probably required but undermined by cross-loading-induced errors.
BFP <- '
CEF =~ Card + Flanker + Nback + List + SST
US =~ Nback + List 
g =~ Card + Flanker + Nback + List + SST + MatrixR + PicVocab'

FREMM.fit <- cfa(MM, sample.cov = FRE.cov, sample.nobs = AN, std.lv = T, orthogonal = T)

## Warning in lav_object_post_check(object): lavaan WARNING: covariance matrix of latent variables
##                 is not positive definite;
##                 use lavInspect(fit, "cov.lv") to investigate.

FREHOFI.fit <- cfa(HOFI, sample.cov = FRE.cov, sample.nobs = AN, std.lv = T, check.gradient = F, control=list(rel.tol=1e-4))

## Warning in lav_model_vcov(lavmodel = lavmodel2, lavsamplestats = lavsamplestats, : lavaan WARNING:
##     Could not compute standard errors! The information matrix could
##     not be inverted. This may be a symptom that the model is not
##     identified.

FREHOFM.fit <- cfa(HOFM, sample.cov = FRE.cov, sample.nobs = AN, std.lv = T, check.gradient = F, control=list(rel.tol=1e-4))
FREBFI.fit <- cfa(BFI, sample.cov = FRE.cov, sample.nobs = AN, std.lv = T, orthogonal = T, check.gradient = F, control=list(rel.tol=1e-4))

## Warning in lav_model_vcov(lavmodel = lavmodel2, lavsamplestats = lavsamplestats, : lavaan WARNING:
##     Could not compute standard errors! The information matrix could
##     not be inverted. This may be a symptom that the model is not
##     identified.

## Warning in lav_object_post_check(object): lavaan WARNING: some estimated ov
## variances are negative

FREBFP.fit <- cfa(BFP, sample.cov = FRE.cov, sample.nobs = AN, std.lv = T, orthogonal = T, check.gradient = F, control=list(rel.tol=1e-4))

## Warning in lav_model_vcov(lavmodel = lavmodel2, lavsamplestats = lavsamplestats, : lavaan WARNING:
##     Could not compute standard errors! The information matrix could
##     not be inverted. This may be a symptom that the model is not
##     identified.

## Warning in lav_model_vcov(lavmodel = lavmodel2, lavsamplestats = lavsamplestats, : lavaan WARNING: some estimated ov variances are negative

round(cbind(MM = fitMeasures(FREMM.fit, FITM),
            HOFCopied = fitMeasures(FREHOFI.fit, FITM),
            HOFMixed = fitMeasures(FREHOFM.fit, FITM),
            BFCopied = fitMeasures(FREBFI.fit, FITM),
            BFProper = fitMeasures(FREBFP.fit, FITM)),3)

##                        MM  HOFCopied   HOFMixed   BFCopied   BFProper
## chisq             112.410    114.531    113.587     86.858    103.280
## df                 10.000      9.000     10.000      5.000      7.000
## npar               18.000     19.000     18.000     23.000     21.000
## cfi                 0.991      0.990      0.991      0.993      0.991
## rmsea               0.031      0.033      0.031      0.039      0.036
## rmsea.ci.lower      0.026      0.028      0.026      0.032      0.030
## rmsea.ci.upper      0.036      0.039      0.036      0.047      0.042
## aic            466672.978 466677.099 466674.155 466657.426 466669.848
## bic            466803.918 466815.314 466805.095 466824.739 466822.611

Their preferred phenotypic factor model had ten df as expected (14 - 4 = 10). This means that they must have removed their constraint that got them to 4 df on the EF-only model when they added “intelligence”. That is, unless this was an error. Perhaps MPlus automatically constrained something and they were not aware of it or they doubly identified their model.

The mixed higher-order factor model shows practically no residual variance for the Updating-Specific factor (\(\approx\) 1%) when g is modeled; moreover, the CEF loading on N-back and List Sort goes to a minuscule r \(\approx\) 0.07-0.08. In both of the bifactor models, indicators must be dropped from both executive function factors. In the improper bifactor model, the N-back indicator turns into an ultra-Heywood case and becomes a problem again. Constraining it just removes its loadings on everything and destroys model fit. Trying to identify the model by constraining the two-indicator group factor loadings just leads to atrocious fit and exacerbates the N-back problem.

#Refit Higher-Order Model - Mixed

HOFMII <- '
CEF =~ Card + Flanker + SST

g =~ CEF + Nback + List + MatrixR + PicVocab'


FREHOFMII.fit <- cfa(HOFMII, sample.cov = FRE.cov, sample.nobs = AN, std.lv = T, check.gradient = F, control=list(rel.tol=1e-4))

#Refit Bifactor Model 
#Identification via constraining the INT loadings is required and is reasonable enough given their similarity in this model. 

BFIII <- '
CEF =~ Card + Flanker + SST
INT =~ 1*MatrixR + 1*PicVocab 

g =~ Card + Flanker + Nback + List + SST + MatrixR + PicVocab'

FREBFIII.fit <- cfa(BFIII, sample.cov = FRE.cov, sample.nobs = AN, std.lv = T, orthogonal = T, check.gradient = F, control=list(rel.tol=1e-4))

round(cbind(MM = fitMeasures(FREMM.fit, FITM),
            HOF = fitMeasures(FREHOFMII.fit, FITM),
            BF = fitMeasures(FREBFIII.fit, FITM)), 3)

##                        MM        HOF         BF
## chisq             112.410    131.643    105.241
## df                 10.000     13.000     11.000
## npar               18.000     15.000     17.000
## cfi                 0.991      0.989      0.991
## rmsea               0.031      0.029      0.028
## rmsea.ci.lower      0.026      0.025      0.024
## rmsea.ci.upper      0.036      0.034      0.033
## aic            466672.978 466686.211 466663.809
## bic            466803.918 466795.327 466787.474

In the more theoretically acceptable mixed higher-order model, the g factor still explained approximately 50% of the variance in the CEF indicators. For the only slightly-loading SST, there doesn’t seem to be much of a proportionality constraint (higher-order g loading = 0.131, independent = 0.153). In the bifactor model based on the initial improper model with pruning, the resulting values of PUC, H, \(\omega_h\) and \(\omega_t\), ECV, and total variance for g are \(\approx\) 0.76 (indicators are half-weighted when there isn’t an orthogonal group factor), 0.74, 0.66, 0.74, 0.71, and 0.26. This battery is largely unidimensional, as it was for the refit mixed higher-order model whose values - obtained using Schmid-Leiman transformed loadings - were 0.57, 0.74, 0.67, 0.71, 0.89, and 0.26. The g loadings were basically the same and correlated at r = 0.998. In order to make the result inescapable, Waller the measurement model and the results are similar. Obviously g is likely and was excluded from the model through conceptual misidentification.

Jensen Effect

Because of Gorsuch’s interpretation of PCA and its rough correspondence to the higher-order interpretation of g implicit in Jensen’s work and explicit in Kan’s, I will be using the PCA loadings for the Jensen effect meta since the higher-order loadings cannot actually be reliably assessed due to the lack of indicators. Just as well, my A, C, and E estimates will be computed based on the twin correlations since ACE model estimates often lead to strange results due to low power and switching between different numbers of factors. Negative values were constrained to zero with E adjusted accordingly. The formulas to derive these quantities from twin correlations are

\[A = 2(r_{MZ} - r_{DZ})\]

\[C = (r_{MZ} - 2r_{DZ})\]

\[E = 1 - A + C\]

pca(FRECOG.cor)

## Principal Components Analysis
## Call: principal(r = r, nfactors = nfactors, residuals = residuals, 
##     rotate = rotate, n.obs = n.obs, covar = covar, scores = scores, 
##     missing = missing, impute = impute, oblique.scores = oblique.scores, 
##     method = method, use = use, cor = cor, correct = 0.5, weight = NULL)
## Standardized loadings (pattern matrix) based upon correlation matrix
##           PC1    h2   u2 com
## Flanker  0.63 0.394 0.61   1
## List     0.70 0.495 0.50   1
## Card     0.68 0.456 0.54   1
## Nback    0.48 0.230 0.77   1
## SST      0.23 0.055 0.95   1
## MatrixR  0.69 0.470 0.53   1
## PicVocab 0.69 0.470 0.53   1
## 
##                 PC1
## SS loadings    2.57
## Proportion Var 0.37
## 
## Mean item complexity =  1
## Test of the hypothesis that 1 component is sufficient.
## 
## The root mean square of the residuals (RMSR) is  0.11 
## 
## Fit based upon off diagonal values = 0.84

df <- data.frame("gPCA" = c(0.63, 0.70, 0.68, 0.48, 0.23, 0.69, 0.69), "A" = c(0.34, 0.36, 0.50, 0.36, 0.30, 0.28, 0.34), "C" = c(0.08, 0, 0.12, 0.05, 0, 0, 0), "E" = c(0.58, 0.64, 0.38, 0.59, 0.70, 0.72, 0.66))

CONGO <- function(F1, F2) {
  PHI = sum(F1*F2) / sqrt(sum(F1^2)*sum(F2^2))
  return(PHI)}

cor(df$gPCA, df$A, method = "pearson"); cor(df$gPCA, df$A, method = "spearman"); CONGO(df$gPCA, df$A)

## [1] 0.3122905

## [1] 0.1009217

## [1] 0.9628971

cor(df$gPCA, df$C, method = "pearson"); cor(df$gPCA, df$C, method = "spearman"); CONGO(df$gPCA, df$C)

## [1] 0.1864494

## [1] -0.3379782

## [1] 0.6356721

cor(df$gPCA, df$E, method = "pearson"); cor(df$gPCA, df$E, method = "spearman"); CONGO(df$gPCA, df$E)

## [1] -0.2746112

## [1] 0.14415

## [1] 0.9373397

In terms of r, there is a Jensen effect for A, a smaller one or a non-Jensen effect for C, and an anti-Jensen effect for E; for the preferred \(\rho\), there is a small or non-Jensen effect for A, an anti-Jensen effect for C, and a small or non-Jensen effect for E; by my preferred \(\phi\), there is a Jensen effect for A, not a Jensen effect for C, and a tautological Jensen effect (due to mirroring) for E, so probably really no Jensen effect. This would benefit from a common versus independent pathway analysis.

Conclusion

Freis et al. (2020) assessed the phenotypic and etiological appearance of a battery of five executive function tasks and two visuospatial tasks. They made a number of conceptual and real analytic errors in their study. To summarize what I noticed:

They improperly defined factors.
They neglected the dimensionality of their entire battery.
They misreported the df of at least one model.
They reported impossible coefficients barring some sort of reverse-scoring they did not (at least clearly) report and which was not presented in their data.
They seem to have theorized beyond the limits of their data and in some parts, in an incoherent way.

I hope this helps them to improve their analysis and inspires more theoretical thinking and literature reviewing in the future. I only wish enough data had been provided to assess if more theoretically interesting and coherent etiological models fit.

In addition to this critique, I have shown support for executive function and psychometric tests tapping an identical g factor in the data from Engelhardt’s (2018) thesis and a Jensen effect in the data from Freis et al. (2020). An interesting thing to note is that if the ACE model values (which can be confounded by power, non-nesting (i.e., AE or AC, CE, instead of ACE), etc.) were used, the Jensen effect results would be altered (reversed, even); similarly, using the higher-order loadings (this model is badly substantively identified), the results would be slightly altered.

References

Freis, S. M., Morrison, C., Lessem, J. M., Hewitt, J. K., & Friedman, N. P. (2020). Genetic and Environmental Influences on Executive Functions and Intelligence in the ABCD Study [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/xpba4

Johnson, W., Nijenhuis, J. te, & Bouchard, T. J. (2008). Still just 1 g: Consistent results from five test batteries. Intelligence, 36(1), 81-95. https://doi.org/10.1016/j.intell.2007.06.001

Guttman, L., & Levy, S. (1991). Two structural laws for intelligence tests. Intelligence, 15(1), 79-103. https://doi.org/10.1016/0160-2896(91)90023-7

Floyd, R. G., Bergeron, R., Hamilton, G., & Parra, G. R. (2010). How do executive functions fit with the Cattell-Horn-Carroll model? Some evidence from a joint factor analysis of the Delis-Kaplan Executive Function System and the Woodcock-Johnson III tests of cognitive abilities. Psychology in the Schools, 721-738. https://doi.org/10.1002/pits.20500

Engelhardt, L. E. (2018). Biological mechanisms underlying the unity and diversity of executive functions in childhood [Thesis]. https://doi.org/10.15781/T2RB6WM7T

Spearman, C. (1904). “General Intelligence,” Objectively Determined and Measured. The American Journal of Psychology, 15(2), 201-292. https://doi.org/10.2307/1412107

Major, J. T., Johnson, W., & Bouchard, T. J. (2011). The dependability of the general factor of intelligence: Why small, single-factor models do not adequately represent g. Intelligence, 39(5), 418-433. https://doi.org/10.1016/j.intell.2011.07.002

Frazier, T. W., & Youngstrom, E. A. (2007). Historical increase in the number of factors measured by commercial tests of cognitive ability: Are we overfactoring? Intelligence, 35(2), 169-182. https://doi.org/10.1016/j.intell.2006.07.002

Reeve, C. L., & Blacksmith, N. (2009). Identifying g: A review of current factor analytic practices in the science of mental abilities. Intelligence, 37(5), 487-494. https://doi.org/10.1016/j.intell.2009.06.002

Hood, S. B. (2010). Latent Variable Realism in Psychometrics [Thesis]. https://scholarworks.iu.edu/dspace/handle/2022/8120

Engelhardt, L. E., Mann, F. D., Briley, D. A., Church, J. A., Harden, K. P., & Tucker-Drob, E. M. (2016). Strong genetic overlap between executive functions and intelligence. Journal of Experimental Psychology: General, 145(9), 1141-1159. https://doi.org/10.1037/xge0000195

Reanalysis of Executive Function and Intelligence in the ABCD Study

Jay Lasker

July 2, 2020, 3:35 PM