Setup

Packages and Functions

library(pacman)
p_load(lavaan, psych, semPlot, VIM, egg, dplyr, knitr, ggplot2, reshape2, latex2exp)

CONGO <- function(F1, F2) {
  PHI = sum(F1*F2) / sqrt(sum(F1^2)*sum(F2^2))
  return(PHI)}

Data

BAF <- read.csv("BAFA.csv")

Imputation

ach_vars <- c("Portuguese", "English", "Geography", "History", "Biology", "Physics", "Chemistry", "Math")

missing_plot <- aggr(BAF[, ach_vars], col=c('gold', 'darkred'), numbers = T, sortVars = T, labels = names(BAF[, ach_vars]), cex.axis = .7, gap = 3, ylab = c("Proportion Missing", "Pattern"))

## 
##  Variables sorted by number of missings: 
##    Variable     Count
##     History 0.6267123
##   Geography 0.5958904
##     Physics 0.2260274
##     English 0.2226027
##  Portuguese 0.2191781
##     Biology 0.2191781
##   Chemistry 0.2191781
##        Math 0.2191781

There's too much missing for history and geogaphy and practically the same numbers of people are missing the others. The randomness is dubious, so I elected not to impute. 21.23% of N, P1, P2, P3, FF, FI1, and FI2 were missing but otherwise no cognitive data was missing; because of the apparent lack of randomness, I elected not to impute those either.

Rationale

As described by Golino & Gomes (2014) the Battery of Higher-Order Cognitive Factors (BAFACALO) is a cognitive battery developed for use in Brazil. The battery was based on the Kit of Factor Referenced Tests (see https://rpubs.com/JLLJ/JOG for an analysis showing that the Kit produces the same g as the ASVAB) and was built to cover six broad domains of the Cattell-Horn-Carroll model:

Fluid intelligence
Crystallized intelligence
Short-term memory
Visual perception
Fluency
Speed

The data adduced for Golino & Gomes' test also includes grade point measures for Portuguese, English, mathematics, biology, physics, chemistry, geography, and history. Their data can be used to help provide evidence for several stylized facts from the field of intelligence research which, though well-established in the west, have not received equal attention in non-western samples. With these data, I aim to test

If there is CFA and dimensionality evidence for a g
If the group factors have interpretive relevance beyond g
The correspondence between an "achievement g" and the g factor produced by their intelligence test
Whether and which specific abilities can predict school performance above and beyond g

In western countries, CFAs usually prefer higher-order to correlated group factor models when there are more than three factors in the model because a model with g is more parsimonious and the factor covariances usually have a unidimensional structure such that a single factor is found among them and it explains the vast majority of the relationships among them. With three or fewer factors, there will be too few for a comparison of models with different degrees of freedom. That this data has a theoretical factor configuration based on the CHC framework is helpful because that framework includes many factors, reliable or unreliable as they may be (Benson et al., 2018). I will be using the theoretically supplied factor structure for my own CFAs. I will model the achievement measures with one factor because, as the EFA below shows, the battery is basically described by one factor; moreover, a two-factor model was unworkable and led to errors. The second point is germane to the first; in general, g explains >70% of the common variance in a battery, regardless of its size. Beyond a certain point, this is more true the broader the battery. As such, the group factors are generally left with limited reliable variance of their own, complicating their interpretation for clinicians and other users of cognitive tests. There is a considerable literature on how attempts to predict clinical outcomes and intellectual performance from individuals' cognitive profiles afford limited validity at best; this may be a result of the limited incremental variance found in those factors. This generally results in the recommendation that a battery not be interpreted beyond its FSIQ, since the other scores amount to proxies for it with a greater degree of measurement error.

The third point is important to know because, in the west, the relationship between achievement and intelligence test measures is, despite a century of work, somehow still a matter of contention. All analyses of population-representative samples with good instruments return that achievement and intelligence measures are strongly related and that most of the variance in achievement is attributable to intelligence (Kaufman & O'Neil, 1988; Deary et al., 2007; Kaufman et al., 2012; Zaboski, Kranzler & Gage, 2018). Most analyses do not get to the root of the topic, obliquely addressing it by finding that achievement and congitive measurements have similar g loadings, that lower-order g factors are highly similar, that g predicts a large part of the variance in a few measures, or that full-scale scores are strongly related. However, in order to account for psychometric sampling error and assess if what they test in general is the same - and if they have the same recommended level of interpretation -, fitting higher-order (or, perhaps, bifactor) CFAs for their comparison is required. The only study which had sufficient data to test this of which I am aware was conducted by Kaufman et al. (2012); they found that achievement-g and intelligence-g were correlated at r = 0.86 for the KABC-II and KTEA-II and 0.80 for the WJ-III Cog and WJ-III Ach. The breadth of these g factors was contestable, but the test was, nonetheless, the best yet-conducted of which I am aware. The relationship will be assessed in this battery within the factor model; unfortunately, the breadth is even worse here, but not terribly so. I will also assess the g and s loading stability of these batteries (Schmid-Leiman) with separate g factors, one g between them, and based on a CFA where the indicators from both sets are allowed to sit under the same group factors (i.e., a true joint confirmatory factor analysis).

The fourth point is about the generalizability of the more than fifty years of findings that, when predicting with IQ tests, they are strong predictors, but the only factor that consistently predicts is g. Specific ability estimates tend to provide much less validity. One reason for this may be that, in the higher-order model, or residualized of g based on a Schmid-Leiman transformation of EFA results, the group factors are relatively weak compared to those extracted directly in bifactor models or those containing g variance in correlated group factor models. This may reflect bifactor group factors pulling variance that resides in the unique factor variances (and these could become group factors with more indicators in the model) for a subset of the tests in a battery, but it may reflect genuinely better measurement. To this end, I will test higher-order factor predictive validity but not bifactor predictive validity, because of identification issues, for the various grades measurements, as well as a unit-weighted GPA-like grade sumscore.

VZ, I, RL, RG, V1, V2, V3, MV, MA1, MA2, VZ, and CF sumscores were used. The II10, 12, and 13 items were excluded for the reason provided by Golino & Gomes (2014).

Analysis

Distributions of Eigenvalues

# Intelligence

BAFACog <- c("N", "I", "RL", "RG", "VZ", "CF", "P1", "P2", "P3", "FF", "FI1", "FI2", "MV", "MA1", "MA2", "V1", "V2", "V3")
CogData <- BAF[BAFACog]

BAFEFA <- fa(CogData, nfactors = 1); print(BAFEFA$loadings, cutoff = 0.2)

## 
## Loadings:
##     MR1  
## N   0.601
## I   0.608
## RL  0.653
## RG  0.727
## VZ  0.704
## CF  0.639
## P1  0.308
## P2  0.428
## P3  0.460
## FF       
## FI1 0.361
## FI2 0.370
## MV  0.554
## MA1 0.547
## MA2 0.560
## V1  0.477
## V2  0.336
## V3  0.511
## 
##                  MR1
## SS loadings    4.867
## Proportion Var 0.270

NUM <- seq(1, 18, 1); EVS <- BAFEFA$e.values
bpdf <- data.frame("NUM" = NUM, "EVS" = EVS)

INT <- ggplot(bpdf, aes(x = NUM, y = EVS, fill = as.factor(NUM))) + geom_bar(stat = "identity", show.legend = F) + xlab("Number") + ylab("Eigenvalue") + theme_bw() + scale_fill_hue(c = 30) + theme(text = element_text(size = 12, family = "serif"),  plot.title = element_text(hjust = 0.5)) + geom_hline(yintercept = 1, color = "darkgray", linetype = "dashed", size = 1)

# Achievement

BAFAAch <- c("Portuguese", "English", "Biology", "Physics", "Chemistry", "Math")
AchData <- BAF[BAFAAch]

BAFEFA <- fa(AchData, nfactors = 1); print(BAFEFA$loadings, cutoff = 0.2)

## 
## Loadings:
##            MR1  
## Portuguese 0.691
## English    0.816
## Biology    0.898
## Physics    0.927
## Chemistry  0.952
## Math       0.905
## 
##                  MR1
## SS loadings    4.535
## Proportion Var 0.756

ANUM <- seq(1, 6, 1); AVS <- BAFEFA$e.values
bpdf <- data.frame("ANUM" = ANUM, "AVS" = AVS)

ACH <- ggplot(bpdf, aes(x = ANUM, y = AVS, fill = as.factor(ANUM))) + geom_bar(stat = "identity", show.legend = F) + xlab("Number") + ylab("") + theme_bw() + scale_fill_hue(c = 30) + theme(text = element_text(size = 12, family = "serif"),  plot.title = element_text(hjust = 0.5)) + geom_hline(yintercept = 1, color = "darkgray", linetype = "dashed", size = 1)

ggarrange(INT, ACH, ncol = 2)

The first factor in the intelligence battery explained 30.70% of the total variance; in the achievement battery, it explained 79.33%.

g in the Psychometric Battery

GroupFactors <- '
Gf =~ N + I + RL + RG
Gv =~ VZ + CF
Gs =~ P1 + P2 + P3
Gfl =~ FF + FI1 + FI2
Gsm =~ MV + MA1 + MA2 
Gc =~ V1 + V2 + V3'

HigherOrder <- '
Gf =~ N + I + RL + RG
Gv =~ VZ + CF
Gs =~ P1 + P2 + P3
Gfl =~ FF + FI1 + FI2
Gsm =~ MV + MA1 + MA2 
Gc =~ V1 + V2 + V3

gINT =~ Gf + Gv + Gs + Gfl + Gsm + Gc'

Groups <- cfa(GroupFactors, data = BAF, std.lv = T)
HoG <- cfa(HigherOrder, data = BAF, std.lv = T)

round(cbind("Group Factors" = fitMeasures(Groups, FITM),
            "Higher-order" = fitMeasures(HoG, FITM)), 3)

##                Group Factors Higher-order
## chisq                200.585      207.141
## df                   120.000      129.000
## npar                  51.000       42.000
## cfi                    0.923        0.926
## rmsea                  0.054        0.051
## rmsea.ci.lower         0.041        0.038
## rmsea.ci.upper         0.067        0.064
## aic                22884.084    22872.640
## bic                23059.426    23017.039
## tli                    0.902        0.912

parameterEstimates(HoG, stand = T) %>% 
  filter(op == "=~") %>% 
  select('Latent Factor' = lhs, Indicator = rhs, B = est, SE = se, Z = z, 'p-value' = pvalue, Beta = std.all) %>% 
  kable(digits = 3, format = "pandoc", caption = "Factor Loadings")

Factor Loadings
Latent Factor	Indicator	B	SE	Z	Beta
Gf	N	3.309	0.459	7.205	0.657
Gf	I	0.719	0.117	6.126	0.517
Gf	RL	1.698	0.235	7.238	0.662
Gf	RG	1.889	0.241	7.827	0.816
Gv	VZ	3.162	0.409	7.740	0.756
Gv	CF	3.871	0.490	7.898	0.711
Gs	P1	2.107	0.491	4.292	0.441
Gs	P2	1.647	0.336	4.902	0.631
Gs	P3	2.716	0.554	4.900	0.623
Gfl	FF	1.260	0.280	4.505	0.334
Gfl	FI1	3.703	0.411	9.002	0.710
Gfl	FI2	2.863	0.313	9.149	0.815
Gsm	MV	0.724	0.127	5.709	0.435
Gsm	MA1	2.524	0.288	8.775	0.783
Gsm	MA2	1.395	0.162	8.592	0.693
Gc	V1	2.055	0.209	9.842	0.744
Gc	V2	1.379	0.174	7.909	0.582
Gc	V3	2.065	0.202	10.215	0.814
gINT	Gf	1.256	0.225	5.594	0.782
gINT	Gv	0.871	0.159	5.482	0.657
gINT	Gs	1.359	0.327	4.152	0.805
gINT	Gfl	0.571	0.113	5.059	0.496
gINT	Gsm	0.780	0.139	5.592	0.615
gINT	Gc	0.796	0.133	5.964	0.623

The higher-order model fit crushingly better than the correlated group factor model for the psychometric data.

resid(HoG, "cor")

## $type
## [1] "cor.bollen"
## 
## $cov
##     N      I      RL     RG     VZ     CF     P1     P2     P3     FF    
## N    0.000                                                               
## I   -0.079  0.000                                                        
## RL  -0.075 -0.018  0.000                                                 
## RG  -0.016  0.068  0.038  0.000                                          
## VZ   0.059  0.069  0.018 -0.033  0.000                                   
## CF   0.005 -0.002  0.017 -0.030  0.000  0.000                            
## P1   0.176  0.041 -0.070 -0.058 -0.069 -0.069  0.000                     
## P2   0.176 -0.155 -0.007 -0.059  0.042  0.020  0.060  0.000              
## P3   0.121 -0.025 -0.064 -0.045  0.039  0.079 -0.036 -0.014  0.000       
## FF  -0.076 -0.146 -0.053 -0.115 -0.182 -0.099 -0.151 -0.099 -0.039  0.000
## FI1  0.042 -0.018  0.032  0.070  0.000 -0.001 -0.100 -0.012  0.090  0.005
## FI2  0.021  0.058  0.006 -0.046 -0.061  0.021 -0.056  0.027  0.049  0.033
## MV   0.125  0.068  0.132  0.075  0.155  0.108  0.014  0.004  0.050 -0.102
## MA1  0.127 -0.035 -0.022  0.008 -0.035 -0.081  0.101 -0.024  0.020 -0.094
## MA2  0.033 -0.116  0.069 -0.024  0.024 -0.098  0.014 -0.065 -0.022 -0.039
## V1   0.128 -0.103 -0.021 -0.109 -0.065 -0.022 -0.029 -0.015 -0.025 -0.056
## V2   0.065 -0.023 -0.018 -0.063  0.012  0.110  0.009 -0.024 -0.033 -0.039
## V3   0.078 -0.004  0.076 -0.053 -0.004  0.049  0.009 -0.036  0.034 -0.111
##     FI1    FI2    MV     MA1    MA2    V1     V2     V3    
## N                                                          
## I                                                          
## RL                                                         
## RG                                                         
## VZ                                                         
## CF                                                         
## P1                                                         
## P2                                                         
## P3                                                         
## FF                                                         
## FI1  0.000                                                 
## FI2 -0.009  0.000                                          
## MV  -0.030 -0.034  0.000                                   
## MA1  0.015 -0.044 -0.040  0.000                            
## MA2 -0.017 -0.085  0.006  0.015  0.000                     
## V1   0.066  0.064  0.054  0.025  0.009  0.000              
## V2   0.106  0.031  0.035  0.050 -0.057  0.019  0.000       
## V3   0.023  0.088  0.007 -0.005 -0.050  0.001 -0.016  0.000

HR <- (resid(HoG, "cor"))
mean(abs(HR$cov))

## [1] 0.04919215

There were not really any large local dependence issues. P1:P3 and FF showed a bit, but for the most part, the rest did not. Since this was not fixed in the bifactor model and based on how it appeared, it looks like there might be some surface LD. Some of these could be modestly important were this personality data where correlations are typically lower. It may not be a coincidence that P1:P3 and FF were part of the least-reliable of the scales. The small magnitude of local dependence and its apparent specificity may explain why the bifactor model doesn't offer that much more predictive validity, especially for the group factors; that observation also supports SLD over underlying LD.

PsyLats <- list(
  Gf = c("N", "I", "RL", "RG"),
  Gv = c("VZ", "CF"),
  Gs = c("P1", "P2", "P3"),
  Gfl = c("FF", "FI1", "FI2"),
  Gsm = c("MV", "MA1", "MA2"),
  Gc = c("V1", "V2", "V3"))

semPaths(HoG, "model", "std", title = F, residuals = F, groups = "PsyLats", bifactor = "gINT", pastel = T, mar = c(2, 1, 3, 1), layout = "tree2", exoCov = T)

g in the Achievement Battery

AchG <- '
gACH =~ Portuguese + English + Biology + Physics + Chemistry + Math'

AchG <- cfa(AchG, data = BAF, std.lv = T)

round(cbind("Achievement g" = fitMeasures(AchG, FITM)), 3)

##                Achievement g
## chisq                 48.084
## df                     9.000
## npar                  12.000
## cfi                    0.972
## rmsea                  0.139
## rmsea.ci.lower         0.102
## rmsea.ci.upper         0.178
## aic                 9659.550
## bic                 9700.596
## tli                    0.953

parameterEstimates(AchG, stand = T) %>% 
  filter(op == "=~") %>% 
  select('Latent Factor' = lhs, Indicator = rhs, B = est, SE = se, Z = z, 'p-value' = pvalue, Beta = std.all) %>% 
  kable(digits = 3, format = "pandoc", caption = "Factor Loadings")

Factor Loadings
Latent Factor	Indicator	B	SE	Z	Beta
gACH	Portuguese	8.919	0.811	10.999	0.657
gACH	English	11.850	0.783	15.128	0.826
gACH	Biology	10.092	0.617	16.355	0.867
gACH	Physics	12.270	0.658	18.647	0.936
gACH	Chemistry	13.055	0.687	18.996	0.946
gACH	Math	15.967	0.889	17.963	0.917

Portuguese was the least g loaded, which may make sense if it is the easiest for a sample of Portuguese speakers.

resid(AchG, "cor")

## $type
## [1] "cor.bollen"
## 
## $cov
##            Portgs Englsh Biolgy Physcs Chmstr Math  
## Portuguese  0.000                                   
## English    -0.021  0.000                            
## Biology     0.104  0.029  0.000                     
## Physics     0.006 -0.012  0.003  0.000              
## Chemistry  -0.022 -0.004  0.008 -0.007  0.000       
## Math       -0.027  0.009 -0.051  0.012  0.010  0.000

AR <- (resid(AchG, "cor"))
mean(abs(AR$cov))

## [1] 0.01804877

semPaths(AchG, "std", title = F, residuals = F, mar = c(2, 1, 3, 1), layout = "circle", posCol = c("skyblue4", "red"), sizeMan = 7, edge.label.cex = 1.2)

How Many g Factors?

TwoFactorsNO <- '
Gf =~ N + I + RL + RG
Gv =~ VZ + CF
Gs =~ P1 + P2 + P3
Gfl =~ FF + FI1 + FI2
Gsm =~ MV + MA1 + MA2 
Gc =~ V1 + V2 + V3

gINT =~ Gf + Gv + Gs + Gfl + Gsm + Gc

gACH =~ Portuguese + English + Biology + Physics + Chemistry + Math

gINT ~~ 0*gACH'

TwoFactorsFR <- '
Gf =~ N + I + RL + RG
Gv =~ VZ + CF
Gs =~ P1 + P2 + P3
Gfl =~ FF + FI1 + FI2
Gsm =~ MV + MA1 + MA2 
Gc =~ V1 + V2 + V3

gINT =~ Gf + Gv + Gs + Gfl + Gsm + Gc

gACH =~ Portuguese + English + Biology + Physics + Chemistry + Math

gINT ~~ gACH'

TwoFactorsID <- '
Gf =~ N + I + RL + RG
Gv =~ VZ + CF
Gs =~ P1 + P2 + P3
Gfl =~ FF + FI1 + FI2
Gsm =~ MV + MA1 + MA2 
Gc =~ V1 + V2 + V3

gINT =~ Gf + Gv + Gs + Gfl + Gsm + Gc

gACH =~ Portuguese + English + Biology + Physics + Chemistry + Math

gINT ~~ 1*gACH'

GNO <- cfa(TwoFactorsNO, data = BAF, std.lv = T)
GFR <- cfa(TwoFactorsFR, data = BAF, std.lv = T)
GID <- cfa(TwoFactorsID, data = BAF, std.lv = T)

round(cbind("Unrelated?" = fitMeasures(GNO, FITM),
            "Related?" = fitMeasures(GFR, FITM),
            "Identical?" = fitMeasures(GID, FITM)), 3)

##                Unrelated?  Related? Identical?
## chisq             543.947   447.898    535.177
## df                246.000   245.000    246.000
## npar               54.000    55.000     54.000
## cfi                 0.884     0.921      0.887
## rmsea               0.073     0.061      0.072
## rmsea.ci.lower      0.065     0.052      0.064
## rmsea.ci.upper      0.082     0.069      0.081
## aic             31975.861 31881.812  31967.092
## bic             32160.331 32069.698  32151.561
## tli                 0.869     0.911      0.873

parameterEstimates(GFR, stand = T) %>% 
  filter(op == "=~") %>% 
  select('Latent Factor' = lhs, Indicator = rhs, B = est, SE = se, Z = z, 'p-value' = pvalue, Beta = std.all) %>% 
  kable(digits = 3, format = "pandoc", caption = "Factor Loadings")

Factor Loadings
Latent Factor	Indicator	B	SE	Z	Beta
Gf	N	2.972	0.534	5.562	0.759
Gf	I	0.482	0.105	4.575	0.454
Gf	RL	1.185	0.226	5.238	0.608
Gf	RG	1.331	0.240	5.555	0.748
Gv	VZ	3.273	0.449	7.297	0.773
Gv	CF	3.533	0.466	7.585	0.637
Gs	P1	2.319	0.491	4.724	0.476
Gs	P2	1.719	0.325	5.287	0.644
Gs	P3	2.633	0.504	5.229	0.597
Gfl	FF	1.301	0.293	4.433	0.332
Gfl	FI1	3.788	0.435	8.702	0.697
Gfl	FI2	3.007	0.334	9.012	0.823
Gsm	MV	0.685	0.122	5.591	0.430
Gsm	MA1	2.450	0.281	8.705	0.779
Gsm	MA2	1.336	0.157	8.485	0.683
Gc	V1	2.120	0.208	10.178	0.757
Gc	V2	1.385	0.176	7.865	0.573
Gc	V3	2.069	0.198	10.439	0.797
gINT	Gf	1.831	0.389	4.703	0.878
gINT	Gv	0.761	0.141	5.387	0.606
gINT	Gs	1.328	0.289	4.602	0.799
gINT	Gfl	0.476	0.101	4.702	0.430
gINT	Gsm	0.827	0.139	5.933	0.637
gINT	Gc	0.753	0.122	6.199	0.602
gACH	Portuguese	9.018	0.806	11.192	0.667
gACH	English	11.899	0.785	15.156	0.828
gACH	Biology	10.115	0.618	16.356	0.868
gACH	Physics	12.276	0.661	18.560	0.935
gACH	Chemistry	13.064	0.691	18.916	0.945
gACH	Math	16.035	0.891	17.990	0.919

parameterEstimates(GID, stand = T) %>% 
  filter(op == "=~") %>% 
  select('Latent Factor' = lhs, Indicator = rhs, B = est, SE = se, Z = z, 'p-value' = pvalue, Beta = std.all) %>% 
  kable(digits = 3, format = "pandoc", caption = "Factor Loadings")

Factor Loadings
Latent Factor	Indicator	B	SE	Z	p-value	Beta
Gf	N	4.416	0.452	9.777	0.000	0.660
Gf	I	0.918	0.126	7.262	0.000	0.506
Gf	RL	2.109	0.226	9.333	0.000	0.633
Gf	RG	2.576	0.209	12.309	0.000	0.847
Gv	VZ	4.548	0.765	5.948	0.000	0.924
Gv	CF	3.440	0.574	5.996	0.000	0.533
Gs	P1	3.411	0.559	6.107	0.000	0.505
Gs	P2	2.453	0.335	7.322	0.000	0.664
Gs	P3	3.389	0.513	6.612	0.000	0.554
Gfl	FF	1.447	0.317	4.562	0.000	0.342
Gfl	FI1	3.842	0.514	7.472	0.000	0.656
Gfl	FI2	3.416	0.413	8.272	0.000	0.867
Gsm	MV	0.754	0.134	5.635	0.000	0.423
Gsm	MA1	2.701	0.285	9.460	0.000	0.767
Gsm	MA2	1.535	0.168	9.122	0.000	0.700
Gc	V1	2.496	0.221	11.286	0.000	0.800
Gc	V2	1.535	0.188	8.152	0.000	0.570
Gc	V3	2.182	0.202	10.827	0.000	0.755
gINT	Gf	0.701	0.099	7.098	0.000	0.574
gINT	Gv	0.409	0.099	4.139	0.000	0.379
gINT	Gs	0.663	0.120	5.531	0.000	0.552
gINT	Gfl	0.234	0.080	2.924	0.003	0.228
gINT	Gsm	0.585	0.100	5.873	0.000	0.505
gINT	Gc	0.512	0.090	5.698	0.000	0.456
gACH	Portuguese	9.096	0.804	11.317	0.000	0.673
gACH	English	11.918	0.784	15.193	0.000	0.829
gACH	Biology	10.117	0.618	16.362	0.000	0.868
gACH	Physics	12.239	0.663	18.463	0.000	0.932
gACH	Chemistry	13.029	0.692	18.824	0.000	0.942
gACH	Math	16.032	0.891	17.986	0.000	0.918

The relationship between the psychometric and achievement g is r = 0.695.

semPaths(GFR, "model", "std", title = F, residuals = F, groups = "AllLats", pastel = T, mar = c(2, 1, 3, 1), layout = "tree", exoCov = T)

Joint Confirmatory Factor Analysis

AchCog <- c("N", "I", "RL", "RG", "VZ", "CF", "P1", "P2", "P3", "FF", "FI1", "FI2", "MV", "MA1", "MA2", "V1", "V2", "V3", "Portuguese", "English", "Biology", "Physics", "Chemistry", "Math")
AchCog <- BAF[AchCog]

fa.parallel(AchCog)

## Parallel analysis suggests that the number of factors =  6  and the number of components =  3

BAFEFA <- fa(AchCog, nfactors = 1); print(BAFEFA$loadings, cutoff = 0.2)

## 
## Loadings:
##            MR1   
## N           0.683
## I           0.437
## RL          0.546
## RG          0.639
## VZ          0.568
## CF          0.458
## P1          0.353
## P2          0.450
## P3          0.431
## FF               
## FI1         0.285
## FI2         0.310
## MV          0.455
## MA1         0.543
## MA2         0.505
## V1          0.497
## V2          0.345
## V3          0.472
## Portuguese  0.684
## English     0.731
## Biology     0.749
## Physics     0.779
## Chemistry   0.777
## Math        0.792
## 
##                  MR1
## SS loadings    7.337
## Proportion Var 0.306

BAFEFA <- fa(AchCog, nfactors = 6); print(BAFEFA$loadings, cutoff = 0.3)

## Loading required namespace: GPArotation

## 
## Loadings:
##            MR1    MR2    MR4    MR3    MR5    MR6   
## N                                              0.422
## I                  0.803                            
## RL                 0.550                            
## RG                 0.648                            
## VZ                 0.796                            
## CF                 0.661                            
## P1                                             0.481
## P2                                             0.665
## P3                                             0.439
## FF                               0.473              
## FI1                              0.708              
## FI2                              0.729              
## MV                 0.404                0.372       
## MA1                                     0.699       
## MA2                                     0.818       
## V1                        0.745                     
## V2                        0.466                     
## V3                        0.796                     
## Portuguese  0.573                                   
## English     0.774                                   
## Biology     0.880                                   
## Physics     0.914                                   
## Chemistry   0.977                                   
## Math        0.892                                   
## 
##                  MR1   MR2   MR4   MR3   MR5   MR6
## SS loadings    4.450 2.686 1.515 1.362 1.343 1.133
## Proportion Var 0.185 0.112 0.063 0.057 0.056 0.047
## Cumulative Var 0.185 0.297 0.360 0.417 0.473 0.520

GJCFAF <- '
Ach =~ Portuguese + English + Biology + Physics + Chemistry + Math
Gf =~ I + RL + RG + VZ + CF
Gc =~ V1 + V2 + V3
Gfl =~ FF + FI1 + FI2
Gs =~ N + P1 + P2 + P3 
Gsm =~ MV + MA1 + MA2'

GJCFA <- '
Ach =~ Portuguese + English + Biology + Physics + Chemistry + Math
Gf =~ I + RL + RG + VZ + CF
Gc =~ V1 + V2 + V3
Gfl =~ FF + FI1 + FI2
Gs =~ N + P1 + P2 + P3 
Gsm =~ MV + MA1 + MA2

g =~ Ach + Gf + Gc + Gfl + Gs + Gsm'

GF <- cfa(GJCFAF, data = BAF, std.lv = T)
G <- cfa(GJCFA, data = BAF, std.lv = T)

round(cbind("Group Factors" = fitMeasures(GF, FITM),
            "Higher-order" = fitMeasures(G, FITM)), 3)

##                Group Factors Higher-order
## chisq                449.059      459.970
## df                   237.000      246.000
## npar                  63.000       54.000
## cfi                    0.917        0.916
## rmsea                  0.063        0.062
## rmsea.ci.lower         0.054        0.053
## rmsea.ci.upper         0.072        0.071
## aic                31898.973    31891.884
## bic                32114.187    32076.354
## tli                    0.903        0.906

p = 0.282. The g model fits better. The higher-order model fits better almost across the board. The theory-based combined model still fits better even though it appears that the achievement g should really just be a group factor for g.

semPaths(G, "model", "std", title = F, residuals = F, groups = "JCFALats", bifactor = "g", pastel = T, mar = c(2, 1, 3, 1), layout = "tree2", exoCov = T)

parameterEstimates(G, stand = T) %>% 
  filter(op == "=~") %>% 
  select('Latent Factor' = lhs, Indicator = rhs, B = est, SE = se, Z = z, 'p-value' = pvalue, Beta = std.all) %>% 
  kable(digits = 3, format = "pandoc", caption = "Factor Loadings")

Factor Loadings
Latent Factor	Indicator	B	SE	Z	p-value	Beta
Ach	Portuguese	6.326	0.625	10.117	0.000	0.667
Ach	English	8.344	0.654	12.754	0.000	0.828
Ach	Biology	7.093	0.528	13.443	0.000	0.868
Ach	Physics	8.608	0.590	14.586	0.000	0.935
Ach	Chemistry	9.161	0.621	14.752	0.000	0.945
Ach	Math	11.247	0.786	14.311	0.000	0.919
Gf	I	0.751	0.115	6.541	0.000	0.514
Gf	RL	1.784	0.220	8.094	0.000	0.665
Gf	RG	1.975	0.224	8.825	0.000	0.806
Gf	VZ	1.549	0.272	5.703	0.000	0.442
Gf	CF	1.812	0.353	5.136	0.000	0.394
Gc	V1	2.161	0.209	10.347	0.000	0.768
Gc	V2	1.389	0.176	7.888	0.000	0.572
Gc	V3	2.050	0.196	10.472	0.000	0.787
Gfl	FF	1.329	0.296	4.496	0.000	0.336
Gfl	FI1	3.799	0.440	8.633	0.000	0.693
Gfl	FI2	3.048	0.338	9.006	0.000	0.826
Gs	N	2.266	0.863	2.625	0.009	0.832
Gs	P1	1.190	0.465	2.558	0.011	0.441
Gs	P2	0.808	0.306	2.640	0.008	0.546
Gs	P3	1.249	0.477	2.617	0.009	0.511
Gsm	MV	0.682	0.122	5.567	0.000	0.425
Gsm	MA1	2.507	0.283	8.853	0.000	0.790
Gsm	MA2	1.332	0.156	8.566	0.000	0.675
g	Ach	1.017	0.127	7.989	0.000	0.713
g	Gf	1.140	0.177	6.458	0.000	0.752
g	Gc	0.744	0.119	6.268	0.000	0.597
g	Gfl	0.452	0.098	4.592	0.000	0.412
g	Gs	2.828	1.149	2.461	0.014	0.943
g	Gsm	0.809	0.135	6.013	0.000	0.629

How Consistent is g?

PSYTOG <- c(0.379, 0.290, 0.363, 0.486, 0.350, 0.202, 0.279, 0.367, 0.306, 0.078, 0.150, 0.198, 0.214, 0.387, 0.354, 0.365, 0.260, 0.344)
ACHTOG <- c(0.673, 0.829, 0.868, 0.932, 0.942, 0.918)
PSYSEPG <- c(0.514, 0.404, 0.518, 0.638, 0.497, 0.467, 0.355, 0.508, 0.502, 0.166, 0.352, 0.404, 0.268, 0.482, 0.426, 0.464, 0.363, 0.507)
ACHSEPG <- c(0.657, 0.826, 0.867, 0.936, 0.946, 0.917)
PSYTOS <- c(0.254, 0.195, 0.244, 0.326, 0.300, 0.173, 0.194, 0.255, 0.213, 0.074, 0.142, 0.187, 0.159, 0.289, 0.263, 0.289, 0.206, 0.273)
PSYSEPS <- c(0.200, 0.157, 0.201, 0.248, 0.282, 0.265, 0.125, 0.179, 0.177, 0.125, 0.266, 0.305, 0.166, 0.299, 0.265, 0.284, 0.222, 0.310)

TOTTOG <- c(0.379, 0.290, 0.363, 0.486, 0.350, 0.202, 0.279, 0.367, 0.306, 0.078, 0.150, 0.198, 0.214, 0.387, 0.354, 0.365, 0.260, 0.344, 0.673, 0.829, 0.868, 0.932, 0.942, 0.918)
TOTSEPG <- c(0.514, 0.404, 0.518, 0.638, 0.497, 0.467, 0.355, 0.508, 0.502, 0.166, 0.352, 0.404, 0.268, 0.482, 0.426, 0.464, 0.363, 0.507, 0.657, 0.826, 0.867, 0.936, 0.946, 0.917) 
GJCFA <- c(0.785, 0.387, 0.500, 0.606, 0.332, 0.296, 0.416, 0.515, 0.482, 0.138, 0.286, 0.340, 0.267, 0.497, 0.425, 0.458, 0.341, 0.470, 0.476, 0.590, 0.619, 0.667, 0.674, 0.655)
ACHJCFA <- c(0.476, 0.590, 0.619, 0.667, 0.674, 0.655)

message(paste("PSY g Loadings: Together and Apart? Pearson = ", cor(PSYTOG, PSYSEPG), "Spearman = ", cor(PSYTOG, PSYSEPG, method = "spearman"), "Tucker = ", CONGO(PSYTOG, PSYSEPG)))

## PSY g Loadings: Together and Apart? Pearson =  0.860252498899787 Spearman =  0.797109037562556 Tucker =  0.98681796192903

message(paste("PSY s Loadings: Together and Apart? Pearson = ", cor(PSYTOS, PSYSEPS), "Spearman = ", cor(PSYTOS, PSYSEPS, method = "spearman"), "Tucker = ", CONGO(PSYTOS, PSYSEPS)))

## PSY s Loadings: Together and Apart? Pearson =  0.508903334284125 Spearman =  0.458914789965189 Tucker =  0.966088155160878

message(paste("ACH g Loadings: Together and Apart? Pearson = ", cor(ACHTOG, ACHSEPG), "Spearman = ", cor(ACHTOG, ACHSEPG, method = "spearman"), "Tucker = ", CONGO(ACHTOG, ACHSEPG)))

## ACH g Loadings: Together and Apart? Pearson =  0.999893928101252 Spearman =  1 Tucker =  0.999968025516256

message(paste("ACH g Loadings: Combined versus JCFA? Pearson = ", cor(ACHTOG, ACHJCFA), "Spearman = ", cor(ACHTOG, ACHJCFA, method = "spearman"), "Tucker = ", CONGO(ACHTOG, ACHJCFA)))

## ACH g Loadings: Combined versus JCFA? Pearson =  0.999964797764591 Spearman =  1 Tucker =  0.999994015201659

message(paste("ACH g Loadings: Apart versus JCFA? Pearson = ", cor(ACHSEPG, ACHJCFA), "Spearman = ", cor(ACHSEPG, ACHJCFA, method = "spearman"), "Tucker = ", CONGO(ACHSEPG, ACHJCFA)))

## ACH g Loadings: Apart versus JCFA? Pearson =  0.99992847083121 Spearman =  1 Tucker =  0.999988994984006

message(paste("Total g Loadings: Together and Apart? Pearson = ", cor(TOTTOG, TOTSEPG), "Spearman = ", cor(TOTTOG, TOTSEPG, method = "spearman"), "Tucker = ", CONGO(TOTTOG, TOTSEPG)))

## Total g Loadings: Together and Apart? Pearson =  0.971010581931045 Spearman =  0.914546662193708 Tucker =  0.980773371000493

message(paste("Combined versus JCFA g Loadings? Pearson = ", cor(TOTTOG, GJCFA), "Spearman = ", cor(TOTTOG, GJCFA, method = "spearman"), "Tucker = ", CONGO(TOTTOG, GJCFA)))

## Combined versus JCFA g Loadings? Pearson =  0.768173838110543 Spearman =  0.900869565217391 Tucker =  0.938481162146067

The \(\omega_h\) of the solo psychometric battery was 0.791, \(\omega_t\) was 0.827, ECV was 0.785, H was 0.827, PUC was 0.876, and the total variance explained by g was 20%. The group factor H values ranged from 0.075 to 0.197, \(\omega_{hs}\) ranged from 0.053 to 0.144, \(\omega_t\) from 0.347 to 0.645, ECV from 0.017 to 0.049, and total variance from 0.8 to 1.3%. In the corresponding bifactor model - where Gv's loadings were set to \(\sqrt{cov(VZ, CF)}\) - the \(\omega_h\) was 0.730, \(\omega_t\) was 0.891, ECV was 0.423, H was 0.837, PUC was 0.876 (thanks to setting Gv), and the total variance explained by g was 21%. The group factor H values ranged from 0.404 to 0.695, \(\omega_{hs}\) ranged from 0.201 to 0.573, \(\omega_t\) from 0.626 to 0.794, ECV from 0.051 to 0.120, and total variance from 2.5 to 5.9%. As is typical, the bifactor model increased the reliability of group factors.

For the combined battery, for g, the values were 0.849, 0.872, 0.865, 0.965, 0.877, and 26.1%. The group factor ranges were 0.059 to 0.225, 0.100 to 0.160, 0.107 to 0.516, 0.008 to 0.037, and 0.3 to 1.1%. For the JCFA, for g, the values were 0.848, 0.886, 0.816, 0.902, 0.855, and 24.2%. The group factor ranges were 0.016 to 0.379, 0.008 to 0.164, 0.273 to 0.842, 0.002 to 0.078, and 0.1 to 2.3%.

Prediction: More than g?

In the SEM, aggregate grades were predicted at r = 0.620 with g alone; 0.628 with Gf, 0.624 with Gv, 0.612 with Gs, 0.544 with Gfl, 0.639 with Gsm, and 0.611 with Gc. With all group factors in the model, the multiple-R jumped to a mere 0.682. Portuguese was predicted at 0.615 by g alone; 0.618 with Gf, 0.600 with Gv or Gs, 0.496 with Gfl, 0.652 with Gsm, and 0.641 with Gc, but just 0.652 altogether. English was predicated at 0.602 by g alone; 0.602 with Gf, 0.565 with Gv, 0.584 with Gs, 0.536 with Gfl, 0.643 with Gsm, and 0.640 with Gc, but just 0.683 altogether. Since the results were looking dismal at this point, I decided to just provide the biology, physics, chemistry, and math g-only and altogether results; these were 0.574 versus 0.618, 0.607 versus 0.683, 0.587 versus 0.652, 0.663 versus 0.686. There was very little more than g.

These results may be different in a bifactor model, and clearly some degree of confounding between levels may explain the results here, but this is not a hopeful result for finding more than g. It may have been predicted by the Schmid-Leiman derived dimensionalities. If that is the case, then the consistently replicated finding that the bifactor generates absolutely larger group factors that seem to have more predictive validity independent of g may also be true here. Due to identification issues this can be hard to legitimately assess. Because the point was to see if common western findings replicated in Brazil, I pursued this possibility. Below I have given the \(R^2\) figures from regressions of the regression-based (the Bartlett method is not admissible for assessing the predictive validity of higher-order factors for obvious reasons - try it out and see what happens) factor scores in a table and a plot for models with g alone and with group factors added. The incremental \(R^2\) is clearly modest.

BAF <- read.csv("BAFANA.csv") #No cognitive NAs
BAF$GPA <- BAF$Portuguese + BAF$English + BAF$Math + BAF$Biology + BAF$Physics + BAF$Chemistry
BAF$GFT = BAF$N + BAF$I + BAF$RL + BAF$RG
BAF$GVT = BAF$VZ + BAF$CF
BAF$GST = BAF$P1 + BAF$P2 + BAF$P3
BAF$GFLT = BAF$FF + BAF$FI1 + BAF$FI2
BAF$GSMT = BAF$MV + BAF$MA1 + BAF$MA2
BAF$GCT = BAF$V1 + BAF$V2 + BAF$V3
HoG <- cfa(HigherOrder, data = BAF, std.lv = T)
FS <- cbind(BAF, lavPredict(HoG))

PLMG <- lm(Portuguese ~ gINT, FS); PLMM <- lm(Portuguese ~ gINT + Gf + Gv + Gs + Gfl + Gsm + Gc, FS)
ELMG <- lm(English ~ gINT, FS); ELMM <- lm(English ~ gINT + Gf + Gv + Gs + Gfl + Gsm + Gc, FS)
MLMG <- lm(Math ~ gINT, FS); MLMM <- lm(Math ~ gINT + Gc + Gf + Gv + Gs + Gfl + Gsm + Gc, FS)
BLMG <- lm(Biology ~ gINT, FS); BLMM <- lm(Biology ~ gINT + Gf + Gv + Gs + Gfl + Gsm + Gc, FS)
HLMG <- lm(Physics ~ gINT, FS); HLMM <- lm(Physics ~ gINT + Gf + Gv + Gs + Gfl + Gsm + Gc, FS)
CLMG <- lm(Chemistry ~ gINT, FS); CLMM <- lm(Chemistry ~ gINT + Gf + Gv + Gs + Gfl + Gsm + Gc, FS)
GLMG <- lm(GPA ~ gINT, FS); GLMM <- lm(GPA ~ gINT + Gf + Gv + Gs + Gfl + Gsm + Gc, FS)

rdf <- data.frame("Subject" = c("Portuguese", "English", "Mathematics", "Biology", "Physics", "Chemistry", "Sum Grade"), "g" = c(summary(PLMG)$r.squared, summary(ELMG)$r.squared, summary(MLMG)$r.squared, summary(BLMG)$r.squared, summary(HLMG)$r.squared, summary(CLMG)$r.squared, summary(GLMG)$r.squared), "gPlus" = c(summary(PLMM)$r.squared, summary(ELMM)$r.squared, summary(MLMM)$r.squared, summary(BLMM)$r.squared, summary(HLMM)$r.squared, summary(CLMM)$r.squared, summary(GLMM)$r.squared), "gAdj" = c(summary(PLMG)$adj.r.squared, summary(ELMG)$adj.r.squared, summary(MLMG)$adj.r.squared, summary(BLMG)$adj.r.squared, summary(HLMG)$adj.r.squared, summary(CLMG)$adj.r.squared, summary(GLMG)$adj.r.squared), "gPlusAdj" = c(summary(PLMM)$adj.r.squared, summary(ELMM)$adj.r.squared, summary(MLMM)$adj.r.squared, summary(BLMM)$adj.r.squared, summary(HLMM)$adj.r.squared, summary(CLMM)$adj.r.squared, summary(GLMM)$adj.r.squared))
rdf$gProp <- rdf$g/rdf$gPlus
rdf$gPropAdj <- rdf$gAdj/rdf$gPlusAdj
kable(rdf)

Subject	g	gPlus	gAdj	gPlusAdj	gProp	gPropAdj
Portuguese	0.2695278	0.3198890	0.2662813	0.3013406	0.8425667	0.8836557
English	0.2609840	0.2926898	0.2576848	0.2733114	0.8916746	0.9428251
Mathematics	0.3068056	0.3293479	0.3037248	0.3110574	0.9315548	0.9764268
Biology	0.2358436	0.2681422	0.2324474	0.2481824	0.8795469	0.9365989
Physics	0.2646097	0.2853870	0.2613120	0.2657188	0.9271960	0.9834156
Chemistry	0.2508421	0.2726900	0.2475125	0.2528543	0.9198802	0.9788743
Sum Grade	0.3604381	0.3879309	0.3575701	0.3710850	0.9291295	0.9635801

The summary grade measure is predicted at r = 0.600 with g alone versus 0.623 with group factors added (incremental \(R^2\) = 0.0275). The average g proportion of the (non-adjusted) \(R^2\) is 90.3% (95.2% for adjusted). If a correlated group factors model is used, the \(R^2\) values using all group factors are, in order, 0.319, 0.292, 0.328, 0.268, 0.285, 0.272, and 0.387: a restatement of the findings with g, as should occur. Using only the Gf (which produces similar results to the other factors) from this model, we can see that the group factors are just reflecting the g variance, as its \(R^2\) values, in order, are 0.239, 0.216, 0.291, 0.203, 0.219, 0.220, and 0.315. This is the major problem with correlated group factors models, that they just reflect g so other group factors have limited incremental validity over them. In fact, they seem to have validity to the extent they're related to g. Recall from the plot above that g loaded on the group factors at 0.78 (Gf), 0.66 (Gv), 0.81 (Gs), 0.50 (Gfl), 0.61 (Gsm), and 0.62 (Gc). The \(R^2\) values for Gv are 0.146, 0.119, 0.164, 0.107, 0.138, 0.130, and 0.182; for Gs, these are 0.213, 0.203, 0.254, 0.197, 0.224, 0.203, and 0.292; for Gfl, 0.022, 0.052, 0.055, 0.030, 0.050, 0.042, and 0.063; for Gsm, these are 0.205, 0.201, 0.199, 0.181, 0.211, 0.191, and 0.252; and for Gc, these are 0.169, 0.184, 0.136, 0.140, 0.129, 0.131, and 0.207. The correlation between the higher-order g loadings and the validity for the summary grade measure (the most reliable since it is a composite score) is r = 0.881 (shown below). With results like this, just what are practitioners opposed to the notion of g supposed to do? Ignore g and end up predicting with g anyway? Clearly nested models do not produce different latent variances that somehow thus alter the predictive validity through model choice: when you have a positive manifold, a well-fitting model will reiterate that the positive manifold exists in accord with how strong it is and prediction will not be altered by model choice within the realm of models that produce adequate fit.

corrdf <- data.frame("GVal" = c(0.315, 0.182, 0.292, 0.063, 0.252, 0.207), "Gl" = c(0.78, 0.66, 0.81, 0.50, 0.61, 0.62))
cor(corrdf$Gl, corrdf$GVal); cor(corrdf$Gl, corrdf$GVal, method = "spearman"); CONGO(corrdf$Gl, corrdf$GVal)

## [1] 0.88075

## [1] 0.7142857

## [1] 0.9721698

ggplot(corrdf, aes(x = Gl, y = GVal)) + geom_point() + geom_smooth(method = lm, color = "steelblue4", formula = 'y ~ x')+ labs(title = "Relationship between Group Factor g Loading and Validity", x = "g Loading", y = "Validity") + theme_minimal() + theme(legend.position = "none", text = element_text(size = 12, family = "serif"), plot.title = element_text(hjust = 0.5))

rdfp <- rdf[, -c(4:7)]
rdf2 <- melt(rdfp, id.vars = "Subject")
names(rdf2)[names(rdf2) == "variable"] <- "Model"
HOP <- ggplot(rdf2, aes(x = Subject, y = value, fill = Model)) + geom_bar(stat = "identity", position = position_dodge(width = 0.8), width = 0.7, show.legend = T) + coord_cartesian(ylim = c(0, 0.4)) + ylab(TeX("$R^2$")) + scale_fill_manual(values = c("#2D4059", "#DECDC3"), breaks = c("g", "gPlus"), labels = c("Higher-order g", "Higher-order g and Group Factors")) + theme_bw() + theme(legend.position = c(0.16, 0.85), legend.background = element_blank(), text = element_text(size = 12, family = "serif"), axis.title.x = element_blank())

What about with a bifactor model?

Bifactor <- '
Gf =~ N + I + RL + RG
VZ ~~ CF
Gs =~ P1 + P2 + P3
Gfl =~ FF + FI1 + FI2
Gsm =~ MV + MA1 + MA2 
Gc =~ V1 + V2 + V3

gINT =~ N + I + RL + RG + VZ + CF + P1 + P2 + P3 + FF + FI1 + FI2 + MV + MA1 + MA2 + V1 + V2 + V3
RG ~~ 0.5*RG'

BF <- cfa(Bifactor, data = BAF, std.lv = T, orthogonal = T)
fitMeasures(BF, FITM)

##          chisq             df           npar            cfi          rmsea 
##        149.661        119.000         52.000          0.971          0.033 
## rmsea.ci.lower rmsea.ci.upper            aic            bic            tli 
##          0.011          0.049      22835.159      23013.939          0.963

BS <- cbind(BAF, lavPredict(BF))
resid(BF, "cor")

## $type
## [1] "cor.bollen"
## 
## $cov
##     N      I      RL     RG     P1     P2     P3     FF     FI1    FI2   
## N    0.000                                                               
## I   -0.039  0.000                                                        
## RL  -0.041  0.016  0.000                                                 
## RG   0.002  0.000  0.000  0.000                                          
## P1   0.097  0.055 -0.070 -0.040  0.000                                   
## P2   0.071 -0.130 -0.001 -0.027  0.000  0.000                            
## P3   0.003 -0.009 -0.069 -0.024  0.000  0.000  0.000                     
## FF   0.027 -0.071  0.045  0.005 -0.082 -0.002  0.058  0.000              
## FI1 -0.034 -0.003  0.033  0.089 -0.115 -0.029  0.066  0.000  0.000       
## FI2 -0.041  0.087  0.025 -0.004 -0.059  0.027  0.041  0.000  0.000  0.000
## MV   0.001  0.046  0.086  0.037 -0.033 -0.059 -0.020 -0.048 -0.076 -0.072
## MA1  0.035 -0.009 -0.012  0.044  0.088 -0.037 -0.003 -0.002  0.003 -0.040
## MA2  0.000 -0.068  0.113  0.048  0.029 -0.039 -0.005  0.041 -0.002 -0.053
## V1   0.050 -0.072 -0.004 -0.066 -0.036 -0.020 -0.039  0.033  0.059  0.073
## V2  -0.010 -0.006 -0.014 -0.040 -0.004 -0.038 -0.054  0.030  0.094  0.031
## V3  -0.022  0.021  0.084 -0.018 -0.007 -0.053  0.007 -0.014  0.009  0.090
## VZ  -0.029  0.099  0.034  0.009 -0.079  0.034  0.021 -0.088 -0.009 -0.053
## CF  -0.070  0.031  0.037  0.016 -0.075  0.018  0.068 -0.010 -0.006  0.033
##     MV     MA1    MA2    V1     V2     V3     VZ     CF    
## N                                                          
## I                                                          
## RL                                                         
## RG                                                         
## P1                                                         
## P2                                                         
## P3                                                         
## FF                                                         
## FI1                                                        
## FI2                                                        
## MV   0.000                                                 
## MA1  0.000  0.000                                          
## MA2  0.000  0.000  0.000                                   
## V1   0.006  0.025  0.041  0.000                            
## V2  -0.010  0.040 -0.039  0.000  0.000                     
## V3  -0.054 -0.015 -0.022  0.000  0.000  0.000              
## VZ   0.101 -0.038  0.056 -0.061  0.006 -0.009  0.000       
## CF   0.061 -0.078 -0.064 -0.013  0.109  0.050  0.000  0.000

BR <- (resid(BF, "cor"))
mean(abs(BR$cov))

## [1] 0.03460028

PLMG <- lm(Portuguese ~ gINT, BS); PLMM <- lm(Portuguese ~ gINT + Gf + Gs + Gfl + Gsm + Gc, BS); PLMMS <- lm(Portuguese ~ Gf + Gs + Gfl + Gsm + Gc, BS)
ELMG <- lm(English ~ gINT, BS); ELMM <- lm(English ~ gINT + Gf + Gs + Gfl + Gsm + Gc, BS); ELMMS <- lm(English ~ Gf + Gs + Gfl + Gsm + Gc, BS)
MLMG <- lm(Math ~ gINT, BS); MLMM <- lm(Math ~ gINT + Gf + Gs + Gfl + Gsm + Gc, BS); MLMMS <- lm(Math ~ Gf + Gs + Gfl + Gsm + Gc, BS)
BLMG <- lm(Biology ~ gINT, BS); BLMM <- lm(Biology ~ gINT + Gf + Gs + Gfl + Gsm + Gc, BS); BLMMS <- lm(Biology ~ Gf + Gs + Gfl + Gsm + Gc, BS)
HLMG <- lm(Physics ~ gINT, BS); HLMM <- lm(Physics ~ gINT + Gf + Gs + Gfl + Gsm + Gc, BS); HLMMS <- lm(Physics ~ Gf + Gs + Gfl + Gsm + Gc, BS)
CLMG <- lm(Chemistry ~ gINT, BS); CLMM <- lm(Chemistry ~ gINT + Gf + Gs + Gfl + Gsm + Gc, BS); CLMMS <- lm(Chemistry ~ Gf + Gs + Gfl + Gsm + Gc, BS)
GLMG <- lm(GPA ~ gINT, BS); GLMM <- lm(GPA ~ gINT + Gf + Gs + Gfl + Gsm + Gc, BS); GLMMS <- lm(GPA ~ Gf + Gs + Gfl + Gsm + Gc, BS)

rdf <- data.frame("Subject" = c("Portuguese", "English", "Mathematics", "Biology", "Physics", "Chemistry", "Sum Grade"), "g" = c(summary(PLMG)$r.squared, summary(ELMG)$r.squared, summary(MLMG)$r.squared, summary(BLMG)$r.squared, summary(HLMG)$r.squared, summary(CLMG)$r.squared, summary(GLMG)$r.squared), "gPlus" = c(summary(PLMM)$r.squared, summary(ELMM)$r.squared, summary(MLMM)$r.squared, summary(BLMM)$r.squared, summary(HLMM)$r.squared, summary(CLMM)$r.squared, summary(GLMM)$r.squared), "gAdj" = c(summary(PLMG)$adj.r.squared, summary(ELMG)$adj.r.squared, summary(MLMG)$adj.r.squared, summary(BLMG)$adj.r.squared, summary(HLMG)$adj.r.squared, summary(CLMG)$adj.r.squared, summary(GLMG)$adj.r.squared), "gPlusAdj" = c(summary(PLMM)$adj.r.squared, summary(ELMM)$adj.r.squared, summary(MLMM)$adj.r.squared, summary(BLMM)$adj.r.squared, summary(HLMM)$adj.r.squared, summary(CLMM)$adj.r.squared, summary(GLMM)$adj.r.squared), "Sansg" = c(summary(PLMMS)$r.squared, summary(ELMMS)$r.squared, summary(MLMMS)$r.squared, summary(BLMMS)$r.squared, summary(HLMMS)$r.squared, summary(CLMMS)$r.squared, summary(GLMMS)$r.squared), "SansgAdj" = c(summary(PLMMS)$adj.r.squared, summary(ELMMS)$adj.r.squared, summary(MLMMS)$adj.r.squared, summary(BLMMS)$adj.r.squared, summary(HLMMS)$adj.r.squared, summary(CLMMS)$adj.r.squared, summary(GLMMS)$adj.r.squared))
rdf$gProp <- rdf$g/rdf$gPlus
rdf$gPropAdj <- rdf$gAdj/rdf$gPlusAdj
kable(rdf)

Subject	g	gPlus	gAdj	gPlusAdj	Sansg	SansgAdj	gProp	gPropAdj
Portuguese	0.2830000	0.3234593	0.2798133	0.3050082	0.1098939	0.0897557	0.8749170	0.9173962
English	0.2786460	0.3081881	0.2754256	0.2892343	0.1146588	0.0945374	0.9041426	0.9522578
Mathematics	0.3301322	0.3486620	0.3271550	0.3308982	0.0951585	0.0746869	0.9468546	0.9886877
Biology	0.2510367	0.2808534	0.2477080	0.2612403	0.1102747	0.0901451	0.8938355	0.9481998
Physics	0.2969123	0.3139872	0.2937595	0.2951061	0.0749460	0.0538260	0.9456191	0.9954366
Chemistry	0.2621533	0.2833122	0.2588740	0.2637662	0.0885356	0.0679142	0.9253160	0.9814526
Sum Grade	0.3918664	0.4140936	0.3891393	0.3979678	0.1138525	0.0936208	0.9463232	0.9778162

The summary grade measure is predicted at r = 0.626 with g alone versus 0.644 with group factors added (incremental \(R^2\) = 0.0222). The average g proportion is 92% (96.6% for adjusted). In case it is not clear, the "Sans g" columns are not present in the higher-order model because they produce the same result as a model with g plus group factors due to how scoring is done.

rdfp <- rdf[, -c(4:9)]
rdf2 <- melt(rdfp, id.vars = "Subject")
names(rdf2)[names(rdf2) == "variable"] <- "Model"
BFP <- ggplot(rdf2, aes(x = Subject, y = value, fill = Model)) + geom_bar(stat = "identity", position = position_dodge(width = 0.8), width = 0.7, show.legend = T) + coord_cartesian(ylim = c(0, 0.4)) + ylab(TeX("$R^2$")) + scale_fill_manual(values = c("#900D0D", "#D9C6A5"), breaks = c("g", "gPlus"), labels = c("Bifactor g", "Bifactor g and Group Factors")) + theme_bw() + theme(legend.position = c(0.14, 0.85), legend.background = element_blank(), text = element_text(size = 12, family = "serif"), axis.title.x = element_blank())

ggarrange(HOP, BFP, ncol = 2)

Discussion

I aimed to assess if four commonly replicated findings from western countries replicated in a Brazilian dataset. The first of these was confirmed: there is CFA and dimensional evidence for a g factor. The second was as well, at least for a higher-order model, which, it should be said, reduces the importance of group factors compared to the bifactor model, which I tested (not shown) and found to have identification problems. It should also be said that bifactor models generally show limited interpretive relevance for group factors, but they still show greater predictive validity and absolute importance, though they also do not absolutely reduce g, they simply reduce the uniqueness. Identifying why that is can be complicated; it may be that they model out what should really be the minor residual covariances found in every battery (Gignac's remarks on \(\omega_h\) in the HOF and BF are redolent, as are his remarks on the extremely low group factor reliabilities in higher-order models; see Gignac, 2008, his simulation study for \(\omega_h\), and the possibility of proportionality/tetrad constraints violations). The third finding was found to be confirmed to a similar degree to Zaboski, Kranzler & Gage's (2018) result at r = 0.695 compared to their 0.735 assessed via different methods with various indicators. This result may be weaker than the result from Kaufman et al. (2012) because of modeling constraints: there were too few indicators and there was too much missingness for some of the achievement measures - which may have been less reliable and cognitive than achievement tests since they were just grades. The fourth finding, like the second and for some of the same reasons, is prospectively confirmed in a similar manner to McGilll & Busse (2015), allowing similar conclusions to Jensen (1984) or, from wage prediction (also similar to occupational performance), Ganzach & Patel (2018). The common finding that the bifactor model leads to greater incremental validity for group factors was not supported (this finding only applies in cases like this one where group factors are well-defined and not simply reflections of g).

References

Golino, H., & Gomes, C. (2014). Psychology data from the "BAFACALO project: The Brazilian Intelligence Battery based on two state-of-the-art models - Carroll's Model and the CHC model." Journal of Open Psychology Data, 2(1), e6. https://doi.org/10.5334/jopd.af

Benson, N. F., Beaujean, A. A., McGill, R. J., & Dombrowski, S. C. (2018). Revisiting Carroll's survey of factor-analytic studies: Implications for the clinical assessment of intelligence. Psychological Assessment, 30(8), 1028-1038. https://doi.org/10.1037/pas0000556

Kaufman, A. S., & O'Neal, M. R. (1988). Analysis of the Cognitive, Achievement, and General Factors Underlying the Woodcock-Johnson Psycho-Educational Battery. Journal of Clinical Child Psychology, 17(2), 143-151. https://doi.org/10.1207/s15374424jccp1702_6

Deary, I. J., Strand, S., Smith, P., & Fernandes, C. (2007). Intelligence and educational achievement. Intelligence, 35(1), 13-21. https://doi.org/10.1016/j.intell.2006.02.001

Kaufman, S. B., Reynolds, M. R., Liu, X., Kaufman, A. S., & McGrew, K. S. (2012). Are cognitive g and academic achievement g one and the same g? An exploration on the Woodcock-Johnson and Kaufman tests. Intelligence, 40(2), 123-138. https://doi.org/10.1016/j.intell.2012.01.009

Zaboski, B. A., Kranzler, J. H., & Gage, N. A. (2018). Meta-analysis of the relationship between academic achievement and broad abilities of the Cattell-horn-Carroll theory. Journal of School Psychology, 71, 42-56. https://doi.org/10.1016/j.jsp.2018.10.001

Gignac, G. E. (2008). Higher-order models versus direct hierarchical models: G as superordinate or breadth factor? Psychology Science, 50(1), 21-43.

McGill, R. J., & Busse, R. T. (2015). Incremental validity of the WJ III COG: Limited predictive effects beyond the GIA-E. School Psychology Quarterly: The Official Journal of the Division of School Psychology, American Psychological Association, 30(3), 353-365. https://doi.org/10.1037/spq0000094

Jensen, A. R. (1984). Test validity: G versus the specificity doctrine. Journal of Social and Biological Structures, 7(2), 93-118. https://doi.org/10.1016/S0140-1750(84)80001-9

Ganzach, Y., & Patel, P. C. (2018). Wages, mental abilities and assessments in large scale international surveys: Still not much more than g. Intelligence, 69, 1-7. https://doi.org/10.1016/j.intell.2018.03.014

Postscript: Network Models?

I'll do more with this later. That means doing it properly.

p_load(qgraph, qpcR, igraph, EGAnet)
EGA(CogData)

## Network estimated with:
##  • gamma = 0.5
##  • lambda.min.ratio = 0.1

## EGA Results:
## 
## Number of Dimensions:
## [1] 5
## 
## Items per Dimension:
##     items dimension
## FF     FF         1
## FI1   FI1         1
## FI2   FI2         1
## MV     MV         2
## MA1   MA1         2
## MA2   MA2         2
## I       I         3
## RL     RL         3
## RG     RG         3
## VZ     VZ         3
## CF     CF         3
## N       N         4
## P1     P1         4
## P2     P2         4
## P3     P3         4
## V1     V1         5
## V2     V2         5
## V3     V3         5

congraph <- EBICglasso(cor(CogData, use = "pairwise.complete.obs"), n = 230, lambda.min.ratio = 0.01, threshold = T) #threshold = T for specificity
congraphlay <- qgraph(congraph, layout = "spring", legend = F, groups = PsyLats, palette = "colorblind", theme = "Reddit", fade = F)

SNetwork <- ggmFit(congraph, as.matrix(cor(CogData, use = "pairwise.complete.obs")), sampleSize = 230, nPar = fitMeasures(HoG, "npar") - 1, ebicTuning = 0.5)

## Refitting network

SNetwork

## 
## ggmFit object:
##  Use plot(object) to plot the network structure 
##  Fit measures stored under object$fitMeasures 
## 
##              Measure    Value
## 1               nvar       18
## 2               nobs      171
## 3               npar       41
## 4                 df      130
## 5               fmin     1.01
## 6              chisq   464.80
## 7             pvalue < 0.0001
## 8     baseline.chisq  1598.54
## 9        baseline.df      153
## 10   baseline.pvalue < 0.0001
## 11               nfi     0.71
## 12               tli     0.73
## 13               rfi     0.66
## 14               ifi     0.77
## 15               rni     0.77
## 16               cfi     0.77
## 17             rmsea     0.11
## 18    rmsea.ci.lower    0.096
## 19    rmsea.ci.upper     0.12
## 20      rmsea.pvalue        0
## 21               rmr     0.16
## 22              srmr     0.16
## 23              logl -5298.51
## 24 unrestricted.logl -5066.11
## 25            aic.ll 10679.03
## 26           aic.ll2 10697.35
## 27             aic.x   204.80
## 28            aic.x2   546.80
## 29               bic 10819.99
## 30              bic2 10690.04
## 31              ebic 11057.00
## 32        ebicTuning     0.50

BassAckwards?

p_load(GPArotation)
bassAckward(CogData, nfactors = 2, organize = T, cut = 0.3, fm = "pca", main = "BAFACALO, PCA, 2"); bassAckward(CogData, nfactors = 3, organize = T, cut = 0.3, fm = "pca", main = "BAFACALO, PCA, 3"); bassAckward(CogData, nfactors = 6, organize = T, cut = 0.3, fm = "pca", main = "BAFACALO, PCA, 6"); bassAckward(CogData, nfactors = 6, organize = T, cut = 0.3, main = "BAFACALO, EFA, 6")

## 
## Call: bassAckward(r = CogData, nfactors = 2, fm = "pca", cut = 0.3, 
##     organize = T, main = "BAFACALO, PCA, 2")
## 
##  1 C1
##  2 C1
## Use print with the short = FALSE option to see the correlations, or use the summary command.

## 
## Call: bassAckward(r = CogData, nfactors = 3, fm = "pca", cut = 0.3, 
##     organize = T, main = "BAFACALO, PCA, 3")
## 
##  1 C1
##  2 C1
##  3 C1 C2
## Use print with the short = FALSE option to see the correlations, or use the summary command.

## 
## Call: bassAckward(r = CogData, nfactors = 6, fm = "pca", cut = 0.3, 
##     organize = T, main = "BAFACALO, PCA, 6")
## 
##  1 C1
##  2 C1
##  3 C1 C2
##  4 C1 C2 C4
##  5 C1 C2 C3 C4
##  6 C1 C2 C4 C5 C3
## Use print with the short = FALSE option to see the correlations, or use the summary command.

## 
## Call: bassAckward(r = CogData, nfactors = 6, cut = 0.3, organize = T, 
##     main = "BAFACALO, EFA, 6")
## 
##  1 F1
##  2 F1
##  3 F1 F2
##  4 F1 F2 F4
##  5 F1 F2 F5 F3
##  6 F1 F2 F5 F4 F6
## Use print with the short = FALSE option to see the correlations, or use the summary command.

Posterior Check for Regression

With n = 6 this will not be pretty (or necessary).

set.seed(42069)
PPC <- lm(Gl ~ GVal, corrdf)
pp_check(PPC)

Relating Factor Scores to Scale Scores

If a factor model summarizes all or nearly-all of the common, systematic variance in a set of variables, then factor scores and scale scores should correlate near-identically with a single correction for the reliability of the scale. If, on the other hand, the uniqueness of the variables is not random error, the correlation between factor scores and scale scores should be reduced considerably. This is not a very sensitive test, so it'll just be for fun and can really only be used like an intraocular trauma test. The difference from unity without systematic bias should just be due to (1) sampling error which will decrease with n and (2) random measurement error, indexed reasonably well by the composite reliability of the scale. This obviously only applies for unidimensional models like the correlated group-factor or higher-order models (without cross-loadings).

BAFFac <- c("Gf", "Gv", "Gs", "Gfl", "Gsm", "Gc", "GFT", "GVT", "GST", "GFLT", "GSMT", "GCT")
BafCorD <- FS[BAFFac]
cor(BafCorD)

##             Gf        Gv        Gs       Gfl       Gsm        Gc       GFT
## Gf   1.0000000 0.6339483 0.7700378 0.4757595 0.6102915 0.5691873 0.9593579
## Gv   0.6339483 1.0000000 0.6970264 0.3920400 0.5000345 0.5180147 0.5535624
## Gs   0.7700378 0.6970264 1.0000000 0.5207271 0.6398778 0.6260182 0.7143155
## Gfl  0.4757595 0.3920400 0.5207271 1.0000000 0.3439020 0.4401977 0.4050554
## Gsm  0.6102915 0.5000345 0.6398778 0.3439020 1.0000000 0.4780607 0.5501461
## Gc   0.5691873 0.5180147 0.6260182 0.4401977 0.4780607 1.0000000 0.5170801
## GFT  0.9593579 0.5535624 0.7143155 0.4050554 0.5501461 0.5170801 1.0000000
## GVT  0.4706547 0.9676231 0.5410603 0.2830057 0.3515902 0.3920426 0.3916701
## GST  0.5091375 0.4580357 0.8860434 0.3216697 0.4508433 0.4066359 0.4825259
## GFLT 0.3214442 0.2379008 0.3504058 0.9289825 0.2231831 0.2965826 0.2614736
## GSMT 0.4940748 0.4011040 0.5145364 0.2435686 0.9785253 0.3719393 0.4421852
## GCT  0.4488499 0.4238066 0.5089101 0.3744053 0.3913289 0.9743498 0.4056595
##            GVT       GST      GFLT      GSMT       GCT
## Gf   0.4706547 0.5091375 0.3214442 0.4940748 0.4488499
## Gv   0.9676231 0.4580357 0.2379008 0.4011040 0.4238066
## Gs   0.5410603 0.8860434 0.3504058 0.5145364 0.5089101
## Gfl  0.2830057 0.3216697 0.9289825 0.2435686 0.3744053
## Gsm  0.3515902 0.4508433 0.2231831 0.9785253 0.3913289
## Gc   0.3920426 0.4066359 0.2965826 0.3719393 0.9743498
## GFT  0.3916701 0.4825259 0.2614736 0.4421852 0.4056595
## GVT  1.0000000 0.3292943 0.1537225 0.2705081 0.3177133
## GST  0.3292943 1.0000000 0.1836168 0.3499651 0.3091759
## GFLT 0.1537225 0.1836168 1.0000000 0.1434847 0.2567441
## GSMT 0.2705081 0.3499651 0.1434847 1.0000000 0.2994833
## GCT  0.3177133 0.3091759 0.2567441 0.2994833 1.0000000

So latent Gf and its corresponding scale correlate at 0.959, Gv gets 0.968, Gs gets 0.886, Gfl gets 0.929, Gsm gets 0.979, and Gc gets 0.974. It's probably the case that there's very little systematic residual variance in these measures, and certainly too little to do interpret. By the common rule-of-thumb that measures don't have enough discriminant validity to be used independently when they correlate at or greater than r = 0.85, differences between total and factor scores would be rejected for this data. Worth noting is that if a bifactor model is used, the total score and group factor scores are not as related. That is because g is often far more prominent than group factors. There will still be inflation due to the factor score estimation method however. Since this battery did not suffer from factor collapse, it should yield quite large relationships between the group factor measures and their total scores even if those explain little of the variance compared to g in the proper medium to investigate that, A SEM. This is clear just by looking at the sum of squared correlations for g and group factors versus the communalities in this case (the former is higher than the latter and in the case of Gc, >1, an obvious impossibility and not something that happened in the model).

BAFFacBF <- c("gINT", "Gf", "Gs", "Gfl", "Gsm", "Gc", "GFT", "GST", "GFLT", "GSMT", "GCT")
BafCorDBF <- BS[BAFFacBF]
cor(BafCorDBF)

##           gINT           Gf          Gs         Gfl           Gsm            Gc
## gINT 1.0000000  0.177002146  0.18748141  0.10729449  1.309218e-01  0.1722406035
## Gf   0.1770021  1.000000000 -0.17553448 -0.02820754 -5.186266e-03 -0.1871133059
## Gs   0.1874814 -0.175534480  1.00000000 -0.07058037 -1.430262e-01 -0.1912705398
## Gfl  0.1072945 -0.028207538 -0.07058037  1.00000000 -1.023564e-01  0.0475077302
## Gsm  0.1309218 -0.005186266 -0.14302625 -0.10235637  1.000000e+00 -0.0770821352
## Gc   0.1722406 -0.187113306 -0.19127054  0.04750773 -7.708214e-02  1.0000000000
## GFT  0.8401092  0.518208022  0.01774068  0.00537240  4.511176e-02 -0.0001352586
## GST  0.7189841 -0.053850149  0.55764413 -0.02057356  9.512753e-03 -0.0491761077
## GFLT 0.3774979  0.061370040 -0.01847591  0.90561060 -1.793272e-02  0.0846003494
## GSMT 0.6079595  0.070187869 -0.05531146 -0.05954138  7.931666e-01  0.0010948420
## GCT  0.6229409 -0.062785647 -0.06067124  0.09064416 -1.928414e-05  0.8641181444
##                GFT          GST        GFLT         GSMT           GCT
## gINT  0.8401091582  0.718984065  0.37749793  0.607959475  6.229409e-01
## Gf    0.5182080216 -0.053850149  0.06137004  0.070187869 -6.278565e-02
## Gs    0.0177406840  0.557644129 -0.01847591 -0.055311459 -6.067124e-02
## Gfl   0.0053724000 -0.020573562  0.90561060 -0.059541382  9.064416e-02
## Gsm   0.0451117645  0.009512753 -0.01793272  0.793166563 -1.928414e-05
## Gc   -0.0001352586 -0.049176108  0.08460035  0.001094842  8.641181e-01
## GFT   1.0000000000  0.482525917  0.26147357  0.442185213  4.056595e-01
## GST   0.4825259166  1.000000000  0.18361684  0.349965125  3.091759e-01
## GFLT  0.2614735716  0.183616839  1.00000000  0.143484652  2.567441e-01
## GSMT  0.4421852126  0.349965125  0.14348465  1.000000000  2.994833e-01
## GCT   0.4056594783  0.309175920  0.25674408  0.299483350  1.000000e+00

GFTM <- c("N", "I", "RL", "RG"); GFTMD <- FS[GFTM]
GVTM <- c("VZ", "CF"); GVTMD <- FS[GVTM]
GSTM <- c("P1", "P2", "P3"); GSTMD <- FS[GSTM]
GFLTM <- c("FF", "FI1", "FI2"); GFLTMD <- FS[GFLTM]
GSMTM <- c("MV", "MA1", "MA2"); GSMTMD <- FS[GSMTM]
GCTM <- c("V1", "V2", "V3"); GCTMD <- FS[GCTM]

splitHalf(GFTMD, raw = T, covar = F, ci = .05)

## Split half reliabilities  
## Call: splitHalf(r = GFTMD, raw = T, covar = F, ci = 0.05)
## 
## Maximum split half reliability (lambda 4) =  0.75
## Guttman lambda 6                          =  0.71
## Average split half reliability            =  0.74
## Guttman lambda 3 (alpha)                  =  0.74
## Guttman lambda 2                          =  0.75
## Minimum split half reliability  (beta)    =  0.74
## Average interitem r =  0.42  with median =  0.42
##                                              2.5% 50% 97.5%
##  Quantiles of split half reliability      =  0.74 0.75 0.75

splitHalf(GVTMD, raw = T, covar = F, ci = .05)

## Split half reliabilities  
## Call: splitHalf(r = GVTMD, raw = T, covar = F, ci = 0.05)
## 
## Maximum split half reliability (lambda 4) =  0.7
## Guttman lambda 6                          =  0.54
## Average split half reliability            =  0.7
## Guttman lambda 3 (alpha)                  =  0.7
## Guttman lambda 2                          =  0.7
## Minimum split half reliability  (beta)    =  0.7
## Average interitem r =  0.54  with median =  0.54
##                                              2.5% 50% 97.5%
##  Quantiles of split half reliability      =  0.7 0.7 0.7

splitHalf(GSTMD, raw = T, covar = F, ci = .05)

## Split half reliabilities  
## Call: splitHalf(r = GSTMD, raw = T, covar = F, ci = 0.05)
## 
## Maximum split half reliability (lambda 4) =  0.58
## Guttman lambda 6                          =  0.49
## Average split half reliability            =  0.52
## Guttman lambda 3 (alpha)                  =  0.58
## Guttman lambda 2                          =  0.59
## Minimum split half reliability  (beta)    =  0.47
## Average interitem r =  0.32  with median =  0.34
##                                              2.5% 50% 97.5%
##  Quantiles of split half reliability      =  0.47 0.5 0.58

splitHalf(GFLTMD, raw = T, covar = F, ci = .05)

## Split half reliabilities  
## Call: splitHalf(r = GFLTMD, raw = T, covar = F, ci = 0.05)
## 
## Maximum split half reliability (lambda 4) =  0.67
## Guttman lambda 6                          =  0.58
## Average split half reliability            =  0.57
## Guttman lambda 3 (alpha)                  =  0.64
## Guttman lambda 2                          =  0.66
## Minimum split half reliability  (beta)    =  0.42
## Average interitem r =  0.37  with median =  0.31
##                                              2.5% 50% 97.5%
##  Quantiles of split half reliability      =  0.42 0.62 0.67

splitHalf(GSMTMD, raw = T, covar = F, ci = .05)

## Split half reliabilities  
## Call: splitHalf(r = GSMTMD, raw = T, covar = F, ci = 0.05)
## 
## Maximum split half reliability (lambda 4) =  0.65
## Guttman lambda 6                          =  0.58
## Average split half reliability            =  0.58
## Guttman lambda 3 (alpha)                  =  0.66
## Guttman lambda 2                          =  0.67
## Minimum split half reliability  (beta)    =  0.46
## Average interitem r =  0.39  with median =  0.31
##                                              2.5% 50% 97.5%
##  Quantiles of split half reliability      =  0.46 0.64 0.65

splitHalf(GCTMD, raw = T, covar = F, ci = .05)

## Split half reliabilities  
## Call: splitHalf(r = GCTMD, raw = T, covar = F, ci = 0.05)
## 
## Maximum split half reliability (lambda 4) =  0.71
## Guttman lambda 6                          =  0.68
## Average split half reliability            =  0.67
## Guttman lambda 3 (alpha)                  =  0.75
## Guttman lambda 2                          =  0.76
## Minimum split half reliability  (beta)    =  0.6
## Average interitem r =  0.51  with median =  0.46
##                                              2.5% 50% 97.5%
##  Quantiles of split half reliability      =  0.6 0.7 0.71

These are all rather low, but they should serve reasonably well for estimating the relationship between reliability and deviation from a perfect correlation. It is interesting to note that the bifactor group factors produce a relationship with relability of r = 0.058, \(\rho\) = -0.100, and a congruence of 0.974. On the other hand, for g these are 0.203, 0.300, and 0.973: neither is terribly affected by unreliability, but the latter may be to a slight degree. The finding that

dfc <- data.frame("Correlation" = c(0.959, 0.968, 0.886, 0.929, 0.979, 0.974), "Reliability" = c(0.75, 0.70, 0.58, 0.67, 0.65, 0.71))
cor(dfc$Correlation, dfc$Reliability); cor(dfc$Correlation, dfc$Reliability, method = "spearman"); CONGO(dfc$Correlation, dfc$Reliability)

## [1] 0.7123782

## [1] 0.2571429

## [1] 0.9982326

ggplot(dfc, aes(x = Correlation, y = Reliability)) + geom_point() + geom_smooth(method = lm, color = "orangered2", formula = 'y ~ x') + labs(x = "Total Score Correlation", y = "Total Score Reliability") + theme_minimal() + theme(legend.position = "none", text = element_text(size = 12, family = "serif"), plot.title = element_text(hjust = 0.5))

The BAFACALO Battery

Jay Lasker

August 1, 2020