library(pacman)
p_load(lavaan, psych, semPlot, VIM, egg, dplyr, knitr, ggplot2, reshape2, latex2exp)
CONGO <- function(F1, F2) {
PHI = sum(F1*F2) / sqrt(sum(F1^2)*sum(F2^2))
return(PHI)}
BAF <- read.csv("BAFA.csv")
ach_vars <- c("Portuguese", "English", "Geography", "History", "Biology", "Physics", "Chemistry", "Math")
missing_plot <- aggr(BAF[, ach_vars], col=c('gold', 'darkred'), numbers = T, sortVars = T, labels = names(BAF[, ach_vars]), cex.axis = .7, gap = 3, ylab = c("Proportion Missing", "Pattern"))
##
## Variables sorted by number of missings:
## Variable Count
## History 0.6267123
## Geography 0.5958904
## Physics 0.2260274
## English 0.2226027
## Portuguese 0.2191781
## Biology 0.2191781
## Chemistry 0.2191781
## Math 0.2191781
There's too much missing for history and geogaphy and practically the same numbers of people are missing the others. The randomness is dubious, so I elected not to impute. 21.23% of N, P1, P2, P3, FF, FI1, and FI2 were missing but otherwise no cognitive data was missing; because of the apparent lack of randomness, I elected not to impute those either.
As described by Golino & Gomes (2014) the Battery of Higher-Order Cognitive Factors (BAFACALO) is a cognitive battery developed for use in Brazil. The battery was based on the Kit of Factor Referenced Tests (see https://rpubs.com/JLLJ/JOG for an analysis showing that the Kit produces the same g as the ASVAB) and was built to cover six broad domains of the Cattell-Horn-Carroll model:
The data adduced for Golino & Gomes' test also includes grade point measures for Portuguese, English, mathematics, biology, physics, chemistry, geography, and history. Their data can be used to help provide evidence for several stylized facts from the field of intelligence research which, though well-established in the west, have not received equal attention in non-western samples. With these data, I aim to test
In western countries, CFAs usually prefer higher-order to correlated group factor models when there are more than three factors in the model because a model with g is more parsimonious and the factor covariances usually have a unidimensional structure such that a single factor is found among them and it explains the vast majority of the relationships among them. With three or fewer factors, there will be too few for a comparison of models with different degrees of freedom. That this data has a theoretical factor configuration based on the CHC framework is helpful because that framework includes many factors, reliable or unreliable as they may be (Benson et al., 2018). I will be using the theoretically supplied factor structure for my own CFAs. I will model the achievement measures with one factor because, as the EFA below shows, the battery is basically described by one factor; moreover, a two-factor model was unworkable and led to errors. The second point is germane to the first; in general, g explains >70% of the common variance in a battery, regardless of its size. Beyond a certain point, this is more true the broader the battery. As such, the group factors are generally left with limited reliable variance of their own, complicating their interpretation for clinicians and other users of cognitive tests. There is a considerable literature on how attempts to predict clinical outcomes and intellectual performance from individuals' cognitive profiles afford limited validity at best; this may be a result of the limited incremental variance found in those factors. This generally results in the recommendation that a battery not be interpreted beyond its FSIQ, since the other scores amount to proxies for it with a greater degree of measurement error.
The third point is important to know because, in the west, the relationship between achievement and intelligence test measures is, despite a century of work, somehow still a matter of contention. All analyses of population-representative samples with good instruments return that achievement and intelligence measures are strongly related and that most of the variance in achievement is attributable to intelligence (Kaufman & O'Neil, 1988; Deary et al., 2007; Kaufman et al., 2012; Zaboski, Kranzler & Gage, 2018). Most analyses do not get to the root of the topic, obliquely addressing it by finding that achievement and congitive measurements have similar g loadings, that lower-order g factors are highly similar, that g predicts a large part of the variance in a few measures, or that full-scale scores are strongly related. However, in order to account for psychometric sampling error and assess if what they test in general is the same - and if they have the same recommended level of interpretation -, fitting higher-order (or, perhaps, bifactor) CFAs for their comparison is required. The only study which had sufficient data to test this of which I am aware was conducted by Kaufman et al. (2012); they found that achievement-g and intelligence-g were correlated at r = 0.86 for the KABC-II and KTEA-II and 0.80 for the WJ-III Cog and WJ-III Ach. The breadth of these g factors was contestable, but the test was, nonetheless, the best yet-conducted of which I am aware. The relationship will be assessed in this battery within the factor model; unfortunately, the breadth is even worse here, but not terribly so. I will also assess the g and s loading stability of these batteries (Schmid-Leiman) with separate g factors, one g between them, and based on a CFA where the indicators from both sets are allowed to sit under the same group factors (i.e., a true joint confirmatory factor analysis).
The fourth point is about the generalizability of the more than fifty years of findings that, when predicting with IQ tests, they are strong predictors, but the only factor that consistently predicts is g. Specific ability estimates tend to provide much less validity. One reason for this may be that, in the higher-order model, or residualized of g based on a Schmid-Leiman transformation of EFA results, the group factors are relatively weak compared to those extracted directly in bifactor models or those containing g variance in correlated group factor models. This may reflect bifactor group factors pulling variance that resides in the unique factor variances (and these could become group factors with more indicators in the model) for a subset of the tests in a battery, but it may reflect genuinely better measurement. To this end, I will test higher-order factor predictive validity but not bifactor predictive validity, because of identification issues, for the various grades measurements, as well as a unit-weighted GPA-like grade sumscore.
VZ, I, RL, RG, V1, V2, V3, MV, MA1, MA2, VZ, and CF sumscores were used. The II10, 12, and 13 items were excluded for the reason provided by Golino & Gomes (2014).
# Intelligence
BAFACog <- c("N", "I", "RL", "RG", "VZ", "CF", "P1", "P2", "P3", "FF", "FI1", "FI2", "MV", "MA1", "MA2", "V1", "V2", "V3")
CogData <- BAF[BAFACog]
BAFEFA <- fa(CogData, nfactors = 1); print(BAFEFA$loadings, cutoff = 0.2)
##
## Loadings:
## MR1
## N 0.601
## I 0.608
## RL 0.653
## RG 0.727
## VZ 0.704
## CF 0.639
## P1 0.308
## P2 0.428
## P3 0.460
## FF
## FI1 0.361
## FI2 0.370
## MV 0.554
## MA1 0.547
## MA2 0.560
## V1 0.477
## V2 0.336
## V3 0.511
##
## MR1
## SS loadings 4.867
## Proportion Var 0.270
NUM <- seq(1, 18, 1); EVS <- BAFEFA$e.values
bpdf <- data.frame("NUM" = NUM, "EVS" = EVS)
INT <- ggplot(bpdf, aes(x = NUM, y = EVS, fill = as.factor(NUM))) + geom_bar(stat = "identity", show.legend = F) + xlab("Number") + ylab("Eigenvalue") + theme_bw() + scale_fill_hue(c = 30) + theme(text = element_text(size = 12, family = "serif"), plot.title = element_text(hjust = 0.5)) + geom_hline(yintercept = 1, color = "darkgray", linetype = "dashed", size = 1)
# Achievement
BAFAAch <- c("Portuguese", "English", "Biology", "Physics", "Chemistry", "Math")
AchData <- BAF[BAFAAch]
BAFEFA <- fa(AchData, nfactors = 1); print(BAFEFA$loadings, cutoff = 0.2)
##
## Loadings:
## MR1
## Portuguese 0.691
## English 0.816
## Biology 0.898
## Physics 0.927
## Chemistry 0.952
## Math 0.905
##
## MR1
## SS loadings 4.535
## Proportion Var 0.756
ANUM <- seq(1, 6, 1); AVS <- BAFEFA$e.values
bpdf <- data.frame("ANUM" = ANUM, "AVS" = AVS)
ACH <- ggplot(bpdf, aes(x = ANUM, y = AVS, fill = as.factor(ANUM))) + geom_bar(stat = "identity", show.legend = F) + xlab("Number") + ylab("") + theme_bw() + scale_fill_hue(c = 30) + theme(text = element_text(size = 12, family = "serif"), plot.title = element_text(hjust = 0.5)) + geom_hline(yintercept = 1, color = "darkgray", linetype = "dashed", size = 1)
ggarrange(INT, ACH, ncol = 2)
The first factor in the intelligence battery explained 30.70% of the total variance; in the achievement battery, it explained 79.33%.
GroupFactors <- '
Gf =~ N + I + RL + RG
Gv =~ VZ + CF
Gs =~ P1 + P2 + P3
Gfl =~ FF + FI1 + FI2
Gsm =~ MV + MA1 + MA2
Gc =~ V1 + V2 + V3'
HigherOrder <- '
Gf =~ N + I + RL + RG
Gv =~ VZ + CF
Gs =~ P1 + P2 + P3
Gfl =~ FF + FI1 + FI2
Gsm =~ MV + MA1 + MA2
Gc =~ V1 + V2 + V3
gINT =~ Gf + Gv + Gs + Gfl + Gsm + Gc'
Groups <- cfa(GroupFactors, data = BAF, std.lv = T)
HoG <- cfa(HigherOrder, data = BAF, std.lv = T)
round(cbind("Group Factors" = fitMeasures(Groups, FITM),
"Higher-order" = fitMeasures(HoG, FITM)), 3)
## Group Factors Higher-order
## chisq 200.585 207.141
## df 120.000 129.000
## npar 51.000 42.000
## cfi 0.923 0.926
## rmsea 0.054 0.051
## rmsea.ci.lower 0.041 0.038
## rmsea.ci.upper 0.067 0.064
## aic 22884.084 22872.640
## bic 23059.426 23017.039
## tli 0.902 0.912
parameterEstimates(HoG, stand = T) %>%
filter(op == "=~") %>%
select('Latent Factor' = lhs, Indicator = rhs, B = est, SE = se, Z = z, 'p-value' = pvalue, Beta = std.all) %>%
kable(digits = 3, format = "pandoc", caption = "Factor Loadings")
| Latent Factor | Indicator | B | SE | Z | p-value | Beta |
|---|---|---|---|---|---|---|
| Gf | N | 3.309 | 0.459 | 7.205 | 0 | 0.657 |
| Gf | I | 0.719 | 0.117 | 6.126 | 0 | 0.517 |
| Gf | RL | 1.698 | 0.235 | 7.238 | 0 | 0.662 |
| Gf | RG | 1.889 | 0.241 | 7.827 | 0 | 0.816 |
| Gv | VZ | 3.162 | 0.409 | 7.740 | 0 | 0.756 |
| Gv | CF | 3.871 | 0.490 | 7.898 | 0 | 0.711 |
| Gs | P1 | 2.107 | 0.491 | 4.292 | 0 | 0.441 |
| Gs | P2 | 1.647 | 0.336 | 4.902 | 0 | 0.631 |
| Gs | P3 | 2.716 | 0.554 | 4.900 | 0 | 0.623 |
| Gfl | FF | 1.260 | 0.280 | 4.505 | 0 | 0.334 |
| Gfl | FI1 | 3.703 | 0.411 | 9.002 | 0 | 0.710 |
| Gfl | FI2 | 2.863 | 0.313 | 9.149 | 0 | 0.815 |
| Gsm | MV | 0.724 | 0.127 | 5.709 | 0 | 0.435 |
| Gsm | MA1 | 2.524 | 0.288 | 8.775 | 0 | 0.783 |
| Gsm | MA2 | 1.395 | 0.162 | 8.592 | 0 | 0.693 |
| Gc | V1 | 2.055 | 0.209 | 9.842 | 0 | 0.744 |
| Gc | V2 | 1.379 | 0.174 | 7.909 | 0 | 0.582 |
| Gc | V3 | 2.065 | 0.202 | 10.215 | 0 | 0.814 |
| gINT | Gf | 1.256 | 0.225 | 5.594 | 0 | 0.782 |
| gINT | Gv | 0.871 | 0.159 | 5.482 | 0 | 0.657 |
| gINT | Gs | 1.359 | 0.327 | 4.152 | 0 | 0.805 |
| gINT | Gfl | 0.571 | 0.113 | 5.059 | 0 | 0.496 |
| gINT | Gsm | 0.780 | 0.139 | 5.592 | 0 | 0.615 |
| gINT | Gc | 0.796 | 0.133 | 5.964 | 0 | 0.623 |
The higher-order model fit crushingly better than the correlated group factor model for the psychometric data.
resid(HoG, "cor")
## $type
## [1] "cor.bollen"
##
## $cov
## N I RL RG VZ CF P1 P2 P3 FF
## N 0.000
## I -0.079 0.000
## RL -0.075 -0.018 0.000
## RG -0.016 0.068 0.038 0.000
## VZ 0.059 0.069 0.018 -0.033 0.000
## CF 0.005 -0.002 0.017 -0.030 0.000 0.000
## P1 0.176 0.041 -0.070 -0.058 -0.069 -0.069 0.000
## P2 0.176 -0.155 -0.007 -0.059 0.042 0.020 0.060 0.000
## P3 0.121 -0.025 -0.064 -0.045 0.039 0.079 -0.036 -0.014 0.000
## FF -0.076 -0.146 -0.053 -0.115 -0.182 -0.099 -0.151 -0.099 -0.039 0.000
## FI1 0.042 -0.018 0.032 0.070 0.000 -0.001 -0.100 -0.012 0.090 0.005
## FI2 0.021 0.058 0.006 -0.046 -0.061 0.021 -0.056 0.027 0.049 0.033
## MV 0.125 0.068 0.132 0.075 0.155 0.108 0.014 0.004 0.050 -0.102
## MA1 0.127 -0.035 -0.022 0.008 -0.035 -0.081 0.101 -0.024 0.020 -0.094
## MA2 0.033 -0.116 0.069 -0.024 0.024 -0.098 0.014 -0.065 -0.022 -0.039
## V1 0.128 -0.103 -0.021 -0.109 -0.065 -0.022 -0.029 -0.015 -0.025 -0.056
## V2 0.065 -0.023 -0.018 -0.063 0.012 0.110 0.009 -0.024 -0.033 -0.039
## V3 0.078 -0.004 0.076 -0.053 -0.004 0.049 0.009 -0.036 0.034 -0.111
## FI1 FI2 MV MA1 MA2 V1 V2 V3
## N
## I
## RL
## RG
## VZ
## CF
## P1
## P2
## P3
## FF
## FI1 0.000
## FI2 -0.009 0.000
## MV -0.030 -0.034 0.000
## MA1 0.015 -0.044 -0.040 0.000
## MA2 -0.017 -0.085 0.006 0.015 0.000
## V1 0.066 0.064 0.054 0.025 0.009 0.000
## V2 0.106 0.031 0.035 0.050 -0.057 0.019 0.000
## V3 0.023 0.088 0.007 -0.005 -0.050 0.001 -0.016 0.000
HR <- (resid(HoG, "cor"))
mean(abs(HR$cov))
## [1] 0.04919215
There were not really any large local dependence issues. P1:P3 and FF showed a bit, but for the most part, the rest did not. Since this was not fixed in the bifactor model and based on how it appeared, it looks like there might be some surface LD. Some of these could be modestly important were this personality data where correlations are typically lower. It may not be a coincidence that P1:P3 and FF were part of the least-reliable of the scales. The small magnitude of local dependence and its apparent specificity may explain why the bifactor model doesn't offer that much more predictive validity, especially for the group factors; that observation also supports SLD over underlying LD.
PsyLats <- list(
Gf = c("N", "I", "RL", "RG"),
Gv = c("VZ", "CF"),
Gs = c("P1", "P2", "P3"),
Gfl = c("FF", "FI1", "FI2"),
Gsm = c("MV", "MA1", "MA2"),
Gc = c("V1", "V2", "V3"))
semPaths(HoG, "model", "std", title = F, residuals = F, groups = "PsyLats", bifactor = "gINT", pastel = T, mar = c(2, 1, 3, 1), layout = "tree2", exoCov = T)
AchG <- '
gACH =~ Portuguese + English + Biology + Physics + Chemistry + Math'
AchG <- cfa(AchG, data = BAF, std.lv = T)
round(cbind("Achievement g" = fitMeasures(AchG, FITM)), 3)
## Achievement g
## chisq 48.084
## df 9.000
## npar 12.000
## cfi 0.972
## rmsea 0.139
## rmsea.ci.lower 0.102
## rmsea.ci.upper 0.178
## aic 9659.550
## bic 9700.596
## tli 0.953
parameterEstimates(AchG, stand = T) %>%
filter(op == "=~") %>%
select('Latent Factor' = lhs, Indicator = rhs, B = est, SE = se, Z = z, 'p-value' = pvalue, Beta = std.all) %>%
kable(digits = 3, format = "pandoc", caption = "Factor Loadings")
| Latent Factor | Indicator | B | SE | Z | p-value | Beta |
|---|---|---|---|---|---|---|
| gACH | Portuguese | 8.919 | 0.811 | 10.999 | 0 | 0.657 |
| gACH | English | 11.850 | 0.783 | 15.128 | 0 | 0.826 |
| gACH | Biology | 10.092 | 0.617 | 16.355 | 0 | 0.867 |
| gACH | Physics | 12.270 | 0.658 | 18.647 | 0 | 0.936 |
| gACH | Chemistry | 13.055 | 0.687 | 18.996 | 0 | 0.946 |
| gACH | Math | 15.967 | 0.889 | 17.963 | 0 | 0.917 |
Portuguese was the least g loaded, which may make sense if it is the easiest for a sample of Portuguese speakers.
resid(AchG, "cor")
## $type
## [1] "cor.bollen"
##
## $cov
## Portgs Englsh Biolgy Physcs Chmstr Math
## Portuguese 0.000
## English -0.021 0.000
## Biology 0.104 0.029 0.000
## Physics 0.006 -0.012 0.003 0.000
## Chemistry -0.022 -0.004 0.008 -0.007 0.000
## Math -0.027 0.009 -0.051 0.012 0.010 0.000
AR <- (resid(AchG, "cor"))
mean(abs(AR$cov))
## [1] 0.01804877
semPaths(AchG, "std", title = F, residuals = F, mar = c(2, 1, 3, 1), layout = "circle", posCol = c("skyblue4", "red"), sizeMan = 7, edge.label.cex = 1.2)
TwoFactorsNO <- '
Gf =~ N + I + RL + RG
Gv =~ VZ + CF
Gs =~ P1 + P2 + P3
Gfl =~ FF + FI1 + FI2
Gsm =~ MV + MA1 + MA2
Gc =~ V1 + V2 + V3
gINT =~ Gf + Gv + Gs + Gfl + Gsm + Gc
gACH =~ Portuguese + English + Biology + Physics + Chemistry + Math
gINT ~~ 0*gACH'
TwoFactorsFR <- '
Gf =~ N + I + RL + RG
Gv =~ VZ + CF
Gs =~ P1 + P2 + P3
Gfl =~ FF + FI1 + FI2
Gsm =~ MV + MA1 + MA2
Gc =~ V1 + V2 + V3
gINT =~ Gf + Gv + Gs + Gfl + Gsm + Gc
gACH =~ Portuguese + English + Biology + Physics + Chemistry + Math
gINT ~~ gACH'
TwoFactorsID <- '
Gf =~ N + I + RL + RG
Gv =~ VZ + CF
Gs =~ P1 + P2 + P3
Gfl =~ FF + FI1 + FI2
Gsm =~ MV + MA1 + MA2
Gc =~ V1 + V2 + V3
gINT =~ Gf + Gv + Gs + Gfl + Gsm + Gc
gACH =~ Portuguese + English + Biology + Physics + Chemistry + Math
gINT ~~ 1*gACH'
GNO <- cfa(TwoFactorsNO, data = BAF, std.lv = T)
GFR <- cfa(TwoFactorsFR, data = BAF, std.lv = T)
GID <- cfa(TwoFactorsID, data = BAF, std.lv = T)
round(cbind("Unrelated?" = fitMeasures(GNO, FITM),
"Related?" = fitMeasures(GFR, FITM),
"Identical?" = fitMeasures(GID, FITM)), 3)
## Unrelated? Related? Identical?
## chisq 543.947 447.898 535.177
## df 246.000 245.000 246.000
## npar 54.000 55.000 54.000
## cfi 0.884 0.921 0.887
## rmsea 0.073 0.061 0.072
## rmsea.ci.lower 0.065 0.052 0.064
## rmsea.ci.upper 0.082 0.069 0.081
## aic 31975.861 31881.812 31967.092
## bic 32160.331 32069.698 32151.561
## tli 0.869 0.911 0.873
parameterEstimates(GFR, stand = T) %>%
filter(op == "=~") %>%
select('Latent Factor' = lhs, Indicator = rhs, B = est, SE = se, Z = z, 'p-value' = pvalue, Beta = std.all) %>%
kable(digits = 3, format = "pandoc", caption = "Factor Loadings")
| Latent Factor | Indicator | B | SE | Z | p-value | Beta |
|---|---|---|---|---|---|---|
| Gf | N | 2.972 | 0.534 | 5.562 | 0 | 0.759 |
| Gf | I | 0.482 | 0.105 | 4.575 | 0 | 0.454 |
| Gf | RL | 1.185 | 0.226 | 5.238 | 0 | 0.608 |
| Gf | RG | 1.331 | 0.240 | 5.555 | 0 | 0.748 |
| Gv | VZ | 3.273 | 0.449 | 7.297 | 0 | 0.773 |
| Gv | CF | 3.533 | 0.466 | 7.585 | 0 | 0.637 |
| Gs | P1 | 2.319 | 0.491 | 4.724 | 0 | 0.476 |
| Gs | P2 | 1.719 | 0.325 | 5.287 | 0 | 0.644 |
| Gs | P3 | 2.633 | 0.504 | 5.229 | 0 | 0.597 |
| Gfl | FF | 1.301 | 0.293 | 4.433 | 0 | 0.332 |
| Gfl | FI1 | 3.788 | 0.435 | 8.702 | 0 | 0.697 |
| Gfl | FI2 | 3.007 | 0.334 | 9.012 | 0 | 0.823 |
| Gsm | MV | 0.685 | 0.122 | 5.591 | 0 | 0.430 |
| Gsm | MA1 | 2.450 | 0.281 | 8.705 | 0 | 0.779 |
| Gsm | MA2 | 1.336 | 0.157 | 8.485 | 0 | 0.683 |
| Gc | V1 | 2.120 | 0.208 | 10.178 | 0 | 0.757 |
| Gc | V2 | 1.385 | 0.176 | 7.865 | 0 | 0.573 |
| Gc | V3 | 2.069 | 0.198 | 10.439 | 0 | 0.797 |
| gINT | Gf | 1.831 | 0.389 | 4.703 | 0 | 0.878 |
| gINT | Gv | 0.761 | 0.141 | 5.387 | 0 | 0.606 |
| gINT | Gs | 1.328 | 0.289 | 4.602 | 0 | 0.799 |
| gINT | Gfl | 0.476 | 0.101 | 4.702 | 0 | 0.430 |
| gINT | Gsm | 0.827 | 0.139 | 5.933 | 0 | 0.637 |
| gINT | Gc | 0.753 | 0.122 | 6.199 | 0 | 0.602 |
| gACH | Portuguese | 9.018 | 0.806 | 11.192 | 0 | 0.667 |
| gACH | English | 11.899 | 0.785 | 15.156 | 0 | 0.828 |
| gACH | Biology | 10.115 | 0.618 | 16.356 | 0 | 0.868 |
| gACH | Physics | 12.276 | 0.661 | 18.560 | 0 | 0.935 |
| gACH | Chemistry | 13.064 | 0.691 | 18.916 | 0 | 0.945 |
| gACH | Math | 16.035 | 0.891 | 17.990 | 0 | 0.919 |
parameterEstimates(GID, stand = T) %>%
filter(op == "=~") %>%
select('Latent Factor' = lhs, Indicator = rhs, B = est, SE = se, Z = z, 'p-value' = pvalue, Beta = std.all) %>%
kable(digits = 3, format = "pandoc", caption = "Factor Loadings")
| Latent Factor | Indicator | B | SE | Z | p-value | Beta |
|---|---|---|---|---|---|---|
| Gf | N | 4.416 | 0.452 | 9.777 | 0.000 | 0.660 |
| Gf | I | 0.918 | 0.126 | 7.262 | 0.000 | 0.506 |
| Gf | RL | 2.109 | 0.226 | 9.333 | 0.000 | 0.633 |
| Gf | RG | 2.576 | 0.209 | 12.309 | 0.000 | 0.847 |
| Gv | VZ | 4.548 | 0.765 | 5.948 | 0.000 | 0.924 |
| Gv | CF | 3.440 | 0.574 | 5.996 | 0.000 | 0.533 |
| Gs | P1 | 3.411 | 0.559 | 6.107 | 0.000 | 0.505 |
| Gs | P2 | 2.453 | 0.335 | 7.322 | 0.000 | 0.664 |
| Gs | P3 | 3.389 | 0.513 | 6.612 | 0.000 | 0.554 |
| Gfl | FF | 1.447 | 0.317 | 4.562 | 0.000 | 0.342 |
| Gfl | FI1 | 3.842 | 0.514 | 7.472 | 0.000 | 0.656 |
| Gfl | FI2 | 3.416 | 0.413 | 8.272 | 0.000 | 0.867 |
| Gsm | MV | 0.754 | 0.134 | 5.635 | 0.000 | 0.423 |
| Gsm | MA1 | 2.701 | 0.285 | 9.460 | 0.000 | 0.767 |
| Gsm | MA2 | 1.535 | 0.168 | 9.122 | 0.000 | 0.700 |
| Gc | V1 | 2.496 | 0.221 | 11.286 | 0.000 | 0.800 |
| Gc | V2 | 1.535 | 0.188 | 8.152 | 0.000 | 0.570 |
| Gc | V3 | 2.182 | 0.202 | 10.827 | 0.000 | 0.755 |
| gINT | Gf | 0.701 | 0.099 | 7.098 | 0.000 | 0.574 |
| gINT | Gv | 0.409 | 0.099 | 4.139 | 0.000 | 0.379 |
| gINT | Gs | 0.663 | 0.120 | 5.531 | 0.000 | 0.552 |
| gINT | Gfl | 0.234 | 0.080 | 2.924 | 0.003 | 0.228 |
| gINT | Gsm | 0.585 | 0.100 | 5.873 | 0.000 | 0.505 |
| gINT | Gc | 0.512 | 0.090 | 5.698 | 0.000 | 0.456 |
| gACH | Portuguese | 9.096 | 0.804 | 11.317 | 0.000 | 0.673 |
| gACH | English | 11.918 | 0.784 | 15.193 | 0.000 | 0.829 |
| gACH | Biology | 10.117 | 0.618 | 16.362 | 0.000 | 0.868 |
| gACH | Physics | 12.239 | 0.663 | 18.463 | 0.000 | 0.932 |
| gACH | Chemistry | 13.029 | 0.692 | 18.824 | 0.000 | 0.942 |
| gACH | Math | 16.032 | 0.891 | 17.986 | 0.000 | 0.918 |
The relationship between the psychometric and achievement g is r = 0.695.
semPaths(GFR, "model", "std", title = F, residuals = F, groups = "AllLats", pastel = T, mar = c(2, 1, 3, 1), layout = "tree", exoCov = T)
AchCog <- c("N", "I", "RL", "RG", "VZ", "CF", "P1", "P2", "P3", "FF", "FI1", "FI2", "MV", "MA1", "MA2", "V1", "V2", "V3", "Portuguese", "English", "Biology", "Physics", "Chemistry", "Math")
AchCog <- BAF[AchCog]
fa.parallel(AchCog)
## Parallel analysis suggests that the number of factors = 6 and the number of components = 3
BAFEFA <- fa(AchCog, nfactors = 1); print(BAFEFA$loadings, cutoff = 0.2)
##
## Loadings:
## MR1
## N 0.683
## I 0.437
## RL 0.546
## RG 0.639
## VZ 0.568
## CF 0.458
## P1 0.353
## P2 0.450
## P3 0.431
## FF
## FI1 0.285
## FI2 0.310
## MV 0.455
## MA1 0.543
## MA2 0.505
## V1 0.497
## V2 0.345
## V3 0.472
## Portuguese 0.684
## English 0.731
## Biology 0.749
## Physics 0.779
## Chemistry 0.777
## Math 0.792
##
## MR1
## SS loadings 7.337
## Proportion Var 0.306
BAFEFA <- fa(AchCog, nfactors = 6); print(BAFEFA$loadings, cutoff = 0.3)
## Loading required namespace: GPArotation
##
## Loadings:
## MR1 MR2 MR4 MR3 MR5 MR6
## N 0.422
## I 0.803
## RL 0.550
## RG 0.648
## VZ 0.796
## CF 0.661
## P1 0.481
## P2 0.665
## P3 0.439
## FF 0.473
## FI1 0.708
## FI2 0.729
## MV 0.404 0.372
## MA1 0.699
## MA2 0.818
## V1 0.745
## V2 0.466
## V3 0.796
## Portuguese 0.573
## English 0.774
## Biology 0.880
## Physics 0.914
## Chemistry 0.977
## Math 0.892
##
## MR1 MR2 MR4 MR3 MR5 MR6
## SS loadings 4.450 2.686 1.515 1.362 1.343 1.133
## Proportion Var 0.185 0.112 0.063 0.057 0.056 0.047
## Cumulative Var 0.185 0.297 0.360 0.417 0.473 0.520
GJCFAF <- '
Ach =~ Portuguese + English + Biology + Physics + Chemistry + Math
Gf =~ I + RL + RG + VZ + CF
Gc =~ V1 + V2 + V3
Gfl =~ FF + FI1 + FI2
Gs =~ N + P1 + P2 + P3
Gsm =~ MV + MA1 + MA2'
GJCFA <- '
Ach =~ Portuguese + English + Biology + Physics + Chemistry + Math
Gf =~ I + RL + RG + VZ + CF
Gc =~ V1 + V2 + V3
Gfl =~ FF + FI1 + FI2
Gs =~ N + P1 + P2 + P3
Gsm =~ MV + MA1 + MA2
g =~ Ach + Gf + Gc + Gfl + Gs + Gsm'
GF <- cfa(GJCFAF, data = BAF, std.lv = T)
G <- cfa(GJCFA, data = BAF, std.lv = T)
round(cbind("Group Factors" = fitMeasures(GF, FITM),
"Higher-order" = fitMeasures(G, FITM)), 3)
## Group Factors Higher-order
## chisq 449.059 459.970
## df 237.000 246.000
## npar 63.000 54.000
## cfi 0.917 0.916
## rmsea 0.063 0.062
## rmsea.ci.lower 0.054 0.053
## rmsea.ci.upper 0.072 0.071
## aic 31898.973 31891.884
## bic 32114.187 32076.354
## tli 0.903 0.906
p = 0.282. The g model fits better. The higher-order model fits better almost across the board. The theory-based combined model still fits better even though it appears that the achievement g should really just be a group factor for g.
semPaths(G, "model", "std", title = F, residuals = F, groups = "JCFALats", bifactor = "g", pastel = T, mar = c(2, 1, 3, 1), layout = "tree2", exoCov = T)
parameterEstimates(G, stand = T) %>%
filter(op == "=~") %>%
select('Latent Factor' = lhs, Indicator = rhs, B = est, SE = se, Z = z, 'p-value' = pvalue, Beta = std.all) %>%
kable(digits = 3, format = "pandoc", caption = "Factor Loadings")
| Latent Factor | Indicator | B | SE | Z | p-value | Beta |
|---|---|---|---|---|---|---|
| Ach | Portuguese | 6.326 | 0.625 | 10.117 | 0.000 | 0.667 |
| Ach | English | 8.344 | 0.654 | 12.754 | 0.000 | 0.828 |
| Ach | Biology | 7.093 | 0.528 | 13.443 | 0.000 | 0.868 |
| Ach | Physics | 8.608 | 0.590 | 14.586 | 0.000 | 0.935 |
| Ach | Chemistry | 9.161 | 0.621 | 14.752 | 0.000 | 0.945 |
| Ach | Math | 11.247 | 0.786 | 14.311 | 0.000 | 0.919 |
| Gf | I | 0.751 | 0.115 | 6.541 | 0.000 | 0.514 |
| Gf | RL | 1.784 | 0.220 | 8.094 | 0.000 | 0.665 |
| Gf | RG | 1.975 | 0.224 | 8.825 | 0.000 | 0.806 |
| Gf | VZ | 1.549 | 0.272 | 5.703 | 0.000 | 0.442 |
| Gf | CF | 1.812 | 0.353 | 5.136 | 0.000 | 0.394 |
| Gc | V1 | 2.161 | 0.209 | 10.347 | 0.000 | 0.768 |
| Gc | V2 | 1.389 | 0.176 | 7.888 | 0.000 | 0.572 |
| Gc | V3 | 2.050 | 0.196 | 10.472 | 0.000 | 0.787 |
| Gfl | FF | 1.329 | 0.296 | 4.496 | 0.000 | 0.336 |
| Gfl | FI1 | 3.799 | 0.440 | 8.633 | 0.000 | 0.693 |
| Gfl | FI2 | 3.048 | 0.338 | 9.006 | 0.000 | 0.826 |
| Gs | N | 2.266 | 0.863 | 2.625 | 0.009 | 0.832 |
| Gs | P1 | 1.190 | 0.465 | 2.558 | 0.011 | 0.441 |
| Gs | P2 | 0.808 | 0.306 | 2.640 | 0.008 | 0.546 |
| Gs | P3 | 1.249 | 0.477 | 2.617 | 0.009 | 0.511 |
| Gsm | MV | 0.682 | 0.122 | 5.567 | 0.000 | 0.425 |
| Gsm | MA1 | 2.507 | 0.283 | 8.853 | 0.000 | 0.790 |
| Gsm | MA2 | 1.332 | 0.156 | 8.566 | 0.000 | 0.675 |
| g | Ach | 1.017 | 0.127 | 7.989 | 0.000 | 0.713 |
| g | Gf | 1.140 | 0.177 | 6.458 | 0.000 | 0.752 |
| g | Gc | 0.744 | 0.119 | 6.268 | 0.000 | 0.597 |
| g | Gfl | 0.452 | 0.098 | 4.592 | 0.000 | 0.412 |
| g | Gs | 2.828 | 1.149 | 2.461 | 0.014 | 0.943 |
| g | Gsm | 0.809 | 0.135 | 6.013 | 0.000 | 0.629 |
PSYTOG <- c(0.379, 0.290, 0.363, 0.486, 0.350, 0.202, 0.279, 0.367, 0.306, 0.078, 0.150, 0.198, 0.214, 0.387, 0.354, 0.365, 0.260, 0.344)
ACHTOG <- c(0.673, 0.829, 0.868, 0.932, 0.942, 0.918)
PSYSEPG <- c(0.514, 0.404, 0.518, 0.638, 0.497, 0.467, 0.355, 0.508, 0.502, 0.166, 0.352, 0.404, 0.268, 0.482, 0.426, 0.464, 0.363, 0.507)
ACHSEPG <- c(0.657, 0.826, 0.867, 0.936, 0.946, 0.917)
PSYTOS <- c(0.254, 0.195, 0.244, 0.326, 0.300, 0.173, 0.194, 0.255, 0.213, 0.074, 0.142, 0.187, 0.159, 0.289, 0.263, 0.289, 0.206, 0.273)
PSYSEPS <- c(0.200, 0.157, 0.201, 0.248, 0.282, 0.265, 0.125, 0.179, 0.177, 0.125, 0.266, 0.305, 0.166, 0.299, 0.265, 0.284, 0.222, 0.310)
TOTTOG <- c(0.379, 0.290, 0.363, 0.486, 0.350, 0.202, 0.279, 0.367, 0.306, 0.078, 0.150, 0.198, 0.214, 0.387, 0.354, 0.365, 0.260, 0.344, 0.673, 0.829, 0.868, 0.932, 0.942, 0.918)
TOTSEPG <- c(0.514, 0.404, 0.518, 0.638, 0.497, 0.467, 0.355, 0.508, 0.502, 0.166, 0.352, 0.404, 0.268, 0.482, 0.426, 0.464, 0.363, 0.507, 0.657, 0.826, 0.867, 0.936, 0.946, 0.917)
GJCFA <- c(0.785, 0.387, 0.500, 0.606, 0.332, 0.296, 0.416, 0.515, 0.482, 0.138, 0.286, 0.340, 0.267, 0.497, 0.425, 0.458, 0.341, 0.470, 0.476, 0.590, 0.619, 0.667, 0.674, 0.655)
ACHJCFA <- c(0.476, 0.590, 0.619, 0.667, 0.674, 0.655)
message(paste("PSY g Loadings: Together and Apart? Pearson = ", cor(PSYTOG, PSYSEPG), "Spearman = ", cor(PSYTOG, PSYSEPG, method = "spearman"), "Tucker = ", CONGO(PSYTOG, PSYSEPG)))
## PSY g Loadings: Together and Apart? Pearson = 0.860252498899787 Spearman = 0.797109037562556 Tucker = 0.98681796192903
message(paste("PSY s Loadings: Together and Apart? Pearson = ", cor(PSYTOS, PSYSEPS), "Spearman = ", cor(PSYTOS, PSYSEPS, method = "spearman"), "Tucker = ", CONGO(PSYTOS, PSYSEPS)))
## PSY s Loadings: Together and Apart? Pearson = 0.508903334284125 Spearman = 0.458914789965189 Tucker = 0.966088155160878
message(paste("ACH g Loadings: Together and Apart? Pearson = ", cor(ACHTOG, ACHSEPG), "Spearman = ", cor(ACHTOG, ACHSEPG, method = "spearman"), "Tucker = ", CONGO(ACHTOG, ACHSEPG)))
## ACH g Loadings: Together and Apart? Pearson = 0.999893928101252 Spearman = 1 Tucker = 0.999968025516256
message(paste("ACH g Loadings: Combined versus JCFA? Pearson = ", cor(ACHTOG, ACHJCFA), "Spearman = ", cor(ACHTOG, ACHJCFA, method = "spearman"), "Tucker = ", CONGO(ACHTOG, ACHJCFA)))
## ACH g Loadings: Combined versus JCFA? Pearson = 0.999964797764591 Spearman = 1 Tucker = 0.999994015201659
message(paste("ACH g Loadings: Apart versus JCFA? Pearson = ", cor(ACHSEPG, ACHJCFA), "Spearman = ", cor(ACHSEPG, ACHJCFA, method = "spearman"), "Tucker = ", CONGO(ACHSEPG, ACHJCFA)))
## ACH g Loadings: Apart versus JCFA? Pearson = 0.99992847083121 Spearman = 1 Tucker = 0.999988994984006
message(paste("Total g Loadings: Together and Apart? Pearson = ", cor(TOTTOG, TOTSEPG), "Spearman = ", cor(TOTTOG, TOTSEPG, method = "spearman"), "Tucker = ", CONGO(TOTTOG, TOTSEPG)))
## Total g Loadings: Together and Apart? Pearson = 0.971010581931045 Spearman = 0.914546662193708 Tucker = 0.980773371000493
message(paste("Combined versus JCFA g Loadings? Pearson = ", cor(TOTTOG, GJCFA), "Spearman = ", cor(TOTTOG, GJCFA, method = "spearman"), "Tucker = ", CONGO(TOTTOG, GJCFA)))
## Combined versus JCFA g Loadings? Pearson = 0.768173838110543 Spearman = 0.900869565217391 Tucker = 0.938481162146067
The \(\omega_h\) of the solo psychometric battery was 0.791, \(\omega_t\) was 0.827, ECV was 0.785, H was 0.827, PUC was 0.876, and the total variance explained by g was 20%. The group factor H values ranged from 0.075 to 0.197, \(\omega_{hs}\) ranged from 0.053 to 0.144, \(\omega_t\) from 0.347 to 0.645, ECV from 0.017 to 0.049, and total variance from 0.8 to 1.3%. In the corresponding bifactor model - where Gv's loadings were set to \(\sqrt{cov(VZ, CF)}\) - the \(\omega_h\) was 0.730, \(\omega_t\) was 0.891, ECV was 0.423, H was 0.837, PUC was 0.876 (thanks to setting Gv), and the total variance explained by g was 21%. The group factor H values ranged from 0.404 to 0.695, \(\omega_{hs}\) ranged from 0.201 to 0.573, \(\omega_t\) from 0.626 to 0.794, ECV from 0.051 to 0.120, and total variance from 2.5 to 5.9%. As is typical, the bifactor model increased the reliability of group factors.
For the combined battery, for g, the values were 0.849, 0.872, 0.865, 0.965, 0.877, and 26.1%. The group factor ranges were 0.059 to 0.225, 0.100 to 0.160, 0.107 to 0.516, 0.008 to 0.037, and 0.3 to 1.1%. For the JCFA, for g, the values were 0.848, 0.886, 0.816, 0.902, 0.855, and 24.2%. The group factor ranges were 0.016 to 0.379, 0.008 to 0.164, 0.273 to 0.842, 0.002 to 0.078, and 0.1 to 2.3%.
In the SEM, aggregate grades were predicted at r = 0.620 with g alone; 0.628 with Gf, 0.624 with Gv, 0.612 with Gs, 0.544 with Gfl, 0.639 with Gsm, and 0.611 with Gc. With all group factors in the model, the multiple-R jumped to a mere 0.682. Portuguese was predicted at 0.615 by g alone; 0.618 with Gf, 0.600 with Gv or Gs, 0.496 with Gfl, 0.652 with Gsm, and 0.641 with Gc, but just 0.652 altogether. English was predicated at 0.602 by g alone; 0.602 with Gf, 0.565 with Gv, 0.584 with Gs, 0.536 with Gfl, 0.643 with Gsm, and 0.640 with Gc, but just 0.683 altogether. Since the results were looking dismal at this point, I decided to just provide the biology, physics, chemistry, and math g-only and altogether results; these were 0.574 versus 0.618, 0.607 versus 0.683, 0.587 versus 0.652, 0.663 versus 0.686. There was very little more than g.
These results may be different in a bifactor model, and clearly some degree of confounding between levels may explain the results here, but this is not a hopeful result for finding more than g. It may have been predicted by the Schmid-Leiman derived dimensionalities. If that is the case, then the consistently replicated finding that the bifactor generates absolutely larger group factors that seem to have more predictive validity independent of g may also be true here. Due to identification issues this can be hard to legitimately assess. Because the point was to see if common western findings replicated in Brazil, I pursued this possibility. Below I have given the \(R^2\) figures from regressions of the regression-based (the Bartlett method is not admissible for assessing the predictive validity of higher-order factors for obvious reasons - try it out and see what happens) factor scores in a table and a plot for models with g alone and with group factors added. The incremental \(R^2\) is clearly modest.
BAF <- read.csv("BAFANA.csv") #No cognitive NAs
BAF$GPA <- BAF$Portuguese + BAF$English + BAF$Math + BAF$Biology + BAF$Physics + BAF$Chemistry
BAF$GFT = BAF$N + BAF$I + BAF$RL + BAF$RG
BAF$GVT = BAF$VZ + BAF$CF
BAF$GST = BAF$P1 + BAF$P2 + BAF$P3
BAF$GFLT = BAF$FF + BAF$FI1 + BAF$FI2
BAF$GSMT = BAF$MV + BAF$MA1 + BAF$MA2
BAF$GCT = BAF$V1 + BAF$V2 + BAF$V3
HoG <- cfa(HigherOrder, data = BAF, std.lv = T)
FS <- cbind(BAF, lavPredict(HoG))
PLMG <- lm(Portuguese ~ gINT, FS); PLMM <- lm(Portuguese ~ gINT + Gf + Gv + Gs + Gfl + Gsm + Gc, FS)
ELMG <- lm(English ~ gINT, FS); ELMM <- lm(English ~ gINT + Gf + Gv + Gs + Gfl + Gsm + Gc, FS)
MLMG <- lm(Math ~ gINT, FS); MLMM <- lm(Math ~ gINT + Gc + Gf + Gv + Gs + Gfl + Gsm + Gc, FS)
BLMG <- lm(Biology ~ gINT, FS); BLMM <- lm(Biology ~ gINT + Gf + Gv + Gs + Gfl + Gsm + Gc, FS)
HLMG <- lm(Physics ~ gINT, FS); HLMM <- lm(Physics ~ gINT + Gf + Gv + Gs + Gfl + Gsm + Gc, FS)
CLMG <- lm(Chemistry ~ gINT, FS); CLMM <- lm(Chemistry ~ gINT + Gf + Gv + Gs + Gfl + Gsm + Gc, FS)
GLMG <- lm(GPA ~ gINT, FS); GLMM <- lm(GPA ~ gINT + Gf + Gv + Gs + Gfl + Gsm + Gc, FS)
rdf <- data.frame("Subject" = c("Portuguese", "English", "Mathematics", "Biology", "Physics", "Chemistry", "Sum Grade"), "g" = c(summary(PLMG)$r.squared, summary(ELMG)$r.squared, summary(MLMG)$r.squared, summary(BLMG)$r.squared, summary(HLMG)$r.squared, summary(CLMG)$r.squared, summary(GLMG)$r.squared), "gPlus" = c(summary(PLMM)$r.squared, summary(ELMM)$r.squared, summary(MLMM)$r.squared, summary(BLMM)$r.squared, summary(HLMM)$r.squared, summary(CLMM)$r.squared, summary(GLMM)$r.squared), "gAdj" = c(summary(PLMG)$adj.r.squared, summary(ELMG)$adj.r.squared, summary(MLMG)$adj.r.squared, summary(BLMG)$adj.r.squared, summary(HLMG)$adj.r.squared, summary(CLMG)$adj.r.squared, summary(GLMG)$adj.r.squared), "gPlusAdj" = c(summary(PLMM)$adj.r.squared, summary(ELMM)$adj.r.squared, summary(MLMM)$adj.r.squared, summary(BLMM)$adj.r.squared, summary(HLMM)$adj.r.squared, summary(CLMM)$adj.r.squared, summary(GLMM)$adj.r.squared))
rdf$gProp <- rdf$g/rdf$gPlus
rdf$gPropAdj <- rdf$gAdj/rdf$gPlusAdj
kable(rdf)
| Subject | g | gPlus | gAdj | gPlusAdj | gProp | gPropAdj |
|---|---|---|---|---|---|---|
| Portuguese | 0.2695278 | 0.3198890 | 0.2662813 | 0.3013406 | 0.8425667 | 0.8836557 |
| English | 0.2609840 | 0.2926898 | 0.2576848 | 0.2733114 | 0.8916746 | 0.9428251 |
| Mathematics | 0.3068056 | 0.3293479 | 0.3037248 | 0.3110574 | 0.9315548 | 0.9764268 |
| Biology | 0.2358436 | 0.2681422 | 0.2324474 | 0.2481824 | 0.8795469 | 0.9365989 |
| Physics | 0.2646097 | 0.2853870 | 0.2613120 | 0.2657188 | 0.9271960 | 0.9834156 |
| Chemistry | 0.2508421 | 0.2726900 | 0.2475125 | 0.2528543 | 0.9198802 | 0.9788743 |
| Sum Grade | 0.3604381 | 0.3879309 | 0.3575701 | 0.3710850 | 0.9291295 | 0.9635801 |
The summary grade measure is predicted at r = 0.600 with g alone versus 0.623 with group factors added (incremental \(R^2\) = 0.0275). The average g proportion of the (non-adjusted) \(R^2\) is 90.3% (95.2% for adjusted). If a correlated group factors model is used, the \(R^2\) values using all group factors are, in order, 0.319, 0.292, 0.328, 0.268, 0.285, 0.272, and 0.387: a restatement of the findings with g, as should occur. Using only the Gf (which produces similar results to the other factors) from this model, we can see that the group factors are just reflecting the g variance, as its \(R^2\) values, in order, are 0.239, 0.216, 0.291, 0.203, 0.219, 0.220, and 0.315. This is the major problem with correlated group factors models, that they just reflect g so other group factors have limited incremental validity over them. In fact, they seem to have validity to the extent they're related to g. Recall from the plot above that g loaded on the group factors at 0.78 (Gf), 0.66 (Gv), 0.81 (Gs), 0.50 (Gfl), 0.61 (Gsm), and 0.62 (Gc). The \(R^2\) values for Gv are 0.146, 0.119, 0.164, 0.107, 0.138, 0.130, and 0.182; for Gs, these are 0.213, 0.203, 0.254, 0.197, 0.224, 0.203, and 0.292; for Gfl, 0.022, 0.052, 0.055, 0.030, 0.050, 0.042, and 0.063; for Gsm, these are 0.205, 0.201, 0.199, 0.181, 0.211, 0.191, and 0.252; and for Gc, these are 0.169, 0.184, 0.136, 0.140, 0.129, 0.131, and 0.207. The correlation between the higher-order g loadings and the validity for the summary grade measure (the most reliable since it is a composite score) is r = 0.881 (shown below). With results like this, just what are practitioners opposed to the notion of g supposed to do? Ignore g and end up predicting with g anyway? Clearly nested models do not produce different latent variances that somehow thus alter the predictive validity through model choice: when you have a positive manifold, a well-fitting model will reiterate that the positive manifold exists in accord with how strong it is and prediction will not be altered by model choice within the realm of models that produce adequate fit.
corrdf <- data.frame("GVal" = c(0.315, 0.182, 0.292, 0.063, 0.252, 0.207), "Gl" = c(0.78, 0.66, 0.81, 0.50, 0.61, 0.62))
cor(corrdf$Gl, corrdf$GVal); cor(corrdf$Gl, corrdf$GVal, method = "spearman"); CONGO(corrdf$Gl, corrdf$GVal)
## [1] 0.88075
## [1] 0.7142857
## [1] 0.9721698
ggplot(corrdf, aes(x = Gl, y = GVal)) + geom_point() + geom_smooth(method = lm, color = "steelblue4", formula = 'y ~ x')+ labs(title = "Relationship between Group Factor g Loading and Validity", x = "g Loading", y = "Validity") + theme_minimal() + theme(legend.position = "none", text = element_text(size = 12, family = "serif"), plot.title = element_text(hjust = 0.5))
rdfp <- rdf[, -c(4:7)]
rdf2 <- melt(rdfp, id.vars = "Subject")
names(rdf2)[names(rdf2) == "variable"] <- "Model"
HOP <- ggplot(rdf2, aes(x = Subject, y = value, fill = Model)) + geom_bar(stat = "identity", position = position_dodge(width = 0.8), width = 0.7, show.legend = T) + coord_cartesian(ylim = c(0, 0.4)) + ylab(TeX("$R^2$")) + scale_fill_manual(values = c("#2D4059", "#DECDC3"), breaks = c("g", "gPlus"), labels = c("Higher-order g", "Higher-order g and Group Factors")) + theme_bw() + theme(legend.position = c(0.16, 0.85), legend.background = element_blank(), text = element_text(size = 12, family = "serif"), axis.title.x = element_blank())
What about with a bifactor model?
Bifactor <- '
Gf =~ N + I + RL + RG
VZ ~~ CF
Gs =~ P1 + P2 + P3
Gfl =~ FF + FI1 + FI2
Gsm =~ MV + MA1 + MA2
Gc =~ V1 + V2 + V3
gINT =~ N + I + RL + RG + VZ + CF + P1 + P2 + P3 + FF + FI1 + FI2 + MV + MA1 + MA2 + V1 + V2 + V3
RG ~~ 0.5*RG'
BF <- cfa(Bifactor, data = BAF, std.lv = T, orthogonal = T)
fitMeasures(BF, FITM)
## chisq df npar cfi rmsea
## 149.661 119.000 52.000 0.971 0.033
## rmsea.ci.lower rmsea.ci.upper aic bic tli
## 0.011 0.049 22835.159 23013.939 0.963
BS <- cbind(BAF, lavPredict(BF))
resid(BF, "cor")
## $type
## [1] "cor.bollen"
##
## $cov
## N I RL RG P1 P2 P3 FF FI1 FI2
## N 0.000
## I -0.039 0.000
## RL -0.041 0.016 0.000
## RG 0.002 0.000 0.000 0.000
## P1 0.097 0.055 -0.070 -0.040 0.000
## P2 0.071 -0.130 -0.001 -0.027 0.000 0.000
## P3 0.003 -0.009 -0.069 -0.024 0.000 0.000 0.000
## FF 0.027 -0.071 0.045 0.005 -0.082 -0.002 0.058 0.000
## FI1 -0.034 -0.003 0.033 0.089 -0.115 -0.029 0.066 0.000 0.000
## FI2 -0.041 0.087 0.025 -0.004 -0.059 0.027 0.041 0.000 0.000 0.000
## MV 0.001 0.046 0.086 0.037 -0.033 -0.059 -0.020 -0.048 -0.076 -0.072
## MA1 0.035 -0.009 -0.012 0.044 0.088 -0.037 -0.003 -0.002 0.003 -0.040
## MA2 0.000 -0.068 0.113 0.048 0.029 -0.039 -0.005 0.041 -0.002 -0.053
## V1 0.050 -0.072 -0.004 -0.066 -0.036 -0.020 -0.039 0.033 0.059 0.073
## V2 -0.010 -0.006 -0.014 -0.040 -0.004 -0.038 -0.054 0.030 0.094 0.031
## V3 -0.022 0.021 0.084 -0.018 -0.007 -0.053 0.007 -0.014 0.009 0.090
## VZ -0.029 0.099 0.034 0.009 -0.079 0.034 0.021 -0.088 -0.009 -0.053
## CF -0.070 0.031 0.037 0.016 -0.075 0.018 0.068 -0.010 -0.006 0.033
## MV MA1 MA2 V1 V2 V3 VZ CF
## N
## I
## RL
## RG
## P1
## P2
## P3
## FF
## FI1
## FI2
## MV 0.000
## MA1 0.000 0.000
## MA2 0.000 0.000 0.000
## V1 0.006 0.025 0.041 0.000
## V2 -0.010 0.040 -0.039 0.000 0.000
## V3 -0.054 -0.015 -0.022 0.000 0.000 0.000
## VZ 0.101 -0.038 0.056 -0.061 0.006 -0.009 0.000
## CF 0.061 -0.078 -0.064 -0.013 0.109 0.050 0.000 0.000
BR <- (resid(BF, "cor"))
mean(abs(BR$cov))
## [1] 0.03460028
PLMG <- lm(Portuguese ~ gINT, BS); PLMM <- lm(Portuguese ~ gINT + Gf + Gs + Gfl + Gsm + Gc, BS); PLMMS <- lm(Portuguese ~ Gf + Gs + Gfl + Gsm + Gc, BS)
ELMG <- lm(English ~ gINT, BS); ELMM <- lm(English ~ gINT + Gf + Gs + Gfl + Gsm + Gc, BS); ELMMS <- lm(English ~ Gf + Gs + Gfl + Gsm + Gc, BS)
MLMG <- lm(Math ~ gINT, BS); MLMM <- lm(Math ~ gINT + Gf + Gs + Gfl + Gsm + Gc, BS); MLMMS <- lm(Math ~ Gf + Gs + Gfl + Gsm + Gc, BS)
BLMG <- lm(Biology ~ gINT, BS); BLMM <- lm(Biology ~ gINT + Gf + Gs + Gfl + Gsm + Gc, BS); BLMMS <- lm(Biology ~ Gf + Gs + Gfl + Gsm + Gc, BS)
HLMG <- lm(Physics ~ gINT, BS); HLMM <- lm(Physics ~ gINT + Gf + Gs + Gfl + Gsm + Gc, BS); HLMMS <- lm(Physics ~ Gf + Gs + Gfl + Gsm + Gc, BS)
CLMG <- lm(Chemistry ~ gINT, BS); CLMM <- lm(Chemistry ~ gINT + Gf + Gs + Gfl + Gsm + Gc, BS); CLMMS <- lm(Chemistry ~ Gf + Gs + Gfl + Gsm + Gc, BS)
GLMG <- lm(GPA ~ gINT, BS); GLMM <- lm(GPA ~ gINT + Gf + Gs + Gfl + Gsm + Gc, BS); GLMMS <- lm(GPA ~ Gf + Gs + Gfl + Gsm + Gc, BS)
rdf <- data.frame("Subject" = c("Portuguese", "English", "Mathematics", "Biology", "Physics", "Chemistry", "Sum Grade"), "g" = c(summary(PLMG)$r.squared, summary(ELMG)$r.squared, summary(MLMG)$r.squared, summary(BLMG)$r.squared, summary(HLMG)$r.squared, summary(CLMG)$r.squared, summary(GLMG)$r.squared), "gPlus" = c(summary(PLMM)$r.squared, summary(ELMM)$r.squared, summary(MLMM)$r.squared, summary(BLMM)$r.squared, summary(HLMM)$r.squared, summary(CLMM)$r.squared, summary(GLMM)$r.squared), "gAdj" = c(summary(PLMG)$adj.r.squared, summary(ELMG)$adj.r.squared, summary(MLMG)$adj.r.squared, summary(BLMG)$adj.r.squared, summary(HLMG)$adj.r.squared, summary(CLMG)$adj.r.squared, summary(GLMG)$adj.r.squared), "gPlusAdj" = c(summary(PLMM)$adj.r.squared, summary(ELMM)$adj.r.squared, summary(MLMM)$adj.r.squared, summary(BLMM)$adj.r.squared, summary(HLMM)$adj.r.squared, summary(CLMM)$adj.r.squared, summary(GLMM)$adj.r.squared), "Sansg" = c(summary(PLMMS)$r.squared, summary(ELMMS)$r.squared, summary(MLMMS)$r.squared, summary(BLMMS)$r.squared, summary(HLMMS)$r.squared, summary(CLMMS)$r.squared, summary(GLMMS)$r.squared), "SansgAdj" = c(summary(PLMMS)$adj.r.squared, summary(ELMMS)$adj.r.squared, summary(MLMMS)$adj.r.squared, summary(BLMMS)$adj.r.squared, summary(HLMMS)$adj.r.squared, summary(CLMMS)$adj.r.squared, summary(GLMMS)$adj.r.squared))
rdf$gProp <- rdf$g/rdf$gPlus
rdf$gPropAdj <- rdf$gAdj/rdf$gPlusAdj
kable(rdf)
| Subject | g | gPlus | gAdj | gPlusAdj | Sansg | SansgAdj | gProp | gPropAdj |
|---|---|---|---|---|---|---|---|---|
| Portuguese | 0.2830000 | 0.3234593 | 0.2798133 | 0.3050082 | 0.1098939 | 0.0897557 | 0.8749170 | 0.9173962 |
| English | 0.2786460 | 0.3081881 | 0.2754256 | 0.2892343 | 0.1146588 | 0.0945374 | 0.9041426 | 0.9522578 |
| Mathematics | 0.3301322 | 0.3486620 | 0.3271550 | 0.3308982 | 0.0951585 | 0.0746869 | 0.9468546 | 0.9886877 |
| Biology | 0.2510367 | 0.2808534 | 0.2477080 | 0.2612403 | 0.1102747 | 0.0901451 | 0.8938355 | 0.9481998 |
| Physics | 0.2969123 | 0.3139872 | 0.2937595 | 0.2951061 | 0.0749460 | 0.0538260 | 0.9456191 | 0.9954366 |
| Chemistry | 0.2621533 | 0.2833122 | 0.2588740 | 0.2637662 | 0.0885356 | 0.0679142 | 0.9253160 | 0.9814526 |
| Sum Grade | 0.3918664 | 0.4140936 | 0.3891393 | 0.3979678 | 0.1138525 | 0.0936208 | 0.9463232 | 0.9778162 |
The summary grade measure is predicted at r = 0.626 with g alone versus 0.644 with group factors added (incremental \(R^2\) = 0.0222). The average g proportion is 92% (96.6% for adjusted). In case it is not clear, the "Sans g" columns are not present in the higher-order model because they produce the same result as a model with g plus group factors due to how scoring is done.
rdfp <- rdf[, -c(4:9)]
rdf2 <- melt(rdfp, id.vars = "Subject")
names(rdf2)[names(rdf2) == "variable"] <- "Model"
BFP <- ggplot(rdf2, aes(x = Subject, y = value, fill = Model)) + geom_bar(stat = "identity", position = position_dodge(width = 0.8), width = 0.7, show.legend = T) + coord_cartesian(ylim = c(0, 0.4)) + ylab(TeX("$R^2$")) + scale_fill_manual(values = c("#900D0D", "#D9C6A5"), breaks = c("g", "gPlus"), labels = c("Bifactor g", "Bifactor g and Group Factors")) + theme_bw() + theme(legend.position = c(0.14, 0.85), legend.background = element_blank(), text = element_text(size = 12, family = "serif"), axis.title.x = element_blank())
ggarrange(HOP, BFP, ncol = 2)
I aimed to assess if four commonly replicated findings from western countries replicated in a Brazilian dataset. The first of these was confirmed: there is CFA and dimensional evidence for a g factor. The second was as well, at least for a higher-order model, which, it should be said, reduces the importance of group factors compared to the bifactor model, which I tested (not shown) and found to have identification problems. It should also be said that bifactor models generally show limited interpretive relevance for group factors, but they still show greater predictive validity and absolute importance, though they also do not absolutely reduce g, they simply reduce the uniqueness. Identifying why that is can be complicated; it may be that they model out what should really be the minor residual covariances found in every battery (Gignac's remarks on \(\omega_h\) in the HOF and BF are redolent, as are his remarks on the extremely low group factor reliabilities in higher-order models; see Gignac, 2008, his simulation study for \(\omega_h\), and the possibility of proportionality/tetrad constraints violations). The third finding was found to be confirmed to a similar degree to Zaboski, Kranzler & Gage's (2018) result at r = 0.695 compared to their 0.735 assessed via different methods with various indicators. This result may be weaker than the result from Kaufman et al. (2012) because of modeling constraints: there were too few indicators and there was too much missingness for some of the achievement measures - which may have been less reliable and cognitive than achievement tests since they were just grades. The fourth finding, like the second and for some of the same reasons, is prospectively confirmed in a similar manner to McGilll & Busse (2015), allowing similar conclusions to Jensen (1984) or, from wage prediction (also similar to occupational performance), Ganzach & Patel (2018). The common finding that the bifactor model leads to greater incremental validity for group factors was not supported (this finding only applies in cases like this one where group factors are well-defined and not simply reflections of g).
Golino, H., & Gomes, C. (2014). Psychology data from the "BAFACALO project: The Brazilian Intelligence Battery based on two state-of-the-art models - Carroll's Model and the CHC model." Journal of Open Psychology Data, 2(1), e6. https://doi.org/10.5334/jopd.af
Benson, N. F., Beaujean, A. A., McGill, R. J., & Dombrowski, S. C. (2018). Revisiting Carroll's survey of factor-analytic studies: Implications for the clinical assessment of intelligence. Psychological Assessment, 30(8), 1028-1038. https://doi.org/10.1037/pas0000556
Kaufman, A. S., & O'Neal, M. R. (1988). Analysis of the Cognitive, Achievement, and General Factors Underlying the Woodcock-Johnson Psycho-Educational Battery. Journal of Clinical Child Psychology, 17(2), 143-151. https://doi.org/10.1207/s15374424jccp1702_6
Deary, I. J., Strand, S., Smith, P., & Fernandes, C. (2007). Intelligence and educational achievement. Intelligence, 35(1), 13-21. https://doi.org/10.1016/j.intell.2006.02.001
Kaufman, S. B., Reynolds, M. R., Liu, X., Kaufman, A. S., & McGrew, K. S. (2012). Are cognitive g and academic achievement g one and the same g? An exploration on the Woodcock-Johnson and Kaufman tests. Intelligence, 40(2), 123-138. https://doi.org/10.1016/j.intell.2012.01.009
Zaboski, B. A., Kranzler, J. H., & Gage, N. A. (2018). Meta-analysis of the relationship between academic achievement and broad abilities of the Cattell-horn-Carroll theory. Journal of School Psychology, 71, 42-56. https://doi.org/10.1016/j.jsp.2018.10.001
Gignac, G. E. (2008). Higher-order models versus direct hierarchical models: G as superordinate or breadth factor? Psychology Science, 50(1), 21-43.
McGill, R. J., & Busse, R. T. (2015). Incremental validity of the WJ III COG: Limited predictive effects beyond the GIA-E. School Psychology Quarterly: The Official Journal of the Division of School Psychology, American Psychological Association, 30(3), 353-365. https://doi.org/10.1037/spq0000094
Jensen, A. R. (1984). Test validity: G versus the specificity doctrine. Journal of Social and Biological Structures, 7(2), 93-118. https://doi.org/10.1016/S0140-1750(84)80001-9
Ganzach, Y., & Patel, P. C. (2018). Wages, mental abilities and assessments in large scale international surveys: Still not much more than g. Intelligence, 69, 1-7. https://doi.org/10.1016/j.intell.2018.03.014
I'll do more with this later. That means doing it properly.
p_load(qgraph, qpcR, igraph, EGAnet)
EGA(CogData)
## Network estimated with:
## • gamma = 0.5
## • lambda.min.ratio = 0.1
## EGA Results:
##
## Number of Dimensions:
## [1] 5
##
## Items per Dimension:
## items dimension
## FF FF 1
## FI1 FI1 1
## FI2 FI2 1
## MV MV 2
## MA1 MA1 2
## MA2 MA2 2
## I I 3
## RL RL 3
## RG RG 3
## VZ VZ 3
## CF CF 3
## N N 4
## P1 P1 4
## P2 P2 4
## P3 P3 4
## V1 V1 5
## V2 V2 5
## V3 V3 5
congraph <- EBICglasso(cor(CogData, use = "pairwise.complete.obs"), n = 230, lambda.min.ratio = 0.01, threshold = T) #threshold = T for specificity
congraphlay <- qgraph(congraph, layout = "spring", legend = F, groups = PsyLats, palette = "colorblind", theme = "Reddit", fade = F)
SNetwork <- ggmFit(congraph, as.matrix(cor(CogData, use = "pairwise.complete.obs")), sampleSize = 230, nPar = fitMeasures(HoG, "npar") - 1, ebicTuning = 0.5)
## Refitting network
SNetwork
##
## ggmFit object:
## Use plot(object) to plot the network structure
## Fit measures stored under object$fitMeasures
##
## Measure Value
## 1 nvar 18
## 2 nobs 171
## 3 npar 41
## 4 df 130
## 5 fmin 1.01
## 6 chisq 464.80
## 7 pvalue < 0.0001
## 8 baseline.chisq 1598.54
## 9 baseline.df 153
## 10 baseline.pvalue < 0.0001
## 11 nfi 0.71
## 12 tli 0.73
## 13 rfi 0.66
## 14 ifi 0.77
## 15 rni 0.77
## 16 cfi 0.77
## 17 rmsea 0.11
## 18 rmsea.ci.lower 0.096
## 19 rmsea.ci.upper 0.12
## 20 rmsea.pvalue 0
## 21 rmr 0.16
## 22 srmr 0.16
## 23 logl -5298.51
## 24 unrestricted.logl -5066.11
## 25 aic.ll 10679.03
## 26 aic.ll2 10697.35
## 27 aic.x 204.80
## 28 aic.x2 546.80
## 29 bic 10819.99
## 30 bic2 10690.04
## 31 ebic 11057.00
## 32 ebicTuning 0.50
p_load(GPArotation)
bassAckward(CogData, nfactors = 2, organize = T, cut = 0.3, fm = "pca", main = "BAFACALO, PCA, 2"); bassAckward(CogData, nfactors = 3, organize = T, cut = 0.3, fm = "pca", main = "BAFACALO, PCA, 3"); bassAckward(CogData, nfactors = 6, organize = T, cut = 0.3, fm = "pca", main = "BAFACALO, PCA, 6"); bassAckward(CogData, nfactors = 6, organize = T, cut = 0.3, main = "BAFACALO, EFA, 6")
##
## Call: bassAckward(r = CogData, nfactors = 2, fm = "pca", cut = 0.3,
## organize = T, main = "BAFACALO, PCA, 2")
##
## 1 C1
## 2 C1
## Use print with the short = FALSE option to see the correlations, or use the summary command.
##
## Call: bassAckward(r = CogData, nfactors = 3, fm = "pca", cut = 0.3,
## organize = T, main = "BAFACALO, PCA, 3")
##
## 1 C1
## 2 C1
## 3 C1 C2
## Use print with the short = FALSE option to see the correlations, or use the summary command.
##
## Call: bassAckward(r = CogData, nfactors = 6, fm = "pca", cut = 0.3,
## organize = T, main = "BAFACALO, PCA, 6")
##
## 1 C1
## 2 C1
## 3 C1 C2
## 4 C1 C2 C4
## 5 C1 C2 C3 C4
## 6 C1 C2 C4 C5 C3
## Use print with the short = FALSE option to see the correlations, or use the summary command.
##
## Call: bassAckward(r = CogData, nfactors = 6, cut = 0.3, organize = T,
## main = "BAFACALO, EFA, 6")
##
## 1 F1
## 2 F1
## 3 F1 F2
## 4 F1 F2 F4
## 5 F1 F2 F5 F3
## 6 F1 F2 F5 F4 F6
## Use print with the short = FALSE option to see the correlations, or use the summary command.
With n = 6 this will not be pretty (or necessary).
set.seed(42069)
PPC <- lm(Gl ~ GVal, corrdf)
pp_check(PPC)
If a factor model summarizes all or nearly-all of the common, systematic variance in a set of variables, then factor scores and scale scores should correlate near-identically with a single correction for the reliability of the scale. If, on the other hand, the uniqueness of the variables is not random error, the correlation between factor scores and scale scores should be reduced considerably. This is not a very sensitive test, so it'll just be for fun and can really only be used like an intraocular trauma test. The difference from unity without systematic bias should just be due to (1) sampling error which will decrease with n and (2) random measurement error, indexed reasonably well by the composite reliability of the scale. This obviously only applies for unidimensional models like the correlated group-factor or higher-order models (without cross-loadings).
BAFFac <- c("Gf", "Gv", "Gs", "Gfl", "Gsm", "Gc", "GFT", "GVT", "GST", "GFLT", "GSMT", "GCT")
BafCorD <- FS[BAFFac]
cor(BafCorD)
## Gf Gv Gs Gfl Gsm Gc GFT
## Gf 1.0000000 0.6339483 0.7700378 0.4757595 0.6102915 0.5691873 0.9593579
## Gv 0.6339483 1.0000000 0.6970264 0.3920400 0.5000345 0.5180147 0.5535624
## Gs 0.7700378 0.6970264 1.0000000 0.5207271 0.6398778 0.6260182 0.7143155
## Gfl 0.4757595 0.3920400 0.5207271 1.0000000 0.3439020 0.4401977 0.4050554
## Gsm 0.6102915 0.5000345 0.6398778 0.3439020 1.0000000 0.4780607 0.5501461
## Gc 0.5691873 0.5180147 0.6260182 0.4401977 0.4780607 1.0000000 0.5170801
## GFT 0.9593579 0.5535624 0.7143155 0.4050554 0.5501461 0.5170801 1.0000000
## GVT 0.4706547 0.9676231 0.5410603 0.2830057 0.3515902 0.3920426 0.3916701
## GST 0.5091375 0.4580357 0.8860434 0.3216697 0.4508433 0.4066359 0.4825259
## GFLT 0.3214442 0.2379008 0.3504058 0.9289825 0.2231831 0.2965826 0.2614736
## GSMT 0.4940748 0.4011040 0.5145364 0.2435686 0.9785253 0.3719393 0.4421852
## GCT 0.4488499 0.4238066 0.5089101 0.3744053 0.3913289 0.9743498 0.4056595
## GVT GST GFLT GSMT GCT
## Gf 0.4706547 0.5091375 0.3214442 0.4940748 0.4488499
## Gv 0.9676231 0.4580357 0.2379008 0.4011040 0.4238066
## Gs 0.5410603 0.8860434 0.3504058 0.5145364 0.5089101
## Gfl 0.2830057 0.3216697 0.9289825 0.2435686 0.3744053
## Gsm 0.3515902 0.4508433 0.2231831 0.9785253 0.3913289
## Gc 0.3920426 0.4066359 0.2965826 0.3719393 0.9743498
## GFT 0.3916701 0.4825259 0.2614736 0.4421852 0.4056595
## GVT 1.0000000 0.3292943 0.1537225 0.2705081 0.3177133
## GST 0.3292943 1.0000000 0.1836168 0.3499651 0.3091759
## GFLT 0.1537225 0.1836168 1.0000000 0.1434847 0.2567441
## GSMT 0.2705081 0.3499651 0.1434847 1.0000000 0.2994833
## GCT 0.3177133 0.3091759 0.2567441 0.2994833 1.0000000
So latent Gf and its corresponding scale correlate at 0.959, Gv gets 0.968, Gs gets 0.886, Gfl gets 0.929, Gsm gets 0.979, and Gc gets 0.974. It's probably the case that there's very little systematic residual variance in these measures, and certainly too little to do interpret. By the common rule-of-thumb that measures don't have enough discriminant validity to be used independently when they correlate at or greater than r = 0.85, differences between total and factor scores would be rejected for this data. Worth noting is that if a bifactor model is used, the total score and group factor scores are not as related. That is because g is often far more prominent than group factors. There will still be inflation due to the factor score estimation method however. Since this battery did not suffer from factor collapse, it should yield quite large relationships between the group factor measures and their total scores even if those explain little of the variance compared to g in the proper medium to investigate that, A SEM. This is clear just by looking at the sum of squared correlations for g and group factors versus the communalities in this case (the former is higher than the latter and in the case of Gc, >1, an obvious impossibility and not something that happened in the model).
BAFFacBF <- c("gINT", "Gf", "Gs", "Gfl", "Gsm", "Gc", "GFT", "GST", "GFLT", "GSMT", "GCT")
BafCorDBF <- BS[BAFFacBF]
cor(BafCorDBF)
## gINT Gf Gs Gfl Gsm Gc
## gINT 1.0000000 0.177002146 0.18748141 0.10729449 1.309218e-01 0.1722406035
## Gf 0.1770021 1.000000000 -0.17553448 -0.02820754 -5.186266e-03 -0.1871133059
## Gs 0.1874814 -0.175534480 1.00000000 -0.07058037 -1.430262e-01 -0.1912705398
## Gfl 0.1072945 -0.028207538 -0.07058037 1.00000000 -1.023564e-01 0.0475077302
## Gsm 0.1309218 -0.005186266 -0.14302625 -0.10235637 1.000000e+00 -0.0770821352
## Gc 0.1722406 -0.187113306 -0.19127054 0.04750773 -7.708214e-02 1.0000000000
## GFT 0.8401092 0.518208022 0.01774068 0.00537240 4.511176e-02 -0.0001352586
## GST 0.7189841 -0.053850149 0.55764413 -0.02057356 9.512753e-03 -0.0491761077
## GFLT 0.3774979 0.061370040 -0.01847591 0.90561060 -1.793272e-02 0.0846003494
## GSMT 0.6079595 0.070187869 -0.05531146 -0.05954138 7.931666e-01 0.0010948420
## GCT 0.6229409 -0.062785647 -0.06067124 0.09064416 -1.928414e-05 0.8641181444
## GFT GST GFLT GSMT GCT
## gINT 0.8401091582 0.718984065 0.37749793 0.607959475 6.229409e-01
## Gf 0.5182080216 -0.053850149 0.06137004 0.070187869 -6.278565e-02
## Gs 0.0177406840 0.557644129 -0.01847591 -0.055311459 -6.067124e-02
## Gfl 0.0053724000 -0.020573562 0.90561060 -0.059541382 9.064416e-02
## Gsm 0.0451117645 0.009512753 -0.01793272 0.793166563 -1.928414e-05
## Gc -0.0001352586 -0.049176108 0.08460035 0.001094842 8.641181e-01
## GFT 1.0000000000 0.482525917 0.26147357 0.442185213 4.056595e-01
## GST 0.4825259166 1.000000000 0.18361684 0.349965125 3.091759e-01
## GFLT 0.2614735716 0.183616839 1.00000000 0.143484652 2.567441e-01
## GSMT 0.4421852126 0.349965125 0.14348465 1.000000000 2.994833e-01
## GCT 0.4056594783 0.309175920 0.25674408 0.299483350 1.000000e+00
GFTM <- c("N", "I", "RL", "RG"); GFTMD <- FS[GFTM]
GVTM <- c("VZ", "CF"); GVTMD <- FS[GVTM]
GSTM <- c("P1", "P2", "P3"); GSTMD <- FS[GSTM]
GFLTM <- c("FF", "FI1", "FI2"); GFLTMD <- FS[GFLTM]
GSMTM <- c("MV", "MA1", "MA2"); GSMTMD <- FS[GSMTM]
GCTM <- c("V1", "V2", "V3"); GCTMD <- FS[GCTM]
splitHalf(GFTMD, raw = T, covar = F, ci = .05)
## Split half reliabilities
## Call: splitHalf(r = GFTMD, raw = T, covar = F, ci = 0.05)
##
## Maximum split half reliability (lambda 4) = 0.75
## Guttman lambda 6 = 0.71
## Average split half reliability = 0.74
## Guttman lambda 3 (alpha) = 0.74
## Guttman lambda 2 = 0.75
## Minimum split half reliability (beta) = 0.74
## Average interitem r = 0.42 with median = 0.42
## 2.5% 50% 97.5%
## Quantiles of split half reliability = 0.74 0.75 0.75
splitHalf(GVTMD, raw = T, covar = F, ci = .05)
## Split half reliabilities
## Call: splitHalf(r = GVTMD, raw = T, covar = F, ci = 0.05)
##
## Maximum split half reliability (lambda 4) = 0.7
## Guttman lambda 6 = 0.54
## Average split half reliability = 0.7
## Guttman lambda 3 (alpha) = 0.7
## Guttman lambda 2 = 0.7
## Minimum split half reliability (beta) = 0.7
## Average interitem r = 0.54 with median = 0.54
## 2.5% 50% 97.5%
## Quantiles of split half reliability = 0.7 0.7 0.7
splitHalf(GSTMD, raw = T, covar = F, ci = .05)
## Split half reliabilities
## Call: splitHalf(r = GSTMD, raw = T, covar = F, ci = 0.05)
##
## Maximum split half reliability (lambda 4) = 0.58
## Guttman lambda 6 = 0.49
## Average split half reliability = 0.52
## Guttman lambda 3 (alpha) = 0.58
## Guttman lambda 2 = 0.59
## Minimum split half reliability (beta) = 0.47
## Average interitem r = 0.32 with median = 0.34
## 2.5% 50% 97.5%
## Quantiles of split half reliability = 0.47 0.5 0.58
splitHalf(GFLTMD, raw = T, covar = F, ci = .05)
## Split half reliabilities
## Call: splitHalf(r = GFLTMD, raw = T, covar = F, ci = 0.05)
##
## Maximum split half reliability (lambda 4) = 0.67
## Guttman lambda 6 = 0.58
## Average split half reliability = 0.57
## Guttman lambda 3 (alpha) = 0.64
## Guttman lambda 2 = 0.66
## Minimum split half reliability (beta) = 0.42
## Average interitem r = 0.37 with median = 0.31
## 2.5% 50% 97.5%
## Quantiles of split half reliability = 0.42 0.62 0.67
splitHalf(GSMTMD, raw = T, covar = F, ci = .05)
## Split half reliabilities
## Call: splitHalf(r = GSMTMD, raw = T, covar = F, ci = 0.05)
##
## Maximum split half reliability (lambda 4) = 0.65
## Guttman lambda 6 = 0.58
## Average split half reliability = 0.58
## Guttman lambda 3 (alpha) = 0.66
## Guttman lambda 2 = 0.67
## Minimum split half reliability (beta) = 0.46
## Average interitem r = 0.39 with median = 0.31
## 2.5% 50% 97.5%
## Quantiles of split half reliability = 0.46 0.64 0.65
splitHalf(GCTMD, raw = T, covar = F, ci = .05)
## Split half reliabilities
## Call: splitHalf(r = GCTMD, raw = T, covar = F, ci = 0.05)
##
## Maximum split half reliability (lambda 4) = 0.71
## Guttman lambda 6 = 0.68
## Average split half reliability = 0.67
## Guttman lambda 3 (alpha) = 0.75
## Guttman lambda 2 = 0.76
## Minimum split half reliability (beta) = 0.6
## Average interitem r = 0.51 with median = 0.46
## 2.5% 50% 97.5%
## Quantiles of split half reliability = 0.6 0.7 0.71
These are all rather low, but they should serve reasonably well for estimating the relationship between reliability and deviation from a perfect correlation. It is interesting to note that the bifactor group factors produce a relationship with relability of r = 0.058, \(\rho\) = -0.100, and a congruence of 0.974. On the other hand, for g these are 0.203, 0.300, and 0.973: neither is terribly affected by unreliability, but the latter may be to a slight degree. The finding that
dfc <- data.frame("Correlation" = c(0.959, 0.968, 0.886, 0.929, 0.979, 0.974), "Reliability" = c(0.75, 0.70, 0.58, 0.67, 0.65, 0.71))
cor(dfc$Correlation, dfc$Reliability); cor(dfc$Correlation, dfc$Reliability, method = "spearman"); CONGO(dfc$Correlation, dfc$Reliability)
## [1] 0.7123782
## [1] 0.2571429
## [1] 0.9982326
ggplot(dfc, aes(x = Correlation, y = Reliability)) + geom_point() + geom_smooth(method = lm, color = "orangered2", formula = 'y ~ x') + labs(x = "Total Score Correlation", y = "Total Score Reliability") + theme_minimal() + theme(legend.position = "none", text = element_text(size = 12, family = "serif"), plot.title = element_text(hjust = 0.5))