Setup

library(pacman); p_load(lavaan, dplyr)

FITM <- c("chisq", "df", "nPar", "cfi", "rmsea", "rmsea.ci.lower", "rmsea.ci.upper", "aic", "bic")

Rationale

Humphreys et al. (2022) finally made the Bucharest Early Intervention Project IQ data available to analyze (https://osf.io/yd9se/?view_only=eca2485027ff49d487da424769400ba6). They concluded that movement from care-as-usual to foster-care improved fullscale-IQs by 9 points. For some reason, they reported this as “9.00 points”, despite it being 8.38 points (p = .02). They made gigantic conclusions, like that their work provided “compelling evidence that caregiving relationships influence IQ” and “long-lasting family placements [sic] is the most advantageous strategy to enhance cognitive development among children requiring institutional care”. The statistically low-quality nature of their results makes this conclusion entirely unwarranted, as it is nearly consistent with chance at conventional p-value thresholds, and with the number of additional analyses they ran, any multiple comparisons correction would render their result nonsignificant. Moreover, reanalysis reveals that the difference could be attributed to ethnic differences in the sample just as easily as it could to the intervention, as both were as statistically strongly supported. Randomization may be a way around this concern in theory, but with samples as small as the one they used, randomization failures can easily occur, and the evidence for such a failure is at least as strong as the evidence being moved to foster-care had any effect on IQ or intelligence.

Analysis

Summary Statistics

group_by(data, group) %>%
  summarise(
    count = n(),
    mean = mean(IQ_fullscale_age18),
    sd = sd(IQ_fullscale_age18))
group_by(data, ethnic) %>%
  summarise(
    count = n(),
    mean = mean(IQ_fullscale_age18),
    sd = sd(IQ_fullscale_age18))

Mean Differences

t.test(IQ_fullscale_age18 ~ ethnic, dataEthnic) #Only Romanians and Roma
## 
##  Welch Two Sample t-test
## 
## data:  IQ_fullscale_age18 by ethnic
## t = 5.0937, df = 60.821, p-value = 3.655e-06
## alternative hypothesis: true difference in means between group Romanian and group Rroma (Gypsy) is not equal to 0
## 95 percent confidence interval:
##  11.35186 26.02592
## sample estimates:
##      mean in group Romanian mean in group Rroma (Gypsy) 
##                    83.28889                    64.60000
t.test(IQ_fullscale_age18 ~ group, dataGroup)   #Only CAUG and FCG
## 
##  Welch Two Sample t-test
## 
## data:  IQ_fullscale_age18 by group
## t = -2.3448, df = 91.062, p-value = 0.02121
## alternative hypothesis: true difference in means between group CAUG and group FCG is not equal to 0
## 95 percent confidence interval:
##  -16.32420  -1.35104
## sample estimates:
## mean in group CAUG  mean in group FCG 
##           64.65217           73.48980

Baseline Model

RomGModel <- '
g =~ VCI_est3_age18 + PRI_est3_age18 + WMI_est2_age18 + PSI_est2_age18'

RomGFit <- cfa(RomGModel, data, std.lv = T)
summary(RomGFit, stand = T, fit = T)
## lavaan 0.6-12 ended normally after 17 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                         8
## 
##   Number of observations                           135
## 
## Model Test User Model:
##                                                       
##   Test statistic                                 4.294
##   Degrees of freedom                                 2
##   P-value (Chi-square)                           0.117
## 
## Model Test Baseline Model:
## 
##   Test statistic                               477.186
##   Degrees of freedom                                 6
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.995
##   Tucker-Lewis Index (TLI)                       0.985
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -2131.914
##   Loglikelihood unrestricted model (H1)      -2129.768
##                                                       
##   Akaike (AIC)                                4279.829
##   Bayesian (BIC)                              4303.071
##   Sample-size adjusted Bayesian (BIC)         4277.764
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.092
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.215
##   P-value RMSEA <= 0.05                          0.202
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.012
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   g =~                                                                  
##     VCI_est3_age18   17.453    1.347   12.960    0.000   17.453    0.885
##     PRI_est3_age18   19.113    1.357   14.083    0.000   19.113    0.930
##     WMI_est2_age18   17.316    1.366   12.679    0.000   17.316    0.873
##     PSI_est2_age18   15.347    1.230   12.476    0.000   15.347    0.865
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .VCI_est3_age18   84.091   13.446    6.254    0.000   84.091    0.216
##    .PRI_est3_age18   57.302   11.953    4.794    0.000   57.302    0.136
##    .WMI_est2_age18   93.153   14.361    6.486    0.000   93.153    0.237
##    .PSI_est2_age18   79.353   11.971    6.629    0.000   79.353    0.252
##     g                 1.000                               1.000    1.000

Between Ethnic Groups

MI rules: check if p < .05, \(\Delta_{\text{AIC}} > 3\) or \(\Delta_{\text{BIC}} > 3\), and large (i.e., >.01 or >.015) changes in CFI or RMSEA, respectively, but these get less concern if \(\chi^2\), AIC, and BIC are fine. First comparison: Romanians and Roma.

RomConfigural <- cfa(RomGModel, dataEthnic, std.lv = T, group = "ethnic")
RomMetric <- cfa(RomGModel, dataEthnic, std.lv = F, group = "ethnic", group.equal = "loadings")

#RomPartialMetric <- cfa(RomGModel, dataEthnic, std.lv = F, group = "ethnic", group.equal = "loadings", group.partial = "g=~PRI_est3_age18") #Optional

RomScalar <- cfa(RomGModel, dataEthnic, std.lv = F, group = "ethnic", group.equal = c("loadings", "intercepts"))

#RomPartialScalar <- cfa(RomGModel, dataEthnic, std.lv = F, group = "ethnic", group.equal = c("loadings", "intercepts"), group.partial = c("g=~PRI_est3_age18", "PRI_est3_age18~1"))

RomStrict <- cfa(RomGModel, dataEthnic, std.lv = F, group = "ethnic", group.equal = c("loadings", "intercepts", "residuals"))

#RomPartialStrict <- cfa(RomGModel, dataEthnic, std.lv = F, group = "ethnic", group.equal = c("loadings", "intercepts", "residuals"), group.partial = c("g=~PRI_est3_age18", "PRI_est3_age18~1", "VCI_est3_age18~~VCI_est3_age18"))

RomLV <- cfa(RomGModel, dataEthnic, std.lv = F, group = "ethnic", group.equal = c("loadings", "intercepts", "residuals", "lv.variances"))
RomMean <- cfa(RomGModel, dataEthnic, std.lv = F, group = "ethnic", group.equal = c("loadings", "intercepts", "residuals", "lv.variances", "means"))

round(cbind(RC = fitMeasures(RomConfigural, FITM),
            RM = fitMeasures(RomMetric, FITM), #pchisq(11.721-4.648, 3, lower.tail = F) = .07
#           RMP = fitMeasures(RomPartialMetric, FITM),
            RS = fitMeasures(RomScalar, FITM), #pchisq(15.935-11.721, 3, lower.tail = F) = .24
#           RSP = fitMeasures(RomPartialScalar, FITM),
            RR = fitMeasures(RomStrict, FITM), #pchisq(23.715-15.935, 4, lower.tail = F) = .10
#           RRS = fitMeasures(RomPartialStrict, FITM),
            RL = fitMeasures(RomLV, FITM),
            RE = fitMeasures(RomMean, FITM)), 3)
##                      RC       RM       RS       RR       RL       RE
## chisq             4.648   11.721   15.935   23.715   25.773   42.852
## df                4.000    7.000   10.000   14.000   15.000   16.000
## npar             24.000   21.000   18.000   14.000   13.000   12.000
## cfi               0.998    0.987    0.984    0.973    0.970    0.926
## rmsea             0.052    0.106    0.099    0.108    0.109    0.167
## rmsea.ci.lower    0.000    0.000    0.000    0.005    0.023    0.108
## rmsea.ci.upper    0.208    0.209    0.187    0.180    0.179    0.229
## aic            3798.535 3799.609 3797.822 3797.603 3797.660 3812.739
## bic            3865.435 3858.146 3847.997 3836.628 3833.897 3846.189
summary(RomLV, stand = T, fit = T)
## lavaan 0.6-12 ended normally after 89 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        25
##   Number of equality constraints                    12
## 
##   Number of observations per group:                   
##     Romanian                                        90
##     Rroma (Gypsy)                                   30
## 
## Model Test User Model:
##                                                       
##   Test statistic                                25.773
##   Degrees of freedom                                15
##   P-value (Chi-square)                           0.040
##   Test statistic for each group:
##     Romanian                                     7.963
##     Rroma (Gypsy)                               17.810
## 
## Model Test Baseline Model:
## 
##   Test statistic                               373.962
##   Degrees of freedom                                12
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.970
##   Tucker-Lewis Index (TLI)                       0.976
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -1885.830
##   Loglikelihood unrestricted model (H1)      -1872.944
##                                                       
##   Akaike (AIC)                                3797.660
##   Bayesian (BIC)                              3833.897
##   Sample-size adjusted Bayesian (BIC)         3792.798
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.109
##   90 Percent confidence interval - lower         0.023
##   90 Percent confidence interval - upper         0.179
##   P-value RMSEA <= 0.05                          0.096
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.135
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## 
## Group 1 [Romanian]:
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   g =~                                                                  
##     VCI_3_1           1.000                              15.675    0.874
##     PRI_3_1 (.p2.)    1.111    0.073   15.191    0.000   17.418    0.911
##     WMI_2_1 (.p3.)    0.977    0.074   13.158    0.000   15.322    0.841
##     PSI_2_1 (.p4.)    0.857    0.067   12.755    0.000   13.426    0.827
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .VCI_3_1 (.10.)   83.396    1.855   44.959    0.000   83.396    4.651
##    .PRI_3_1 (.11.)   86.523    1.994   43.394    0.000   86.523    4.524
##    .WMI_2_1 (.12.)   89.455    1.872   47.785    0.000   89.455    4.910
##    .PSI_2_1 (.13.)   86.417    1.664   51.931    0.000   86.417    5.320
##     g                 0.000                               0.000    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .VCI_3_1 (.p5.)   75.763   13.412    5.649    0.000   75.763    0.236
##    .PRI_3_1 (.p6.)   62.398   13.454    4.638    0.000   62.398    0.171
##    .WMI_2_1 (.p7.)   97.165   15.641    6.212    0.000   97.165    0.293
##    .PSI_2_1 (.p8.)   83.564   13.088    6.385    0.000   83.564    0.317
##     g       (.p9.)  245.700   40.308    6.096    0.000    1.000    1.000
## 
## 
## Group 2 [Rroma (Gypsy)]:
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   g =~                                                                  
##     VCI_3_1           1.000                              15.675    0.874
##     PRI_3_1 (.p2.)    1.111    0.073   15.191    0.000   17.418    0.911
##     WMI_2_1 (.p3.)    0.977    0.074   13.158    0.000   15.322    0.841
##     PSI_2_1 (.p4.)    0.857    0.067   12.755    0.000   13.426    0.827
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .VCI_3_1 (.10.)   83.396    1.855   44.959    0.000   83.396    4.651
##    .PRI_3_1 (.11.)   86.523    1.994   43.394    0.000   86.523    4.524
##    .WMI_2_1 (.12.)   89.455    1.872   47.785    0.000   89.455    4.910
##    .PSI_2_1 (.13.)   86.417    1.664   51.931    0.000   86.417    5.320
##     g               -14.750    3.492   -4.225    0.000   -0.941   -0.941
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .VCI_3_1 (.p5.)   75.763   13.412    5.649    0.000   75.763    0.236
##    .PRI_3_1 (.p6.)   62.398   13.454    4.638    0.000   62.398    0.171
##    .WMI_2_1 (.p7.)   97.165   15.641    6.212    0.000   97.165    0.293
##    .PSI_2_1 (.p8.)   83.564   13.088    6.385    0.000   83.564    0.317
##     g       (.p9.)  245.700   40.308    6.096    0.000    1.000    1.000

There were no measurement issues comparing Romanians to Roma. Their mean intelligence levels differed significantly, but not due to bias. Roma average -.941 g lower than Romanians,

Between Intervention Groups

Second comparison: care-as-usual-group (CAUG) and foster-care-group (FCG). The non-institutionalized group is of children who were not abandoned and is therefore incomparable, as it is not randomized, and is very obviously selected with respect to the conditions that bring about child abandonment. An interesting example of this playing out can be observed in the UKBB, where individuals put up for adoption had greater genetic burdens for psychiatric conditions (Lehto et al., 2020). This was obviously not caused by being put up for adoption. The correct conclusion is that people with a higher risk of psychiatric conditions for genetic reasons are more likely to put their kids - who inherit their genes - up for adoption.

RomConfigural <- cfa(RomGModel, dataGroup, std.lv = T, group = "group")
RomMetric <- cfa(RomGModel, dataGroup, std.lv = F, group = "group", group.equal = "loadings")
RomScalar <- cfa(RomGModel, dataGroup, std.lv = F, group = "group", group.equal = c("loadings", "intercepts"))
RomStrict <- cfa(RomGModel, dataGroup, std.lv = F, group = "group", group.equal = c("loadings", "intercepts", "residuals"))
RomLV <- cfa(RomGModel, dataGroup, std.lv = F, group = "group", group.equal = c("loadings", "intercepts", "residuals", "lv.variances"))
RomMean <- cfa(RomGModel, dataGroup, std.lv = F, group = "group", group.equal = c("loadings", "intercepts", "residuals", "lv.variances", "means"))

round(cbind(RC = fitMeasures(RomConfigural, FITM),
            RM = fitMeasures(RomMetric, FITM), #pchisq(9.627 - 5.813, 3, lower.tail = F) = .28
            RS = fitMeasures(RomScalar, FITM), #pchisq(16.253 - 9.627, 3, lower.tail = F) = .08
            RR = fitMeasures(RomStrict, FITM),
            RL = fitMeasures(RomLV, FITM),
            RE = fitMeasures(RomMean, FITM)), 3) #pchisq(21.534 - 17.272, 1, lower.tail = F) = .04
##                      RC       RM       RS       RR       RL       RE
## chisq             5.813    9.627   16.253   16.588   17.272   21.534
## df                4.000    7.000   10.000   14.000   15.000   16.000
## npar             24.000   21.000   18.000   14.000   13.000   12.000
## cfi               0.994    0.992    0.980    0.992    0.993    0.983
## rmsea             0.098    0.089    0.115    0.062    0.056    0.085
## rmsea.ci.lower    0.000    0.000    0.000    0.000    0.000    0.000
## rmsea.ci.upper    0.256    0.212    0.212    0.160    0.154    0.169
## aic            2983.587 2981.401 2982.028 2974.363 2973.046 2975.308
## bic            3044.880 3035.033 3027.998 3010.117 3006.247 3005.955
summary(RomLV, stand = T, fit = T)
## lavaan 0.6-12 ended normally after 95 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        25
##   Number of equality constraints                    12
## 
##   Number of observations per group:                   
##     FCG                                             49
##     CAUG                                            46
## 
## Model Test User Model:
##                                                       
##   Test statistic                                17.272
##   Degrees of freedom                                15
##   P-value (Chi-square)                           0.303
##   Test statistic for each group:
##     FCG                                         10.910
##     CAUG                                         6.361
## 
## Model Test Baseline Model:
## 
##   Test statistic                               330.261
##   Degrees of freedom                                12
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.993
##   Tucker-Lewis Index (TLI)                       0.994
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -1473.523
##   Loglikelihood unrestricted model (H1)      -1464.887
##                                                       
##   Akaike (AIC)                                2973.046
##   Bayesian (BIC)                              3006.247
##   Sample-size adjusted Bayesian (BIC)         2965.203
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.056
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.154
##   P-value RMSEA <= 0.05                          0.421
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.106
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## 
## Group 1 [FCG]:
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   g =~                                                                  
##     VCI_3_1           1.000                              15.166    0.835
##     PRI_3_1 (.p2.)    1.124    0.092   12.244    0.000   17.042    0.930
##     WMI_2_1 (.p3.)    1.119    0.096   11.682    0.000   16.977    0.902
##     PSI_2_1 (.p4.)    0.906    0.085   10.687    0.000   13.737    0.855
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .VCI_3_1 (.10.)   74.540    2.436   30.599    0.000   74.540    4.103
##    .PRI_3_1 (.11.)   77.806    2.570   30.276    0.000   77.806    4.244
##    .WMI_2_1 (.12.)   81.844    2.604   31.431    0.000   81.844    4.350
##    .PSI_2_1 (.13.)   78.581    2.174   36.139    0.000   78.581    4.892
##     g                 0.000                               0.000    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .VCI_3_1 (.p5.)  100.019   17.109    5.846    0.000  100.019    0.303
##    .PRI_3_1 (.p6.)   45.627   11.613    3.929    0.000   45.627    0.136
##    .WMI_2_1 (.p7.)   65.798   13.689    4.806    0.000   65.798    0.186
##    .PSI_2_1 (.p8.)   69.271   12.278    5.642    0.000   69.271    0.269
##     g       (.p9.)  230.009   46.028    4.997    0.000    1.000    1.000
## 
## 
## Group 2 [CAUG]:
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   g =~                                                                  
##     VCI_3_1           1.000                              15.166    0.835
##     PRI_3_1 (.p2.)    1.124    0.092   12.244    0.000   17.042    0.930
##     WMI_2_1 (.p3.)    1.119    0.096   11.682    0.000   16.977    0.902
##     PSI_2_1 (.p4.)    0.906    0.085   10.687    0.000   13.737    0.855
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .VCI_3_1 (.10.)   74.540    2.436   30.599    0.000   74.540    4.103
##    .PRI_3_1 (.11.)   77.806    2.570   30.276    0.000   77.806    4.244
##    .WMI_2_1 (.12.)   81.844    2.604   31.431    0.000   81.844    4.350
##    .PSI_2_1 (.13.)   78.581    2.174   36.139    0.000   78.581    4.892
##     g                -6.724    3.239   -2.076    0.038   -0.443   -0.443
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .VCI_3_1 (.p5.)  100.019   17.109    5.846    0.000  100.019    0.303
##    .PRI_3_1 (.p6.)   45.627   11.613    3.929    0.000   45.627    0.136
##    .WMI_2_1 (.p7.)   65.798   13.689    4.806    0.000   65.798    0.186
##    .PSI_2_1 (.p8.)   69.271   12.278    5.642    0.000   69.271    0.269
##     g       (.p9.)  230.009   46.028    4.997    0.000    1.000    1.000

Between Ethnic Groups, Intervention Groups Only

RomConfigural <- cfa(RomGModel, dataEthnicGroup, std.lv = T, group = "ethnic")
RomMetric <- cfa(RomGModel, dataEthnicGroup, std.lv = F, group = "ethnic", group.equal = "loadings")
RomPartialMetric <- cfa(RomGModel, dataEthnicGroup, std.lv = F, group = "ethnic", group.equal = "loadings", group.partial = "g=~PRI_est3_age18")
RomScalar <- cfa(RomGModel, dataEthnicGroup, std.lv = F, group = "ethnic", group.equal = c("loadings", "intercepts"), group.partial = "g=~PRI_est3_age18")
RomStrict <- cfa(RomGModel, dataEthnicGroup, std.lv = F, group = "ethnic", group.equal = c("loadings", "intercepts", "residuals"), group.partial = "g=~PRI_est3_age18")
RomLV <- cfa(RomGModel, dataEthnicGroup, std.lv = F, group = "ethnic", group.equal = c("loadings", "intercepts", "residuals", "lv.variances"), group.partial = "g=~PRI_est3_age18")
RomMean <- cfa(RomGModel, dataEthnicGroup, std.lv = F, group = "ethnic", group.equal = c("loadings", "intercepts", "residuals", "means"), group.partial = "g=~PRI_est3_age18")

round(cbind(RC = fitMeasures(RomConfigural, FITM),
            RM = fitMeasures(RomMetric, FITM), #pchisq(14.243 - 3.864, 3, lower.tail = F) = .02
            RMP = fitMeasures(RomPartialMetric, FITM),
            RS = fitMeasures(RomScalar, FITM), 
            RR = fitMeasures(RomStrict, FITM), #pchisq(15.054 - 9.978, 4, lower.tail = F) = .28
            RL = fitMeasures(RomLV, FITM), #pchisq(20.170 - 15.045, 1, lower.tail = F) = .02
            RE = fitMeasures(RomMean, FITM)), 3) 
##                      RC       RM      RMP       RS       RR       RL       RE
## chisq             3.864   14.243    7.175    9.978   15.045   20.170   22.594
## df                4.000    7.000    6.000    9.000   13.000   14.000   14.000
## npar             24.000   21.000   22.000   19.000   15.000   14.000   14.000
## cfi               1.000    0.972    0.995    0.996    0.992    0.976    0.967
## rmsea             0.000    0.161    0.070    0.052    0.063    0.105    0.124
## rmsea.ci.lower    0.000    0.018    0.000    0.000    0.000    0.000    0.000
## rmsea.ci.upper    0.235    0.281    0.226    0.190    0.175    0.199    0.214
## aic            2506.307 2510.686 2505.618 2502.421 2499.488 2502.613 2505.036
## bic            2563.476 2560.708 2558.023 2547.679 2535.218 2535.961 2538.385

Among the institutionalized-only groups, the ethnic differences in means and variances between Romanians and Roma are present. There appears to be bias in terms of the PRI subscale, which is Perceptual Reasoning, a subscale considered to be “culture-free”, because it does not test things like vocabulary or mathematical skills, but instead assesses spatial processing, visuomotor integration, and perceptual and fluid reasoning. Romanians appear to have a broader range of cognitive ability and to be 0.554 g more intelligent in the institutionalized group.

summary(RomStrict, stand = T, fit = T)
## lavaan 0.6-12 ended normally after 107 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        25
##   Number of equality constraints                    10
## 
##   Number of observations per group:                   
##     Rroma (Gypsy)                                   28
##     Romanian                                        52
## 
## Model Test User Model:
##                                                       
##   Test statistic                                15.045
##   Degrees of freedom                                13
##   P-value (Chi-square)                           0.305
##   Test statistic for each group:
##     Rroma (Gypsy)                                9.544
##     Romanian                                     5.501
## 
## Model Test Baseline Model:
## 
##   Test statistic                               269.557
##   Degrees of freedom                                12
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.992
##   Tucker-Lewis Index (TLI)                       0.993
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -1234.744
##   Loglikelihood unrestricted model (H1)      -1227.221
##                                                       
##   Akaike (AIC)                                2499.488
##   Bayesian (BIC)                              2535.218
##   Sample-size adjusted Bayesian (BIC)         2487.917
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.063
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.175
##   P-value RMSEA <= 0.05                          0.397
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.074
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## 
## Group 1 [Rroma (Gypsy)]:
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   g =~                                                                  
##     VCI_3_1           1.000                              10.421    0.746
##     PRI_3_1           1.576    0.219    7.198    0.000   16.422    0.933
##     WMI_2_1 (.p3.)    1.134    0.102   11.092    0.000   11.819    0.823
##     PSI_2_1 (.p4.)    0.874    0.091    9.572    0.000    9.105    0.719
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .VCI_3_1 (.10.)   65.997    2.340   28.209    0.000   65.997    4.727
##    .PRI_3_1 (.11.)   68.990    2.693   25.623    0.000   68.990    3.918
##    .WMI_2_1 (.12.)   73.369    2.529   29.014    0.000   73.369    5.107
##    .PSI_2_1 (.13.)   71.752    2.085   34.415    0.000   71.752    5.665
##     g                 0.000                               0.000    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .VCI_3_1 (.p5.)   86.335   16.179    5.336    0.000   86.335    0.443
##    .PRI_3_1 (.p6.)   40.380   12.827    3.148    0.002   40.380    0.130
##    .WMI_2_1 (.p7.)   66.684   14.539    4.587    0.000   66.684    0.323
##    .PSI_2_1 (.p8.)   77.502   14.109    5.493    0.000   77.502    0.483
##     g               108.593   37.791    2.874    0.004    1.000    1.000
## 
## 
## Group 2 [Romanian]:
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   g =~                                                                  
##     VCI_3_1           1.000                              16.385    0.870
##     PRI_3_1           1.081    0.097   11.187    0.000   17.717    0.941
##     WMI_2_1 (.p3.)    1.134    0.102   11.092    0.000   18.583    0.916
##     PSI_2_1 (.p4.)    0.874    0.091    9.572    0.000   14.316    0.852
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .VCI_3_1 (.10.)   65.997    2.340   28.209    0.000   65.997    3.504
##    .PRI_3_1 (.11.)   68.990    2.693   25.623    0.000   68.990    3.665
##    .WMI_2_1 (.12.)   73.369    2.529   29.014    0.000   73.369    3.615
##    .PSI_2_1 (.13.)   71.752    2.085   34.415    0.000   71.752    4.269
##     g                 9.075    3.263    2.781    0.005    0.554    0.554
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .VCI_3_1 (.p5.)   86.335   16.179    5.336    0.000   86.335    0.243
##    .PRI_3_1 (.p6.)   40.380   12.827    3.148    0.002   40.380    0.114
##    .WMI_2_1 (.p7.)   66.684   14.539    4.587    0.000   66.684    0.162
##    .PSI_2_1 (.p8.)   77.502   14.109    5.493    0.000   77.502    0.274
##     g               268.469   65.766    4.082    0.000    1.000    1.000

Between Intervention Groups, Romanians and Roma Only

RomConfigural <- cfa(RomGModel, dataEthnicGroup, std.lv = T, group = "group")
RomMetric <- cfa(RomGModel, dataEthnicGroup, std.lv = F, group = "group", group.equal = "loadings")
RomScalar <- cfa(RomGModel, dataEthnicGroup, std.lv = F, group = "group", group.equal = c("loadings", "intercepts"))
RomStrict <- cfa(RomGModel, dataEthnicGroup, std.lv = F, group = "group", group.equal = c("loadings", "intercepts", "residuals"))
RomLV <- cfa(RomGModel, dataEthnicGroup, std.lv = F, group = "group", group.equal = c("loadings", "intercepts", "residuals", "lv.variances"))
RomMean <- cfa(RomGModel, dataEthnicGroup, std.lv = F, group = "group", group.equal = c("loadings", "intercepts", "residuals", "lv.variances", "means"))

round(cbind(RC = fitMeasures(RomConfigural, FITM),
            RM = fitMeasures(RomMetric, FITM), #pchisq(10.355 - 7.954, 3, lower.tail = F) = .49
            RS = fitMeasures(RomScalar, FITM), #pchisq(14.971 - 10.355, 3, lower.tail = F) = .20
            RR = fitMeasures(RomStrict, FITM), #pchisq(16.093 - 14.971, 4, lower.tail = F) = .89
            RL = fitMeasures(RomLV, FITM), #pchisq(18.823 - 16.093, 1, lower.tail = F) = .10
            RE = fitMeasures(RomMean, FITM)), 3) #pchisq(20.033 - 18.823, 1, lower.tail = F) = .27
##                      RC       RM       RS       RR       RL       RE
## chisq             7.954   10.355   14.971   16.093   18.823   20.033
## df                4.000    7.000   10.000   14.000   15.000   16.000
## npar             24.000   21.000   18.000   14.000   13.000   12.000
## cfi               0.985    0.987    0.981    0.992    0.985    0.985
## rmsea             0.157    0.109    0.111    0.061    0.080    0.079
## rmsea.ci.lower    0.000    0.000    0.000    0.000    0.000    0.000
## rmsea.ci.upper    0.317    0.240    0.221    0.171    0.178    0.175
## aic            2520.074 2516.474 2515.091 2508.212 2508.943 2508.153
## bic            2577.243 2566.497 2557.967 2541.561 2539.909 2536.737

Subsetting to Romanians and Roma only, there was no difference in latent intelligence between the intervention and nonintervention groups. This presumably follows from the observation that Roma in the FCG did worse than Roma in the CAUG group (ns, 65.27 CAUG vs 60.08 FCG). with that said, what if we subset only to Romanians?

summary(RomLV, stand = T, fit = T)
## lavaan 0.6-12 ended normally after 88 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        25
##   Number of equality constraints                    12
## 
##   Number of observations per group:                   
##     FCG                                             43
##     CAUG                                            37
## 
## Model Test User Model:
##                                                       
##   Test statistic                                18.823
##   Degrees of freedom                                15
##   P-value (Chi-square)                           0.222
##   Test statistic for each group:
##     FCG                                         11.805
##     CAUG                                         7.018
## 
## Model Test Baseline Model:
## 
##   Test statistic                               273.878
##   Degrees of freedom                                12
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.985
##   Tucker-Lewis Index (TLI)                       0.988
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -1241.471
##   Loglikelihood unrestricted model (H1)      -1232.060
##                                                       
##   Akaike (AIC)                                2508.943
##   Bayesian (BIC)                              2539.909
##   Sample-size adjusted Bayesian (BIC)         2498.915
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.080
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.178
##   P-value RMSEA <= 0.05                          0.312
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.207
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## 
## Group 1 [FCG]:
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   g =~                                                                  
##     VCI_3_1           1.000                              14.805    0.838
##     PRI_3_1 (.p2.)    1.179    0.106   11.138    0.000   17.459    0.930
##     WMI_2_1 (.p3.)    1.151    0.108   10.624    0.000   17.048    0.903
##     PSI_2_1 (.p4.)    0.895    0.095    9.406    0.000   13.257    0.839
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .VCI_3_1 (.10.)   73.829    2.534   29.133    0.000   73.829    4.181
##    .PRI_3_1 (.11.)   77.718    2.810   27.659    0.000   77.718    4.142
##    .WMI_2_1 (.12.)   82.282    2.792   29.473    0.000   82.282    4.357
##    .PSI_2_1 (.13.)   78.633    2.268   34.674    0.000   78.633    4.978
##     g                 0.000                               0.000    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .VCI_3_1 (.p5.)   92.675   17.421    5.320    0.000   92.675    0.297
##    .PRI_3_1 (.p6.)   47.253   13.408    3.524    0.000   47.253    0.134
##    .WMI_2_1 (.p7.)   66.033   15.148    4.359    0.000   66.033    0.185
##    .PSI_2_1 (.p8.)   73.787   13.890    5.312    0.000   73.787    0.296
##     g       (.p9.)  219.195   47.844    4.581    0.000    1.000    1.000
## 
## 
## Group 2 [CAUG]:
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   g =~                                                                  
##     VCI_3_1           1.000                              14.805    0.838
##     PRI_3_1 (.p2.)    1.179    0.106   11.138    0.000   17.459    0.930
##     WMI_2_1 (.p3.)    1.151    0.108   10.624    0.000   17.048    0.903
##     PSI_2_1 (.p4.)    0.895    0.095    9.406    0.000   13.257    0.839
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .VCI_3_1 (.10.)   73.829    2.534   29.133    0.000   73.829    4.181
##    .PRI_3_1 (.11.)   77.718    2.810   27.659    0.000   77.718    4.142
##    .WMI_2_1 (.12.)   82.282    2.792   29.473    0.000   82.282    4.357
##    .PSI_2_1 (.13.)   78.633    2.268   34.674    0.000   78.633    4.978
##     g                -3.792    3.435   -1.104    0.270   -0.256   -0.256
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .VCI_3_1 (.p5.)   92.675   17.421    5.320    0.000   92.675    0.297
##    .PRI_3_1 (.p6.)   47.253   13.408    3.524    0.000   47.253    0.134
##    .WMI_2_1 (.p7.)   66.033   15.148    4.359    0.000   66.033    0.185
##    .PSI_2_1 (.p8.)   73.787   13.890    5.312    0.000   73.787    0.296
##     g       (.p9.)  219.195   47.844    4.581    0.000    1.000    1.000
RomConfigural <- cfa(RomGModel, dataRomanianInt, std.lv = T, group = "group")
RomMetric <- cfa(RomGModel, dataRomanianInt, std.lv = F, group = "group", group.equal = "loadings")
RomScalar <- cfa(RomGModel, dataRomanianInt, std.lv = F, group = "group", group.equal = c("loadings", "intercepts"))
RomStrict <- cfa(RomGModel, dataRomanianInt, std.lv = F, group = "group", group.equal = c("loadings", "intercepts", "residuals"))
RomLV <- cfa(RomGModel, dataRomanianInt, std.lv = F, group = "group", group.equal = c("loadings", "intercepts", "residuals", "lv.variances"))
RomMean <- cfa(RomGModel, dataRomanianInt, std.lv = F, group = "group", group.equal = c("loadings", "intercepts", "residuals", "lv.variances", "means"))

round(cbind(RC = fitMeasures(RomConfigural, FITM),
            RM = fitMeasures(RomMetric, FITM), 
            RS = fitMeasures(RomScalar, FITM),
            RR = fitMeasures(RomStrict, FITM), #pchisq(10.306 - 5.509, 4, lower.tail = F) = .31
            RL = fitMeasures(RomLV, FITM),
            RE = fitMeasures(RomMean, FITM)), 3) #pchisq(13.696 - 10.407, 1, lower.tail = F) = .07
##                      RC       RM       RS       RR       RL       RE
## chisq             4.514    4.987    5.509   10.306   10.407   13.696
## df                4.000    7.000   10.000   14.000   15.000   16.000
## npar             24.000   21.000   18.000   14.000   13.000   12.000
## cfi               0.997    1.000    1.000    1.000    1.000    1.000
## rmsea             0.070    0.000    0.000    0.000    0.000    0.000
## rmsea.ci.lower    0.000    0.000    0.000    0.000    0.000    0.000
## rmsea.ci.upper    0.312    0.194    0.117    0.139    0.123    0.156
## aic            1632.507 1626.980 1621.502 1618.300 1616.401 1617.690
## bic            1679.337 1667.956 1656.625 1645.617 1641.767 1641.105

There was no evidence of an effect when subset to Romanians only. This interaction (Romanian vs all non-Romanian) only had a p-value of .03. I say “only”, but this is between the observed effect on fullscale-IQ (p = .02) and the effect on latent g (p = .04), so if we take the authors’ conclusions to heart, this one should be accepted, too. This may even signal randomization failure. Importantly, this interaction did not occur appear to occur at the level of the fullscale-IQ, only g. Since IQ scores do not matter on their own, but only in as much as they represent g (and other latent abilities, as opposed to mere test taking skills), it is obvious that the interaction with g should more strongly motivate subsequent inferences.

Discussion and Conclusion

One may argue that this was a salami-slicing expedition. That’s fair, but the strength of the evidence provided by Humphreys et al. (2022) for their preferred interpretation was abysmal in the first place. One would have thought the world would move beyond small fostering studies long ago, but it appears the world has not.

Because of the marginality of their headline effects, the conclusion by Humphreys et al. (2022) that “early investment in family care as an alternative to institutional care leads to sustained gains in cognitive ability” is only barely supported, and the strength of the evidence should not be enough to change anyone’s beliefs on the matter. There was no reason to think that “Fostering caregiving relationships is a likely mechanism of the intervention.” Thanks to the authors’ exploratory over-analysis, they conducted altogether too many analyses and any adjustment for multiple comparisons renders their headline conclusions nonsignificant. A more stringent p-value threshold should be used to reduce the chance that low-quality data such as this is allowed to proliferate alongside its authors’ preferred, dangerously-overstated conclusions. This analysis could be improved if all subtest scores were made available.

A lone conclusion which may remain robust to correction for almost any number of comparisons in their data is Romanian vs Roma ethnic differences in intelligence. These are some of the lowest average IQs I’ve seen in ostensibly real data and, interestingly, they seemed to recapitulate differences between Romanians and Roma seen elsewhere, despite the Romanian IQ being lower than is typically estimated. It was interesting that, despite this additional selection in the Romanian group and non-difference between the Roma IQs reported here and typical ones, there was no bias in their comparison. However, this needs to be caveated like everything that could be reported with this study, because it was extremely statistically weak. Humphreys et al.’s (2022) paper comes down to p = .02 at the observed level, p = .04 at the latent level, and a lot of trust that there weren’t any other issues because those margins are already slim and they’re even debatable. The ethnic differences in their data were vastly more significant than their intervention’s effects.

Their curve-fitting to clouds was a laughable part of their publication and a glance at it (see, e.g., Fig. S8) should reveal why it doesn’t need to be addressed. This is even more sure when it is considered alongside the weakness of the rest of the evidence and non-evidence for effects of things like age of placement, quality of caregiving, and foster care stability.

Because of the weakness of their evidence coupled with the strength of their conclusions and the susceptibility of policymakers to the errant conclusions of behavioral scientists, Humphreys et al. (2022) stands as an example of an unfortunate disservice to the scientific community. The only redeeming part of it was its publication of some much-desired data. The data was much-desired in the first place because of the many other studies that used it, in the presented and earlier iterations, to make similarly strong, but equally unsupported conclusions.

References

Humphreys, K. L., King, L. S., Guyon-Harris, K. L., Sheridan, M. A., McLaughlin, K. A., Radulescu, A., Nelson, C. A., Fox, N. A., & Zeanah, C. H. (2022). Foster care leads to sustained cognitive gains following severe early deprivation. Proceedings of the National Academy of Sciences, 119(38), e2119318119. https://doi.org/10.1073/pnas.2119318119

Lehto, K., Haegg, S., Lu, D., Karlsson, R., Pedersen, N. L., & Mosing, M. A. (2020). Childhood Adoption and Mental Health in Adulthood: The Role of Gene-Environment Correlations and Interactions in the UK Biobank. Biological Psychiatry, 87(8), 708–716. https://doi.org/10.1016/j.biopsych.2019.10.016