Setup

library(pacman); p_load(psych, lavaan, tidyverse)

FITM <- c("chisq", "df", "nPar", "cfi", "rmsea", "rmsea.ci.lower", "rmsea.ci.upper", "aic", "bic")

Rationale

John Protzko has produced a new paper that is interesting, to say the least (Protzko, 2022). The gist is that the interpretation of measurement invariance that goes

\[f(X,\eta) = f(X, \eta, v)\]

is false. He showed this by taking a sample of 1500 and randomly selecting participants to take a Meaning of Life questionnaire, while the others took a Meaning of Gavagai (a nonsense word) questionnaire, and then assessing measurement invariance when these questionnaires’ items were treated as if they came from one scale. By finding that measurement invariance held, we could dismiss the notion that measurement invariance guarantees you are measuring the same things, and thus \(f(X,\eta) \neq f(X, \eta, v)\), because X is not influenced by a common \(\eta\).

That manuscript suffered from a number of incredibly large defects and showcased basic misunderstandings of measurement in general and measurement invariance in particular. All of its conclusions are incorrect, and, actually, it serves as a proof that measurement invariance testing works. In no particular order, here are the issues that stood out most prominently.

Firstly, we do not get to pick what our scales measure. If we devise a happiness scale and discover that it is perfectly correlated with a survey about enjoying processed foods, we have a construct validity issue for one or both scales and this should cause introspection about what is being measured rather than dismissal of either instrument because the questions do not appear to plausibly entail identical or even just highly-related measurements. The “Gavagai” scale does not measure “nothing” (pp.’s 2, 17) just because it is nonsensical on its face. People who are given the Gavagai questionnaire impute their own meanings and they can be meanings derived in the moment or according to some broad rule if they so wish. To the degree people in general do this, we should actually expect some degree of validity. To the extent that people hold a questionnaire’s questions in mind together, we should expect some covariance if they do not have clearly different content, simply because people should infer relationships among questions and try to answer accordingly. A questionnaire for nothing is - a nonsense scale - must be truly nonsensical to measure nothing. An example question set for a questionnaire which would truly measure nothing would be something like

  1. Bananas: Yes or no?

  2. How old are you: Finland or Silverback Kittens?

  3. Uh oh: How does that make you feel (fill in the blank and then eat the paper and yell out your answer)?

The use of “Gavagai”, which is a word whose meaning can be imputed for whatever, with otherwise sensible questions does not generate a nothing scale, and psychometricians should stop acting like it does. Additionally, unless an interview gets people to firmly state how they interpreted a thing, it cannot be used to verify or dismiss interpretation in common in a sample. When 65% of people say they don’t know what Gavagai is and 35% guess Meaning/purpose, happiness, money, or sexual pleasure, it may be that the residual 65% had an idea in mind when answering and it may have been consistent between questions, but they did not answer because they felt uncertain and we observed a response bias.

Second, \(\omega\) and high loadings are not separate findings. \(\omega\) is a measure of factor saturation derived from the loadings, and can be found despite low test-retest reliability if the common variance of a set of items is considerable enough. Simulate it - seriously! Though unnecessary, it is trivial to achieve this condition if you violate local independence to sustain it for a single-factor model.

Third, measurement invariance is not just testing metric and scalar invariance. When metric invariance holds, indicator variables are interpreted the same way for different groups, and when scale invariance holds, the means of those indicators are interpretable the same for different groups. But, more importantly, to get to the interpretation of measurement invariance as meaning that a scale is “measuring the same thing” for different groups, you must also have equal residual variances. Variance on an item is caused by something, whether it be a real systematic influence, like schooling on a vocabulary test, or error. To show that the same thing is measured, one must make all influences on tests equal, so the variances cannot differ net of differences in the latent variances. Moreover, to ensure equal reliability, you must also control the residual variances. This is well-known and well-understood, but some people are adamant that they can interpret the variances without assessing their equality, which is obviously wrong. These people call going to scalar invariance measurement invariance, but they are in the wrong: you must have error variance homogeneity to have measurement invariance (see DeShon, 2004). As a final note, to be very explicit, “Strong” or “Scalar” invariance does not mean a scale measures the same thing in different groups, it means the interpretations of items and their levels is common between groups. Strict invariance means scales measure the same thing.

Four, insensitive model fit indices are not useful for assessing measurement invariance, and the conclusions they yield are often wrong. In Ropovik’s (2015) cautionary note on testing between model fit indices, he reviewed published SEMs where model fits were compared. The most common fit indices reported were RMSEA and CFI, followed by SRMR, and, finally, \(\chi^2\) exact fit tests. This is critical because approximate fit indices are much weaker tests of fit, and they can easily be gamed, by picking and choosing those which agree or do not. On the other hand, the \(chi^2\) test is able to strongly detect violations of measurement invariance, and both AIC and BIC (in that order) are superior to RMSEA, CFI, and SRMR. Using insensitive fit indices is a surefire recipe for picking the wrong model, so it should be considered alarming that so many do it. If a model is found to be noninvariant, the extent of measurement bias to the means can be estimated with existing methods. Other variant parameters are just informative for theoretical reasons. To be serious about measurement invariance requires seriously testing the hypotheses we are interested in, but Protzko (2022) did not feature serious measurement invariance testing. Instead, it involved the use of weak indices. In the future, hopefully dynamically computed fit indices can be generated so we can stop relying on cutoffs that may not apply to our data (see McNeish & Wolf, 2021).

Fifth, the estimand for Protzko (2022) was wrong. What was tested in that publication was, at best, whether there was discriminant validity for the Meaning of Life and Meaning of Gavagai questionnaires, not whether measurement invariance means tests measure the same thing. Surely, were we to administer questions from different questionnaires for extraversion and to see if they were invariant, we might find something like this - and especially so if we use weak cutoffs -, but that does not mean that measurement invariance is flawed; it means the analysis was. We have a lopsided estimand: on the one hand, showing that Gavagai is not the same measurement as Meaning of Life shows that, yes, different things generate different results, but showing that they generate the same is agnostic with respect to the conclusion. To get to the desired non-lopsided test, the desired method is to administer two samples the same questionnaire and assess if doing something like providing answers, practice, or coaching for an objective assessment leads to noninvariance. This gives us a confirmed different source of score variation, rather than an open question about the construct validity of different questionnaires. In this sense, Protzko (2022) could only truly verify that measurement invariance worked, not that it did not.

I show below that, when analyzed properly, with sensitive fit indices and full measurement invariance, the data from Protzko (2022) strongly support the interpretation of measurement invariance as showing that the same things are measured in different groups.

As a side note, the estimator is formally wrong, but it should not make much difference theoretically (Robitzsch, 2021), although the proper estimator provides more liberal CFI and RMSEA estimates, which, if focused on instead of more sensitive fit indices, can increase the odds of spuriously accepting measurement invariance (there is a considerable literature on this, but it is also obvious because treating ordinals as ordinals ought to generate better fit with ordinals than treating them as if they’re quantitative).

Analysis

First, the aggregate model, purely for interest’s sake. If we run this model in both groups, we can see that the loadings do significantly differ, so omnibus metric invariance must be rejected. This leaves room for partial metric invariance, and the location of that partial invariance can be informative about the sources of interpretive differences for our groups. Because that is one of our questions, lets investigate where that difference in loadings appears. As it turns out,

GavaData <- subset(data, gavagai == 1)
LifeData <- subset(data, gavagai == 0)

GavaLife.model <- '
FWB =~ fwb1 + fwb2 + fwb3 + fwb4 + fwb5
MLQS =~ mlqs1 + mlqs2 + mlqs3 + mlqs4 + mlqs5'

GavaLife.fit <- sem(GavaLife.model, data, std.lv = T)

GavaGava.fit <- sem(GavaLife.model, GavaData, std.lv = T)
LifeLife.fit <- sem(GavaLife.model, LifeData, std.lv = T)

summary(GavaLife.fit, stand = T, fit = T)
## lavaan 0.6-9 ended normally after 25 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        21
##                                                       
##   Number of observations                          1500
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                               216.262
##   Degrees of freedom                                34
##   P-value (Chi-square)                           0.000
## 
## Model Test Baseline Model:
## 
##   Test statistic                             11890.034
##   Degrees of freedom                                45
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.985
##   Tucker-Lewis Index (TLI)                       0.980
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             -21832.993
##   Loglikelihood unrestricted model (H1)     -21724.862
##                                                       
##   Akaike (AIC)                               43707.986
##   Bayesian (BIC)                             43819.564
##   Sample-size adjusted Bayesian (BIC)        43752.852
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.060
##   90 Percent confidence interval - lower         0.052
##   90 Percent confidence interval - upper         0.068
##   P-value RMSEA <= 0.05                          0.016
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.022
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   FWB =~                                                                
##     fwb1              0.931    0.032   28.738    0.000    0.931    0.689
##     fwb2              1.086    0.034   31.918    0.000    1.086    0.745
##     fwb3              1.218    0.036   34.294    0.000    1.218    0.784
##     fwb4              1.158    0.032   35.704    0.000    1.158    0.806
##     fwb5              1.062    0.035   30.618    0.000    1.062    0.722
##   MLQS =~                                                               
##     mlqs1             1.386    0.033   41.644    0.000    1.386    0.861
##     mlqs2             1.468    0.032   46.319    0.000    1.468    0.918
##     mlqs3             1.435    0.032   44.831    0.000    1.435    0.900
##     mlqs4             1.506    0.032   47.745    0.000    1.506    0.933
##     mlqs5             1.523    0.033   46.848    0.000    1.523    0.923
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   FWB ~~                                                                
##     MLQS              0.269    0.026   10.199    0.000    0.269    0.269
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .fwb1              0.960    0.041   23.653    0.000    0.960    0.525
##    .fwb2              0.949    0.043   22.205    0.000    0.949    0.446
##    .fwb3              0.931    0.045   20.715    0.000    0.931    0.386
##    .fwb4              0.723    0.037   19.607    0.000    0.723    0.350
##    .fwb5              1.034    0.045   22.859    0.000    1.034    0.478
##    .mlqs1             0.668    0.028   24.194    0.000    0.668    0.258
##    .mlqs2             0.405    0.019   21.389    0.000    0.405    0.158
##    .mlqs3             0.481    0.021   22.589    0.000    0.481    0.189
##    .mlqs4             0.336    0.017   19.747    0.000    0.336    0.129
##    .mlqs5             0.400    0.019   20.848    0.000    0.400    0.147
##     FWB               1.000                               1.000    1.000
##     MLQS              1.000                               1.000    1.000
standardizedSolution(GavaGava.fit); standardizedSolution(LifeLife.fit) #Same output as the configural model
ZREG <- function(B1, B2, SEB1, SEB2) {
  Z = (B1-B2)/sqrt((SEB1)^2 + (SEB2)^2)
  return(Z)}

#Standardized

"Free will Beliefs - Standardized"
## [1] "Free will Beliefs - Standardized"
(1 - pnorm(ZREG(0.665, 0.711, 0.024, 0.021))) * 2 #fwb1, *2 to be two-tailed, though one-tailed is appropriate because we know to expect bias as a result of the use of different forms, and not to expect bias for the free will beliefs questionnaire due to randomization
## [1] 1.850821
(1 - pnorm(ZREG(0.774, 0.746, 0.020, 0.019))) * 2 #fwb2
## [1] 0.3101062
(1 - pnorm(ZREG(0.789, 0.779, 0.018, 0.018))) * 2 #fwb3
## [1] 0.6944398
(1 - pnorm(ZREG(0.803, 0.808, 0.017, 0.016))) * 2 #fwb4
## [1] 1.169591
(1 - pnorm(ZREG(0.726, 0.719, 0.021, 0.020))) * 2 #fwb5
## [1] 0.8092611
"Meaning of Life/Gavagai - Standardized" 
## [1] "Meaning of Life/Gavagai - Standardized"
(1 - pnorm(ZREG(0.909, 0.811, 0.007, 0.013))) * 2 #mlsq1
## [1] 3.192691e-11
(1 - pnorm(ZREG(0.937, 0.896, 0.005, 0.008))) * 2 #mlsq2
## [1] 1.386481e-05
(1 - pnorm(ZREG(0.944, 0.860, 0.005, 0.010))) * 2 #mlsq3
## [1] 5.77316e-14
(1 - pnorm(ZREG(0.941, 0.925, 0.005, 0.007))) * 2 #mlsq4
## [1] 0.06289087
(1 - pnorm(ZREG(0.941, 0.912, 0.005, 0.007))) * 2 #mlsq5
## [1] 0.0007484652
#Unstandardized - from the configural model, below, which had to be identified with latent variance standardization to avoid hiding bias with the constrained marker, and could be unstandardized in the subsequent, overidentified models to allow formally testing its equality between groups

"Free will Beliefs - Unstandardized"
## [1] "Free will Beliefs - Unstandardized"
(1 - pnorm(ZREG(.981, .878, .045, .046))) * 2 
## [1] 0.1094641
(1 - pnorm(ZREG(1.087, 1.086, .047, .049))) * 2
## [1] 0.988249
(1 - pnorm(ZREG(1.211, 1.224, .049, .051))) * 2
## [1] 1.145838
(1 - pnorm(ZREG(1.144, 1.172, .044, .048))) * 2
## [1] 1.332809
(1 - pnorm(ZREG(1.053, 1.071, .048, .050))) * 2
## [1] 1.204904
"Meaning of Life/Gavagai - Unstandardized" 
## [1] "Meaning of Life/Gavagai - Unstandardized"
(1 - pnorm(ZREG(1.344, 1.273, .049, .040))) * 2
## [1] 0.2616634
(1 - pnorm(ZREG(1.536, 1.268, .048, .038))) * 2
## [1] 1.199993e-05
(1 - pnorm(ZREG(1.443, 1.289, .048, .038))) * 2
## [1] 0.01188703
(1 - pnorm(ZREG(1.605, 1.289, .047, .038))) * 2
## [1] 1.710563e-07
(1 - pnorm(ZREG(1.621, 1.339, .049, .040))) * 2
## [1] 8.262986e-06

Consistent with the randomization procedure and the use of different tests, none of the loadings for the Free Will Belief questions significantly differed, but four of the loadings for the Meaning of Life/Gavagai questions, did (and three probably meaningfully). The simple conclusion is that the omnibus rejection of metric invariance must concern that questionnaire, and not the common one. Now, before moving on to actual testing of measurement invariance, it should be noted, that when testing partial models, at each step, the newly chosen model is chosen based on the most noninvariant parameter as indicated by \(\chi^2\), and the search for partial models stops when the difference with the prior stage of measurement invariance is nonsignificant. Importantly, model selection biases subsequent p values - we can achieve a model in which, say, some loadings known to be noninvariant appear to be acceptably invariant, but we can know based on inspection of initial model parameters, that this cannot be the case. With that said, we move on to fitting all of the non-partial measurement invariance models, in order of increasing restrictiveness.

GavaLifeC <- sem(GavaLife.model, data, std.lv = T, group = "gavagai")
GavaLifeM <- sem(GavaLife.model, data, std.lv = F, group = "gavagai", group.equal = "loadings")
GavaLifeS <- sem(GavaLife.model, data, std.lv = F, group = "gavagai", group.equal = c("loadings", "intercepts"))
GavaLifeR <- sem(GavaLife.model, data, std.lv = F, group = "gavagai", group.equal = c("loadings", "intercepts", "residuals"))
GavaLifeLV <- sem(GavaLife.model, data, std.lv = F, group = "gavagai", group.equal = c("loadings", "intercepts", "residuals", "lv.variances"))
GavaLifeLC <- sem(GavaLife.model, data, std.lv = F, group = "gavagai", group.equal = c("loadings", "intercepts", "residuals", "lv.variances", "lv.covariances"))
GavaLifeMN <- sem(GavaLife.model, data, std.lv = F, group = "gavagai", group.equal = c("loadings", "intercepts", "residuals", "lv.variances", "lv.covariances", "means"))

round(cbind(Configural = fitMeasures(GavaLifeC, FITM),
            Metric = fitMeasures(GavaLifeM, FITM),
            Scalar = fitMeasures(GavaLifeS, FITM),
            Strict = fitMeasures(GavaLifeR, FITM),
            LV.Vars = fitMeasures(GavaLifeLV, FITM),
            LV.Covars = fitMeasures(GavaLifeLC, FITM),
            Means = fitMeasures(GavaLifeMN, FITM)), 3)
##                Configural    Metric    Scalar    Strict   LV.Vars LV.Covars
## chisq             272.483   302.135   359.628  1039.201  1063.359  1063.360
## df                 68.000    76.000    84.000    94.000    96.000    97.000
## npar               62.000    54.000    46.000    36.000    34.000    33.000
## cfi                 0.983     0.981     0.977     0.922     0.920     0.920
## rmsea               0.063     0.063     0.066     0.116     0.116     0.115
## rmsea.ci.lower      0.056     0.056     0.059     0.109     0.110     0.109
## rmsea.ci.upper      0.071     0.071     0.073     0.122     0.122     0.122
## aic             42887.227 42900.879 42942.372 43601.945 43622.103 43620.104
## bic             43216.647 43187.793 43186.780 43793.221 43802.753 43795.441
##                    Means
## chisq           1175.242
## df                99.000
## npar              31.000
## cfi                0.911
## rmsea              0.120
## rmsea.ci.lower     0.114
## rmsea.ci.upper     0.127
## aic            43727.986
## bic            43892.696
summary(GavaLifeC, stand = T, fit = T)
## lavaan 0.6-9 ended normally after 35 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        62
##                                                       
##   Number of observations per group:                   
##     0                                              782
##     1                                              718
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                               272.483
##   Degrees of freedom                                68
##   P-value (Chi-square)                           0.000
##   Test statistic for each group:
##     0                                          143.593
##     1                                          128.891
## 
## Model Test Baseline Model:
## 
##   Test statistic                             12200.118
##   Degrees of freedom                                90
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.983
##   Tucker-Lewis Index (TLI)                       0.978
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             -21381.614
##   Loglikelihood unrestricted model (H1)     -21245.372
##                                                       
##   Akaike (AIC)                               42887.227
##   Bayesian (BIC)                             43216.647
##   Sample-size adjusted Bayesian (BIC)        43019.690
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.063
##   90 Percent confidence interval - lower         0.056
##   90 Percent confidence interval - upper         0.071
##   P-value RMSEA <= 0.05                          0.003
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.026
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## 
## Group 1 [0]:
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   FWB =~                                                                
##     fwb1              0.981    0.045   21.658    0.000    0.981    0.711
##     fwb2              1.087    0.047   23.124    0.000    1.087    0.746
##     fwb3              1.211    0.049   24.566    0.000    1.211    0.779
##     fwb4              1.144    0.044   25.908    0.000    1.144    0.808
##     fwb5              1.053    0.048   22.007    0.000    1.053    0.719
##   MLQS =~                                                               
##     mlqs1             1.344    0.049   27.279    0.000    1.344    0.811
##     mlqs2             1.536    0.048   32.005    0.000    1.536    0.896
##     mlqs3             1.443    0.048   29.901    0.000    1.443    0.860
##     mlqs4             1.605    0.047   33.798    0.000    1.605    0.925
##     mlqs5             1.621    0.049   32.974    0.000    1.621    0.912
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   FWB ~~                                                                
##     MLQS              0.302    0.036    8.385    0.000    0.302    0.302
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .fwb1              5.393    0.049  109.303    0.000    5.393    3.909
##    .fwb2              5.321    0.052  102.114    0.000    5.321    3.652
##    .fwb3              4.934    0.056   88.683    0.000    4.934    3.171
##    .fwb4              5.304    0.051  104.767    0.000    5.304    3.746
##    .fwb5              5.042    0.052   96.325    0.000    5.042    3.445
##    .mlqs1             5.019    0.059   84.714    0.000    5.019    3.029
##    .mlqs2             4.825    0.061   78.724    0.000    4.825    2.815
##    .mlqs3             4.908    0.060   81.838    0.000    4.908    2.927
##    .mlqs4             4.763    0.062   76.755    0.000    4.763    2.745
##    .mlqs5             4.717    0.064   74.219    0.000    4.717    2.654
##     FWB               0.000                               0.000    0.000
##     MLQS              0.000                               0.000    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .fwb1              0.942    0.056   16.766    0.000    0.942    0.495
##    .fwb2              0.943    0.059   16.065    0.000    0.943    0.444
##    .fwb3              0.953    0.063   15.193    0.000    0.953    0.394
##    .fwb4              0.696    0.049   14.171    0.000    0.696    0.347
##    .fwb5              1.034    0.062   16.613    0.000    1.034    0.483
##    .mlqs1             0.939    0.053   17.820    0.000    0.939    0.342
##    .mlqs2             0.578    0.037   15.549    0.000    0.578    0.197
##    .mlqs3             0.731    0.043   16.862    0.000    0.731    0.260
##    .mlqs4             0.435    0.032   13.614    0.000    0.435    0.144
##    .mlqs5             0.532    0.036   14.632    0.000    0.532    0.168
##     FWB               1.000                               1.000    1.000
##     MLQS              1.000                               1.000    1.000
## 
## 
## Group 2 [1]:
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   FWB =~                                                                
##     fwb1              0.878    0.046   18.987    0.000    0.878    0.665
##     fwb2              1.086    0.049   22.031    0.000    1.086    0.744
##     fwb3              1.224    0.051   23.916    0.000    1.224    0.789
##     fwb4              1.172    0.048   24.554    0.000    1.172    0.803
##     fwb5              1.071    0.050   21.310    0.000    1.071    0.726
##   MLQS =~                                                               
##     mlqs1             1.273    0.040   31.652    0.000    1.273    0.909
##     mlqs2             1.268    0.038   33.401    0.000    1.268    0.937
##     mlqs3             1.289    0.038   33.832    0.000    1.289    0.944
##     mlqs4             1.289    0.038   33.611    0.000    1.289    0.941
##     mlqs5             1.339    0.040   33.658    0.000    1.339    0.941
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   FWB ~~                                                                
##     MLQS              0.257    0.038    6.743    0.000    0.257    0.257
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .fwb1              5.467    0.049  111.074    0.000    5.467    4.145
##    .fwb2              5.344    0.055   98.030    0.000    5.344    3.658
##    .fwb3              4.953    0.058   85.534    0.000    4.953    3.192
##    .fwb4              5.312    0.054   97.567    0.000    5.312    3.641
##    .fwb5              5.106    0.055   92.700    0.000    5.106    3.460
##    .mlqs1             4.084    0.052   78.166    0.000    4.084    2.917
##    .mlqs2             4.045    0.050   80.095    0.000    4.045    2.989
##    .mlqs3             4.057    0.051   79.617    0.000    4.057    2.971
##    .mlqs4             4.026    0.051   78.727    0.000    4.026    2.938
##    .mlqs5             4.071    0.053   76.673    0.000    4.071    2.861
##     FWB               0.000                               0.000    0.000
##     MLQS              0.000                               0.000    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .fwb1              0.969    0.058   16.653    0.000    0.969    0.557
##    .fwb2              0.953    0.062   15.335    0.000    0.953    0.447
##    .fwb3              0.910    0.064   14.112    0.000    0.910    0.378
##    .fwb4              0.754    0.055   13.599    0.000    0.754    0.354
##    .fwb5              1.030    0.066   15.706    0.000    1.030    0.473
##    .mlqs1             0.339    0.021   16.367    0.000    0.339    0.173
##    .mlqs2             0.222    0.015   14.997    0.000    0.222    0.121
##    .mlqs3             0.203    0.014   14.471    0.000    0.203    0.109
##    .mlqs4             0.216    0.015   14.755    0.000    0.216    0.115
##    .mlqs5             0.230    0.016   14.697    0.000    0.230    0.114
##     FWB               1.000                               1.000    1.000
##     MLQS              1.000                               1.000    1.000
1 - pchisq(302.135 - 272.483, 8, lower.tail = T) #Metric vs Configural
## [1] 0.0002435004
1 - pchisq(359.628 - 302.135, 8, lower.tail = T) #Scalar vs Metric
## [1] 1.442822e-09
1 - pchisq(1039.201 - 359.628, 10, lower.tail = T) #Strict vs Scalar
## [1] 0
1 - pchisq(1063.359 - 1039.201, 2, lower.tail = T) #Homogeneous Latent Variances vs Strict
## [1] 5.677498e-06
1 - pchisq(1063.360 - 1063.359, 1, lower.tail = T) #Equal Latent Covariances vs Homogeneous Latent Variances
## [1] 0.9747729
1 - pchisq(1175.242 - 1063.360, 2, lower.tail = T) #Homogeneous Means vs Equal Latent Covariances
## [1] 0

Every stage of measurement invariance testing yielded noninvariance, except for the stage in which the equality of the latent variances was tested. This makes sense, because in the Protzko (2022) paper, the Life group and the Gavagai group had latent covariances of 0.31 (0.22 - 0.42) and 0.22 (0.13 - 0.32), respectively. The difference between those correlations is clearly not significant (p = 0.10), so this stage of testing should not have yielded any evidence of noninvariance. Everything thus far is consistent. Note the most major source of misfit: the residual variances. This proves, definitively, that at least some of the causes of scoring are not shared between groups. The neglect of this stage - which is absolutely crucial to the conclusion that something measures the same thing in different groups - is what explains the result.

The partial models follow.

GavaLifeC <- sem(GavaLife.model, data, std.lv = T, group = "gavagai")
GavaLifeM <- sem(GavaLife.model, data, std.lv = F, group = "gavagai", group.equal = "loadings")
GavaLifeMP <- sem(GavaLife.model, data, std.lv = F, group = "gavagai", group.equal = "loadings", group.partial = c("MLQS =~ mlqs4", "MLQS =~ mlqs5", "MLQS =~ mlqs2"))
GavaLifeS <- sem(GavaLife.model, data, std.lv = F, group = "gavagai", group.equal = c("loadings", "intercepts"), group.partial = c("MLQS =~ mlqs4", "MLQS =~ mlqs5", "MLQS =~ mlqs2"))
GavaLifeSP <- sem(GavaLife.model, data, std.lv = F, group = "gavagai", group.equal = c("loadings", "intercepts"), group.partial = c("MLQS =~ mlqs4", "MLQS =~ mlqs5", "MLQS =~ mlqs2", "mlqs5 ~ 1", "mlqs1 ~ 1"))
GavaLifeR <- sem(GavaLife.model, data, std.lv = F, group = "gavagai", group.equal = c("loadings", "intercepts", "residuals"), group.partial = c("MLQS =~ mlqs4", "MLQS =~ mlqs5", "MLQS =~ mlqs2", "mlqs5 ~ 1", "mlqs1 ~ 1"))
GavaLifeRP <- sem(GavaLife.model, data, std.lv = F, group = "gavagai", group.equal = c("loadings", "intercepts", "residuals"), group.partial = c("MLQS =~ mlqs4", "MLQS =~ mlqs5", "MLQS =~ mlqs2", "mlqs5 ~ 1", "mlqs1 ~ 1", "mlqs3 ~~ mlqs3", "mlqs1 ~~ mlqs1", "mlqs2 ~~ mlqs2", "mlqs5 ~~ mlqs5", "mlqs4 ~~ mlqs4"))
GavaLifeLV <- sem(GavaLife.model, data, std.lv = F, group = "gavagai", group.equal = c("loadings", "intercepts", "residuals", "lv.variances"), group.partial = c("MLQS =~ mlqs4", "MLQS =~ mlqs5", "MLQS =~ mlqs2", "mlqs5 ~ 1", "mlqs1 ~ 1", "mlqs3 ~~ mlqs3", "mlqs1 ~~ mlqs1", "mlqs2 ~~ mlqs2", "mlqs5 ~~ mlqs5", "mlqs4 ~~ mlqs4"))
GavaLifeLC <- sem(GavaLife.model, data, std.lv = F, group = "gavagai", group.equal = c("loadings", "intercepts", "residuals", "lv.variances", "lv.covariances"), group.partial = c("MLQS =~ mlqs4", "MLQS =~ mlqs5", "MLQS =~ mlqs2", "mlqs5 ~ 1", "mlqs1 ~ 1", "mlqs3 ~~ mlqs3", "mlqs1 ~~ mlqs1", "mlqs2 ~~ mlqs2", "mlqs5 ~~ mlqs5", "mlqs4 ~~ mlqs4"))
GavaLifeMN <- sem(GavaLife.model, data, std.lv = F, group = "gavagai", group.equal = c("loadings", "intercepts", "residuals", "lv.variances", "lv.covariances", "means"), group.partial = c("MLQS =~ mlqs4", "MLQS =~ mlqs5", "MLQS =~ mlqs2", "mlqs5 ~ 1", "mlqs1 ~ 1", "mlqs3 ~~ mlqs3", "mlqs1 ~~ mlqs1", "mlqs2 ~~ mlqs2", "mlqs5 ~~ mlqs5", "mlqs4 ~~ mlqs4"))

GavaLifeMN1.model <- '
FWB =~ fwb1 + fwb2 + fwb3 + fwb4 + fwb5
MLQS =~ mlqs1 + mlqs2 + mlqs3 + mlqs4 + mlqs5

FWB ~ c(0,0)*1'

GavaLifeMN2.model <- '
FWB =~ fwb1 + fwb2 + fwb3 + fwb4 + fwb5
MLQS =~ mlqs1 + mlqs2 + mlqs3 + mlqs4 + mlqs5

MLQS ~ c(0,0)*1'

GavaLifeMNP1 <- sem(GavaLifeMN1.model, data, std.lv = F, group = "gavagai", group.equal = c("loadings", "intercepts", "residuals", "lv.variances", "lv.covariances"), group.partial = c("MLQS =~ mlqs4", "MLQS =~ mlqs5", "MLQS =~ mlqs2", "mlqs5 ~ 1", "mlqs1 ~ 1", "mlqs3 ~~ mlqs3", "mlqs1 ~~ mlqs1", "mlqs2 ~~ mlqs2", "mlqs5 ~~ mlqs5", "mlqs4 ~~ mlqs4"))
GavaLifeMNP2 <- sem(GavaLifeMN2.model, data, std.lv = F, group = "gavagai", group.equal = c("loadings", "intercepts", "residuals", "lv.variances", "lv.covariances"), group.partial = c("MLQS =~ mlqs4", "MLQS =~ mlqs5", "MLQS =~ mlqs2", "mlqs5 ~ 1", "mlqs1 ~ 1", "mlqs3 ~~ mlqs3", "mlqs1 ~~ mlqs1", "mlqs2 ~~ mlqs2", "mlqs5 ~~ mlqs5", "mlqs4 ~~ mlqs4"))

round(cbind(Configural = fitMeasures(GavaLifeC, FITM),
            Metric = fitMeasures(GavaLifeM, FITM),
            P_Metric = fitMeasures(GavaLifeMP, FITM),
            Scalar = fitMeasures(GavaLifeS, FITM),
            P_Scalar = fitMeasures(GavaLifeSP, FITM),
            Strict = fitMeasures(GavaLifeR, FITM),
            P_Strict = fitMeasures(GavaLifeRP, FITM),
            LV.Vars = fitMeasures(GavaLifeLV, FITM),
            LV.Covars = fitMeasures(GavaLifeLC, FITM),
            Means = fitMeasures(GavaLifeMN, FITM),
            P_Means1 = fitMeasures(GavaLifeMNP1, FITM),
            P_Means2 = fitMeasures(GavaLifeMNP2, FITM)), 3)
##                Configural    Metric  P_Metric    Scalar  P_Scalar    Strict
## chisq             272.483   302.135   278.377   323.664   285.213   965.745
## df                 68.000    76.000    73.000    81.000    79.000    89.000
## npar               62.000    54.000    57.000    49.000    51.000    41.000
## cfi                 0.983     0.981     0.983     0.980     0.983     0.928
## rmsea               0.063     0.063     0.061     0.063     0.059     0.115
## rmsea.ci.lower      0.056     0.056     0.054     0.056     0.052     0.108
## rmsea.ci.upper      0.071     0.071     0.069     0.070     0.066     0.121
## aic             42887.227 42900.879 42883.121 42912.408 42877.957 43538.489
## bic             43216.647 43187.793 43185.975 43172.756 43148.931 43756.331
##                 P_Strict   LV.Vars LV.Covars     Means  P_Means1  P_Means2
## chisq            286.555   290.720   291.036   413.526   291.264   403.427
## df                84.000    86.000    87.000    89.000    88.000    88.000
## npar              46.000    44.000    43.000    41.000    42.000    42.000
## cfi                0.983     0.983     0.983     0.973     0.983     0.974
## rmsea              0.057     0.056     0.056     0.070     0.055     0.069
## rmsea.ci.lower     0.050     0.049     0.049     0.063     0.049     0.062
## rmsea.ci.upper     0.064     0.064     0.063     0.077     0.063     0.076
## aic            42869.299 42869.464 42867.780 42986.270 42866.008 42978.171
## bic            43113.707 43103.245 43096.249 43204.112 43089.163 43201.326
"Metric"
## [1] "Metric"
1 - pchisq(fitMeasures(GavaLifeM, "chisq") - fitMeasures(GavaLifeC, "chisq"), fitMeasures(GavaLifeM, "df") - fitMeasures(GavaLifeC, "df"), lower.tail = T)
## chisq 
##     0
1 - pchisq(fitMeasures(GavaLifeMP, "chisq") - fitMeasures(GavaLifeC, "chisq"), fitMeasures(GavaLifeMP, "df") - fitMeasures(GavaLifeC, "df"), lower.tail = T)
## chisq 
## 0.317
"Scalar"
## [1] "Scalar"
1 - pchisq(fitMeasures(GavaLifeS, "chisq") - fitMeasures(GavaLifeMP, "chisq"), fitMeasures(GavaLifeS, "df") - fitMeasures(GavaLifeMP, "df"), lower.tail = T)
## chisq 
##     0
1 - pchisq(fitMeasures(GavaLifeSP, "chisq") - fitMeasures(GavaLifeMP, "chisq"), fitMeasures(GavaLifeSP, "df") - fitMeasures(GavaLifeMP, "df"), lower.tail = T)
## chisq 
## 0.336
"Strict"
## [1] "Strict"
1 - pchisq(fitMeasures(GavaLifeR, "chisq") - fitMeasures(GavaLifeSP, "chisq"), fitMeasures(GavaLifeR, "df") - fitMeasures(GavaLifeSP, "df"), lower.tail = T)
## chisq 
##     0
1 - pchisq(fitMeasures(GavaLifeRP, "chisq") - fitMeasures(GavaLifeSP, "chisq"), fitMeasures(GavaLifeRP, "df") - fitMeasures(GavaLifeSP, "df"), lower.tail = T)
## chisq 
## 0.931
"Latent Variance Homogeneity"
## [1] "Latent Variance Homogeneity"
1 - pchisq(fitMeasures(GavaLifeLV, "chisq") - fitMeasures(GavaLifeRP, "chisq"), fitMeasures(GavaLifeLV, "df") - fitMeasures(GavaLifeRP, "df"), lower.tail = T)
## chisq 
## 0.125
"Latent Covariance Homogeneity"
## [1] "Latent Covariance Homogeneity"
1 - pchisq(fitMeasures(GavaLifeLC, "chisq") - fitMeasures(GavaLifeLV, "chisq"), fitMeasures(GavaLifeLC, "df") - fitMeasures(GavaLifeLV, "df"), lower.tail = T)
## chisq 
## 0.574
"Means"
## [1] "Means"
1 - pchisq(fitMeasures(GavaLifeMN, "chisq") - fitMeasures(GavaLifeLC, "chisq"), fitMeasures(GavaLifeMN, "df") - fitMeasures(GavaLifeLC, "df"), lower.tail = T)
## chisq 
##     0
1 - pchisq(fitMeasures(GavaLifeMNP1, "chisq") - fitMeasures(GavaLifeLC, "chisq"), fitMeasures(GavaLifeMNP1, "df") - fitMeasures(GavaLifeLC, "df"), lower.tail = T)
## chisq 
## 0.633
1 - pchisq(fitMeasures(GavaLifeMNP2, "chisq") - fitMeasures(GavaLifeLC, "chisq"), fitMeasures(GavaLifeMNP2, "df") - fitMeasures(GavaLifeLC, "df"), lower.tail = T)
## chisq 
##     0

Just as before, the largest misstep is clearly the omission of the test of whether the influences on the indicator variables were identical (i.e., strict invariance, error variance homogeneity, etc.). Because that is the critically omitted part of the modeling in Protzko (2022) and because it provides the clearest evidence that the Gavagai/Life groups did not measure the same things as expected, it needs to be reiterated. Below, I have created a table that displays the steps of partial strict model fitting. The restricted parameter in each stage is bolded. This table shows that, at each stage, the most noninvariant parameters were Life/Gavagai parameters. In no case did the Belief in Free Will items even yield significant differences. The influences on the Belief in Free Will items were common to both groups, but the influences on Life/Gavagai were clearly not. Even CFI and RMSEA showed this by the lax standards used in Protzko (2022), but since this model step - which is crucial to the estimand of that paper - was skipped, it could not have been even noticed.

Variable \(\chi^2 1\) \(\chi^2 2\) \(\chi^2 3\) \(\chi^2 4\) \(\chi^2 5\)
fwb1 965.743 710.619 552.225 409.111 332.379
fwb2 965.725 710.603 552.209 409.092 332.362
fwb3 965.626 710.506 552.116 409.000 332.266
fwb4 964.533 709.402 551.014 407.895 331.161
fwb5 965.731 710.608 552.215 409.101 332.369
mlsq1 793.560 552.227 NA NA NA
mlsq2 807.884 583.288 409.113 NA NA
mlsq3 710.620 NA NA NA NA
mlsq4 857.923 634.909 488.060 358.758 286.555
mlsq5 829.249 605.429 455.911 332.381 NA

In the metric stage, three loadings had to be freed for the model to fit again. These were each Gavagai/Life questions, and the order they had to be freed was consistent with the p-values of their between-group differences, and only the most easily confused of the two questions could remain constrained with a fitting model (“I am looking for something that makes my life feel meaningful/gavagai” and “I am always searching for something that makes my life feel meaningful/gavagai”). In the scalar stage, two intercepts had to be freed for the model to fit again. These were Gavagai/life questions. Almost all of the meanings were found to significantly differ, but some level interpretations were also found to differ independently. Level interpretations alone differing is similar to linguistic bias, which tends to fall on the intercepts for people with limited but present foreign language experience (e.g., Wicherts & Dolan, 2010). The part of the question that was not biased in interpretation or level of interpretation that may have driven the lack of noninvariance so far may have been the use of the word “always”, because few people are “always” searching for anything if they’re honest with themselves. In the strict phase, as mentioned and shown above, the Free Will Belief residual variances were indistinguishable between groups, but the same was not true for Gavagai/Life, all of which significantly varied, indicating different sets of influences on scores in the Gavagai and Life groups. The latent variances were equal, indicating that the constructs for Gavagai and Life were equally broad when the residual variances and differences in meaning in previous stages were allowed to exist in the model. This is sensible given that the noninvariant loadings were freed and the other two questions did not differ interpretively, so the variance should have been equal at this point, although too many freed parameters used to estimate it also would have meant that it could not be properly estimated (and this would be biased towards equality between groups). Luckily, there were two untouched loadings, which is sufficient to estimate the latent variance. The latent covariances again did not differ which, to note, suggests that Gavagai is consistently interpreted in the same way Life is consistently interpreted in this data - it has reliable variance and cannot by any means be called a “nothing” construct, it is just highly, consistently personal, perhaps like the other measure. That may have been driven by response bias leading to overly homogeneous responding as well. That could happen with odd questionnaires, since people may not know how to answer, so they just tick similar boxes. The mean constrained model differed considerably, and when testing between different models it became clear that the only mean that differed significantly was the mean for Gavagai/Life, rather than Belief in Free Will.

The parameters for the most-constrained model are below.

summary(GavaLifeMNP1, stand = T, fit = T)
## lavaan 0.6-9 ended normally after 73 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        63
##   Number of equality constraints                    21
##                                                       
##   Number of observations per group:                   
##     0                                              782
##     1                                              718
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                               291.264
##   Degrees of freedom                                88
##   P-value (Chi-square)                           0.000
##   Test statistic for each group:
##     0                                          153.764
##     1                                          137.500
## 
## Model Test Baseline Model:
## 
##   Test statistic                             12200.118
##   Degrees of freedom                                90
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.983
##   Tucker-Lewis Index (TLI)                       0.983
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             -21391.004
##   Loglikelihood unrestricted model (H1)     -21245.372
##                                                       
##   Akaike (AIC)                               42866.008
##   Bayesian (BIC)                             43089.163
##   Sample-size adjusted Bayesian (BIC)        42955.741
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.055
##   90 Percent confidence interval - lower         0.049
##   90 Percent confidence interval - upper         0.063
##   P-value RMSEA <= 0.05                          0.096
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.044
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## 
## Group 1 [0]:
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   FWB =~                                                                
##     fwb1              1.000                               0.931    0.689
##     fwb2    (.p2.)    1.166    0.046   25.419    0.000    1.086    0.744
##     fwb3    (.p3.)    1.308    0.049   26.536    0.000    1.218    0.784
##     fwb4    (.p4.)    1.243    0.046   27.118    0.000    1.158    0.806
##     fwb5    (.p5.)    1.141    0.046   24.773    0.000    1.062    0.723
##   MLQS =~                                                               
##     mlqs1             1.000                               1.314    0.805
##     mlqs2             1.134    0.031   36.930    0.000    1.490    0.891
##     mlqs3   (.p8.)    1.033    0.019   55.491    0.000    1.357    0.844
##     mlqs4             1.186    0.030   39.923    0.000    1.558    0.921
##     mlqs5             1.197    0.031   38.561    0.000    1.573    0.908
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   FWB ~~                                                                
##     MLQS    (.24.)    0.343    0.037    9.214    0.000    0.280    0.280
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     FWB               0.000                               0.000    0.000
##    .fwb1    (.25.)    5.430    0.035  155.664    0.000    5.430    4.018
##    .fwb2    (.26.)    5.334    0.038  141.653    0.000    5.334    3.656
##    .fwb3    (.27.)    4.945    0.040  123.312    0.000    4.945    3.183
##    .fwb4    (.28.)    5.310    0.037  143.219    0.000    5.310    3.696
##    .fwb5    (.29.)    5.075    0.038  133.752    0.000    5.075    3.452
##    .mlqs1             5.040    0.057   88.114    0.000    5.040    3.087
##    .mlqs2   (.31.)    4.844    0.055   88.214    0.000    4.844    2.896
##    .mlqs3   (.32.)    4.891    0.055   88.683    0.000    4.891    3.042
##    .mlqs4   (.33.)    4.815    0.056   86.608    0.000    4.815    2.846
##    .mlqs5             4.743    0.060   78.549    0.000    4.743    2.736
##     MLQS              0.000                               0.000    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .fwb1    (.12.)    0.959    0.041   23.652    0.000    0.959    0.525
##    .fwb2    (.13.)    0.949    0.043   22.214    0.000    0.949    0.446
##    .fwb3    (.14.)    0.931    0.045   20.720    0.000    0.931    0.386
##    .fwb4    (.15.)    0.724    0.037   19.631    0.000    0.724    0.351
##    .fwb5    (.16.)    1.033    0.045   22.854    0.000    1.033    0.478
##    .mlqs1             0.938    0.053   17.828    0.000    0.938    0.352
##    .mlqs2             0.578    0.037   15.513    0.000    0.578    0.206
##    .mlqs3             0.744    0.043   17.130    0.000    0.744    0.288
##    .mlqs4             0.434    0.032   13.545    0.000    0.434    0.152
##    .mlqs5             0.529    0.036   14.561    0.000    0.529    0.176
##     FWB     (.22.)    0.867    0.060   14.376    0.000    1.000    1.000
##     MLQS    (.23.)    1.727    0.082   21.192    0.000    1.000    1.000
## 
## 
## Group 2 [1]:
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   FWB =~                                                                
##     fwb1              1.000                               0.931    0.689
##     fwb2    (.p2.)    1.166    0.046   25.419    0.000    1.086    0.744
##     fwb3    (.p3.)    1.308    0.049   26.536    0.000    1.218    0.784
##     fwb4    (.p4.)    1.243    0.046   27.118    0.000    1.158    0.806
##     fwb5    (.p5.)    1.141    0.046   24.773    0.000    1.062    0.723
##   MLQS =~                                                               
##     mlqs1             1.000                               1.314    0.914
##     mlqs2             1.001    0.020   49.922    0.000    1.316    0.941
##     mlqs3   (.p8.)    1.033    0.019   55.491    0.000    1.357    0.949
##     mlqs4             1.009    0.020   50.321    0.000    1.326    0.943
##     mlqs5             1.056    0.021   49.790    0.000    1.388    0.945
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   FWB ~~                                                                
##     MLQS    (.24.)    0.343    0.037    9.214    0.000    0.280    0.280
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     FWB               0.000                               0.000    0.000
##    .fwb1    (.25.)    5.430    0.035  155.664    0.000    5.430    4.018
##    .fwb2    (.26.)    5.334    0.038  141.653    0.000    5.334    3.656
##    .fwb3    (.27.)    4.945    0.040  123.312    0.000    4.945    3.183
##    .fwb4    (.28.)    5.310    0.037  143.219    0.000    5.310    3.696
##    .fwb5    (.29.)    5.075    0.038  133.752    0.000    5.075    3.452
##    .mlqs1             4.880    0.059   82.158    0.000    4.880    3.394
##    .mlqs2   (.31.)    4.844    0.055   88.214    0.000    4.844    3.466
##    .mlqs3   (.32.)    4.891    0.055   88.683    0.000    4.891    3.422
##    .mlqs4   (.33.)    4.815    0.056   86.608    0.000    4.815    3.426
##    .mlqs5             4.912    0.061   80.726    0.000    4.912    3.345
##     MLQS             -0.801    0.072  -11.099    0.000   -0.609   -0.609
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .fwb1    (.12.)    0.959    0.041   23.652    0.000    0.959    0.525
##    .fwb2    (.13.)    0.949    0.043   22.214    0.000    0.949    0.446
##    .fwb3    (.14.)    0.931    0.045   20.720    0.000    0.931    0.386
##    .fwb4    (.15.)    0.724    0.037   19.631    0.000    0.724    0.351
##    .fwb5    (.16.)    1.033    0.045   22.854    0.000    1.033    0.478
##    .mlqs1             0.339    0.021   16.412    0.000    0.339    0.164
##    .mlqs2             0.222    0.015   15.013    0.000    0.222    0.114
##    .mlqs3             0.201    0.014   14.332    0.000    0.201    0.099
##    .mlqs4             0.218    0.015   14.875    0.000    0.218    0.110
##    .mlqs5             0.230    0.016   14.716    0.000    0.230    0.107
##     FWB     (.22.)    0.867    0.060   14.376    0.000    1.000    1.000
##     MLQS    (.23.)    1.727    0.082   21.192    0.000    1.000    1.000

Because randomization was presumably done properly and power was quite high due to the large samples and their near-balance, the difference between groups in model parameters can only be due differences in test function. As such, it makes no sense to try to estimate the effect of the degree of measurement bias, as whatever residual we find between those estimates and the observed differences has to be due to error variance, and specifically, differences in reliability. Because the residual variances were not equal between groups, we know that the amount of measurement error may differ, and other causes are excluded thanks to randomization. Notably, reliability can differ due to differences in range restriction and these can also affect loading differences. Despite the nonsensical nature of this, I decided to calculate the effect of bias on the means of indicator items and the proportion of the residual variances which could be attributed to systematic influences of the Gavagai/Life questionnaire differences. The first method is as follows:

#From Gunn, Grimm & Edwards (2019)

SDI2.UDI2 = function(p, loading1, loading2, intercept1, intercept2, stdev2, 
                     fmean2, fsd2, nodewidth = 0.01, lowerLV = -5, upperLV = 5) {
   LV = seq(lowerLV,upperLV,nodewidth)
   DiffExpItem12 = matrix(NA,length(LV),p)
   pdfLV2 = matrix(NA,length(LV),1)
   SDI2numerator = matrix(NA,length(LV),p)
   UDI2numerator = matrix(NA,length(LV),p)
   SDI2 = matrix(NA,p,1)
   UDI2 = matrix(NA,p,1)
   for(j in 1:p){
    for(k in 1:length(LV)){
      DiffExpItem12[k,j] <- (intercept1[j]-intercept2[j]) + (loading1[j]-loading2[j])*LV[k]
      pdfLV2[k] = dnorm(LV[k], mean=fmean2, sd=fsd2)
      SDI2numerator[k,j] = DiffExpItem12[k,j]*pdfLV2[k]*nodewidth
      UDI2numerator[k,j] = abs(SDI2numerator[k,j])
    }
    SDI2[j] <- sum(SDI2numerator[,j])/stdev2[j]
    UDI2[j] <- sum(UDI2numerator[,j])/stdev2[j]
   }
  colnames(SDI2) = "SDI2"
  colnames(UDI2) = "UDI2"
  return(list(SDI2=round(SDI2,4), UDI2=round(UDI2,4)))}

data %>% group_by(gavagai) %>% summarise_at(vars(mlqs1, mlqs2, mlqs3, mlqs4, mlqs5), list(mean = mean, sd = sd), na.rm = T)
#Parameters for Meaning of Life/Gavagai

MGp = 5; MGl1 = c(0.805, 0.891, 0, 0.921, 0.908); MGl2 = c(0.914, 0.941, 0, 0.943, 0.945); MGi1 = c(5.040, 0, 0, 0, 4.743); MGi2 = c(4.880, 0, 0, 0, 4.912); MGsd = c(1.657912, 1.714963, 1.678123, 1.736567, 1.778569); MGfm = 0.609; MGfsd = 1

MGS <- SDI2.UDI2(MGp, MGl1, MGl2, MGi1, MGi2, MGsd, MGfm, MGfsd)

SUDI <- function(SDI, UDI){
  D <- ifelse(SDI < 0, -1, 1)
  H = D * UDI
  return(H)}

SUDI(MGS$SDI2, MGS$UDI2)
##         SDI2
## [1,]  0.0707
## [2,] -0.0274
## [3,]  0.0000
## [4,] -0.0119
## [5,] -0.1077
qsd1 <- cohen.d(data$mlqs1, data$gavagai); qsd1$hedges.g
##       data 
## -0.6074539
qsd2 <- cohen.d(data$mlqs2, data$gavagai); qsd2$hedges.g
##       data 
## -0.5024998
qsd3 <- cohen.d(data$mlqs3, data$gavagai); qsd3$hedges.g
##       data 
## -0.5536249
qsd4 <- cohen.d(data$mlqs4, data$gavagai); qsd4$hedges.g
##      data 
## -0.468687
qsd5 <- cohen.d(data$mlqs5, data$gavagai); qsd5$hedges.g
##       data 
## -0.3993656

The effect of bias on the means is modest, but present. The second method, to quantify the proportion of systematic bias in the residual variances is as follows:

MOES <- function(T1, T2, S1, S2) {
  ES = (T1 - T2) / (S1^2 - S2^2)
  return(ES)}

qsv1 <- MOES(.938, .339, 1.657912, 1.400830)
qsv2 <- MOES(.578, .222, 1.714963, 1.354044)
qsv3 <- MOES(.744, .201, 1.678123, 1.366392)
qsv4 <- MOES(.434, .218, 1.736567, 1.371404)
qsv5 <- MOES(.529, .230, 1.778569, 1.423737)

print(paste("mlqs1 proportion =", qsv1, "mlqs2 = ", qsv2, "mlqs3 = ", qsv3, "mlqs4 = ", qsv4, "mlqs5 = ", qsv5))
## [1] "mlqs1 proportion = 0.761749724822151 mlqs2 =  0.321397410682556 mlqs3 =  0.572139218935405 mlqs4 =  0.190322453189071 mlqs5 =  0.263139218246245"

So, the residual 24%, 68%, 43%, 81%, and 74% of residual variance differences would be attributable to differences in reliability.

Discussion

Protzko (2022) was an interesting paper whose data delivered a resoundingly clear conclusion: measurement invariance, done properly, shows us that a scale measures the same thing for different groups. The experimental nature of the Protzko (2022) data brilliantly confirms that. However, it also provides considerable room for caution, because it is apparent that certain fit indices are insensitive at some steps of measurement invariance testing, and also, there are still people who test measurement invariance but do not understand the meaning of the various stages, or that error variance homogeneity is a necessity for achieving measurement invariance and the conclusion that measurement pertains only to constructs whose influences are shared between groups. It is easy to forget that the error terms are themselves latent variables; testing for strict invariance is how we deal with them.

Edit: Sacha Epskamp has now partially confirmed these results (https://gist.github.com/SachaEpskamp/1d876d52186c6c0677f8c38871896182) and provided simulations of fit indices that show that CFI and RMSEA are not good model fit indices for comparing nested models (https://gist.github.com/SachaEpskamp/f8310d15e78c60e14e5dff347acdcee1). Their purpose should be for initial model fitting. His twitter post on this: https://twitter.com/SachaEpskamp/status/1518518277147873283.

References

Protzko, J. (2022). Invariance: What Does Measurement Invariance Allow us to Claim? PsyArXiv. https://doi.org/10.31234/osf.io/r8yka

DeShon, R. P. (2004). Measures are not invariant across groups without error variance homogeneity. Psychology Science, 46(1), 137–149.

Ropovik, I. (2015). A cautionary note on testing latent variable models. Frontiers in Psychology, 6. https://doi.org/10.3389/fpsyg.2015.01715

McNeish, D., & Wolf, M. G. (2021). Dynamic fit index cutoffs for confirmatory factor analysis models. Psychological Methods. https://doi.org/10.1037/met0000425

Robitzsch, A. (2020). Why Ordinal Variables Can (Almost) Always Be Treated as Continuous Variables: Clarifying Assumptions of Robust Continuous and Ordinal Factor Analysis Estimation Methods. Frontiers in Education, 5. https://doi.org/10.3389/feduc.2020.589965

Wicherts, J. M., & Dolan, C. V. (2010). Measurement Invariance in Confirmatory Factor Analysis: An Illustration Using IQ Test Performance of Minorities. Educational Measurement: Issues and Practice, 29(3), 39–47. https://doi.org/10.1111/j.1745-3992.2010.00182.x

Gunn, H. J., Grimm, K. J., & Edwards, M. C. (2019). Evaluation of Six Effect Size Measures of Measurement Non-Invariance for Continuous Outcomes. Structural Equation Modeling: A Multidisciplinary Journal, 27(4), 503–514. https://doi.org/10.1080/10705511.2019.1689507

Postscript

A popular snowclone goes “One man’s modus ponens is another man’s modus tollens.” The idea that the meaning of gavagai scale is, at the end of the day, “nonsense” is not tenable in light of this and the fact that the meaning of gavagai questionnaire is not totally nonsensical. It asks real questions with a word substituted for what would normally be there. If we want to know how much a nonsense factor correlates with a real one, or if it can be invariant with a real one, then we need to either

  1. Use a nonsense measurement, like the ridiculous questions posited at the beginning of this, or

  2. Randomly generate responses and see if they are correlated/invariant with real factors.

Because the first option is not available with this data, we will have to settle for the second.

p_load(wakefield)
set.seed(1)

N = 718

Q1 <- likert_7(
  N,
  #x = c("Absolutely Untrue", "Mostly Untrue", "Somewhat Untrue", "Can't Say", "Somewhat True", "Mostly True", "Absolutely True"),
  x = c("1", "2", "3", "4", "5", "6", "7"),
  name = "Q1")

Q2 <- likert_7(
  N,
  x = c("1", "2", "3", "4", "5", "6", "7"),
  name = "Q2")

Q3 <- likert_7(
  N,
  x = c("1", "2", "3", "4", "5", "6", "7"),
  name = "Q3")

Q4 <- likert_7(
  N,
  x = c("1", "2", "3", "4", "5", "6", "7"),
  name = "Q5")

Q5 <- likert_7(
  N,
  x = c("1", "2", "3", "4", "5", "6", "7"),
  name = "Q5")

dataLife <- subset(data, gavagai == 0)
dataGav <- subset(data, gavagai == 1)

dataGav$mlqs1 <- Q1
dataGav$mlqs2 <- Q2
dataGav$mlqs3 <- Q3
dataGav$mlqs4 <- Q4
dataGav$mlqs5 <- Q5

data <- rbind(dataLife, dataGav)
GavaLifeC <- sem(GavaLife.model, data, std.lv = T, group = "gavagai")
GavaLifeM <- sem(GavaLife.model, data, std.lv = F, group = "gavagai", group.equal = "loadings")
## Warning in lav_object_post_check(object): lavaan WARNING: some estimated lv
## variances are negative
GavaLifeS <- sem(GavaLife.model, data, std.lv = F, group = "gavagai", group.equal = c("loadings", "intercepts"))
## Warning in lav_object_post_check(object): lavaan WARNING: some estimated lv
## variances are negative
GavaLifeR <- sem(GavaLife.model, data, std.lv = F, group = "gavagai", group.equal = c("loadings", "intercepts", "residuals"))
GavaLifeLV <- sem(GavaLife.model, data, std.lv = F, group = "gavagai", group.equal = c("loadings", "intercepts", "residuals", "lv.variances"))
GavaLifeLC <- sem(GavaLife.model, data, std.lv = F, group = "gavagai", group.equal = c("loadings", "intercepts", "residuals", "lv.variances", "lv.covariances"))
GavaLifeMN <- sem(GavaLife.model, data, std.lv = F, group = "gavagai", group.equal = c("loadings", "intercepts", "residuals", "lv.variances", "lv.covariances", "means"))

round(cbind(Configural = fitMeasures(GavaLifeC, FITM),
            Metric = fitMeasures(GavaLifeM, FITM),
            Scalar = fitMeasures(GavaLifeS, FITM),
            Strict = fitMeasures(GavaLifeR, FITM),
            LV.Vars = fitMeasures(GavaLifeLV, FITM),
            LV.Covars = fitMeasures(GavaLifeLC, FITM),
            Means = fitMeasures(GavaLifeMN, FITM)), 3)
##                Configural    Metric    Scalar    Strict   LV.Vars LV.Covars
## chisq             202.205   213.169   236.951  2593.629  2844.297  2856.082
## df                 68.000    76.000    84.000    94.000    96.000    97.000
## npar               62.000    54.000    46.000    36.000    34.000    33.000
## cfi                 0.981     0.981     0.979     0.649     0.614     0.612
## rmsea               0.051     0.049     0.049     0.188     0.195     0.195
## rmsea.ci.lower      0.043     0.041     0.042     0.182     0.189     0.189
## rmsea.ci.upper      0.060     0.057     0.057     0.195     0.202     0.201
## aic             50436.493 50431.457 50439.239 52775.917 53022.585 53032.370
## bic             50765.912 50718.371 50683.647 52967.193 53203.234 53207.706
##                    Means
## chisq           3019.382
## df                99.000
## npar              31.000
## cfi                0.590
## rmsea              0.198
## rmsea.ci.lower     0.192
## rmsea.ci.upper     0.204
## aic            53191.670
## bic            53356.380
summary(GavaLifeM, stand = T, fit = T)
## lavaan 0.6-9 ended normally after 47 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        62
##   Number of equality constraints                     8
##                                                       
##   Number of observations per group:                   
##     0                                              782
##     1                                              718
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                               213.169
##   Degrees of freedom                                76
##   P-value (Chi-square)                           0.000
##   Test statistic for each group:
##     0                                          145.432
##     1                                           67.737
## 
## Model Test Baseline Model:
## 
##   Test statistic                              7205.173
##   Degrees of freedom                                90
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.981
##   Tucker-Lewis Index (TLI)                       0.977
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             -25161.728
##   Loglikelihood unrestricted model (H1)             NA
##                                                       
##   Akaike (AIC)                               50431.457
##   Bayesian (BIC)                             50718.371
##   Sample-size adjusted Bayesian (BIC)        50546.828
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.049
##   90 Percent confidence interval - lower         0.041
##   90 Percent confidence interval - upper         0.057
##   P-value RMSEA <= 0.05                          0.566
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.029
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## 
## Group 1 [0]:
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   FWB =~                                                                
##     fwb1              1.000                               0.933    0.689
##     fwb2    (.p2.)    1.169    0.046   25.453    0.000    1.090    0.746
##     fwb3    (.p3.)    1.306    0.049   26.487    0.000    1.218    0.781
##     fwb4    (.p4.)    1.243    0.046   27.118    0.000    1.160    0.814
##     fwb5    (.p5.)    1.139    0.046   24.742    0.000    1.063    0.723
##   MLQS =~                                                               
##     mlqs1             1.000                               1.344    0.811
##     mlqs2   (.p7.)    1.143    0.037   30.923    0.000    1.536    0.896
##     mlqs3   (.p8.)    1.073    0.037   29.020    0.000    1.442    0.860
##     mlqs4   (.p9.)    1.195    0.037   32.497    0.000    1.606    0.925
##     mlqs5   (.10.)    1.206    0.038   31.781    0.000    1.621    0.912
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   FWB ~~                                                                
##     MLQS              0.378    0.053    7.131    0.000    0.301    0.301
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .fwb1              5.393    0.048  111.467    0.000    5.393    3.986
##    .fwb2              5.321    0.052  101.863    0.000    5.321    3.643
##    .fwb3              4.934    0.056   88.515    0.000    4.934    3.165
##    .fwb4              5.304    0.051  104.069    0.000    5.304    3.722
##    .fwb5              5.042    0.053   95.954    0.000    5.042    3.431
##    .mlqs1             5.019    0.059   84.719    0.000    5.019    3.030
##    .mlqs2             4.825    0.061   78.723    0.000    4.825    2.815
##    .mlqs3             4.908    0.060   81.857    0.000    4.908    2.927
##    .mlqs4             4.763    0.062   76.739    0.000    4.763    2.744
##    .mlqs5             4.717    0.064   74.224    0.000    4.717    2.654
##     FWB               0.000                               0.000    0.000
##     MLQS              0.000                               0.000    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .fwb1              0.961    0.056   17.281    0.000    0.961    0.525
##    .fwb2              0.946    0.058   16.289    0.000    0.946    0.443
##    .fwb3              0.946    0.061   15.405    0.000    0.946    0.390
##    .fwb4              0.687    0.048   14.296    0.000    0.687    0.338
##    .fwb5              1.030    0.062   16.743    0.000    1.030    0.477
##    .mlqs1             0.939    0.053   17.821    0.000    0.939    0.342
##    .mlqs2             0.578    0.037   15.550    0.000    0.578    0.197
##    .mlqs3             0.732    0.043   16.864    0.000    0.732    0.260
##    .mlqs4             0.435    0.032   13.608    0.000    0.435    0.144
##    .mlqs5             0.532    0.036   14.632    0.000    0.532    0.168
##     FWB               0.870    0.070   12.437    0.000    1.000    1.000
##     MLQS              1.806    0.132   13.640    0.000    1.000    1.000
## 
## 
## Group 2 [1]:
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   FWB =~                                                                
##     fwb1              1.000                               0.929    0.689
##     fwb2    (.p2.)    1.169    0.046   25.453    0.000    1.086    0.746
##     fwb3    (.p3.)    1.306    0.049   26.487    0.000    1.213    0.784
##     fwb4    (.p4.)    1.243    0.046   27.118    0.000    1.156    0.798
##     fwb5    (.p5.)    1.139    0.046   24.742    0.000    1.059    0.720
##   MLQS =~                                                               
##     mlqs1             1.000                                  NA       NA
##     mlqs2   (.p7.)    1.143    0.037   30.923    0.000       NA       NA
##     mlqs3   (.p8.)    1.073    0.037   29.020    0.000       NA       NA
##     mlqs4   (.p9.)    1.195    0.037   32.497    0.000       NA       NA
##     mlqs5   (.10.)    1.206    0.038   31.781    0.000       NA       NA
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   FWB ~~                                                                
##     MLQS              0.011    0.028    0.391    0.695    0.046    0.046
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .fwb1              5.467    0.050  108.628    0.000    5.467    4.054
##    .fwb2              5.344    0.054   98.293    0.000    5.344    3.668
##    .fwb3              4.953    0.058   85.708    0.000    4.953    3.199
##    .fwb4              5.312    0.054   98.321    0.000    5.312    3.669
##    .fwb5              5.106    0.055   93.091    0.000    5.106    3.474
##    .mlqs1             3.979    0.075   53.145    0.000    3.979    1.983
##    .mlqs2             4.008    0.075   53.613    0.000    4.008    2.001
##    .mlqs3             4.031    0.075   54.010    0.000    4.031    2.016
##    .mlqs4             3.933    0.074   52.868    0.000    3.933    1.973
##    .mlqs5             4.054    0.073   55.495    0.000    4.054    2.071
##     FWB               0.000                               0.000    0.000
##     MLQS              0.000                                  NA       NA
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .fwb1              0.955    0.058   16.478    0.000    0.955    0.525
##    .fwb2              0.943    0.061   15.507    0.000    0.943    0.444
##    .fwb3              0.925    0.064   14.547    0.000    0.925    0.386
##    .fwb4              0.761    0.054   14.102    0.000    0.761    0.363
##    .fwb5              1.039    0.065   15.992    0.000    1.039    0.481
##    .mlqs1             4.092    0.216   18.932    0.000    4.092    1.017
##    .mlqs2             4.101    0.218   18.800    0.000    4.101    1.022
##    .mlqs3             4.076    0.216   18.871    0.000    4.076    1.019
##    .mlqs4             4.070    0.217   18.727    0.000    4.070    1.024
##    .mlqs5             3.930    0.210   18.679    0.000    3.930    1.026
##     FWB               0.864    0.071   12.155    0.000    1.000    1.000
##     MLQS             -0.067    0.035   -1.933    0.053       NA       NA
summary(GavaLifeS, stand = T, fit = T)
## lavaan 0.6-9 ended normally after 73 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        64
##   Number of equality constraints                    18
##                                                       
##   Number of observations per group:                   
##     0                                              782
##     1                                              718
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                               236.951
##   Degrees of freedom                                84
##   P-value (Chi-square)                           0.000
##   Test statistic for each group:
##     0                                          150.279
##     1                                           86.672
## 
## Model Test Baseline Model:
## 
##   Test statistic                              7205.173
##   Degrees of freedom                                90
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.979
##   Tucker-Lewis Index (TLI)                       0.977
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             -25173.619
##   Loglikelihood unrestricted model (H1)             NA
##                                                       
##   Akaike (AIC)                               50439.239
##   Bayesian (BIC)                             50683.647
##   Sample-size adjusted Bayesian (BIC)        50537.518
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.049
##   90 Percent confidence interval - lower         0.042
##   90 Percent confidence interval - upper         0.057
##   P-value RMSEA <= 0.05                          0.551
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.032
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## 
## Group 1 [0]:
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   FWB =~                                                                
##     fwb1              1.000                               0.933    0.689
##     fwb2    (.p2.)    1.168    0.046   25.457    0.000    1.090    0.746
##     fwb3    (.p3.)    1.305    0.049   26.489    0.000    1.218    0.781
##     fwb4    (.p4.)    1.243    0.046   27.118    0.000    1.159    0.813
##     fwb5    (.p5.)    1.139    0.046   24.748    0.000    1.063    0.723
##   MLQS =~                                                               
##     mlqs1             1.000                               1.373    0.817
##     mlqs2   (.p7.)    1.118    0.035   32.138    0.000    1.535    0.896
##     mlqs3   (.p8.)    1.056    0.035   30.221    0.000    1.449    0.862
##     mlqs4   (.p9.)    1.167    0.035   33.756    0.000    1.603    0.924
##     mlqs5   (.10.)    1.170    0.036   32.918    0.000    1.606    0.910
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   FWB ~~                                                                
##     MLQS              0.387    0.054    7.150    0.000    0.302    0.302
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .fwb1    (.24.)    5.415    0.043  126.504    0.000    5.415    4.001
##    .fwb2    (.25.)    5.316    0.048  111.886    0.000    5.316    3.640
##    .fwb3    (.26.)    4.925    0.052   95.515    0.000    4.925    3.160
##    .fwb4    (.27.)    5.292    0.048  110.125    0.000    5.292    3.713
##    .fwb5    (.28.)    5.057    0.047  106.934    0.000    5.057    3.441
##    .mlqs1   (.29.)    4.970    0.059   84.704    0.000    4.970    2.959
##    .mlqs2   (.30.)    4.828    0.061   79.522    0.000    4.828    2.819
##    .mlqs3   (.31.)    4.896    0.059   82.583    0.000    4.896    2.911
##    .mlqs4   (.32.)    4.768    0.062   77.282    0.000    4.768    2.750
##    .mlqs5   (.33.)    4.741    0.063   75.635    0.000    4.741    2.686
##     FWB               0.000                               0.000    0.000
##     MLQS              0.000                               0.000    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .fwb1              0.961    0.056   17.279    0.000    0.961    0.525
##    .fwb2              0.946    0.058   16.288    0.000    0.946    0.443
##    .fwb3              0.947    0.061   15.406    0.000    0.947    0.390
##    .fwb4              0.687    0.048   14.301    0.000    0.687    0.338
##    .fwb5              1.031    0.062   16.741    0.000    1.031    0.477
##    .mlqs1             0.937    0.053   17.728    0.000    0.937    0.332
##    .mlqs2             0.576    0.037   15.539    0.000    0.576    0.197
##    .mlqs3             0.729    0.043   16.819    0.000    0.729    0.258
##    .mlqs4             0.437    0.032   13.666    0.000    0.437    0.146
##    .mlqs5             0.538    0.036   14.782    0.000    0.538    0.173
##     FWB               0.870    0.070   12.438    0.000    1.000    1.000
##     MLQS              1.885    0.135   13.958    0.000    1.000    1.000
## 
## 
## Group 2 [1]:
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   FWB =~                                                                
##     fwb1              1.000                               0.930    0.689
##     fwb2    (.p2.)    1.168    0.046   25.457    0.000    1.086    0.746
##     fwb3    (.p3.)    1.305    0.049   26.489    0.000    1.213    0.784
##     fwb4    (.p4.)    1.243    0.046   27.118    0.000    1.155    0.798
##     fwb5    (.p5.)    1.139    0.046   24.748    0.000    1.059    0.721
##   MLQS =~                                                               
##     mlqs1             1.000                                  NA       NA
##     mlqs2   (.p7.)    1.118    0.035   32.138    0.000       NA       NA
##     mlqs3   (.p8.)    1.056    0.035   30.221    0.000       NA       NA
##     mlqs4   (.p9.)    1.167    0.035   33.756    0.000       NA       NA
##     mlqs5   (.10.)    1.170    0.036   32.918    0.000       NA       NA
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   FWB ~~                                                                
##     MLQS              0.011    0.029    0.398    0.691    0.045    0.045
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .fwb1    (.24.)    5.415    0.043  126.504    0.000    5.415    4.014
##    .fwb2    (.25.)    5.316    0.048  111.886    0.000    5.316    3.649
##    .fwb3    (.26.)    4.925    0.052   95.515    0.000    4.925    3.181
##    .fwb4    (.27.)    5.292    0.048  110.125    0.000    5.292    3.656
##    .fwb5    (.28.)    5.057    0.047  106.934    0.000    5.057    3.440
##    .mlqs1   (.29.)    4.970    0.059   84.704    0.000    4.970    2.460
##    .mlqs2   (.30.)    4.828    0.061   79.522    0.000    4.828    2.409
##    .mlqs3   (.31.)    4.896    0.059   82.583    0.000    4.896    2.447
##    .mlqs4   (.32.)    4.768    0.062   77.282    0.000    4.768    2.391
##    .mlqs5   (.33.)    4.741    0.063   75.635    0.000    4.741    2.411
##     FWB               0.028    0.052    0.543    0.587    0.030    0.030
##     MLQS             -0.753    0.061  -12.441    0.000       NA       NA
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .fwb1              0.955    0.058   16.477    0.000    0.955    0.525
##    .fwb2              0.943    0.061   15.505    0.000    0.943    0.444
##    .fwb3              0.925    0.064   14.548    0.000    0.925    0.386
##    .fwb4              0.761    0.054   14.107    0.000    0.761    0.363
##    .fwb5              1.039    0.065   15.989    0.000    1.039    0.481
##    .mlqs1             4.154    0.219   18.936    0.000    4.154    1.018
##    .mlqs2             4.107    0.218   18.819    0.000    4.107    1.023
##    .mlqs3             4.087    0.216   18.882    0.000    4.087    1.021
##    .mlqs4             4.077    0.217   18.749    0.000    4.077    1.025
##    .mlqs5             3.970    0.212   18.722    0.000    3.970    1.026
##     FWB               0.864    0.071   12.156    0.000    1.000    1.000
##     MLQS             -0.074    0.036   -2.038    0.042       NA       NA

If you generated invalid, randomly generated data, you could potentially fool yourself into thinking measurement invariance held, you’d just have to ignore problems like the negative latent variance that would invalidate the model. Importantly, this actual nonsense factor composed of randomly generated data does not correlate with the free will beliefs questionnaire. That illustrates quite clearly that the meaning of gavagai is not nonsense to the people answering it. It uses a nonsense word, but it produces a valid factor, and it at least has comparable predictive validity in this one scenario with a more cognitively meaningful questionnaire about the meaning of life.

The position that a different factor is measured by different questionnaires is empirical, not about questionnaire construction. We could not a priori ensure the meaning of life and gavagai questionnaires measured wholly different things, so we could have attained measurement invariance and still have had the tests not assess the theoretical utility of measurement invariance testing since the question of “different” factors was still on the table. On the other hand, we could and did find that they measured different things, we merely had to do the right tests. This asymmetry is at the core of why this experiment, as conducted, could only show that measurement invariance worked, not that it did not.

sessionInfo()
## R version 4.1.2 (2021-11-01)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19042)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] wakefield_0.3.6       lavaan_0.6-9          pacman_0.5.1         
##  [4] kirkegaard_2022-04-12 rlang_0.4.12          metafor_3.0-2        
##  [7] Matrix_1.3-4          psych_2.1.9           magrittr_2.0.1       
## [10] assertthat_0.2.1      weights_1.0.4         Hmisc_4.7-0          
## [13] Formula_1.2-4         survival_3.2-13       lattice_0.20-45      
## [16] forcats_0.5.1         stringr_1.4.0         dplyr_1.0.7          
## [19] purrr_0.3.4           readr_2.0.2           tidyr_1.1.4          
## [22] tibble_3.1.5          ggplot2_3.3.5         tidyverse_1.3.1      
## 
## loaded via a namespace (and not attached):
##  [1] nlme_3.1-153        fs_1.5.0            lubridate_1.8.0    
##  [4] RColorBrewer_1.1-2  httr_1.4.2          tools_4.1.2        
##  [7] backports_1.3.0     bslib_0.3.1         utf8_1.2.2         
## [10] R6_2.5.1            rpart_4.1-15        DBI_1.1.2          
## [13] colorspace_2.0-2    nnet_7.3-16         withr_2.4.2        
## [16] mnormt_2.0.2        tidyselect_1.1.1    gridExtra_2.3      
## [19] compiler_4.1.2      cli_3.1.0           rvest_1.0.2        
## [22] htmlTable_2.4.0     mice_3.14.0         xml2_1.3.2         
## [25] sass_0.4.0          scales_1.1.1        checkmate_2.0.0    
## [28] pbivnorm_0.6.0      digest_0.6.28       foreign_0.8-81     
## [31] minqa_1.2.4         rmarkdown_2.11      base64enc_0.1-3    
## [34] jpeg_0.1-9          pkgconfig_2.0.3     htmltools_0.5.2    
## [37] lme4_1.1-27.1       dbplyr_2.1.1        fastmap_1.1.0      
## [40] htmlwidgets_1.5.4   readxl_1.4.0        rstudioapi_0.13    
## [43] jquerylib_0.1.4     generics_0.1.1      jsonlite_1.7.2     
## [46] gtools_3.9.2        Rcpp_1.0.7          munsell_0.5.0      
## [49] fansi_0.5.0         lifecycle_1.0.1     stringi_1.7.5      
## [52] yaml_2.2.1          mathjaxr_1.4-0      MASS_7.3-54        
## [55] grid_4.1.2          parallel_4.1.2      gdata_2.18.0       
## [58] crayon_1.4.2        haven_2.4.3         splines_4.1.2      
## [61] hms_1.1.1           tmvnsim_1.0-2       knitr_1.36         
## [64] pillar_1.6.4        boot_1.3-28         stats4_4.1.2       
## [67] reprex_2.0.1        glue_1.4.2          evaluate_0.14      
## [70] latticeExtra_0.6-29 data.table_1.14.2   modelr_0.1.8       
## [73] nloptr_1.2.2.3      png_0.1-7           vctrs_0.3.8        
## [76] tzdb_0.2.0          cellranger_1.1.0    gtable_0.3.0       
## [79] xfun_0.27           broom_0.7.10        cluster_2.1.2      
## [82] ellipsis_0.3.2