library(pacman); p_load(psych, lavaan, tidyverse)
FITM <- c("chisq", "df", "nPar", "cfi", "rmsea", "rmsea.ci.lower", "rmsea.ci.upper", "aic", "bic")
John Protzko has produced a new paper that is interesting, to say the least (Protzko, 2022). The gist is that the interpretation of measurement invariance that goes
\[f(X,\eta) = f(X, \eta, v)\]
is false. He showed this by taking a sample of 1500 and randomly selecting participants to take a Meaning of Life questionnaire, while the others took a Meaning of Gavagai (a nonsense word) questionnaire, and then assessing measurement invariance when these questionnaires’ items were treated as if they came from one scale. By finding that measurement invariance held, we could dismiss the notion that measurement invariance guarantees you are measuring the same things, and thus \(f(X,\eta) \neq f(X, \eta, v)\), because X is not influenced by a common \(\eta\).
That manuscript suffered from a number of incredibly large defects and showcased basic misunderstandings of measurement in general and measurement invariance in particular. All of its conclusions are incorrect, and, actually, it serves as a proof that measurement invariance testing works. In no particular order, here are the issues that stood out most prominently.
Firstly, we do not get to pick what our scales measure. If we devise a happiness scale and discover that it is perfectly correlated with a survey about enjoying processed foods, we have a construct validity issue for one or both scales and this should cause introspection about what is being measured rather than dismissal of either instrument because the questions do not appear to plausibly entail identical or even just highly-related measurements. The “Gavagai” scale does not measure “nothing” (pp.’s 2, 17) just because it is nonsensical on its face. People who are given the Gavagai questionnaire impute their own meanings and they can be meanings derived in the moment or according to some broad rule if they so wish. To the degree people in general do this, we should actually expect some degree of validity. To the extent that people hold a questionnaire’s questions in mind together, we should expect some covariance if they do not have clearly different content, simply because people should infer relationships among questions and try to answer accordingly. A questionnaire for nothing is - a nonsense scale - must be truly nonsensical to measure nothing. An example question set for a questionnaire which would truly measure nothing would be something like
Bananas: Yes or no?
How old are you: Finland or Silverback Kittens?
Uh oh: How does that make you feel (fill in the blank and then eat the paper and yell out your answer)?
The use of “Gavagai”, which is a word whose meaning can be imputed for whatever, with otherwise sensible questions does not generate a nothing scale, and psychometricians should stop acting like it does. Additionally, unless an interview gets people to firmly state how they interpreted a thing, it cannot be used to verify or dismiss interpretation in common in a sample. When 65% of people say they don’t know what Gavagai is and 35% guess Meaning/purpose, happiness, money, or sexual pleasure, it may be that the residual 65% had an idea in mind when answering and it may have been consistent between questions, but they did not answer because they felt uncertain and we observed a response bias.
Second, \(\omega\) and high loadings are not separate findings. \(\omega\) is a measure of factor saturation derived from the loadings, and can be found despite low test-retest reliability if the common variance of a set of items is considerable enough. Simulate it - seriously! Though unnecessary, it is trivial to achieve this condition if you violate local independence to sustain it for a single-factor model.
Third, measurement invariance is not just testing metric and scalar invariance. When metric invariance holds, indicator variables are interpreted the same way for different groups, and when scale invariance holds, the means of those indicators are interpretable the same for different groups. But, more importantly, to get to the interpretation of measurement invariance as meaning that a scale is “measuring the same thing” for different groups, you must also have equal residual variances. Variance on an item is caused by something, whether it be a real systematic influence, like schooling on a vocabulary test, or error. To show that the same thing is measured, one must make all influences on tests equal, so the variances cannot differ net of differences in the latent variances. Moreover, to ensure equal reliability, you must also control the residual variances. This is well-known and well-understood, but some people are adamant that they can interpret the variances without assessing their equality, which is obviously wrong. These people call going to scalar invariance measurement invariance, but they are in the wrong: you must have error variance homogeneity to have measurement invariance (see DeShon, 2004). As a final note, to be very explicit, “Strong” or “Scalar” invariance does not mean a scale measures the same thing in different groups, it means the interpretations of items and their levels is common between groups. Strict invariance means scales measure the same thing.
Four, insensitive model fit indices are not useful for assessing measurement invariance, and the conclusions they yield are often wrong. In Ropovik’s (2015) cautionary note on testing between model fit indices, he reviewed published SEMs where model fits were compared. The most common fit indices reported were RMSEA and CFI, followed by SRMR, and, finally, \(\chi^2\) exact fit tests. This is critical because approximate fit indices are much weaker tests of fit, and they can easily be gamed, by picking and choosing those which agree or do not. On the other hand, the \(chi^2\) test is able to strongly detect violations of measurement invariance, and both AIC and BIC (in that order) are superior to RMSEA, CFI, and SRMR. Using insensitive fit indices is a surefire recipe for picking the wrong model, so it should be considered alarming that so many do it. If a model is found to be noninvariant, the extent of measurement bias to the means can be estimated with existing methods. Other variant parameters are just informative for theoretical reasons. To be serious about measurement invariance requires seriously testing the hypotheses we are interested in, but Protzko (2022) did not feature serious measurement invariance testing. Instead, it involved the use of weak indices. In the future, hopefully dynamically computed fit indices can be generated so we can stop relying on cutoffs that may not apply to our data (see McNeish & Wolf, 2021).
Fifth, the estimand for Protzko (2022) was wrong. What was tested in that publication was, at best, whether there was discriminant validity for the Meaning of Life and Meaning of Gavagai questionnaires, not whether measurement invariance means tests measure the same thing. Surely, were we to administer questions from different questionnaires for extraversion and to see if they were invariant, we might find something like this - and especially so if we use weak cutoffs -, but that does not mean that measurement invariance is flawed; it means the analysis was. We have a lopsided estimand: on the one hand, showing that Gavagai is not the same measurement as Meaning of Life shows that, yes, different things generate different results, but showing that they generate the same is agnostic with respect to the conclusion. To get to the desired non-lopsided test, the desired method is to administer two samples the same questionnaire and assess if doing something like providing answers, practice, or coaching for an objective assessment leads to noninvariance. This gives us a confirmed different source of score variation, rather than an open question about the construct validity of different questionnaires. In this sense, Protzko (2022) could only truly verify that measurement invariance worked, not that it did not.
I show below that, when analyzed properly, with sensitive fit indices and full measurement invariance, the data from Protzko (2022) strongly support the interpretation of measurement invariance as showing that the same things are measured in different groups.
As a side note, the estimator is formally wrong, but it should not make much difference theoretically (Robitzsch, 2021), although the proper estimator provides more liberal CFI and RMSEA estimates, which, if focused on instead of more sensitive fit indices, can increase the odds of spuriously accepting measurement invariance (there is a considerable literature on this, but it is also obvious because treating ordinals as ordinals ought to generate better fit with ordinals than treating them as if they’re quantitative).
First, the aggregate model, purely for interest’s sake. If we run this model in both groups, we can see that the loadings do significantly differ, so omnibus metric invariance must be rejected. This leaves room for partial metric invariance, and the location of that partial invariance can be informative about the sources of interpretive differences for our groups. Because that is one of our questions, lets investigate where that difference in loadings appears. As it turns out,
GavaData <- subset(data, gavagai == 1)
LifeData <- subset(data, gavagai == 0)
GavaLife.model <- '
FWB =~ fwb1 + fwb2 + fwb3 + fwb4 + fwb5
MLQS =~ mlqs1 + mlqs2 + mlqs3 + mlqs4 + mlqs5'
GavaLife.fit <- sem(GavaLife.model, data, std.lv = T)
GavaGava.fit <- sem(GavaLife.model, GavaData, std.lv = T)
LifeLife.fit <- sem(GavaLife.model, LifeData, std.lv = T)
summary(GavaLife.fit, stand = T, fit = T)
## lavaan 0.6-9 ended normally after 25 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 21
##
## Number of observations 1500
##
## Model Test User Model:
##
## Test statistic 216.262
## Degrees of freedom 34
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 11890.034
## Degrees of freedom 45
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.985
## Tucker-Lewis Index (TLI) 0.980
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -21832.993
## Loglikelihood unrestricted model (H1) -21724.862
##
## Akaike (AIC) 43707.986
## Bayesian (BIC) 43819.564
## Sample-size adjusted Bayesian (BIC) 43752.852
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.060
## 90 Percent confidence interval - lower 0.052
## 90 Percent confidence interval - upper 0.068
## P-value RMSEA <= 0.05 0.016
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.022
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## FWB =~
## fwb1 0.931 0.032 28.738 0.000 0.931 0.689
## fwb2 1.086 0.034 31.918 0.000 1.086 0.745
## fwb3 1.218 0.036 34.294 0.000 1.218 0.784
## fwb4 1.158 0.032 35.704 0.000 1.158 0.806
## fwb5 1.062 0.035 30.618 0.000 1.062 0.722
## MLQS =~
## mlqs1 1.386 0.033 41.644 0.000 1.386 0.861
## mlqs2 1.468 0.032 46.319 0.000 1.468 0.918
## mlqs3 1.435 0.032 44.831 0.000 1.435 0.900
## mlqs4 1.506 0.032 47.745 0.000 1.506 0.933
## mlqs5 1.523 0.033 46.848 0.000 1.523 0.923
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## FWB ~~
## MLQS 0.269 0.026 10.199 0.000 0.269 0.269
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .fwb1 0.960 0.041 23.653 0.000 0.960 0.525
## .fwb2 0.949 0.043 22.205 0.000 0.949 0.446
## .fwb3 0.931 0.045 20.715 0.000 0.931 0.386
## .fwb4 0.723 0.037 19.607 0.000 0.723 0.350
## .fwb5 1.034 0.045 22.859 0.000 1.034 0.478
## .mlqs1 0.668 0.028 24.194 0.000 0.668 0.258
## .mlqs2 0.405 0.019 21.389 0.000 0.405 0.158
## .mlqs3 0.481 0.021 22.589 0.000 0.481 0.189
## .mlqs4 0.336 0.017 19.747 0.000 0.336 0.129
## .mlqs5 0.400 0.019 20.848 0.000 0.400 0.147
## FWB 1.000 1.000 1.000
## MLQS 1.000 1.000 1.000
standardizedSolution(GavaGava.fit); standardizedSolution(LifeLife.fit) #Same output as the configural model
ZREG <- function(B1, B2, SEB1, SEB2) {
Z = (B1-B2)/sqrt((SEB1)^2 + (SEB2)^2)
return(Z)}
#Standardized
"Free will Beliefs - Standardized"
## [1] "Free will Beliefs - Standardized"
(1 - pnorm(ZREG(0.665, 0.711, 0.024, 0.021))) * 2 #fwb1, *2 to be two-tailed, though one-tailed is appropriate because we know to expect bias as a result of the use of different forms, and not to expect bias for the free will beliefs questionnaire due to randomization
## [1] 1.850821
(1 - pnorm(ZREG(0.774, 0.746, 0.020, 0.019))) * 2 #fwb2
## [1] 0.3101062
(1 - pnorm(ZREG(0.789, 0.779, 0.018, 0.018))) * 2 #fwb3
## [1] 0.6944398
(1 - pnorm(ZREG(0.803, 0.808, 0.017, 0.016))) * 2 #fwb4
## [1] 1.169591
(1 - pnorm(ZREG(0.726, 0.719, 0.021, 0.020))) * 2 #fwb5
## [1] 0.8092611
"Meaning of Life/Gavagai - Standardized"
## [1] "Meaning of Life/Gavagai - Standardized"
(1 - pnorm(ZREG(0.909, 0.811, 0.007, 0.013))) * 2 #mlsq1
## [1] 3.192691e-11
(1 - pnorm(ZREG(0.937, 0.896, 0.005, 0.008))) * 2 #mlsq2
## [1] 1.386481e-05
(1 - pnorm(ZREG(0.944, 0.860, 0.005, 0.010))) * 2 #mlsq3
## [1] 5.77316e-14
(1 - pnorm(ZREG(0.941, 0.925, 0.005, 0.007))) * 2 #mlsq4
## [1] 0.06289087
(1 - pnorm(ZREG(0.941, 0.912, 0.005, 0.007))) * 2 #mlsq5
## [1] 0.0007484652
#Unstandardized - from the configural model, below, which had to be identified with latent variance standardization to avoid hiding bias with the constrained marker, and could be unstandardized in the subsequent, overidentified models to allow formally testing its equality between groups
"Free will Beliefs - Unstandardized"
## [1] "Free will Beliefs - Unstandardized"
(1 - pnorm(ZREG(.981, .878, .045, .046))) * 2
## [1] 0.1094641
(1 - pnorm(ZREG(1.087, 1.086, .047, .049))) * 2
## [1] 0.988249
(1 - pnorm(ZREG(1.211, 1.224, .049, .051))) * 2
## [1] 1.145838
(1 - pnorm(ZREG(1.144, 1.172, .044, .048))) * 2
## [1] 1.332809
(1 - pnorm(ZREG(1.053, 1.071, .048, .050))) * 2
## [1] 1.204904
"Meaning of Life/Gavagai - Unstandardized"
## [1] "Meaning of Life/Gavagai - Unstandardized"
(1 - pnorm(ZREG(1.344, 1.273, .049, .040))) * 2
## [1] 0.2616634
(1 - pnorm(ZREG(1.536, 1.268, .048, .038))) * 2
## [1] 1.199993e-05
(1 - pnorm(ZREG(1.443, 1.289, .048, .038))) * 2
## [1] 0.01188703
(1 - pnorm(ZREG(1.605, 1.289, .047, .038))) * 2
## [1] 1.710563e-07
(1 - pnorm(ZREG(1.621, 1.339, .049, .040))) * 2
## [1] 8.262986e-06
Consistent with the randomization procedure and the use of different tests, none of the loadings for the Free Will Belief questions significantly differed, but four of the loadings for the Meaning of Life/Gavagai questions, did (and three probably meaningfully). The simple conclusion is that the omnibus rejection of metric invariance must concern that questionnaire, and not the common one. Now, before moving on to actual testing of measurement invariance, it should be noted, that when testing partial models, at each step, the newly chosen model is chosen based on the most noninvariant parameter as indicated by \(\chi^2\), and the search for partial models stops when the difference with the prior stage of measurement invariance is nonsignificant. Importantly, model selection biases subsequent p values - we can achieve a model in which, say, some loadings known to be noninvariant appear to be acceptably invariant, but we can know based on inspection of initial model parameters, that this cannot be the case. With that said, we move on to fitting all of the non-partial measurement invariance models, in order of increasing restrictiveness.
GavaLifeC <- sem(GavaLife.model, data, std.lv = T, group = "gavagai")
GavaLifeM <- sem(GavaLife.model, data, std.lv = F, group = "gavagai", group.equal = "loadings")
GavaLifeS <- sem(GavaLife.model, data, std.lv = F, group = "gavagai", group.equal = c("loadings", "intercepts"))
GavaLifeR <- sem(GavaLife.model, data, std.lv = F, group = "gavagai", group.equal = c("loadings", "intercepts", "residuals"))
GavaLifeLV <- sem(GavaLife.model, data, std.lv = F, group = "gavagai", group.equal = c("loadings", "intercepts", "residuals", "lv.variances"))
GavaLifeLC <- sem(GavaLife.model, data, std.lv = F, group = "gavagai", group.equal = c("loadings", "intercepts", "residuals", "lv.variances", "lv.covariances"))
GavaLifeMN <- sem(GavaLife.model, data, std.lv = F, group = "gavagai", group.equal = c("loadings", "intercepts", "residuals", "lv.variances", "lv.covariances", "means"))
round(cbind(Configural = fitMeasures(GavaLifeC, FITM),
Metric = fitMeasures(GavaLifeM, FITM),
Scalar = fitMeasures(GavaLifeS, FITM),
Strict = fitMeasures(GavaLifeR, FITM),
LV.Vars = fitMeasures(GavaLifeLV, FITM),
LV.Covars = fitMeasures(GavaLifeLC, FITM),
Means = fitMeasures(GavaLifeMN, FITM)), 3)
## Configural Metric Scalar Strict LV.Vars LV.Covars
## chisq 272.483 302.135 359.628 1039.201 1063.359 1063.360
## df 68.000 76.000 84.000 94.000 96.000 97.000
## npar 62.000 54.000 46.000 36.000 34.000 33.000
## cfi 0.983 0.981 0.977 0.922 0.920 0.920
## rmsea 0.063 0.063 0.066 0.116 0.116 0.115
## rmsea.ci.lower 0.056 0.056 0.059 0.109 0.110 0.109
## rmsea.ci.upper 0.071 0.071 0.073 0.122 0.122 0.122
## aic 42887.227 42900.879 42942.372 43601.945 43622.103 43620.104
## bic 43216.647 43187.793 43186.780 43793.221 43802.753 43795.441
## Means
## chisq 1175.242
## df 99.000
## npar 31.000
## cfi 0.911
## rmsea 0.120
## rmsea.ci.lower 0.114
## rmsea.ci.upper 0.127
## aic 43727.986
## bic 43892.696
summary(GavaLifeC, stand = T, fit = T)
## lavaan 0.6-9 ended normally after 35 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 62
##
## Number of observations per group:
## 0 782
## 1 718
##
## Model Test User Model:
##
## Test statistic 272.483
## Degrees of freedom 68
## P-value (Chi-square) 0.000
## Test statistic for each group:
## 0 143.593
## 1 128.891
##
## Model Test Baseline Model:
##
## Test statistic 12200.118
## Degrees of freedom 90
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.983
## Tucker-Lewis Index (TLI) 0.978
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -21381.614
## Loglikelihood unrestricted model (H1) -21245.372
##
## Akaike (AIC) 42887.227
## Bayesian (BIC) 43216.647
## Sample-size adjusted Bayesian (BIC) 43019.690
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.063
## 90 Percent confidence interval - lower 0.056
## 90 Percent confidence interval - upper 0.071
## P-value RMSEA <= 0.05 0.003
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.026
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
##
## Group 1 [0]:
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## FWB =~
## fwb1 0.981 0.045 21.658 0.000 0.981 0.711
## fwb2 1.087 0.047 23.124 0.000 1.087 0.746
## fwb3 1.211 0.049 24.566 0.000 1.211 0.779
## fwb4 1.144 0.044 25.908 0.000 1.144 0.808
## fwb5 1.053 0.048 22.007 0.000 1.053 0.719
## MLQS =~
## mlqs1 1.344 0.049 27.279 0.000 1.344 0.811
## mlqs2 1.536 0.048 32.005 0.000 1.536 0.896
## mlqs3 1.443 0.048 29.901 0.000 1.443 0.860
## mlqs4 1.605 0.047 33.798 0.000 1.605 0.925
## mlqs5 1.621 0.049 32.974 0.000 1.621 0.912
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## FWB ~~
## MLQS 0.302 0.036 8.385 0.000 0.302 0.302
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .fwb1 5.393 0.049 109.303 0.000 5.393 3.909
## .fwb2 5.321 0.052 102.114 0.000 5.321 3.652
## .fwb3 4.934 0.056 88.683 0.000 4.934 3.171
## .fwb4 5.304 0.051 104.767 0.000 5.304 3.746
## .fwb5 5.042 0.052 96.325 0.000 5.042 3.445
## .mlqs1 5.019 0.059 84.714 0.000 5.019 3.029
## .mlqs2 4.825 0.061 78.724 0.000 4.825 2.815
## .mlqs3 4.908 0.060 81.838 0.000 4.908 2.927
## .mlqs4 4.763 0.062 76.755 0.000 4.763 2.745
## .mlqs5 4.717 0.064 74.219 0.000 4.717 2.654
## FWB 0.000 0.000 0.000
## MLQS 0.000 0.000 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .fwb1 0.942 0.056 16.766 0.000 0.942 0.495
## .fwb2 0.943 0.059 16.065 0.000 0.943 0.444
## .fwb3 0.953 0.063 15.193 0.000 0.953 0.394
## .fwb4 0.696 0.049 14.171 0.000 0.696 0.347
## .fwb5 1.034 0.062 16.613 0.000 1.034 0.483
## .mlqs1 0.939 0.053 17.820 0.000 0.939 0.342
## .mlqs2 0.578 0.037 15.549 0.000 0.578 0.197
## .mlqs3 0.731 0.043 16.862 0.000 0.731 0.260
## .mlqs4 0.435 0.032 13.614 0.000 0.435 0.144
## .mlqs5 0.532 0.036 14.632 0.000 0.532 0.168
## FWB 1.000 1.000 1.000
## MLQS 1.000 1.000 1.000
##
##
## Group 2 [1]:
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## FWB =~
## fwb1 0.878 0.046 18.987 0.000 0.878 0.665
## fwb2 1.086 0.049 22.031 0.000 1.086 0.744
## fwb3 1.224 0.051 23.916 0.000 1.224 0.789
## fwb4 1.172 0.048 24.554 0.000 1.172 0.803
## fwb5 1.071 0.050 21.310 0.000 1.071 0.726
## MLQS =~
## mlqs1 1.273 0.040 31.652 0.000 1.273 0.909
## mlqs2 1.268 0.038 33.401 0.000 1.268 0.937
## mlqs3 1.289 0.038 33.832 0.000 1.289 0.944
## mlqs4 1.289 0.038 33.611 0.000 1.289 0.941
## mlqs5 1.339 0.040 33.658 0.000 1.339 0.941
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## FWB ~~
## MLQS 0.257 0.038 6.743 0.000 0.257 0.257
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .fwb1 5.467 0.049 111.074 0.000 5.467 4.145
## .fwb2 5.344 0.055 98.030 0.000 5.344 3.658
## .fwb3 4.953 0.058 85.534 0.000 4.953 3.192
## .fwb4 5.312 0.054 97.567 0.000 5.312 3.641
## .fwb5 5.106 0.055 92.700 0.000 5.106 3.460
## .mlqs1 4.084 0.052 78.166 0.000 4.084 2.917
## .mlqs2 4.045 0.050 80.095 0.000 4.045 2.989
## .mlqs3 4.057 0.051 79.617 0.000 4.057 2.971
## .mlqs4 4.026 0.051 78.727 0.000 4.026 2.938
## .mlqs5 4.071 0.053 76.673 0.000 4.071 2.861
## FWB 0.000 0.000 0.000
## MLQS 0.000 0.000 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .fwb1 0.969 0.058 16.653 0.000 0.969 0.557
## .fwb2 0.953 0.062 15.335 0.000 0.953 0.447
## .fwb3 0.910 0.064 14.112 0.000 0.910 0.378
## .fwb4 0.754 0.055 13.599 0.000 0.754 0.354
## .fwb5 1.030 0.066 15.706 0.000 1.030 0.473
## .mlqs1 0.339 0.021 16.367 0.000 0.339 0.173
## .mlqs2 0.222 0.015 14.997 0.000 0.222 0.121
## .mlqs3 0.203 0.014 14.471 0.000 0.203 0.109
## .mlqs4 0.216 0.015 14.755 0.000 0.216 0.115
## .mlqs5 0.230 0.016 14.697 0.000 0.230 0.114
## FWB 1.000 1.000 1.000
## MLQS 1.000 1.000 1.000
1 - pchisq(302.135 - 272.483, 8, lower.tail = T) #Metric vs Configural
## [1] 0.0002435004
1 - pchisq(359.628 - 302.135, 8, lower.tail = T) #Scalar vs Metric
## [1] 1.442822e-09
1 - pchisq(1039.201 - 359.628, 10, lower.tail = T) #Strict vs Scalar
## [1] 0
1 - pchisq(1063.359 - 1039.201, 2, lower.tail = T) #Homogeneous Latent Variances vs Strict
## [1] 5.677498e-06
1 - pchisq(1063.360 - 1063.359, 1, lower.tail = T) #Equal Latent Covariances vs Homogeneous Latent Variances
## [1] 0.9747729
1 - pchisq(1175.242 - 1063.360, 2, lower.tail = T) #Homogeneous Means vs Equal Latent Covariances
## [1] 0
Every stage of measurement invariance testing yielded noninvariance, except for the stage in which the equality of the latent variances was tested. This makes sense, because in the Protzko (2022) paper, the Life group and the Gavagai group had latent covariances of 0.31 (0.22 - 0.42) and 0.22 (0.13 - 0.32), respectively. The difference between those correlations is clearly not significant (p = 0.10), so this stage of testing should not have yielded any evidence of noninvariance. Everything thus far is consistent. Note the most major source of misfit: the residual variances. This proves, definitively, that at least some of the causes of scoring are not shared between groups. The neglect of this stage - which is absolutely crucial to the conclusion that something measures the same thing in different groups - is what explains the result.
The partial models follow.
GavaLifeC <- sem(GavaLife.model, data, std.lv = T, group = "gavagai")
GavaLifeM <- sem(GavaLife.model, data, std.lv = F, group = "gavagai", group.equal = "loadings")
GavaLifeMP <- sem(GavaLife.model, data, std.lv = F, group = "gavagai", group.equal = "loadings", group.partial = c("MLQS =~ mlqs4", "MLQS =~ mlqs5", "MLQS =~ mlqs2"))
GavaLifeS <- sem(GavaLife.model, data, std.lv = F, group = "gavagai", group.equal = c("loadings", "intercepts"), group.partial = c("MLQS =~ mlqs4", "MLQS =~ mlqs5", "MLQS =~ mlqs2"))
GavaLifeSP <- sem(GavaLife.model, data, std.lv = F, group = "gavagai", group.equal = c("loadings", "intercepts"), group.partial = c("MLQS =~ mlqs4", "MLQS =~ mlqs5", "MLQS =~ mlqs2", "mlqs5 ~ 1", "mlqs1 ~ 1"))
GavaLifeR <- sem(GavaLife.model, data, std.lv = F, group = "gavagai", group.equal = c("loadings", "intercepts", "residuals"), group.partial = c("MLQS =~ mlqs4", "MLQS =~ mlqs5", "MLQS =~ mlqs2", "mlqs5 ~ 1", "mlqs1 ~ 1"))
GavaLifeRP <- sem(GavaLife.model, data, std.lv = F, group = "gavagai", group.equal = c("loadings", "intercepts", "residuals"), group.partial = c("MLQS =~ mlqs4", "MLQS =~ mlqs5", "MLQS =~ mlqs2", "mlqs5 ~ 1", "mlqs1 ~ 1", "mlqs3 ~~ mlqs3", "mlqs1 ~~ mlqs1", "mlqs2 ~~ mlqs2", "mlqs5 ~~ mlqs5", "mlqs4 ~~ mlqs4"))
GavaLifeLV <- sem(GavaLife.model, data, std.lv = F, group = "gavagai", group.equal = c("loadings", "intercepts", "residuals", "lv.variances"), group.partial = c("MLQS =~ mlqs4", "MLQS =~ mlqs5", "MLQS =~ mlqs2", "mlqs5 ~ 1", "mlqs1 ~ 1", "mlqs3 ~~ mlqs3", "mlqs1 ~~ mlqs1", "mlqs2 ~~ mlqs2", "mlqs5 ~~ mlqs5", "mlqs4 ~~ mlqs4"))
GavaLifeLC <- sem(GavaLife.model, data, std.lv = F, group = "gavagai", group.equal = c("loadings", "intercepts", "residuals", "lv.variances", "lv.covariances"), group.partial = c("MLQS =~ mlqs4", "MLQS =~ mlqs5", "MLQS =~ mlqs2", "mlqs5 ~ 1", "mlqs1 ~ 1", "mlqs3 ~~ mlqs3", "mlqs1 ~~ mlqs1", "mlqs2 ~~ mlqs2", "mlqs5 ~~ mlqs5", "mlqs4 ~~ mlqs4"))
GavaLifeMN <- sem(GavaLife.model, data, std.lv = F, group = "gavagai", group.equal = c("loadings", "intercepts", "residuals", "lv.variances", "lv.covariances", "means"), group.partial = c("MLQS =~ mlqs4", "MLQS =~ mlqs5", "MLQS =~ mlqs2", "mlqs5 ~ 1", "mlqs1 ~ 1", "mlqs3 ~~ mlqs3", "mlqs1 ~~ mlqs1", "mlqs2 ~~ mlqs2", "mlqs5 ~~ mlqs5", "mlqs4 ~~ mlqs4"))
GavaLifeMN1.model <- '
FWB =~ fwb1 + fwb2 + fwb3 + fwb4 + fwb5
MLQS =~ mlqs1 + mlqs2 + mlqs3 + mlqs4 + mlqs5
FWB ~ c(0,0)*1'
GavaLifeMN2.model <- '
FWB =~ fwb1 + fwb2 + fwb3 + fwb4 + fwb5
MLQS =~ mlqs1 + mlqs2 + mlqs3 + mlqs4 + mlqs5
MLQS ~ c(0,0)*1'
GavaLifeMNP1 <- sem(GavaLifeMN1.model, data, std.lv = F, group = "gavagai", group.equal = c("loadings", "intercepts", "residuals", "lv.variances", "lv.covariances"), group.partial = c("MLQS =~ mlqs4", "MLQS =~ mlqs5", "MLQS =~ mlqs2", "mlqs5 ~ 1", "mlqs1 ~ 1", "mlqs3 ~~ mlqs3", "mlqs1 ~~ mlqs1", "mlqs2 ~~ mlqs2", "mlqs5 ~~ mlqs5", "mlqs4 ~~ mlqs4"))
GavaLifeMNP2 <- sem(GavaLifeMN2.model, data, std.lv = F, group = "gavagai", group.equal = c("loadings", "intercepts", "residuals", "lv.variances", "lv.covariances"), group.partial = c("MLQS =~ mlqs4", "MLQS =~ mlqs5", "MLQS =~ mlqs2", "mlqs5 ~ 1", "mlqs1 ~ 1", "mlqs3 ~~ mlqs3", "mlqs1 ~~ mlqs1", "mlqs2 ~~ mlqs2", "mlqs5 ~~ mlqs5", "mlqs4 ~~ mlqs4"))
round(cbind(Configural = fitMeasures(GavaLifeC, FITM),
Metric = fitMeasures(GavaLifeM, FITM),
P_Metric = fitMeasures(GavaLifeMP, FITM),
Scalar = fitMeasures(GavaLifeS, FITM),
P_Scalar = fitMeasures(GavaLifeSP, FITM),
Strict = fitMeasures(GavaLifeR, FITM),
P_Strict = fitMeasures(GavaLifeRP, FITM),
LV.Vars = fitMeasures(GavaLifeLV, FITM),
LV.Covars = fitMeasures(GavaLifeLC, FITM),
Means = fitMeasures(GavaLifeMN, FITM),
P_Means1 = fitMeasures(GavaLifeMNP1, FITM),
P_Means2 = fitMeasures(GavaLifeMNP2, FITM)), 3)
## Configural Metric P_Metric Scalar P_Scalar Strict
## chisq 272.483 302.135 278.377 323.664 285.213 965.745
## df 68.000 76.000 73.000 81.000 79.000 89.000
## npar 62.000 54.000 57.000 49.000 51.000 41.000
## cfi 0.983 0.981 0.983 0.980 0.983 0.928
## rmsea 0.063 0.063 0.061 0.063 0.059 0.115
## rmsea.ci.lower 0.056 0.056 0.054 0.056 0.052 0.108
## rmsea.ci.upper 0.071 0.071 0.069 0.070 0.066 0.121
## aic 42887.227 42900.879 42883.121 42912.408 42877.957 43538.489
## bic 43216.647 43187.793 43185.975 43172.756 43148.931 43756.331
## P_Strict LV.Vars LV.Covars Means P_Means1 P_Means2
## chisq 286.555 290.720 291.036 413.526 291.264 403.427
## df 84.000 86.000 87.000 89.000 88.000 88.000
## npar 46.000 44.000 43.000 41.000 42.000 42.000
## cfi 0.983 0.983 0.983 0.973 0.983 0.974
## rmsea 0.057 0.056 0.056 0.070 0.055 0.069
## rmsea.ci.lower 0.050 0.049 0.049 0.063 0.049 0.062
## rmsea.ci.upper 0.064 0.064 0.063 0.077 0.063 0.076
## aic 42869.299 42869.464 42867.780 42986.270 42866.008 42978.171
## bic 43113.707 43103.245 43096.249 43204.112 43089.163 43201.326
"Metric"
## [1] "Metric"
1 - pchisq(fitMeasures(GavaLifeM, "chisq") - fitMeasures(GavaLifeC, "chisq"), fitMeasures(GavaLifeM, "df") - fitMeasures(GavaLifeC, "df"), lower.tail = T)
## chisq
## 0
1 - pchisq(fitMeasures(GavaLifeMP, "chisq") - fitMeasures(GavaLifeC, "chisq"), fitMeasures(GavaLifeMP, "df") - fitMeasures(GavaLifeC, "df"), lower.tail = T)
## chisq
## 0.317
"Scalar"
## [1] "Scalar"
1 - pchisq(fitMeasures(GavaLifeS, "chisq") - fitMeasures(GavaLifeMP, "chisq"), fitMeasures(GavaLifeS, "df") - fitMeasures(GavaLifeMP, "df"), lower.tail = T)
## chisq
## 0
1 - pchisq(fitMeasures(GavaLifeSP, "chisq") - fitMeasures(GavaLifeMP, "chisq"), fitMeasures(GavaLifeSP, "df") - fitMeasures(GavaLifeMP, "df"), lower.tail = T)
## chisq
## 0.336
"Strict"
## [1] "Strict"
1 - pchisq(fitMeasures(GavaLifeR, "chisq") - fitMeasures(GavaLifeSP, "chisq"), fitMeasures(GavaLifeR, "df") - fitMeasures(GavaLifeSP, "df"), lower.tail = T)
## chisq
## 0
1 - pchisq(fitMeasures(GavaLifeRP, "chisq") - fitMeasures(GavaLifeSP, "chisq"), fitMeasures(GavaLifeRP, "df") - fitMeasures(GavaLifeSP, "df"), lower.tail = T)
## chisq
## 0.931
"Latent Variance Homogeneity"
## [1] "Latent Variance Homogeneity"
1 - pchisq(fitMeasures(GavaLifeLV, "chisq") - fitMeasures(GavaLifeRP, "chisq"), fitMeasures(GavaLifeLV, "df") - fitMeasures(GavaLifeRP, "df"), lower.tail = T)
## chisq
## 0.125
"Latent Covariance Homogeneity"
## [1] "Latent Covariance Homogeneity"
1 - pchisq(fitMeasures(GavaLifeLC, "chisq") - fitMeasures(GavaLifeLV, "chisq"), fitMeasures(GavaLifeLC, "df") - fitMeasures(GavaLifeLV, "df"), lower.tail = T)
## chisq
## 0.574
"Means"
## [1] "Means"
1 - pchisq(fitMeasures(GavaLifeMN, "chisq") - fitMeasures(GavaLifeLC, "chisq"), fitMeasures(GavaLifeMN, "df") - fitMeasures(GavaLifeLC, "df"), lower.tail = T)
## chisq
## 0
1 - pchisq(fitMeasures(GavaLifeMNP1, "chisq") - fitMeasures(GavaLifeLC, "chisq"), fitMeasures(GavaLifeMNP1, "df") - fitMeasures(GavaLifeLC, "df"), lower.tail = T)
## chisq
## 0.633
1 - pchisq(fitMeasures(GavaLifeMNP2, "chisq") - fitMeasures(GavaLifeLC, "chisq"), fitMeasures(GavaLifeMNP2, "df") - fitMeasures(GavaLifeLC, "df"), lower.tail = T)
## chisq
## 0
Just as before, the largest misstep is clearly the omission of the test of whether the influences on the indicator variables were identical (i.e., strict invariance, error variance homogeneity, etc.). Because that is the critically omitted part of the modeling in Protzko (2022) and because it provides the clearest evidence that the Gavagai/Life groups did not measure the same things as expected, it needs to be reiterated. Below, I have created a table that displays the steps of partial strict model fitting. The restricted parameter in each stage is bolded. This table shows that, at each stage, the most noninvariant parameters were Life/Gavagai parameters. In no case did the Belief in Free Will items even yield significant differences. The influences on the Belief in Free Will items were common to both groups, but the influences on Life/Gavagai were clearly not. Even CFI and RMSEA showed this by the lax standards used in Protzko (2022), but since this model step - which is crucial to the estimand of that paper - was skipped, it could not have been even noticed.
Variable | \(\chi^2 1\) | \(\chi^2 2\) | \(\chi^2 3\) | \(\chi^2 4\) | \(\chi^2 5\) |
---|---|---|---|---|---|
fwb1 | 965.743 | 710.619 | 552.225 | 409.111 | 332.379 |
fwb2 | 965.725 | 710.603 | 552.209 | 409.092 | 332.362 |
fwb3 | 965.626 | 710.506 | 552.116 | 409.000 | 332.266 |
fwb4 | 964.533 | 709.402 | 551.014 | 407.895 | 331.161 |
fwb5 | 965.731 | 710.608 | 552.215 | 409.101 | 332.369 |
mlsq1 | 793.560 | 552.227 | NA | NA | NA |
mlsq2 | 807.884 | 583.288 | 409.113 | NA | NA |
mlsq3 | 710.620 | NA | NA | NA | NA |
mlsq4 | 857.923 | 634.909 | 488.060 | 358.758 | 286.555 |
mlsq5 | 829.249 | 605.429 | 455.911 | 332.381 | NA |
In the metric stage, three loadings had to be freed for the model to fit again. These were each Gavagai/Life questions, and the order they had to be freed was consistent with the p-values of their between-group differences, and only the most easily confused of the two questions could remain constrained with a fitting model (“I am looking for something that makes my life feel meaningful/gavagai” and “I am always searching for something that makes my life feel meaningful/gavagai”). In the scalar stage, two intercepts had to be freed for the model to fit again. These were Gavagai/life questions. Almost all of the meanings were found to significantly differ, but some level interpretations were also found to differ independently. Level interpretations alone differing is similar to linguistic bias, which tends to fall on the intercepts for people with limited but present foreign language experience (e.g., Wicherts & Dolan, 2010). The part of the question that was not biased in interpretation or level of interpretation that may have driven the lack of noninvariance so far may have been the use of the word “always”, because few people are “always” searching for anything if they’re honest with themselves. In the strict phase, as mentioned and shown above, the Free Will Belief residual variances were indistinguishable between groups, but the same was not true for Gavagai/Life, all of which significantly varied, indicating different sets of influences on scores in the Gavagai and Life groups. The latent variances were equal, indicating that the constructs for Gavagai and Life were equally broad when the residual variances and differences in meaning in previous stages were allowed to exist in the model. This is sensible given that the noninvariant loadings were freed and the other two questions did not differ interpretively, so the variance should have been equal at this point, although too many freed parameters used to estimate it also would have meant that it could not be properly estimated (and this would be biased towards equality between groups). Luckily, there were two untouched loadings, which is sufficient to estimate the latent variance. The latent covariances again did not differ which, to note, suggests that Gavagai is consistently interpreted in the same way Life is consistently interpreted in this data - it has reliable variance and cannot by any means be called a “nothing” construct, it is just highly, consistently personal, perhaps like the other measure. That may have been driven by response bias leading to overly homogeneous responding as well. That could happen with odd questionnaires, since people may not know how to answer, so they just tick similar boxes. The mean constrained model differed considerably, and when testing between different models it became clear that the only mean that differed significantly was the mean for Gavagai/Life, rather than Belief in Free Will.
The parameters for the most-constrained model are below.
summary(GavaLifeMNP1, stand = T, fit = T)
## lavaan 0.6-9 ended normally after 73 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 63
## Number of equality constraints 21
##
## Number of observations per group:
## 0 782
## 1 718
##
## Model Test User Model:
##
## Test statistic 291.264
## Degrees of freedom 88
## P-value (Chi-square) 0.000
## Test statistic for each group:
## 0 153.764
## 1 137.500
##
## Model Test Baseline Model:
##
## Test statistic 12200.118
## Degrees of freedom 90
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.983
## Tucker-Lewis Index (TLI) 0.983
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -21391.004
## Loglikelihood unrestricted model (H1) -21245.372
##
## Akaike (AIC) 42866.008
## Bayesian (BIC) 43089.163
## Sample-size adjusted Bayesian (BIC) 42955.741
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.055
## 90 Percent confidence interval - lower 0.049
## 90 Percent confidence interval - upper 0.063
## P-value RMSEA <= 0.05 0.096
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.044
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
##
## Group 1 [0]:
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## FWB =~
## fwb1 1.000 0.931 0.689
## fwb2 (.p2.) 1.166 0.046 25.419 0.000 1.086 0.744
## fwb3 (.p3.) 1.308 0.049 26.536 0.000 1.218 0.784
## fwb4 (.p4.) 1.243 0.046 27.118 0.000 1.158 0.806
## fwb5 (.p5.) 1.141 0.046 24.773 0.000 1.062 0.723
## MLQS =~
## mlqs1 1.000 1.314 0.805
## mlqs2 1.134 0.031 36.930 0.000 1.490 0.891
## mlqs3 (.p8.) 1.033 0.019 55.491 0.000 1.357 0.844
## mlqs4 1.186 0.030 39.923 0.000 1.558 0.921
## mlqs5 1.197 0.031 38.561 0.000 1.573 0.908
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## FWB ~~
## MLQS (.24.) 0.343 0.037 9.214 0.000 0.280 0.280
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## FWB 0.000 0.000 0.000
## .fwb1 (.25.) 5.430 0.035 155.664 0.000 5.430 4.018
## .fwb2 (.26.) 5.334 0.038 141.653 0.000 5.334 3.656
## .fwb3 (.27.) 4.945 0.040 123.312 0.000 4.945 3.183
## .fwb4 (.28.) 5.310 0.037 143.219 0.000 5.310 3.696
## .fwb5 (.29.) 5.075 0.038 133.752 0.000 5.075 3.452
## .mlqs1 5.040 0.057 88.114 0.000 5.040 3.087
## .mlqs2 (.31.) 4.844 0.055 88.214 0.000 4.844 2.896
## .mlqs3 (.32.) 4.891 0.055 88.683 0.000 4.891 3.042
## .mlqs4 (.33.) 4.815 0.056 86.608 0.000 4.815 2.846
## .mlqs5 4.743 0.060 78.549 0.000 4.743 2.736
## MLQS 0.000 0.000 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .fwb1 (.12.) 0.959 0.041 23.652 0.000 0.959 0.525
## .fwb2 (.13.) 0.949 0.043 22.214 0.000 0.949 0.446
## .fwb3 (.14.) 0.931 0.045 20.720 0.000 0.931 0.386
## .fwb4 (.15.) 0.724 0.037 19.631 0.000 0.724 0.351
## .fwb5 (.16.) 1.033 0.045 22.854 0.000 1.033 0.478
## .mlqs1 0.938 0.053 17.828 0.000 0.938 0.352
## .mlqs2 0.578 0.037 15.513 0.000 0.578 0.206
## .mlqs3 0.744 0.043 17.130 0.000 0.744 0.288
## .mlqs4 0.434 0.032 13.545 0.000 0.434 0.152
## .mlqs5 0.529 0.036 14.561 0.000 0.529 0.176
## FWB (.22.) 0.867 0.060 14.376 0.000 1.000 1.000
## MLQS (.23.) 1.727 0.082 21.192 0.000 1.000 1.000
##
##
## Group 2 [1]:
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## FWB =~
## fwb1 1.000 0.931 0.689
## fwb2 (.p2.) 1.166 0.046 25.419 0.000 1.086 0.744
## fwb3 (.p3.) 1.308 0.049 26.536 0.000 1.218 0.784
## fwb4 (.p4.) 1.243 0.046 27.118 0.000 1.158 0.806
## fwb5 (.p5.) 1.141 0.046 24.773 0.000 1.062 0.723
## MLQS =~
## mlqs1 1.000 1.314 0.914
## mlqs2 1.001 0.020 49.922 0.000 1.316 0.941
## mlqs3 (.p8.) 1.033 0.019 55.491 0.000 1.357 0.949
## mlqs4 1.009 0.020 50.321 0.000 1.326 0.943
## mlqs5 1.056 0.021 49.790 0.000 1.388 0.945
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## FWB ~~
## MLQS (.24.) 0.343 0.037 9.214 0.000 0.280 0.280
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## FWB 0.000 0.000 0.000
## .fwb1 (.25.) 5.430 0.035 155.664 0.000 5.430 4.018
## .fwb2 (.26.) 5.334 0.038 141.653 0.000 5.334 3.656
## .fwb3 (.27.) 4.945 0.040 123.312 0.000 4.945 3.183
## .fwb4 (.28.) 5.310 0.037 143.219 0.000 5.310 3.696
## .fwb5 (.29.) 5.075 0.038 133.752 0.000 5.075 3.452
## .mlqs1 4.880 0.059 82.158 0.000 4.880 3.394
## .mlqs2 (.31.) 4.844 0.055 88.214 0.000 4.844 3.466
## .mlqs3 (.32.) 4.891 0.055 88.683 0.000 4.891 3.422
## .mlqs4 (.33.) 4.815 0.056 86.608 0.000 4.815 3.426
## .mlqs5 4.912 0.061 80.726 0.000 4.912 3.345
## MLQS -0.801 0.072 -11.099 0.000 -0.609 -0.609
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .fwb1 (.12.) 0.959 0.041 23.652 0.000 0.959 0.525
## .fwb2 (.13.) 0.949 0.043 22.214 0.000 0.949 0.446
## .fwb3 (.14.) 0.931 0.045 20.720 0.000 0.931 0.386
## .fwb4 (.15.) 0.724 0.037 19.631 0.000 0.724 0.351
## .fwb5 (.16.) 1.033 0.045 22.854 0.000 1.033 0.478
## .mlqs1 0.339 0.021 16.412 0.000 0.339 0.164
## .mlqs2 0.222 0.015 15.013 0.000 0.222 0.114
## .mlqs3 0.201 0.014 14.332 0.000 0.201 0.099
## .mlqs4 0.218 0.015 14.875 0.000 0.218 0.110
## .mlqs5 0.230 0.016 14.716 0.000 0.230 0.107
## FWB (.22.) 0.867 0.060 14.376 0.000 1.000 1.000
## MLQS (.23.) 1.727 0.082 21.192 0.000 1.000 1.000
Because randomization was presumably done properly and power was quite high due to the large samples and their near-balance, the difference between groups in model parameters can only be due differences in test function. As such, it makes no sense to try to estimate the effect of the degree of measurement bias, as whatever residual we find between those estimates and the observed differences has to be due to error variance, and specifically, differences in reliability. Because the residual variances were not equal between groups, we know that the amount of measurement error may differ, and other causes are excluded thanks to randomization. Notably, reliability can differ due to differences in range restriction and these can also affect loading differences. Despite the nonsensical nature of this, I decided to calculate the effect of bias on the means of indicator items and the proportion of the residual variances which could be attributed to systematic influences of the Gavagai/Life questionnaire differences. The first method is as follows:
#From Gunn, Grimm & Edwards (2019)
SDI2.UDI2 = function(p, loading1, loading2, intercept1, intercept2, stdev2,
fmean2, fsd2, nodewidth = 0.01, lowerLV = -5, upperLV = 5) {
LV = seq(lowerLV,upperLV,nodewidth)
DiffExpItem12 = matrix(NA,length(LV),p)
pdfLV2 = matrix(NA,length(LV),1)
SDI2numerator = matrix(NA,length(LV),p)
UDI2numerator = matrix(NA,length(LV),p)
SDI2 = matrix(NA,p,1)
UDI2 = matrix(NA,p,1)
for(j in 1:p){
for(k in 1:length(LV)){
DiffExpItem12[k,j] <- (intercept1[j]-intercept2[j]) + (loading1[j]-loading2[j])*LV[k]
pdfLV2[k] = dnorm(LV[k], mean=fmean2, sd=fsd2)
SDI2numerator[k,j] = DiffExpItem12[k,j]*pdfLV2[k]*nodewidth
UDI2numerator[k,j] = abs(SDI2numerator[k,j])
}
SDI2[j] <- sum(SDI2numerator[,j])/stdev2[j]
UDI2[j] <- sum(UDI2numerator[,j])/stdev2[j]
}
colnames(SDI2) = "SDI2"
colnames(UDI2) = "UDI2"
return(list(SDI2=round(SDI2,4), UDI2=round(UDI2,4)))}
data %>% group_by(gavagai) %>% summarise_at(vars(mlqs1, mlqs2, mlqs3, mlqs4, mlqs5), list(mean = mean, sd = sd), na.rm = T)
#Parameters for Meaning of Life/Gavagai
MGp = 5; MGl1 = c(0.805, 0.891, 0, 0.921, 0.908); MGl2 = c(0.914, 0.941, 0, 0.943, 0.945); MGi1 = c(5.040, 0, 0, 0, 4.743); MGi2 = c(4.880, 0, 0, 0, 4.912); MGsd = c(1.657912, 1.714963, 1.678123, 1.736567, 1.778569); MGfm = 0.609; MGfsd = 1
MGS <- SDI2.UDI2(MGp, MGl1, MGl2, MGi1, MGi2, MGsd, MGfm, MGfsd)
SUDI <- function(SDI, UDI){
D <- ifelse(SDI < 0, -1, 1)
H = D * UDI
return(H)}
SUDI(MGS$SDI2, MGS$UDI2)
## SDI2
## [1,] 0.0707
## [2,] -0.0274
## [3,] 0.0000
## [4,] -0.0119
## [5,] -0.1077
qsd1 <- cohen.d(data$mlqs1, data$gavagai); qsd1$hedges.g
## data
## -0.6074539
qsd2 <- cohen.d(data$mlqs2, data$gavagai); qsd2$hedges.g
## data
## -0.5024998
qsd3 <- cohen.d(data$mlqs3, data$gavagai); qsd3$hedges.g
## data
## -0.5536249
qsd4 <- cohen.d(data$mlqs4, data$gavagai); qsd4$hedges.g
## data
## -0.468687
qsd5 <- cohen.d(data$mlqs5, data$gavagai); qsd5$hedges.g
## data
## -0.3993656
The effect of bias on the means is modest, but present. The second method, to quantify the proportion of systematic bias in the residual variances is as follows:
MOES <- function(T1, T2, S1, S2) {
ES = (T1 - T2) / (S1^2 - S2^2)
return(ES)}
qsv1 <- MOES(.938, .339, 1.657912, 1.400830)
qsv2 <- MOES(.578, .222, 1.714963, 1.354044)
qsv3 <- MOES(.744, .201, 1.678123, 1.366392)
qsv4 <- MOES(.434, .218, 1.736567, 1.371404)
qsv5 <- MOES(.529, .230, 1.778569, 1.423737)
print(paste("mlqs1 proportion =", qsv1, "mlqs2 = ", qsv2, "mlqs3 = ", qsv3, "mlqs4 = ", qsv4, "mlqs5 = ", qsv5))
## [1] "mlqs1 proportion = 0.761749724822151 mlqs2 = 0.321397410682556 mlqs3 = 0.572139218935405 mlqs4 = 0.190322453189071 mlqs5 = 0.263139218246245"
So, the residual 24%, 68%, 43%, 81%, and 74% of residual variance differences would be attributable to differences in reliability.
Protzko (2022) was an interesting paper whose data delivered a resoundingly clear conclusion: measurement invariance, done properly, shows us that a scale measures the same thing for different groups. The experimental nature of the Protzko (2022) data brilliantly confirms that. However, it also provides considerable room for caution, because it is apparent that certain fit indices are insensitive at some steps of measurement invariance testing, and also, there are still people who test measurement invariance but do not understand the meaning of the various stages, or that error variance homogeneity is a necessity for achieving measurement invariance and the conclusion that measurement pertains only to constructs whose influences are shared between groups. It is easy to forget that the error terms are themselves latent variables; testing for strict invariance is how we deal with them.
Edit: Sacha Epskamp has now partially confirmed these results (https://gist.github.com/SachaEpskamp/1d876d52186c6c0677f8c38871896182) and provided simulations of fit indices that show that CFI and RMSEA are not good model fit indices for comparing nested models (https://gist.github.com/SachaEpskamp/f8310d15e78c60e14e5dff347acdcee1). Their purpose should be for initial model fitting. His twitter post on this: https://twitter.com/SachaEpskamp/status/1518518277147873283.
Protzko, J. (2022). Invariance: What Does Measurement Invariance Allow us to Claim? PsyArXiv. https://doi.org/10.31234/osf.io/r8yka
DeShon, R. P. (2004). Measures are not invariant across groups without error variance homogeneity. Psychology Science, 46(1), 137–149.
Ropovik, I. (2015). A cautionary note on testing latent variable models. Frontiers in Psychology, 6. https://doi.org/10.3389/fpsyg.2015.01715
McNeish, D., & Wolf, M. G. (2021). Dynamic fit index cutoffs for confirmatory factor analysis models. Psychological Methods. https://doi.org/10.1037/met0000425
Robitzsch, A. (2020). Why Ordinal Variables Can (Almost) Always Be Treated as Continuous Variables: Clarifying Assumptions of Robust Continuous and Ordinal Factor Analysis Estimation Methods. Frontiers in Education, 5. https://doi.org/10.3389/feduc.2020.589965
Wicherts, J. M., & Dolan, C. V. (2010). Measurement Invariance in Confirmatory Factor Analysis: An Illustration Using IQ Test Performance of Minorities. Educational Measurement: Issues and Practice, 29(3), 39–47. https://doi.org/10.1111/j.1745-3992.2010.00182.x
Gunn, H. J., Grimm, K. J., & Edwards, M. C. (2019). Evaluation of Six Effect Size Measures of Measurement Non-Invariance for Continuous Outcomes. Structural Equation Modeling: A Multidisciplinary Journal, 27(4), 503–514. https://doi.org/10.1080/10705511.2019.1689507
A popular snowclone goes “One man’s modus ponens is another man’s modus tollens.” The idea that the meaning of gavagai scale is, at the end of the day, “nonsense” is not tenable in light of this and the fact that the meaning of gavagai questionnaire is not totally nonsensical. It asks real questions with a word substituted for what would normally be there. If we want to know how much a nonsense factor correlates with a real one, or if it can be invariant with a real one, then we need to either
Use a nonsense measurement, like the ridiculous questions posited at the beginning of this, or
Randomly generate responses and see if they are correlated/invariant with real factors.
Because the first option is not available with this data, we will have to settle for the second.
p_load(wakefield)
set.seed(1)
N = 718
Q1 <- likert_7(
N,
#x = c("Absolutely Untrue", "Mostly Untrue", "Somewhat Untrue", "Can't Say", "Somewhat True", "Mostly True", "Absolutely True"),
x = c("1", "2", "3", "4", "5", "6", "7"),
name = "Q1")
Q2 <- likert_7(
N,
x = c("1", "2", "3", "4", "5", "6", "7"),
name = "Q2")
Q3 <- likert_7(
N,
x = c("1", "2", "3", "4", "5", "6", "7"),
name = "Q3")
Q4 <- likert_7(
N,
x = c("1", "2", "3", "4", "5", "6", "7"),
name = "Q5")
Q5 <- likert_7(
N,
x = c("1", "2", "3", "4", "5", "6", "7"),
name = "Q5")
dataLife <- subset(data, gavagai == 0)
dataGav <- subset(data, gavagai == 1)
dataGav$mlqs1 <- Q1
dataGav$mlqs2 <- Q2
dataGav$mlqs3 <- Q3
dataGav$mlqs4 <- Q4
dataGav$mlqs5 <- Q5
data <- rbind(dataLife, dataGav)
GavaLifeC <- sem(GavaLife.model, data, std.lv = T, group = "gavagai")
GavaLifeM <- sem(GavaLife.model, data, std.lv = F, group = "gavagai", group.equal = "loadings")
## Warning in lav_object_post_check(object): lavaan WARNING: some estimated lv
## variances are negative
GavaLifeS <- sem(GavaLife.model, data, std.lv = F, group = "gavagai", group.equal = c("loadings", "intercepts"))
## Warning in lav_object_post_check(object): lavaan WARNING: some estimated lv
## variances are negative
GavaLifeR <- sem(GavaLife.model, data, std.lv = F, group = "gavagai", group.equal = c("loadings", "intercepts", "residuals"))
GavaLifeLV <- sem(GavaLife.model, data, std.lv = F, group = "gavagai", group.equal = c("loadings", "intercepts", "residuals", "lv.variances"))
GavaLifeLC <- sem(GavaLife.model, data, std.lv = F, group = "gavagai", group.equal = c("loadings", "intercepts", "residuals", "lv.variances", "lv.covariances"))
GavaLifeMN <- sem(GavaLife.model, data, std.lv = F, group = "gavagai", group.equal = c("loadings", "intercepts", "residuals", "lv.variances", "lv.covariances", "means"))
round(cbind(Configural = fitMeasures(GavaLifeC, FITM),
Metric = fitMeasures(GavaLifeM, FITM),
Scalar = fitMeasures(GavaLifeS, FITM),
Strict = fitMeasures(GavaLifeR, FITM),
LV.Vars = fitMeasures(GavaLifeLV, FITM),
LV.Covars = fitMeasures(GavaLifeLC, FITM),
Means = fitMeasures(GavaLifeMN, FITM)), 3)
## Configural Metric Scalar Strict LV.Vars LV.Covars
## chisq 202.205 213.169 236.951 2593.629 2844.297 2856.082
## df 68.000 76.000 84.000 94.000 96.000 97.000
## npar 62.000 54.000 46.000 36.000 34.000 33.000
## cfi 0.981 0.981 0.979 0.649 0.614 0.612
## rmsea 0.051 0.049 0.049 0.188 0.195 0.195
## rmsea.ci.lower 0.043 0.041 0.042 0.182 0.189 0.189
## rmsea.ci.upper 0.060 0.057 0.057 0.195 0.202 0.201
## aic 50436.493 50431.457 50439.239 52775.917 53022.585 53032.370
## bic 50765.912 50718.371 50683.647 52967.193 53203.234 53207.706
## Means
## chisq 3019.382
## df 99.000
## npar 31.000
## cfi 0.590
## rmsea 0.198
## rmsea.ci.lower 0.192
## rmsea.ci.upper 0.204
## aic 53191.670
## bic 53356.380
summary(GavaLifeM, stand = T, fit = T)
## lavaan 0.6-9 ended normally after 47 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 62
## Number of equality constraints 8
##
## Number of observations per group:
## 0 782
## 1 718
##
## Model Test User Model:
##
## Test statistic 213.169
## Degrees of freedom 76
## P-value (Chi-square) 0.000
## Test statistic for each group:
## 0 145.432
## 1 67.737
##
## Model Test Baseline Model:
##
## Test statistic 7205.173
## Degrees of freedom 90
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.981
## Tucker-Lewis Index (TLI) 0.977
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -25161.728
## Loglikelihood unrestricted model (H1) NA
##
## Akaike (AIC) 50431.457
## Bayesian (BIC) 50718.371
## Sample-size adjusted Bayesian (BIC) 50546.828
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.049
## 90 Percent confidence interval - lower 0.041
## 90 Percent confidence interval - upper 0.057
## P-value RMSEA <= 0.05 0.566
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.029
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
##
## Group 1 [0]:
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## FWB =~
## fwb1 1.000 0.933 0.689
## fwb2 (.p2.) 1.169 0.046 25.453 0.000 1.090 0.746
## fwb3 (.p3.) 1.306 0.049 26.487 0.000 1.218 0.781
## fwb4 (.p4.) 1.243 0.046 27.118 0.000 1.160 0.814
## fwb5 (.p5.) 1.139 0.046 24.742 0.000 1.063 0.723
## MLQS =~
## mlqs1 1.000 1.344 0.811
## mlqs2 (.p7.) 1.143 0.037 30.923 0.000 1.536 0.896
## mlqs3 (.p8.) 1.073 0.037 29.020 0.000 1.442 0.860
## mlqs4 (.p9.) 1.195 0.037 32.497 0.000 1.606 0.925
## mlqs5 (.10.) 1.206 0.038 31.781 0.000 1.621 0.912
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## FWB ~~
## MLQS 0.378 0.053 7.131 0.000 0.301 0.301
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .fwb1 5.393 0.048 111.467 0.000 5.393 3.986
## .fwb2 5.321 0.052 101.863 0.000 5.321 3.643
## .fwb3 4.934 0.056 88.515 0.000 4.934 3.165
## .fwb4 5.304 0.051 104.069 0.000 5.304 3.722
## .fwb5 5.042 0.053 95.954 0.000 5.042 3.431
## .mlqs1 5.019 0.059 84.719 0.000 5.019 3.030
## .mlqs2 4.825 0.061 78.723 0.000 4.825 2.815
## .mlqs3 4.908 0.060 81.857 0.000 4.908 2.927
## .mlqs4 4.763 0.062 76.739 0.000 4.763 2.744
## .mlqs5 4.717 0.064 74.224 0.000 4.717 2.654
## FWB 0.000 0.000 0.000
## MLQS 0.000 0.000 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .fwb1 0.961 0.056 17.281 0.000 0.961 0.525
## .fwb2 0.946 0.058 16.289 0.000 0.946 0.443
## .fwb3 0.946 0.061 15.405 0.000 0.946 0.390
## .fwb4 0.687 0.048 14.296 0.000 0.687 0.338
## .fwb5 1.030 0.062 16.743 0.000 1.030 0.477
## .mlqs1 0.939 0.053 17.821 0.000 0.939 0.342
## .mlqs2 0.578 0.037 15.550 0.000 0.578 0.197
## .mlqs3 0.732 0.043 16.864 0.000 0.732 0.260
## .mlqs4 0.435 0.032 13.608 0.000 0.435 0.144
## .mlqs5 0.532 0.036 14.632 0.000 0.532 0.168
## FWB 0.870 0.070 12.437 0.000 1.000 1.000
## MLQS 1.806 0.132 13.640 0.000 1.000 1.000
##
##
## Group 2 [1]:
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## FWB =~
## fwb1 1.000 0.929 0.689
## fwb2 (.p2.) 1.169 0.046 25.453 0.000 1.086 0.746
## fwb3 (.p3.) 1.306 0.049 26.487 0.000 1.213 0.784
## fwb4 (.p4.) 1.243 0.046 27.118 0.000 1.156 0.798
## fwb5 (.p5.) 1.139 0.046 24.742 0.000 1.059 0.720
## MLQS =~
## mlqs1 1.000 NA NA
## mlqs2 (.p7.) 1.143 0.037 30.923 0.000 NA NA
## mlqs3 (.p8.) 1.073 0.037 29.020 0.000 NA NA
## mlqs4 (.p9.) 1.195 0.037 32.497 0.000 NA NA
## mlqs5 (.10.) 1.206 0.038 31.781 0.000 NA NA
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## FWB ~~
## MLQS 0.011 0.028 0.391 0.695 0.046 0.046
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .fwb1 5.467 0.050 108.628 0.000 5.467 4.054
## .fwb2 5.344 0.054 98.293 0.000 5.344 3.668
## .fwb3 4.953 0.058 85.708 0.000 4.953 3.199
## .fwb4 5.312 0.054 98.321 0.000 5.312 3.669
## .fwb5 5.106 0.055 93.091 0.000 5.106 3.474
## .mlqs1 3.979 0.075 53.145 0.000 3.979 1.983
## .mlqs2 4.008 0.075 53.613 0.000 4.008 2.001
## .mlqs3 4.031 0.075 54.010 0.000 4.031 2.016
## .mlqs4 3.933 0.074 52.868 0.000 3.933 1.973
## .mlqs5 4.054 0.073 55.495 0.000 4.054 2.071
## FWB 0.000 0.000 0.000
## MLQS 0.000 NA NA
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .fwb1 0.955 0.058 16.478 0.000 0.955 0.525
## .fwb2 0.943 0.061 15.507 0.000 0.943 0.444
## .fwb3 0.925 0.064 14.547 0.000 0.925 0.386
## .fwb4 0.761 0.054 14.102 0.000 0.761 0.363
## .fwb5 1.039 0.065 15.992 0.000 1.039 0.481
## .mlqs1 4.092 0.216 18.932 0.000 4.092 1.017
## .mlqs2 4.101 0.218 18.800 0.000 4.101 1.022
## .mlqs3 4.076 0.216 18.871 0.000 4.076 1.019
## .mlqs4 4.070 0.217 18.727 0.000 4.070 1.024
## .mlqs5 3.930 0.210 18.679 0.000 3.930 1.026
## FWB 0.864 0.071 12.155 0.000 1.000 1.000
## MLQS -0.067 0.035 -1.933 0.053 NA NA
summary(GavaLifeS, stand = T, fit = T)
## lavaan 0.6-9 ended normally after 73 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 64
## Number of equality constraints 18
##
## Number of observations per group:
## 0 782
## 1 718
##
## Model Test User Model:
##
## Test statistic 236.951
## Degrees of freedom 84
## P-value (Chi-square) 0.000
## Test statistic for each group:
## 0 150.279
## 1 86.672
##
## Model Test Baseline Model:
##
## Test statistic 7205.173
## Degrees of freedom 90
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.979
## Tucker-Lewis Index (TLI) 0.977
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -25173.619
## Loglikelihood unrestricted model (H1) NA
##
## Akaike (AIC) 50439.239
## Bayesian (BIC) 50683.647
## Sample-size adjusted Bayesian (BIC) 50537.518
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.049
## 90 Percent confidence interval - lower 0.042
## 90 Percent confidence interval - upper 0.057
## P-value RMSEA <= 0.05 0.551
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.032
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
##
## Group 1 [0]:
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## FWB =~
## fwb1 1.000 0.933 0.689
## fwb2 (.p2.) 1.168 0.046 25.457 0.000 1.090 0.746
## fwb3 (.p3.) 1.305 0.049 26.489 0.000 1.218 0.781
## fwb4 (.p4.) 1.243 0.046 27.118 0.000 1.159 0.813
## fwb5 (.p5.) 1.139 0.046 24.748 0.000 1.063 0.723
## MLQS =~
## mlqs1 1.000 1.373 0.817
## mlqs2 (.p7.) 1.118 0.035 32.138 0.000 1.535 0.896
## mlqs3 (.p8.) 1.056 0.035 30.221 0.000 1.449 0.862
## mlqs4 (.p9.) 1.167 0.035 33.756 0.000 1.603 0.924
## mlqs5 (.10.) 1.170 0.036 32.918 0.000 1.606 0.910
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## FWB ~~
## MLQS 0.387 0.054 7.150 0.000 0.302 0.302
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .fwb1 (.24.) 5.415 0.043 126.504 0.000 5.415 4.001
## .fwb2 (.25.) 5.316 0.048 111.886 0.000 5.316 3.640
## .fwb3 (.26.) 4.925 0.052 95.515 0.000 4.925 3.160
## .fwb4 (.27.) 5.292 0.048 110.125 0.000 5.292 3.713
## .fwb5 (.28.) 5.057 0.047 106.934 0.000 5.057 3.441
## .mlqs1 (.29.) 4.970 0.059 84.704 0.000 4.970 2.959
## .mlqs2 (.30.) 4.828 0.061 79.522 0.000 4.828 2.819
## .mlqs3 (.31.) 4.896 0.059 82.583 0.000 4.896 2.911
## .mlqs4 (.32.) 4.768 0.062 77.282 0.000 4.768 2.750
## .mlqs5 (.33.) 4.741 0.063 75.635 0.000 4.741 2.686
## FWB 0.000 0.000 0.000
## MLQS 0.000 0.000 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .fwb1 0.961 0.056 17.279 0.000 0.961 0.525
## .fwb2 0.946 0.058 16.288 0.000 0.946 0.443
## .fwb3 0.947 0.061 15.406 0.000 0.947 0.390
## .fwb4 0.687 0.048 14.301 0.000 0.687 0.338
## .fwb5 1.031 0.062 16.741 0.000 1.031 0.477
## .mlqs1 0.937 0.053 17.728 0.000 0.937 0.332
## .mlqs2 0.576 0.037 15.539 0.000 0.576 0.197
## .mlqs3 0.729 0.043 16.819 0.000 0.729 0.258
## .mlqs4 0.437 0.032 13.666 0.000 0.437 0.146
## .mlqs5 0.538 0.036 14.782 0.000 0.538 0.173
## FWB 0.870 0.070 12.438 0.000 1.000 1.000
## MLQS 1.885 0.135 13.958 0.000 1.000 1.000
##
##
## Group 2 [1]:
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## FWB =~
## fwb1 1.000 0.930 0.689
## fwb2 (.p2.) 1.168 0.046 25.457 0.000 1.086 0.746
## fwb3 (.p3.) 1.305 0.049 26.489 0.000 1.213 0.784
## fwb4 (.p4.) 1.243 0.046 27.118 0.000 1.155 0.798
## fwb5 (.p5.) 1.139 0.046 24.748 0.000 1.059 0.721
## MLQS =~
## mlqs1 1.000 NA NA
## mlqs2 (.p7.) 1.118 0.035 32.138 0.000 NA NA
## mlqs3 (.p8.) 1.056 0.035 30.221 0.000 NA NA
## mlqs4 (.p9.) 1.167 0.035 33.756 0.000 NA NA
## mlqs5 (.10.) 1.170 0.036 32.918 0.000 NA NA
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## FWB ~~
## MLQS 0.011 0.029 0.398 0.691 0.045 0.045
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .fwb1 (.24.) 5.415 0.043 126.504 0.000 5.415 4.014
## .fwb2 (.25.) 5.316 0.048 111.886 0.000 5.316 3.649
## .fwb3 (.26.) 4.925 0.052 95.515 0.000 4.925 3.181
## .fwb4 (.27.) 5.292 0.048 110.125 0.000 5.292 3.656
## .fwb5 (.28.) 5.057 0.047 106.934 0.000 5.057 3.440
## .mlqs1 (.29.) 4.970 0.059 84.704 0.000 4.970 2.460
## .mlqs2 (.30.) 4.828 0.061 79.522 0.000 4.828 2.409
## .mlqs3 (.31.) 4.896 0.059 82.583 0.000 4.896 2.447
## .mlqs4 (.32.) 4.768 0.062 77.282 0.000 4.768 2.391
## .mlqs5 (.33.) 4.741 0.063 75.635 0.000 4.741 2.411
## FWB 0.028 0.052 0.543 0.587 0.030 0.030
## MLQS -0.753 0.061 -12.441 0.000 NA NA
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .fwb1 0.955 0.058 16.477 0.000 0.955 0.525
## .fwb2 0.943 0.061 15.505 0.000 0.943 0.444
## .fwb3 0.925 0.064 14.548 0.000 0.925 0.386
## .fwb4 0.761 0.054 14.107 0.000 0.761 0.363
## .fwb5 1.039 0.065 15.989 0.000 1.039 0.481
## .mlqs1 4.154 0.219 18.936 0.000 4.154 1.018
## .mlqs2 4.107 0.218 18.819 0.000 4.107 1.023
## .mlqs3 4.087 0.216 18.882 0.000 4.087 1.021
## .mlqs4 4.077 0.217 18.749 0.000 4.077 1.025
## .mlqs5 3.970 0.212 18.722 0.000 3.970 1.026
## FWB 0.864 0.071 12.156 0.000 1.000 1.000
## MLQS -0.074 0.036 -2.038 0.042 NA NA
If you generated invalid, randomly generated data, you could potentially fool yourself into thinking measurement invariance held, you’d just have to ignore problems like the negative latent variance that would invalidate the model. Importantly, this actual nonsense factor composed of randomly generated data does not correlate with the free will beliefs questionnaire. That illustrates quite clearly that the meaning of gavagai is not nonsense to the people answering it. It uses a nonsense word, but it produces a valid factor, and it at least has comparable predictive validity in this one scenario with a more cognitively meaningful questionnaire about the meaning of life.
The position that a different factor is measured by different questionnaires is empirical, not about questionnaire construction. We could not a priori ensure the meaning of life and gavagai questionnaires measured wholly different things, so we could have attained measurement invariance and still have had the tests not assess the theoretical utility of measurement invariance testing since the question of “different” factors was still on the table. On the other hand, we could and did find that they measured different things, we merely had to do the right tests. This asymmetry is at the core of why this experiment, as conducted, could only show that measurement invariance worked, not that it did not.
sessionInfo()
## R version 4.1.2 (2021-11-01)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19042)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] wakefield_0.3.6 lavaan_0.6-9 pacman_0.5.1
## [4] kirkegaard_2022-04-12 rlang_0.4.12 metafor_3.0-2
## [7] Matrix_1.3-4 psych_2.1.9 magrittr_2.0.1
## [10] assertthat_0.2.1 weights_1.0.4 Hmisc_4.7-0
## [13] Formula_1.2-4 survival_3.2-13 lattice_0.20-45
## [16] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7
## [19] purrr_0.3.4 readr_2.0.2 tidyr_1.1.4
## [22] tibble_3.1.5 ggplot2_3.3.5 tidyverse_1.3.1
##
## loaded via a namespace (and not attached):
## [1] nlme_3.1-153 fs_1.5.0 lubridate_1.8.0
## [4] RColorBrewer_1.1-2 httr_1.4.2 tools_4.1.2
## [7] backports_1.3.0 bslib_0.3.1 utf8_1.2.2
## [10] R6_2.5.1 rpart_4.1-15 DBI_1.1.2
## [13] colorspace_2.0-2 nnet_7.3-16 withr_2.4.2
## [16] mnormt_2.0.2 tidyselect_1.1.1 gridExtra_2.3
## [19] compiler_4.1.2 cli_3.1.0 rvest_1.0.2
## [22] htmlTable_2.4.0 mice_3.14.0 xml2_1.3.2
## [25] sass_0.4.0 scales_1.1.1 checkmate_2.0.0
## [28] pbivnorm_0.6.0 digest_0.6.28 foreign_0.8-81
## [31] minqa_1.2.4 rmarkdown_2.11 base64enc_0.1-3
## [34] jpeg_0.1-9 pkgconfig_2.0.3 htmltools_0.5.2
## [37] lme4_1.1-27.1 dbplyr_2.1.1 fastmap_1.1.0
## [40] htmlwidgets_1.5.4 readxl_1.4.0 rstudioapi_0.13
## [43] jquerylib_0.1.4 generics_0.1.1 jsonlite_1.7.2
## [46] gtools_3.9.2 Rcpp_1.0.7 munsell_0.5.0
## [49] fansi_0.5.0 lifecycle_1.0.1 stringi_1.7.5
## [52] yaml_2.2.1 mathjaxr_1.4-0 MASS_7.3-54
## [55] grid_4.1.2 parallel_4.1.2 gdata_2.18.0
## [58] crayon_1.4.2 haven_2.4.3 splines_4.1.2
## [61] hms_1.1.1 tmvnsim_1.0-2 knitr_1.36
## [64] pillar_1.6.4 boot_1.3-28 stats4_4.1.2
## [67] reprex_2.0.1 glue_1.4.2 evaluate_0.14
## [70] latticeExtra_0.6-29 data.table_1.14.2 modelr_0.1.8
## [73] nloptr_1.2.2.3 png_0.1-7 vctrs_0.3.8
## [76] tzdb_0.2.0 cellranger_1.1.0 gtable_0.3.0
## [79] xfun_0.27 broom_0.7.10 cluster_2.1.2
## [82] ellipsis_0.3.2