If we want to be able to make comparisons across groups, we must show our latent variable(s) is functioning the same way across groups. This is referred to as measurement invariance. If the model is not functioning the same way across the groups, then it makes no sense to say, for example, that one group has a higher mean with respect to the measured construct than another group because the constructs are functioning differently. Similarly, if we want to show that it’s fair to compare scores on a construct at time 1 with scores on that construct at time 2, we need demonstrate measurement invariance.
We implicitly assume measurement invariance, when we make comparsions on some construct across groups. We are assuming that the construct functions the same way when we decide to do regressions, t-test, mixed effects model, etc.
We’ll use the Holzinger data set in lavaan
, to show how to assess measurement invariance. Here’s the description of the data set:
The classic Holzinger and Swineford (1939) dataset consists of mental ability test scores of seventh- and eighth-grade children from two different schools (Pasteur and Grant-White). In the original dataset (available in the MBESS package), there are scores for 26 tests. However, a smaller subset with 9 variables is more widely used in the literature (for example in Joreskog’s 1969 paper, which also uses the 145 subjects from the Grant-White school only).
library("lavaan")
This is lavaan 0.5-20
lavaan is BETA software! Please report any bugs.
data("HolzingerSwineford1939")
head(HolzingerSwineford1939)
id sex ageyr agemo school grade x1 x2 x3 x4 x5 x6 x7 x8 x9
1 1 1 13 1 Pasteur 7 3.333333 7.75 0.375 2.333333 5.75 1.2857143 3.391304 5.75 6.361111
2 2 2 13 7 Pasteur 7 5.333333 5.25 2.125 1.666667 3.00 1.2857143 3.782609 6.25 7.916667
3 3 2 13 1 Pasteur 7 4.500000 5.25 1.875 1.000000 1.75 0.4285714 3.260870 3.90 4.416667
4 4 1 13 2 Pasteur 7 5.333333 7.75 3.000 2.666667 4.50 2.4285714 3.000000 5.30 4.861111
5 5 2 12 2 Pasteur 7 4.833333 4.75 0.875 2.666667 4.00 2.5714286 3.695652 6.30 5.916667
6 6 2 14 1 Pasteur 7 5.333333 5.00 2.250 1.000000 3.00 0.8571429 4.347826 6.65 7.500000
What we are testing is whether our model is invariant across these two schools.
Subtests x1, x2, and x3 are manifestations of the visual factor; x4, x5, and x6 are manifestations of the textual factor; and x7, x8, and x9 are manifestations of the speed factor. This is what our model looks like overall (ignoring the schools).
HS.model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9 '
fit <- cfa(HS.model, data=HolzingerSwineford1939)
summary(fit, fit.measures=TRUE)
lavaan (0.5-20) converged normally after 35 iterations
Number of observations 301
Estimator ML
Minimum Function Test Statistic 85.306
Degrees of freedom 24
P-value (Chi-square) 0.000
Model test baseline model:
Minimum Function Test Statistic 918.852
Degrees of freedom 36
P-value 0.000
User model versus baseline model:
Comparative Fit Index (CFI) 0.931
Tucker-Lewis Index (TLI) 0.896
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -3737.745
Loglikelihood unrestricted model (H1) -3695.092
Number of free parameters 21
Akaike (AIC) 7517.490
Bayesian (BIC) 7595.339
Sample-size adjusted Bayesian (BIC) 7528.739
Root Mean Square Error of Approximation:
RMSEA 0.092
90 Percent Confidence Interval 0.071 0.114
P-value RMSEA <= 0.05 0.001
Standardized Root Mean Square Residual:
SRMR 0.065
Parameter Estimates:
Information Expected
Standard Errors Standard
Latent Variables:
Estimate Std.Err Z-value P(>|z|)
visual =~
x1 1.000
x2 0.554 0.100 5.554 0.000
x3 0.729 0.109 6.685 0.000
textual =~
x4 1.000
x5 1.113 0.065 17.014 0.000
x6 0.926 0.055 16.703 0.000
speed =~
x7 1.000
x8 1.180 0.165 7.152 0.000
x9 1.082 0.151 7.155 0.000
Covariances:
Estimate Std.Err Z-value P(>|z|)
visual ~~
textual 0.408 0.074 5.552 0.000
speed 0.262 0.056 4.660 0.000
textual ~~
speed 0.173 0.049 3.518 0.000
Variances:
Estimate Std.Err Z-value P(>|z|)
x1 0.549 0.114 4.833 0.000
x2 1.134 0.102 11.146 0.000
x3 0.844 0.091 9.317 0.000
x4 0.371 0.048 7.779 0.000
x5 0.446 0.058 7.642 0.000
x6 0.356 0.043 8.277 0.000
x7 0.799 0.081 9.823 0.000
x8 0.488 0.074 6.573 0.000
x9 0.566 0.071 8.003 0.000
visual 0.809 0.145 5.564 0.000
textual 0.979 0.112 8.737 0.000
speed 0.384 0.086 4.451 0.000
Our model has adequate fit and the fit statistics are just ok. They are not awful but if this was your data set, you would hope for a better model fit or you might consider modifications (e.g. removing/swapping in indicators, allowing residuals to correlate, etc).
The first step in demonstrating measurement invariance is to show that the model fits for each of our groups. This can be done two ways. 1) You could fit the data to each subset of the data or 2) you can use a multiple group model. We’ll use this second apprach and to do this in lavaan, we simply add the group = "school"
argument and refit our model. This argument will allow all of our parameters to differ across the groups.
Note, it is inappropriate to use standardize coefficients here. The reason is that we have no reason to expect the variances are the same across the groups and
configural <- cfa(HS.model, data=HolzingerSwineford1939, group = "school")
summary(configural, fit.measures=TRUE)
lavaan (0.5-20) converged normally after 57 iterations
Number of observations per group
Pasteur 156
Grant-White 145
Estimator ML
Minimum Function Test Statistic 115.851
Degrees of freedom 48
P-value (Chi-square) 0.000
Chi-square for each group:
Pasteur 64.309
Grant-White 51.542
Model test baseline model:
Minimum Function Test Statistic 957.769
Degrees of freedom 72
P-value 0.000
User model versus baseline model:
Comparative Fit Index (CFI) 0.923
Tucker-Lewis Index (TLI) 0.885
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -3682.198
Loglikelihood unrestricted model (H1) -3624.272
Number of free parameters 60
Akaike (AIC) 7484.395
Bayesian (BIC) 7706.822
Sample-size adjusted Bayesian (BIC) 7516.536
Root Mean Square Error of Approximation:
RMSEA 0.097
90 Percent Confidence Interval 0.075 0.120
P-value RMSEA <= 0.05 0.001
Standardized Root Mean Square Residual:
SRMR 0.068
Parameter Estimates:
Information Expected
Standard Errors Standard
Group 1 [Pasteur]:
Latent Variables:
Estimate Std.Err Z-value P(>|z|)
visual =~
x1 1.000
x2 0.394 0.122 3.220 0.001
x3 0.570 0.140 4.076 0.000
textual =~
x4 1.000
x5 1.183 0.102 11.613 0.000
x6 0.875 0.077 11.421 0.000
speed =~
x7 1.000
x8 1.125 0.277 4.057 0.000
x9 0.922 0.225 4.104 0.000
Covariances:
Estimate Std.Err Z-value P(>|z|)
visual ~~
textual 0.479 0.106 4.531 0.000
speed 0.185 0.077 2.397 0.017
textual ~~
speed 0.182 0.069 2.628 0.009
Intercepts:
Estimate Std.Err Z-value P(>|z|)
x1 4.941 0.095 52.249 0.000
x2 5.984 0.098 60.949 0.000
x3 2.487 0.093 26.778 0.000
x4 2.823 0.092 30.689 0.000
x5 3.995 0.105 38.183 0.000
x6 1.922 0.079 24.321 0.000
x7 4.432 0.087 51.181 0.000
x8 5.563 0.078 71.214 0.000
x9 5.418 0.079 68.440 0.000
visual 0.000
textual 0.000
speed 0.000
Variances:
Estimate Std.Err Z-value P(>|z|)
x1 0.298 0.232 1.286 0.198
x2 1.334 0.158 8.464 0.000
x3 0.989 0.136 7.271 0.000
x4 0.425 0.069 6.138 0.000
x5 0.456 0.086 5.292 0.000
x6 0.290 0.050 5.780 0.000
x7 0.820 0.125 6.580 0.000
x8 0.510 0.116 4.406 0.000
x9 0.680 0.104 6.516 0.000
visual 1.097 0.276 3.967 0.000
textual 0.894 0.150 5.963 0.000
speed 0.350 0.126 2.778 0.005
Group 2 [Grant-White]:
Latent Variables:
Estimate Std.Err Z-value P(>|z|)
visual =~
x1 1.000
x2 0.736 0.155 4.760 0.000
x3 0.925 0.166 5.583 0.000
textual =~
x4 1.000
x5 0.990 0.087 11.418 0.000
x6 0.963 0.085 11.377 0.000
speed =~
x7 1.000
x8 1.226 0.187 6.569 0.000
x9 1.058 0.165 6.429 0.000
Covariances:
Estimate Std.Err Z-value P(>|z|)
visual ~~
textual 0.408 0.098 4.153 0.000
speed 0.276 0.076 3.639 0.000
textual ~~
speed 0.222 0.073 3.022 0.003
Intercepts:
Estimate Std.Err Z-value P(>|z|)
x1 4.930 0.095 51.696 0.000
x2 6.200 0.092 67.416 0.000
x3 1.996 0.086 23.195 0.000
x4 3.317 0.093 35.625 0.000
x5 4.712 0.096 48.986 0.000
x6 2.469 0.094 26.277 0.000
x7 3.921 0.086 45.819 0.000
x8 5.488 0.087 63.174 0.000
x9 5.327 0.085 62.571 0.000
visual 0.000
textual 0.000
speed 0.000
Variances:
Estimate Std.Err Z-value P(>|z|)
x1 0.715 0.126 5.676 0.000
x2 0.899 0.123 7.339 0.000
x3 0.557 0.103 5.409 0.000
x4 0.315 0.065 4.870 0.000
x5 0.419 0.072 5.812 0.000
x6 0.406 0.069 5.880 0.000
x7 0.600 0.091 6.584 0.000
x8 0.401 0.094 4.249 0.000
x9 0.535 0.089 6.010 0.000
visual 0.604 0.160 3.762 0.000
textual 0.942 0.152 6.177 0.000
speed 0.461 0.118 3.910 0.000
The model again fits just ok. At least it doesn’t fit worse when we fit a multiple group model. The configural model is a prerequisite to investigating measurement invariance.
To assess for weak (metric) invariance, we need to constrain the factor loadings to be equal across the groups. This shows that the factor(s) has the same meaning across the groups. Are they attributing the same meaning to the latent constructs? If we don’t have weak invariance, they this implies the the meaning of the manifest variables are different across the groups. For example, on the Minnesota Student Survey, there are many items that ask about bullying and a specific item that asks about frequency of being bullied because of your race/ethnic group. Racism functions differently for white students than non-white students and because of this, this item ends up not being invariant because white students are much more likely to say never whereas non-white students would be more likely to say it occurs more frequently.
At this stage, and in future steps, if you have only partial invariance of the factor loadings, constrain these invariant loadings to be equal and allow the other, ones to be freed. At this point, we are yet not estimating the factor means. This happens in the next step.
You fit this model in lavaan
using the group.equal
argument.
weak.invariance <- cfa(HS.model, data=HolzingerSwineford1939, group = "school", group.equal = "loadings")
summary(weak.invariance, fit.measures = TRUE)
lavaan (0.5-20) converged normally after 42 iterations
Number of observations per group
Pasteur 156
Grant-White 145
Estimator ML
Minimum Function Test Statistic 124.044
Degrees of freedom 54
P-value (Chi-square) 0.000
Chi-square for each group:
Pasteur 68.825
Grant-White 55.219
Model test baseline model:
Minimum Function Test Statistic 957.769
Degrees of freedom 72
P-value 0.000
User model versus baseline model:
Comparative Fit Index (CFI) 0.921
Tucker-Lewis Index (TLI) 0.895
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -3686.294
Loglikelihood unrestricted model (H1) -3624.272
Number of free parameters 54
Akaike (AIC) 7480.587
Bayesian (BIC) 7680.771
Sample-size adjusted Bayesian (BIC) 7509.514
Root Mean Square Error of Approximation:
RMSEA 0.093
90 Percent Confidence Interval 0.071 0.114
P-value RMSEA <= 0.05 0.001
Standardized Root Mean Square Residual:
SRMR 0.072
Parameter Estimates:
Information Expected
Standard Errors Standard
Group 1 [Pasteur]:
Latent Variables:
Estimate Std.Err Z-value P(>|z|)
visual =~
x1 1.000
x2 (.p2.) 0.599 0.100 5.979 0.000
x3 (.p3.) 0.784 0.108 7.267 0.000
textual =~
x4 1.000
x5 (.p5.) 1.083 0.067 16.049 0.000
x6 (.p6.) 0.912 0.058 15.785 0.000
speed =~
x7 1.000
x8 (.p8.) 1.201 0.155 7.738 0.000
x9 (.p9.) 1.038 0.136 7.629 0.000
Covariances:
Estimate Std.Err Z-value P(>|z|)
visual ~~
textual 0.416 0.097 4.271 0.000
speed 0.169 0.064 2.643 0.008
textual ~~
speed 0.176 0.061 2.882 0.004
Intercepts:
Estimate Std.Err Z-value P(>|z|)
x1 4.941 0.093 52.991 0.000
x2 5.984 0.100 60.096 0.000
x3 2.487 0.094 26.465 0.000
x4 2.823 0.093 30.371 0.000
x5 3.995 0.101 39.714 0.000
x6 1.922 0.081 23.711 0.000
x7 4.432 0.086 51.540 0.000
x8 5.563 0.078 71.087 0.000
x9 5.418 0.079 68.153 0.000
visual 0.000
textual 0.000
speed 0.000
Variances:
Estimate Std.Err Z-value P(>|z|)
x1 0.551 0.137 4.010 0.000
x2 1.258 0.155 8.117 0.000
x3 0.882 0.128 6.884 0.000
x4 0.434 0.070 6.238 0.000
x5 0.508 0.082 6.229 0.000
x6 0.266 0.050 5.294 0.000
x7 0.849 0.114 7.468 0.000
x8 0.515 0.095 5.409 0.000
x9 0.658 0.096 6.865 0.000
visual 0.805 0.171 4.714 0.000
textual 0.913 0.137 6.651 0.000
speed 0.305 0.078 3.920 0.000
Group 2 [Grant-White]:
Latent Variables:
Estimate Std.Err Z-value P(>|z|)
visual =~
x1 1.000
x2 (.p2.) 0.599 0.100 5.979 0.000
x3 (.p3.) 0.784 0.108 7.267 0.000
textual =~
x4 1.000
x5 (.p5.) 1.083 0.067 16.049 0.000
x6 (.p6.) 0.912 0.058 15.785 0.000
speed =~
x7 1.000
x8 (.p8.) 1.201 0.155 7.738 0.000
x9 (.p9.) 1.038 0.136 7.629 0.000
Covariances:
Estimate Std.Err Z-value P(>|z|)
visual ~~
textual 0.437 0.099 4.423 0.000
speed 0.314 0.079 3.958 0.000
textual ~~
speed 0.226 0.072 3.144 0.002
Intercepts:
Estimate Std.Err Z-value P(>|z|)
x1 4.930 0.097 50.763 0.000
x2 6.200 0.091 68.379 0.000
x3 1.996 0.085 23.455 0.000
x4 3.317 0.092 35.950 0.000
x5 4.712 0.100 47.173 0.000
x6 2.469 0.091 27.248 0.000
x7 3.921 0.086 45.555 0.000
x8 5.488 0.087 63.257 0.000
x9 5.327 0.085 62.786 0.000
visual 0.000
textual 0.000
speed 0.000
Variances:
Estimate Std.Err Z-value P(>|z|)
x1 0.645 0.127 5.084 0.000
x2 0.933 0.121 7.732 0.000
x3 0.605 0.096 6.282 0.000
x4 0.329 0.062 5.279 0.000
x5 0.384 0.073 5.270 0.000
x6 0.437 0.067 6.576 0.000
x7 0.599 0.090 6.651 0.000
x8 0.406 0.089 4.541 0.000
x9 0.532 0.086 6.202 0.000
visual 0.722 0.161 4.490 0.000
textual 0.906 0.136 6.646 0.000
speed 0.475 0.109 4.347 0.000
All of the measurement invariance models are nested within one another. The weak invariance model is nested within the configural model. Therefore, we can do a chi-square difference test.
anova(weak.invariance, configural)
Chi Square Difference Test
Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
configural 48 7484.4 7706.8 115.85
weak.invariance 54 7480.6 7680.8 124.04 8.1922 6 0.2244
The null hypothesis is that the configural model is not an improvement in fit over the weak invariance model. We fail to reject the null hypothesis. This is evidence that we have weak invariance. We can also investigate other fit statistics.
fit.stats <- rbind(fitmeasures(configural, fit.measures = c("chisq", "df", "rmsea", "tli", "cfi", "aic")),
fitmeasures(weak.invariance, fit.measures = c("chisq", "df", "rmsea", "tli", "cfi", "aic")))
rownames(fit.stats) <- c("configural", "weak invariance")
fit.stats
chisq df rmsea tli cfi aic
configural 115.8513 48 0.09691486 0.8850976 0.9233984 7484.395
weak invariance 124.0435 54 0.09283654 0.8945646 0.9209235 7480.587
The fit statistics haven’t really changed, but recall that the configural model is the model were all the parameters are all freely estimated across the groups, so the weak invariance model is the more parsimonious model estimating 6 fewer parameters. This is evidence that we have weak invariance. Based on this, we can safely conclude that we have weak invariance.
Strong (scalar) invariance adds the additional constraint that intercepts are equal across groups. Now we have both loadings and intercepts held constrant across groups.This implies that the meaning of the construct (the factor loadings), and the levels of the underlying manifest variables (intercepts) are equal in both groups. If we have strong invariance, we are able to compare differences in the mean across the latent construct(s). In this stage we estimate the factor means.
strong.invariance <- cfa(HS.model, data=HolzingerSwineford1939, group = "school", group.equal = c( "loadings", "intercepts"))
summary(strong.invariance, fit.measures = TRUE)
lavaan (0.5-20) converged normally after 60 iterations
Number of observations per group
Pasteur 156
Grant-White 145
Estimator ML
Minimum Function Test Statistic 164.103
Degrees of freedom 60
P-value (Chi-square) 0.000
Chi-square for each group:
Pasteur 90.210
Grant-White 73.892
Model test baseline model:
Minimum Function Test Statistic 957.769
Degrees of freedom 72
P-value 0.000
User model versus baseline model:
Comparative Fit Index (CFI) 0.882
Tucker-Lewis Index (TLI) 0.859
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -3706.323
Loglikelihood unrestricted model (H1) -3624.272
Number of free parameters 48
Akaike (AIC) 7508.647
Bayesian (BIC) 7686.588
Sample-size adjusted Bayesian (BIC) 7534.359
Root Mean Square Error of Approximation:
RMSEA 0.107
90 Percent Confidence Interval 0.088 0.127
P-value RMSEA <= 0.05 0.000
Standardized Root Mean Square Residual:
SRMR 0.082
Parameter Estimates:
Information Expected
Standard Errors Standard
Group 1 [Pasteur]:
Latent Variables:
Estimate Std.Err Z-value P(>|z|)
visual =~
x1 1.000
x2 (.p2.) 0.576 0.101 5.713 0.000
x3 (.p3.) 0.798 0.112 7.146 0.000
textual =~
x4 1.000
x5 (.p5.) 1.120 0.066 16.965 0.000
x6 (.p6.) 0.932 0.056 16.608 0.000
speed =~
x7 1.000
x8 (.p8.) 1.130 0.145 7.786 0.000
x9 (.p9.) 1.009 0.132 7.667 0.000
Covariances:
Estimate Std.Err Z-value P(>|z|)
visual ~~
textual 0.410 0.095 4.293 0.000
speed 0.178 0.066 2.687 0.007
textual ~~
speed 0.180 0.062 2.900 0.004
Intercepts:
Estimate Std.Err Z-value P(>|z|)
x1 (.25.) 5.001 0.090 55.760 0.000
x2 (.26.) 6.151 0.077 79.905 0.000
x3 (.27.) 2.271 0.083 27.387 0.000
x4 (.28.) 2.778 0.087 31.953 0.000
x5 (.29.) 4.035 0.096 41.858 0.000
x6 (.30.) 1.926 0.079 24.426 0.000
x7 (.31.) 4.242 0.073 57.975 0.000
x8 (.32.) 5.630 0.072 78.531 0.000
x9 (.33.) 5.465 0.069 79.016 0.000
visual 0.000
textual 0.000
speed 0.000
Variances:
Estimate Std.Err Z-value P(>|z|)
x1 0.555 0.139 3.983 0.000
x2 1.296 0.158 8.186 0.000
x3 0.944 0.136 6.929 0.000
x4 0.445 0.069 6.430 0.000
x5 0.502 0.082 6.136 0.000
x6 0.263 0.050 5.264 0.000
x7 0.888 0.120 7.416 0.000
x8 0.541 0.095 5.706 0.000
x9 0.654 0.096 6.805 0.000
visual 0.796 0.172 4.641 0.000
textual 0.879 0.131 6.694 0.000
speed 0.322 0.082 3.914 0.000
Group 2 [Grant-White]:
Latent Variables:
Estimate Std.Err Z-value P(>|z|)
visual =~
x1 1.000
x2 (.p2.) 0.576 0.101 5.713 0.000
x3 (.p3.) 0.798 0.112 7.146 0.000
textual =~
x4 1.000
x5 (.p5.) 1.120 0.066 16.965 0.000
x6 (.p6.) 0.932 0.056 16.608 0.000
speed =~
x7 1.000
x8 (.p8.) 1.130 0.145 7.786 0.000
x9 (.p9.) 1.009 0.132 7.667 0.000
Covariances:
Estimate Std.Err Z-value P(>|z|)
visual ~~
textual 0.427 0.097 4.417 0.000
speed 0.329 0.082 4.006 0.000
textual ~~
speed 0.236 0.073 3.224 0.001
Intercepts:
Estimate Std.Err Z-value P(>|z|)
x1 (.25.) 5.001 0.090 55.760 0.000
x2 (.26.) 6.151 0.077 79.905 0.000
x3 (.27.) 2.271 0.083 27.387 0.000
x4 (.28.) 2.778 0.087 31.953 0.000
x5 (.29.) 4.035 0.096 41.858 0.000
x6 (.30.) 1.926 0.079 24.426 0.000
x7 (.31.) 4.242 0.073 57.975 0.000
x8 (.32.) 5.630 0.072 78.531 0.000
x9 (.33.) 5.465 0.069 79.016 0.000
visual -0.148 0.122 -1.211 0.226
textual 0.576 0.117 4.918 0.000
speed -0.177 0.090 -1.968 0.049
Variances:
Estimate Std.Err Z-value P(>|z|)
x1 0.654 0.128 5.094 0.000
x2 0.964 0.123 7.812 0.000
x3 0.641 0.101 6.316 0.000
x4 0.343 0.062 5.534 0.000
x5 0.376 0.073 5.133 0.000
x6 0.437 0.067 6.559 0.000
x7 0.625 0.095 6.574 0.000
x8 0.434 0.088 4.914 0.000
x9 0.522 0.086 6.102 0.000
visual 0.708 0.160 4.417 0.000
textual 0.870 0.131 6.659 0.000
speed 0.505 0.115 4.379 0.000
You’ll note that the latent means for visual, textual, and speed are now estimated. However, they shouldn’t be compared across group until we test if we have strong invariance.
The strong invariance model is nested within the weak invariance model. Therefore, we can again perform a chi-square difference test.
anova(strong.invariance, weak.invariance)
Chi Square Difference Test
Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
weak.invariance 54 7480.6 7680.8 124.04
strong.invariance 60 7508.6 7686.6 164.10 40.059 6 4.435e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
We see that we reject the null hypothesis and that the weak invariance model fits better than the strong invariance model. This is also evident by the lower AIC and BIC. What we should now know is examine whether there is partial invariance. We can do this by using the lavTestScore()
function in the lavaan function. This function allows us to see the effect of releasing equality constraints across the groups. The modindices()
will only show modification indices for newly added parameters associated with new paths. These parameters aren’t newly estimated in the model, we’re just freeing them across groups.
lavTestScore(strong.invariance)
$test
total score test:
test X2 df p.value
1 score 46.956 15 0
$uni
univariate score tests:
lhs op rhs X2 df p.value
1 .p2. == .p38. 0.306 1 0.580
2 .p3. == .p39. 1.636 1 0.201
3 .p5. == .p41. 2.744 1 0.098
4 .p6. == .p42. 2.627 1 0.105
5 .p8. == .p44. 0.027 1 0.871
6 .p9. == .p45. 0.004 1 0.952
7 .p25. == .p61. 5.847 1 0.016
8 .p26. == .p62. 6.863 1 0.009
9 .p27. == .p63. 19.193 1 0.000
10 .p28. == .p64. 2.139 1 0.144
11 .p29. == .p65. 1.563 1 0.211
12 .p30. == .p66. 0.032 1 0.857
13 .p31. == .p67. 15.021 1 0.000
14 .p32. == .p68. 4.710 1 0.030
15 .p33. == .p69. 1.498 1 0.221
What are all these .p
s?
parTable(strong.invariance)
id lhs op rhs user group free ustart exo label plabel start est se
1 1 visual =~ x1 1 1 0 1 0 .p1. 1.000 1.000 0.000
2 2 visual =~ x2 1 1 1 NA 0 .p2. .p2. 0.769 0.576 0.101
3 3 visual =~ x3 1 1 2 NA 0 .p3. .p3. 1.186 0.798 0.112
4 4 textual =~ x4 1 1 0 1 0 .p4. 1.000 1.000 0.000
5 5 textual =~ x5 1 1 3 NA 0 .p5. .p5. 1.237 1.120 0.066
6 6 textual =~ x6 1 1 4 NA 0 .p6. .p6. 0.865 0.932 0.056
7 7 speed =~ x7 1 1 0 1 0 .p7. 1.000 1.000 0.000
8 8 speed =~ x8 1 1 5 NA 0 .p8. .p8. 1.227 1.130 0.145
9 9 speed =~ x9 1 1 6 NA 0 .p9. .p9. 0.827 1.009 0.132
10 10 x1 ~~ x1 0 1 7 NA 0 .p10. 0.698 0.555 0.139
11 11 x2 ~~ x2 0 1 8 NA 0 .p11. 0.752 1.296 0.158
12 12 x3 ~~ x3 0 1 9 NA 0 .p12. 0.673 0.944 0.136
13 13 x4 ~~ x4 0 1 10 NA 0 .p13. 0.660 0.445 0.069
14 14 x5 ~~ x5 0 1 11 NA 0 .p14. 0.854 0.502 0.082
15 15 x6 ~~ x6 0 1 12 NA 0 .p15. 0.487 0.263 0.050
16 16 x7 ~~ x7 0 1 13 NA 0 .p16. 0.585 0.888 0.120
17 17 x8 ~~ x8 0 1 14 NA 0 .p17. 0.476 0.541 0.095
18 18 x9 ~~ x9 0 1 15 NA 0 .p18. 0.489 0.654 0.096
19 19 visual ~~ visual 0 1 16 NA 0 .p19. 0.050 0.796 0.172
20 20 textual ~~ textual 0 1 17 NA 0 .p20. 0.050 0.879 0.131
21 21 speed ~~ speed 0 1 18 NA 0 .p21. 0.050 0.322 0.082
22 22 visual ~~ textual 0 1 19 NA 0 .p22. 0.000 0.410 0.095
23 23 visual ~~ speed 0 1 20 NA 0 .p23. 0.000 0.178 0.066
24 24 textual ~~ speed 0 1 21 NA 0 .p24. 0.000 0.180 0.062
25 25 x1 ~1 0 1 22 NA 0 .p25. .p25. 4.941 5.001 0.090
26 26 x2 ~1 0 1 23 NA 0 .p26. .p26. 5.984 6.151 0.077
27 27 x3 ~1 0 1 24 NA 0 .p27. .p27. 2.487 2.271 0.083
28 28 x4 ~1 0 1 25 NA 0 .p28. .p28. 2.823 2.778 0.087
29 29 x5 ~1 0 1 26 NA 0 .p29. .p29. 3.995 4.035 0.096
30 30 x6 ~1 0 1 27 NA 0 .p30. .p30. 1.922 1.926 0.079
31 31 x7 ~1 0 1 28 NA 0 .p31. .p31. 4.432 4.242 0.073
32 32 x8 ~1 0 1 29 NA 0 .p32. .p32. 5.563 5.630 0.072
33 33 x9 ~1 0 1 30 NA 0 .p33. .p33. 5.418 5.465 0.069
34 34 visual ~1 0 1 0 0 0 .p34. 0.000 0.000 0.000
35 35 textual ~1 0 1 0 0 0 .p35. 0.000 0.000 0.000
36 36 speed ~1 0 1 0 0 0 .p36. 0.000 0.000 0.000
37 37 visual =~ x1 1 2 0 1 0 .p37. 1.000 1.000 0.000
38 38 visual =~ x2 1 2 31 NA 0 .p2. .p38. 0.896 0.576 0.101
39 39 visual =~ x3 1 2 32 NA 0 .p3. .p39. 1.155 0.798 0.112
40 40 textual =~ x4 1 2 0 1 0 .p40. 1.000 1.000 0.000
41 41 textual =~ x5 1 2 33 NA 0 .p5. .p41. 0.991 1.120 0.066
42 42 textual =~ x6 1 2 34 NA 0 .p6. .p42. 0.962 0.932 0.056
43 43 speed =~ x7 1 2 0 1 0 .p43. 1.000 1.000 0.000
44 44 speed =~ x8 1 2 35 NA 0 .p8. .p44. 1.282 1.130 0.145
45 45 speed =~ x9 1 2 36 NA 0 .p9. .p45. 0.895 1.009 0.132
46 46 x1 ~~ x1 0 2 37 NA 0 .p46. 0.659 0.654 0.128
47 47 x2 ~~ x2 0 2 38 NA 0 .p47. 0.613 0.964 0.123
48 48 x3 ~~ x3 0 2 39 NA 0 .p48. 0.537 0.641 0.101
49 49 x4 ~~ x4 0 2 40 NA 0 .p49. 0.629 0.343 0.062
50 50 x5 ~~ x5 0 2 41 NA 0 .p50. 0.671 0.376 0.073
51 51 x6 ~~ x6 0 2 42 NA 0 .p51. 0.640 0.437 0.067
52 52 x7 ~~ x7 0 2 43 NA 0 .p52. 0.531 0.625 0.095
53 53 x8 ~~ x8 0 2 44 NA 0 .p53. 0.547 0.434 0.088
54 54 x9 ~~ x9 0 2 45 NA 0 .p54. 0.526 0.522 0.086
55 55 visual ~~ visual 0 2 46 NA 0 .p55. 0.050 0.708 0.160
56 56 textual ~~ textual 0 2 47 NA 0 .p56. 0.050 0.870 0.131
57 57 speed ~~ speed 0 2 48 NA 0 .p57. 0.050 0.505 0.115
58 58 visual ~~ textual 0 2 49 NA 0 .p58. 0.000 0.427 0.097
59 59 visual ~~ speed 0 2 50 NA 0 .p59. 0.000 0.329 0.082
60 60 textual ~~ speed 0 2 51 NA 0 .p60. 0.000 0.236 0.073
61 61 x1 ~1 0 2 52 NA 0 .p25. .p61. 4.930 5.001 0.090
62 62 x2 ~1 0 2 53 NA 0 .p26. .p62. 6.200 6.151 0.077
63 63 x3 ~1 0 2 54 NA 0 .p27. .p63. 1.996 2.271 0.083
64 64 x4 ~1 0 2 55 NA 0 .p28. .p64. 3.317 2.778 0.087
65 65 x5 ~1 0 2 56 NA 0 .p29. .p65. 4.712 4.035 0.096
66 66 x6 ~1 0 2 57 NA 0 .p30. .p66. 2.469 1.926 0.079
67 67 x7 ~1 0 2 58 NA 0 .p31. .p67. 3.921 4.242 0.073
68 68 x8 ~1 0 2 59 NA 0 .p32. .p68. 5.488 5.630 0.072
69 69 x9 ~1 0 2 60 NA 0 .p33. .p69. 5.327 5.465 0.069
70 70 visual ~1 0 2 61 NA 0 .p70. 0.000 -0.148 0.122
71 71 textual ~1 0 2 62 NA 0 .p71. 0.000 0.576 0.117
72 72 speed ~1 0 2 63 NA 0 .p72. 0.000 -0.177 0.090
73 73 .p2. == .p38. 2 0 0 NA 0 0.000 0.000 0.000
74 74 .p3. == .p39. 2 0 0 NA 0 0.000 0.000 0.000
75 75 .p5. == .p41. 2 0 0 NA 0 0.000 0.000 0.000
76 76 .p6. == .p42. 2 0 0 NA 0 0.000 0.000 0.000
77 77 .p8. == .p44. 2 0 0 NA 0 0.000 0.000 0.000
78 78 .p9. == .p45. 2 0 0 NA 0 0.000 0.000 0.000
79 79 .p25. == .p61. 2 0 0 NA 0 0.000 0.000 0.000
80 80 .p26. == .p62. 2 0 0 NA 0 0.000 0.000 0.000
81 81 .p27. == .p63. 2 0 0 NA 0 0.000 0.000 0.000
82 82 .p28. == .p64. 2 0 0 NA 0 0.000 0.000 0.000
83 83 .p29. == .p65. 2 0 0 NA 0 0.000 0.000 0.000
84 84 .p30. == .p66. 2 0 0 NA 0 0.000 0.000 0.000
85 85 .p31. == .p67. 2 0 0 NA 0 0.000 0.000 0.000
86 86 .p32. == .p68. 2 0 0 NA 0 0.000 0.000 0.000
87 87 .p33. == .p69. 2 0 0 NA 0 0.000 0.000 0.000
The first output is a multivariate score test (i.e. Lagrange multipler test) and this is a test of whether freeing all equality constraints represents an improvement in fit over the base model. We reject the null hypothesis. Therefore, we should look at the univariate score tests (i.e., the chi-square difference tests) to see which equality constraints should be relaxed and these parameters should be freed. We see that we should free the x3 (this is the .p27 == .p63
) intercept has the largest change in chi-square difference (X2
). Let’s fit this model first.
strong.invariance.x3 <- cfa(HS.model, data=HolzingerSwineford1939, group = "school", group.equal = c( "loadings", "intercepts"), group.partial = c("x3 ~ 1"))
lavTestScore(strong.invariance.x3)
$test
total score test:
test X2 df p.value
1 score 27.528 14 0.016
$uni
univariate score tests:
lhs op rhs X2 df p.value
1 .p2. == .p38. 0.734 1 0.392
2 .p3. == .p39. 0.485 1 0.486
3 .p5. == .p41. 2.760 1 0.097
4 .p6. == .p42. 2.630 1 0.105
5 .p8. == .p44. 0.026 1 0.872
6 .p9. == .p45. 0.002 1 0.960
7 .p25. == .p61. 2.833 1 0.092
8 .p26. == .p62. 2.833 1 0.092
9 .p28. == .p64. 2.136 1 0.144
10 .p29. == .p65. 1.560 1 0.212
11 .p30. == .p66. 0.032 1 0.857
12 .p31. == .p67. 15.023 1 0.000
13 .p32. == .p68. 4.727 1 0.030
14 .p33. == .p69. 1.492 1 0.222
Again, we reject the multivariate score test and see that we can free x7 (.p31. == .p67.
). We could potentially free x8 (.p32. == .p68.
) but as before, let’s free x7 first.
strong.invariance.x3x7 <- cfa(HS.model, data=HolzingerSwineford1939, group = "school", group.equal = c( "loadings", "intercepts"), group.partial = c("x3 ~ 1", "x7 ~ 1"))
lavTestScore(strong.invariance.x3x7)
$test
total score test:
test X2 df p.value
1 score 12.583 13 0.481
$uni
univariate score tests:
lhs op rhs X2 df p.value
1 .p2. == .p38. 0.734 1 0.391
2 .p3. == .p39. 0.492 1 0.483
3 .p5. == .p41. 2.769 1 0.096
4 .p6. == .p42. 2.631 1 0.105
5 .p8. == .p44. 0.013 1 0.910
6 .p9. == .p45. 0.062 1 0.803
7 .p25. == .p61. 2.832 1 0.092
8 .p26. == .p62. 2.832 1 0.092
9 .p28. == .p64. 2.135 1 0.144
10 .p29. == .p65. 1.563 1 0.211
11 .p30. == .p66. 0.032 1 0.858
12 .p32. == .p68. 0.053 1 0.818
13 .p33. == .p69. 0.053 1 0.818
Nothing else should be freed (i.e., we fail to reject multivariate score test). Let’s return back to using lavaan
and add these additional constriants.
strong.invariance.x3x7 <- cfa(HS.model, data=HolzingerSwineford1939, group = "school", group.equal = c( "loadings", "intercepts"), group.partial = c("x3 ~ 1", "x7 ~ 1"))
anova(strong.invariance.x3x7, weak.invariance)
Chi Square Difference Test
Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
weak.invariance 54 7480.6 7680.8 124.04
strong.invariance.x3x7 58 7478.0 7663.3 129.42 5.3789 4 0.2506
Now, we see that have partial strong (scalar) invariance.
fit.stats2 <- rbind(fitmeasures(strong.invariance, fit.measures = c("chisq", "df", "rmsea", "tli", "cfi", "aic")),
fitmeasures(strong.invariance.x3x7, fit.measures = c("chisq", "df", "rmsea", "tli", "cfi", "aic")))
rownames(fit.stats2) <- c("strong", "strong with x3 x7")
fit.stats <- rbind(fit.stats, fit.stats2)
round(fit.stats, 4)
chisq df rmsea tli cfi aic
configural 115.8513 48 0.0969 0.8851 0.9234 7484.395
weak invariance 124.0435 54 0.0928 0.8946 0.9209 7480.587
strong 164.1028 60 0.1074 0.8590 0.8825 7508.646
strong with x3 x7 129.4225 58 0.0905 0.8999 0.9194 7477.966
Because we have strong invariance, we can now make mean comparisons for the latent variables
summary(strong.invariance.x3x7)
lavaan (0.5-20) converged normally after 61 iterations
Number of observations per group
Pasteur 156
Grant-White 145
Estimator ML
Minimum Function Test Statistic 129.422
Degrees of freedom 58
P-value (Chi-square) 0.000
Chi-square for each group:
Pasteur 71.170
Grant-White 58.253
Parameter Estimates:
Information Expected
Standard Errors Standard
Group 1 [Pasteur]:
Latent Variables:
Estimate Std.Err Z-value P(>|z|)
visual =~
x1 1.000
x2 (.p2.) 0.606 0.101 5.988 0.000
x3 (.p3.) 0.791 0.109 7.259 0.000
textual =~
x4 1.000
x5 (.p5.) 1.120 0.066 16.960 0.000
x6 (.p6.) 0.932 0.056 16.606 0.000
speed =~
x7 1.000
x8 (.p8.) 1.200 0.155 7.741 0.000
x9 (.p9.) 1.041 0.136 7.635 0.000
Covariances:
Estimate Std.Err Z-value P(>|z|)
visual ~~
textual 0.404 0.095 4.247 0.000
speed 0.168 0.064 2.647 0.008
textual ~~
speed 0.172 0.060 2.882 0.004
Intercepts:
Estimate Std.Err Z-value P(>|z|)
x1 (.25.) 4.914 0.092 53.538 0.000
x2 (.26.) 6.087 0.079 76.999 0.000
x3 2.487 0.094 26.474 0.000
x4 (.28.) 2.778 0.087 31.953 0.000
x5 (.29.) 4.035 0.096 41.861 0.000
x6 (.30.) 1.926 0.079 24.425 0.000
x7 4.432 0.086 51.533 0.000
x8 (.32.) 5.569 0.074 75.328 0.000
x9 (.33.) 5.409 0.070 77.182 0.000
visual 0.000
textual 0.000
speed 0.000
Variances:
Estimate Std.Err Z-value P(>|z|)
x1 0.560 0.137 4.086 0.000
x2 1.267 0.156 8.105 0.000
x3 0.879 0.128 6.850 0.000
x4 0.446 0.069 6.432 0.000
x5 0.502 0.082 6.132 0.000
x6 0.263 0.050 5.258 0.000
x7 0.850 0.114 7.471 0.000
x8 0.516 0.095 5.429 0.000
x9 0.656 0.096 6.852 0.000
visual 0.796 0.170 4.691 0.000
textual 0.879 0.131 6.693 0.000
speed 0.304 0.078 3.918 0.000
Group 2 [Grant-White]:
Latent Variables:
Estimate Std.Err Z-value P(>|z|)
visual =~
x1 1.000
x2 (.p2.) 0.606 0.101 5.988 0.000
x3 (.p3.) 0.791 0.109 7.259 0.000
textual =~
x4 1.000
x5 (.p5.) 1.120 0.066 16.960 0.000
x6 (.p6.) 0.932 0.056 16.606 0.000
speed =~
x7 1.000
x8 (.p8.) 1.200 0.155 7.741 0.000
x9 (.p9.) 1.041 0.136 7.635 0.000
Covariances:
Estimate Std.Err Z-value P(>|z|)
visual ~~
textual 0.426 0.097 4.412 0.000
speed 0.312 0.079 3.955 0.000
textual ~~
speed 0.223 0.071 3.163 0.002
Intercepts:
Estimate Std.Err Z-value P(>|z|)
x1 (.25.) 4.914 0.092 53.538 0.000
x2 (.26.) 6.087 0.079 76.999 0.000
x3 1.955 0.108 18.170 0.000
x4 (.28.) 2.778 0.087 31.953 0.000
x5 (.29.) 4.035 0.096 41.861 0.000
x6 (.30.) 1.926 0.079 24.425 0.000
x7 3.992 0.094 42.478 0.000
x8 (.32.) 5.569 0.074 75.328 0.000
x9 (.33.) 5.409 0.070 77.182 0.000
visual 0.051 0.129 0.393 0.695
textual 0.576 0.117 4.918 0.000
speed -0.071 0.089 -0.800 0.424
Variances:
Estimate Std.Err Z-value P(>|z|)
x1 0.651 0.127 5.138 0.000
x2 0.939 0.122 7.721 0.000
x3 0.603 0.096 6.248 0.000
x4 0.343 0.062 5.532 0.000
x5 0.377 0.073 5.136 0.000
x6 0.437 0.067 6.556 0.000
x7 0.599 0.090 6.655 0.000
x8 0.407 0.089 4.570 0.000
x9 0.531 0.086 6.186 0.000
visual 0.715 0.160 4.473 0.000
textual 0.870 0.131 6.659 0.000
speed 0.475 0.109 4.344 0.000
We have strong invariance for textual and partial strong invariance for both visual and speed factors. We see that Grant-White, school 2, students do better on the textual latent construct than Pasteur. They are expected to be 0.576 higher than students in Pasteur on textual. If we allow for partial strong invariance, we see that there is no difference between Grant-White and Pasteur students on visual or speed.
For strict invariance, we add the additional constraint of equal residual variances for the manifest variables across the groups. Because we have only partial invariance, we will allow those parameters to be free that should be (e.g., the intercepts for x3 and x7) and restrict all the other parameters (i.e., factor loadings and intercepts) as well as the residual variances. If we want to make comparsions based on the raw scores (e.g. based on the sum of the scores on the variables), we require strict invariance. This is because the observed variance is the sum of true score variance and residual/error variance. If the residual variances are the same, they have the same amount of true scure variance. As with before, we estimate the latent means.
strict.invariance.x3x7 <- cfa(HS.model, data=HolzingerSwineford1939, group = "school", group.equal = c( "loadings", "intercepts", "residuals"), group.partial = c("x3 ~ 1", "x7 ~ 1"))
anova(strong.invariance.x3x7, strict.invariance.x3x7)
Chi Square Difference Test
Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq)
strong.invariance.x3x7 58 7478.0 7663.3 129.42
strict.invariance.x3x7 67 7477.8 7629.8 147.26 17.838 9 0.0371 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The chi-square difference test is significant. BUT, we have performed a lot of statistical tests. Therefore, a p-value of .037, isn’t that unlike. Let’s look at the other fit measures.
fit.stats <- rbind(fit.stats, fitmeasures(strict.invariance.x3x7, fit.measures = c("chisq", "df", "rmsea", "tli", "cfi", "aic")))
rownames(fit.stats)[5] <- "strict invariance"
round(fit.stats, 4)
chisq df rmsea tli cfi aic
configural 115.8513 48 0.0969 0.8851 0.9234 7484.395
weak invariance 124.0435 54 0.0928 0.8946 0.9209 7480.587
strong 164.1028 60 0.1074 0.8590 0.8825 7508.646
strong with x3 x7 129.4225 58 0.0905 0.8999 0.9194 7477.966
strict invariance 147.2605 67 0.0892 0.9026 0.9094 7477.804
Strict invariance is a marginal improvement in fit over strong invariance where the intercepts for x3 and x7 are free. If we had complete strict invariance, we can say manifest variables are equal reliable across the groups. This means we could just use sum scores for mean comparsions or to perform regressions across groups. We dont’ have full strict invariance, therefore the only scale we could compare in this nature is the textual factor.
When we talk about structural models, we’ll return to this issue and discuss structural invariance, specifically factor variances, factor covariances, and regression coefficents in SEM.