SEM WORKSHOP May 6th, 2014. Njål Foldnes, BI Norwegian Business School.
This report contains R code for estimating latent variable interaction with the product indicator approach, using the R package lavaan.
Let us first generate a dataset with interaction. We fix the parameters to values found from a real-world dataset
reg_coeff = c(0.4, 0.4, 0.2)
ploadings = c(1, 0.96, 0.78)
aloadings = c(1, 0.57, 0.71, 1.01)
gloadings = c(1, 1.1, 0.77)
covpbcatt = 0.31 #between first-order factors
# residual variances:
varp1 = 0.37
varp2 = 0.67
varp3 = 1.39
vara1 = 0.38
vara2 = 0.72
vara3 = 3.01
vara4 = 2.35
varg1 = 1.29
varg2 = 0.54
varg3 = 1.24
varpbc = 2.71
varatt = 1.79
vargc = 2.85
Next we create multivariate normal data from these fixed parameters.
set.seed(1234) # so that we can replicate
samplesize = 500
PBC <- rnorm(samplesize, 0, sqrt(varpbc))
ATT <- covpbcatt/varpbc * PBC + sqrt(varatt - covpbcatt/varpbc^2) * rnorm(samplesize)
INT = PBC * ATT
GC <- reg_coeff[1] * PBC + reg_coeff[2] * ATT + reg_coeff[3] * INT + rnorm(samplesize,
sd = sqrt(vargc))
p1 <- ploadings[1] * PBC + rnorm(samplesize, sd = sqrt(varp1))
p2 <- ploadings[2] * PBC + rnorm(samplesize, sd = sqrt(varp2))
p3 <- ploadings[3] * PBC + rnorm(samplesize, sd = sqrt(varp3))
a1 <- aloadings[1] * ATT + rnorm(samplesize, sd = sqrt(vara1))
a2 <- aloadings[2] * ATT + rnorm(samplesize, sd = sqrt(vara2))
a3 <- aloadings[3] * ATT + rnorm(samplesize, sd = sqrt(vara3))
a4 <- aloadings[4] * ATT + rnorm(samplesize, sd = sqrt(vara4))
g1 <- gloadings[1] * GC + rnorm(samplesize, sd = sqrt(varg1))
g2 <- gloadings[2] * GC + rnorm(samplesize, sd = sqrt(varg2))
g3 <- gloadings[3] * GC + rnorm(samplesize, sd = sqrt(varg3))
obs = cbind(p1, p2, p3, a1, a2, a3, a4, g1, g2, g3)
round(colMeans(obs), 3)
## p1 p2 p3 a1 a2 a3 a4 g1 g2 g3
## 0.001 0.045 0.011 -0.044 -0.097 -0.013 -0.226 0.186 0.169 0.057
It is recommended to mean-center all indicators, create product indicators, and again to mean-center these. (Lin, Guan-Chyun, et al. “Structural equation models of latent interactions: Clarification of orthogonalizing and double-mean-centering strategies.” Structural Equation Modeling (2010): 374-391.)
cn = colnames(obs)
obs = obs - data.frame(matrix(rep(colMeans(obs), nrow(obs)), ncol = ncol(obs),
byrow = T)) #mean centering
colnames(obs) = cn # awkward way to retain variable names
round(colMeans(obs), 3)
## p1 p2 p3 a1 a2 a3 a4 g1 g2 g3
## 0 0 0 0 0 0 0 0 0 0
First, let us test the measurement model for the first-order latent factors. If the measurement model does not fit, it does not make sense to proceed with the interaction model. We use the lavaan package for estimating the models (Yves Rosseel (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(2), 1-36. URL http://www.jstatsoft.org/v48/i02/.)
library(lavaan) # first install with install.packages('lavaan')
## This is lavaan 0.5-16
## lavaan is BETA software! Please report any bugs.
meas.model = "GC =~ g1+g2+g3; PBC =~ p1+p2+p3; ATT =~ a1+a2+a3+a4"
meas.f = cfa(meas.model, data = obs)
summary(meas.f)
## lavaan (0.5-16) converged normally after 49 iterations
##
## Number of observations 500
##
## Estimator ML
## Minimum Function Test Statistic 20.691
## Degrees of freedom 32
## P-value (Chi-square) 0.938
##
## Parameter estimates:
##
## Information Expected
## Standard Errors Standard
##
## Estimate Std.err Z-value P(>|z|)
## Latent variables:
## GC =~
## g1 1.000
## g2 1.073 0.038 28.056 0.000
## g3 0.768 0.033 22.947 0.000
## PBC =~
## p1 1.000
## p2 0.975 0.036 27.054 0.000
## p3 0.726 0.037 19.620 0.000
## ATT =~
## a1 1.000
## a2 0.590 0.041 14.555 0.000
## a3 0.637 0.068 9.398 0.000
## a4 0.995 0.070 14.300 0.000
##
## Covariances:
## GC ~~
## PBC 1.632 0.181 9.001 0.000
## ATT 0.785 0.135 5.807 0.000
## PBC ~~
## ATT 0.518 0.114 4.533 0.000
##
## Variances:
## g1 1.195 0.112
## g2 0.555 0.100
## g3 1.195 0.091
## p1 0.434 0.077
## p2 0.664 0.080
## p3 1.378 0.097
## a1 0.415 0.088
## a2 0.665 0.052
## a3 2.856 0.189
## a4 2.041 0.157
## GC 3.800 0.318
## PBC 2.839 0.217
## ATT 1.728 0.157
Compare this with the parameter values we fixed in the beginning. Fit statistics indicate reasonably good fit (unexpectedly, since the model is correctly specified for the population from which our sample is randomly drawn)
fitmeasures(meas.f)
## fmin chisq df
## 0.021 20.691 32.000
## pvalue baseline.chisq baseline.df
## 0.938 2693.170 45.000
## baseline.pvalue cfi tli
## 0.000 1.000 1.006
## nnfi rfi nfi
## 1.006 0.989 0.992
## pnfi ifi rni
## 0.706 1.004 1.000
## logl unrestricted.logl npar
## -8628.896 -8618.550 23.000
## aic bic ntotal
## 17303.792 17400.728 500.000
## bic2 rmsea rmsea.ci.lower
## 17327.724 0.000 0.000
## rmsea.ci.upper rmsea.pvalue rmr
## 0.008 1.000 0.072
## rmr_nomean srmr srmr_bentler
## 0.072 0.021 0.021
## srmr_bentler_nomean srmr_bollen srmr_bollen_nomean
## 0.021 0.021 0.021
## srmr_mplus srmr_mplus_nomean cn_05
## 0.021 0.021 1117.288
## cn_01 gfi agfi
## 1293.488 0.992 0.986
## pgfi mfi ecvi
## 0.577 1.011 0.133
The effect of PBC and ATT on GC, assuming no interaction, is estimated by the model
sem.model = "GC =~ g1+g2+g3;PBC =~ p1+p2+p3; ATT =~ a1+a2+a3+a4; GC ~ PBC + ATT"
sem.f = sem(sem.model, data = obs)
summary(sem.f)
## lavaan (0.5-16) converged normally after 42 iterations
##
## Number of observations 500
##
## Estimator ML
## Minimum Function Test Statistic 20.691
## Degrees of freedom 32
## P-value (Chi-square) 0.938
##
## Parameter estimates:
##
## Information Expected
## Standard Errors Standard
##
## Estimate Std.err Z-value P(>|z|)
## Latent variables:
## GC =~
## g1 1.000
## g2 1.073 0.038 28.056 0.000
## g3 0.768 0.033 22.947 0.000
## PBC =~
## p1 1.000
## p2 0.975 0.036 27.054 0.000
## p3 0.726 0.037 19.620 0.000
## ATT =~
## a1 1.000
## a2 0.590 0.041 14.555 0.000
## a3 0.637 0.068 9.398 0.000
## a4 0.995 0.070 14.300 0.000
##
## Regressions:
## GC ~
## PBC 0.521 0.053 9.796 0.000
## ATT 0.298 0.068 4.371 0.000
##
## Covariances:
## PBC ~~
## ATT 0.518 0.114 4.533 0.000
##
## Variances:
## g1 1.195 0.112
## g2 0.555 0.100
## g3 1.195 0.091
## p1 0.434 0.077
## p2 0.664 0.080
## p3 1.378 0.097
## a1 0.415 0.088
## a2 0.665 0.052
## a3 2.856 0.189
## a4 2.041 0.157
## GC 2.716 0.235
## PBC 2.839 0.217
## ATT 1.728 0.157
Unfortunately, the chi-square goodness-of-fit test is not capable of indicating the misspecification of excluding the interaction (Mooijaart, A., & Satorra, A. (2009). On Insensitivity of the Chi-Square Model Test to Nonlinear Misspecification in Structural Equation Models. Psychometrika, 74(3), 443–455.) Therefore, if a proposed linear model fits the data well according to the chi-square test, it does not necessarily follow that the underlying model is linear.
Our main concern is to test whether the interaction parameter \( \gamma_3 \) in \[ GC = \gamma_1 PBC + \gamma_2 ATT + \gamma_3 PBC \cdot ATT + \zeta \] significantly differs from zero. We now employ the product indicator approach to estimate \( \gamma_3 \).
Let us create and mean-center all twelve product indicators.
# create product indicator P11 from p1 and a1, and so on
prod = obs$p1 * obs$a1
obs$prod11 = prod - mean(prod)
prod = obs$p1 * obs$a2
obs$prod12 = prod - mean(prod)
prod = obs$p1 * obs$a3
obs$prod13 = prod - mean(prod)
prod = obs$p1 * obs$a4
obs$prod14 = prod - mean(prod)
prod = obs$p2 * obs$a1
obs$prod21 = prod - mean(prod)
prod = obs$p2 * obs$a2
obs$prod22 = prod - mean(prod)
prod = obs$p2 * obs$a3
obs$prod23 = prod - mean(prod)
prod = obs$p2 * obs$a4
obs$prod24 = prod - mean(prod)
prod = obs$p3 * obs$a1
obs$prod31 = prod - mean(prod)
prod = obs$p3 * obs$a2
obs$prod32 = prod - mean(prod)
prod = obs$p3 * obs$a3
obs$prod33 = prod - mean(prod)
prod = obs$p3 * obs$a4
obs$prod34 = prod - mean(prod)
head(obs, 2)
## p1 p2 p3 a1 a2 a3 a4 g1 g2
## 1 -0.8972 -2.7492 -3.1713 0.8809 -0.9212 -0.8941 -0.7989 -3.9499 -3.6146
## 2 -0.3747 0.3123 0.9813 -1.2064 -1.5410 -2.5108 -2.7350 0.1302 -0.3057
## g3 prod11 prod12 prod13 prod14 prod21 prod22 prod23 prod24
## 1 -2.207 -1.28853 0.5480 0.5931 0.06248 -2.9145 2.2534 2.202 1.557
## 2 -0.345 -0.04614 0.2989 0.7317 0.37059 -0.8695 -0.7607 -1.041 -1.494
## prod31 prod32 prod33 prod34
## 1 -3.194 2.700 2.705 1.971
## 2 -1.584 -1.734 -2.594 -3.247
Sometimes the total number of product indicators is too large, and the researcher must choose a subset of product indicators. Let us try to estimate the interaction model by choosing the following four product indicators \( p_1a_1, p_1a_2, p_2a_3, p_3a_4 \). Product indicators that share a constituent first-order indicator (i.e., \( p_1a_1 \) and \( p_1a_2 \) share \( p_1 \)) have residual covariance.
interaction.model.1 = "GC =~ g1+g2+g3; PBC =~ p1+p2+p3; ATT =~ a1+a2+a3+a4; INT =~ prod11+prod12+prod23+prod34; GC ~ PBC + ATT+INT; prod11 ~~prod12"
int1.f = sem(interaction.model.1, data = obs)
summary(int1.f)
## lavaan (0.5-16) converged normally after 71 iterations
##
## Number of observations 500
##
## Estimator ML
## Minimum Function Test Statistic 47.446
## Degrees of freedom 70
## P-value (Chi-square) 0.982
##
## Parameter estimates:
##
## Information Expected
## Standard Errors Standard
##
## Estimate Std.err Z-value P(>|z|)
## Latent variables:
## GC =~
## g1 1.000
## g2 1.073 0.038 28.207 0.000
## g3 0.767 0.033 22.908 0.000
## PBC =~
## p1 1.000
## p2 0.975 0.036 27.071 0.000
## p3 0.726 0.037 19.617 0.000
## ATT =~
## a1 1.000
## a2 0.587 0.040 14.555 0.000
## a3 0.633 0.067 9.384 0.000
## a4 0.988 0.069 14.289 0.000
## INT =~
## prod11 1.000
## prod12 0.557 0.064 8.660 0.000
## prod23 0.769 0.146 5.270 0.000
## prod34 0.868 0.159 5.474 0.000
##
## Regressions:
## GC ~
## PBC 0.473 0.054 8.845 0.000
## ATT 0.315 0.067 4.677 0.000
## INT 0.224 0.062 3.636 0.000
##
## Covariances:
## prod11 ~~
## prod12 0.430 0.345 1.246 0.213
## PBC ~~
## ATT 0.519 0.115 4.531 0.000
## INT 0.557 0.182 3.055 0.002
## ATT ~~
## INT -0.026 0.144 -0.181 0.857
##
## Variances:
## g1 1.194 0.111
## g2 0.548 0.099
## g3 1.202 0.091
## p1 0.433 0.077
## p2 0.664 0.080
## p3 1.379 0.097
## a1 0.402 0.088
## a2 0.668 0.052
## a3 2.860 0.189
## a4 2.052 0.157
## prod11 2.939 0.615
## prod12 2.200 0.265
## prod23 9.724 0.715
## prod34 8.395 0.691
## GC 2.552 0.227
## PBC 2.839 0.217
## ATT 1.741 0.158
## INT 3.383 0.685
With this choice of product indicator configuration \( \hat{\gamma}_3= \) 0.2244. Note that the true underlying population parameter is \( \gamma_3= \) 0.2
Let us choose some other reasonable set of product indicators: \( p_1a_4, p_2a_1, p_3a_2 \).
interaction.model.2 = "GC =~ g1+g2+g3; PBC =~ p1+p2+p3; ATT =~ a1+a2+a3+a4; INT =~ prod14+prod21+prod32; GC ~ PBC + ATT+INT;"
int2.f = sem(interaction.model.2, data = obs)
subset(parameterestimates(int2.f), lhs == "GC" & op == "~")
## lhs op rhs est se z pvalue ci.lower ci.upper
## 1 GC ~ PBC 0.484 0.054 9.035 0.000 0.379 0.589
## 2 GC ~ ATT 0.325 0.068 4.760 0.000 0.191 0.458
## 3 GC ~ INT 0.159 0.048 3.317 0.001 0.065 0.254
Now the estimated effect, although still highly significant, is different: \( \hat{\gamma}_3= \) 0.1593. So the product indicator approach inherently contains ambiguity (Foldnes, N., & Hagtvet, K. A. (2014). The choice of product indicators in latent variable interaction models: Post hoc analyses. Psychological Methods.)