SEM WORKSHOP May 6th, 2014. Njål Foldnes, BI Norwegian Business School.

Latent Variable Interaction Modeling with R

This report contains R code for estimating latent variable interaction with the product indicator approach, using the R package lavaan.

1. Data preparation

Let us first generate a dataset with interaction. We fix the parameters to values found from a real-world dataset

reg_coeff = c(0.4, 0.4, 0.2)
ploadings = c(1, 0.96, 0.78)
aloadings = c(1, 0.57, 0.71, 1.01)
gloadings = c(1, 1.1, 0.77)
covpbcatt = 0.31  #between first-order factors

# residual variances:
varp1 = 0.37
varp2 = 0.67
varp3 = 1.39

vara1 = 0.38
vara2 = 0.72
vara3 = 3.01
vara4 = 2.35

varg1 = 1.29
varg2 = 0.54
varg3 = 1.24

varpbc = 2.71
varatt = 1.79
vargc = 2.85

Next we create multivariate normal data from these fixed parameters.

set.seed(1234)  # so that we can replicate
samplesize = 500
PBC <- rnorm(samplesize, 0, sqrt(varpbc))
ATT <- covpbcatt/varpbc * PBC + sqrt(varatt - covpbcatt/varpbc^2) * rnorm(samplesize)
INT = PBC * ATT
GC <- reg_coeff[1] * PBC + reg_coeff[2] * ATT + reg_coeff[3] * INT + rnorm(samplesize, 
    sd = sqrt(vargc))

p1 <- ploadings[1] * PBC + rnorm(samplesize, sd = sqrt(varp1))
p2 <- ploadings[2] * PBC + rnorm(samplesize, sd = sqrt(varp2))
p3 <- ploadings[3] * PBC + rnorm(samplesize, sd = sqrt(varp3))

a1 <- aloadings[1] * ATT + rnorm(samplesize, sd = sqrt(vara1))
a2 <- aloadings[2] * ATT + rnorm(samplesize, sd = sqrt(vara2))
a3 <- aloadings[3] * ATT + rnorm(samplesize, sd = sqrt(vara3))
a4 <- aloadings[4] * ATT + rnorm(samplesize, sd = sqrt(vara4))

g1 <- gloadings[1] * GC + rnorm(samplesize, sd = sqrt(varg1))
g2 <- gloadings[2] * GC + rnorm(samplesize, sd = sqrt(varg2))
g3 <- gloadings[3] * GC + rnorm(samplesize, sd = sqrt(varg3))

obs = cbind(p1, p2, p3, a1, a2, a3, a4, g1, g2, g3)
round(colMeans(obs), 3)

##     p1     p2     p3     a1     a2     a3     a4     g1     g2     g3 
##  0.001  0.045  0.011 -0.044 -0.097 -0.013 -0.226  0.186  0.169  0.057

It is recommended to mean-center all indicators, create product indicators, and again to mean-center these. (Lin, Guan-Chyun, et al. “Structural equation models of latent interactions: Clarification of orthogonalizing and double-mean-centering strategies.” Structural Equation Modeling (2010): 374-391.)

cn = colnames(obs)
obs = obs - data.frame(matrix(rep(colMeans(obs), nrow(obs)), ncol = ncol(obs), 
    byrow = T))  #mean centering
colnames(obs) = cn  # awkward way to retain variable names
round(colMeans(obs), 3)

## p1 p2 p3 a1 a2 a3 a4 g1 g2 g3 
##  0  0  0  0  0  0  0  0  0  0

2. The measurement model

First, let us test the measurement model for the first-order latent factors. If the measurement model does not fit, it does not make sense to proceed with the interaction model. We use the lavaan package for estimating the models (Yves Rosseel (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(2), 1-36. URL http://www.jstatsoft.org/v48/i02/.)

library(lavaan)  # first install with install.packages('lavaan')

## This is lavaan 0.5-16
## lavaan is BETA software! Please report any bugs.

meas.model = "GC =~ g1+g2+g3; PBC =~ p1+p2+p3; ATT  =~ a1+a2+a3+a4"
meas.f = cfa(meas.model, data = obs)
summary(meas.f)

## lavaan (0.5-16) converged normally after  49 iterations
## 
##   Number of observations                           500
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic               20.691
##   Degrees of freedom                                32
##   P-value (Chi-square)                           0.938
## 
## Parameter estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
##                    Estimate  Std.err  Z-value  P(>|z|)
## Latent variables:
##   GC =~
##     g1                1.000
##     g2                1.073    0.038   28.056    0.000
##     g3                0.768    0.033   22.947    0.000
##   PBC =~
##     p1                1.000
##     p2                0.975    0.036   27.054    0.000
##     p3                0.726    0.037   19.620    0.000
##   ATT =~
##     a1                1.000
##     a2                0.590    0.041   14.555    0.000
##     a3                0.637    0.068    9.398    0.000
##     a4                0.995    0.070   14.300    0.000
## 
## Covariances:
##   GC ~~
##     PBC               1.632    0.181    9.001    0.000
##     ATT               0.785    0.135    5.807    0.000
##   PBC ~~
##     ATT               0.518    0.114    4.533    0.000
## 
## Variances:
##     g1                1.195    0.112
##     g2                0.555    0.100
##     g3                1.195    0.091
##     p1                0.434    0.077
##     p2                0.664    0.080
##     p3                1.378    0.097
##     a1                0.415    0.088
##     a2                0.665    0.052
##     a3                2.856    0.189
##     a4                2.041    0.157
##     GC                3.800    0.318
##     PBC               2.839    0.217
##     ATT               1.728    0.157

Compare this with the parameter values we fixed in the beginning. Fit statistics indicate reasonably good fit (unexpectedly, since the model is correctly specified for the population from which our sample is randomly drawn)

fitmeasures(meas.f)

##                fmin               chisq                  df 
##               0.021              20.691              32.000 
##              pvalue      baseline.chisq         baseline.df 
##               0.938            2693.170              45.000 
##     baseline.pvalue                 cfi                 tli 
##               0.000               1.000               1.006 
##                nnfi                 rfi                 nfi 
##               1.006               0.989               0.992 
##                pnfi                 ifi                 rni 
##               0.706               1.004               1.000 
##                logl   unrestricted.logl                npar 
##           -8628.896           -8618.550              23.000 
##                 aic                 bic              ntotal 
##           17303.792           17400.728             500.000 
##                bic2               rmsea      rmsea.ci.lower 
##           17327.724               0.000               0.000 
##      rmsea.ci.upper        rmsea.pvalue                 rmr 
##               0.008               1.000               0.072 
##          rmr_nomean                srmr        srmr_bentler 
##               0.072               0.021               0.021 
## srmr_bentler_nomean         srmr_bollen  srmr_bollen_nomean 
##               0.021               0.021               0.021 
##          srmr_mplus   srmr_mplus_nomean               cn_05 
##               0.021               0.021            1117.288 
##               cn_01                 gfi                agfi 
##            1293.488               0.992               0.986 
##                pgfi                 mfi                ecvi 
##               0.577               1.011               0.133

The regression model without interaction: The insensitivity of the chi-square test

The effect of PBC and ATT on GC, assuming no interaction, is estimated by the model

sem.model = "GC =~ g1+g2+g3;PBC =~ p1+p2+p3; ATT  =~ a1+a2+a3+a4; GC ~ PBC + ATT"
sem.f = sem(sem.model, data = obs)
summary(sem.f)

## lavaan (0.5-16) converged normally after  42 iterations
## 
##   Number of observations                           500
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic               20.691
##   Degrees of freedom                                32
##   P-value (Chi-square)                           0.938
## 
## Parameter estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
##                    Estimate  Std.err  Z-value  P(>|z|)
## Latent variables:
##   GC =~
##     g1                1.000
##     g2                1.073    0.038   28.056    0.000
##     g3                0.768    0.033   22.947    0.000
##   PBC =~
##     p1                1.000
##     p2                0.975    0.036   27.054    0.000
##     p3                0.726    0.037   19.620    0.000
##   ATT =~
##     a1                1.000
##     a2                0.590    0.041   14.555    0.000
##     a3                0.637    0.068    9.398    0.000
##     a4                0.995    0.070   14.300    0.000
## 
## Regressions:
##   GC ~
##     PBC               0.521    0.053    9.796    0.000
##     ATT               0.298    0.068    4.371    0.000
## 
## Covariances:
##   PBC ~~
##     ATT               0.518    0.114    4.533    0.000
## 
## Variances:
##     g1                1.195    0.112
##     g2                0.555    0.100
##     g3                1.195    0.091
##     p1                0.434    0.077
##     p2                0.664    0.080
##     p3                1.378    0.097
##     a1                0.415    0.088
##     a2                0.665    0.052
##     a3                2.856    0.189
##     a4                2.041    0.157
##     GC                2.716    0.235
##     PBC               2.839    0.217
##     ATT               1.728    0.157

Unfortunately, the chi-square goodness-of-fit test is not capable of indicating the misspecification of excluding the interaction (Mooijaart, A., & Satorra, A. (2009). On Insensitivity of the Chi-Square Model Test to Nonlinear Misspecification in Structural Equation Models. Psychometrika, 74(3), 443–455.) Therefore, if a proposed linear model fits the data well according to the chi-square test, it does not necessarily follow that the underlying model is linear.

3. Estimating the interaction model with product indicators

Our main concern is to test whether the interaction parameter \( \gamma_3 \) in \[ GC = \gamma_1 PBC + \gamma_2 ATT + \gamma_3 PBC \cdot ATT + \zeta \] significantly differs from zero. We now employ the product indicator approach to estimate \( \gamma_3 \).

Create product indicators

Let us create and mean-center all twelve product indicators.

# create product indicator P11 from p1 and a1, and so on
prod = obs$p1 * obs$a1
obs$prod11 = prod - mean(prod)
prod = obs$p1 * obs$a2
obs$prod12 = prod - mean(prod)
prod = obs$p1 * obs$a3
obs$prod13 = prod - mean(prod)
prod = obs$p1 * obs$a4
obs$prod14 = prod - mean(prod)
prod = obs$p2 * obs$a1
obs$prod21 = prod - mean(prod)
prod = obs$p2 * obs$a2
obs$prod22 = prod - mean(prod)
prod = obs$p2 * obs$a3
obs$prod23 = prod - mean(prod)
prod = obs$p2 * obs$a4
obs$prod24 = prod - mean(prod)
prod = obs$p3 * obs$a1
obs$prod31 = prod - mean(prod)
prod = obs$p3 * obs$a2
obs$prod32 = prod - mean(prod)
prod = obs$p3 * obs$a3
obs$prod33 = prod - mean(prod)
prod = obs$p3 * obs$a4
obs$prod34 = prod - mean(prod)
head(obs, 2)

##        p1      p2      p3      a1      a2      a3      a4      g1      g2
## 1 -0.8972 -2.7492 -3.1713  0.8809 -0.9212 -0.8941 -0.7989 -3.9499 -3.6146
## 2 -0.3747  0.3123  0.9813 -1.2064 -1.5410 -2.5108 -2.7350  0.1302 -0.3057
##       g3   prod11 prod12 prod13  prod14  prod21  prod22 prod23 prod24
## 1 -2.207 -1.28853 0.5480 0.5931 0.06248 -2.9145  2.2534  2.202  1.557
## 2 -0.345 -0.04614 0.2989 0.7317 0.37059 -0.8695 -0.7607 -1.041 -1.494
##   prod31 prod32 prod33 prod34
## 1 -3.194  2.700  2.705  1.971
## 2 -1.584 -1.734 -2.594 -3.247

Choose a product indicator set

Sometimes the total number of product indicators is too large, and the researcher must choose a subset of product indicators. Let us try to estimate the interaction model by choosing the following four product indicators \( p_1a_1, p_1a_2, p_2a_3, p_3a_4 \). Product indicators that share a constituent first-order indicator (i.e., \( p_1a_1 \) and \( p_1a_2 \) share \( p_1 \)) have residual covariance.
alt text

Estimate the interaction model

interaction.model.1 = "GC =~ g1+g2+g3; PBC =~ p1+p2+p3; ATT  =~ a1+a2+a3+a4; INT =~ prod11+prod12+prod23+prod34; GC ~ PBC + ATT+INT; prod11 ~~prod12"
int1.f = sem(interaction.model.1, data = obs)
summary(int1.f)

## lavaan (0.5-16) converged normally after  71 iterations
## 
##   Number of observations                           500
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic               47.446
##   Degrees of freedom                                70
##   P-value (Chi-square)                           0.982
## 
## Parameter estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
##                    Estimate  Std.err  Z-value  P(>|z|)
## Latent variables:
##   GC =~
##     g1                1.000
##     g2                1.073    0.038   28.207    0.000
##     g3                0.767    0.033   22.908    0.000
##   PBC =~
##     p1                1.000
##     p2                0.975    0.036   27.071    0.000
##     p3                0.726    0.037   19.617    0.000
##   ATT =~
##     a1                1.000
##     a2                0.587    0.040   14.555    0.000
##     a3                0.633    0.067    9.384    0.000
##     a4                0.988    0.069   14.289    0.000
##   INT =~
##     prod11            1.000
##     prod12            0.557    0.064    8.660    0.000
##     prod23            0.769    0.146    5.270    0.000
##     prod34            0.868    0.159    5.474    0.000
## 
## Regressions:
##   GC ~
##     PBC               0.473    0.054    8.845    0.000
##     ATT               0.315    0.067    4.677    0.000
##     INT               0.224    0.062    3.636    0.000
## 
## Covariances:
##   prod11 ~~
##     prod12            0.430    0.345    1.246    0.213
##   PBC ~~
##     ATT               0.519    0.115    4.531    0.000
##     INT               0.557    0.182    3.055    0.002
##   ATT ~~
##     INT              -0.026    0.144   -0.181    0.857
## 
## Variances:
##     g1                1.194    0.111
##     g2                0.548    0.099
##     g3                1.202    0.091
##     p1                0.433    0.077
##     p2                0.664    0.080
##     p3                1.379    0.097
##     a1                0.402    0.088
##     a2                0.668    0.052
##     a3                2.860    0.189
##     a4                2.052    0.157
##     prod11            2.939    0.615
##     prod12            2.200    0.265
##     prod23            9.724    0.715
##     prod34            8.395    0.691
##     GC                2.552    0.227
##     PBC               2.839    0.217
##     ATT               1.741    0.158
##     INT               3.383    0.685

With this choice of product indicator configuration \( \hat{\gamma}_3= \) 0.2244. Note that the true underlying population parameter is \( \gamma_3= \) 0.2

The estimated interaction effect is contingent upon the choice of product indicators

Let us choose some other reasonable set of product indicators: \( p_1a_4, p_2a_1, p_3a_2 \).

interaction.model.2 = "GC =~ g1+g2+g3; PBC =~ p1+p2+p3; ATT  =~ a1+a2+a3+a4; INT =~ prod14+prod21+prod32; GC ~ PBC + ATT+INT;"
int2.f = sem(interaction.model.2, data = obs)
subset(parameterestimates(int2.f), lhs == "GC" & op == "~")

##   lhs op rhs   est    se     z pvalue ci.lower ci.upper
## 1  GC  ~ PBC 0.484 0.054 9.035  0.000    0.379    0.589
## 2  GC  ~ ATT 0.325 0.068 4.760  0.000    0.191    0.458
## 3  GC  ~ INT 0.159 0.048 3.317  0.001    0.065    0.254

Now the estimated effect, although still highly significant, is different: \( \hat{\gamma}_3= \) 0.1593. So the product indicator approach inherently contains ambiguity (Foldnes, N., & Hagtvet, K. A. (2014). The choice of product indicators in latent variable interaction models: Post hoc analyses. Psychological Methods.)