Modeling in lavaan

Installing and librarying lavaan

Lavaan is the most widely used program to conduct SEM modeling in R. Since this is not a default package, we need to install it.

install.packages("lavaan")

You only need to install a program once, but you will need to library the package each session you wish to use it in. We library lavaan using the following function:

library(lavaan)
## This is lavaan 0.6-9
## lavaan is FREE software! Please report any bugs.

Notice that the install command needs the package in quotes, but the library command does not.

Importing data

Once your programs are libraried, you need to read in your dataset. Using the file.choose() function, we can browes our computer for the .csv file we need.

dat <- read.csv(file.choose())

Once our data is read in, we can use head to take a look at it:

head(dat)
##     ID gender age class agen_1 agen_2 agen_3 achiev_1 achiev_2 achiev_3 motiv_1
## 1 1028 female  14     A      2      2      3        3        4        3       3
## 2 1049   male  13     A      3      3      2        2        3        3       3
## 3 1052 female  14     B      3      3      3        2        2        3       2
## 4 1063   male  14     A      3      3      3        2        2        3       3
## 5 1119   male  13     A      3      3      3        3        4        4       2
## 6 1142 female  13     A      3      3      3        3        4        3       2
##   motiv_2 motiv_3
## 1       2       3
## 2       4       4
## 3       3       3
## 4       3       2
## 5       3       2
## 6       4       4

This isn’t necessairy, but it’s always good to check that your data looks like what you expect it to.

Modeling with lavaan

In order to model with lavaan, we will need to use lavaan syntax to define a model. Our model will look like this:

SEM Model Diagram

Item Loadings

In order to designate item loadings, we use the symbols =~ . In the following example, we can see that our latent constructs of Agency, Achievement, and Motivation are being measured by three indicators each.

mod1 <- '
Agency =~ agen_1 +
          agen_2 +
          agen_3

Achiev =~ achiev_1 +
          achiev_2 +
          achiev_3

Motiv  =~ motiv_1 +
          motiv_2 +
          motiv_3
'

For Item loadings in Mplus, we use BY statements instead of =~. Mplus also does not want you to use the plus ( + ) symbol between indicators, but you will need to end every list of construct indicators with a semicolon( ; ). Here is the above model in Mplus:

MODEL:
agency BY agen_1 
          agen_2 
          agen_3;

achiev BY achiev_1 
          achiev_2 
          achiev_3;

motiv  BY motiv_1 
          motiv_2 
          motiv_3;

IMPORTANT: R and lavaan is case sensitive, so if your variables are named “Agen_1” in your dataset, but you use “agen_1” in your model, lavaan will say that no such variable exists! Mplus, on the other hand, is not case sensitive.

Modeling Covariance

Now, lavaan (and Mplus) will automatically estimate covariances among all latent constructs, but if we can also explicitly state those in our model. Lavaan uses the symbols ~~ to specify a covariance, whether it’s between latent variables or observed variables.

mod1 <- '
# Loadings
Agency =~ agen_1 +
          agen_2 +
          agen_3

Achiev =~ achiev_1 +
          achiev_2 +
          achiev_3

Motiv  =~ motiv_1 +
          motiv_2 +
          motiv_3
          
# Latent Covariances
Agency ~~ Achiev
Agency ~~ Motiv
Achiev ~~ Motiv
'

For Mplus, you will need to use the word WITH to explicitly state covariacnes in your model. Below is the same model as above in Mplus syntax (and don’t forget the semicolons!):

MODEL:
! Loadings
agency BY agen_1 
          agen_2 
          agen_3;

achiev BY achiev_1 
          achiev_2 
          achiev_3;

motiv  BY motiv_1 
          motiv_2 
          motiv_3;
          
! Latent Covariances
agency WITH achiev;
agency WITH motiv;
achiev WITH motiv;

Modeling Variance

Like covariances, lavaan and Mplus will automatically estimate latent variances (or fix them if using fixed factor). Lavaan will use the same ~~ syntax for covariance as it will for variance. For example, we’ll explicitly model our latent variance.

mod1 <- '
# Loadings
Agency =~ agen_1 +
          agen_2 +
          agen_3

Achiev =~ achiev_1 +
          achiev_2 +
          achiev_3

Motiv  =~ motiv_1 +
          motiv_2 +
          motiv_3
          
# Latent Covariances
Agency ~~ Achiev
Agency ~~ Motiv
Achiev ~~ Motiv

# Latent Variance
Agency ~~ Agency
Achiev ~~ Achiev
Motiv  ~~ Motiv
'

In Mplus, on the other hand, variances are explicitly estimated by simply listing the name of the variable (followed by our semicolon!).

MODEL:
! Loadings
agency BY agen_1 
          agen_2 
          agen_3;

achiev BY achiev_1 
          achiev_2 
          achiev_3;

motiv  BY motiv_1 
          motiv_2 
          motiv_3;
          
! Latent Covariances
agency WITH achiev;
agency WITH motiv;
achiev WITH motiv;

! Latent Variances
agency;
achiev;
motiv;

Modeling Means

Now we will add the mean information to our model. As we discussed before, the mean can be estimated in our model when we regress it on to a constant without any predictors, and the easiest constant to use is 1. The syntax for doing this in lavaan uses the regression language to model this as Variable ~ 1. Below you’ll see we added latent and manifest means to our model. One feature of this method of identification is that our latent means need to be fixed to 0. In lavaan, we can fix a parameter by multiplying the parameter by that fixed parameter. For example, in our case, Agency ~ 0*1.

mod1 <- '
# Loadings
Agency =~ agen_1 +
          agen_2 +
          agen_3

Achiev =~ achiev_1 +
          achiev_2 +
          achiev_3

Motiv  =~ motiv_1 +
          motiv_2 +
          motiv_3
          
# Latent Covariances
Agency ~~ Achiev
Agency ~~ Motiv
Achiev ~~ Motiv

# Latent Variance
Agency ~~ Agency
Achiev ~~ Achiev
Motiv  ~~ Motiv

# Latent Means
Agency ~ 0*1
Achiev ~ 0*1
Motiv  ~ 0*1

# Indicator Means
agen_1 ~ 1
agen_2 ~ 1
agen_3 ~ 1
achiev_1 ~ 1
achiev_2 ~ 1
achiev_3 ~ 1
motiv_1 ~ 1
motiv_2 ~ 1
motiv_3 ~ 1
'

With Mplus, we can explicitly estimate means by simply placing the names of the variables we desire mean information for in square brackets, like [this];. In order to fix a parameter to 0 like we had to for our latent mean, we use the “at” symbol (@) inside the brackets like []; to say that the parameter is fixed “at” zero. Here is the above model in Mplus syntax.

MODEL:
! Loadings
agency BY agen_1 
          agen_2 
          agen_3;

achiev BY achiev_1 
          achiev_2 
          achiev_3;

motiv  BY motiv_1 
          motiv_2 
          motiv_3;
          
! Latent Covariances
agency WITH achiev;
agency WITH motiv;
achiev WITH motiv;

! Latent Variances
agency;
achiev;
motiv;

! Latent Means
[agency@0];
[achiev@0];
[motiv@0];

! Indicator Means
[agen_1];
[agen_2];
[agen_3];
[achiev_1];
[achiev_2];
[achiev_3];
[motiv_1];
[motiv_2];
[motiv_3];

Evaluating Model

In order to evaluate our model, we will use one of a few lavaan functions. For a basic CFA, we will used the cfa() function. We

fit1<- cfa(model=mod1,         # <- Our model defined previously 
           data=dat,           # <- Our data set
           estimator="ML",     # <- ML is the default estimator
           meanstructure=TRUE) # <- We tell lavaan to evaluate the mean structure too

Looking at model results

Once we have evaluated our model, we can use the summary() function to look at the output of our model analysis.

summary(fit1,              # <- our fit object from cfa()
        fit.measures=TRUE, # <- get our model fit information
        standardized=T)    # <- include standardized results
## lavaan 0.6-9 ended normally after 42 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        30
##                                                       
##   Number of observations                           500
##                                                       
## Model Test User Model:
##                                                       
##   Test statistic                                30.876
##   Degrees of freedom                                24
##   P-value (Chi-square)                           0.157
## 
## Model Test Baseline Model:
## 
##   Test statistic                               692.980
##   Degrees of freedom                                36
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.990
##   Tucker-Lewis Index (TLI)                       0.984
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -4597.202
##   Loglikelihood unrestricted model (H1)      -4581.764
##                                                       
##   Akaike (AIC)                                9254.404
##   Bayesian (BIC)                              9380.842
##   Sample-size adjusted Bayesian (BIC)         9285.620
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.024
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.046
##   P-value RMSEA <= 0.05                          0.977
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.026
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   Agency =~                                                             
##     agen_1            1.000                               0.415    0.672
##     agen_2            0.891    0.105    8.469    0.000    0.369    0.551
##     agen_3            0.866    0.104    8.297    0.000    0.359    0.532
##   Achiev =~                                                             
##     achiev_1          1.000                               0.420    0.568
##     achiev_2          1.136    0.156    7.273    0.000    0.477    0.576
##     achiev_3          1.041    0.147    7.091    0.000    0.437    0.531
##   Motiv =~                                                              
##     motiv_1           1.000                               0.475    0.651
##     motiv_2           0.961    0.103    9.340    0.000    0.456    0.659
##     motiv_3           0.791    0.098    8.061    0.000    0.376    0.493
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   Agency ~~                                                             
##     Achiev            0.099    0.017    5.972    0.000    0.566    0.566
##     Motiv             0.135    0.018    7.345    0.000    0.684    0.684
##   Achiev ~~                                                             
##     Motiv             0.110    0.019    5.846    0.000    0.550    0.550
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     Agency            0.000                               0.000    0.000
##     Achiev            0.000                               0.000    0.000
##     Motiv             0.000                               0.000    0.000
##    .agen_1            3.118    0.028  113.090    0.000    3.118    5.058
##    .agen_2            3.110    0.030  103.678    0.000    3.110    4.637
##    .agen_3            3.078    0.030  101.932    0.000    3.078    4.559
##    .achiev_1          3.068    0.033   92.725    0.000    3.068    4.147
##    .achiev_2          2.988    0.037   80.560    0.000    2.988    3.603
##    .achiev_3          3.210    0.037   87.178    0.000    3.210    3.899
##    .motiv_1           2.854    0.033   87.439    0.000    2.854    3.910
##    .motiv_2           3.192    0.031  103.114    0.000    3.192    4.611
##    .motiv_3           3.016    0.034   88.572    0.000    3.016    3.961
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     Agency            0.172    0.026    6.507    0.000    1.000    1.000
##     Achiev            0.176    0.034    5.139    0.000    1.000    1.000
##     Motiv             0.226    0.035    6.450    0.000    1.000    1.000
##    .agen_1            0.208    0.022    9.642    0.000    0.208    0.548
##    .agen_2            0.313    0.025   12.584    0.000    0.313    0.697
##    .agen_3            0.327    0.025   12.909    0.000    0.327    0.717
##    .achiev_1          0.371    0.033   11.280    0.000    0.371    0.678
##    .achiev_2          0.460    0.041   11.091    0.000    0.460    0.669
##    .achiev_3          0.487    0.040   12.094    0.000    0.487    0.718
##    .motiv_1           0.307    0.029   10.583    0.000    0.307    0.577
##    .motiv_2           0.271    0.026   10.349    0.000    0.271    0.566
##    .motiv_3           0.439    0.032   13.598    0.000    0.439    0.757

Now we can move onto Mplus!