Lavaan is the most widely used program to conduct SEM modeling in R. Since this is not a default package, we need to install it.
install.packages("lavaan")
You only need to install a program once, but you will need to library the package each session you wish to use it in. We library lavaan using the following function:
library(lavaan)
## This is lavaan 0.6-9
## lavaan is FREE software! Please report any bugs.
Notice that the install command needs the package in quotes, but the library command does not.
Once your programs are libraried, you need to read in your dataset. Using the file.choose() function, we can browes our computer for the .csv file we need.
dat <- read.csv(file.choose())
Once our data is read in, we can use head to take a look at it:
head(dat)
## ID gender age class agen_1 agen_2 agen_3 achiev_1 achiev_2 achiev_3 motiv_1
## 1 1028 female 14 A 2 2 3 3 4 3 3
## 2 1049 male 13 A 3 3 2 2 3 3 3
## 3 1052 female 14 B 3 3 3 2 2 3 2
## 4 1063 male 14 A 3 3 3 2 2 3 3
## 5 1119 male 13 A 3 3 3 3 4 4 2
## 6 1142 female 13 A 3 3 3 3 4 3 2
## motiv_2 motiv_3
## 1 2 3
## 2 4 4
## 3 3 3
## 4 3 2
## 5 3 2
## 6 4 4
This isn’t necessairy, but it’s always good to check that your data looks like what you expect it to.
In order to model with lavaan, we will need to use lavaan syntax to define a model. Our model will look like this:
SEM Model Diagram
In order to designate item loadings, we use the symbols =~ . In the following example, we can see that our latent constructs of Agency, Achievement, and Motivation are being measured by three indicators each.
mod1 <- '
Agency =~ agen_1 +
agen_2 +
agen_3
Achiev =~ achiev_1 +
achiev_2 +
achiev_3
Motiv =~ motiv_1 +
motiv_2 +
motiv_3
'
For Item loadings in Mplus, we use BY statements instead of =~. Mplus also does not want you to use the plus ( + ) symbol between indicators, but you will need to end every list of construct indicators with a semicolon( ; ). Here is the above model in Mplus:
MODEL:
agency BY agen_1
agen_2
agen_3;
achiev BY achiev_1
achiev_2
achiev_3;
motiv BY motiv_1
motiv_2
motiv_3;
IMPORTANT: R and lavaan is case sensitive, so if your variables are named “Agen_1” in your dataset, but you use “agen_1” in your model, lavaan will say that no such variable exists! Mplus, on the other hand, is not case sensitive.
Now, lavaan (and Mplus) will automatically estimate covariances among all latent constructs, but if we can also explicitly state those in our model. Lavaan uses the symbols ~~ to specify a covariance, whether it’s between latent variables or observed variables.
mod1 <- '
# Loadings
Agency =~ agen_1 +
agen_2 +
agen_3
Achiev =~ achiev_1 +
achiev_2 +
achiev_3
Motiv =~ motiv_1 +
motiv_2 +
motiv_3
# Latent Covariances
Agency ~~ Achiev
Agency ~~ Motiv
Achiev ~~ Motiv
'
For Mplus, you will need to use the word WITH to explicitly state covariacnes in your model. Below is the same model as above in Mplus syntax (and don’t forget the semicolons!):
MODEL:
! Loadings
agency BY agen_1
agen_2
agen_3;
achiev BY achiev_1
achiev_2
achiev_3;
motiv BY motiv_1
motiv_2
motiv_3;
! Latent Covariances
agency WITH achiev;
agency WITH motiv;
achiev WITH motiv;
Like covariances, lavaan and Mplus will automatically estimate latent variances (or fix them if using fixed factor). Lavaan will use the same ~~ syntax for covariance as it will for variance. For example, we’ll explicitly model our latent variance.
mod1 <- '
# Loadings
Agency =~ agen_1 +
agen_2 +
agen_3
Achiev =~ achiev_1 +
achiev_2 +
achiev_3
Motiv =~ motiv_1 +
motiv_2 +
motiv_3
# Latent Covariances
Agency ~~ Achiev
Agency ~~ Motiv
Achiev ~~ Motiv
# Latent Variance
Agency ~~ Agency
Achiev ~~ Achiev
Motiv ~~ Motiv
'
In Mplus, on the other hand, variances are explicitly estimated by simply listing the name of the variable (followed by our semicolon!).
MODEL:
! Loadings
agency BY agen_1
agen_2
agen_3;
achiev BY achiev_1
achiev_2
achiev_3;
motiv BY motiv_1
motiv_2
motiv_3;
! Latent Covariances
agency WITH achiev;
agency WITH motiv;
achiev WITH motiv;
! Latent Variances
agency;
achiev;
motiv;
Now we will add the mean information to our model. As we discussed before, the mean can be estimated in our model when we regress it on to a constant without any predictors, and the easiest constant to use is 1. The syntax for doing this in lavaan uses the regression language to model this as Variable ~ 1. Below you’ll see we added latent and manifest means to our model. One feature of this method of identification is that our latent means need to be fixed to 0. In lavaan, we can fix a parameter by multiplying the parameter by that fixed parameter. For example, in our case, Agency ~ 0*1.
mod1 <- '
# Loadings
Agency =~ agen_1 +
agen_2 +
agen_3
Achiev =~ achiev_1 +
achiev_2 +
achiev_3
Motiv =~ motiv_1 +
motiv_2 +
motiv_3
# Latent Covariances
Agency ~~ Achiev
Agency ~~ Motiv
Achiev ~~ Motiv
# Latent Variance
Agency ~~ Agency
Achiev ~~ Achiev
Motiv ~~ Motiv
# Latent Means
Agency ~ 0*1
Achiev ~ 0*1
Motiv ~ 0*1
# Indicator Means
agen_1 ~ 1
agen_2 ~ 1
agen_3 ~ 1
achiev_1 ~ 1
achiev_2 ~ 1
achiev_3 ~ 1
motiv_1 ~ 1
motiv_2 ~ 1
motiv_3 ~ 1
'
With Mplus, we can explicitly estimate means by simply placing the names of the variables we desire mean information for in square brackets, like [this];. In order to fix a parameter to 0 like we had to for our latent mean, we use the “at” symbol (@) inside the brackets like [this@0]; to say that the parameter is fixed “at” zero. Here is the above model in Mplus syntax.
MODEL:
! Loadings
agency BY agen_1
agen_2
agen_3;
achiev BY achiev_1
achiev_2
achiev_3;
motiv BY motiv_1
motiv_2
motiv_3;
! Latent Covariances
agency WITH achiev;
agency WITH motiv;
achiev WITH motiv;
! Latent Variances
agency;
achiev;
motiv;
! Latent Means
[agency@0];
[achiev@0];
[motiv@0];
! Indicator Means
[agen_1];
[agen_2];
[agen_3];
[achiev_1];
[achiev_2];
[achiev_3];
[motiv_1];
[motiv_2];
[motiv_3];
In order to evaluate our model, we will use one of a few lavaan functions. For a basic CFA, we will used the cfa() function. We
fit1<- cfa(model=mod1, # <- Our model defined previously
data=dat, # <- Our data set
estimator="ML", # <- ML is the default estimator
meanstructure=TRUE) # <- We tell lavaan to evaluate the mean structure too
Once we have evaluated our model, we can use the summary() function to look at the output of our model analysis.
summary(fit1, # <- our fit object from cfa()
fit.measures=TRUE, # <- get our model fit information
standardized=T) # <- include standardized results
## lavaan 0.6-9 ended normally after 42 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 30
##
## Number of observations 500
##
## Model Test User Model:
##
## Test statistic 30.876
## Degrees of freedom 24
## P-value (Chi-square) 0.157
##
## Model Test Baseline Model:
##
## Test statistic 692.980
## Degrees of freedom 36
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.990
## Tucker-Lewis Index (TLI) 0.984
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -4597.202
## Loglikelihood unrestricted model (H1) -4581.764
##
## Akaike (AIC) 9254.404
## Bayesian (BIC) 9380.842
## Sample-size adjusted Bayesian (BIC) 9285.620
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.024
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.046
## P-value RMSEA <= 0.05 0.977
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.026
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## Agency =~
## agen_1 1.000 0.415 0.672
## agen_2 0.891 0.105 8.469 0.000 0.369 0.551
## agen_3 0.866 0.104 8.297 0.000 0.359 0.532
## Achiev =~
## achiev_1 1.000 0.420 0.568
## achiev_2 1.136 0.156 7.273 0.000 0.477 0.576
## achiev_3 1.041 0.147 7.091 0.000 0.437 0.531
## Motiv =~
## motiv_1 1.000 0.475 0.651
## motiv_2 0.961 0.103 9.340 0.000 0.456 0.659
## motiv_3 0.791 0.098 8.061 0.000 0.376 0.493
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## Agency ~~
## Achiev 0.099 0.017 5.972 0.000 0.566 0.566
## Motiv 0.135 0.018 7.345 0.000 0.684 0.684
## Achiev ~~
## Motiv 0.110 0.019 5.846 0.000 0.550 0.550
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## Agency 0.000 0.000 0.000
## Achiev 0.000 0.000 0.000
## Motiv 0.000 0.000 0.000
## .agen_1 3.118 0.028 113.090 0.000 3.118 5.058
## .agen_2 3.110 0.030 103.678 0.000 3.110 4.637
## .agen_3 3.078 0.030 101.932 0.000 3.078 4.559
## .achiev_1 3.068 0.033 92.725 0.000 3.068 4.147
## .achiev_2 2.988 0.037 80.560 0.000 2.988 3.603
## .achiev_3 3.210 0.037 87.178 0.000 3.210 3.899
## .motiv_1 2.854 0.033 87.439 0.000 2.854 3.910
## .motiv_2 3.192 0.031 103.114 0.000 3.192 4.611
## .motiv_3 3.016 0.034 88.572 0.000 3.016 3.961
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## Agency 0.172 0.026 6.507 0.000 1.000 1.000
## Achiev 0.176 0.034 5.139 0.000 1.000 1.000
## Motiv 0.226 0.035 6.450 0.000 1.000 1.000
## .agen_1 0.208 0.022 9.642 0.000 0.208 0.548
## .agen_2 0.313 0.025 12.584 0.000 0.313 0.697
## .agen_3 0.327 0.025 12.909 0.000 0.327 0.717
## .achiev_1 0.371 0.033 11.280 0.000 0.371 0.678
## .achiev_2 0.460 0.041 11.091 0.000 0.460 0.669
## .achiev_3 0.487 0.040 12.094 0.000 0.487 0.718
## .motiv_1 0.307 0.029 10.583 0.000 0.307 0.577
## .motiv_2 0.271 0.026 10.349 0.000 0.271 0.566
## .motiv_3 0.439 0.032 13.598 0.000 0.439 0.757