CFA & Hierarchical Latent Variable Models
Learning Objectives
- Understand the process of running a one-factor, four-factor, higher-order, and bifactor CFA in Lavaan.
- Understand the process of fitting a CFA model i. Model Specification ii. Model Explanation iii. Model Identification
- Understand the meaning of identification and model degrees of freedom i. What is a just-identified model? ii. What is an under-identified model? iii. What is an over-identified model?
- Know how variance can be decomposed
- Understand how to compare and select the best possible model given your data i. Know what fit indices to look at and what is considered ‘good fit’ ii. Run and interpret a chi-square LR test & know the drawbacks of this GOF measure iii. Evaluate and interpret the lambda and theta matrices from your model(s)
Homework Questions
This text file includes the covariance matrix for 20 items (N = 500). Using the lavaan package, fit four models with CFA (one single factor, correlated 4-factor model, 4-factor model with higher order general factor, and a bifactor model with a general factor and four specific factors).
- Evaluate each model (e.g. model fit, factor loadings, residual matrix, etc.) and which model would you use. Justify your answer (2pts).
- Look at factor correlation matrix and the correlated factors model. Do the numbers suggest a higher order general factor? Explain (1pt).
- In higher order general factor model, is the structural model is over-identified? Explain and justify your answer (1pt).
- For the bifactor model, pick an item and show how the item variance can be decomposed using the loadings, residual variances, and R-square values (2pts).
Read in Data and Load and/or Install Packages
Packages needed for this assignment include:
Data for this Assignment: ’HW-4.txt"
Since this is file is already set up as a covariance matrix, we will also tell R we want to read this as a matrix instead of a dataframe.
Overview of Using Lavaan
Lavaan is an easy package for confirmatory CFA/SEM/Growth models in R. Lavaan works first by specifying your model and then using this model into a specific command where we can then change certain defaults of interest (e.g., factor identification).
- The official reference to the lavaan package is the following paper: Yves Rosseel (2012). lavaan: An R Package for Structural Equation Modeling. Journal of Statistical Software, 48(2), 1-36.
Basic terminology specific to the lavaan package
- ‘=~’ means we are defining a LV by whatever comes to the right of the tilde.
- For example, if we had a factor called ‘IQ’ and had 4 vbl’s to define ‘IQ,’ we would specify this exactly as ‘IQ =~ quant + verbal + spatial + analytic’
- The ‘+’ sign is what is used to seperate the indicators that correspond to a specific factor of interest.
- To extract information from these models, we simply use the ‘summary’ function. Lavaan defaults to unstandardized estimates for factor loadings, which is in the first column on the left of the output and labeled ‘Estimate.’
- For standardized loadings, we can add ‘standardized = TRUE’ when we ask for the output of our model.
- This will give you both partially standardized and fully standardized estimates that are the two last columns on the right hand side of the output.
- The ‘cfa’ command in lavaan has some defaults embedded in the function. For example, this will default to identifying the factor model by setting the first factor loading to one. It also assumes the latent variables are allowed to correlate.
- If you would prefer the factor to be identified by setting the factor variance to one and intercept to 0, then specify ‘std.lv = TRUE’ in your command.
- If you would like factors to be orthogonal from one another (which we will use later for the bifactor model), simply add ‘orthogonal = TRUE’ in your command.
- ML estimation is the default. However, you can specify ‘estimator =’ to change this to estimator options like MLR, MLM, MLMVS, MLF, WLSMV, WLS, ULS, DWLS, etc.
One-Factor CFA Model
Now that we have gone over some basics, let’s see what it actually looks like to specify a one factor model.
- Lavaan requires us to specify the measurement model first and save this as some sort of value we will later feed into the ‘cfa’ command.
- Since it is tedious to type in all the variable names manually, a handy command to use is below. This will call and print all variable names in quotations in the console.
- This allows us to copy those over much easier than individually typing out each variable name.
## c("V1", "V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10",
## "V11", "V12", "V13", "V14", "V15", "V16", "V17", "V18", "V19",
## "V20")
- We can now copy these over to define our factor. Lavaan does not need the quotations around each variable, however, so another easy way to save time is to copy these in your script, highlight all variables+ and click ‘ctrl+f’ or ‘command+f’ for mac users. You can then quickly erase all quotation marks and change the commas to ‘+’ signs by using the find and replace function in R.
- Make sure the boxes ‘in selection’ and ‘match case’ are selected.
Specifying the Measurement Model
You can name your measurement model anything you like. The latent variable you’re creating (in this case, ‘LV1’), can also be named whatever your heart desires. The variables used to define the factor, however, must match the variable names in the dataset/matrix being used.
one.factor <- "LV1 =~ V1+ V2+ V3+ V4+ V5+ V6+ V7+ V8+ V9+ V10 +
V11+ V12+ V13+ V14+ V15+ V16+ V17+ V18+ V19 + V20"Now that we have our model defined, we can actually run the one factor CFA fairly easily with the ‘cfa’ command.
- You will also want to name your cfa analysis by saving it as an object (below I named it ‘onefact.cfa’). This is typically referred to as the ‘fitted object,’ and will be used later to extract info about our model.
- Since we will be coming back to our CFA fitted objects quite a bit, I recommend having a consistent naming scheme for all the CFA’s you run so you can quickly extract information & know which model you are working with.
Running the One-Factor Model
- In lavaan, using raw data versus a covariance matrix requires slightly different specification.
- For raw data, we can simply specify ‘data = dataset.name’
- For a covariance matrix, we specify ‘sample.cov = cov.name’ AND ‘sample.nobs = #’ – the sammple.nobs is the actual number of observations we have. An example is below.
# raw data
model.rawdata <- cfa(model.name, std.lv = TRUE, data = df.name)
# co-variance matrix
model.cov <- cfa(model.name, std.lv = TRUE, sample.cov = matrix.name, sample.nobs = actual.num.of.obs)Code for Our Actual Model
- I decided to identify the model by setting the variance of the factor to 1 so I can freely estimate all factor loadings. As a reminder, to do so we simply add ‘std.lv = T’ to our line of code.
onefact.cfa<- cfa(one.factor, #'one.factor' comes from the previously defined measurement model
std.lv=TRUE, #tells lavaan we want factor to be identified with variance of 1
sample.cov=mat, sample.nobs=500) #covariance matrix name is 'mat'- If you wanted to identify the factor using the marker variable method, you can simply delete ‘std.lv = T’ (marker vbl method is default in lavaan).
Extracting Information From the Output
To extract other pieces of information, we can use the ‘inspect’ command in lavaan.
- For instance, if we want just factor loadings, we can use the command as follows:
- Importantly, if we want standardized loadings, we need to specify what= “std” in the line of code.
If we want the values of the residuals for each indicator, we also use the ‘inspect’ command.
- Lavaan uses LISREL notation, so ‘theta’ is used to call the residuals for all the indictors.
- Below, I am extracting the standardized thetas (residual) values.
## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11
## V1 0.315
## V2 0.000 0.399
## V3 0.000 0.000 0.505
## V4 0.000 0.000 0.000 0.504
## V5 0.000 0.000 0.000 0.000 0.461
## V6 0.000 0.000 0.000 0.000 0.000 0.250
## V7 0.000 0.000 0.000 0.000 0.000 0.000 0.245
## V8 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.462
## V9 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.329
## V10 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.305
## V11 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.318
## V12 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
## V13 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
## V14 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
## V15 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
## V16 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
## V17 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
## V18 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
## V19 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
## V20 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
## V12 V13 V14 V15 V16 V17 V18 V19 V20
## V1
## V2
## V3
## V4
## V5
## V6
## V7
## V8
## V9
## V10
## V11
## V12 0.413
## V13 0.000 0.288
## V14 0.000 0.000 0.345
## V15 0.000 0.000 0.000 0.335
## V16 0.000 0.000 0.000 0.000 0.624
## V17 0.000 0.000 0.000 0.000 0.000 0.693
## V18 0.000 0.000 0.000 0.000 0.000 0.000 0.564
## V19 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.566
## V20 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.574
We can also extract the residual co-variance/correlation matrix from our model using the command ‘lavResiduals.’
- If type = “raw”, this function returns the raw (= unscaled) difference between the observed and the expected (model-implied) summary statistics, as well as the standardized version of these residualds.
- If type = “cor”, or type = “cor.bollen”, the observed and model implied covariance matrices are first transformed to a correlation matrix before the residuals are computed.
- If type = “cor.bentler” - both the observed and model implied covariance matrices are rescaled by dividing the elements by the square roots of the corresponding variances of the observed covariance matrix.
- if zstat = T (which is the default) - can extract standardized residuals, which are the raw residuals divided by the corresponding (estimated) standard errors.
- if se = TRUE, can also get the standard error estimates for the residual matrix.
## $type
## [1] "cor.bollen"
##
## $cov
## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
## V1 0.000
## V2 0.138 0.000
## V3 0.127 0.090 0.000
## V4 0.142 0.121 0.124 0.000
## V5 0.148 0.119 0.103 0.146 0.000
## V6 -0.049 -0.009 -0.059 -0.084 -0.039 0.000
## V7 -0.043 -0.028 -0.027 -0.064 -0.033 0.085 0.000
## V8 -0.044 -0.064 -0.070 -0.067 -0.046 0.072 0.083 0.000
## V9 -0.037 -0.058 -0.033 -0.046 -0.054 0.071 0.055 0.058 0.000
## V10 -0.060 -0.049 -0.065 -0.059 -0.054 0.077 0.087 0.088 0.065 0.000
## V11 -0.034 -0.061 -0.048 -0.040 -0.052 -0.004 -0.007 0.013 -0.010 0.016
## V12 -0.046 -0.049 -0.072 -0.039 -0.048 -0.020 -0.012 0.006 -0.003 0.015
## V13 -0.048 -0.029 -0.028 -0.024 -0.041 -0.011 -0.012 -0.020 -0.009 -0.023
## V14 -0.029 -0.036 -0.059 -0.037 -0.035 -0.022 -0.009 -0.007 -0.001 -0.013
## V15 -0.068 -0.051 -0.064 -0.058 -0.051 0.000 -0.008 -0.001 0.018 0.010
## V16 0.067 0.049 0.090 0.051 0.028 -0.047 -0.068 -0.050 -0.054 -0.067
## V17 0.053 0.049 0.053 0.061 0.051 -0.072 -0.067 -0.085 -0.044 -0.066
## V18 0.065 0.027 0.117 0.084 0.047 -0.050 -0.063 -0.086 -0.072 -0.076
## V19 0.053 0.045 0.087 0.072 0.037 -0.054 -0.075 -0.085 -0.048 -0.082
## V20 0.060 0.029 0.082 0.073 0.018 -0.033 -0.052 -0.067 -0.056 -0.052
## V11 V12 V13 V14 V15 V16 V17 V18 V19 V20
## V1
## V2
## V3
## V4
## V5
## V6
## V7
## V8
## V9
## V10
## V11 0.000
## V12 0.098 0.000
## V13 0.093 0.079 0.000
## V14 0.112 0.104 0.113 0.000
## V15 0.082 0.099 0.087 0.113 0.000
## V16 -0.117 -0.090 -0.067 -0.133 -0.063 0.000
## V17 -0.071 -0.052 -0.056 -0.087 -0.079 0.260 0.000
## V18 -0.095 -0.091 -0.074 -0.117 -0.086 0.356 0.305 0.000
## V19 -0.107 -0.093 -0.045 -0.115 -0.086 0.373 0.310 0.383 0.000
## V20 -0.088 -0.100 -0.089 -0.109 -0.094 0.316 0.267 0.365 0.372 0.000
##
## $summary
## crmr crmr.se crmr.z crmr.pvalue ucrmr ucrmr.se
## cov 0.102 0.002 35.13 0 0.1 0.006