Introduction to CFA

Spring 2020

Intro to CFA

Researcher uses model to confirm an existing theory/hypothesis
Must specify a particular model for testing. In the modeling world, to 'specify' a model means to define the structure of the model including the number of factors, which items load on which factors, and whether the factors covary
Requires that the researcher have more a-priori ideas about the item covariances than in the EFA approach

Model Constraints

May test specific hypotheses by using particular patterns of constraints in the model
For example, to specify which items load onto which factors the researcher constrains all factor loadings between those items and other factors in the model to 0
By making constraints on the model, the remaining parameters are free to be estimated using maximum likelihood

Model Constraints

The use of model constraints is an important component of CFA and one of the things that distinguishes CFA from EFA
Most basic hypothesis test in CFA: does hypothesized model fit significantly better than baseline model (the worst possible model for the data)?

Model Constraints

More sophisticated and/or specific hypothesis can also be tested in the CFA framework using model constraints
Example: Do two groups of individuals (e.g. reperesenting two different cultures) have the same factor structure? This could be tested by constraining all factor loadings to be the same across the two groups and comparing the fit to unconstrained model
If the fit of the latter model is significantly better than the former, conclude that the two groups have different factor structures
This is a simplified example of a method called Measurement Invariance (or Factor Invariance) Testing

CFA Caution

It is important to remind you that a model can be specified in a number of ways, and just because a model has good fit doesn’t mean that it is a ‘good’ model theoretically.
Remember that a model is not useful if it doesn’t make sense theoretically, and the researcher is responsible for making sure this is the case.

Model Evaluation

How can you determine whether a hypothesis has been supported?
Use various fit statistics and fit indices to evaluate the model and draw conclusions about your hypotheses

Chi-Square

As long as you have degrees of freedom left in your model (we call this an overidentified model- a model with 0 degrees of freedom is called a just identified model), you can get a \(\chi^2\) value for your model
Represents the difference between the actual covariance structure, and the covariance structure implied by the model
Larger values represent worse fit
Hope to achieve non-significance and fail to reject the null in this test of overall model fit

Incremental Fit Indices

Allow you to compare the fit of your hypothesized model to the baseline model (also called the null model).
Tucker-Lewis Index (TLI) and the Comparative Fit Index (CFI)
2 slightly different calculations for model fit that generally range between 0 and 1. For both indices, values closer to 1 are desirable. A value of exactly 1 would suggest that the model misfit = 0
Many researchers agree that values should be greater than .90 for acceptable fit and .95 for good model fit

Incremental Fit Indices

Note: Some statisticians have pointed out that the CFI and TLI are not informative or reliable indicators of model fit unless the baseline model has an RMSEA > .158 (don't worry about the math on this one for now, I will show you how to calculate the RMSEA for the baseline model in R).

Absolute Fit Indices

Compares model fit to the best possible model, or a model that perfectly describes the data
Root mean-square error of approximation (RMSEA): measure of misfit, meaning that lower values are desirable
The best possible model would have an RMSEA of 0, meaning perfect fit. Thus, smaller values of RMSEA are desirable.
Value of < .05 is an indicator of good fit, and that values < .08 or even .10 represent acceptable fit. Values greater than .10 should certainly give you pause.

Comparative Fit Indices

Used to compare two nested models to determine which model fits the data better
Useful when you want to test competing hypotheses
Most popular: Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC)
The particular values are essentially meaningless when in comes to hypothesis testing. However, when models are nested they can be used to compare the two models. The model with a smaller AIC or BIC value is the preferred model
Both the AIC and BIC penalize for overly complex models.

Conclusions and Recommendations around Model Evaluation

There are many other fit statistics and fit indices in the literature; the ones I have described here are the most frequently reported
Increasing concern in scientific community about which fit indices researchers choose to report, and whether some researchers pick only the ones that confirm hypotheses
I recommend that you take a fully transparent approach and report all of the measures I have mentioned here or that you have access to