Multidimensional EFA is similar to its unidimensional variance, but with two or more latent variable (factor) in the model.
For this chapter, you’ll be using the bfi dataset, which consists of responses to 25 items measuring the Big Five personality traits.
Since you’ll be doing both exploratory and confirmatory factor analyses on this data, you’ll want to start out by splitting the dataset.
setwd("D:/Class Materials & Work/Summer 2020 practice/SEM/EFA_Multidimensional")
library(psych)
library(tidyverse)
library(semPlot)
library(ggcorrplot)
data("bfi")
# Establish two sets of indices to split the dataset
N <- nrow(bfi)
indices <- seq(1, N)
indices_EFA <- sample(indices, floor((.5*N)))
indices_CFA <- indices[!(indices %in% indices_EFA)]
# Use those indices to split the dataset into halves for your EFA and CFA
bfi_EFA <- bfi[indices_EFA, ]
bfi_CFA <- bfi[indices_CFA, ]
To empirically determine the dimensionality of your data, a common strategy is to examine the eigenvalues. Eigenvalues are calculated from a correlation matrix, so you’ll need to use cor() to calculate and store the dataset’s correlation matrix before calculating eigenvalues.
Note that we do not use lowercor() because the function will force the code to display the matrix, which is not yet necessary.
In computing the correlation, use pairwise complete observations to prevent NA in our matrix from missing data.
We will first calculate the correlation on bfi_EFA, and retain the bfi_CFA for our confirmatory analysis.
# Calculate the correlation matrix first
bfi_EFA_cor <- cor(bfi_EFA, use = "pairwise.complete.obs")
# Then use that correlation matrix to calculate eigenvalues
eigenvals <- eigen(bfi_EFA_cor)
# Look at the eigenvalues returned
eigenvals$values
## [1] 5.2710748 2.8431513 2.0960942 1.8751451 1.6514697 1.2529541 1.0709875
## [8] 0.9249280 0.8093461 0.8014285 0.7318150 0.6989594 0.6874003 0.6686124
## [15] 0.6473311 0.6053515 0.5737552 0.5647180 0.5236901 0.5074725 0.4873091
## [22] 0.4484087 0.4327757 0.4280691 0.3854361 0.3775701 0.3646952 0.2700513
The scree plot is a visual representation of eigenvalues. Visual inspection of the scree plot is a quick and easy way to get a feel for the dimensionality of your dataset. Like the eigen() function in the previous exercise, the scree() function takes a correlation matrix as its argument.
Eigenvalues can be generated from a principal component analysis or a factor analysis, and the scree() function calculates and plots both by default. Since eigen() finds eigenvalues via principal components analysis, we will use factors = FALSE so our scree plot will only display the values corresponding to those results.
# Then use the correlation matrix to create the scree plot
scree(bfi_EFA_cor, factors = FALSE)
From the above plot, The point where the slope of the curve is clearly leveling off (the "elbow) indicates that the number of factors to the left of this point should be retained as significant. For our case, six factors are recommended.
In our unidimensional EFA practice, we ran a unidimensional EFA by using the fa() function. To run a multidimensional EFA, you’ll want to use the nfactors argument to specify the number of factors desired.
# Run the EFA with six factors (as indicated by your scree plot)
EFA_model <- fa(bfi_EFA, nfactors = 6)
# View results from the model object
print(EFA_model)
## Factor Analysis using method = minres
## Call: fa(r = bfi_EFA, nfactors = 6)
## Standardized loadings (pattern matrix) based upon correlation matrix
## MR2 MR3 MR5 MR1 MR4 MR6 h2 u2 com
## A1 0.11 0.05 -0.56 -0.05 0.10 0.31 0.363 0.64 1.8
## A2 0.04 0.06 0.67 -0.07 0.00 -0.05 0.496 0.50 1.1
## A3 -0.02 0.03 0.61 -0.10 0.09 0.13 0.518 0.48 1.2
## A4 -0.10 0.18 0.42 0.00 -0.06 0.16 0.286 0.71 1.9
## A5 -0.14 0.00 0.50 -0.16 0.18 0.21 0.523 0.48 2.1
## C1 0.02 0.52 -0.03 0.08 0.20 0.03 0.325 0.67 1.4
## C2 0.07 0.67 0.04 0.19 0.12 0.19 0.509 0.49 1.4
## C3 0.02 0.53 0.08 0.05 0.00 0.05 0.291 0.71 1.1
## C4 0.06 -0.69 -0.03 0.09 0.09 0.23 0.575 0.42 1.3
## C5 0.15 -0.56 0.03 0.20 0.08 0.00 0.450 0.55 1.5
## E1 -0.12 0.07 -0.12 0.64 -0.07 0.09 0.452 0.55 1.2
## E2 0.08 -0.04 -0.09 0.66 -0.08 0.00 0.557 0.44 1.1
## E3 0.01 0.01 0.16 -0.31 0.42 0.21 0.488 0.51 2.8
## E4 -0.04 0.04 0.23 -0.49 0.04 0.32 0.548 0.45 2.2
## E5 0.14 0.27 0.08 -0.37 0.28 0.01 0.428 0.57 3.2
## N1 0.81 -0.01 -0.10 -0.10 0.00 0.01 0.641 0.36 1.1
## N2 0.83 0.02 -0.04 -0.06 -0.02 -0.05 0.664 0.34 1.0
## N3 0.69 -0.04 0.06 0.10 0.03 0.05 0.530 0.47 1.1
## N4 0.45 -0.16 0.08 0.39 0.07 0.01 0.486 0.51 2.4
## N5 0.48 0.03 0.18 0.20 -0.13 0.11 0.354 0.65 2.0
## O1 -0.03 0.09 -0.04 -0.04 0.59 0.05 0.387 0.61 1.1
## O2 0.14 -0.09 0.08 0.01 -0.32 0.39 0.300 0.70 2.4
## O3 -0.02 0.00 0.04 -0.08 0.62 -0.07 0.423 0.58 1.1
## O4 0.13 -0.02 0.16 0.34 0.39 -0.10 0.269 0.73 2.7
## O5 0.04 -0.04 0.01 0.04 -0.42 0.38 0.318 0.68 2.0
## gender 0.22 0.14 0.31 -0.08 -0.22 -0.13 0.180 0.82 3.8
## education -0.05 -0.01 0.08 0.12 0.16 -0.21 0.079 0.92 3.1
## age -0.09 0.09 0.24 0.07 0.04 -0.26 0.143 0.86 2.7
##
## MR2 MR3 MR5 MR1 MR4 MR6
## SS loadings 2.55 2.14 2.13 2.08 1.76 0.92
## Proportion Var 0.09 0.08 0.08 0.07 0.06 0.03
## Cumulative Var 0.09 0.17 0.24 0.32 0.38 0.41
## Proportion Explained 0.22 0.18 0.18 0.18 0.15 0.08
## Cumulative Proportion 0.22 0.40 0.59 0.77 0.92 1.00
##
## With factor correlations of
## MR2 MR3 MR5 MR1 MR4 MR6
## MR2 1.00 -0.18 -0.07 0.24 -0.01 0.12
## MR3 -0.18 1.00 0.22 -0.22 0.21 0.01
## MR5 -0.07 0.22 1.00 -0.30 0.22 0.13
## MR1 0.24 -0.22 -0.30 1.00 -0.22 -0.09
## MR4 -0.01 0.21 0.22 -0.22 1.00 0.07
## MR6 0.12 0.01 0.13 -0.09 0.07 1.00
##
## Mean item complexity = 1.8
## Test of the hypothesis that 6 factors are sufficient.
##
## The degrees of freedom for the null model are 378 and the objective function was 7.93 with Chi Square of 11010.9
## The degrees of freedom for the model are 225 and the objective function was 0.61
##
## The root mean square of the residuals (RMSR) is 0.02
## The df corrected root mean square of the residuals is 0.03
##
## The harmonic number of observations is 1374 with the empirical chi square 602.53 with prob < 3.1e-36
## The total number of observations was 1400 with Likelihood Chi Square = 844.29 with prob < 1.8e-72
##
## Tucker Lewis Index of factoring reliability = 0.902
## RMSEA index = 0.044 and the 90 % confidence intervals are 0.041 0.048
## BIC = -785.66
## Fit based upon off diagonal values = 0.98
## Measures of factor score adequacy
## MR2 MR3 MR5 MR1 MR4 MR6
## Correlation of (regression) scores with factors 0.93 0.89 0.88 0.89 0.86 0.78
## Multiple R square of scores with factors 0.86 0.79 0.78 0.79 0.73 0.60
## Minimum correlation of possible factor scores 0.71 0.59 0.57 0.57 0.47 0.20
# Create a path diagram of the items' factor loadings
fa.diagram(EFA_model)
We will examine items’ factor loadings and individuals’ factor scores.
An item’s loadings represent the amount of information it provides for each factor. Only items’ meaningful loadings will be displayed in the output.
You’ll notice that many items load onto more than one factor, which means they provide information about multiple factors. This may not be desirable for measure development, so some researchers consider only the strongest loading for each item.
Each examinee will have a factor score for each factor, so that the matrix won’t include blanks. However, examinees with missing data will receive NA scores on all factors.
# View items' factor loadings
EFA_model$loadings
##
## Loadings:
## MR2 MR3 MR5 MR1 MR4 MR6
## A1 0.111 -0.561 0.100 0.312
## A2 0.672
## A3 0.612 -0.104 0.128
## A4 0.183 0.417 0.157
## A5 -0.138 0.502 -0.159 0.179 0.211
## C1 0.519 0.197
## C2 0.667 0.189 0.124 0.186
## C3 0.526
## C4 -0.694 0.227
## C5 0.151 -0.565 0.204
## E1 -0.117 -0.120 0.642
## E2 0.659
## E3 0.161 -0.305 0.416 0.214
## E4 0.226 -0.490 0.317
## E5 0.136 0.269 -0.369 0.277
## N1 0.805 -0.103 -0.100
## N2 0.832
## N3 0.687
## N4 0.450 -0.157 0.392
## N5 0.482 0.181 0.198 -0.135 0.113
## O1 0.588
## O2 0.140 -0.321 0.395
## O3 0.617
## O4 0.132 0.156 0.339 0.393 -0.102
## O5 -0.416 0.380
## gender 0.218 0.141 0.309 -0.224 -0.131
## education 0.118 0.158 -0.209
## age 0.244 -0.257
##
## MR2 MR3 MR5 MR1 MR4 MR6
## SS loadings 2.459 1.987 1.935 1.821 1.644 0.893
## Proportion Var 0.088 0.071 0.069 0.065 0.059 0.032
## Cumulative Var 0.088 0.159 0.228 0.293 0.352 0.384
# View the first few lines of examinees' factor scores
head(EFA_model$scores)
## MR2 MR3 MR5 MR1 MR4 MR6
## 65709 2.0021187 -0.24275728 0.46012234 1.5313921 0.60206450 -0.3383439
## 62161 -0.1528222 -0.86285915 -0.17615070 -0.8113223 0.05121022 0.5098835
## 66929 -0.8763783 0.04723032 -0.05058311 1.5832182 0.17495513 -0.1624284
## 65841 NA NA NA NA NA NA
## 64673 0.1832247 -0.10032073 -1.78674553 2.1871294 -0.99658360 -0.5838158
## 61890 0.2378832 -0.38046507 0.05639170 -0.4019052 1.26262190 0.4306842
Now that you know the basics of model fit, let’s take a look at the absolute fit statistics for your six-factor EFA on the bfi_EFA dataset.
#Show the reference value for model good fit at P = 0.01
data.frame(
`Chi-square test` = "p-value < 0.01",
TLI = 0.916,
RMSEA = 0.045)
## Chi.square.test TLI RMSEA
## 1 p-value < 0.01 0.916 0.045
Usually for CFA, The TLI is above the 0.90 cutoff and the RMSEA is below the 0.05 cutoff indicate a good fit. The EFA_model shows the Tucker Lewis Index of factoring reliability = 0.885 and RMSEA index = 0.047. Therefore, the model has a good fit.
The items are theorized to load onto five factors, but the scree plot indicates the optimal number of six factors. We can use fit statistics to make am empirical decision about how many factors to use.
We can use the bfi_EFA dataset to run both hypothesized number of factors, five and six. Then, we compare the Bayesian Information Criterion (BIC) value to determine the preferred model, with its lower variation as a priority.
# Run each theorized EFA on your dataset
bfi_theory <- fa(bfi_EFA, nfactors = 5)
bfi_eigen <- fa(bfi_EFA, nfactors = 6)
# Compare the BIC values
bfi_theory$BIC
## [1] -431.2817
bfi_eigen$BIC
## [1] -785.6592
The eigenvalue-driven model has a much lower BIC, so it is empirically preferred.