Multidimensional EFA

Multidimensional EFA is similar to its unidimensional variance, but with two or more latent variable (factor) in the model.

Splitting the BFI dataset

For this chapter, you’ll be using the bfi dataset, which consists of responses to 25 items measuring the Big Five personality traits.

Since you’ll be doing both exploratory and confirmatory factor analyses on this data, you’ll want to start out by splitting the dataset.

setwd("D:/Class Materials & Work/Summer 2020 practice/SEM/EFA_Multidimensional")

library(psych)
library(tidyverse)
library(semPlot)
library(ggcorrplot)

data("bfi")

# Establish two sets of indices to split the dataset
N <- nrow(bfi)
indices <- seq(1, N)
indices_EFA <- sample(indices, floor((.5*N)))
indices_CFA <- indices[!(indices %in% indices_EFA)]

# Use those indices to split the dataset into halves for your EFA and CFA
bfi_EFA <- bfi[indices_EFA, ]
bfi_CFA <- bfi[indices_CFA, ]

Calculate Eigenvalue

To empirically determine the dimensionality of your data, a common strategy is to examine the eigenvalues. Eigenvalues are calculated from a correlation matrix, so you’ll need to use cor() to calculate and store the dataset’s correlation matrix before calculating eigenvalues.

Note that we do not use lowercor() because the function will force the code to display the matrix, which is not yet necessary.

In computing the correlation, use pairwise complete observations to prevent NA in our matrix from missing data.

We will first calculate the correlation on bfi_EFA, and retain the bfi_CFA for our confirmatory analysis.

# Calculate the correlation matrix first
bfi_EFA_cor <- cor(bfi_EFA, use = "pairwise.complete.obs") 

# Then use that correlation matrix to calculate eigenvalues
eigenvals <- eigen(bfi_EFA_cor)

# Look at the eigenvalues returned
eigenvals$values

##  [1] 5.2710748 2.8431513 2.0960942 1.8751451 1.6514697 1.2529541 1.0709875
##  [8] 0.9249280 0.8093461 0.8014285 0.7318150 0.6989594 0.6874003 0.6686124
## [15] 0.6473311 0.6053515 0.5737552 0.5647180 0.5236901 0.5074725 0.4873091
## [22] 0.4484087 0.4327757 0.4280691 0.3854361 0.3775701 0.3646952 0.2700513

Creating a scree plot

The scree plot is a visual representation of eigenvalues. Visual inspection of the scree plot is a quick and easy way to get a feel for the dimensionality of your dataset. Like the eigen() function in the previous exercise, the scree() function takes a correlation matrix as its argument.

Eigenvalues can be generated from a principal component analysis or a factor analysis, and the scree() function calculates and plots both by default. Since eigen() finds eigenvalues via principal components analysis, we will use factors = FALSE so our scree plot will only display the values corresponding to those results.

# Then use the correlation matrix to create the scree plot

scree(bfi_EFA_cor, factors = FALSE)

From the above plot, The point where the slope of the curve is clearly leveling off (the "elbow) indicates that the number of factors to the left of this point should be retained as significant. For our case, six factors are recommended.

Conducting a multidimensional EFA

In our unidimensional EFA practice, we ran a unidimensional EFA by using the fa() function. To run a multidimensional EFA, you’ll want to use the nfactors argument to specify the number of factors desired.

# Run the EFA with six factors (as indicated by your scree plot)
EFA_model <- fa(bfi_EFA, nfactors = 6)

# View results from the model object
print(EFA_model)

## Factor Analysis using method =  minres
## Call: fa(r = bfi_EFA, nfactors = 6)
## Standardized loadings (pattern matrix) based upon correlation matrix
##             MR2   MR3   MR5   MR1   MR4   MR6    h2   u2 com
## A1         0.11  0.05 -0.56 -0.05  0.10  0.31 0.363 0.64 1.8
## A2         0.04  0.06  0.67 -0.07  0.00 -0.05 0.496 0.50 1.1
## A3        -0.02  0.03  0.61 -0.10  0.09  0.13 0.518 0.48 1.2
## A4        -0.10  0.18  0.42  0.00 -0.06  0.16 0.286 0.71 1.9
## A5        -0.14  0.00  0.50 -0.16  0.18  0.21 0.523 0.48 2.1
## C1         0.02  0.52 -0.03  0.08  0.20  0.03 0.325 0.67 1.4
## C2         0.07  0.67  0.04  0.19  0.12  0.19 0.509 0.49 1.4
## C3         0.02  0.53  0.08  0.05  0.00  0.05 0.291 0.71 1.1
## C4         0.06 -0.69 -0.03  0.09  0.09  0.23 0.575 0.42 1.3
## C5         0.15 -0.56  0.03  0.20  0.08  0.00 0.450 0.55 1.5
## E1        -0.12  0.07 -0.12  0.64 -0.07  0.09 0.452 0.55 1.2
## E2         0.08 -0.04 -0.09  0.66 -0.08  0.00 0.557 0.44 1.1
## E3         0.01  0.01  0.16 -0.31  0.42  0.21 0.488 0.51 2.8
## E4        -0.04  0.04  0.23 -0.49  0.04  0.32 0.548 0.45 2.2
## E5         0.14  0.27  0.08 -0.37  0.28  0.01 0.428 0.57 3.2
## N1         0.81 -0.01 -0.10 -0.10  0.00  0.01 0.641 0.36 1.1
## N2         0.83  0.02 -0.04 -0.06 -0.02 -0.05 0.664 0.34 1.0
## N3         0.69 -0.04  0.06  0.10  0.03  0.05 0.530 0.47 1.1
## N4         0.45 -0.16  0.08  0.39  0.07  0.01 0.486 0.51 2.4
## N5         0.48  0.03  0.18  0.20 -0.13  0.11 0.354 0.65 2.0
## O1        -0.03  0.09 -0.04 -0.04  0.59  0.05 0.387 0.61 1.1
## O2         0.14 -0.09  0.08  0.01 -0.32  0.39 0.300 0.70 2.4
## O3        -0.02  0.00  0.04 -0.08  0.62 -0.07 0.423 0.58 1.1
## O4         0.13 -0.02  0.16  0.34  0.39 -0.10 0.269 0.73 2.7
## O5         0.04 -0.04  0.01  0.04 -0.42  0.38 0.318 0.68 2.0
## gender     0.22  0.14  0.31 -0.08 -0.22 -0.13 0.180 0.82 3.8
## education -0.05 -0.01  0.08  0.12  0.16 -0.21 0.079 0.92 3.1
## age       -0.09  0.09  0.24  0.07  0.04 -0.26 0.143 0.86 2.7
## 
##                        MR2  MR3  MR5  MR1  MR4  MR6
## SS loadings           2.55 2.14 2.13 2.08 1.76 0.92
## Proportion Var        0.09 0.08 0.08 0.07 0.06 0.03
## Cumulative Var        0.09 0.17 0.24 0.32 0.38 0.41
## Proportion Explained  0.22 0.18 0.18 0.18 0.15 0.08
## Cumulative Proportion 0.22 0.40 0.59 0.77 0.92 1.00
## 
##  With factor correlations of 
##       MR2   MR3   MR5   MR1   MR4   MR6
## MR2  1.00 -0.18 -0.07  0.24 -0.01  0.12
## MR3 -0.18  1.00  0.22 -0.22  0.21  0.01
## MR5 -0.07  0.22  1.00 -0.30  0.22  0.13
## MR1  0.24 -0.22 -0.30  1.00 -0.22 -0.09
## MR4 -0.01  0.21  0.22 -0.22  1.00  0.07
## MR6  0.12  0.01  0.13 -0.09  0.07  1.00
## 
## Mean item complexity =  1.8
## Test of the hypothesis that 6 factors are sufficient.
## 
## The degrees of freedom for the null model are  378  and the objective function was  7.93 with Chi Square of  11010.9
## The degrees of freedom for the model are 225  and the objective function was  0.61 
## 
## The root mean square of the residuals (RMSR) is  0.02 
## The df corrected root mean square of the residuals is  0.03 
## 
## The harmonic number of observations is  1374 with the empirical chi square  602.53  with prob <  3.1e-36 
## The total number of observations was  1400  with Likelihood Chi Square =  844.29  with prob <  1.8e-72 
## 
## Tucker Lewis Index of factoring reliability =  0.902
## RMSEA index =  0.044  and the 90 % confidence intervals are  0.041 0.048
## BIC =  -785.66
## Fit based upon off diagonal values = 0.98
## Measures of factor score adequacy             
##                                                    MR2  MR3  MR5  MR1  MR4  MR6
## Correlation of (regression) scores with factors   0.93 0.89 0.88 0.89 0.86 0.78
## Multiple R square of scores with factors          0.86 0.79 0.78 0.79 0.73 0.60
## Minimum correlation of possible factor scores     0.71 0.59 0.57 0.57 0.47 0.20

# Create a path diagram of the items' factor loadings
fa.diagram(EFA_model)

Interpreting the results

We will examine items’ factor loadings and individuals’ factor scores.
An item’s loadings represent the amount of information it provides for each factor. Only items’ meaningful loadings will be displayed in the output.

You’ll notice that many items load onto more than one factor, which means they provide information about multiple factors. This may not be desirable for measure development, so some researchers consider only the strongest loading for each item.

Each examinee will have a factor score for each factor, so that the matrix won’t include blanks. However, examinees with missing data will receive NA scores on all factors.

# View items' factor loadings
EFA_model$loadings

## 
## Loadings:
##           MR2    MR3    MR5    MR1    MR4    MR6   
## A1         0.111        -0.561         0.100  0.312
## A2                       0.672                     
## A3                       0.612 -0.104         0.128
## A4                0.183  0.417                0.157
## A5        -0.138         0.502 -0.159  0.179  0.211
## C1                0.519                0.197       
## C2                0.667         0.189  0.124  0.186
## C3                0.526                            
## C4               -0.694                       0.227
## C5         0.151 -0.565         0.204              
## E1        -0.117        -0.120  0.642              
## E2                              0.659              
## E3                       0.161 -0.305  0.416  0.214
## E4                       0.226 -0.490         0.317
## E5         0.136  0.269        -0.369  0.277       
## N1         0.805        -0.103 -0.100              
## N2         0.832                                   
## N3         0.687                                   
## N4         0.450 -0.157         0.392              
## N5         0.482         0.181  0.198 -0.135  0.113
## O1                                     0.588       
## O2         0.140                      -0.321  0.395
## O3                                     0.617       
## O4         0.132         0.156  0.339  0.393 -0.102
## O5                                    -0.416  0.380
## gender     0.218  0.141  0.309        -0.224 -0.131
## education                       0.118  0.158 -0.209
## age                      0.244               -0.257
## 
##                  MR2   MR3   MR5   MR1   MR4   MR6
## SS loadings    2.459 1.987 1.935 1.821 1.644 0.893
## Proportion Var 0.088 0.071 0.069 0.065 0.059 0.032
## Cumulative Var 0.088 0.159 0.228 0.293 0.352 0.384

# View the first few lines of examinees' factor scores
head(EFA_model$scores)

##              MR2         MR3         MR5        MR1         MR4        MR6
## 65709  2.0021187 -0.24275728  0.46012234  1.5313921  0.60206450 -0.3383439
## 62161 -0.1528222 -0.86285915 -0.17615070 -0.8113223  0.05121022  0.5098835
## 66929 -0.8763783  0.04723032 -0.05058311  1.5832182  0.17495513 -0.1624284
## 65841         NA          NA          NA         NA          NA         NA
## 64673  0.1832247 -0.10032073 -1.78674553  2.1871294 -0.99658360 -0.5838158
## 61890  0.2378832 -0.38046507  0.05639170 -0.4019052  1.26262190  0.4306842

Interpret absolute model fit statistics

Now that you know the basics of model fit, let’s take a look at the absolute fit statistics for your six-factor EFA on the bfi_EFA dataset.

#Show the reference value for model good fit at P = 0.01
data.frame(
  `Chi-square test` = "p-value < 0.01", 
  TLI = 0.916, 
  RMSEA = 0.045)

##   Chi.square.test   TLI RMSEA
## 1  p-value < 0.01 0.916 0.045

Usually for CFA, The TLI is above the 0.90 cutoff and the RMSEA is below the 0.05 cutoff indicate a good fit. The EFA_model shows the Tucker Lewis Index of factoring reliability = 0.885 and RMSEA index = 0.047. Therefore, the model has a good fit.

Selecting the best model

The items are theorized to load onto five factors, but the scree plot indicates the optimal number of six factors. We can use fit statistics to make am empirical decision about how many factors to use.

We can use the bfi_EFA dataset to run both hypothesized number of factors, five and six. Then, we compare the Bayesian Information Criterion (BIC) value to determine the preferred model, with its lower variation as a priority.

# Run each theorized EFA on your dataset
bfi_theory <- fa(bfi_EFA, nfactors = 5)
bfi_eigen <- fa(bfi_EFA, nfactors = 6)

# Compare the BIC values
bfi_theory$BIC

## [1] -431.2817

bfi_eigen$BIC

## [1] -785.6592

The eigenvalue-driven model has a much lower BIC, so it is empirically preferred.