Single-factor EFA

Exploratory Factor Analysis (EFA) is to be used when you are piloting a measure, but you’re not sure which factor(s) each item measures.

The gcbs dataset used for this example contains 2,495 responses to 15 multiple choice questions, or items, from the Generic Conspiracist Beliefs Scale, which are designed to test respondents’ level of belief in conspiracies. Let’s start by running a single-factor EFA and plotting the loadings, mean scores and confidence intervals for each variable in the dataset. Factor loadings represent the strength and directionality of the relationship between each item and the underlying factor.

Internal Consistency

If you want to both develop and confirm a theory of how items are related to underlying factors, you’ll want to use both EFA and CFA on your dataset. As a first step, you’ll need to split it to create a training and a testing set and then test for correlations and reliability.

Coefficient alpha, also called Cronbach’s alpha, is a measure of the internal consistency of your measure, which is also called reliability. It is a measure of how well those items relate to each other. The gcbs items have a coefficient alpha of 0.93, which suggests excellent reliability as, in general, values greater than 0.8 are desired.


Reliability analysis   
Call: alpha(x = GCBS)

  raw_alpha std.alpha G6(smc) average_r S/N   ase mean sd
      0.93      0.93    0.94      0.48  14 0.002  2.9  1
 median_r
     0.47

    95% confidence boundaries 
         lower alpha upper
Feldt     0.93  0.93  0.94
Duhachek  0.93  0.93  0.94

 Reliability if an item is dropped:
    raw_alpha std.alpha G6(smc) average_r S/N alpha se
Q1       0.93      0.93    0.94      0.48  13   0.0021
Q2       0.93      0.93    0.94      0.48  13   0.0021
Q3       0.93      0.93    0.94      0.49  13   0.0020
Q4       0.93      0.93    0.94      0.47  13   0.0022
Q5       0.93      0.93    0.94      0.48  13   0.0021
Q6       0.93      0.93    0.94      0.48  13   0.0021
Q7       0.93      0.93    0.94      0.48  13   0.0021
Q8       0.93      0.93    0.94      0.48  13   0.0020
Q9       0.93      0.93    0.94      0.48  13   0.0021
Q10      0.93      0.93    0.94      0.49  14   0.0020
Q11      0.93      0.93    0.94      0.48  13   0.0021
Q12      0.93      0.93    0.94      0.47  13   0.0022
Q13      0.93      0.93    0.94      0.48  13   0.0021
Q14      0.93      0.93    0.94      0.48  13   0.0021
Q15      0.93      0.93    0.94      0.49  14   0.0020
     var.r med.r
Q1  0.0105  0.46
Q2  0.0099  0.47
Q3  0.0084  0.48
Q4  0.0105  0.46
Q5  0.0112  0.48
Q6  0.0104  0.46
Q7  0.0098  0.47
Q8  0.0086  0.49
Q9  0.0108  0.46
Q10 0.0102  0.49
Q11 0.0104  0.46
Q12 0.0093  0.46
Q13 0.0092  0.48
Q14 0.0109  0.46
Q15 0.0095  0.49

 Item statistics 
       n raw.r std.r r.cor r.drop mean  sd
Q1  2495  0.73  0.73  0.70   0.68  3.5 1.5
Q2  2495  0.74  0.74  0.72   0.69  3.0 1.5
Q3  2495  0.68  0.67  0.66   0.62  2.0 1.4
Q4  2495  0.78  0.78  0.76   0.74  2.6 1.5
Q5  2495  0.70  0.70  0.67   0.65  3.3 1.5
Q6  2495  0.76  0.76  0.74   0.72  3.1 1.5
Q7  2495  0.75  0.75  0.73   0.70  2.7 1.5
Q8  2495  0.69  0.69  0.68   0.63  2.5 1.6
Q9  2495  0.72  0.72  0.69   0.67  2.2 1.4
Q10 2495  0.61  0.61  0.57   0.55  3.5 1.4
Q11 2495  0.74  0.74  0.72   0.69  3.3 1.4
Q12 2495  0.79  0.79  0.79   0.75  2.6 1.5
Q13 2495  0.71  0.71  0.70   0.66  2.1 1.4
Q14 2495  0.76  0.76  0.74   0.71  3.0 1.5
Q15 2495  0.60  0.62  0.58   0.56  4.2 1.1

Non missing response frequency for each item
       0    1    2    3    4    5 miss
Q1  0.00 0.16 0.12 0.12 0.27 0.32    0
Q2  0.01 0.23 0.19 0.16 0.20 0.22    0
Q3  0.00 0.55 0.13 0.12 0.10 0.10    0
Q4  0.00 0.32 0.18 0.15 0.20 0.14    0
Q5  0.00 0.19 0.14 0.13 0.28 0.26    0
Q6  0.00 0.23 0.15 0.15 0.23 0.24    0
Q7  0.00 0.33 0.19 0.13 0.18 0.17    0
Q8  0.00 0.44 0.12 0.14 0.12 0.18    0
Q9  0.00 0.45 0.19 0.12 0.12 0.11    0
Q10 0.00 0.14 0.12 0.14 0.30 0.30    0
Q11 0.00 0.16 0.14 0.19 0.27 0.24    0
Q12 0.00 0.34 0.18 0.15 0.17 0.17    0
Q13 0.01 0.51 0.15 0.15 0.10 0.09    0
Q14 0.00 0.25 0.17 0.15 0.22 0.20    0
Q15 0.00 0.05 0.05 0.08 0.27 0.55    0

Multi-dimensional EFA

When you are trying to figure out how many unobservable (latent) factors are represented by the items on the measure, you can think of EFA as a dimensionality reduction technique. In this case, you may look at the eigenvalues.

The Big Five Inventory dataset contains 2,800 subjects’ responses to 25 questions that measure each of the big five personality traits: extraversion, agreeableness, openness, conscientiousness, and neuroticism. Item responses range from 1 to 6, representing respondents’ ratings on a six-point scale ranging from Very Inaccurate to Very Accurate. Let’s pretend we don’t have the information regarding the 5 personality traits or factors. To figure out the number of factors the items represent, you can look at eigenvalues, which are a way of quantifying the unique factors within a correlation matrix. Eigenvalues are numeric representations of the amount of variance explained by each factor or component. A general rule is that eigenvalues greater than 1 represent meaningful factors. You can visualize eigenvalues with a scree plot and see that 6 factors are recommended.

Multi-factor EFA and Model Fit Statistics

Following the scree plot, we start by fitting a factor analysis model with 6 factors according to the eigenvalues. We then fit one with five factors according to the theory, and select the best model based on model fit statistics.

Factor Analysis using method =  minres
Call: fa(r = bfi_EFA, nfactors = 6)
Standardized loadings (pattern matrix) based upon correlation matrix
     MR2   MR1   MR3   MR5   MR4   MR6   h2   u2 com
A1  0.07 -0.10  0.10 -0.61  0.05  0.26 0.37 0.63 1.5
A2  0.06 -0.10  0.07  0.68  0.01 -0.03 0.53 0.47 1.1
A3 -0.05 -0.10  0.06  0.56  0.10  0.21 0.53 0.47 1.5
A4 -0.07 -0.01  0.23  0.37 -0.10  0.24 0.30 0.70 2.7
A5 -0.17 -0.13  0.04  0.40  0.12  0.28 0.45 0.55 2.7
C1  0.03  0.06  0.52 -0.08  0.22  0.08 0.33 0.67 1.5
C2  0.10  0.13  0.64 -0.02  0.13  0.17 0.46 0.54 1.4
C3  0.03  0.06  0.57  0.08 -0.02  0.02 0.33 0.67 1.1
C4  0.07  0.09 -0.62 -0.08  0.06  0.30 0.55 0.45 1.6
C5  0.15  0.12 -0.58 -0.04  0.13  0.02 0.45 0.55 1.3
E1 -0.13  0.62  0.11 -0.10 -0.06  0.05 0.40 0.60 1.2
E2  0.07  0.68 -0.01 -0.08 -0.06  0.01 0.56 0.44 1.1
E3  0.00 -0.38 -0.01  0.10  0.37  0.22 0.48 0.52 2.8
E4 -0.08 -0.48  0.06  0.14  0.07  0.37 0.55 0.45 2.2
E5  0.15 -0.42  0.25  0.09  0.20 -0.03 0.40 0.60 2.6
N1  0.84 -0.10  0.01 -0.06 -0.07 -0.03 0.68 0.32 1.1
N2  0.84 -0.04  0.03 -0.02  0.00 -0.07 0.68 0.32 1.0
N3  0.64  0.12 -0.06  0.05  0.07  0.13 0.52 0.48 1.2
N4  0.44  0.40 -0.17  0.06  0.12  0.04 0.52 0.48 2.5
N5  0.45  0.23 -0.01  0.12 -0.09  0.20 0.36 0.64 2.2
O1 -0.06 -0.08  0.07 -0.06  0.58  0.02 0.37 0.63 1.1
O2  0.12  0.01 -0.09  0.04 -0.36  0.39 0.30 0.70 2.4
O3  0.00 -0.12  0.00  0.07  0.61 -0.04 0.45 0.55 1.1
O4  0.09  0.34 -0.04  0.13  0.38  0.01 0.25 0.75 2.4
O5  0.03 -0.06 -0.04 -0.07 -0.42  0.39 0.32 0.68 2.1

                       MR2  MR1  MR3  MR5  MR4  MR6
SS loadings           2.50 2.15 2.09 1.78 1.61 0.98
Proportion Var        0.10 0.09 0.08 0.07 0.06 0.04
Cumulative Var        0.10 0.19 0.27 0.34 0.41 0.44
Proportion Explained  0.23 0.19 0.19 0.16 0.14 0.09
Cumulative Proportion 0.23 0.42 0.61 0.77 0.91 1.00

 With factor correlations of 
      MR2   MR1   MR3   MR5   MR4   MR6
MR2  1.00  0.26 -0.20 -0.14  0.04  0.11
MR1  0.26  1.00 -0.26 -0.31 -0.19 -0.10
MR3 -0.20 -0.26  1.00  0.21  0.19  0.02
MR5 -0.14 -0.31  0.21  1.00  0.26  0.15
MR4  0.04 -0.19  0.19  0.26  1.00  0.08
MR6  0.11 -0.10  0.02  0.15  0.08  1.00

Mean item complexity =  1.7
Test of the hypothesis that 6 factors are sufficient.

The degrees of freedom for the null model are  300  and the objective function was  7.4 with Chi Square of  10278.59
The degrees of freedom for the model are 165  and the objective function was  0.45 

The root mean square of the residuals (RMSR) is  0.02 
The df corrected root mean square of the residuals is  0.03 

The harmonic number of observations is  1382 with the empirical chi square  383.81  with prob <  1.8e-19 
The total number of observations was  1400  with Likelihood Chi Square =  625.59  with prob <  8.5e-55 

Tucker Lewis Index of factoring reliability =  0.916
RMSEA index =  0.045  and the 90 % confidence intervals are  0.041 0.048
BIC =  -569.71
Fit based upon off diagonal values = 0.99
Measures of factor score adequacy             
                                                   MR2  MR1
Correlation of (regression) scores with factors   0.93 0.89
Multiple R square of scores with factors          0.86 0.79
Minimum correlation of possible factor scores     0.73 0.58
                                                   MR3  MR5
Correlation of (regression) scores with factors   0.88 0.87
Multiple R square of scores with factors          0.78 0.76
Minimum correlation of possible factor scores     0.56 0.51
                                                   MR4  MR6
Correlation of (regression) scores with factors   0.85 0.79
Multiple R square of scores with factors          0.72 0.63
Minimum correlation of possible factor scores     0.44 0.25

TLI for this model is 0.91 and RMSEA is 0.04, indicating a good fit.

RMSEA, or the Root Mean Square Error of Approximation, quantifies the differences between the observed and expected data. Smaller RMSEA values of less than 0.05 can be said to indicate good fit.

The TLI, or Tucker-Lewis Index, can be roughly understood as how well the observed data match the expected data. A model can be considered to have good fit if the value is greater than 0.90.

To use relative fit statistics, we compare two different models: one where the theory behind the BFI dataset recommended five factors, against this one where the eigenvalues recommended six factors. When looking at BICs, the lowest BIC is always preferred. The BIC is lower for the eigenvalues model, which was estimated with six factors (BIC = -572.66).