Introduction to Exploratory Factor Analysis

Factor analysis includes both confirmatory factor analysis (CFA) and exploratory factor analysis (EFA). However, CFA and EFA serve different research purposes. CFA is mainly used to test a hypothesized model that is based on substantive theory and/or prior empirical studies in a confirmatory way. The major purposes of EFA are to identify and interpret the underlying construct(s), to develop operational/observed indicators of the underlying construct(s), and to validate measures using different samples.

EFA is frequently mistreated as the same as Principal Component Analysis (PCA) though they are fundamentally different. Factor analysis is a statistical methods used to describe associations among observed variables with respect to one or more latent variables, whereas PCA is used to decompose a covariance or correlation matrix to provide components (Helwig, 2017).

The general procedure of conducting an EFA study includes three steps: (1) identifying the number of factors, (2) run an EFA analysis to obtain factor loading matrices, and (3) interpret the meaning of factors. Researchers could modify their models with removing some indicators with poor quality (e.g., low factor loadings) and go through the procedure again based on their specific research needs. For this tutorial, we will focus on the first two steps and introduce how to use our EFA shiny app to conduct EFA analysis.


About the EFA Shiny App

This Shiny App provides an interactive online tool of EFA and PCA for both continuous and ordinal categorical variables (e.g., binary or Likert-type items).


Step-by-step tutorial

Pre-analysis: 1. Open the link in a web browser

The URL of EFA Shinny App is https://lokhc.shinyapps.io/EFA2/. Open it in a web browser, a simple web page will show as below.


Pre-analysis: 2. Import a CSV or SAV data file

In order to run EFA, you need upload a CSV or SPSS (.sav) data file first. In this guideline, a sample dataset named WISC_data.sav will be uploaded, which is from a public available data source. Users can obtain this sample data set in the following link: http://psych.colorado.edu/~carey/Courses/PSYC7291/ClassDataSets.htm.

This data set comes from Using multivariate statistics (Tabachnick & Fidell, 1996), which contains subscale scores for the Weschler Intelligence Scale for Children (WISC-R).


On the right side, under “Data Table” tab the data file you just uploaded is displayed. The variable names and first 10 rows are shown by default. Users can change this default by clicking on “Show 10 entries” and select different number of rows.


Pre-analysis: 3. Select variables for analysis

For example, the first two variables “client” and “agemate” will be excluded in that they are not intelligence items in the test. The rest variables will be included, which are Information, Comprehension, Arithmetic, Similarities, Vocabulary, Digit Span, Picture Completion, Paragraph Arrangement, Block Design, Object Assembly and Coding respectively.


EFA Procedure

Step-1: Identifying the number of factors using parallel analysis

You can run a Horn’s Parallel Analysis to help determine the number of factors you want to extract from the data. For more details about parallel analysis, please see Horn (1965), O’Connor (2000) and Garrido et al. (2013).


The Psych R package is used for this demonstration, which provides the solutions using both EFA and PCA. The main difference between these two methods is that PCA involves the eigen-decomposition of the correlation matrix while EFA involves the eigen-decomposition of the correlation matrix R with the diagonal elements replaced by the communalities.

In the following example, FA method suggests two factors and PCA suggests two components. Under “Eigenvalues”, the first two columns present the results using EFA method and the third and fourth columns present the results using PCA. For each method, eigenvalues calculated from original data are compared to those obtained from simulated data. For the third factor, the eigenvalues obtained from original data and simulated data are almost equivalent (0.25 vs 0.24). Hence, two factors are recommended by parallel analysis. Using PCA method, the eigenvalue obtained from the original data (1.12) is smaller than that obtained from simulated data (1.21), so that two components are recommended.

Note. The detailed descriptions of differences between FA and PCA in parallel analysis can be found at https://stats.stackexchange.com/questions/52224/whats-the-difference-between-a-component-and-a-factor-in-parallel-analysis.


In the “Scree Plots” section, the results of Parallel Analysis are visualized. Plot on the left side shows the result from PCA method, and the right side plot shows the result from EFA method. Both plots suggest a two-factor solution.


Step-2: run an EFA to obtain factor loading matrices

Clicking the “Extraction” tab, you can start from EFA or PCA.


A brief description of the output from this EFA Shiny App is provided as follows.





The procedure of PCA is similar to EFA. For more information about PCA, please refer to the literature listed in the References, e.g., Field (2009) and Helwig (2017).


References

Abdi, H. (2003). Factor rotations in factor analyses.Encyclopedia for Research Methods for the Social Sciences. Sage: Thousand Oaks, CA, 792-795.

Field, A. (2009).Exploratory Factor Analysis in Discovering statistics using SPSS. Sage publications.

Garrett-Mayer, E. (2006). Factor Analysis I. Retrieved from: ocw.jhsph.edu/courses/StatisticsPsychosocialResearch/PDFs/Lecture8.pdf

Garrett-Mayer, E. (2006). Factor Analysis II. Retrieved from: ocw.jhsph.edu/courses/StatisticsPsychosocialResearch/PDFs/Lecture9.pdf

Garrido, L. E., Abad, F. J., & Ponsoda, V. (2013). A new look at Horn’s parallel analysis with ordinal variables.Psychological methods,18(4), 454.

Helwig, N. E. (2017). Factor Analysis. Retrieved from: users.stat.umn.edu/~helwig/notes/factanal-Notes.pdf

Horn, J. L. (1965). A rationale and test for the number of factors in factor analysis.Psychometrika,30(2), 179-185.

O’connor, B. P. (2000). SPSS and SAS programs for determining the number of components using parallel analysis and Velicer’s MAP test.Behavior research methods, instruments, & computers,32(3), 396-402.

Revelle, W. (2017). How To: Use the psych package for Factor Analysis and data Reduction. Retrieved from: http://personality-project.org/r/psych/HowTo/factor.pdf

Tryfos, P. (1997). Chapter 14: Factor Analysis.Methods for Business Analysis and Forecasting: Text and Case.

Zhang, G., & Preacher, K. J. (2015). Factor rotation and standard errors in exploratory factor analysis.Journal of Educational and Behavioral Statistics,40(6), 579-603.