Factor analysis includes both confirmatory factor analysis (CFA) and exploratory factor analysis (EFA). However, CFA and EFA serve different research purposes. CFA is mainly used to test a hypothesized model that is based on substantive theory and/or prior empirical studies in a confirmatory way. The major purposes of EFA are to identify and interpret the underlying construct(s), to develop operational/observed indicators of the underlying construct(s), and to validate measures using different samples.
EFA is frequently mistreated as the same as Principal Component Analysis (PCA) though they are fundamentally different. Factor analysis is a statistical methods used to describe associations among observed variables with respect to one or more latent variables, whereas PCA is used to decompose a covariance or correlation matrix to provide components (Helwig, 2017).
The general procedure of conducting an EFA study includes three steps: (1) identifying the number of factors, (2) run an EFA analysis to obtain factor loading matrices, and (3) interpret the meaning of factors. Researchers could modify their models with removing some indicators with poor quality (e.g., low factor loadings) and go through the procedure again based on their specific research needs. For this tutorial, we will focus on the first two steps and introduce how to use our EFA shiny app to conduct EFA analysis.
This Shiny App provides an interactive online tool of EFA and PCA for both continuous and ordinal categorical variables (e.g., binary or Likert-type items).
Pre-analysis: 1. Open the link in a web browser
The URL of EFA Shinny App is https://lokhc.shinyapps.io/EFA2/. Open it in a web browser, a simple web page will show as below.
Pre-analysis: 2. Import a CSV or SAV data file
In order to run EFA, you need upload a CSV or SPSS (.sav) data file first. In this guideline, a sample dataset named WISC_data.sav will be uploaded, which is from a public available data source. Users can obtain this sample data set in the following link: http://psych.colorado.edu/~carey/Courses/PSYC7291/ClassDataSets.htm.
This data set comes from Using multivariate statistics (Tabachnick & Fidell, 1996), which contains subscale scores for the Weschler Intelligence Scale for Children (WISC-R).
Click on âBrowse…â button and find where the data file is stored, then double click the data file.
When the data set has been uploaded successfully, the progress bar shows âUpload completeâ, a sample of data (10 rows) appears on the right side, and three tabs are shown under the progress bar as well (i.e., âVariablesâ, âParallel Analysisâ, and âExtractionâ).
On the right side, under âData Tableâ tab the data file you just uploaded is displayed. The variable names and first 10 rows are shown by default. Users can change this default by clicking on âShow 10 entriesâ and select different number of rows.
Pre-analysis: 3. Select variables for analysis
Select a subset of variables: Click the box under âSelect Variablesâ to choose variables one by one.
Select all of the variables: Click the âSelect Allâ button first; if you want to remove a variable, you can simply click on that variable name and delete it using the âDeleteâ key from your computer keyboard.
Remove all variables: Click âSelect Noneâ.
For example, the first two variables âclientâ and âagemateâ will be excluded in that they are not intelligence items in the test. The rest variables will be included, which are Information, Comprehension, Arithmetic, Similarities, Vocabulary, Digit Span, Picture Completion, Paragraph Arrangement, Block Design, Object Assembly and Coding respectively.
EFA Procedure
Step-1: Identifying the number of factors using parallel analysis
You can run a Hornâs Parallel Analysis to help determine the number of factors you want to extract from the data. For more details about parallel analysis, please see Horn (1965), OâConnor (2000) and Garrido et al. (2013).
Click âParallel Analysisâ tab.
Set the number of simulated analyses to perform (the default number is 100).
Set the quantile of simulated eigenvalues (the default is 0.95).
Select the type of variables: continuous or categorical.
Click the âRun Parallel Analysisâ button.
The Psych R package is used for this demonstration, which provides the solutions using both EFA and PCA. The main difference between these two methods is that PCA involves the eigen-decomposition of the correlation matrix while EFA involves the eigen-decomposition of the correlation matrix R with the diagonal elements replaced by the communalities.
In the following example, FA method suggests two factors and PCA suggests two components. Under âEigenvaluesâ, the first two columns present the results using EFA method and the third and fourth columns present the results using PCA. For each method, eigenvalues calculated from original data are compared to those obtained from simulated data. For the third factor, the eigenvalues obtained from original data and simulated data are almost equivalent (0.25 vs 0.24). Hence, two factors are recommended by parallel analysis. Using PCA method, the eigenvalue obtained from the original data (1.12) is smaller than that obtained from simulated data (1.21), so that two components are recommended.
Note. The detailed descriptions of differences between FA and PCA in parallel analysis can be found at https://stats.stackexchange.com/questions/52224/whats-the-difference-between-a-component-and-a-factor-in-parallel-analysis.
In the âScree Plotsâ section, the results of Parallel Analysis are visualized. Plot on the left side shows the result from PCA method, and the right side plot shows the result from EFA method. Both plots suggest a two-factor solution.
Step-2: run an EFA to obtain factor loading matrices
Clicking the âExtractionâ tab, you can start from EFA or PCA.
âMethodâ
You can choose either âPrincipal Component Analysisâ or âExploratory Factor Analysisâ. This tutorial focuses more on EFA.
âNumber of factors to extractâ
Enter the number that you obtained from parallel analysis or some number that you want to explore. In this tutorial, we assume that three factors will be extracted.
âAnalyzeâ
Choose appropriate correlation matrix for your analysis: âPearson correlation matrixâ for continuous variables; âTetrachoric correlation matrixâ for binary variables; and âPolychoric correlation matrixâ for polytomous/Likert-type scale variables.
âRotation methodâ
The default is the ânoneâ of rotation method. You can try a rotation method in the âRotation methodâ pull-down menu for a better interpretation of your factor model. Note that the rotation does not increase the proportion of variance explained by the model. Three common rotation methods are offered in this Shiny App, âVarimaxâ (orthogonal rotation method), as well as âPromaxâ and âObliminâ (oblique rotation methods). Readers can refer to Abdi (2003), Garrett-Mayer (2006a) for more information on the factor analysis rotation methods.
âFactoring methodâ
This Shiny App provides four commonly used estimation methods for factor analysis, i.e., âMaximum likelihood factor analysisâ, âWeighted least squares solutionâ, âGeneralized weighted least squaresâ, and âPrincipal factor solutionâ. For more detailed description and discussion, please refer to the references, e.g. Garrett-Mayer (2006,b) and Helwig (2017).
âRun EFAâ
A brief description of the output from this EFA Shiny App is provided as follows.
âTotal Variance Explainedâ
The eigenvalues, proportion of variance explained and the cumulative variance explained are provided for the three factors that we decided from Step-1.
âFactor Loadingsâ
The table shows factor loadings obtained from Maximum Likelihood method (ML). In the current example, 3 factors are shown. The majority of variables have larger loadings on the first factor, which may indicate a unidimensional factor structure.
âCommunality/Uniquenessâ
In this section, the table displays both the explained variance (Communality) and unexplained variance (Uniqueness) for each variable.
âScree Plotâ
The plot shows eigenvalues of each factor/component arranged in decreasing order.
The procedure of PCA is similar to EFA. For more information about PCA, please refer to the literature listed in the References, e.g., Field (2009) and Helwig (2017).
Abdi, H. (2003). Factor rotations in factor analyses.Encyclopedia for Research Methods for the Social Sciences. Sage: Thousand Oaks, CA, 792-795.
Field, A. (2009).Exploratory Factor Analysis in Discovering statistics using SPSS. Sage publications.
Garrett-Mayer, E. (2006). Factor Analysis I. Retrieved from: ocw.jhsph.edu/courses/StatisticsPsychosocialResearch/PDFs/Lecture8.pdf
Garrett-Mayer, E. (2006). Factor Analysis II. Retrieved from: ocw.jhsph.edu/courses/StatisticsPsychosocialResearch/PDFs/Lecture9.pdf
Garrido, L. E., Abad, F. J., & Ponsoda, V. (2013). A new look at Hornâs parallel analysis with ordinal variables.Psychological methods,18(4), 454.
Helwig, N. E. (2017). Factor Analysis. Retrieved from: users.stat.umn.edu/~helwig/notes/factanal-Notes.pdf
Horn, J. L. (1965). A rationale and test for the number of factors in factor analysis.Psychometrika,30(2), 179-185.
Oâconnor, B. P. (2000). SPSS and SAS programs for determining the number of components using parallel analysis and Velicerâs MAP test.Behavior research methods, instruments, & computers,32(3), 396-402.
Revelle, W. (2017). How To: Use the psych package for Factor Analysis and data Reduction. Retrieved from: http://personality-project.org/r/psych/HowTo/factor.pdf
Tryfos, P. (1997). Chapter 14: Factor Analysis.Methods for Business Analysis and Forecasting: Text and Case.
Zhang, G., & Preacher, K. J. (2015). Factor rotation and standard errors in exploratory factor analysis.Journal of Educational and Behavioral Statistics,40(6), 579-603.