Exploratory Factor Analysis (EFA) is a multivariate technique used to identify underlying latent variables, or factors, that explain patterns of correlations among observed variables. Unlike PCA, which focuses on summarising total variance, EFA assumes that the observed variables are influenced by a smaller number of unobserved constructs, along with some measurement error.

The main goal of EFA is to uncover the underlying structure of a dataset by grouping variables that are highly correlated with each other into common factors. This can help reduce data complexity and provide a more interpretable representation of the relationships between variables.

In the following section, we will explore how to conduct EFA in R and how to interpret the resulting factor structure.

EFA Analysis in R

We will start by loading the most fundamental package: psych. If you never used the package psych, then you need to first install it:

install.packages('psych')

Then you can load the library:

library('psych')
#> Warning: package 'psych' was built under R version 4.5.3

EFA is commonly performed using the fa() function from the psych package. This function allows us to extract a specified number of factors and choose different extraction and rotation methods, making it very flexible for applied work.

The basic syntax of the function is:

Syntax:
fa(r, nfactors, rotate = “none”, fm = “minres”)

Where:

r: Is the dataset or correlation matrix

nfactors: Specifies the number of factors to extract. Default is 1.

rotate: Defines the rotation method used to improve interpretability of the factor solution. Default is none

fm: Specifies the factoring method. Default is minres

Read data

First we need to read in our data into R.Throughtout this example we will use the wine data. These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wines.

The attributes are:

Alcohol
Malic acid
Ash
Alcalinity of ash
Magnesium
Total phenols
Flavanoids
Nonflavanoid phenols
Proanthocyanins
Color intensity
Hue - OD280/OD315 of diluted wines
Proline

The wine data is in a .txt format, so to read in the data we can use the read.table() function in R.

wine <- read.table("http://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data", sep=",")

colnames(wine) <- c("Cultivar","Alcohol","Malic acid","Ash","Alcalinity of ash","Magnesium","Total phenols","Flavanoids","Nonflavanoid phenols","Proanthocyanins","Color intensity","Hue","OD280/OD315 of diluted wines","Proline")

dim(wine)
#> [1] 178  14

head(wine, 5)
#>   Cultivar Alcohol Malic acid  Ash Alcalinity of ash Magnesium Total phenols
#> 1        1   14.23       1.71 2.43              15.6       127          2.80
#> 2        1   13.20       1.78 2.14              11.2       100          2.65
#> 3        1   13.16       2.36 2.67              18.6       101          2.80
#> 4        1   14.37       1.95 2.50              16.8       113          3.85
#> 5        1   13.24       2.59 2.87              21.0       118          2.80
#>   Flavanoids Nonflavanoid phenols Proanthocyanins Color intensity  Hue
#> 1       3.06                 0.28            2.29            5.64 1.04
#> 2       2.76                 0.26            1.28            4.38 1.05
#> 3       3.24                 0.30            2.81            5.68 1.03
#> 4       3.49                 0.24            2.18            7.80 0.86
#> 5       2.69                 0.39            1.82            4.32 1.04
#>   OD280/OD315 of diluted wines Proline
#> 1                         3.92    1065
#> 2                         3.40    1050
#> 3                         3.17    1185
#> 4                         3.45    1480
#> 5                         2.93     735

The wine dataset contains 178 observations of 14 variables, including the 13 measured quantities of chemicals and the variable Cultivar, which indicates the type of grape from which the wine was produced. Since this is a categorical variable describing group membership rather than a measured chemical property, it is not included in the EFA.

Before performing EFA, it is important to check whether all variables are measured on the same scale. If not, the data should be standardized so that each variable contributes equally to the analysis.

wine_range <- data.frame(cbind(sapply(wine[,2:14],min),
                 sapply(wine[,2:14],max),
                 round(sapply(wine[,2:14],var),2))) 
names(wine_range) <- c("min value","max value","variance")

wine_range
#>                              min value max value variance
#> Alcohol                          11.03     14.83     0.66
#> Malic acid                        0.74      5.80     1.25
#> Ash                               1.36      3.23     0.08
#> Alcalinity of ash                10.60     30.00    11.15
#> Magnesium                        70.00    162.00   203.99
#> Total phenols                     0.98      3.88     0.39
#> Flavanoids                        0.34      5.08     1.00
#> Nonflavanoid phenols              0.13      0.66     0.02
#> Proanthocyanins                   0.41      3.58     0.33
#> Color intensity                   1.28     13.00     5.37
#> Hue                               0.48      1.71     0.05
#> OD280/OD315 of diluted wines      1.27      4.00     0.50
#> Proline                         278.00   1680.00 99166.72

The measured attributes have very different ranges, so the data should be standardized before performing EFA to ensure that all variables contribute equally to the analysis.

wine_stand <- as.data.frame(scale(wine)) # standardize data by subtracting the mean and deviding by the sd

wine_stand_range <- data.frame(cbind(sapply(wine_stand[,2:14],min),
                             sapply(wine_stand[,2:14],max),
                             round(sapply(wine_stand[,2:14],var),2)))
names(wine_stand_range) <- c("min value","max value","variance")

wine_stand_range
#>                              min value max value variance
#> Alcohol                      -2.427388  2.253415        1
#> Malic acid                   -1.428952  3.100446        1
#> Ash                          -3.668813  3.147447        1
#> Alcalinity of ash            -2.663505  3.145637        1
#> Magnesium                    -2.082381  4.359076        1
#> Total phenols                -2.101318  2.532372        1
#> Flavanoids                   -1.691200  3.054216        1
#> Nonflavanoid phenols         -1.862979  2.395645        1
#> Proanthocyanins              -2.063214  3.475269        1
#> Color intensity              -1.629691  3.425768        1
#> Hue                          -2.088840  3.292407        1
#> OD280/OD315 of diluted wines -1.889723  1.955399        1
#> Proline                      -1.488987  2.963114        1

Is EFA appropriate?

Before performing Exploratory Factor Analysis (EFA), it is important to assess whether the data are suitable for this technique. EFA relies on the presence of meaningful correlations between variables, so there should be sufficient correlation structure within the dataset.

This can first be explored visually using plots of the data or a correlation matrix (see the section on graphical displays of data). In addition, several more formal measures can be used to evaluate factorability.

These include the determinant of the correlation matrix (to check for excessive multicollinearity), Bartlett’s test of sphericity (to test whether the correlation matrix significantly differs from an identity matrix), and the Kaiser–Meyer–Olkin (KMO) measure of sampling adequacy (to assess whether partial correlations are sufficiently small).

To check whether the correlations among the measured variables are too high, we can look at the determinant of the correlation matrix (values larger than 0,00001 are okay).

det(cor(wine[,2:14]))
#> [1] 0.0004687431

The Bartlett’s test of sphericity can be performed by using the cortest.bartlett() funtoin from the psych package.

cortest.bartlett(wine_stand[,2:14])$p.value
#> R was not square, finding R from data
#> [1] 2.468617e-224

In this case Bartlett’s test rejects the null hypothesis that the correlation matrix is an identity matrix (p-value below 0.05).

The KMO index can be computed using the KMO() function of the psych package.

KMO(wine_stand[,2:14])
#> Kaiser-Meyer-Olkin factor adequacy
#> Call: KMO(r = wine_stand[, 2:14])
#> Overall MSA =  0.78
#> MSA for each item = 
#>                      Alcohol                   Malic acid 
#>                         0.73                         0.80 
#>                          Ash            Alcalinity of ash 
#>                         0.44                         0.68 
#>                    Magnesium                Total phenols 
#>                         0.68                         0.87 
#>                   Flavanoids         Nonflavanoid phenols 
#>                         0.81                         0.83 
#>              Proanthocyanins              Color intensity 
#>                         0.85                         0.62 
#>                          Hue OD280/OD315 of diluted wines 
#>                         0.79                         0.87 
#>                      Proline 
#>                         0.82

Perform EFA analysis

Before choosing the extraction method, we can check whether our data meets the assumption of multivariate normality (to use the Maximum Likelihood extraction method). The MVN package provides the Henze–Zirkler test by default, which is a reliable test for this purpose.

install.packages('MVN')

library('MVN')
#> Warning: package 'MVN' was built under R version 4.5.3
#> 
#> Attaching package: 'MVN'
#> The following object is masked from 'package:psych':
#> 
#>     mardia
mvn(wine_stand[,2:14])$multivariate_normality
#>            Test Statistic p.value     Method          MVN
#> 1 Henze-Zirkler     1.074  <0.001 asymptotic ✗ Not normal

# Shapiro tests (not perfect but indicative)
apply(wine_stand[,2:14], 2, shapiro.test)
#> $Alcohol
#> 
#>  Shapiro-Wilk normality test
#> 
#> data:  newX[, i]
#> W = 0.9818, p-value = 0.02005
#> 
#> 
#> $`Malic acid`
#> 
#>  Shapiro-Wilk normality test
#> 
#> data:  newX[, i]
#> W = 0.88878, p-value = 2.946e-10
#> 
#> 
#> $Ash
#> 
#>  Shapiro-Wilk normality test
#> 
#> data:  newX[, i]
#> W = 0.98395, p-value = 0.03868
#> 
#> 
#> $`Alcalinity of ash`
#> 
#>  Shapiro-Wilk normality test
#> 
#> data:  newX[, i]
#> W = 0.99023, p-value = 0.2639
#> 
#> 
#> $Magnesium
#> 
#>  Shapiro-Wilk normality test
#> 
#> data:  newX[, i]
#> W = 0.93833, p-value = 6.346e-07
#> 
#> 
#> $`Total phenols`
#> 
#>  Shapiro-Wilk normality test
#> 
#> data:  newX[, i]
#> W = 0.97668, p-value = 0.004395
#> 
#> 
#> $Flavanoids
#> 
#>  Shapiro-Wilk normality test
#> 
#> data:  newX[, i]
#> W = 0.95453, p-value = 1.679e-05
#> 
#> 
#> $`Nonflavanoid phenols`
#> 
#>  Shapiro-Wilk normality test
#> 
#> data:  newX[, i]
#> W = 0.96252, p-value = 0.0001055
#> 
#> 
#> $Proanthocyanins
#> 
#>  Shapiro-Wilk normality test
#> 
#> data:  newX[, i]
#> W = 0.98072, p-value = 0.01445
#> 
#> 
#> $`Color intensity`
#> 
#>  Shapiro-Wilk normality test
#> 
#> data:  newX[, i]
#> W = 0.94032, p-value = 9.229e-07
#> 
#> 
#> $Hue
#> 
#>  Shapiro-Wilk normality test
#> 
#> data:  newX[, i]
#> W = 0.98134, p-value = 0.01743
#> 
#> 
#> $`OD280/OD315 of diluted wines`
#> 
#>  Shapiro-Wilk normality test
#> 
#> data:  newX[, i]
#> W = 0.94505, p-value = 2.316e-06
#> 
#> 
#> $Proline
#> 
#>  Shapiro-Wilk normality test
#> 
#> data:  newX[, i]
#> W = 0.93119, p-value = 1.741e-07

The p-value <0.001 for the multivariate normality test indicates that we reject the null hypothesis of multivariate normality (since p < 0.05). This suggests that the data are not consistent with a multivariate normal distribution, so it is not appropriate to proceed with ML extraction. Since the multivariate normality test is very sensitive to sample size, another proxy is to look at univariate normality of all the variables. However, these Shapiro-Wilk tests also show that most variable deviate from the normality assumption. Therefore we will opt for the Ordinary Least Squares option (OLS).

Using the psych package fa() function, we run a factor analysis model, extracting 8 components using the wine data.

wine.fa <- fa(wine_stand[,2:14],nfactors=8, fm = "ols", rotate = "none")
wine.fa
#> Factor Analysis using method =  ols
#> Call: fa(r = wine_stand[, 2:14], nfactors = 8, rotate = "none", fm = "ols")
#> Standardized loadings (pattern matrix) based upon correlation matrix
#>                                  1     2     3     4     5     6     7     8
#> Alcohol                       0.31  0.73 -0.22  0.12  0.05 -0.20 -0.16  0.22
#> Malic acid                   -0.50  0.30  0.05 -0.23  0.25 -0.37  0.18  0.04
#> Ash                          -0.01  0.53  0.76  0.13 -0.18 -0.15 -0.07 -0.24
#> Alcalinity of ash            -0.54 -0.01  0.73 -0.24  0.04  0.16 -0.14  0.26
#> Magnesium                     0.28  0.40  0.10 -0.13 -0.41  0.07  0.34  0.09
#> Total phenols                 0.84  0.11  0.17  0.00  0.24  0.05  0.03 -0.03
#> Flavanoids                    0.92  0.01  0.20 -0.01  0.20  0.03 -0.01 -0.06
#> Nonflavanoid phenols         -0.67  0.05  0.20  0.63  0.25  0.07  0.20  0.06
#> Proanthocyanins               0.63  0.06  0.12 -0.14  0.22  0.17  0.22  0.05
#> Color intensity              -0.20  0.87 -0.20 -0.05  0.14  0.35 -0.11 -0.09
#> Hue                           0.62 -0.39  0.12  0.31 -0.19  0.04 -0.06  0.11
#> OD280/OD315 of diluted wines  0.79 -0.24  0.20 -0.03  0.15 -0.14 -0.04  0.03
#> Proline                       0.59  0.53 -0.12  0.18 -0.13 -0.06 -0.02  0.09
#>                                h2     u2 com
#> Alcohol                      0.80 0.1955 2.2
#> Malic acid                   0.63 0.3705 4.1
#> Ash                          1.00 0.0047 2.4
#> Alcalinity of ash            1.00 0.0050 2.6
#> Magnesium                    0.57 0.4350 4.3
#> Total phenols                0.80 0.1986 1.3
#> Flavanoids                   0.94 0.0609 1.2
#> Nonflavanoid phenols         1.00 0.0049 2.7
#> Proanthocyanins              0.56 0.4390 1.9
#> Color intensity              1.00 0.0035 1.7
#> Hue                          0.70 0.3001 2.7
#> OD280/OD315 of diluted wines 0.77 0.2286 1.5
#> Proline                      0.71 0.2895 2.4
#> 
#>                       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> SS loadings           4.51 2.33 1.40 0.71 0.57 0.41 0.31 0.22
#> Proportion Var        0.35 0.18 0.11 0.05 0.04 0.03 0.02 0.02
#> Cumulative Var        0.35 0.53 0.63 0.69 0.73 0.76 0.79 0.80
#> Proportion Explained  0.43 0.22 0.13 0.07 0.05 0.04 0.03 0.02
#> Cumulative Proportion 0.43 0.65 0.79 0.86 0.91 0.95 0.98 1.00
#> 
#> Mean item complexity =  2.4
#> Test of the hypothesis that 8 factors are sufficient.
#> 
#> df null model =  78  with the objective function =  7.67 with Chi Square =  1317.18
#> df of  the model are 2  and the objective function was  0.01 
#> 
#> The root mean square of the residuals (RMSR) is  0 
#> The df corrected root mean square of the residuals is  0.02 
#> 
#> The harmonic n.obs is  178 with the empirical chi square  0.08  with prob <  0.96 
#> The total n.obs was  178  with Likelihood Chi Square =  2.06  with prob <  0.36 
#> 
#> Tucker Lewis Index of factoring reliability =  0.998
#> RMSEA index =  0.012  and the 90 % confidence intervals are  0 0.15
#> BIC =  -8.3
#> Fit based upon off diagonal values = 1
#> Measures of factor score adequacy             
#>                                                   [,1] [,2] [,3] [,4] [,5] [,6]
#> Correlation of (regression) scores with factors   0.99 0.98 0.99 0.95 0.88 0.88
#> Multiple R square of scores with factors          0.97 0.96 0.99 0.91 0.77 0.78
#> Minimum correlation of possible factor scores     0.94 0.93 0.97 0.82 0.55 0.56
#>                                                   [,7] [,8]
#> Correlation of (regression) scores with factors   0.74 0.86
#> Multiple R square of scores with factors          0.55 0.75
#> Minimum correlation of possible factor scores     0.09 0.50

The output of fa() includes factor loadings, uniquenesses, and several statistics that help evaluate model fit and interpret the underlying factor structure.

The most important component is the factor loadings, which indicate the strength and direction of the relationship between each observed variable and the underlying factors. Higher absolute loadings suggest that a variable is strongly associated with a particular factor. h2 represents the amount of variance in the item/variable explained by the (retained) factors. It is the sum of the squared loadings, a.k.a. communality.

The output also includes uniquenesses (u2), which represent the proportion of variance in each variable that is not explained by the common factors. Variables with high uniqueness are less well explained by the factor model. com is “Hoffman’s index of complexity”. It equals one if an item loads only on one factor, 2 if evenly loads on two factors, etc. It tells you how much an item reflects a single construct. It will be lower for relatively lower loadings.

In addition, the output provides SS loadings (sum of squared loadings) for each factor, which give an indication of how much variance each factor explains. From these, the proportion of variance explained and the cumulative variance explained can be derived, helping to assess the overall adequacy of the factor solution.

SS loadings: These are the eigenvalues, the sum of the squared loadings. In >this case where we are using a correlation matrix, summing across all factors would > equal the number of variables used in the analysis.

Proportion Var: tells us how much of the overall variance the factor accounts > for out of all the variables.

Cumulative Var: the cumulative sum of Proportion Var.

Proportion Explained: The relative amount of variance explained

Cumulative Proportion: the cumulative sum of Proportion Explained.

Finally, various fit statistics are included to evaluate how well the factor model represents the observed correlation structure, along with factor score information if scores are requested. This includes the Tucker Lewis fit index, typically reported in SEM. Generally you want values larger than 0.9. The BIC is udeful for model comparisons.

How many factors to retain?

At this point, we face the question of how many factors to retain. It is generally recommended to use a combination of parallel analysis (PA) and the minimum average partial (MAP) test, as these methods provide more reliable guidance than relying on a single approach. The scree plot can also be used as a supplementary visual tool to help support the decision.

In the following section, we will demonstrate how to implement these three methods in R.

Parallel Analysis (PA)

pa_results <- fa.parallel(wine_stand[,2:14], fa="pc", ylab="Eigenvalues", n.iter=500)

#> Parallel analysis suggests that the number of factors =  NA  and the number of components =  3

pa_results
#> Call: fa.parallel(x = wine_stand[, 2:14], fa = "pc", n.iter = 500, 
#>     ylabel = "Eigenvalues")
#> Parallel analysis suggests that the number of factors =  NA  and the number of components =  3 
#> 
#>  Eigen Values of 
#> 
#>  eigen values of factors
#>  [1]  4.24  1.64  0.64  0.22  0.09  0.00 -0.05 -0.11 -0.26 -0.36 -0.50 -0.63
#> [13] -0.68
#> 
#>  eigen values of simulated factors
#> [1] NA
#> 
#>  eigen values of components 
#>  [1] 4.71 2.50 1.45 0.92 0.85 0.64 0.55 0.35 0.29 0.25 0.23 0.17 0.10
#> 
#>  eigen values of simulated components
#>  [1] 1.47 1.35 1.26 1.18 1.11 1.04 0.98 0.92 0.86 0.80 0.74 0.68 0.60

Here, PCA extraction method is used (fa="pc") because the unreduced correlation matrix was used in the development of PA. It was also found more accurate in several simulation studies. The number of simulated data sets is controlled by n.iter command. By looking at the plot, it appears that only three components are sufficient.

Minimum average partial method (MAP)

VSS(wine_stand[,2:14], rotate = "promax", fm="pc", plot=FALSE)
#> 
#> Very Simple Structure
#> Call: vss(x = x, n = n, rotate = rotate, diagonal = diagonal, fm = fm, 
#>     n.obs = n.obs, plot = plot, title = title, use = use, cor = cor)
#> VSS complexity 1 achieves a maximimum of 0.77  with  5  factors
#> VSS complexity 2 achieves a maximimum of 0.92  with  3  factors
#> 
#> The Velicer MAP achieves a minimum of 0.05  with  3  factors
#> Warning in min(x$vss.stats[, "BIC"], na.rm = TRUE): no non-missing arguments to
#> min; returning Inf
#> 
#> BIC achieves a minimum of  Inf  with    factors
#> Warning in min(x$vss.stats[, "SABIC"], na.rm = TRUE): no non-missing arguments
#> to min; returning Inf
#> Sample Size adjusted BIC achieves a minimum of  Inf  with    factors
#> 
#> Statistics by number of factors 
#>   vss1 vss2   map dof chisq prob sqresid  fit RMSEA BIC SABIC complex eChisq
#> 1 0.67 0.00 0.066   0    NA   NA   10.97 0.67    NA  NA    NA      NA     NA
#> 2 0.71 0.86 0.053   0    NA   NA    4.74 0.86    NA  NA    NA      NA     NA
#> 3 0.71 0.89 0.051   0    NA   NA    2.65 0.92    NA  NA    NA      NA     NA
#> 4 0.76 0.89 0.056   0    NA   NA    1.80 0.95    NA  NA    NA      NA     NA
#> 5 0.76 0.90 0.074   0    NA   NA    1.07 0.97    NA  NA    NA      NA     NA
#> 6 0.66 0.91 0.100   0    NA   NA    0.66 0.98    NA  NA    NA      NA     NA
#> 7 0.60 0.90 0.135   0    NA   NA    0.36 0.99    NA  NA    NA      NA     NA
#> 8 0.77 0.92 0.185   0    NA   NA    0.24 0.99    NA  NA    NA      NA     NA
#>   SRMR eCRMS eBIC
#> 1   NA    NA   NA
#> 2   NA    NA   NA
#> 3   NA    NA   NA
#> 4   NA    NA   NA
#> 5   NA    NA   NA
#> 6   NA    NA   NA
#> 7   NA    NA   NA
#> 8   NA    NA   NA

The lowest MAP value identifies the number of factors to retain. In our example, MAP identifies three factors to retain.

Scree plot

scree(wine_stand[,2:14], pc=TRUE, factors=TRUE)

Augmented scree plot

library(factoextra)
#> Warning: package 'factoextra' was built under R version 4.5.3
#> Loading required package: ggplot2
#> Warning: package 'ggplot2' was built under R version 4.5.3
#> 
#> Attaching package: 'ggplot2'
#> The following objects are masked from 'package:psych':
#> 
#>     %+%, alpha
#> Welcome to factoextra!
#> Want to learn more? See two factoextra-related books at https://www.datanovia.com/en/product/practical-guide-to-principal-component-methods-in-r/
library(FactoMineR)
#> Warning: package 'FactoMineR' was built under R version 4.5.3
pca_wine <- PCA(wine_stand[,2:14], graph = FALSE)
fviz_eig(pca_wine, addlabels = TRUE)

Model evaluation

wine.fa1 <- fa(wine_stand[,2:14],nfactors=3, fm = "ols", rotate = "none")
wine.fa1
#> Factor Analysis using method =  ols
#> Call: fa(r = wine_stand[, 2:14], nfactors = 3, rotate = "none", fm = "ols")
#> Standardized loadings (pattern matrix) based upon correlation matrix
#>                                  1     2     3   h2    u2 com
#> Alcohol                       0.30  0.72 -0.18 0.64 0.358 1.5
#> Malic acid                   -0.47  0.28  0.09 0.31 0.695 1.7
#> Ash                           0.00  0.45  0.65 0.63 0.373 1.8
#> Alcalinity of ash            -0.51 -0.03  0.70 0.75 0.250 1.8
#> Magnesium                     0.26  0.35  0.09 0.20 0.800 2.0
#> Total phenols                 0.85  0.09  0.17 0.75 0.249 1.1
#> Flavanoids                    0.94 -0.01  0.19 0.92 0.083 1.1
#> Nonflavanoid phenols         -0.58  0.04  0.14 0.36 0.641 1.1
#> Proanthocyanins               0.62  0.04  0.12 0.40 0.601 1.1
#> Color intensity              -0.19  0.81 -0.11 0.70 0.298 1.1
#> Hue                           0.60 -0.39  0.05 0.51 0.489 1.7
#> OD280/OD315 of diluted wines  0.80 -0.26  0.17 0.74 0.255 1.3
#> Proline                       0.60  0.55 -0.12 0.68 0.320 2.1
#> 
#>                       [,1] [,2] [,3]
#> SS loadings           4.36 2.11 1.12
#> Proportion Var        0.34 0.16 0.09
#> Cumulative Var        0.34 0.50 0.58
#> Proportion Explained  0.57 0.28 0.15
#> Cumulative Proportion 0.57 0.85 1.00
#> 
#> Mean item complexity =  1.5
#> Test of the hypothesis that 3 factors are sufficient.
#> 
#> df null model =  78  with the objective function =  7.67 with Chi Square =  1317.18
#> df of  the model are 42  and the objective function was  0.99 
#> 
#> The root mean square of the residuals (RMSR) is  0.05 
#> The df corrected root mean square of the residuals is  0.07 
#> 
#> The harmonic n.obs is  178 with the empirical chi square  34.25  with prob <  0.8 
#> The total n.obs was  178  with Likelihood Chi Square =  167.98  with prob <  5.5e-17 
#> 
#> Tucker Lewis Index of factoring reliability =  0.809
#> RMSEA index =  0.13  and the 90 % confidence intervals are  0.11 0.151
#> BIC =  -49.65
#> Fit based upon off diagonal values = 0.98
#> Measures of factor score adequacy             
#>                                                   [,1] [,2] [,3]
#> Correlation of (regression) scores with factors   0.98 0.93 0.89
#> Multiple R square of scores with factors          0.95 0.86 0.80
#> Minimum correlation of possible factor scores     0.91 0.71 0.59

wine.fa2 <- fa(wine_stand[,2:14],nfactors=3, fm = "ols", rotate = "varimax")
wine.fa2
#> Factor Analysis using method =  ols
#> Call: fa(r = wine_stand[, 2:14], nfactors = 3, rotate = "varimax", 
#>     fm = "ols")
#> Standardized loadings (pattern matrix) based upon correlation matrix
#>                                  1     2     3   h2    u2 com
#> Alcohol                       0.04  0.80 -0.07 0.64 0.358 1.0
#> Malic acid                   -0.49  0.09  0.23 0.31 0.695 1.5
#> Ash                           0.03  0.31  0.73 0.63 0.373 1.4
#> Alcalinity of ash            -0.30 -0.31  0.75 0.75 0.250 1.7
#> Magnesium                     0.17  0.40  0.12 0.20 0.800 1.6
#> Total phenols                 0.80  0.34  0.03 0.75 0.249 1.3
#> Flavanoids                    0.92  0.26  0.02 0.92 0.083 1.2
#> Nonflavanoid phenols         -0.52 -0.17  0.24 0.36 0.641 1.7
#> Proanthocyanins               0.59  0.22  0.02 0.40 0.601 1.3
#> Color intensity              -0.43  0.71  0.11 0.70 0.298 1.7
#> Hue                           0.68 -0.18 -0.14 0.51 0.489 1.2
#> OD280/OD315 of diluted wines  0.86 -0.01 -0.03 0.74 0.255 1.0
#> Proline                       0.38  0.73 -0.10 0.68 0.320 1.5
#> 
#>                       [,1] [,2] [,3]
#> SS loadings           4.00 2.32 1.27
#> Proportion Var        0.31 0.18 0.10
#> Cumulative Var        0.31 0.49 0.58
#> Proportion Explained  0.53 0.31 0.17
#> Cumulative Proportion 0.53 0.83 1.00
#> 
#> Mean item complexity =  1.4
#> Test of the hypothesis that 3 factors are sufficient.
#> 
#> df null model =  78  with the objective function =  7.67 with Chi Square =  1317.18
#> df of  the model are 42  and the objective function was  0.99 
#> 
#> The root mean square of the residuals (RMSR) is  0.05 
#> The df corrected root mean square of the residuals is  0.07 
#> 
#> The harmonic n.obs is  178 with the empirical chi square  34.25  with prob <  0.8 
#> The total n.obs was  178  with Likelihood Chi Square =  167.98  with prob <  5.5e-17 
#> 
#> Tucker Lewis Index of factoring reliability =  0.809
#> RMSEA index =  0.13  and the 90 % confidence intervals are  0.11 0.151
#> BIC =  -49.65
#> Fit based upon off diagonal values = 0.98
#> Measures of factor score adequacy             
#>                                                   [,1] [,2] [,3]
#> Correlation of (regression) scores with factors   0.97 0.93 0.89
#> Multiple R square of scores with factors          0.94 0.87 0.80
#> Minimum correlation of possible factor scores     0.89 0.73 0.59

wine.fa3 <- fa(wine_stand[,2:14],nfactors=3, fm = "ols", rotate = "oblimin")
#> Loading required namespace: GPArotation
wine.fa3
#> Factor Analysis using method =  ols
#> Call: fa(r = wine_stand[, 2:14], nfactors = 3, rotate = "oblimin", 
#>     fm = "ols")
#> Standardized loadings (pattern matrix) based upon correlation matrix
#>                                  1     2     3   h2    u2 com
#> Alcohol                       0.10  0.78 -0.08 0.64 0.358 1.1
#> Malic acid                   -0.44  0.18  0.22 0.31 0.695 1.9
#> Ash                           0.22  0.26  0.75 0.63 0.373 1.4
#> Alcalinity of ash            -0.16 -0.28  0.77 0.75 0.250 1.4
#> Magnesium                     0.23  0.35  0.12 0.20 0.800 2.0
#> Total phenols                 0.84  0.16  0.05 0.75 0.249 1.1
#> Flavanoids                    0.96  0.06  0.03 0.92 0.083 1.0
#> Nonflavanoid phenols         -0.48 -0.07  0.24 0.36 0.641 1.5
#> Proanthocyanins               0.62  0.09  0.03 0.40 0.601 1.0
#> Color intensity              -0.34  0.79  0.10 0.70 0.298 1.4
#> Hue                           0.63 -0.31 -0.13 0.51 0.489 1.6
#> OD280/OD315 of diluted wines  0.86 -0.20 -0.01 0.74 0.255 1.1
#> Proline                       0.42  0.64 -0.11 0.68 0.320 1.8
#> 
#>                       [,1] [,2] [,3]
#> SS loadings           4.07 2.13 1.38
#> Proportion Var        0.31 0.16 0.11
#> Cumulative Var        0.31 0.48 0.58
#> Proportion Explained  0.54 0.28 0.18
#> Cumulative Proportion 0.54 0.82 1.00
#> 
#>  With factor correlations of 
#>       [,1] [,2]  [,3]
#> [1,]  1.00 0.12 -0.23
#> [2,]  0.12 1.00  0.05
#> [3,] -0.23 0.05  1.00
#> 
#> Mean item complexity =  1.4
#> Test of the hypothesis that 3 factors are sufficient.
#> 
#> df null model =  78  with the objective function =  7.67 with Chi Square =  1317.18
#> df of  the model are 42  and the objective function was  0.99 
#> 
#> The root mean square of the residuals (RMSR) is  0.05 
#> The df corrected root mean square of the residuals is  0.07 
#> 
#> The harmonic n.obs is  178 with the empirical chi square  34.25  with prob <  0.8 
#> The total n.obs was  178  with Likelihood Chi Square =  167.98  with prob <  5.5e-17 
#> 
#> Tucker Lewis Index of factoring reliability =  0.809
#> RMSEA index =  0.13  and the 90 % confidence intervals are  0.11 0.151
#> BIC =  -49.65
#> Fit based upon off diagonal values = 0.98
#> Measures of factor score adequacy             
#>                                                   [,1] [,2] [,3]
#> Correlation of (regression) scores with factors   0.97 0.93 0.88
#> Multiple R square of scores with factors          0.93 0.86 0.78
#> Minimum correlation of possible factor scores     0.87 0.71 0.56

When using an oblique rotation (like “oblimin”), the factors are allowed to correlate. This leads to two types of loading matrices, and in the output above, you can see the Pattern matrix, which shows the unique contribution (regression coefficients) of each factor to each variable.

Below, you can get the structure matrix, which shows the correlations between observed variables and the factors:

wine.fa3$Structure
#> 
#> Loadings:
#>                              [,1]   [,2]   [,3]  
#> Alcohol                       0.206  0.789       
#> Malic acid                   -0.468  0.145  0.335
#> Ash                                  0.321  0.706
#> Alcalinity of ash            -0.372 -0.262  0.797
#> Magnesium                     0.246  0.380       
#> Total phenols                 0.851  0.257 -0.143
#> Flavanoids                    0.955  0.172 -0.189
#> Nonflavanoid phenols         -0.548 -0.114  0.353
#> Proanthocyanins               0.625  0.163 -0.112
#> Color intensity              -0.268  0.752  0.215
#> Hue                           0.627 -0.243 -0.296
#> OD280/OD315 of diluted wines  0.840        -0.226
#> Proline                       0.524  0.685 -0.173
#> 
#>                 [,1]  [,2]  [,3]
#> SS loadings    4.239 2.199 1.665
#> Proportion Var 0.326 0.169 0.128
#> Cumulative Var 0.326 0.495 0.623

We can also obtain the off-diagonal residuals:

resid3 <- residuals(wine.fa3, diag = FALSE, na.rm = TRUE)
resid3
#>                              Alchl Mlcac Ash   Alcoa Mgnsm Ttlph Flvnd Nnflp
#> Alcohol                         NA                                          
#> Malic acid                    0.05    NA                                    
#> Ash                           0.01 -0.02    NA                              
#> Alcalinity of ash            -0.01  0.00  0.00    NA                        
#> Magnesium                    -0.04 -0.04  0.07  0.00    NA                  
#> Total phenols                 0.00  0.02 -0.02  0.00 -0.06    NA            
#> Flavanoids                    0.00  0.02  0.00 -0.01 -0.06  0.04    NA      
#> Nonflavanoid phenols          0.01  0.00  0.08 -0.03 -0.13  0.01 -0.02    NA
#> Proanthocyanins              -0.06  0.05 -0.09  0.03  0.05  0.07  0.05 -0.03
#> Color intensity               0.00 -0.05 -0.03  0.02 -0.02  0.04  0.03  0.01
#> Hue                           0.04 -0.18  0.07 -0.01  0.03 -0.04 -0.03  0.10
#> OD280/OD315 of diluted wines  0.05  0.07  0.01  0.00 -0.07  0.02  0.00 -0.05
#> Proline                       0.04 -0.05  0.06 -0.03  0.05 -0.04 -0.04  0.03
#>                              Prnth Clrin Hue  
#> Alcohol                                       
#> Malic acid                                    
#> Ash                                           
#> Alcalinity of ash                             
#> Magnesium                                     
#> Total phenols                                 
#> Flavanoids                                    
#> Nonflavanoid phenols                          
#> Proanthocyanins                 NA            
#> Color intensity               0.07    NA      
#> Hue                          -0.06 -0.09    NA
#> OD280/OD315 of diluted wines  0.01 -0.05 -0.03
#> Proline                      -0.05 -0.03  0.09
#>                              ODodw Proln
#> OD280/OD315 of diluted wines    NA      
#> Proline                      -0.01    NA

The residual correlations indicate the extent to which the model fails to reproduce the observed correlation matrix. Ideally, these residuals should be small (below 0.05), suggesting that the factor model captures most of the underlying structure.

In addition, fit statistics such as the Tucker–Lewis Index (TLI) and the root mean square of residuals (RMSR) provide summary measures of model fit, with better-fitting models showing higher TLI values and lower RMSR values. Together, these diagnostics help assess whether the latent factors provide a meaningful and adequate representation of the data.

Exploratory Factor Analysis (EFA)

dr. Annelies Agten

2026-04-28