Exploratory Factor Analysis (EFA) is a multivariate technique used to identify underlying latent variables, or factors, that explain patterns of correlations among observed variables. Unlike PCA, which focuses on summarising total variance, EFA assumes that the observed variables are influenced by a smaller number of unobserved constructs, along with some measurement error.
The main goal of EFA is to uncover the underlying structure of a dataset by grouping variables that are highly correlated with each other into common factors. This can help reduce data complexity and provide a more interpretable representation of the relationships between variables.
In the following section, we will explore how to conduct EFA in R and how to interpret the resulting factor structure.
We will start by loading the most fundamental package:
psych. If you never used the package psych, then you need
to first install it:
Then you can load the library:
EFA is commonly performed using the fa() function from
the psych package. This function allows us to extract a
specified number of factors and choose different extraction and rotation
methods, making it very flexible for applied work.
The basic syntax of the function is:
Syntax:
fa(r, nfactors, rotate = “none”, fm = “minres”)Where:
- r: Is the dataset or correlation matrix
- nfactors: Specifies the number of factors to extract. Default is 1.
- rotate: Defines the rotation method used to improve interpretability of the factor solution. Default is none
- fm: Specifies the factoring method. Default is minres
First we need to read in our data into R.Throughtout this example we
will use the wine data. These data are the results of a
chemical analysis of wines grown in the same region in Italy but derived
from three different cultivars. The analysis determined the quantities
of 13 constituents found in each of the three types of wines.
The attributes are:
The wine data is in a .txt format, so to read in the
data we can use the read.table() function in R.
wine <- read.table("http://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data", sep=",")
colnames(wine) <- c("Cultivar","Alcohol","Malic acid","Ash","Alcalinity of ash","Magnesium","Total phenols","Flavanoids","Nonflavanoid phenols","Proanthocyanins","Color intensity","Hue","OD280/OD315 of diluted wines","Proline")
dim(wine)
#> [1] 178 14
head(wine, 5)
#> Cultivar Alcohol Malic acid Ash Alcalinity of ash Magnesium Total phenols
#> 1 1 14.23 1.71 2.43 15.6 127 2.80
#> 2 1 13.20 1.78 2.14 11.2 100 2.65
#> 3 1 13.16 2.36 2.67 18.6 101 2.80
#> 4 1 14.37 1.95 2.50 16.8 113 3.85
#> 5 1 13.24 2.59 2.87 21.0 118 2.80
#> Flavanoids Nonflavanoid phenols Proanthocyanins Color intensity Hue
#> 1 3.06 0.28 2.29 5.64 1.04
#> 2 2.76 0.26 1.28 4.38 1.05
#> 3 3.24 0.30 2.81 5.68 1.03
#> 4 3.49 0.24 2.18 7.80 0.86
#> 5 2.69 0.39 1.82 4.32 1.04
#> OD280/OD315 of diluted wines Proline
#> 1 3.92 1065
#> 2 3.40 1050
#> 3 3.17 1185
#> 4 3.45 1480
#> 5 2.93 735The wine dataset contains 178 observations of 14
variables, including the 13 measured quantities of chemicals and the
variable Cultivar, which indicates the type of grape from which the wine
was produced. Since this is a categorical variable describing group
membership rather than a measured chemical property, it is not included
in the EFA.
Before performing EFA, it is important to check whether all variables are measured on the same scale. If not, the data should be standardized so that each variable contributes equally to the analysis.
wine_range <- data.frame(cbind(sapply(wine[,2:14],min),
sapply(wine[,2:14],max),
round(sapply(wine[,2:14],var),2)))
names(wine_range) <- c("min value","max value","variance")
wine_range
#> min value max value variance
#> Alcohol 11.03 14.83 0.66
#> Malic acid 0.74 5.80 1.25
#> Ash 1.36 3.23 0.08
#> Alcalinity of ash 10.60 30.00 11.15
#> Magnesium 70.00 162.00 203.99
#> Total phenols 0.98 3.88 0.39
#> Flavanoids 0.34 5.08 1.00
#> Nonflavanoid phenols 0.13 0.66 0.02
#> Proanthocyanins 0.41 3.58 0.33
#> Color intensity 1.28 13.00 5.37
#> Hue 0.48 1.71 0.05
#> OD280/OD315 of diluted wines 1.27 4.00 0.50
#> Proline 278.00 1680.00 99166.72The measured attributes have very different ranges, so the data should be standardized before performing EFA to ensure that all variables contribute equally to the analysis.
wine_stand <- as.data.frame(scale(wine)) # standardize data by subtracting the mean and deviding by the sd
wine_stand_range <- data.frame(cbind(sapply(wine_stand[,2:14],min),
sapply(wine_stand[,2:14],max),
round(sapply(wine_stand[,2:14],var),2)))
names(wine_stand_range) <- c("min value","max value","variance")
wine_stand_range
#> min value max value variance
#> Alcohol -2.427388 2.253415 1
#> Malic acid -1.428952 3.100446 1
#> Ash -3.668813 3.147447 1
#> Alcalinity of ash -2.663505 3.145637 1
#> Magnesium -2.082381 4.359076 1
#> Total phenols -2.101318 2.532372 1
#> Flavanoids -1.691200 3.054216 1
#> Nonflavanoid phenols -1.862979 2.395645 1
#> Proanthocyanins -2.063214 3.475269 1
#> Color intensity -1.629691 3.425768 1
#> Hue -2.088840 3.292407 1
#> OD280/OD315 of diluted wines -1.889723 1.955399 1
#> Proline -1.488987 2.963114 1Before performing Exploratory Factor Analysis (EFA), it is important to assess whether the data are suitable for this technique. EFA relies on the presence of meaningful correlations between variables, so there should be sufficient correlation structure within the dataset.
This can first be explored visually using plots of the data or a correlation matrix (see the section on graphical displays of data). In addition, several more formal measures can be used to evaluate factorability.
These include the determinant of the correlation matrix (to check for excessive multicollinearity), Bartlett’s test of sphericity (to test whether the correlation matrix significantly differs from an identity matrix), and the Kaiser–Meyer–Olkin (KMO) measure of sampling adequacy (to assess whether partial correlations are sufficiently small).
To check whether the correlations among the measured variables are too high, we can look at the determinant of the correlation matrix (values larger than 0,00001 are okay).
The Bartlett’s test of sphericity can be performed by using the
cortest.bartlett() funtoin from the psych
package.
cortest.bartlett(wine_stand[,2:14])$p.value
#> R was not square, finding R from data
#> [1] 2.468617e-224In this case Bartlett’s test rejects the null hypothesis that the correlation matrix is an identity matrix (p-value below 0.05).
The KMO index can be computed using the KMO() function
of the psych package.
KMO(wine_stand[,2:14])
#> Kaiser-Meyer-Olkin factor adequacy
#> Call: KMO(r = wine_stand[, 2:14])
#> Overall MSA = 0.78
#> MSA for each item =
#> Alcohol Malic acid
#> 0.73 0.80
#> Ash Alcalinity of ash
#> 0.44 0.68
#> Magnesium Total phenols
#> 0.68 0.87
#> Flavanoids Nonflavanoid phenols
#> 0.81 0.83
#> Proanthocyanins Color intensity
#> 0.85 0.62
#> Hue OD280/OD315 of diluted wines
#> 0.79 0.87
#> Proline
#> 0.82Before choosing the extraction method, we can check whether our data
meets the assumption of multivariate normality (to use the Maximum
Likelihood extraction method). The MVN package provides the
Henze–Zirkler test by default, which is a reliable test for this
purpose.
library('MVN')
#> Warning: package 'MVN' was built under R version 4.5.3
#>
#> Attaching package: 'MVN'
#> The following object is masked from 'package:psych':
#>
#> mardia
mvn(wine_stand[,2:14])$multivariate_normality
#> Test Statistic p.value Method MVN
#> 1 Henze-Zirkler 1.074 <0.001 asymptotic ✗ Not normal
# Shapiro tests (not perfect but indicative)
apply(wine_stand[,2:14], 2, shapiro.test)
#> $Alcohol
#>
#> Shapiro-Wilk normality test
#>
#> data: newX[, i]
#> W = 0.9818, p-value = 0.02005
#>
#>
#> $`Malic acid`
#>
#> Shapiro-Wilk normality test
#>
#> data: newX[, i]
#> W = 0.88878, p-value = 2.946e-10
#>
#>
#> $Ash
#>
#> Shapiro-Wilk normality test
#>
#> data: newX[, i]
#> W = 0.98395, p-value = 0.03868
#>
#>
#> $`Alcalinity of ash`
#>
#> Shapiro-Wilk normality test
#>
#> data: newX[, i]
#> W = 0.99023, p-value = 0.2639
#>
#>
#> $Magnesium
#>
#> Shapiro-Wilk normality test
#>
#> data: newX[, i]
#> W = 0.93833, p-value = 6.346e-07
#>
#>
#> $`Total phenols`
#>
#> Shapiro-Wilk normality test
#>
#> data: newX[, i]
#> W = 0.97668, p-value = 0.004395
#>
#>
#> $Flavanoids
#>
#> Shapiro-Wilk normality test
#>
#> data: newX[, i]
#> W = 0.95453, p-value = 1.679e-05
#>
#>
#> $`Nonflavanoid phenols`
#>
#> Shapiro-Wilk normality test
#>
#> data: newX[, i]
#> W = 0.96252, p-value = 0.0001055
#>
#>
#> $Proanthocyanins
#>
#> Shapiro-Wilk normality test
#>
#> data: newX[, i]
#> W = 0.98072, p-value = 0.01445
#>
#>
#> $`Color intensity`
#>
#> Shapiro-Wilk normality test
#>
#> data: newX[, i]
#> W = 0.94032, p-value = 9.229e-07
#>
#>
#> $Hue
#>
#> Shapiro-Wilk normality test
#>
#> data: newX[, i]
#> W = 0.98134, p-value = 0.01743
#>
#>
#> $`OD280/OD315 of diluted wines`
#>
#> Shapiro-Wilk normality test
#>
#> data: newX[, i]
#> W = 0.94505, p-value = 2.316e-06
#>
#>
#> $Proline
#>
#> Shapiro-Wilk normality test
#>
#> data: newX[, i]
#> W = 0.93119, p-value = 1.741e-07The p-value <0.001 for the multivariate normality test indicates that we reject the null hypothesis of multivariate normality (since p < 0.05). This suggests that the data are not consistent with a multivariate normal distribution, so it is not appropriate to proceed with ML extraction. Since the multivariate normality test is very sensitive to sample size, another proxy is to look at univariate normality of all the variables. However, these Shapiro-Wilk tests also show that most variable deviate from the normality assumption. Therefore we will opt for the Ordinary Least Squares option (OLS).
Using the psych package fa() function, we
run a factor analysis model, extracting 8 components using the
wine data.
wine.fa <- fa(wine_stand[,2:14],nfactors=8, fm = "ols", rotate = "none")
wine.fa
#> Factor Analysis using method = ols
#> Call: fa(r = wine_stand[, 2:14], nfactors = 8, rotate = "none", fm = "ols")
#> Standardized loadings (pattern matrix) based upon correlation matrix
#> 1 2 3 4 5 6 7 8
#> Alcohol 0.31 0.73 -0.22 0.12 0.05 -0.20 -0.16 0.22
#> Malic acid -0.50 0.30 0.05 -0.23 0.25 -0.37 0.18 0.04
#> Ash -0.01 0.53 0.76 0.13 -0.18 -0.15 -0.07 -0.24
#> Alcalinity of ash -0.54 -0.01 0.73 -0.24 0.04 0.16 -0.14 0.26
#> Magnesium 0.28 0.40 0.10 -0.13 -0.41 0.07 0.34 0.09
#> Total phenols 0.84 0.11 0.17 0.00 0.24 0.05 0.03 -0.03
#> Flavanoids 0.92 0.01 0.20 -0.01 0.20 0.03 -0.01 -0.06
#> Nonflavanoid phenols -0.67 0.05 0.20 0.63 0.25 0.07 0.20 0.06
#> Proanthocyanins 0.63 0.06 0.12 -0.14 0.22 0.17 0.22 0.05
#> Color intensity -0.20 0.87 -0.20 -0.05 0.14 0.35 -0.11 -0.09
#> Hue 0.62 -0.39 0.12 0.31 -0.19 0.04 -0.06 0.11
#> OD280/OD315 of diluted wines 0.79 -0.24 0.20 -0.03 0.15 -0.14 -0.04 0.03
#> Proline 0.59 0.53 -0.12 0.18 -0.13 -0.06 -0.02 0.09
#> h2 u2 com
#> Alcohol 0.80 0.1955 2.2
#> Malic acid 0.63 0.3705 4.1
#> Ash 1.00 0.0047 2.4
#> Alcalinity of ash 1.00 0.0050 2.6
#> Magnesium 0.57 0.4350 4.3
#> Total phenols 0.80 0.1986 1.3
#> Flavanoids 0.94 0.0609 1.2
#> Nonflavanoid phenols 1.00 0.0049 2.7
#> Proanthocyanins 0.56 0.4390 1.9
#> Color intensity 1.00 0.0035 1.7
#> Hue 0.70 0.3001 2.7
#> OD280/OD315 of diluted wines 0.77 0.2286 1.5
#> Proline 0.71 0.2895 2.4
#>
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> SS loadings 4.51 2.33 1.40 0.71 0.57 0.41 0.31 0.22
#> Proportion Var 0.35 0.18 0.11 0.05 0.04 0.03 0.02 0.02
#> Cumulative Var 0.35 0.53 0.63 0.69 0.73 0.76 0.79 0.80
#> Proportion Explained 0.43 0.22 0.13 0.07 0.05 0.04 0.03 0.02
#> Cumulative Proportion 0.43 0.65 0.79 0.86 0.91 0.95 0.98 1.00
#>
#> Mean item complexity = 2.4
#> Test of the hypothesis that 8 factors are sufficient.
#>
#> df null model = 78 with the objective function = 7.67 with Chi Square = 1317.18
#> df of the model are 2 and the objective function was 0.01
#>
#> The root mean square of the residuals (RMSR) is 0
#> The df corrected root mean square of the residuals is 0.02
#>
#> The harmonic n.obs is 178 with the empirical chi square 0.08 with prob < 0.96
#> The total n.obs was 178 with Likelihood Chi Square = 2.06 with prob < 0.36
#>
#> Tucker Lewis Index of factoring reliability = 0.998
#> RMSEA index = 0.012 and the 90 % confidence intervals are 0 0.15
#> BIC = -8.3
#> Fit based upon off diagonal values = 1
#> Measures of factor score adequacy
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> Correlation of (regression) scores with factors 0.99 0.98 0.99 0.95 0.88 0.88
#> Multiple R square of scores with factors 0.97 0.96 0.99 0.91 0.77 0.78
#> Minimum correlation of possible factor scores 0.94 0.93 0.97 0.82 0.55 0.56
#> [,7] [,8]
#> Correlation of (regression) scores with factors 0.74 0.86
#> Multiple R square of scores with factors 0.55 0.75
#> Minimum correlation of possible factor scores 0.09 0.50The output of fa() includes factor loadings,
uniquenesses, and several statistics that help evaluate model fit and
interpret the underlying factor structure.
The most important component is the factor loadings,
which indicate the strength and direction of the relationship between
each observed variable and the underlying factors. Higher absolute
loadings suggest that a variable is strongly associated with a
particular factor. h2 represents the amount of variance in
the item/variable explained by the (retained) factors. It is the sum of
the squared loadings, a.k.a. communality.
The output also includes uniquenesses
(u2), which represent the proportion of variance in each
variable that is not explained by the common factors. Variables with
high uniqueness are less well explained by the factor model.
com is “Hoffman’s index of complexity”. It equals one if an
item loads only on one factor, 2 if evenly loads on two factors, etc. It
tells you how much an item reflects a single construct. It will be lower
for relatively lower loadings.
In addition, the output provides SS loadings (sum of squared loadings) for each factor, which give an indication of how much variance each factor explains. From these, the proportion of variance explained and the cumulative variance explained can be derived, helping to assess the overall adequacy of the factor solution.
- SS loadings: These are the eigenvalues, the sum of the squared loadings. In >this case where we are using a correlation matrix, summing across all factors would > equal the number of variables used in the analysis.
- Proportion Var: tells us how much of the overall variance the factor accounts > for out of all the variables.
- Cumulative Var: the cumulative sum of Proportion Var.
- Proportion Explained: The relative amount of variance explained
- Cumulative Proportion: the cumulative sum of Proportion Explained.
Finally, various fit statistics are included to evaluate how well the factor model represents the observed correlation structure, along with factor score information if scores are requested. This includes the Tucker Lewis fit index, typically reported in SEM. Generally you want values larger than 0.9. The BIC is udeful for model comparisons.
At this point, we face the question of how many factors to retain. It is generally recommended to use a combination of parallel analysis (PA) and the minimum average partial (MAP) test, as these methods provide more reliable guidance than relying on a single approach. The scree plot can also be used as a supplementary visual tool to help support the decision.
In the following section, we will demonstrate how to implement these three methods in R.
Parallel Analysis (PA)
#> Parallel analysis suggests that the number of factors = NA and the number of components = 3
pa_results
#> Call: fa.parallel(x = wine_stand[, 2:14], fa = "pc", n.iter = 500,
#> ylabel = "Eigenvalues")
#> Parallel analysis suggests that the number of factors = NA and the number of components = 3
#>
#> Eigen Values of
#>
#> eigen values of factors
#> [1] 4.24 1.64 0.64 0.22 0.09 0.00 -0.05 -0.11 -0.26 -0.36 -0.50 -0.63
#> [13] -0.68
#>
#> eigen values of simulated factors
#> [1] NA
#>
#> eigen values of components
#> [1] 4.71 2.50 1.45 0.92 0.85 0.64 0.55 0.35 0.29 0.25 0.23 0.17 0.10
#>
#> eigen values of simulated components
#> [1] 1.47 1.35 1.26 1.18 1.11 1.04 0.98 0.92 0.86 0.80 0.74 0.68 0.60
Here, PCA extraction method is used (fa="pc") because
the unreduced correlation matrix was used in the development of PA. It
was also found more accurate in several simulation studies. The number
of simulated data sets is controlled by n.iter command. By
looking at the plot, it appears that only three components are
sufficient.
Minimum average partial method (MAP)
VSS(wine_stand[,2:14], rotate = "promax", fm="pc", plot=FALSE)
#>
#> Very Simple Structure
#> Call: vss(x = x, n = n, rotate = rotate, diagonal = diagonal, fm = fm,
#> n.obs = n.obs, plot = plot, title = title, use = use, cor = cor)
#> VSS complexity 1 achieves a maximimum of 0.77 with 5 factors
#> VSS complexity 2 achieves a maximimum of 0.92 with 3 factors
#>
#> The Velicer MAP achieves a minimum of 0.05 with 3 factors
#> Warning in min(x$vss.stats[, "BIC"], na.rm = TRUE): no non-missing arguments to
#> min; returning Inf
#>
#> BIC achieves a minimum of Inf with factors
#> Warning in min(x$vss.stats[, "SABIC"], na.rm = TRUE): no non-missing arguments
#> to min; returning Inf
#> Sample Size adjusted BIC achieves a minimum of Inf with factors
#>
#> Statistics by number of factors
#> vss1 vss2 map dof chisq prob sqresid fit RMSEA BIC SABIC complex eChisq
#> 1 0.67 0.00 0.066 0 NA NA 10.97 0.67 NA NA NA NA NA
#> 2 0.71 0.86 0.053 0 NA NA 4.74 0.86 NA NA NA NA NA
#> 3 0.71 0.89 0.051 0 NA NA 2.65 0.92 NA NA NA NA NA
#> 4 0.76 0.89 0.056 0 NA NA 1.80 0.95 NA NA NA NA NA
#> 5 0.76 0.90 0.074 0 NA NA 1.07 0.97 NA NA NA NA NA
#> 6 0.66 0.91 0.100 0 NA NA 0.66 0.98 NA NA NA NA NA
#> 7 0.60 0.90 0.135 0 NA NA 0.36 0.99 NA NA NA NA NA
#> 8 0.77 0.92 0.185 0 NA NA 0.24 0.99 NA NA NA NA NA
#> SRMR eCRMS eBIC
#> 1 NA NA NA
#> 2 NA NA NA
#> 3 NA NA NA
#> 4 NA NA NA
#> 5 NA NA NA
#> 6 NA NA NA
#> 7 NA NA NA
#> 8 NA NA NAThe lowest MAP value identifies the number of factors to retain. In our example, MAP identifies three factors to retain.
Scree plot
Augmented scree plot
library(factoextra)
#> Warning: package 'factoextra' was built under R version 4.5.3
#> Loading required package: ggplot2
#> Warning: package 'ggplot2' was built under R version 4.5.3
#>
#> Attaching package: 'ggplot2'
#> The following objects are masked from 'package:psych':
#>
#> %+%, alpha
#> Welcome to factoextra!
#> Want to learn more? See two factoextra-related books at https://www.datanovia.com/en/product/practical-guide-to-principal-component-methods-in-r/
library(FactoMineR)
#> Warning: package 'FactoMineR' was built under R version 4.5.3
pca_wine <- PCA(wine_stand[,2:14], graph = FALSE)
fviz_eig(pca_wine, addlabels = TRUE)wine.fa1 <- fa(wine_stand[,2:14],nfactors=3, fm = "ols", rotate = "none")
wine.fa1
#> Factor Analysis using method = ols
#> Call: fa(r = wine_stand[, 2:14], nfactors = 3, rotate = "none", fm = "ols")
#> Standardized loadings (pattern matrix) based upon correlation matrix
#> 1 2 3 h2 u2 com
#> Alcohol 0.30 0.72 -0.18 0.64 0.358 1.5
#> Malic acid -0.47 0.28 0.09 0.31 0.695 1.7
#> Ash 0.00 0.45 0.65 0.63 0.373 1.8
#> Alcalinity of ash -0.51 -0.03 0.70 0.75 0.250 1.8
#> Magnesium 0.26 0.35 0.09 0.20 0.800 2.0
#> Total phenols 0.85 0.09 0.17 0.75 0.249 1.1
#> Flavanoids 0.94 -0.01 0.19 0.92 0.083 1.1
#> Nonflavanoid phenols -0.58 0.04 0.14 0.36 0.641 1.1
#> Proanthocyanins 0.62 0.04 0.12 0.40 0.601 1.1
#> Color intensity -0.19 0.81 -0.11 0.70 0.298 1.1
#> Hue 0.60 -0.39 0.05 0.51 0.489 1.7
#> OD280/OD315 of diluted wines 0.80 -0.26 0.17 0.74 0.255 1.3
#> Proline 0.60 0.55 -0.12 0.68 0.320 2.1
#>
#> [,1] [,2] [,3]
#> SS loadings 4.36 2.11 1.12
#> Proportion Var 0.34 0.16 0.09
#> Cumulative Var 0.34 0.50 0.58
#> Proportion Explained 0.57 0.28 0.15
#> Cumulative Proportion 0.57 0.85 1.00
#>
#> Mean item complexity = 1.5
#> Test of the hypothesis that 3 factors are sufficient.
#>
#> df null model = 78 with the objective function = 7.67 with Chi Square = 1317.18
#> df of the model are 42 and the objective function was 0.99
#>
#> The root mean square of the residuals (RMSR) is 0.05
#> The df corrected root mean square of the residuals is 0.07
#>
#> The harmonic n.obs is 178 with the empirical chi square 34.25 with prob < 0.8
#> The total n.obs was 178 with Likelihood Chi Square = 167.98 with prob < 5.5e-17
#>
#> Tucker Lewis Index of factoring reliability = 0.809
#> RMSEA index = 0.13 and the 90 % confidence intervals are 0.11 0.151
#> BIC = -49.65
#> Fit based upon off diagonal values = 0.98
#> Measures of factor score adequacy
#> [,1] [,2] [,3]
#> Correlation of (regression) scores with factors 0.98 0.93 0.89
#> Multiple R square of scores with factors 0.95 0.86 0.80
#> Minimum correlation of possible factor scores 0.91 0.71 0.59wine.fa2 <- fa(wine_stand[,2:14],nfactors=3, fm = "ols", rotate = "varimax")
wine.fa2
#> Factor Analysis using method = ols
#> Call: fa(r = wine_stand[, 2:14], nfactors = 3, rotate = "varimax",
#> fm = "ols")
#> Standardized loadings (pattern matrix) based upon correlation matrix
#> 1 2 3 h2 u2 com
#> Alcohol 0.04 0.80 -0.07 0.64 0.358 1.0
#> Malic acid -0.49 0.09 0.23 0.31 0.695 1.5
#> Ash 0.03 0.31 0.73 0.63 0.373 1.4
#> Alcalinity of ash -0.30 -0.31 0.75 0.75 0.250 1.7
#> Magnesium 0.17 0.40 0.12 0.20 0.800 1.6
#> Total phenols 0.80 0.34 0.03 0.75 0.249 1.3
#> Flavanoids 0.92 0.26 0.02 0.92 0.083 1.2
#> Nonflavanoid phenols -0.52 -0.17 0.24 0.36 0.641 1.7
#> Proanthocyanins 0.59 0.22 0.02 0.40 0.601 1.3
#> Color intensity -0.43 0.71 0.11 0.70 0.298 1.7
#> Hue 0.68 -0.18 -0.14 0.51 0.489 1.2
#> OD280/OD315 of diluted wines 0.86 -0.01 -0.03 0.74 0.255 1.0
#> Proline 0.38 0.73 -0.10 0.68 0.320 1.5
#>
#> [,1] [,2] [,3]
#> SS loadings 4.00 2.32 1.27
#> Proportion Var 0.31 0.18 0.10
#> Cumulative Var 0.31 0.49 0.58
#> Proportion Explained 0.53 0.31 0.17
#> Cumulative Proportion 0.53 0.83 1.00
#>
#> Mean item complexity = 1.4
#> Test of the hypothesis that 3 factors are sufficient.
#>
#> df null model = 78 with the objective function = 7.67 with Chi Square = 1317.18
#> df of the model are 42 and the objective function was 0.99
#>
#> The root mean square of the residuals (RMSR) is 0.05
#> The df corrected root mean square of the residuals is 0.07
#>
#> The harmonic n.obs is 178 with the empirical chi square 34.25 with prob < 0.8
#> The total n.obs was 178 with Likelihood Chi Square = 167.98 with prob < 5.5e-17
#>
#> Tucker Lewis Index of factoring reliability = 0.809
#> RMSEA index = 0.13 and the 90 % confidence intervals are 0.11 0.151
#> BIC = -49.65
#> Fit based upon off diagonal values = 0.98
#> Measures of factor score adequacy
#> [,1] [,2] [,3]
#> Correlation of (regression) scores with factors 0.97 0.93 0.89
#> Multiple R square of scores with factors 0.94 0.87 0.80
#> Minimum correlation of possible factor scores 0.89 0.73 0.59wine.fa3 <- fa(wine_stand[,2:14],nfactors=3, fm = "ols", rotate = "oblimin")
#> Loading required namespace: GPArotation
wine.fa3
#> Factor Analysis using method = ols
#> Call: fa(r = wine_stand[, 2:14], nfactors = 3, rotate = "oblimin",
#> fm = "ols")
#> Standardized loadings (pattern matrix) based upon correlation matrix
#> 1 2 3 h2 u2 com
#> Alcohol 0.10 0.78 -0.08 0.64 0.358 1.1
#> Malic acid -0.44 0.18 0.22 0.31 0.695 1.9
#> Ash 0.22 0.26 0.75 0.63 0.373 1.4
#> Alcalinity of ash -0.16 -0.28 0.77 0.75 0.250 1.4
#> Magnesium 0.23 0.35 0.12 0.20 0.800 2.0
#> Total phenols 0.84 0.16 0.05 0.75 0.249 1.1
#> Flavanoids 0.96 0.06 0.03 0.92 0.083 1.0
#> Nonflavanoid phenols -0.48 -0.07 0.24 0.36 0.641 1.5
#> Proanthocyanins 0.62 0.09 0.03 0.40 0.601 1.0
#> Color intensity -0.34 0.79 0.10 0.70 0.298 1.4
#> Hue 0.63 -0.31 -0.13 0.51 0.489 1.6
#> OD280/OD315 of diluted wines 0.86 -0.20 -0.01 0.74 0.255 1.1
#> Proline 0.42 0.64 -0.11 0.68 0.320 1.8
#>
#> [,1] [,2] [,3]
#> SS loadings 4.07 2.13 1.38
#> Proportion Var 0.31 0.16 0.11
#> Cumulative Var 0.31 0.48 0.58
#> Proportion Explained 0.54 0.28 0.18
#> Cumulative Proportion 0.54 0.82 1.00
#>
#> With factor correlations of
#> [,1] [,2] [,3]
#> [1,] 1.00 0.12 -0.23
#> [2,] 0.12 1.00 0.05
#> [3,] -0.23 0.05 1.00
#>
#> Mean item complexity = 1.4
#> Test of the hypothesis that 3 factors are sufficient.
#>
#> df null model = 78 with the objective function = 7.67 with Chi Square = 1317.18
#> df of the model are 42 and the objective function was 0.99
#>
#> The root mean square of the residuals (RMSR) is 0.05
#> The df corrected root mean square of the residuals is 0.07
#>
#> The harmonic n.obs is 178 with the empirical chi square 34.25 with prob < 0.8
#> The total n.obs was 178 with Likelihood Chi Square = 167.98 with prob < 5.5e-17
#>
#> Tucker Lewis Index of factoring reliability = 0.809
#> RMSEA index = 0.13 and the 90 % confidence intervals are 0.11 0.151
#> BIC = -49.65
#> Fit based upon off diagonal values = 0.98
#> Measures of factor score adequacy
#> [,1] [,2] [,3]
#> Correlation of (regression) scores with factors 0.97 0.93 0.88
#> Multiple R square of scores with factors 0.93 0.86 0.78
#> Minimum correlation of possible factor scores 0.87 0.71 0.56When using an oblique rotation (like “oblimin”), the factors are allowed to correlate. This leads to two types of loading matrices, and in the output above, you can see the Pattern matrix, which shows the unique contribution (regression coefficients) of each factor to each variable.
Below, you can get the structure matrix, which shows the correlations between observed variables and the factors:
wine.fa3$Structure
#>
#> Loadings:
#> [,1] [,2] [,3]
#> Alcohol 0.206 0.789
#> Malic acid -0.468 0.145 0.335
#> Ash 0.321 0.706
#> Alcalinity of ash -0.372 -0.262 0.797
#> Magnesium 0.246 0.380
#> Total phenols 0.851 0.257 -0.143
#> Flavanoids 0.955 0.172 -0.189
#> Nonflavanoid phenols -0.548 -0.114 0.353
#> Proanthocyanins 0.625 0.163 -0.112
#> Color intensity -0.268 0.752 0.215
#> Hue 0.627 -0.243 -0.296
#> OD280/OD315 of diluted wines 0.840 -0.226
#> Proline 0.524 0.685 -0.173
#>
#> [,1] [,2] [,3]
#> SS loadings 4.239 2.199 1.665
#> Proportion Var 0.326 0.169 0.128
#> Cumulative Var 0.326 0.495 0.623We can also obtain the off-diagonal residuals:
resid3 <- residuals(wine.fa3, diag = FALSE, na.rm = TRUE)
resid3
#> Alchl Mlcac Ash Alcoa Mgnsm Ttlph Flvnd Nnflp
#> Alcohol NA
#> Malic acid 0.05 NA
#> Ash 0.01 -0.02 NA
#> Alcalinity of ash -0.01 0.00 0.00 NA
#> Magnesium -0.04 -0.04 0.07 0.00 NA
#> Total phenols 0.00 0.02 -0.02 0.00 -0.06 NA
#> Flavanoids 0.00 0.02 0.00 -0.01 -0.06 0.04 NA
#> Nonflavanoid phenols 0.01 0.00 0.08 -0.03 -0.13 0.01 -0.02 NA
#> Proanthocyanins -0.06 0.05 -0.09 0.03 0.05 0.07 0.05 -0.03
#> Color intensity 0.00 -0.05 -0.03 0.02 -0.02 0.04 0.03 0.01
#> Hue 0.04 -0.18 0.07 -0.01 0.03 -0.04 -0.03 0.10
#> OD280/OD315 of diluted wines 0.05 0.07 0.01 0.00 -0.07 0.02 0.00 -0.05
#> Proline 0.04 -0.05 0.06 -0.03 0.05 -0.04 -0.04 0.03
#> Prnth Clrin Hue
#> Alcohol
#> Malic acid
#> Ash
#> Alcalinity of ash
#> Magnesium
#> Total phenols
#> Flavanoids
#> Nonflavanoid phenols
#> Proanthocyanins NA
#> Color intensity 0.07 NA
#> Hue -0.06 -0.09 NA
#> OD280/OD315 of diluted wines 0.01 -0.05 -0.03
#> Proline -0.05 -0.03 0.09
#> ODodw Proln
#> OD280/OD315 of diluted wines NA
#> Proline -0.01 NAThe residual correlations indicate the extent to which the model fails to reproduce the observed correlation matrix. Ideally, these residuals should be small (below 0.05), suggesting that the factor model captures most of the underlying structure.
In addition, fit statistics such as the Tucker–Lewis Index (TLI) and the root mean square of residuals (RMSR) provide summary measures of model fit, with better-fitting models showing higher TLI values and lower RMSR values. Together, these diagnostics help assess whether the latent factors provide a meaningful and adequate representation of the data.