The first step in running an EFA is to go through all of the assumption checks you would normally assess in any multivariate analysis. After that there are only 2 additional assumptions that should be assessed. The KMO (Kaiser-Meyer-Olkin), which is a test of the adequacy of the interitem correlations to support an EFA and the Bartlett’s test of sphericity, which tests the assumption that the items are independent of each other and thus not appropriate for EFA.
I have removed column 1 and 42 from the data because they are not continuous variables and not part of the EFA.
Since I have already loaded the data I will not include that step here, so to adapt this code you would first need to load the data you intend to analyse. Also, you may need to install the packages loaded for this example.
library(haven)
UWin_Analysis_Data_2<-read_sav("C:/Work Files/Kendall/COVID/UWin_Analysis_Data_2.sav")
library(psych)
library(GPArotation)
##
## Attaching package: 'GPArotation'
## The following objects are masked from 'package:psych':
##
## equamax, varimin
KMO(UWin_Analysis_Data_2 [c(-1,-42)])
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = UWin_Analysis_Data_2[c(-1, -42)])
## Overall MSA = 0.91
## MSA for each item =
## V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 B1 B2 B3 B4
## 0.94 0.94 0.91 0.89 0.91 0.92 0.92 0.90 0.88 0.89 0.92 0.91 0.94 0.90 0.93 0.92
## B5 B6 B7 B8 G1 G2 G3 G4 G5 G6 G7 G8 G9 G10 G11 C1
## 0.95 0.95 0.95 0.95 0.85 0.84 0.86 0.80 0.77 0.80 0.94 0.91 0.89 0.91 0.90 0.89
## C2 C3 C4 C5 C6 C7 C8 C9
## 0.92 0.91 0.92 0.95 0.93 0.90 0.91 0.92
cortest.bartlett(UWin_Analysis_Data_2 [c(-1,-42)])
## R was not square, finding R from data
## $chisq
## [1] 24077.68
##
## $p.value
## [1] 0
##
## $df
## [1] 780
The KMO ranges from 0-1 with values closer to 1 indicating greater shared variance among items. If the KMO is less than .5 conducting an EFA is not advisable. In this case the overall KMO is .91 and there is no item with a lower KMO than .77, So all the items have enough shared variance to justify an EFA.
Similarly, the Bartlett’s tests the null hypothesis that the items are orthogonal or independent of each other and thus there can be no underlying latent variable accounting for shared variance among items because there is no shared variance. Therefore, we want the chi-square comparing the observed correlation matrix to an identity matrix (completely independent items)to be significant, rejecting the null assumption of orthogonal items. In this example we see that the chi-sq (780)=24077, p<.001, and so this also confirms that EFA is appropriate.
Next we need to estimate the eigenvalues to determine how many factors there might be underlying the correlation matrix. Since this requires complete data, we need to first remove any missing values from our analysis data set. Then we can estimate the eigenvalues. There will be as many eigenvalues as there are variables,indicating that there is an ordinal number of factors ranging from 1 factor (the largest eigen value) to the kth eigenvalue (k=number of items). The larger the eigenvalue the greater the proportion of variance accounted for by that number of factors. The Eigenvalues can then be plotted with the magnitude of the eigenvalue on the y-axis and the number of factors associated with that eigenvalue on the x-axis. This is where it gets a little loose. The number of factors is generally determined by the highest ordinal factor number with a corresponding eigenvalue larger than 1, but one can also examine the slope of the scree plot and visually assess the point at which the scree plot “breaks” or starts to flatten. Both of these approaches are also evaluated against the conceptual coherence of a particular number of factors. There is no hard and fast cutoff that is mathematically determined.
UWin_Analysis_Data_No_NA <-na.omit(UWin_Analysis_Data_2)
ev <- eigen(cor(UWin_Analysis_Data_No_NA[c(-1,-29:-32,-42)]))
ev$values
## [1] 8.5491029 4.9741335 3.0208263 2.2873128 1.7463312 1.3633053 1.0825749
## [8] 0.9063233 0.8531707 0.8244124 0.7643965 0.7111200 0.6815327 0.6439731
## [15] 0.5769964 0.5167527 0.5101867 0.5032805 0.4617738 0.4401325 0.3857815
## [22] 0.3774928 0.3577919 0.3484256 0.3400673 0.3206362 0.3165991 0.2944672
## [29] 0.2903730 0.2765311 0.2550214 0.2361756 0.2209242 0.2026628 0.1985882
## [36] 0.1608239
##Scree Plot##
scree(UWin_Analysis_Data_No_NA[c(-1,-29:-32,-42)], pc=FALSE)
fa.parallel(UWin_Analysis_Data_No_NA[c(-1,-29:-32,-42)], fa="fa")
## Parallel analysis suggests that the number of factors = 6 and the number of components = NA
Based on this profile it appears that there are somewhere between 5 and 7 factors. You can run an EFA extracting 5, 6, 7 factors in that order and examine the results to find the solution that is most statistically, conceptually, and theoretically coherent. An important step in determining the final number of factors and the optimal factor solution is to conduct factor rotations, which mathematically orient the factors in a hyperplane space (we won’t get into that here)in order to maximize what Thurstone called simple structure - a fancy way of saying that items should load maximally on one and onely one factor and the factor loadings should be as high as possible (any factor loading less than .40 is not considered).
There are two kinds of rotation: Orthogonal and Oblique - the Orthogonal rotation assumes that the factors must remain perpendicular to each other in the hyperspace plane (that is just fun to say), but functionally what this means is that the factors are forced to be uncorrelated with each other. This is often an unrealistic assumption with psychological data,despite orthoginal rotations being the most common type performed. Oblique rotations on the other hand allow the factors to become correlated to the extent that such correlations increase the simple structure of the solution. The two types of rotation approaches equimax and promax for orthoginal and oblique rotations respectively are the most common and widely applicable rotation methods.
Nfacs <- 5 # Number of factors
##Orthoganal rotation##
fit2 <- factanal(UWin_Analysis_Data_No_NA[,c(-1, -29:-32,-42)], Nfacs, rotation="equamax")
print(fit2, digits=2, cutoff=0.4, sort=TRUE)
##
## Call:
## factanal(x = UWin_Analysis_Data_No_NA[, c(-1, -29:-32, -42)], factors = Nfacs, rotation = "equamax")
##
## Uniquenesses:
## V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 B1 B2 B3 B4
## 0.80 0.78 0.54 0.51 0.55 0.61 0.71 0.41 0.28 0.24 0.60 0.25 0.60 0.43 0.49 0.43
## B5 B6 B7 B8 G1 G2 G3 G4 G5 G6 G7 C1 C2 C3 C4 C5
## 0.40 0.38 0.57 0.44 0.57 0.22 0.22 0.66 0.65 0.58 0.51 0.55 0.52 0.39 0.60 0.49
## C6 C7 C8 C9
## 0.50 0.34 0.38 0.63
##
## Loadings:
## Factor1 Factor2 Factor3 Factor4 Factor5
## C1 0.63
## C2 0.65
## C3 0.73
## C4 0.54
## C5 0.67
## C6 0.68
## C7 0.80
## C8 0.75
## C9 0.55
## B2 0.65
## B3 0.62
## B4 0.65
## B5 0.71
## B6 0.71
## B7 0.56
## B8 0.69
## V9 0.76
## V10 0.84
## V11 0.85
## V12 0.60
## V13 0.84
## G2 0.85
## G3 0.83
## V4 0.65
## V5 0.67
## V2
## V3
## V6 0.48
## V7 0.47
## V8 0.50
## B1 0.49
## G1 0.48
## G4
## G5
## G6 -0.45
## G7 0.45 0.46
##
## Factor1 Factor2 Factor3 Factor4 Factor5
## SS loadings 4.57 4.42 3.86 2.85 2.49
## Proportion Var 0.13 0.12 0.11 0.08 0.07
## Cumulative Var 0.13 0.25 0.36 0.44 0.51
##
## Test of the hypothesis that 5 factors are sufficient.
## The chi square statistic is 1803.46 on 460 degrees of freedom.
## The p-value is 4.95e-158
##Oblique rotation##
fit1 <- factanal(UWin_Analysis_Data_No_NA[c(-1, -29:-32,-42)], Nfacs, rotation="promax")
print(fit1, digits=2, cutoff=0.4, sort=TRUE)
##
## Call:
## factanal(x = UWin_Analysis_Data_No_NA[c(-1, -29:-32, -42)], factors = Nfacs, rotation = "promax")
##
## Uniquenesses:
## V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 B1 B2 B3 B4
## 0.80 0.78 0.54 0.51 0.55 0.61 0.71 0.41 0.28 0.24 0.60 0.25 0.60 0.43 0.49 0.43
## B5 B6 B7 B8 G1 G2 G3 G4 G5 G6 G7 C1 C2 C3 C4 C5
## 0.40 0.38 0.57 0.44 0.57 0.22 0.22 0.66 0.65 0.58 0.51 0.55 0.52 0.39 0.60 0.49
## C6 C7 C8 C9
## 0.50 0.34 0.38 0.63
##
## Loadings:
## Factor1 Factor2 Factor3 Factor4 Factor5
## B1 0.55
## B2 0.70
## B3 0.70
## B4 0.70
## B5 0.82
## B6 0.81
## B7 0.67
## B8 0.76
## C1 0.67
## C2 0.72
## C3 0.74
## C4 0.55
## C5 0.71
## C6 0.69
## C7 0.83
## C8 0.79
## C9 0.62
## V9 0.79
## V10 0.89
## V11 0.90
## V12 0.62
## V13 0.87
## V4 0.70
## V5 0.75
## V8 0.51
## G2 0.86
## G3 0.81
## V2
## V3
## V6 0.41
## V7 0.42
## G1 0.48
## G4
## G5
## G6 -0.41
## G7 0.49
##
## Factor1 Factor2 Factor3 Factor4 Factor5
## SS loadings 5.12 4.72 3.89 2.40 2.29
## Proportion Var 0.14 0.13 0.11 0.07 0.06
## Cumulative Var 0.14 0.27 0.38 0.45 0.51
##
## Factor Correlations:
## Factor1 Factor2 Factor3 Factor4 Factor5
## Factor1 1.00 0.4142 -0.1843 -0.4166 0.268
## Factor2 0.41 1.0000 -0.0034 -0.0098 0.234
## Factor3 -0.18 -0.0034 1.0000 0.1236 -0.436
## Factor4 -0.42 -0.0098 0.1236 1.0000 0.087
## Factor5 0.27 0.2338 -0.4361 0.0868 1.000
##
## Test of the hypothesis that 5 factors are sufficient.
## The chi square statistic is 1803.46 on 460 degrees of freedom.
## The p-value is 4.95e-158
And there you have it. EFA in a nutshell.