Let’s begin by using the psych package and conducting a single-factor explanatory factor analysis (EFA). The fa() function conducts an EFA on your data. When you’re using this in the real world, be sure to use a dataset that only contains item responses - other types of data will cause errors and/or incorrect results. In the gcbs dataset, these are examinees’ responses to 15 items from the Generic Conspiracist Beliefs Scale, which is designed to measure conspiracist beliefs.
An EFA provides information on each item’s relationship to a single factor hypothesized to be represented by each of the items. EFA results give you basic information about how well items relate to that hypothesized construct.
Be sure to save the analysis result object so you can return to it later.
# Load the psych package
library(psych)
url <- "https://assets.datacamp.com/production/repositories/2136/datasets/869615371e66021e97829feb7e19e38037ed0c14/GCBS_data.rds"
gcbs <- readRDS(gzcon(url(url)))
# Conduct a single-factor EFA
EFA_model <- fa(gcbs)
# View the results
print(EFA_model)
## Factor Analysis using method = minres
## Call: fa(r = gcbs)
## Standardized loadings (pattern matrix) based upon correlation matrix
## MR1 h2 u2 com
## Q1 0.70 0.49 0.51 1
## Q2 0.72 0.52 0.48 1
## Q3 0.64 0.41 0.59 1
## Q4 0.77 0.59 0.41 1
## Q5 0.67 0.45 0.55 1
## Q6 0.75 0.56 0.44 1
## Q7 0.73 0.54 0.46 1
## Q8 0.65 0.43 0.57 1
## Q9 0.70 0.48 0.52 1
## Q10 0.56 0.32 0.68 1
## Q11 0.72 0.52 0.48 1
## Q12 0.79 0.62 0.38 1
## Q13 0.68 0.46 0.54 1
## Q14 0.74 0.55 0.45 1
## Q15 0.57 0.33 0.67 1
##
## MR1
## SS loadings 7.27
## Proportion Var 0.48
##
## Mean item complexity = 1
## Test of the hypothesis that 1 factor is sufficient.
##
## The degrees of freedom for the null model are 105 and the objective function was 9.31 with Chi Square of 23173.8
## The degrees of freedom for the model are 90 and the objective function was 1.93
##
## The root mean square of the residuals (RMSR) is 0.08
## The df corrected root mean square of the residuals is 0.09
##
## The harmonic number of observations is 2495 with the empirical chi square 3398.99 with prob < 0
## The total number of observations was 2495 with Likelihood Chi Square = 4809.34 with prob < 0
##
## Tucker Lewis Index of factoring reliability = 0.761
## RMSEA index = 0.145 and the 90 % confidence intervals are 0.142 0.149
## BIC = 4105.36
## Fit based upon off diagonal values = 0.97
## Measures of factor score adequacy
## MR1
## Correlation of (regression) scores with factors 0.97
## Multiple R square of scores with factors 0.94
## Minimum correlation of possible factor scores 0.87
Each fa() results object is actually a list, and each element of the list contains specific information about the analysis, including factor loadings. Factor loadings represent the strength and directionality of the relationship between each item and the underlying factor, and they can range from -1 to 1.
You can also create a diagram of loadings. The fa.diagram() function takes a result object from fa() and creates a path diagram showing the items’ loadings ordered from strongest to weakest. Path diagrams are more common for structural equation modeling than for factor analysis, but this type of visualization can be a helpful way to represent your results.
# Set up the single-factor EFA
EFA_model <- fa(gcbs)
# View the factor loadings
EFA_model$loadings
##
## Loadings:
## MR1
## Q1 0.703
## Q2 0.719
## Q3 0.638
## Q4 0.770
## Q5 0.672
## Q6 0.746
## Q7 0.734
## Q8 0.654
## Q9 0.695
## Q10 0.565
## Q11 0.719
## Q12 0.786
## Q13 0.679
## Q14 0.743
## Q15 0.574
##
## MR1
## SS loadings 7.267
## Proportion Var 0.484
# Create a path diagram of the items' factor loadings
fa.diagram(EFA_model)
The EFA_model object also contains a named list element, scores, which contains factor scores for each person. These factor scores are an indication of how much or how little of the factor each person is thought to possess. Factor scores are not computed for examinees with missing data.
head(gcbs)
## Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15
## 1 5 5 3 5 5 5 5 3 4 5 5 5 3 5 5
## 2 5 5 5 5 5 3 5 5 1 4 4 5 4 4 5
## 3 2 4 1 2 2 2 4 2 2 4 2 4 0 2 4
## 4 5 4 1 2 4 5 4 1 4 5 5 5 1 4 5
## 5 5 4 1 4 4 5 4 3 1 5 5 5 3 5 5
## 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
rowSums(gcbs[1:6, ])
## 1 2 3 4 5 6
## 68 65 37 55 59 15
head(EFA_model$scores)
## MR1
## [1,] 1.5614675
## [2,] 1.3432026
## [3,] -0.3960355
## [4,] 0.7478868
## [5,] 1.0435203
## [6,] -1.7290812
summary(EFA_model$scores)
## MR1
## Min. :-1.854703
## 1st Qu.:-0.783260
## Median :-0.001971
## Mean : 0.000000
## 3rd Qu.: 0.728568
## Max. : 1.949580
plot(density(EFA_model$scores, na.rm = TRUE),
main = "Factor Scores")
The psych package provides several functions to help you visualize your data. The describe() function tells you basic statistics about each variable in your dataset, including the number of complete cases, the mean, standard deviation, minimum, maximum, range, skew, and kurtosis.
You can also view graphical representations of the error bars for different variables using error.dots() and error.bars(). Both of these graphical representations are created from the summary statistics available from calling describe().
describe(gcbs)
## vars n mean sd median trimmed mad min max range skew kurtosis
## Q1 1 2495 3.47 1.46 4 3.59 1.48 0 5 5 -0.55 -1.10
## Q2 2 2495 2.96 1.49 3 2.96 1.48 0 5 5 -0.01 -1.40
## Q3 3 2495 2.05 1.39 1 1.82 0.00 0 5 5 0.98 -0.44
## Q4 4 2495 2.64 1.45 2 2.55 1.48 0 5 5 0.26 -1.34
## Q5 5 2495 3.25 1.47 4 3.32 1.48 0 5 5 -0.35 -1.27
## Q6 6 2495 3.11 1.51 3 3.14 1.48 0 5 5 -0.17 -1.42
## Q7 7 2495 2.67 1.51 2 2.59 1.48 0 5 5 0.28 -1.39
## Q8 8 2495 2.45 1.57 2 2.32 1.48 0 5 5 0.51 -1.30
## Q9 9 2495 2.23 1.42 2 2.05 1.48 0 5 5 0.76 -0.82
## Q10 10 2495 3.50 1.39 4 3.63 1.48 1 5 4 -0.59 -0.94
## Q11 11 2495 3.27 1.40 4 3.34 1.48 0 5 5 -0.35 -1.11
## Q12 12 2495 2.64 1.50 2 2.56 1.48 0 5 5 0.29 -1.37
## Q13 13 2495 2.10 1.38 1 1.89 0.00 0 5 5 0.89 -0.56
## Q14 14 2495 2.96 1.49 3 2.95 1.48 0 5 5 -0.02 -1.43
## Q15 15 2495 4.23 1.10 5 4.47 0.00 0 5 5 -1.56 1.71
## se
## Q1 0.03
## Q2 0.03
## Q3 0.03
## Q4 0.03
## Q5 0.03
## Q6 0.03
## Q7 0.03
## Q8 0.03
## Q9 0.03
## Q10 0.03
## Q11 0.03
## Q12 0.03
## Q13 0.03
## Q14 0.03
## Q15 0.02
error.dots(gcbs)
error.bars(gcbs)
During the measure development process, it’s important to conduct EFA and CFA on separate datasets because using the same dataset can lead to inflated model fit statistics. Instead, you can split your dataset in half, then use one half for the EFA and the other half for the CFA.
N <- nrow(gcbs)
indices <- seq(1, N)
indices_EFA <- sample(indices, floor((.5*N)))
indices_CFA <- indices[!(indices %in% indices_EFA)]
gcbs_EFA <- gcbs[indices_EFA, ]
gcbs_CFA <- gcbs[indices_CFA, ]
Just as you inspected the features of your full dataset, it’s important to examine the halves after you’ve split the data. You can always use describe() on each dataset, but the psych package also provides some functions to help compare a dataset according to a grouping variable.
In this exercise, you’ll use the indices created when splitting the dataset to create a grouping variable and attach it to the gcbs dataset. Once that grouping variable is set up, you can use describeBy() and statsBy() to view basic descriptive statistics as well as between-group statistics.
A word of warning: while the group argument of describeBy() has to be a vector, the group argument of statsBy() has to be the name of a column in your dataframe. Plan accordingly!
# Use the indices from the previous exercise to create a grouping variable
group_var <- vector("numeric", nrow(gcbs))
group_var[indices_EFA] <- 1
group_var[indices_CFA] <- 2
# Bind that grouping variable onto the gcbs dataset
gcbs_grouped <- cbind(gcbs, group_var)
# Compare stats across groups
describeBy(gcbs_grouped, group = group_var)
##
## Descriptive statistics by group
## group: 1
## vars n mean sd median trimmed mad min max range skew
## Q1 1 1247 3.48 1.45 4 3.60 1.48 0 5 5 -0.55
## Q2 2 1247 2.98 1.50 3 2.98 1.48 0 5 5 0.00
## Q3 3 1247 2.03 1.40 1 1.80 0.00 0 5 5 1.00
## Q4 4 1247 2.68 1.47 2 2.61 1.48 0 5 5 0.24
## Q5 5 1247 3.32 1.47 4 3.40 1.48 0 5 5 -0.41
## Q6 6 1247 3.13 1.50 3 3.16 1.48 0 5 5 -0.18
## Q7 7 1247 2.68 1.51 2 2.60 1.48 0 5 5 0.28
## Q8 8 1247 2.43 1.55 2 2.29 1.48 0 5 5 0.55
## Q9 9 1247 2.30 1.45 2 2.13 1.48 0 5 5 0.69
## Q10 10 1247 3.50 1.37 4 3.62 1.48 1 5 4 -0.58
## Q11 11 1247 3.28 1.40 4 3.36 1.48 0 5 5 -0.38
## Q12 12 1247 2.66 1.52 2 2.58 1.48 0 5 5 0.29
## Q13 13 1247 2.11 1.39 1 1.90 0.00 0 5 5 0.88
## Q14 14 1247 3.01 1.49 3 3.02 1.48 0 5 5 -0.06
## Q15 15 1247 4.25 1.08 5 4.47 0.00 1 5 4 -1.56
## group_var 16 1247 1.00 0.00 1 1.00 0.00 1 1 0 NaN
## kurtosis se
## Q1 -1.10 0.04
## Q2 -1.41 0.04
## Q3 -0.41 0.04
## Q4 -1.36 0.04
## Q5 -1.22 0.04
## Q6 -1.40 0.04
## Q7 -1.38 0.04
## Q8 -1.23 0.04
## Q9 -0.95 0.04
## Q10 -0.91 0.04
## Q11 -1.10 0.04
## Q12 -1.38 0.04
## Q13 -0.57 0.04
## Q14 -1.41 0.04
## Q15 1.78 0.03
## group_var NaN 0.00
## --------------------------------------------------------
## group: 2
## vars n mean sd median trimmed mad min max range skew
## Q1 1 1248 3.47 1.46 4 3.58 1.48 0 5 5 -0.55
## Q2 2 1248 2.95 1.49 3 2.94 1.48 0 5 5 -0.02
## Q3 3 1248 2.06 1.38 1 1.84 0.00 0 5 5 0.96
## Q4 4 1248 2.59 1.43 2 2.49 1.48 0 5 5 0.28
## Q5 5 1248 3.19 1.47 4 3.24 1.48 0 5 5 -0.28
## Q6 6 1248 3.09 1.51 3 3.11 1.48 0 5 5 -0.15
## Q7 7 1248 2.65 1.51 2 2.57 1.48 0 5 5 0.29
## Q8 8 1248 2.48 1.59 2 2.35 1.48 0 5 5 0.48
## Q9 9 1248 2.17 1.39 2 1.97 1.48 0 5 5 0.83
## Q10 10 1248 3.51 1.41 4 3.63 1.48 1 5 4 -0.59
## Q11 11 1248 3.25 1.40 3 3.31 1.48 0 5 5 -0.32
## Q12 12 1248 2.63 1.49 2 2.55 1.48 0 5 5 0.28
## Q13 13 1248 2.09 1.38 1 1.88 0.00 0 5 5 0.89
## Q14 14 1248 2.90 1.49 3 2.87 1.48 0 5 5 0.03
## Q15 15 1248 4.21 1.13 5 4.46 0.00 0 5 5 -1.55
## group_var 16 1248 2.00 0.00 2 2.00 0.00 2 2 0 NaN
## kurtosis se
## Q1 -1.11 0.04
## Q2 -1.39 0.04
## Q3 -0.48 0.04
## Q4 -1.32 0.04
## Q5 -1.31 0.04
## Q6 -1.43 0.04
## Q7 -1.40 0.04
## Q8 -1.36 0.05
## Q9 -0.67 0.04
## Q10 -0.98 0.04
## Q11 -1.13 0.04
## Q12 -1.36 0.04
## Q13 -0.56 0.04
## Q14 -1.45 0.04
## Q15 1.61 0.03
## group_var NaN 0.00
statsBy(gcbs_grouped, group = "group_var")
## Statistics within and between groups
## Call: statsBy(data = gcbs_grouped, group = "group_var")
## Intraclass Correlation 1 (Percentage of variance due to groups)
## Q1 Q2 Q3 Q4 Q5 Q6 Q7
## 0 0 0 0 0 0 0
## Q8 Q9 Q10 Q11 Q12 Q13 Q14
## 0 0 0 0 0 0 0
## Q15 group_var
## 0 1
## Intraclass Correlation 2 (Reliability of group differences)
## Q1 Q2 Q3 Q4 Q5 Q6 Q7
## -16.32 -2.67 -4.01 0.65 0.80 -1.17 -5.05
## Q8 Q9 Q10 Q11 Q12 Q13 Q14
## -0.52 0.82 -21.90 -1.50 -6.40 -7.94 0.74
## Q15 group_var
## -0.56 1.00
## eta^2 between groups
## Q1.bg Q2.bg Q3.bg Q4.bg Q5.bg Q6.bg Q7.bg Q8.bg Q9.bg Q10.bg
## 0 0 0 0 0 0 0 0 0 0
## Q11.bg Q12.bg Q13.bg Q14.bg Q15.bg
## 0 0 0 0 0
##
## To see the correlations between and within groups, use the short=FALSE option in your print statement.
## Many results are not shown directly. To see specific objects select from the following list:
## mean sd n F ICC1 ICC2 ci1 ci2 raw rbg pbg rwg nw pwg etabg etawg nwg nG Call
One of the easiest ways to get a feel for your dataset is to examine the relationships of your variables with each other. While base R has the cor() function, the psych package has the lowerCor() function, which only displays the lower triangle of the correlation matrix for easier viewing and interpretation.
You can also use the corr.test() function to view several probability values, including p-values from t-tests of each correlation and confidence intervals for the correlations. P-values are corrected using the Holm correction by default.
lowerCor(gcbs, use = "pairwise.complete.obs")
## Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10
## Q1 1.00
## Q2 0.53 1.00
## Q3 0.36 0.40 1.00
## Q4 0.52 0.53 0.50 1.00
## Q5 0.48 0.46 0.40 0.57 1.00
## Q6 0.63 0.55 0.40 0.61 0.50 1.00
## Q7 0.47 0.67 0.42 0.57 0.45 0.54 1.00
## Q8 0.39 0.38 0.78 0.49 0.41 0.41 0.41 1.00
## Q9 0.42 0.49 0.49 0.56 0.46 0.48 0.53 0.48 1.00
## Q10 0.44 0.38 0.32 0.40 0.43 0.41 0.39 0.36 0.37 1.00
## Q11 0.64 0.52 0.34 0.52 0.49 0.62 0.49 0.37 0.46 0.45
## Q12 0.52 0.72 0.44 0.60 0.49 0.59 0.75 0.42 0.57 0.40
## Q13 0.38 0.40 0.71 0.51 0.43 0.42 0.45 0.76 0.54 0.37
## Q14 0.53 0.50 0.43 0.60 0.54 0.55 0.52 0.45 0.55 0.41
## Q15 0.51 0.40 0.27 0.39 0.45 0.47 0.39 0.31 0.32 0.45
## Q11 Q12 Q13 Q14 Q15
## Q11 1.00
## Q12 0.55 1.00
## Q13 0.40 0.49 1.00
## Q14 0.56 0.56 0.50 1.00
## Q15 0.54 0.41 0.30 0.46 1.00
corr.test(gcbs, use = "pairwise.complete.obs")$ci
## lower r upper p
## Q1-Q2 0.4970162 0.5259992 0.5538098 1.384140e-177
## Q1-Q3 0.3206223 0.3553928 0.3892067 3.608276e-75
## Q1-Q4 0.4953852 0.5244323 0.5523079 2.359896e-176
## Q1-Q5 0.4503342 0.4810747 0.5106759 1.054746e-144
## Q1-Q6 0.6071117 0.6313131 0.6543444 1.477903e-277
## Q1-Q7 0.4412058 0.4722710 0.5022057 8.142449e-139
## Q1-Q8 0.3564216 0.3902059 0.4229712 1.549786e-91
## Q1-Q9 0.3850453 0.4179718 0.4498355 4.344797e-106
## Q1-Q10 0.4034438 0.4357865 0.4670415 3.550942e-116
## Q1-Q11 0.6199265 0.6435136 0.6659388 9.499179e-292
## Q1-Q12 0.4932727 0.5224025 0.5503620 9.097129e-175
## Q1-Q13 0.3464313 0.3805006 0.4135673 9.052542e-87
## Q1-Q14 0.5059498 0.5345780 0.5620298 1.912163e-184
## Q1-Q15 0.4753633 0.5051815 0.5338405 9.793389e-162
## Q2-Q3 0.3618855 0.3955108 0.4281083 3.282447e-94
## Q2-Q4 0.5062706 0.5348860 0.5623248 1.075388e-184
## Q2-Q5 0.4259018 0.4574975 0.4879788 2.578663e-129
## Q2-Q6 0.5234810 0.5513960 0.5781285 1.757495e-198
## Q2-Q7 0.6501266 0.6722188 0.6931753 0.000000e+00
## Q2-Q8 0.3425926 0.3767693 0.4099501 5.579984e-85
## Q2-Q9 0.4556319 0.4861810 0.5155863 3.376084e-148
## Q2-Q10 0.3464233 0.3804928 0.4135598 9.131376e-87
## Q2-Q11 0.4915283 0.5207263 0.5487548 1.822105e-173
## Q2-Q12 0.6962013 0.7158851 0.7344931 0.000000e+00
## Q2-Q13 0.3667134 0.4001964 0.4326439 1.297778e-96
## Q2-Q14 0.4702225 0.5002339 0.5290898 3.969981e-158
## Q2-Q15 0.3659505 0.3994560 0.4319274 3.129498e-96
## Q3-Q4 0.4693335 0.4993781 0.5282679 1.647531e-157
## Q3-Q5 0.3653695 0.3988923 0.4313817 6.108182e-96
## Q3-Q6 0.3667467 0.4002287 0.4326752 1.248831e-96
## Q3-Q7 0.3904688 0.4232258 0.4549125 5.318107e-109
## Q3-Q8 0.7683496 0.7839542 0.7986273 0.000000e+00
## Q3-Q9 0.4601647 0.4905484 0.5197845 3.104324e-151
## Q3-Q10 0.2853436 0.3209913 0.3557521 6.760094e-61
## Q3-Q11 0.3066780 0.3418064 0.3760050 2.580298e-69
## Q3-Q12 0.4061739 0.4384278 0.4695906 1.005698e-117
## Q3-Q13 0.6919756 0.7118867 0.7307155 0.000000e+00
## Q3-Q14 0.3989973 0.4314834 0.4628876 1.103607e-113
## Q3-Q15 0.2285790 0.2654400 0.3015410 1.676489e-41
## Q4-Q5 0.5438704 0.5709273 0.5967985 8.032837e-216
## Q4-Q6 0.5837641 0.6090539 0.6331630 3.037754e-253
## Q4-Q7 0.5394959 0.5667395 0.5927977 5.260693e-212
## Q4-Q8 0.4589969 0.4894234 0.5187032 1.898345e-150
## Q4-Q9 0.5374441 0.5647747 0.5909202 3.108940e-210
## Q4-Q10 0.3706739 0.4040387 0.4363621 1.298818e-98
## Q4-Q11 0.4932765 0.5224062 0.5503655 9.037626e-175
## Q4-Q12 0.5749365 0.6006273 0.6251350 1.550592e-244
## Q4-Q13 0.4833700 0.5128834 0.5412322 1.798859e-167
## Q4-Q14 0.5782876 0.6038268 0.6281838 8.219433e-248
## Q4-Q15 0.3607035 0.3943633 0.4269973 1.255385e-93
## Q5-Q6 0.4650551 0.4952588 0.5243108 1.470071e-154
## Q5-Q7 0.4142088 0.4461981 0.4770864 2.336455e-122
## Q5-Q8 0.3765708 0.4097576 0.4418941 1.224898e-101
## Q5-Q9 0.4328328 0.4641904 0.4944261 1.462606e-133
## Q5-Q10 0.3951535 0.4277623 0.4592945 1.476449e-111
## Q5-Q11 0.4632435 0.4935141 0.5226345 2.540009e-153
## Q5-Q12 0.4617541 0.4920795 0.5212560 2.612164e-152
## Q5-Q13 0.3953385 0.4279414 0.4594675 1.168044e-111
## Q5-Q14 0.5106389 0.5390785 0.5663399 4.008639e-188
## Q5-Q15 0.4189383 0.4507697 0.4814945 3.849304e-125
## Q6-Q7 0.5120337 0.5404170 0.5676214 3.149502e-189
## Q6-Q8 0.3794902 0.4125879 0.4446310 3.704932e-103
## Q6-Q9 0.4534992 0.4841255 0.5136099 8.755609e-147
## Q6-Q10 0.3747259 0.4079687 0.4401639 1.098605e-100
## Q6-Q11 0.5981808 0.6228032 0.6462508 4.981130e-268
## Q6-Q12 0.5610551 0.5873651 0.6124896 2.403960e-231
## Q6-Q13 0.3915639 0.4242864 0.4559371 1.353745e-109
## Q6-Q14 0.5190730 0.5471694 0.5740847 7.044729e-195
## Q6-Q15 0.4345044 0.4658040 0.4959800 1.340087e-134
## Q7-Q8 0.3795181 0.4126150 0.4446571 3.582463e-103
## Q7-Q9 0.5001718 0.5290301 0.5567146 5.497267e-180
## Q7-Q10 0.3547857 0.3886172 0.4214323 9.585238e-91
## Q7-Q11 0.4624648 0.4927641 0.5219138 8.601767e-153
## Q7-Q12 0.7365820 0.7540288 0.7704729 0.000000e+00
## Q7-Q13 0.4167211 0.4486267 0.4794284 7.858024e-124
## Q7-Q14 0.4867147 0.5160994 0.5443174 6.545724e-170
## Q7-Q15 0.3549662 0.3887925 0.4216021 7.843219e-91
## Q8-Q9 0.4490408 0.4798277 0.5094765 7.372509e-144
## Q8-Q10 0.3302521 0.3647668 0.3983073 2.223163e-79
## Q8-Q11 0.3367608 0.3710987 0.4044508 2.643708e-82
## Q8-Q12 0.3904041 0.4231631 0.4548520 5.764588e-109
## Q8-Q13 0.7398147 0.7570774 0.7733440 0.000000e+00
## Q8-Q14 0.4175690 0.4494462 0.4802185 2.485870e-124
## Q8-Q15 0.2696028 0.3056115 0.3407668 4.328489e-55
## Q9-Q10 0.3321729 0.3666358 0.4001210 3.093874e-80
## Q9-Q11 0.4294991 0.4609717 0.4913259 1.655959e-131
## Q9-Q12 0.5439334 0.5709876 0.5968561 7.071640e-216
## Q9-Q13 0.5083417 0.5368740 0.5642288 2.581509e-186
## Q9-Q14 0.5268510 0.5546263 0.5812182 2.858794e-201
## Q9-Q15 0.2858045 0.3214413 0.3561903 4.516727e-61
## Q10-Q11 0.4188025 0.4506384 0.4813679 4.633595e-125
## Q10-Q12 0.3705162 0.4038857 0.4362140 1.561964e-98
## Q10-Q13 0.3339871 0.3684007 0.4018335 4.747674e-81
## Q10-Q14 0.3815258 0.4145611 0.4465387 3.168647e-104
## Q10-Q15 0.4176732 0.4495470 0.4803157 2.157366e-124
## Q11-Q12 0.5197817 0.5478491 0.5747350 1.870488e-195
## Q11-Q13 0.3651436 0.3986730 0.4311694 7.920246e-96
## Q11-Q14 0.5342028 0.5616704 0.5879532 1.851647e-207
## Q11-Q15 0.5111332 0.5395529 0.5667941 1.629274e-188
## Q12-Q13 0.4545949 0.4851817 0.5146255 1.648258e-147
## Q12-Q14 0.5354702 0.5628844 0.5891136 1.534183e-208
## Q12-Q15 0.3766079 0.4097936 0.4419289 1.171847e-101
## Q13-Q14 0.4667464 0.4968874 0.5258754 1.013162e-155
## Q13-Q15 0.2625236 0.2986885 0.3340155 1.375737e-52
## Q14-Q15 0.4302457 0.4616926 0.4920203 5.766839e-132
You know how to examine how individual items perform in your measure, but what about how well those items relate to each other - the overall internal reliability of a measure? Coefficient alpha (also called Cronbach’s alpha) and split-half reliability are two common ways of assessing reliability. These statistics are a function of the measure length and items’ interrelatedness, which you just investigated by looking at the correlation matrix.
In reliability values greater than 0.8 are desired, though some fields of study have higher or lower guidelines.
Do the results of alpha() and splitHalf() indicate the gcbs dataset has acceptable reliability?
# Estimate coefficient alpha
alpha(gcbs)
##
## Reliability analysis
## Call: alpha(x = gcbs)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.93 0.93 0.94 0.48 14 0.002 2.9 1 0.47
##
## lower alpha upper 95% confidence boundaries
## 0.93 0.93 0.94
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## Q1 0.93 0.93 0.94 0.48 13 0.0021 0.0105 0.46
## Q2 0.93 0.93 0.94 0.48 13 0.0021 0.0099 0.47
## Q3 0.93 0.93 0.94 0.49 13 0.0020 0.0084 0.48
## Q4 0.93 0.93 0.94 0.47 13 0.0022 0.0105 0.46
## Q5 0.93 0.93 0.94 0.48 13 0.0021 0.0112 0.48
## Q6 0.93 0.93 0.94 0.48 13 0.0021 0.0104 0.46
## Q7 0.93 0.93 0.94 0.48 13 0.0021 0.0098 0.47
## Q8 0.93 0.93 0.94 0.48 13 0.0020 0.0086 0.49
## Q9 0.93 0.93 0.94 0.48 13 0.0021 0.0108 0.46
## Q10 0.93 0.93 0.94 0.49 14 0.0020 0.0102 0.49
## Q11 0.93 0.93 0.94 0.48 13 0.0021 0.0104 0.46
## Q12 0.93 0.93 0.94 0.47 13 0.0022 0.0093 0.46
## Q13 0.93 0.93 0.94 0.48 13 0.0021 0.0092 0.48
## Q14 0.93 0.93 0.94 0.48 13 0.0021 0.0109 0.46
## Q15 0.93 0.93 0.94 0.49 14 0.0020 0.0095 0.49
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## Q1 2495 0.73 0.73 0.70 0.68 3.5 1.5
## Q2 2495 0.74 0.74 0.72 0.69 3.0 1.5
## Q3 2495 0.68 0.67 0.66 0.62 2.0 1.4
## Q4 2495 0.78 0.78 0.76 0.74 2.6 1.5
## Q5 2495 0.70 0.70 0.67 0.65 3.3 1.5
## Q6 2495 0.76 0.76 0.74 0.72 3.1 1.5
## Q7 2495 0.75 0.75 0.73 0.70 2.7 1.5
## Q8 2495 0.69 0.69 0.68 0.63 2.5 1.6
## Q9 2495 0.72 0.72 0.69 0.67 2.2 1.4
## Q10 2495 0.61 0.61 0.57 0.55 3.5 1.4
## Q11 2495 0.74 0.74 0.72 0.69 3.3 1.4
## Q12 2495 0.79 0.79 0.79 0.75 2.6 1.5
## Q13 2495 0.71 0.71 0.70 0.66 2.1 1.4
## Q14 2495 0.76 0.76 0.74 0.71 3.0 1.5
## Q15 2495 0.60 0.62 0.58 0.56 4.2 1.1
##
## Non missing response frequency for each item
## 0 1 2 3 4 5 miss
## Q1 0.00 0.16 0.12 0.12 0.27 0.32 0
## Q2 0.01 0.23 0.19 0.16 0.20 0.22 0
## Q3 0.00 0.55 0.13 0.12 0.10 0.10 0
## Q4 0.00 0.32 0.18 0.15 0.20 0.14 0
## Q5 0.00 0.19 0.14 0.13 0.28 0.26 0
## Q6 0.00 0.23 0.15 0.15 0.23 0.24 0
## Q7 0.00 0.33 0.19 0.13 0.18 0.17 0
## Q8 0.00 0.44 0.12 0.14 0.12 0.18 0
## Q9 0.00 0.45 0.19 0.12 0.12 0.11 0
## Q10 0.00 0.14 0.12 0.14 0.30 0.30 0
## Q11 0.00 0.16 0.14 0.19 0.27 0.24 0
## Q12 0.00 0.34 0.18 0.15 0.17 0.17 0
## Q13 0.01 0.51 0.15 0.15 0.10 0.09 0
## Q14 0.00 0.25 0.17 0.15 0.22 0.20 0
## Q15 0.00 0.05 0.05 0.08 0.27 0.55 0
splitHalf(gcbs)
## Split half reliabilities
## Call: splitHalf(r = gcbs)
##
## Maximum split half reliability (lambda 4) = 0.95
## Guttman lambda 6 = 0.94
## Average split half reliability = 0.93
## Guttman lambda 3 (alpha) = 0.93
## Minimum split half reliability (beta) = 0.86
## Average interitem r = 0.48 with median = 0.47
For which of these measure development situations would an exploratory factor analysis be appropriate?
The answer is You are piloting a measure, but you’re not sure which factor(s) each item measures.
This chapter will show you how to extend the single-factor EFA you learned in Chapter 1 to multidimensional data.
For this chapter, you’ll be using the bfi dataset, which consists of responses to 25 items measuring the Big Five personality traits. I’ve trimmed the dataset down to only the item responses for your use. Since you’ll be doing both exploratory and confirmatory factor analyses on this data, you’ll want to start out by splitting the dataset. You’ll use the same process that you learned in Chapter 1 when you split the gcbs dataset.
data("bfi")
# Establish two sets of indices to split the dataset
N <- nrow(bfi)
indices <- seq(1, N)
indices_EFA <- sample(indices, floor((.5*N)))
indices_CFA <- indices[!(indices %in% indices_EFA)]
# Use those indices to split the dataset into halves for your EFA and CFA
bfi_EFA <- bfi[indices_EFA, ]
bfi_CFA <- bfi[indices_CFA, ]
To empirically determine the dimensionality of your data, a common strategy is to examine the eigenvalues. Eigenvalues are numeric representations of the amount of variance explained by each factor or component. Eigenvalues are calculated from a correlation matrix, so you’ll need to use cor() to calculate and store the dataset’s correlation matrix before calculating eigenvalues. You’ll need to specify that you want to use pairwise complete observations. The default is to use everything, but if your dataset has any missing values, this will leave you with a matrix full of NAs.
You’ll do these calculations on the bfi_EFA dataset you just created - remember, you’re saving the data in bfi_CFA for your confirmatory analysis!
# Calculate the correlation matrix first
bfi_EFA_cor <- cor(bfi_EFA, use = "pairwise.complete.obs")
# Then use that correlation matrix to calculate eigenvalues
eigenvals <- eigen(bfi_EFA_cor)
# Look at the eigenvalues returned
eigenvals$values
## [1] 4.9862831 2.8109476 2.1136723 1.8714942 1.6649654 1.2674485 1.1443434
## [8] 0.9312817 0.8503565 0.8184843 0.7558814 0.7135064 0.6853813 0.6575841
## [15] 0.6546158 0.6149472 0.5779194 0.5593646 0.5392932 0.5267551 0.4964413
## [22] 0.4873962 0.4336545 0.4221291 0.3932725 0.3864433 0.3715366 0.2646011
Now you know how to use the correlation matrix to calculate eigenvalues from a dataset.
The scree plot is a visual representation of eigenvalues. Visual inspection of the scree plot is a quick and easy way to get a feel for the dimensionality of your dataset. Like the eigen() function in the previous exercise, the scree() function takes a correlation matrix as its argument. Eigenvalues can be generated from a principal component analysis or a factor analysis, and the scree() function calculates and plots both by default. Since eigen() finds eigenvalues via principal components analysis, we will use factors = FALSE so our scree plot will only display the values corresponding to those results.
# Then use that correlation matrix to create the scree plot
scree(bfi_EFA_cor, factors = FALSE)
A commonly used criterion for selecting the optimal number of factors is to only consider factors with eigenvalues greater than 1. scree() includes a solid horizontal line at 1 on the y-axis to help you quickly interpret your results. Run the code below to recreate the scree plot from the bfi_EFA data you created in the previous exercise. Based on the results, how many factors are recommended?
The answer is 6 factors
Now that you’ve examined the eigenvalues and scree plot to find the data-driven recommended number of factors, you can get down to actually running the multidimensional EFA. In Chapter 1, you ran a unidimensional EFA by using the fa() function. To run a multidimensional EFA, you’ll want to use the nfactors argument to specify the number of factors desired.
As in the previous exercises, you’ll want to be sure to run this analysis on the bfi_EFA dataset you created earlier.
# Run the EFA with six factors (as indicated by your scree plot)
EFA_model <- fa(bfi_EFA, nfactors = 6)
## Loading required namespace: GPArotation
# View results from the model object
print(EFA_model)
## Factor Analysis using method = minres
## Call: fa(r = bfi_EFA, nfactors = 6)
## Standardized loadings (pattern matrix) based upon correlation matrix
## MR2 MR1 MR3 MR5 MR4 MR6 h2 u2 com
## A1 0.11 -0.08 0.10 -0.58 0.05 0.29 0.388 0.61 1.7
## A2 0.02 -0.06 0.06 0.65 0.05 0.01 0.479 0.52 1.0
## A3 -0.03 -0.15 0.07 0.57 0.08 0.13 0.492 0.51 1.3
## A4 -0.09 -0.07 0.25 0.36 -0.11 0.15 0.283 0.72 2.7
## A5 -0.14 -0.20 0.05 0.42 0.13 0.25 0.457 0.54 2.8
## C1 0.02 0.06 0.55 -0.02 0.17 0.01 0.339 0.66 1.2
## C2 0.06 0.13 0.70 0.01 0.11 0.15 0.514 0.49 1.2
## C3 -0.01 0.05 0.55 0.05 -0.06 0.01 0.296 0.70 1.1
## C4 0.05 0.11 -0.58 -0.05 0.04 0.34 0.515 0.49 1.7
## C5 0.17 0.18 -0.53 -0.01 0.14 0.07 0.428 0.57 1.7
## E1 -0.13 0.59 0.13 -0.13 -0.05 0.10 0.382 0.62 1.4
## E2 0.05 0.69 -0.02 -0.03 -0.04 0.06 0.534 0.47 1.0
## E3 0.01 -0.39 0.00 0.13 0.38 0.21 0.486 0.51 2.8
## E4 -0.06 -0.55 0.01 0.17 0.04 0.28 0.532 0.47 1.8
## E5 0.16 -0.43 0.27 0.05 0.18 -0.03 0.387 0.61 2.5
## N1 0.83 -0.09 0.03 -0.09 -0.06 0.00 0.668 0.33 1.1
## N2 0.82 -0.07 0.00 -0.04 0.01 -0.06 0.656 0.34 1.0
## N3 0.69 0.11 -0.05 0.05 0.08 0.05 0.550 0.45 1.1
## N4 0.45 0.39 -0.14 0.06 0.11 0.01 0.483 0.52 2.4
## N5 0.46 0.23 0.01 0.20 -0.14 0.20 0.383 0.62 2.6
## O1 -0.05 -0.04 0.09 -0.05 0.58 0.04 0.361 0.64 1.1
## O2 0.07 0.00 -0.05 0.07 -0.34 0.38 0.272 0.73 2.2
## O3 0.01 -0.08 0.04 0.05 0.64 -0.01 0.470 0.53 1.1
## O4 0.13 0.33 0.01 0.16 0.37 -0.06 0.263 0.74 2.7
## O5 0.04 -0.02 0.00 -0.07 -0.44 0.37 0.343 0.66 2.0
## gender 0.20 -0.08 0.10 0.29 -0.22 -0.07 0.138 0.86 3.4
## education -0.02 0.09 0.02 0.10 0.02 -0.18 0.045 0.96 2.2
## age -0.08 0.08 0.11 0.23 -0.04 -0.20 0.115 0.89 3.0
##
## MR2 MR1 MR3 MR5 MR4 MR6
## SS loadings 2.55 2.23 2.06 1.87 1.65 0.91
## Proportion Var 0.09 0.08 0.07 0.07 0.06 0.03
## Cumulative Var 0.09 0.17 0.24 0.31 0.37 0.40
## Proportion Explained 0.23 0.20 0.18 0.17 0.15 0.08
## Cumulative Proportion 0.23 0.42 0.61 0.77 0.92 1.00
##
## With factor correlations of
## MR2 MR1 MR3 MR5 MR4 MR6
## MR2 1.00 0.23 -0.18 -0.11 0.05 0.14
## MR1 0.23 1.00 -0.22 -0.28 -0.19 -0.08
## MR3 -0.18 -0.22 1.00 0.17 0.17 -0.01
## MR5 -0.11 -0.28 0.17 1.00 0.23 0.10
## MR4 0.05 -0.19 0.17 0.23 1.00 0.01
## MR6 0.14 -0.08 -0.01 0.10 0.01 1.00
##
## Mean item complexity = 1.8
## Test of the hypothesis that 6 factors are sufficient.
##
## The degrees of freedom for the null model are 378 and the objective function was 7.57 with Chi Square of 10512.46
## The degrees of freedom for the model are 225 and the objective function was 0.66
##
## The root mean square of the residuals (RMSR) is 0.03
## The df corrected root mean square of the residuals is 0.03
##
## The harmonic number of observations is 1373 with the empirical chi square 702.03 with prob < 1.8e-50
## The total number of observations was 1400 with Likelihood Chi Square = 917.72 with prob < 2.2e-84
##
## Tucker Lewis Index of factoring reliability = 0.885
## RMSEA index = 0.047 and the 90 % confidence intervals are 0.044 0.05
## BIC = -712.23
## Fit based upon off diagonal values = 0.98
## Measures of factor score adequacy
## MR2 MR1 MR3 MR5 MR4
## Correlation of (regression) scores with factors 0.93 0.89 0.88 0.87 0.85
## Multiple R square of scores with factors 0.86 0.80 0.78 0.75 0.73
## Minimum correlation of possible factor scores 0.72 0.59 0.56 0.51 0.46
## MR6
## Correlation of (regression) scores with factors 0.77
## Multiple R square of scores with factors 0.60
## Minimum correlation of possible factor scores 0.20
As before, you’ll be interested in items’ factor loadings and individuals’ factor scores. These will be interpreted in the same way, but since your EFA is multidimensional, you’ll get results for each factor.
Remember, an item’s loadings represent the amount of information it provides for each factor. Items’ meaningful loadings will be displayed in the output. You’ll notice that many items load onto more than one factor, which means they provide information about multiple factors. This may not be desirable for measure development, so some researchers consider only the strongest loading for each item.
Each examinee will have a factor score for each factor, so that the matrix won’t include blanks. However, examinees with missing data will receive NA scores on all factors.
# Run the EFA with six factors (as indicated by your scree plot)
EFA_model <- fa(bfi_EFA, nfactors = 6)
# View items' factor loadings
EFA_model$loadings
##
## Loadings:
## MR2 MR1 MR3 MR5 MR4 MR6
## A1 0.114 0.100 -0.583 0.291
## A2 0.647
## A3 -0.151 0.571 0.127
## A4 0.251 0.356 -0.106 0.146
## A5 -0.143 -0.204 0.417 0.130 0.248
## C1 0.547 0.173
## C2 0.131 0.699 0.109 0.154
## C3 0.549
## C4 0.106 -0.580 0.338
## C5 0.174 0.184 -0.526 0.137
## E1 -0.131 0.594 0.130 -0.131 0.101
## E2 0.694
## E3 -0.395 0.131 0.376 0.213
## E4 -0.552 0.173 0.283
## E5 0.163 -0.427 0.265 0.184
## N1 0.827
## N2 0.825
## N3 0.689 0.110
## N4 0.448 0.392 -0.144 0.107
## N5 0.458 0.234 0.205 -0.144 0.198
## O1 0.576
## O2 -0.338 0.376
## O3 0.640
## O4 0.134 0.330 0.155 0.370
## O5 -0.438 0.374
## gender 0.203 0.291 -0.218
## education -0.178
## age 0.108 0.233 -0.204
##
## MR2 MR1 MR3 MR5 MR4 MR6
## SS loadings 2.453 1.996 1.931 1.705 1.564 0.876
## Proportion Var 0.088 0.071 0.069 0.061 0.056 0.031
## Cumulative Var 0.088 0.159 0.228 0.289 0.345 0.376
# View the first few lines of examinees' factor scores
head(EFA_model$scores)
## MR2 MR1 MR3 MR5 MR4
## 63516 -0.2182622 -0.1410742 0.94632097 1.10717650 1.12911753
## 63887 0.4414892 0.5037780 -0.09207285 -0.66602530 0.05929365
## 63638 NA NA NA NA NA
## 61778 NA NA NA NA NA
## 64017 -1.7107179 -1.1593705 1.22761716 0.05292053 0.98558274
## 61818 0.1738390 0.3555376 -1.85974873 0.30129994 -2.16053222
## MR6
## 63516 0.40755424
## 63887 -0.08653287
## 63638 NA
## 61778 NA
## 64017 -0.21128665
## 61818 0.48428593
Now that you know the basics of model fit, let’s take a look at the absolute fit statistics for your six-factor EFA on the bfi_EFA dataset.
Those results show the following absolute fit statistics. Which of these indices meet the criteria for good fit?
data.frame(
`Chi-square test` = "p-value < 0.01",
TLI = 0.916,
RMSEA = 0.045
)
## Chi.square.test TLI RMSEA
## 1 p-value < 0.01 0.916 0.045
Usuall, Chi-Squre test: Non-Significant Result The TLI is above the 0.90 cutoff for good fit, and the RMSEA is below the 0.05 cutoff for good fit.
EFA_model
## Factor Analysis using method = minres
## Call: fa(r = bfi_EFA, nfactors = 6)
## Standardized loadings (pattern matrix) based upon correlation matrix
## MR2 MR1 MR3 MR5 MR4 MR6 h2 u2 com
## A1 0.11 -0.08 0.10 -0.58 0.05 0.29 0.388 0.61 1.7
## A2 0.02 -0.06 0.06 0.65 0.05 0.01 0.479 0.52 1.0
## A3 -0.03 -0.15 0.07 0.57 0.08 0.13 0.492 0.51 1.3
## A4 -0.09 -0.07 0.25 0.36 -0.11 0.15 0.283 0.72 2.7
## A5 -0.14 -0.20 0.05 0.42 0.13 0.25 0.457 0.54 2.8
## C1 0.02 0.06 0.55 -0.02 0.17 0.01 0.339 0.66 1.2
## C2 0.06 0.13 0.70 0.01 0.11 0.15 0.514 0.49 1.2
## C3 -0.01 0.05 0.55 0.05 -0.06 0.01 0.296 0.70 1.1
## C4 0.05 0.11 -0.58 -0.05 0.04 0.34 0.515 0.49 1.7
## C5 0.17 0.18 -0.53 -0.01 0.14 0.07 0.428 0.57 1.7
## E1 -0.13 0.59 0.13 -0.13 -0.05 0.10 0.382 0.62 1.4
## E2 0.05 0.69 -0.02 -0.03 -0.04 0.06 0.534 0.47 1.0
## E3 0.01 -0.39 0.00 0.13 0.38 0.21 0.486 0.51 2.8
## E4 -0.06 -0.55 0.01 0.17 0.04 0.28 0.532 0.47 1.8
## E5 0.16 -0.43 0.27 0.05 0.18 -0.03 0.387 0.61 2.5
## N1 0.83 -0.09 0.03 -0.09 -0.06 0.00 0.668 0.33 1.1
## N2 0.82 -0.07 0.00 -0.04 0.01 -0.06 0.656 0.34 1.0
## N3 0.69 0.11 -0.05 0.05 0.08 0.05 0.550 0.45 1.1
## N4 0.45 0.39 -0.14 0.06 0.11 0.01 0.483 0.52 2.4
## N5 0.46 0.23 0.01 0.20 -0.14 0.20 0.383 0.62 2.6
## O1 -0.05 -0.04 0.09 -0.05 0.58 0.04 0.361 0.64 1.1
## O2 0.07 0.00 -0.05 0.07 -0.34 0.38 0.272 0.73 2.2
## O3 0.01 -0.08 0.04 0.05 0.64 -0.01 0.470 0.53 1.1
## O4 0.13 0.33 0.01 0.16 0.37 -0.06 0.263 0.74 2.7
## O5 0.04 -0.02 0.00 -0.07 -0.44 0.37 0.343 0.66 2.0
## gender 0.20 -0.08 0.10 0.29 -0.22 -0.07 0.138 0.86 3.4
## education -0.02 0.09 0.02 0.10 0.02 -0.18 0.045 0.96 2.2
## age -0.08 0.08 0.11 0.23 -0.04 -0.20 0.115 0.89 3.0
##
## MR2 MR1 MR3 MR5 MR4 MR6
## SS loadings 2.55 2.23 2.06 1.87 1.65 0.91
## Proportion Var 0.09 0.08 0.07 0.07 0.06 0.03
## Cumulative Var 0.09 0.17 0.24 0.31 0.37 0.40
## Proportion Explained 0.23 0.20 0.18 0.17 0.15 0.08
## Cumulative Proportion 0.23 0.42 0.61 0.77 0.92 1.00
##
## With factor correlations of
## MR2 MR1 MR3 MR5 MR4 MR6
## MR2 1.00 0.23 -0.18 -0.11 0.05 0.14
## MR1 0.23 1.00 -0.22 -0.28 -0.19 -0.08
## MR3 -0.18 -0.22 1.00 0.17 0.17 -0.01
## MR5 -0.11 -0.28 0.17 1.00 0.23 0.10
## MR4 0.05 -0.19 0.17 0.23 1.00 0.01
## MR6 0.14 -0.08 -0.01 0.10 0.01 1.00
##
## Mean item complexity = 1.8
## Test of the hypothesis that 6 factors are sufficient.
##
## The degrees of freedom for the null model are 378 and the objective function was 7.57 with Chi Square of 10512.46
## The degrees of freedom for the model are 225 and the objective function was 0.66
##
## The root mean square of the residuals (RMSR) is 0.03
## The df corrected root mean square of the residuals is 0.03
##
## The harmonic number of observations is 1373 with the empirical chi square 702.03 with prob < 1.8e-50
## The total number of observations was 1400 with Likelihood Chi Square = 917.72 with prob < 2.2e-84
##
## Tucker Lewis Index of factoring reliability = 0.885
## RMSEA index = 0.047 and the 90 % confidence intervals are 0.044 0.05
## BIC = -712.23
## Fit based upon off diagonal values = 0.98
## Measures of factor score adequacy
## MR2 MR1 MR3 MR5 MR4
## Correlation of (regression) scores with factors 0.93 0.89 0.88 0.87 0.85
## Multiple R square of scores with factors 0.86 0.80 0.78 0.75 0.73
## Minimum correlation of possible factor scores 0.72 0.59 0.56 0.51 0.46
## MR6
## Correlation of (regression) scores with factors 0.77
## Multiple R square of scores with factors 0.60
## Minimum correlation of possible factor scores 0.20
Check the Tucker Lewis Index & RMSEA Values. ## 7. Selecting the best model Now use your knowledge of finding and interpreting absolute and relative model fit statistics to select the best model for your data. When I introduced this dataset I said that the items were theorized to load onto five factors, but you may have noticed that your scree plot indicated six factors. You might be wondering which you should trust. Not to worry - you can use fit statistics to make am empirical decision about how many factors to use.
First, you’ll use the bfi_EFA dataset to run EFAs with each of the hypothesized number of factors. Then, you can look at the BIC, which is a relative fit statistic, to compare models. Remember, the lowest BIC is preferred!
# Run each theorized EFA on your dataset
bfi_theory <- fa(bfi_EFA, nfactors = 5)
bfi_eigen <- fa(bfi_EFA, nfactors = 6)
# Compare the BIC values
bfi_theory$BIC
## [1] -403.0135
bfi_eigen$BIC
## [1] -712.2325
You can see that the eigenvalue-driven model has a much lower BIC, so it is empirically preferred.