Section 1. Evaluating your measure with factor analysis

1. Starting out with a unidimensional EFA

Let’s begin by using the psych package and conducting a single-factor explanatory factor analysis (EFA). The fa() function conducts an EFA on your data. When you’re using this in the real world, be sure to use a dataset that only contains item responses - other types of data will cause errors and/or incorrect results. In the gcbs dataset, these are examinees’ responses to 15 items from the Generic Conspiracist Beliefs Scale, which is designed to measure conspiracist beliefs.

An EFA provides information on each item’s relationship to a single factor hypothesized to be represented by each of the items. EFA results give you basic information about how well items relate to that hypothesized construct.

Be sure to save the analysis result object so you can return to it later.

# Load the psych package
library(psych)

url <- "https://assets.datacamp.com/production/repositories/2136/datasets/869615371e66021e97829feb7e19e38037ed0c14/GCBS_data.rds"
gcbs <- readRDS(gzcon(url(url)))

# Conduct a single-factor EFA
EFA_model <- fa(gcbs)

# View the results
print(EFA_model)

## Factor Analysis using method =  minres
## Call: fa(r = gcbs)
## Standardized loadings (pattern matrix) based upon correlation matrix
##      MR1   h2   u2 com
## Q1  0.70 0.49 0.51   1
## Q2  0.72 0.52 0.48   1
## Q3  0.64 0.41 0.59   1
## Q4  0.77 0.59 0.41   1
## Q5  0.67 0.45 0.55   1
## Q6  0.75 0.56 0.44   1
## Q7  0.73 0.54 0.46   1
## Q8  0.65 0.43 0.57   1
## Q9  0.70 0.48 0.52   1
## Q10 0.56 0.32 0.68   1
## Q11 0.72 0.52 0.48   1
## Q12 0.79 0.62 0.38   1
## Q13 0.68 0.46 0.54   1
## Q14 0.74 0.55 0.45   1
## Q15 0.57 0.33 0.67   1
## 
##                 MR1
## SS loadings    7.27
## Proportion Var 0.48
## 
## Mean item complexity =  1
## Test of the hypothesis that 1 factor is sufficient.
## 
## The degrees of freedom for the null model are  105  and the objective function was  9.31 with Chi Square of  23173.8
## The degrees of freedom for the model are 90  and the objective function was  1.93 
## 
## The root mean square of the residuals (RMSR) is  0.08 
## The df corrected root mean square of the residuals is  0.09 
## 
## The harmonic number of observations is  2495 with the empirical chi square  3398.99  with prob <  0 
## The total number of observations was  2495  with Likelihood Chi Square =  4809.34  with prob <  0 
## 
## Tucker Lewis Index of factoring reliability =  0.761
## RMSEA index =  0.145  and the 90 % confidence intervals are  0.142 0.149
## BIC =  4105.36
## Fit based upon off diagonal values = 0.97
## Measures of factor score adequacy             
##                                                    MR1
## Correlation of (regression) scores with factors   0.97
## Multiple R square of scores with factors          0.94
## Minimum correlation of possible factor scores     0.87

2. Viewing and visualizing the factor loadings

Each fa() results object is actually a list, and each element of the list contains specific information about the analysis, including factor loadings. Factor loadings represent the strength and directionality of the relationship between each item and the underlying factor, and they can range from -1 to 1.

You can also create a diagram of loadings. The fa.diagram() function takes a result object from fa() and creates a path diagram showing the items’ loadings ordered from strongest to weakest. Path diagrams are more common for structural equation modeling than for factor analysis, but this type of visualization can be a helpful way to represent your results.

# Set up the single-factor EFA
EFA_model <- fa(gcbs)

# View the factor loadings
EFA_model$loadings

## 
## Loadings:
##     MR1  
## Q1  0.703
## Q2  0.719
## Q3  0.638
## Q4  0.770
## Q5  0.672
## Q6  0.746
## Q7  0.734
## Q8  0.654
## Q9  0.695
## Q10 0.565
## Q11 0.719
## Q12 0.786
## Q13 0.679
## Q14 0.743
## Q15 0.574
## 
##                  MR1
## SS loadings    7.267
## Proportion Var 0.484

# Create a path diagram of the items' factor loadings
fa.diagram(EFA_model)

3. Interpreting individuals’ factor scores

The EFA_model object also contains a named list element, scores, which contains factor scores for each person. These factor scores are an indication of how much or how little of the factor each person is thought to possess. Factor scores are not computed for examinees with missing data.

(1) Use rowSums() to see the total scores for the first six respondents. These values tell you how much of the construct they possess.

head(gcbs)

##   Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15
## 1  5  5  3  5  5  5  5  3  4   5   5   5   3   5   5
## 2  5  5  5  5  5  3  5  5  1   4   4   5   4   4   5
## 3  2  4  1  2  2  2  4  2  2   4   2   4   0   2   4
## 4  5  4  1  2  4  5  4  1  4   5   5   5   1   4   5
## 5  5  4  1  4  4  5  4  3  1   5   5   5   3   5   5
## 6  1  1  1  1  1  1  1  1  1   1   1   1   1   1   1

rowSums(gcbs[1:6, ])

##  1  2  3  4  5  6 
## 68 65 37 55 59 15

(2) Use head() to look at the first few lines of the response data and their sum scores. Comparing these helps illustrate the relationship between responses and factor scores.

head(EFA_model$scores)

##             MR1
## [1,]  1.5614675
## [2,]  1.3432026
## [3,] -0.3960355
## [4,]  0.7478868
## [5,]  1.0435203
## [6,] -1.7290812

(3) To get a feel for how the factor scores are distributed, use summary() to look at summary statistics.

summary(EFA_model$scores)

##       MR1           
##  Min.   :-1.854703  
##  1st Qu.:-0.783260  
##  Median :-0.001971  
##  Mean   : 0.000000  
##  3rd Qu.: 0.728568  
##  Max.   : 1.949580

(4) Use plot() and density() to create a density plot of the estimated factor scores for a visual representation. Density plots show the distribution of data over a continuous interval and can give you a sense of what your data look like.

plot(density(EFA_model$scores, na.rm = TRUE), 
    main = "Factor Scores")

4. Descriptive statistics of your dataset

The psych package provides several functions to help you visualize your data. The describe() function tells you basic statistics about each variable in your dataset, including the number of complete cases, the mean, standard deviation, minimum, maximum, range, skew, and kurtosis.

You can also view graphical representations of the error bars for different variables using error.dots() and error.bars(). Both of these graphical representations are created from the summary statistics available from calling describe().

(1) Check out the basic descriptive statistics provided by the describe() function.

describe(gcbs)

##     vars    n mean   sd median trimmed  mad min max range  skew kurtosis
## Q1     1 2495 3.47 1.46      4    3.59 1.48   0   5     5 -0.55    -1.10
## Q2     2 2495 2.96 1.49      3    2.96 1.48   0   5     5 -0.01    -1.40
## Q3     3 2495 2.05 1.39      1    1.82 0.00   0   5     5  0.98    -0.44
## Q4     4 2495 2.64 1.45      2    2.55 1.48   0   5     5  0.26    -1.34
## Q5     5 2495 3.25 1.47      4    3.32 1.48   0   5     5 -0.35    -1.27
## Q6     6 2495 3.11 1.51      3    3.14 1.48   0   5     5 -0.17    -1.42
## Q7     7 2495 2.67 1.51      2    2.59 1.48   0   5     5  0.28    -1.39
## Q8     8 2495 2.45 1.57      2    2.32 1.48   0   5     5  0.51    -1.30
## Q9     9 2495 2.23 1.42      2    2.05 1.48   0   5     5  0.76    -0.82
## Q10   10 2495 3.50 1.39      4    3.63 1.48   1   5     4 -0.59    -0.94
## Q11   11 2495 3.27 1.40      4    3.34 1.48   0   5     5 -0.35    -1.11
## Q12   12 2495 2.64 1.50      2    2.56 1.48   0   5     5  0.29    -1.37
## Q13   13 2495 2.10 1.38      1    1.89 0.00   0   5     5  0.89    -0.56
## Q14   14 2495 2.96 1.49      3    2.95 1.48   0   5     5 -0.02    -1.43
## Q15   15 2495 4.23 1.10      5    4.47 0.00   0   5     5 -1.56     1.71
##       se
## Q1  0.03
## Q2  0.03
## Q3  0.03
## Q4  0.03
## Q5  0.03
## Q6  0.03
## Q7  0.03
## Q8  0.03
## Q9  0.03
## Q10 0.03
## Q11 0.03
## Q12 0.03
## Q13 0.03
## Q14 0.03
## Q15 0.02

(2) Create a dot chart of the mean scores and confidence intervals for each variable in the dataset.

error.dots(gcbs)

(3) Create a bar chart of the mean scores and confidence intervals for each variable in the dataset.

error.bars(gcbs)

5. Splitting your dataset

During the measure development process, it’s important to conduct EFA and CFA on separate datasets because using the same dataset can lead to inflated model fit statistics. Instead, you can split your dataset in half, then use one half for the EFA and the other half for the CFA.

N <- nrow(gcbs)
indices <- seq(1, N)
indices_EFA <- sample(indices, floor((.5*N)))
indices_CFA <- indices[!(indices %in% indices_EFA)]

gcbs_EFA <- gcbs[indices_EFA, ]
gcbs_CFA <- gcbs[indices_CFA, ]

6. Comparing the halves of your dataset

Just as you inspected the features of your full dataset, it’s important to examine the halves after you’ve split the data. You can always use describe() on each dataset, but the psych package also provides some functions to help compare a dataset according to a grouping variable.

In this exercise, you’ll use the indices created when splitting the dataset to create a grouping variable and attach it to the gcbs dataset. Once that grouping variable is set up, you can use describeBy() and statsBy() to view basic descriptive statistics as well as between-group statistics.

A word of warning: while the group argument of describeBy() has to be a vector, the group argument of statsBy() has to be the name of a column in your dataframe. Plan accordingly!

# Use the indices from the previous exercise to create a grouping variable
group_var <- vector("numeric", nrow(gcbs))
group_var[indices_EFA] <- 1
group_var[indices_CFA] <- 2

# Bind that grouping variable onto the gcbs dataset
gcbs_grouped <- cbind(gcbs, group_var)

# Compare stats across groups
describeBy(gcbs_grouped, group = group_var)

## 
##  Descriptive statistics by group 
## group: 1
##           vars    n mean   sd median trimmed  mad min max range  skew
## Q1           1 1247 3.48 1.45      4    3.60 1.48   0   5     5 -0.55
## Q2           2 1247 2.98 1.50      3    2.98 1.48   0   5     5  0.00
## Q3           3 1247 2.03 1.40      1    1.80 0.00   0   5     5  1.00
## Q4           4 1247 2.68 1.47      2    2.61 1.48   0   5     5  0.24
## Q5           5 1247 3.32 1.47      4    3.40 1.48   0   5     5 -0.41
## Q6           6 1247 3.13 1.50      3    3.16 1.48   0   5     5 -0.18
## Q7           7 1247 2.68 1.51      2    2.60 1.48   0   5     5  0.28
## Q8           8 1247 2.43 1.55      2    2.29 1.48   0   5     5  0.55
## Q9           9 1247 2.30 1.45      2    2.13 1.48   0   5     5  0.69
## Q10         10 1247 3.50 1.37      4    3.62 1.48   1   5     4 -0.58
## Q11         11 1247 3.28 1.40      4    3.36 1.48   0   5     5 -0.38
## Q12         12 1247 2.66 1.52      2    2.58 1.48   0   5     5  0.29
## Q13         13 1247 2.11 1.39      1    1.90 0.00   0   5     5  0.88
## Q14         14 1247 3.01 1.49      3    3.02 1.48   0   5     5 -0.06
## Q15         15 1247 4.25 1.08      5    4.47 0.00   1   5     4 -1.56
## group_var   16 1247 1.00 0.00      1    1.00 0.00   1   1     0   NaN
##           kurtosis   se
## Q1           -1.10 0.04
## Q2           -1.41 0.04
## Q3           -0.41 0.04
## Q4           -1.36 0.04
## Q5           -1.22 0.04
## Q6           -1.40 0.04
## Q7           -1.38 0.04
## Q8           -1.23 0.04
## Q9           -0.95 0.04
## Q10          -0.91 0.04
## Q11          -1.10 0.04
## Q12          -1.38 0.04
## Q13          -0.57 0.04
## Q14          -1.41 0.04
## Q15           1.78 0.03
## group_var      NaN 0.00
## -------------------------------------------------------- 
## group: 2
##           vars    n mean   sd median trimmed  mad min max range  skew
## Q1           1 1248 3.47 1.46      4    3.58 1.48   0   5     5 -0.55
## Q2           2 1248 2.95 1.49      3    2.94 1.48   0   5     5 -0.02
## Q3           3 1248 2.06 1.38      1    1.84 0.00   0   5     5  0.96
## Q4           4 1248 2.59 1.43      2    2.49 1.48   0   5     5  0.28
## Q5           5 1248 3.19 1.47      4    3.24 1.48   0   5     5 -0.28
## Q6           6 1248 3.09 1.51      3    3.11 1.48   0   5     5 -0.15
## Q7           7 1248 2.65 1.51      2    2.57 1.48   0   5     5  0.29
## Q8           8 1248 2.48 1.59      2    2.35 1.48   0   5     5  0.48
## Q9           9 1248 2.17 1.39      2    1.97 1.48   0   5     5  0.83
## Q10         10 1248 3.51 1.41      4    3.63 1.48   1   5     4 -0.59
## Q11         11 1248 3.25 1.40      3    3.31 1.48   0   5     5 -0.32
## Q12         12 1248 2.63 1.49      2    2.55 1.48   0   5     5  0.28
## Q13         13 1248 2.09 1.38      1    1.88 0.00   0   5     5  0.89
## Q14         14 1248 2.90 1.49      3    2.87 1.48   0   5     5  0.03
## Q15         15 1248 4.21 1.13      5    4.46 0.00   0   5     5 -1.55
## group_var   16 1248 2.00 0.00      2    2.00 0.00   2   2     0   NaN
##           kurtosis   se
## Q1           -1.11 0.04
## Q2           -1.39 0.04
## Q3           -0.48 0.04
## Q4           -1.32 0.04
## Q5           -1.31 0.04
## Q6           -1.43 0.04
## Q7           -1.40 0.04
## Q8           -1.36 0.05
## Q9           -0.67 0.04
## Q10          -0.98 0.04
## Q11          -1.13 0.04
## Q12          -1.36 0.04
## Q13          -0.56 0.04
## Q14          -1.45 0.04
## Q15           1.61 0.03
## group_var      NaN 0.00

statsBy(gcbs_grouped, group = "group_var")

## Statistics within and between groups  
## Call: statsBy(data = gcbs_grouped, group = "group_var")
## Intraclass Correlation 1 (Percentage of variance due to groups) 
##        Q1        Q2        Q3        Q4        Q5        Q6        Q7 
##         0         0         0         0         0         0         0 
##        Q8        Q9       Q10       Q11       Q12       Q13       Q14 
##         0         0         0         0         0         0         0 
##       Q15 group_var 
##         0         1 
## Intraclass Correlation 2 (Reliability of group differences) 
##        Q1        Q2        Q3        Q4        Q5        Q6        Q7 
##    -16.32     -2.67     -4.01      0.65      0.80     -1.17     -5.05 
##        Q8        Q9       Q10       Q11       Q12       Q13       Q14 
##     -0.52      0.82    -21.90     -1.50     -6.40     -7.94      0.74 
##       Q15 group_var 
##     -0.56      1.00 
## eta^2 between groups  
##  Q1.bg  Q2.bg  Q3.bg  Q4.bg  Q5.bg  Q6.bg  Q7.bg  Q8.bg  Q9.bg Q10.bg 
##      0      0      0      0      0      0      0      0      0      0 
## Q11.bg Q12.bg Q13.bg Q14.bg Q15.bg 
##      0      0      0      0      0 
## 
## To see the correlations between and within groups, use the short=FALSE option in your print statement.
## Many results are not shown directly. To see specific objects select from the following list:
##  mean sd n F ICC1 ICC2 ci1 ci2 raw rbg pbg rwg nw pwg etabg etawg nwg nG Call

7. Viewing and testing correlations

One of the easiest ways to get a feel for your dataset is to examine the relationships of your variables with each other. While base R has the cor() function, the psych package has the lowerCor() function, which only displays the lower triangle of the correlation matrix for easier viewing and interpretation.

You can also use the corr.test() function to view several probability values, including p-values from t-tests of each correlation and confidence intervals for the correlations. P-values are corrected using the Holm correction by default.

(1) View the lower triangle of the correlation matrix.

lowerCor(gcbs, use = "pairwise.complete.obs")

##     Q1   Q2   Q3   Q4   Q5   Q6   Q7   Q8   Q9   Q10 
## Q1  1.00                                             
## Q2  0.53 1.00                                        
## Q3  0.36 0.40 1.00                                   
## Q4  0.52 0.53 0.50 1.00                              
## Q5  0.48 0.46 0.40 0.57 1.00                         
## Q6  0.63 0.55 0.40 0.61 0.50 1.00                    
## Q7  0.47 0.67 0.42 0.57 0.45 0.54 1.00               
## Q8  0.39 0.38 0.78 0.49 0.41 0.41 0.41 1.00          
## Q9  0.42 0.49 0.49 0.56 0.46 0.48 0.53 0.48 1.00     
## Q10 0.44 0.38 0.32 0.40 0.43 0.41 0.39 0.36 0.37 1.00
## Q11 0.64 0.52 0.34 0.52 0.49 0.62 0.49 0.37 0.46 0.45
## Q12 0.52 0.72 0.44 0.60 0.49 0.59 0.75 0.42 0.57 0.40
## Q13 0.38 0.40 0.71 0.51 0.43 0.42 0.45 0.76 0.54 0.37
## Q14 0.53 0.50 0.43 0.60 0.54 0.55 0.52 0.45 0.55 0.41
## Q15 0.51 0.40 0.27 0.39 0.45 0.47 0.39 0.31 0.32 0.45
##     Q11  Q12  Q13  Q14  Q15 
## Q11 1.00                    
## Q12 0.55 1.00               
## Q13 0.40 0.49 1.00          
## Q14 0.56 0.56 0.50 1.00     
## Q15 0.54 0.41 0.30 0.46 1.00

(2) Check out the p-values created when calculating correlations. Significant values mean items are meaningfully correlated.

corr.test(gcbs, use = "pairwise.complete.obs")$p

##                Q1            Q2            Q3            Q4            Q5
## Q1   0.000000e+00 1.038105e-175  2.525793e-74 1.746323e-174 5.801103e-143
## Q2  1.384140e-177  0.000000e+00  6.236650e-93 8.388030e-183 1.237758e-127
## Q3   3.608276e-75  3.282447e-94  0.000000e+00 1.087371e-155  1.282718e-94
## Q4  2.359896e-176 1.075388e-184 1.647531e-157  0.000000e+00 7.309882e-214
## Q5  1.054746e-144 2.578663e-129  6.108182e-96 8.032837e-216  0.000000e+00
## Q6  1.477903e-277 1.757495e-198  1.248831e-96 3.037754e-253 1.470071e-154
## Q7  8.142449e-139  0.000000e+00 5.318107e-109 5.260693e-212 2.336455e-122
## Q8   1.549786e-91  5.579984e-85  0.000000e+00 1.898345e-150 1.224898e-101
## Q9  4.344797e-106 3.376084e-148 3.104324e-151 3.108940e-210 1.462606e-133
## Q10 3.550942e-116  9.131376e-87  6.760094e-61  1.298818e-98 1.476449e-111
## Q11 9.499179e-292 1.822105e-173  2.580298e-69 9.037626e-175 2.540009e-153
## Q12 9.097129e-175  0.000000e+00 1.005698e-117 1.550592e-244 2.612164e-152
## Q13  9.052542e-87  1.297778e-96  0.000000e+00 1.798859e-167 1.168044e-111
## Q14 1.912163e-184 3.969981e-158 1.103607e-113 8.219433e-248 4.008639e-188
## Q15 9.793389e-162  3.129498e-96  1.676489e-41  1.255385e-93 3.849304e-125
##                Q6            Q7            Q8            Q9           Q10
## Q1  1.448345e-275 4.315498e-137  2.634636e-90 1.433783e-104 1.420377e-114
## Q2  1.493871e-196  0.000000e+00  6.695981e-84 1.958129e-146  1.267356e-85
## Q3   2.997195e-95 1.861338e-107  0.000000e+00 1.862595e-149  2.704037e-60
## Q4  2.916244e-251 4.734624e-210 1.120024e-148 2.766957e-208  3.376926e-97
## Q5  9.408454e-153 9.813111e-121 3.429715e-100 7.459290e-132 5.462862e-110
## Q6   0.000000e+00 2.582592e-187 1.111480e-101 4.903141e-145  2.966233e-99
## Q7  3.149502e-189  0.000000e+00 1.110564e-101 4.177923e-178  1.437786e-89
## Q8  3.704932e-103 3.582463e-103  0.000000e+00 3.981155e-142  1.778530e-78
## Q9  8.755609e-147 5.497267e-180 7.372509e-144  0.000000e+00  2.784487e-79
## Q10 1.098605e-100  9.585238e-91  2.223163e-79  3.093874e-80  0.000000e+00
## Q11 4.981130e-268 8.601767e-153  2.643708e-82 1.655959e-131 4.633595e-125
## Q12 2.403960e-231  0.000000e+00 5.764588e-109 7.071640e-216  1.561964e-98
## Q13 1.353745e-109 7.858024e-124  0.000000e+00 2.581509e-186  4.747674e-81
## Q14 7.044729e-195 6.545724e-170 2.485870e-124 2.858794e-201 3.168647e-104
## Q15 1.340087e-134  7.843219e-91  4.328489e-55  4.516727e-61 2.157366e-124
##               Q11           Q12           Q13           Q14           Q15
## Q1  9.404187e-290 6.597467e-173  1.267356e-85 1.472366e-182 6.659505e-160
## Q2  1.293694e-171  0.000000e+00  2.997195e-95 2.659887e-156  6.884895e-95
## Q3   1.548179e-68 4.123361e-116  0.000000e+00 4.304066e-112  1.676489e-41
## Q4  6.597467e-173 1.457557e-242 1.241212e-165 7.808462e-246  2.259693e-92
## Q5  1.600206e-151 1.593420e-150 4.438568e-110 3.206912e-186 1.809173e-123
## Q6  4.831696e-266 2.235683e-229 4.873482e-108 5.847125e-193 6.968453e-133
## Q7  5.333096e-151  0.000000e+00 3.378951e-122 4.582007e-168  1.254915e-89
## Q8   2.908079e-81 1.959960e-107  0.000000e+00 1.093783e-122  1.298547e-54
## Q9  8.114201e-130 6.505909e-214 2.039392e-184 2.458563e-199  2.258363e-60
## Q10 2.131454e-123  3.904910e-97  4.747674e-80 1.013967e-102 9.708145e-123
## Q11  0.000000e+00 1.571210e-193  1.584049e-94 1.610933e-205 1.319712e-186
## Q12 1.870488e-195  0.000000e+00 9.395071e-146 1.350081e-206 3.398356e-100
## Q13  7.920246e-96 1.648258e-147  0.000000e+00 6.585550e-154  2.751475e-52
## Q14 1.851647e-207 1.534183e-208 1.013162e-155  0.000000e+00 2.883420e-130
## Q15 1.629274e-188 1.171847e-101  1.375737e-52 5.766839e-132  0.000000e+00

(3) View the confidence intervals created when calculating correlations.

corr.test(gcbs, use = "pairwise.complete.obs")$ci

##             lower         r     upper             p
## Q1-Q2   0.4970162 0.5259992 0.5538098 1.384140e-177
## Q1-Q3   0.3206223 0.3553928 0.3892067  3.608276e-75
## Q1-Q4   0.4953852 0.5244323 0.5523079 2.359896e-176
## Q1-Q5   0.4503342 0.4810747 0.5106759 1.054746e-144
## Q1-Q6   0.6071117 0.6313131 0.6543444 1.477903e-277
## Q1-Q7   0.4412058 0.4722710 0.5022057 8.142449e-139
## Q1-Q8   0.3564216 0.3902059 0.4229712  1.549786e-91
## Q1-Q9   0.3850453 0.4179718 0.4498355 4.344797e-106
## Q1-Q10  0.4034438 0.4357865 0.4670415 3.550942e-116
## Q1-Q11  0.6199265 0.6435136 0.6659388 9.499179e-292
## Q1-Q12  0.4932727 0.5224025 0.5503620 9.097129e-175
## Q1-Q13  0.3464313 0.3805006 0.4135673  9.052542e-87
## Q1-Q14  0.5059498 0.5345780 0.5620298 1.912163e-184
## Q1-Q15  0.4753633 0.5051815 0.5338405 9.793389e-162
## Q2-Q3   0.3618855 0.3955108 0.4281083  3.282447e-94
## Q2-Q4   0.5062706 0.5348860 0.5623248 1.075388e-184
## Q2-Q5   0.4259018 0.4574975 0.4879788 2.578663e-129
## Q2-Q6   0.5234810 0.5513960 0.5781285 1.757495e-198
## Q2-Q7   0.6501266 0.6722188 0.6931753  0.000000e+00
## Q2-Q8   0.3425926 0.3767693 0.4099501  5.579984e-85
## Q2-Q9   0.4556319 0.4861810 0.5155863 3.376084e-148
## Q2-Q10  0.3464233 0.3804928 0.4135598  9.131376e-87
## Q2-Q11  0.4915283 0.5207263 0.5487548 1.822105e-173
## Q2-Q12  0.6962013 0.7158851 0.7344931  0.000000e+00
## Q2-Q13  0.3667134 0.4001964 0.4326439  1.297778e-96
## Q2-Q14  0.4702225 0.5002339 0.5290898 3.969981e-158
## Q2-Q15  0.3659505 0.3994560 0.4319274  3.129498e-96
## Q3-Q4   0.4693335 0.4993781 0.5282679 1.647531e-157
## Q3-Q5   0.3653695 0.3988923 0.4313817  6.108182e-96
## Q3-Q6   0.3667467 0.4002287 0.4326752  1.248831e-96
## Q3-Q7   0.3904688 0.4232258 0.4549125 5.318107e-109
## Q3-Q8   0.7683496 0.7839542 0.7986273  0.000000e+00
## Q3-Q9   0.4601647 0.4905484 0.5197845 3.104324e-151
## Q3-Q10  0.2853436 0.3209913 0.3557521  6.760094e-61
## Q3-Q11  0.3066780 0.3418064 0.3760050  2.580298e-69
## Q3-Q12  0.4061739 0.4384278 0.4695906 1.005698e-117
## Q3-Q13  0.6919756 0.7118867 0.7307155  0.000000e+00
## Q3-Q14  0.3989973 0.4314834 0.4628876 1.103607e-113
## Q3-Q15  0.2285790 0.2654400 0.3015410  1.676489e-41
## Q4-Q5   0.5438704 0.5709273 0.5967985 8.032837e-216
## Q4-Q6   0.5837641 0.6090539 0.6331630 3.037754e-253
## Q4-Q7   0.5394959 0.5667395 0.5927977 5.260693e-212
## Q4-Q8   0.4589969 0.4894234 0.5187032 1.898345e-150
## Q4-Q9   0.5374441 0.5647747 0.5909202 3.108940e-210
## Q4-Q10  0.3706739 0.4040387 0.4363621  1.298818e-98
## Q4-Q11  0.4932765 0.5224062 0.5503655 9.037626e-175
## Q4-Q12  0.5749365 0.6006273 0.6251350 1.550592e-244
## Q4-Q13  0.4833700 0.5128834 0.5412322 1.798859e-167
## Q4-Q14  0.5782876 0.6038268 0.6281838 8.219433e-248
## Q4-Q15  0.3607035 0.3943633 0.4269973  1.255385e-93
## Q5-Q6   0.4650551 0.4952588 0.5243108 1.470071e-154
## Q5-Q7   0.4142088 0.4461981 0.4770864 2.336455e-122
## Q5-Q8   0.3765708 0.4097576 0.4418941 1.224898e-101
## Q5-Q9   0.4328328 0.4641904 0.4944261 1.462606e-133
## Q5-Q10  0.3951535 0.4277623 0.4592945 1.476449e-111
## Q5-Q11  0.4632435 0.4935141 0.5226345 2.540009e-153
## Q5-Q12  0.4617541 0.4920795 0.5212560 2.612164e-152
## Q5-Q13  0.3953385 0.4279414 0.4594675 1.168044e-111
## Q5-Q14  0.5106389 0.5390785 0.5663399 4.008639e-188
## Q5-Q15  0.4189383 0.4507697 0.4814945 3.849304e-125
## Q6-Q7   0.5120337 0.5404170 0.5676214 3.149502e-189
## Q6-Q8   0.3794902 0.4125879 0.4446310 3.704932e-103
## Q6-Q9   0.4534992 0.4841255 0.5136099 8.755609e-147
## Q6-Q10  0.3747259 0.4079687 0.4401639 1.098605e-100
## Q6-Q11  0.5981808 0.6228032 0.6462508 4.981130e-268
## Q6-Q12  0.5610551 0.5873651 0.6124896 2.403960e-231
## Q6-Q13  0.3915639 0.4242864 0.4559371 1.353745e-109
## Q6-Q14  0.5190730 0.5471694 0.5740847 7.044729e-195
## Q6-Q15  0.4345044 0.4658040 0.4959800 1.340087e-134
## Q7-Q8   0.3795181 0.4126150 0.4446571 3.582463e-103
## Q7-Q9   0.5001718 0.5290301 0.5567146 5.497267e-180
## Q7-Q10  0.3547857 0.3886172 0.4214323  9.585238e-91
## Q7-Q11  0.4624648 0.4927641 0.5219138 8.601767e-153
## Q7-Q12  0.7365820 0.7540288 0.7704729  0.000000e+00
## Q7-Q13  0.4167211 0.4486267 0.4794284 7.858024e-124
## Q7-Q14  0.4867147 0.5160994 0.5443174 6.545724e-170
## Q7-Q15  0.3549662 0.3887925 0.4216021  7.843219e-91
## Q8-Q9   0.4490408 0.4798277 0.5094765 7.372509e-144
## Q8-Q10  0.3302521 0.3647668 0.3983073  2.223163e-79
## Q8-Q11  0.3367608 0.3710987 0.4044508  2.643708e-82
## Q8-Q12  0.3904041 0.4231631 0.4548520 5.764588e-109
## Q8-Q13  0.7398147 0.7570774 0.7733440  0.000000e+00
## Q8-Q14  0.4175690 0.4494462 0.4802185 2.485870e-124
## Q8-Q15  0.2696028 0.3056115 0.3407668  4.328489e-55
## Q9-Q10  0.3321729 0.3666358 0.4001210  3.093874e-80
## Q9-Q11  0.4294991 0.4609717 0.4913259 1.655959e-131
## Q9-Q12  0.5439334 0.5709876 0.5968561 7.071640e-216
## Q9-Q13  0.5083417 0.5368740 0.5642288 2.581509e-186
## Q9-Q14  0.5268510 0.5546263 0.5812182 2.858794e-201
## Q9-Q15  0.2858045 0.3214413 0.3561903  4.516727e-61
## Q10-Q11 0.4188025 0.4506384 0.4813679 4.633595e-125
## Q10-Q12 0.3705162 0.4038857 0.4362140  1.561964e-98
## Q10-Q13 0.3339871 0.3684007 0.4018335  4.747674e-81
## Q10-Q14 0.3815258 0.4145611 0.4465387 3.168647e-104
## Q10-Q15 0.4176732 0.4495470 0.4803157 2.157366e-124
## Q11-Q12 0.5197817 0.5478491 0.5747350 1.870488e-195
## Q11-Q13 0.3651436 0.3986730 0.4311694  7.920246e-96
## Q11-Q14 0.5342028 0.5616704 0.5879532 1.851647e-207
## Q11-Q15 0.5111332 0.5395529 0.5667941 1.629274e-188
## Q12-Q13 0.4545949 0.4851817 0.5146255 1.648258e-147
## Q12-Q14 0.5354702 0.5628844 0.5891136 1.534183e-208
## Q12-Q15 0.3766079 0.4097936 0.4419289 1.171847e-101
## Q13-Q14 0.4667464 0.4968874 0.5258754 1.013162e-155
## Q13-Q15 0.2625236 0.2986885 0.3340155  1.375737e-52
## Q14-Q15 0.4302457 0.4616926 0.4920203 5.766839e-132

8. Internal reliability

You know how to examine how individual items perform in your measure, but what about how well those items relate to each other - the overall internal reliability of a measure? Coefficient alpha (also called Cronbach’s alpha) and split-half reliability are two common ways of assessing reliability. These statistics are a function of the measure length and items’ interrelatedness, which you just investigated by looking at the correlation matrix.

In reliability values greater than 0.8 are desired, though some fields of study have higher or lower guidelines.

Do the results of alpha() and splitHalf() indicate the gcbs dataset has acceptable reliability?

(1) Use alpha() to get estimates of coefficient alpha and inspect the results. Be sure to check out the information about how reliability would change if items were dropped.

# Estimate coefficient alpha
alpha(gcbs)

## 
## Reliability analysis   
## Call: alpha(x = gcbs)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase mean sd median_r
##       0.93      0.93    0.94      0.48  14 0.002  2.9  1     0.47
## 
##  lower alpha upper     95% confidence boundaries
## 0.93 0.93 0.94 
## 
##  Reliability if an item is dropped:
##     raw_alpha std.alpha G6(smc) average_r S/N alpha se  var.r med.r
## Q1       0.93      0.93    0.94      0.48  13   0.0021 0.0105  0.46
## Q2       0.93      0.93    0.94      0.48  13   0.0021 0.0099  0.47
## Q3       0.93      0.93    0.94      0.49  13   0.0020 0.0084  0.48
## Q4       0.93      0.93    0.94      0.47  13   0.0022 0.0105  0.46
## Q5       0.93      0.93    0.94      0.48  13   0.0021 0.0112  0.48
## Q6       0.93      0.93    0.94      0.48  13   0.0021 0.0104  0.46
## Q7       0.93      0.93    0.94      0.48  13   0.0021 0.0098  0.47
## Q8       0.93      0.93    0.94      0.48  13   0.0020 0.0086  0.49
## Q9       0.93      0.93    0.94      0.48  13   0.0021 0.0108  0.46
## Q10      0.93      0.93    0.94      0.49  14   0.0020 0.0102  0.49
## Q11      0.93      0.93    0.94      0.48  13   0.0021 0.0104  0.46
## Q12      0.93      0.93    0.94      0.47  13   0.0022 0.0093  0.46
## Q13      0.93      0.93    0.94      0.48  13   0.0021 0.0092  0.48
## Q14      0.93      0.93    0.94      0.48  13   0.0021 0.0109  0.46
## Q15      0.93      0.93    0.94      0.49  14   0.0020 0.0095  0.49
## 
##  Item statistics 
##        n raw.r std.r r.cor r.drop mean  sd
## Q1  2495  0.73  0.73  0.70   0.68  3.5 1.5
## Q2  2495  0.74  0.74  0.72   0.69  3.0 1.5
## Q3  2495  0.68  0.67  0.66   0.62  2.0 1.4
## Q4  2495  0.78  0.78  0.76   0.74  2.6 1.5
## Q5  2495  0.70  0.70  0.67   0.65  3.3 1.5
## Q6  2495  0.76  0.76  0.74   0.72  3.1 1.5
## Q7  2495  0.75  0.75  0.73   0.70  2.7 1.5
## Q8  2495  0.69  0.69  0.68   0.63  2.5 1.6
## Q9  2495  0.72  0.72  0.69   0.67  2.2 1.4
## Q10 2495  0.61  0.61  0.57   0.55  3.5 1.4
## Q11 2495  0.74  0.74  0.72   0.69  3.3 1.4
## Q12 2495  0.79  0.79  0.79   0.75  2.6 1.5
## Q13 2495  0.71  0.71  0.70   0.66  2.1 1.4
## Q14 2495  0.76  0.76  0.74   0.71  3.0 1.5
## Q15 2495  0.60  0.62  0.58   0.56  4.2 1.1
## 
## Non missing response frequency for each item
##        0    1    2    3    4    5 miss
## Q1  0.00 0.16 0.12 0.12 0.27 0.32    0
## Q2  0.01 0.23 0.19 0.16 0.20 0.22    0
## Q3  0.00 0.55 0.13 0.12 0.10 0.10    0
## Q4  0.00 0.32 0.18 0.15 0.20 0.14    0
## Q5  0.00 0.19 0.14 0.13 0.28 0.26    0
## Q6  0.00 0.23 0.15 0.15 0.23 0.24    0
## Q7  0.00 0.33 0.19 0.13 0.18 0.17    0
## Q8  0.00 0.44 0.12 0.14 0.12 0.18    0
## Q9  0.00 0.45 0.19 0.12 0.12 0.11    0
## Q10 0.00 0.14 0.12 0.14 0.30 0.30    0
## Q11 0.00 0.16 0.14 0.19 0.27 0.24    0
## Q12 0.00 0.34 0.18 0.15 0.17 0.17    0
## Q13 0.01 0.51 0.15 0.15 0.10 0.09    0
## Q14 0.00 0.25 0.17 0.15 0.22 0.20    0
## Q15 0.00 0.05 0.05 0.08 0.27 0.55    0

(2) Use splitHalf() to estimate split-half reliability.

splitHalf(gcbs)

## Split half reliabilities  
## Call: splitHalf(r = gcbs)
## 
## Maximum split half reliability (lambda 4) =  0.95
## Guttman lambda 6                          =  0.94
## Average split half reliability            =  0.93
## Guttman lambda 3 (alpha)                  =  0.93
## Minimum split half reliability  (beta)    =  0.86
## Average interitem r =  0.48  with median =  0.47

9. When to use EFA

For which of these measure development situations would an exploratory factor analysis be appropriate?

The answer is You are piloting a measure, but you’re not sure which factor(s) each item measures.

Section 2. Multidimensional EFA

This chapter will show you how to extend the single-factor EFA you learned in Chapter 1 to multidimensional data.

1. Splitting the BFI dataset

For this chapter, you’ll be using the bfi dataset, which consists of responses to 25 items measuring the Big Five personality traits. I’ve trimmed the dataset down to only the item responses for your use. Since you’ll be doing both exploratory and confirmatory factor analyses on this data, you’ll want to start out by splitting the dataset. You’ll use the same process that you learned in Chapter 1 when you split the gcbs dataset.

data("bfi")

# Establish two sets of indices to split the dataset
N <- nrow(bfi)
indices <- seq(1, N)
indices_EFA <- sample(indices, floor((.5*N)))
indices_CFA <- indices[!(indices %in% indices_EFA)]

# Use those indices to split the dataset into halves for your EFA and CFA
bfi_EFA <- bfi[indices_EFA, ]
bfi_CFA <- bfi[indices_CFA, ]

2. Calculating eigenvalues

To empirically determine the dimensionality of your data, a common strategy is to examine the eigenvalues. Eigenvalues are numeric representations of the amount of variance explained by each factor or component. Eigenvalues are calculated from a correlation matrix, so you’ll need to use cor() to calculate and store the dataset’s correlation matrix before calculating eigenvalues. You’ll need to specify that you want to use pairwise complete observations. The default is to use everything, but if your dataset has any missing values, this will leave you with a matrix full of NAs.

You’ll do these calculations on the bfi_EFA dataset you just created - remember, you’re saving the data in bfi_CFA for your confirmatory analysis!

# Calculate the correlation matrix first
bfi_EFA_cor <- cor(bfi_EFA, use = "pairwise.complete.obs")

# Then use that correlation matrix to calculate eigenvalues
eigenvals <- eigen(bfi_EFA_cor)

# Look at the eigenvalues returned
eigenvals$values

##  [1] 4.9862831 2.8109476 2.1136723 1.8714942 1.6649654 1.2674485 1.1443434
##  [8] 0.9312817 0.8503565 0.8184843 0.7558814 0.7135064 0.6853813 0.6575841
## [15] 0.6546158 0.6149472 0.5779194 0.5593646 0.5392932 0.5267551 0.4964413
## [22] 0.4873962 0.4336545 0.4221291 0.3932725 0.3864433 0.3715366 0.2646011

Now you know how to use the correlation matrix to calculate eigenvalues from a dataset.

3. Creating a scree plot

The scree plot is a visual representation of eigenvalues. Visual inspection of the scree plot is a quick and easy way to get a feel for the dimensionality of your dataset. Like the eigen() function in the previous exercise, the scree() function takes a correlation matrix as its argument. Eigenvalues can be generated from a principal component analysis or a factor analysis, and the scree() function calculates and plots both by default. Since eigen() finds eigenvalues via principal components analysis, we will use factors = FALSE so our scree plot will only display the values corresponding to those results.

# Then use that correlation matrix to create the scree plot
scree(bfi_EFA_cor, factors = FALSE)

A commonly used criterion for selecting the optimal number of factors is to only consider factors with eigenvalues greater than 1. scree() includes a solid horizontal line at 1 on the y-axis to help you quickly interpret your results. Run the code below to recreate the scree plot from the bfi_EFA data you created in the previous exercise. Based on the results, how many factors are recommended?

The answer is 6 factors

4. Conducting a multidimensional EFA

Now that you’ve examined the eigenvalues and scree plot to find the data-driven recommended number of factors, you can get down to actually running the multidimensional EFA. In Chapter 1, you ran a unidimensional EFA by using the fa() function. To run a multidimensional EFA, you’ll want to use the nfactors argument to specify the number of factors desired.

As in the previous exercises, you’ll want to be sure to run this analysis on the bfi_EFA dataset you created earlier.

# Run the EFA with six factors (as indicated by your scree plot)
EFA_model <- fa(bfi_EFA, nfactors = 6)

## Loading required namespace: GPArotation

# View results from the model object
print(EFA_model)

## Factor Analysis using method =  minres
## Call: fa(r = bfi_EFA, nfactors = 6)
## Standardized loadings (pattern matrix) based upon correlation matrix
##             MR2   MR1   MR3   MR5   MR4   MR6    h2   u2 com
## A1         0.11 -0.08  0.10 -0.58  0.05  0.29 0.388 0.61 1.7
## A2         0.02 -0.06  0.06  0.65  0.05  0.01 0.479 0.52 1.0
## A3        -0.03 -0.15  0.07  0.57  0.08  0.13 0.492 0.51 1.3
## A4        -0.09 -0.07  0.25  0.36 -0.11  0.15 0.283 0.72 2.7
## A5        -0.14 -0.20  0.05  0.42  0.13  0.25 0.457 0.54 2.8
## C1         0.02  0.06  0.55 -0.02  0.17  0.01 0.339 0.66 1.2
## C2         0.06  0.13  0.70  0.01  0.11  0.15 0.514 0.49 1.2
## C3        -0.01  0.05  0.55  0.05 -0.06  0.01 0.296 0.70 1.1
## C4         0.05  0.11 -0.58 -0.05  0.04  0.34 0.515 0.49 1.7
## C5         0.17  0.18 -0.53 -0.01  0.14  0.07 0.428 0.57 1.7
## E1        -0.13  0.59  0.13 -0.13 -0.05  0.10 0.382 0.62 1.4
## E2         0.05  0.69 -0.02 -0.03 -0.04  0.06 0.534 0.47 1.0
## E3         0.01 -0.39  0.00  0.13  0.38  0.21 0.486 0.51 2.8
## E4        -0.06 -0.55  0.01  0.17  0.04  0.28 0.532 0.47 1.8
## E5         0.16 -0.43  0.27  0.05  0.18 -0.03 0.387 0.61 2.5
## N1         0.83 -0.09  0.03 -0.09 -0.06  0.00 0.668 0.33 1.1
## N2         0.82 -0.07  0.00 -0.04  0.01 -0.06 0.656 0.34 1.0
## N3         0.69  0.11 -0.05  0.05  0.08  0.05 0.550 0.45 1.1
## N4         0.45  0.39 -0.14  0.06  0.11  0.01 0.483 0.52 2.4
## N5         0.46  0.23  0.01  0.20 -0.14  0.20 0.383 0.62 2.6
## O1        -0.05 -0.04  0.09 -0.05  0.58  0.04 0.361 0.64 1.1
## O2         0.07  0.00 -0.05  0.07 -0.34  0.38 0.272 0.73 2.2
## O3         0.01 -0.08  0.04  0.05  0.64 -0.01 0.470 0.53 1.1
## O4         0.13  0.33  0.01  0.16  0.37 -0.06 0.263 0.74 2.7
## O5         0.04 -0.02  0.00 -0.07 -0.44  0.37 0.343 0.66 2.0
## gender     0.20 -0.08  0.10  0.29 -0.22 -0.07 0.138 0.86 3.4
## education -0.02  0.09  0.02  0.10  0.02 -0.18 0.045 0.96 2.2
## age       -0.08  0.08  0.11  0.23 -0.04 -0.20 0.115 0.89 3.0
## 
##                        MR2  MR1  MR3  MR5  MR4  MR6
## SS loadings           2.55 2.23 2.06 1.87 1.65 0.91
## Proportion Var        0.09 0.08 0.07 0.07 0.06 0.03
## Cumulative Var        0.09 0.17 0.24 0.31 0.37 0.40
## Proportion Explained  0.23 0.20 0.18 0.17 0.15 0.08
## Cumulative Proportion 0.23 0.42 0.61 0.77 0.92 1.00
## 
##  With factor correlations of 
##       MR2   MR1   MR3   MR5   MR4   MR6
## MR2  1.00  0.23 -0.18 -0.11  0.05  0.14
## MR1  0.23  1.00 -0.22 -0.28 -0.19 -0.08
## MR3 -0.18 -0.22  1.00  0.17  0.17 -0.01
## MR5 -0.11 -0.28  0.17  1.00  0.23  0.10
## MR4  0.05 -0.19  0.17  0.23  1.00  0.01
## MR6  0.14 -0.08 -0.01  0.10  0.01  1.00
## 
## Mean item complexity =  1.8
## Test of the hypothesis that 6 factors are sufficient.
## 
## The degrees of freedom for the null model are  378  and the objective function was  7.57 with Chi Square of  10512.46
## The degrees of freedom for the model are 225  and the objective function was  0.66 
## 
## The root mean square of the residuals (RMSR) is  0.03 
## The df corrected root mean square of the residuals is  0.03 
## 
## The harmonic number of observations is  1373 with the empirical chi square  702.03  with prob <  1.8e-50 
## The total number of observations was  1400  with Likelihood Chi Square =  917.72  with prob <  2.2e-84 
## 
## Tucker Lewis Index of factoring reliability =  0.885
## RMSEA index =  0.047  and the 90 % confidence intervals are  0.044 0.05
## BIC =  -712.23
## Fit based upon off diagonal values = 0.98
## Measures of factor score adequacy             
##                                                    MR2  MR1  MR3  MR5  MR4
## Correlation of (regression) scores with factors   0.93 0.89 0.88 0.87 0.85
## Multiple R square of scores with factors          0.86 0.80 0.78 0.75 0.73
## Minimum correlation of possible factor scores     0.72 0.59 0.56 0.51 0.46
##                                                    MR6
## Correlation of (regression) scores with factors   0.77
## Multiple R square of scores with factors          0.60
## Minimum correlation of possible factor scores     0.20

5. Interpreting the results

As before, you’ll be interested in items’ factor loadings and individuals’ factor scores. These will be interpreted in the same way, but since your EFA is multidimensional, you’ll get results for each factor.

Remember, an item’s loadings represent the amount of information it provides for each factor. Items’ meaningful loadings will be displayed in the output. You’ll notice that many items load onto more than one factor, which means they provide information about multiple factors. This may not be desirable for measure development, so some researchers consider only the strongest loading for each item.

Each examinee will have a factor score for each factor, so that the matrix won’t include blanks. However, examinees with missing data will receive NA scores on all factors.

# Run the EFA with six factors (as indicated by your scree plot)
EFA_model <- fa(bfi_EFA, nfactors = 6)

# View items' factor loadings
EFA_model$loadings

## 
## Loadings:
##           MR2    MR1    MR3    MR5    MR4    MR6   
## A1         0.114         0.100 -0.583         0.291
## A2                              0.647              
## A3               -0.151         0.571         0.127
## A4                       0.251  0.356 -0.106  0.146
## A5        -0.143 -0.204         0.417  0.130  0.248
## C1                       0.547         0.173       
## C2                0.131  0.699         0.109  0.154
## C3                       0.549                     
## C4                0.106 -0.580                0.338
## C5         0.174  0.184 -0.526         0.137       
## E1        -0.131  0.594  0.130 -0.131         0.101
## E2                0.694                            
## E3               -0.395         0.131  0.376  0.213
## E4               -0.552         0.173         0.283
## E5         0.163 -0.427  0.265         0.184       
## N1         0.827                                   
## N2         0.825                                   
## N3         0.689  0.110                            
## N4         0.448  0.392 -0.144         0.107       
## N5         0.458  0.234         0.205 -0.144  0.198
## O1                                     0.576       
## O2                                    -0.338  0.376
## O3                                     0.640       
## O4         0.134  0.330         0.155  0.370       
## O5                                    -0.438  0.374
## gender     0.203                0.291 -0.218       
## education                                    -0.178
## age                      0.108  0.233        -0.204
## 
##                  MR2   MR1   MR3   MR5   MR4   MR6
## SS loadings    2.453 1.996 1.931 1.705 1.564 0.876
## Proportion Var 0.088 0.071 0.069 0.061 0.056 0.031
## Cumulative Var 0.088 0.159 0.228 0.289 0.345 0.376

# View the first few lines of examinees' factor scores
head(EFA_model$scores)

##              MR2        MR1         MR3         MR5         MR4
## 63516 -0.2182622 -0.1410742  0.94632097  1.10717650  1.12911753
## 63887  0.4414892  0.5037780 -0.09207285 -0.66602530  0.05929365
## 63638         NA         NA          NA          NA          NA
## 61778         NA         NA          NA          NA          NA
## 64017 -1.7107179 -1.1593705  1.22761716  0.05292053  0.98558274
## 61818  0.1738390  0.3555376 -1.85974873  0.30129994 -2.16053222
##               MR6
## 63516  0.40755424
## 63887 -0.08653287
## 63638          NA
## 61778          NA
## 64017 -0.21128665
## 61818  0.48428593

6. Interpret absolute model fit statistics

Now that you know the basics of model fit, let’s take a look at the absolute fit statistics for your six-factor EFA on the bfi_EFA dataset.

Those results show the following absolute fit statistics. Which of these indices meet the criteria for good fit?

data.frame(
  `Chi-square test` = "p-value < 0.01", 
  TLI = 0.916, 
  RMSEA = 0.045
)

##   Chi.square.test   TLI RMSEA
## 1  p-value < 0.01 0.916 0.045

Usuall, Chi-Squre test: Non-Significant Result The TLI is above the 0.90 cutoff for good fit, and the RMSEA is below the 0.05 cutoff for good fit.

EFA_model

## Factor Analysis using method =  minres
## Call: fa(r = bfi_EFA, nfactors = 6)
## Standardized loadings (pattern matrix) based upon correlation matrix
##             MR2   MR1   MR3   MR5   MR4   MR6    h2   u2 com
## A1         0.11 -0.08  0.10 -0.58  0.05  0.29 0.388 0.61 1.7
## A2         0.02 -0.06  0.06  0.65  0.05  0.01 0.479 0.52 1.0
## A3        -0.03 -0.15  0.07  0.57  0.08  0.13 0.492 0.51 1.3
## A4        -0.09 -0.07  0.25  0.36 -0.11  0.15 0.283 0.72 2.7
## A5        -0.14 -0.20  0.05  0.42  0.13  0.25 0.457 0.54 2.8
## C1         0.02  0.06  0.55 -0.02  0.17  0.01 0.339 0.66 1.2
## C2         0.06  0.13  0.70  0.01  0.11  0.15 0.514 0.49 1.2
## C3        -0.01  0.05  0.55  0.05 -0.06  0.01 0.296 0.70 1.1
## C4         0.05  0.11 -0.58 -0.05  0.04  0.34 0.515 0.49 1.7
## C5         0.17  0.18 -0.53 -0.01  0.14  0.07 0.428 0.57 1.7
## E1        -0.13  0.59  0.13 -0.13 -0.05  0.10 0.382 0.62 1.4
## E2         0.05  0.69 -0.02 -0.03 -0.04  0.06 0.534 0.47 1.0
## E3         0.01 -0.39  0.00  0.13  0.38  0.21 0.486 0.51 2.8
## E4        -0.06 -0.55  0.01  0.17  0.04  0.28 0.532 0.47 1.8
## E5         0.16 -0.43  0.27  0.05  0.18 -0.03 0.387 0.61 2.5
## N1         0.83 -0.09  0.03 -0.09 -0.06  0.00 0.668 0.33 1.1
## N2         0.82 -0.07  0.00 -0.04  0.01 -0.06 0.656 0.34 1.0
## N3         0.69  0.11 -0.05  0.05  0.08  0.05 0.550 0.45 1.1
## N4         0.45  0.39 -0.14  0.06  0.11  0.01 0.483 0.52 2.4
## N5         0.46  0.23  0.01  0.20 -0.14  0.20 0.383 0.62 2.6
## O1        -0.05 -0.04  0.09 -0.05  0.58  0.04 0.361 0.64 1.1
## O2         0.07  0.00 -0.05  0.07 -0.34  0.38 0.272 0.73 2.2
## O3         0.01 -0.08  0.04  0.05  0.64 -0.01 0.470 0.53 1.1
## O4         0.13  0.33  0.01  0.16  0.37 -0.06 0.263 0.74 2.7
## O5         0.04 -0.02  0.00 -0.07 -0.44  0.37 0.343 0.66 2.0
## gender     0.20 -0.08  0.10  0.29 -0.22 -0.07 0.138 0.86 3.4
## education -0.02  0.09  0.02  0.10  0.02 -0.18 0.045 0.96 2.2
## age       -0.08  0.08  0.11  0.23 -0.04 -0.20 0.115 0.89 3.0
## 
##                        MR2  MR1  MR3  MR5  MR4  MR6
## SS loadings           2.55 2.23 2.06 1.87 1.65 0.91
## Proportion Var        0.09 0.08 0.07 0.07 0.06 0.03
## Cumulative Var        0.09 0.17 0.24 0.31 0.37 0.40
## Proportion Explained  0.23 0.20 0.18 0.17 0.15 0.08
## Cumulative Proportion 0.23 0.42 0.61 0.77 0.92 1.00
## 
##  With factor correlations of 
##       MR2   MR1   MR3   MR5   MR4   MR6
## MR2  1.00  0.23 -0.18 -0.11  0.05  0.14
## MR1  0.23  1.00 -0.22 -0.28 -0.19 -0.08
## MR3 -0.18 -0.22  1.00  0.17  0.17 -0.01
## MR5 -0.11 -0.28  0.17  1.00  0.23  0.10
## MR4  0.05 -0.19  0.17  0.23  1.00  0.01
## MR6  0.14 -0.08 -0.01  0.10  0.01  1.00
## 
## Mean item complexity =  1.8
## Test of the hypothesis that 6 factors are sufficient.
## 
## The degrees of freedom for the null model are  378  and the objective function was  7.57 with Chi Square of  10512.46
## The degrees of freedom for the model are 225  and the objective function was  0.66 
## 
## The root mean square of the residuals (RMSR) is  0.03 
## The df corrected root mean square of the residuals is  0.03 
## 
## The harmonic number of observations is  1373 with the empirical chi square  702.03  with prob <  1.8e-50 
## The total number of observations was  1400  with Likelihood Chi Square =  917.72  with prob <  2.2e-84 
## 
## Tucker Lewis Index of factoring reliability =  0.885
## RMSEA index =  0.047  and the 90 % confidence intervals are  0.044 0.05
## BIC =  -712.23
## Fit based upon off diagonal values = 0.98
## Measures of factor score adequacy             
##                                                    MR2  MR1  MR3  MR5  MR4
## Correlation of (regression) scores with factors   0.93 0.89 0.88 0.87 0.85
## Multiple R square of scores with factors          0.86 0.80 0.78 0.75 0.73
## Minimum correlation of possible factor scores     0.72 0.59 0.56 0.51 0.46
##                                                    MR6
## Correlation of (regression) scores with factors   0.77
## Multiple R square of scores with factors          0.60
## Minimum correlation of possible factor scores     0.20

Check the Tucker Lewis Index & RMSEA Values. ## 7. Selecting the best model Now use your knowledge of finding and interpreting absolute and relative model fit statistics to select the best model for your data. When I introduced this dataset I said that the items were theorized to load onto five factors, but you may have noticed that your scree plot indicated six factors. You might be wondering which you should trust. Not to worry - you can use fit statistics to make am empirical decision about how many factors to use.

First, you’ll use the bfi_EFA dataset to run EFAs with each of the hypothesized number of factors. Then, you can look at the BIC, which is a relative fit statistic, to compare models. Remember, the lowest BIC is preferred!

# Run each theorized EFA on your dataset
bfi_theory <- fa(bfi_EFA, nfactors = 5)
bfi_eigen <- fa(bfi_EFA, nfactors = 6)

# Compare the BIC values
bfi_theory$BIC

## [1] -403.0135

bfi_eigen$BIC

## [1] -712.2325

You can see that the eigenvalue-driven model has a much lower BIC, so it is empirically preferred.

Exploratory Factor Analysis in R

Evan

5/24/2019