Speed Dating dataset (Kaggle)

Investingating the constructs that could answer the question “What influences love at first sight?”. Read about the experiment: https://www.kaggle.com/annavictoria/speed-dating-experiment

dating <-read.csv("Speed Dating Data.csv")
# Choosing the variables we think belong to latent factors
dating1<- dating[c("imprace","imprelig", "date", "go_out", "sports", 
                   "tvsports", "exercise",  "dining" , "museums",  "art",  
                   "hiking", "gaming",  "clubbing",  
                   "reading", "tv",  "theater", "movies",  "concerts",   
                   "music",   "shopping",   "yoga", "exphappy" , "attr1_1",
                   "sinc1_1",   "intel1_1", "fun1_1",   "amb1_1",   
                   "shar1_1", "attr2_1", "sinc2_1",   "intel2_1",   
                   "fun2_1",   "amb2_1",   "shar2_1",   "attr3_1",   "sinc3_1",
                   "intel3_1",   "fun3_1",   "amb3_1")]
dating1 <- as.data.frame(dating1)
dating12 <- na.omit(dating1) # as EFA doesn't work with missing cases
dim(dating1)                
## [1] 8378   39
#summary(dating1)

So, we have more than 8 thousand of cases and 39 variables to discover the latent factors (constructs) which can help to explain correlations between those variables. First of all, let’s check variables’ type.

sapply(dating1, typeof)
##   imprace  imprelig      date    go_out    sports  tvsports  exercise 
## "integer" "integer" "integer" "integer" "integer" "integer" "integer" 
##    dining   museums       art    hiking    gaming  clubbing   reading 
## "integer" "integer" "integer" "integer" "integer" "integer" "integer" 
##        tv   theater    movies  concerts     music  shopping      yoga 
## "integer" "integer" "integer" "integer" "integer" "integer" "integer" 
##  exphappy   attr1_1   sinc1_1  intel1_1    fun1_1    amb1_1   shar1_1 
## "integer"  "double"  "double"  "double"  "double"  "double"  "double" 
##   attr2_1   sinc2_1  intel2_1    fun2_1    amb2_1   shar2_1   attr3_1 
##  "double"  "double"  "double"  "double"  "double"  "double" "integer" 
##   sinc3_1  intel3_1    fun3_1    amb3_1 
## "integer" "integer" "integer" "integer"

I looks like all of the variables contain numeric values. So I can proceed to EFA.

Scree plots

First of all, let’s see how many factors the eigenvalues suggest to take by using a scree plot:

library(polycor)
library(psych)
dat.cor <- hetcor(dating1)
dat.cor<- dat.cor$correlations

fa.parallel(dating1, 
            fa="both", 
            n.iter=100)

## Parallel analysis suggests that the number of factors =  14  and the number of components =  13

Two blue lines on the plot stand for the observed eigenvalues, while red dotted lines present either random eigenvalues or simulated data. The logic is that all pints on the blue lines lying above the corresponding blue lines are the quantity of factors and components syggested to be extracted. In this case I would say that it’s not exactly clear which red line should be considered for factors. However, the software articulates that there should be 14 factors, so the line of interest is probably the most lower one.

Just to be sure, I will run another type of a scree plot

scree(dating1)

This scree plot tells that 5-6 factors and 13 principal components should be considered. While the latter corresponds to what the software tell us from the previous plot, the number of factors to be extracted is different. Let’s try both options and see which quantity of factors explains more variance.

14 factors

First, with the maximum quantity of factors - 14, and no rotation

fa_1 <- fa(dating1, nfactors=14, rotate="none", fm="ml") 
print(fa_1$loadings, cutoff = 0, digits = 2)
## 
## Loadings:
##          ML11  ML1   ML3   ML2   ML14  ML12  ML7   ML4   ML10  ML6   ML13 
## imprace   0.03 -0.03  0.05  0.01 -0.03  0.22  0.10  0.08 -0.01  0.02  0.27
## imprelig -0.10 -0.02  0.17  0.12  0.08  0.13  0.02  0.14 -0.03  0.02  0.19
## date     -0.09  0.05  0.20  0.06 -0.24 -0.04 -0.07 -0.05  0.06  0.10 -0.04
## go_out    0.03  0.03  0.03  0.11 -0.26 -0.07 -0.13 -0.06  0.07  0.09  0.03
## sports   -0.04  0.01 -0.25 -0.05  0.32  0.05  0.06 -0.08  0.12 -0.09 -0.07
## tvsports  0.01  0.04 -0.21  0.04  0.12  0.29  0.03 -0.08  0.12  0.03  0.10
## exercise  0.01 -0.10 -0.02 -0.04  0.26  0.07  0.13  0.01  0.06 -0.01  0.08
## dining    0.40 -0.02  0.11  0.01  0.19  0.15  0.06  0.10 -0.05  0.04  0.10
## museums   0.86  0.02  0.20  0.06 -0.01 -0.19 -0.04  0.09 -0.20  0.09  0.13
## art       0.84  0.00  0.16  0.05 -0.02 -0.17 -0.01  0.07 -0.15  0.08  0.06
## hiking    0.21  0.07  0.04 -0.03  0.10 -0.13 -0.02 -0.05  0.08  0.02 -0.17
## gaming    0.03  0.12 -0.18  0.00  0.00  0.27  0.06 -0.07  0.06 -0.04  0.02
## clubbing  0.18 -0.01 -0.01 -0.08  0.09  0.11  0.09 -0.02  0.04  0.08 -0.05
## reading   0.24  0.02  0.14  0.04  0.06 -0.10 -0.07  0.10 -0.14  0.03  0.06
## tv        0.08  0.05  0.09  0.06 -0.23  0.64  0.00 -0.01  0.09  0.08  0.49
## theater   0.54  0.03  0.27  0.07 -0.14  0.17 -0.05  0.11 -0.07  0.14  0.05
## movies    0.38  0.08  0.15  0.03 -0.18  0.37 -0.08  0.07 -0.08  0.12  0.00
## concerts  0.56  0.03  0.09  0.09 -0.09  0.39 -0.08  0.07  0.02  0.00 -0.53
## music     0.44  0.03  0.08  0.01  0.04  0.39 -0.02  0.06  0.06 -0.02 -0.43
## shopping  0.28 -0.09  0.14 -0.03 -0.03  0.42  0.13  0.07  0.06  0.15  0.24
## yoga      0.30  0.02  0.12  0.04  0.11  0.05  0.02  0.07  0.02  0.04 -0.09
## exphappy  0.13  0.13 -0.18 -0.10  0.14  0.11  0.00 -0.07  0.00 -0.05 -0.07
## attr1_1   0.02 -0.67 -0.58 -0.11 -0.01 -0.01  0.19 -0.22  0.11 -0.28  0.00
## sinc1_1   0.02  0.29  0.15  0.15  0.00 -0.01 -0.56 -0.14  0.39  0.46  0.00
## intel1_1 -0.09  0.03  0.13  0.02  0.00  0.00 -0.29  0.36 -0.81  0.15 -0.01
## fun1_1    0.00  0.21  0.14 -0.75  0.00  0.00  0.02 -0.01  0.06 -0.07  0.00
## amb1_1   -0.01  0.31  0.38  0.14 -0.01 -0.01  0.62  0.27  0.14  0.45  0.00
## shar1_1   0.01  0.46  0.36  0.62  0.00  0.00 -0.01 -0.04  0.08 -0.48  0.00
## attr2_1   0.00 -0.91  0.23  0.15  0.00  0.00 -0.13  0.07  0.04  0.10  0.00
## sinc2_1   0.00  0.66 -0.37  0.09  0.00  0.00 -0.02 -0.57 -0.03  0.21  0.00
## intel2_1  0.00  0.56 -0.52 -0.16  0.00  0.00 -0.14  0.56  0.06 -0.10  0.00
## fun2_1    0.00  0.18  0.50 -0.63  0.00  0.00 -0.02 -0.18  0.01 -0.19  0.00
## amb2_1    0.04  0.37 -0.32  0.03 -0.01  0.01  0.38  0.11 -0.04 -0.05  0.00
## shar2_1  -0.03  0.47  0.16  0.29 -0.01  0.00  0.11 -0.07 -0.06 -0.17  0.00
## attr3_1   0.13 -0.06 -0.13 -0.06  0.54  0.12  0.25 -0.01 -0.05  0.01  0.09
## sinc3_1   0.14  0.08  0.10  0.10  0.32  0.15 -0.18  0.01  0.15  0.15  0.06
## intel3_1  0.05  0.06 -0.09  0.06  0.51  0.09  0.08  0.08 -0.14 -0.08  0.12
## fun3_1    0.22 -0.04 -0.09 -0.12  0.51  0.24  0.19  0.06  0.15  0.06  0.05
## amb3_1    0.11 -0.01 -0.07  0.04  0.38  0.21  0.39  0.12  0.03  0.12  0.03
##          ML9   ML5   ML8  
## imprace   0.03  0.02  0.06
## imprelig  0.03 -0.01  0.02
## date      0.01 -0.04 -0.07
## go_out   -0.06  0.06 -0.06
## sports    0.02  0.00 -0.03
## tvsports -0.03 -0.12  0.02
## exercise  0.05  0.04  0.04
## dining    0.11  0.07  0.11
## museums   0.09  0.02  0.01
## art       0.12 -0.01  0.07
## hiking    0.05 -0.01 -0.01
## gaming   -0.03  0.04  0.07
## clubbing  0.00  0.00  0.10
## reading   0.05  0.06 -0.05
## tv        0.03 -0.04  0.04
## theater   0.17  0.06  0.07
## movies    0.07  0.01  0.00
## concerts  0.08 -0.01  0.03
## music     0.08 -0.02 -0.03
## shopping  0.09  0.00  0.13
## yoga      0.11 -0.05  0.04
## exphappy  0.02 -0.03  0.04
## attr1_1   0.05  0.14  0.02
## sinc1_1  -0.02  0.17 -0.36
## intel1_1 -0.06  0.13  0.08
## fun1_1    0.02 -0.60  0.02
## amb1_1    0.05  0.12  0.19
## shar1_1  -0.08 -0.14  0.11
## attr2_1  -0.03 -0.15  0.20
## sinc2_1  -0.02 -0.06  0.22
## intel2_1  0.02  0.13  0.16
## fun2_1   -0.04  0.50  0.06
## amb2_1   -0.49 -0.12 -0.57
## shar2_1   0.69 -0.07 -0.35
## attr3_1   0.17  0.06  0.03
## sinc3_1   0.06  0.06 -0.13
## intel3_1 -0.02  0.04 -0.05
## fun3_1    0.06 -0.23  0.07
## amb3_1    0.04  0.04  0.10
## 
##                ML11  ML1  ML3  ML2 ML14 ML12  ML7  ML4 ML10  ML6 ML13  ML9
## SS loadings    3.03 2.92 2.05 1.65 1.59 1.57 1.41 1.10 1.10 1.05 1.00 0.90
## Proportion Var 0.08 0.07 0.05 0.04 0.04 0.04 0.04 0.03 0.03 0.03 0.03 0.02
## Cumulative Var 0.08 0.15 0.21 0.25 0.29 0.33 0.36 0.39 0.42 0.45 0.47 0.50
##                 ML5  ML8
## SS loadings    0.87 0.85
## Proportion Var 0.02 0.02
## Cumulative Var 0.52 0.54
fa.diagram(fa_1)

The produced diagram suggests that 14 is probably too much as the results become conceptually uninterpretable. Additionaly, there are quite a few of variables that have a loading of more than |0.4| (considered the minimum desired value to attribute a variable to one of the discovered factors) among all of the factors.

Let’s briefly take a look at the values of measures of good fit of this factor model

fa_1$TLI
## [1] 0.7182358
fa_1$RMSEA[1]
##     RMSEA 
## 0.0821538

The first one is the comparative Tucker Lewis Index (TLI) values which is not acceptable in this case, as it falls below 0.9. The second one is the Root Mean Square Error of Approximation (RMSEA) value which is considered adequate when it’s up to 0.08 - which is pretty close to our case, but not quite there yet. Thus, this factor model is considered of bad fit.

5 factors

Then, let’s take 5 factors, as it was suggested by the second scree plot, and redo factor analysis. Again, with no rotation.

fa_2 <- fa(dating1, nfactors=5, rotate="none", fm="ml") 
print(fa_2$loadings, cutoff = 0, digits = 2)
## 
## Loadings:
##          ML3   ML4   ML2   ML1   ML5  
## imprace   0.05  0.08 -0.04 -0.03  0.24
## imprelig -0.01 -0.05 -0.04 -0.07  0.28
## date     -0.04 -0.36 -0.05  0.01  0.08
## go_out    0.04 -0.31  0.02 -0.01 -0.03
## sports   -0.15  0.41  0.14  0.03 -0.11
## tvsports -0.08  0.29  0.19 -0.07  0.13
## exercise -0.02  0.32 -0.07 -0.04  0.06
## dining    0.43  0.25 -0.10  0.01  0.21
## museums   0.91 -0.02 -0.08 -0.01 -0.13
## art       0.91  0.02 -0.07 -0.03 -0.16
## hiking    0.19  0.04  0.00  0.07 -0.11
## gaming   -0.04  0.21  0.14  0.09  0.11
## clubbing  0.15  0.24 -0.04  0.01  0.09
## reading   0.30 -0.12 -0.05  0.02 -0.02
## tv        0.09  0.04  0.00 -0.01  0.50
## theater   0.63 -0.10 -0.13  0.00  0.28
## movies    0.42 -0.07 -0.03  0.04  0.33
## concerts  0.48  0.04 -0.01 -0.03  0.17
## music     0.36  0.15 -0.01  0.01  0.20
## shopping  0.30  0.24 -0.15 -0.07  0.43
## yoga      0.34  0.08 -0.03 -0.03  0.12
## exphappy  0.06  0.26  0.13  0.12 -0.04
## attr1_1  -0.25  0.47 -0.14 -0.36 -0.39
## sinc1_1   0.06 -0.37  0.10  0.15  0.18
## intel1_1  0.13 -0.26 -0.07 -0.01 -0.01
## fun1_1   -0.04  0.13 -0.09  0.30 -0.02
## amb1_1    0.18 -0.05  0.06  0.16  0.41
## shar1_1   0.11 -0.32  0.26  0.10  0.22
## attr2_1   0.00  0.00 -0.60 -0.79  0.00
## sinc2_1  -0.03 -0.01  0.57  0.36 -0.01
## intel2_1 -0.01  0.10  0.52  0.33 -0.06
## fun2_1    0.00  0.00 -0.62  0.78  0.00
## amb2_1   -0.07  0.10  0.59  0.22 -0.11
## shar2_1   0.15 -0.19  0.41  0.27  0.19
## attr3_1   0.11  0.53  0.04  0.01  0.07
## sinc3_1   0.16  0.03  0.02  0.01  0.26
## intel3_1  0.06  0.31  0.13  0.03  0.08
## fun3_1    0.16  0.57  0.06 -0.09  0.22
## amb3_1    0.11  0.46  0.08 -0.05  0.26
## 
##                 ML3  ML4  ML2  ML1  ML5
## SS loadings    3.41 2.45 2.18 1.94 1.64
## Proportion Var 0.09 0.06 0.06 0.05 0.04
## Cumulative Var 0.09 0.15 0.21 0.26 0.30
fa.diagram(fa_2)

The diagram looks not so messy now, yet the factor loadings are still insufficient for most of the variables.

fa_2$TLI
## [1] 0.2667007
fa_2$RMSEA[1]
##     RMSEA 
## 0.1325313

However, the values of fit measures worsened - the TLI fell even further from the minimum acceptable value, while the RMSEA increased.

“Varimax” rotation

Let’s try 5 factors with rotation

fa_3 <- fa(dating1, nfactors=5, rotate="varimax", fm="ml") 
print(fa_3$loadings, cutoff = 0, digits = 2)
## 
## Loadings:
##          ML3   ML2   ML4   ML5   ML1  
## imprace   0.03 -0.04  0.09  0.24  0.00
## imprelig -0.04 -0.06 -0.03  0.28 -0.05
## date     -0.06  0.00 -0.36  0.09  0.00
## go_out    0.03  0.03 -0.30 -0.02 -0.06
## sports   -0.13  0.09  0.42 -0.15  0.00
## tvsports -0.10  0.10  0.32  0.10 -0.12
## exercise  0.00 -0.10  0.31  0.05  0.04
## dining    0.42 -0.07  0.23  0.25  0.05
## museums   0.92 -0.02 -0.07 -0.03 -0.02
## art       0.92 -0.03 -0.02 -0.06 -0.04
## hiking    0.20  0.04  0.03 -0.09  0.05
## gaming   -0.05  0.15  0.23  0.09  0.02
## clubbing  0.16 -0.03  0.23  0.10  0.05
## reading   0.30 -0.01 -0.13  0.02  0.02
## tv        0.04  0.01  0.06  0.50 -0.02
## theater   0.60 -0.05 -0.12  0.35  0.02
## movies    0.38  0.04 -0.07  0.38  0.01
## concerts  0.46  0.00  0.04  0.21 -0.05
## music     0.35  0.01  0.15  0.23  0.01
## shopping  0.27 -0.15  0.24  0.45  0.02
## yoga      0.33 -0.02  0.07  0.15 -0.02
## exphappy  0.07  0.16  0.27 -0.05  0.06
## attr1_1  -0.19 -0.38  0.45 -0.43 -0.15
## sinc1_1   0.02  0.20 -0.36  0.19  0.02
## intel1_1  0.13 -0.04 -0.27  0.02 -0.01
## fun1_1   -0.02  0.07  0.11 -0.03  0.32
## amb1_1    0.13  0.17 -0.03  0.42  0.08
## shar1_1   0.05  0.31 -0.28  0.23 -0.10
## attr2_1   0.02 -0.94 -0.04  0.03 -0.34
## sinc2_1  -0.06  0.67  0.04 -0.04  0.00
## intel2_1 -0.03  0.61  0.15 -0.09  0.01
## fun2_1    0.06 -0.09 -0.12  0.03  0.98
## amb2_1   -0.09  0.59  0.16 -0.14 -0.11
## shar2_1   0.09  0.52 -0.14  0.19 -0.03
## attr3_1   0.12  0.01  0.53  0.06  0.04
## sinc3_1   0.13  0.04  0.04  0.27 -0.01
## intel3_1  0.05  0.11  0.32  0.06 -0.01
## fun3_1    0.15 -0.03  0.58  0.21 -0.06
## amb3_1    0.10  0.02  0.47  0.25 -0.05
## 
##                 ML3  ML2  ML4  ML5  ML1
## SS loadings    3.26 2.77 2.49 1.82 1.29
## Proportion Var 0.08 0.07 0.06 0.05 0.03
## Cumulative Var 0.08 0.15 0.22 0.26 0.30
fa.diagram(fa_3)

This model uses “Varimax” rotation, assuming that factors are orthogonal so that no correlation between them is present. It shows a bit different diagram while the conclusion about factor loadings still applies.

fa_3$TLI
## [1] 0.2667007
fa_3$RMSEA[1]
##     RMSEA 
## 0.1325313

The TLI and RMSEA values hadn’t changed at all suggesting that rotation did not help.

“Oblimin” rotation

Let’s do it with “Oblimin” rotation assuming that factors are correlated with each other.

fa_4 <- fa(dating1, nfactors=5, rotate="oblimin", fm="ml") 
print(fa_4$loadings, cutoff = 0, digits = 2)
## 
## Loadings:
##          ML2   ML3   ML4   ML5   ML1  
## imprace  -0.07 -0.07  0.20  0.18  0.00
## imprelig -0.13 -0.14  0.10  0.27 -0.04
## date     -0.09 -0.06 -0.27  0.25  0.00
## go_out   -0.02  0.06 -0.28  0.13 -0.06
## sports    0.18 -0.10  0.29 -0.31 -0.02
## tvsports  0.11 -0.14  0.31 -0.03 -0.15
## exercise -0.04 -0.05  0.30 -0.11  0.06
## dining   -0.05  0.31  0.36  0.11  0.07
## museums   0.01  0.93 -0.03 -0.01  0.00
## art       0.01  0.94  0.00 -0.06 -0.02
## hiking    0.08  0.23  0.00 -0.09  0.04
## gaming    0.17 -0.10  0.24  0.00 -0.01
## clubbing  0.00  0.10  0.27 -0.02  0.05
## reading  -0.02  0.30 -0.09  0.07  0.02
## tv       -0.08 -0.15  0.31  0.44 -0.03
## theater  -0.12  0.47  0.11  0.37  0.04
## movies   -0.04  0.24  0.15  0.39  0.00
## concerts -0.03  0.38  0.16  0.19 -0.05
## music     0.01  0.25  0.26  0.14  0.01
## shopping -0.18  0.08  0.46  0.29  0.05
## yoga     -0.03  0.27  0.16  0.10 -0.01
## exphappy  0.22  0.07  0.21 -0.15  0.03
## attr1_1  -0.22 -0.06  0.18 -0.65 -0.08
## sinc1_1   0.09 -0.02 -0.23  0.37 -0.01
## intel1_1 -0.09  0.14 -0.22  0.13  0.00
## fun1_1    0.14 -0.03  0.10 -0.09  0.30
## amb1_1    0.09 -0.02  0.19  0.43  0.05
## shar1_1   0.19  0.00 -0.15  0.40 -0.16
## attr2_1  -0.96  0.00 -0.01 -0.06 -0.16
## sinc2_1   0.66 -0.02 -0.01  0.04 -0.13
## intel2_1  0.63  0.01  0.06 -0.06 -0.10
## fun2_1    0.02 -0.01 -0.01 -0.01  1.00
## amb2_1    0.61 -0.02  0.03 -0.11 -0.23
## shar2_1   0.44  0.05 -0.05  0.32 -0.13
## attr3_1   0.10  0.06  0.51 -0.19  0.03
## sinc3_1   0.00  0.03  0.17  0.24 -0.02
## intel3_1  0.15  0.01  0.31 -0.07 -0.04
## fun3_1    0.03  0.04  0.63 -0.06 -0.06
## amb3_1    0.05 -0.02  0.54  0.03 -0.05
## 
##                 ML2  ML3  ML4  ML5  ML1
## SS loadings    2.72 2.70 2.52 2.13 1.30
## Proportion Var 0.07 0.07 0.06 0.05 0.03
## Cumulative Var 0.07 0.14 0.20 0.26 0.29
fa.diagram(fa_4)

fa_4$TLI
## [1] 0.2667007
fa_4$RMSEA[1]
##     RMSEA 
## 0.1325313

Factor loadings changed now reaching even |0.96| in some cases. Nevertheless, measures of fit are still the same indicating a bad model fit.

6 factors with rotation

Finally, I will try to fit the model very last time with 6 factors, as suggested by the second scree plot, and “Oblimin” rotation, as I believe there should be some correlation between factors concerning what attracts people in their potential partner on the first date.

fa6 <- fa(dating1, nfactors=6, rotate="oblimin", fm="ml") 
print(fa6$loadings, cutoff = 0, digits = 2)
## 
## Loadings:
##          ML5   ML4   ML6   ML1   ML2   ML3  
## imprace  -0.01 -0.06  0.19  0.04 -0.03  0.01
## imprelig -0.12 -0.15  0.17  0.22 -0.08 -0.04
## date     -0.03 -0.10 -0.28  0.21  0.08  0.00
## go_out    0.11  0.00 -0.33  0.04  0.07 -0.05
## sports   -0.14  0.16  0.24 -0.17  0.10 -0.03
## tvsports -0.11  0.09  0.23 -0.01  0.12 -0.15
## exercise -0.06 -0.07  0.29 -0.05  0.05  0.05
## dining    0.34 -0.05  0.35  0.03 -0.03  0.07
## museums   0.93  0.02 -0.03 -0.02 -0.04  0.00
## art       0.93  0.02 -0.01 -0.05  0.02 -0.02
## hiking    0.21  0.05 -0.02  0.01  0.11  0.04
## gaming   -0.05  0.19  0.16 -0.09  0.04 -0.01
## clubbing  0.11 -0.01  0.23 -0.01  0.05  0.05
## reading   0.29 -0.01 -0.03  0.05 -0.12  0.03
## tv       -0.01 -0.07  0.22  0.18  0.08 -0.02
## theater   0.56 -0.10  0.08  0.16 -0.01  0.05
## movies    0.34 -0.01  0.10  0.15 -0.04  0.01
## concerts  0.45 -0.01  0.09  0.05  0.06 -0.04
## music     0.32  0.00  0.20  0.06  0.09  0.01
## shopping  0.18 -0.18  0.39  0.10  0.05  0.05
## yoga      0.28 -0.05  0.16  0.11  0.06 -0.02
## exphappy  0.06  0.22  0.17 -0.12  0.01  0.02
## attr1_1  -0.01 -0.07  0.02 -0.93  0.17 -0.04
## sinc1_1   0.01  0.01 -0.25  0.49  0.24 -0.03
## intel1_1  0.02 -0.02 -0.03  0.07 -0.98 -0.01
## fun1_1   -0.08  0.05  0.11  0.20  0.17  0.28
## amb1_1   -0.01 -0.03  0.28  0.56  0.12  0.03
## shar1_1   0.03  0.11 -0.11  0.49  0.18 -0.17
## attr2_1  -0.01 -0.95  0.00 -0.07 -0.03 -0.17
## sinc2_1   0.00  0.65 -0.08  0.05  0.13 -0.13
## intel2_1 -0.02  0.65  0.10 -0.07 -0.22 -0.10
## fun2_1    0.00  0.03 -0.02  0.00  0.01  1.00
## amb2_1   -0.05  0.60  0.03 -0.05  0.04 -0.23
## shar2_1   0.11  0.41 -0.04  0.25  0.11 -0.12
## attr3_1   0.04  0.09  0.53 -0.17 -0.03  0.03
## sinc3_1   0.07 -0.05  0.18  0.23  0.09 -0.02
## intel3_1 -0.03  0.14  0.40 -0.04 -0.16 -0.04
## fun3_1    0.03 -0.03  0.64  0.05  0.15 -0.07
## amb3_1   -0.02  0.03  0.58  0.01  0.01 -0.06
## 
##                 ML5  ML4  ML6  ML1  ML2  ML3
## SS loadings    2.93 2.55 2.39 2.13 1.36 1.28
## Proportion Var 0.08 0.07 0.06 0.05 0.03 0.03
## Cumulative Var 0.08 0.14 0.20 0.26 0.29 0.32
fa.diagram(fa6)

fa6$TLI
## [1] 0.2733481
fa6$RMSEA[1]
##     RMSEA 
## 0.1319293

Okaaay, the situation did not change much, although the TLI value increased a little. But this is still really bad FA model.. Perhaps, the model with 14 factors, though being uninterpretable, was the closest to ideal judging by performance indicators (TLI, RMSEA).

Labeling factors

So, let’s give it a shot and try to label the resulting from the last model factors.

  • The ML5 is mostly represented with such variables as museums, art, theater, and concerts. It seems appropriate to label this factor as “Cultural interests”.

  • The ML4 is represented with attr2_1, sinc2_1, intel2_1, shar2_1, and amb2_1 and can be labeled as “Personal characteristics_2”.

  • The ML6 is mostly covered with amb3_1, fun3_1, intel3_1, and attr3_1 asking for “Personal characteristics_3”.

  • The ML1, mostly covered with attr1_1, sinc1_1, amb1_1, and shar1_1, and the ML2, mostly presented by intel1_1 only, can be in turn labeled as “Personal characteristics_1”. Yet as they’re somehow happened to be in separated factors, it becomes quite confusing. So ML2 can be called “Intelligence_1”, I guess.

  • Lastly, the ML3 is presented by fun2_1 and so, by analogy with the previous one, can be labeled as “Sense of humor_2”. With numbers for each of labels including those indicating the wave of the survey.

Still, it’s a mess :)

Anyways, that’s all for EFA Part 1. Thanks for your attention!