Here I’ll quickly review some important points about running an EFA and CFA.
There are 4 steps to this process, which will require 2 datasets: Dataset #1: * Construct EFA, and make decisions about factors. * Remove items and finalize model Dataset #2: * Construct CFA based on model uncovered in EFA * Evaluate model fit
I’ll also say a few things about: * The decision to conduct a CFA with both datasets. * Using factor scores for prediction (and, what some alternatives would be) * Validation
The data I’ll use for this will rely on recent data collected here at UvA. I’m going to keep it simple; I will conduct an EFA using items from existing measures (Dominance & Prestige). This is part of a separate study, but, in brief, we are going to test what personality traits lead to people emerging as a leader in a group.
For later validation purposes, and prediction purposes, I’ll also include measures of agency and communion, and a future DV: Leadership conferral. So, together we will ensure the 2-factor structure of dominance and prestige (i.e., make sure they are distinct), show their convergent and divergent validity with agency and communion (in theory, dominance is high agency but not associated with communion; prestige is high agency high communion), and test if they predict our key outcome: whether a person emerged as a leader in the group. This should be pretty similar to what you’re trying to do, with validating a multi-factor scale, validate the factors, and use them to predict something.
First, we load our relevant libraries and data.
# Data cleaning, wrangling, and visualization
library(tidyr)
library(dplyr)
library(ggplot2)
#read and edit data:
library(readr)
library(stringr)
#read-in data:
data <- read_csv("~/Dropbox/Research/Status and attraction/Explicit Perceptions 2/Data/Edited data.csv")
data<-data[51:68] #these are the relevant columns of ALL data
colnames(data) <- str_replace(colnames(data), "^SR\\.", "") #ignore this.
head(data)
## # A tibble: 6 Ă— 18
## dom1 dom2 dom3 dom4 pres1 pres2 pres3 pres4 agency1 agency2 agency3 comm1
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2 2 3 2 5 2 2 2 2 2 2 2
## 2 5 5 6 5 5 6 6 6 6 6 6 5
## 3 6 7 7 5 7 7 7 6 7 7 7 7
## 4 7 7 7 7 7 6 6 7 7 7 6 7
## 5 6 7 6 6 6 4 6 5 7 7 6 7
## 6 7 7 7 7 7 7 7 7 7 7 7 7
## # ℹ 6 more variables: comm2 <dbl>, comm3 <dbl>, kind <dbl>, high.status <dbl>,
## # leader <dbl>, paid.attention <dbl>
First step, lets conduct our EFA. We’re going to conduct an EFA on a shortened version of the dominance and prestige scale. This consists of 8 items – 4 for prestige, and 4 for dominance. Here are the items we expect to load on each factor:
Nonetheless, we want to make sure that these distinct clusters naturally emerge, to make sure that the items to tap into 2 distinctive things. So, lets take a look.
To test this, we can construct a scree plot, or run a parallel analysis, to see how many factors naturally emerge. For simplicity, I’ll construct a parallel analysis (which also produces a scree plot). In brief, the parallel analysis is used to determine the number of factors with observed eigenvalues greater than those from simulated random data. Or, simply put, for how many factors is the blue line (observed eigenvalues) higher than the corresponding red dotted line (eigenvalues from 1-factor or 0-factor data).
library(psych)
DomPres<-tibble(data[1:8])
#fa.parallel from psych package
psych::fa.parallel(DomPres)
## Parallel analysis suggests that the number of factors = 2 and the number of components = 2
The fa.parallel directly states that the analysis suggests 2 factors (or 2 principle components).
This is consistent with what we expected. Next, lets go ahead and look at the factor loadings from an EFA, with 2 factors. We want to find a simple structure with the loadings. Simple structure is defined as the indicators cleanly loading on only one factor (e.g., loading > .6 on target factor), but without cross-loading on other factors (e.g., <.4 on non-target factors).
Note that I’ll constrain the factors to be uncorrelated. This is a theoretical decision – do you think the factors are correlated, or uncorrelated? If correlated, use an “oblimin” rotation. If uncorrelated, use a “varimax” rotation. Although this decision is (partly) based on theory, it also has empirical consequences; if you allow the factors to correlate, you’re less likely to find a simple structure (in my experience).
#run the factor analysis
fa_result <- psych::fa(DomPres,
nfactors = 2, #number of factors corresponding to fa.parallel
rotate = "varimax") #for *uncorrelated* factors.
# Extract the factor loadings
loadings <- fa_result$loadings
print(loadings, cutoff = .4)
##
## Loadings:
## MR1 MR2
## dom1 0.783
## dom2 0.820
## dom3 0.768
## dom4 0.788
## pres1 0.677
## pres2 0.693
## pres3 0.704
## pres4 0.685
##
## MR1 MR2
## SS loadings 2.796 2.347
## Proportion Var 0.349 0.293
## Cumulative Var 0.349 0.643
Here we see a very clean simple structure. The 4 dominance items load nicely onto the first factor and NOT the second factor, whereas the 3 prestige items load nicely onto the second factor and not the first factor.
Of course, these scales are pretty good, so there’s no other action that really needs to be taken. But, when you create your own scales or scale items based on your intuition (even if based on qualitative input), no matter how good you think they are it’s pretty likely you will run into a few items that just don’t really fit nicely onto a single factor. If that’s the case you should just remove it.
Now that we’ve confirmed the factor structure, we can create composite scores from the scales. There are two ways to do this. The simple way is to calculate means based on the items from each factor. This is an option, but it’s a bit problematic, because it assigns equal value to all items in the factor. (Yet, based on the factor loadings, we know that all items are not equally relevant for the factor). I only mention this method in case it will be more digestible to stakeholders.
data<-data %>%
mutate(DominanceTOT = dom1+dom2+dom3+dom4, #these loaded onto the 1st factor.
PrestigeTOT = pres1+pres2+pres3+pres4, #these loaded onto the 2nd factor.
Agency = agency1+agency2+agency3, #agency
Communion = comm3+kind, #communion
leader = leader)
The other – better – way of doing this is to use factor scores. By calculating factor scores, you’re weighing each item in proportion to it’s loading on the factor, with items loading highest on the factor being given the most weight.
As I briefly mentioned, dominance and prestige should BOTH correlate with agency. Dominance should NOT be associated with communion, whereas prestige should be positively associated with communion.
Lets run a quick correlation, to test whether these scales are behaving in expected ways. The top matrix is the correlation matrix, the bottom one is the P values:
library(Hmisc)
rcorr(as.matrix(tibble(
dom.factor= data$dom.factor,
pres.factor= data$pres.factor,
Agency= data$Agency,
Communion= data$Communion)))
## dom.factor pres.factor Agency Communion
## dom.factor 1.00 0.16 0.46 0.05
## pres.factor 0.16 1.00 0.69 0.57
## Agency 0.46 0.69 1.00 0.45
## Communion 0.05 0.57 0.45 1.00
##
## n= 1503
##
##
## P
## dom.factor pres.factor Agency Communion
## dom.factor 0.0000 0.0000 0.0757
## pres.factor 0.0000 0.0000 0.0000
## Agency 0.0000 0.0000 0.0000
## Communion 0.0757 0.0000 0.0000
As you can see, dominance and prestige both correlate with agency (woohoo!). Furthermore, prestige correlates with communion, but dominance does not (again, woohoo!).
Both factors are distinct, and their content represents what we expect (i.e., agency and communion). Last, lets go ahead and see if these distinct factors predict leadership emergence. To do this, include all of the as simultaneous predictors in a linear model:
##
## Call:
## lm(formula = leader ~ dom.factor + pres.factor, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.1324 -0.5495 0.0677 0.6238 3.4024
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.32335 0.02559 208.05 <2e-16 ***
## dom.factor 0.59193 0.02814 21.03 <2e-16 ***
## pres.factor 0.88058 0.02985 29.50 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.992 on 1500 degrees of freedom
## (168 observations deleted due to missingness)
## Multiple R-squared: 0.5072, Adjusted R-squared: 0.5065
## F-statistic: 771.8 on 2 and 1500 DF, p-value: < 2.2e-16
As desired, factor scores from both dominance and prestige predict leadership emergence in groups. These two factors account for 67% of the variance in final leadership scores.
In sum, dominance and prestige are two distinct, valid, and reliable, and important factors for predicting leadership emergence.
(*note: You can run this again, instead including agency and communion as predictors, and you will see that those only account for a total of 49% of the variance. Therefore, not only are dominance and prestige important, valid, and distinct factors for predicting who emerges as a leader, but using this framework also allows us to better understand leadership conferrals overall, when compared to the more traditional agency/communion framework).
Now we’ve determined our factor structure based on our EFA. Specifically, we have a 2-factor structure, with uncorrelated factors, each with 4 loadings (and no cross loadings). Lets test whether, in a new dataset, responses to the 8 items are consistent with this structure, and if they similarly predict leadership.
I’ll load in a new dataset that was nearly a direct replication of this study. (see my comment in the general discussion at the end)
Now we need to run our CFA model. First, I’ll specify the model:
library(lavaan)
CFA.dom.pres<-'
dom.factor=~NA*dom1+dom2+dom3+dom4
pres.factor=~NA*pres1+pres2+pres3+pres4
dom.factor~~1*dom.factor
pres.factor~~1*pres.factor
dom.factor~~0*pres.factor #note, specifying no correlation between factors.
'
There are a few important things to note about this. First, lavaan
will automatically assign the first factor loading to 1, and let the
factor variance freely emerge. I think that’s stupid – and so do a lot
of other people. So, instead, set the first factor loading to be free,
and constrain the factor variance to 1 (you have to constrain ONE of
these – either the first factor loading or the factor variance). Do this
by saying NA*dom1, or NA*pres1, and then constrain the factor variance
to 1 (e.g., “pres.factor1*pres.factor”;
dom.factor1*dom.factor)
Next, you will notice that there is NO correlation between the two factors. This is based on what I specified (and found) in my EFA model. If you DO want to allow the factors to correlate, remove the 0 specifying no correlation between the factors:
correlated.factors<-'
dom.factor=~NA*dom1+dom2+dom3+dom4
pres.factor=~NA*pres1+pres2+pres3+pres4
dom.factor~~1*dom.factor
pres.factor~~1*pres.factor
dom.factor~~pres.factor #Factors now correlate
'
Ok, now that we’ve specified the model, lets see how the model fits the data. Here are the approximate benchmarks we want to hit:
fit1<-sem(CFA.dom.pres, data=data, missing = "FIML")
summary(fit1, standardized = T, rsquare = T, fit.measures = T)
## lavaan 0.6.16 ended normally after 16 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 24
##
## Used Total
## Number of observations 252 266
## Number of missing patterns 1
##
## Model Test User Model:
##
## Test statistic 227.100
## Degrees of freedom 20
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 1047.743
## Degrees of freedom 28
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.797
## Tucker-Lewis Index (TLI) 0.716
##
## Robust Comparative Fit Index (CFI) 0.797
## Robust Tucker-Lewis Index (TLI) 0.716
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -3066.357
## Loglikelihood unrestricted model (H1) -2952.807
##
## Akaike (AIC) 6180.714
## Bayesian (BIC) 6265.420
## Sample-size adjusted Bayesian (SABIC) 6189.337
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.203
## 90 Percent confidence interval - lower 0.179
## 90 Percent confidence interval - upper 0.227
## P-value H_0: RMSEA <= 0.050 0.000
## P-value H_0: RMSEA >= 0.080 1.000
##
## Robust RMSEA 0.203
## 90 Percent confidence interval - lower 0.179
## 90 Percent confidence interval - upper 0.227
## P-value H_0: Robust RMSEA <= 0.050 0.000
## P-value H_0: Robust RMSEA >= 0.080 1.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.283
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Observed
## Observed information based on Hessian
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## dom.factor =~
## dom1 1.021 0.091 11.234 0.000 1.021 0.666
## dom2 1.232 0.081 15.221 0.000 1.232 0.832
## dom3 1.158 0.076 15.187 0.000 1.158 0.828
## dom4 1.189 0.079 15.020 0.000 1.189 0.820
## pres.factor =~
## pres1 0.981 0.079 12.383 0.000 0.981 0.745
## pres2 1.005 0.076 13.229 0.000 1.005 0.787
## pres3 0.937 0.078 12.017 0.000 0.937 0.738
## pres4 0.747 0.075 9.961 0.000 0.747 0.636
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## dom.factor ~~
## pres.factor 0.000 0.000 0.000
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .dom1 4.913 0.097 50.878 0.000 4.913 3.205
## .dom2 5.067 0.093 54.360 0.000 5.067 3.424
## .dom3 5.206 0.088 59.073 0.000 5.206 3.721
## .dom4 5.270 0.091 57.704 0.000 5.270 3.635
## .pres1 5.238 0.083 63.093 0.000 5.238 3.974
## .pres2 5.365 0.080 66.737 0.000 5.365 4.204
## .pres3 5.381 0.080 67.345 0.000 5.381 4.242
## .pres4 5.385 0.074 72.778 0.000 5.385 4.585
## dom.factor 0.000 0.000 0.000
## pres.factor 0.000 0.000 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## dom.factor 1.000 1.000 1.000
## pres.factor 1.000 1.000 1.000
## .dom1 1.307 0.133 9.804 0.000 1.307 0.556
## .dom2 0.673 0.094 7.151 0.000 0.673 0.307
## .dom3 0.617 0.082 7.478 0.000 0.617 0.315
## .dom4 0.688 0.089 7.731 0.000 0.688 0.327
## .pres1 0.774 0.099 7.835 0.000 0.774 0.446
## .pres2 0.619 0.091 6.787 0.000 0.619 0.380
## .pres3 0.732 0.096 7.599 0.000 0.732 0.455
## .pres4 0.822 0.090 9.105 0.000 0.822 0.596
##
## R-Square:
## Estimate
## dom1 0.444
## dom2 0.693
## dom3 0.685
## dom4 0.673
## pres1 0.554
## pres2 0.620
## pres3 0.545
## pres4 0.404
Is my model good? No! Absolutely not! It’s absolutely terrible.
Lets explore why the model fit was so bad, using the following code (which essentially sorts the chances I could make to improve the model fit):
## lhs op rhs mi epc sepc.lv sepc.all sepc.nox
## 11 dom.factor ~~ pres.factor 120.014 0.810 0.810 0.810 0.810
## 60 pres1 ~~ pres2 23.819 0.483 0.483 0.698 0.698
## 65 pres3 ~~ pres4 23.819 0.343 0.343 0.442 0.442
## 30 dom.factor =~ pres1 18.297 0.286 0.286 0.217 0.217
## 51 dom3 ~~ dom4 14.232 0.356 0.356 0.546 0.546
## 38 dom1 ~~ dom2 14.231 0.325 0.325 0.346 0.346
At the very top of the list, you can see the problem. It’s that the factors are correlated.
It’s important not to just plug anything into your model at this point – only make additions that would make sense. But, nonetheless, I suspect the correlation between the factors is the problem (a recent meta showed that, in fact, dominance and prestige tend to be related constructs). Lets look at model fit when we allow the factors to correlate:
fit2<-sem(correlated.factors, data=data, missing = "FIML")
summary(fit2, standardized = T, rsquare = T, fit.measures = T)
## lavaan 0.6.16 ended normally after 19 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 25
##
## Used Total
## Number of observations 252 266
## Number of missing patterns 1
##
## Model Test User Model:
##
## Test statistic 62.603
## Degrees of freedom 19
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 1047.743
## Degrees of freedom 28
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.957
## Tucker-Lewis Index (TLI) 0.937
##
## Robust Comparative Fit Index (CFI) 0.957
## Robust Tucker-Lewis Index (TLI) 0.937
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -2984.108
## Loglikelihood unrestricted model (H1) -2952.807
##
## Akaike (AIC) 6018.217
## Bayesian (BIC) 6106.453
## Sample-size adjusted Bayesian (SABIC) 6027.199
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.095
## 90 Percent confidence interval - lower 0.070
## 90 Percent confidence interval - upper 0.122
## P-value H_0: RMSEA <= 0.050 0.003
## P-value H_0: RMSEA >= 0.080 0.847
##
## Robust RMSEA 0.095
## 90 Percent confidence interval - lower 0.070
## 90 Percent confidence interval - upper 0.122
## P-value H_0: Robust RMSEA <= 0.050 0.003
## P-value H_0: Robust RMSEA >= 0.080 0.847
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.039
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Observed
## Observed information based on Hessian
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## dom.factor =~
## dom1 1.020 0.090 11.353 0.000 1.020 0.665
## dom2 1.223 0.079 15.386 0.000 1.223 0.826
## dom3 1.160 0.075 15.516 0.000 1.160 0.829
## dom4 1.196 0.078 15.401 0.000 1.196 0.825
## pres.factor =~
## pres1 1.024 0.075 13.711 0.000 1.024 0.777
## pres2 0.996 0.072 13.833 0.000 0.996 0.780
## pres3 0.919 0.074 12.376 0.000 0.919 0.724
## pres4 0.722 0.072 10.017 0.000 0.722 0.615
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## dom.factor ~~
## pres.factor 0.814 0.034 23.977 0.000 0.814 0.814
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .dom1 4.913 0.097 50.878 0.000 4.913 3.205
## .dom2 5.067 0.093 54.360 0.000 5.067 3.424
## .dom3 5.206 0.088 59.073 0.000 5.206 3.721
## .dom4 5.270 0.091 57.704 0.000 5.270 3.635
## .pres1 5.238 0.083 63.093 0.000 5.238 3.974
## .pres2 5.365 0.080 66.737 0.000 5.365 4.204
## .pres3 5.381 0.080 67.345 0.000 5.381 4.242
## .pres4 5.385 0.074 72.778 0.000 5.385 4.585
## dom.factor 0.000 0.000 0.000
## pres.factor 0.000 0.000 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## dom.factor 1.000 1.000 1.000
## pres.factor 1.000 1.000 1.000
## .dom1 1.310 0.130 10.060 0.000 1.310 0.558
## .dom2 0.695 0.086 8.096 0.000 0.695 0.317
## .dom3 0.611 0.075 8.132 0.000 0.611 0.312
## .dom4 0.671 0.081 8.243 0.000 0.671 0.319
## .pres1 0.688 0.084 8.235 0.000 0.688 0.396
## .pres2 0.637 0.077 8.255 0.000 0.637 0.391
## .pres3 0.765 0.086 8.917 0.000 0.765 0.475
## .pres4 0.858 0.086 9.960 0.000 0.858 0.622
##
## R-Square:
## Estimate
## dom1 0.442
## dom2 0.683
## dom3 0.688
## dom4 0.681
## pres1 0.604
## pres2 0.609
## pres3 0.525
## pres4 0.378
Model fit is much better. It’s not perfect, but I’m satisfied. We see that all indicators loaded onto their respective factors, and, that the factors were (actually quite highly) correlated. Note that the std.all is the standardized beta for the respective effects.
Now that we have our measurement model, lets save factor scores:
And, now that we have our factor scores saves, lets see if they again predict leadership:
##
## Call:
## lm(formula = leader ~ dom.factor.scores + pres.factor.scores,
## data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.1216 -0.5816 0.0177 0.6123 3.9882
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.46825 0.05577 98.051 < 2e-16 ***
## dom.factor.scores 0.36741 0.12937 2.840 0.00488 **
## pres.factor.scores 0.60555 0.13188 4.592 6.98e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8853 on 249 degrees of freedom
## (14 observations deleted due to missingness)
## Multiple R-squared: 0.5031, Adjusted R-squared: 0.4991
## F-statistic: 126 on 2 and 249 DF, p-value: < 2.2e-16
Again, we see that dominance and prestige are two distinct (yet, correlated) personality traits that lead to the emergence of leadership.
Wait! There’s more! (don’t skip this.)
In fact, this isn’t the best way of doing this. Because, why not just combine your CFA, and your prediction model, into the same comprehensive and parsimonious model? On the one hand, model fit may become a problem – individual items from either factor may uniquely correlate with your outcome – independent of their expected relationship via the factor – thus messing with your model fit. On the other hand, parsimony, and comprehensiveness. Here’s what the comprehensive model would look like – notice it all happens at once:
Comprehensive.SEM.CFA.model<-'
dom.factor=~NA*dom1+dom2+dom3+dom4
pres.factor=~NA*pres1+pres2+pres3+pres4
dom.factor~~1*dom.factor
pres.factor~~1*pres.factor
dom.factor~~pres.factor
leader~dom.factor+pres.factor # ADD THE PREDICTION ELEMENT TO THE MODEL
'
fit3<-sem(Comprehensive.SEM.CFA.model, data=data, missing = "FIML")
summary(fit3, standardized = T, rsquare = T, fit.measures = T)
## lavaan 0.6.16 ended normally after 22 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 29
##
## Used Total
## Number of observations 252 266
## Number of missing patterns 1
##
## Model Test User Model:
##
## Test statistic 76.263
## Degrees of freedom 25
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 1238.810
## Degrees of freedom 36
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.957
## Tucker-Lewis Index (TLI) 0.939
##
## Robust Comparative Fit Index (CFI) 0.957
## Robust Tucker-Lewis Index (TLI) 0.939
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -3308.888
## Loglikelihood unrestricted model (H1) -3270.757
##
## Akaike (AIC) 6675.776
## Bayesian (BIC) 6778.130
## Sample-size adjusted Bayesian (SABIC) 6686.195
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.090
## 90 Percent confidence interval - lower 0.068
## 90 Percent confidence interval - upper 0.114
## P-value H_0: RMSEA <= 0.050 0.003
## P-value H_0: RMSEA >= 0.080 0.783
##
## Robust RMSEA 0.090
## 90 Percent confidence interval - lower 0.068
## 90 Percent confidence interval - upper 0.114
## P-value H_0: Robust RMSEA <= 0.050 0.003
## P-value H_0: Robust RMSEA >= 0.080 0.783
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.038
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Observed
## Observed information based on Hessian
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## dom.factor =~
## dom1 1.030 0.089 11.546 0.000 1.030 0.672
## dom2 1.235 0.079 15.677 0.000 1.235 0.835
## dom3 1.154 0.075 15.420 0.000 1.154 0.825
## dom4 1.184 0.078 15.204 0.000 1.184 0.817
## pres.factor =~
## pres1 1.002 0.075 13.381 0.000 1.002 0.760
## pres2 0.980 0.072 13.600 0.000 0.980 0.768
## pres3 0.945 0.073 12.986 0.000 0.945 0.745
## pres4 0.743 0.071 10.472 0.000 0.743 0.632
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## leader ~
## dom.factor 0.361 0.132 2.727 0.006 0.361 0.289
## pres.factor 0.617 0.135 4.558 0.000 0.617 0.495
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## dom.factor ~~
## pres.factor 0.812 0.034 23.696 0.000 0.812 0.812
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .dom1 4.913 0.097 50.878 0.000 4.913 3.205
## .dom2 5.067 0.093 54.360 0.000 5.067 3.424
## .dom3 5.206 0.088 59.073 0.000 5.206 3.721
## .dom4 5.270 0.091 57.704 0.000 5.270 3.635
## .pres1 5.238 0.083 63.093 0.000 5.238 3.974
## .pres2 5.365 0.080 66.737 0.000 5.365 4.204
## .pres3 5.381 0.080 67.345 0.000 5.381 4.242
## .pres4 5.385 0.074 72.778 0.000 5.385 4.585
## .leader 5.468 0.079 69.533 0.000 5.468 4.380
## dom.factor 0.000 0.000 0.000
## pres.factor 0.000 0.000 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## dom.factor 1.000 1.000 1.000
## pres.factor 1.000 1.000 1.000
## .dom1 1.288 0.128 10.085 0.000 1.288 0.548
## .dom2 0.664 0.083 8.040 0.000 0.664 0.303
## .dom3 0.625 0.075 8.326 0.000 0.625 0.320
## .dom4 0.699 0.082 8.506 0.000 0.699 0.332
## .pres1 0.733 0.084 8.710 0.000 0.733 0.422
## .pres2 0.668 0.077 8.633 0.000 0.668 0.410
## .pres3 0.717 0.081 8.893 0.000 0.717 0.445
## .pres4 0.828 0.083 9.984 0.000 0.828 0.600
## .leader 0.685 0.070 9.723 0.000 0.685 0.440
##
## R-Square:
## Estimate
## dom1 0.452
## dom2 0.697
## dom3 0.680
## dom4 0.668
## pres1 0.578
## pres2 0.590
## pres3 0.555
## pres4 0.400
## leader 0.560
Results look great. Scores on the dominance and prestige factor both predict leadership emergence.
## # A tibble: 266 Ă— 19
## dom1 dom2 dom3 dom4 pres1 pres2 pres3 pres4 agency1 agency2 agency3 comm1
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 6 6 7 7 6 7 7 6 7 7 6 7
## 2 1 3 4 4 4 4 5 5 6 5 5 5
## 3 1 2 3 4 5 6 7 7 5 4 3 2
## 4 4 5 4 5 6 4 5 5 6 2 3 5
## 5 7 7 7 6 7 7 7 6 6 7 7 7
## 6 6 6 7 7 7 7 7 6 7 7 7 6
## 7 6 6 6 6 6 6 7 7 7 7 7 7
## 8 4 4 5 5 4 5 4 5 4 5 4 5
## 9 6 5 5 7 5 6 4 3 2 4 6 6
## 10 5 6 5 6 5 6 5 6 5 6 5 6
## # ℹ 256 more rows
## # ℹ 7 more variables: comm2 <dbl>, comm3 <dbl>, kind <dbl>, high.status <dbl>,
## # leader <dbl>, dom.factor.scores <dbl>, pres.factor.scores <dbl>
Can you do a CFA with both datasets? Sure. But, as a general rule, you’re going to want to use a more exploratory set to figure out which items to include versus not, and whether you want your factors correlated, etc. If I was the person doing it, I would recommend doing an EFA, and then a CFA on the same dataset, but then getting a second confirmatory follow-up dataset and doing another CFA on that. The confirmatory set is important, to make sure your model and findings replicates, as messing around with the data will lead to overfitting and may not be a genuine or robust/replicable finding.