We are going to use the SAQ dataset from Andy Field. For more details and to see the SPSS usage click this
library(foreign)
library(psych)
library(lavaan)
library(semPlot)saq <- foreign::read.spss("https://stats.idre.ucla.edu/wp-content/uploads/2018/05/SAQ.sav",
to.data.frame=TRUE, use.value.labels = FALSE)
dim(saq)## [1] 2571 23
head(saq)## q01 q02 q03 q04 q05 q06 q07 q08 q09 q10 q11 q12 q13 q14 q15 q16 q17 q18 q19
## 1 2 1 4 2 2 2 3 1 1 2 1 2 2 2 2 3 1 2 3
## 2 1 1 4 3 2 2 2 2 5 2 2 3 1 3 4 3 2 2 3
## 3 2 3 2 2 4 1 2 2 2 2 3 3 2 4 2 3 2 3 1
## 4 3 1 1 4 3 3 4 2 2 4 2 2 2 3 3 3 2 4 2
## 5 2 1 3 2 2 3 3 2 4 2 2 3 3 2 2 2 2 3 3
## 6 2 1 3 2 4 4 4 2 4 3 2 4 3 3 5 2 3 5 1
## q20 q21 q22 q23
## 1 2 2 2 5
## 2 4 4 4 2
## 3 4 3 2 2
## 4 4 4 4 3
## 5 4 2 4 4
## 6 5 3 1 4
As you can see, this dataset includes 23 variables with 4 Likert type scoring and 2571 participants.
It can be checked by using Kaiser-Meyer-Olkin test.
psych::KMO(saq)## Kaiser-Meyer-Olkin factor adequacy
## Call: psych::KMO(r = saq)
## Overall MSA = 0.93
## MSA for each item =
## q01 q02 q03 q04 q05 q06 q07 q08 q09 q10 q11 q12 q13 q14 q15 q16
## 0.93 0.87 0.95 0.96 0.96 0.89 0.94 0.87 0.83 0.95 0.91 0.95 0.95 0.97 0.94 0.93
## q17 q18 q19 q20 q21 q22 q23
## 0.93 0.95 0.94 0.89 0.93 0.88 0.77
According to the results, sample size is enough for factor analytic solutions.
It can be checked by using Bartlett’s test of sphericity
psych::cortest.bartlett(saq)## R was not square, finding R from data
## $chisq
## [1] 19334.49
##
## $p.value
## [1] 0
##
## $df
## [1] 253
According to the results, multiple correlations among the variables is significant.
After checking basic assumptions, we need to clarify that how many components or factors that we are going to extract. For this purposes, we can use some methods or criterion:
fa.parallel() function in ‘psych’ package can be used for all these together.
psych::fa.parallel(saq)## Parallel analysis suggests that the number of factors = 7 and the number of components = 4
We can try the 1, 2 and 4 components/factors solutions. Let’s have a 4 components/factors.
As a basic difference, PCA ignores the error variances in initial solutions. So, calculations are easier the the other EFA techniques.
We can use the ‘principal’ function in ‘psych’ package.
model_pca <- psych::principal(saq, nfactors = 4)Communalities indicate the variance explained by each variable seperately. It’s recommended that these values are bigger than .30 or .32.
coms <- model_pca$communality
round(coms, 2)## q01 q02 q03 q04 q05 q06 q07 q08 q09 q10 q11 q12 q13 q14 q15 q16
## 0.43 0.41 0.53 0.47 0.34 0.65 0.55 0.74 0.48 0.33 0.69 0.51 0.54 0.49 0.38 0.49
## q17 q18 q19 q20 q21 q22 q23
## 0.68 0.60 0.34 0.48 0.55 0.46 0.41
It’s seen that all communalities are bigger than .32.
model_pca## Principal Components Analysis
## Call: psych::principal(r = saq, nfactors = 4)
## Standardized loadings (pattern matrix) based upon correlation matrix
## RC3 RC1 RC4 RC2 h2 u2 com
## q01 0.24 0.50 0.36 0.06 0.43 0.57 2.4
## q02 -0.01 -0.34 0.07 0.54 0.41 0.59 1.7
## q03 -0.20 -0.57 -0.18 0.37 0.53 0.47 2.3
## q04 0.32 0.52 0.31 0.04 0.47 0.53 2.4
## q05 0.32 0.43 0.24 0.01 0.34 0.66 2.5
## q06 0.80 -0.01 0.10 -0.07 0.65 0.35 1.0
## q07 0.64 0.33 0.16 -0.08 0.55 0.45 1.7
## q08 0.13 0.17 0.83 0.01 0.74 0.26 1.1
## q09 -0.09 -0.20 0.12 0.65 0.48 0.52 1.3
## q10 0.55 0.00 0.13 -0.12 0.33 0.67 1.2
## q11 0.26 0.21 0.75 -0.14 0.69 0.31 1.5
## q12 0.47 0.52 0.09 -0.08 0.51 0.49 2.1
## q13 0.65 0.23 0.23 -0.10 0.54 0.46 1.6
## q14 0.58 0.36 0.14 -0.07 0.49 0.51 1.8
## q15 0.46 0.22 0.29 -0.19 0.38 0.62 2.6
## q16 0.33 0.51 0.31 -0.12 0.49 0.51 2.6
## q17 0.27 0.22 0.75 -0.04 0.68 0.32 1.5
## q18 0.68 0.33 0.13 -0.08 0.60 0.40 1.5
## q19 -0.15 -0.37 -0.03 0.43 0.34 0.66 2.2
## q20 -0.04 0.68 0.07 -0.14 0.48 0.52 1.1
## q21 0.29 0.66 0.16 -0.07 0.55 0.45 1.5
## q22 -0.19 0.03 -0.10 0.65 0.46 0.54 1.2
## q23 -0.02 0.17 -0.20 0.59 0.41 0.59 1.4
##
## RC3 RC1 RC4 RC2
## SS loadings 3.73 3.34 2.55 1.95
## Proportion Var 0.16 0.15 0.11 0.08
## Cumulative Var 0.16 0.31 0.42 0.50
## Proportion Explained 0.32 0.29 0.22 0.17
## Cumulative Proportion 0.32 0.61 0.83 1.00
##
## Mean item complexity = 1.8
## Test of the hypothesis that 4 components are sufficient.
##
## The root mean square of the residuals (RMSR) is 0.06
## with the empirical chi square 4006.15 with prob < 0
##
## Fit based upon off diagonal values = 0.96
According to the results:
These results indicate the well-defined structure. By the way, relatively some poor items, like 2, 15, 16 could be ignored and re-analysed.
EFA considers error variences for each variable in initial solution. And it is an iterative process. For this reasons, EFA calculations are more complex than PCA and parameters migth be estimated lower, especially small sample or fewer items.
For EFA, we can use ‘fa()’ function in ‘psych’ package. This function includes many arguments. Results can be assessed similar to the PCA results.
library(GPArotation)
model_fa <- fa(saq, nfactors = 4)
model_fa## Factor Analysis using method = minres
## Call: fa(r = saq, nfactors = 4)
## Standardized loadings (pattern matrix) based upon correlation matrix
## MR1 MR4 MR3 MR2 h2 u2 com
## q01 0.51 0.18 0.01 0.08 0.37 0.63 1.3
## q02 -0.11 0.03 0.06 0.48 0.26 0.74 1.2
## q03 -0.43 -0.09 0.03 0.36 0.47 0.53 2.1
## q04 0.53 0.15 0.08 0.06 0.42 0.58 1.2
## q05 0.43 0.10 0.11 0.03 0.30 0.70 1.2
## q06 -0.11 0.02 0.81 0.00 0.60 0.40 1.0
## q07 0.29 0.04 0.47 -0.04 0.49 0.51 1.7
## q08 -0.01 0.85 -0.07 0.06 0.65 0.35 1.0
## q09 0.00 0.07 -0.01 0.59 0.34 0.66 1.0
## q10 0.05 0.08 0.35 -0.07 0.20 0.80 1.2
## q11 -0.03 0.75 0.07 -0.11 0.63 0.37 1.1
## q12 0.50 -0.03 0.24 -0.07 0.45 0.55 1.5
## q13 0.17 0.13 0.48 -0.05 0.47 0.53 1.4
## q14 0.33 0.03 0.38 -0.04 0.42 0.58 2.0
## q15 0.15 0.20 0.27 -0.14 0.32 0.68 3.1
## q16 0.51 0.15 0.07 -0.08 0.46 0.54 1.3
## q17 0.10 0.67 0.07 0.03 0.58 0.42 1.1
## q18 0.29 0.00 0.53 -0.03 0.54 0.46 1.6
## q19 -0.19 -0.02 -0.03 0.36 0.24 0.76 1.6
## q20 0.48 0.02 -0.15 -0.17 0.27 0.73 1.5
## q21 0.61 0.03 0.05 -0.07 0.47 0.53 1.0
## q22 0.15 -0.08 -0.12 0.48 0.25 0.75 1.4
## q23 0.18 -0.11 -0.02 0.35 0.12 0.88 1.7
##
## MR1 MR4 MR3 MR2
## SS loadings 3.16 2.29 2.37 1.49
## Proportion Var 0.14 0.10 0.10 0.06
## Cumulative Var 0.14 0.24 0.34 0.40
## Proportion Explained 0.34 0.25 0.25 0.16
## Cumulative Proportion 0.34 0.59 0.84 1.00
##
## With factor correlations of
## MR1 MR4 MR3 MR2
## MR1 1.00 0.54 0.51 -0.39
## MR4 0.54 1.00 0.46 -0.19
## MR3 0.51 0.46 1.00 -0.29
## MR2 -0.39 -0.19 -0.29 1.00
##
## Mean item complexity = 1.4
## Test of the hypothesis that 4 factors are sufficient.
##
## The degrees of freedom for the null model are 253 and the objective function was 7.55 with Chi Square of 19334.49
## The degrees of freedom for the model are 167 and the objective function was 0.46
##
## The root mean square of the residuals (RMSR) is 0.03
## The df corrected root mean square of the residuals is 0.03
##
## The harmonic number of observations is 2571 with the empirical chi square 880.47 with prob < 2.3e-97
## The total number of observations was 2571 with Likelihood Chi Square = 1166.19 with prob < 2.3e-149
##
## Tucker Lewis Index of factoring reliability = 0.921
## RMSEA index = 0.048 and the 90 % confidence intervals are 0.046 0.051
## BIC = -145.11
## Fit based upon off diagonal values = 0.99
## Measures of factor score adequacy
## MR1 MR4 MR3 MR2
## Correlation of (regression) scores with factors 0.91 0.92 0.90 0.82
## Multiple R square of scores with factors 0.83 0.84 0.81 0.67
## Minimum correlation of possible factor scores 0.66 0.69 0.62 0.35
According to the results;
CFA is a model-data fit test based on multivariate regression. Outputs are coefficients of paths and fit indices. If the paths are significant and indices indicate acceptable or high degree of fit, that means the structural model is confirmed by data.
CFA’ assumptions are the same as PCA and EFA. Multivariate normality, adequacy of the sample size, multivariate linearity, etc. Also multicollinearity and autocorrelation risks should be checked before further analysis.
For now, we assume that the assumptions are met by the SAQ data and going on building model and estimating parameters. We are going to use ‘cfa()’ function in ‘lavaan’ package.
syntx <- '
RC3 =~ q06 + q07 + q10 + q13 + q14 + q15 + q18
RC1 =~ q01 + q03 + q04 + q05 + q12 + q16 + q20 + q21
RC4 =~ q08 + q11 + q17
RC2 =~ q02 + q09 + q19 + q22 + q23
SAQ =~ RC3
SAQ =~ RC1
SAQ =~ RC4
SAQ =~ RC2
'
model_cfa <- lavaan::cfa(syntx, data = saq)We built the model and estimated the model parameters. Let’s see the results.
summary(model_cfa, fit.measures = TRUE, standardized = TRUE)## lavaan 0.6-9 ended normally after 48 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 50
##
## Number of observations 2571
##
## Model Test User Model:
##
## Test statistic 2316.247
## Degrees of freedom 226
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 19406.199
## Degrees of freedom 253
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.891
## Tucker-Lewis Index (TLI) 0.878
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -74302.494
## Loglikelihood unrestricted model (H1) -73144.371
##
## Akaike (AIC) 148704.988
## Bayesian (BIC) 148997.591
## Sample-size adjusted Bayesian (BIC) 148838.727
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.060
## 90 Percent confidence interval - lower 0.058
## 90 Percent confidence interval - upper 0.062
## P-value RMSEA <= 0.05 0.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.049
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## RC3 =~
## q06 1.000 0.708 0.631
## q07 1.090 0.037 29.136 0.000 0.772 0.701
## q10 0.533 0.028 19.365 0.000 0.378 0.431
## q13 0.922 0.032 28.745 0.000 0.653 0.688
## q14 0.935 0.033 27.936 0.000 0.662 0.663
## q15 0.797 0.033 24.320 0.000 0.564 0.559
## q18 1.097 0.036 30.277 0.000 0.777 0.738
## RC1 =~
## q01 1.000 0.491 0.594
## q03 -1.351 0.054 -25.188 0.000 -0.664 -0.618
## q04 1.240 0.048 25.913 0.000 0.609 0.642
## q05 1.086 0.047 23.178 0.000 0.534 0.553
## q12 1.235 0.047 26.487 0.000 0.607 0.662
## q16 1.264 0.047 26.922 0.000 0.621 0.678
## q20 0.927 0.048 19.220 0.000 0.455 0.440
## q21 1.320 0.050 26.388 0.000 0.649 0.659
## RC4 =~
## q08 1.000 0.660 0.756
## q11 1.074 0.029 36.874 0.000 0.708 0.805
## q17 1.025 0.029 35.655 0.000 0.676 0.765
## RC2 =~
## q02 1.000 0.406 0.477
## q09 1.614 0.108 15.012 0.000 0.655 0.519
## q19 1.419 0.094 15.064 0.000 0.576 0.523
## q22 1.249 0.086 14.586 0.000 0.507 0.487
## q23 0.731 0.071 10.251 0.000 0.297 0.284
## SAQ =~
## RC3 1.000 0.886 0.886
## RC1 0.746 0.034 22.220 0.000 0.952 0.952
## RC4 0.748 0.032 23.142 0.000 0.711 0.711
## RC2 -0.366 0.025 -14.425 0.000 -0.566 -0.566
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .q06 0.757 0.023 32.366 0.000 0.757 0.602
## .q07 0.619 0.020 30.758 0.000 0.619 0.509
## .q10 0.626 0.018 34.663 0.000 0.626 0.815
## .q13 0.473 0.015 31.100 0.000 0.473 0.526
## .q14 0.558 0.018 31.711 0.000 0.558 0.560
## .q15 0.700 0.021 33.466 0.000 0.700 0.687
## .q18 0.505 0.017 29.520 0.000 0.505 0.456
## .q01 0.444 0.013 33.041 0.000 0.444 0.648
## .q03 0.715 0.022 32.660 0.000 0.715 0.619
## .q04 0.528 0.016 32.213 0.000 0.528 0.588
## .q05 0.645 0.019 33.575 0.000 0.645 0.694
## .q12 0.471 0.015 31.794 0.000 0.471 0.561
## .q16 0.453 0.014 31.431 0.000 0.453 0.540
## .q20 0.865 0.025 34.620 0.000 0.865 0.807
## .q21 0.549 0.017 31.871 0.000 0.549 0.566
## .q08 0.326 0.012 26.312 0.000 0.326 0.428
## .q11 0.273 0.012 22.717 0.000 0.273 0.353
## .q17 0.324 0.013 25.722 0.000 0.324 0.414
## .q02 0.559 0.019 29.915 0.000 0.559 0.773
## .q09 1.166 0.041 28.398 0.000 1.166 0.731
## .q19 0.881 0.031 28.226 0.000 0.881 0.726
## .q22 0.826 0.028 29.566 0.000 0.826 0.763
## .q23 1.001 0.029 34.103 0.000 1.001 0.919
## .RC3 0.108 0.011 9.603 0.000 0.215 0.215
## .RC1 0.022 0.005 4.460 0.000 0.093 0.093
## .RC4 0.215 0.012 17.330 0.000 0.494 0.494
## .RC2 0.112 0.012 9.405 0.000 0.680 0.680
## SAQ 0.394 0.026 15.037 0.000 1.000 1.000
fits <- fitMeasures(model_cfa)
data.frame(fits = round(fits, 2))## fits
## npar 50.00
## fmin 0.45
## chisq 2316.25
## df 226.00
## pvalue 0.00
## baseline.chisq 19406.20
## baseline.df 253.00
## baseline.pvalue 0.00
## cfi 0.89
## tli 0.88
## nnfi 0.88
## rfi 0.87
## nfi 0.88
## pnfi 0.79
## ifi 0.89
## rni 0.89
## logl -74302.49
## unrestricted.logl -73144.37
## aic 148704.99
## bic 148997.59
## ntotal 2571.00
## bic2 148838.73
## rmsea 0.06
## rmsea.ci.lower 0.06
## rmsea.ci.upper 0.06
## rmsea.pvalue 0.00
## rmr 0.05
## rmr_nomean 0.05
## srmr 0.05
## srmr_bentler 0.05
## srmr_bentler_nomean 0.05
## crmr 0.05
## crmr_nomean 0.05
## srmr_mplus 0.05
## srmr_mplus_nomean 0.05
## cn_05 291.89
## cn_01 310.00
## gfi 0.93
## agfi 0.91
## pgfi 0.76
## mfi 0.67
## ecvi 0.94
According to the results:
As a results, we can try to improve the model with some statistical modifications.
modifs <- modindices(model_cfa, sort = TRUE)
head(modifs, 15)## lhs op rhs mi epc sepc.lv sepc.all sepc.nox
## 72 RC1 =~ q06 236.858 -1.639 -0.805 -0.718 -0.718
## 125 SAQ =~ q06 221.300 -1.659 -1.041 -0.928 -0.928
## 356 q20 ~~ q21 210.736 0.214 0.214 0.311 0.311
## 115 RC2 =~ q03 191.390 1.023 0.415 0.386 0.386
## 254 q15 ~~ q16 104.394 0.124 0.124 0.221 0.221
## 84 RC1 =~ q19 87.796 -0.657 -0.323 -0.293 -0.293
## 285 q01 ~~ q16 87.623 0.094 0.094 0.210 0.210
## 145 SAQ =~ q19 83.634 -0.527 -0.331 -0.300 -0.300
## 122 RC2 =~ q08 81.099 0.434 0.176 0.202 0.202
## 140 SAQ =~ q08 78.450 -0.379 -0.238 -0.273 -0.273
## 380 q11 ~~ q17 78.447 -0.120 -0.120 -0.404 -0.404
## 64 RC3 =~ q08 74.664 -0.273 -0.194 -0.222 -0.222
## 60 RC3 =~ q12 73.293 0.509 0.361 0.394 0.394
## 182 q07 ~~ q21 69.544 0.108 0.108 0.186 0.186
## 305 q03 ~~ q02 63.237 0.109 0.109 0.173 0.173
As you see, there are some suggestions for modifications. We can use some of them to improve our model. We prefer just items relted with same latent variable. It’s safer approach.
syntx2 <- '
RC3 =~ q06 + q07 + q10 + q13 + q14 + q15 + q18
RC1 =~ q01 + q03 + q04 + q05 + q12 + q16 + q20 + q21
RC4 =~ q08 + q11 + q17
RC2 =~ q02 + q09 + q19 + q22 + q23
SAQ =~ RC3
SAQ =~ RC1
SAQ =~ RC4
SAQ =~ RC2
q20 ~~ q21
q01 ~~ q16
q11 ~~ q17
q06 ~~ q07
q15 ~~ q18
'
modified_model_cfa <- lavaan::cfa(syntx2, data = saq)
fits2 <- round(fitMeasures(modified_model_cfa), 2)
data.frame(fits2)## fits2
## npar 55.00
## fmin 0.35
## chisq 1822.11
## df 221.00
## pvalue 0.00
## baseline.chisq 19406.20
## baseline.df 253.00
## baseline.pvalue 0.00
## cfi 0.92
## tli 0.90
## nnfi 0.90
## rfi 0.89
## nfi 0.91
## pnfi 0.79
## ifi 0.92
## rni 0.92
## logl -74055.43
## unrestricted.logl -73144.37
## aic 148220.85
## bic 148542.72
## ntotal 2571.00
## bic2 148367.96
## rmsea 0.05
## rmsea.ci.lower 0.05
## rmsea.ci.upper 0.06
## rmsea.pvalue 0.01
## rmr 0.05
## rmr_nomean 0.05
## srmr 0.04
## srmr_bentler 0.04
## srmr_bentler_nomean 0.04
## crmr 0.05
## crmr_nomean 0.05
## srmr_mplus 0.04
## srmr_mplus_nomean 0.04
## cn_05 363.18
## cn_01 385.96
## gfi 0.94
## agfi 0.93
## pgfi 0.75
## mfi 0.73
## ecvi 0.75
As you can see, we get better model. Now, diagrams wtih graphical tool. We use ‘semPaths()’ function in ‘semPlot’ package.
semPlot::semPaths(modified_model_cfa, what = "est", rotation = 2,
style = "lisrel", font = 2)
title("Row Estimations")semPlot::semPaths(modified_model_cfa, what = "std", rotation = 2,
style = "lisrel", font = 2, title = TRUE)
title("Standardized Estimations")library(rmarkdown) render(“C:\Users\acer\Desktop\EFA_CFA_example.Rmd”)