Structural equation modeling is a advanced stastictical method and analysis that seek to explain the relationships among multiple variables. In doing so, it examines the structure of interrelantionship expressed in series of multiple regression equation. Furthermore, these variables are latent factors that represented by multiple variables. Thus, SEM can be thought combination to two type of multivariate technique: factor analysis and multiple regression. This tutorial will explore confirmatory factor analysis (CFA) in SEM method.

R packages consists of several SEM technique pacakges such as: sem, lavaan, and OpenMx. In this tutorial we will use two packages, lavaan used to employ SEM technique, and semPlot to do graphical analysis of SEM final model.

library(lavaan)
library(foreign)

#import data from spss
newdata <- read.spss("C:/Users/asus/Google Drive/Ilham Fadhil/Tutor/Advanced Statistics/Archive/Materi/Week 10/LISREL - DEPRESS/DEPRESS.sav", use.value.labels = TRUE, to.data.frame = TRUE)
head(newdata)
##   SELF1 SELF2 SELF3 SELF4 SELF5 DEPRES1 DEPRES2 DEPRES3 DEPRES4 IMPULS1
## 1     3     2     3     4     4       4       2       0       4       0
## 2     2     1     2     3     2       3       0       0       1       2
## 3     2     1     4     2     2       2       0       4       4       0
## 4     1     1     2     2     4       4       3       4       4       0
## 5     2     0     1     2     3       2       1       4       4       0
## 6     4     3     3     2     4       2       1       3       4       0
##   IMPULS2 IMPULS3
## 1       0       0
## 2       0       3
## 3       0       0
## 4       0       0
## 5       1       2
## 6       0       3

the data contains 204 observations of 12 indicators of 3 latent variables:

CFA is a way of testing how well measured variables represent a smaller number of constructs. The difference between confirmatory factor analysis (CFA) and exploratory factor analysis (EFA) is how these two method derived factors. in EFA the factors derived from statistical analysis based on loading and eigenvalue, on the other hand CFA’s factors derived from theoretical analysis. In this respect, in CFA researcher must assigned and specify both the number of factors that exixt within set of variables and which factor each variable will load highly on before results can be computed.

# develop CFA model
cfa.model <- 'selfest =~ SELF1 + SELF2 + SELF3 + SELF4 + SELF5
              depres =~ DEPRES1 + DEPRES2 + DEPRES3 + DEPRES4
              impuls =~ IMPULS1 + IMPULS2 + IMPULS3'
cfa.sem <- cfa(cfa.model, data = newdata)
summary(cfa.sem)
## lavaan (0.5-23.1097) converged normally after  39 iterations
## 
##   Number of observations                           204
## 
##   Estimator                                         ML
##   Minimum Function Test Statistic              123.453
##   Degrees of freedom                                51
##   P-value (Chi-square)                           0.000
## 
## Parameter Estimates:
## 
##   Information                                 Expected
##   Standard Errors                             Standard
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   selfest =~                                          
##     SELF1             1.000                           
##     SELF2             1.048    0.098   10.715    0.000
##     SELF3             1.000    0.095   10.501    0.000
##     SELF4             1.127    0.103   10.953    0.000
##     SELF5             1.269    0.109   11.672    0.000
##   depres =~                                           
##     DEPRES1           1.000                           
##     DEPRES2           0.710    0.065   10.995    0.000
##     DEPRES3           0.863    0.071   12.160    0.000
##     DEPRES4           0.892    0.069   12.978    0.000
##   impuls =~                                           
##     IMPULS1           1.000                           
##     IMPULS2           0.650    0.128    5.079    0.000
##     IMPULS3           0.958    0.197    4.872    0.000
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   selfest ~~                                          
##     depres            1.028    0.142    7.227    0.000
##     impuls            0.125    0.055    2.291    0.022
##   depres ~~                                           
##     impuls            0.129    0.068    1.892    0.058
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .SELF1             0.923    0.102    9.083    0.000
##    .SELF2             0.656    0.077    8.518    0.000
##    .SELF3             0.662    0.076    8.676    0.000
##    .SELF4             0.673    0.081    8.310    0.000
##    .SELF5             0.552    0.075    7.329    0.000
##    .DEPRES1           0.454    0.075    6.097    0.000
##    .DEPRES2           0.850    0.094    9.041    0.000
##    .DEPRES3           0.940    0.108    8.672    0.000
##    .DEPRES4           0.815    0.098    8.326    0.000
##    .IMPULS1           0.296    0.072    4.081    0.000
##    .IMPULS2           0.200    0.034    5.820    0.000
##    .IMPULS3           1.030    0.120    8.584    0.000
##     selfest           0.964    0.169    5.706    0.000
##     depres            1.513    0.199    7.618    0.000
##     impuls            0.363    0.088    4.112    0.000

One of the advantage of SEM/CFA is its ability to assess the construct validity of a proposed mesurement theory by examining its construct validty. Construct validity is the extent to which a set of measurement items that relfects the theoretical lantent construct which designed to measure.

Standardized factor loadings is used to computing variance extracted and construct reliability in order to asses the convergent validity

inspect(cfa.sem, what = "std")
## $lambda
##         selfst depres impuls
## SELF1    0.715  0.000  0.000
## SELF2    0.786  0.000  0.000
## SELF3    0.770  0.000  0.000
## SELF4    0.803  0.000  0.000
## SELF5    0.859  0.000  0.000
## DEPRES1  0.000  0.877  0.000
## DEPRES2  0.000  0.688  0.000
## DEPRES3  0.000  0.738  0.000
## DEPRES4  0.000  0.772  0.000
## IMPULS1  0.000  0.000  0.742
## IMPULS2  0.000  0.000  0.658
## IMPULS3  0.000  0.000  0.494
## 
## $theta
##         SELF1 SELF2 SELF3 SELF4 SELF5 DEPRES1 DEPRES2 DEPRES3 DEPRES4
## SELF1   0.489                                                        
## SELF2   0.000 0.383                                                  
## SELF3   0.000 0.000 0.407                                            
## SELF4   0.000 0.000 0.000 0.354                                      
## SELF5   0.000 0.000 0.000 0.000 0.262                                
## DEPRES1 0.000 0.000 0.000 0.000 0.000 0.231                          
## DEPRES2 0.000 0.000 0.000 0.000 0.000 0.000   0.527                  
## DEPRES3 0.000 0.000 0.000 0.000 0.000 0.000   0.000   0.455          
## DEPRES4 0.000 0.000 0.000 0.000 0.000 0.000   0.000   0.000   0.404  
## IMPULS1 0.000 0.000 0.000 0.000 0.000 0.000   0.000   0.000   0.000  
## IMPULS2 0.000 0.000 0.000 0.000 0.000 0.000   0.000   0.000   0.000  
## IMPULS3 0.000 0.000 0.000 0.000 0.000 0.000   0.000   0.000   0.000  
##         IMPULS1 IMPULS2 IMPULS3
## SELF1                          
## SELF2                          
## SELF3                          
## SELF4                          
## SELF5                          
## DEPRES1                        
## DEPRES2                        
## DEPRES3                        
## DEPRES4                        
## IMPULS1 0.449                  
## IMPULS2 0.000   0.566          
## IMPULS3 0.000   0.000   0.756  
## 
## $psi
##         selfst depres impuls
## selfest 1.000               
## depres  0.851  1.000        
## impuls  0.212  0.174  1.000

In assesing the CFA model we use goodness-of-fit measurement below

fitMeasures(cfa.sem)
##                npar                fmin               chisq 
##              27.000               0.303             123.453 
##                  df              pvalue      baseline.chisq 
##              51.000               0.000            1291.011 
##         baseline.df     baseline.pvalue                 cfi 
##              66.000               0.000               0.941 
##                 tli                nnfi                 rfi 
##               0.923               0.923               0.876 
##                 nfi                pnfi                 ifi 
##               0.904               0.699               0.942 
##                 rni                logl   unrestricted.logl 
##               0.941           -3346.414           -3284.687 
##                 aic                 bic              ntotal 
##            6746.827            6836.417             204.000 
##                bic2               rmsea      rmsea.ci.lower 
##            6750.873               0.083               0.065 
##      rmsea.ci.upper        rmsea.pvalue                 rmr 
##               0.102               0.002               0.145 
##          rmr_nomean                srmr        srmr_bentler 
##               0.145               0.094               0.094 
## srmr_bentler_nomean         srmr_bollen  srmr_bollen_nomean 
##               0.094               0.094               0.094 
##          srmr_mplus   srmr_mplus_nomean               cn_05 
##               0.094               0.094             114.473 
##               cn_01                 gfi                agfi 
##             128.877               0.909               0.861 
##                pgfi                 mfi                ecvi 
##               0.594               0.837               0.870