Project 3 - Structural Equation Modeling (SEM)

Author

Chi-Ning Chang modified by Margaret Gatongi

SEM Assignment: Task 2

  1. The data set (project3_data.csv) contains the following variables:
  • id: student ID

  • female: female =1; male =0.

  • ses: socioeconomic status

  • math_score: math test score

  • mse1: do an excellent job on math tests 

  • mse2: understand math textbook

  • mse3: master skills taught in math course

  • mse4: do an excellent job on math assignments

Math self-efficacy can be measured using four continuous items: mse1, mse2, mse3, and mse4. Each item is scored on a scale ranging from 1 to 4, where a higher score indicates a higher level of math self-efficacy.

Theoretically, students’ math motivational factors, including math self-efficacy, are expected to positively predict their math achievement at a later time (Saw & Chang, 2018). Please test the predictive validity of math self-efficacy by using Structural Equation Modeling (SEM) to examine whether, and to what extent, math self-efficacy positively predicts math test scores, as illustrated in Figure 1. Report your findings and include your R code. (5 points)

Project 3: Structural Equation Modeling (SEM)

Load Data

# Example data

data <- read.csv("project3_data.csv")

head(data) # Display the first six rows
  id female     ses math_score mse1 mse2 mse3 mse4
1  1      1  1.2715    46.7904    2    2    2    2
2  2      1 -0.1480    56.4332    2    2    2    2
3  3      1  0.0502    54.7217    3    2    3    3
4  4      1  0.0826    53.9767    3    3    3    3
5  5      0 -1.0577    38.9407    2    3    3    3
6  6      0  0.6963    53.0355    3    3    3    3

Testing for Normality of Data

#  Install and load the MVN package

#install.packages("MVN")

library(MVN)

# Test multivariate normality for all observed continuous variables

hz_test <- mvn(data = data[, c("mse1", "mse2", "mse3", "mse4", "ses", "math_score")], mvnTest = "hz")

# Display the results

hz_test
$multivariateNormality
           Test       HZ p value MVN
1 Henze-Zirkler 13.56932       0  NO

$univariateNormality
              Test   Variable Statistic   p value Normality
1 Anderson-Darling    mse1      78.9824  <0.001      NO    
2 Anderson-Darling    mse2      60.2503  <0.001      NO    
3 Anderson-Darling    mse3      88.1124  <0.001      NO    
4 Anderson-Darling    mse4      91.8098  <0.001      NO    
5 Anderson-Darling    ses        2.3492  <0.001      NO    
6 Anderson-Darling math_score    1.2641  0.0027      NO    

$Descriptives
              n       Mean   Std.Dev   Median     Min     Max     25th    75th
mse1       1000  3.0370000 0.7564411  3.00000  1.0000  4.0000  3.00000  4.0000
mse2       1000  2.7570000 0.8310485  3.00000  1.0000  4.0000  2.00000  3.0000
mse3       1000  3.0320000 0.7038973  3.00000  1.0000  4.0000  3.00000  3.0000
mse4       1000  3.1140000 0.7010674  3.00000  1.0000  4.0000  3.00000  4.0000
ses        1000  0.1602087 0.7918316  0.09930 -1.6616  2.5668 -0.40240  0.7069
math_score 1000 52.2871577 9.9354092 51.95995 24.6926 79.5300 46.49687 59.1433
                 Skew    Kurtosis
mse1       -0.5602597  0.16475493
mse2       -0.2329040 -0.51117610
mse3       -0.4230704  0.16036508
mse4       -0.5787041  0.49580175
ses         0.3382118 -0.07924449
math_score -0.1237308  0.02366755

STEP 1: Model Specification

# Install and load the lavaan package

#install.packages("lavaan")
library(lavaan)
This is lavaan 0.6-18
lavaan is FREE software! Please report any bugs.
# specify the model

model <- '
          # measurement model          
          mse =~ mse1 + mse2 + mse3 + mse4
          
          # structural model
          math_score ~ mse
          '

STEP 2: Model Estimation

model_fit<- sem(model, data=data, estimator = "MLR")

STEP 3: Model Evaluation

summary(model_fit, fit.measures=TRUE, standardized=TRUE) 
lavaan 0.6-18 ended normally after 33 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        10

  Number of observations                          1000

Model Test User Model:
                                              Standard      Scaled
  Test Statistic                                34.138      26.982
  Degrees of freedom                                 5           5
  P-value (Chi-square)                           0.000       0.000
  Scaling correction factor                                  1.265
    Yuan-Bentler correction (Mplus variant)                       

Model Test Baseline Model:

  Test statistic                              2736.521    1851.720
  Degrees of freedom                                10          10
  P-value                                        0.000       0.000
  Scaling correction factor                                  1.478

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    0.989       0.988
  Tucker-Lewis Index (TLI)                       0.979       0.976
                                                                  
  Robust Comparative Fit Index (CFI)                         0.990
  Robust Tucker-Lewis Index (TLI)                            0.980

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)              -6866.634   -6866.634
  Scaling correction factor                                  1.363
      for the MLR correction                                      
  Loglikelihood unrestricted model (H1)      -6849.565   -6849.565
  Scaling correction factor                                  1.331
      for the MLR correction                                      
                                                                  
  Akaike (AIC)                               13753.268   13753.268
  Bayesian (BIC)                             13802.345   13802.345
  Sample-size adjusted Bayesian (SABIC)      13770.585   13770.585

Root Mean Square Error of Approximation:

  RMSEA                                          0.076       0.066
  90 Percent confidence interval - lower         0.053       0.046
  90 Percent confidence interval - upper         0.102       0.089
  P-value H_0: RMSEA <= 0.050                    0.031       0.094
  P-value H_0: RMSEA >= 0.080                    0.435       0.169
                                                                  
  Robust RMSEA                                               0.075
  90 Percent confidence interval - lower                     0.049
  90 Percent confidence interval - upper                     0.103
  P-value H_0: Robust RMSEA <= 0.050                         0.060
  P-value H_0: Robust RMSEA >= 0.080                         0.410

Standardized Root Mean Square Residual:

  SRMR                                           0.015       0.015

Parameter Estimates:

  Standard errors                             Sandwich
  Information bread                           Observed
  Observed information based on                Hessian

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  mse =~                                                                
    mse1              1.000                               0.653    0.863
    mse2              1.025    0.031   32.637    0.000    0.669    0.806
    mse3              0.900    0.034   26.629    0.000    0.588    0.835
    mse4              0.920    0.028   32.443    0.000    0.600    0.857

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  math_score ~                                                          
    mse               5.951    0.498   11.953    0.000    3.885    0.391

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .mse1              0.146    0.012   11.754    0.000    0.146    0.255
   .mse2              0.242    0.016   15.337    0.000    0.242    0.351
   .mse3              0.150    0.013   11.276    0.000    0.150    0.302
   .mse4              0.131    0.011   11.683    0.000    0.131    0.266
   .math_score       83.524    3.922   21.296    0.000   83.524    0.847
    mse               0.426    0.026   16.095    0.000    1.000    1.000

Create a visual diagram of the SEM model:

# Install and load the semPlot package

#install.packages("semPlot")
library("semPlot")

# Create a visual diagram of the CFA or SEM model.

semPaths(model_fit, whatLabels="std", sizeMan=10, edge.label.cex=1)  # "std"=> displaying standardized values; change sizeMan and edge.label.cex to adjust the font size

STEP 4: Model Modification

modificationindices(model_fit, sort.=TRUE) # sort.=TRUE: sort the output using the values of the modification index values. Higher values appear first.
    lhs op        rhs     mi    epc sepc.lv sepc.all sepc.nox
13 mse1 ~~       mse3 28.573 -0.045  -0.045   -0.306   -0.306
17 mse2 ~~       mse4 23.596 -0.043  -0.043   -0.245   -0.245
14 mse1 ~~       mse4 13.000  0.031   0.031    0.227    0.227
16 mse2 ~~       mse3 10.877  0.029   0.029    0.154    0.154
12 mse1 ~~       mse2  3.797  0.019   0.019    0.101    0.101
20 mse3 ~~ math_score  3.463  0.242   0.242    0.068    0.068
19 mse3 ~~       mse4  1.993  0.011   0.011    0.079    0.079
18 mse2 ~~ math_score  0.911 -0.153  -0.153   -0.034   -0.034
21 mse4 ~~ math_score  0.231 -0.060  -0.060   -0.018   -0.018
15 mse1 ~~ math_score  0.191 -0.059  -0.059   -0.017   -0.017

Model modification based on the MI

# Modify the model

model_2 <- '
          # measurement model          
          mse =~ mse1 + mse2 + mse3 + mse4
          mse1 ~~ mse3
          
          # structural model
          math_score ~ mse
          '

# Fit the model again

model_fit_2<- sem(model_2, data=data, estimator = "MLR")

# Summarize the modified model

summary(model_fit_2, fit.measures=TRUE, standardized=TRUE)
lavaan 0.6-18 ended normally after 34 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        11

  Number of observations                          1000

Model Test User Model:
                                              Standard      Scaled
  Test Statistic                                 2.983       2.434
  Degrees of freedom                                 4           4
  P-value (Chi-square)                           0.561       0.656
  Scaling correction factor                                  1.226
    Yuan-Bentler correction (Mplus variant)                       

Model Test Baseline Model:

  Test statistic                              2736.521    1851.720
  Degrees of freedom                                10          10
  P-value                                        0.000       0.000
  Scaling correction factor                                  1.478

User Model versus Baseline Model:

  Comparative Fit Index (CFI)                    1.000       1.000
  Tucker-Lewis Index (TLI)                       1.001       1.002
                                                                  
  Robust Comparative Fit Index (CFI)                         1.000
  Robust Tucker-Lewis Index (TLI)                            1.002

Loglikelihood and Information Criteria:

  Loglikelihood user model (H0)              -6851.056   -6851.056
  Scaling correction factor                                  1.369
      for the MLR correction                                      
  Loglikelihood unrestricted model (H1)      -6849.565   -6849.565
  Scaling correction factor                                  1.331
      for the MLR correction                                      
                                                                  
  Akaike (AIC)                               13724.112   13724.112
  Bayesian (BIC)                             13778.097   13778.097
  Sample-size adjusted Bayesian (SABIC)      13743.161   13743.161

Root Mean Square Error of Approximation:

  RMSEA                                          0.000       0.000
  90 Percent confidence interval - lower         0.000       0.000
  90 Percent confidence interval - upper         0.042       0.034
  P-value H_0: RMSEA <= 0.050                    0.982       0.995
  P-value H_0: RMSEA >= 0.080                    0.000       0.000
                                                                  
  Robust RMSEA                                               0.000
  90 Percent confidence interval - lower                     0.000
  90 Percent confidence interval - upper                     0.042
  P-value H_0: Robust RMSEA <= 0.050                         0.979
  P-value H_0: Robust RMSEA >= 0.080                         0.000

Standardized Root Mean Square Residual:

  SRMR                                           0.006       0.006

Parameter Estimates:

  Standard errors                             Sandwich
  Information bread                           Observed
  Observed information based on                Hessian

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  mse =~                                                                
    mse1              1.000                               0.678    0.897
    mse2              0.970    0.032   29.838    0.000    0.658    0.792
    mse3              0.905    0.031   28.962    0.000    0.614    0.873
    mse4              0.863    0.029   30.102    0.000    0.585    0.835

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  math_score ~                                                          
    mse               5.653    0.476   11.885    0.000    3.835    0.386

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
 .mse1 ~~                                                               
   .mse3             -0.047    0.010   -4.783    0.000   -0.047   -0.408

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .mse1              0.111    0.014    7.770    0.000    0.111    0.195
   .mse2              0.257    0.016   15.760    0.000    0.257    0.373
   .mse3              0.118    0.014    8.211    0.000    0.118    0.238
   .mse4              0.148    0.012   12.411    0.000    0.148    0.302
   .math_score       83.908    3.919   21.413    0.000   83.908    0.851
    mse               0.460    0.028   16.492    0.000    1.000    1.000

Create a visual diagram of Modified SEM model

# Create a visual diagram of the modified SEM model.

semPaths(model_fit_2, whatLabels="std", sizeMan=10, edge.label.cex=1)  # "std"=> displaying standardized values; change sizeMan and edge.label.cex to adjust the font size

Acknowledgement

This document was created by Dr. Chi-Ning Chang and modified by Margaret Gatongi for the purposes of completing class project #3. The original version was created with assistance from ChatGPT in clarifying aspects of the R code.