The data set (project3_data.csv) contains the following variables:
id: student ID
female: female =1; male =0.
ses: socioeconomic status
math_score: math test score
mse1: do an excellent job on math tests
mse2: understand math textbook
mse3: master skills taught in math course
mse4: do an excellent job on math assignments
Math self-efficacy can be measured using four continuous items: mse1, mse2, mse3, and mse4. Each item is scored on a scale ranging from 1 to 4, where a higher score indicates a higher level of math self-efficacy.
Theoretically, students’ math motivational factors, including math self-efficacy, are expected to positively predict their math achievement at a later time (Saw & Chang, 2018). Please test the predictive validity of math self-efficacy by using Structural Equation Modeling (SEM) to examine whether, and to what extent, math self-efficacy positively predicts math test scores, as illustrated in Figure 1. Report your findings and include your R code. (5 points)
Project 3: Structural Equation Modeling (SEM)
Load Data
# Example datadata <-read.csv("project3_data.csv")head(data) # Display the first six rows
# Install and load the MVN package#install.packages("MVN")library(MVN)# Test multivariate normality for all observed continuous variableshz_test <-mvn(data = data[, c("mse1", "mse2", "mse3", "mse4", "ses", "math_score")], mvnTest ="hz")# Display the resultshz_test
$multivariateNormality
Test HZ p value MVN
1 Henze-Zirkler 13.56932 0 NO
$univariateNormality
Test Variable Statistic p value Normality
1 Anderson-Darling mse1 78.9824 <0.001 NO
2 Anderson-Darling mse2 60.2503 <0.001 NO
3 Anderson-Darling mse3 88.1124 <0.001 NO
4 Anderson-Darling mse4 91.8098 <0.001 NO
5 Anderson-Darling ses 2.3492 <0.001 NO
6 Anderson-Darling math_score 1.2641 0.0027 NO
$Descriptives
n Mean Std.Dev Median Min Max 25th 75th
mse1 1000 3.0370000 0.7564411 3.00000 1.0000 4.0000 3.00000 4.0000
mse2 1000 2.7570000 0.8310485 3.00000 1.0000 4.0000 2.00000 3.0000
mse3 1000 3.0320000 0.7038973 3.00000 1.0000 4.0000 3.00000 3.0000
mse4 1000 3.1140000 0.7010674 3.00000 1.0000 4.0000 3.00000 4.0000
ses 1000 0.1602087 0.7918316 0.09930 -1.6616 2.5668 -0.40240 0.7069
math_score 1000 52.2871577 9.9354092 51.95995 24.6926 79.5300 46.49687 59.1433
Skew Kurtosis
mse1 -0.5602597 0.16475493
mse2 -0.2329040 -0.51117610
mse3 -0.4230704 0.16036508
mse4 -0.5787041 0.49580175
ses 0.3382118 -0.07924449
math_score -0.1237308 0.02366755
STEP 1: Model Specification
# Install and load the lavaan package#install.packages("lavaan")library(lavaan)
This is lavaan 0.6-18
lavaan is FREE software! Please report any bugs.
# specify the modelmodel <-' # measurement model mse =~ mse1 + mse2 + mse3 + mse4 # structural model math_score ~ mse '
lavaan 0.6-18 ended normally after 33 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 10
Number of observations 1000
Model Test User Model:
Standard Scaled
Test Statistic 34.138 26.982
Degrees of freedom 5 5
P-value (Chi-square) 0.000 0.000
Scaling correction factor 1.265
Yuan-Bentler correction (Mplus variant)
Model Test Baseline Model:
Test statistic 2736.521 1851.720
Degrees of freedom 10 10
P-value 0.000 0.000
Scaling correction factor 1.478
User Model versus Baseline Model:
Comparative Fit Index (CFI) 0.989 0.988
Tucker-Lewis Index (TLI) 0.979 0.976
Robust Comparative Fit Index (CFI) 0.990
Robust Tucker-Lewis Index (TLI) 0.980
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -6866.634 -6866.634
Scaling correction factor 1.363
for the MLR correction
Loglikelihood unrestricted model (H1) -6849.565 -6849.565
Scaling correction factor 1.331
for the MLR correction
Akaike (AIC) 13753.268 13753.268
Bayesian (BIC) 13802.345 13802.345
Sample-size adjusted Bayesian (SABIC) 13770.585 13770.585
Root Mean Square Error of Approximation:
RMSEA 0.076 0.066
90 Percent confidence interval - lower 0.053 0.046
90 Percent confidence interval - upper 0.102 0.089
P-value H_0: RMSEA <= 0.050 0.031 0.094
P-value H_0: RMSEA >= 0.080 0.435 0.169
Robust RMSEA 0.075
90 Percent confidence interval - lower 0.049
90 Percent confidence interval - upper 0.103
P-value H_0: Robust RMSEA <= 0.050 0.060
P-value H_0: Robust RMSEA >= 0.080 0.410
Standardized Root Mean Square Residual:
SRMR 0.015 0.015
Parameter Estimates:
Standard errors Sandwich
Information bread Observed
Observed information based on Hessian
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
mse =~
mse1 1.000 0.653 0.863
mse2 1.025 0.031 32.637 0.000 0.669 0.806
mse3 0.900 0.034 26.629 0.000 0.588 0.835
mse4 0.920 0.028 32.443 0.000 0.600 0.857
Regressions:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
math_score ~
mse 5.951 0.498 11.953 0.000 3.885 0.391
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.mse1 0.146 0.012 11.754 0.000 0.146 0.255
.mse2 0.242 0.016 15.337 0.000 0.242 0.351
.mse3 0.150 0.013 11.276 0.000 0.150 0.302
.mse4 0.131 0.011 11.683 0.000 0.131 0.266
.math_score 83.524 3.922 21.296 0.000 83.524 0.847
mse 0.426 0.026 16.095 0.000 1.000 1.000
Create a visual diagram of the SEM model:
# Install and load the semPlot package#install.packages("semPlot")library("semPlot")# Create a visual diagram of the CFA or SEM model.semPaths(model_fit, whatLabels="std", sizeMan=10, edge.label.cex=1) # "std"=> displaying standardized values; change sizeMan and edge.label.cex to adjust the font size
STEP 4: Model Modification
modificationindices(model_fit, sort.=TRUE) # sort.=TRUE: sort the output using the values of the modification index values. Higher values appear first.
# Modify the modelmodel_2 <-' # measurement model mse =~ mse1 + mse2 + mse3 + mse4 mse1 ~~ mse3 # structural model math_score ~ mse '# Fit the model againmodel_fit_2<-sem(model_2, data=data, estimator ="MLR")# Summarize the modified modelsummary(model_fit_2, fit.measures=TRUE, standardized=TRUE)
lavaan 0.6-18 ended normally after 34 iterations
Estimator ML
Optimization method NLMINB
Number of model parameters 11
Number of observations 1000
Model Test User Model:
Standard Scaled
Test Statistic 2.983 2.434
Degrees of freedom 4 4
P-value (Chi-square) 0.561 0.656
Scaling correction factor 1.226
Yuan-Bentler correction (Mplus variant)
Model Test Baseline Model:
Test statistic 2736.521 1851.720
Degrees of freedom 10 10
P-value 0.000 0.000
Scaling correction factor 1.478
User Model versus Baseline Model:
Comparative Fit Index (CFI) 1.000 1.000
Tucker-Lewis Index (TLI) 1.001 1.002
Robust Comparative Fit Index (CFI) 1.000
Robust Tucker-Lewis Index (TLI) 1.002
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -6851.056 -6851.056
Scaling correction factor 1.369
for the MLR correction
Loglikelihood unrestricted model (H1) -6849.565 -6849.565
Scaling correction factor 1.331
for the MLR correction
Akaike (AIC) 13724.112 13724.112
Bayesian (BIC) 13778.097 13778.097
Sample-size adjusted Bayesian (SABIC) 13743.161 13743.161
Root Mean Square Error of Approximation:
RMSEA 0.000 0.000
90 Percent confidence interval - lower 0.000 0.000
90 Percent confidence interval - upper 0.042 0.034
P-value H_0: RMSEA <= 0.050 0.982 0.995
P-value H_0: RMSEA >= 0.080 0.000 0.000
Robust RMSEA 0.000
90 Percent confidence interval - lower 0.000
90 Percent confidence interval - upper 0.042
P-value H_0: Robust RMSEA <= 0.050 0.979
P-value H_0: Robust RMSEA >= 0.080 0.000
Standardized Root Mean Square Residual:
SRMR 0.006 0.006
Parameter Estimates:
Standard errors Sandwich
Information bread Observed
Observed information based on Hessian
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
mse =~
mse1 1.000 0.678 0.897
mse2 0.970 0.032 29.838 0.000 0.658 0.792
mse3 0.905 0.031 28.962 0.000 0.614 0.873
mse4 0.863 0.029 30.102 0.000 0.585 0.835
Regressions:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
math_score ~
mse 5.653 0.476 11.885 0.000 3.835 0.386
Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.mse1 ~~
.mse3 -0.047 0.010 -4.783 0.000 -0.047 -0.408
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.mse1 0.111 0.014 7.770 0.000 0.111 0.195
.mse2 0.257 0.016 15.760 0.000 0.257 0.373
.mse3 0.118 0.014 8.211 0.000 0.118 0.238
.mse4 0.148 0.012 12.411 0.000 0.148 0.302
.math_score 83.908 3.919 21.413 0.000 83.908 0.851
mse 0.460 0.028 16.492 0.000 1.000 1.000
Create a visual diagram of Modified SEM model
# Create a visual diagram of the modified SEM model.semPaths(model_fit_2, whatLabels="std", sizeMan=10, edge.label.cex=1) # "std"=> displaying standardized values; change sizeMan and edge.label.cex to adjust the font size
Acknowledgement
This document was created by Dr. Chi-Ning Chang and modified by Margaret Gatongi for the purposes of completing class project #3. The original version was created with assistance from ChatGPT in clarifying aspects of the R code.