EDPSY 558 Assignment Three

Setting up the Environment

rm(list=ls()) #Remove all existing objects
setwd("D:/OneDrive/class/EDPSY558/hw3") #Set the working directory
library(tidyverse) #Data Pre-processing

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(lavaan) #SEM

## This is lavaan 0.6-16
## lavaan is FREE software! Please report any bugs.

Reading into the Matrix

rothlower.cor <- '
1
.292 1
.282 .184 1
.166 .383 .386 1
.231 .277 .431 .537 1
'

# Name the Variables and convert to full matrix
rothfull.cor <- getCov(rothlower.cor, names = c("occ", "inc", "subocc", "subinc", "subses"))

# Display
print(rothfull.cor)

##          occ   inc subocc subinc subses
## occ    1.000 0.292  0.282  0.166  0.231
## inc    0.292 1.000  0.184  0.383  0.277
## subocc 0.282 0.184  1.000  0.386  0.431
## subinc 0.166 0.383  0.386  1.000  0.537
## subses 0.231 0.277  0.431  0.537  1.000

# Add the SD and convert to covariances
cov <- cor2cov(rothfull.cor, sds= c(21.277, 2.198, .640, .670, .627))

# Display
print(cov)

##               occ        inc    subocc    subinc    subses
## occ    452.710729 13.6559190 3.8400730 2.3664279 3.0816968
## inc     13.655919  4.8312040 0.2588365 0.5640288 0.3817464
## subocc   3.840073  0.2588365 0.4096000 0.1655168 0.1729517
## subinc   2.366428  0.5640288 0.1655168 0.4489000 0.2255883
## subses   3.081697  0.3817464 0.1729517 0.2255883 0.3931290

Specify the Path Model

model <- '
# Regression
subses ~ subinc + subocc
subinc ~ inc + subocc
subocc ~ occ + subinc

# Covariance
inc ~~ occ
'
fit <- sem(model, sample.cov = cov, sample.nobs = 432, 
           fixed.x = FALSE, sample.cov.rescale = FALSE)

summary(fit, fit.measures = TRUE, standardized = TRUE,
        rsquare = TRUE, modindices = TRUE)

## lavaan 0.6.16 ended normally after 32 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        12
## 
##   Number of observations                           432
## 
## Model Test User Model:
##                                                       
##   Test statistic                                 7.327
##   Degrees of freedom                                 3
##   P-value (Chi-square)                           0.062
## 
## Model Test Baseline Model:
## 
##   Test statistic                               394.912
##   Degrees of freedom                                10
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.989
##   Tucker-Lewis Index (TLI)                       0.963
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -3964.766
##   Loglikelihood unrestricted model (H1)      -3961.103
##                                                       
##   Akaike (AIC)                                7953.533
##   Bayesian (BIC)                              8002.354
##   Sample-size adjusted Bayesian (SABIC)       7964.272
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.058
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.112
##   P-value H_0: RMSEA <= 0.050                    0.329
##   P-value H_0: RMSEA >= 0.080                    0.295
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.032
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   subses ~                                                              
##     subinc            0.408    0.039   10.356    0.000    0.408    0.436
##     subocc            0.258    0.041    6.245    0.000    0.258    0.263
##   subinc ~                                                              
##     inc               0.110    0.014    7.832    0.000    0.110    0.362
##     subocc            0.122    0.109    1.115    0.265    0.122    0.116
##   subocc ~                                                              
##     occ               0.007    0.001    5.279    0.000    0.007    0.241
##     subinc            0.238    0.098    2.425    0.015    0.238    0.249
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   inc ~~                                                                
##     occ              13.656    2.344    5.826    0.000   13.656    0.292
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .subses            0.257    0.017   14.697    0.000    0.257    0.654
##    .subinc            0.356    0.031   11.560    0.000    0.356    0.794
##    .subocc            0.333    0.024   13.824    0.000    0.333    0.815
##     inc               4.831    0.329   14.697    0.000    4.831    1.000
##     occ             452.711   30.803   14.697    0.000  452.711    1.000
## 
## R-Square:
##                    Estimate
##     subses            0.346
##     subinc            0.206
##     subocc            0.185
## 
## Modification Indices:
## 
##       lhs op    rhs    mi    epc sepc.lv sepc.all sepc.nox
## 1  subses ~~ subinc 2.961 -0.067  -0.067   -0.221   -0.221
## 2  subses ~~ subocc 5.128 -0.124  -0.124   -0.425   -0.425
## 3  subses ~~    inc 1.146  0.059   0.059    0.053    0.053
## 4  subses ~~    occ 3.409  0.945   0.945    0.088    0.088
## 5  subinc ~~ subocc 0.681 -0.065  -0.065   -0.190   -0.190
## 6  subinc ~~    inc 0.681 -0.209  -0.209   -0.159   -0.159
## 7  subinc ~~    occ 0.681  0.590   0.590    0.046    0.046
## 8  subocc ~~    inc 0.681  0.090   0.090    0.071    0.071
## 9  subocc ~~    occ 0.681 -2.970  -2.970   -0.242   -0.242
## 10 subses  ~    inc 2.961  0.021   0.021    0.072    0.072
## 11 subses  ~    occ 5.128  0.003   0.003    0.092    0.092
## 12 subinc  ~ subses 2.961 -0.260  -0.260   -0.243   -0.243
## 13 subinc  ~    occ 0.681  0.001   0.001    0.045    0.045
## 14 subocc  ~ subses 5.128 -0.483  -0.483   -0.474   -0.474
## 15 subocc  ~    inc 0.681  0.020   0.020    0.070    0.070
## 16    inc  ~ subses 1.253  0.239   0.239    0.068    0.068
## 17    inc  ~ subinc 0.681 -0.775  -0.775   -0.236   -0.236
## 18    inc  ~ subocc 0.681  0.294   0.294    0.085    0.085
## 19    occ  ~ subses 4.088  3.716   3.716    0.109    0.109
## 20    occ  ~ subinc 0.681  1.647   1.647    0.052    0.052

Question 1

The over all model fit

Chi-Square(df): 7.327(3) p > .05. In significant result suggests the model fits well.
CFI: 0.989. CFI > .95 indicating good fit.
RMSEA: 0.058. RMSEA < .08 indicating acceptable fit.
SMRM: 0.032. SRMR < .08 indicating good fit.

The local fit

R-square subses has a moderate R-square estimate (0.346), subinc has a medium-to-low R-square estimate (0.206), and subocc has a relatively low R-square estimate (0.185).
Parameter estimates:

subses ~ subinc: The estimate is 0.408 with a standard error of 0.039, z-value of 10.356, and p-value < 0.001. Statistically significant with a moderate standardized effect (Std.all = 0.436).
subses ~ subocc: The estimate is 0.258, statistically significant (p < 0.001) with a relatively small standardized effect (Std.all = 0.263).
subinc ~ inc: Significant (p < 0.001) with a moderate standardized effect of 0.362.
subinc ~ subocc: Not statistically significant (p = 0.265), suggesting that subocc is not a strong predictor of subinc given our model.
subocc ~ occ: Significant (p < 0.001) with a moderate effect size (Std.all = 0.241).
subocc ~ subinc: Significant (p = 0.015) with a moderate effect size (Std.all = 0.249).

0Covariances:

inc ~~ occ: Significant with a standardized estimate of 0.292, indicating a moderate relationship

Variances:

The variances of the error terms for subses, subinc, and subocc are significant, indicating that the model accounts for a significant portion of the variance in all our endogenous variables.

Overall, the local fit of the model seems reasonable. The significant regression weights for most paths show that the exogenous variables have meaningful relationships with the endogenous variables. However, the relatively low R-square values for subinc and subocc suggest that there might be other important variables not included in the model. The nonsignificant path from subocc to subinc also hints on reevaluating this path.

Question 2

There are two potential modifications we can do based purely on statistical indicators. First, as we have mentioned previously, the path from subocc to subinc is not statistically significant. In the modified model, we can fix this path to 0.

Second, We have two unspeicifed paths that have high MIs, namely subses ~ occ (MI = 5.128) and subocc ~ subses (MI = 5.128). However, subocc ~ subses has a SEPC of -0.474, indicating that if we freely estimate this path, it would have a negative effect on subocc. This does not sound theoretical sound. On the other hand, subses ~ occ has a SEPC of 0.092, the magnitude is small, and it suggests that if we freely estimate this path, it will have a positive effect on subses, which sounds theoretically reasonable.

Based on the above discussion, we specify a new model:

model2 <- 
  '
# Regression
subses ~ subinc + subocc + occ
subinc ~ inc + 0*subocc
subocc ~ occ + subinc

# Covariance
inc ~~ occ
  '

Now let’s look at the model fit

fit2 <- sem(model2, sample.cov = cov, sample.nobs = 432, 
           fixed.x = FALSE, sample.cov.rescale = FALSE)

summary(fit2, fit.measures = TRUE, standardized = TRUE,
        rsquare = TRUE, modindices = TRUE)

## lavaan 0.6.16 ended normally after 37 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        12
## 
##   Number of observations                           432
## 
## Model Test User Model:
##                                                       
##   Test statistic                                 3.231
##   Degrees of freedom                                 3
##   P-value (Chi-square)                           0.357
## 
## Model Test Baseline Model:
## 
##   Test statistic                               394.912
##   Degrees of freedom                                10
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    0.999
##   Tucker-Lewis Index (TLI)                       0.998
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -3962.718
##   Loglikelihood unrestricted model (H1)      -3961.103
##                                                       
##   Akaike (AIC)                                7949.437
##   Bayesian (BIC)                              7998.258
##   Sample-size adjusted Bayesian (SABIC)       7960.177
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.013
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.083
##   P-value H_0: RMSEA <= 0.050                    0.722
##   P-value H_0: RMSEA >= 0.080                    0.062
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.020
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   subses ~                                                              
##     subinc            0.402    0.039   10.297    0.000    0.402    0.431
##     subocc            0.234    0.042    5.546    0.000    0.234    0.239
##     occ               0.003    0.001    2.303    0.021    0.003    0.093
##   subinc ~                                                              
##     inc               0.117    0.014    8.618    0.000    0.117    0.383
##     subocc            0.000                               0.000    0.000
##   subocc ~                                                              
##     occ               0.007    0.001    5.168    0.000    0.007    0.225
##     subinc            0.333    0.041    8.044    0.000    0.333    0.350
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   occ ~~                                                                
##     inc              13.656    2.344    5.826    0.000   13.656    0.292
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .subses            0.254    0.017   14.697    0.000    0.254    0.650
##    .subinc            0.383    0.026   14.697    0.000    0.383    0.853
##    .subocc            0.329    0.022   14.697    0.000    0.329    0.809
##     occ             452.711   30.803   14.697    0.000  452.711    1.000
##     inc               4.831    0.329   14.697    0.000    4.831    1.000
## 
## R-Square:
##                    Estimate
##     subses            0.350
##     subinc            0.147
##     subocc            0.191
## 
## Modification Indices:
## 
##       lhs op    rhs    mi    epc sepc.lv sepc.all sepc.nox
## 1  subinc  ~ subocc 1.046  0.121   0.121    0.115    0.115
## 2  subses ~~ subinc 1.467 -0.049  -0.049   -0.158   -0.158
## 3  subses ~~    occ 1.467 -2.200  -2.200   -0.205   -0.205
## 4  subses ~~    inc 1.467  0.066   0.066    0.060    0.060
## 5  subinc ~~ subocc 0.154  0.018   0.018    0.051    0.051
## 6  subinc ~~    occ 1.624  0.772   0.772    0.059    0.059
## 7  subinc ~~    inc 1.624 -0.273  -0.273   -0.201   -0.201
## 8  subocc ~~    occ 0.154  0.811   0.811    0.066    0.066
## 9  subocc ~~    inc 0.154 -0.024  -0.024   -0.019   -0.019
## 10 subses  ~    inc 1.467  0.015   0.015    0.053    0.053
## 11 subinc  ~ subses 0.198 -0.063  -0.063   -0.058   -0.058
## 12 subinc  ~    occ 1.624  0.002   0.002    0.059    0.059
## 13 subocc  ~    inc 0.154 -0.006  -0.006   -0.019   -0.019
## 14    occ  ~ subses 0.477  2.056   2.056    0.060    0.060
## 15    occ  ~ subinc 1.624  2.016   2.016    0.063    0.063
## 16    occ  ~ subocc 1.571  4.751   4.751    0.142    0.142
## 17    inc  ~ subses 0.685  0.170   0.170    0.048    0.048
## 18    inc  ~ subinc 1.624 -0.713  -0.713   -0.217   -0.217
## 19    inc  ~ subocc 0.284 -0.101  -0.101   -0.029   -0.029

Global Fit:

Chi-Square: The chi-square test statistic is 3.231 with a p-value of 0.357 and 3 degrees of freedom. It drops from the previous model’s 7.327 and is non-significant, suggesting improved model fit.
CFI: The CFI value is 0.999, well above 0.95, indicating an excellent fit.
TLI: The TLI value is 0.998, also well above the threshold of 0.95, further supporting an excellent model fit.
RMSEA: The RMSEA value is 0.013, well below the 0.05 threshold, indicating a close fit.
SRMR: The SRMR value is 0.020, below the threshold of 0.08, indicating a good fit.

Local Fit:

R-Square:
- subses: R-square is 0.350, indicating a moderate amount of variance explained.
- subinc: R-square is 0.147, which is relatively low, suggesting limited explanatory power.
- subocc: R-square is 0.191, also indicating a relatively low explanatory power.
Regression Weights:
- subses ~ subinc: Significant with a relatively large standardized effect (.402).
- subses ~ subocc: Significant with a moderate standardized effect (.234).
- subses ~ occ: Significant, albeit with a small effect size (.003).
- subinc ~ inc: Significant and moderate effect (.117).
- subinc ~ subocc: The path is fixed to 0.
- subocc ~ occ: Significant with a small effect (.007).
- subocc ~ subinc: Significant with a moderate effect (.333).
Covariances:
- occ ~~ inc: Significant and indicates a moderate relationship.
Variances:
- The variances of the error terms for subses, subinc, subocc, occ, and inc are all significant, suggesting that the model accounts for a significant portion of the variance in these variables.

Overall Interpretation:

The newly-identified model show an excellent overall fit with high CFI and TLI, low RMSEA and SRMR, and a non-significant chi-square. Our two models are not nested, the new model has: Akaike (AIC) 7949.437 Bayesian (BIC) 7998.258 Sample-size adjusted Bayesian (SABIC) 7960.177

The previous model has: Akaike (AIC) 7953.533 Bayesian (BIC) 8002.354 Sample-size adjusted Bayesian (SABIC)

Both the AIC and BIC are smaller for the modified model, indicating better fit than the previous model.

Question 3

No. Comparing our original model with the modified model, we can see that it’s more likely only subinc predicts subocc, instead of the two having a reciprocal relations.

In the original model, we estimated the reciprocal relationship. However, subinc ~ subocc is not statistically significant at p < .05 level. In the modified model, we fixed subinc ~ subocc to 0, we have shown in question 2 that the modified model is overall a better fit to the data.

But this conclusion is only limited to this assignment. In actual research, the situation can be much more complicated. For example, maybe adding other variables in the model would make the reciprocal relationship significant.