rm(list=ls()) #Remove all existing objects
setwd("D:/OneDrive/class/EDPSY558/hw3") #Set the working directory
library(tidyverse) #Data Pre-processing
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lavaan) #SEM
## This is lavaan 0.6-16
## lavaan is FREE software! Please report any bugs.
rothlower.cor <- '
1
.292 1
.282 .184 1
.166 .383 .386 1
.231 .277 .431 .537 1
'
# Name the Variables and convert to full matrix
rothfull.cor <- getCov(rothlower.cor, names = c("occ", "inc", "subocc", "subinc", "subses"))
# Display
print(rothfull.cor)
## occ inc subocc subinc subses
## occ 1.000 0.292 0.282 0.166 0.231
## inc 0.292 1.000 0.184 0.383 0.277
## subocc 0.282 0.184 1.000 0.386 0.431
## subinc 0.166 0.383 0.386 1.000 0.537
## subses 0.231 0.277 0.431 0.537 1.000
# Add the SD and convert to covariances
cov <- cor2cov(rothfull.cor, sds= c(21.277, 2.198, .640, .670, .627))
# Display
print(cov)
## occ inc subocc subinc subses
## occ 452.710729 13.6559190 3.8400730 2.3664279 3.0816968
## inc 13.655919 4.8312040 0.2588365 0.5640288 0.3817464
## subocc 3.840073 0.2588365 0.4096000 0.1655168 0.1729517
## subinc 2.366428 0.5640288 0.1655168 0.4489000 0.2255883
## subses 3.081697 0.3817464 0.1729517 0.2255883 0.3931290
model <- '
# Regression
subses ~ subinc + subocc
subinc ~ inc + subocc
subocc ~ occ + subinc
# Covariance
inc ~~ occ
'
fit <- sem(model, sample.cov = cov, sample.nobs = 432,
fixed.x = FALSE, sample.cov.rescale = FALSE)
summary(fit, fit.measures = TRUE, standardized = TRUE,
rsquare = TRUE, modindices = TRUE)
## lavaan 0.6.16 ended normally after 32 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 12
##
## Number of observations 432
##
## Model Test User Model:
##
## Test statistic 7.327
## Degrees of freedom 3
## P-value (Chi-square) 0.062
##
## Model Test Baseline Model:
##
## Test statistic 394.912
## Degrees of freedom 10
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.989
## Tucker-Lewis Index (TLI) 0.963
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -3964.766
## Loglikelihood unrestricted model (H1) -3961.103
##
## Akaike (AIC) 7953.533
## Bayesian (BIC) 8002.354
## Sample-size adjusted Bayesian (SABIC) 7964.272
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.058
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.112
## P-value H_0: RMSEA <= 0.050 0.329
## P-value H_0: RMSEA >= 0.080 0.295
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.032
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## subses ~
## subinc 0.408 0.039 10.356 0.000 0.408 0.436
## subocc 0.258 0.041 6.245 0.000 0.258 0.263
## subinc ~
## inc 0.110 0.014 7.832 0.000 0.110 0.362
## subocc 0.122 0.109 1.115 0.265 0.122 0.116
## subocc ~
## occ 0.007 0.001 5.279 0.000 0.007 0.241
## subinc 0.238 0.098 2.425 0.015 0.238 0.249
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## inc ~~
## occ 13.656 2.344 5.826 0.000 13.656 0.292
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .subses 0.257 0.017 14.697 0.000 0.257 0.654
## .subinc 0.356 0.031 11.560 0.000 0.356 0.794
## .subocc 0.333 0.024 13.824 0.000 0.333 0.815
## inc 4.831 0.329 14.697 0.000 4.831 1.000
## occ 452.711 30.803 14.697 0.000 452.711 1.000
##
## R-Square:
## Estimate
## subses 0.346
## subinc 0.206
## subocc 0.185
##
## Modification Indices:
##
## lhs op rhs mi epc sepc.lv sepc.all sepc.nox
## 1 subses ~~ subinc 2.961 -0.067 -0.067 -0.221 -0.221
## 2 subses ~~ subocc 5.128 -0.124 -0.124 -0.425 -0.425
## 3 subses ~~ inc 1.146 0.059 0.059 0.053 0.053
## 4 subses ~~ occ 3.409 0.945 0.945 0.088 0.088
## 5 subinc ~~ subocc 0.681 -0.065 -0.065 -0.190 -0.190
## 6 subinc ~~ inc 0.681 -0.209 -0.209 -0.159 -0.159
## 7 subinc ~~ occ 0.681 0.590 0.590 0.046 0.046
## 8 subocc ~~ inc 0.681 0.090 0.090 0.071 0.071
## 9 subocc ~~ occ 0.681 -2.970 -2.970 -0.242 -0.242
## 10 subses ~ inc 2.961 0.021 0.021 0.072 0.072
## 11 subses ~ occ 5.128 0.003 0.003 0.092 0.092
## 12 subinc ~ subses 2.961 -0.260 -0.260 -0.243 -0.243
## 13 subinc ~ occ 0.681 0.001 0.001 0.045 0.045
## 14 subocc ~ subses 5.128 -0.483 -0.483 -0.474 -0.474
## 15 subocc ~ inc 0.681 0.020 0.020 0.070 0.070
## 16 inc ~ subses 1.253 0.239 0.239 0.068 0.068
## 17 inc ~ subinc 0.681 -0.775 -0.775 -0.236 -0.236
## 18 inc ~ subocc 0.681 0.294 0.294 0.085 0.085
## 19 occ ~ subses 4.088 3.716 3.716 0.109 0.109
## 20 occ ~ subinc 0.681 1.647 1.647 0.052 0.052
The over all model fit
Chi-Square(df): 7.327(3) p > .05. In significant result suggests the model fits well.
CFI: 0.989. CFI > .95 indicating good fit.
RMSEA: 0.058. RMSEA < .08 indicating acceptable fit.
SMRM: 0.032. SRMR < .08 indicating good fit.
The local fit
R-square subses
has a moderate
R-square estimate (0.346), subinc
has a medium-to-low
R-square estimate (0.206), and subocc
has a relatively low
R-square estimate (0.185).
Parameter estimates:
subses ~ subinc: The estimate is 0.408 with a standard error of 0.039, z-value of 10.356, and p-value < 0.001. Statistically significant with a moderate standardized effect (Std.all = 0.436).
subses ~ subocc: The estimate is 0.258, statistically significant (p < 0.001) with a relatively small standardized effect (Std.all = 0.263).
subinc ~ inc: Significant (p < 0.001) with a moderate standardized effect of 0.362.
subinc ~ subocc: Not statistically significant
(p = 0.265), suggesting that subocc
is not a strong
predictor of subinc
given our model.
subocc ~ occ: Significant (p < 0.001) with a moderate effect size (Std.all = 0.241).
subocc ~ subinc: Significant (p = 0.015) with a moderate effect size (Std.all = 0.249).
The variances of the error terms for subses
,
subinc
, and subocc
are significant, indicating
that the model accounts for a significant portion of the variance in all
our endogenous variables.
Overall, the local fit of the model seems reasonable. The significant
regression weights for most paths show that the exogenous variables have
meaningful relationships with the endogenous variables. However, the
relatively low R-square values for subinc
and
subocc
suggest that there might be other important
variables not included in the model. The nonsignificant path from
subocc
to subinc
also hints on reevaluating
this path.
There are two potential modifications we can do based purely on
statistical indicators. First, as we have mentioned previously, the path
from subocc
to subinc
is not statistically
significant. In the modified model, we can fix this path to 0.
Second, We have two unspeicifed paths that have high MIs, namely
subses ~ occ (MI = 5.128) and subocc ~ subses (MI = 5.128). However,
subocc ~ subses has a SEPC of -0.474, indicating that if we freely
estimate this path, it would have a negative effect on
subocc
. This does not sound theoretical sound. On the other
hand, subses ~ occ has a SEPC of 0.092, the magnitude is small, and it
suggests that if we freely estimate this path, it will have a positive
effect on subses
, which sounds theoretically
reasonable.
Based on the above discussion, we specify a new model:
model2 <-
'
# Regression
subses ~ subinc + subocc + occ
subinc ~ inc + 0*subocc
subocc ~ occ + subinc
# Covariance
inc ~~ occ
'
Now let’s look at the model fit
fit2 <- sem(model2, sample.cov = cov, sample.nobs = 432,
fixed.x = FALSE, sample.cov.rescale = FALSE)
summary(fit2, fit.measures = TRUE, standardized = TRUE,
rsquare = TRUE, modindices = TRUE)
## lavaan 0.6.16 ended normally after 37 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 12
##
## Number of observations 432
##
## Model Test User Model:
##
## Test statistic 3.231
## Degrees of freedom 3
## P-value (Chi-square) 0.357
##
## Model Test Baseline Model:
##
## Test statistic 394.912
## Degrees of freedom 10
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.999
## Tucker-Lewis Index (TLI) 0.998
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -3962.718
## Loglikelihood unrestricted model (H1) -3961.103
##
## Akaike (AIC) 7949.437
## Bayesian (BIC) 7998.258
## Sample-size adjusted Bayesian (SABIC) 7960.177
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.013
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.083
## P-value H_0: RMSEA <= 0.050 0.722
## P-value H_0: RMSEA >= 0.080 0.062
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.020
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## subses ~
## subinc 0.402 0.039 10.297 0.000 0.402 0.431
## subocc 0.234 0.042 5.546 0.000 0.234 0.239
## occ 0.003 0.001 2.303 0.021 0.003 0.093
## subinc ~
## inc 0.117 0.014 8.618 0.000 0.117 0.383
## subocc 0.000 0.000 0.000
## subocc ~
## occ 0.007 0.001 5.168 0.000 0.007 0.225
## subinc 0.333 0.041 8.044 0.000 0.333 0.350
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## occ ~~
## inc 13.656 2.344 5.826 0.000 13.656 0.292
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .subses 0.254 0.017 14.697 0.000 0.254 0.650
## .subinc 0.383 0.026 14.697 0.000 0.383 0.853
## .subocc 0.329 0.022 14.697 0.000 0.329 0.809
## occ 452.711 30.803 14.697 0.000 452.711 1.000
## inc 4.831 0.329 14.697 0.000 4.831 1.000
##
## R-Square:
## Estimate
## subses 0.350
## subinc 0.147
## subocc 0.191
##
## Modification Indices:
##
## lhs op rhs mi epc sepc.lv sepc.all sepc.nox
## 1 subinc ~ subocc 1.046 0.121 0.121 0.115 0.115
## 2 subses ~~ subinc 1.467 -0.049 -0.049 -0.158 -0.158
## 3 subses ~~ occ 1.467 -2.200 -2.200 -0.205 -0.205
## 4 subses ~~ inc 1.467 0.066 0.066 0.060 0.060
## 5 subinc ~~ subocc 0.154 0.018 0.018 0.051 0.051
## 6 subinc ~~ occ 1.624 0.772 0.772 0.059 0.059
## 7 subinc ~~ inc 1.624 -0.273 -0.273 -0.201 -0.201
## 8 subocc ~~ occ 0.154 0.811 0.811 0.066 0.066
## 9 subocc ~~ inc 0.154 -0.024 -0.024 -0.019 -0.019
## 10 subses ~ inc 1.467 0.015 0.015 0.053 0.053
## 11 subinc ~ subses 0.198 -0.063 -0.063 -0.058 -0.058
## 12 subinc ~ occ 1.624 0.002 0.002 0.059 0.059
## 13 subocc ~ inc 0.154 -0.006 -0.006 -0.019 -0.019
## 14 occ ~ subses 0.477 2.056 2.056 0.060 0.060
## 15 occ ~ subinc 1.624 2.016 2.016 0.063 0.063
## 16 occ ~ subocc 1.571 4.751 4.751 0.142 0.142
## 17 inc ~ subses 0.685 0.170 0.170 0.048 0.048
## 18 inc ~ subinc 1.624 -0.713 -0.713 -0.217 -0.217
## 19 inc ~ subocc 0.284 -0.101 -0.101 -0.029 -0.029
Global Fit:
Chi-Square: The chi-square test statistic is 3.231 with a p-value of 0.357 and 3 degrees of freedom. It drops from the previous model’s 7.327 and is non-significant, suggesting improved model fit.
CFI: The CFI value is 0.999, well above 0.95, indicating an excellent fit.
TLI: The TLI value is 0.998, also well above the threshold of 0.95, further supporting an excellent model fit.
RMSEA: The RMSEA value is 0.013, well below the 0.05 threshold, indicating a close fit.
SRMR: The SRMR value is 0.020, below the threshold of 0.08, indicating a good fit.
Local Fit:
R-Square:
subses
: R-square is 0.350, indicating a moderate amount
of variance explained.subinc
: R-square is 0.147, which is relatively low,
suggesting limited explanatory power.subocc
: R-square is 0.191, also indicating a relatively
low explanatory power.Regression Weights:
subses ~ subinc: Significant with a relatively large standardized effect (.402).
subses ~ subocc: Significant with a moderate standardized effect (.234).
subses ~ occ: Significant, albeit with a small effect size (.003).
subinc ~ inc: Significant and moderate effect (.117).
subinc ~ subocc: The path is fixed to 0.
subocc ~ occ: Significant with a small effect (.007).
subocc ~ subinc: Significant with a moderate effect (.333).
Covariances:
Variances:
subses
,
subinc
, subocc
, occ
, and
inc
are all significant, suggesting that the model accounts
for a significant portion of the variance in these variables.Overall Interpretation:
The newly-identified model show an excellent overall fit with high CFI and TLI, low RMSEA and SRMR, and a non-significant chi-square. Our two models are not nested, the new model has: Akaike (AIC) 7949.437 Bayesian (BIC) 7998.258 Sample-size adjusted Bayesian (SABIC) 7960.177
The previous model has: Akaike (AIC) 7953.533 Bayesian (BIC) 8002.354 Sample-size adjusted Bayesian (SABIC)
Both the AIC and BIC are smaller for the modified model, indicating better fit than the previous model.
No. Comparing our original model with the modified model, we can see
that it’s more likely only subinc
predicts
subocc
, instead of the two having a reciprocal
relations.
In the original model, we estimated the reciprocal relationship.
However, subinc
~ subocc
is not statistically
significant at p < .05 level. In the modified model, we fixed
subinc
~ subocc
to 0, we have shown in
question 2 that the modified model is overall a better fit to the
data.
But this conclusion is only limited to this assignment. In actual research, the situation can be much more complicated. For example, maybe adding other variables in the model would make the reciprocal relationship significant.