We will use the same ELS dataset that we used in logistic regression week. For packages, we will use tidyverseand lavaan.
library(tidyverse)
library(lavaan)
library(haven)
els <- read_dta("ELS_Logistic.dta")
glimpse(els)
Rows: 16,197
Columns: 55
$ stu_id <dbl> 101101, 101102, 101104, 101105, 101106, 101107, 101108…
$ sch_id <dbl> 1011, 1011, 1011, 1011, 1011, 1011, 1011, 1011, 1011, …
$ strat_id <dbl> 101, 101, 101, 101, 101, 101, 101, 101, 101, 101, 101,…
$ psu <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ f1sch_id <dbl+lbl> 1011, 1011, 1011, 1011, 1011, 1011, 1011, -8, …
$ f1univ1 <dbl+lbl> 101, 101, 101, 101, 101, 101, 101, 107, 107, 101, …
$ f1univ2a <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ f1univ2b <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 7, 7, 1, 1, 1, 1, 1, 1, 1, 1,…
$ f2univ_p <dbl+lbl> 103, 101, 101, 101, 101, 101, 101, 104, 105, 101, …
$ bystuwt <dbl+lbl> 178.9513, 28.2951, 589.7248, 235.7822, 178.9513, …
$ bysex <dbl+lbl> 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 2,…
$ byrace <dbl+lbl> 5, 2, 7, 3, 4, 4, 4, 7, 4, 3, 3, 4, 3, 2, 2, 3, 3,…
$ bystlang <dbl+lbl> 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1,…
$ byses1 <dbl+lbl> -0.25, 0.58, -0.85, -0.80, -1.41, -1.07, 0.27, -…
$ byses1qu <dbl+lbl> 2, 4, 1, 1, 1, 1, 3, 2, 1, 1, 2, 2, 1, 2, 4, 3, 3,…
$ bygrdrpt <dbl+lbl> 0, 0, 0, 98, 0, 0, 98, 0, 0, 98, 0, 0, 1…
$ byhomlit <dbl+lbl> 0, 3, 2, 1, 1, 2, 1, 3, 0, 2, 0, 3, 2, 1, 2, 0, 1,…
$ byparasp <dbl+lbl> 5, 7, 7, 6, 2, 3, 5, 5, 5, 2, 5, 3, 6, 5, 6, 3, 5,…
$ byiepflg <dbl+lbl> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,…
$ bytxcstd <dbl+lbl> 56.21, 57.66, 66.50, 46.46, 36.17, 30.72, 45.46, 6…
$ bytxcqu <dbl+lbl> 3, 4, 4, 2, 1, 1, 2, 4, 1, 2, 4, 2, 1, 3, 4, 1, 1,…
$ bywrtnga <dbl+lbl> 1.191, 1.191, 0.996, -0.137, -0.435, -0.630, -0…
$ byxtracu <dbl+lbl> 1, 3, 2, 0, 0, 0, 1, 2, 1, 2, 0, 0, 2, 1, 2, 0, 0,…
$ byhmwrk <dbl+lbl> 7, 5, -9, 11, 10, 13, 12, 12, -9, 23, 9, 4, 7…
$ bytvvigm <dbl+lbl> 99, 4, 1, 99, 4, 3, 2, 3, -9, 3, 8, 1, 4…
$ f1qwt <dbl+lbl> 152.9769, 25.3577, 709.4246, 199.7193, 152.9…
$ f1pnlwt <dbl+lbl> 155.6312, 25.4906, 725.6926, 205.1919, 155.6…
$ f1trscwt <dbl+lbl> 284.6529, 42.0937, 730.2420, 280.0837, 287.2398, …
$ f2qtscwt <dbl+lbl> 0.0000, 50.3749, 755.4681, 245.5287, 265.5541, …
$ f2qwt <dbl+lbl> 0.0000, 28.2772, 660.0555, 198.5202, 153.7132, …
$ f2f1wt <dbl+lbl> 0.0000, 29.8462, 713.1807, 227.0875, 163.9503, …
$ f2bywt <dbl+lbl> 0.0000, 27.8257, 662.1222, 212.3764, 158.2162, …
$ bys14 <dbl+lbl> 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 2,…
$ bys15 <dbl+lbl> 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0,…
$ bys24d <dbl+lbl> 1, 1, 1, 1, 1, 2, 2, 1, 2, 1, 1, 1, 2…
$ bys24e <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2…
$ bys24f <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2…
$ bys33a <dbl+lbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ bys33b <dbl+lbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
$ bys43 <dbl+lbl> 2, 2, 3, 5, 2, 12, -9, 0, -9, 2, 0, 2, 0…
$ bys62a <dbl+lbl> -3, -3, -3, -3, -3, -3, -3, -3, -3, -3, -3, -3, -3…
$ bys62g <dbl+lbl> -3, -3, -3, -3, -3, -3, -3, -3, -3, -3, -3, -3, -3…
$ bys66a <dbl+lbl> 4, 1, 1, 1, 1, 6, 1, 1, 1, 4, -9, 1, 1…
$ bys66b <dbl+lbl> 6, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ bys66c <dbl+lbl> 3, -1, -1, 6, 1, -3, -1, 6, 1, 6, 1, 1, 1…
$ bys66f <dbl+lbl> 1, 1, -1, 1, 1, -3, 1, 1, 1, 1, 1, 1, -3…
$ bys66g <dbl+lbl> -3, -3, -1, 1, -3, -9, 1, 1, -3, -1, 1, 1, -3…
$ bys67 <dbl+lbl> 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1,…
$ bys71d <dbl+lbl> 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0…
$ bys71e <dbl+lbl> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0…
$ bys83a <dbl+lbl> 2, 5, 2, 2, 1, -9, -1, -1, -1, 2, -1, 6, 2…
$ bys83b <dbl+lbl> 4, 7, 4, 2, 1, -9, -1, -1, -1, 2, -1, 2, 5…
$ bys87a <dbl+lbl> 3, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2…
$ bys87e <dbl+lbl> 2, 2, 1, 2, 2, -9, 2, 2, 2, 2, 1, 2, 2…
$ bys87f <dbl+lbl> 3, 2, 3, 1, 2, 1, 3, 1, 2, 1, 2, 2, 2…
transmute is your friend!We will use the transmute function from the tidyverse. This function works like a combination of mutate and select at the same time. Only variabels that are named in transmute will be kept. So we will make copies of the variables we need that have more readable names, and in doing so, we also will create a smaller dataset just containing the variables we need. Pretty cool!
els.clean <- els %>%
na_if(., -9) %>%
na_if(., -8) %>%
na_if(., -7) %>%
na_if(., -4)%>%
na_if(., -1) %>%
transmute(.,
hrsread = bys43,
absorbmath = bys87a,
absorbread = bys87e,
importmath = bys87f,
comptest = bytxcstd,
sex = as_factor(bysex),
sesq = as_factor(byses1qu))
Before selecting variables, we first used the na_if function from the tidyverse to convert common missing value codes from ELS into NAs. We also did this last week.
Next, we selected variables related to math and reading engagement (absorbmath and absorbread) and math importance (importmath). We also selected hours read at home per week (hrsread), a standardized achievement test variable (comptest), a of sex (sex) and a quartile measure of SES (sesq). We converted the last two categorical variables into factors using the as_factor function.
stand option)I recommend using full-information maximum likelihood (FIML), which adjusts for missing values. It is often preferred to multiple imputation! Just add the option estimator = "MLR" to the end of the sem command in lavaan.
model1 <- 'comptest ~ absorbread'
fit1 <- sem(model1, data = els.clean, estimator = "MLR")
summary() function provides a nice summary of the fitted model:summary(fit1, standardized = TRUE)
lavaan 0.6-7 ended normally after 16 iterations
Estimator ML
Optimization method NLMINB
Number of free parameters 2
Used Total
Number of observations 11603 16197
Model Test User Model:
Standard Robust
Test Statistic 0.000 0.000
Degrees of freedom 0 0
Parameter Estimates:
Standard errors Sandwich
Information bread Observed
Observed information based on Hessian
Regressions:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
comptest ~
absorbread -2.374 0.114 -20.878 0.000 -2.374 -0.229
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.comptest 90.527 1.131 80.075 0.000 90.527 0.947
semPlot::semPaths(fit1, whatLabels = "standardized")
lm() functionmodel2 <- lm(comptest ~ absorbread, data = els.clean)
summary(model2)
Call:
lm(formula = comptest ~ absorbread, data = els.clean)
Residuals:
Min 1Q Median 3Q Max
-43.952 -6.393 0.419 6.868 29.052
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 57.07586 0.22911 249.12 <2e-16 ***
absorbread -2.37439 0.09354 -25.38 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 9.515 on 11601 degrees of freedom
(4594 observations deleted due to missingness)
Multiple R-squared: 0.05262, Adjusted R-squared: 0.05253
F-statistic: 644.3 on 1 and 11601 DF, p-value: < 2.2e-16
table(els.clean$sex)
{Survey component legitimate skip/NA}
0
{Nonrespondent}
0
Male
7653
Female
7717
case_when() to create a dummy variableels.clean <- els.clean %>%
mutate(.,
male = case_when(
sex == "Male" ~ 1,
sex == "Female" ~ 0))
model3 <- 'absorbmath ~ male'
fit3 <- sem(model3, data = els.clean, estimator = "MLR")
summary(fit3, standardized = TRUE)
lavaan 0.6-7 ended normally after 11 iterations
Estimator ML
Optimization method NLMINB
Number of free parameters 2
Used Total
Number of observations 11755 16197
Model Test User Model:
Standard Robust
Test Statistic 0.000 0.000
Degrees of freedom 0 0
Parameter Estimates:
Standard errors Sandwich
Information bread Observed
Observed information based on Hessian
Regressions:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
absorbmath ~
male -0.081 0.016 -5.055 0.000 -0.081 -0.047
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.absorbmath 0.743 0.025 30.138 0.000 0.743 0.998
semPlot::semPaths(fit3, whatLabels = "standardized")
table(els.clean$sesq)
{Survey component legitimate skip/NA}
0
{Nonrespondent}
0
Lowest quartile
3608
Second quartile
3604
Third quartile
3731
Highest quartile
4301
els.clean <- els.clean %>%
mutate(.,
ses1 = case_when(
sesq== "Lowest quartile" ~ 1,
TRUE ~ 0),
ses2 = case_when(
sesq== "Second quartile" ~ 1,
TRUE ~ 0),
ses3 = case_when(
sesq== "Third quartile" ~ 1,
TRUE ~ 0),
ses4 = case_when(
sesq== "Highest quartile" ~ 1,
TRUE ~ 0))
model4 <- 'importmath ~ ses1 + ses2 + ses3'
fit4 <- sem(model4, data = els.clean, estimator = "MLR")
summary(fit4, standardized = TRUE)
lavaan 0.6-7 ended normally after 15 iterations
Estimator ML
Optimization method NLMINB
Number of free parameters 4
Used Total
Number of observations 11763 16197
Model Test User Model:
Standard Robust
Test Statistic 0.000 0.000
Degrees of freedom 0 0
Parameter Estimates:
Standard errors Sandwich
Information bread Observed
Observed information based on Hessian
Regressions:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
importmath ~
ses1 -0.039 0.023 -1.662 0.097 -0.039 -0.017
ses2 -0.011 0.024 -0.481 0.630 -0.011 -0.005
ses3 0.033 0.022 1.485 0.138 0.033 0.016
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.importmath 0.822 0.016 51.731 0.000 0.822 0.999
semPlot::semPaths(fit4, whatLabels = "standardized")
model5 <- 'comptest ~ hrsread + absorbmath + absorbread + importmath + male + ses1 + ses2 + ses3'
fit5 <- sem(model5, data = els.clean, estimator = "MLR")
summary(fit5, standardized = TRUE)
lavaan 0.6-7 ended normally after 69 iterations
Estimator ML
Optimization method NLMINB
Number of free parameters 9
Used Total
Number of observations 11235 16197
Model Test User Model:
Standard Robust
Test Statistic 0.000 0.000
Degrees of freedom 0 0
Parameter Estimates:
Standard errors Sandwich
Information bread Observed
Observed information based on Hessian
Regressions:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
comptest ~
hrsread 0.053 0.025 2.108 0.035 0.053 0.021
absorbmath 0.255 0.104 2.439 0.015 0.255 0.023
absorbread -1.907 0.109 -17.449 0.000 -1.907 -0.184
importmath -0.589 0.100 -5.885 0.000 -0.589 -0.055
male 0.495 0.166 2.972 0.003 0.495 0.026
ses1 -10.302 0.231 -44.560 0.000 -10.302 -0.435
ses2 -6.981 0.224 -31.165 0.000 -6.981 -0.306
ses3 -4.221 0.217 -19.469 0.000 -4.221 -0.190
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.comptest 72.814 0.985 73.925 0.000 72.814 0.785
semPlot::semPaths(fit5, whatLabels = "standardized")
model6 <- 'Read =~ hrsread + absorbread'
fit6 <- sem(model6, data=els.clean, estimator = "MLR")
lavaan WARNING:
The variance-covariance matrix of the estimated parameters (vcov)
does not appear to be positive definite! The smallest eigenvalue
(= -3.463451e+02) is smaller than zero. This may be a symptom that
the model is not identified.lavaan WARNING: some estimated ov variances are negative
summary(fit6, standardized = TRUE)
lavaan 0.6-7 ended normally after 21 iterations
Estimator ML
Optimization method NLMINB
Number of free parameters 4
Used Total
Number of observations 11319 16197
Model Test User Model:
Test statistic NA
Degrees of freedom -1
P-value (Unknown) NA
Parameter Estimates:
Standard errors Sandwich
Information bread Observed
Observed information based on Hessian
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
Read =~
hrsread 1.000 1.140 0.295
absorbread -1.010 NA -1.151 -1.229
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.hrsread 13.607 NA 13.607 0.913
.absorbread -0.447 NA -0.447 -0.509
Read 1.300 NA 1.000 1.000
model7 <- 'Learn =~ hrsread + absorbread + absorbmath + importmath'
fit7 <- sem(model7, data=els.clean, estimator = "MLR")
summary(fit7, fit.measures = TRUE, standardized = TRUE)
lavaan 0.6-7 ended normally after 50 iterations
Estimator ML
Optimization method NLMINB
Number of free parameters 8
Used Total
Number of observations 11235 16197
Model Test User Model:
Standard Robust
Test Statistic 1531.674 1195.624
Degrees of freedom 2 2
P-value (Chi-square) 0.000 0.000
Scaling correction factor 1.281
Yuan-Bentler correction (Mplus variant)
Model Test Baseline Model:
Test statistic 3954.430 1838.852
Degrees of freedom 6 6
P-value 0.000 0.000
Scaling correction factor 2.150
User Model versus Baseline Model:
Comparative Fit Index (CFI) 0.613 0.349
Tucker-Lewis Index (TLI) -0.162 -0.954
Robust Comparative Fit Index (CFI) 0.612
Robust Tucker-Lewis Index (TLI) -0.164
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -74091.256 -74091.256
Scaling correction factor 3.248
for the MLR correction
Loglikelihood unrestricted model (H1) NA NA
Scaling correction factor 2.855
for the MLR correction
Akaike (AIC) 148198.513 148198.513
Bayesian (BIC) 148257.127 148257.127
Sample-size adjusted Bayesian (BIC) 148231.704 148231.704
Root Mean Square Error of Approximation:
RMSEA 0.261 0.230
90 Percent confidence interval - lower 0.250 0.221
90 Percent confidence interval - upper 0.272 0.240
P-value RMSEA <= 0.05 0.000 0.000
Robust RMSEA 0.261
90 Percent confidence interval - lower 0.249
90 Percent confidence interval - upper 0.273
Standardized Root Mean Square Residual:
SRMR 0.109 0.109
Parameter Estimates:
Standard errors Sandwich
Information bread Observed
Observed information based on Hessian
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
Learn =~
hrsread 1.000 0.445 0.115
absorbread -0.476 0.049 -9.739 0.000 -0.212 -0.228
absorbmath -1.163 0.178 -6.534 0.000 -0.517 -0.600
importmath -1.362 0.215 -6.339 0.000 -0.605 -0.672
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.hrsread 14.734 0.433 33.993 0.000 14.734 0.987
.absorbread 0.821 0.018 45.443 0.000 0.821 0.948
.absorbmath 0.475 0.030 15.689 0.000 0.475 0.640
.importmath 0.446 0.025 17.779 0.000 0.446 0.549
Learn 0.198 0.058 3.392 0.001 1.000 1.000
comptestmodel8 <- 'Learn =~ hrsread + absorbread + absorbmath + importmath
comptest ~ Learn'
fit8 <- sem(model8, data=els.clean, estimator = "MLR")
summary(fit8, fit.measures = TRUE, standardized = TRUE)
lavaan 0.6-7 ended normally after 59 iterations
Estimator ML
Optimization method NLMINB
Number of free parameters 10
Used Total
Number of observations 11235 16197
Model Test User Model:
Standard Robust
Test Statistic 2118.391 1979.069
Degrees of freedom 5 5
P-value (Chi-square) 0.000 0.000
Scaling correction factor 1.070
Yuan-Bentler correction (Mplus variant)
Model Test Baseline Model:
Test statistic 4638.662 2646.109
Degrees of freedom 10 10
P-value 0.000 0.000
Scaling correction factor 1.753
User Model versus Baseline Model:
Comparative Fit Index (CFI) 0.543 0.251
Tucker-Lewis Index (TLI) 0.087 -0.498
Robust Comparative Fit Index (CFI) 0.543
Robust Tucker-Lewis Index (TLI) 0.085
Loglikelihood and Information Criteria:
Loglikelihood user model (H0) -115434.688 -115434.688
Scaling correction factor 2.865
for the MLR correction
Loglikelihood unrestricted model (H1) NA NA
Scaling correction factor 2.267
for the MLR correction
Akaike (AIC) 230889.376 230889.376
Bayesian (BIC) 230962.644 230962.644
Sample-size adjusted Bayesian (BIC) 230930.865 230930.865
Root Mean Square Error of Approximation:
RMSEA 0.194 0.187
90 Percent confidence interval - lower 0.187 0.181
90 Percent confidence interval - upper 0.201 0.194
P-value RMSEA <= 0.05 0.000 0.000
Robust RMSEA 0.194
90 Percent confidence interval - lower 0.187
90 Percent confidence interval - upper 0.201
Standardized Root Mean Square Residual:
SRMR 0.106 0.106
Parameter Estimates:
Standard errors Sandwich
Information bread Observed
Observed information based on Hessian
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
Learn =~
hrsread 1.000 0.512 0.133
absorbread -0.462 0.043 -10.816 0.000 -0.237 -0.254
absorbmath -0.985 0.155 -6.368 0.000 -0.505 -0.586
importmath -1.178 0.198 -5.953 0.000 -0.603 -0.669
Regressions:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
comptest ~
Learn 2.355 0.302 7.809 0.000 1.207 0.125
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.hrsread 14.669 0.436 33.651 0.000 14.669 0.982
.absorbread 0.810 0.019 42.099 0.000 0.810 0.935
.absorbmath 0.487 0.028 17.349 0.000 0.487 0.657
.importmath 0.448 0.025 17.934 0.000 0.448 0.552
.comptest 91.355 1.153 79.258 0.000 91.355 0.984
Learn 0.262 0.079 3.324 0.001 1.000 1.000
semPlot::semPaths(fit8, whatLabels = "standardized")