So far we’ve just been using complete cases - next we’ll implement the model using multiple imputation. Again, you can find a walkthrough on MI and bootstrapping theory I did here. Here we’re imputing 10 datasets, using a range of models suitable for the data type (e.g. proportional odds model’s for ordinal data, logistic regression for binary variables, and predictive mean matching for continous variables, although they are actually amounts not continous variables).
The following code shows the actual MI process. This code is not actually evaluated, because R Markdown is slow. Instead, it’s run in a seperate bit of code and loaded here.
# Number of imputed df
N.Imp = 10
# Conduct multiple imputation using the mice package
# Here we have to specify the imputation method for each variable:
str_out <- data.frame(capture.output(str(DF3)))
# Delete the first and last row
str_out <- data.frame(str_output = matrix(str_out[2:(nrow(str_out)-1),]))
# Create a column that contain the appropreate model for each variable - this only works if the variable is of the correct data type in the first place
str_out$type <- ifelse(grepl("Ord.factor", str_out$str_output, fixed = TRUE)==T, "polr",
ifelse(grepl("num", str_out$str_output, fixed = TRUE)==T, "pmm",
ifelse(grepl("Factor", str_out$str_output, fixed = TRUE)==T, "polyreg",
ifelse(grepl("int", str_out$str_output, fixed = TRUE)==T, "logreg", "ERROR"))))
# Conduct the MI - with the number of datasets specified by N.Imp, and the estimation type specified by str_out$type (derived from the above)
DF3_imp <- mice(DF3, m = N.Imp, method = str_out$type )
# create a df in a list for each imputed df
# and create dummy variable from factors
# Exract each DF
mice.imp <- list()
for(i in 1:N.Imp) {
mice.imp[[i]] <- mice::complete(DF3_imp, action= i, inc=FALSE)
# Turn factor into series of binary variables
mice.imp[[i]] <- dummy_cols(mice.imp[[i]], select_columns = c("gender", "position", "education"))
# remove spaces and other charecters on colnames
colnames( mice.imp[[i]]) <- gsub(" ", "", colnames( mice.imp[[i]]), fixed = TRUE)
colnames( mice.imp[[i]]) <- gsub("/", "", colnames( mice.imp[[i]]), fixed = TRUE)
colnames( mice.imp[[i]]) <- gsub("-", "_", colnames( mice.imp[[i]]), fixed = TRUE)
}
# Now we have to convert the ordinal exogenous variables to numeric. If we want to include them, we're somewhat forced to assume they are numeric.
exogenous <- c("SS1" , "SS2" , "SS3", "PS_1" , "PS_2", "PS_3", "health") # the exogeous ordinal variables
for (i in 1:N.Imp) {
mice.imp[[i]][,exogenous] <- apply(mice.imp[[i]][,exogenous], 2, as.numeric)
}
Now let’s run a simple SEM on the imputed datasets.
# Load the preciously created imputed dataset
load("analy_data/mice.imp.Rdata")
# A simple K10 ~ age SEM
model_MI_test_1 <- '
# PD
PD =~ K10_1 + K10_2 + K10_3 + K10_4 + K10_5 + K10_6 + K10_7 + K10_8 + K10_9 + K10_10
# Correlated error terms
K10_2 ~~ K10_3
K10_5 ~~ K10_6
K10_7 ~~ K10_8
# Predicting PD
PD ~ age_year
'
# Running the test model on each DF - this does the pooling automatically
fit_MI_test_1 <- runMI(model_MI_test_1, data = mice.imp, fun="sem", estimator = "WLSMVS", ordered = c("K10_1" , "K10_2" , "K10_3" , "K10_4", "K10_5", "K10_6", "K10_7", "K10_8", "K10_9", "K10_10"))
Here lavaan.mi and semPlot are not integrated. As a result, we have to run a SEM with the same parameters and model structure. Then we subsitute the plotted valued with the pooled estimates and p-values from the lavaan.mi object. This is a point about the visualisation, rather than the statistics.
## lavaan.mi object based on 10 imputed data sets.
## See class?lavaan.mi help page for available methods.
##
## Convergence information:
## The model converged on 10 imputed data sets
##
## Rubin's (1987) rules were used to pool point and SE estimates across 10 imputed data sets, and to calculate degrees of freedom for each parameter's t test and CI.
The RMSEA is 0.049 (95% CI 0.044 - 0.055).
Here we have a slightly more complex model. Additionally, lavaan has a neat feature that allows you to name coefficients in the mode, and calculate joint probabilities. You could do this by hand, but this also calculates p-values etc. for the joint probabilities. However, you still have to correctly describe how the joint probabilities are calculated, but this follows the same logic as when using probability trees.
# There are some warnings, but it does not appear to have affected the results...
# A More complex SEM
model_MI_test_2 <- '
### Conservation optimism
SO =~ SO_1 + SO_2 + SO_3 +SO_4 + SO_5+ SO_6 + SO_7 + SO_9
# Correlated error terms
SO_1 ~~ SO_2
SO_3 ~~ SO_4
SO_5 ~~ SO_6
SO_7 ~~ SO_9
### Dispositional optimism
DO =~ LOTR_1 + LOTR_2 + LOTR_3 + LOTR_4 + LOTR_5 + LOTR_6
# The method effect
method =~ LOTR_1 + LOTR_3 + LOTR_6
### regressing SO and DO
SO ~ SO.DO*DO
### regressing SO and DO on goal progress
GP_total ~ GP.DO*DO
### PD
PD =~ K10_1 + K10_2 + K10_3 + K10_4 + K10_5 + K10_6 + K10_7 + K10_8 + K10_9 + K10_10
# Correlated error terms
K10_2 ~~ K10_3
K10_5 ~~ K10_6
K10_7 ~~ K10_8
### regressing GP and DO on PD
PD ~ PD.GP*GP_total + PD.DO*DO + PD.SO*SO
# Setting the correlation between the method effect and DO to 0
DO ~~ 0*method
SO ~~ 0*method
# Joint probabilities
PD.GP.DO := GP.DO*PD.GP # DO on GP on PD
PD.SO.DO := SO.DO*PD.SO # DO on SO on PD
DO_total := PD.DO + PD.GP.DO + PD.SO.DO
'
# Running the test model on each DF - this does the pooling automatically
fit_MI_test_2 <- runMI(model_MI_test_2, data = mice.imp, fun="sem", estimator = "WLSMVS", ordered = c("SO_1" , "SO_2" , "SO_3" , "SO_4", "SO_5", "SO_6", "SO_7", "SO_9", "LOTR_1" , "LOTR_2" , "LOTR_3" , "LOTR_4", "LOTR_5", "LOTR_6", "K10_1" , "K10_2" , "K10_3" , "K10_4", "K10_5", "K10_6", "K10_7", "K10_8", "K10_9", "K10_10"))
Here lavaan.mi and semPlot are not integrated. As a result, we have to run a SEM with the same parameters and model structure. Then we subsitute the plotted valued with the pooled estimates and p-values from the lavaan.mi object. This is a point about the visualisation, rather than the statistics.
## lavaan.mi object based on 10 imputed data sets.
## See class?lavaan.mi help page for available methods.
##
## Convergence information:
## The model converged on 10 imputed data sets
##
## Rubin's (1987) rules were used to pool point and SE estimates across 10 imputed data sets, and to calculate degrees of freedom for each parameter's t test and CI.
## lavaan.mi object based on 10 imputed data sets.
## See class?lavaan.mi help page for available methods.
##
## Convergence information:
## The model converged on 10 imputed data sets
##
## Rubin's (1987) rules were used to pool point and SE estimates across 10 imputed data sets, and to calculate degrees of freedom for each parameter's t test and CI.
## lavaan.mi object based on 10 imputed data sets.
## See class?lavaan.mi help page for available methods.
##
## Convergence information:
## The model converged on 10 imputed data sets
##
## Rubin's (1987) rules were used to pool point and SE estimates across 10 imputed data sets, and to calculate degrees of freedom for each parameter's t test and CI.
Response variable | Relationship | Explanatory variable | Estimate | p-value |
---|---|---|---|---|
SO | ~ | DO | 0.278 |
***
|
GP_total | ~ | DO | 0.337 |
***
|
PD | ~ | GP_total | -0.100 |
***
|
PD | ~ | DO | -0.480 |
***
|
PD | ~ | SO | -0.059 |
*
|
The RMSEA is 0.049 (95% CI 0.044 - 0.055).
Now we can combine this analysis with the other predictor variables into our main model. We have several exogenous ordinal variables (“SS1” , “SS2” , “SS3”, “PS_1” , “PS_2”, “PS_3”, “health”). I am unaware of how to include these as ordinal variables within lavaan. Typically, people assume (in many types of statistical model) these are numeric and include them as such. However, this makes the assumption that they are equal interval, and thus have a linear relationship with the response variable. Alternatively, it’s also common to include polynomial terms to account for potential non-linearity. I’m against this, since we don’t really have an a priori justification for the choice in polynomial, so I think if we’re making assumptions it’s better to keep them simple (and linear).
## lavaan.mi object based on 10 imputed data sets.
## See class?lavaan.mi help page for available methods.
##
## Convergence information:
## The model converged on 10 imputed data sets
##
## Rubin's (1987) rules were used to pool point and SE estimates across 10 imputed data sets, and to calculate degrees of freedom for each parameter's t test and CI.
The model is actually too big to show as a single figure. Instead, we’ll seperate it into the ‘goal/optimism part’, and the ‘other predictors’ part.
## lavaan.mi object based on 10 imputed data sets.
## See class?lavaan.mi help page for available methods.
##
## Convergence information:
## The model converged on 10 imputed data sets
##
## Rubin's (1987) rules were used to pool point and SE estimates across 10 imputed data sets, and to calculate degrees of freedom for each parameter's t test and CI.
## lavaan.mi object based on 10 imputed data sets.
## See class?lavaan.mi help page for available methods.
##
## Convergence information:
## The model converged on 10 imputed data sets
##
## Rubin's (1987) rules were used to pool point and SE estimates across 10 imputed data sets, and to calculate degrees of freedom for each parameter's t test and CI.
Response variable | Relationship | Explanatory variable | Estimate | p-value |
---|---|---|---|---|
Optimism and goal progress | ||||
SO | ~ | DO | 0.175 |
***
|
GP | ~ | DO | 0.190 |
***
|
PD | ~ | GP | -0.053 |
**
|
PD | ~ | DO | -0.303 |
***
|
PD | ~ | SO | -0.039 |
.
|
Assorted charecteristics | ||||
PD | ~ | ERI (new) | 0.302 |
***
|
PD | ~ | Years in conservation | -0.115 |
**
|
PD | ~ | Age | -0.131 |
**
|
PD | ~ | Health | -0.215 |
***
|
PD | ~ | National/non-national | -0.021 | |
PD | ~ | Work hours | 0.019 | |
PD | ~ | CV | -0.007 | |
Gender (reference = female) | ||||
PD | ~ | Male | -0.177 |
***
|
PD | ~ | Other than male or female | 0.261 | |
PD | ~ | Prefer not to say | 0.179 | |
Position (reference = desk) | ||||
PD | ~ | Non-desk | -0.066 | |
PD | ~ | Unknown | -0.073 | |
Education (reference = college) | ||||
PD | ~ | University | -0.041 | |
PD | ~ | Primary | 0.496 | |
PD | ~ | Secoundary | 0.116 | |
PD | ~ | Unknown | -0.120 | |
Social support | ||||
PD | ~ | Social support: personal relationships | -0.114 |
***
|
PD | ~ | Social support: support from F&F | -0.081 |
***
|
PD | ~ | Social support: time with F&F | -0.062 |
***
|
Personal security | ||||
PD | ~ | Personal security: dangerous outside | -0.039 |
*
|
PD | ~ | Personal security: dangerous work | -0.006 | |
PD | ~ | Personal security: not feel safe | 0.140 |
***
|