Set Working Directory and Read in Data
setwd("H:/USC Guest Lecture/Multilevel MI")
dat <- read.csv("stress_data.csv")
Impose Missingness into Dataframe
stress_data <- simsem::imposeMissing(dat,
cov = "baseage",
pmMAR = .2,
ignoreCols = 1:7)
Describe Data
psych::describe(stress_data)
## vars n mean sd median trimmed mad min max range skew
## PersonID 1 525 363.85 201.89 511.00 371.22 69.68 102.0 566.00 464.0 -0.32
## women 2 525 0.73 0.44 1.00 0.79 0.00 0.0 1.00 1.0 -1.05
## baseage 3 525 80.13 6.08 80.02 79.83 6.14 69.7 95.31 25.6 0.36
## session 4 525 4.00 1.42 4.00 4.00 1.48 2.0 6.00 4.0 0.00
## studyday 5 525 5.70 3.84 6.00 5.37 4.45 1.0 17.00 16.0 0.42
## dayofweek 6 525 4.82 1.94 6.00 5.02 1.48 1.0 7.00 6.0 -0.76
## weekend 7 525 0.51 0.50 1.00 0.52 0.00 0.0 1.00 1.0 -0.06
## symptoms 8 419 1.29 1.33 1.00 1.09 1.48 0.0 5.00 5.0 1.06
## mood 9 412 1.19 0.37 1.00 1.10 0.00 1.0 3.00 2.0 2.70
## stressor 10 410 0.44 0.50 0.00 0.43 0.00 0.0 1.00 1.0 0.22
## kurtosis se
## PersonID -1.87 8.81
## women -0.89 0.02
## baseage -0.47 0.27
## session -1.31 0.06
## studyday -0.71 0.17
## dayofweek -0.66 0.08
## weekend -2.00 0.02
## symptoms 0.52 0.06
## mood 7.70 0.02
## stressor -1.95 0.02
Find Percent Missing in Each Column
pMiss <- function(x){sum(is.na(x))/length(x)*100} #function to find percentage
apply(stress_data,2,pMiss)
## PersonID women baseage session studyday dayofweek weekend symptoms
## 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 20.19048
## mood stressor
## 21.52381 21.90476
Visualize Missing Data Pattern
## Compute the unique response patterns:
pats <- mice::md.pattern(stress_data)
pats
## PersonID women baseage session studyday dayofweek weekend symptoms mood
## 256 1 1 1 1 1 1 1 1 1
## 72 1 1 1 1 1 1 1 1 1
## 78 1 1 1 1 1 1 1 1 0
## 13 1 1 1 1 1 1 1 1 0
## 60 1 1 1 1 1 1 1 0 1
## 24 1 1 1 1 1 1 1 0 1
## 16 1 1 1 1 1 1 1 0 0
## 6 1 1 1 1 1 1 1 0 0
## 0 0 0 0 0 0 0 106 113
## stressor
## 256 1 0
## 72 0 1
## 78 1 1
## 13 0 2
## 60 1 1
## 24 0 2
## 16 1 2
## 6 0 3
## 115 334
Correlations between variables
round(cor(stress_data, use = "pairwise"),3)
## PersonID women baseage session studyday dayofweek weekend symptoms
## PersonID 1.000 -0.087 -0.753 0.000 -0.023 -0.053 -0.071 -0.066
## women -0.087 1.000 0.102 0.000 0.013 0.031 0.026 -0.127
## baseage -0.753 0.102 1.000 0.000 0.021 0.038 0.056 0.045
## session 0.000 0.000 0.000 1.000 0.862 -0.256 -0.377 -0.035
## studyday -0.023 0.013 0.021 0.862 1.000 -0.233 -0.407 -0.002
## dayofweek -0.053 0.031 0.038 -0.256 -0.233 1.000 0.817 -0.010
## weekend -0.071 0.026 0.056 -0.377 -0.407 0.817 1.000 0.023
## symptoms -0.066 -0.127 0.045 -0.035 -0.002 -0.010 0.023 1.000
## mood -0.104 -0.001 0.131 0.033 0.045 -0.022 -0.024 0.342
## stressor 0.027 0.024 0.002 0.059 0.117 -0.084 -0.080 0.296
## mood stressor
## PersonID -0.104 0.027
## women -0.001 0.024
## baseage 0.131 0.002
## session 0.033 0.059
## studyday 0.045 0.117
## dayofweek -0.022 -0.084
## weekend -0.024 -0.080
## symptoms 0.342 0.296
## mood 1.000 0.303
## stressor 0.303 1.000
Model of Interest
Generate Imputations
The easiest way of specifying the imputation model is to use the formula argument of panImpute. Generally speaking, the imputation model should include all variables that are either (a) part of the model of interest, (b) related to the variables in the model, or (c) related to whether the variables in the model are missing.
In this example, we include stressor and mood as the target variables and baseage, women (gender), and symptoms as auxillary (or related) variables
library(mitml)
## Warning: package 'mitml' was built under R version 4.1.2
## *** This is beta software. Please report any bugs!
## *** See the NEWS file for recent changes.
fml <- mood + stressor + baseage + women + symptoms ~ 1 + (1|PersonID)
Notice how the model is “empty” on the right hand side of the ~ This allows for all relations between variables at level 1 and 2 and is suitable for most applications of multilevel rando mintercept model.
The imputation procedure is then run for 5,000 iterations (burn-in), after which 100 imputations are drawn every 100 iterations.
imp <- panImpute(stress_data, formula = fml, n.burn = 5000, n.iter = 100, m = 100)
## Running burn-in phase ...
## Creating imputed data set ( 1 / 100 ) ...
## Creating imputed data set ( 2 / 100 ) ...
## Creating imputed data set ( 3 / 100 ) ...
## Creating imputed data set ( 4 / 100 ) ...
## Creating imputed data set ( 5 / 100 ) ...
## Creating imputed data set ( 6 / 100 ) ...
## Creating imputed data set ( 7 / 100 ) ...
## Creating imputed data set ( 8 / 100 ) ...
## Creating imputed data set ( 9 / 100 ) ...
## Creating imputed data set ( 10 / 100 ) ...
## Creating imputed data set ( 11 / 100 ) ...
## Creating imputed data set ( 12 / 100 ) ...
## Creating imputed data set ( 13 / 100 ) ...
## Creating imputed data set ( 14 / 100 ) ...
## Creating imputed data set ( 15 / 100 ) ...
## Creating imputed data set ( 16 / 100 ) ...
## Creating imputed data set ( 17 / 100 ) ...
## Creating imputed data set ( 18 / 100 ) ...
## Creating imputed data set ( 19 / 100 ) ...
## Creating imputed data set ( 20 / 100 ) ...
## Creating imputed data set ( 21 / 100 ) ...
## Creating imputed data set ( 22 / 100 ) ...
## Creating imputed data set ( 23 / 100 ) ...
## Creating imputed data set ( 24 / 100 ) ...
## Creating imputed data set ( 25 / 100 ) ...
## Creating imputed data set ( 26 / 100 ) ...
## Creating imputed data set ( 27 / 100 ) ...
## Creating imputed data set ( 28 / 100 ) ...
## Creating imputed data set ( 29 / 100 ) ...
## Creating imputed data set ( 30 / 100 ) ...
## Creating imputed data set ( 31 / 100 ) ...
## Creating imputed data set ( 32 / 100 ) ...
## Creating imputed data set ( 33 / 100 ) ...
## Creating imputed data set ( 34 / 100 ) ...
## Creating imputed data set ( 35 / 100 ) ...
## Creating imputed data set ( 36 / 100 ) ...
## Creating imputed data set ( 37 / 100 ) ...
## Creating imputed data set ( 38 / 100 ) ...
## Creating imputed data set ( 39 / 100 ) ...
## Creating imputed data set ( 40 / 100 ) ...
## Creating imputed data set ( 41 / 100 ) ...
## Creating imputed data set ( 42 / 100 ) ...
## Creating imputed data set ( 43 / 100 ) ...
## Creating imputed data set ( 44 / 100 ) ...
## Creating imputed data set ( 45 / 100 ) ...
## Creating imputed data set ( 46 / 100 ) ...
## Creating imputed data set ( 47 / 100 ) ...
## Creating imputed data set ( 48 / 100 ) ...
## Creating imputed data set ( 49 / 100 ) ...
## Creating imputed data set ( 50 / 100 ) ...
## Creating imputed data set ( 51 / 100 ) ...
## Creating imputed data set ( 52 / 100 ) ...
## Creating imputed data set ( 53 / 100 ) ...
## Creating imputed data set ( 54 / 100 ) ...
## Creating imputed data set ( 55 / 100 ) ...
## Creating imputed data set ( 56 / 100 ) ...
## Creating imputed data set ( 57 / 100 ) ...
## Creating imputed data set ( 58 / 100 ) ...
## Creating imputed data set ( 59 / 100 ) ...
## Creating imputed data set ( 60 / 100 ) ...
## Creating imputed data set ( 61 / 100 ) ...
## Creating imputed data set ( 62 / 100 ) ...
## Creating imputed data set ( 63 / 100 ) ...
## Creating imputed data set ( 64 / 100 ) ...
## Creating imputed data set ( 65 / 100 ) ...
## Creating imputed data set ( 66 / 100 ) ...
## Creating imputed data set ( 67 / 100 ) ...
## Creating imputed data set ( 68 / 100 ) ...
## Creating imputed data set ( 69 / 100 ) ...
## Creating imputed data set ( 70 / 100 ) ...
## Creating imputed data set ( 71 / 100 ) ...
## Creating imputed data set ( 72 / 100 ) ...
## Creating imputed data set ( 73 / 100 ) ...
## Creating imputed data set ( 74 / 100 ) ...
## Creating imputed data set ( 75 / 100 ) ...
## Creating imputed data set ( 76 / 100 ) ...
## Creating imputed data set ( 77 / 100 ) ...
## Creating imputed data set ( 78 / 100 ) ...
## Creating imputed data set ( 79 / 100 ) ...
## Creating imputed data set ( 80 / 100 ) ...
## Creating imputed data set ( 81 / 100 ) ...
## Creating imputed data set ( 82 / 100 ) ...
## Creating imputed data set ( 83 / 100 ) ...
## Creating imputed data set ( 84 / 100 ) ...
## Creating imputed data set ( 85 / 100 ) ...
## Creating imputed data set ( 86 / 100 ) ...
## Creating imputed data set ( 87 / 100 ) ...
## Creating imputed data set ( 88 / 100 ) ...
## Creating imputed data set ( 89 / 100 ) ...
## Creating imputed data set ( 90 / 100 ) ...
## Creating imputed data set ( 91 / 100 ) ...
## Creating imputed data set ( 92 / 100 ) ...
## Creating imputed data set ( 93 / 100 ) ...
## Creating imputed data set ( 94 / 100 ) ...
## Creating imputed data set ( 95 / 100 ) ...
## Creating imputed data set ( 96 / 100 ) ...
## Creating imputed data set ( 97 / 100 ) ...
## Creating imputed data set ( 98 / 100 ) ...
## Creating imputed data set ( 99 / 100 ) ...
## Creating imputed data set ( 100 / 100 ) ...
## Done!
Assessing Convergence
Convergence Plot
Completing the Data
In order to work with and analyze the imputed data sets, the data sets must be completed with the imputations generated in the previous steps. To do so, mitml provides the function mitmlComplete.
implist <- mitmlComplete(imp, "all")
Analysis and Pooling
In order to obtain estimates for the model of interest, the model must be fit separately to each of the completed data sets, and the results must be pooled into a final set of estimates and inferences. The mitml package offers the with function to fit various statistical models to a list of completed data sets.
In this example, we use the lmer function from the R package lme4 to fit the model of interest.
We fit a model with the imputed data, a model with the missing data, and a model with an almost complete dataset.
library(lme4)
## Loading required package: Matrix
fit.imp <- with(implist, lmer(mood ~ 1 + stressor + symptoms + baseage + women + (1|PersonID)))
fit1 <- with(stress_data, lmer(mood ~ 1 + stressor + symptoms + baseage + women + (1|PersonID)))
fit2 <- with(dat, lmer(mood ~ 1 + stressor + symptoms + baseage + women + (1|PersonID)))
Model with Imputed Values
options(scipen = 999)
testEstimates(fit.imp, extra.pars = TRUE)
##
## Call:
##
## testEstimates(model = fit.imp, extra.pars = TRUE)
##
## Final parameter estimates and inferences obtained from 100 imputed data sets.
##
## Estimate Std.Error t.value df P(>|t|) RIV FMI
## (Intercept) 0.425 0.333 1.279 23698.763 0.201 0.069 0.065
## stressor 0.157 0.040 3.955 605.909 0.000 0.678 0.406
## symptoms 0.042 0.018 2.367 540.386 0.018 0.748 0.430
## baseage 0.008 0.004 1.929 30055.772 0.054 0.061 0.057
## women 0.013 0.060 0.217 5055.827 0.828 0.163 0.140
##
## Estimate
## Intercept~~Intercept|PersonID 0.046
## Residual~~Residual 0.078
## ICC|PersonID 0.371
##
## Unadjusted hypothesis test as appropriate in larger samples.
Model with Listwise Deletion
summary(fit1)
## Linear mixed model fit by REML ['lmerMod']
## Formula: mood ~ 1 + stressor + symptoms + baseage + women + (1 | PersonID)
##
## REML criterion at convergence: 179.2
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.3892 -0.4425 -0.1095 0.1261 3.8644
##
## Random effects:
## Groups Name Variance Std.Dev.
## PersonID (Intercept) 0.05051 0.2247
## Residual 0.07354 0.2712
## Number of obs: 256, groups: PersonID, 98
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 0.555048 0.379298 1.463
## stressor 0.190412 0.044143 4.314
## symptoms 0.059472 0.018288 3.252
## baseage 0.005551 0.004749 1.169
## women 0.055240 0.065666 0.841
##
## Correlation of Fixed Effects:
## (Intr) strssr symptm baseag
## stressor -0.045
## symptoms -0.041 -0.171
## baseage -0.986 0.003 -0.028
## women -0.009 0.019 0.116 -0.124
out1 <- summary(fit1)
coef1 <- out1[["coefficients"]]
p1 <- 2 * pnorm(abs(coef1[, "t value"]), lower.tail=FALSE)
formatC(p1, format="e", digits=2)
## (Intercept) stressor symptoms baseage women
## "1.43e-01" "1.61e-05" "1.15e-03" "2.42e-01" "4.00e-01"
Model with < 1% missing in any column
summary(fit2)
## Linear mixed model fit by REML ['lmerMod']
## Formula: mood ~ 1 + stressor + symptoms + baseage + women + (1 | PersonID)
##
## REML criterion at convergence: 356.1
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.0981 -0.5176 -0.1373 0.1355 4.8647
##
## Random effects:
## Groups Name Variance Std.Dev.
## PersonID (Intercept) 0.03477 0.1865
## Residual 0.08969 0.2995
## Number of obs: 513, groups: PersonID, 105
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 0.406581 0.299261 1.359
## stressor 0.171954 0.032133 5.351
## symptoms 0.047339 0.013894 3.407
## baseage 0.007901 0.003748 2.108
## women 0.033953 0.051815 0.655
##
## Correlation of Fixed Effects:
## (Intr) strssr symptm baseag
## stressor -0.046
## symptoms -0.014 -0.169
## baseage -0.986 0.010 -0.053
## women -0.019 -0.015 0.115 -0.114
out2 <- summary(fit2)
coef2 <- out2[["coefficients"]]
p2 <- 2 * pnorm(abs(coef2[, "t value"]), lower.tail=FALSE)
formatC(p2, format="e", digits=2)
## (Intercept) stressor symptoms baseage women
## "1.74e-01" "8.73e-08" "6.57e-04" "3.50e-02" "5.12e-01"