Multilevel Multiple Imputation

Set Working Directory and Read in Data

setwd("H:/USC Guest Lecture/Multilevel MI")
dat  <- read.csv("stress_data.csv")

Impose Missingness into Dataframe

stress_data <- simsem::imposeMissing(dat, 
                                     cov = "baseage", 
                                     pmMAR = .2,
                                     ignoreCols = 1:7)

Describe Data

psych::describe(stress_data)

##           vars   n   mean     sd median trimmed   mad   min    max range  skew
## PersonID     1 525 363.85 201.89 511.00  371.22 69.68 102.0 566.00 464.0 -0.32
## women        2 525   0.73   0.44   1.00    0.79  0.00   0.0   1.00   1.0 -1.05
## baseage      3 525  80.13   6.08  80.02   79.83  6.14  69.7  95.31  25.6  0.36
## session      4 525   4.00   1.42   4.00    4.00  1.48   2.0   6.00   4.0  0.00
## studyday     5 525   5.70   3.84   6.00    5.37  4.45   1.0  17.00  16.0  0.42
## dayofweek    6 525   4.82   1.94   6.00    5.02  1.48   1.0   7.00   6.0 -0.76
## weekend      7 525   0.51   0.50   1.00    0.52  0.00   0.0   1.00   1.0 -0.06
## symptoms     8 419   1.29   1.33   1.00    1.09  1.48   0.0   5.00   5.0  1.06
## mood         9 412   1.19   0.37   1.00    1.10  0.00   1.0   3.00   2.0  2.70
## stressor    10 410   0.44   0.50   0.00    0.43  0.00   0.0   1.00   1.0  0.22
##           kurtosis   se
## PersonID     -1.87 8.81
## women        -0.89 0.02
## baseage      -0.47 0.27
## session      -1.31 0.06
## studyday     -0.71 0.17
## dayofweek    -0.66 0.08
## weekend      -2.00 0.02
## symptoms      0.52 0.06
## mood          7.70 0.02
## stressor     -1.95 0.02

Find Percent Missing in Each Column

pMiss <- function(x){sum(is.na(x))/length(x)*100} #function to find percentage
apply(stress_data,2,pMiss)

##  PersonID     women   baseage   session  studyday dayofweek   weekend  symptoms 
##   0.00000   0.00000   0.00000   0.00000   0.00000   0.00000   0.00000  20.19048 
##      mood  stressor 
##  21.52381  21.90476

Visualize Missing Data Pattern

## Compute the unique response patterns:
pats <- mice::md.pattern(stress_data)

pats

##     PersonID women baseage session studyday dayofweek weekend symptoms mood
## 256        1     1       1       1        1         1       1        1    1
## 72         1     1       1       1        1         1       1        1    1
## 78         1     1       1       1        1         1       1        1    0
## 13         1     1       1       1        1         1       1        1    0
## 60         1     1       1       1        1         1       1        0    1
## 24         1     1       1       1        1         1       1        0    1
## 16         1     1       1       1        1         1       1        0    0
## 6          1     1       1       1        1         1       1        0    0
##            0     0       0       0        0         0       0      106  113
##     stressor    
## 256        1   0
## 72         0   1
## 78         1   1
## 13         0   2
## 60         1   1
## 24         0   2
## 16         1   2
## 6          0   3
##          115 334

Correlations between variables

round(cor(stress_data, use = "pairwise"),3)

##           PersonID  women baseage session studyday dayofweek weekend symptoms
## PersonID     1.000 -0.087  -0.753   0.000   -0.023    -0.053  -0.071   -0.066
## women       -0.087  1.000   0.102   0.000    0.013     0.031   0.026   -0.127
## baseage     -0.753  0.102   1.000   0.000    0.021     0.038   0.056    0.045
## session      0.000  0.000   0.000   1.000    0.862    -0.256  -0.377   -0.035
## studyday    -0.023  0.013   0.021   0.862    1.000    -0.233  -0.407   -0.002
## dayofweek   -0.053  0.031   0.038  -0.256   -0.233     1.000   0.817   -0.010
## weekend     -0.071  0.026   0.056  -0.377   -0.407     0.817   1.000    0.023
## symptoms    -0.066 -0.127   0.045  -0.035   -0.002    -0.010   0.023    1.000
## mood        -0.104 -0.001   0.131   0.033    0.045    -0.022  -0.024    0.342
## stressor     0.027  0.024   0.002   0.059    0.117    -0.084  -0.080    0.296
##             mood stressor
## PersonID  -0.104    0.027
## women     -0.001    0.024
## baseage    0.131    0.002
## session    0.033    0.059
## studyday   0.045    0.117
## dayofweek -0.022   -0.084
## weekend   -0.024   -0.080
## symptoms   0.342    0.296
## mood       1.000    0.303
## stressor   0.303    1.000

Model of Interest

Generate Imputations

The easiest way of specifying the imputation model is to use the formula argument of panImpute. Generally speaking, the imputation model should include all variables that are either (a) part of the model of interest, (b) related to the variables in the model, or (c) related to whether the variables in the model are missing.

In this example, we include stressor and mood as the target variables and baseage, women (gender), and symptoms as auxillary (or related) variables

library(mitml)

## Warning: package 'mitml' was built under R version 4.1.2

## *** This is beta software. Please report any bugs!
## *** See the NEWS file for recent changes.

fml <- mood + stressor + baseage + women + symptoms ~ 1 + (1|PersonID)

Notice how the model is “empty” on the right hand side of the ~ This allows for all relations between variables at level 1 and 2 and is suitable for most applications of multilevel rando mintercept model.

The imputation procedure is then run for 5,000 iterations (burn-in), after which 100 imputations are drawn every 100 iterations.

imp <- panImpute(stress_data, formula = fml, n.burn = 5000, n.iter = 100, m = 100)

## Running burn-in phase ...
## Creating imputed data set ( 1 / 100 ) ...
## Creating imputed data set ( 2 / 100 ) ...
## Creating imputed data set ( 3 / 100 ) ...
## Creating imputed data set ( 4 / 100 ) ...
## Creating imputed data set ( 5 / 100 ) ...
## Creating imputed data set ( 6 / 100 ) ...
## Creating imputed data set ( 7 / 100 ) ...
## Creating imputed data set ( 8 / 100 ) ...
## Creating imputed data set ( 9 / 100 ) ...
## Creating imputed data set ( 10 / 100 ) ...
## Creating imputed data set ( 11 / 100 ) ...
## Creating imputed data set ( 12 / 100 ) ...
## Creating imputed data set ( 13 / 100 ) ...
## Creating imputed data set ( 14 / 100 ) ...
## Creating imputed data set ( 15 / 100 ) ...
## Creating imputed data set ( 16 / 100 ) ...
## Creating imputed data set ( 17 / 100 ) ...
## Creating imputed data set ( 18 / 100 ) ...
## Creating imputed data set ( 19 / 100 ) ...
## Creating imputed data set ( 20 / 100 ) ...
## Creating imputed data set ( 21 / 100 ) ...
## Creating imputed data set ( 22 / 100 ) ...
## Creating imputed data set ( 23 / 100 ) ...
## Creating imputed data set ( 24 / 100 ) ...
## Creating imputed data set ( 25 / 100 ) ...
## Creating imputed data set ( 26 / 100 ) ...
## Creating imputed data set ( 27 / 100 ) ...
## Creating imputed data set ( 28 / 100 ) ...
## Creating imputed data set ( 29 / 100 ) ...
## Creating imputed data set ( 30 / 100 ) ...
## Creating imputed data set ( 31 / 100 ) ...
## Creating imputed data set ( 32 / 100 ) ...
## Creating imputed data set ( 33 / 100 ) ...
## Creating imputed data set ( 34 / 100 ) ...
## Creating imputed data set ( 35 / 100 ) ...
## Creating imputed data set ( 36 / 100 ) ...
## Creating imputed data set ( 37 / 100 ) ...
## Creating imputed data set ( 38 / 100 ) ...
## Creating imputed data set ( 39 / 100 ) ...
## Creating imputed data set ( 40 / 100 ) ...
## Creating imputed data set ( 41 / 100 ) ...
## Creating imputed data set ( 42 / 100 ) ...
## Creating imputed data set ( 43 / 100 ) ...
## Creating imputed data set ( 44 / 100 ) ...
## Creating imputed data set ( 45 / 100 ) ...
## Creating imputed data set ( 46 / 100 ) ...
## Creating imputed data set ( 47 / 100 ) ...
## Creating imputed data set ( 48 / 100 ) ...
## Creating imputed data set ( 49 / 100 ) ...
## Creating imputed data set ( 50 / 100 ) ...
## Creating imputed data set ( 51 / 100 ) ...
## Creating imputed data set ( 52 / 100 ) ...
## Creating imputed data set ( 53 / 100 ) ...
## Creating imputed data set ( 54 / 100 ) ...
## Creating imputed data set ( 55 / 100 ) ...
## Creating imputed data set ( 56 / 100 ) ...
## Creating imputed data set ( 57 / 100 ) ...
## Creating imputed data set ( 58 / 100 ) ...
## Creating imputed data set ( 59 / 100 ) ...
## Creating imputed data set ( 60 / 100 ) ...
## Creating imputed data set ( 61 / 100 ) ...
## Creating imputed data set ( 62 / 100 ) ...
## Creating imputed data set ( 63 / 100 ) ...
## Creating imputed data set ( 64 / 100 ) ...
## Creating imputed data set ( 65 / 100 ) ...
## Creating imputed data set ( 66 / 100 ) ...
## Creating imputed data set ( 67 / 100 ) ...
## Creating imputed data set ( 68 / 100 ) ...
## Creating imputed data set ( 69 / 100 ) ...
## Creating imputed data set ( 70 / 100 ) ...
## Creating imputed data set ( 71 / 100 ) ...
## Creating imputed data set ( 72 / 100 ) ...
## Creating imputed data set ( 73 / 100 ) ...
## Creating imputed data set ( 74 / 100 ) ...
## Creating imputed data set ( 75 / 100 ) ...
## Creating imputed data set ( 76 / 100 ) ...
## Creating imputed data set ( 77 / 100 ) ...
## Creating imputed data set ( 78 / 100 ) ...
## Creating imputed data set ( 79 / 100 ) ...
## Creating imputed data set ( 80 / 100 ) ...
## Creating imputed data set ( 81 / 100 ) ...
## Creating imputed data set ( 82 / 100 ) ...
## Creating imputed data set ( 83 / 100 ) ...
## Creating imputed data set ( 84 / 100 ) ...
## Creating imputed data set ( 85 / 100 ) ...
## Creating imputed data set ( 86 / 100 ) ...
## Creating imputed data set ( 87 / 100 ) ...
## Creating imputed data set ( 88 / 100 ) ...
## Creating imputed data set ( 89 / 100 ) ...
## Creating imputed data set ( 90 / 100 ) ...
## Creating imputed data set ( 91 / 100 ) ...
## Creating imputed data set ( 92 / 100 ) ...
## Creating imputed data set ( 93 / 100 ) ...
## Creating imputed data set ( 94 / 100 ) ...
## Creating imputed data set ( 95 / 100 ) ...
## Creating imputed data set ( 96 / 100 ) ...
## Creating imputed data set ( 97 / 100 ) ...
## Creating imputed data set ( 98 / 100 ) ...
## Creating imputed data set ( 99 / 100 ) ...
## Creating imputed data set ( 100 / 100 ) ...
## Done!

Assessing Convergence

Convergence Plot

Completing the Data

In order to work with and analyze the imputed data sets, the data sets must be completed with the imputations generated in the previous steps. To do so, mitml provides the function mitmlComplete.

implist <- mitmlComplete(imp, "all")

Analysis and Pooling

In order to obtain estimates for the model of interest, the model must be fit separately to each of the completed data sets, and the results must be pooled into a final set of estimates and inferences. The mitml package offers the with function to fit various statistical models to a list of completed data sets.

In this example, we use the lmer function from the R package lme4 to fit the model of interest.

We fit a model with the imputed data, a model with the missing data, and a model with an almost complete dataset.

library(lme4)

## Loading required package: Matrix

fit.imp <- with(implist, lmer(mood ~ 1 + stressor + symptoms + baseage + women + (1|PersonID)))
fit1 <- with(stress_data, lmer(mood ~ 1 + stressor + symptoms + baseage + women + (1|PersonID)))
fit2 <- with(dat, lmer(mood ~ 1 + stressor + symptoms + baseage + women + (1|PersonID)))

Model with Imputed Values

options(scipen = 999)

testEstimates(fit.imp, extra.pars = TRUE)

## 
## Call:
## 
## testEstimates(model = fit.imp, extra.pars = TRUE)
## 
## Final parameter estimates and inferences obtained from 100 imputed data sets.
## 
##              Estimate Std.Error   t.value        df   P(>|t|)       RIV       FMI 
## (Intercept)     0.425     0.333     1.279 23698.763     0.201     0.069     0.065 
## stressor        0.157     0.040     3.955   605.909     0.000     0.678     0.406 
## symptoms        0.042     0.018     2.367   540.386     0.018     0.748     0.430 
## baseage         0.008     0.004     1.929 30055.772     0.054     0.061     0.057 
## women           0.013     0.060     0.217  5055.827     0.828     0.163     0.140 
## 
##                               Estimate 
## Intercept~~Intercept|PersonID    0.046 
## Residual~~Residual               0.078 
## ICC|PersonID                     0.371 
## 
## Unadjusted hypothesis test as appropriate in larger samples.

Model with Listwise Deletion

summary(fit1)

## Linear mixed model fit by REML ['lmerMod']
## Formula: mood ~ 1 + stressor + symptoms + baseage + women + (1 | PersonID)
## 
## REML criterion at convergence: 179.2
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.3892 -0.4425 -0.1095  0.1261  3.8644 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  PersonID (Intercept) 0.05051  0.2247  
##  Residual             0.07354  0.2712  
## Number of obs: 256, groups:  PersonID, 98
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept) 0.555048   0.379298   1.463
## stressor    0.190412   0.044143   4.314
## symptoms    0.059472   0.018288   3.252
## baseage     0.005551   0.004749   1.169
## women       0.055240   0.065666   0.841
## 
## Correlation of Fixed Effects:
##          (Intr) strssr symptm baseag
## stressor -0.045                     
## symptoms -0.041 -0.171              
## baseage  -0.986  0.003 -0.028       
## women    -0.009  0.019  0.116 -0.124

out1 <- summary(fit1)
coef1 <- out1[["coefficients"]]
p1 <- 2 * pnorm(abs(coef1[, "t value"]), lower.tail=FALSE)
formatC(p1, format="e", digits=2)

## (Intercept)    stressor    symptoms     baseage       women 
##  "1.43e-01"  "1.61e-05"  "1.15e-03"  "2.42e-01"  "4.00e-01"

Model with < 1% missing in any column

summary(fit2)

## Linear mixed model fit by REML ['lmerMod']
## Formula: mood ~ 1 + stressor + symptoms + baseage + women + (1 | PersonID)
## 
## REML criterion at convergence: 356.1
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -2.0981 -0.5176 -0.1373  0.1355  4.8647 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  PersonID (Intercept) 0.03477  0.1865  
##  Residual             0.08969  0.2995  
## Number of obs: 513, groups:  PersonID, 105
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept) 0.406581   0.299261   1.359
## stressor    0.171954   0.032133   5.351
## symptoms    0.047339   0.013894   3.407
## baseage     0.007901   0.003748   2.108
## women       0.033953   0.051815   0.655
## 
## Correlation of Fixed Effects:
##          (Intr) strssr symptm baseag
## stressor -0.046                     
## symptoms -0.014 -0.169              
## baseage  -0.986  0.010 -0.053       
## women    -0.019 -0.015  0.115 -0.114

out2 <- summary(fit2)
coef2 <- out2[["coefficients"]]
p2 <- 2 * pnorm(abs(coef2[, "t value"]), lower.tail=FALSE)
formatC(p2, format="e", digits=2)

## (Intercept)    stressor    symptoms     baseage       women 
##  "1.74e-01"  "8.73e-08"  "6.57e-04"  "3.50e-02"  "5.12e-01"

Multilevel Multiple Imputation - USC Guest Lecture

Omolola A. Odejimi

11/30/2021