\[ \renewcommand\vec{\boldsymbol} \def\bigO#1{\mathcal{O}(#1)} \def\Cond#1#2{\left(#1\,\middle|\, #2\right)} \def\mat#1{\boldsymbol{#1}} \def\der{{\mathop{}\!\mathrm{d}}} \def\argmax{\text{arg}\,\text{max}} \def\Prob{\text{P}} \def\Expec{\text{E}} \def\logit{\text{logit}} \def\diag{\text{diag}} \]
Multinomial, ordinal, binary, and continuous. Examples: surveys and medical records.
Mostly assumptions about the dependence between variables.
with an approximate expectation maximization (AEM) algorithm (Zhao and Udell 2020b, 2020a).
Fast but may be biased and inefficient.
Latent \(Z_1,Z_2\) are joint normal distributed. We observe \(X_2 = f(Z_2)\) and \(X_1 \in\{A,B,C\}\). The illustration is contours of possible conditional densities of the continuous variable on the latent and observed scale.
An extension of Genz and Bretz (2002).
in addition the existing binary, ordinal, and continuous variables.
Simulation study comparing our method with the AEM algorithm. Left: relative error of the estimated correlation matrix versus the sample size (gray: our method; white: the AEM algorithm). Right: differences in imputation error between our method and the AEM algorithm for each data type versus the sample size.
Average imputation error and computation time with simulated and observational data sets. We compare with the random forest based missForest (Stekhoven and Buehlmann 2012) and the PCA-like imputeFAMD (Audigier, Husson, and Josse 2014; Josse and Husson 2016).
Average imputation error and computation time with simulated and observational data sets. We compare with the random forest based missForest (Stekhoven and Buehlmann 2012) and the PCA-like imputeFAMD (Audigier, Husson, and Josse 2014; Josse and Husson 2016).
Average imputation error and computation time with simulated and observational data sets. We compare with the random forest based missForest (Stekhoven and Buehlmann 2012) and the PCA-like imputeFAMD (Audigier, Husson, and Josse 2014; Josse and Husson 2016).
The mdgc package is on CRAN and at github.com/boennecd/mdgc.
The presentation is at rpubs.com/boennecd/mdgc-ACML.
The markdown is at github.com/boennecd/Talks.
References are on the next slide.