This document compares mclust and Mplus output, for the purpose of benchmarking the use of mclust, an open-source tool, to Mplus.estimate_profiles()
uses the mclust R package, while estimate_profiles_mplus()
generates Mplus model syntax which is then run with Mplus. Note that for the estimate_profiles_mplus()
function to work, you must have Mplus installed (and purchased).
library(tidyLPA)
#> tidyLPA provides the functionality to carry out Latent Profile Analysis. Note that tidyLPA is still at the beta stage!
#> Please report any bugs at https://github.com/jrosen48/tidyLPA or send an email to jrosen@msu.edu.
The functions that use MPlus rely on the MplusAutomation R package (and MPlus) to run models from R. For example, let’s run the following function:
m1_mplus <- estimate_profiles_mplus(iris,
Sepal.Length, Sepal.Width, Petal.Length, Petal.Width,
n_profiles = 2,
model = 1)
#> Note that this (and other functions that use MPlus) is at the experimental stage! Please provide feedback at https://github.com/jrosen48/tidyLPA
#> LogLik is 488.915
#> BIC is 1042.968
#> Entropy is 0.991
Running this generates four files necessary for running an LPA model through MPlus: 1. a data file (.dat
) 1. an input / syntax file (.inp
) 1. (from running the model through MPlus) an output file (.out
) 1. (also from running the model through MPlus) a modified data file, with the class probabilities and class (profile) assignments (also .dat
, but with a different file name than the data file used in the input)
Here, for instance, is the input file dynamically generated by tidyLPA by the function above:
TITLE: test
DATA: File is d.dat;
VARIABLE:
Names are Sepal_Length Sepal_Width Petal_Length Petal_Width;
Classes = c(2);
MODEL:
%overall%
[Sepal_Length Sepal_Width Petal_Length Petal_Width];
Sepal_Length Sepal_Width Petal_Length Petal_Width;
Sepal_Length WITH Sepal_Width@0;
Sepal_Length WITH Petal_Length@0;
Sepal_Length WITH Petal_Width@0;
Sepal_Width WITH Petal_Length@0;
Sepal_Width WITH Petal_Width@0;
Petal_Length WITH Petal_Width@0;
%c#1%
[Sepal_Length Sepal_Width Petal_Length Petal_Width];
Sepal_Length Sepal_Width Petal_Length Petal_Width(1-4);
Sepal_Length WITH Sepal_Width@0;
Sepal_Length WITH Petal_Length@0;
Sepal_Length WITH Petal_Width@0;
Sepal_Width WITH Petal_Length@0;
Sepal_Width WITH Petal_Width@0;
Petal_Length WITH Petal_Width@0;
%c#2%
[Sepal_Length Sepal_Width Petal_Length Petal_Width];
Sepal_Length Sepal_Width Petal_Length Petal_Width(1-4);
Sepal_Length WITH Sepal_Width@0;
Sepal_Length WITH Petal_Length@0;
Sepal_Length WITH Petal_Width@0;
Sepal_Width WITH Petal_Length@0;
Sepal_Width WITH Petal_Width@0;
Petal_Length WITH Petal_Width@0;
ANALYSIS:
Type is mixture;
start = 20 4;
miterations = 500;
stiterations = 10;
convergence = 1e-06;
processors = 1;
OUTPUT: TECH1 TECH11;
SAVEDATA: File is d-mod.dat;
SAVE = CPROBABILITIES;
Along with the data file that is also automatically generated, this input file is the sent to MPlus (you don’t have to open MPlus or do anything - the MplusAutomation package makes this both possible and easy), and then the output file is returned.
We can see how this input file dynamically changes if we change the function’s arguments by removing a variable, increasing the number of profiles, and changing the model that is specified:
m1_mplus <- estimate_profiles_mplus(iris,
Sepal.Length, Sepal.Width, Petal.Length,
n_profiles = 3,
model = 2, remove_tmp_files=F)
#> Note that this (and other functions that use MPlus) is at the experimental stage! Please provide feedback at https://github.com/jrosen48/tidyLPA
#> LogLik is 308.07
#> BIC is 701.32
#> Entropy is 0.917
TITLE: test
DATA: File is d.dat;
VARIABLE:
Names are Sepal_Length Sepal_Width Petal_Length;
Classes = c(3);
MODEL:
%overall%
[Sepal_Length Sepal_Width Petal_Length];
Sepal_Length Sepal_Width Petal_Length;
Sepal_Length WITH Sepal_Width;
Sepal_Length WITH Petal_Length;
Sepal_Width WITH Petal_Length;
%c#1%
[Sepal_Length Sepal_Width Petal_Length];
Sepal_Length Sepal_Width Petal_Length(1-3);
Sepal_Length WITH Sepal_Width(4);
Sepal_Length WITH Petal_Length(5);
Sepal_Width WITH Petal_Length(6);
%c#2%
[Sepal_Length Sepal_Width Petal_Length];
Sepal_Length Sepal_Width Petal_Length(1-3);
Sepal_Length WITH Sepal_Width(4);
Sepal_Length WITH Petal_Length(5);
Sepal_Width WITH Petal_Length(6);
%c#3%
[Sepal_Length Sepal_Width Petal_Length];
Sepal_Length Sepal_Width Petal_Length(1-3);
Sepal_Length WITH Sepal_Width(4);
Sepal_Length WITH Petal_Length(5);
Sepal_Width WITH Petal_Length(6);
ANALYSIS:
Type is mixture;
start = 20 4;
miterations = 500;
stiterations = 10;
convergence = 1e-06;
processors = 1;
OUTPUT: TECH1 TECH11;
SAVEDATA: File is d-mod.dat;
SAVE = CPROBABILITIES;
Before proceeding, I will note that the functions that use MPlus are still under-development; dynamically generating the input files is technically demanding, and it is possible that these functions will change significantly in later versions. Nevertheless, they work well with testing using both built-in data (like the iris and geyser datasets used in this document) as well as a range of other data.
In the rest of this document, Mplus and mclust are benchmarked to one another. For ease of comparison, the estimated log-likelihood; if these are identical it probably means all of the estimated paramteters are identical (other benchmarks we have done with Mplus and mclust have demonstrated this).
Mplus
m1_mplus <- estimate_profiles_mplus(iris,
Sepal.Length, Sepal.Width, Petal.Length, Petal.Width,
n_profiles = 2,
model = 1)
#> Note that this (and other functions that use MPlus) is at the experimental stage! Please provide feedback at https://github.com/jrosen48/tidyLPA
#> LogLik is 488.915
#> BIC is 1042.968
#> Entropy is 0.991
m2_mplus <- estimate_profiles_mplus(iris,
Sepal.Length, Sepal.Width, Petal.Length, Petal.Width,
n_profiles = 2,
model = 2)
#> Note that this (and other functions that use MPlus) is at the experimental stage! Please provide feedback at https://github.com/jrosen48/tidyLPA
#> LogLik is 296.448
#> BIC is 688.097
#> Entropy is 1
m3_mplus <- estimate_profiles_mplus(iris,
Sepal.Length, Sepal.Width, Petal.Length, Petal.Width,
n_profiles = 2,
model = 3)
#> Note that this (and other functions that use MPlus) is at the experimental stage! Please provide feedback at https://github.com/jrosen48/tidyLPA
#> LogLik is 386.185
#> BIC is 857.551
#> Entropy is 1
m5_mplus <- estimate_profiles_mplus(iris,
Sepal.Length, Sepal.Width, Petal.Length, Petal.Width,
n_profiles = 2,
model = 6)
#> Note that this (and other functions that use MPlus) is at the experimental stage! Please provide feedback at https://github.com/jrosen48/tidyLPA
#> LogLik is 214.355
#> BIC is 574.018
#> Entropy is 1
mclust
Note that model 4 can only be specified in Mplus, so is not included here.
m1_mclust <- estimate_profiles(iris,
Sepal.Length, Sepal.Width, Petal.Length, Petal.Width,
n_profiles = 2,
model = 1)
#> Fit varying means, equal variances, covariances fixed to 0 (Model 1) model with 2 profiles.
#> LogLik is 488.915
#> BIC is 1042.968
#> Entropy is 0.998
m2_mclust <- estimate_profiles(iris,
Sepal.Length, Sepal.Width, Petal.Length, Petal.Width,
n_profiles = 2,
model = 2)
#> Fit varying means, equal variances and covariances (Model 2) model with 2 profiles.
#> LogLik is 296.448
#> BIC is 688.097
#> Entropy is 1
m3_mclust <- estimate_profiles(iris,
Sepal.Length, Sepal.Width, Petal.Length, Petal.Width,
n_profiles = 2,
model = 3)
#> Fit varying means and variances, covariances fixed to 0 (Model 3) model with 2 profiles.
#> LogLik is 386.185
#> BIC is 857.551
#> Entropy is 1
m5_mclust <- estimate_profiles(iris,
Sepal.Length, Sepal.Width, Petal.Length, Petal.Width,
n_profiles = 2,
model = 6)
#> Fit varying means, variances, and covariances (Model 4) model with 2 profiles.
#> LogLik is 214.355
#> BIC is 574.018
#> Entropy is 1
Here is a summary of the log-likelihood statistics:
library(tibble)
iris_log_lik <- tribble(
~model, ~mplus, ~mclust,
1, 488.915, 488.915,
2, 296.448, 296.448,
3, 386.185, 386.185,
5, 214.355, 214.355
)
# iris_log_lik$model <- as.integer(iris_log_lik$model)
What if we specify 3 profiles?
Mplus
m1_mplus <- estimate_profiles_mplus(iris,
Sepal.Length, Sepal.Width, Petal.Length, Petal.Width,
n_profiles = 3,
model = 1)
#> Note that this (and other functions that use MPlus) is at the experimental stage! Please provide feedback at https://github.com/jrosen48/tidyLPA
#> LogLik is 361.426
#> BIC is 813.042
#> Entropy is 0.957
m2_mplus <- estimate_profiles_mplus(iris,
Sepal.Length, Sepal.Width, Petal.Length, Petal.Width,
n_profiles = 3,
model = 2)
#> Note that this (and other functions that use MPlus) is at the experimental stage! Please provide feedback at https://github.com/jrosen48/tidyLPA
#> LogLik is 256.354
#> BIC is 632.963
#> Entropy is 0.962
m3_mplus <- estimate_profiles_mplus(iris,
Sepal.Length, Sepal.Width, Petal.Length, Petal.Width,
n_profiles = 3,
model = 3)
#> Note that this (and other functions that use MPlus) is at the experimental stage! Please provide feedback at https://github.com/jrosen48/tidyLPA
#> LogLik is 307.178
#> BIC is 744.632
#> Entropy is 0.948
m5_mplus <- estimate_profiles_mplus(iris,
Sepal.Length, Sepal.Width, Petal.Length, Petal.Width,
n_profiles = 3,
model = 6)
#> Note that this (and other functions that use MPlus) is at the experimental stage! Please provide feedback at https://github.com/jrosen48/tidyLPA
#> LogLik is 180.185
#> BIC is 580.839
#> Entropy is 0.97
mclust
Note that model 4 can only be specified in Mplus, so is not included here.
m1_mclust <- estimate_profiles(iris,
Sepal.Length, Sepal.Width, Petal.Length, Petal.Width,
n_profiles = 3,
model = 1)
#> Fit varying means, equal variances, covariances fixed to 0 (Model 1) model with 3 profiles.
#> LogLik is 361.429
#> BIC is 813.05
#> Entropy is 0.979
m2_mclust <- estimate_profiles(iris,
Sepal.Length, Sepal.Width, Petal.Length, Petal.Width,
n_profiles = 3,
model = 2)
#> Fit varying means, equal variances and covariances (Model 2) model with 3 profiles.
#> LogLik is 256.355
#> BIC is 632.965
#> Entropy is 0.985
m3_mclust <- estimate_profiles(iris,
Sepal.Length, Sepal.Width, Petal.Length, Petal.Width,
n_profiles = 3,
model = 3)
#> Fit varying means and variances, covariances fixed to 0 (Model 3) model with 3 profiles.
#> LogLik is 307.181
#> BIC is 744.638
#> Entropy is 0.979
m5_mclust <- estimate_profiles(iris,
Sepal.Length, Sepal.Width, Petal.Length, Petal.Width,
n_profiles = 3,
model = 6)
#> Fit varying means, variances, and covariances (Model 4) model with 3 profiles.
#> LogLik is 180.186
#> BIC is 580.84
#> Entropy is 0.99
iris_log_lik_3
#> # A tibble: 4 x 3
#> model mplus mclust
#> <dbl> <dbl> <dbl>
#> 1 1.00 361 361
#> 2 2.00 256 256
#> 3 3.00 307 307
#> 4 5.00 180 180
Mplus
m1_mplus <- estimate_profiles_mplus(faithful,
eruptions, waiting,
n_profiles = 2,
model = 1)
#> Note that this (and other functions that use MPlus) is at the experimental stage! Please provide feedback at https://github.com/jrosen48/tidyLPA
#> LogLik is 1157.68
#> BIC is 2354.601
#> Entropy is 0.993
m2_mplus <- estimate_profiles_mplus(faithful,
eruptions, waiting,
n_profiles = 2,
model = 2)
#> Note that this (and other functions that use MPlus) is at the experimental stage! Please provide feedback at https://github.com/jrosen48/tidyLPA
#> LogLik is 1140.187
#> BIC is 2325.22
#> Entropy is 0.993
m3_mplus <- estimate_profiles_mplus(faithful,
eruptions, waiting,
n_profiles = 2,
model = 3)
#> Note that this (and other functions that use MPlus) is at the experimental stage! Please provide feedback at https://github.com/jrosen48/tidyLPA
#> LogLik is 1147.806
#> BIC is 2346.065
#> Entropy is 0.999
m5_mplus <- estimate_profiles_mplus(faithful,
eruptions, waiting,
n_profiles = 2,
model = 6)
#> Note that this (and other functions that use MPlus) is at the experimental stage! Please provide feedback at https://github.com/jrosen48/tidyLPA
#> LogLik is 1130.264
#> BIC is 2322.192
#> Entropy is 0.996
mclust
Note that model 4 can only be specified in Mplus, so is not included here.
m1_mclust <- estimate_profiles(faithful,
eruptions, waiting,
n_profiles = 2,
model = 1)
#> Fit varying means, equal variances, covariances fixed to 0 (Model 1) model with 2 profiles.
#> LogLik is 1157.68
#> BIC is 2354.601
#> Entropy is 0.998
m2_mclust <- estimate_profiles(faithful,
eruptions, waiting,
n_profiles = 2,
model = 2)
#> Fit varying means, equal variances and covariances (Model 2) model with 2 profiles.
#> LogLik is 1140.187
#> BIC is 2325.22
#> Entropy is 0.998
m3_mclust <- estimate_profiles(faithful,
eruptions, waiting,
n_profiles = 2,
model = 3)
#> Fit varying means and variances, covariances fixed to 0 (Model 3) model with 2 profiles.
#> LogLik is 1147.806
#> BIC is 2346.065
#> Entropy is 1
m5_mclust <- estimate_profiles(faithful,
eruptions, waiting,
n_profiles = 2,
model = 6)
#> Fit varying means, variances, and covariances (Model 4) model with 2 profiles.
#> LogLik is 1130.264
#> BIC is 2322.192
#> Entropy is 0.999
Here is a summary of the log-likelihood statistics:
geyser_log_lik
#> # A tibble: 4 x 3
#> model mplus mclust
#> <dbl> <dbl> <dbl>
#> 1 1.00 1158 1158
#> 2 2.00 1140 1140
#> 3 3.00 1148 1148
#> 4 5.00 1130 1130
Finding conditions when the models do differ, especially for large or complex datasets, is an important consideration. These differences, when encountered, are possibly due to two reasons; investigating these will be an important next step: