Run Mplus from within Rmarkdown

Need to set up mplus as an engine for knitr. I am hacking my way through this:

Code chunks in my Rmd

You might not want the Stata line

   knitr::opts_chunk$set(engine.path = list(
      stata = "/Applications/Stata/StataMP.app/Contents/MacOS/stata-mp" ,
      mplus = "/Applications/Mplus/mplus"
   ))
  knitr::knit_engines$set(mplus = function(options) {
    code <- paste(options$code, collapse = "\n")
    fileConn<-file("formplus.inp")
    writeLines(code, fileConn)
    close(fileConn)
    out  <- system2("/Applications/Mplus/mplus", "formplus.inp")
    fileConnOutput <- file("formplus.out")
    mplusOutput <- readLines(fileConnOutput)
    knitr::engine_output(options, code, mplusOutput)
})

Use MplusAutomation to prepare data

First get some data

#install.packages("palmerpenguins")
library(palmerpenguins)
df <- penguins
str(df)
## tibble [344 × 8] (S3: tbl_df/tbl/data.frame)
##  $ species          : Factor w/ 3 levels "Adelie","Chinstrap",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ island           : Factor w/ 3 levels "Biscoe","Dream",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ bill_length_mm   : num [1:344] 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ...
##  $ bill_depth_mm    : num [1:344] 18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ...
##  $ flipper_length_mm: int [1:344] 181 186 195 NA 193 190 181 195 193 190 ...
##  $ body_mass_g      : int [1:344] 3750 3800 3250 NA 3450 3650 3625 4675 3475 4250 ...
##  $ sex              : Factor w/ 2 levels "female","male": 2 1 1 NA 1 2 1 2 NA NA ...
##  $ year             : int [1:344] 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ...
df <- df[,c("body_mass_g","flipper_length_mm")]
str(df)
## tibble [344 × 2] (S3: tbl_df/tbl/data.frame)
##  $ body_mass_g      : int [1:344] 3750 3800 3250 NA 3450 3650 3625 4675 3475 4250 ...
##  $ flipper_length_mm: int [1:344] 181 186 195 NA 193 190 181 195 193 190 ...

Now load up MplusAutomation package

#install.packages("MplusAutomation")
library(MplusAutomation)
## Version:  0.8
## We work hard to write this free software. Please help us get credit by citing: 
## 
## Hallquist, M. N. & Wiley, J. F. (2018). MplusAutomation: An R Package for Facilitating Large-Scale Latent Variable Analyses in Mplus. Structural Equation Modeling, 25, 621-638. doi: 10.1080/10705511.2017.1402334.
## 
## -- see citation("MplusAutomation").
MplusAutomation::prepareMplusData(df,"pp.dat",interactive="FALSE")
## The file(s)
##  'pp.dat' 
## currently exist(s) and will be overwritten
## TITLE: Your title goes here
## DATA: FILE = "pp.dat";
## VARIABLE: 
## NAMES = body_mass_g flipper_length_mm; 
## MISSING=.;

Here’s a picture of what the next chunk of my rmd file looks like

Figure 1. Isn’t it amazing

Now specify and run a simple Mplus model in mplus/Rmarkdown

About variable names. The prepared pp.dat file is just data, no variable names just like Mplus wants it. Programmer has to keep track of the order, and that is why it is so nice that MplusAutomation::prepareMplusData produces that shell of an Mplus input file. But importantly, Mplus will accept data with >8 characters but will only report the first 8 in output. It is therefore better to use short variable names when working with Mplus. I am just going to use different names in the code below that are short. Mplus will also make these all uppercase in output.

TITLE:     Palmer penguins body mass and flipper length
DATA:      FILE = pp.dat; 
VARIABLE:  NAMES = weight flipper ; 
           MISSING = . ; 
MODEL:     weight on flipper ;
## Mplus VERSION 8.5 (Mac)
## MUTHEN & MUTHEN
## 02/26/2021  11:15 AM
## 
## INPUT INSTRUCTIONS
## 
##   TITLE:     Palmer penguins body mass and flipper length
##   DATA:      FILE = pp.dat;
##   VARIABLE:  NAMES = weight flipper ;
##              MISSING = . ;
##   MODEL:     weight on flipper ;
## 
## 
## 
## *** WARNING
##   Data set contains cases with missing on all variables.
##   These cases were not included in the analysis.
##   Number of cases with missing on all variables:  2
##    1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS
## 
## 
## 
## Palmer penguins body mass and flipper length
## 
## SUMMARY OF ANALYSIS
## 
## Number of groups                                                 1
## Number of observations                                         342
## 
## Number of dependent variables                                    1
## Number of independent variables                                  1
## Number of continuous latent variables                            0
## 
## Observed dependent variables
## 
##   Continuous
##    WEIGHT
## 
## Observed independent variables
##    FLIPPER
## 
## 
## Estimator                                                       ML
## Information matrix                                        OBSERVED
## Maximum number of iterations                                  1000
## Convergence criterion                                    0.500D-04
## Maximum number of steepest descent iterations                   20
## Maximum number of iterations for H1                           2000
## Convergence criterion for H1                             0.100D-03
## 
## Input data file(s)
##   pp.dat
## 
## Input data format  FREE
## 
## 
## SUMMARY OF DATA
## 
##      Number of missing data patterns             1
## 
## 
## COVARIANCE COVERAGE OF DATA
## 
## Minimum covariance coverage value   0.100
## 
## 
##      PROPORTION OF DATA PRESENT
## 
## 
##            Covariance Coverage
##               WEIGHT        FLIPPER
##               ________      ________
##  WEIGHT         1.000
##  FLIPPER        1.000         1.000
## 
## 
## 
## UNIVARIATE SAMPLE STATISTICS
## 
## 
##      UNIVARIATE HIGHER-ORDER MOMENT DESCRIPTIVE STATISTICS
## 
##          Variable/         Mean/     Skewness/   Minimum/ % with                Percentiles
##         Sample Size      Variance    Kurtosis    Maximum  Min/Max      20%/60%    40%/80%    Median
## 
##      WEIGHT             4201.754       0.468    2700.000    0.29%    3450.000   3800.000   4050.000
##              342.000  641250.577      -0.726    6300.000    0.29%    4300.000   4950.000
##      FLIPPER             200.915       0.344     172.000    0.29%     188.000    194.000    197.000
##              342.000     197.154      -0.987     231.000    0.29%     203.000    215.000
## 
## 
## THE MODEL ESTIMATION TERMINATED NORMALLY
## 
## 
## 
## MODEL FIT INFORMATION
## 
## Number of Free Parameters                        3
## 
## Loglikelihood
## 
##           H0 Value                       -2528.427
##           H1 Value                       -2528.427
## 
## Information Criteria
## 
##           Akaike (AIC)                    5062.855
##           Bayesian (BIC)                  5074.359
##           Sample-Size Adjusted BIC        5064.843
##             (n* = (n + 2) / 24)
## 
## Chi-Square Test of Model Fit
## 
##           Value                              0.000
##           Degrees of Freedom                     0
##           P-Value                           0.0000
## 
## RMSEA (Root Mean Square Error Of Approximation)
## 
##           Estimate                           0.000
##           90 Percent C.I.                    0.000  0.000
##           Probability RMSEA <= .05           0.000
## 
## CFI/TLI
## 
##           CFI                                1.000
##           TLI                                1.000
## 
## Chi-Square Test of Model Fit for the Baseline Model
## 
##           Value                            486.641
##           Degrees of Freedom                     1
##           P-Value                           0.0000
## 
## SRMR (Standardized Root Mean Square Residual)
## 
##           Value                              0.000
## 
## 
## 
## MODEL RESULTS
## 
##                                                     Two-Tailed
##                     Estimate       S.E.  Est./S.E.    P-Value
## 
##  WEIGHT   ON
##     FLIPPER           49.685      1.514     32.818      0.000
## 
##  Intercepts
##     WEIGHT         -5780.780    304.917    -18.959      0.000
## 
##  Residual Variances
##     WEIGHT         *********  11818.094     13.077      0.000
## 
## 
## QUALITY OF NUMERICAL RESULTS
## 
##      Condition Number for the Information Matrix              0.579E-03
##        (ratio of smallest to largest eigenvalue)
## 
## 
##      Beginning Time:  11:15:48
##         Ending Time:  11:15:48
##        Elapsed Time:  00:00:00
## 
## 
## 
## MUTHEN & MUTHEN
## 3463 Stoner Ave.
## Los Angeles, CA  90066
## 
## Tel: (310) 391-9971
## Fax: (310) 391-8971
## Web: www.StatModel.com
## Support: Support@StatModel.com
## 
## Copyright (c) 1998-2020 Muthen & Muthen

Check model in R

model <- glm(body_mass_g ~ flipper_length_mm , data=df)
summary(model)
## 
## Call:
## glm(formula = body_mass_g ~ flipper_length_mm, data = df)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1058.80   -259.27    -26.88    247.33   1288.69  
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       -5780.831    305.815  -18.90   <2e-16 ***
## flipper_length_mm    49.686      1.518   32.72   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 155455.3)
## 
##     Null deviance: 219307697  on 341  degrees of freedom
## Residual deviance:  52854796  on 340  degrees of freedom
##   (2 observations deleted due to missingness)
## AIC: 5062.9
## 
## Number of Fisher Scoring iterations: 2

Kind of interesting they are not exactly the same.

Now I use the MplusAutomation::readModels to get useful information from the output back to R

penguinsResults <- MplusAutomation::readModels(target="formplus.out")
## Reading model:  formplus.out
summary(penguinsResults)
## Palmer penguins body mass and flipper lengthEstimated using ML 
## Number of obs: 342, number of (free) parameters: 3 
## 
## Model: Chi2(df = 0) = 0, p = 0 
## Baseline model: Chi2(df = 1) = 486.641, p = 0 
## 
## Fit Indices: 
## 
## CFI = 1, TLI = 1, SRMR = 0 
## RMSEA = 0, 90% CI [0, 0], p < .05 = 0 
## AIC = 5062.855, BIC = 5074.359
print(penguinsResults$parameters)
## $unstandardized
##          paramHeader   param       est        se  est_se pval
## 1          WEIGHT.ON FLIPPER    49.685     1.514  32.818    0
## 2         Intercepts  WEIGHT  -5780.78   304.917 -18.959    0
## 3 Residual.Variances  WEIGHT ********* 11818.094  13.077    0