Introduction to Mplus

About Mplus

Mplus is a user-friendly software that allows for the analysis of a wide variety of multivariate statistical models with or without latent variables

Advantages include:

Different estimation algorithms to handle tricky data (e.g. non-normal data)
Modeling continuous and categorical data
Handling missing data (FIML)
Many good resources
Examples of models you can fit in Mplus: CFA, EFA, Latent Growth Curve, IRT, Survival, Latent Class Analysis, and many more

Data Prep

Clean and prep data before importing into Mplus

Mplus accepts tab-delimited (.dat), fixed format text (.dat), or comma separated (.csv) files only

Make sure missing values are coded the same

Don’t include variable names in data file

Only include numeric variables (no character strings)

Mplus Syntax Basics

Two kinds of files – both are text files: .inp = input syntax, .out = output (input syntax reappears at the top)

90 character max per line (can see on bottom)

Mplus commands are blue and end in colons (;)

Subcommands are black and end in semi-colons (:)

Comments start with ! and will turn green

Code runs all at once (whole script)

The terms IS, ARE, and = can be used interchangeably

Can abbreviate variables using dash for contiguous names (e.g. var1 - var10)

Variable names can only be 8 characters long

Not case sensitive

Order of commands is arbitrary

Typically don’t separate items in a list with a comma

Check that data are read in correctly by running a basic analysis and checking values agains original data set

Mplus Commands

TITLE: optional

DATA:

file = this is where you specify your data file. Whole path if not already in directory where file is

VARIABLE:

names = write out the names of your variables (make sure in correct order)

usevariables = list out only the variables you will be using in your analysis

useobservations = how you tell mplus to only select certain cases (e.g. useobservations = gender 1;)

missing = specify the code for your missing values (e.g. missing = .;)

idvariables = any outputted dataset will include this variable (e.g. idvariable = UserID;)

ANALYSIS: under this command, you specify the type of analysis

type = six different types: GENERAL, MIXTURE, TWOLEVEL, THREELEVEL, CROSSCLASSIFIED, and EFA

estimator = this is where you can choose your estimator (e.g. ML, WLS, BAYES)

MODEL: used to specify the desired model to be fit

BY is used to define latent variables (e.g. factor1 BY var1 var2 var3;)

ON is used to specify regression slopes (e.g. outcomevar ON pred1 pred2 pred3;)

WITH is used to define covariances (e.g. var1 WITH var2;)

Notes about the MODEL command:

To freely estimate parameters, use (e.g. F1 BY var1* var2* var3)

To fix parameter values, use @ (e.g. F1 BY var1@1 var2@2 var3@3)

To refer to means, place variables inside [ ] (e.g. [var1];)

To refer to variances, just list our variables (e.g. var1 var2 var3;)

OUTPUT: this is where you request specific pieces of output not provided by default.

SAMPSTAT this option provides descriptives of your sample

STANDARDIZED this option standardizes parameter estimates in 3 different ways:

StdYX uses the variances of the continuous latent variables as well as the variances of the background and outcome variables for standardization.

StdY uses the variances of the continuous latent variables as well as the variances of the outcome variables for standardization.

Std uses the variances of the continuous latent variables for standardization.

SAVEDATA: this is how you ask for outputted datasets, such as factor scores or model parameters.

file = specify a file where you want the saved data to go

save = specify what output you want to save (e.g. save = fscores;)

Linear Structural Equation Models

SEM can analyze relations among multiple dependent and independent variables simultaneously

SEM can also model relations among manifest variables at the latent level (error free)

Latent variables are variables that are not directly observed but are assumed to account for covariances among variables measuring the same construct

SEM with latent variables consists of two parts - a measurement model and a structural model

Simple Linear Regression with Manifest Variables

Let’s start by checking to make sure the data were imported properly

vp <- read.csv("/Users/pegad/Desktop/Mplus_workshop/VP_Promo.csv", na.strings = ".")
names(vp)

## [1] "Visibility1_code" "Visibility2_code" "Leadership1_code"
## [4] "Leadership2_code" "Leadership3_code" "Impact1_code"    
## [7] "Timing1_code"

dim(vp)

## [1] 188   7

summary(vp)

##  Visibility1_code Visibility2_code Leadership1_code Leadership2_code
##  Min.   :0.000    Min.   :0.000    Min.   :1.000    Min.   :1.00    
##  1st Qu.:3.000    1st Qu.:3.000    1st Qu.:4.000    1st Qu.:4.00    
##  Median :4.000    Median :4.000    Median :4.000    Median :4.00    
##  Mean   :3.642    Mean   :3.814    Mean   :4.051    Mean   :4.07    
##  3rd Qu.:5.000    3rd Qu.:5.000    3rd Qu.:5.000    3rd Qu.:5.00    
##  Max.   :5.000    Max.   :5.000    Max.   :5.000    Max.   :5.00    
##  NA's   :1                         NA's   :10       NA's   :17      
##  Leadership3_code  Impact1_code    Timing1_code  
##  Min.   :2.000    Min.   :1.000   Min.   :1.000  
##  1st Qu.:4.000    1st Qu.:3.000   1st Qu.:4.000  
##  Median :4.000    Median :3.000   Median :4.000  
##  Mean   :4.239    Mean   :3.408   Mean   :4.027  
##  3rd Qu.:5.000    3rd Qu.:4.000   3rd Qu.:5.000  
##  Max.   :5.000    Max.   :5.000   Max.   :5.000  
##  NA's   :12       NA's   :4

Now we can run a simple linear regression to see if impact scores predict timing scores

summary(lm(Timing1_code ~ Impact1_code, data = vp))

## 
## Call:
## lm(formula = Timing1_code ~ Impact1_code, data = vp)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.90419 -0.33924  0.09581  0.53086  1.09581 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.59903    0.17427  14.914  < 2e-16 ***
## Impact1_code  0.43505    0.04917   8.847 7.75e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6494 on 182 degrees of freedom
##   (4 observations deleted due to missingness)
## Multiple R-squared:  0.3007, Adjusted R-squared:  0.2969 
## F-statistic: 78.27 on 1 and 182 DF,  p-value: 7.745e-16

Factor Analysis

Moving onto models with latent variables

Factor models are data reduction models - take many variables and reduce them down to fewer factors

Then, you can take these factors and use them in subsequent analyses

There are two kinds of factor models - Exploratory Factor Analysis and Confirmatory Factor Analysis

EFA assumes no structure and explores the relations among variables to find factors
CFA assumes a structure among the variables based on previous research or theory

Exploratory Factor Analysis

EFA is an exploratory appraoch, so we have no assumptions about relations among variables, just seeing what turns up

Confirmatory Factor Analysis

With CFA, we can test a model that assumes certain variables load onto certain factors

CFA is used to test a model based on theory or previous research

If we have a large enough data set, we can split the data set into two samples, then run an EFA on one half, and a CFA on the other half

We can test whether vis1 - vis2 load onto a broader Visibility factor, lead 1 - lead3 load onto a braoder Leadership factor, and the relation between the two factor

The first thing to check when testing models, is model fit

There are a number of different fit indices to consult:

Chi-square Test - Tests the null hypothesis that covariance matrix and mean vector in the population are equal to the model-implied covariance matrix and mean vector (test of exact model fit). A significant value suggests implied model does not fit exactly in the population.
Chi-square Difference Test - Used to statistically compare two nested models. Nested means one model is a special case of another, more general model.
Comparative Fit Index (CFI) - Compares target model to a baseline model that assumes no relations among variables. The closer the CFI is to 1, the better the model fits. Good fit is assumed at a CFI of .95 or greater.
Tucker-Lewis Index (TLI) - Comparable to CFI, same rules.
Root Mean Square Error of Approximation (RMSEA) - Measure of approximate fit. Good model should have RMSEA values smaller than .05. The closer to 0, the better the model fit.
Information Criteria (AIC, BIC) - Can be used to compared non-nested models. Model with smaller AIC or BIC is said to fit better.

Running Mplus through R with MplusAutomation

We can utilize the amazingness of R to automate the Mplus process by using the package MplusAutomation

Use “prepareMplusData” function to create Mplus data and inpute file from R

prepareMplusData(vp, "/Users/pegad/Desktop/Mplus_workshop/R_to_Mplus.dat", inpfile = TRUE)

We can also create and run Mplus scripts within R

regression_mod <- mplusObject(TITLE = "Regression;", MODEL = "Timing1_code ON Impact1_code;",
   usevariables = c("Impact1_code", "Timing1_code"), OUTPUT = "SAMPSTAT STDYX;", rdata = vp)

cat(createSyntax(regression_mod, filename = "R_to_Mplus.dat"))

## All ok

## TITLE:
## Regression;
## DATA:
## FILE = "R_to_Mplus.dat";
##  
## VARIABLE:
## NAMES = Impact1_code Timing1_code; 
##  MISSING=.;
##  
## MODEL:
## Timing1_code ON Impact1_code;
## OUTPUT:
## SAMPSTAT STDYX;

results <- mplusModeler(regression_mod, dataout = "test.dat", run = 1)

## Wrote model to: test.inp

## Wrote data to: test.dat

## Warning in prepareMplusData(df = data[i, object$usevariables], filename =
## dataout, : The file 'test.dat' currently exists and will be overwritten

## 
## Running model: test.inp 
## System command: cd "/Users/pegad/Desktop/Mplus_workshop" && "/Applications/Mplus/mplus" "test.inp" 
## Reading model:  test.out

Viewing the results of the model are a little complicated

class(results)

## [1] "mplusObject" "list"

names(results)

##  [1] "TITLE"        "DATA"         "VARIABLE"     "DEFINE"      
##  [5] "ANALYSIS"     "MODEL"        "OUTPUT"       "SAVEDATA"    
##  [9] "PLOT"         "results"      "usevariables" "rdata"       
## [13] "imputed"

results$results

## $input
## $title
## [1] " Regression;"
## 
## $data
## $data$file
## [1] "\"test.dat\""
## 
## 
## $variable
## $variable$names
## [1] "Impact1_code Timing1_code"
## 
## $variable$missing
## [1] "."
## 
## 
## $model
## [1] ""                                "  Timing1_code ON Impact1_code;"
## 
## $output
## [1] ""                  "  SAMPSTAT STDYX;" ""                 
## [4] ""                  ""                 
## 
## attr(,"class")
## [1] "list"      "mplus.inp"
## attr(,"start.line")
## [1] 6
## attr(,"end.line")
## [1] 22
## 
## $warnings
## [[1]]
## [1] "Note that only the first 8 characters of variable names are used in the output."
## [2] "Shorten variable names to avoid any confusion."                                 
## 
## [[2]]
## [1] "Data set contains cases with missing on x-variables."
## [2] "These cases were not included in the analysis."      
## [3] "Number of cases with missing on x-variables:  4"     
## [4] "2 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS"        
## 
## attr(,"class")
## [1] "list"           "mplus.warnings"
## 
## $errors
## list()
## attr(,"class")
## [1] "list"         "mplus.errors"
## 
## $summaries
##   Mplus.version        Title AnalysisType   DataType Estimator
## 1           7.4  Regression;      GENERAL INDIVIDUAL        ML
##   Observations Parameters ChiSqM_Value ChiSqM_DF ChiSqM_PValue
## 1          184          3            0         0             0
##   ChiSqBaseline_Value ChiSqBaseline_DF ChiSqBaseline_PValue       LL
## 1              65.823                1                    0 -180.657
##   UnrestrictedLL CFI TLI     AIC     BIC    aBIC RMSEA_Estimate
## 1       -180.657   1   1 367.314 376.959 367.457              0
##   RMSEA_90CI_LB RMSEA_90CI_UB RMSEA_pLT05 SRMR     AICC Filename
## 1             0             0           0    0 367.4473 test.out
## 
## $parameters
## $parameters$unstandardized
##          paramHeader      param   est    se est_se pval
## 1        TIMING1_.ON IMPACT1_CO 0.435 0.049  8.896    0
## 2         Intercepts TIMING1_CO 2.599 0.173 14.996    0
## 3 Residual.Variances TIMING1_CO 0.417 0.043  9.592    0
## 
## $parameters$r2
##      param   est    se est_se pval
## 1 TIMING1_ 0.301 0.057  5.319    0
## 
## $parameters$stdyx.standardized
##          paramHeader      param   est    se est_se pval
## 1        TIMING1_.ON IMPACT1_CO 0.548 0.052 10.638    0
## 2         Intercepts TIMING1_CO 3.365 0.361  9.325    0
## 3 Residual.Variances TIMING1_CO 0.699 0.057 12.368    0
## 
## 
## $class_counts
## list()
## 
## $residuals
## list()
## 
## $tech1
## list()
## 
## $tech3
## list()
## 
## $tech4
## list()
## 
## $tech7
## list()
## 
## $tech9
## list()
## attr(,"class")
## [1] "list"        "mplus.tech9"
## 
## $tech12
## list()
## 
## $fac_score_stats
## list()
## attr(,"class")
## [1] "list"                "mplus.facscorestats"
## 
## $gh5
## list()
## 
## attr(,"class")
## [1] "mplus.model" "list"       
## attr(,"filename")
## [1] "test.out"

results$results$parameters$stdyx.standardized

##          paramHeader      param   est    se est_se pval
## 1        TIMING1_.ON IMPACT1_CO 0.548 0.052 10.638    0
## 2         Intercepts TIMING1_CO 3.365 0.361  9.325    0
## 3 Residual.Variances TIMING1_CO 0.699 0.057 12.368    0

Using other functions within the package make viewing the results a little easier

extractModelParameters("/Users/pegad/Desktop/Mplus_workshop/test.out")

## $unstandardized
##          paramHeader      param   est    se est_se pval
## 1        TIMING1_.ON IMPACT1_CO 0.435 0.049  8.896    0
## 2         Intercepts TIMING1_CO 2.599 0.173 14.996    0
## 3 Residual.Variances TIMING1_CO 0.417 0.043  9.592    0
## 
## $r2
##      param   est    se est_se pval
## 1 TIMING1_ 0.301 0.057  5.319    0
## 
## $stdyx.standardized
##          paramHeader      param   est    se est_se pval
## 1        TIMING1_.ON IMPACT1_CO 0.548 0.052 10.638    0
## 2         Intercepts TIMING1_CO 3.365 0.361  9.325    0
## 3 Residual.Variances TIMING1_CO 0.699 0.057 12.368    0

We can also use R to run our Mplus scripts in batches

runModels("/Users/pegad/Desktop/Mplus_workshop", replaceOutfile = FALSE)

## 
## Running model: CFA_break.inp 
## System command: cd "/Users/pegad/Desktop/Mplus_workshop" && "/Applications/Mplus/mplus" "CFA_break.inp" 
## 
## Running model: CFA.inp 
## System command: cd "/Users/pegad/Desktop/Mplus_workshop" && "/Applications/Mplus/mplus" "CFA.inp" 
## 
## Running model: Data_check.inp 
## System command: cd "/Users/pegad/Desktop/Mplus_workshop" && "/Applications/Mplus/mplus" "Data_check.inp" 
## 
## Running model: EFA.inp 
## System command: cd "/Users/pegad/Desktop/Mplus_workshop" && "/Applications/Mplus/mplus" "EFA.inp" 
## 
## Running model: R_to_Mplus.inp 
## System command: cd "/Users/pegad/Desktop/Mplus_workshop" && "/Applications/Mplus/mplus" "R_to_Mplus.inp" 
## 
## Running model: Simple_regression.inp 
## System command: cd "/Users/pegad/Desktop/Mplus_workshop" && "/Applications/Mplus/mplus" "Simple_regression.inp" 
## 
## Running model: test.inp 
## System command: cd "/Users/pegad/Desktop/Mplus_workshop" && "/Applications/Mplus/mplus" "test.inp"

Common Errors or Issues

*** WARNING in VARIABLE command Note that only the first 8 characters of variable names are used in the output. Shorten variable names to avoid any confusion.
*** WARNING Input line exceeded 90 characters. Some input may be truncated.
*** WARNING in MODEL command All variables are uncorrelated with all other variables in the model.
THE MODEL ESTIMATION TERMINATED NORMALLY

THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES COULD NOT BE COMPUTED. THE MODEL MAY NOT BE IDENTIFIED. CHECK YOUR MODEL. PROBLEM INVOLVING THE FOLLOWING PARAMETER: Parameter 15, LEAD WITH VIS

If your model doesn’t run (spits out an error message or stops mid-run), check the following:

1 semicolons at the end of each line
2 correct variables specified in “usevariables” command
3 datafile correctly specified and to the correct location
4 everything spelt correctly
5 naming variables in the correct order
6 missing data is coded correctly

Mplus Resources

Mplus User’s Guide v.7
Mplus Syntax Cheatsheet
Data Analysis with Mplus by Christian Geiser
idre UCLA
MplusAutomation