About Mplus

Mplus is a user-friendly software that allows for the analysis of a wide variety of multivariate statistical models with or without latent variables
Advantages include:
  • Different estimation algorithms to handle tricky data (e.g. non-normal data)
  • Modeling continuous and categorical data
  • Handling missing data (FIML)
  • Many good resources
  • Examples of models you can fit in Mplus: CFA, EFA, Latent Growth Curve, IRT, Survival, Latent Class Analysis, and many more


Data Prep

Clean and prep data before importing into Mplus
Mplus accepts tab-delimited (.dat), fixed format text (.dat), or comma separated (.csv) files only
Make sure missing values are coded the same
Don’t include variable names in data file
Only include numeric variables (no character strings)


Mplus Syntax Basics

Two kinds of files – both are text files: .inp = input syntax, .out = output (input syntax reappears at the top)
90 character max per line (can see on bottom)
Mplus commands are blue and end in colons (;)
Subcommands are black and end in semi-colons (:)
Comments start with ! and will turn green
Code runs all at once (whole script)
The terms IS, ARE, and = can be used interchangeably
Can abbreviate variables using dash for contiguous names (e.g. var1 - var10)
Variable names can only be 8 characters long
Not case sensitive
Order of commands is arbitrary
Typically don’t separate items in a list with a comma
Check that data are read in correctly by running a basic analysis and checking values agains original data set


Mplus Commands

TITLE: optional

DATA:

  file = this is where you specify your data file. Whole path if not already in directory where file is

VARIABLE:

  names = write out the names of your variables (make sure in correct order)
  usevariables = list out only the variables you will be using in your analysis
  useobservations = how you tell mplus to only select certain cases (e.g. useobservations = gender 1;)
  missing = specify the code for your missing values (e.g. missing = .;)
  idvariables = any outputted dataset will include this variable (e.g. idvariable = UserID;)

ANALYSIS: under this command, you specify the type of analysis

  type = six different types: GENERAL, MIXTURE, TWOLEVEL, THREELEVEL, CROSSCLASSIFIED, and EFA
  estimator = this is where you can choose your estimator (e.g. ML, WLS, BAYES)

MODEL: used to specify the desired model to be fit

  BY is used to define latent variables (e.g. factor1 BY var1 var2 var3;)
  ON is used to specify regression slopes (e.g. outcomevar ON pred1 pred2 pred3;)
  WITH is used to define covariances (e.g. var1 WITH var2;)
Notes about the MODEL command:
  To freely estimate parameters, use * (e.g. F1 BY var1* var2* var3*)
  To fix parameter values, use @ (e.g. F1 BY var1@1 var2@2 var3@3)
  To refer to means, place variables inside [ ] (e.g. [var1];)
  To refer to variances, just list our variables (e.g. var1 var2 var3;)

OUTPUT: this is where you request specific pieces of output not provided by default.

  SAMPSTAT this option provides descriptives of your sample
  STANDARDIZED this option standardizes parameter estimates in 3 different ways:
    StdYX uses the variances of the continuous latent variables as well as the variances of the background and outcome variables for standardization.
    StdY uses the variances of the continuous latent variables as well as the variances of the outcome variables for standardization.
    Std uses the variances of the continuous latent variables for standardization.

SAVEDATA: this is how you ask for outputted datasets, such as factor scores or model parameters.

  file = specify a file where you want the saved data to go
  save = specify what output you want to save (e.g. save = fscores;)


Linear Structural Equation Models

SEM can analyze relations among multiple dependent and independent variables simultaneously
SEM can also model relations among manifest variables at the latent level (error free)
Latent variables are variables that are not directly observed but are assumed to account for covariances among variables measuring the same construct
SEM with latent variables consists of two parts - a measurement model and a structural model


Simple Linear Regression with Manifest Variables

Let’s start by checking to make sure the data were imported properly
vp <- read.csv("/Users/pegad/Desktop/Mplus_workshop/VP_Promo.csv", na.strings = ".")
names(vp)
## [1] "Visibility1_code" "Visibility2_code" "Leadership1_code"
## [4] "Leadership2_code" "Leadership3_code" "Impact1_code"    
## [7] "Timing1_code"
dim(vp)
## [1] 188   7
summary(vp)
##  Visibility1_code Visibility2_code Leadership1_code Leadership2_code
##  Min.   :0.000    Min.   :0.000    Min.   :1.000    Min.   :1.00    
##  1st Qu.:3.000    1st Qu.:3.000    1st Qu.:4.000    1st Qu.:4.00    
##  Median :4.000    Median :4.000    Median :4.000    Median :4.00    
##  Mean   :3.642    Mean   :3.814    Mean   :4.051    Mean   :4.07    
##  3rd Qu.:5.000    3rd Qu.:5.000    3rd Qu.:5.000    3rd Qu.:5.00    
##  Max.   :5.000    Max.   :5.000    Max.   :5.000    Max.   :5.00    
##  NA's   :1                         NA's   :10       NA's   :17      
##  Leadership3_code  Impact1_code    Timing1_code  
##  Min.   :2.000    Min.   :1.000   Min.   :1.000  
##  1st Qu.:4.000    1st Qu.:3.000   1st Qu.:4.000  
##  Median :4.000    Median :3.000   Median :4.000  
##  Mean   :4.239    Mean   :3.408   Mean   :4.027  
##  3rd Qu.:5.000    3rd Qu.:4.000   3rd Qu.:5.000  
##  Max.   :5.000    Max.   :5.000   Max.   :5.000  
##  NA's   :12       NA's   :4
Now we can run a simple linear regression to see if impact scores predict timing scores
summary(lm(Timing1_code ~ Impact1_code, data = vp))
## 
## Call:
## lm(formula = Timing1_code ~ Impact1_code, data = vp)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.90419 -0.33924  0.09581  0.53086  1.09581 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.59903    0.17427  14.914  < 2e-16 ***
## Impact1_code  0.43505    0.04917   8.847 7.75e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6494 on 182 degrees of freedom
##   (4 observations deleted due to missingness)
## Multiple R-squared:  0.3007, Adjusted R-squared:  0.2969 
## F-statistic: 78.27 on 1 and 182 DF,  p-value: 7.745e-16


Factor Analysis

Moving onto models with latent variables
Factor models are data reduction models - take many variables and reduce them down to fewer factors
Then, you can take these factors and use them in subsequent analyses
There are two kinds of factor models - Exploratory Factor Analysis and Confirmatory Factor Analysis
  • EFA assumes no structure and explores the relations among variables to find factors
  • CFA assumes a structure among the variables based on previous research or theory

Exploratory Factor Analysis

With EFA, we can see if any of our VP promo varibles load onto broader constructs or factors
EFA is an exploratory appraoch, so we have no assumptions about relations among variables, just seeing what turns up


Confirmatory Factor Analysis

With CFA, we can test a model that assumes certain variables load onto certain factors
CFA is used to test a model based on theory or previous research
If we have a large enough data set, we can split the data set into two samples, then run an EFA on one half, and a CFA on the other half
We can test whether vis1 - vis2 load onto a broader Visibility factor, lead 1 - lead3 load onto a braoder Leadership factor, and the relation between the two factor


The first thing to check when testing models, is model fit
There are a number of different fit indices to consult:
  • Chi-square Test - Tests the null hypothesis that covariance matrix and mean vector in the population are equal to the model-implied covariance matrix and mean vector (test of exact model fit). A significant value suggests implied model does not fit exactly in the population.
  • Chi-square Difference Test - Used to statistically compare two nested models. Nested means one model is a special case of another, more general model.
  • Comparative Fit Index (CFI) - Compares target model to a baseline model that assumes no relations among variables. The closer the CFI is to 1, the better the model fits. Good fit is assumed at a CFI of .95 or greater.
  • Tucker-Lewis Index (TLI) - Comparable to CFI, same rules.
  • Root Mean Square Error of Approximation (RMSEA) - Measure of approximate fit. Good model should have RMSEA values smaller than .05. The closer to 0, the better the model fit.
  • Information Criteria (AIC, BIC) - Can be used to compared non-nested models. Model with smaller AIC or BIC is said to fit better.

Running Mplus through R with MplusAutomation

We can utilize the amazingness of R to automate the Mplus process by using the package MplusAutomation


Use “prepareMplusData” function to create Mplus data and inpute file from R
prepareMplusData(vp, "/Users/pegad/Desktop/Mplus_workshop/R_to_Mplus.dat", inpfile = TRUE)


We can also create and run Mplus scripts within R
regression_mod <- mplusObject(TITLE = "Regression;", MODEL = "Timing1_code ON Impact1_code;",
   usevariables = c("Impact1_code", "Timing1_code"), OUTPUT = "SAMPSTAT STDYX;", rdata = vp)

cat(createSyntax(regression_mod, filename = "R_to_Mplus.dat"))
## All ok
## TITLE:
## Regression;
## DATA:
## FILE = "R_to_Mplus.dat";
##  
## VARIABLE:
## NAMES = Impact1_code Timing1_code; 
##  MISSING=.;
##  
## MODEL:
## Timing1_code ON Impact1_code;
## OUTPUT:
## SAMPSTAT STDYX;
results <- mplusModeler(regression_mod, dataout = "test.dat", run = 1)
## Wrote model to: test.inp
## Wrote data to: test.dat
## Warning in prepareMplusData(df = data[i, object$usevariables], filename =
## dataout, : The file 'test.dat' currently exists and will be overwritten
## 
## Running model: test.inp 
## System command: cd "/Users/pegad/Desktop/Mplus_workshop" && "/Applications/Mplus/mplus" "test.inp" 
## Reading model:  test.out


Viewing the results of the model are a little complicated
class(results)
## [1] "mplusObject" "list"
names(results)
##  [1] "TITLE"        "DATA"         "VARIABLE"     "DEFINE"      
##  [5] "ANALYSIS"     "MODEL"        "OUTPUT"       "SAVEDATA"    
##  [9] "PLOT"         "results"      "usevariables" "rdata"       
## [13] "imputed"
results$results
## $input
## $title
## [1] " Regression;"
## 
## $data
## $data$file
## [1] "\"test.dat\""
## 
## 
## $variable
## $variable$names
## [1] "Impact1_code Timing1_code"
## 
## $variable$missing
## [1] "."
## 
## 
## $model
## [1] ""                                "  Timing1_code ON Impact1_code;"
## 
## $output
## [1] ""                  "  SAMPSTAT STDYX;" ""                 
## [4] ""                  ""                 
## 
## attr(,"class")
## [1] "list"      "mplus.inp"
## attr(,"start.line")
## [1] 6
## attr(,"end.line")
## [1] 22
## 
## $warnings
## [[1]]
## [1] "Note that only the first 8 characters of variable names are used in the output."
## [2] "Shorten variable names to avoid any confusion."                                 
## 
## [[2]]
## [1] "Data set contains cases with missing on x-variables."
## [2] "These cases were not included in the analysis."      
## [3] "Number of cases with missing on x-variables:  4"     
## [4] "2 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS"        
## 
## attr(,"class")
## [1] "list"           "mplus.warnings"
## 
## $errors
## list()
## attr(,"class")
## [1] "list"         "mplus.errors"
## 
## $summaries
##   Mplus.version        Title AnalysisType   DataType Estimator
## 1           7.4  Regression;      GENERAL INDIVIDUAL        ML
##   Observations Parameters ChiSqM_Value ChiSqM_DF ChiSqM_PValue
## 1          184          3            0         0             0
##   ChiSqBaseline_Value ChiSqBaseline_DF ChiSqBaseline_PValue       LL
## 1              65.823                1                    0 -180.657
##   UnrestrictedLL CFI TLI     AIC     BIC    aBIC RMSEA_Estimate
## 1       -180.657   1   1 367.314 376.959 367.457              0
##   RMSEA_90CI_LB RMSEA_90CI_UB RMSEA_pLT05 SRMR     AICC Filename
## 1             0             0           0    0 367.4473 test.out
## 
## $parameters
## $parameters$unstandardized
##          paramHeader      param   est    se est_se pval
## 1        TIMING1_.ON IMPACT1_CO 0.435 0.049  8.896    0
## 2         Intercepts TIMING1_CO 2.599 0.173 14.996    0
## 3 Residual.Variances TIMING1_CO 0.417 0.043  9.592    0
## 
## $parameters$r2
##      param   est    se est_se pval
## 1 TIMING1_ 0.301 0.057  5.319    0
## 
## $parameters$stdyx.standardized
##          paramHeader      param   est    se est_se pval
## 1        TIMING1_.ON IMPACT1_CO 0.548 0.052 10.638    0
## 2         Intercepts TIMING1_CO 3.365 0.361  9.325    0
## 3 Residual.Variances TIMING1_CO 0.699 0.057 12.368    0
## 
## 
## $class_counts
## list()
## 
## $residuals
## list()
## 
## $tech1
## list()
## 
## $tech3
## list()
## 
## $tech4
## list()
## 
## $tech7
## list()
## 
## $tech9
## list()
## attr(,"class")
## [1] "list"        "mplus.tech9"
## 
## $tech12
## list()
## 
## $fac_score_stats
## list()
## attr(,"class")
## [1] "list"                "mplus.facscorestats"
## 
## $gh5
## list()
## 
## attr(,"class")
## [1] "mplus.model" "list"       
## attr(,"filename")
## [1] "test.out"
results$results$parameters$stdyx.standardized
##          paramHeader      param   est    se est_se pval
## 1        TIMING1_.ON IMPACT1_CO 0.548 0.052 10.638    0
## 2         Intercepts TIMING1_CO 3.365 0.361  9.325    0
## 3 Residual.Variances TIMING1_CO 0.699 0.057 12.368    0


Using other functions within the package make viewing the results a little easier
extractModelParameters("/Users/pegad/Desktop/Mplus_workshop/test.out")
## $unstandardized
##          paramHeader      param   est    se est_se pval
## 1        TIMING1_.ON IMPACT1_CO 0.435 0.049  8.896    0
## 2         Intercepts TIMING1_CO 2.599 0.173 14.996    0
## 3 Residual.Variances TIMING1_CO 0.417 0.043  9.592    0
## 
## $r2
##      param   est    se est_se pval
## 1 TIMING1_ 0.301 0.057  5.319    0
## 
## $stdyx.standardized
##          paramHeader      param   est    se est_se pval
## 1        TIMING1_.ON IMPACT1_CO 0.548 0.052 10.638    0
## 2         Intercepts TIMING1_CO 3.365 0.361  9.325    0
## 3 Residual.Variances TIMING1_CO 0.699 0.057 12.368    0


We can also use R to run our Mplus scripts in batches
runModels("/Users/pegad/Desktop/Mplus_workshop", replaceOutfile = FALSE)
## 
## Running model: CFA_break.inp 
## System command: cd "/Users/pegad/Desktop/Mplus_workshop" && "/Applications/Mplus/mplus" "CFA_break.inp" 
## 
## Running model: CFA.inp 
## System command: cd "/Users/pegad/Desktop/Mplus_workshop" && "/Applications/Mplus/mplus" "CFA.inp" 
## 
## Running model: Data_check.inp 
## System command: cd "/Users/pegad/Desktop/Mplus_workshop" && "/Applications/Mplus/mplus" "Data_check.inp" 
## 
## Running model: EFA.inp 
## System command: cd "/Users/pegad/Desktop/Mplus_workshop" && "/Applications/Mplus/mplus" "EFA.inp" 
## 
## Running model: R_to_Mplus.inp 
## System command: cd "/Users/pegad/Desktop/Mplus_workshop" && "/Applications/Mplus/mplus" "R_to_Mplus.inp" 
## 
## Running model: Simple_regression.inp 
## System command: cd "/Users/pegad/Desktop/Mplus_workshop" && "/Applications/Mplus/mplus" "Simple_regression.inp" 
## 
## Running model: test.inp 
## System command: cd "/Users/pegad/Desktop/Mplus_workshop" && "/Applications/Mplus/mplus" "test.inp"


Common Errors or Issues

  • *** WARNING in VARIABLE command Note that only the first 8 characters of variable names are used in the output. Shorten variable names to avoid any confusion.

  • *** WARNING Input line exceeded 90 characters. Some input may be truncated.

  • *** WARNING in MODEL command All variables are uncorrelated with all other variables in the model.

  • THE MODEL ESTIMATION TERMINATED NORMALLY

    THE STANDARD ERRORS OF THE MODEL PARAMETER ESTIMATES COULD NOT BE COMPUTED. THE MODEL MAY NOT BE IDENTIFIED. CHECK YOUR MODEL. PROBLEM INVOLVING THE FOLLOWING PARAMETER: Parameter 15, LEAD WITH VIS

If your model doesn’t run (spits out an error message or stops mid-run), check the following:

1 semicolons at the end of each line
2 correct variables specified in “usevariables” command
3 datafile correctly specified and to the correct location
4 everything spelt correctly
5 naming variables in the correct order
6 missing data is coded correctly