Notes on model selection

Joshua M. Rosenberg

2018-09-19

First, I examined a wide range of models and solutions.
- I did this in order to select particular, candidate models to scrutinize in greater detail.
- In order to carry out this analysis, I followed guidelines recommended by the developers of MPlus (Asparouhov & Muthen, 2012; Muthen & Muthen, 2017) as well as those making recommendations about its use (Geiser, 2012):
  - In particular, I set the number of starts to 600 for initial stage starts, and to 120 for the number of starts to be optimized.
  - This means that for each model estimated, 600 random starting values for the parameters were used to initialize the EM algorithm.
  - Of these 600, 120 that demonstrated the lowest log-likelihood were allowed to continue until they reached convergence or the limit for the number of iterations.
  - In order for a model to me considered trustworthy, of these 120 runs, the lowest log-likelihood must be replicated at least one time.
Second, for the models with replicated log-likelihoods, I followed recommendations associated with mixture modeling (Collins and Lanza, 2009; Geiser, 2012) and the authors of the MPlus software (Muthen & Muthen, 2017) as well as recent peer-reviewed articles (Pastor et al., 2007) to choose which models to focus upon in greater detail.
- Simulation studies have suggested that BIC, CAIC, SABIC, and BLRT are most helpful for selecting the correct number of profiles (Nylund, Asparouhov, & Muthen, 2007).
- As the test statistics associated with unmodified LRT do not follow the distribution that the test is based on (Muthen & Muthen, 2017). These are the Vu-Lo-Mendell-Rubin LRT, Lo-Mendell-Rubin LRT, and the bootstrapped LRT. Of the three, the bootstrapped is considered to be the best indicator of which of two models, one nested (with certain parameters fixed to 0) within the other, fits better, but it is also the most computationally-intensive to carry out (Asparouhov & Muthen, 2012). A p-value greater than .05 suggests that the model with fewer profiles should be preferred.
- For the entropy statistic, higher values are considered better, though scholars have suggested that the entropy statistic not be used for model selection (Lubke & Muthen, 2007).
- Here’s an example for one model type:
  - For solutions associated with model 1, the decrease (indicating a preferred model) in information criteria becomes smaller as the number of profiles increases from 5 to 6 and 6 to 7. A solution associated with 8 profiles did not replicate the log-likelihood and the VLMR and LMR suggest that the solution associated with 9 profiles did not fit better than that with 8 profiles, suggesting that models with 7 or fewer profiles be preferred. Considering these models, the entropy statistic increases by a large amount between the solution associated with 4 and 5 profiles (and then decreases slightly between 5 and 6 and 6 and 7 profile solutions), suggesting (but not providing conclusive evidence) that models 5, 6, or 7 may be preferred. The bootstrapped LRT suggests that, until the log-likelihood is not replicated, every more complex model be selected. Taking these pieces of evidence into conclusion, for model 1, solutions associated with 4 through 7 may be considered in more depth, with an emphasis on solutions associated with profiles with 5 and 6 profiles on the basis of the slowing of the decrease in the information criteria associated with the solutions with greater profiles than these, and the increase in the entropy from 4 to 5 (and 6) profile solutions.
Third, having chosen a subset of the models, I focused upon interpreting them. Here are examples for one solution:
- Model 1 type, 6 profile solutions
  - Profiles:
    - a full profile, profile 6
    - a universally low profile, profile 2
    - an all moderate profile, profile 5–and, like, the model 1, six profile solution–with moderate levels of affective engagement
    - an only behaviorally engaged profile, profile 1, with moderate levels of behavioral engagement, very low affective engagement, and moderately (low) levels of cognitive engagement and challenge and competence
    - an only affectively engaged profile, profile 4, with moderate levels of affective engagement, low levels of behavioral engagement, and moderately (low) levels of cognitive engagement and challenge and competence
    - an engaged and competent but not challenged profile, profile 3, characterized by high levels of each of the three dimensions of engagement and of competence, but with low levels of challenge
  - Interpretation
    - The number of observations associated with each of the profiles is somewhat balanced, with the universally low profile with the largest number of observations (n = 667; the same number for this profile as in the model 1, five profile solution), followed by the all moderate profile (n = 638). Each of the other four profiles were associated with 300 to 400 observations. Unlike the model 1, four and five profile solutions, which distinguished observations on either a condition of engagement (i.e., competence) or one of its dimensions (i.e., cognitive, behavioral, and affective), this solution was associated with profiles that distinguished observations on the basis of both: There were profiles for only behaviorally and affectively engaged and for engaged and competent but not challenged. While the engaged and competent but not challenged was distinguished by low levels of challenge–different from the profile associated with the model 1, four profile solution characterized by high levels of competence–this solution is compelling because it appears to group students on the basis of multiple of the indicators, and demonstrate viability on the basis of the fit statistics (i.e., Tables 5.1 and 5.2 and Figure 5.1). The log-likelihood was replicated two times, with the next lowest log-likelihood not being replicated, followed by a log-likelihood that was replicated (at least) seven times. This solution (associated with the log-likelihood that was replicated [at least] seven times) could be investigated in further detail, to see whether–and if so, how–it differs from the solution interpreted here. Pending further exploration, this solution seems like a potential candidate for use in subsequent analyses.