This document is for discussion and planning of standard versus mixed effects modeling (alt. multilevel modeling) approaches for the analyses of data from the Task Switching - Tasking Maintaining project (Traut et al., in prep).
In it’s most basic terms, the TSTM paradigm presented participants with a series of trials differing in Strategy condition (Switching vs. Maintaining) and Cue condition (Cued vs. Uncued) - comprising a 2x2 within subjects design. On each trial, participants either successfully adhered to the correct strategy or they did not. Participants were from one of three age groups (7yo, 10yo, Adults).
The purpose of this document is determine what modeling approaches are most appropriate given the principle questions of the study:
Additional questions for this study include:
The models presented in these notes are for the simplest of these question first.
Our outcome measure of interest, adaptive coordination of switching and maintaining strategies, will be codified as the likelihood of strategy adherence on a given trial. This is derived from binary coding of strategy adherence on a given trial (1/0) and will be calculated from the Log Odds outcome of the base logistic regressions proposed below.
The use of a standard logistic regression approach analyzes the data based on the outcome of each individual trial collapsing across participants.
The base structure of the standard regression model is as follows:
\[LogOddsStrategyAdherence = \beta_{0}+\beta_{1}(Trial) +\beta_{2}(Strategy)+\beta_{3}(Cue)+\beta_{4}(Age)+ \\ \beta_{5}(Trial*Strategy)+\beta_{6}(Trial*Cue)+\beta_{7}(Trial*Age)+ \\ \beta_{8}(Strategy*Cue) + \beta_{9}(Stategy*Age) + \beta_{10}(Cue*Age) \\ \beta_{11}(Trial*Strategy*Cue) + \beta_{12}(Trial*Strategy*Age) + \\ \beta_{13}(Trial*Cue*Age) + \beta_{14}(Strategy*Cue*Age) + \\ \beta_{15}(Trial*Strateg*Cue*Age)\]
Each term within the model assesses the following. Bolded terms are critical to questions of interest.
In terms of implementation using the glm function from base R, the formula would be:
glm(log ~ trial*strategy*cue*age_grp, data = tstm_measures, family = binomial(link = "logit"))
The advantage of the standard logistic regression approach is that it is the simplest sufficient method of assessing our pertinent questions as well as the most readily interpreted approach.
The disadvantage is that it ignores the dependence of trials within participant and the structural association of Age Group to participant level as opposed to trial level.
The use of a mixed effects modeling approach assumes two levels of analysis for the data:
As with the standard approach, the ultimate outcome at Level 1 is the likelihood of strategy adherence on a given trial as gauged by a 1/0 binary from the recorded data.
The model will include a random intercept, allowing estimation of average performance to differ between individual participants. The model does not currently include random slopes.
Level one model (trial level) for this approach is be:
\[LogOddsStrategyAdherence_{ij} = \beta_{0j} + \beta_{1j}(Trial)_{ij} + \beta_{2j}(Strategy)_{ij} + \beta_{3j}(Cue)_{ij}+ \\ \beta_{4j}(Trial*Strategy)_{ij}+\beta_{5j}(Trial*Cue)_{ij}+\beta_{6j}(Strategy*Cue)_{ij} +\\ \beta_{7j}(Trial*Strategy*Cue)_{ij}\]
Where the sampling function is \(Y_{ij}|\varphi_{ij} \sim Bernoulli(\varphi_{ij})\), with a probability mass function for the Bernoulli distribution being: \[P(y;\mu) = \binom{n}{yn}\mu^{yn}(1-\mu)^{(1-y)n}\]
Level two models (participant level) for this approach is:
\[\beta_{0j} = \gamma_{00} + \gamma_{01}(Age_j) + u_{0j} \] \[\beta_{1j} = \gamma_{10} + \gamma_{11}(Age_j)\] \[\beta_{2j} = \gamma_{20} + \gamma_{21}(Age_j)\] \[\beta_{3j} = \gamma_{30} + \gamma_{31}(Age_j)\] \[\beta_{4j} = \gamma_{40} + \gamma_{41}(Age_j)\] \[\beta_{5j} = \gamma_{50} + \gamma_{51}(Age_j)\] \[\beta_{6j} = \gamma_{60} + \gamma_{61}(Age_j)\]
Where \(u_{nj}\) ~ independent, \(N(0,\tau_{n0})\)
For a complete mixed model of:
\[LiklihoodStrategyAdherence_{ij} = \\ \gamma_{00} + \gamma_{01}(Age_j) + \gamma_{10} + \gamma_{11}(Age_j) + \\ \gamma_{20} + \gamma_{21}(Age_j) + \gamma_{30} + \gamma_{31}(Age_j) + \\ \gamma_{40} + \gamma_{41}(Age_j) + \gamma_{50} + \gamma_{51}(Age_j) + \\ \gamma_{60} + \gamma_{61}(Age_j) + u_{0j}\]
Level 1 terms:
Level 2 terms:
Mixed effects logistic regression can be implemented using the glmer function from the lme4 package.
lme4::glmer(log ~ trial*strategy*cue + (1|partID + age), data = tstm_measures,
family = binomial(link = "logit"))
The advantage of the MEM approach is that it (a) allows us to account of the dependence of trials within participant, (b) it allows us to parcel out variance for item-level characteristics (e.g. trial condition) versus participant-level characteristics, and (c) provides a potential more nuanced evaluation of the interplay of predictors in our dataset.
The disadvantage of the MEM approach is it’s complexity in terms of implementation - and the resulting complexity in terms of interpreting results, particularly withing a logistic framework (which is already rather difficult to translate into real world terms).
Is the model syntax for including the level 2 slope for Age group correct? Haven’t found a specific sample addressing this type of model structure.
What exactly is \(\varphi\) in the sampling function?
What is the distribution of error parameters for both the standard and mixed effects logistic regressions?
What value might including random slopes in the mixed effects approach provide for answering our questions? Do we loose anything with regard to interpreting the Age parameters at Level 2 by not including random slopes?
How should ease of comprehension between these two models inform our decision about which approach to use?
What contrast code is most appropriate for the Age Group predictor given the hypothesized linear progression in the development of adaptive coordination for strategy?