Introduction

The topic of reliability (or lack thereof) has recently captivated cognitive psychology. The growing consensus is that interference tasks are not reliable enough to study individual differences. I (1) think this view is misguided and perhaps attributable to a limited perspective on studying variability; and (2) worry the residue of this sometimes sensational debate could quell the curiosity needed to explore yet charted aspects of these tasks and the underlying cognitive process.

In this blog, I discuss an intriguing line of research that recently unearthed substantial heterogeneity in the within-person variance structure, including large individual differences. Note that within-person variance is commonly considered “noise” and indicative of measurement “error.” An alternative perspective considers the “noise” an essential aspect of the cognitive process: not mere “error” but a marker of response time consistency.

Regardless of the perspective, these recent findings have serious ramifications for how we think about cognitive interference tasks and individual differences: if there are individual difference in measurementerror,” this then implies that there are individual differences in reliability! And, in fact, classical test theory (CTT) cannot account for this because a critical assumption is that the trial-to-trial “noise” is the same for each person:

Individual differences in intra-individual variability inherently violate core assumptions of CTT’’ (X)

The assumption of a common (or constant) within-person variance (i.e., \(\sigma^2\) in a mixed-effects model) is demonstrably incorrect in interference tasks. This finding easily passes the inter-ocular trauma test:

…sometimes the result is so obvious that it hits you between the eyes (link).

This replicates in EVERY cognitive task that I have examined (perhaps 15 datasets in total).

Importantly, heterogeneous within-person variance is not a thorn in our side. It opens the door for (1) exciting new ideas that can be woven into the fabric of our theories and the possibility of explaining the “noise”, and (2) extending commonly used quantitative measures to the individual level (e.g., effect sizes).

This post is organize as follows:

  1. I first put the “individual into reliability” (X, X).

  2. Next, I investigate the experimental effect (i.e., incongruent priming) on within-person variance or response time consistency (including individual differences therein).

  3. I then demonstrate that we might be masking individual differences by assuming a constant within-person variance.

  4. Related to 3. I discuss posterior predictive checking specifically for the within-person variance structure.

  5. Finally, I shown how to compute person-specific effect sizes in cognitive interference tasks: Bayesian \(r\).

Varying Reliability

Measurement reliability is a fundamental concept in psychology. It is traditionally considered a stable property of a an experimental task. Although intraclass correlation coefficients (ICC) are often used to assess reliability in repeated measurements designs, their descriptive nature depends upon the assumption of a common within-person variance. Modeling this variance immediately leads to the possibility of varying reliability.

To see this, consider a relative simple random intercepts only model for reaction times from a Stroop task (X):

\[ \begin{equation} \label{eq:1} y_{ij} = \beta_0 + u_{0j} + \epsilon_{ij} \end{equation} \] Here \(i\) denotes the response time for subject \(j\), \(\beta_0\) is the population-averaged mean reaction time (i.e., the fixed effect), and \(u_j\) denotes the person-specific deviation (or “random” effect ) from that average for subject \(j = 1, \ldots, n\). For example, \(\beta_0 + u_j\) is the mean of \(j\)th response time distribution. It is then customary to assume the random effects are drawn from a common distribution

\[ u_{0j} \sim \mathcal{N}(0, \sigma^2_{u0}), \] where \(\sigma^2_{u0}\) is the between-subject variance that captures the dispersion of the random intercepts. Note that it is possible to explain \(\sigma^2_{u0}\) with, say, a “level two” predictor (e.g., gender), but in this example it is constant.

In have intentionally left the residuals, \(\epsilon_{ij}\), to last. To allow for the possibility of varying reliability, the residual variance, \(\sigma^2\), is explicitly modeled with its own hierarchical model:

\[ \begin{align} \varepsilon_{ij} &\sim \mathcal{N}(0, \sigma_{\varepsilon_{ij}}^2) \; \text{with} \\ \sigma_{\varepsilon_{ij}}^2 &= \text{exp}(\eta_0 + u_{1i})^2 . \end{align} \]

The subscripts denote the residual for the \(j\)th person and \(i\)th trial. It is readily apparent that the “error” variance is now allowed to vary across the \(j\) subjects given a log-linear model. This effectively allows for estimating the population-average within-person variance, \(\eta_0\) (on the log scale), as well as the individual deviations (or “random” effects). For example, \(\eta_0 + u_j\) is the standard deviation of \(j\)th response time distribution. These are also assumed to be normally distributed:

\[ u_{1j} \sim \mathcal{N}(0, \sigma^2_{u1}). \]

In this formulation, the random effects are independent. This simplification is for demonstrative purposes, and, in practice, the random effects should share a common multivariate normal distribution.

brms

This model is readily fitted in the R package brms:

fit <- brm(brmsformula(rt ~ 1 + (1|ID),  
                      sigma ~ 1 + (1|ID)), 
           data = dat) 

To estimate the correlation between random effects, (1 | ID) is replaced with (1 | C |ID) for both “random” effects.

Person-Specific ICCs

A random intercept model is commonly used to compute the an ICC. This is often the first step when fitting a mixed-effects model. The ICC is defined as

\[ \text{ICC} = \frac{\sigma_{u0}^2}{\sigma_{u0}^2 + \sigma^2}. \] With an ICC of, say, 0.20, this indicates that 20% of the observed variance in subjects’ reaction times is due to systematic between-subject differences compared to the total variance in reaction times (paraphrasing, pX. X). Alternatively, the ICC indicates “the correlation for any two observations nested within the same individual” (X).

In this case, however, I allowed for the possibility of individual differences in within-person variance (\(\sigma^2\)). Hence, following X, X and X, this allows for computing an ICC for each of the \(j\) subjects. This can written as

\[ \text{ICC}_j = \frac{\sigma_{u0}^2}{\sigma_{u0}^2 + \text{exp}(\eta_0 + u_{1j})^2 }, \; j = 1,\ldots, n. \] The modification to the customary ICC formulation is slight, but the practical implications are huge: for the first time, we can investigate the reliability of interference tasks at the individual level.

Figure 1

Figure 1

Figure 1 includes the estimates for each person. The reaction time means for each subject are in panel A. These are also be provided by a customary random intercepts model that treats \(\sigma^2\) as a constant.

The key innovation is panel B, that is, the reaction time standard deviations for each person. The dotted line corresponds to \(\eta_0\) that is the average within-person variance. However, it is clear that the vast majority of people differ from \(\eta_0\). This indicates that there are individual differences in within-person variance.

The striking variation in Panel B is quite important for thinking about the reliability of interference tasks. For example, when computing a constant ICC, the dotted line in panel B is thought to capture the residual variance in the data. This is not so in these tasks. This heterogeneity is readily seen in panel C, where the dotted line is merely the average ICC or reliability:

In interference tasks, there are individual difference in reliability.

For some individuals, their reaction times were nearly independent, whereas, for others, their reaction times show strong dependency.1 This suggests the task is more or less reliable for certain individuals.

Exciting Idea for New Research

An exciting future direction is to then explain this by, say, predicting both \(\sigma^2_{u0}\) and \(\sigma^2_{u1}\) with sub-models. This would provide insights into the distribution of random effects, and, hence, reliability. For example, if a certain personality type (e.g., consciousness) was positively related to \(\sigma^2_{u0}\), this would imply that the task is more reliable for people that have a “desire to do a task well” (link).

Priming Effect on Within-Person Variance

In these tasks, the question of interest is the experimental effect, say, the effect of incongruent primes on inhibiting irrelevant information. This is commonly examined with mixed-effects models or so-called individual differences model.2. The model discussed in this blog is termed the mixed-effects location-scale model (MELSM, pronounced mel\(-\)zem, X, X, X).

Location Sub-Model

The outcome is reaction time for correct responses on the seconds scale. The location sub-model for the \(j\)th subject and \(i\)th trial can be written as \[ \begin{align} \label{eq:stroop_mean} y_{ij} \sim \beta_0 &+ \beta_1(\text{Incongruent}_{ij}) \\ \nonumber &+ u_{0j} + u_{1j}(\text{Incongruent}_{ij}) + \epsilon_{ij}, \end{align} \] where \(\beta_0\) is the fixed effect intercept and \(\beta_1\) is the fixed effect slope (i.e., the experimental effect). In this formulation, the reference category is the congruent condition. There are random intercepts, \(u_{0j}\), that capture the individual deviations from \(\beta_0\). For subject \(j\), their mean reaction time for the congruent condition is thus \(\beta_0 + u_{0j}\). Additionally, there are random slopes, \(u_{1j}\), that capture the individual deviations from \(\beta_1\). Hence, each person is permitted to have an experimental effect.

Scale (Residual Variance) Sub-Model

The above is a traditional mixed-effects model, in that the distribution of ``errors’’ is not modeled. This is not the case for the mel\(-\)zem:

\[ \begin{align} \label{eq:stroop_var} \epsilon_{ij} &\sim \mathcal{N}(0, \sigma^2_{\epsilon_{ij}}) \text{ with} \\ \nonumber \sigma_{\epsilon_{ij}}^2 &= \text{exp}\big(\eta_0 + \eta_1(\text{Incongruent}_{ij}) \\ \nonumber & \hspace{2 cm} + u_{2i} + u_{3i}(\text{Incongruent}_{ij})\big)^2. \end{align} \]

The subscripts \(i\) and \(j\) denote residuals for the \(j\)th person and \(i\)th trial. Furthermore, the residual variance, \(\sigma^2_{\epsilon_{ij}}\), a function of a mixed-effects model that includes both fixed and random effects. The interpretation differs from the random intercepts only model, because there is a predictor in the location sub-model. Hence, this model captures systematic patterns in the residual variance that were not explained by the experimental manipulation. The parameters are also on logarithmic scale.

  • \(\eta_0\) is the fixed effect and corresponds to the within-person variance for the congruent condition.

  • \(\eta_1\) is the fixed effect slope (i.e., the experimental effect). But, in this case, the effect is on trial-to-trial “noise” or reaction time consistency.

There are also random intercepts, \(u_{2j}\), and slopes, \(u_{3j}\), that capture the individual deviations from the fixed effects. For person \(j\), the within-person variance for their congruent responses is \(\eta_0 +u_{2j}\) and \(\eta_1+ u_{3j}\) is their experimental effect on within-person variance.3

Note that this formulation essentially generalizes the possibility of unequal variances in the congruent and incongruent conditions to the individual level. This is especially important because unequal variance is quite common, but typical models essentially consider only \(\eta_0\) and \(\eta_1\), which assumes that each is constant (or non-varying).

brms

This model can also be fit rather simply with brms:

# formula
form <- brmsformula(rt ~ congruency + (congruency |c| ID),
                    sigma  ~ congruency + (congruency |c| ID))
# fit model
fit_stroop <- brm(form, data = stroop,
                  inits = 0, cores = 4,
                  chains = 4, iter = 3500,
                  warmup = 1000)

The c in (congruency |c| ID) allows for correlation between the sub-models. This is especially important because it is known that the mean and variance are correlated with one another (see Figure X, X).

Figure 2. IIV stands for intraindividual variability (I encourage readers to Google serach this topic in their respective fields)

Figure 2. IIV stands for intraindividual variability (I encourage readers to Google serach this topic in their respective fields)

Figure 2 includes the scale intercepts for a Stroop and Flanker task (X, X). This corresponds to the person-specific estimates for the within-person variance in the congruent condition. The dotted line denotes fixed effect \(\eta_0\). Note that, in models that permit unequal variances for the conditions, this is assumed constant for each subject. This is clearly note the case:

There are HUGE4 individual differences in trial-to-trial “noise”. Alternatively, this can be interpreted as response time consistency. Perhaps there is some underlying and yet unthought of process in the “control” condition.

Figure 3. The intercepts might not be that interesting to some. However, at minimum, this indicates that clearly a location only model (a customary mixed-effects model) is not reasonable.

Figure 3 includes the priming effect on within-person variance. That is, the difference in trial-to-trail “noise” or response time consistency from the congruent condition. The line at 0 % indicates there was difference and the dotted line is \(\eta_1\) (i.e., the fixed effect slope):

There are HUGE individual differences in the congruent priming effect on within-person variance, that is, the ability to consistently inhibit irrelevant information is apparently not the same for each person.

Why not Just Ignore the Within-Person Variance Structure ?

Because the within-person variance is typically consider noise and a nuisance, one perspective might be to just ignore it and use standard mixed-effects model. However, at the conceptual level it is important to consider the assumption of a location only mixed-effect model (MELM):

A mixed model is analogous to fitting a separate linear model for each subject, but then, for some reason, fixing the residual variance to be the same for each subject.5

This translates into assuming that each subject has the same uncertainty in their slope or experimental effect.6 Furthermore, in hierarchical models the estimates are shrunken (or partially pooled, X) and degree of shrnkage (or regualrization to the fixed effect) assumes a constant variance (e.g., see Google for the shrinkage factor). Hence, this suggest that even when the mean effect is of interest, heterogeneous within-person variance should not be ignored.


  1. There are recommendations for interpreting ICCs (X). This typically do not apply to single observations as opposed to average score reliability. I recommend X for a full discussion of this topic

  2. I find this paradoxical because everything but the means are assumed to be identical for each person!

  3. To keep this post concise, I do not write out the full covariance matrix. I refer interested readers to our paper (equation X).

  4. The caps captures my overall frustration with academic twitter suggesting there are no individual differences in these tasks :-)

  5. This definition is not precise, as mixed models provide shrinkage for the estimates. However, it still communicates the fixed effect. implicit assumption of a mixed model for these tasks.

  6. Note that this statement assumes each subject has the same number of trials.