Introduction

Frequentist statistics have been the staple paradigm for inferential statistics since the early 20th century, when they were introduced by Fisher, Neyman and Egon (Biau et al, 2010). Fisher developed the theory behind p-values, Neyman and Pearson developed the theory behind null hypothesis significance testing (NHST). The development of these theories have been of great importance in scientific research ever since. However, recently there have been more and more concerns about the misuse of these tools (Goodman, 2008; Hubbard & Bayarri, 2009; Sterne & Smith, 2001). Some researchers have even gone as far as calling the NHST procedure a “ritualized devil’s advocacy” (Abelson, 1995). One fundamental problem with the “ritualized devil’s advocacy” is the inevitability of false positives and the inflexible approach overall. The idea that there is not a probability assigned to the hypothesis itself but rather the data under the hypothesis and that we have a dichotomous outcome can itself be regarded as problematic, science is not black and white. With Bayes rule this probability can also be estimated and we can monitor the accumulation of evidence for one hypothesis over another over time (Nair, 2022). By using Bayesian hierarchical models, the data does not get aggregated as it would when using frequentist statistics. This allows us to examine interindividual differences in depth. Another key advantage is its ability to handle nonlinear models makes Bayesian hierarchical modeling an invaluable tool in Psychology and provides flexibility(Rouder, Morey, & Pratte, 2013). Lastly, using Bayesian statistical processes over Frequentist ones leads to fewer convergence problems, although this doesn’t mean that there are no technical issues in this approach. Even though there is a rise in criticism, the vast majority of researchers still prefer the use of frequentist statistics, such as ANOVA, over Bayesian statistics. In our case, testing the Stroop Task effect with repeated measures ANOVA would assume independent data while the data of trials over time may be highly correlated (e.g. learning). The use of aggregated data is in this case problematic, while Bayesian hierarchical modeling takes the individual differences into account explicitly instead of this aggregation (Rouder & Lu, 2005, as cited in Veenman, Stefan, & Haaf, 2022). Still, the use of Bayesian statistics is rising, partly due to the advanced of computational resources and inventions such as Markov Chain Monte Carlo. In the current study, Bayesian hierarchical modelling was used to study inter- and intraindividual variability in the Stroop task. The sample used in this study was derived from the Stroop task, as originally established by Pratte et al. (2010).

Data

The current study contained the data of 38 participants. Each of these participants completed 7 blocks with 66 to 67 trials on the Stroop-task. The primary outcome variable was the reaction time on the individual trials. The mean and standard deviation of reaction time in seconds on congruent trials was (M = 7.02; SD = 2.81) and for incongruent trials (M = 7.89; SD = 3.28). The more generalized mean doesn’t give much information about the individual differences of interest. Furthermore, the standard deviations are vast relative to the average reaction time. This is indicative of vast within person variation for the Stroop effect. Plot 1 shows the differences in individual Stroop effects, taking the mean of each subject over all trials, for different individuals there are differences in overall reaction time as well as differences in the effect. These differences will be investigated further in our analyses.

Plots 1 and 2

Finally, a normal Q-Q plot (Plot 3) was made in order to check possible deviation from normality. The plot indicates that the data is not normally distributed.

#Plot 3: Normal Q-Q plot Check Normality Reaction Times
qqnorm(dat.stroop.p1$rt) # violated normality 

Model

To model individual variability in a repeated measures design, the following notation was assumed: Let \(Y_{ijk}\) be a response variable for the kth trial, for the ith participant in the jth condition.

\(Y_{ijk}\) ~ \(N(α_i + θ_i*x:j, σ)\)

Here, \(α_i\)=reaction times per subject, \(σ_i\)= error per subject, \(x_j\)= condition with \(x_j\) = -1 for the congruent condition and \(x_j\) = 1 for the incongruent condition, \(θ_i\)= reaction time difference per subject and σ= the error/variation overall. The main variable of interest here is the \(θ_i\), being the Stroop effect; the slope, for each subject depending on the condition (congruent vs incongruent) . Using these parameters, an unstructured model was developed applying a normal likelihood where μ and σ are population parameters describing the mean and variance of the effects across the population. [insert likelihood model syntax here] Although our analysis revealed that the reaction times violated the assumption of normality, it was opted to use the normal likelihood for several reasons. As newcomers to this type of analysis, it was sought to maintain simplicity and avoid overcomplicating the modeling process. Whilst a lognormal distribution would have been a more appropriate likelihood, using this distribution would involve transformation of the logarithmic parameters which introduces additional complexities. A normal likelihood in this case still provided a reasonable approximation for the data.

Priors

In line with the Bayesian framework, priors were selected for the aforementioned parameters. Namely \(α_i\)=reaction times per subject, σi= error per subject, \(x_j\)= condition with \(x_j\)= -1 for the congruent condition and \(x_j\)= 1 for the incongruent condition, θi= reaction time difference per subject, which reflected a wide range of beliefs and were weakly informative (check if this is true!). Priors were selected using the sampling based approach. Priors were constructed for the group-level parameters and individual effects. Priors were used which were normally distributed and are conjugate to the normal likelihood resulting in the following priors:

\(Y_{ij}\) ~ \(N(α_i + θ_i*x_j, σ)\)

\(σ_i\) ~ \(N(μα, σα)\)

\(θ_i\) ~ \(N(μθ, σθ)\)

\(σ\) ~ \(N+(0, 1)\)

\(μα\) ~ \(N(0.7, 1)\)

\(σ, σα, σβ\) ~ \(N+(0, 1)\)

\(μθ\) ~ \(N(0, 1)\)

For our priors on the standard deviation parameters, it was opted to use a truncated normal distribution as these parameters cannot be less than zero. Although an inverse gamma distribution assumes only real numbers greater than zero, the resulting distribution is fairly concentrated around the central tendency compared to the normal distribution. Due to the limited prior information available, the utilization of priors with a normal distribution allowed for the reflection of a wide range of beliefs, thereby enabling the data to exert a stronger influence on the posterior distribution. With our priors defined we simulated our prior predictions to later use in the fitted posterior model: