1 Methods

1.1 Design

All participants were exposed to adjacent and non-adjacent dependencies. In a sequence A-B-C, the location of a dot C was target of an adjacent dependency when always following the same B location while the location of dot A was random. In nonadjacent dependencies, the location of a dot C was always following the same location of A with the location of dot B being random.

Each participant was presented with two blocks with different location sets (order of location sets was counterbalanced across participants). Each block contained 4 sequences of three elements. Each of the four sequences was repeated 40 times per block.

There are two main differences between the three experiments. First, participants were either encouraged to predict the next target (directive instruction) or not (non-directive instruction). Second, adjacent and nonadjacent dependencies were either included in the same block, and thus presented concurrently in a mixed block or adjacent and non adjacent dependencies were presented in different blocks (not mixed).

Experiment 1: non-directive instruction; dependencies not mixed (4 sequences of one dependency type per block / location set)
Experiment 2: directive instruction; dependencies not mixed (4 sequences of one dependency type per block / location set)
Experiment 3: directive instruction; dependencies mixed (2 sequences per dependency type in each block / location set)

1.2 Participants

Experiment 1: We tested 20 participants (15 females, 5 males). The median age of the sample was 19.5 years (SD = 1.75) with an age range from 18 to 24 years.

Experiment 2: We tested 26 participants (19 females, 7 males). The median age of the sample was 27.5 years (SD = 15.21) with an age range from 18 to 62 years.

Experiment 3: We tested 27 participants (21 females, 6 males). The median age of the sample was 31 years (SD = 12.05) with an age range from 20 to 57 years.

2 Results

2.1 Model fit

The dependent variable was the number of eye samples on the target dot (C in the sequence A-B-C) before it illuminated (and 250 msecs after the previous target illuminated) accumulating across occurrences.

Data were analysed in Bayesian mixed effects models following a zero-inflated negative binomial distribution (Gelman et al. 2014; McElreath 2016). The R package brms (Bürkner 2017, 2018) was used to model the data using the probabilistic programming language Stan (Carpenter et al. 2016; Hoffman and Gelman 2014). Fixed effects were main effects and interactions of occurrence id (1 to 40) and dependency type (levels: adjacent, nonadjacent, baseline). Occurrence id was treatment coded and dependency type was sum coded (for advantages of prespecifying model contrasts see Schad et al. 2020) comparing each dependency type to baseline; the baseline consists of transitions to every dot that was essentially random and not part of a dependency (neither dependee, nor dependent). Further, to model the learning curves, occurrence was modelled as quadratic function (modelled as second order orthogonal polynomial).

Models were fitted with maximal random effects structure (Barr et al. 2013; Bates et al. 2015) including random participant intercepts with by-participant slope adjustments for the second order function of occurrence id as well as random slopes for the second-order polynomial of occurrence id¹ and their interaction with the combination of location set (2 sets), sequence (4 sequences per dependency type and location set), dependency type, and transition (levels: to A for adjacent dependencies, to B for nonadjacent dependencies, and to C) rending 16 levels for the latter.²

We calculated the statistical support for the alternative hypothesis over the null hypothesis. This evidence was obtained using Bayes Factors (henceforth, BF) calculated using the Savage-Dickey method (see, e.g., Dickey, Lientz, et al. 1970; Wagenmakers et al. 2010). We calculated both the evidence for the alternative hypothesis H\(_1\) over the null hypothesis given the data. A BF larger than 5 indicate moderate and larger than 10 strong evidence for a statistically meaningful effect compared to the null hypothesis (see, e.g., Baguley 2012; Jeffreys 1961; Lee and Wagenmakers 2014). For example BF of 2 reflect that the alternative hypothesis is two times more likely than the null hypothesis given the evidence. In contrast to traditional statistical methods (null-hypothesis significance testing), the Bayesian framework allows us to infer the evidence against the alternative hypothesis typically corresponding to BFs smaller than 0.33 (for discussion see Dienes 2014, 2016; Dienes and Mclatchie 2018; Schönbrodt et al. 2017; Wagenmakers et al. 2018).

Models were fitted with weakly informative priors (see McElreath 2016) and run with 20,000 iterations on 3 chains with a warm-up of 10,000 iterations and no thinning. Model convergence was confirmed by the Rubin-Gelman statistic (Gelman and Rubin 1992) and inspection of the Markov chain Monte Carlo chains.

2.2 Modelling results

Fixed effects for all predictors are summarized in Table 2.1 as posterior estimate with 95% probability intervals and the evidence in support of the alternative hypothesis indicated as \(H_1\). We found strong evidence for learning adjacent dependencies in Experiments 1 and 2 as indicated by the advantage for adjacent dependencies over baseline. For nonadjacent dependencies we found strong evidence for a learning disadvantage compared to baseline in Experiments 1 and 2. These effects (main effects of adjacent and nonadjacent dependencies) disappeared in Experiment 3, when adjacent and nonadjacent dependencies were learnt in the same block. This was support by strong evidence against the alternative hypothesis over the null hypothesis for both the main effect of dependency and the main effect of adjacency. There was moderate evidence for an occurrence id by nonadjacent dependencies interaction in experiment 1 but not in experiment 2. Evidence for the first and second order growth term was strong. Evidence for all interactions was negligible.

Table 2.1: Fixed effects summaries for main effects and interactions of adjacent and nonadjacent dependency (compared to baseline) and occurrence id with linear and quadratic term.
	Experiment 1		Experiment 2		Experiment 3
Predictor	Est. with 95% PI	H₁	Est. with 95% PI	H₁	Est. with 95% PI	H₁
Occurrence id (linear)	79.78 [77.2 – 82.27]	>100	92.4 [90.05 – 94.88]	>100	92.71 [90.27 – 95.17]	>100
Occurrence id (quadratic)	-24.68 [-26.92 – -22.41]	>100	-28.18 [-30.23 – -26.19]	>100	-27.4 [-29.44 – -25.36]	>100
Adjacent	0.16 [0.12 – 0.2]	>100	0.12 [0.09 – 0.16]	>100	-0.04 [-0.08 – 0]	0.11
Nonadjacent	-0.09 [-0.13 – -0.05]	>100	-0.08 [-0.11 – -0.04]	>100	0.05 [0.01 – 0.1]	0.33
Occurrence id (linear) : Adjacent	1.69 [-0.37 – 3.89]	1.17	1.69 [-0.44 – 3.83]	1.25	2.02 [-0.45 – 4.55]	1.49
Occurrence id (quadratic) : Adjacent	-1.2 [-3.18 – 0.74]	0.65	-0.13 [-2.1 – 1.78]	0.31	0.02 [-2.32 – 2.35]	0.39
Occurrence id (linear) : Nonadjacent	-2.66 [-4.94 – -0.56]	6.91	-1.71 [-3.86 – 0.46]	1.2	-1.72 [-4.27 – 0.85]	1.05
Occurrence id (quadratic) : Nonadjacent	0.66 [-1.32 – 2.68]	0.4	-0.09 [-2.05 – 1.86]	0.31	0.14 [-2.25 – 2.54]	0.39
Note:
H₁ = evidence in favour of the alternative hypothesis over the null hypothesis (Bayes Factor); PI = probability interval; ‘:’ = interaction

Figure 2.1 shows the modelled learning curves for adjacent and nonadjacent dependencies across experiments compared to baseline. The posterior learning curves highlight that the adjacency learning effect in Experiments 1 and 2 had a stronger magnitude than the disadvantage for nonadjacent dependencies. Both dependency effects disappear for the mixed presentation in Experiment 3.

Figure 2.1: Modelled learning curves with posterior mean and 95% PIs indicated as ribbons.

2.3 Pooled analysis

In a pooled analysis of all three experiments, we aimed to find statistical evidence that learning adjacent dependencies fails when people are exposed to adjacent and nonadjacent dependencies at the same time. In the pooled analysis we added experiment as interaction term for to answer this question. The factor experiment was coded via Helmert contrast comparing Experiment 1 to Experiment 2 and comparing Experiment 3 to both Experiments 1 and 2 in combination (see Schad et al. 2020).

Table 2.2 summarizes the results of the cross-experiment analysis. We found strong evidence for an overall larger number of looks to the target in Experiments 1 and 2 compared to Experiment 3 (see main effect Exp 1 & 2 vs 3) but no difference between Experiments 1 and 2. As in Experiments 1 and 2 before, we found main effects for adjacent and nonadjacent dependencies. Both these effects interacted with the comparison between Experiments 1 and 2 vs Experiment 3 indicating that any dependency related effects disappeared for learning mixed sequences. This finding supports the hypothesis that learning of adjacent dependencies fails when presented concurrently with nonadjacent dependencies. Finally there was weak evidence for each an adjacent and and nonadjacent dependency by occurrence id (linear) interaction suggesting faster learning for adjacent dependencies and slower learning for nonadjacent dependencies compared to baseline. Evidence for all other effects was negligible.

Table 2.2: Fixed effects summaries for main effects and interactions of experiment, adjacent and nonadjacent dependencies and occurrence id with linear and quadratic term.
Predictor	Est. with 95% PI	H₁
Main effects
Occurrence id (linear)	153.66 [150.72 – 156.69]	>100
Occurrence id (quadratic)	-46.45 [-48.66 – -44.26]	>100
Adjacent	0.1 [0.07 – 0.12]	>100
Nonadjacent	-0.05 [-0.08 – -0.03]	>100
Exp 1 vs 2	-0.01 [-0.02 – 0]	0.02
Exp 1 & 2 vs 3	-0.05 [-0.06 – -0.05]	>100
Two-way interactions
Occurrence id (linear) : Adjacent	2.8 [0.24 – 5.42]	4.14
Occurrence id (quadratic) : Adjacent	-0.61 [-2.85 – 1.62]	0.42
Occurrence id (linear) : Nonadjacent	-3.01 [-5.6 – -0.41]	5.56
Occurrence id (quadratic) : Nonadjacent	0.08 [-2.16 – 2.32]	0.38
Occurrence id (linear) : Exp 1 vs 2	0.23 [-1.55 – 2.05]	0.31
Occurrence id (quadratic) : Exp 1 vs 2	-0.05 [-1.86 – 1.74]	0.3
Occurrence id (linear) : Exp 1 & 2 vs 3	-0.85 [-1.89 – 0.18]	0.59
Occurrence id (quadratic) : Exp 1 & 2 vs 3	0.81 [-0.23 – 1.88]	0.51
Adjacent : Exp 1 vs 2	-0.01 [-0.02 – 0]	0.01
Nonadjacent : Exp 1 vs 2	0 [-0.01 – 0.02]	0.01
Adjacent : Exp 1 & 2 vs 3	0.02 [0.01 – 0.03]	>100
Nonadjacent : Exp 1 & 2 vs 3	-0.03 [-0.03 – -0.02]	>100
Three-way interactions
Occurrence id (linear) : Adjacent : Exp 1 vs 2	0.16 [-1.4 – 1.76]	0.74
Occurrence id (quadratic) : Adjacent : Exp 1 vs 2	0.39 [-1.16 – 2.04]	0.83
Occurrence id (linear) : Nonadjacent : Exp 1 vs 2	0.19 [-1.38 – 1.8]	0.76
Occurrence id (quadratic) : Nonadjacent : Exp 1 vs 2	-0.09 [-1.7 – 1.48]	0.76
Occurrence id (linear) : Adjacent : Exp 1 & 2 vs 3	0.49 [-0.67 – 1.68]	0.8
Occurrence id (quadratic) : Adjacent : Exp 1 & 2 vs 3	0.24 [-0.93 – 1.41]	0.6
Occurrence id (linear) : Nonadjacent : Exp 1 & 2 vs 3	-0.01 [-1.2 – 1.16]	0.58
Occurrence id (quadratic) : Nonadjacent : Exp 1 & 2 vs 3	-0.09 [-1.28 – 1.09]	0.57
Note:
H₁ = evidence in favour of the alternative hypothesis over the null hypothesis (Bayes Factor); PI = probability interval

References

Baguley, Thomas. 2012. Serious Stats: A Guide to Advanced Statistics for the Behavioral Sciences. Basingstoke: Palgrave Macmillan.

Barr, Dale J., Roger Levy, Christoph Scheepers, and Harry J. Tily. 2013. “Random Effects Structure for Confirmatory Hypothesis Testing: Keep It Maximal.” Journal of Memory and Language 68 (3): 255–78.

Bates, Douglas M., Reinhold Kliegl, Shravan Vasishth, and R. Harald Baayen. 2015. “Parsimonious Mixed Models.” arXiv Preprint arXiv:1506.04967.

Bürkner, Paul-Christian. 2017. “brms: An R Package for Bayesian Multilevel Models Using Stan.” Journal of Statistical Software 80 (1): 1–28. https://doi.org/10.18637/jss.v080.i01.

———. 2018. “Advanced Bayesian Multilevel Modeling with the R Package brms.” The R Journal 10 (1): 395–411. https://doi.org/10.32614/RJ-2018-017.

Carpenter, Bob, Andrew Gelman, Matt Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Michael A. Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2016. “Stan: A Probabilistic Programming Language.” Journal of Statistical Software 20.

Dickey, James M., B. P. Lientz, et al. 1970. “The Weighted Likelihood Ratio, Sharp Hypotheses about Chances, the Order of a Markov Chain.” The Annals of Mathematical Statistics 41 (1): 214–26.

Dienes, Zoltan. 2014. “Using Bayes to Get the Most Out of Non-Significant Results.” Frontiers in Psychology 5 (781): 1–17.

———. 2016. “How Bayes Factors Change Scientific Practice.” Journal of Mathematical Psychology 72: 78–89.

Dienes, Zoltan, and Neil Mclatchie. 2018. “Four Reasons to Prefer Bayesian Analyses over Significance Testing.” Psychonomic Bulletin & Review 25 (1): 207–18.

Gelman, Andrew, J. B. Carlin, H. S. Stern, D. B. Dunson, Aki Vehtari, and D. B. Rubin. 2014. Bayesian Data Analysis. 3rd ed. Chapman; Hall/CRC.

Gelman, Andrew, and Donald B. Rubin. 1992. “Inference from Iterative Simulation Using Multiple Sequences.” Statistical Science 7 (4): 457–72.

Hoffman, Matthew D., and Andrew Gelman. 2014. “The No-U-Turn sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo.” Journal of Machine Learning Research 15 (1): 1593–623.

Jeffreys, Harold. 1961. The Theory of Probability. Vol. 3. Oxford: Oxford University Press, Clarendon Press.

Lee, Michael D., and Eric-Jan Wagenmakers. 2014. Bayesian Cognitive Modeling: A Practical Course. Cambridge University Press.

McElreath, Richard. 2016. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. CRC Press.

Schad, Daniel J., Shravan Vasishth, Sven Hohenstein, and Reinhold Kliegl. 2020. “How to Capitalize on a Priori Contrasts in Linear (Mixed) Models: A Tutorial.” Journal of Memory and Language 110: 104038.

Schönbrodt, Felix D., Eric-Jan Wagenmakers, Michael Zehetleitner, and Marco Perugini. 2017. “Sequential Hypothesis Testing with Bayes Factors: Efficiently Testing Mean Differences.” Psychological Methods 22 (2): 322–39.

Vehtari, Aki, Andrew Gelman, and Jonah Gabry. 2015. “Pareto Smoothed Importance Sampling.” arXiv Preprint arXiv:1507.02646.

———. 2017. “Practical Bayesian Model Evaluation Using Leave-One-Out Cross-Validation and WAIC.” Statistics and Computing 27 (5): 1413–32.

Wagenmakers, Eric-Jan, Tom Lodewyckx, Himanshu Kuriyal, and Raoul Grasman. 2010. “Bayesian Hypothesis Testing for Psychologists: A Tutorial on the Savage–Dickey Method.” Cognitive Psychology 60 (3): 158–89.

Wagenmakers, Eric-Jan, Maarten Marsman, Tahira Jamil, Alexander Ly, Josine Verhagen, Jonathon Love, Ravi Selker, et al. 2018. “Bayesian Inference for Psychology. Part I: Theoretical Advantages and Practical Ramifications.” Psychonomic Bulletin & Review 25 (1): 35–57.

Model comparisons using leave-one-out cross-validation to prevent overfitting (Vehtari, Gelman, and Gabry 2015, 2017) revealed a higher predictive performance for occurrence id modelled as quadratic compared to a linear predictor (in the context of a negative binomial model, this would render an exponential function); Experiment 1: \(\Delta\widehat{elpd}\) = -659, SE = 31, linear: \(\widehat{elpd}\) = -112,142, SE = 152, quadratic: \(\widehat{elpd}\) = -111,483, SE = 158; Experiment 2: \(\Delta\widehat{elpd}\) = -964, SE = 38, linear: \(\widehat{elpd}\) = -144,805, SE = 179, quadratic: \(\widehat{elpd}\) = -143,841, SE = 185; Experiment 3: \(\Delta\widehat{elpd}\) = -483, SE = 25, linear: \(\widehat{elpd}\) = -147,611, SE = 227, quadratic: \(\widehat{elpd}\) = -147,129, SE = 227.↩︎
In brms syntax: y ~ poly(occurrence, 2) * dependency + (poly(occurrence, 2) * sequence|participant) where sequence has 16 levels per participant comprising location sets, dependency types, transition location, and 4 different sequences per dependency type.↩︎

Concurrent learning of adjacent and nonadjacent dependencies

Jens Roeser

22/07/2020