Modelling parallel planning in written production as Bayesian mixture process

Slides: rpubs.com/jensroes/rg-leicester-2025-talk

Understanding factors that influence interkey intervals requires a theory of how the mental processes that underlie the generation of keystrokes are coordinated.

How do we produce text?

General stages of word planning

Classic (serial) view of writing

Pause duration before a writer starts a sentence is typically longer than before a mid-sentence word and these are longer than between mid-word key presses (e.g. Conijn, Roeser, and van Zaanen 2019).
Long pauses are typically followed by production bursts (Hayes 2012; Kaufer, Hayes, and Flower 1986).
Planning message and syntax for a sentence adds to the time required to prepare the upcoming word, and the motor planning required to produce the first keystroke (Baaijen, Galbraith, and De Glopper 2012; Roeser, Torrance, and Baguley 2019).

Parallel view of writing

Fluent output is maintained by processes that run in parallel (Olive 2014; Roeser et al. 2025; Van Galen 1991).
Variations in interkey intervals are not sufficiently explained by text location.
Writers often do not hesitate before sentence (or word) onset.
Sentence initial durations are often shorter than what one would expect if sentence-initial pauses reflect planning (Medimorec and Risko 2017; Rønneberg et al. 2022)
Sentence plans are usually incomplete at sentence onset (Nottbusch 2010; Roeser, Torrance, and Baguley 2019).

How can we represent serial and parallel models statistcally?

Serial view: single-process model

\[ \begin{align} \log(\text{iki}_i) \sim\text{ } \mathcal{N}(\mu, \sigma_{e}^2) \end{align} \]

Serial view: single-process model

\[ \begin{align} \log(\text{iki}_i) \sim&\ \mathcal{N}(\mu_i, \sigma_{e}^2)\\ \mu_i =&\ \alpha + \beta_\text{diff} \times \text{textlocation[i]} \end{align} \]

Parallel view: two distributions mixture-process

Process 1 (fluent interkey interval): interkey intervals are determined by time taken to move to the next key (executing motor movements is usually 100-150 msecs, Conijn, Roeser, and van Zaanen 2019; Van Waes et al. 2021).
Process 2 (hesitant interkey interval): interkey intervals are determined by time taken to complete upstream processes.

Parallel view: two distributions mixture-process

\[ \begin{align} \log(\text{iki}_{i}) \sim&\ \theta \times \mathcal{N}(\mu, \sigma_{e}^2) \\ \theta =&\ 1 \end{align} \]

Parallel view: two distributions mixture-process

\[ \begin{align} \log(\text{iki}_{i}) \sim\text{ } &\ \theta \times \mathcal{N}(\mu_2, \sigma_{e'}^2) + \\ &\ (1 - \theta) \times \mathcal{N}(\mu_1, \sigma_{e}^2) \end{align} \]

Parallel view: two distributions mixture-process

\[ \begin{align} \log(\text{iki}_{i}) \sim\text{ } &\ \theta \times \mathcal{N}(\mu_2, \sigma_{e'}^2) + \\ &\ (1 - \theta) \times \mathcal{N}(\mu_1, \sigma_{e}^2)\\ \mu_1 = &\ \alpha\\ \mu_{2} = &\ \alpha + \delta_\text{diff} \\ \text{constraint:} &\ \delta_\text{diff} > 0 \end{align} \]

Simulation: jens-roeser.shinyapps.io/mixture-of-gaussians/

Parallel view: two distributions mixture-process

\[ \begin{align} \log(\text{iki}_{i}) \sim\text{ } &\ \theta_\text{textlocation[i]} \times \mathcal{N}(\mu_{2[i]}, \sigma_{e'_\text{textlocation[i]}}^2) + \\ &\ (1 - \theta_\text{textlocation[i]}) \times \mathcal{N}(\mu_1, \sigma_{e_\text{textlocation[i]}}^2)\\ \mu_1 = &\ \alpha\\ \mu_{2[i]} = &\ \alpha + \delta_\text{diff} \times \text{textlocation[i]}\\ \text{constraint:} &\ \delta_\text{diff} > 0 \end{align} \] \[ \begin{align} \theta_\text{diff} = \theta_\text{textlocation[1]} - \theta_\text{textlocation[2]} \end{align} \]

How can we evaluate these models?

Six datasets with interkey intervals

Dataset	Source	Keylogger	Task	N	Age	Sample	Country	Language
C2L1	Rønneberg et al. (2022)	EyeWrite	Argumentative	126	12	6th graders	Norway	Norwegian
CATO	Torrance et al. (2016)	EyeWrite	Expository	52	17	Secondary school students (dyslexic, non dyslexic)	Norway	Norwegian
GE2	Ofstad Oxborough and Torrance (2011)	EyeWrite	Argumentative	45	19	Undergraduate students	UK	English
LIFT	Vandermeulen et al. (2020)	InputLog	Synthesis	658	17	Secondary school students	The Netherlands	Dutch
PLanTra	Rossetti and Van Waes (2022)	InputLog	Text simplification	47	23	Master students	Belgium	English (L2)
SPL2	Torrance, Roeser, and Chukharev (n.d.)	CyWrite	Argumentative	39	21	Undergraduate students	USA	English

Text location classifications

Location	Example
Within word	T\(^{\wedge}\)h\(^{\wedge}\)e c\(^{\wedge}\)a\(^{\wedge}\)t m\(^{\wedge}\)e\(^{\wedge}\)o\(^{\wedge}\)w\(^{\wedge}\)e\(^{\wedge}\)d. T\(^{\wedge}\)h\(^{\wedge}\)e\(^{\wedge}\)n i\(^{\wedge}\)t s\(^{\wedge}\)l\(^{\wedge}\)e\(^{\wedge}\)p\(^{\wedge}\)t.
Before word	The \(^{\wedge}\)cat \(^{\wedge}\)meowed. Then \(^{\wedge}\)it \(^{\wedge}\)slept.
Before sentence	The cat meowed. \(^{\wedge}\)Then it slept.
^a Note: Key intervals that terminated in a space or revision were removed.

Model overview

Models	Description
Serial
M1	Single distribution Gaussian
M2	Single distribution log-Gaussian
M3	Single distribution log-Gaussian with different variance components per text location
Parallel
M4	Two-distributions mixture of log-Gaussians

Implementation

Bayesian models were implemented in Stan (Carpenter et al. 2016) and run using rstan (Stan Development Team 2018) via \(R\).
Stan code was based on Vasishth et al. (2017; see also Roeser et al. 2024, 2025).
All models were implemented with random intercepts for participants.
Leave-one-out cross-validation: sum of the expected log predictive density \(\widehat{elpd}\)
Difference between models \(\Delta\widehat{elpd}\) (Vehtari, Gelman, and Gabry 2015, 2017) was summarised as

\(\mid\frac{\Delta\widehat{elpd}}{\text{SE}_\text{diff}}\mid\),

i.e. the standardised change in predictive performance (Sivula et al. 2020).

Which model showed better performance?

Model comparisons

Values are the absolute ratio of the difference in predictive performance measured as \(\widehat{elpd}\) (Vehtari, Gelman, and Gabry 2015, 2017) and its standard error \(\mid\frac{\Delta\widehat{elpd}}{\text{SE}}\mid\) which corresponds to the \(z\)-score of the change in predictive performance (Sivula et al. 2020).
Data set	Mixture process (M3 vs M4)	Single process (unequal var.; M2 vs M3)	Single process (M1 vs M2)
C2L1
CATO
GE2
LIFT
PLanTra
SPL2

Model comparisons

Values are the absolute ratio of the difference in predictive performance measured as \(\widehat{elpd}\) (Vehtari, Gelman, and Gabry 2015, 2017) and its standard error \(\mid\frac{\Delta\widehat{elpd}}{\text{SE}}\mid\) which corresponds to the \(z\)-score of the change in predictive performance (Sivula et al. 2020).
Data set	Mixture process (M3 vs M4)	Single process (unequal var.; M2 vs M3)	Single process (M1 vs M2)
C2L1	23	13	47
CATO	24	18	43
GE2	27	26	63
LIFT	40	21	46
PLanTra	26	17	64
SPL2	18	25	69

Model fit to data

How do text locations affect the mixture process?

Mixture distributions

Key location effects

Values indicate log BFs in support of the alternative hypothesis over the null hypothesis.
	Hesitation slowdown		Hesitation probability
Dataset	before sentence vs word	before vs within word	before sentence vs word	before vs within word
C2L1
CATO
GE2
LIFT
PLanTra
SPL2

Key location effects

Values indicate log BFs in support of the alternative hypothesis over the null hypothesis.
	Hesitation slowdown		Hesitation probability
Dataset	before sentence vs word	before vs within word	before sentence vs word	before vs within word
C2L1	0.14	-2.71	-0.39	46.46
CATO	18.6	-1.61	-0.41	21.97
GE2	33.55	3.29	18.31	23.16
LIFT	-0.69	-0.88	0.47	24.61
PLanTra	13.23	3.96	-1.27	28.82
SPL2	48.26	2.2	5.8	23.81

Planning text unfolds in parallel to production!

First robust evidence that written composition is a parallel process.
Writers do not necessarily pause at larger linguistic locations but plan utterances in parallel to writing.
Evidenced by
- stronger predictive performance for mixture models.
- pauses are not always more likely before sentences compare to words (but often longer).
Pauses are consistently more likely at word-initial location compared to word-medial but (1) not always longer and (2) for sentence-initial interkey intervals pausing behaviour varies as a function of writing experience, languages (e.g. young / L2 writers, students), composition tasks.

Planning text unfolds in parallel to production!

Mixture models closely align with what we know about the mental process that underlies the generation of keystroke intervals.
Model parameters can be used to separate the writing process in fluent writing, slowdown for hesitations and the probability of hesitations.
Parameter estimates can be used to test hypothesis about factors that cause changes in hesitation patterns based on a principled theoretical / statistical framework.
Mixture models are useful for modelling data that are generated by more than one mental process (Gelman et al. 2014; Vasishth et al. 2017).

“Maybe mixture models are just always bettter?”

Simulation

Simulate data from (i) a single log-normal distribution and (ii) a mixture of two log-normal distributions.
Analysed both in (i) a single process model and (ii) a mixture model.
Data simulated with mixture process:
- Advantage for mixture model over single process model: \(\Delta\widehat{elpd} =\) -191.3 (16.5)
Data simulated with single process:
- Negligible difference between models: \(\Delta\widehat{elpd} =\) -0.5 (0.7)

Thank you for listening!

Published in Roeser, J., Conijn, R., Chukharev, E., Ofstad, G. H., & Torrance, M. (2025). Typing in tandem: Language planning in multisentence text production is fundamentally parallel. Journal of Experimental Psychology: General, 154(7), 1824–1854.

For a tutorial on Bayesian mixed-effects mixture-model analysis for keystroke data see rpubs.com/jensroes/mixture-models-tutorial.

This work was supported by

US National Science Foundation 2016868: “ProWrite: Biometric feedback for improving college students’ writing processes.”
UKRI ESRC ES/W011832/1: “Can you use it in a sentence?: Establishing how word-production difficulties shape text formation.”

References

Baaijen, Veerle M., David Galbraith, and Kees De Glopper. 2012. “Keystroke Analysis: Reflections on Procedures and Measures.” Written Communication 29 (3): 246–77.

Carpenter, Bob, Andrew Gelman, Matt Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Michael A. Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2016. “Stan: A Probabilistic Programming Language.” Journal of Statistical Software 20.

Conijn, Rianne, Jens Roeser, and Menno van Zaanen. 2019. “Understanding the Keystroke Log: The Effect of Writing Task on Keystroke Features.” Reading and Writing 32 (9): 2353–74.

Gelman, Andrew, J. B. Carlin, H. S. Stern, D. B. Dunson, Aki Vehtari, and D. B. Rubin. 2014. Bayesian Data Analysis. 3rd ed. Chapman; Hall/CRC.

Hayes, John R. 2012. “Evidence from Language Bursts, Revision, and Transcription for Translation and Its Relation to Other Writing Processes.” In Translation of Thought to Written Text While Composing: Advancing Theory, Knowledge, Methods, and Applications, edited by M. Fayol, D. Alamargot, and V. Berninger, 15–25. Psychology Press.

Kaufer, David S., John R. Hayes, and Linda S. Flower. 1986. “Composing Written Sentences.” Research in the Teaching of English, 121–40.

Medimorec, Srdan, and Evan F. Risko. 2017. “Pauses in Written Composition: On the Importance of Where Writers Pause.” Reading and Writing 30: 1267–85.

Nottbusch, Guido. 2010. “Grammatical Planning, Execution, and Control in Written Sentence Production.” Reading and Writing 23 (7): 777–801.

Ofstad Oxborough, Gunn Helen, and Mark Torrance. 2011. “Multilevel Analysis of Latency in Writing.” In 21st Annual Meeting of the Society for Text and Discourse. http://textanddiscourse2011.conference.univ-poitiers.fr/PROG_DEFIN.pdf.

Olive, Thierry. 2014. “Toward a Parallel and Cascading Model of the Writing System: A Review of Research on Writing Processes Coordination.” Journal of Writing Research 6 (2): 173–94.

Roeser, Jens, Rianne Conijn, Evgeny Chukharev, Gunn H. Ofstad, and Mark Torrance. 2025. “Typing in Tandem: Language Planning in Multisentence Text Production Is Fundamentally Parallel.” Journal of Experimental Psychology: General 154 (7): 1824–54. https://doi.org/10.1037/xge0001759.

Roeser, Jens, Sven De Maeyer, Mariëlle Leijten, and Luuk Van Waes. 2024. “Modelling Typing Disfluencies as Finite Mixture Process.” Reading and Writing 37 (2): 359–84.

Roeser, Jens, Mark Torrance, and Thom Baguley. 2019. “Advance Planning in Written and Spoken Sentence Production.” Journal of Experimental Psychology: Learning, Memory, and Cognition 45 (11): 1983–2009. https://doi.org/10.1037/xlm0000685.

Rønneberg, Vibeke, Mark Torrance, Per Henning Uppstad, and Christer Johansson. 2022. “The Process-Disruption Hypothesis: How Spelling and Typing Skill Affects Written Composition Process and Product.” Psychological Research 86 (7): 2239–55.

Rossetti, Alessandra, and Luuk Van Waes. 2022. “Text Simplification in Second Language: Process and Product Data.” Zenodo. https://doi.org/10.5281/zenodo.6720290.

Sivula, Tuomas, Måns Magnusson, Asael Alonzo Matamoros, and Aki Vehtari. 2020. “Uncertainty in Bayesian Leave-One-Out Cross-Validation Based Model Comparison.” arXiv Preprint arXiv:2008.10296.

Stan Development Team. 2018. “ RStan: The R Interface to Stan.” https://mc-stan.org/.

Torrance, Mark, Jens Roeser, and Evgeny Chukharev. n.d. “Lookback Supports Cascaded, Just-in-Time Processing in Second Language Written Composition.”

Torrance, Mark, Vibeke Rønneberg, Christer Johansson, and Per Henning Uppstad. 2016. “Adolescent Weak Decoders Writing in a Shallow Orthography: Process and Product.” Scientific Studies of Reading 20 (5): 375–88.

Van Galen, Gerard P. 1991. “Handwriting: Issues for a Psychomotor Theory.” Human Movement Science 10 (2): 165–91.

Van Waes, Luuk, Mariëlle Leijten, Jens Roeser, Thierry Olive, and Joachim Grabowski. 2021. “Measuring and Assessing Typing Skills in Writing Research.” Journal of Writing Research 13 (1): 107–53. https://doi.org/10.17239/jowr-2021.13.01.04.

Vandermeulen, Nina, Sven De Maeyer, Elke Van Steendam, Marije Lesterhuis, Huub Van den Bergh, and Gert Rijlaarsdam. 2020. “Mapping Synthesis Writing in Various Levels of Dutch Upper-Secondary Education: A National Baseline Study on Text Quality, Writing Process and Students’ Perspectives on Writing.” Pedagogische Studiën: Tijdschrift Voor Onderwijskunde En Opvoedkunde 97 (3): 187–236.

Vasishth, Shravan, N. Chopin, R. Ryder, and Bruno Nicenboim. 2017. “Modelling Dependency Completion in Sentence Comprehension as a Bayesian Hierarchical Mixture Process: A Case Study Involving Chinese Relative Clauses.” ArXiv e-Prints.

Vehtari, Aki, Andrew Gelman, and Jonah Gabry. 2015. “Pareto Smoothed Importance Sampling.” arXiv Preprint arXiv:1507.02646.

———. 2017. “Practical Bayesian Model Evaluation Using Leave-One-Out Cross-Validation and WAIC.” Statistics and Computing 27 (5): 1413–32.