General stages of language planning

General stages of language planning

Classic (serial) view of writing

Parallel view of writing

Statistical models of typing key-interval data

Models Description
Serial
M1 Single distribution Gaussian
M2 Single distribution log-Gaussian
M3 Single distribution log-Gaussian with different variance components per text location
Parallel
M4 Two-distributions mixture of log-Gaussians

Bayesian models were implemented in Stan (Carpenter et al., 2016) and run using rstan (Stan Development Team, 2018) via \(R\).

Six datasets with key interval data

from \(>\) 5m keystrokes produced by 967 ppts of various demographics!

Dataset Task N Age Sample Language
Rønneberg et al. (2022) Argumentative 126 12 6th graders Norwegian
Torrance et al. (2016) Expository 52 17 Secondary school students (dyslexic, non dyslexic) Norwegian
Chukharev et al. (2025) Argumentative 39 21 Undergraduate students English
Rossetti & Van Waes (2022) Text simplification 47 23 Master students English (L2)
Vandermeulen et al. (2020) Synthesis 658 17 Secondary school students Dutch
Ofstad Oxborough & Torrance (2011) Argumentative 45 19 Undergraduate students English

Planning text unfolds in parallel to production!

  • Cross-validation showed stronger performance for mixture models across all data sets.
  • Writers often do not pause before starting a new sentence which cannot be explained under the serial view.
  • First robust evidence that written composition is a parallel process.
  • Similar effects across different writers (e.g. young / L2 writers, students, different languages) and composition tasks.

REF criteria

Originality Rigour Significance
First empirical evidence to challenge serial models of multi-sentence text production using an implementation of a parallel account in Bayesian mixed-effects mixture models.

For reflection on our other REF 4* papers see rpubs.com/jensroes/ref-overview.

REF criteria

Originality Rigour Significance
First empirical evidence to challenge serial models of multi-sentence text production using an implementation of a parallel account in Bayesian mixed-effects mixture models. Multi-dataset analysis (> 5m keystrokes produced by 967 participants of various demographics), formal (Bayesian) model comparison using leave-one-out cross-validation, simulation checks to rule out overfitting, and transparent reporting of code and data (online tutorial).

For reflection on our other REF 4* papers see rpubs.com/jensroes/ref-overview.

For a tutorial on Bayesian mixed-effects mixture-model analysis for keystroke data see rpubs.com/jensroes/mixture-models-tutorial.

REF criteria

Originality Rigour Significance
First empirical evidence to challenge serial models of multi-sentence text production using an implementation of a parallel account in Bayesian mixed-effects mixture models. Multi-dataset analysis (> 5m keystrokes produced by 967 participants of various demographics), formal (Bayesian) model comparison using leave-one-out cross-validation, simulation checks to rule out overfitting, and transparent reporting of code and data (online tutorial). Challenges a dominant theoretical assumption in writing research, with implications for cognitive models of language production and interpretation of process data across diverse writing contexts.

For reflection on our other REF 4* papers see rpubs.com/jensroes/ref-overview.

For a tutorial on Bayesian mixed-effects mixture-model analysis for keystroke data see rpubs.com/jensroes/mixture-models-tutorial.

References

Carpenter, B., Gelman, A., Hoffman, M., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M. A., Guo, J., Li, P., & Riddell, A. (2016). Stan: A probabilistic programming language. Journal of Statistical Software, 20.

Chukharev, E., Roeser, J., & Torrance, M. (2025). Lookback supports semi-parallel, just-in-time processing in second language written composition. PLOS ONE, 20(11), 1–18. https://doi.org/10.1371/journal.pone.0334960

Ofstad Oxborough, G. H., & Torrance, M. (2011). Multilevel analysis of latency in writing. 21st Annual Meeting of the Society for Text and Discourse. http://textanddiscourse2011.conference.univ-poitiers.fr/PROG_DEFIN.pdf

Rønneberg, V., Torrance, M., Uppstad, P. H., & Johansson, C. (2022). The process-disruption hypothesis: How spelling and typing skill affects written composition process and product. Psychological Research, 86(7), 2239–2255.

Rossetti, A., & Van Waes, L. (2022). Text simplification in second language: Process and product data [Dataset]. Zenodo. https://doi.org/10.5281/zenodo.6720290

Stan Development Team. (2018). RStan: The R interface to Stan. https://mc-stan.org/

Torrance, M., Rønneberg, V., Johansson, C., & Uppstad, P. H. (2016). Adolescent weak decoders writing in a shallow orthography: Process and product. Scientific Studies of Reading, 20(5), 375–388.

Vandermeulen, N., De Maeyer, S., Van Steendam, E., Lesterhuis, M., Van den Bergh, H., & Rijlaarsdam, G. (2020). Mapping synthesis writing in various levels of Dutch upper-secondary education: A national baseline study on text quality, writing process and students’ perspectives on writing. Pedagogische Studiën: Tijdschrift Voor Onderwijskunde En Opvoedkunde, 97(3), 187–236.