Analysis: parallel planning in written naming

1 Data analysis

Keystroke transition durations data were analysed in Bayesian hierarchical mixture models as described in Roeser et al. (2021) using the R package brms (Bürkner, 2017b, 2017a) and the probabilistic programming language Stan (Carpenter et al., 2016; Hoffman & Gelman, 2014). Mixture models. The use of mixture models allow us to model the writing data as a mixture of two processes, (1) a smooth flow of activation from higher levels of activation into finger movements merely restricted by the participants ability to move their fingers on the keyboard and (2) key transitions that were inhibited by delays on upstream levels, for example, the retrieval of a lexical names. Rather than imposing threshold values to distinguish between simple and more demanding events, mixture models model the data as a combination of two processes using a mixing weight to capture the probability of processing difficulty occurring. Random by-participant intercepts were modelled for the pausing probability (i.e. mixing proportion) and the duration of fluent key transitions. These random-effect of the mixture models allows to capture individual differences in typing style (typing speed or and pausing frequency). Also, random by-image set intercepts were included to model differences between pictoral stimuli. A detailed description and a tutorial for Bayesian hierarchical mixed-effects mixture models for keystroke data can be found in Roeser et al. (2021); see also Hall et al. (2022); Baaijen et al. (2012); Almond et al. (2012).

We report the most probable posterior (i.e. inferred) parameter value as well as the interval that contains the posterior parameter value with a 95% probability; 95% probability intervals (henceforth, PI). Also we calculated the statistical support for the effects of interest and the support for the alternative hypothesis over the null hypothesis. This evidence was obtained using Bayes Factors (henceforth, BF) calculated using the Savage-Dickey method (see, e.g., Dickey et al., 1970; Wagenmakers et al., 2010). A BF larger than 5 indicate moderate and larger than 10 strong evidence for a statistically meaningful effect compared to the null hypothesis (see, e.g., Jeffreys, 1961; Lee & Wagenmakers, 2014). For example BF=2 reflect that the alternative hypothesis is two times more likely than the null hypothesis given the evidence. Priors for all effects were weakly informative. We used these weakly informative priors favoring the null hypothesis over the alternative hypothesis for the slope parameters because BFs are sensitive to the distribution of the prior. Thus, our priors are not favoring the alternative hypothesis.

Data were analysed in Bayesian mixed effects models (Gelman et al., 2014; McElreath, 2016). The R (R Core Team, 2020) package rstan (Stan Development Team, n.d.) was used to interface with the probabilistic programming language Stan (Carpenter et al., 2016) which was used to implement all models. Models were fitted with weakly informative priors (see McElreath, 2016), and run with at least 10,000 iterations on 3 chains with a warm-up of 5,000 iterations and no thinning. Model convergence was confirmed by the Rubin-Gelman statistic (\(\hat{R}\) = 1) (Gelman & Rubin, 1992) and inspection of the Markov chain Monte Carlo chains.

2 Mixture model details

The mixture model was implemented in the probabilistic programming language Stan. The written naming process that results in inter-keystroke intervals (iki) was constrained to be a mixture of two log-normal distributions of which one distribution represents fluent transitions and the other disfluencies. This model can be summarised as in equation @ref{eq:mog}.

\[ \begin{align}\tag{2.1} \text{iki}_{ij} \sim \ & \theta_\text{verbtype[i], location[i], position[i]} \cdot \text{logN}(\beta_\text{position} + \delta_\text{location} + u_i + w_j, \sigma_\text{location}') \ +\\ & (1-\theta_\text{verbtype[i], location[i], position[i]}) \cdot \text{logN}(\beta_\text{position} + u_i + w_j, \sigma_\text{location})\\ \text{constraints: } & \delta > 0, \sigma' > \sigma\\ \end{align} \]

The positively constrained parameter \(\delta\) ensures that the distribution on the first line of the equation is larger than the distribution on the second line as both distributions have the mean \(\beta\) but the first distribution is incremented by a lag \(\delta\) between subsequent keystrokes. Therefore, \(\beta\) is capturing the average transition duration of fluent keystroke pairs separated by word position of the keystroke in the sentence and the \(\delta\) the difference / slowdown for long keystroke pairs in addition to \(\beta\). The relative weighting between these two distributions is captured by the mixing proportion \(\theta\): \(\theta\) is parameterised so that it captures the relative weight of keystrokes that are disfluent and can, therefore, be understood as the pausing probability (Roeser et al., 2021). This is achieved by multiplying \(\theta\) with the distribution of slow key transitions but using the inverse \(1-\theta\) for the distribution of fluent transitions.

The slowdown for disfluent transitions was allowed to vary across transition locations. Transition location in this study was restricted to two locations: before word transitions – the latency difference between the space space preceding a word and the first letter of a word – and within word transitions – the differences between all key bigrams that form a word after and before the preceding and subsequent space, respectively. [There needs to be an argument and discussion why the sentence initial keystroke was not included in the data collection.] These transition locations were reported in previous research to be psycholinguistically meaningful (Chukharev-Hudilainen et al., 2019; De Smet et al., 2018; Torrance et al., 2016). Each distribution was assumed to have unequal variances in two regards: first, disfluent key transitions have a larger variability than fluent key transitions which is achieved by constraining \(\sigma'\) to be larger than \(\sigma\). Second each transition location (location) has its own variance component.

The theoretically most interesting model parameter is the mixing proportion \(\theta\) which was parametrisied to reflect the disfluency probability. The disfluency probability was allowed to vary for every combination of factor levels of transition location (levels: within words, between words), verbtype (levels: intransitive, transitive, ditransitive), and position which is the word position in the utterance (levels: first determiner [DET1]¹, first noun [N1], verb, second determiner [DET2], second noun [N2], third determiner [DET3], third noun [N3]). From the posterior of the model we calculated then main effects and interactions as well as planned pairwise comparisons between verb types and estimated marginal cell means.

The model is a mixed effects model because we included random effects for participants and image sets: for participants, we allowed the average typing speed to vary across participants. We assumed that some participants have a typing speed slower than the average \(\beta\) and others are on average faster. This is captured by the parameter \(u_i\) where \(i\) is indexing every participant. The typing speed difference between average and participant \(u\) was assumed to be distributed according to \(u \sim \text{N}(0, \sigma_u)\). Also, we assumed that the probability of long latencies varies across participants, hence the index \(i\) for \(\theta\). This is capturing the idea that people vary in their writing styles as to how often they pause to plan upcoming utterances and plan in parallel to production. Similar to the random intercepts for participants we allowed the average typing speed to vary across image sets. We assumed that some image sets are easier to name and others are on harder than average leading to typing speed variability centred around the \(\beta\) parameter. This is captured by the parameter \(w_j\) where \(j\) is indexing every image set. The typing speed difference between average and image set was assumed to be distributed according to \(w \sim \text{N}(0, \sigma_w)\).

3 Results

3.1 Sample

TODO

number of participants in final sample with number of trials
number of trials by verb type
number of verbs by verb type
removed data

3.2 Raw data visualisation

Average keystroke intervals are shown in Figure 3.1 by transition location, sentence position and verb type.

Summary plots of mean inter-keystroke intervals with 1.96 standard errors (= 1 confidence interval) transition location, position, and verb type.

Figure 3.1: Summary plots of mean inter-keystroke intervals with 1.96 standard errors (= 1 confidence interval) transition location, position, and verb type.

3.3 Mixture model results

The parameter estimates of the hierarchical Bayesian mixture model are summarised in Table 3.1. For ease of interpretation we transformed the fluent typing estimate \(\beta\), and the disfluency slowdown \(\delta\) to the linear msecs scales (rather than displaying log msecs), and the disfluency probability to the probability scale (rather than displaying logits).

Table 3.1: Mixture model parameter estimates with 95% PI.
			Transition location
Parameter	Verb type	Position	overall	before words	within words
\(\beta\)	–	DET1	233 [218, 248]	–	–
\(\beta\)	–	N1	235 [221, 249]	–	–
\(\beta\)	–	verb	229 [215, 244]	–	–
\(\beta\)	–	DET2	189 [177, 202]	–	–
\(\beta\)	–	N2	234 [220, 249]	–	–
\(\beta\)	–	DET3	194 [181, 208]	–	–
\(\beta\)	–	N3	233 [218, 249]	–	–
\(\delta\)	–	–	–	428 [371, 492]	152 [122, 190]
\(\theta\)	intransitive	DET1	–	–	.23 [.16, .30]
\(\theta\)	intransitive	N1	–	.57 [.50, .65]	.26 [.22, .31]
\(\theta\)	intransitive	verb	–	.72 [.64, .79]	.23 [.19, .27]
\(\theta\)	transitive	DET1	–	–	.21 [.14, .30]
\(\theta\)	transitive	N1	–	.59 [.50, .69]	.22 [.17, .27]
\(\theta\)	transitive	verb	–	.85 [.77, .93]	.27 [.22, .32]
\(\theta\)	transitive	DET2	–	.81 [.72, .90]	.10 [.05, .16]
\(\theta\)	transitive	N2	–	.58 [.50, .67]	.23 [.18, .28]
\(\theta\)	ditransitive	DET1	–	–	.16 [.09, .24]
\(\theta\)	ditransitive	N1	–	.64 [.55, .73]	.23 [.18, .28]
\(\theta\)	ditransitive	verb	–	.85 [.77, .93]	.16 [.12, .20]
\(\theta\)	ditransitive	DET2	–	.81 [.72, .89]	.10 [.06, .15]
\(\theta\)	ditransitive	N2	–	.60 [.51, .69]	.22 [.18, .27]
\(\theta\)	ditransitive	DET3	–	.34 [.26, .43]	.08 [.04, .14]
\(\theta\)	ditransitive	N3	–	.71 [.62, .80]	.22 [.17, .27]
\(\sigma\)	–	–	0.54 [0.53, 0.55]	–	–
\(\sigma_\text{diff}\)	–	–	0.22 [0.18, 0.28]	–	–
\(\sigma_u\)	–	–	0.21 [0.2, 0.23]	–	–
\(\sigma_w\)	–	–	0.12 [0.1, 0.14]	–	–

Figure 3.2 summarises the disfluency probability by location in the sentence, transition type and verb type.

Figure 3.2: Posterior estimates of disfluency probability.

Pairwise differences for verb type for the probability of disfluent transitions are summarised in Table 3.2 with 95% probability intervals and Bayes Factors indicating the evidence in support of the alternative hypothesis. Two effects are standing out: (1) we found a substantially higher disfluency probability before transitive verbs compared compared to intransitive verbs, suggesting that the object was more likely to be planned along with transitive verbs; (2) disfluency probability within transitive verbs was substantially higher compared to ditransitive verb, suggesting that for ditransitive verb types objects were less likely to be planned along with the verb, potentially because in transitive but not in ditransitive verbs it is unambiguous which object will immediately follow the verb.

Table 3.2: Differences for disfluency probability (on logit scale) with 95% PI and Bayes Factors (BF) indicating the evidence in favour of the alternative hypothesis over the null hypothesis.
	Transition location
Position	before words	within words
Comparison: ditransitive vs transitive
DET1	–	-0.39 [-1.1, 0.31], BF = 1.27
N1	0.21 [-0.31, 0.74], BF = 0.71	0.04 [-0.28, 0.37], BF = 0.35
verb	0 [-0.98, 0.93], BF = 0.91	-0.68 [-1.01, -0.35], BF = 588.6
DET2	-0.03 [-0.87, 0.77], BF = 0.78	-0.04 [-0.8, 0.75], BF = 0.78
N2	0.06 [-0.44, 0.55], BF = 0.52	-0.04 [-0.36, 0.3], BF = 0.34
Comparison: transitive vs intransitve
DET1	–	-0.08 [-0.67, 0.5], BF = 0.61
N1	0.09 [-0.38, 0.57], BF = 0.51	-0.24 [-0.53, 0.04], BF = 1.17
verb	0.86 [0.16, 1.76], BF = 14.27	0.22 [-0.06, 0.51], BF = 0.9

3.4 Model comparison

For model comparisons we used out-of-sample predictions estimated using Pareto smoothed importance-sampling leave-one-out cross-validation (Vehtari et al., 2015, 2017). Predictive performance was estimated as the sum of the expected log predictive density (\(\widehat{elpd}\)) and the difference \(\Delta\widehat{elpd}\) between models. The advantage of using leave-one-out cross-validation is that models with more parameters are penalised to prevent overfit. We fitted three models in which we incrementally increased the factors accounted for by the mixing proportion.

In the simplest model, the mixing proportion distinguished between sentence location only: in other words the pausing probability varied only as a function of the location of the word position in the sentence. In the second model, the mixing proportion was allowed to vary as a function of all combinations of levels of position and transition location, so the number of pauses was allowed to vary depending on the position of the word in the sentence and if the transition was within or before a word. Finally the third model also distinguished between verb type (levels: intransitive, transitive, ditransitive), to allow pausing probabilities changing depending the the verb type used in the sentence.

Results can be found in Table 3.3. Predictive performance was substantially increased by including location. Adding verbtype to the mixing propotion rendered a small increase in predictive performance.

Table 3.3: Model comparisons. Predictive performance was indicated as expected log pointwise predictive density (\(\widehat{elpd}\)). The top row shows the model with the highest predictive performance, i.e. the highest \(\widehat{elpd}\); the differences \(\Delta\widehat{elpd}\) are relative to the model with the highest predictive performance.
Model	\(\Delta\widehat{elpd}\)	\(\widehat{elpd}\)
Mixing proportion by position, location, verbtype	–	-116128 (188)
Mixing proportion by position, location	-3 (6)	-116131 (188)
Mixing proportion by position	-312 (24)	-116440 (192)
Note:
Standard errors are shown in parentheses.

4 Session info

R version 4.3.2 (2023-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.6 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so;  LAPACK version 3.7.1

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/London
tzcode source: system (glibc)

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] modelr_0.1.11    gtable_0.3.0     gridExtra_2.3    rmdformats_1.0.4
 [5] patchwork_1.2.0  polspline_1.1.24 brms_2.20.4      Rcpp_1.0.12     
 [9] kableExtra_1.3.4 lubridate_1.9.3  forcats_1.0.0    stringr_1.5.1   
[13] dplyr_1.1.4      purrr_1.0.2      readr_2.1.5      tidyr_1.3.0     
[17] tibble_3.2.1     ggplot2_3.4.4    tidyverse_2.0.0 

loaded via a namespace (and not attached):
  [1] inline_0.3.19        rlang_1.1.3          magrittr_2.0.3      
  [4] matrixStats_1.2.0    compiler_4.3.2       loo_2.6.0           
  [7] systemfonts_1.0.5    vctrs_0.6.5          reshape2_1.4.4      
 [10] rvest_1.0.3          crayon_1.3.4         pkgconfig_2.0.3     
 [13] fastmap_1.1.1        backports_1.1.7      ellipsis_0.3.1      
 [16] labeling_0.3         utf8_1.1.4           threejs_0.3.3       
 [19] promises_1.2.1       rmarkdown_2.25       markdown_1.1        
 [22] tzdb_0.4.0           bit_4.0.5            xfun_0.41           
 [25] cachem_1.0.8         jsonlite_1.8.8       highr_0.8           
 [28] later_1.3.2          tweenr_2.0.2         broom_1.0.5         
 [31] parallel_4.3.2       R6_2.4.1             dygraphs_1.1.1.6    
 [34] StanHeaders_2.32.5   bslib_0.6.1          stringi_1.8.3       
 [37] estimability_1.4.1   jquerylib_0.1.4      bookdown_0.37       
 [40] rstan_2.32.5         knitr_1.45           zoo_1.8-12          
 [43] base64enc_0.1-3      bayesplot_1.10.0     httpuv_1.6.13       
 [46] Matrix_1.6-3         igraph_1.6.0         timechange_0.2.0    
 [49] tidyselect_1.2.0     rstudioapi_0.15.0    abind_1.4-5         
 [52] yaml_2.2.1           codetools_0.2-19     miniUI_0.1.1.1      
 [55] pkgbuild_1.4.3       lattice_0.22-5       plyr_1.8.6          
 [58] shiny_1.8.0          withr_2.5.2          bridgesampling_1.1-2
 [61] posterior_1.5.0      coda_0.19-4          evaluate_0.23       
 [64] polyclip_1.10-6      RcppParallel_5.1.7   xts_0.13.1          
 [67] xml2_1.3.6           pillar_1.9.0         tensorA_0.36.2.1    
 [70] stats4_4.3.2         checkmate_2.3.1      DT_0.31             
 [73] shinyjs_2.1.0        distributional_0.3.2 generics_0.1.3      
 [76] vroom_1.6.5          hms_1.1.3            rstantools_2.3.1.1  
 [79] munsell_0.5.0        scales_1.3.0         gtools_3.9.5        
 [82] xtable_1.8-4         glue_1.7.0           emmeans_1.9.0       
 [85] tools_4.3.2          shinystan_2.6.0      colourpicker_1.3.0  
 [88] webshot_0.5.5        mvtnorm_1.2-4        QuickJSR_1.0.9      
 [91] crosstalk_1.2.1      colorspace_1.4-1     nlme_3.1-163        
 [94] ggforce_0.4.2        cli_3.6.2            fansi_0.4.1         
 [97] ggthemes_5.0.0       viridisLite_0.4.2    svglite_2.1.3       
[100] Brobdingnag_1.2-9    sass_0.4.8           digest_0.6.25       
[103] htmlwidgets_1.6.4    farver_2.0.3         htmltools_0.5.7     
[106] lifecycle_1.0.4      httr_1.4.7           mime_0.9            
[109] MASS_7.3-60          bit64_4.0.5          shinythemes_1.2.0

References

Almond, R., Deane, P., Quinlan, T., Wagner, M., & Sydorenko, T. (2012). A preliminary analysis of keystroke log data from a timed writing task. ETS Research Report Series, 2012(2), i–61.

Baaijen, V. M., Galbraith, D., & De Glopper, K. (2012). Keystroke analysis: Reflections on procedures and measures. Written Communication, 29(3), 246–277.

Bürkner, P.-C. (2017a). Advanced Bayesian multilevel modeling with the R package brms. arXiv Preprint arXiv:1705.11123.

Bürkner, P.-C. (2017b). brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software, 80(1), 1–28. https://doi.org/10.18637/jss.v080.i01

Carpenter, B., Gelman, A., Hoffman, M., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M. A., Guo, J., Li, P., & Riddell, A. (2016). Stan: A probabilistic programming language. Journal of Statistical Software, 20.

Chukharev-Hudilainen, E., Saricaoglu, A., Torrance, M., & Feng, H.-H. (2019). Combined deployable keystroke logging and eyetracking for investigating L2 writing fluency. Studies in Second Language Acquisition, 41(3), 583–604.

De Smet, M. J. R., Leijten, M., & Van Waes, L. (2018). Exploring the process of reading during writing using eye tracking and keystroke logging. Written Communication, 35(4), 411–447.

Dickey, J. M., Lientz, B. P., et al. (1970). The weighted likelihood ratio, sharp hypotheses about chances, the order of a markov chain. The Annals of Mathematical Statistics, 41(1), 214–226.

Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2014). Bayesian data analysis (3rd ed.). Chapman; Hall/CRC.

Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7(4), 457–472.

Hall, S., Baaijen, V. M., & Galbraith, D. (2022). Constructing theoretically informed measures of pause duration in experimentally manipulated writing. Reading and Writing, 1–29.

Hoffman, M. D., & Gelman, A. (2014). The No-U-Turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research, 15(1), 1593–1623.

Jeffreys, H. (1961). The theory of probability (Vol. 3). Oxford University Press, Clarendon Press.

Lee, M. D., & Wagenmakers, E.-J. (2014). Bayesian cognitive modeling: A practical course. Cambridge University Press.

McElreath, R. (2016). Statistical rethinking: A bayesian course with examples in R and Stan. CRC Press.

R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/

Roeser, J., De Maeyer, S., Leijten, M., & Van Waes, L. (2021). Modelling typing disfluencies as finite mixture process. Reading and Writing, 1–26.

Stan Development Team. (n.d.). RStan: The R interface to Stan. https://mc-stan.org/

Torrance, M., Rønneberg, V., Johansson, C., & Uppstad, P. H. (2016). Adolescent weak decoders writing in a shallow orthography: Process and product. Scientific Studies of Reading, 20(5), 375–388.

Vehtari, A., Gelman, A., & Gabry, J. (2015). Pareto smoothed importance sampling. arXiv Preprint arXiv:1507.02646.

Vehtari, A., Gelman, A., & Gabry, J. (2017). Practical bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing, 27(5), 1413–1432.

Wagenmakers, E.-J., Lodewyckx, T., Kuriyal, H., & Grasman, R. (2010). Bayesian hypothesis testing for psychologists: A tutorial on the savage–dickey method. Cognitive Psychology, 60(3), 158–189.

Before word transitions were not included for the first determiner because these data were not recorded.↩︎