1 Introduction

2 Model definitions

The models presented in the following can be divided into two general groups. The first three models are largely akin to models typically used in the literature. By this we mean models that assume a uni-modal process that generates keystroke data as is incorporated in statistical models such as analysis of variance and linear mixed-effects models. Second, the last two models model keystroke intervals as a combination of two weighted processes of which one presents a smooth information flow from mind into finger; the other component is more important as it represents moments at which the information flow was interrupted leading to longer latencies. The latter two models map directly on the idea of a cascading model of writing.

3 Analysis

We reanalysed data sets including process information from participants writing text. For all data sets we fit a series of four models each with random effects for participants. Probability functions used were normal and log-normal in line with typically treatments used in the literature, a log-normal distribution with unequal variances for model predictions, and a bimodal mixed effects model. Stan code for mixture models was based on Roeser et al. (2021). Text locations (levels: before sentence, before word, within word) was included as predictor in all models.

Data were analysed in Bayesian mixed effects models (Gelman et al., 2014; McElreath, 2016). The R (R Core Team, 2020) package rstan (Stan Development Team, n.d.) was used to interface with the probabilistic programming language Stan (Carpenter et al., 2016) which was used to implement all models. Models were fitted with weakly informative priors (see McElreath, 2016), and run with 20,000 iterations on 3 chains with a warm-up of 10,000 iterations and no thinning. Model convergence was confirmed by the Rubin-Gelman statistic (\(\hat{R}\) = 1) (Gelman & Rubin, 1992) and inspection of the Markov chain Monte Carlo chains.

4 Datasets

4.1 General overview

Five datasets with keystroke data from text production were used for analysis. An overview can be found in Table 4.1.

Table 4.1: Datasets in brief.
Dataset	Source	Keylogger	Writing task	N (ppts)	conditions	Mean age	Language
C2L1	Rønneberg et al. (2022)	EyeWrite	Argumentative essays	126		11.80	Norwegian
CATO	Torrance et al. (2016)	EyeWrite	Expository texts	52	weak decoders / control; masked / unmasked	16.90	Norwegian
SPL2	Torrance et al. (n.d.)	CyWrite	Argumentative essays	39	write in L1 / L2	20.60	English (L1) / Spanish (L2)
PLanTra	Rossetti & Van Waes (2022)	InputLog	Text simplification	47	pre / post test trained in plain language principles and control	23.00	English (L2)
LIFT	Vandermeulen et al. (2020)	InputLog	Synthesis	658	Various topics and genres	16.95	Dutch
GUNNEXP2		EyeWrite		45	masked / unmasked	NA	Norwegian

4.1.1 GUNNEXP2

4.1.2 C2L1

The C2L1 data set comprises data Norwegian 6th graders – N=126, mean age 11 years 10 months – published in Rønneberg et al. (2022). The children composed argumentative essays in Norwegian, a language with a relatively shallow orthography.

TODO: might need to remove kids that don’t speak Norwegian at home (see github issue).

4.1.3 CATO

Data are published in Torrance et al. (2016). Norwegian upper secondary students–N=26, mean age = 16.9 years–with weak decoding skills and 26 age-matched controls composed expository texts by keyboard under two conditions: normally and with letters masked to prevent them reading what they were writing.

4.1.4 PLanTra

The PLanTra (Plain Language for Financial Content: Assessing the Impact of Training on Students’ Revisions and Readers’ Comprehension) data set (Rossetti & Van Waes, 2022) involved the collection of keystroke data from 47 university students, who were randomly divided into an experimental and a control group. In a pre-test session, all students were assigned an extract of a corporate report dealing with sustainability and were instructed to revise it to make it easier to read for a lay audience. Subsequently, the experimental group received training on how to apply plain language principles to sustainability content, while the control group received training exclusively on the topic of sustainability. During a post-test session, both groups were instructed to revise a second extract of a corporate sustainability report with the same goal–i.e. making it easier to read for a lay audience–by applying what they had learned from their respective training. The texts were in English while the participants were native speakers of other languages (mainly Dutch), so writing took place in second language. It should be pointed out that, while some students decided to revise the assigned texts, the majority of them opted for rewriting the texts from scratch.

4.1.5 LIFT

LIFT (Improving Pre-university Students’ Performance in Academic Synthesis Tasks with Level-up Instructions and Feedback Tool) (Vandermeulen et al., 2020).

4.1.6 SPL2

Data are going to be published in Torrance et al. (n.d.).

Undergraduate university students–N = 39, 28 female, mean age = 20.6 years (SD = 1.51)–wrote two short argumentative essays, one in English (the student’s first language in all cases; L1) and one in Spanish (L2) using CyWrite (Chukharev-Hudilainen et al., 2019). CyWrite provides a writing environment with basic word processing functionality (e.g., Microsoft WordPad), including text selection by mouse action, and copy-and-paste. We recorded the time of each keystroke and mouse action, and tracked writers’ eye movements within their emerging text.

Writing tasks: Participants were given a 40 minute time limit. They wrote essays in response to each of two prompts, with order and L1 / L2 counterbalanced across subjects.

4.2 Transition types

The transition types that were analysed in this study focuses on those locations that were found, by previous research, to be psycholinguistically meaningful (Chukharev-Hudilainen et al., 2019; De Smet et al., 2018; e.g. Torrance et al., n.d., 2016) and are detailed in Table 4.2. Keytransitions that terminated in an editing operation were excluded from the analysis. Transitions that occurred at the beginning of the text or the beginning of a paragraph were not treated as before-sentence transitions.

Table 4.2: Transition location classification.
Transition type	Description	Example
Within word	Transitions between any letter	T^h^e c^a^t m^e^o^w^e^d. T^h^a^t[bsp][bsp]e^n i^t s^l^e^p^t.
Below word	Keypress after space followed by any letter	The ^cat ^meowed. That[bsp][bsp]en ^it ^slept.
Before sentence	Keypress following a space preceding any letter	The cat meowed. ^That[bsp][bsp]en it slept.
Note:
‘^’ marks transition location, [bsp] represents backspace. IKIs were timed to the shift keypress.

4.3 Data reduction

For all datasets we only used transitions that were not followed by an editing operation.

We removed participants that did not complete all conditions in studies with within-participant factors (reducing the number of participants to 343 in the LIFT data set, and 41 participants in the PLanTra data set). We removed participants that produced less than 10 sentences (LIFT: 109 participants; PLanTra: 3 participants; SPL2: 1 participant)

We removed keystroke intervals that are extremely short (\(\le\) 50 msecs) or extremely long (\(\ge\) 30 secs). The percentage of remove keystroke data can be found in Table 4.3.

From the remaining data we randomly sampled 100 observations per participant, per condition, and per transition location, with the exception of the LIFT data set. This was done for computational reasons to reduce the time the Bayesian models need to complete. For the LIFT data set we reduced the number of participants to 100 which is substantially more than most of the other data sets in our analysis. Because we included the large number of writing tasks in the LIFT data set as fixed effect, we sampled 50 observations per condition, location and participant. The percentage of keystroke data that went into the final analysis can be found, by transition location, in Table 4.3.

Table 4.3: Data reduction. Mean percentage of extreme data removed and the mean percentage of randomly sampled data by transition locattion. Standard error is shown in parentheses.
	Extreme values		Randomly sampled data
Dataset	\(\le\) 50 msecs	\(\ge\) 30 secs	before word	within word	before sentence
C2L1	0.19% (0.1%)	0.07% (0.06%)	84.5% (1.8%)	35.1% (2.6%)	100% (0%)
CATO	0.65% (0.15%)	0.02% (0.02%)	48.6% (2.2%)	14.9% (0.9%)	100% (0%)
LIFT	2.65% (0.16%)	0% (0%)	13.1% (0.9%)	3.2% (0.2%)	99.4% (0.1%)
PLanTra	2.49% (0.41%)	0.04% (0.03%)	36.6% (1.9%)	9.7% (0.6%)	100% (0%)
SPL2	2.29% (0.2%)	0.03% (0.02%)	22.6% (1.4%)	5.7% (0.4%)	100% (0%)
GUNNEXP2	2.16% (0.17%)	0.01% (0.01%)	22.5% (1.4%)	6.2% (0.4%)	100% (0%)

5 Out-of-samples cross-validation

For model comparisons we used out-of-sample predictions estimated using Pareto smoothed importance-sampling leave-one-out cross-validation (Vehtari et al., 2015, 2017). Predictive performance was estimated as the sum of the expected log predictive density (\(\widehat{elpd}\)) and the difference \(\Delta\widehat{elpd}\) between models. The advantage of using leave-one-out cross-validation is that models with more parameters are penalised to prevent overfit.

Results for all data sets are shown in Table 5.1. For all data sets we found the same pattern. The mixture of log-normal distributions provided a substantially better fit than uni-modal distribution models. The unconstrained version of the mixture of log-normal distributions rendered a higher predictive performance than the constrained version that does not allow the distribution of short keystroke-intervals to vary across conditions.

Table 5.1: Model comparisons. The top row shows the models with the highest predictive performance. Standard error is shown in parentheses.
	GUNNEXP2		CATO		CL21		LIFT		PLanTra		SPL2		SPL2 (shift + C)
Model	\(\Delta\widehat{elpd}\)	\(\widehat{elpd}\)	\(\Delta\widehat{elpd}\)	\(\widehat{elpd}\)	\(\Delta\widehat{elpd}\)	\(\widehat{elpd}\)	\(\Delta\widehat{elpd}\)	\(\widehat{elpd}\)	\(\Delta\widehat{elpd}\)	\(\widehat{elpd}\)	\(\Delta\widehat{elpd}\)	\(\widehat{elpd}\)	\(\Delta\widehat{elpd}\)	\(\widehat{elpd}\)
Bimodal log-normal (unconstrained)	–	-120,675 (237)	–	-139,691 (230)	–	-165,434 (231)	–	-272,696 (327)	–	-107,324 (228)	–	-96,671 (217)	–	-97,266 (219)
Bimodal log-normal (constrained)	-554 (38)	-121,228 (235)	-607 (43)	-140,298 (230)	-546 (51)	-165,980 (237)	-500 (37)	-273,197 (327)	-130 (18)	-107,454 (228)	-564 (41)	-97,234 (216)	-593 (40)	-97,859 (219)
Unimodal log-normal (unequal variance)	-2,981 (94)	-123,656 (258)	-2,389 (79)	-142,080 (246)	-2,184 (83)	-167,617 (256)	-5,605 (131)	-278,301 (365)	-2,514 (92)	-109,837 (248)	-1,744 (68)	-98,415 (224)	-1,719 (66)	-98,984 (226)
Unimodal log-normal	-5,293 (111)	-125,968 (263)	-3,689 (100)	-143,379 (258)	-2,968 (98)	-168,402 (267)	-7,716 (147)	-280,412 (375)	-3,895 (94)	-111,219 (246)	-3,457 (81)	-100,128 (221)	-3,086 (76)	-100,351 (223)
Unimodal normal	-44,136 (660)	-164,811 (713)	-41,362 (927)	-181,052 (992)	-40,485 (856)	-205,919 (935)	-82,567 (1,691)	-355,263 (1,779)	-37,718 (572)	-145,042 (637)	-32,174 (450)	-128,845 (493)	-32,019 (437)	-129,284 (477)
Note:
\(\widehat{elpd}\) = predictive performance indicated as expected log pointwise predictive density; \(\Delta\widehat{elpd}\) = difference in predictive performance relative to the model with the highest predictive performance in the top row.

6 Cross-data set comparisons

6.1 Cross-data set visualisation

The model estimates for the mixture-model with the highest predictive performance are shown in Figure 6.1. In this visualisation we ignore dataset specific conditions that are presented in detail below.

Figure 6.1: Across studies. Posterior parameter distribution

6.2 Effect of transition location

It’s generally believed that pausing is associated with syntactic edges such that more and longer pauses are predicted for key transitions at larger syntactic edges, i.e. before sentence > before word > within word. We have evaluated the differences between transition locations for all data sets. The results are shown in Table 6.1.

Results are largely consistent across data sets (with caveats) but differ, to some extent, from what the literature would predict. In line with the literature hesitations are more frequent before words than within words. Also hesitations are longer at before-sentence transitions compared to before-word transitions (except dataset C2L1) compared to within-word transitions (except dataset LIFT). However, our results do not support that writers pause more frequently at before-sentence locations compared to before-word locations (except for dataset SPL2; this also shows that more pauses at before-sentence locations can not be explained on the basis of multi-key combinations for sentence-initial capitalisation). Also, we observe that even fluent key-transitions are slower at before-word locations compared to within-word locations but there is generally not difference for fluent transitions for before-sentence transitions compared to before-word transitions (except for dataset SPL2).

The datasets differ to the extent that sentence-initial key transitions do (PLanTra, LIFT) or do not (CATO, C2L1, SPL2) include the character following the shift key for capitalisation. In other words, the pause before sentences may sum across two key intervals, namely _^[shift]^C but only involves one keyintervals, namely _^[shift]. For the SPL2 dataset, we calculated location effects for sentence-initial transitions that do and do not involve the shift-to-key transition. The results were the same for the transition location effects. A comparison that is untangling the effects of the multi-keycombination on the mixture-model estimates can be found in Table 8.3. In short, the duration of fluent transitions and the hesitation slowdown are affected but not the hesitation probability.

In conclusion, while pauses tend to be longer before sentences they are not more frequent than before words.

Table 6.1: Effect of transition location on keystroke intervals. Differences between transition locations are shown on log scale (for transition durations) and logit scale for probability of hesitant transitions. 95% PIs in brackets.
		Fluent transitions		Slowdown for hesitations		Probability of hesitations
Data set	Difference	Est. with 95% PIs	BF	Est. with 95% PIs	BF	Est. with 95% PIs	BF
C2L1	before sentence vs word	0.01 [-0.13, 0.15]	0.07	0.21 [-0.13, 0.54]	0.35	0.54 [-0.31, 1.43]	0.89
C2L1	before vs within word	0.4 [0.38, 0.42]	> 100	0.49 [0.39, 0.58]	> 100	1.1 [0.76, 1.44]	> 100
CATO (non-dyslexic unmasked)	before sentence vs word	-0.03 [-0.18, 0.11]	0.08	1.19 [0.88, 1.49]	> 100	0.41 [-0.44, 1.27]	0.67
CATO (non-dyslexic unmasked)	before vs within word	0.35 [0.31, 0.38]	> 100	0.35 [0.14, 0.54]	17.48	1.83 [1.19, 2.49]	> 100
GUNNEXP2 (unmasked)	before sentence vs word	0.22 [0.12, 0.32]	> 100	0.93 [0.78, 1.07]	> 100	1.32 [0.86, 1.78]	> 100
GUNNEXP2 (unmasked)	before vs within word	0.23 [0.21, 0.25]	> 100	0.38 [0.21, 0.54]	> 100	1.97 [1.59, 2.37]	> 100
LIFT	before sentence vs word	-0.04 [-0.13, 0.05]	0.07	0.35 [0.05, 0.72]	2.14	-0.49 [-1.03, 0]	1.63
LIFT	before vs within word	0.21 [0.13, 0.27]	> 100	0.26 [-0.16, 0.55]	0.71	1.56 [1.02, 2.16]	> 100
PLanTra	before sentence vs word	0.01 [-0.09, 0.11]	0.07	0.65 [0.46, 0.83]	> 100	0.13 [-0.25, 0.53]	0.25
PLanTra	before vs within word	0.16 [0.1, 0.21]	> 100	0.37 [0.2, 0.54]	> 100	1.83 [1.48, 2.17]	> 100
SPL2 (L1; shift + C)	before sentence vs word	0.73 [0.66, 0.81]	> 100	0.92 [0.78, 1.07]	> 100	1.16 [0.64, 1.69]	> 100
SPL2 (L1; shift + C)	before vs within word	0.3 [0.27, 0.34]	> 100	0.35 [0.15, 0.54]	22.82	1.84 [1.33, 2.35]	> 100
SPL2 (L1)	before sentence vs word	0.24 [0.17, 0.31]	> 100	1.39 [1.25, 1.52]	> 100	0.69 [0.17, 1.19]	8.01
SPL2 (L1)	before vs within word	0.31 [0.28, 0.34]	> 100	0.35 [0.14, 0.54]	16.25	1.94 [1.4, 2.5]	> 100
Note:
PIs are probability intervals. BF is the evidence in favour of the alternative hypothesis over the null hypothesis.

7 Model comparison with simulated data

A general concern with mixture models is that in principle, as the mixture model has more parameters it might simply always lead to a better fit, even though cross-validation is addressing potential problems with overfitting models.

To address this concern we simulated two sets of data. Both data sets have two conditions and 1,000 observations each. The first set of data has as underlying data generating process a mixture model with two mixture components similar to the process described above. The difference between the two conditions is that the mixing proportion is larger for condition 2 than for condition 1, hence long observations are more likely in condition 2. This model can be summarised as followed:

\[ \text{y}_i \sim \theta_\text{condition[i]} \cdot \text{logN}(\mu_1, \sigma^2_1) +\\ (1 - \theta_\text{condition[i]}) \cdot \text{logN}(\mu_2, \sigma^2_2) \]

The second data set was generated with an unequal variance unimodal model as data generating process. Condition 2 has a larger mean and standard deviation than condition 1. The model can be summaried as followed:

\[ \text{y}_i \sim \text{logN}(\mu_\text{condition[i]}, \sigma^2_\text{condition[i]}) \]

The true parameter values used for each of the two data simulations can be found in Table 7.1. The simulated data are visualised in Figure 7.1. The data are simulated to be similarly distributed to keystroke transitions.

Figure 7.1: Data simulated with a bimodal process (left) and a unimodal process (right).

We run 4 models: 2 mixture models, one on the data generated with a mixture process and one on the data generated with the unimodal unequal variance process. We repeated the same using an unimodal unequal variance model. Models were run with 3 chains, with each 6,000 iterations of which 3,000 were warmup. The Stan models uncovered the model parameters of their respective data sets successfully, as shown in Table 7.1, not less so when the model was applied to data generated with the other underlying process.

Table 7.1: Uncovered parameter estimates with 95% probability interval (PI) and true parameter values for each simulated data set and by model and their respective parameters.
		Estimate with 95% PI
Parameter	True value	Bimodal data	Unimodal data
Model: Bimodal mixture model
\(\beta\)	5	5 [4.98, 5.01]	5 [4.99, 5.02]
\(\delta\)	1	0.99 [0.9, 1.06]	1.04 [1, 1.08]
\(\theta_\text{condition=1}\)	.10	.09 [.07, .12]	.01 [.00, .01]
\(\theta_\text{condition=2}\)	.40	.42 [.38, .47]	.97 [.95, .99]
\(\sigma^2_1\)	0.25	0.25 [0.24, 0.26]	0.25 [0.24, 0.26]
\(\sigma^2_2\)	0.5	0.48 [0.43, 0.54]	0.48 [0.46, 0.51]
Model: Unimodal process
\(\beta_\text{condition=1}\)	5	5.1 [5.07, 5.12]	5 [4.99, 5.02]
\(\beta_\text{condition=2}\)	6	5.41 [5.37, 5.44]	6.02 [5.99, 6.05]
\(\sigma_\text{condition=1}\)	0.25	0.4 [0.38, 0.42]	0.25 [0.24, 0.26]
\(\sigma_\text{condition=2}\)	0.5	0.61 [0.59, 0.64]	0.51 [0.48, 0.53]

We used LOO-CV to compare the fit of the two models for each data set. The model comparisons can be found for each data generating process in Table (tab:loossim). The results show that the mixture model does not always lead to higher predictive performance. Indeed, the mixture model showed a lower predictive performance for the data that were generated with a unimodal process. However, for the data generated with a bimodal process, the mixture model model shows a higher predictive performance. In fact, the ratio of \(\Delta\widehat{elpd}\) and its SE, as metric for the strength of evidence, shows that the mixture model performs 11 times better than the unimodal model for the data generated with a bimodal process. In comparison, for the unimodal data, the unimodal model shows only 3 times better than the bimodal mixture model. Thus, even though the mixture model does not necessarily perform better for non-bimodal data but it also doesn’t necessarily perform much worse. This contrast is likely a reflection of the increased number of parameters in the mixture model.

Table 7.2: Model comparisons by data set. The top row shows the models with the highest predictive performance. Standard error is shown in parentheses.
Model	\(\Delta\widehat{elpd}\)	\(\widehat{elpd}\)
Data: Bimodal mixture process
Bimodal mixture model	0 (0)	-11,614 (58)
Unimodal unequal-variance model	-325 (30)	-11,939 (66)
Data: Unimodal process
Unimodal unequal-variance model	0 (0)	-11,788 (53)
Bimodal mixture model	-6 (2)	-11,794 (53)
Note:
\(\widehat{elpd}\) = predictive performance indicated as expected log pointwise predictive density; \(\Delta\widehat{elpd}\) = difference in predictive performance relative to the model with the highest predictive performance in the top row.

8 Posterior by data set

8.1 GUNNEXP2

8.1.1 Fit to data

8.1.2 Unconstrained mixture model

8.1.2.1 Posterior parameter estimates

Figure 8.1: GUNNEXP2 (unconstrained model). Posterior parameter distribution

8.1.2.2 Masking effect

Table 8.1: Mixture model estimates for key transitions. Cell means are shown for the masked and unmasked writing task in msecs for fluent key-transitions, the slowdown for long transitions and the probability of disfluent transitions. The effect for masking is shown on log scale (for transition durations) and logit scale for probability of disfluent transitions. 95% PIs in brackets.
Transition location	Unmasked	Masked	Difference	BF
Fluent transitions
before sentence	220 [195, 248]	250 [224, 279]	0.13 [0.02, 0.23]	1.01
before word	177 [165, 189]	185 [173, 198]	0.05 [0.03, 0.07]	70.05
within word	140 [131, 150]	146 [137, 156]	0.04 [0.03, 0.06]	> 100
Disfluencies
before sentence	1,602 [1,336, 1,908]	2,310 [1,968, 2,703]	0.21 [0.05, 0.38]	1.97
before word	401 [355, 451]	472 [415, 535]	0.08 [-0.02, 0.18]	0.18
within word	173 [129, 227]	172 [124, 230]	-0.03 [-0.24, 0.19]	0.11
Probability of disfluencies
before sentence	.69 [.61, .77]	.69 [.62, .76]	0 [-0.45, 0.45]	0.23
before word	.38 [.32, .44]	.32 [.27, .38]	-0.24 [-0.57, 0.09]	0.46
within word	.08 [.06, .10]	.06 [.05, .09]	-0.22 [-0.65, 0.21]	0.36
Note:
PIs are probability intervals. BF is the evidence in favour of the alternative hypothesis over the null hypothesis.

8.1.3 Cononstrained mixture model

8.1.3.1 Posterior parameter estimates

The posterior of the constrained model is shown in Figure 8.2 showing the posterior slowdown for disfluent keystrokes (left panel) and the probability of disfluent keystrokes (right panel). Fluent keystroke transitions are distributed around a posterior mean of 157 msecs, PI: (147, 168).

Figure 8.2: GUNNEXP2 (constrained model). Posterior parameter distribution.

8.1.3.2 Masking effect

Table 8.2: Mixture model estimates for key transitions. Cell means are shown for the masked and unmasked writing task in msecs for the slowdown for long transitions and the probability of disfluent transitions. The effect for masking is shown on log scale (for transition durations) and logit scale for probability of disfluent transitions. 95% PIs in brackets.
Transition location	Unmasked	Masked	Difference	BF
Disfluencies
before sentence	1,124 [954, 1,307]	1,418 [1,218, 1,634]	0.21 [0.05, 0.36]	2.59
before word	363 [325, 403]	379 [341, 420]	0.03 [-0.05, 0.11]	0.05
within word	182 [105, 279]	209 [135, 306]	0.08 [-0.24, 0.4]	0.19
Probability of disfluencies
before sentence	.88 [.82, .93]	.90 [.85, .94]	0.16 [-0.46, 0.79]	0.36
before word	.47 [.40, .54]	.45 [.38, .52]	-0.06 [-0.46, 0.35]	0.21
within word	.05 [.03, .08]	.04 [.03, .06]	-0.29 [-0.88, 0.28]	0.48
Note:
PIs are probability intervals. BF is the evidence in favour of the alternative hypothesis over the null hypothesis.

8.2 Fit to data

8.2.1 C2L1

C2L1 data. Comparison of 100 simulated (predicted) sets of data to observed data illustated by model. For illustration the x-axis was truncated at 2,000 msecs.

Figure 8.3: C2L1 data. Comparison of 100 simulated (predicted) sets of data to observed data illustated by model. For illustration the x-axis was truncated at 2,000 msecs.

8.2.2 CATO

CATO data. Comparison of 100 simulated (predicted) sets of data to observed data illustated by model. For illustration the x-axis was truncated at 2,000 msecs.

Figure 8.4: CATO data. Comparison of 100 simulated (predicted) sets of data to observed data illustated by model. For illustration the x-axis was truncated at 2,000 msecs.

8.2.3 PLanTra

PLanTra data. Comparison of 100 simulated (predicted) sets of data to observed data illustated by model. For illustration the x-axis was truncated at 2,000 msecs.

Figure 8.5: PLanTra data. Comparison of 100 simulated (predicted) sets of data to observed data illustated by model. For illustration the x-axis was truncated at 2,000 msecs.

8.2.4 LIFT

LIFT data. Comparison of 100 simulated (predicted) sets of data to observed data illustated by model. For illustration the x-axis was truncated at 2,000 msecs.

Figure 8.6: LIFT data. Comparison of 100 simulated (predicted) sets of data to observed data illustated by model. For illustration the x-axis was truncated at 2,000 msecs.

8.2.5 SPL2

SPL2 data. Comparison of 100 simulated (predicted) sets of data to observed data illustated by model. For illustration the x-axis was truncated at 2,000 msecs.

Figure 8.7: SPL2 data. Comparison of 100 simulated (predicted) sets of data to observed data illustated by model. For illustration the x-axis was truncated at 2,000 msecs.

8.3 Posterior parameter estimates of mixture model

8.3.1 C2L1

Figure 8.8: C2l1. Posterior parameter distribution

8.3.2 CATO

Figure 8.9: CATO. Posterior parameter distribution

8.3.3 PLanTra

Figure 8.10: PLanTra. Posterior parameter distribution

8.3.4 LIFT

Figure 8.11: LIFT. Posterior parameter distribution

8.3.5 SPL2

Can slowdowns for sentence-pauses be explained on the basis of a complex keystrokes that were summed across? – No

The data sets used differ to the extent that the keystroke interval before sentences does (PLanTra, LIFT) or does not (CATO, C2L1, SPL2) scope over the character following Shift. In other words, the pause before sentences sums across two key intervals in the PLanTra and LIFT data, namely _^[shift]^C but only involves one keyintervals, namely _^[shift] for the remaining data sets.

For the SPL2 dataset we compared whether the different patterns for sentences pauses can be explain but the keycombination. We analysed the SPL2 data including and excluding the keystroke after shift. Model estimates are presented in Figure (fig:spl2post). The results of this comparison can be found in Table 8.3. Overall, fluent transition duration and the hesitation duration were affected by whether or not the sentence-initial transition include the interval between shift and the first character but not the hesitation probability. Fluent keytransitions were substantially longer when including the interval following the shift-key. The slowdown for hesitations was affected too but the difference is numerically small. There was no conclusive evidence for an increased hesitation probability. Taken together, including the character following shift affects the duration of fluent transitions more than it affects pause duration and frequency.

Figure 8.12: SPL2. Posterior parameter distribution

Table 8.3: Mixture model estimates for key transitions. Cell means are shown for transitions that do and do not involve the transition to the character following shift in msecs for fluent key-transitions, the slowdown for long transitions and the probability of disfluent transitions. The difference for including the transition duration to the character after shift is shown on log scale (for transition durations) and logit scale for probability of disfluent transitions. 95% PIs in brackets.
Language	Transition location	_^[shift] + C	_^[shift]	Difference	BF
Fluent transitions
L1	before sentence	390 [350, 434]	240 [216, 266]	0.48 [0.34, 0.63]	> 100
	within word	138 [127, 150]	138 [127, 150]	0 [-0.11, 0.11]	0.06
	before word	187 [172, 204]	188 [173, 205]	-0.01 [-0.12, 0.11]	0.06
L2	before sentence	448 [379, 521]	296 [253, 343]	0.41 [0.19, 0.63]	46.08
	within word	155 [143, 168]	156 [143, 169]	0 [-0.12, 0.11]	0.06
	before word	259 [236, 282]	259 [236, 284]	0 [-0.13, 0.12]	0.06
Disfluencies
L1	before sentence	2,398 [2,001, 2,836]	2,469 [2,119, 2,855]	-0.46 [-0.61, -0.3]	> 100
	within word	140 [96, 195]	138 [93, 196]	0.01 [-0.24, 0.25]	0.13
	before word	345 [292, 406]	343 [289, 404]	0.01 [-0.12, 0.14]	0.07
L2	before sentence	2,859 [2,407, 3,368]	2,769 [2,348, 3,236]	-0.34 [-0.51, -0.17]	> 100
	within word	170 [132, 215]	171 [132, 217]	0 [-0.17, 0.17]	0.09
	before word	764 [673, 867]	759 [667, 862]	0.01 [-0.09, 0.11]	0.05
Probability of disfluencies
L1	before sentence	.62 [.53, .71]	.50 [.41, .59]	0.49 [-0.04, 1.03]	1.43
	within word	.08 [.05, .11]	.07 [.05, .10]	0.12 [-0.46, 0.7]	0.32
	before word	.34 [.27, .42]	.34 [.26, .42]	0.02 [-0.48, 0.51]	0.25
L2	before sentence	.81 [.73, .88]	.72 [.63, .80]	0.48 [-0.16, 1.13]	0.94
	within word	.18 [.14, .24]	.18 [.13, .24]	0.03 [-0.47, 0.52]	0.25
	before word	.59 [.51, .67]	.59 [.51, .68]	0.01 [-0.48, 0.5]	0.25
Note:
PIs are probability intervals. BF is the evidence in favour of the alternative hypothesis over the null hypothesis.

References

Carpenter, B., Gelman, A., Hoffman, M., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M. A., Guo, J., Li, P., & Riddell, A. (2016). Stan: A probabilistic programming language. Journal of Statistical Software, 20.

Chukharev-Hudilainen, E., Saricaoglu, A., Torrance, M., & Feng, H.-H. (2019). Combined deployable keystroke logging and eyetracking for investigating L2 writing fluency. Studies in Second Language Acquisition, 41(3), 583–604.

De Smet, M. J., Leijten, M., & Van Waes, L. (2018). Exploring the process of reading during writing using eye tracking and keystroke logging. Written Communication, 35(4), 411–447.

Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2014). Bayesian data analysis (3rd ed.). Chapman; Hall/CRC.

Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7(4), 457–472.

McElreath, R. (2016). Statistical rethinking: A bayesian course with examples in R and Stan. CRC Press.

R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/

Roeser, J., De Maeyer, S., Leijten, M., & Van Waes, L. (2021). Modelling typing disfluencies as finite mixture process. Reading and Writing, 1–26. https://osf.io/y3p4d/

Rossetti, A., & Van Waes, L. (2022). It’s not just a phase: Investigating text simplification in a second language from a process and product perspective. Frontiers in Artificial Intelligence, 5.

Rønneberg, V., Torrance, M., Uppstad, P. H., & Johansson, C. (2022). The process-disruption hypothesis: How spelling and typing skill affects written composition process and product. Psychological Research, 1–17.

Stan Development Team. (n.d.). RStan: The R interface to Stan. https://mc-stan.org/

Torrance, M., Roeser, J., & Chukharev-Hudilainen, E. (n.d.). Lookback in L1 and L2 writing: An eye movement study.

Torrance, M., Rønneberg, V., Johansson, C., & Uppstad, P. H. (2016). Adolescent weak decoders writing in a shallow orthography: Process and product. Scientific Studies of Reading, 20(5), 375–388.

Vandermeulen, N., Steendam, E. V., & Rijlaarsdam, G. (2020). DATASET - Baseline data LIFT Synthesis Writing project [Data set]. Zenodo. https://doi.org/10.5281/zenodo.3893538

Vehtari, A., Gelman, A., & Gabry, J. (2015). Pareto smoothed importance sampling. arXiv Preprint arXiv:1507.02646.

Vehtari, A., Gelman, A., & Gabry, J. (2017). Practical bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing, 27(5), 1413–1432.

Modelling writing hesitations in text writing as finite mixture process