L2 writing fluency in Norwegian students

1 Data analysis

All data were analysed in Bayesian mixed-effects models using the R package brms (Bürkner, 2017b, 2017a) and the probabilistic programming language Stan (Carpenter et al., 2016; Hoffman & Gelman, 2014). Models were, where plausible, fitted with random intercepts for participants and by-participant slope adjustments for Language (text written in L1 or L2).

We report the most probable posterior (i.e. inferred) parameter value as well as the interval that contains the posterior parameter value with a 95% probability; 95% probability intervals (henceforth, PI). Also we calculated the statistical support for the effects of interest and the support for the alternative hypothesis over the null hypothesis. This evidence was obtained using Bayes Factors (henceforth, BF) calculated using the Savage-Dickey method (see, e.g., Dickey et al., 1970; Wagenmakers et al., 2010). A BF larger than 5 indicate moderate and larger than 10 strong evidence for a statistically meaningful effect compared to the null hypothesis (see, e.g., Baguley, 2012; Jeffreys, 1961; Lee & Wagenmakers, 2014). For example BF=2 reflect that the alternative hypothesis is two times more likely than the null hypothesis given the evidence. Priors for all effects were weakly informative. We used these weakly informative priors favoring the null hypothesis over the alternative hypothesis for the slope parameters because BFs are sensitive to the distribution of the prior. Thus, our priors are not favoring the alternative hypothesis.

Transition duration was analysed in mixture models as described in Roeser et al. (2021). This is important to distinguish fluent transition durations associated with a smooth flow of activation from higher levels of activation into finger movements and transitions that were inhibited by, for example, difficulty to retrieve correct spelling or the right word. Rather than imposing threshold values to distinguish between simple and more demanding events, mixture models model the data as a combination of two processes using a mixing weight to capture the probability of processing difficulty occurring. Random by-participant intercepts were modelled for the pausing probability (i.e. mixing proportion) and the duration of fluent key transitions. These random-effect of the mixture models allows to capture individual differences in typing style. A detailed description and a tutorial for Bayesian mixed-effects mixture models for keystroke data can be found in Roeser et al. (2021); see also Hall et al. (2022); Baaijen et al. (2012); Li (2021); Almond et al. (2012).

Data were analysed in Bayesian mixed effects models (Gelman et al., 2014; McElreath, 2016). The R (R Core Team, 2020) package rstan (Stan Development Team, n.d.) was used to interface with the probabilistic programming language Stan (Carpenter et al., 2016) which was used to implement all models. Models were fitted with weakly informative priors (see McElreath, 2016), and run with at least 4,000 iterations on 4 chains with a warm-up of 2,000 iterations and no thinning. Model convergence was confirmed by the Rubin-Gelman statistic ($\hat{R}$ = 1) (Gelman & Rubin, 1992) and inspection of the Markov chain Monte Carlo chains. For mixture models we used 20,000 iterations with 10,000 iterations warmup on 6 chains.

2 Copy task

Each participant (n = 101) completed a copy task in English and Norwegian.

2.1 Raw data visualisation

Density plots of the raw keystroke intervals are shown in Figure 2.1 by copy-task component and language.

Figure 2.1: Density plots of inter-keystroke intervals (IKIs) by copy-task component.

2.2 Data reduction

This analysis focuses on transition duration. Transition duration shorter than (or equal to) 10 msecs were removed before analysis (M=1.18%, SE=0.12%).

Among the copy-task components participants had to copy 4 word-triplets (phrases) of which 3 word-triples consisted predominantly of high-frequency bigrams and 1 contained low-frequency bigrams. From the first 3 word triplets we removed bigrams that were not highly frequent and from the latter we used only low-frequency bigrams for analysis. Therefore, the HF-words task included only high-frequency bigrams, and the LF-words task included only low-frequency bigrams.

We randomly selected a subset of 50 observations per participant, language, and copy-task component to reduce the time and computational power necessary to run the statistical models. Therefore we used subsets of the data of (most) copy-task components (indicated is the by-participant average of observations used per copy-task component and language with standard errors [SE] indicating the variability of observations used across participants): consonants tasks (Norwegian: M=100%, SE=0%, English: M=100%, SE=0%), HF-words task (Norwegian: M=13.6%, SE=3.4%, English: M=13.4%, SE=3.4%), LF-words task (Norwegian: M=100%, SE=0%, English: M=100%, SE=0%), the sentence task (Norwegian: M=76.4%, SE=4.2%, English: M=74%, SE=4.4%), typing-speed task (Norwegian: M=40.7%, SE=4.9%, English: M=40.1%, SE=4.9%). The average number of bigrams used in the final sample is shown in Table 2.1.

Table 2.1: Average number of bigrams per participant in sample used for statistical modelling. Values are shown by copy-task component and language. Standard error (SE) is shown in parentheses.
Copy-task component	L1	L2
Consonants	9.1 (0.1)	15 (0.1)
HF words	50 (0)	50 (0)
LF words	26.1 (0.3)	24.7 (0.4)
Sentence	47.1 (0.8)	47.4 (0.7)
Typing speed	48.5 (0.7)	49.6 (0.3)

2.3 Mixture model results

The mixture model was implemented in the probabilistic programming language Stan. The copy-typing process was constrained to be a mixture of two log-normal distributions of which one distribution represents fluent transitions and the other disfluencies.

\[ \begin{align}\tag{2.1} \text{iki}_{i} \sim \ & \theta_\text{condition[i]} \cdot \text{logN}(\beta_\text{condition} + \delta_\text{condition} + u_i, \sigma_\text{ctc}') \ +\\ & (1-\theta_\text{condition[i]}) \cdot \text{logN}(\beta_\text{condition} + u_i, \sigma_\text{ctc})\\ \text{constraints: } & \delta > 0, \sigma' > \sigma\\ \end{align} \]

The positively constrained parameter $\delta$ achieves this: it ensures that the distribution on the first line of the equation is larger than the distribution on the second line as both distributions have the mean $\beta$ but the first distribution is incremented by $\delta$. Therefore, $\beta$ is capturing the average transition duration of fluent keystroke pairs and the $\delta$ the difference / slowdown for long keystroke pairs in addition to $\beta$. The relative weighting between these two distributions is captured by the mixing proportion $\theta$: $\theta$ is parameterised so that it captures the relative weight of keystrokes that are disfluent and can, therefore, be understood as the pausing probability (Roeser et al., 2021). This is achieved by multiplying $\theta$ with the distribution of slow key transitions but using the inverse $1-\theta$ for the distribution of fluent transitions.

Each of these distributions was assumed to have unequal variances in two regards: first, disfluent keytransitions have a larger variability than fluent key transitions which is achieved by constraining $\sigma'$ to be larger than $\sigma$. Second each copy-task component (ctc) has its own variance component has the tasks are inherently different in terms of complexity.

The model is a mixed effects model because we included random effects for participants: first, we allowed the average typing speed to vary across participants. We assumed that some participants have a typing speed slower than the average $\beta$ and others are on average faster. This is captured by the parameter $u_i$ where $i$ is indexing every participant. The typing speed difference between average and participant $u$ was assumed to be distributed according to $u \sim \text{N}(0, \sigma_u)$. Second, we assumed that the probability of long latencies varies across participants, hence the index $i$ for $\theta$. This variability in pausing frequency is frequenting different copy-typing styles and captures the extent to which participants use their memory or touch typing rather than pausing and looking up during copy typing.

Finally and importantly condition was included is an indicator for every combination of language (levels: Norwegian (L1), English (L2)) and copy-task component (in short ctc; levels: consonants task, HF words, LF words, sentence, typing speed). This allows us to estimate the model estimates for each of the three parameters of interesting, namely the fluent typing speed $\beta$, the slowdown for long transitions $\delta$, and the disfluency probability $\theta$. From the posterior of the model we calculated then main effects and interactions as well as planned pairwise comparisons between language and estimated marginal cell means.

Note though that for the typing speed task we modeled $\beta$ but not the pausing probability as disfluencies in the typing speed task have no cognitive interpretation but are assumed to be entirely random. This was achieved including an if-condition in the model that is modeling ikis of the typing speed task as

\[ \begin{align}\tag{2.2} \text{iki}_{i} \sim \ \text{logN}(\beta_\text{condition[i]} + u_i, \sigma_\text{ctc})\\ \end{align} \]

which corresponds to the second line of the model in equation (2.1).

The result of the mixed-effects mixture model are summarised in Table 2.2. Table 2.2 shows main effects and interactions for language and copy-task component for each model parameter.

Table 2.2: Mixture model results of the inter-keystroke intervals with estimates for the distribution of fluent transitions, the slowdown for long transitions (both on log scale) and the probability of long transitions (on logit scale). Estimates are shown with 95% PI.
	Fluent transitions		Slowdown for disfluencies		Probability of disfluencies
Predictor	Estimate	$BF_{10}$	Estimate	$BF_{10}$	Estimate	$BF_{10}$
Main effects
Language (L1, L2)	-0.04 [-0.07, -0.01]	0.37	0.01 [-0.04, 0.06]	0.03	-0.2 [-0.4, 0]	0.67
CTC 1 (Consonants, LF words)	-0.06 [-0.17, 0.04]	0.09	0.64 [0.55, 0.73]	> 100	1.89 [1.44, 2.37]	> 100
CTC 2 (LF words, HF words)	0.33 [0.3, 0.35]	> 100	0.2 [0.12, 0.27]	> 100	0.68 [0.38, 0.97]	> 100
CTC 3 (LF words, HF words)	-0.14 [-0.15, -0.13]	> 100	-0.07 [-0.14, 0]	0.23	-0.21 [-0.46, 0.05]	0.47
CTC 4 (HF words, typing speed)	0.71 [0.7, 0.73]	> 100
Two-way interactions
Language : CTC1	-0.28 [-0.42, -0.14]	41.26	-0.02 [-0.18, 0.14]	0.08	-1.83 [-2.49, -1.2]	> 100
Language : CTC2	-0.02 [-0.06, 0.02]	0.04	-0.09 [-0.23, 0.05]	0.16	-1.07 [-1.6, -0.55]	> 100
Language : CTC3	0.07 [0.04, 0.1]	> 100	0.02 [-0.11, 0.15]	0.07	0.53 [0.06, 1.01]	2.69
Language : CTC4	0.13 [0.1, 0.16]	> 100
Note:
Colon indicates interactions. PI is the probability interval. $BF_{10}$ is the evidence in favour of the alternative hypothesis over the null hypothesis. CTC is short for ‘copy-task component’

Cell means wth 95% PIs correspoding to the main effects and interactions in Table 2.2 are illustrated in Figure 2.2 and again summarised as number along with planned by-language comparisons in Table 2.3.

Fluent transitions were found to be shorter in L2 in the HF words task, the LF words task, and the sentence task. As one would probably expect, the non-lexical tasks – the consonants task and the typing speed task – showed no language differences as these tasks are language independent and identical in both languages. No language difference was observed for the disfluency duration parameter: the slowdown participants experienced in either task was the same for Norwegian and English suggesting that pauses were not reflecting L2-related processing. Yet though, we observed a substantially higher probability of disfluencies in L2 copy-typing for the LF words task.

$Estimated cell means of transition duration with 95\% PIs (probability intervals). Shown are the estimates for the mixture component of long long transition durations in msecs (on log scale) and the probability of long transition durations.$

Figure 2.2: Estimated cell means of transition duration with 95% PIs (probability intervals). Shown are the estimates for the mixture component of long long transition durations in msecs (on log scale) and the probability of long transition durations.

Table 2.3: Key transitions of copy task. Cell means for L1 and L2 with values in msecs for fluent key-transitions, the sldowdown for long transitions and the probability of long transition durations. Language difference are shown on log scale (for transition durations) and logit scale for probability of long transition durations. 95% PIs in brackets.
Copy-task component	L1	L2	Language effect	$BF_{10}$
Fluent transitions
Consonants	266 [227, 306]	232 [211, 256]	0.13 [-0.01, 0.26]	0.47
HF words	179 [172, 185]	202 [195, 210]	-0.12 [-0.14, -0.1]	> 100
LF words	245 [235, 255]	283 [270, 297]	-0.15 [-0.18, -0.11]	> 100
Sentence	161 [155, 167]	170 [164, 176]	-0.05 [-0.07, -0.04]	> 100
Typing speed	93 [90, 97]	93 [89, 96]	0.01 [-0.02, 0.03]	0.01
Disfluencies
Consonants	669 [612, 730]	608 [561, 657]	-0.03 [-0.15, 0.1]	0.07
HF words	104 [85, 126]	104 [86, 124]	0.05 [-0.04, 0.14]	0.07
LF words	207 [170, 250]	264 [230, 300]	-0.05 [-0.15, 0.06]	0.08
Sentence	80 [65, 96]	68 [53, 86]	0.06 [-0.03, 0.16]	0.12
Probability of disfluencies
Consonants	.85 [.78, .91]	.78 [.71, .84]	0.5 [0.02, 1.02]	1.85
HF words	.24 [.19, .28]	.28 [.24, .34]	-0.25 [-0.59, 0.08]	0.51
LF words	.26 [.20, .33]	.57 [.49, .64]	-1.32 [-1.73, -0.92]	> 100
Sentence	.25 [.20, .29]	.20 [.16, .24]	0.28 [-0.05, 0.61]	0.65
Note:
PIs are probability intervals. $BF_{10}$ is the evidence in favour of the alternative hypothesis over the null hypothesis.

3 Text compositions

3.1 Text features

As can be seen in Table 3.1, participants’ L2 texts were shorter in terms of number of words and characters but consisted of more sentences. Sentences were shorter while words were longer in L2, and were informationally more sparse (had lower ratio of open to closed class words), and lexically less diverse measured using the MTLD statistic (McCarthy, 2005) as a sensitive and text-length independent measure of lexical diversity (Torruella & Capsada, 2013).

Table 3.1: Characteristics of texts written in L1 and L2. Mean (SD) and standardised effect of language (with 95% PI).
Measure	L1	L2	Standardised effect	BF
Character count	1793 (706)	1520 (635)	0.4 [0.12, 0.68]	7.04
Word count	446 (171)	362 (151)	0.5 [0.22, 0.78]	49.27
Mean word length	4.0 (.2)	4.2 (.2)	-0.84 [-1.1, -0.57]	>100
Sentence count	11.9 (6.6)	19.5 (9.6)	-0.83 [-1.09, -0.57]	>100
Mean sentence length	49.6 (53.6)	20.7 (8.4)	0.68 [0.39, 0.96]	>100
Open class / closed class	2.1 (.3)	1.1 (.5)	1.54 [1.35, 1.74]	>100
Lexical diversity (MTLD)	121.8 (31.2)	105.6 (28.3)	0.53 [0.25, 0.8]	>100
Note:
Parameter estimates and BFs from a multivariate linear model with language (L1 vs. L2 as predictor.

3.2 Sample

The final sample of data comes from 85 Norwegian students of which each produced a text in their L1 (Norwegian) or their L2 (English). Seventeen participants were removed prior to analysis because they only contributed one text.

Table 3.2: Total number of writing events for analysis by language.
Participant id	L1	L2
1	3076	3493
2	2609	2666
3	3304	3560
4	2416	3111
5	2626	1990
6	2440	1850
7	2016	2247
8	767	1454
9	2725	2442
10	3984	2985
11	2612	1066
12	1034	519
13	1123	2427
14	3190	2613
15	1317	1136
16	1370	1389
17	2332	2304
18	3435	2436
19	2370	3040
20	2883	2957
21	5329	4283
22	2654	1436
23	2462	1483
24	2217	2137
25	4128	2827
26	4630	2573
27	3037	3016
28	4567	2373
29	2055	791
30	2552	1630
31	1700	1304
32	2775	1701
33	4389	3209
34	4021	3056
35	2706	2374
36	2852	2233
37	820	630
38	416	1386
39	3182	2877
40	577	616
41	2365	2540
42	3968	2904
43	62	1369
44	2317	2556
45	2816	2434
46	1748	1599
47	3381	3382
48	2915	3126
49	3700	3417
50	2631	2261
51	3771	3113
52	3020	3876
53	2907	1982
54	2043	1036
55	2409	2992
56	1669	1544
57	2932	2225
58	3273	3362
59	2303	3963
60	2515	438
61	2319	1923
62	2407	1789
63	2919	1150
64	6280	3659
65	3474	3043
66	3699	2125
67	2762	3059
68	3480	3995
69	2518	2436
70	2652	2151
71	1225	1081
72	3234	1559
73	1572	1142
74	784	654
75	3376	4422
76	5855	4104
77	2846	2184
78	3955	1042
79	2422	1763
80	2677	2995
81	1022	369
82	2829	2572
83	1310	2813
84	3923	2785
85	3228	2846

3.3 Number of transitions

# Run model
#source("../scripts/models/transition_counts.R")
# Load model posterior
fit <- readRDS(file = "../stanout/transition_counts.rda")

fit$formula

n_transitions ~ 1 + condition + (language | participant)

In this section we analysed the number of transitions in writing data. condition was coded with main effects of Language (levels: L1, L2), Edit (levels: editing, no editing; i.e. whether a key transition terminated in an editing operation), Transition location 1 (levels: before sentence, before word), Transition location 2 (levels: before word, within word), Transition location 3 (levels: after word, within word), and all two and three-way interactions by-Transition location.

fit$family


Family: negbinomial 
Link function: log

In addition we fitted a binomial model to account for the overall number of transitions by participant and language.

# Run model
#source("../scripts/models/transition_binomial.R")
# Load model posterior
fit_binom <- readRDS(file = "../stanout/transition_binomial.rda")

fit_binom$formula

n_transitions | trials(total) ~ condition + (language | participant)

fit_binom$family


Family: binomial 
Link function: logit

Main effects and interactions of the analysis of the transition counts are shown in Table 3.3. The results are illustrated in Figure 3.1 and with the corresponding cellmeans in Table 3.4 along with planned pairwise comparisons between language.

Table 3.3: Transition-count coefficients.
	log number of transitions		logit prop. of transitions
Predictor	Estimate with 95% PI	$BF_{10}$	Estimate with 95% PI	$BF_{10}$
Main effects
Language (L1, L2)	-0.66 [-1.45, 0.16]	1.46	0.21 [0.04, 0.39]	1.44
Location 1 (before sentence, before word)	-8.54 [-8.82, -8.27]	> 100	-9.01 [-9.17, -8.84]	> 100
Location 2 (before word, within word)	4.2 [3.98, 4.43]	> 100	5.53 [5.45, 5.61]	> 100
Location 3 (after word, within word)	-2.54 [-2.76, -2.32]	> 100	-4 [-4.06, -3.93]	> 100
Edit (edit, no edit)	20.18 [19.82, 20.53]	> 100	23.08 [22.9, 23.26]	> 100
Two-way interactions
Language : Location 1	-0.14 [-0.42, 0.14]	0.22	-0.04 [-0.2, 0.12]	0.09
Language : Location 2	0.23 [0.01, 0.46]	0.91	0.33 [0.25, 0.41]	> 100
Language : Location 3	-0.07 [-0.3, 0.15]	0.14	-0.05 [-0.11, 0.01]	0.11
Language : Edit	-0.73 [-1.07, -0.39]	> 100	-0.54 [-0.72, -0.36]	> 100
Location 1 : Edit	-3 [-3.28, -2.73]	> 100	-3.24 [-3.4, -3.08]	> 100
Location 2 : Edit	0.3 [0.08, 0.53]	3.6	1.68 [1.6, 1.76]	> 100
Location 3 : Edit	-2.11 [-2.33, -1.89]	> 100	-3.43 [-3.5, -3.37]	> 100
Three-way interactions
Location 1 : Edit : Language	2.03 [1.74, 2.33]	> 100	2.19 [2.02, 2.36]	> 100
Location 2 : Edit : Language	-0.14 [-0.36, 0.08]	0.23	-0.19 [-0.27, -0.12]	> 100
Location 3 : Edit : Language	-0.11 [-0.32, 0.1]	0.18	-0.16 [-0.22, -0.1]	> 100
Note:
Colon indicates interactions. $BF_{10}$ is the evidence in favour of the alternative hypothesis over the null hypothesis

$\label{fig:transfig}Estimated cell means for transition counts with 95\% PIs (probability intervals).$

Figure 3.1: Estimated cell means for transition counts with 95% PIs (probability intervals).

Table 3.4: Transition counts and percentages. Cell means for L1 and L2 as counts and percentages, and their respective language differences on log scale. 95% PIs in brackets.
	Number of transitions				% transitions
Transition location	L1	L2	Language effect	$BF_{10}$	L1	L2	Language effect	$BF_{10}$
Editing
after words	43 [37, 50]	48 [42, 55]	-0.12 [-0.27, 0.03]	0.26	1.6 [1.6, 1.7]	2.1 [2.1, 2.2]	-0.26 [-0.31, -0.22]	> 100
before sentences	5 [4, 6]	4 [3, 5]	0.19 [-0.03, 0.41]	0.47	0.2 [0.2, 0.2]	0.2 [0.2, 0.2]	0.06 [-0.08, 0.21]	0.1
before words	20 [17, 23]	18 [16, 21]	0.07 [-0.1, 0.24]	0.12	0.8 [0.8, 0.9]	0.8 [0.8, 0.8]	0.05 [-0.01, 0.12]	0.13
within words	49 [42, 57]	54 [48, 62]	-0.11 [-0.26, 0.05]	0.21	1.9 [1.9, 2]	2.4 [2.3, 2.4]	-0.21 [-0.25, -0.17]	> 100
Writing
after words	477 [413, 555]	383 [340, 436]	0.22 [0.07, 0.37]	4.15	18.2 [18, 18.3]	17.1 [17, 17.3]	0.07 [0.06, 0.09]	> 100
before sentences	28 [24, 32]	22 [19, 25]	0.23 [0.07, 0.39]	4.2	1.1 [1.1, 1.1]	1 [1, 1.1]	0.08 [0.03, 0.14]	1.89
before words	493 [427, 573]	411 [364, 467]	0.18 [0.04, 0.33]	1.63	18.7 [18.6, 18.9]	18.2 [18, 18.4]	0.04 [0.02, 0.05]	> 100
within words	1484 [1284, 1727]	1302 [1153, 1480]	0.13 [-0.02, 0.28]	0.36	57.5 [57.3, 57.6]	58.3 [58, 58.5]	-0.03 [-0.04, -0.02]	> 100
Note:
PIs are probability intervals. $BF_{10}$ is the evidence in favour of the alternative hypothesis over the null hypothesis.

3.4 Editing frequency

# Run model
#source("../scripts/models/editing_frequency.R")

# Load model posterior
fit <- readRDS(file = "../stanout/editing_frequency.rda")

fit$formula

edits | trials(total) ~ 1 + condition + (language | participant)

This number concerns the frequency to which a key transition terminated an editing operation (revision) or not. condition was coded with main effects of Language (levels: L1, L2), Transition location 1 (levels: before sentence, before word), Transition location 2 (levels: before word, within word), Transition location 3 (levels: after word, within word) and all two-way interactions by-Transition location (i.e. no interactions that involve more than one Transition location).

fit$family


Family: binomial 
Link function: logit

Main effects and interactions of the analysis of the editing frequency are shown in Table 3.5. The results are illustrated in Figure 3.2 and with the corresponding cellmeans in Table 3.6 along with planned pairwise comparisons between language. The results show that L1 and L2 writers edited their text equally frequent with one notable exception: L2 writers showed a higher proportion of transitions that terminated in an editing operation at after-word transition locations. There was substantial evidence for a difference at within-word locations as well but the size of the differences was too small to be meaningful.

Table 3.5: Editing frequency effects on logit scale.
Predictor	Estimate with 95% PI	$BF_{10}$
Main effects
Language (L1, L2)	0.7 [0.43 – 0.96]	> 100
Location 1 (before sentence, before word)	2.54 [2.36 – 2.72]	> 100
Location 2 (before word, within word)	-0.32 [-0.39 – -0.24]	> 100
Location 3 (within word, after word)	2.08 [2.02 – 2.14]	> 100
Two-way interactions
Language : Location 1	0.02 [-0.15 – 0.2]	0.17
Language : Location 2	0.19 [0.11 – 0.27]	> 100
Language : Location 3	0.13 [0.07 – 0.19]	> 100
Note:
Colon indicates interactions. $BF_{10}$ is the evidence in favour of the alternative hypothesis over the null hypothesis

$\label{fig:editingfig1}Estimated cell means for editing frequency with 95\% PIs (probability intervals).$

Figure 3.2: Estimated cell means for editing frequency with 95% PIs (probability intervals).

Table 3.6: Editing frequency. Cell means for L1 and L2 in proportion and language difference on logit scale both shown with 95% PIs in brackets.
Transition location	L1	L2	Language effect	$BF_{10}$
after words	.08 [.07, .09]	.11 [.10, .12]	-0.37 [-0.44, -0.3]	>100
before sentences	.13 [.11, .14]	.14 [.12, .15]	-0.1 [-0.26, 0.07]	0.17
before words	.04 [.04, .04]	.04 [.04, .05]	-0.05 [-0.13, 0.03]	0.08
within words	.03 [.03, .03]	.04 [.04, .04]	-0.24 [-0.3, -0.17]	>100
Note:
PIs are probability intervals. $BF_{10}$ is the evidence in favour of the alternative hypothesis over the null hypothesis.

3.5 Transition duration

3.5.1 Raw data visualisation

Density plots of the raw keystroke intervals are shown in Figure 3.3 by copy-task component and language. Among the copy-task components participants had to copy 4 word-triplets (phrases) of which 3 triples consisted predominantly of high-frequency bigrams and 1 contained low-frequency bigrams. From the first 3 triplets we removed bigrams that were not highly frequent and from the latter we used only low-frequency bigrams for analysis.

Figure 3.3: Density plots of inter-keystroke intervals (IKIs) by transition location and language.

3.5.2 Mixture model results

The model used to analyse transition durations in text composition is largely similar to the model for the copy-task data described equation (2.1). We provde a brief description for completness.

The mixture model was implemented in the probabilistic programming language Stan. The keystroke data resulting from the text production process were modelled as coming from a mixture of two log-normal distributions of which one distribution represents fluent transitions and the other disfluencies between keystroke intervals.

\[ \begin{align}\tag{3.1} \text{iki}_{i} \sim \ & \theta_\text{condition[i]} \cdot \text{logN}(\beta_\text{condition} + \delta_\text{condition} + u_i, \sigma_\text{location}') \ +\\ & (1-\theta_\text{condition[i]}) \cdot \text{logN}(\beta_\text{condition} + u_i, \sigma_\text{location})\\ \text{constraints: } & \delta > 0, \sigma' > \sigma\\ \end{align} \] Each distribution was assumed to have unequal variances in two regards: first, disfluent key transitions have a larger variability than fluent key transitions which is achieved by constraining $\sigma'$ to be larger than $\sigma$. Second each transition location (location) has its own variance component.

condition was included is an indicator for every combination of language (levels: Norwegian (L1), English (L2)) and transition location (levels: after words, before words, within words, before sentences). This allows us to estimate the model estimates for each of the three parameters of interesting, namely the fluent typing speed $\beta$, the slowdown for long transitions $\delta$, and the disfluency probability $\theta$. From the posterior of the model we calculated then main effects and interactions as well as planned pairwise comparisons between language and estimated marginal cell means.

The model is a mixed effects model because we included random effects for participants: first, we allowed the average typing speed to vary across participants. We assumed that some participants have a typing speed slower than the average $\beta$ and others are on average faster. This is captured by the parameter $u_i$ where $i$ is indexing every participant. The typing speed difference between average and participant $u$ was assumed to be distributed according to $u \sim \text{N}(0, \sigma_u)$. Second, we assumed that the probability of long latencies varies across participants, hence the index $i$ for $\theta$. This is capturing the idea that people vary in their writing styles as to how often they pause to plan upcoming utterances and plan in parallel to production.

Prior to analysis of the transition durations, we removed data that were shorter than (or equal to) 50 msecs (M=1.2%, SE=0.17%) or longer than 15 secs (M=0.2%, SE=0.03%) were removed before analysis. Values outside of this range were removed because they are unlikely to be representative for text composition and are more likely to be related to, for example, finger slips or the writer engaging in non-composition related activities (browsing, social media, extended reading), respectively. Also, removed key intervals that terminated in a revision of the text (M=5.1%, SE=2.39%).

We randomly selected a subset of 150 observations per participant, language, and transition location, where the number ob observations per participants exceed 150 observations, to reduce the time and computational power necessary to run the statistical models.

Therefore the final sample contained a subset of the full text production data set for (most) transition locations (indicated is the by-participant average of observations used per transition location and language with standard errors [SE] indicating the variability of observations used across participants): after-words transitions (L1: M=35.7%, SE=5.2%; L2: M=44.2%, SE=5.4%), before-words transitions (L1: M=34.6%, SE=5.2%; L2: M=41.8%, SE=5.3%), within-words transitions (L1: M=13.2%, SE=3.7%; L2: M=14.3%, SE=3.8%), before-sentences transitions (L1: M=100%, SE=0%; L2: M=100%, SE=0%).

As discussed above, we modelled transition durations (time between first and second keypress) as finite mixture models with fixed effects for language, for location in text, for keystrokes that represented ongoing production as opposed to keystrokes that were followed by a text revision; for an overview on mixture models see Gelman et al. (2014) p. 519 – 524 and Hall et al. (2022) and Roeser et al. (2021) for an application to keystroke data. We assumed two data-generating processes: one associated with fluent typing, generating a distribution of relatively short transition durations, and an one, associated with longer durations (i.e. disfluencies), in which the demands of upstream (higher-level) processing affect time to plan the next keystroke.

The result of the mixed-effects mixture model are summarised in Table 3.7. Table 3.7 shows main effects and interactions for language and transition location for each model parameter.

We found strong evidence for longer transition intervals and a larger proportion of hesitations at higher-level text locations (after word < within word < before word / before sentence) although in L2 writing transitions were longer before-sentences compared to before word-locations but there was no difference in L1 writing while there were fewer pauses in L1 writing before words.

There was strong evidence of a main effect of language. This represents a general tendency of longer fluent transitions and a larger probability of hesitant transitions in L2 while the slowdown for hesitations was similar across languages.

Table 3.7: Mixture model results of the transition duration with the predictor estimates for the distribution of fluent transitions and the slowdown for long transitions (on log scale) and the probability of long transitions / disfluencies (on logit scale). Estimates are shown with 95% PI.
	Fluent transitions		Slowdown for disfluencies		Probability of disfluencies
Predictor	Estimate	$BF_{10}$	Estimate	$BF_{10}$	Estimate	$BF_{10}$
Main effects
Language (L1, L2)	-0.07 [-0.09, -0.06]	>100	-0.04 [-0.08, 0]	0.16	-0.33 [-0.49, -0.17]	>100
Location 1 (before sentence, before word)	0.15 [0.06, 0.23]	6.45	1.05 [0.91, 1.16]	>100	-0.87 [-1.23, -0.46]	>100
Location 2 (before word, within word)	0.33 [0.31, 0.34]	>100	0.35 [0.3, 0.4]	>100	1.6 [1.36, 1.84]	>100
Location 3 (after word, within word)	-0.15 [-0.16, -0.14]	>100	0.3 [0.24, 0.36]	>100	0.25 [0.03, 0.48]	1.34
Two-way interactions
Language : Location 1	0.16 [0.1, 0.21]	>100	-0.05 [-0.16, 0.06]	0.08	0.28 [-0.2, 0.76]	0.48
Language : Location 2	-0.11 [-0.14, -0.08]	>100	0.15 [0.06, 0.24]	8.86	-0.11 [-0.55, 0.34]	0.25
Language : Location 3	0.05 [0.04, 0.07]	>100	0.08 [-0.03, 0.18]	0.15	0.13 [-0.31, 0.56]	0.26
Note:
Colon indicates interactions. PI is the probability interval. $BF_{10}$ is the evidence in favour of the alternative hypothesis over the null hypothesis.

We unpacked these effects by calculating the effect of language by transition location. Cell means with 95% PIs corresponding to the main effects and interactions in Table 3.7 are illustrated in Figure 3.4 and again summarised as number along with planned by-language comparisons in Table 3.8. The following differences were found for writing in L2: Transition durations, in which we mean fluent transitions between keystrokes, were generally slower when writing in L2 at before-word locations and within-words. Also writing in L2 led to more pauses at the same locations. Interestingly, the slowdown for pauses, across all text locations was similar in L1 and L2. Also, we observed no changes related to L2 writing at before-sentence locations and at after-word locations. These results suggest that disfluency in L2 writing are associated with word-level difficulty, associated with lexical retrieval and spelling retrieval of which, persumably the latter, was not completed at word-production onset.

$Estimated cell means of transition durations with 95\% PIs (probability intervals). Shown are the estimates for each mixture components (short and long transitions) and the probability of disfluencies.$

Figure 3.4: Estimated cell means of transition durations with 95% PIs (probability intervals). Shown are the estimates for each mixture components (short and long transitions) and the probability of disfluencies.

Table 3.8: Transition duration. Cell means for L1 and L2 with values in msecs for durations and probability of disfluencies. Language difference are shown on log scale (for durations) and logit scale for probability of disfluencies. 95% PIs in brackets.
Transition location	L1	L2	Language effect	$BF_{10}$
Short fixations
after words	160 [152, 167]	162 [155, 170]	-0.02 [-0.03, -0.01]	1.09
before sentences	299 [270, 326]	306 [276, 335]	-0.02 [-0.08, 0.03]	0.04
before words	237 [226, 249]	284 [270, 299]	-0.18 [-0.2, -0.16]	>100
within words	181 [173, 190]	194 [185, 203]	-0.07 [-0.08, -0.06]	>100
Slowdown for disfluencies
after words	453 [420, 488]	481 [449, 515]	-0.04 [-0.12, 0.03]	0.08
before sentences	2,584 [2,014, 3,136]	2,706 [2,181, 3,222]	-0.02 [-0.13, 0.08]	0.06
before words	735 [690, 782]	856 [807, 907]	0.03 [-0.02, 0.07]	0.05
within words	367 [340, 396]	443 [413, 476]	-0.12 [-0.2, -0.04]	4.83
Probability of disfluencies
after words	.20 [.17, .24]	.25 [.21, .29]	-0.26 [-0.55, 0.04]	0.63
before sentences	.30 [.23, .39]	.34 [.27, .43]	-0.2 [-0.56, 0.16]	0.34
before words	.47 [.41, .52]	.59 [.53, .64]	-0.49 [-0.8, -0.17]	16.82
within words	.16 [.13, .19]	.21 [.18, .25]	-0.38 [-0.69, -0.07]	2.78
Note:
PIs are probability intervals. $BF_{10}$ is the evidence in favour of the alternative hypothesis over the null hypothesis.

References

Almond, R., Deane, P., Quinlan, T., Wagner, M., & Sydorenko, T. (2012). A preliminary analysis of keystroke log data from a timed writing task. ETS Research Report Series, 2012(2), i–61.

Baaijen, V. M., Galbraith, D., & De Glopper, K. (2012). Keystroke analysis: Reflections on procedures and measures. Written Communication, 29(3), 246–277.

Baguley, T. (2012). Serious stats: A guide to advanced statistics for the behavioral sciences. Palgrave Macmillan.

Bürkner, P.-C. (2017a). Advanced Bayesian multilevel modeling with the R package brms. arXiv Preprint arXiv:1705.11123.

Bürkner, P.-C. (2017b). brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software, 80(1), 1–28. https://doi.org/10.18637/jss.v080.i01

Carpenter, B., Gelman, A., Hoffman, M., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M. A., Guo, J., Li, P., & Riddell, A. (2016). Stan: A probabilistic programming language. Journal of Statistical Software, 20.

Dickey, J. M., Lientz, B. P., & others. (1970). The weighted likelihood ratio, sharp hypotheses about chances, the order of a markov chain. The Annals of Mathematical Statistics, 41(1), 214–226.

Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2014). Bayesian data analysis (3rd ed.). Chapman; Hall/CRC.

Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7(4), 457–472.

Hall, S., Baaijen, V. M., & Galbraith, D. (2022). Constructing theoretically informed measures of pause duration in experimentally manipulated writing. Reading and Writing, 1–29.

Hoffman, M. D., & Gelman, A. (2014). The No-U-Turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research, 15(1), 1593–1623.

Jeffreys, H. (1961). The theory of probability (Vol. 3). Oxford University Press, Clarendon Press.

Lee, M. D., & Wagenmakers, E.-J. (2014). Bayesian cognitive modeling: A practical course. Cambridge University Press.

Li, T. (2021). Identifying mixture components from large-scale keystroke log data. Frontiers in Psychology, 12, 628660.

McCarthy, P. M. (2005). An assessment of the range and usefulness of lexical diversity measures and the potential of the measure of textual, lexical diversity (MTLD) [PhD thesis]. The University of Memphis.

McElreath, R. (2016). Statistical rethinking: A bayesian course with examples in R and Stan. CRC Press.

R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/

Roeser, J., De Maeyer, S., Leijten, M., & Van Waes, L. (2021). Modelling typing disfluencies as finite mixture process. Reading and Writing, 1–26.

Stan Development Team. (n.d.). RStan: The R interface to Stan. https://mc-stan.org/

Torruella, J., & Capsada, R. (2013). Lexical statistics and tipological structures: A measure of lexical richness. Procedia-Social and Behavioral Sciences, 95, 447–454.

Wagenmakers, E.-J., Lodewyckx, T., Kuriyal, H., & Grasman, R. (2010). Bayesian hypothesis testing for psychologists: A tutorial on the savage–dickey method. Cognitive Psychology, 60(3), 158–189.

	Fluent transitions		Slowdown for disfluencies		Probability of disfluencies
Predictor	Estimate	\(BF_{10}\)	Estimate	\(BF_{10}\)	Estimate	\(BF_{10}\)
Main effects
Language (L1, L2)	-0.04 [-0.07, -0.01]	0.37	0.01 [-0.04, 0.06]	0.03	-0.2 [-0.4, 0]	0.67
CTC 1 (Consonants, LF words)	-0.06 [-0.17, 0.04]	0.09	0.64 [0.55, 0.73]	> 100	1.89 [1.44, 2.37]	> 100
CTC 2 (LF words, HF words)	0.33 [0.3, 0.35]	> 100	0.2 [0.12, 0.27]	> 100	0.68 [0.38, 0.97]	> 100
CTC 3 (LF words, HF words)	-0.14 [-0.15, -0.13]	> 100	-0.07 [-0.14, 0]	0.23	-0.21 [-0.46, 0.05]	0.47
CTC 4 (HF words, typing speed)	0.71 [0.7, 0.73]	> 100
Two-way interactions
Language : CTC1	-0.28 [-0.42, -0.14]	41.26	-0.02 [-0.18, 0.14]	0.08	-1.83 [-2.49, -1.2]	> 100
Language : CTC2	-0.02 [-0.06, 0.02]	0.04	-0.09 [-0.23, 0.05]	0.16	-1.07 [-1.6, -0.55]	> 100
Language : CTC3	0.07 [0.04, 0.1]	> 100	0.02 [-0.11, 0.15]	0.07	0.53 [0.06, 1.01]	2.69
Language : CTC4	0.13 [0.1, 0.16]	> 100
Note:
Colon indicates interactions. PI is the probability interval. \(BF_{10}\) is the evidence in favour of the alternative hypothesis over the null hypothesis. CTC is short for ‘copy-task component’