L2 writing fluency in Norwegian students

1 Data analysis

All data were analysed in Bayesian mixed-effects models using the R package brms (Bürkner, 2017b, 2017a) and the probabilistic programming language Stan (Carpenter et al., 2016; Hoffman & Gelman, 2014). Models were, where plausible, fitted with random intercepts for participants and by-participant slope adjustments for Language (text written in L1 or L2).

We report the most probable posterior (i.e. inferred) parameter value as well as the interval that contains the posterior parameter value with a 95% probability; 95% probability intervals (henceforth, PI). Also we calculated the statistical support for the effects of interest and the support for the alternative hypothesis over the null hypothesis. This evidence was obtained using Bayes Factors (henceforth, BF) calculated using the Savage-Dickey method (see, e.g., Dickey et al., 1970; Wagenmakers et al., 2010). A BF larger than 5 indicate moderate and larger than 10 strong evidence for a statistically meaningful effect compared to the null hypothesis (see, e.g., Baguley, 2012; Jeffreys, 1961; Lee & Wagenmakers, 2014). For example BF=2 reflect that the alternative hypothesis is two times more likely than the null hypothesis given the evidence. Priors for all effects were weakly informative. We used these weakly informative priors favoring the null hypothesis over the alternative hypothesis for the slope parameters because BFs are sensitive to the distribution of the prior. Thus, our priors are not favoring the alternative hypothesis.

Transition duration was analysed in mixture models as described in Roeser et al. (2021). This is important to distinguish fluent transition durations associated with a smooth flow of activation from higher levels of activation into finger movements and transitions that were inhibited by, for example, difficulty to retrieve correct spelling or the right word. Rather than imposing threshold values to distinguish between simple and more demanding events, mixture models model the data as a combination of two processes using a mixing weight to capture the probability of processing difficulty occurring. Random by-participant intercepts were modelled for the pausing probability (i.e. mixing proportion) and the duration of fluent key transitions. These random-effect of the mixture models allows to capture individual differences in typing style. A detailed description and a tutorial for Bayesian mixed-effects mixture models for keystroke data can be found in Roeser et al. (2021); see also Hall et al. (2022); Baaijen et al. (2012); Li (2021); Almond et al. (2012).

Data were analysed in Bayesian mixed effects models (Gelman et al., 2014; McElreath, 2016). The R (R Core Team, 2020) package rstan (Stan Development Team, n.d.) was used to interface with the probabilistic programming language Stan (Carpenter et al., 2016) which was used to implement all models. Models were fitted with weakly informative priors (see McElreath, 2016), and run with at least 4,000 iterations on 4 chains with a warm-up of 2,000 iterations and no thinning. Model convergence was confirmed by the Rubin-Gelman statistic (\(\hat{R}\) = 1) (Gelman & Rubin, 1992) and inspection of the Markov chain Monte Carlo chains. For mixture models we used 20,000 iterations with 10,000 iterations warmup on 6 chains.

2 Copy task

Each participant (n = 101) completed a copy task in English and Norwegian.

2.1 Raw data visualisation

Density plots of the raw keystroke intervals are shown in Figure 2.1 by copy-task component and language.

Density plots of inter-keystroke intervals (IKIs) by copy-task component.

Figure 2.1: Density plots of inter-keystroke intervals (IKIs) by copy-task component.

2.2 Data reduction

This analysis focuses on transition duration. Transition duration shorter than (or equal to) 10 msecs were removed before analysis (M=1.18%, SE=0.12%).

Among the copy-task components participants had to copy 4 word-triplets (phrases) of which 3 word-triples consisted predominantly of high-frequency bigrams and 1 contained low-frequency bigrams. From the first 3 word triplets we removed bigrams that were not highly frequent and from the latter we used only low-frequency bigrams for analysis. Therefore, the HF-words task included only high-frequency bigrams, and the LF-words task included only low-frequency bigrams.

We randomly selected a subset of 50 observations per participant, language, and copy-task component to reduce the time and computational power necessary to run the statistical models. Therefore we used subsets of the data of (most) copy-task components (indicated is the by-participant average of observations used per copy-task component and language with standard errors [SE] indicating the variability of observations used across participants): consonants tasks (Norwegian: M=100%, SE=0%, English: M=100%, SE=0%), HF-words task (Norwegian: M=13.6%, SE=3.4%, English: M=13.4%, SE=3.4%), LF-words task (Norwegian: M=100%, SE=0%, English: M=100%, SE=0%), the sentence task (Norwegian: M=76.4%, SE=4.2%, English: M=74%, SE=4.4%), typing-speed task (Norwegian: M=40.7%, SE=4.9%, English: M=40.1%, SE=4.9%). The average number of bigrams used in the final sample is shown in Table 2.1.

Table 2.1: Average number of bigrams per participant in sample used for statistical modelling. Values are shown by copy-task component and language. Standard error (SE) is shown in parentheses.
Copy-task component L1 L2
Consonants 9.1 (0.1) 15 (0.1)
HF words 50 (0) 50 (0)
LF words 26.1 (0.3) 24.7 (0.4)
Sentence 47.1 (0.8) 47.4 (0.7)
Typing speed 48.5 (0.7) 49.6 (0.3)

2.3 Mixture model results

The mixture model was implemented in the probabilistic programming language Stan. The copy-typing process was constrained to be a mixture of two log-normal distributions of which one distribution represents fluent transitions and the other disfluencies.

\[ \begin{align}\tag{2.1} \text{iki}_{i} \sim \ & \theta_\text{condition[i]} \cdot \text{logN}(\beta_\text{condition} + \delta_\text{condition} + u_i, \sigma_\text{ctc}') \ +\\ & (1-\theta_\text{condition[i]}) \cdot \text{logN}(\beta_\text{condition} + u_i, \sigma_\text{ctc})\\ \text{constraints: } & \delta > 0, \sigma' > \sigma\\ \end{align} \]

The positively constrained parameter \(\delta\) achieves this: it ensures that the distribution on the first line of the equation is larger than the distribution on the second line as both distributions have the mean \(\beta\) but the first distribution is incremented by \(\delta\). Therefore, \(\beta\) is capturing the average transition duration of fluent keystroke pairs and the \(\delta\) the difference / slowdown for long keystroke pairs in addition to \(\beta\). The relative weighting between these two distributions is captured by the mixing proportion \(\theta\): \(\theta\) is parameterised so that it captures the relative weight of keystrokes that are disfluent and can, therefore, be understood as the pausing probability (Roeser et al., 2021). This is achieved by multiplying \(\theta\) with the distribution of slow key transitions but using the inverse \(1-\theta\) for the distribution of fluent transitions.

Each of these distributions was assumed to have unequal variances in two regards: first, disfluent keytransitions have a larger variability than fluent key transitions which is achieved by constraining \(\sigma'\) to be larger than \(\sigma\). Second each copy-task component (ctc) has its own variance component has the tasks are inherently different in terms of complexity.

The model is a mixed effects model because we included random effects for participants: first, we allowed the average typing speed to vary across participants. We assumed that some participants have a typing speed slower than the average \(\beta\) and others are on average faster. This is captured by the parameter \(u_i\) where \(i\) is indexing every participant. The typing speed difference between average and participant \(u\) was assumed to be distributed according to \(u \sim \text{N}(0, \sigma_u)\). Second, we assumed that the probability of long latencies varies across participants, hence the index \(i\) for \(\theta\). This variability in pausing frequency is frequenting different copy-typing styles and captures the extent to which participants use their memory or touch typing rather than pausing and looking up during copy typing.

Finally and importantly condition was included is an indicator for every combination of language (levels: Norwegian (L1), English (L2)) and copy-task component (in short ctc; levels: consonants task, HF words, LF words, sentence, typing speed). This allows us to estimate the model estimates for each of the three parameters of interesting, namely the fluent typing speed \(\beta\), the slowdown for long transitions \(\delta\), and the disfluency probability \(\theta\). From the posterior of the model we calculated then main effects and interactions as well as planned pairwise comparisons between language and estimated marginal cell means.

Note though that for the typing speed task we modeled \(\beta\) but not the pausing probability as disfluencies in the typing speed task have no cognitive interpretation but are assumed to be entirely random. This was achieved including an if-condition in the model that is modeling ikis of the typing speed task as

\[ \begin{align}\tag{2.2} \text{iki}_{i} \sim \ \text{logN}(\beta_\text{condition[i]} + u_i, \sigma_\text{ctc})\\ \end{align} \]

which corresponds to the second line of the model in equation (2.1).

The result of the mixed-effects mixture model are summarised in Table 2.2. Table 2.2 shows main effects and interactions for language and copy-task component for each model parameter.

Table 2.2: Mixture model results of the inter-keystroke intervals with estimates for the distribution of fluent transitions, the slowdown for long transitions (both on log scale) and the probability of long transitions (on logit scale). Estimates are shown with 95% PI.
Fluent transitions
Slowdown for disfluencies
Probability of disfluencies
Predictor Estimate \(BF_{10}\) Estimate \(BF_{10}\) Estimate \(BF_{10}\)
Main effects
Language (L1, L2) -0.04 [-0.07, -0.01] 0.37 0.01 [-0.04, 0.06] 0.03 -0.2 [-0.4, 0] 0.67
CTC 1 (Consonants, LF words) -0.06 [-0.17, 0.04] 0.09 0.64 [0.55, 0.73] > 100 1.89 [1.44, 2.37] > 100
CTC 2 (LF words, HF words) 0.33 [0.3, 0.35] > 100 0.2 [0.12, 0.27] > 100 0.68 [0.38, 0.97] > 100
CTC 3 (LF words, HF words) -0.14 [-0.15, -0.13] > 100 -0.07 [-0.14, 0] 0.23 -0.21 [-0.46, 0.05] 0.47
CTC 4 (HF words, typing speed) 0.71 [0.7, 0.73] > 100
Two-way interactions
Language : CTC1 -0.28 [-0.42, -0.14] 41.26 -0.02 [-0.18, 0.14] 0.08 -1.83 [-2.49, -1.2] > 100
Language : CTC2 -0.02 [-0.06, 0.02] 0.04 -0.09 [-0.23, 0.05] 0.16 -1.07 [-1.6, -0.55] > 100
Language : CTC3 0.07 [0.04, 0.1] > 100 0.02 [-0.11, 0.15] 0.07 0.53 [0.06, 1.01] 2.69
Language : CTC4 0.13 [0.1, 0.16] > 100
Note:
Colon indicates interactions. PI is the probability interval. \(BF_{10}\) is the evidence in favour of the alternative hypothesis over the null hypothesis. CTC is short for ‘copy-task component’

Cell means wth 95% PIs correspoding to the main effects and interactions in Table 2.2 are illustrated in Figure 2.2 and again summarised as number along with planned by-language comparisons in Table 2.3.

Fluent transitions were found to be shorter in L2 in the HF words task, the LF words task, and the sentence task. As one would probably expect, the non-lexical tasks – the consonants task and the typing speed task – showed no language differences as these tasks are language independent and identical in both languages. No language difference was observed for the disfluency duration parameter: the slowdown participants experienced in either task was the same for Norwegian and English suggesting that pauses were not reflecting L2-related processing. Yet though, we observed a substantially higher probability of disfluencies in L2 copy-typing for the LF words task.

Estimated cell means of transition duration with 95\% PIs (probability intervals). Shown are the estimates for the mixture component of long long transition durations in msecs (on log scale) and the probability of long transition durations.

Figure 2.2: Estimated cell means of transition duration with 95% PIs (probability intervals). Shown are the estimates for the mixture component of long long transition durations in msecs (on log scale) and the probability of long transition durations.

Table 2.3: Key transitions of copy task. Cell means for L1 and L2 with values in msecs for fluent key-transitions, the sldowdown for long transitions and the probability of long transition durations. Language difference are shown on log scale (for transition durations) and logit scale for probability of long transition durations. 95% PIs in brackets.
Copy-task component L1 L2 Language effect \(BF_{10}\)
Fluent transitions
Consonants 266 [227, 306] 232 [211, 256] 0.13 [-0.01, 0.26] 0.47
HF words 179 [172, 185] 202 [195, 210] -0.12 [-0.14, -0.1] > 100
LF words 245 [235, 255] 283 [270, 297] -0.15 [-0.18, -0.11] > 100
Sentence 161 [155, 167] 170 [164, 176] -0.05 [-0.07, -0.04] > 100
Typing speed 93 [90, 97] 93 [89, 96] 0.01 [-0.02, 0.03] 0.01
Disfluencies
Consonants 669 [612, 730] 608 [561, 657] -0.03 [-0.15, 0.1] 0.07
HF words 104 [85, 126] 104 [86, 124] 0.05 [-0.04, 0.14] 0.07
LF words 207 [170, 250] 264 [230, 300] -0.05 [-0.15, 0.06] 0.08
Sentence 80 [65, 96] 68 [53, 86] 0.06 [-0.03, 0.16] 0.12
Probability of disfluencies
Consonants .85 [.78, .91] .78 [.71, .84] 0.5 [0.02, 1.02] 1.85
HF words .24 [.19, .28] .28 [.24, .34] -0.25 [-0.59, 0.08] 0.51
LF words .26 [.20, .33] .57 [.49, .64] -1.32 [-1.73, -0.92] > 100
Sentence .25 [.20, .29] .20 [.16, .24] 0.28 [-0.05, 0.61] 0.65
Note:
PIs are probability intervals. \(BF_{10}\) is the evidence in favour of the alternative hypothesis over the null hypothesis.

3 Text compositions

3.1 Text features

As can be seen in Table 3.1, participants’ L2 texts were shorter in terms of number of words and characters but consisted of more sentences. Sentences were shorter while words were longer in L2, and were informationally more sparse (had lower ratio of open to closed class words), and lexically less diverse measured using the MTLD statistic (McCarthy, 2005) as a sensitive and text-length independent measure of lexical diversity (Torruella & Capsada, 2013).

Table 3.1: Characteristics of texts written in L1 and L2. Mean (SD) and standardised effect of language (with 95% PI).
Measure L1 L2 Standardised effect BF
Character count 1793 (706) 1520 (635) 0.4 [0.12, 0.68] 7.04
Word count 446 (171) 362 (151) 0.5 [0.22, 0.78] 49.27
Mean word length 4.0 (.2) 4.2 (.2) -0.84 [-1.1, -0.57] >100
Sentence count 11.9 (6.6) 19.5 (9.6) -0.83 [-1.09, -0.57] >100
Mean sentence length 49.6 (53.6) 20.7 (8.4) 0.68 [0.39, 0.96] >100
Open class / closed class 2.1 (.3) 1.1 (.5) 1.54 [1.35, 1.74] >100
Lexical diversity (MTLD) 121.8 (31.2) 105.6 (28.3) 0.53 [0.25, 0.8] >100
Note:
Parameter estimates and BFs from a multivariate linear model with language (L1 vs. L2 as
predictor.

3.2 Sample

The final sample of data comes from 85 Norwegian students of which each produced a text in their L1 (Norwegian) or their L2 (English). Seventeen participants were removed prior to analysis because they only contributed one text.

Table 3.2: Total number of writing events for analysis by language.
Participant id L1 L2
1 3076 3493
2 2609 2666
3 3304 3560
4 2416 3111
5 2626 1990
6 2440 1850
7 2016 2247
8 767 1454
9 2725 2442
10 3984 2985
11 2612 1066
12 1034 519
13 1123 2427
14 3190 2613
15 1317 1136
16 1370 1389
17 2332 2304
18 3435 2436
19 2370 3040
20 2883 2957
21 5329 4283
22 2654 1436
23 2462 1483
24 2217 2137
25 4128 2827
26 4630 2573
27 3037 3016
28 4567 2373
29 2055 791
30 2552 1630
31 1700 1304
32 2775 1701
33 4389 3209
34 4021 3056
35 2706 2374
36 2852 2233
37 820 630
38 416 1386
39 3182 2877
40 577 616
41 2365 2540
42 3968 2904
43 62 1369
44 2317 2556
45 2816 2434
46 1748 1599
47 3381 3382
48 2915 3126
49 3700 3417
50 2631 2261
51 3771 3113
52 3020 3876
53 2907 1982
54 2043 1036
55 2409 2992
56 1669 1544
57 2932 2225
58 3273 3362
59 2303 3963
60 2515 438
61 2319 1923
62 2407 1789
63 2919 1150
64 6280 3659
65 3474 3043
66 3699 2125
67 2762 3059
68 3480 3995
69 2518 2436
70 2652 2151
71 1225 1081
72 3234 1559
73 1572 1142
74 784 654
75 3376 4422
76 5855 4104
77 2846 2184
78 3955 1042
79 2422 1763
80 2677 2995
81 1022 369
82 2829 2572
83 1310 2813
84 3923 2785
85 3228 2846

3.3 Number of transitions

# Run model
#source("../scripts/models/transition_counts.R")
# Load model posterior
fit <- readRDS(file = "../stanout/transition_counts.rda")
fit$formula
n_transitions ~ 1 + condition + (language | participant) 

In this section we analysed the number of transitions in writing data. condition was coded with main effects of Language (levels: L1, L2), Edit (levels: editing, no editing; i.e. whether a key transition terminated in an editing operation), Transition location 1 (levels: before sentence, before word), Transition location 2 (levels: before word, within word), Transition location 3 (levels: after word, within word), and all two and three-way interactions by-Transition location.

fit$family

Family: negbinomial 
Link function: log 

In addition we fitted a binomial model to account for the overall number of transitions by participant and language.

# Run model
#source("../scripts/models/transition_binomial.R")
# Load model posterior
fit_binom <- readRDS(file = "../stanout/transition_binomial.rda")
fit_binom$formula
n_transitions | trials(total) ~ condition + (language | participant) 
fit_binom$family

Family: binomial 
Link function: logit 

Main effects and interactions of the analysis of the transition counts are shown in Table 3.3. The results are illustrated in Figure 3.1 and with the corresponding cellmeans in Table 3.4 along with planned pairwise comparisons between language.

Table 3.3: Transition-count coefficients.
log number of transitions
logit prop. of transitions
Predictor Estimate with 95% PI \(BF_{10}\) Estimate with 95% PI \(BF_{10}\)
Main effects
Language (L1, L2) -0.66 [-1.45, 0.16] 1.46 0.21 [0.04, 0.39] 1.44
Location 1 (before sentence, before word) -8.54 [-8.82, -8.27] > 100 -9.01 [-9.17, -8.84] > 100
Location 2 (before word, within word) 4.2 [3.98, 4.43] > 100 5.53 [5.45, 5.61] > 100
Location 3 (after word, within word) -2.54 [-2.76, -2.32] > 100 -4 [-4.06, -3.93] > 100
Edit (edit, no edit) 20.18 [19.82, 20.53] > 100 23.08 [22.9, 23.26] > 100
Two-way interactions
Language : Location 1 -0.14 [-0.42, 0.14] 0.22 -0.04 [-0.2, 0.12] 0.09
Language : Location 2 0.23 [0.01, 0.46] 0.91 0.33 [0.25, 0.41] > 100
Language : Location 3 -0.07 [-0.3, 0.15] 0.14 -0.05 [-0.11, 0.01] 0.11
Language : Edit -0.73 [-1.07, -0.39] > 100 -0.54 [-0.72, -0.36] > 100
Location 1 : Edit -3 [-3.28, -2.73] > 100 -3.24 [-3.4, -3.08] > 100
Location 2 : Edit 0.3 [0.08, 0.53] 3.6 1.68 [1.6, 1.76] > 100
Location 3 : Edit -2.11 [-2.33, -1.89] > 100 -3.43 [-3.5, -3.37] > 100
Three-way interactions
Location 1 : Edit : Language 2.03 [1.74, 2.33] > 100 2.19 [2.02, 2.36] > 100
Location 2 : Edit : Language -0.14 [-0.36, 0.08] 0.23 -0.19 [-0.27, -0.12] > 100
Location 3 : Edit : Language -0.11 [-0.32, 0.1] 0.18 -0.16 [-0.22, -0.1] > 100
Note:
Colon indicates interactions. \(BF_{10}\) is the evidence in favour of the alternative hypothesis over the null hypothesis
\label{fig:transfig}Estimated cell means for transition counts with 95\% PIs (probability intervals).

Figure 3.1: Estimated cell means for transition counts with 95% PIs (probability intervals).

Table 3.4: Transition counts and percentages. Cell means for L1 and L2 as counts and percentages, and their respective language differences on log scale. 95% PIs in brackets.
Number of transitions
% transitions
Transition location L1 L2 Language effect \(BF_{10}\) L1 L2 Language effect \(BF_{10}\)
Editing
after words 43 [37, 50] 48 [42, 55] -0.12 [-0.27, 0.03] 0.26 1.6 [1.6, 1.7] 2.1 [2.1, 2.2] -0.26 [-0.31, -0.22] > 100
before sentences 5 [4, 6] 4 [3, 5] 0.19 [-0.03, 0.41] 0.47 0.2 [0.2, 0.2] 0.2 [0.2, 0.2] 0.06 [-0.08, 0.21] 0.1
before words 20 [17, 23] 18 [16, 21] 0.07 [-0.1, 0.24] 0.12 0.8 [0.8, 0.9] 0.8 [0.8, 0.8] 0.05 [-0.01, 0.12] 0.13
within words 49 [42, 57] 54 [48, 62] -0.11 [-0.26, 0.05] 0.21 1.9 [1.9, 2] 2.4 [2.3, 2.4] -0.21 [-0.25, -0.17] > 100
Writing
after words 477 [413, 555] 383 [340, 436] 0.22 [0.07, 0.37] 4.15 18.2 [18, 18.3] 17.1 [17, 17.3] 0.07 [0.06, 0.09] > 100
before sentences 28 [24, 32] 22 [19, 25] 0.23 [0.07, 0.39] 4.2 1.1 [1.1, 1.1] 1 [1, 1.1] 0.08 [0.03, 0.14] 1.89
before words 493 [427, 573] 411 [364, 467] 0.18 [0.04, 0.33] 1.63 18.7 [18.6, 18.9] 18.2 [18, 18.4] 0.04 [0.02, 0.05] > 100
within words 1484 [1284, 1727] 1302 [1153, 1480] 0.13 [-0.02, 0.28] 0.36 57.5 [57.3, 57.6] 58.3 [58, 58.5] -0.03 [-0.04, -0.02] > 100
Note:
PIs are probability intervals. \(BF_{10}\) is the evidence in favour of the alternative hypothesis over the null hypothesis.

3.4 Editing frequency

# Run model
#source("../scripts/models/editing_frequency.R")

# Load model posterior
fit <- readRDS(file = "../stanout/editing_frequency.rda")
fit$formula
edits | trials(total) ~ 1 + condition + (language | participant) 

This number concerns the frequency to which a key transition terminated an editing operation (revision) or not. condition was coded with main effects of Language (levels: L1, L2), Transition location 1 (levels: before sentence, before word), Transition location 2 (levels: before word, within word), Transition location 3 (levels: after word, within word) and all two-way interactions by-Transition location (i.e. no interactions that involve more than one Transition location).

fit$family

Family: binomial 
Link function: logit 

Main effects and interactions of the analysis of the editing frequency are shown in Table 3.5. The results are illustrated in Figure 3.2 and with the corresponding cellmeans in Table 3.6 along with planned pairwise comparisons between language. The results show that L1 and L2 writers edited their text equally frequent with one notable exception: L2 writers showed a higher proportion of transitions that terminated in an editing operation at after-word transition locations. There was substantial evidence for a difference at within-word locations as well but the size of the differences was too small to be meaningful.

Table 3.5: Editing frequency effects on logit scale.
Predictor Estimate with 95% PI \(BF_{10}\)
Main effects
Language (L1, L2) 0.7 [0.43 – 0.96] > 100
Location 1 (before sentence, before word) 2.54 [2.36 – 2.72] > 100
Location 2 (before word, within word) -0.32 [-0.39 – -0.24] > 100
Location 3 (within word, after word) 2.08 [2.02 – 2.14] > 100
Two-way interactions
Language : Location 1 0.02 [-0.15 – 0.2] 0.17
Language : Location 2 0.19 [0.11 – 0.27] > 100
Language : Location 3 0.13 [0.07 – 0.19] > 100
Note:
Colon indicates interactions. \(BF_{10}\) is the evidence in favour of the alternative hypothesis over the null hypothesis
\label{fig:editingfig1}Estimated cell means for editing frequency with 95\% PIs (probability intervals).

Figure 3.2: Estimated cell means for editing frequency with 95% PIs (probability intervals).

Table 3.6: Editing frequency. Cell means for L1 and L2 in proportion and language difference on logit scale both shown with 95% PIs in brackets.
Transition location L1 L2 Language effect \(BF_{10}\)
after words .08 [.07, .09] .11 [.10, .12] -0.37 [-0.44, -0.3] >100
before sentences .13 [.11, .14] .14 [.12, .15] -0.1 [-0.26, 0.07] 0.17
before words .04 [.04, .04] .04 [.04, .05] -0.05 [-0.13, 0.03] 0.08
within words .03 [.03, .03] .04 [.04, .04] -0.24 [-0.3, -0.17] >100
Note:
PIs are probability intervals. \(BF_{10}\) is the evidence in favour of the alternative hypothesis over the null hypothesis.

3.5 Transition duration

3.5.1 Raw data visualisation

Density plots of the raw keystroke intervals are shown in Figure 3.3 by copy-task component and language. Among the copy-task components participants had to copy 4 word-triplets (phrases) of which 3 triples consisted predominantly of high-frequency bigrams and 1 contained low-frequency bigrams. From the first 3 triplets we removed bigrams that were not highly frequent and from the latter we used only low-frequency bigrams for analysis.

Density plots of inter-keystroke intervals (IKIs) by transition location and language.

Figure 3.3: Density plots of inter-keystroke intervals (IKIs) by transition location and language.

3.5.2 Mixture model results

The model used to analyse transition durations in text composition is largely similar to the model for the copy-task data described equation (2.1). We provde a brief description for completness.

The mixture model was implemented in the probabilistic programming language Stan. The keystroke data resulting from the text production process were modelled as coming from a mixture of two log-normal distributions of which one distribution represents fluent transitions and the other disfluencies between keystroke intervals.

\[ \begin{align}\tag{3.1} \text{iki}_{i} \sim \ & \theta_\text{condition[i]} \cdot \text{logN}(\beta_\text{condition} + \delta_\text{condition} + u_i, \sigma_\text{location}') \ +\\ & (1-\theta_\text{condition[i]}) \cdot \text{logN}(\beta_\text{condition} + u_i, \sigma_\text{location})\\ \text{constraints: } & \delta > 0, \sigma' > \sigma\\ \end{align} \] Each distribution was assumed to have unequal variances in two regards: first, disfluent key transitions have a larger variability than fluent key transitions which is achieved by constraining \(\sigma'\) to be larger than \(\sigma\). Second each transition location (location) has its own variance component.

condition was included is an indicator for every combination of language (levels: Norwegian (L1), English (L2)) and transition location (levels: after words, before words, within words, before sentences). This allows us to estimate the model estimates for each of the three parameters of interesting, namely the fluent typing speed \(\beta\), the slowdown for long transitions \(\delta\), and the disfluency probability \(\theta\). From the posterior of the model we calculated then main effects and interactions as well as planned pairwise comparisons between language and estimated marginal cell means.

The model is a mixed effects model because we included random effects for participants: first, we allowed the average typing speed to vary across participants. We assumed that some participants have a typing speed slower than the average \(\beta\) and others are on average faster. This is captured by the parameter \(u_i\) where \(i\) is indexing every participant. The typing speed difference between average and participant \(u\) was assumed to be distributed according to \(u \sim \text{N}(0, \sigma_u)\). Second, we assumed that the probability of long latencies varies across participants, hence the index \(i\) for \(\theta\). This is capturing the idea that people vary in their writing styles as to how often they pause to plan upcoming utterances and plan in parallel to production.

Prior to analysis of the transition durations, we removed data that were shorter than (or equal to) 50 msecs (M=1.2%, SE=0.17%) or longer than 15 secs (M=0.2%, SE=0.03%) were removed before analysis. Values outside of this range were removed because they are unlikely to be representative for text composition and are more likely to be related to, for example, finger slips or the writer engaging in non-composition related activities (browsing, social media, extended reading), respectively. Also, removed key intervals that terminated in a revision of the text (M=5.1%, SE=2.39%).

We randomly selected a subset of 150 observations per participant, language, and transition location, where the number ob observations per participants exceed 150 observations, to reduce the time and computational power necessary to run the statistical models.

Therefore the final sample contained a subset of the full text production data set for (most) transition locations (indicated is the by-participant average of observations used per transition location and language with standard errors [SE] indicating the variability of observations used across participants): after-words transitions (L1: M=35.7%, SE=5.2%; L2: M=44.2%, SE=5.4%), before-words transitions (L1: M=34.6%, SE=5.2%; L2: M=41.8%, SE=5.3%), within-words transitions (L1: M=13.2%, SE=3.7%; L2: M=14.3%, SE=3.8%), before-sentences transitions (L1: M=100%, SE=0%; L2: M=100%, SE=0%).

As discussed above, we modelled transition durations (time between first and second keypress) as finite mixture models with fixed effects for language, for location in text, for keystrokes that represented ongoing production as opposed to keystrokes that were followed by a text revision; for an overview on mixture models see Gelman et al. (2014) p. 519 – 524 and Hall et al. (2022) and Roeser et al. (2021) for an application to keystroke data. We assumed two data-generating processes: one associated with fluent typing, generating a distribution of relatively short transition durations, and an one, associated with longer durations (i.e. disfluencies), in which the demands of upstream (higher-level) processing affect time to plan the next keystroke.

The result of the mixed-effects mixture model are summarised in Table 3.7. Table 3.7 shows main effects and interactions for language and transition location for each model parameter.

We found strong evidence for longer transition intervals and a larger proportion of hesitations at higher-level text locations (after word < within word < before word / before sentence) although in L2 writing transitions were longer before-sentences compared to before word-locations but there was no difference in L1 writing while there were fewer pauses in L1 writing before words.

There was strong evidence of a main effect of language. This represents a general tendency of longer fluent transitions and a larger probability of hesitant transitions in L2 while the slowdown for hesitations was similar across languages.

Table 3.7: Mixture model results of the transition duration with the predictor estimates for the distribution of fluent transitions and the slowdown for long transitions (on log scale) and the probability of long transitions / disfluencies (on logit scale). Estimates are shown with 95% PI.
Fluent transitions
Slowdown for disfluencies
Probability of disfluencies
Predictor Estimate \(BF_{10}\) Estimate \(BF_{10}\) Estimate \(BF_{10}\)
Main effects
Language (L1, L2) -0.07 [-0.09, -0.06] >100 -0.04 [-0.08, 0] 0.16 -0.33 [-0.49, -0.17] >100
Location 1 (before sentence, before word) 0.15 [0.06, 0.23] 6.45 1.05 [0.91, 1.16] >100 -0.87 [-1.23, -0.46] >100
Location 2 (before word, within word) 0.33 [0.31, 0.34] >100 0.35 [0.3, 0.4] >100 1.6 [1.36, 1.84] >100
Location 3 (after word, within word) -0.15 [-0.16, -0.14] >100 0.3 [0.24, 0.36] >100 0.25 [0.03, 0.48] 1.34
Two-way interactions
Language : Location 1 0.16 [0.1, 0.21] >100 -0.05 [-0.16, 0.06] 0.08 0.28 [-0.2, 0.76] 0.48
Language : Location 2 -0.11 [-0.14, -0.08] >100 0.15 [0.06, 0.24] 8.86 -0.11 [-0.55, 0.34] 0.25
Language : Location 3 0.05 [0.04, 0.07] >100 0.08 [-0.03, 0.18] 0.15 0.13 [-0.31, 0.56] 0.26
Note:
Colon indicates interactions. PI is the probability interval. \(BF_{10}\) is the evidence in favour of the alternative hypothesis over the null hypothesis.

We unpacked these effects by calculating the effect of language by transition location. Cell means with 95% PIs corresponding to the main effects and interactions in Table 3.7 are illustrated in Figure 3.4 and again summarised as number along with planned by-language comparisons in Table 3.8. The following differences were found for writing in L2: Transition durations, in which we mean fluent transitions between keystrokes, were generally slower when writing in L2 at before-word locations and within-words. Also writing in L2 led to more pauses at the same locations. Interestingly, the slowdown for pauses, across all text locations was similar in L1 and L2. Also, we observed no changes related to L2 writing at before-sentence locations and at after-word locations. These results suggest that disfluency in L2 writing are associated with word-level difficulty, associated with lexical retrieval and spelling retrieval of which, persumably the latter, was not completed at word-production onset.

Estimated cell means of transition durations with 95\% PIs (probability intervals). Shown are the estimates for each mixture components (short and long transitions) and the probability of disfluencies.

Figure 3.4: Estimated cell means of transition durations with 95% PIs (probability intervals). Shown are the estimates for each mixture components (short and long transitions) and the probability of disfluencies.

Table 3.8: Transition duration. Cell means for L1 and L2 with values in msecs for durations and probability of disfluencies. Language difference are shown on log scale (for durations) and logit scale for probability of disfluencies. 95% PIs in brackets.
Transition location L1 L2 Language effect \(BF_{10}\)
Short fixations
after words 160 [152, 167] 162 [155, 170] -0.02 [-0.03, -0.01] 1.09
before sentences 299 [270, 326] 306 [276, 335] -0.02 [-0.08, 0.03] 0.04
before words 237 [226, 249] 284 [270, 299] -0.18 [-0.2, -0.16] >100
within words 181 [173, 190] 194 [185, 203] -0.07 [-0.08, -0.06] >100
Slowdown for disfluencies
after words 453 [420, 488] 481 [449, 515] -0.04 [-0.12, 0.03] 0.08
before sentences 2,584 [2,014, 3,136] 2,706 [2,181, 3,222] -0.02 [-0.13, 0.08] 0.06
before words 735 [690, 782] 856 [807, 907] 0.03 [-0.02, 0.07] 0.05
within words 367 [340, 396] 443 [413, 476] -0.12 [-0.2, -0.04] 4.83
Probability of disfluencies
after words .20 [.17, .24] .25 [.21, .29] -0.26 [-0.55, 0.04] 0.63
before sentences .30 [.23, .39] .34 [.27, .43] -0.2 [-0.56, 0.16] 0.34
before words .47 [.41, .52] .59 [.53, .64] -0.49 [-0.8, -0.17] 16.82
within words .16 [.13, .19] .21 [.18, .25] -0.38 [-0.69, -0.07] 2.78
Note:
PIs are probability intervals. \(BF_{10}\) is the evidence in favour of the alternative hypothesis over the null hypothesis.

References

Almond, R., Deane, P., Quinlan, T., Wagner, M., & Sydorenko, T. (2012). A preliminary analysis of keystroke log data from a timed writing task. ETS Research Report Series, 2012(2), i–61.
Baaijen, V. M., Galbraith, D., & De Glopper, K. (2012). Keystroke analysis: Reflections on procedures and measures. Written Communication, 29(3), 246–277.
Baguley, T. (2012). Serious stats: A guide to advanced statistics for the behavioral sciences. Palgrave Macmillan.
Bürkner, P.-C. (2017a). Advanced Bayesian multilevel modeling with the R package brms. arXiv Preprint arXiv:1705.11123.
Bürkner, P.-C. (2017b). brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software, 80(1), 1–28. https://doi.org/10.18637/jss.v080.i01
Carpenter, B., Gelman, A., Hoffman, M., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M. A., Guo, J., Li, P., & Riddell, A. (2016). Stan: A probabilistic programming language. Journal of Statistical Software, 20.
Dickey, J. M., Lientz, B. P., & others. (1970). The weighted likelihood ratio, sharp hypotheses about chances, the order of a markov chain. The Annals of Mathematical Statistics, 41(1), 214–226.
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2014). Bayesian data analysis (3rd ed.). Chapman; Hall/CRC.
Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7(4), 457–472.
Hall, S., Baaijen, V. M., & Galbraith, D. (2022). Constructing theoretically informed measures of pause duration in experimentally manipulated writing. Reading and Writing, 1–29.
Hoffman, M. D., & Gelman, A. (2014). The No-U-Turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research, 15(1), 1593–1623.
Jeffreys, H. (1961). The theory of probability (Vol. 3). Oxford University Press, Clarendon Press.
Lee, M. D., & Wagenmakers, E.-J. (2014). Bayesian cognitive modeling: A practical course. Cambridge University Press.
Li, T. (2021). Identifying mixture components from large-scale keystroke log data. Frontiers in Psychology, 12, 628660.
McCarthy, P. M. (2005). An assessment of the range and usefulness of lexical diversity measures and the potential of the measure of textual, lexical diversity (MTLD) [PhD thesis]. The University of Memphis.
McElreath, R. (2016). Statistical rethinking: A bayesian course with examples in R and Stan. CRC Press.
R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/
Roeser, J., De Maeyer, S., Leijten, M., & Van Waes, L. (2021). Modelling typing disfluencies as finite mixture process. Reading and Writing, 1–26.
Stan Development Team. (n.d.). RStan: The R interface to Stan. https://mc-stan.org/
Torruella, J., & Capsada, R. (2013). Lexical statistics and tipological structures: A measure of lexical richness. Procedia-Social and Behavioral Sciences, 95, 447–454.
Wagenmakers, E.-J., Lodewyckx, T., Kuriyal, H., & Grasman, R. (2010). Bayesian hypothesis testing for psychologists: A tutorial on the savage–dickey method. Cognitive Psychology, 60(3), 158–189.