All 40 task completers reported both English and a Chinese-family language or dialect, so no participants were excluded on language-background grounds. The final analysed sample included 36 Chinese-English bilinguals after excluding 4 participants with accuracy below 80%. Participants were aged 18-42 years (\(M = 23.14\), \(SD = 5.45\)). First-language responses were predominantly Mandarin/Chinese (28 participants), followed by English (4), Cantonese (3), and Fuzhou dialect (1). For second language, most participants reported English (28), with a smaller number reporting Mandarin/Chinese (4). 13 participants reported an additional language or dialect beyond their main Chinese-English background.
Language exposure was hand-coded from free-text LEAP-Q responses. Mandarin, Chinese, Cantonese, Fuzhou dialect, and Hokkien were grouped as Chinese-family exposure. Two responses whose percentages summed to more than 100 were rescaled proportionally. Exposure could be coded for 35 participants. Mean English exposure was 48.59% (\(SD = 23.76\), range = 2.00-95.00), and mean Chinese-family exposure was 50.92% (\(SD = 23.46\), range = 5.00-97.00). 14 participants were English-dominant, 15 were Chinese-family-dominant, and 6 were approximately balanced (within 10 percentage points).
Reaction-time analyses are based on correct responses only. Very slow responses are unlikely to reflect the intended speeded lexical decision process. The analysis therefore applied the language-background and participant-accuracy exclusions, kept correct responses, and then applied a 3000 ms RT cutoff. The RT cutoff removed 110 correct trials, corresponding to 8.18% of correct trials and 7.64% of all trials in the analysed participant sample.
Data were analysed using Bayesian mixed-effects models (Gelman et al. 2014; McElreath 2016; Nicenboim and Vasishth 2016). Reaction times were modelled with a shifted-lognormal distribution (Rouder 2005), which is appropriate for positively skewed response-time data because it estimates a non-decision-time shift in addition to the lognormal location and scale parameters. This approach follows recommendations to model response-time distributions directly rather than relying only on transformed Gaussian models (Lo and Andrews 2015). Accuracy was analysed with a Bernoulli model with a logit link. Both models included fixed effects of lexicality, script, and their interaction, with random intercepts for participants and items.
Predictors were coded with scaled sum contrasts (-0.5, +0.5), following recommendations for contrast coding in mixed models (Brehm and Alday 2022; Schad et al. 2020). With this coding, two-level main effects are estimated on the full pairwise-difference scale and interaction terms are estimated on the corresponding difference-in-differences scale. This keeps main effects and interactions on comparable scales for ROPE evaluation. Models were fitted with brms (Bürkner 2017), which uses Stan and Hamiltonian Monte Carlo for Bayesian estimation (Carpenter et al. 2017; Hoffman and Gelman 2014).
Weakly informative priors were used to regularise estimates without strongly constraining the direction of the effects (McElreath 2016). For the RT model, fixed effects were assigned \(\mathcal{N}(0, 0.30)\) priors on the shifted-lognormal location scale, with a \(\mathcal{N}(log(1000), 1)\) prior on the intercept and a \(\mathcal{N}(log(250), 0.35)\) prior on the non-decision-time intercept. For the accuracy model, fixed effects were assigned \(\mathcal{N}(0, 1)\) priors on the log-odds scale, with a \(\mathcal{N}(2.5, 1.5)\) prior on the intercept. Model convergence was assessed using \(\hat{R}\), effective sample sizes, divergent transitions, and visual inspection of trace plots.
Posterior summaries are reported using posterior medians, 95% probability intervals (PIs), the posterior probability that the effect is negative [P(neg.)], and the posterior probability that the effect falls inside a region of practical equivalence (ROPE). PIs directly summarise uncertainty in the posterior distribution of the parameter (Sorensen et al. 2016). ROPE values quantify the posterior probability that an effect is practically equivalent to zero (Kruschke 2018; Kruschke and Liddell 2018). The ROPE was set to \(\pm .05\) log units for RT coefficients and \(\pm .10\) log-odds for accuracy coefficients. For pairwise comparisons in the tables, the ROPE was set to \(\pm 20\) ms for RT differences, following similar applications to psycholinguistic response-time data (Vasishth et al. 2018), and \(\pm 2\) percentage points for accuracy differences.
Reaction times were analysed using a Bayesian shifted-lognormal mixed-effects model. The model included fixed effects of lexicality, script, and their interaction, with random intercepts for participants and items. Estimated marginal means and pairwise script comparisons are shown in Table 1, and the RT estimates are plotted in Figure 1. For the RT model, the posterior summary for lexicality was Mdn = -0.18, 95% PI [-0.24, -0.13], P(neg.) > .99, ROPE < .01; for script, Mdn = 0.01, 95% PI [-0.05, 0.06], P(neg.) = 0.39, ROPE = 0.91; and for the lexicality by script interaction, Mdn = 0.18, 95% PI [0.07, 0.29], P(neg.) < .01, ROPE < .01.
Accuracy was analysed using a Bayesian Bernoulli mixed-effects model with fixed effects of lexicality, script, and their interaction, and random intercepts for participants and items. For the accuracy model, the posterior summary for lexicality was Mdn = 0.18, 95% PI [-0.24, 0.60], P(neg.) = 0.20, ROPE = 0.26; for script, Mdn = -0.18, 95% PI [-0.60, 0.23], P(neg.) = 0.80, ROPE = 0.26; and for the lexicality by script interaction, Mdn = 0.15, 95% PI [-0.64, 0.93], P(neg.) = 0.36, ROPE = 0.18.
| Outcome | Lexicality | Pinyin | Chinese | Difference | Ratio / OR | P(neg.) | ROPE |
|---|---|---|---|---|---|---|---|
| Reaction time | nonword | 1185 ms [1090, 1299] | 1258 ms [1155, 1381] | -72 ms [-145, -3] | 0.94 [0.89, 1.00] | 0.98 | 0.07 |
| Reaction time | word | 1109 ms [1024, 1211] | 1038 ms [961, 1130] | 71 ms [15, 128] | 1.07 [1.01, 1.12] | < .01 | 0.04 |
| Accuracy | nonword | 93.36% [90.03, 95.88] | 94.73% [91.77, 96.91] | -1.34 pp [-4.69, 1.64] | 0.78 [0.44, 1.35] | 0.80 | 0.64 |
| Accuracy | word | 94.76% [91.75, 96.88] | 95.24% [92.54, 97.21] | -0.45 pp [-3.37, 2.32] | 0.91 [0.51, 1.62] | 0.63 | 0.82 |
Pairwise comparisons within lexicality indicated that responses to real words were slower for Pinyin than for Chinese. For nonwords, responses were slower for Chinese than for Pinyin. Accuracy did not differ significantly between scripts for either words or nonwords.
Figure 1: Estimated marginal mean reaction times by lexicality and written form. Error bars show 95% probability intervals.
As a posthoc analysis, the RT model was extended by adding exposure-dominance group and all interactions with lexicality and script. This analysis included the 29 participants classified as English-dominant or Chinese-family-dominant (English-dominant: n = 14; Chinese-family-dominant: n = 15). The six approximately balanced participants and one participant whose exposure response could not be coded were not included in this exploratory subgroup analysis. The subgroup estimated marginal means and pairwise script comparisons are shown in Table 2 and Figure 2. In addition to the three-way interaction (Mdn = -0.09, 95% PI [-0.33, 0.13], P(neg.) = 0.79, ROPE = 0.25), the exploratory model estimated the exposure-group main effect (Mdn = 0.18, 95% PI [-0.02, 0.37], P(neg.) = 0.04, ROPE = 0.08), the lexicality by exposure-group interaction (Mdn = -0.26, 95% PI [-0.38, -0.14], P(neg.) > .99, ROPE < .01), and the script by exposure-group interaction (Mdn = -0.20, 95% PI [-0.32, -0.08], P(neg.) > .99, ROPE < .01). These subgroup estimates and pairwise comparisons should be treated as descriptive.
| Exposure group | Lexicality | Pinyin | Chinese | Difference | Ratio | P(neg.) | ROPE |
|---|---|---|---|---|---|---|---|
| Chinese-family-dominant | nonword | 1046 ms [937, 1172] | 1058 ms [951, 1183] | -11 ms [-95, 71] | 0.99 [0.91, 1.07] | 0.61 | 0.36 |
| Chinese-family-dominant | word | 1094 ms [979, 1229] | 954 ms [862, 1062] | 139 ms [62, 226] | 1.15 [1.06, 1.24] | < .01 | < .01 |
| English-dominant | nonword | 1231 ms [1090, 1397] | 1393 ms [1228, 1589] | -161 ms [-294, -41] | 0.88 [0.80, 0.97] | > .99 | < .01 |
| English-dominant | word | 1043 ms [934, 1169] | 1071 ms [956, 1208] | -29 ms [-114, 57] | 0.97 [0.90, 1.06] | 0.73 | 0.29 |
Figure 2: Estimated marginal mean reaction times by lexicality, written form, and language-exposure group. Error bars show 95% probability intervals.
Table 3 reports \(\hat{R}\) values for the main reported parameters from each fitted model. Trace plots are then shown for the same fixed-effect parameters. The chains showed adequate mixing for the parameters reported in the main text.
| Model | Parameter | R-hat |
|---|---|---|
| Reaction time | Intercept | 1.00 |
| Reaction time | Lexicality | 1.00 |
| Reaction time | Script | 1.00 |
| Reaction time | Lexicality x script | 1.00 |
| Reaction time | Non-decision time intercept | 1.00 |
| Accuracy | Intercept | 1.00 |
| Accuracy | Lexicality | 1.00 |
| Accuracy | Script | 1.00 |
| Accuracy | Lexicality x script | 1.00 |
| Exploratory reaction time | Intercept | 1.00 |
| Exploratory reaction time | Exposure group | 1.00 |
| Exploratory reaction time | Lexicality x exposure group | 1.00 |
| Exploratory reaction time | Script x exposure group | 1.00 |
| Exploratory reaction time | Lexicality x script x exposure group | 1.00 |
Figure 3: Trace plots for the main reaction-time model fixed-effect parameters.
Figure 4: Trace plots for the accuracy model fixed-effect parameters.
Figure 5: Trace plots for the exploratory reaction-time model fixed-effect parameters involving language-exposure group.
Posterior predictive checks compare the observed data with draws from the posterior predictive distribution. For reaction times, the checks compare the observed RT distribution with replicated RT distributions generated by the fitted shifted-lognormal models. For accuracy, the check compares the observed distribution of binary responses with replicated binary-response data generated by the Bernoulli model.
Figure 6: Posterior predictive check for the main reaction-time model. The dark line shows the observed data; lighter lines show replicated data sets generated from the posterior predictive distribution.
Figure 7: Posterior predictive check for the accuracy model. Bars compare the observed binary-response distribution with replicated data sets generated from the posterior predictive distribution.
Figure 8: Posterior predictive check for the exploratory reaction-time model. The dark line shows the observed data; lighter lines show replicated data sets generated from the posterior predictive distribution.