Participants

All 40 task completers reported both English and a Chinese-family language or dialect, so no participants were excluded on language-background grounds. The final analysed sample included 36 Chinese-English bilinguals after excluding 4 participants with accuracy below 80%. Participants were aged 18-42 years (\(M = 23.14\), \(SD = 5.45\)). First-language responses were predominantly Mandarin/Chinese (28 participants), followed by English (4), Cantonese (3), and Fuzhou dialect (1). For second language, most participants reported English (28), with a smaller number reporting Mandarin/Chinese (4). 13 participants reported an additional language or dialect beyond their main Chinese-English background.

Language exposure was hand-coded from free-text LEAP-Q responses. Mandarin, Chinese, Cantonese, Fuzhou dialect, and Hokkien were grouped as Chinese-family exposure. Two responses whose percentages summed to more than 100 were rescaled proportionally. Exposure could be coded for 35 participants. Mean English exposure was 48.59% (\(SD = 23.76\), range = 2.00-95.00), and mean Chinese-family exposure was 50.92% (\(SD = 23.46\), range = 5.00-97.00). 14 participants were English-dominant, 15 were Chinese-family-dominant, and 6 were approximately balanced (within 10 percentage points).

Data Analysis

Reaction-time analyses are based on correct responses only. Very slow responses are unlikely to reflect the intended speeded lexical decision process. The analysis therefore applied the language-background and participant-accuracy exclusions, kept correct responses, and then applied a 3000 ms RT cutoff. The RT cutoff removed 110 correct trials, corresponding to 8.18% of correct trials and 7.64% of all trials in the analysed participant sample.

Data were analysed using Bayesian mixed-effects models (Gelman et al. 2014; McElreath 2016; Nicenboim and Vasishth 2016). Reaction times were modelled with a shifted-lognormal distribution (Rouder 2005), which is appropriate for positively skewed response-time data because it estimates a non-decision-time shift in addition to the lognormal location and scale parameters. This approach follows recommendations to model response-time distributions directly rather than relying only on transformed Gaussian models (Lo and Andrews 2015). Accuracy was analysed with a Bernoulli model with a logit link. Both models included fixed effects of lexicality, script, and their interaction, with random intercepts for participants and items.

Predictors were coded with scaled sum contrasts (-0.5, +0.5), following recommendations for contrast coding in mixed models (Brehm and Alday 2022; Schad et al. 2020). With this coding, two-level main effects are estimated on the full pairwise-difference scale and interaction terms are estimated on the corresponding difference-in-differences scale. This keeps main effects and interactions on comparable scales for ROPE evaluation. Models were fitted with brms (Bürkner 2017), which uses Stan and Hamiltonian Monte Carlo for Bayesian estimation (Carpenter et al. 2017; Hoffman and Gelman 2014).

Weakly informative priors were used to regularise estimates without strongly constraining the direction of the effects (McElreath 2016). For the RT model, fixed effects were assigned \(\mathcal{N}(0, 0.30)\) priors on the shifted-lognormal location scale, with a \(\mathcal{N}(log(1000), 1)\) prior on the intercept and a \(\mathcal{N}(log(250), 0.35)\) prior on the non-decision-time intercept. For the accuracy model, fixed effects were assigned \(\mathcal{N}(0, 1)\) priors on the log-odds scale, with a \(\mathcal{N}(2.5, 1.5)\) prior on the intercept. Model convergence was assessed using \(\hat{R}\), effective sample sizes, divergent transitions, and visual inspection of trace plots.

Posterior summaries are reported using posterior medians, 95% probability intervals (PIs), the posterior probability that the effect is negative [P(neg.)], and the posterior probability that the effect falls inside a region of practical equivalence (ROPE). PIs directly summarise uncertainty in the posterior distribution of the parameter (Sorensen et al. 2016). ROPE values quantify the posterior probability that an effect is practically equivalent to zero (Kruschke 2018; Kruschke and Liddell 2018). The ROPE was set to \(\pm .05\) log units for RT coefficients and \(\pm .10\) log-odds for accuracy coefficients. For pairwise comparisons in the tables, the ROPE was set to \(\pm 20\) ms for RT differences, following similar applications to psycholinguistic response-time data (Vasishth et al. 2018), and \(\pm 2\) percentage points for accuracy differences.

Results

Reaction times were analysed using a Bayesian shifted-lognormal mixed-effects model. The model included fixed effects of lexicality, script, and their interaction, with random intercepts for participants and items. Estimated marginal means and pairwise script comparisons are shown in Table 1, and the RT estimates are plotted in Figure 1. For the RT model, the posterior summary for lexicality was Mdn = -0.18, 95% PI [-0.24, -0.13], P(neg.) > .99, ROPE < .01; for script, Mdn = 0.01, 95% PI [-0.05, 0.06], P(neg.) = 0.39, ROPE = 0.91; and for the lexicality by script interaction, Mdn = 0.18, 95% PI [0.07, 0.29], P(neg.) < .01, ROPE < .01.

Accuracy was analysed using a Bayesian Bernoulli mixed-effects model with fixed effects of lexicality, script, and their interaction, and random intercepts for participants and items. For the accuracy model, the posterior summary for lexicality was Mdn = 0.18, 95% PI [-0.24, 0.60], P(neg.) = 0.20, ROPE = 0.26; for script, Mdn = -0.18, 95% PI [-0.60, 0.23], P(neg.) = 0.80, ROPE = 0.26; and for the lexicality by script interaction, Mdn = 0.15, 95% PI [-0.64, 0.93], P(neg.) = 0.36, ROPE = 0.18.

Table 1: Posterior estimated marginal means and pairwise script comparisons for reaction time and accuracy. Reaction-time differences are in milliseconds; accuracy differences are in percentage points. Intervals are 95% posterior intervals.
Outcome	Lexicality	Pinyin	Chinese	Difference	Ratio / OR	P(neg.)	ROPE
Reaction time	nonword	1185 ms [1090, 1299]	1258 ms [1155, 1381]	-72 ms [-145, -3]	0.94 [0.89, 1.00]	0.98	0.07
Reaction time	word	1109 ms [1024, 1211]	1038 ms [961, 1130]	71 ms [15, 128]	1.07 [1.01, 1.12]	< .01	0.04
Accuracy	nonword	93.36% [90.03, 95.88]	94.73% [91.77, 96.91]	-1.34 pp [-4.69, 1.64]	0.78 [0.44, 1.35]	0.80	0.64
Accuracy	word	94.76% [91.75, 96.88]	95.24% [92.54, 97.21]	-0.45 pp [-3.37, 2.32]	0.91 [0.51, 1.62]	0.63	0.82

Pairwise comparisons within lexicality indicated that responses to real words were slower for Pinyin than for Chinese. For nonwords, responses were slower for Chinese than for Pinyin. Accuracy did not differ significantly between scripts for either words or nonwords.

Figure 1: Estimated marginal mean reaction times by lexicality and written form. Error bars show 95% probability intervals.

Exploratory RT Analysis by Language Exposure

As a posthoc analysis, the RT model was extended by adding exposure-dominance group and all interactions with lexicality and script. This analysis included the 29 participants classified as English-dominant or Chinese-family-dominant (English-dominant: n = 14; Chinese-family-dominant: n = 15). The six approximately balanced participants and one participant whose exposure response could not be coded were not included in this exploratory subgroup analysis. The subgroup estimated marginal means and pairwise script comparisons are shown in Table 2 and Figure 2. In addition to the three-way interaction (Mdn = -0.09, 95% PI [-0.33, 0.13], P(neg.) = 0.79, ROPE = 0.25), the exploratory model estimated the exposure-group main effect (Mdn = 0.18, 95% PI [-0.02, 0.37], P(neg.) = 0.04, ROPE = 0.08), the lexicality by exposure-group interaction (Mdn = -0.26, 95% PI [-0.38, -0.14], P(neg.) > .99, ROPE < .01), and the script by exposure-group interaction (Mdn = -0.20, 95% PI [-0.32, -0.08], P(neg.) > .99, ROPE < .01). These subgroup estimates and pairwise comparisons should be treated as descriptive.

Table 2: Posterior estimated marginal mean reaction times and pairwise script comparisons by language-exposure group. Differences are in milliseconds and intervals are 95% posterior intervals.
Exposure group	Lexicality	Pinyin	Chinese	Difference	Ratio	P(neg.)	ROPE
Chinese-family-dominant	nonword	1046 ms [937, 1172]	1058 ms [951, 1183]	-11 ms [-95, 71]	0.99 [0.91, 1.07]	0.61	0.36
Chinese-family-dominant	word	1094 ms [979, 1229]	954 ms [862, 1062]	139 ms [62, 226]	1.15 [1.06, 1.24]	< .01	< .01
English-dominant	nonword	1231 ms [1090, 1397]	1393 ms [1228, 1589]	-161 ms [-294, -41]	0.88 [0.80, 0.97]	> .99	< .01
English-dominant	word	1043 ms [934, 1169]	1071 ms [956, 1208]	-29 ms [-114, 57]	0.97 [0.90, 1.06]	0.73	0.29

Estimated marginal mean reaction times by lexicality, written form, and language-exposure group. Error bars show 95% probability intervals.

Figure 2: Estimated marginal mean reaction times by lexicality, written form, and language-exposure group. Error bars show 95% probability intervals.

References

Brehm, Laurel, and Phillip M. Alday. 2022. “Contrast Coding Choices in a Decade of Mixed Models.” Journal of Memory and Language 125: 104334. https://doi.org/10.1016/j.jml.2022.104334.

Bürkner, Paul-Christian. 2017. “Brms: An r Package for Bayesian Multilevel Models Using Stan.” Journal of Statistical Software 80 (1): 1–28. https://doi.org/10.18637/jss.v080.i01.

Carpenter, Bob, Andrew Gelman, Matthew D. Hoffman, et al. 2017. “Stan: A Probabilistic Programming Language.” Journal of Statistical Software 76 (1): 1–32. https://doi.org/10.18637/jss.v076.i01.

Gelman, Andrew, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, and Donald B. Rubin. 2014. Bayesian Data Analysis. 3rd ed. Chapman; Hall/CRC.

Hoffman, Matthew D., and Andrew Gelman. 2014. “The No-u-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo.” Journal of Machine Learning Research 15 (1): 1593–623.

Kruschke, John K. 2018. “Rejecting or Accepting Parameter Values in Bayesian Estimation.” Advances in Methods and Practices in Psychological Science 1 (2): 270–80. https://doi.org/10.1177/2515245918771304.

Kruschke, John K., and Torrin M. Liddell. 2018. “The Bayesian New Statistics: Hypothesis Testing, Estimation, Meta-Analysis, and Power Analysis from a Bayesian Perspective.” Psychonomic Bulletin & Review 25 (1): 178–206. https://doi.org/10.3758/s13423-016-1221-4.

Lo, Steson, and Sally Andrews. 2015. “To Transform or Not to Transform: Using Generalized Linear Mixed Models to Analyse Reaction Time Data.” Frontiers in Psychology 6: 1171. https://doi.org/10.3389/fpsyg.2015.01171.

McElreath, Richard. 2016. Statistical Rethinking: A Bayesian Course with Examples in r and Stan. CRC Press.

Nicenboim, Bruno, and Shravan Vasishth. 2016. “Statistical Methods for Linguistic Research: Foundational Ideas - Part II.” Language and Linguistics Compass 10 (11): 591–613. https://doi.org/10.1111/lnc3.12207.

Rouder, Jeffrey N. 2005. “Are Unshifted Distributional Models Appropriate for Response Time?” Psychometrika 70 (2): 377–81. https://doi.org/10.1007/s11336-005-1297-7.

Schad, Daniel J., Shravan Vasishth, Sven Hohenstein, and Reinhold Kliegl. 2020. “How to Capitalize on a Priori Contrasts in Linear (Mixed) Models: A Tutorial.” Journal of Memory and Language 110: 104038. https://doi.org/10.1016/j.jml.2019.104038.

Sorensen, Tanner, Sven Hohenstein, and Shravan Vasishth. 2016. “Bayesian Linear Mixed Models Using Stan: A Tutorial for Psychologists, Linguists, and Cognitive Scientists.” The Quantitative Methods for Psychology 12 (3): 175–200. https://doi.org/10.20982/tqmp.12.3.p175.

Vasishth, Shravan, Daniela Mertzen, Lena A. Jäger, and Andrew Gelman. 2018. “The Statistical Significance Filter Leads to Overoptimistic Expectations of Replicability.” Journal of Memory and Language 103: 151–75. https://doi.org/10.1016/j.jml.2018.07.004.

Appendix: Posterior Model Checks

Table 3 reports \(\hat{R}\) values for the main reported parameters from each fitted model. Trace plots are then shown for the same fixed-effect parameters. The chains showed adequate mixing for the parameters reported in the main text.

Table 3: R-hat values for the main reported parameters from each Bayesian model.
Model	Parameter	R-hat
Reaction time	Intercept	1.00
Reaction time	Lexicality	1.00
Reaction time	Script	1.00
Reaction time	Lexicality x script	1.00
Reaction time	Non-decision time intercept	1.00
Accuracy	Intercept	1.00
Accuracy	Lexicality	1.00
Accuracy	Script	1.00
Accuracy	Lexicality x script	1.00
Exploratory reaction time	Intercept	1.00
Exploratory reaction time	Exposure group	1.00
Exploratory reaction time	Lexicality x exposure group	1.00
Exploratory reaction time	Script x exposure group	1.00
Exploratory reaction time	Lexicality x script x exposure group	1.00

Figure 3: Trace plots for the main reaction-time model fixed-effect parameters.

Figure 4: Trace plots for the accuracy model fixed-effect parameters.

Figure 5: Trace plots for the exploratory reaction-time model fixed-effect parameters involving language-exposure group.

Posterior predictive checks compare the observed data with draws from the posterior predictive distribution. For reaction times, the checks compare the observed RT distribution with replicated RT distributions generated by the fitted shifted-lognormal models. For accuracy, the check compares the observed distribution of binary responses with replicated binary-response data generated by the Bernoulli model.

Posterior predictive check for the main reaction-time model. The dark line shows the observed data; lighter lines show replicated data sets generated from the posterior predictive distribution.

Figure 6: Posterior predictive check for the main reaction-time model. The dark line shows the observed data; lighter lines show replicated data sets generated from the posterior predictive distribution.

Posterior predictive check for the accuracy model. Bars compare the observed binary-response distribution with replicated data sets generated from the posterior predictive distribution.

Figure 7: Posterior predictive check for the accuracy model. Bars compare the observed binary-response distribution with replicated data sets generated from the posterior predictive distribution.

Posterior predictive check for the exploratory reaction-time model. The dark line shows the observed data; lighter lines show replicated data sets generated from the posterior predictive distribution.