1 Results

Data were analysed in Bayesian linear mixed effects models (Gelman et al. 2014; McElreath 2016). The R (R Core Team 2020) package brms (Bürkner 2017b, 2018) was used to model the data using the probabilistic programming language Stan (Carpenter et al. 2016; Hoffman and Gelman 2014). Models were fitted with weakly informative priors (see McElreath 2016), except for effect parameters (see below) and run with 10,000 iterations on 3 chains with a warm-up of 5,000 iterations and no thinning. Model convergence was confirmed by the Rubin-Gelman statistic ($\hat{R}$ = 1) (Gelman and Rubin 1992) and inspection of the Markov chain Monte Carlo chains.

We calculated the statistical support for the effects of interest and the support for the null hypothesis over the alternative hypothesis. For the present data, the latter is as informative as the evidence in favour of effects as this would show that from our data we cannot conclude that typing compared to handwriting changes the students’ writing ability. Note that traditional statistical methods (null-hypothesis significance testing) does not allow this as null-hypothesis based inference is essentially testing the probability of the data assuming that the null-hypothesis is true. In contrast, the Bayesian framework used in our analysis allows us to infer how probable the null hypothesis is given the evidence (for discussion see Dienes 2014, 2016; Dienes and Mclatchie 2018; Schönbrodt et al. 2017; Wagenmakers et al. 2018). This evidence was obtained using Bayes Factors (henceforth, BF) calculated using the Savage-Dickey method (see, e.g., Dickey, Lientz, and others 1970; Wagenmakers et al. 2010). We calculated both the evidence for the alternative hypothesis H$_1$ over the null hypothesis H$_0$ (BF$_{10}$) and the evidence for the null hypothesis H$_0$ over the alternative hypothesis, given the data. A BF$_{10}$ larger than 5 indicate moderate and larger than 10 strong evidence for a statistically meaningful effect compared to the null hypothesis H$_0$ (see, e.g., Baguley 2012; Jeffreys 1961; Lee and Wagenmakers 2014). For example BF$_{10}$=2 reflect that the alternative hypothesis is two times more likely than the null hypothesis given the evidence. Similarly, a BF$_{01}$ larger than 10 indicates strong evidence in favour of the null hypothesis H$_0$ over the alternative hypothesis H$_1$.

Priors for all effects were Student’s t-distribution centered around 0 with a scale of 10 and 1 degree of freedom (essentially Cauchy). We used deliberately vague priors for the slope parameters as the Savage-Dickey method for BFs is sensitive to the priors on the parameter slopes. Further, we report the most probable posterior (i.e. inferred) parameter value $\mu$ as well as the interval around $\mu$ that contains 95% of the posterior probability mass; 95% probability intervals (henceforth, PI) (Kruschke, Aguinis, and Joo 2012; Nicenboim and Vasishth 2016; Sorensen, Hohenstein, and Vasishth 2016).

All models were fitted with random intercepts for schools and students nested in schools, and with by-schools adjustment for modality.

1.1 Text-production ability

1.1.1 Simple main effect of writing modality

Prior to analysing modality effects on various text quality measures, we want to investigate if modality affects students’ ability to produce a text. We operationalised this ability as the production of four or more words. Out of 140 handwritten texts, 121 texts contained four of more words; out of 140 typed text, 112 texts contained four words or more (102 students produced four or more words in both modalities; these data were used for the main analysis below). To test for a possible modality difference, we performed a Bayesian generalised mixed-effects model with Bernoulli link function for binomial data on the ability to produce a text (0: less than four words produced; 1: four or more words produced) with predictors for modality (levels: handwriting, typing), age, and their interaction. The modelling results can be found in Table 1.1. We found strong evidence against a modality by age interaction and negligible evidence against (or for) a modality difference suggesting that writing modality does not impact on students’ ability to produce a text.

Table 1.1: Modality effect for text production ability.
Predictor	Estimate	Lower	Upper	H₀	H₁
(Intercept)	-4.20	-15.49	6.63	–	–
Modality	-0.14	-2.40	1.98	1.03	0.97
Age	0.09	-0.05	0.25	6.22	0.16
Modality:Age	0.00	-0.04	0.03	62.60	0.02
Note:
Colon indicates interactions. Upper and lower indicate the bounds of 95% PIs. H₀ indicates the statistical support for the null hypothesis over the alternative hypothesis given the data determined as Bayes Factor BF₀₁. Conversely, the evidence for the alternative hypothesis H₁ is determined as BF₁₀ being the statistical support in favour of the alternative hypothesis over the null hypothesis.

1.1.2 Text-production ability moderated by literacy ability

In the previous section we determined the evidence for modality effects on text-production ability. To evaluate whether the modality effect varies by literacy abilities we run another logistic Bayesian mixed-effects model with Bernoulli distribution. In addition to modality, we added the the predictors grapheme-phoneme mapping, first sound pronunciation, blending, single-word reading, speller, vocabulary, age (in months) and all by-modality two-way interactions to test for a moderation of the modality effect. Table 1.2 summarises the modelling results. We found substantial to strong evidence in favour of the full suggest that writing modality is not moderated by age, blending, and vocabulary. However, there as substantial evidence for an interaction of writing modality by single-word reading. This interaction suggests that children with higher single-word reading scores are more likely to produce four or more words in typing than in handwriting. Evidence was all other interaction was negligible.

Table 1.2: Modality effect for text production ability moderated by literacy ability.
Predictor	Estimate	Lower	Upper	H₀	H₁
(Intercept)	-7.07	-22.73	7.41	–	–
Modality	-0.07	-2.37	2.16	0.97	1.03
Age	0.07	-0.12	0.28	8.48	0.12
First sound segment	0.35	0.02	0.75	0.78	1.28
Blending	0.23	-0.22	0.71	2.84	0.35
Speller	-0.35	-2.05	1.27	1.13	0.88
Vocabulary	-0.02	-0.24	0.19	9.74	0.10
Word reading	-0.64	-1.32	-0.02	0.43	2.34
Grapheme-Phoneme mapping	0.62	0.26	1.07	0.00	970.47
Modality:Age	0.00	-0.06	0.06	35.97	0.03
Modality:First sounds segment	-0.32	-0.75	0.07	1.59	0.63
Modality:Blending	0.01	-0.52	0.54	3.84	0.26
Modality:Speller	-0.55	-2.41	1.17	1.03	0.97
Modality:Vocabulary	0.16	-0.10	0.44	4.02	0.25
Modality:Word reading	0.79	0.09	1.53	0.27	3.72
Modality:Grapheme-Phoneme mapping	-0.45	-0.91	-0.08	0.36	2.75
Note:
Colon indicates interactions. Upper and lower indicate the bounds of 95% PIs. H₀ indicates the statistical support for the null hypothesis over the alternative hypothesis given the data determined as Bayes Factor BF₀₁. Conversely, the evidence for the alternative hypothesis H₁ is determined as BF₁₀ being the statistical support in favour of the alternative hypothesis over the null hypothesis.

1.1.3 Model comparison

We tested whether a model fitted on the text production data that included all literacy measures provide model fit with a higher predictive performance (i.e. explain more variance and covariance than a model without predictors and a model with the modality predictor only). This was tested in out-of-sample predictions estimated using Pareto smoothed importance-sampling leave-one-out cross-validation (Vehtari, Gelman, and Gabry 2015, 2017). Predictive performance was estimated as the sum of the expected log predictive density ($\widehat{elpd}$). The results of the model comparisons can be found in Table 1.3. $\Delta\widehat{elpd}$ are the differences in predictive performance compared to the best fitting model (top row). The model comparisons show that the inclusions of model predictors does not contribute to explaining change in the outcome variables. All models were found to have a similar predictive performance.

Table 1.3: Model comparisons. Predictive performance was indicated as expected log pointwise predictive density ($\widehat{elpd}$). The top row shows the model with the highest predictive performance, i.e. the highest $\widehat{elpd}$; the differences $\Delta\widehat{elpd}$ are relative to the model with the highest predictive performance.
Model	$\Delta\widehat{elpd}$	$\widehat{elpd}$
Literacy measures	–	-95 (11)
Modality + Literacy measures	0 (1)	-95 (11)
Modality * Literacy measures	-3 (4)	-98 (12)
Intercept only	-13 (6)	-108 (11)
Modality	-14 (7)	-109 (11)
Note:
Standard errors are shown in parentheses.

Next we compared the fit of models with incrementally increased complexity by adding to an intercept-only model (1), modality as main effect (2), then all literacy predictors (3), and all two-way interactions of literacy predictors and modality (4). The results can be found in Table 1.4. Adding modality does not render a higher predictive performance, but adding the literacy predictors does; the interaction terms did not increase the predictive performance of the model.

Table 1.4: Model comparisons. Predictive performance was compared for individual models indicated as difference between the expected log pointwide predictive densities$\Delta\widehat{elpd}$. Predictive performance is shown for the better fitting model ($\widehat{elpd}$).
Comparison	$\Delta\widehat{elpd}$	Better model	$\widehat{elpd}$
Modality vs. Intercept only	-1 (2)	Intercept only	-108 (11)
Simple main effects vs. Modality	-14 (7)	Modality + Literacy measures	-95 (11)
Interactions vs. Simple main effects	-3 (4)	Modality + Literacy measures	-95 (11)
Note:
Standard errors are shown in parentheses.

1.2 Modality effect

1.2.1 Simple main effect of modality on writing measures

The first Bayesian multivariet mixed effects model included modality (levels: handwriting, typing) as predictor variable. We used a multivariet model to account for correlations across outcome variables. In other words, instead of modelling each outcome variable individually, we can fit one model that includes all nine outcome variables. The modelling results can be found in Table 1.5. The model column indicates the distribution family (link functions) used for each outcome variables (see Bürkner 2017a, 2019; Bürkner and Vuorre 2019). Bernoulli distributions were used to binomial data. A Gaussian distribution was used for vocabulary mean age. Poisson distributions were used for count data. Sequential process models were used for ordinal data. Sequential process models are useful for variables in which the psychological distance between categories cannot be assumed to be identical. In particular story grammar and the holistic text quality scores were modeled as sequential process as the distance between a score of 0 and a score of 1 might not be equal to the distance between 1 and 2. A zero inflated negative binomial distribution was used for advanced syntax due to a overdispersion of zero observations. A zero inflated Poisson was used for events. While Table 1.5 shows negligible support for a modality effect in all outcome variables, there is substantial support for the null hypothesis observed in the outcome variables number of words and spelling correct. In other words, the use of handwriting or typing does neither influence the number of words the students produced nor their spelling accuracy. There was weak support for H₀ observed in vocabulary mean age, advanced syntax and events suggesting no difference for use of handwriting or typing.

Table 1.5: Modality effect by outcome variable (multivariate model).
Outcome variable	Estimate	Lower	Upper	H₀	H₁	Distribution family used
Advanced structures	0.15	-1.20	1.25	3.40	0.29	Bernoulli
Spacing	0.76	-0.34	1.92	1.36	0.74	Bernoulli
Vocabulary mean age	0.17	0.03	0.31	1.35	0.74	Gaussian
Number of words	0.03	-0.06	0.12	31.03	0.03	Poisson
Spelling correct	0.02	-0.11	0.14	30.94	0.03	Poisson
Story grammar	0.42	-0.42	1.19	2.49	0.40	Sequential process
Text quality	0.29	-0.35	0.93	4.15	0.24	Sequential process
Advanced syntax	0.13	-0.09	0.34	8.56	0.12	Zero-inflated negative-binomial
Events	0.12	-0.19	0.37	8.36	0.12	Zero-inflated Poisson
Note:
Colon indicates interactions. Upper and lower indicate the bounds of 95% PIs. H₀ indicates the statistical support for the null hypothesis over the alternative hypothesis given the data determined as Bayes Factor BF₀₁. Conversely, the evidence for the alternative hypothesis H₁ is determined as BF₁₀ being the statistical support in favour of the alternative hypothesis over the null hypothesis.

1.2.2 Modality effect moderated by literacy ability

In the previous section we determined the evidence for modality effects for each dependent variable. However, the modality effect might vary as a function literacy abilities specific to the tested children. Therefore we run another series of models fitted with, in addition modality, the predictors grapheme-phoneme mapping, first sound pronunciation, blending, single word reading, speller, vocabulary, age (in months) and all by-modality two-way interactions to test for a moderation of the modality effect.

An overview of the modelling results can be found in Figure 1.1. This figure shows the strength of the evidence for the null hypothesis H$_0$ determined by BF$_{01}$. Darker colours indicate higher BF$_{01}$s and thus stronger support in favour of the null hypothesis H$_0$. For details of each outcome variable see the following subsections. Note that the BFs for the factor modality changed after we included all covariates. In contrast to the results in Table 1.5, we do no longer observe support in favor of the null hypothesis. After including all covariates we observed weak to moderate support for the alternative hypothesis H$_1$ showing a larger number of words ($\hat{\mu}$ = 1.75, 95% PI[0.18, 3.31], BF$_{10}$ = 4) and higher spelling accuracy ($\hat{\mu}$ = 2.12, 95% PI[0.24, 4], BF$_{10}$ = 5) for typing compared to handwriting.

For the by-modality interactions, Figure 1.1 shows conclusive evidence that the writing modality does not impact on any measure as function of age. The support for the remaining by-modality interactions varies across outcome variable. For example, the ability to produce the first sounds segment in words does not impact differently on vocabulary mean age, number of words, spelling accuracy, advanced syntax and events depending on the writing modality. Neither does word reading and grapheme-phoneme mapping.

$Multivariate analysis. Shown is the strength of evidence in support of the null hypothesis H$_0$ over the alternative hypothesis H$_1$ given the data. BFs$_{01}$ are shown and represented as colour gradients. Darker colour indicates stronger evidence in favour of H$_0$. Rownames show predictor variables including main effects and modality effects moderated by literacy abilities (indicated by colon). Column names indicate the outcomes variables.$

Figure 1.1: Multivariate analysis. Shown is the strength of evidence in support of the null hypothesis H$_0$ over the alternative hypothesis H$_1$ given the data. BFs$_{01}$ are shown and represented as colour gradients. Darker colour indicates stronger evidence in favour of H$_0$. Rownames show predictor variables including main effects and modality effects moderated by literacy abilities (indicated by colon). Column names indicate the outcomes variables.

1.2.3 Model comparisons

We tested whether a model that includes literacy measures provide model fit with a higher predictive performance (i.e. explain more variance and covariance than a model without predictors and a model with the modality predictor only). This was tested in out-of-sample predictions estimated using Pareto smoothed importance-sampling leave-one-out cross-validation (Vehtari, Gelman, and Gabry 2015, 2017). Predictive performance was estimated as the sum of the expected log predictive density ($\widehat{elpd}$). The results of the model comparisons can be found in Table 1.6. $\Delta\widehat{elpd}$ are the differences in predictive performance compared to the best fitting model (top row). The model comparisons show that the inclusions of model predictors does not contribute to explaining change in the outcome variables. All models were found to have a similar predictive performance.

Table 1.6: Model comparisons. Predictive performance was indicated as expected log pointwise predictive density ($\widehat{elpd}$). The top row shows the model with the highest predictive performance, i.e. the highest $\widehat{elpd}$; the differences $\Delta\widehat{elpd}$ are relative to the model with the highest predictive performance.
Model	$\Delta\widehat{elpd}$	$\widehat{elpd}$
Intercept only	–	-2888 (67)
Modality	-1 (7)	-2889 (66)
Literacy measures	-15 (11)	-2903 (69)
Modality * Literacy measures	-26 (21)	-2914 (66)
Note:
Standard errors are shown in parentheses.

References

Baguley, Thomas. 2012. Serious Stats: A Guide to Advanced Statistics for the Behavioral Sciences. Basingstoke: Palgrave Macmillan.

Bürkner, Paul-Christian. 2017a. “Advanced Bayesian Multilevel Modeling with the R Package Brms.” arXiv Preprint arXiv:1705.11123.

———. 2017b. “brms: An R Package for Bayesian Multilevel Models Using Stan.” Journal of Statistical Software 80 (1): 1–28. https://doi.org/10.18637/jss.v080.i01.

———. 2018. “Advanced Bayesian Multilevel Modeling with the R Package brms” 10 (1): 395–411. https://doi.org/10.32614/RJ-2018-017.

———. 2019. “Bayesian Item Response Modelling in R with Brms and Stan.” arXiv Preprint arXiv:1905.09501.

Bürkner, Paul-Christian, and Matti Vuorre. 2019. “Ordinal Regression Models in Psychology: A Tutorial.” Advances in Methods and Practices in Psychological Science 2 (1): 77–101.

Carpenter, Bob, Andrew Gelman, Matt Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Michael A. Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2016. “Stan: A Probabilistic Programming Language.” Journal of Statistical Software 20.

Dickey, James M., B. P. Lientz, and others. 1970. “The Weighted Likelihood Ratio, Sharp Hypotheses about Chances, the Order of a Markov Chain.” The Annals of Mathematical Statistics 41 (1): 214–26.

Dienes, Zoltan. 2014. “Using Bayes to Get the Most Out of Non-Significant Results.” Frontiers in Psychology 5 (781): 1–17.

———. 2016. “How Bayes Factors Change Scientific Practice.” Journal of Mathematical Psychology 72: 78–89.

Dienes, Zoltan, and Neil Mclatchie. 2018. “Four Reasons to Prefer Bayesian Analyses over Significance Testing.” Psychonomic Bulletin & Review 25 (1): 207–18.

Gelman, Andrew, J. B. Carlin, H. S. Stern, D. B. Dunson, Aki Vehtari, and D. B. Rubin. 2014. Bayesian Data Analysis. 3rd ed. Chapman; Hall/CRC.

Gelman, Andrew, and Donald B. Rubin. 1992. “Inference from Iterative Simulation Using Multiple Sequences.” Statistical Science 7 (4): 457–72.

Hoffman, Matthew D., and Andrew Gelman. 2014. “The No-U-Turn sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo.” Journal of Machine Learning Research 15 (1): 1593–623.

Jeffreys, Harold. 1961. The Theory of Probability. Vol. 3. Oxford: Oxford University Press, Clarendon Press.

Kruschke, John K., Herman Aguinis, and Harry Joo. 2012. “The Time Has Come: Bayesian Methods for Data Analysis in the Organizational Sciences.” Organizational Research Methods 15 (4): 722–52.

Lee, Michael D., and Eric-Jan Wagenmakers. 2014. Bayesian Cognitive Modeling: A Practical Course. Cambridge University Press.

McElreath, Richard. 2016. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. CRC Press.

Nicenboim, Bruno, and Shravan Vasishth. 2016. “Statistical Methods for Linguistic Research: Foundational Ideas – Part II.” Language and Linguistics Compass 10 (11): 591–613.

R Core Team. 2020. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Schönbrodt, Felix D., Eric-Jan Wagenmakers, Michael Zehetleitner, and Marco Perugini. 2017. “Sequential Hypothesis Testing with Bayes Factors: Efficiently Testing Mean Differences.” Psychological Methods 22 (2): 322–39.

Sorensen, Tanner, S. Hohenstein, and Shravan Vasishth. 2016. “Bayesian Linear Mixed Models Using Stan: A Tutorial for Psychologists, Linguists, and Cognitive Scientists.” Quantitative Methods for Psychology 12 (3): 175–200.

Vehtari, Aki, Andrew Gelman, and Jonah Gabry. 2015. “Pareto Smoothed Importance Sampling.” arXiv Preprint arXiv:1507.02646.

———. 2017. “Practical Bayesian Model Evaluation Using Leave-One-Out Cross-Validation and WAIC.” Statistics and Computing 27 (5): 1413–32.

Wagenmakers, Eric-Jan, Tom Lodewyckx, Himanshu Kuriyal, and Raoul Grasman. 2010. “Bayesian Hypothesis Testing for Psychologists: A Tutorial on the Savage–Dickey Method.” Cognitive Psychology 60 (3): 158–89.

Wagenmakers, Eric-Jan, Maarten Marsman, Tahira Jamil, Alexander Ly, Josine Verhagen, Jonathon Love, Ravi Selker, et al. 2018. “Bayesian Inference for Psychology. Part I: Theoretical Advantages and Practical Ramifications.” Psychonomic Bulletin & Review 25 (1): 35–57.

Handwriting and typing in Norwegian first graders

Eivor Finset Spilling, Vibeke Rønneberg, Wenke Mork Rogne, Jens Roeser, Mark Torrance

04/03/2020