1 Data analysis

Data were analysed in Bayesian linear mixed effects models (Gelman et al. 2014; McElreath 2016). The R (R Core Team 2020) package brms (Bürkner 2017b, 2018) was used to model the data. Models were fitted with weakly informative priors (see McElreath 2016), and run with 10,000 iterations on 3 chains with a warm-up of 5,000 iterations and no thinning. Model convergence was confirmed by the Rubin-Gelman statistic (\(\hat{R}\) = 1) (Gelman and Rubin 1992) and inspection of the Markov chain Monte Carlo chains.

We calculated the statistical support for the alternative hypothesis over the null hypothesis. This evidence was obtained using Bayes Factors (henceforth, BF) calculated using the Savage-Dickey method (see, e.g., Dickey, Lientz, and others 1970; Wagenmakers et al. 2010). We calculated both the evidence for the alternative hypothesis H\(_1\) over the null hypothesis H\(_0\) (BF\(_{10}\)) given the data. A BF\(_{10}\) larger than 5 indicate moderate and larger than 10 strong evidence for a statistically meaningful effect compared to the null hypothesis H\(_0\) (see, e.g., Baguley 2012; Jeffreys 1961; Lee and Wagenmakers 2014). For example BF\(_{10}\)=2 reflect that the alternative hypothesis is two times more likely than the null hypothesis given the evidence. In contrast to traditional statistical methods (null-hypothesis significance testing), the Bayesian framework allows us to infer the evidence against the alternative hypothesis typically corresponding to BF\(_{10}\)s smaller than 0.33. The evidence in favour of the H\(_0\) is the inverse of BF\(_{10}\); for example, for a BF\(_{10}\) of 0.33 the evidence in favour of H\(_0\) is \(\frac{1}{0.33}=\) 3.03 indicating moderate evidence in favour of H\(_0\) (for discussion see Dienes 2014, 2016; Dienes and Mclatchie 2018; Schönbrodt et al. 2017; Wagenmakers et al. 2018).

Further, we report the most probable posterior (i.e. inferred) parameter value as well as the posterior probability interval (henceforth, PI), i.e. the interval that contains the true parameter value with a 95% probability (Kruschke, Aguinis, and Joo 2012; Nicenboim and Vasishth 2016; Sorensen, Hohenstein, and Vasishth 2016).

For model comparisons we used out-of-sample predictions estimated using Pareto smoothed importance-sampling leave-one-out cross-validation (Vehtari, Gelman, and Gabry 2015, 2017). Predictive performance was estimated as the sum of the expected log predictive density (\(\widehat{elpd}\)) and the difference \(\Delta\widehat{elpd}\) between models. The advantage of using leave-one-out cross-validation is that models with more parameters are penalised to prevent overfit.

All models were fitted with random intercepts for schools and students nested in schools, and with slope adjustments for time point (i.e. (time|s|child:school) + (time|p|school))

2 Participants

181 children were tested at 5 time points in either handwriting or keyboard typing. After data cleaning (removing incomplete rows) 181 children remained. For some children we did not record 5 sessions. 3 children contributed data to 3 time points and 22 children contributed data to 4 time points. No child contributed data to less than 3 time points and 156 children contributed data to all 5 time points.

3 Classroom activity

The frequency of classroom-activity ratings is shown by modality and for each type of classroom activity in the following figure:

Frequency of classroom-activity ratings by modality.

Figure 3.1: Frequency of classroom-activity ratings by modality.

We tested whether there is a modality difference for different types of class room activities. We used a multivariate model with cumulative distribution for outcome variables discuss_text, discuss_structure, and write_story with one observation each per class room (N=10, i.e. 5 per modality). Model comparisons in leave-one-out cross-validation revealed for improvement for the model with modality as fixed effect.

Table 3.1: Model comparisons for class room activity. Predictive performance was indicated as expected log pointwise predictive density (\(\widehat{elpd}\)). The top row shows the model with the highest predictive performance, i.e. the highest \(\widehat{elpd}\); the differences \(\Delta\widehat{elpd}\) are relative to the model with the highest predictive performance.
Model \(\Delta\widehat{elpd}\) \(\widehat{elpd}\)
Intercept only -34 (4)
Modality -4 (3) -38 (7)
Note:
Standard errors are shown in parentheses.

The posterior for each outcome variable individually, rather than for class activity overall as shown in the previous model comparison, revealed inconclusive evidence for a modality effect on outcome measures discuss text and discuss structure but moderate evidence for a higher write-story score for keyboard typing compared to handwriting.

Table 3.2: Modality effect by outcome variable (multivariate model).
Outcome Estimate H1
Discuss text 1.27 [-1.17 – 4.14] 0.71
Discuss structure -0.71 [-3.5 – 1.9] 0.53
Write story 3.1 [0.34 – 6.71] 4.93
Note:
Upper and lower indicate the bounds of 95% PIs. H1 indicates the statistical support for the alternative hypothesis over the null corresponding (Bayes Factor).
Frequency of classroom-activity ratings by modality.

Figure 3.2: Frequency of classroom-activity ratings by modality.

4 Text features

To provide an overview of the texts that were analysed we summarise text features from the posterior of a multivariate mixed effects model as mean and 95% PIs.

Table 4.1: Posterior summary of text features showing the mean and 95% PIs in brackets.
Time point
Modality by outcome 1 2 3 4 5
Number of words
Handwriting 1.83 [1.26 – 2.65] 1.9 [1.34 – 2.66] 3.33 [2.42 – 4.52] 3.72 [2.69 – 5.06] 3.52 [2.49 – 4.86]
Typing 1.26 [0.86 – 1.84] 1.56 [1.1 – 2.19] 2.52 [1.84 – 3.46] 2.96 [2.17 – 4.06] 3.62 [2.62 – 5.03]
Number of spaces
Handwriting 1.95 [1.32 – 2.87] 2.71 [1.88 – 3.9] 3.57 [2.52 – 5.05] 4.29 [3.02 – 6.03] 4.24 [2.97 – 6.05]
Typing 1.42 [0.94 – 2.12] 1.98 [1.36 – 2.86] 3.39 [2.4 – 4.8] 3.84 [2.72 – 5.41] 4.31 [3.02 – 6.12]
Number of terminators
Handwriting 10.86 [7.93 – 14.96] 15.51 [11.46 – 21.17] 19.55 [14.51 – 26.39] 26.93 [19.99 – 36.34] 27.88 [20.52 – 38.02]
Typing 12.58 [9.18 – 17.39] 17.13 [12.65 – 23.42] 21.36 [15.81 – 28.98] 26.76 [19.87 – 36.55] 32.03 [23.51 – 44.2]
Syntactic complexity
Handwriting 0.33 [0.08 – 1.26] 0.82 [0.22 – 2.99] 1.19 [0.32 – 4.22] 1.33 [0.35 – 4.8] 1.03 [0.26 – 3.81]
Typing 1.93 [0.51 – 6.89] 1.19 [0.32 – 4.21] 1.37 [0.37 – 4.85] 1.49 [0.4 – 5.33] 1.43 [0.38 – 5.23]
Event count
Handwriting 14.04 [10.65 – 18.45] 18.96 [14.53 – 24.62] 22.53 [17.31 – 29.19] 29.96 [22.85 – 38.88] 31.05 [23.58 – 40.58]
Typing 14.73 [11.21 – 19.19] 19.63 [15.14 – 25.29] 24.54 [18.98 – 31.61] 29.3 [22.56 – 37.76] 34.5 [26.42 – 44.7]
Advanced narrative structures
Handwriting 0.63 [0.43 – 0.9] 0.87 [0.63 – 1.21] 1.25 [0.92 – 1.67] 1.42 [1.05 – 1.88] 1.23 [0.9 – 1.66]
Typing 0.47 [0.31 – 0.7] 0.79 [0.55 – 1.1] 1.16 [0.85 – 1.57] 1.35 [1 – 1.82] 1.47 [1.08 – 1.98]
Story grammar
Handwriting 2.15 [1.52 – 2.95] 2.95 [2.14 – 3.98] 4.28 [3.14 – 5.7] 5.61 [4.13 – 7.4] 5.85 [4.3 – 7.79]
Typing 2.04 [1.47 – 2.86] 2.82 [2.07 – 3.87] 4.53 [3.4 – 6.11] 5.04 [3.79 – 6.76] 6.2 [4.66 – 8.32]
Note:
Upper and lower indicate the bounds of 95% PIs.

5 Text composition data

5.1 Outcome variables and probability models

We used multivariate models to account for correlations across outcome variables. In other words, instead of modelling each outcome variable individually, we can fit one model that includes all nine outcome variables.

Outcomes variables and probability models (default link function throughout) (see Bürkner 2017a, 2019; Bürkner and Vuorre 2019):

Table 5.1: Outcome variables and probability models.
Outcome variables Probability models
Advanced narrative structures Negative binomial
Event count Negative binomial
Syntactic complexity Negative binomial
Number of words Negative binomial
Story grammar Sequential process model (sratio)
Vocabulary sophistication Gaussian
Spelling accuracy Binomial
Spacing accuracy Binomial
Terminators accuracy Binomial

Count data were modeled using a negative-binomial distribution to account for overdispersion. Sequential process models were used for ordinal data (story grammar). Sequential process models are useful for variables in which the psychological distance between categories cannot be assumed to be identical. In particular story grammar was modeled as sequential process as the distance between a score of 0 and a score of 1 might not be equal to the distance between 1 and 2. A Gaussian distribution was used for vocabulary mean age.

Three variables were correct for total use (and incorrect use) so that we can determine their respective improvements. This was done using the Binomial probability model with the following specifications:

\[ y_{\text{spelling}} = \frac{\text{correct}}{n_{\text{words}}} \]

\[ y_{\text{spacing}} = \frac{\text{correct}}{\text{correct} + \text{additional} + \text{missing}} \]

\[ y_{\text{terminators}} = \frac{\text{correct}}{\text{correct} + \text{additional} + \text{missing}} \]

We incrementally increased model complexity by adding first timepoint (1-5), then modality (levels: handwriting, typing), and their interaction:

  1. y ~ 1 + (time|s|child:school) + (time|p|school)
  2. y ~ time + (time|s|child:school) + (time|p|school)
  3. y ~ time + modality + (time|s|child:school) + (time|p|school)
  4. y ~ time + modality + time:modality + (time|s|child:school) + (time|p|school)

5.2 Correlation matrix

Residual correlations were estimated in a multivariate random effects with no fixed effects, other than the intercept and random effects as specified above.

Table 5.2: Estimated posterior mean for each outcome measure with all residual correlations between outcomes measures.
Outcome measures
Correlations
Outcome measure Estimate Terminator accuracy Spacing accuracy Spelling accuracy Vocabulary sophisication Story grammar Number of words Syntactic complexity Event count
Advanced narrative structures 2.96 [2.24 – 3.68] -.05 [-.12 – .03] -.05 [-.13 – .03] -.15 [-.22 – -.07] .17 [.10 – .24] .50 [.45 – .56] .53 [.47 – .58] .52 [.46 – .57] .63 [.58 – .67]
Event count 3.58 [2.58 – 4.52] -.06 [-.13 – .02] -.06 [-.14 – .02] -.08 [-.16 – -.00] .13 [.06 – .20] .38 [.32 – .45] .74 [.70 – .78] .72 [.68 – .75]
Syntactic complexity 5.28 [4.06 – 6.45] -.02 [-.10 – .05] -.06 [-.14 – .01] -.08 [-.15 – .00] .13 [.05 – .20] .24 [.17 – .31] .86 [.84 – .88]
Number of words 27.72 [21.14 – 34.11] -.08 [-.16 – -.01] -.07 [-.15 – .01] -.08 [-.16 – -.01] .14 [.07 – .21] .27 [.19 – .34]
Story grammar 1.13 [0.84 – 1.4] .01 [-.06 – .09] -.06 [-.13 – .02] -.12 [-.20 – -.04] .05 [-.02 – .13]
Vocabulary sophistication 6.86 [6.77 – 6.95] -.02 [-.10 – .05] -.09 [-.16 – -.01] -.29 [-.35 – -.22]
Spelling accuracy 0.71 [0.63 – 0.78] .07 [-.00 – .15] .23 [.15 – .30]
Spacing accuracy 0.94 [0.9 – 0.97] .04 [-.03 – .12]
Terminator accuracy 0.12 [0.03 – 0.34]
Note:
95% probability interval in brackets.

5.3 Timecourse function

Time was modeled as first order orthogonal polynomial (linear function) and as second (quadratic) and third order polynomial (cubic). Model comparisons in leave-one-out cross-validation revealed highest predictive performance for the cubic function which, however, was non-different from the quadratic model. Therefore, we use the quadratic function for modelling the time course in the models with modality as factor.

Table 5.3: Model comparisons for time function. Predictive performance was indicated as expected log pointwise predictive density (\(\widehat{elpd}\)). The top row shows the model with the highest predictive performance, i.e. the highest \(\widehat{elpd}\); the differences \(\Delta\widehat{elpd}\) are relative to the model with the highest predictive performance.
Model \(\Delta\widehat{elpd}\) \(\widehat{elpd}\)
Cubic -14810 (149)
Quadratic -11 (9) -14820 (150)
Linear -80 (22) -14890 (152)
Intercept only -82 (22) -14892 (151)
Note:
Standard errors are shown in parentheses.

5.4 Modality effect

Highest predictive performance was found to model that included the timecourse-by-modality interaction. In other words, writing modality has a meaningful effect on the timecourse of writing development.

Table 5.4: Model comparisons for modality timecourse effects. Predictive performance was indicated as expected log pointwise predictive density (\(\widehat{elpd}\)). The top row shows the model with the highest predictive performance, i.e. the highest \(\widehat{elpd}\); the differences \(\Delta\widehat{elpd}\) are relative to the model with the highest predictive performance.
Model \(\Delta\widehat{elpd}\) \(\widehat{elpd}\)
Modality \(\times\) (Time + Time\(^2\)) -14796 (148)
Modality + Time + Time\(^2\) -19 (12) -14814 (150)
Time + Time\(^2\) -25 (12) -14820 (150)
Time -94 (23) -14890 (152)
Intercept only -96 (23) -14892 (151)
Note:
Standard errors are shown in parentheses.

5.5 Model coefficients

The following table shows the parameter values estimates for each outcome variable (rows) for main effects of modality and timepoint and their interaction (columns). Evidence for an interaction of modality and the quadratic timepoint term was highlighted in red; evidence in support of the null hypothesis was highlighted in green.

Table 5.5: Model coefficients by outcome variable (multivariate model). Outcomes measures are shown in rows and predictor variables in columns. Moderate to strong evidence in favour of a time-by-modality interaction was highlighted in red; evidence against the alternative hypothesis was highlighted in green.
Time
Time\(^2\)
Modality
Modality \(\times\) Time
Modality \(\times\) Time\(^2\)
Outcome Estimate H1 Estimate H1 Estimate H1 Estimate H1 Estimate H1
Number of words 8.28 [6.18 – 10.24] >100 -1.34 [-2.32 – -0.34] 1.34 0.1 [-0.27 – 0.46] 0.08 -0.03 [-2.66 – 2.83] 0.26 0.79 [-0.62 – 2.18] 0.26
Spacing accuracy 13.45 [8.09 – 18.07] >100 -2.61 [-4.45 – -0.77] 3.38 0.66 [-0.25 – 1.56] 0.67 -5.57 [-12.04 – 1.2] 2.56 4.54 [1.74 – 7.31] 46.28
Terminator accuracy 8.03 [-1.03 – 16.5] 2.12 -12.48 [-16.16 – -8.88] >100 0.6 [-1.2 – 2.35] 0.5 -3.41 [-13.29 – 5.64] 1.09 9.25 [4.43 – 14.18] >100
Spelling accuracy 3.31 [0.04 – 6.45] 1.3 -2.26 [-3.42 – -1.1] >100 0.38 [0.12 – 0.65] 3.14 -3.3 [-7.41 – 1.08] 1.39 2.36 [0.68 – 4.01] 6.92
Vocabulary sophistication 0.28 [-1.99 – 2.54] 0.1 -0.02 [-0.93 – 0.9] 0.04 -0.1 [-0.2 – 0] 0.15 1.98 [-1.17 – 5.08] 0.74 1.4 [0.11 – 2.7] 1.09
Syntactic complexity 10.84 [8.58 – 13.07] >100 -2.4 [-3.71 – -1.08] 50.8 0.02 [-0.38 – 0.42] 0.08 0.3 [-2.64 – 3.29] 0.28 0.46 [-1.39 – 2.33] 0.21
Event count 8.43 [5.82 – 11.21] >100 -2.78 [-4.27 – -1.32] 25.75 -0.13 [-0.59 – 0.34] 0.11 2.78 [-1 – 6.35] 1.09 -0.27 [-2.42 – 1.86] 0.21
Advanced narrative structures 8.37 [5.12 – 11.64] >100 -2.71 [-4.3 – -1.13] 23.67 -0.21 [-0.63 – 0.2] 0.17 2.49 [-1.84 – 6.92] 0.83 1.19 [-1.14 – 3.52] 0.38
Story grammar 25.15 [17.36 – 32.93] >100 -10.81 [-16.11 – -5.66] >100 -0.06 [-1.18 – 1.06] 0.22 8.17 [-0.87 – 18.62] 3.88 2.97 [-3.38 – 9.84] 0.88
Note:
Upper and lower indicate the bounds of 95% PIs. H1 indicates the statistical support for the alternative hypothesis over the null corresponding (Bayes Factor).

The overall contribution of addting the by-modality interaction was determined by comparing a a simple main effects models to a model with time-by-modality interactions for each correct spacing and correct spelling. Models were comapred using Bayes Factors estimated via bridge sampling (Bennett 1976; Meng and Wong 1996). We found evidence in favour of the interaction model for both spacing (BF\(_{10}\) = 100) and spelling (BF\(_{10}\) = 10).

5.6 Timecourse visualisation

The posterior of the best fitting model was used to visualise the modality-timecourse effects for each outcome variable. For events, number of words, and syntax we found non-different timecourse effects for handwriting and typing. In other words, development in these skills had the same growth rate. Different growth rates were found for correct spacing, correct spelling, and correct terminators after accounting for longer texts for later timepoints. For correct spacing and correct spelling we observe a learning curve for handwriting that is catching up with the performance of keyboard typists. The pattern observed in the posterior for correct terminators is inconclusive (we will come back to this below). For advanced structures and story grammar we observe growth but no conclusive evidence as to whether the function was different for writing modality. The evidence for timecourse changes and modality-specific effects was negligible for vocabulary mean age.

Posterior of the multivariate analysis. Shown are the timecourse effects for each outcome variable by modality.

Figure 5.1: Posterior of the multivariate analysis. Shown are the timecourse effects for each outcome variable by modality.

5.7 Terminator analysis

To explore the unclear results for terminator accuracy (evidence for time-by-modality interaction but unclear timecourse effect), we analysed the three variables that were used for terminator accuracy separately in a multivariate model with negative-binomial distribution for correct terminators, missing terminators, and additional terminators.

5.7.1 Model comparisons

Model comparisons confirmed an interaction for modality and the quadratic function for time.

Table 5.6: Model comparisons for terminator analysis. Predictive performance was indicated as expected log pointwise predictive density (\(\widehat{elpd}\)). The top row shows the model with the highest predictive performance, i.e. the highest \(\widehat{elpd}\); the differences \(\Delta\widehat{elpd}\) are relative to the model with the highest predictive performance.
Model \(\Delta\widehat{elpd}\) \(\widehat{elpd}\)
Modality \(\times\) (Time + Time\(^2\)) -3659 (65)
Modality + Time + Time\(^2\) -18 (8) -3676 (66)
Time + Time\(^2\) -18 (8) -3677 (66)
Intercept only -23 (8) -3682 (65)
Time -96 (18) -3755 (65)
Note:
Standard errors are shown in parentheses.

5.7.2 Model coefficients

The model coefficients show strong evidence for a modality-by-time interaction for correct terminators and moderate evidence for additional terminators and missing terminators.

Table 5.7: Model coefficients by terminator measure (multivariate model). Outcomes measures are shown in rows and predictor variables in columns. Moderate to strong evidence in favour of a time-by-modality interaction was highlighted in red.
Time
Time\(^2\)
Modality
Modality \(\times\) Time
Modality \(\times\) Time\(^2\)
Outcome Estimate H1 Estimate H1 Estimate H1 Estimate H1 Estimate H1
Additional terminators -4.28 [-14.92 – 5.85] 0.67 -2.34 [-10.81 – 5.43] 0.43 0.33 [-1.25 – 1.97] 0.36 -6.15 [-19.89 – 4.72] 1.7 7.91 [-0.74 – 18.57] 3.26
Correct terminators 11.89 [6.5 – 16.77] >100 -8.29 [-11.27 – -5.31] >100 0.61 [-0.65 – 1.87] 0.44 -7.59 [-13.72 – -0.67] 7.41 8.07 [4.21 – 12.04] >100
Missing terminators 3.24 [-1.14 – 7.89] 0.62 0.87 [-1.03 – 2.78] 0.14 -0.06 [-0.75 – 0.65] 0.14 3.08 [-2.9 – 8.87] 1.04 -3.54 [-6.27 – -0.85] 6.66
Note:
Upper and lower indicate the bounds of 95% PIs. H1 indicates the statistical support for the alternative hypothesis over the null corresponding (Bayes Factor).

5.7.3 Timecourse visualisation

The posterior for each temrinator variable is shown in the following figure. The timecourse functions show different profiles for typing and handwriting with, however, a substantial variability.

Posterior of the multivariate analysis. Shown are the timecourse effects for each terminator variable by modality.

Figure 5.2: Posterior of the multivariate analysis. Shown are the timecourse effects for each terminator variable by modality.

5.7.4 Correct terminators (corrected for text length)

The strongest evidence for a modality-by-time interaction was found for the number of correct terminators. As the texts children produced were shorter at earlier timepoints, meaning they produced less terminators overall, we adjusted the analysis of the number of correct terminators for text length (i.e. number of words).

5.7.4.1 Model coefficients

After accounting for the possibility that text-length confounded the results of the analysis of correct terminators, we don’t observe evidence for modality-by-time interaction.

Table 5.8: Model coefficients terminator correct (multivariate model).
Predictor Estimate H1
Main effects
Time 8.7 [1.93 – 15.59] 6.91
Time\(^2\) -4.03 [-8.95 – 0.73] 0.89
Number of words 0 [0 – 0.01] 0
Modality 0.35 [-1.02 – 1.73] 0.32
Two-way interactions
Modality \(\times\) Number of words 0.01 [0 – 0.02] 0
Modality \(\times\) Time -2.07 [-9.84 – 5.32] 0.77
Modality \(\times\) Time\(^2\) 1.84 [-3.86 – 7.75] 0.63
Number of words \(\times\) Time 0.1 [-0.13 – 0.33] 0.03
Number of words \(\times\) Time\(^2\) -0.14 [-0.29 – 0.01] 0.07
Three-way interactions
Number of words \(\times\) Modality \(\times\) Time -0.29 [-0.55 – -0.03] 0.25
Number of words \(\times\) Modality \(\times\) Time\(^2\) 0.26 [0.08 – 0.44] 0.78
Note:
Upper and lower indicate the bounds of 95% PIs. H1 indicates the statistical support for the alternative hypothesis over the null corresponding (Bayes Factor).

5.7.4.2 Timecourse visualisation

Posterior of terminators correct after adjusting for text length. Shown are the timecourse effects for terminators correct by modality.

Figure 5.3: Posterior of terminators correct after adjusting for text length. Shown are the timecourse effects for terminators correct by modality.

5.7.5 Distribution of sample

It would be good to give some information that helps to understand the terminator results. At this place we should probably add some of the extreme examples from kids using terminators instead of spaces. The figure below might not help as much as some examples would do. The figure shows the distribution of values for additional, correct, and missing terminators across all 5 time points. What the figure does highlight is that there are the extreme values. For example, we see that there are extreme values of more than 20 missing terminators at timepoints 3-5. There is an extreme case of more than 30 additional terminators at timepoint 1. But we also see children with 10 and more correct terminators across all timepoints.

Density distributions

Figure 5.4: Density distributions

5.8 Intra-class correlations

Table 5.9: Intra-class correlations (ICC) for schools and children by written composition measure.
ICC
Composition measure Schools Children
advancedstructures .05 .03
correctspaces .21 .12
correctterminators .21 .41
events .06 .03
numberwords .06 .02
spellingcorrect .05 .01
storygrammar .29 .17
syntax .09 .02
vocabmeanage .01 .00

References

Baguley, Thomas. 2012. Serious Stats: A Guide to Advanced Statistics for the Behavioral Sciences. Basingstoke: Palgrave Macmillan.
Bennett, Charles H. 1976. “Efficient Estimation of Free Energy Differences from Monte Carlo Data.” Journal of Computational Physics 22 (2): 245–68.
Bürkner, Paul-Christian. 2017a. “Advanced Bayesian Multilevel Modeling with the R Package Brms.” arXiv Preprint arXiv:1705.11123.
———. 2017b. brms: An R Package for Bayesian Multilevel Models Using Stan.” Journal of Statistical Software 80 (1): 1–28. https://doi.org/10.18637/jss.v080.i01.
———. 2018. “Advanced Bayesian Multilevel Modeling with the R Package brms 10 (1): 395–411. https://doi.org/10.32614/RJ-2018-017.
———. 2019. “Bayesian Item Response Modelling in R with Brms and Stan.” arXiv Preprint arXiv:1905.09501.
Bürkner, Paul-Christian, and Matti Vuorre. 2019. “Ordinal Regression Models in Psychology: A Tutorial.” Advances in Methods and Practices in Psychological Science 2 (1): 77–101.
Dickey, James M., B. P. Lientz, and others. 1970. “The Weighted Likelihood Ratio, Sharp Hypotheses about Chances, the Order of a Markov Chain.” The Annals of Mathematical Statistics 41 (1): 214–26.
Dienes, Zoltan. 2014. “Using Bayes to Get the Most Out of Non-Significant Results.” Frontiers in Psychology 5 (781): 1–17.
———. 2016. “How Bayes Factors Change Scientific Practice.” Journal of Mathematical Psychology 72: 78–89.
Dienes, Zoltan, and Neil Mclatchie. 2018. “Four Reasons to Prefer Bayesian Analyses over Significance Testing.” Psychonomic Bulletin & Review 25 (1): 207–18.
Gelman, Andrew, J. B. Carlin, H. S. Stern, D. B. Dunson, Aki Vehtari, and D. B. Rubin. 2014. Bayesian Data Analysis. 3rd ed. Chapman; Hall/CRC.
Gelman, Andrew, and Donald B. Rubin. 1992. “Inference from Iterative Simulation Using Multiple Sequences.” Statistical Science 7 (4): 457–72.
Jeffreys, Harold. 1961. The Theory of Probability. Vol. 3. Oxford: Oxford University Press, Clarendon Press.
Kruschke, John K., Herman Aguinis, and Harry Joo. 2012. “The Time Has Come: Bayesian Methods for Data Analysis in the Organizational Sciences.” Organizational Research Methods 15 (4): 722–52.
Lee, Michael D., and Eric-Jan Wagenmakers. 2014. Bayesian Cognitive Modeling: A Practical Course. Cambridge University Press.
McElreath, Richard. 2016. Statistical Rethinking: A Bayesian Course with Examples in R and Stan. CRC Press.
Meng, Xiao-Li, and Wing Hung Wong. 1996. “Simulating Ratios of Normalizing Constants via a Simple Identity: A Theoretical Exploration.” Statistica Sinica, 831–60.
Nicenboim, Bruno, and Shravan Vasishth. 2016. “Statistical Methods for Linguistic Research: Foundational Ideas – Part II.” Language and Linguistics Compass 10 (11): 591–613.
R Core Team. 2020. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Schönbrodt, Felix D., Eric-Jan Wagenmakers, Michael Zehetleitner, and Marco Perugini. 2017. “Sequential Hypothesis Testing with Bayes Factors: Efficiently Testing Mean Differences.” Psychological Methods 22 (2): 322–39.
Sorensen, Tanner, S. Hohenstein, and Shravan Vasishth. 2016. “Bayesian Linear Mixed Models Using Stan: A Tutorial for Psychologists, Linguists, and Cognitive Scientists.” Quantitative Methods for Psychology 12 (3): 175–200.
Vehtari, Aki, Andrew Gelman, and Jonah Gabry. 2015. “Pareto Smoothed Importance Sampling.” arXiv Preprint arXiv:1507.02646.
———. 2017. “Practical Bayesian Model Evaluation Using Leave-One-Out Cross-Validation and WAIC.” Statistics and Computing 27 (5): 1413–32.
Wagenmakers, Eric-Jan, Tom Lodewyckx, Himanshu Kuriyal, and Raoul Grasman. 2010. “Bayesian Hypothesis Testing for Psychologists: A Tutorial on the Savage–Dickey Method.” Cognitive Psychology 60 (3): 158–89.
Wagenmakers, Eric-Jan, Maarten Marsman, Tahira Jamil, Alexander Ly, Josine Verhagen, Jonathon Love, Ravi Selker, et al. 2018. “Bayesian Inference for Psychology. Part I: Theoretical Advantages and Practical Ramifications.” Psychonomic Bulletin & Review 25 (1): 35–57.