(Bayesian) linear mixed-effects models

# A tibble: 1,036 × 4 item ppt rt nptype <dbl> <dbl> <dbl> <chr> 1 11 10 1055 conjoined 2 11 9 2010 conjoined 3 11 11 461 conjoined 4 11 6 977 conjoined 5 11 7 1152 conjoined # ℹ 1,031 more rows

# A tibble: 24 × 3 nptype ppt rt <chr> <dbl> <dbl> 1 conjoined 1 1382. 2 simple 1 1323. 3 conjoined 2 1077. 4 simple 2 1108. 5 conjoined 3 1162. # ℹ 19 more rows

Univariate Type III Repeated-Measures ANOVA Assuming Sphericity Sum Sq num Df Error SS den Df F value Pr(>F) (Intercept) 28783750 1 587304 11 539.11 1.1e-10 *** nptype 6217 1 13043 11 5.24 0.043 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Type III Analysis of Variance Table with Satterthwaite's method Sum Sq Mean Sq NumDF DenDF F value Pr(>F) nptype 6217 6217 1 11 5.24 0.043 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Data: data Models: m_null: rt ~ 1 + (1 | ppt) + (1 | item) m: rt ~ nptype + (1 | ppt) + (1 | item) npar AIC BIC logLik deviance Chisq Df Pr(>Chisq) m_null 4 14319 14339 -7155 14311 m 5 14316 14340 -7153 14306 5.28 1 0.022 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Random intercepts model m <- lmer(rt ~ nptype + ( 1 | ppt ) + ( 1 | item ), data = data) # Maximal random effects structure m_max <- lmer(rt ~ nptype + ( nptype | ppt ) + ( nptype | item ), data = data)

Linear mixed model fit by REML. t-tests use Satterthwaite's method [ lmerModLmerTest] Formula: rt ~ nptype + (nptype | ppt) + (nptype | item) Data: data REML criterion at convergence: 14288 Scaled residuals: Min 1Q Median 3Q Max -4.505 -0.596 -0.147 0.449 5.002 Random effects: Groups Name Variance Std.Dev. Corr item (Intercept) 2749.8 52.44 nptypesimple 522.1 22.85 -1.00 ppt (Intercept) 27159.7 164.80 nptypesimple 52.4 7.24 -1.00 Residual 54359.2 233.15 Number of obs: 1036, groups: item, 48; ppt, 12 Fixed effects: Estimate Std. Error df t value Pr(>|t|) (Intercept) 1111.9 49.3 11.5 22.58 7e-11 *** nptypesimple -33.6 15.0 125.7 -2.24 0.027 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Correlation of Fixed Effects: (Intr) nptypesimpl -0.310 optimizer (nloptwrap) convergence code: 0 (OK) boundary (singular) fit: see help('isSingular')

R package for Bayesian models
(Almost) no more complicated to fit than lme4 models.
Syntax deliberately mimics lme4.
More probability models than other packages: gaussian, lognormal, bernoulli, poisson, zero_inflated_poisson, skew_normal, shifted_lognormal, exgaussian, et cetera
Allows mixture models, nonlinear syntax, (un)equal variance signal-detection theory, multivariate models

More flexibility is only with Stan (Annis, Miller, and Palmeri 2017; Carpenter et al. 2017).
brms creates Stan code to compile a probabilistic MCMC (Markov Chain Monte Carlo) sampler.
Compiling the sampler and obtaining the full posterior can take time.

Exercise: complete script mixed_effects_model_brms.R

prior	class	coef	group
(flat)	b
(flat)	b	nptypesimple
lkj(1)	cor
lkj(1)	cor		item
lkj(1)	cor		ppt
student_t(3, 1039.5, 235)	Intercept
student_t(3, 0, 235)	sd
student_t(3, 0, 235)	sd		item
student_t(3, 0, 235)	sd	Intercept	item
student_t(3, 0, 235)	sd	nptypesimple	item
student_t(3, 0, 235)	sd		ppt
student_t(3, 0, 235)	sd	Intercept	ppt
student_t(3, 0, 235)	sd	nptypesimple	ppt
student_t(3, 0, 235)	sigma

prior

class

coef

group

(flat)

nptypesimple

lkj(1)

cor

lkj(1)

cor

item

lkj(1)

cor

ppt

student_t(3, 1039.5, 235)

Intercept

student_t(3, 0, 235)

item

student_t(3, 0, 235)

Intercept

item

student_t(3, 0, 235)

nptypesimple

item

student_t(3, 0, 235)

ppt

student_t(3, 0, 235)

Intercept

ppt

student_t(3, 0, 235)

nptypesimple

ppt

student_t(3, 0, 235)

sigma

Hypothesis Tests for class b: Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob 1 (nptypesimple) = 0 -33.2 17.2 -66.3 0.75 NA NA Star 1 --- 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses. '*': For one-sided hypotheses, the posterior probability exceeds 95%; for two-sided hypotheses, the value tested against lies outside the 95%-CI. Posterior probabilities of point hypotheses assume equal prior probabilities.

Hypothesis Tests for class b: Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob 1 (nptypesimple) < 0 -33.2 17.2 -61.4 -5.88 36 0.97 Star 1 * --- 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses. '*': For one-sided hypotheses, the posterior probability exceeds 95%; for two-sided hypotheses, the value tested against lies outside the 95%-CI. Posterior probabilities of point hypotheses assume equal prior probabilities.

Hypothesis Tests for class b: Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio 1 (nptypesimple)-(-10) < 0 -23.2 17.2 -51.4 4.12 11.1 Post.Prob Star 1 0.92 --- 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses. '*': For one-sided hypotheses, the posterior probability exceeds 95%; for two-sided hypotheses, the value tested against lies outside the 95%-CI. Posterior probabilities of point hypotheses assume equal prior probabilities.

Computed from 4000 by 1036 log-likelihood matrix. Estimate SE elpd_loo -7141.4 40.8 p_loo 41.6 3.6 looic 14282.8 81.6 ------ MCSE of elpd_loo is 0.1. MCSE and ESS estimates assume MCMC draws (r_eff in [0.6, 1.5]). All Pareto k estimates are good (k < 0.7). See help('pareto-k-diagnostic') for details.

log_lik(fit_brm) %>% as_tibble() %>% mutate(sample = 1:n()) %>% pivot_longer(-sample, names_to = "obs", values_to = "loglik") %>% summarise(mean_loglik = mean(exp(loglik)), var_loglik = var(loglik), .by = obs) %>% summarise(elpd_loo = sum(log(mean_loglik)), p_loo = sum(var_loglik), looic = -2 * (elpd_loo - p_loo))

comparison	elpd_diff	SE	elpd_ratio
fit_gaussian vs fit_lognormal	-77.3	22.2	3.5
fit_gaussian vs fit_shifted_lognormal	-75.6	23.2	3.3
fit_lognormal vs fit_shifted_lognormal	1.7	1.2	1.4
^a `elpd_diff` is often reported as e.g. \(\Delta\widehat{elpd}\).

comparison

elpd_diff

elpd_ratio

fit_gaussian vs fit_lognormal

-77.3

22.2

3.5

fit_gaussian vs fit_shifted_lognormal

-75.6

23.2

3.3

fit_lognormal vs fit_shifted_lognormal

1.7

1.2

1.4

^a elpd_diff is often reported as e.g. \(\Delta\widehat{elpd}\).

Annis, Jeffrey, Brent J Miller, and Thomas J Palmeri. 2017. “Bayesian Inference with Stan: A Tutorial on Adding Custom Distributions.” Behavior Research Methods 49: 863–86.

Baayen, R. Harald. 2008. Analyzing Linguistic Data. A Practical Introduction to Statistics Using R. Cambridge: Cambridge University Press.

Baayen, R. Harald, Douglas J. Davidson, and Douglas M. Bates. 2008. “Mixed-Effects Modeling with Crossed Random Effects for Subjects and Items.” Journal of Memory and Language 59 (4): 390–412.

Barr, Dale J, Roger Levy, Christoph Scheepers, and Harry J Tily. 2013. “Random Effects Structure for Confirmatory Hypothesis Testing: Keep It Maximal.” Journal of Memory and Language 68 (3): 255–78.

Bates, Douglas, Reinhold Kliegl, Shravan Vasishth, and Harald Baayen. 2015. “Parsimonious Mixed Models.” arXiv Preprint arXiv:1506.04967.

Bürkner, Paul-Christian. 2017. “Brms: An R Package for Bayesian Multilevel Models Using Stan.” Journal of Statistical Software 80: 1–28. https://doi.org/10.18637/jss.v080.i01.

———. 2018. “Advanced Bayesian Multilevel Modeling with the R Package brms.” The R Journal 10 (1): 395–411. https://doi.org/10.32614/RJ-2018-017.

———. 2019. “Bayesian Item Response Modeling in R with Brms and Stan.” arXiv Preprint arXiv:1905.09501.

Bürkner, Paul-Christian, and Matti Vuorre. 2019. “Ordinal Regression Models in Psychology: A Tutorial.” Advances in Methods and Practices in Psychological Science 2 (1): 77–101.

Carpenter, Bob, Andrew Gelman, Matthew D Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. “Stan: A Probabilistic Programming Language.” Journal of Statistical Software 76: 1–32.

Clayton, Aubrey. 2021. Bernoulli’s Fallacy: Statistical Illogic and the Crisis of Modern Science. Columbia University Press.

Cohn, J. 1988. Statistical Power Analysis for the Behavioral Sciences. Hillsdale, NJ: Lawrence Earlbam Associates.

Colquhoun, David. 2017. “The Reproducibility of Research and the Misinterpretation of p-Values.” Royal Society Open Science 4 (12): 171085.

Dickey, James M., B. P. Lientz, et al. 1970. “The Weighted Likelihood Ratio, Sharp Hypotheses about Chances, the Order of a Markov Chain.” The Annals of Mathematical Statistics 41 (1): 214–26.

Dienes, Zoltan. 2014. “Using Bayes to Get the Most Out of Non-Significant Results.” Frontiers in Psychology 5 (781): 1–17.

———. 2016. “How Bayes Factors Change Scientific Practice.” Journal of Mathematical Psychology 72: 78–89.

Dienes, Zoltan, and Neil Mclatchie. 2018. “Four Reasons to Prefer Bayesian Analyses over Significance Testing.” Psychonomic Bulletin & Review 25 (1): 207–18.

Eager, Christopher D, and Joseph Roy. 2017. “Mixed Models Are Sometimes Terrible.” Linguistic Society of America, Austin, TX.

Gelman, Andrew, Jennifer Hill, and Aki Vehtari. 2020. Regression and Other Stories. Cambridge University Press.

Gelman, Andrew, and Donald B. Rubin. 1992. “Inference from Iterative Simulation Using Multiple Sequences.” Statistical Science 7 (4): 457–72.

Greenland, Sander, Stephen J. Senn, Kenneth J. Rothman, John B. Carlin, Charles Poole, Steven N. Goodman, and Douglas G. Altman. 2016. “Statistical Tests, p Values, Confidence Intervals, and Power: A Guide to Misinterpretations.” European Journal of Epidemiology 31 (4): 337–50.

Hoekstra, Rink, Richard D. Morey, Jeffrey N. Rouder, and Eric-Jan Wagenmakers. 2014. “Robust Misinterpretation of Confidence Intervals.” Psychonomic Bulletin & Review 21 (5): 1157–64.

Jeffreys, Harold. 1961. The Theory of Probability. Vol. 3. Oxford: Oxford University Press, Clarendon Press.

Kimball, A, Kailen Shantz, C Eager, and Joseph Roy. 2016. “Beyond Maximal Random Effects for Logistic Regression: Moving Past Convergence Problems.” ArXiv e-Prints.

Kruschke, John K. 2010. “What to Believe: Bayesian Methods for Data Analysis.” Trends in Cognitive Sciences 14 (7): 293–300.

———. 2011. “Bayesian Assessment of Null Values via Parameter Estimation and Model Comparison.” Perspectives on Psychological Science 6 (3): 299–312.

———. 2014. “Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan.”

———. 2018. “Rejecting or Accepting Parameter Values in Bayesian Estimation.” Advances in Methods and Practices in Psychological Science 1 (2): 270–80.

Lambert, Ben. 2018. A Student’s Guide to Bayesian Statistics. Sage.

Lee, Michael D., and Eric-Jan Wagenmakers. 2014. Bayesian Cognitive Modeling: A Practical Course. Cambridge University Press.

Makowski, Dominique, Mattan S Ben-Shachar, and Daniel Lüdecke. 2019. “ bayestestR: Describing Effects and Their Uncertainty, Existence and Significance Within the Bayesian Framework.” Journal of Open Source Software 4 (40): 1541.

Martin, Randi C, Jason E Crowther, Meredith Knight, Franklin P Tamborello II, and Chin-Lung Yang. 2010. “Planning in Sentence Production: Evidence for the Phrase as a Default Planning Scope.” Cognition 116 (2): 177–92.

Matzke, Dora, and Eric-Jan Wagenmakers. 2009. “Psychological Interpretation of the Ex- Gaussian and Shifted Wald Parameters: A Diffusion Model Analysis.” Psychonomic Bulletin & Review 16 (5): 798–817.

McElreath, Richard. 2020. Statistical Rethinking: A Bayesian course with examples in R and Stan. Vol. 2. CRC Press.

Morey, Richard D., Rink Hoekstra, Jeffrey N Rouder, Michael D. Lee, and Eric-Jan Wagenmakers. 2016. “The Fallacy of Placing Confidence in Confidence Intervals.” Psychonomic Bulletin & Review 23 (1): 103–23.

Nicenboim, Bruno, and Shravan Vasishth. 2016. “Statistical Methods for Linguistic Research: Foundational Ideas – Part II.” Language and Linguistics Compass 10 (11): 591–613.

Roeser, J., M. Torrance, M. Andrews, and T. Baguley. 2024. “No Default Syntactic Scope for Advance Planning in Sentence Production: Evidence from Finite Mixture Models,” December. https://doi.org/10.31219/osf.io/u7v36.

Rouder, Jeffrey N., Richard D. Morey, and Eric-Jan Wagenmakers. 2016. “The Interplay Between Subjectivity, Statistical Practice, and Psychological Science.” Collabra 2 (1): 1–12.

Sorensen, Tanner, S. Hohenstein, and Shravan Vasishth. 2016. “Bayesian Linear Mixed Models Using Stan: A Tutorial for Psychologists, Linguists, and Cognitive Scientists.” Quantitative Methods for Psychology 12 (3): 175–200.

Vasishth, Shravan, and Bruno Nicenboim. 2016. “Statistical Methods for Linguistic Research: Foundational Ideas – Part I.” arXiv Preprint arXiv:1601.01126.

Vehtari, Aki, Andrew Gelman, and Jonah Gabry. 2015. “Pareto Smoothed Importance Sampling.” arXiv Preprint arXiv:1507.02646.

———. 2017. “Practical Bayesian Model Evaluation Using Leave-One-Out Cross-Validation and WAIC.” Statistics and Computing 27 (5): 1413–32.

Winter, Bodo, and Paul-Christian Bürkner. 2021. “Poisson Regression for Linguists: A Tutorial Introduction to Modelling Count Data with Brms.” Language and Linguistics Compass 15 (11): e12439.

In case you weren’t around on Monday

Why generalised linear models?

Normal linear model (formal description)

Normal linear model (formal description)

Simulation of normal distributed data

Simulation of normal distributed data

Simulation of normal distributed data

Simulation of normal distributed data

Simulation of normal distributed data

Real data

Real (hierarchical) data: Martin et al. (2010)

Martin et al. (2010) data (Experiment 3a)

Aggregation by participants (across items)

Conditions nested within participants

Participant variation (by-participant means)

Model description

Model implementation in R

Item variation (by-image sets means)

Crossed participants and items

What’s a mixed-effects model?

Crossed participants and items

Crossed participants and items

Random intercepts

Predicted outcomes

Null-hypothesis test for lmer models?

Predicted effects by participants and items

Random intercepts and random slopes

Random intercepts and random slopes

Model summary

Hypothetical examples

Generalised linear models

Convergence failures

Alternative

Fitting mixed-effects models in brms

(Bürkner 2017, 2018)

brms (Bürkner 2017)

R syntax

Fit models on simulated data

Why Bayesian?

see e.g. Kruschke (2014)

Two universes

Null-hypothesis significance testing

Its problems

Bayesian inference

Bayes’ Theorem

Why is Bayesian inference important?

Why is Bayesian inference important?

Why is Bayesian inference important?

Convergence and model diagnostics

see Lambert (2018) for HMC

So what about this?

Parameter estimation

Parameter estimation

Parameter estimation (traceplot)

Parameter estimation (traceplot)

Parameter estimation (traceplot)

Parameter estimation (traceplot)

Parameter estimation (traceplot)

Parameter estimation (traceplot)

Parameter estimation (traceplot)

Model diagnostics and convergence checks

Where are the priors?

see e.g. chapter 7 in Lee and Wagenmakers (2014)

Priors

Priors: intercept

Priors: intercept

Priors: intercept

Priors: slope

Priors: slope

Priors

Hypothesis testing

see e.g. Nicenboim and Vasishth (2016) for a tutorial

What results do I report?

Posterior probability distribution

Posterior probability distribution

Posterior probability distribution

Posterior probability distribution

Posterior probability distribution

Posterior probability distribution

Posterior probability distribution

Null-hypothesis test for `lmer` models?

Fitting mixed-effects models in `brms`

`brms` (Bürkner 2017)

Student t-distribution: `student_t(df, mean, sd)`

Bayes Factor with `hypothesis()`