Swiss Graduate School for Cognition, Learning, and Memory

Introduction to Bayesian statistics

Andrew Ellis

2019-04-26

Plan for today

Introduction

Have you ever had this problem?






Example 1: Smart Drug


The group means, standard deviations and standard errors are:


Group mean sd se
Placebo 100.36 2.52 0.39
SmartDrug 101.91 6.02 0.88


It is obvious that the data contain several ‘outliers’.


Two sample t-test / Welch test

We can perform a two-sample t-test, or a Welch test:

t.test(IQ ~ Group,
       data = TwoGroupIQ,
       var.equal = TRUE,
       alternative = "less")
## 
##  Two Sample t-test
## 
## data:  IQ by Group
## t = -1.5587, df = 87, p-value = 0.06135
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##       -Inf 0.1037991
## sample estimates:
##   mean in group Placebo mean in group SmartDrug 
##                100.3571                101.9149
t.test(IQ ~ Group,
       data = TwoGroupIQ,
       var.equal = FALSE,
       alternative = "less")
## 
##  Welch Two Sample t-test
## 
## data:  IQ by Group
## t = -1.6222, df = 63.039, p-value = 0.05488
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##        -Inf 0.04532157
## sample estimates:
##   mean in group Placebo mean in group SmartDrug 
##                100.3571                101.9149


Problem: Neither yield a significant result. What do we do?





Example 2: Kitchen rolls

The following is a commonly encountered problem: you would like to quantify evidence for the null hypothesis.

Imagine you have gone to the trouble of runnning a replication experiment in which you measure Openness to Experience scores for two groups of students - while filling out the personality questionnaire, both groups rotated a kitchen roll with their hands; one group clockwise, the other group counterclockwise (Wagenmakers et al. 2015).

library(tidyverse)
kitchenrolls <- read_csv("data/KitchenRolls.csv") %>%
  select(ParticipantNumber, Rotation, NEO = mean_NEO) %>%
  mutate_at(vars(ParticipantNumber, Rotation), ~as_factor(.))
kitchenrolls

We can compute means, standard deviations and standard errors:

kitchenrolls %>%
  group_by(Rotation) %>%
  summarise(N = n(),
            mean = mean(NEO),
            sd = sd(NEO),
            se = sd(NEO)/sqrt(n())) %>%
  mutate_if(is.numeric, ~round(., 3))
kitchenrolls %>%
  ggplot(aes(x = Rotation, y = NEO, fill = Rotation)) +
  geom_boxplot() +
  geom_jitter(width = 0.2) +
  scale_fill_viridis_d() +
  theme_tidybayes()

The hypothesis was that turing a kitchen roll in the clockwise direction should increase Openness to experience.

t.test(NEO ~ Rotation,
       data = kitchenrolls,
       alternative = "less")
## 
##  Welch Two Sample t-test
## 
## data:  NEO by Rotation
## t = 0.75149, df = 97.315, p-value = 0.7729
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
##       -Inf 0.2321921
## sample estimates:
## mean in group counter   mean in group clock 
##              0.712963              0.640625

The result cannot be replicated. If anything, the effect goes in the other direction, but we would like to quantify evidence in favour of the null hypothesis that rotating a kitchen roll has no effect.

How can we do this?

Problems with null hypothesis significance testing (NHST)





The (Bayesian) New Statistics





Bayesian methods:

🤗


😧

Why Bayesian statistics?

It is important to distinguish between parameter estimation and hypothesis testing (Wagenmakers et al. 2018)





Benefits of Bayesian parameter estimation





Benefits of Bayesian hypothesis testing

What is Bayesian statistics?





Bayesian parameter estimation

In Bayesian parameter estimation, we focus on one model.

\[ p(\theta | y) = p(\theta) \cdot \frac{p(y | \theta)}{p(y)}\]

Bayesians cannot test precise hypotheses using confidence intervals. In classical statistics one frequently sees testing done by forming a confidence region for the parameter, and then rejecting a null value of the parameter if it does not lie in the confidence region. This is simply wrong if done in a Bayesian formulation (and if the null value of the parameter is believable as a hypothesis).





Bayesian hypothesis testing

Bayesian hypothesis testing is model comparison, in which we compare the ability of two or more competing models to predict data.

\[ \frac{p(\mathcal{M}_1 | y) = P(y | \mathcal{M}_1) p(\mathcal{M}_1)} {p(\mathcal{M}_2 | y) = P(y | \mathcal{M}_2) p(\mathcal{M}_2)}\]

When the goal is hypothesis testing, Bayesians need to go beyond the posterior distribution. To answer the question “To what extent do the data support the presence of a correlation?” one needs to compare two models





The Bayes factor

Let’s have another look at Bayes rule (including the dependency of the parameters \(\mathbf{\theta}\) on the model \(\mathcal{M}\)):

\[ p(\theta | y, \mathcal{M}) = \frac{p(y|\theta, \mathcal{M}) p(\theta | \mathcal{M})}{p(y | \mathcal{M})}\]

where \(\mathcal{M}\) refers to a specific model. The marginal likelihood \(p(y | \mathcal{M})\) now gives the probability of the data, averaged over all possible parameter value under model \(\mathcal{M}\).





The marginal likelihood \(p(y | \mathcal{M})\) is usually neglected when looking at a single model, but becomes important when comparing models.





Writing out the marginal likelihood \(p(y | \mathcal{M})\): \[ p(y | \mathcal{M}) = \int{p(y | \theta, \mathcal{M}) p(\theta|\mathcal{M})d\theta}\]

we see that this is averaged over all possible values of \(\theta\) that the model will allow.





The priors on \(\theta\) are important.





The problem with making many predictions is that most of these predictions will turn out to be false.

The complexity of a model depends on (among other things):

When a parameter priors are broad (uninformative), those parts of the parameter space where the likelihood is high are assigned low probability. Intuitively, if one hedges one’s bets, one has to assign low probability to parameter values that make good predictions, because one has more possible parameter values.

All this leads to the fact that more complex model have comparatively lower marginal likelihood.





We can also write Bayes rule applied to a comparison between models (marginalized over all parameters within the model):

\[ p(\mathcal{M}_1 | y) = \frac{P(y | \mathcal{M}_1) p(\mathcal{M}_1)}{p(y)}\]

and

\[ p(\mathcal{M}_2 | y) = \frac{P(y | \mathcal{M}_2) p(\mathcal{M}_2)}{p(y)}\]

This tells us that for model \(\mathcal{M_m}\), the posterior probability of the model is proportional to the marginal likelihood times the prior probability of the model.

Now, one is usually less interested in absolute evidence than in relative evidence; we want to compare the predictive performance of one model over another.





To do this, we simply form the ratio of the model probabilities:

\[ \frac{p(\mathcal{M}_1 | y) = \frac{P(y | \mathcal{M}_1) p(\mathcal{M}_1)}{p(y)}} {p(\mathcal{M}_2 | y) = \frac{P(y | \mathcal{M}_2) p(\mathcal{M}_2)}{p(y)}}\]





The term \(p(y)\) cancels out, giving us: \[ \frac{p(\mathcal{M}_1 | y) = P(y | \mathcal{M}_1) p(\mathcal{M}_1)} {p(\mathcal{M}_2 | y) = P(y | \mathcal{M}_2) p(\mathcal{M}_2)}\]





\[\frac{P(y | \mathcal{M}_1)}{P(y | \mathcal{M}_2)}\]

This is the Bayes factor, and it can be interpreted as the change from prior odds to posterior odds that is indicated by the data.





If we consider the prior odds to be \(1\), i.e. we do not favour one model over another a priori, then we are only interested in the Bayes factor. We write this as:

\[ BF_{12} = \frac{P(y | \mathcal{M}_1)}{P(y | \mathcal{M}_2)}\]





Here, \(BF_{12}\) indicates the extent to which the data support model \(\mathcal{M}_1\) over model \(\mathcal{M}_2\).





As an example, if we obtain a \(BF_{12} = 5\), this mean that the data are 5 times more likely to have occured under model 1 than under model 2. Conversely, if \(BF_{12} = 0.2\), then the data are 5 times more likely to have occured under model 2.





We usually perform model comparisons between a null hypothesis \(\mathcal{H}_0\) and an alternative hypothesis \(\mathcal{H}_1\). The terms “model” and “hypothesis” are used synonymously.

In JASP, we will see Bayes factors reported as either

\[ BF_{10} = \frac{P(y | \mathcal{H}_1)}{P(y | \mathcal{H}_0)}\]

which indicates a BF for an undirected alternative \(\mathcal{H}_1\) versus the null, or

\[ BF_{+0} = \frac{P(y | \mathcal{H}_+)}{P(y | \mathcal{H}_0)}\]

which indicates a BF for a directed alternative \(\mathcal{H}_+\) versus \(\mathcal{H}_0\).

If we want a BF for the null \(\mathcal{H}_0\), we can simply take the inverse of \(BF_{10}\):

\[ BF_{01} = \frac{1}{BF_{10}}\]





The following classification scheme is sometimes used, although it is rather unnesscessary.

Principled Bayesian workflow

According to Gelman (2014), Bayesian data analysis is performed in three steps:

  1. Set up probability model (joint probability distribution for observed (\(y\), \(x\)) and latent quantities \(\theta\)).

  2. Condition on observed data: calculate posterior distribution \(P(y | \theta) \cdot p(\theta)\).

  3. Evaluate model and implications of the posterior distribution.

    • How well does the model fit the data?
    • Are the substantive conclusions reasonable?
    • How sensitive are the results to the modelling assumptions?
    • Does the model need to be revised?





This fits very well with the iterative process described by Blei (2014).





In fact, we can describe a Bayesian workflow like this:





This highlights the distinction between posterior evaluation (estimation) of a model





and model comparison (hypothesis testing)

How to do Bayesian statistics?

Open notebook: 01-intro-bayesian-statistics.Rmd

Hands-on session with Jasp

Open notebook: 02-jasp-case-studies.Rmd

Hands-on session with R

Open notebook: 03-brms-case-studies.Rmd

References

Blei, David M. 2014. “Build, Compute, Critique, Repeat: Data Analysis with Latent Variable Models.” Annual Review of Statistics and Its Application 1 (1): 203–32. https://doi.org/10.1146/annurev-statistics-022513-115657.

Cumming, Geoff. 2014. “The New Statistics: Why and How.” Psychological Science 25 (1): 7–29. https://doi.org/10.1177/0956797613504966.

Dienes, Zoltan. 2014. “Using Bayes to Get the Most Out of Non-Significant Results.” Frontiers in Psychology 5. https://doi.org/10.3389/fpsyg.2014.00781.

Gelman, Andrew. 2014. Bayesian Data Analysis. Third edition. Chapman & Hall/CRC Texts in Statistical Science. Boca Raton: CRC Press.

Gelman, Andrew, and Jennifer Hill. 2006. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press. https://doi.org/10.1017/CBO9780511790942.

Gigerenzer, Gerd. 2004. “Mindless Statistics.” The Journal of Socio-Economics 33 (5): 587–606. https://doi.org/10.1016/j.socec.2004.09.033.

———. 2018. “Statistical Rituals: The Replication Delusion and How We Got There.” Advances in Methods and Practices in Psychological Science 1 (2): 198–218. https://doi.org/10.1177/2515245918771329.

Greenland, Sander, Stephen J. Senn, Kenneth J. Rothman, John B. Carlin, Charles Poole, Steven N. Goodman, and Douglas G. Altman. 2016. “Statistical Tests, P Values, Confidence Intervals, and Power: A Guide to Misinterpretations.” European Journal of Epidemiology 31 (4): 337–50. https://doi.org/10.1007/s10654-016-0149-3.

Kruschke, John K. 2013. “Bayesian Estimation Supersedes the T Test.” Journal of Experimental Psychology: General 142 (2): 573–603. https://doi.org/10.1037/a0029146.

Kruschke, John K., and Torrin M. Liddell. 2018. “The Bayesian New Statistics: Hypothesis Testing, Estimation, Meta-Analysis, and Power Analysis from a Bayesian Perspective.” Psychonomic Bulletin & Review 25 (1): 178–206. https://doi.org/10.3758/s13423-016-1221-4.

Lazar, Nicole A. 2016. “The ASA’s Statement on P-Values: Context, Process, and Purpose AU - Wasserstein, Ronald L.” The American Statistician 70 (2): 129–33. https://doi.org/10.1080/00031305.2016.1154108.

Lee, Michael D., and Eric-Jan Wagenmakers. 2014. Bayesian Cognitive Modeling: A Practical Course. 1st ed. Cambridge ; New York: Cambridge University Press. https://doi.org/10.1017/CBO9781139087759.

Wagenmakers, Eric-Jan, Titia F. Beek, Mark Rotteveel, Alex Gierholz, Dora Matzke, Helen Steingroever, Alexander Ly, et al. 2015. “Turning the Hands of Time Again: A Purely Confirmatory Replication Study and a Bayesian Analysis.” Frontiers in Psychology 6 (April). https://doi.org/10.3389/fpsyg.2015.00494.

Wagenmakers, Eric-Jan, Maarten Marsman, Tahira Jamil, Alexander Ly, Josine Verhagen, Jonathon Love, Ravi Selker, et al. 2018. “Bayesian Inference for Psychology. Part I: Theoretical Advantages and Practical Ramifications.” Psychonomic Bulletin & Review 25 (1): 35–57. https://doi.org/10.3758/s13423-017-1343-3.

Wagenmakers, Eric-Jan, Richard D. Morey, and Michael D. Lee. 2016. “Bayesian Benefits for the Pragmatic Researcher.” Current Directions in Psychological Science 25 (3): 169–76. https://doi.org/10.1177/0963721416643289.