Beyond Significant

Andrew Ellis

23/01/2019

Introduction

Overview

  • A practical guide to current Bayesian data analysis methods
    • setting up a Bayesian model
    • parameter estimation
    • model comparison: Bayes factors and posterior predictive checking.

The purpose of this talk is to familiarize the audience with Bayesian thinking and to demonstrate that with current software, Bayesian analysis is no longer restricted to a small number of expert statisticians.

To set the mood

IQ example

Summary: Two groups of people took an IQ test.

Group 1, \(N_1=47\), consumes a “smart drug”, and Group 2, \(N_2=42\), is a control group that consumes a placebo (Kruschke, 2013).

T-Test

Two Sample t-test

Welch Two Sample t-test

Problems with NHST

Misinterpretations

  • American Statistical Association (ASA) released a statement about p-values (Lazar, 2016). Among the principles are:

    • P-values can indicate how incompatible the data are with a specified statistical model.
    • P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
  • Greenland, Senn, Rothman, Carlin, Poole, Goodman, and Altman (2016) provide a good discussion of common misinterpretations of p values and confidence intervals.
    • A confidence interval does not have a 95% chance of containing the true parameter.

The Bayesian New Statistics

  • Cumming (2014) claim that "we need to shift from reliance on NHST to estimation and other preferred techniques.

  • Kruschke and Liddell (2018) advocate that Bayesian methods are better suited to achieve this, for both hypothesis testing and parameter estimation.

  • According to Gigerenzer (2018), we need to stop relying on NHST, but instead learn to use a statistical toolkit.

  • Many reviewers now demand Bayes factors (because a BF can provide evidence for/against hypotheses).

  • Bayesian data analysis is not limited to calculating Bayes factors.

Bayesian methods

🤗

  • more intuitive (quantification of uncertainty)

  • able to provide evidence for/against hypotheses

  • more flexible
  • can include prior knowledge

  • better for multilevel models (Gelman and Hill, 2006)

  • based on probability theory (Bayes theorem)

😧

  • requires computing power

  • setting priors requires familiarity with probability distributions

  • (ongoing discussion about parameter estimation vs. hypothesis testing. See e.g. here and here.)

Some theory

Key ideas

We will have a brief look at the theoretical background, then dive straight into a practical example.

Parameters are random variables. These are drawn from probability distributions, which reflect our uncertainty about the parameters.

The prior distribution is updated with the likelihood (data) to obtain a posterior distribution.

Bayesian workflow

Bayesian workflow

Bayesian model comparison

A hands-on example

To make this more concrete, we return to the IQ example.

  • Our null hypothesis: the smart drug group have higher IQ scores than the placebo group.

  • We obtained a p-value of 0.0548769 (Welch test)

  • Even after removing outliers, p is > 0.05

Bayes factor to the rescue?

Not in this case. The data provide little evidence either for or against our hypothesis.

  • The maximally attainable Bayes factor in favour of our hypothesis is \(\sim 1.8\).

  • A further benefit of going Bayesian is the ability to flexibly specify our generative model. We can make the model robust against outliers.

T-Test as linear model

A t-test is really just a general linear model (assuming equal variances) \[ Y = \alpha + \beta X + \epsilon\] \[ \epsilon \sim N(0, \sigma^2) \]

where \(X\) is an indicator variable.

which can be read as:

\[ IQ = Placebo + \beta \cdot SmartDrug + \epsilon\] \[ \epsilon \sim N(0, \sigma^2) \]

The \(\beta\) parameter therefore represents the difference between groups.

Generative model

Linear regression model as a probabilistic model

  • The graph on the right shows the data-generating process (dependencies among the random variables).

  • \(\alpha\) is our expectation for the placebo group, \(\beta\) is our expectation for the difference in means. \(\sigma\) is the variance of the outcome.

Software

Probabilistic programming languages:

  • Winbugs

  • Jags

  • Stan: Probabilistic programming language

  • PyMC3: Probabilistic programming in Python

  • Turing: Probabilistic programming language with an intuitive modelling interface.

Inference using Stan

Inference using brms

Estimated parameter

We obtain the marginal posterior distribution of the regression parameter \(\beta\)1

Criticize model (posterior predictive check)

We can draw samples from the posterior predictive distribution. The model predicts equal variances, and no outliers.

Model revision

The Gaussian distribution is sensitive to outliers. We can model our data as being generated by a T distribution instead.

Robust regression

Predicted marginal means

## Hypothesis Tests for class b:
##                 Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio
## 1 (GroupSmartDrug)-... > 0     1.03      0.41     0.34      Inf     101.56
##   Post.Prob Star
## 1      0.99    *
## ---
## '*': The expected value under the hypothesis lies outside the 95%-CI.
## Posterior probabilities of point hypotheses assume equal prior probabilities.

Posterior predictive distribution

Equal variance model

Bayes factors

We can compute Bayes factors using different two methods:

  • Savage-Dickey density ratio

  • Bridge sampling (used to obtain the marginal likelihood)

  • Comparison: Model with grouping variable vs model without (restricted).

  • For a more detailed description of Bayes factors, see here.

  • Bayes factors are difficult to compute, often only possible for certain restricted models.

Bayes factors

Savage Dickey density ratio:

## [1] 2.935822

Summary

We noted that the normal model does not describe the data well. We modelled the outcome \(y\) as being drawn from a t distribution. This accounts for outliers.

  • Posterior predictive check: the data are well described by this model.

  • Parameter estimation: the parameter representing the group differnce is positive, with a 95% credible interval of [0.23, 1.85].

  • Bayes factor in favour of our hypothesis lies between 2.94 and 3.49, depending on the method used.

Is this b-hacking?

  • This is exploratory.

  • We did not remove outliers.

  • We did not test more subjects.

  • Computing a BF requires carefully specifying prior distributions, and these need to be reported.

Message to PhD students

Read Statistical Rethinking and install.packages("brms")

Further reading

References

Cumming, G. (2014). “The New Statistics: Why and How”. En. In: Psychological Science 25.1, pp. 7-29. ISSN: 0956-7976. DOI: 10.1177/0956797613504966.

Gelman, A. and J. Hill (2006). Data Analysis Using Regression and Multilevel/Hierarchical Models. En. Cambridge University Press. ISBN: 978-0-511-79094-2. DOI: 10.1017/CBO9780511790942.

Gigerenzer, G. (2018). “Statistical Rituals: The Replication Delusion and How We Got There”. En. In: Advances in Methods and Practices in Psychological Science 1.2, pp. 198-218. ISSN: 2515-2459. DOI: 10.1177/2515245918771329.

Greenland, S, S. J. Senn, K. J. Rothman, et al. (2016). “Statistical Tests, P Values, Confidence Intervals, and Power: A Guide to Misinterpretations”. En. In: European Journal of Epidemiology 31.4, pp. 337-350. ISSN: 1573-7284. DOI: 10.1007/s10654-016-0149-3.

Kruschke, J. K. (2013). “Bayesian Estimation Supersedes the t Test.” En. In: Journal of Experimental Psychology: General 142.2, pp. 573-603. ISSN: 1939-2222, 0096-3445. DOI: 10.1037/a0029146.

Kruschke, J. K. and T. M. Liddell (2018). “The Bayesian New Statistics: Hypothesis Testing, Estimation, Meta-Analysis, and Power Analysis from a Bayesian Perspective”. En. In: Psychonomic Bulletin & Review 25.1, pp. 178-206. ISSN: 1531-5320. DOI: 10.3758/s13423-016-1221-4.

Lazar, N. A. (2016). “The ASA’s Statement on p-Values: Context, Process, and Purpose AU - Wasserstein, Ronald L.” In: The American Statistician 70.2, pp. 129-133. ISSN: 0003-1305. DOI: 10.1080/00031305.2016.1154108.

Lee, M. D. and E. Wagenmakers (2013). Bayesian Cognitive Modeling: A Practical Course. OCLC: ocn861318341. Cambridge ; New York: Cambridge University Press. ISBN: 978-1-107-01845-7 978-1-107-60357-8.