Who would you rather date Bayes factor or p-value?

11/05/2021

Lector

Name: Ruslan Klymentiev

Areas of Specialization: Statistics, Machine Learning, Neuroscience

Education: MSc in Integrative Neuroscience, OvGU

Current Position: Research Assistant at OvGU

DEVrepublik Courses: Python Programming, Statistics for Data Science

URL: www.defme.xyz
Twitter: @ruslan_kl
GitHub: ruslan-kl

Lecture Objectives

What is Frequentist inference
What is Bayesian inference
What makes them different
Not to judge, but to explore!

What is Inference?

“Scientists can draw very different meanings from the same data”

Quiz: What is the probability that coin has landed HEADS up?

A: Probability that can has landed heads up is \(P(\text{Heads Up}) = 50\%\)

B: Coin has either landed heads up or not, \(P(\text{Heads Up}) \in \{0, 100\%\}\)

C: It is hard to say

D: What is Heads?

Are you Bayesian or Frequentist? (1/2)

Frequentist would say that the true answer is either 0 or 1, \(P(\text{Heads Up}) \in \{0, 100\%\}\).

Bayesian would assign the probability to the event \(P(\text{Heads Up}) = 50\%\).

Are you Bayesian or Frequentist? (2/2)

URL: https://youtu.be/GEFxFVESQXc

Bayesian vs Frequentist Probabilities

Frequentist definition of probability:

\[P(A) = \lim_{n \to \infty} \frac{n_{A}}{n}\]

Bayesian definition of probability:

\[P(A|B) = \frac{P(B|A)P(A)}{P(B)}\]

Problem (Disclaimer: numbers are made up)

Benzodiazepines (also known as tranquilizers) class of drugs are commonly used to treat anxiety. However, such drugs as Xanax can evoke mild side effects like drowsiness or headache pain. Throughout the studies the side effects were observed at a chance level (around \(50\%\) of the time). You believe that the rate of developing side effect is much lower for people under 30 years. You have collected data from \(50\) young patients with anxiety disorder who were assigned to Xanax and \(21\) of them showed the side effects after the drug, \(\hat{p} = 0.42\).

Does this result significantly different from a random chance?

Frequentist Approach

Null hypothesis significance testing (NHST)

set null hypothesis (\(H_0\)) value to some constant value;
build the desired probability distribution assuming that \(H_0\) is true;
find the probability of observed data in the direction of the alternative hypothesis (\(H_A\));
direction could be one-sided (less/greater, \(<\) / \(>\),) or two-sided (less or greater, \(\neq\)).

Formulating \(H_0\) and \(H_A\)

\(H_0\): the probability of side effects after Xanax for young adults is \(50\%\), \(P(\text{SiEf})=0.5\);
\(H_A\): the probability of side effects is less than \(50\%\), \(P(\text{SiEf})<0.5\);
Significance level \(\alpha=5\%\)
The significance level (or threshold value) is arbitrary, but in most cases is set to \(5\%\). We will reject the null hypothesis if the p-value is less than \(\alpha\).

Quiz: What is a p-value?

A: Probability that null hypothesis is true, given the data, \(P(H_0 \text{ is true} | \text{Data})\)

B: Probability that null hypothesis is false, given the data, \(P(H_0 \text{ is false} | \text{Data})\)

C: Probability of observing the data, given the null hypothesis is true, \(P(\text{Data}|H_0 \text{ is true})\)

D: Probability of observing the data, given the null hypothesis is false, \(P(\text{Data}|H_0 \text{ is false})\)

p-value

In statistical testing, the p-value is the probability of obtaining test results at least as extreme as the results actually observed, under the assumption that the null hypothesis is correct, \(P(\text{Data}|H_0 \text{ is true})\). A very small p-value means that such an extreme observed outcome would be very unlikely under the null hypothesis.

Quiz: What distribution does \(X\) come from?

\[X \in \{0,1\} \text{ or } \{ {\text{No Side Effects, Side effects}}\}\]

A: Bernoulli, \(X \sim Bern (p)\)

B: Binomial, \(X \sim B (n, p)\)

C: Poisson, \(X \sim Pois(\lambda)\)

D: Exponential, \(X \sim Exp(\lambda)\)

Null Distribution

\[X \sim B \left( n = 50, p = \frac{1}{2} \right)\]

p-value

\[\text{p-value} = \sum_{i=0}^k P(X_i) = P(0) + P(1) + ... + P(21)\] \[=\binom{50}{0} \left( \frac{1}{2} \right) ^0 \left( 1 - \frac{1}{2} \right) ^{50-0} + \binom{50}{1}\left( \frac{1}{2} \right) ^1 \left( 1 - \frac{1}{2} \right) ^{50-1}\] \[ + ... + \binom{50}{21} \left( \frac{1}{2} \right) ^{21} \left( 1 - \frac{1}{2} \right) ^{50-21}\] \[\approx 0.161\]

p-value \(> \alpha\)

p-value equals \(0.161\) so we failed to reject the null hypothesis, meaning that there is not enough evidence to claim that the probability of developing side effects for young patients is less than a random chance (\(0.5\)).

Changing the Hypothesis

\(H_0\): the probability of side effects is \(40\%\), \(P(\text{SiEf})=0.4\);
\(H_A\): the probability of side effects is greater than \(40\%\), \(P(\text{SiEf})>0.4\);
\(\alpha=5\%\)

Null Distribution

\[X \sim B \left( n = 50, p = \frac{2}{5} \right)\]

\[\text{p-value} = \sum_{i=k}^n P(X_i) = P(21) + P(22) + ... + P(50)\] \[=\binom{50}{21} \left( \frac{4}{10} \right) ^{21} \left( 1 - \frac{4}{10} \right) ^{50-21} + \binom{50}{2} \left( \frac{4}{10} \right) ^{22} \left( 1 - \frac{4}{10} \right) ^{50-22}\] \[ + ... + \binom{50}{50} \left( \frac{4}{10} \right) ^{50} \left( 1 - \frac{4}{10} \right) ^{50-50}\] \[ \approx 0.439\]

\[\text{p-value} > \alpha\]

Failed to reject the null hypothesis!

Sensitivity to Null Hypothesis

We were unable to reject the hypothesis that probability of side effects is 50%, but at the same time, we were unable to reject the hypothesis that the probability is 40%. As we can see, NHST is very sensitive to the null hypothesis you choose. Changing the hypotheses (even if the idea behind them stays quite the same) can lead to contradictory results.

Normal Approximation to Binomial Distribution

App

Confidence Intervals (1/2)

For the large amount of \(n\) in binomial trials, we can say that random variable \(X\) follows a normal distribution with the mean \(\hat{p}\) and standard error \(\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\):

\[X \sim \mathcal{N} \left( \mu = \hat{p}, SE = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \right)\]

Once we have a normal distribution we can easily calculate 95% CI:

\[(1-\alpha) \text{% CI}: \mu \pm Z_{1-\alpha/2} \cdot SE\]

Confidence Intervals (2/2)

\(\hat{p} = \frac{k}{n}=0.42\);
\(Z_{1-0.05/2} = 1.96\);
\(SE = \sqrt{\frac{0.42(1-0.42)}{50}} \approx 0.07\)

\[95\text{% CI}: (0.28, 0.56)\]

Quiz: What can we say about the CI?

\[95\text{% CI}: (0.28, 0.56)\]

A: Probability that true value is in the (0.28, 0.56) range is 95%, \(P \big( P(\text{SiEf}) \in [0.28, 0.56] \big) = 95\%\)

B: Probability that true value is in the (0.28, 0.56) range is 100%, \(P \big( P(\text{SiEf}) \in [0.28, 0.56] \big) = 100\%\)

C: Probability that true value is in the (0.28, 0.56) range is 0, \(P \big( P(\text{SiEf}) \in [0.28, 0.56] \big) = 0\)

D: Probability that true value is in the (0.28, 0.56) range is either 0 or 100%, \(P \big( P(\text{SiEf}) \in [0.28, 0.56] \big) = \{0, 100\% \}\)

Interpretation of Confidence Intervals

We don’t know if our CI has included the true value (it is either 0 or 1).

We can also say that if we would get more samples of the size 50 and calculated the CI for each of them, 95% of these CIs would hold a true value.

Bayesian Approach

Under Bayesian framework, we can specify two distinct hypotheses and check which one is more likely to be true:

\(H_1\): the probability of side effects is 50%, \(P(\text{SiEf})=0.5\);
\(H_2\): the probability of side effects is 40%, \(P(\text{SiEf})=0.4\).

We are going to apply Bayes rule to calculate the posterior probability after we observed the data.

\[ P(\text{Model}|\text{Data}) = \frac{P(\text{Data|Model}) \cdot P(\text{Model})}{P(\text{Data})}\]

Interpretation of Bayes Rule

\(P(\text{Data|Model})\) is the likelihood, or probability that the observed data would happen given that model (hypothesis) is true.
\(P(\text{Model})\) is the prior probability of a model (hypothesis).
\(P(\text{Data})\) is the probability of a given data. It is also sometimes referred to as normalizing constant to assure that the posterior probability function sums up to one.
\(P(\text{Model}|\text{Data})\) is the posterior probability of the hypothesis given the observed data.

Choosing Priors

Assume that we have no prior information about the probability of side effects, so we believe that both hypotheses have equal chances to be true.

\[P(H_1)=P(H_2)=\frac{1}{2}\]

Note that the prior probability mass function has to sum up to 1.

Likelihood

\(P(k = 21 | H_1 \text{ is true}) = \binom{n}{k} \cdot P(H_1)^k \cdot (1-P(H_1))^{n-k}\)

\(= \binom{50}{21} \cdot 0.5^{21} \cdot (1-0.5)^{50-21}=0.0598\)
\(P(k = 21 | H_2 \text{ is true}) = \binom{n}{k} \cdot P(H_2)^k \cdot \left( 1-P(H_2) \right) ^{n-k}\)

\(= \binom{50}{21} \cdot 0.4^{21} \cdot (1-0.4)^{50-21}=0.109\)

Posterior Probabilities

\(P(H_1 \text{ is true|}k = 21) = \frac{P(k = 21 | H_1 \text{ is true}) \cdot P(H_1)}{P(\text{k = 21})}\)

\(P(H_2 \text{ is true|}k = 21) = \frac{P(k = 21 | H_2 \text{ is true}) \cdot P(H_2)}{P(\text{k = 21})}\)

\[\scriptsize P(k=21)=P(k = 21 | H_1 \text{ is true}) \cdot P(H_1) + P(k = 21 | H_2 \text{ is true}) \cdot P(H_2)\]

\[\scriptsize =0.0598 \cdot 0.5 + 0.109 \cdot 0.5 = 0.084\]

\(P(H_1 \text{ is true|}k = 21) = 0.354\)
\(P(H_2 \text{ is true|}k = 21) = 1 - P(H_1 \text{ is true|}k = 21) = 0.646\)

Visual Illustation

Bayes Factor

The Bayes factor is a likelihood ratio of the marginal likelihood of two competing hypotheses, usually a null and an alternative. The aim of the Bayes factor is to quantify the support for a model over another, regardless of whether these models are correct.

\[\scriptsize \text{BF}(H_2:H_1)= \frac{\text{Likelihood}_2}{\text{Likelihood}_1} = \frac{P(k = 21 | H_2 \text{ is true})}{P(k = 21 | H_1 \text{ is true})} = \] \[\scriptsize \frac{\frac{P(H_2 \text{ is true}|k=21) P(k=21)}{P(H_2\text{ is true)}}}{\frac{P(H_1 \text{ is true}|k=21) P(k=21)}{P(H_1\text{ is true)}}} = \frac{\frac{P(H_2 \text{ is true}|k=21)}{P(H_1\text{ is true}|k=21)}}{\frac{P(H_2)}{P(H_1)}} = \frac{\text{Posterior Odds}}{\text{Prior Odds}}\]

\[\text{BF}(H_2:H_1)= \frac{\frac{0.646}{0.354}}{\frac{0.5}{0.5}} \approx 1.82\]

Bayes Factor Interpretation

To interpret the value we can refer to Harold Jeffreys interpretation table:

Hence we can see there is not enough supporting evidence for \(H_2\) (that the probability of side effect is 40%).

Beta-Binomial Distribution (1/3)

We have specified two distinct hypotheses \(H_1\) and \(H_2\). But also we could define the whole prior probability distribution function of an unknown parameter \(P(\text{SiEf})\).

\(P(\text{SiEf}) \sim \text{unif}(0,1)\);
replace the Uniform distribution with Beta distribution \(\text{Beta}(\alpha,\beta)\) with parameters \(\alpha=1\), \(\beta=1\), \(P(\text{SiEf}) \sim \text{Beta}(1,1)\);
It just makes calculations (and life) easier since Beta and Binomial distribution form a conjugate family;
The likelihood still follows a Binomial distribution.

Beta-Binomial Distribution (2/3)

Now, in order to find the posterior distribution we can just update the values of the Beta distribution:

\(\alpha^* = \alpha + k\)
\(\beta^* = \beta + n - k\)
\(n\) - total number of observations
\(k\) - number of “successes” (patients with side effects)

Note: You can find the calculations in references.

The expected value of the Beta distribution is:

\[E[X] = \frac{\alpha}{\alpha+\beta}\]

Beta-Binomial Distribution (3/3)

To summarize:

Prior: \(\scriptsize P(\text{SiEf}) \sim \text{Beta}(\alpha=1,\beta=1)\)
Likelihood: \(\scriptsize P \big( k = 21 | P(\text{SiEf}) \big) \sim B(n, P(\text{SiEf})) = \binom{n}{k} \cdot P(\text{SiEf})^k \cdot \big( 1-P(\text{SiEf}) \big) ^{n-k}\)
Posterior: \(\scriptsize P(\text{SiEf}) \sim \text{Beta}(\alpha`=\alpha + k,\beta` = \beta + n - k)\)

Results

Prior mean: \(0.5\)
Posterior mean: \(0.42\)
95% credible interval: \((0.29, 0.56)\)

Quiz: What can we say about the CrI?

\[95\text{% CrI}: (0.29, 0.56)\]

A: probability that true value is in the (0.29, 0.56) range is 95%, \(P \big( P(\text{SiEf}) \in [0.29, 0.56] \big) = 95\%\)

B: probability that true value is in the (0.29, 0.56) range is 100%, \(P \big( P(\text{SiEf}) \in [0.29, 0.56] \big) = 100\%\)

C: probability that true value is in the (0.29, 0.56) range is 0, \(P \big( P(\text{SiEf}) \in [0.29, 0.56] \big) = 0\)

D: probability that true value is in the (0.29, 0.56) range is either 0 or 100%, \(P \big( P(\text{SiEf}) \in [0.29, 0.56] \big) = \{0, 100\% \}\)

Interpretation of Credible Intervals

A credible interval is an interval within which an unobserved parameter value falls with a particular probability.

\[P \big( P(\text{SiEf}) \in [0.28, 0.56] \big) = 95\%\]

Changing the Priors

Let’s say that we believe that the probability of side effects is definitely bigger than the random chance (0.5) with the expected value 0.75. We can use parameters \(\alpha=15\), \(\beta=5\) for this. Prior \(\scriptsize P(\text{SiEf}) \sim \text{Beta}(\alpha=15,\beta=5)\)

Prior mean: \(0.75\)
Posterior mean: \(0.51\)
95% credible interval: \((0.4, 0.63)\)

More Data

Now imagine that we have 5 times more observations with the same ratio 0.42, \(n=250\), \(k=105\).

Prior mean: \(0.75\)
Posterior mean: \(0.44\)
95% credible interval: \((0.39, 0.5)\)

Summary

Frequentist Approach	Bayesian Approach
Establishes the probability of a data given model	Establishes the probability of a model given data
Doesn’t rely on prior information about the unknown	Relies on prior information about the unknown (but prior beliefs become less significant as the sample size increases)
Sensitive to the null hypothesis	Is not sensitive to hypotheses
Estimates the degree of uncertainty using confidence intervals	Estimates the degree of uncertainty using credible intervals
Cannot distinguish the probability of a true value in a CI (it is either 0 or 1)	Can distinguish the probability of a true value in a CrI

Additional Links

Blog Post: Who would you rather date Bayes Factor or p-value?
DEVrepublik: Math and Statistics for Data Science – online course
Wikipedia: p-value
Wikipedia: Bayes factor
Wikipedia: Conjugate prior
Wikipedia: Beta-binomial distribution
Online book: An Introduction to Bayesian Thinking. A Companion to the Statistics with R Course, Merlise Clyde, Mine Cetinkaya-Rundel, Colin Rundel, David Banks, Christine Chai, Lizzy Huang, 2020: link
Online book: An Introduction to Bayesian Data Analysis for Cognitive Science, Bruno Nicenboim, Daniel Schad, and Shravan Vasishth, 2020: link

Presentation

https://rpubs.com/ruslankl/766909