The problem: stating evidence for an invariance
Where null hypothesis significance testing fails us
Where Bayes factor could save us
Examples: coin flips, t-test, ANOVA
Some criticisms
August 18, 2015
The problem: stating evidence for an invariance
Where null hypothesis significance testing fails us
Where Bayes factor could save us
Examples: coin flips, t-test, ANOVA
Some criticisms
According to your theory, the effect of Factor A is expected to be invariant with respect to Factor B
The effect of Factor A should not be different across levels of Factor B
How can we provide statistical evidence for this invariance?
Fit a statistical model (i.e., ANOVA) to data where Factor A and Factor B were manipulated
Is the Factor A X Factor B interaction significant?
Calculate p value – what proportion of values will be as or more extreme as what we observed, under the null hypothesis?
Reject or fail to reject null hypothesis based on p value (i.e., reject if p < .05)
Fail to reject null hypothesis of no difference
Can't "accept" the null (even if we want to!)
State that the evidence for an effect is inconclusive
If you make a big deal of a null effect, chances are a reviewer will suggest your study was underpowered to find an interaction
Directly compare the likelihood of the data under null and non-null models by computing Bayes factor
To what extent should we revise our beliefs regarding null and alternative models after observing data?
Frequentist (Classical)
Probability (i.e., a p value) is the proportion of expected outcomes under repeated tests
Bayesian
Probability quantifies our subjective belief about the likelihood of an outcome, and is rationally updated given new data
Bayes rule in simple form:
\[ Pr ( H | D ) = \frac{ Pr( D | H ) Pr ( H ) }{ Pr ( D ) } \]
\(Pr(D|H)\) is the likelihood of the data under a hypothesis
\(Pr(H)\) is the prior probability of the hypothesis
\(Pr(D)\) is the probability of the data
Bayes factor compares the marginal likelihood of the data under competing models
\[ \frac{ Pr( M_0 | D )}{ Pr( M_1 | D )} = \frac{ Pr( D | M_0) }{ Pr (D | M_1 ) } \times \frac{ Pr(M_0) } { Pr ( M_1 )} \]
Are the data more probable under the null model than a specified alternative?
\(-2 * ln Pr(D|M_1)/Pr(D|M_2)\)
However, there is no direct mechanism that can favor a null (less complex) model, as the log-likelihood of the more complex of the nested models will always be as least as high as the reduced model
Bayes factor offers a clear advantage by eliminating this limitation
We want to determine the likelihood of flipping heads using a coin. Prior to flipping the coin 10 times, we may believe that all outcomes are equally likely.
\(Y | \theta \sim \text{Binomial}(\theta, N)\)
\(\theta \sim \text{Uniform} (0,1)\)
\(\theta \sim \text{Uniform} (0,1)\)
Another person may believe the coin to be fairly weighted, and is likely to land heads and tails evenly.
\(Y \sim \text{Binomial}(.5, N)\)
Let's say we observe 6 heads out of 10 flips.
\(Y = 6\)
\(Pr(D|M_1) = .09\)
\(Pr(D|M_2) = .21\)
\(BF_{12} = .44\)
\(BF_{21} = 2.26\)
The data are more than twice as likely under the 'fair' model as under the uniform model.
Kass & Raftery (1995) recommendations
BF | Strength of evidence |
---|---|
< 1 | Favors other model |
1 - 3 | Not worth a bare mention |
3 - 20 | Substantial |
20 - 150 | Strong |
> 150 | Very strong |
Bayes factor informs us how to revise our prior beliefs
People often hold different prior opinions regarding the truth of hypotheses (i.e., ESP, global warming)
However, we can agree on the impact that data have on our resulting beliefs
Subjectivity in choosing priors is a concern, as priors have a direct influence on Bayes factor (and posterior beliefs)
We often don't have point estimates for effects in to-be-conducted experiments
Complexity of this problem has prevented many from adopting Bayesian methods
Rouder et al. (2009, 2012) have developed "default" priors for use in a wide range of applications in a convenient R package
Priors are placed on effect size (unaffected by measurement scale) and variance
Warning: not to be used blindly
A t-test can indicate whether two independently sampled means are significantly different from one another. Under the null, the two means are assumed to be equal.
\(H_0\): \(\bar{Y_1} - \bar{Y_2} = 0\)
\(H_A\): \(\bar{Y_1} - \bar{Y_2} \neq 0\)
\[ t = \frac{ \bar{Y_1} - \bar{Y_2} } { s_p \sqrt { 1/N_1 + 1/N_2 } } \]
Compare observed \(t\) against \(t\) with \(df = N_1 + N_2 - 2\)
Assume no effect
\[ M_0: \delta = 0 \]
Null model (\(M_0\))
Jeffrey's (noninformative) prior on variance \(Pr(\sigma^2) = 1 / \sigma^2\)
Alternative model (\(M_1\))
Prior on effect size where:
\[ \delta = \frac{ \mu } { \sigma^2 } \]
\[ M_1: \delta \text{ ~ Cauchy} \]
When weaving, does yarn "A" break more than yarn "B"?
plot(breaks ~ factor(wool), warpbreaks)
Does yarn "A" break more than yarn "B"?
t.test(breaks ~ wool, data = warpbreaks, var.eq=TRUE)
## ## Two Sample t-test ## ## data: breaks by wool ## t = 1.6335, df = 52, p-value = 0.1084 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## -1.319679 12.875235 ## sample estimates: ## mean in group A mean in group B ## 31.03704 25.25926
ttestBF(formula = breaks ~ wool, data = warpbreaks)
## Bayes factor analysis ## -------------- ## [1] Alt., r=0.707 : 0.8184672 ±0.01% ## ## Against denominator: ## Null, mu1-mu2 = 0 ## --- ## Bayes factor type: BFindepSample, JZS
Plot code from E-J Wagenmakers
Plot code from E-J Wagenmakers
Rouder et al. (2012) p. 360
Bayesian implementation of the hierarchical ANOVA model (Gelman, 2005)
\[ Y_{ijk} = \mu + \sigma[A_i + B_j + (AB)ij] + \epsilon_{ijk} \]
Cauchy priors are placed on effect size for each term (A, B, AB)
Error terms are assumed to be independent for each effect
Fixed effect \(\alpha\)
\(\alpha* | g \sim \text{Normal}(0, g_{\alpha-1},gI_{\alpha-1})\)
\(g \sim \text{Inverse-}\chi^2(1)\)
Resulting effect-size distributions are multivariate Cauchy
Separate \(g\)-prior for each factor
Effect size used is \(R^2\)
Null model includes only random effects (i.e., subjects or items)
Each of the nested linear models in the ANOVA design is compared to the null model
Nested models can be directly compared by taking the ratio of the Bayes factor
E-Z Reader 10 (Reichle, Warren, & McConnell, 2009) asserts that word identification and integration occur in discrete serial stages
Word identification time is a function of frequency and cloze probability
Integration of word n can affect eye movements if it lags behind word n+1 identification or it fails outright (with probability P(F))
The man noticed the journal was missing from his desk. (HF, plausible)
The man noticed the stapler was missing from his desk. (LF, plausible)
The man angered the journal by placing it in the drawer. (HF, implausible)
The man angered the stapler by placing it in the drawer. (LF, implausible)
Frequency, but not plausibility, will affect word skipping
Plausibility, but not frequency, will affect regression rates
Frequency and plausibility will have additive (critically, not interactive) effects across all temporal measures (i.e., gaze duration)
Variable | F(1,111) | p |
---|---|---|
Frequency | 35.14 | < .001 |
Plausibility | 7.52 | < .01 |
Freq X Plaus | .33 | .57 |
ffd.aov.bf <- anovaBF(ffdR3 ~ Freq*Plaus + Subj, data=fpl.tarffd.byss, whichRandom="Subj", rscaleFixed=.5, iterations=100000) ffd.aov.bf
## Bayes factor analysis ## -------------- ## [1] Freq + Subj : 237815.4 ±3.3% ## [2] Plaus + Subj : 4.578714 ±0.48% ## [3] Freq + Plaus + Subj : 1473768 ±0.62% ## [4] Freq + Plaus + Freq:Plaus + Subj : 244388.3 ±0.86% ## ## Against denominator: ## ffdR3 ~ Subj ## --- ## Bayes factor type: BFlinearModel, JZS
plot(ffd.aov.bf)
What is the evidence for the additive effects model over the interactive effects model?
ffd.aov.bf[3] / ffd.aov.bf[4]
## Bayes factor analysis ## -------------- ## [1] Freq + Plaus + Subj : 6.030438 ±1.06% ## ## Against denominator: ## ffdR3 ~ Freq + Plaus + Freq:Plaus + Subj ## --- ## Bayes factor type: BFlinearModel, JZS
We can also sample from the posterior distribution of the parameters
ffd.chain <- data.frame(posterior(ffd.aov.bf[4], iterations=10000, columnFilter="^Subj$")) head(ffd.chain[,1:5])
## mu Freq.HF Freq.LF Plaus.IM Plaus.PL ## 1 245.8085 -7.556494 7.556494 3.956670 -3.956670 ## 2 243.9439 -8.633186 8.633186 5.928889 -5.928889 ## 3 242.8874 -9.918842 9.918842 5.867772 -5.867772 ## 4 250.2694 -5.541311 5.541311 4.786933 -4.786933 ## 5 243.7995 -9.204271 9.204271 4.652317 -4.652317 ## 6 249.9436 -7.153691 7.153691 2.312630 -2.312630
We can compute posterior odds using the BF object. For example, we can consider all models to be equally likely before observing the data (we have done this implicitly already).
prior.odds = newPriorOdds(ffd.aov.bf, type = "equal") prior.odds
## Prior odds ## -------------- ## [1] Freq + Subj : 1 ## [2] Plaus + Subj : 1 ## [3] Freq + Plaus + Subj : 1 ## [4] Freq + Plaus + Freq:Plaus + Subj : 1 ## ## Against denominator: ## ffdR3 ~ Subj ## --- ## Model type: BFlinearModel, JZS
Now compute the posterior odds
post.odds = prior.odds * ffd.aov.bf post.odds
## Posterior odds ## -------------- ## [1] Freq + Subj : 237815.4 ±3.3% ## [2] Plaus + Subj : 4.578714 ±0.48% ## [3] Freq + Plaus + Subj : 1473768 ±0.62% ## [4] Freq + Plaus + Freq:Plaus + Subj : 244388.3 ±0.86% ## ## Against denominator: ## ffdR3 ~ Subj ## --- ## Model type: BFlinearModel, JZS
And we can convert these to probabilities
post.prob = as.BFprobability(post.odds) post.prob
## Posterior probabilities ## -------------- ## [1] Freq + Subj : 0.1215839 ±NA% ## [2] Plaus + Subj : 2.340883e-06 ±NA% ## [3] Freq + Plaus + Subj : 0.7534689 ±NA% ## [4] Freq + Plaus + Freq:Plaus + Subj : 0.1249443 ±NA% ## [5] Subj : 5.112533e-07 ±NA% ## ## Normalized probability: 1 ## --- ## Model type: BFlinearModel, JZS
Mixed effects models are also possible - just pass additional factors to "whichRandom". NAs must be omitted.
fpl.nona <- subset(fpl, !is.na(fpl$ffdR3)) ffd.mixed.bf <- anovaBF(ffdR3 ~ Freq*Plaus + subj + item, data=fpl.nona, whichRandom=c("subj", "item"), rscaleFixed=.5) ffd.mixed.bf
## Bayes factor analysis ## -------------- ## [1] Freq + item + subj : 4641683 ±1.03% ## [2] Plaus + item + subj : 7.145093 ±1.92% ## [3] Freq + Plaus + item + subj : 32615663 ±2.11% ## [4] Freq + Plaus + Freq:Plaus + item + subj : 1723331 ±2.03% ## ## Against denominator: ## ffdR3 ~ item + subj ## --- ## Bayes factor type: BFlinearModel, JZS
plot(ffd.mixed.bf)
Is Bayes factor prejudiced against small effects? Uri Simonsohn
Krushke: Estimation and precision are the goal (i.e., posterior predictive checks)
Model comparison using Bayes factors is an attractive alternative to NHST, in particular when null effects are of interest
The information gained from Bayesian analysis is rich
Recent developments by Rouder, Morey, and colleagues make Bayesian analysis digestable (and available to the masses using JASP)
What are we waiting for?