The problem: stating evidence for an invariance

Where null hypothesis significance testing fails us

Where Bayes factor could save us

Examples: coin flips, t-test, ANOVA

Some criticisms

August 18, 2015

The problem: stating evidence for an invariance

Where null hypothesis significance testing fails us

Where Bayes factor could save us

Examples: coin flips, t-test, ANOVA

Some criticisms

According to your theory, the effect of Factor A is expected to be **invariant** with respect to Factor B

The effect of Factor A should not be different across levels of Factor B

How can we provide statistical evidence for this invariance?

Fit a statistical model (i.e., ANOVA) to data where Factor A and Factor B were manipulated

Is the Factor A X Factor B interaction significant?

Calculate

*p*value – what proportion of values will be as or more extreme as what we observed, under the null hypothesis?Reject or fail to reject null hypothesis based on

*p*value (i.e., reject if*p*< .05)

Fail to reject null hypothesis of no difference

Can't "accept" the null (even if we want to!)

State that the evidence for an effect is inconclusive

If you make a big deal of a null effect, chances are a reviewer will suggest your study was underpowered to find an interaction

**The absence of an effect is not itself evidence of absence**

Directly compare the likelihood of the data under null and non-null models by computing **Bayes factor**

To what extent should we revise our beliefs regarding null and alternative models after observing data?

**Frequentist (Classical)**

Probability (i.e., a *p* value) is the proportion of expected outcomes under repeated tests

**Bayesian**

Probability quantifies our subjective belief about the likelihood of an outcome, and is rationally updated given new data

Bayes rule in simple form:

\[ Pr ( H | D ) = \frac{ Pr( D | H ) Pr ( H ) }{ Pr ( D ) } \]

\(Pr(D|H)\) is the likelihood of the data under a hypothesis

\(Pr(H)\) is the prior probability of the hypothesis

\(Pr(D)\) is the probability of the data

Bayes factor compares the marginal likelihood of the data under competing models

\[ \frac{ Pr( M_0 | D )}{ Pr( M_1 | D )} = \frac{ Pr( D | M_0) }{ Pr (D | M_1 ) } \times \frac{ Pr(M_0) } { Pr ( M_1 )} \]

Are the data more probable under the null model than a specified alternative?

- When \(Pr(D|M)\) is computed by maximizing the likelihood, we can compare two
**nested**models by comparing the likelihood ratio

\(-2 * ln Pr(D|M_1)/Pr(D|M_2)\)

However, there is no direct mechanism that can favor a null (less complex) model, as the log-likelihood of the more complex of the nested models will always be as least as high as the reduced model

Bayes factor offers a clear advantage by eliminating this limitation

We want to determine the likelihood of flipping heads using a coin. Prior to flipping the coin 10 times, we may believe that all outcomes are equally likely.

\(Y | \theta \sim \text{Binomial}(\theta, N)\)

\(\theta \sim \text{Uniform} (0,1)\)

\(\theta \sim \text{Uniform} (0,1)\)

Another person may believe the coin to be fairly weighted, and is likely to land heads and tails evenly.

\(Y \sim \text{Binomial}(.5, N)\)

Let's say we observe 6 heads out of 10 flips.

\(Y = 6\)

\(Pr(D|M_1) = .09\)

\(Pr(D|M_2) = .21\)

\(BF_{12} = .44\)

\(BF_{21} = 2.26\)

The data are more than twice as likely under the 'fair' model as under the uniform model.

Kass & Raftery (1995) recommendations

BF | Strength of evidence |
---|---|

< 1 | Favors other model |

1 - 3 | Not worth a bare mention |

3 - 20 | Substantial |

20 - 150 | Strong |

> 150 | Very strong |

Bayes factor informs us how to revise our prior beliefs

People often hold different prior opinions regarding the truth of hypotheses (i.e., ESP, global warming)

However, we can agree on the impact that data have on our resulting beliefs

Subjectivity in choosing priors is a concern, as priors have a direct influence on Bayes factor (and posterior beliefs)

We often don't have point estimates for effects in to-be-conducted experiments

Complexity of this problem has prevented many from adopting Bayesian methods

Rouder et al. (2009, 2012) have developed "default" priors for use in a wide range of applications in a convenient R package

Priors are placed on effect size (unaffected by measurement scale) and variance

Warning: not to be used blindly

A t-test can indicate whether two independently sampled means are significantly different from one another. Under the null, the two means are assumed to be equal.

\(H_0\): \(\bar{Y_1} - \bar{Y_2} = 0\)

\(H_A\): \(\bar{Y_1} - \bar{Y_2} \neq 0\)

\[ t = \frac{ \bar{Y_1} - \bar{Y_2} } { s_p \sqrt { 1/N_1 + 1/N_2 } } \]

Compare observed \(t\) against \(t\) with \(df = N_1 + N_2 - 2\)

Assume no effect

\[ M_0: \delta = 0 \]

**Null model (\(M_0\))**

Jeffrey's (noninformative) prior on variance \(Pr(\sigma^2) = 1 / \sigma^2\)

**Alternative model (\(M_1\))**

Prior on effect size where:

\[ \delta = \frac{ \mu } { \sigma^2 } \]

\[ M_1: \delta \text{ ~ Cauchy} \]

When weaving, does yarn "A" break more than yarn "B"?

plot(breaks ~ factor(wool), warpbreaks)

Does yarn "A" break more than yarn "B"?

t.test(breaks ~ wool, data = warpbreaks, var.eq=TRUE)

## ## Two Sample t-test ## ## data: breaks by wool ## t = 1.6335, df = 52, p-value = 0.1084 ## alternative hypothesis: true difference in means is not equal to 0 ## 95 percent confidence interval: ## -1.319679 12.875235 ## sample estimates: ## mean in group A mean in group B ## 31.03704 25.25926

ttestBF(formula = breaks ~ wool, data = warpbreaks)

## Bayes factor analysis ## -------------- ## [1] Alt., r=0.707 : 0.8184672 ±0.01% ## ## Against denominator: ## Null, mu1-mu2 = 0 ## --- ## Bayes factor type: BFindepSample, JZS

Plot code from E-J Wagenmakers

Rouder et al. (2012) p. 360

Bayesian implementation of the hierarchical ANOVA model (Gelman, 2005)

\[ Y_{ijk} = \mu + \sigma[A_i + B_j + (AB)ij] + \epsilon_{ijk} \]

Cauchy priors are placed on effect size for each term (A, B, AB)

Error terms are assumed to be independent for each effect

Fixed effect \(\alpha\)

\(\alpha* | g \sim \text{Normal}(0, g_{\alpha-1},gI_{\alpha-1})\)

\(g \sim \text{Inverse-}\chi^2(1)\)

Resulting effect-size distributions are multivariate Cauchy

Separate \(g\)-prior for each factor

Effect size used is \(R^2\)

Null model includes only random effects (i.e., subjects or items)

Each of the nested linear models in the ANOVA design is compared to the null model

Nested models can be directly compared by taking the ratio of the Bayes factor

E-Z Reader 10 (Reichle, Warren, & McConnell, 2009) asserts that word identification and integration occur in discrete serial stages

Word identification time is a function of frequency and cloze probability

Integration of word n can affect eye movements if it lags behind word n+1 identification or it fails outright (with probability

*P(F)*)

The man noticed the

*journal*was missing from his desk. (HF, plausible)The man noticed the

*stapler*was missing from his desk. (LF, plausible)The man angered the

*journal*by placing it in the drawer. (HF, implausible)The man angered the

*stapler*by placing it in the drawer. (LF, implausible)

Frequency, but not plausibility, will affect word skipping

Plausibility, but not frequency, will affect regression rates

Frequency and plausibility will have additive (critically, not interactive) effects across all temporal measures (i.e., gaze duration)

Variable | F(1,111) |
p |
---|---|---|

Frequency | 35.14 | < .001 |

Plausibility | 7.52 | < .01 |

Freq X Plaus | .33 | .57 |

ffd.aov.bf <- anovaBF(ffdR3 ~ Freq*Plaus + Subj, data=fpl.tarffd.byss, whichRandom="Subj", rscaleFixed=.5, iterations=100000) ffd.aov.bf

## Bayes factor analysis ## -------------- ## [1] Freq + Subj : 237815.4 ±3.3% ## [2] Plaus + Subj : 4.578714 ±0.48% ## [3] Freq + Plaus + Subj : 1473768 ±0.62% ## [4] Freq + Plaus + Freq:Plaus + Subj : 244388.3 ±0.86% ## ## Against denominator: ## ffdR3 ~ Subj ## --- ## Bayes factor type: BFlinearModel, JZS

plot(ffd.aov.bf)

What is the evidence for the additive effects model over the interactive effects model?

ffd.aov.bf[3] / ffd.aov.bf[4]

## Bayes factor analysis ## -------------- ## [1] Freq + Plaus + Subj : 6.030438 ±1.06% ## ## Against denominator: ## ffdR3 ~ Freq + Plaus + Freq:Plaus + Subj ## --- ## Bayes factor type: BFlinearModel, JZS

We can also sample from the posterior distribution of the parameters

ffd.chain <- data.frame(posterior(ffd.aov.bf[4], iterations=10000, columnFilter="^Subj$")) head(ffd.chain[,1:5])

## mu Freq.HF Freq.LF Plaus.IM Plaus.PL ## 1 245.8085 -7.556494 7.556494 3.956670 -3.956670 ## 2 243.9439 -8.633186 8.633186 5.928889 -5.928889 ## 3 242.8874 -9.918842 9.918842 5.867772 -5.867772 ## 4 250.2694 -5.541311 5.541311 4.786933 -4.786933 ## 5 243.7995 -9.204271 9.204271 4.652317 -4.652317 ## 6 249.9436 -7.153691 7.153691 2.312630 -2.312630

We can compute posterior odds using the BF object. For example, we can consider all models to be equally likely before observing the data (we have done this implicitly already).

prior.odds = newPriorOdds(ffd.aov.bf, type = "equal") prior.odds

## Prior odds ## -------------- ## [1] Freq + Subj : 1 ## [2] Plaus + Subj : 1 ## [3] Freq + Plaus + Subj : 1 ## [4] Freq + Plaus + Freq:Plaus + Subj : 1 ## ## Against denominator: ## ffdR3 ~ Subj ## --- ## Model type: BFlinearModel, JZS

Now compute the posterior odds

post.odds = prior.odds * ffd.aov.bf post.odds

## Posterior odds ## -------------- ## [1] Freq + Subj : 237815.4 ±3.3% ## [2] Plaus + Subj : 4.578714 ±0.48% ## [3] Freq + Plaus + Subj : 1473768 ±0.62% ## [4] Freq + Plaus + Freq:Plaus + Subj : 244388.3 ±0.86% ## ## Against denominator: ## ffdR3 ~ Subj ## --- ## Model type: BFlinearModel, JZS

And we can convert these to probabilities

post.prob = as.BFprobability(post.odds) post.prob

## Posterior probabilities ## -------------- ## [1] Freq + Subj : 0.1215839 ±NA% ## [2] Plaus + Subj : 2.340883e-06 ±NA% ## [3] Freq + Plaus + Subj : 0.7534689 ±NA% ## [4] Freq + Plaus + Freq:Plaus + Subj : 0.1249443 ±NA% ## [5] Subj : 5.112533e-07 ±NA% ## ## Normalized probability: 1 ## --- ## Model type: BFlinearModel, JZS

Mixed effects models are also possible - just pass additional factors to "whichRandom". NAs must be omitted.

fpl.nona <- subset(fpl, !is.na(fpl$ffdR3)) ffd.mixed.bf <- anovaBF(ffdR3 ~ Freq*Plaus + subj + item, data=fpl.nona, whichRandom=c("subj", "item"), rscaleFixed=.5) ffd.mixed.bf

## Bayes factor analysis ## -------------- ## [1] Freq + item + subj : 4641683 ±1.03% ## [2] Plaus + item + subj : 7.145093 ±1.92% ## [3] Freq + Plaus + item + subj : 32615663 ±2.11% ## [4] Freq + Plaus + Freq:Plaus + item + subj : 1723331 ±2.03% ## ## Against denominator: ## ffdR3 ~ item + subj ## --- ## Bayes factor type: BFlinearModel, JZS

plot(ffd.mixed.bf)

Is Bayes factor prejudiced against small effects? Uri Simonsohn

Krushke: Estimation and precision are the goal (i.e., posterior predictive checks)