In traditional null hypothesis significance testing (NHST), a p-value tells you how surprising your data would be if the null hypothesis were true. But it doesn’t tell you how much evidence you have for one hypothesis relative to another. That is where Bayes Factors come in.
A Bayes Factor is a ratio that expresses how much more likely the observed data are under one hypothesis compared to another.
There are two versions, and which one you see depends on which hypothesis is in the numerator:
\[ BF_{10} = \frac{\text{Likelihood of data under } H_1}{\text{Likelihood of data under } H_0} \]
\[ BF_{01} = \frac{\text{Likelihood of data under } H_0}{\text{Likelihood of data under } H_1} \]
(Note: \(H_0\) is the null hypothesis and \(H_1\) is the alternative hypothesis… as if there were only one…)
The subscript tells you the story: BF10 puts the alternative hypothesis (\(H_1\)) on top, so values greater than 1 favor \(H_1\). BF01 puts the null hypothesis (\(H_0\)) on top, so values greater than 1 favor \(H_0\).
BF01 works the same way but in reverse:
Different software packages report different versions. JASP, for example, reports both. Always check the subscript before interpreting.
Because BF10 and BF01 are just the same ratio flipped upside down, they are exact mathematical inverses of each other:
\[ BF_{01} = \frac{1}{BF_{10}} \qquad \text{and} \qquad BF_{10} = \frac{1}{BF_{01}} \]
This means if you are given one, you can always compute the other.
If \(BF_{10} = 5\):
\[ BF_{01} = \frac{1}{5} = 0.20 \]
If \(BF_{01} = 4\):
\[ BF_{10} = \frac{1}{4} = 0.25 \]
If \(BF_{10} = 1\):
\[ BF_{01} = \frac{1}{1} = 1 \]
When \(BF_{10} = BF_{01} = 1\), there is no evidence favoring either hypothesis.
This inverse relationship means a large \(BF_{10}\) and a small \(BF_{01}\) are saying exactly the same thing — strong evidence for \(H_1\). Students sometimes mistakenly treat a small \(BF_{01}\) as weak evidence, when in fact a \(BF_{01}\) of 0.05 is equivalent to a \(BF_{10}\) of 20: strong evidence for the alternative.
If you’re a student of numerical cognition like I am, you know that we should make this a rule: Always use the \(BF_01\) or \(BF_10\) that results in a whole number. People understand whole numbers a lot more accurately than decimals and fractions.
A study reports \(BF_{10} = 6\).
What is \(BF_{01}\)?
A study reports \(BF_{01} = 0.25\).
What is \(BF_{10}\)?
A researcher runs a Bayesian t-test and finds \(BF_{10} = 1\).
What does this mean?
A. There is strong evidence for \(H_1\)
B. There is strong evidence for \(H_0\)
C. The data are equally consistent with both hypotheses
D. The test failed to converge
A study reports \(BF_{01} = 12\).
Which hypothesis do the data favor, and by how much?
A. \(H_1\), the data are 12 times
more likely under \(H_1\)
B. \(H_0\), the data are 12 times more
likely under \(H_0\)
C. \(H_1\), the data are 0.08 times
more likely under \(H_1\)
D. Neither, because 12 is not a valid Bayes Factor
A researcher reports \(BF_{10} = 0.08\).
What is the equivalent \(BF_{01}\), rounded to two decimal places?
A Bayes Factor is a continuous quantity — there is no sharp cutoff like p < .05. However, researchers commonly use interpretive conventions to communicate the strength of evidence a Bayes Factor represents.
The most widely used guidelines in psychology are adapted from Harold Jeffreys:
| \(BF_{10}\) | Interpretation |
|---|---|
| \(> 100\) | Extreme evidence for \(H_1\) |
| \(30 – 100\) | Very strong evidence for \(H_1\) |
| \(10 – 30\) | Strong evidence for \(H_1\) |
| \(3 – 10\) | Moderate evidence for \(H_1\) |
| \(1 – 3\) | Anecdotal evidence for \(H_1\) |
| \(1\) | No evidence |
| \(1/3 – 1\) | Anecdotal evidence for \(H_0\) |
| \(1/10 – 1/3\) | Moderate evidence for \(H_0\) |
| \(1/30 – 1/10\) | Strong evidence for \(H_0\) |
| \(1/100 – 1/30\) | Very strong evidence for \(H_0\) |
| \(< 1/100\) | Extreme evidence for \(H_0\) |
Note that the table is symmetric: the same thresholds apply in both directions, which is a direct consequence of the inverse relationship between BF10 and BF01.
For reference, here is the same table expressed in terms of \(BF_{01}\):
| \(BF_{01}\) | Interpretation |
|---|---|
| \(> 100\) | Extreme evidence for \(H_0\) |
| \(30 – 100\) | Very strong evidence for \(H_0\) |
| \(10 – 30\) | Strong evidence for \(H_0\) |
| \(3 – 10\) | Moderate evidence for \(H_0\) |
| \(1 – 3\) | Anecdotal evidence for \(H_0\) |
| \(1\) | No evidence |
| \(1/3 – 1\) | Anecdotal evidence for \(H_1\) |
| \(1/10 – 1/3\) | Moderate evidence for \(H_1\) |
| \(1/30 – 1/10\) | Strong evidence for \(H_1\) |
| \(1/100 – 1/30\) | Very strong evidence for \(H_1\) |
| \(< 1/100\) | Extreme evidence for \(H_1\) |
The two tables are mirror images of each other — which is just the inverse relationship made visual. If you find yourself staring at a \(BF_{01}\) and feeling uncertain, converting it to \(BF_{10}\) and using the first table is always a valid move.
These conventions are guidelines, not rules. A \(BF_{10}\) of 2.9 is not meaningfully different from a \(BF_{10}\) of 3.1, just as a p-value of .051 is not meaningfully different from .049. The label matters less than the overall pattern of evidence, including effect size, sample size, replication, and theoretical plausibility.
A study reports \(BF_{10} = 14.2\).
Using Jeffreys’ conventions, how would you describe the strength of evidence?
A. Anecdotal evidence for \(H_1\)
B. Moderate evidence for \(H_1\)
C. Strong evidence for \(H_1\)
D. Very strong evidence for \(H_1\)
A study reports \(BF_{01} = 15\).
Using Jeffreys’ conventions, how would you describe the strength of evidence?
A. Strong evidence for \(H_1\)
B. Moderate evidence for \(H_0\)
C. Strong evidence for \(H_0\)
D. Extreme evidence for \(H_0\)
A researcher reports \(BF_{10} = 2.1\).
What is the most appropriate conclusion?
A. There is moderate evidence for \(H_1\) and the null can be rejected
B. There is anecdotal evidence for \(H_1\), but it is not strong
C. There is anecdotal evidence for \(H_0\)
D. The Bayes Factor is too small to interpret