Fisher’s Exact Test can be used in place of a chi-square test for independence in cases where counts are too low to make the chi-square approximation appropriate for calculating a p-value. A rule of thumb often used for this is that a chi-square test is no longer reliable when there are fewer than five or so counts in every cell of the contingency table.
If you are not interested in why that is, skip this paragraph. If you are, read on. In a chi sqaure test we
In a chi square test of count values we take each value in our contingency table, square the difference between it and an ‘expected’ value, then divide this squared difference by the expected value.
Ronald Fisher devised an exact test that could be used for this scenario in situations where the marginal totals (ie row totals and column totals) of the table were fixed by the experimenter.
Say we were devising a phone app that we hoped could distingush between two visually similar species of grass given a photo. In an experiment, we show the app 10 cases of species A and 10 of species B and in each case we record the idientification decision of the app. That will give four possibilities.
The counts of these decisions can be recorded in a two-way, in this case 2 x 2, table, in which each cell contains the counts recorded for each combination of the levels of each factor. In a test run, suppose the numbers recorded for each case were as follows:
## [[1]]
##
## [[2]]
## [1] 0.03215051
Here we see that the row totals are set by the experimenter while the column totals are set by the app. Thus we see that the app is slightly more likely to choose A than B, regardless of the actual identity of the plant, since it went for A 11 times out of 20, and for B only 9 times.
Our null hypothesis might be:
The app has no ability to correctly distinguish species A from species B.
We should not use a chi-square test for independence here since the numbers in two of the cells are less than 5.
In the Fisher’s Exact Test, we accept that the marginal totals are as observed. With this constraint we then determine the probability of getting every possible table of cells, under the null hypothesis that the two factors are independent of each other. The p-value for (lack of) independence of the two variables is then the cumulative sum of the probabilities of getting the table we actually got, plus those of tables that are even less likely.
To do this, we notice that if the column and row totals hve been fixed in advance, then only one of the cell values of the table is independent. We can choose any cell. Once this is known, all the others can be determined, using this first value and the marginal totals. It turns that we can treat this number as a random variable that varies with probabilities governed by the hypergeometric distribution.
Let’s see how this works.
Imagine an urn with N balls, K of them white, the rest, N-K of them, black.
Draw n balls without replacement
Let x be the number of white balls in the sample.
x is a random variable that follows a hypergeometric (N, K, n) distribution:
\[ \text{P}(X=x)=\frac{{K \choose x} {{N-K} \choose {N-x}}}{N\choose n} \]
This tells us the probability that we will get x white balls if we draw n balls without replacement (each ball that is drawn is not put back) from an urn that at first contains N balls, K of them white.
For example, if we have an urn with N = 20 balls, K = 11 of them white, the remaining N - K = 9 of them black, and draw n = 10 balls without replacement, the probability that x = 7 of these will be white is given by
\[ \begin{align*} N=20, K&=11, n=10, x=7\\ \\ \text{P}(X=7)& =\frac{{K \choose x} {{N-K} \choose {N-x}}}{N\choose n}\\ &=\frac{{11 \choose 7} {{20-11} \choose {20-7}}}{20\choose 10}\\ &= 0.014 \end{align*} \]
dhyper(x, m, n, k).phyper(q, m, n, k).qhyper(p, m, n, k).rhyper(nn, m, n, k).where, in R:
In this test we
In the case of the example, that means that we should calculate
\[ P(\text{observed table}|H_0) = P(X=7|H_0)\\ X ∼ \text{Hypergeometric} (N=20, K=11, n=10) \] and then
In the figure below we show all possible tables, along with their respective p-values. The table of the actual results obtained is picked out in colour. We see that tables
We see that tables a), b), c), h) (the actual results), i) and j) all have p-values equal to or less than that of the actual results. The combined total of these p-values is 0.06978.
This p-value is bigger than 0.05 so we would normally fail to reject the null hypothesis that there is no association between the two factors. There is no evidence from these data that the app does any better than guessing whether a plant is species A or species B!
If we already know the four counts for our \(2 \times 2\) table we can create a \(2 \times 2\) matrix of them like this:
## [,1] [,2]
## [1,] 3 7
## [2,] 8 2
If we have the data in a tidy data frame like we had for the ladybird
data we can convert it into a matrix using the xtabs()
command in the same way as we did there.
We use this matrix as the argument of the fisher.test()
function:
##
## Fisher's Exact Test for Count Data
##
## data: app.mat
## p-value = 0.06978
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 0.007870555 1.133635839
## sample estimates:
## odds ratio
## 0.1226533
You will see that this gives the same p-value that we have calculated manually.
If we used a chi-square test on this same data, R gives us a warning, because of the low values in the table:
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: app.mat
## X-squared = 3.2323, df = 1, p-value = 0.0722
This is a cautionary tale: you can run this or that test in R on data even when the data are unsuitable for the test, and you will get an output, but the output may not be reliable. Here, at least, we are given a warning that alerts us to this possibility, but that is not always the case. You need to think carefully as to whether a given test is the right one before you use it.