Fisher’s Exact Test

Fisher’s Exact Test can be used in place of a chi-square test for independence in cases where counts are too low to make the chi-square approximation appropriate for calculating a p-value. A rule of thumb often used for this is that a chi-square test is no longer reliable when there are fewer than five or so counts in every cell of the contingency table.

If you are not interested in why that is, skip this paragraph. If you are, read on. In a chi sqaure test we

In a chi square test of count values we take each value in our contingency table, square the difference between it and an ‘expected’ value, then divide this squared difference by the expected value.

Ronald Fisher devised an exact test that could be used for this scenario in situations where the marginal totals (ie row totals and column totals) of the table were fixed by the experimenter.

Say we were devising a phone app that we hoped could distingush between two visually similar species of grass given a photo. In an experiment, we show the app 10 cases of species A and 10 of species B and in each case we record the idientification decision of the app. That will give four possibilities.

species is A, app identifies it as A
species is A, app identifies it as B
species is B, app identifies is as B
species is B, app identifies it as A

The counts of these decisions can be recorded in a two-way, in this case 2 x 2, table, in which each cell contains the counts recorded for each combination of the levels of each factor. In a test run, suppose the numbers recorded for each case were as follows:

## [[1]]

## 
## [[2]]
## [1] 0.03215051

Here we see that the row totals are set by the experimenter while the column totals are set by the app. Thus we see that the app is slightly more likely to choose A than B, regardless of the actual identity of the plant, since it went for A 11 times out of 20, and for B only 9 times.

Our null hypothesis might be:

The app has no ability to correctly distinguish species A from species B.

We should not use a chi-square test for independence here since the numbers in two of the cells are less than 5.

In the Fisher’s Exact Test, we accept that the marginal totals are as observed. With this constraint we then determine the probability of getting every possible table of cells, under the null hypothesis that the two factors are independent of each other. The p-value for (lack of) independence of the two variables is then the cumulative sum of the probabilities of getting the table we actually got, plus those of tables that are even less likely.

To do this, we notice that if the column and row totals hve been fixed in advance, then only one of the cell values of the table is independent. We can choose any cell. Once this is known, all the others can be determined, using this first value and the marginal totals. It turns that we can treat this number as a random variable that varies with probabilities governed by the hypergeometric distribution.

Let’s see how this works.

The Hypergeometric Function

Imagine an urn with N balls, K of them white, the rest, N-K of them, black.

Draw n balls without replacement

Let x be the number of white balls in the sample.

x is a random variable that follows a hypergeometric (N, K, n) distribution:

\[ \text{P}(X=x)=\frac{{K \choose x} {{N-K} \choose {N-x}}}{N\choose n} \]

This tells us the probability that we will get x white balls if we draw n balls without replacement (each ball that is drawn is not put back) from an urn that at first contains N balls, K of them white.

For example, if we have an urn with N = 20 balls, K = 11 of them white, the remaining N - K = 9 of them black, and draw n = 10 balls without replacement, the probability that x = 7 of these will be white is given by

\[ \begin{align*} N=20, K&=11, n=10, x=7\\ \\ \text{P}(X=7)& =\frac{{K \choose x} {{N-K} \choose {N-x}}}{N\choose n}\\ &=\frac{{11 \choose 7} {{20-11} \choose {20-7}}}{20\choose 10}\\ &= 0.014 \end{align*} \]

Hypergeometric Distribuion in R

dhyper(x, m, n, k).
phyper(q, m, n, k).
qhyper(p, m, n, k).
rhyper(nn, m, n, k).

where, in R:

m = number of white balls in urn
n = number of black balls in urn.
k = number of balls sampled (without replacement).
x = number of white balls in sample.
nn= number of balls in urn.

Back to Fisher’s Exact Test

In this test we

Assume the null hypothesis (independence) to be true.
Constrain the marginal counts to be as observed
Calculate the probability of the observed table, gven the null hypothesis

In the case of the example, that means that we should calculate

\[ P(\text{observed table}|H_0) = P(X=7|H_0)\\ X ∼ \text{Hypergeometric} (N=20, K=11, n=10) \] and then

In the same way we calculate the probabilities of all possible tables, given the constraints of the marginal totals, then we
Sum the probabilities of all tables that are as or more ‘extreme’ than the observed table ie whose probability is less than or equal to that of the observed table.
The resulting sum is the p-value of the observed table, given the marginal totals.
If this p-value is less than a predetermined threshold value (the value chosen is usually 0.05) then we reject the null hypothesis and regard the data as providing evidence for an association between the factors. That is we reject the idea that they are independent of each other. In the case presented above, that means we reject the idea

Example

In the figure below we show all possible tables, along with their respective p-values. The table of the actual results obtained is picked out in colour. We see that tables

We see that tables a), b), c), h) (the actual results), i) and j) all have p-values equal to or less than that of the actual results. The combined total of these p-values is 0.06978.

This p-value is bigger than 0.05 so we would normally fail to reject the null hypothesis that there is no association between the two factors. There is no evidence from these data that the app does any better than guessing whether a plant is species A or species B!

Fisher’s Exact Test in R

If we already know the four counts for our \(2 \times 2\) table we can create a \(2 \times 2\) matrix of them like this:

##      [,1] [,2]
## [1,]    3    7
## [2,]    8    2

If we have the data in a tidy data frame like we had for the ladybird data we can convert it into a matrix using the xtabs() command in the same way as we did there.

We use this matrix as the argument of the fisher.test() function:

## 
##  Fisher's Exact Test for Count Data
## 
## data:  app.mat
## p-value = 0.06978
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.007870555 1.133635839
## sample estimates:
## odds ratio 
##  0.1226533

You will see that this gives the same p-value that we have calculated manually.

What if we just used a chi-square test?

If we used a chi-square test on this same data, R gives us a warning, because of the low values in the table:

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  app.mat
## X-squared = 3.2323, df = 1, p-value = 0.0722

This is a cautionary tale: you can run this or that test in R on data even when the data are unsuitable for the test, and you will get an output, but the output may not be reliable. Here, at least, we are given a warning that alerts us to this possibility, but that is not always the case. You need to think carefully as to whether a given test is the right one before you use it.