McNemar’s test is the analogue of the paired T-test, and should be used for the analysis of paired binary data. With paired data, Fisher’s Exact test does not provide a valid hypothesis test.
## Loading required package: exactci
## Loading required package: ssanv
The motivation for this example comes from the following study examining the neuroscience of maternal nest building in several mouse models.
AGRP Neurons Project to the Medial Preoptic Area and Modulate Maternal Nest-Building Xing-Yu Li, Ying Han, Wen Zhang, Shao-Ran Wang, Yi-Chao Wei, Shuai-Shuai Li, Jun-Kai Lin, Jing-Jing Yan, Ai-Xiao Chen, Xin Zhang, Zheng-Dong Zhao, Wei L. Shen and Xiao-Hong Xu Journal of Neuroscience 16 January 2019, 39 (3) 456-471 (https://doi.org/10.1523/JNEUROSCI.0958-18.2018)
We focus on the behavioral experiments described in Table 1 of the paper, and in particular on the analyses were each animal is scored as (1/0) for either completing or not completing a task. Specifically, eight animals were exposed first to Ad libitum and then to Starvation diets. In this design, each animal is measured on two occasions. For illustration, we specifically consider the Retrieval Finish task. Table 1 indicates that 8/8 animals completed the task under the Ad libitum diet and 3/8 animals completed the task under the Starvation diet.
The publications shows the results for Fisher’s Exact Test, which is partially reproduced here.
Fisher’s Exact test is valid for experiments with two groups, when each animal is assigned to exactly one intervention and a binary outcome is measured on each animal. Critically, the animals must be independent of each other.
As an example consider the outcome Retrieval Finish: Yes versus No (third outcome in Table 1 of Li et al. 2019). For Fisher’s Exact test, the data are first summarized into a \(2\times2\) table. Specifically the table used to carry out Fisher’s Exact test in Li et al. 2019 is reproduced below.
The table suggests there were 8 animals assigned to each diet, for a total of 16 animals. Of the animals assigned to Ad libitum, all 8 accomplished Retrieval Finish. Of the 8 animals assigned to Starvation, only 3 accomplished Retrieval Finish. Of the 11 animals that completed Retrieval Finish, 8 were in the Ad libitum diet group and 3 were in the Starvation group. As there were only 8 unique animals, this table is inconsistent with the actual data in the study . Nevertheless, this is the table that was analyzed.Ad libitum | Starve | . | |
---|---|---|---|
Retrieval Finish |
|||
Yes | 8 | 3 | 11 |
No | 0 | 5 | 5 |
Total | |||
. | 8 | 8 | 16 |
The results of the analysis for Fisher’s Exact Test appear below. The \(p\)-value is 0.0256. This value is similar to the one presented in Table 1 of the paper (p=.020), which appears to have an incorrect digit in the third decimal place.
##
## Fisher's Exact Test for Count Data
##
## data: maternal_two_group
## p-value = 0.02564
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 1.325343 Inf
## sample estimates:
## odds ratio
## Inf
In a paired or repeated measures design, each animal is exposed to both sets of experimental conditions. In the Li et al. experiment, animals were first exposed to Ad libitum feed and then, after a five day period, to Starvation feed. The outcome, Retrieval Finish, was measured for each animal twice, once after each experimental condition. There are 16 measurements in total, but only 8 independent animals.
The correct \(2\times2\) table appear below. Here the results are the Retrieval Finish outcomes for pairs of conditions. For Example, 3 animals completed Retrieval Finish for both Ad libitum and Starvation; 5 animals completed Retrieval Finish for Ad libitum but not Starvation.
Yes | No | . | |
---|---|---|---|
Ad libitum | |||
Yes | 3 | 5 | 8 |
No | 0 | 0 | 0 |
All Ad libitum | |||
. | 3 | 5 | 8 |
The results of the Exact McNemar’s Test appear below. The \(p\)-value is 0.0625, considerably larger than the the reported p-value of 0.020 for Fisher’s Exact test in Lin et al. or 0.026 determined above.
##
## Exact McNemar test (with central confidence intervals)
##
## data: maternal_paired
## b = 5, c = 0, p-value = 0.0625
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 0.9163559 Inf
## sample estimates:
## odds ratio
## Inf
To illustrate the consequences of analyzing correlated paired data using Fisher’s Exact test, we simulated 1000 experiments. Initially each set of data involved 15 experimental units, each with values from two trials. The probability of success for each trial was 0.5. The pairs were generated either using completely independent data or by allowing a dependence between the trials.
The true Null Hypothesis was: No association between the intervention and outcome. We analyzed the data using either Fisher’s Exact Test or McNemar’s Exact Test. We rejected the null hypothesis if p<.05.
The correlation parameters describe the probability of a success on trial 2 given a success or failure on trial 1. Thus \(P(X_2=1|X_1=1)\) is the probability of success on trial 2, \(X_2\), given that trial 1, \(X_1\), was a success; \(P(X_2=1|X_1=0)\) is the probability of success on trial2 (\(X_2\)) given that trial 1 (\(X_1\)) was a failure. When the data are independent, the probability of success on \(X_2\) does not vary as a function of the result on \(X_1\). Both are set to 0.5, by the design of the simulation. With positive correlation, the overall probability of success is still 0.5, but the probability of success on trial 2 is influenced by the result of trial 1.
For a sample of size 15, the probability of rejecting the null hypothesis for a Type 1 error rate of 0.05 appears in the table below. Valid tests will yield Type I error rates no greater than 0.05. When the data are independent, both Fisher’s Exact test and McNemar’s Exact test yield valid results, with false rejection rates falling below 0.05. With minor and modest correlation, the false rejection rate for Fisher’s Exact test exceeds the Type 1 error rate of 0.05. Notably for the modest correlation scenario, Fisher’s Exact test falsely rejects the null hypothesis in 15.5% of cases.
Correlation | P(X2=1|X1=0) | P(X2=1|X1=1) | Fisher’s Exact | McNemar’s Exact |
---|---|---|---|---|
Independent | 0.501 | 0.503 | 0.024 | 0.014 |
Minor | 0.426 | 0.578 | 0.07 | 0.021 |
Modest | 0.353 | 0.653 | 0.155 | 0.008 |
The simulations were then repeated for a sample of size 100. When the data are independent, both Fisher’s Exact test and McNemar’s Exact test yield valid results, with false rejection rates falling below 0.05. McNemar’s test is valid for all correlations. However Fisher’s Exact Test shows extraordinarily high false rejection rates. Notably, the value of 83.4% for a false rejection rate is more typical of that desired for statistical power.
Correlation | P(X2=1|X1=0) | P(X2=1|X1=1) | Fisher’s Exact | McNemar’s Exact |
---|---|---|---|---|
Independent | 0.501 | 0.503 | 0.039 | 0.042 |
Minor | 0.426 | 0.578 | 0.286 | 0.029 |
Modest | 0.353 | 0.653 | 0.834 | 0.027 |
Reproducing the computer simulations using a different platform will lead to slight variation in the results of the simulations. Lastly, we note that these analyses were carried out with the exact forms of both Fisher’s and McNemar’s test. For large samples, Pearson’s Chi-square or an asymptotic form of McNemar’s Test could be used.