Understanding and Coding in R
McNemar’s test is a non-parametric test used to analyze paired categorical data. The standard test is a 2x2 contingency table design of paired binary response data to determine whether the row and column marginal frequencies are equal (aka checks the marginal homogeneity of two dichotomous variables.
It is commonly used when analyzing matched pairs, before-and-after data, or case-control studies. Samples are not from two different populations or from different subsets since the cases and controls are paired (i.e., they come together). Once a case is selected, the control for the case is constrained to be one of a small subset of people who match the case in various ways.
The data should consist of paired observations.
The data must be categorical.
The observations should be independent of each other.
Each observation should belong to one of two mutually exclusive categories.
Suppose that one wants to see whether or not there is an association between a risk factor and a disease.
Null hypothesis (H0): There is no association between disease (case or control) and the presence or absence of risk factor (pb = pc).
Alternative hypothesis (Ha): There is an association between disease (case or control) and the presence or absence of risk factor (pb \(\neq\) pc).
The test statistic for McNemar’s test follows a chi-square distribution.
For large sample (the number of pairs n ≥ 30), use the test statistics:
\[ \chi^2 = \frac{(b-c)^2}{b+c} \]
Where:
b = Number of the first discordant pair
c = Number of the second discordant pair
The test statistic follows a chi-square distribution with 1 degree of freedom. Under H0, \(b \sim {\sf Binom}(n = b + c, \enspace p = 0.5)\). When \(n\) is small, an exact binomial test can be performed using the binom.test() function in R.
Step 0: Install and load required package(s)
Step 1: Prepare the data
Step 2: Create a two-way table
Step 3: Perform the McNemar’s test
Step 4: Interpret results
Now let’s go through the steps to perform a McNemar’s test using R. We will use the infer_mcnemar_test() function from the inferr package, and the mcnemar.test() function, which is available in stats package (this package comes with base R).
For this example, we are investigating the following:
Research Question: Is there a significant association between maternal smoking (yes or no) and the occurrence of low birth weight (yes or no) in infants?
Null hypothesis (H0): There is no significant difference in the proportion of low birth weight infants between mothers who smoke and mothers who do not smoke.
Alternative hypothesis (Ha): There is a significant difference in the proportion of low birth weight infants between mothers who smoke and mothers who do not smoke.
We are going to need the MASS package to access a dataset for practice and the inferr package to “amp up” the stats package that comes standard with R. The inferr package specifically provides additional and flexible input options and more detailed and structured test results.
For this example, we will use the birthwt dataset which is publicly available and derived from the MASS package. This dataset contains information on birth weights and potential risk factors for low birth weight.
Note
You need a dataset with paired categorical variables. Each row represents a pair of observations.
'data.frame': 189 obs. of 10 variables:
$ low : int 0 0 0 0 0 0 0 0 0 0 ...
$ age : int 19 33 20 21 18 21 22 17 29 26 ...
$ lwt : int 182 155 105 108 107 124 118 103 123 113 ...
$ race : int 2 3 1 1 1 3 1 3 1 1 ...
$ smoke: int 0 0 1 1 1 0 0 0 1 1 ...
$ ptl : int 0 0 0 0 0 0 0 0 0 0 ...
$ ht : int 0 0 0 0 0 0 0 0 0 0 ...
$ ui : int 1 0 0 1 1 0 0 0 0 0 ...
$ ftv : int 0 3 1 2 0 0 1 1 1 0 ...
$ bwt : int 2523 2551 2557 2594 2600 2622 2637 2637 2663 2665 ...
# Creating a 2x2 contingency table
table_data <- table(birthwt$smoke, birthwt$low)
# Name rows and columns for clarity in output
rownames(table_data) <- c("Mother Smokes", "Mother Does Not Smoke")
colnames(table_data) <- c("Low Birth Weight", "Normal Birth Weight")
# Show the contingency table
table_data
Low Birth Weight Normal Birth Weight
Mother Smokes 86 29
Mother Does Not Smoke 44 30
In McNemar’s test, we need data in a 2x2 contingency table format. In our example, we want to compare two binary variables: “smoke” (indicating whether the mother smokes) and “low” (indicating whether the baby has a low birth weight).
We need to create a new table containing the frequencies of the four possible outcomes: (1) both mother smokes and low birth weight, (2) mother smokes but not low birth weight, (3) mother does not smoke but low birth weight, and (4) neither mother smokes nor low birth weight.
Controls
---------------------------------
Cases 0 1 Total
---------------------------------
0 86 29 115
1 44 30 74
---------------------------------
Total 130 59 189
---------------------------------
McNemar's Test
----------------------------
McNemar's chi2 3.0822
DF 1
Pr > chi2 0.0792
Exact Pr >= chi2 0.1006
----------------------------
Kappa Coefficient
--------------------------------
Kappa 0.159
ASE 0.0723
95% Lower Conf Limit 0.0172
95% Upper Conf Limit 0.3008
--------------------------------
Proportion With Factor
----------------------
cases 0.6085
controls 0.6878
ratio 0.8846
odds ratio 0.6591
----------------------
McNemar's Chi-squared test with continuity correction
data: table_data
McNemar's chi-squared = 2.6849, df = 1, p-value = 0.1013
mcnemar.test(table_data, correct = FALSE) # equilvalent to the chi-square test statistic result from the Mcnemar's test from inferr package
McNemar's Chi-squared test
data: table_data
McNemar's chi-squared = 3.0822, df = 1, p-value = 0.07915
Exact binomial test
data: 29 and 29 + 44
number of successes = 29, number of trials = 73, p-value = 0.1006
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
0.2845273 0.5185986
sample estimates:
probability of success
0.3972603
In this part, we first use infer_mcnemar_test() function to perform the McNemar’s test. The result is more comprehensive than the result obtained from the mcnemar.test() function, which includes the 2x2 contingency table, Chi-square test result, exact binomial test result, and other useful information.
We also used the mcnemar.test() function to perform the McNemar’s test. We pass the contingency table as the input to the function. The mcnemar.test() function tests the null hypothesis that states the proportion of low birth weight infants between mothers who smoke and mothers who do not smoke are equal. It calculates the test statistic, degrees of freedom, and the p-value associated with the test.
McNemar's Chi-squared test with continuity correction
data: table_data
McNemar's chi-squared = 2.6849, df = 1, p-value = 0.1013
# Interpret the results
if (mcnemar_result$p.value < 0.05) {
cat("There is significant evidence to reject the null hypothesis.")
} else {
cat("There is not enough evidence to reject the null hypothesis.")
}There is not enough evidence to reject the null hypothesis.
To interpret the results, we focus on the p-value, which indicates the significance of the test. In this code, we check if the p-value is less than 0.05 (a common significance level). If the p-value is less than 0.05, we conclude that there is significant evidence to reject the null hypothesis. Otherwise, if the p-value is greater than or equal to 0.05, we conclude that there is not enough evidence to reject the null hypothesis.
Note
Keep in mind that while McNemar’s test can show an association, it does not establish causality. It merely helps to determine if there is a significant relationship between the two binary variables in a paired/matched dataset.