What is the McNemar Test?

“The McNemar test is a non-parametric test used to analyze paired nominal data. It is a test on a 2 x 2 contingency table and checks the marginal homogeneity of two dichotomous variables. The test requires one nominal variable with two categories (dichotomous) and one independent variable with two dependent groups.”

Source - https://www.ncbi.nlm.nih.gov/books/NBK560699/#:~:text=The%20McNemar%20test%20is%20a,variable%20with%20two%20dependent%20groups.

In simpler words, it is a statistical test used to analyze the differences between paired nominal data, typically for two related or matched samples. It is often used in medical research, psychological studies, and other social sciences to determine whether the proportions of outcomes in two related samples differ significantly.

Analysing the Dataset

Consider a dataset on 250 subjects in which acid reflux is treated by a drug. The response is either success (reflux stopped) or failure (reflux still present). Half of the subjects are randomly selected to use drug A the first day they have reflux and drug B the second day they have reflux. The others were assigned to use drug B first, then drug A.

We want to know if the drugs have a different probability of successful relief. The outcomes for the relief given by the drugs is given in the table below.

Success Failure Total
Drug A 100 150 250
Drug B 125 125 250
Total 225 275 500

We wouldn’t want to analyze this data with a test of homogeneity because the observations in the table are not independent! We really only have 250 subjects so we can’t have 500 independent observations. Instead, we can look at a table of concordant and discordant pairs.

Drug B Relief Status
Success Failure Total
Drug A Relief Status Success 85 15 100
Failure 40 110 150
Total 125 125 250

The diagonal now represents the observations that ‘agreed’ for an individual (concordant pairs). The off diagonals represent observations that didn’t agree for individuals (discordant pairs).

To test whether or not the drugs have a different effect, we can now apply McNemar’s test on the table of concordant/discordant pairs. Let’s formally write this out. We can consider the table of probabilities given by the table of concordant/discordant pairs:

Drug B Relief Status
Success Failure Total
Drug A Relief Status Success \(\pi_{11}\) \(\pi_{12}\) \(\pi_{1\bullet}\)
Failure \(\pi_{21}\) \(\pi_{22}\) \(\pi_{2\bullet}\)
Total \(\pi_{\bullet 1}\) \(\pi_{\bullet 2}\) \(\pi_{\bullet\bullet}\)

We really want to test a restricted multinomial vs a free multinomial with this table! \(H_0\) : No relationship between drug and relief, or, \(\pi_{1\bullet}\) = \(\pi_{\bullet 1}\) and \(\pi_{2\bullet}\) = \(\pi_{\bullet 2}\) equivalent to \(\pi_{12}\) = \(\pi_{21}\)

\(H_A\) : cell probabilities are ’free’ (other than the sum to 1 constraint)

If we use Pearson’s Chi-Square test statistic (call it \(X^2\)), we have \[ X^2 = \sum_{i=1}^{2} \sum_{j=1}^{2} \frac{(Obs_{ij}-Exp_{ij})^2}{Exp_{ij}} = \frac{(n_{12}-n_{21})^2}{n_{12}+n_{21}} \] where \(n_{ij}\) represents the observed count in cell ij. (You’ll show this later on.) Our reference distribution is a \(\chi_1^2\) . This test can be done in R using the mcnemar.test() function from the stats package (automatically loaded into R). We simply pass the function the matrix of concordant/discordant pairs.

Part 1

First, find the value of the test statistic, the rejection region, and the p-value of the test using R (coding these parts yourself).

Next, create this data in R (matrix() function will work) and run McNemar’s test to determine if we have evidence that the drugs have different probabilities of relief using the mcnemar.test() function. Use correct = FALSE in mcnemar.test() so that you get the same answer here!

Answer

# Creating the above above data matrix
drug_test_matrix <- matrix(c(85, 15, 40, 110), nrow = 2, byrow = TRUE)
colnames(drug_test_matrix) <- c("Success", "Failure")
rownames(drug_test_matrix) <- c("Success", "Failure")

# Displaying the matrix
print(drug_test_matrix)
##         Success Failure
## Success      85      15
## Failure      40     110

Calculating the Test Statistic and p-value

We use the Pearson’s Chi-Square test statistic (call it \(X^2\)). We can calculate that as \[ X^2 = \sum_{i=1}^{2} \sum_{j=1}^{2} \frac{(Obs_{ij}-Exp_{ij})^2}{Exp_{ij}} = \frac{(n_{12}-n_{21})^2}{n_{12}+n_{21}} \]

We also know that the degrees of freedom are \[ df = (I-1)(J-1) = (2-1)(2-1) = 1 \]

Using this, and an \(\alpha\) of 0.05, we can calculate the required values, as shown in the code below

# Calculating the Pearson’s Chi-Square test statistic
n12 <- drug_test_matrix[1, 2]
n21 <- drug_test_matrix[2, 1]
test_stat <- (n12 - n21)^2 / (n12 + n21)

# Calculating p-value using the Chi-squared test distribution
p_value <- pchisq(test_stat, df = 1, lower.tail = FALSE)

# Displaying the p-value
cat("Pearson Chi-Square Test Statistic  = ", test_stat, "\n")
## Pearson Chi-Square Test Statistic  =  11.36364
cat("p-value  = ", p_value, "\n")
## p-value  =  0.0007489604

Since our p-value is less than \(\alpha\) (0.05), we reject our null hypothesis.

# Calculating the Rejection Region for an alpha of 0.05
alpha <- 0.05 # significance level
df <- 1 # degrees of freedom
rejection_region <- qchisq(1 - alpha, df)

# Displaying the rejection region
cat("RR: Test Statistic (X^2) >= ", rejection_region, "\n")
## RR: Test Statistic (X^2) >=  3.841459

The rejection of our null hypothesis is supported by the fact that the calculated test-statistic (\(X^2\)) lies in the rejection region.

Using McNemar’s Test to calculate the above values

# McNemar's test
mcnemar_results <- mcnemar.test(drug_test_matrix, correct = FALSE)

# Print the test results
print(mcnemar_results)
## 
##  McNemar's Chi-squared test
## 
## data:  drug_test_matrix
## McNemar's chi-squared = 11.364, df = 1, p-value = 0.000749

We can also see that the McNemar values are the same as the calculated values for the test statistic and p-value.

Part 2

Null Hypothesis

First, in the statement of the null hypothesis we wrote: \[ \pi_{1\bullet} = \pi_{\bullet 1} \text{ and } \pi_{2\bullet} = \pi_{\bullet 2} \text{ equivalent to } \pi_{12}=\pi_{21} \] Show that this is true.

Answer

We know from ST 501 that if A and B are independent, then \[ P(A \cap B ) = P(A)P(B) \] We know that \(\pi_{ij}\) represents the probability of the observation falling in class i and j, and since \(\pi_{i\bullet}\) and \(\pi_{\bullet j}\) represent the marginal probabilities along the row i and columns j respectively, we can use probability intersection equation as \[ \pi_{11} = (\pi_{1\bullet})(\pi_{\bullet 1}) \] \[ \pi_{12} = (\pi_{1\bullet})(\pi_{\bullet 2}) \] and so on. Therefore \[ \pi_{12} = (\pi_{1\bullet})(\pi_{\bullet 2}) \text{ and } \pi_{21} = (\pi_{2\bullet})(\pi_{\bullet 1}) \] Now since \[ \pi_{1\bullet} = \pi_{\bullet 1} \text{ and } \pi_{2\bullet} = \pi_{\bullet 2} \], we can use this in the above equation and show that \[ \pi_{12} = (\pi_{1\bullet})(\pi_{\bullet 2}) = (\pi_{2\bullet})(\pi_{\bullet 1}) = \pi_{21} \]

Finding Maximums

Second, under this null restriction on our multinomial, derive the maximum’s for \(\pi_{11}, \pi_{12}, \pi_{21}, \pi_{22}\). This can be done using Lagrange multipliers or by substituting in carefully to include the restriction. No need to show your resulting critical values are maximums.

Answer

All we need to do is find the maximum likehood estimate (MLE). The likelihood can be calculated as

\[ L(\pi_{11}, \pi_{12}, \ldots, \pi_{IJ}) \propto \pi_{11}^{n_{11}} \pi_{12}^{n_{12}} \cdot \ldots \cdot \pi_{IJ}^{n_{IJ}} \]

Taking log on both the sides, we get our log likelihood as

\[ l(\pi_{11}, \pi_{12}, \ldots, \pi_{IJ}) = c + n_{11}\ln(\pi_{11}) + n_{12}\ln(\pi_{12}) + \ldots + n_{IJ}\ln(\pi_{IJ}) \]

subject to the fact that \[ \sum_{i=1}^{I} \sum_{j=1}^{J} \pi_{ij} = 1 \] Under the null restriction, we can rewrite this log likelihood as

\[ l(\pi_{\bullet 1}, \pi_{\bullet 2}, ..., \pi_{\bullet J}, \pi_{1\bullet}, \pi_{2\bullet}, ..., \pi_{I\bullet}) = c + n_{11}\ln(\pi_{1\bullet}\pi_{\bullet 1}) + n_{12}\ln(\pi_{1\bullet}\pi_{\bullet2}) + ... + n_{IJ}\ln(\pi_{I\bullet}\pi_{\bullet J}) = c + \sum_{i=1}^{I}\sum_{j=1}^{J} n_{ij} (\ln(\pi_{i\bullet}) + \ln(\pi_{\bullet j})) \]

Given the sum to 1 constraint, we know that:

\[ \sum_{i=1}^{I} \pi_{i\bullet} = 1, \sum_{j=1}^{J} \pi_{\bullet j} = 1 \]

To maximize our likelihood with respect to these constraints, we can either substitute them in or use Lagrange multipliers We use Lagrange Multipliers here to maximize our log-likelihood, subject to given constraints:

  • Write constraints equal to 0

\[ \sum_{i=1}^{I} \pi_{i\bullet} - 1 = 0, \quad \sum_{j=1}^{J} \pi_{\bullet j} - 1 = 0 \]

  • Add these constraints to the our log likelihood with ‘Lagrange multipliers’ attached:

\[ l(\pi_{\bullet 1}, \pi_{\bullet 2}, ..., \pi_{\bullet J}, \pi_{1\bullet}, \pi_{2\bullet}, ..., \pi_{I\bullet})= c + \sum_{i=1}^{I}\sum_{j=1}^{J} n_{ij} (\ln(\pi_{i\bullet}) + \ln(\pi_{\bullet j})) + \lambda_1 \left( \sum_{i=1}^{I} \pi_{i\bullet} - 1 \right) + \lambda_2 \left( \sum_{j=1}^{J} \pi_{\bullet j} - 1 \right) \]

We then take the usual derivatives for all variables of interest including the Lagrange Multipliers, and solve their respective equations.

\[ \frac{\partial l}{\partial \pi_{i\bullet}} = \frac{\sum_{j=1}^{J} n_{ij}}{\pi_{i\bullet}} - \lambda_1 \]

\[ \frac{\partial l}{\partial \pi_{\bullet j}} = \frac{\sum_{i=1}^{I} n_{ij}}{\pi_{\bullet j}} - \lambda_2 \]

\[ \frac{\partial l}{\partial \lambda_1} = \sum_{i=1}^{I} \pi_{i\bullet} - 1 \]

\[ \frac{\partial l}{\partial \lambda_2} = \sum_{j=1}^{J} \pi_{\bullet j} - 1 \]

Now we set all of these equal to 0. The 1st equation gives

\[ \pi_{i\bullet} = \frac{\sum_{j=1}^{J} n_{ij}}{\lambda_1} \]

The 3rd equation gives

\[ \sum_{i=1}^{I} \frac{\sum_{j=1}^{J} n_{ij}}{\lambda_1} = 1 \Rightarrow \lambda_1 = \sum_{i=1}^{I} \sum_{j=1}^{J} n_{ij} = n \]

Plugging this equation for \(\lambda_1\) back into the 1st equation and we get

\[ \tilde \pi_{i\bullet} = \frac{\sum_{j=1}^{J} n_{ij}}{n} \]

Similarly, we can obtain that

\[ \tilde \pi_{\bullet j} = \frac{\sum_{i=1}^{I} n_{ij}}{n} \] Substituting values of 1 and 2 for i and j and using R, we get respective maximums

n <- 250

for (i in 1:2) {
    for (j in 1:2) {
        # Calculate max pi for the current cell
        max_pi_ij <- drug_test_matrix[i, j]/n
        
        # print the result
        cat(max_pi_ij, "\n")
    }
}
## 0.34 
## 0.06 
## 0.16 
## 0.44

Deriving the LRT for this problem

Third, although we derived the form of the LRT generally for a restricted vs free multinomial, I’d like you to do this for this specific problem. You should show that

\[ -2\ln \left( \frac{L(\tilde{\pi}_{11}, \tilde{\pi}_{12}, \tilde{\pi}_{21}, \tilde{\pi}_{22})}{L(\hat{\pi}_{11}, \hat{\pi}_{12}, \hat{\pi}_{21}, \hat{\pi}_{22})} \right) = 2 \sum_{i=1}^{2}\sum_{j=1}^{2} Obs_{ij} \ln \left( \frac{Obs_{ij}}{Exp_{ij}} \right) \]

and then argue that the appropriate reference distribution has 1 degree of freedom. You don’t need to derive the overall maximums for our \(\pi_{ij}\) values. You can just use the usual MLE result as we did in class.

Answer

The LRT can be calculated from the LHS, i.e. \[ -2\ln \left( \frac{L(\tilde{\pi}_{11}, \tilde{\pi}_{12}, \tilde{\pi}_{21}, \tilde{\pi}_{22})}{L(\hat{\pi}_{11}, \hat{\pi}_{12}, \hat{\pi}_{21}, \hat{\pi}_{22})} \right) \] The likelihood function for the given multinomial distribution is calculated as: \[ L(\pi) = \prod_{i=1}^{2}\prod_{j=1}^{2} \pi_{ij}^{n_{ij}} \] But, \(n_{ij}\) is itself the observed cell count, i.e. \(Obs_{ij}\). Therefore,

\[ L(\pi) = \prod_{i=1}^{2}\prod_{j=1}^{2} \pi_{ij}^{Obs_{ij}} \] Taking log on both sides, we derive the log likelihood function as

\[ l(\pi) = \sum_{i=1}^{2}\sum_{j=1}^{2} Obs_{ij} \ln(\pi_{ij}) \] Therefore,

\[ -2\ln \left( \frac{L(\tilde{\pi}_{11}, \tilde{\pi}_{12}, \tilde{\pi}_{21}, \tilde{\pi}_{22})}{L(\hat{\pi}_{11}, \hat{\pi}_{12}, \hat{\pi}_{21}, \hat{\pi}_{22})} \right) = -2\ln \left( \frac{L(\tilde{\pi})}{L(\hat{\pi})} \right) = -2 \left[ \ln(L(\tilde{\pi}))) - \ln(L(\hat{\pi})) \right] = -2 [l(\tilde{\pi}) - l(\hat{\pi})] \] Substituting the value from the log likehood derviation above, we get \[ = -2 \left[\sum_{i=1}^{2}\sum_{j=1}^{2} Obs_{ij} \ln(\tilde\pi_{ij}) - \sum_{i=1}^{2}\sum_{j=1}^{2} Obs_{ij} \ln(\hat\pi_{ij}) \right] = 2 \sum_{i=1}^{2}\sum_{j=1}^{2} Obs_{ij} \left[ln(\hat\pi_{ij}) - ln(\tilde\pi_{ij})\right] = 2 \sum_{i=1}^{2}\sum_{j=1}^{2} Obs_{ij} ln (\frac{\hat\pi_{ij}}{\tilde\pi_{ij}}) \] We know that under the null hypothesis,

\[ Obs_{ij} = n \hat{\pi}_{ij} \text{ & } Exp_{ij} = n \tilde{\pi}_{ij} \]

where n is the total sample size. Therefore, \[ \frac{Obs_{ij}}{Exp_{ij}} = \frac{n \hat{\pi}_{ij}}{n \tilde{\pi}_{ij}} = \frac{\hat{\pi}_{ij}}{\tilde{\pi}_{ij}} \] Substituting in the previous equation, we get \[ \text{LRT} = -2\ln \left( \frac{L(\tilde{\pi}_{11}, \tilde{\pi}_{12}, \tilde{\pi}_{21}, \tilde{\pi}_{22})}{L(\hat{\pi}_{11}, \hat{\pi}_{12}, \hat{\pi}_{21}, \hat{\pi}_{22})} \right) = 2 \sum_{i=1}^{2}\sum_{j=1}^{2} Obs_{ij} ln \left(\frac{Obs_{ij}}{Exp_{ij}}\right) \]

Simplifying the Pearson Chi-Square Test Statistic

Lastly, we know we can use Pearson’s chi-square test statistic instead of this LRT and they are asymptotically equivalent. In our above data example, we used

\[ X^2 = \sum_{i=1}^{2}\sum_{j=1}^{2} \frac{(Obs_{ij} - Exp_{ij})^2}{Exp_{ij}} = \frac{(n_{12} - n_{21})^2}{n_{12} + n_{21}} \]

Show that Pearson’s chi-square test statistic can be simplified into the form on the right.

Answer Let us work with the a sample 2x2 contingency table.

Success Failure
Test1 \(n_{11}\) \(n_{12}\)
Test2 \(n_{21}\) \(n_{22}\)

First, note that for a 2x2 contingency table, \(Obs_{ij}\) are the observed counts, and \(Exp_{ij}\) are the expected counts calculated based on the row and column totals.

Given the observed counts in the 2x2 contingency table:

\[ Obs_{11} = n_{11} \\ Obs_{12} = n_{12} \\ Obs_{21} = n_{21} \\ Obs_{22} = n_{22} \] Now, let’s express the Pearson Chi-Square Test Statistic \(X^2\) for this 2x2 contingency table:

\[ X^2 = \sum_{i=1}^{2} \sum_{j=1}^{2} \frac{(Obs_{ij} - Exp_{ij})^2}{Exp_{ij}} \] Expanding the formula for a 2x2 contingency table:

\[ X^2 = \frac{(n_{11} - Exp_{11})^2}{Exp_{11}} + \frac{(n_{12} - Exp_{12})^2}{Exp_{12}} + \frac{(n_{21} - Exp_{21})^2}{Exp_{21}} + \frac{(n_{22} - Exp_{22})^2}{Exp_{22}} \] The expected counts for a 2x2 contingency table can be calculated as

\[ Exp_{ij} = \frac{n_{i\bullet} \cdot n_{\bullet j}}{n} \] Now, under the McNemar null hypothesis, we have \[ \pi_{1\bullet} = \pi_{\bullet 1} \text{ and } \pi_{2\bullet} = \pi_{\bullet 2} \text{ equivalent to } \pi_{12}=\pi_{21} \] Therefore, the test statistic can be rewritten as: \[ X^2 = \frac{(n_{12} - Exp_{12})^2}{Exp_{12}} + \frac{(n_{21} - Exp_{21})^2}{Exp_{21}} \] Now let’s substitute the expected counts \(Exp_{12} = \frac{n_{1.}n_{.2}}{n}\) and \(Exp_{21} = \frac{n_{2.}n_{.1}}{n}\):

\[ X^2 = \frac{(n_{12} - \frac{n_{1.}n_{.2}}{n})^2}{\frac{n_{1.}n_{.2}}{n}} + \frac{(n_{21} - \frac{n_{2.}n_{.1}}{n})^2}{\frac{n_{2.}n_{.1}}{n}} \]

After simplifying the above formula and combining like terms, the formula simplifies to:

\[ X^2 = \frac{(n_{12} - n_{21})^2}{n_{12} + n_{21}} \] This is the provided form of the test statistic, and it simplifies the test to focus on the difference between discordant pairs. This shows how the general form of the Pearson’s chi-square test statistic can be simplified to the provided form, i.e.

\[ X^2 = \sum_{i=1}^{2}\sum_{j=1}^{2} \frac{(Obs_{ij} - Exp_{ij})^2}{Exp_{ij}} = \frac{(n_{12} - n_{21})^2}{n_{12} + n_{21}} \]

Simulation Study

We might like to know how well our test does at detecting certain alternatives (i.e. its power). While we can try to derive these things, using simulation-based methods provides a nice method for finding approximate results.

The Pearson statistic can be derived as a Taylor series approximation to the LRT. For this last part of the project, we’ll investigate the \(\alpha\) control of the Pearson chi-square test and its power (so we don’t have to worry about the \(\ln(0)\) that can sometimes pop up for the LRT).

Goal of simulation study:

Setup

Simulation Study

  • We will generate correlated binary data using the draw.correlated.binary() function from the MultiRNG package. You will likely need to install this package.

    • The first argument, no.row, is the sample size.
    • The second argument, d, is the number of variables to create (we want 2).
    • The third argument, prop.vec, is the vector of true proportions of success (of length two, corresponding to our two variables).
    • The fourth argument, corr.mat, is a (2x2 for our purposes) correlation matrix. This should have 1’s along the diagonal and the value of the correlation in the off-diagonals.
  • We will generate data under all combinations of the following:

    • n = 25, 40, 80, and 200.
    • \(\pi_1\) = 0.1, 0.4, 0.8 (the success probability of ‘drug A’).
    • \(\pi_2\) = \(\pi_1\), \(\pi_2\) = \(\pi_1\) + 0.02, \(\pi_2\) = \(\pi_1\) + 0.05, \(\pi_2\) = \(\pi_1\) + 0.1 (the success probability of ‘drug B’).
    • rho = 0, 0.2, 0.5 (the correlation level for data generation).
  • We will generate N = 1000 datasets under each of these settings.

By examining the case when \(\pi_2\) = \(\pi_2\), we can assess α control. In other cases, we can investigate power under the alternative created by the difference in π’s and correlation.

# Creating a test function that performs the McNemar test and returns TRUE if the null hypothesis is rejected, FALSE otherwise
test_function <- function(n, d, pi1, pi2, rho) {
  # Generating the correlated binary data
  prop_vec <- c(pi1, pi2)
  corr_mat <- matrix(c(1, rho, rho, 1), nrow = 2)
  binary_data <- draw.correlated.binary(no.row = n, d = d, prop.vec = prop_vec, corr.mat = corr_mat)
  
  # Creating the table for McNemar's test
  test_matrix <- matrix(c(sum(binary_data[,1] == 1 & binary_data[,2] == 1),
                          sum(binary_data[,1] == 1 & binary_data[,2] == 0),
                          sum(binary_data[,1] == 0 & binary_data[,2] == 1),
                          sum(binary_data[,1] == 0 & binary_data[,2] == 0)), nrow = 2)
  
  test_result <- mcnemar.test(test_matrix, correct = FALSE)
  
  # Return TRUE if the null hypothesis is rejected, FALSE otherwise
  return(test_result$p.value < 0.05)
}

# Parameters
sample_sizes <- c(25, 40, 80, 200)
d <- 2
pi1_values <- c(0.1, 0.4, 0.8)
pi2_increments <- c(0, 0.02, 0.05, 0.1)
correls <- c(0, 0.2, 0.5)
N <- 1000  


results <- list() # Creating a list to store the simulation results

# Loop through the given values for n, pi1, pi2 and correlation, to run our test function N times
for(n in sample_sizes) {
  for(pi1 in pi1_values) {
    for(inc in pi2_increments) {
      for(rho in correls) {
        pi2 <- pi1 + inc
        rejections <- replicate(N, test_function(n, d, pi1, pi2, rho))
        rejections = na.omit(rejections)
        prop_rejections <- mean(rejections)
        print(rejections)
        results[[paste(n, pi1, pi2, rho)]] <- prop_rejections
      }
    }
  }
  print(n)
}


results_df <- do.call(rbind, lapply(seq_along(results), function(i) {
  params <- strsplit(names(results)[i], " ")[[1]]
  data.frame(
    sample_size = as.numeric(params[1]),
    pi1 = as.numeric(params[2]),
    pi2_increment = as.numeric(params[3]) - as.numeric(params[2]),
    correlation = as.numeric(params[4]),
    rejected_proportion = results[[i]],
    stringsAsFactors = FALSE
  )
}))
for (pi_1 in pi1_values){
  filtered_df <- results_df[results_df$pi1 == pi_1, ]
  pt <- ggplot(filtered_df, aes(x = pi2_increment, y = rejected_proportion, color = as.factor(correlation), group = correlation)) +
    geom_line() +
    facet_grid(. ~ sample_size) + # Facets for different sample sizes
    labs(
      title = sprintf("Power plot for different sample sizes and correlations with pi1 = %.2f", pi_1),
      x = expression(pi[2] - pi[1]),
      y = "Proportion Rejected",
      color = "Correlation"
    ) +
    theme_minimal() +
        theme(
            axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1, size = 10), # Rotate x-axis labels
            plot.title = element_text(face = "bold", size = 14), # Bold and increase size of the title
            axis.title.x = element_text(face = "bold", margin = margin(t = 10)), # Bold and add margin to x-axis label
            axis.title.y = element_text(face = "bold", margin = margin(r = 10)) # Bold and add margin to y-axis label
        )

  print(pt)
}

Observations from our simulation study

  • The power of the test tends to increase as the sample size grows (from 25 to 200), as evidenced by the steeper power curves in charts with larger sample sizes. This suggests that a larger sample size enhances the test’s ability to correctly reject the null hypothesis when it is false.
  • The shape and steepness of the power curves are also affected by the baseline probability. As \(\pi_1\) increases (from 0.1 to 0.8), the test’s ability to detect a difference improves, though the rate of increase depends on the correlation. Even with smaller sample sizes, higher baseline probabilities provide some power, especially with stronger correlations.
  • Moreover, higher correlation (blue line, correlation of 0.5) is associated with increased test power, as the test becomes more effective at detecting differences between \(\pi_1\) and \(\pi_2\) with greater correlation.
  • The ability to detect a difference improves across all levels of pi1 and correlation as the absolute difference between \(\pi_2\) and \(pi_1\) increases. This is reflected in the upward trend of the power curves and the x-axis (\(\pi_2\) - \(\pi_1\)). Naturally, power is at its lowest when there is no difference (\(\pi_2\) - \(\pi_1\) is close to 0).

Overall, these charts suggest that larger sample sizes, higher baseline probabilities, and stronger correlations between measurements contribute to increased test power. This implies that under certain conditions, there is a greater likelihood of detecting real differences between pi1 and pi2. Even with small sample sizes, substantial differences can lead to acceptable power, particularly when correlations are stronger.

END