Introduction

This document simulates hypothesis tests for coin fairness using various statistical methods. We analyze Type I errors, false discovery rates (FDR), and the impact of multiple comparisons.

1. Single Coin Fairness Test

For each experiment, we flip a fair coin 10,000 times and reject the null hypothesis if the p-value is below a significance threshold.

library(ggplot2)

binomial_test <- function(flips) {
  n_heads <- sum(flips)
  n <- length(flips)
  p_val <- 2 * pbinom(min(n_heads, n - n_heads), n, 0.5)  # Two-tailed test
  return(p_val)
}

simulate_coin_test <- function(n_simulations = 50000, n_flips = 10000, alpha = 0.05) {
  set.seed(142)
  p_values <- replicate(n_simulations, {
    flips <- sample(c(0,1), n_flips, replace=TRUE)
    binomial_test(flips)
  })
  
  df <- data.frame(p_values)
  ggplot(df, aes(x = p_values)) +
    geom_histogram(breaks = seq(0, 1, length.out = 11), fill = "blue", alpha = 0.7, color = "black") +
    geom_vline(xintercept = c(0.01, 0.05, 0.10), color = c("green", "red", "blue"), linetype = "dashed") +
    labs(title = "Histogram of p-values from Coin Fairness Test", x = "p-value", y = "Frequency")
}

simulate_coin_test()

2. Hypothesis Test with Stopping Rule

Now, we simulate hypothesis tests where p-values are checked every 1000 flips. The experiment stops if p-value < α.

simulate_coin_test_stopping <- function(n_simulations = 10000, max_flips = 10000, check_interval = 1000, alpha = 0.05) {
  set.seed(142)
  p_values <- replicate(n_simulations, {
    flips <- c()
    p_value <- 1.0
    for (i in seq(check_interval, max_flips, by = check_interval)) {
      flips <- c(flips, sample(c(0,1), check_interval, replace=TRUE))
      p_value <- binomial_test(flips)
      if (p_value < alpha) break
    }
    return(p_value)
  })
  
  df <- data.frame(p_values)
  ggplot(df, aes(x = p_values)) +
    geom_histogram(breaks = seq(0, 1, length.out = 11), fill = "blue", alpha = 0.7, color = "black") +
    geom_vline(xintercept = c(0.01, 0.05, 0.10), color = c("green", "red", "blue"), linetype = "dashed") +
    labs(title = "Histogram of p-values from Stopping Rule Test", x = "p-value", y = "Frequency")
}

simulate_coin_test_stopping()

3. Multiple Coin Fairness Tests

We now simulate 20 coin fairness tests to show how multiple hypothesis testing can lead to incorrect rejections.

simulate_multiple_coin_tests <- function(n_simulations = 10000, n_coins = 20, n_flips = 1000, alpha = 0.05) {
  set.seed(142)
  rejected_counts <- replicate(n_simulations, {
    sum(replicate(n_coins, binomial_test(sample(c(0,1), n_flips, replace=TRUE)) < alpha))
  })
  
  df <- data.frame(rejected_counts)
  ggplot(df, aes(x = rejected_counts)) +
    geom_histogram(binwidth = 1, fill = "blue", alpha = 0.7, color = "black") +
    labs(title = "Histogram of Rejected Null Hypotheses", x = "Number of Rejections", y = "Frequency")
}

simulate_multiple_coin_tests()

4. False Discovery Rate (FDR) Control

We apply the Benjamini-Hochberg (BH) method to control the False Discovery Rate (FDR).

library(stats)

simulate_fdr_test <- function(n_simulations = 1000, n_coins = 100, n_unfair = 10, n_flips = 1000, alpha = 0.05, unfair_prob = 0.6) {
  set.seed(142)
  fdr_results <- replicate(n_simulations, {
    actual_unfair <- c(rep(FALSE, n_coins - n_unfair), rep(TRUE, n_unfair))
    actual_unfair <- sample(actual_unfair)
    
    p_values <- sapply(actual_unfair, function(unfair) {
      p <- ifelse(unfair, unfair_prob, 0.5)
      binomial_test(sample(c(0,1), n_flips, replace=TRUE, prob=c(1-p, p)))
    })
    
    rejected_fdr <- p.adjust(p_values, method = "BH") < alpha
    false_positives_fdr <- sum(rejected_fdr & !actual_unfair)
    total_rejected_fdr <- sum(rejected_fdr)
    fdr_fdr <- ifelse(total_rejected_fdr > 0, false_positives_fdr / total_rejected_fdr, 0)
    
    rejected_standard <- p_values < alpha
    false_positives_standard <- sum(rejected_standard & !actual_unfair)
    total_rejected_standard <- sum(rejected_standard)
    fdr_standard <- ifelse(total_rejected_standard > 0, false_positives_standard / total_rejected_standard, 0)
    
    return(c(fdr_fdr, fdr_standard))
  })
  
  df <- data.frame(FDR_Method = fdr_results[1,], Standard_Method = fdr_results[2,])
  
  ggplot() +
    geom_histogram(data = df, aes(x = FDR_Method), breaks = seq(0, 1, length.out = 11), fill = "green", alpha = 0.7, color = "black") +
    geom_histogram(data = df, aes(x = Standard_Method), breaks = seq(0, 1, length.out = 11), fill = "red", alpha = 0.5, color = "black") +
    labs(title = "False Discovery Rate (FDR) Comparison", x = "FDR", y = "Frequency") +
    theme_minimal()
}

simulate_fdr_test()

5. Built-in FDR Function

We demonstrate the built-in function for FDR correction using p.adjust.

p_values <- c(0.02, 0.15, 0.0005, 0.04, 0.10, 0.001, 0.25, 0.03, 0.07, 0.005)
adjusted_p_values <- p.adjust(p_values, method = "BH")
rejected <- adjusted_p_values < 0.05

print(data.frame(p_values, adjusted_p_values, rejected))

##    p_values adjusted_p_values rejected
## 1    0.0200        0.05000000    FALSE
## 2    0.1500        0.16666667    FALSE
## 3    0.0005        0.00500000     TRUE
## 4    0.0400        0.06666667    FALSE
## 5    0.1000        0.12500000    FALSE
## 6    0.0010        0.00500000     TRUE
## 7    0.2500        0.25000000    FALSE
## 8    0.0300        0.06000000    FALSE
## 9    0.0700        0.10000000    FALSE
## 10   0.0050        0.01666667     TRUE

Lab Homework

You are allowed to call a built-in function that directly computes the p-value for you.

Problem 1: Normal Distribution \(N(0,1)\)

Let’s say a random variable \(X\) follows a normal distribution of \(N(0, 1)\).
- 1. Draw 5 samples of \(X\) and use \(\bar{x}\) to infer the true population mean using a confidence interval estimate with confidence level of 95%. Do a simulation of 10,000 times and find the probability of the estimated interval actually contains \(\mu = 0\).
- 1. Do a hypothesis test for \(X\) with \(H_0: \mu = 0\) and \(H_a: \mu \neq 0\) with 10,000 simulations and \(\alpha = 0.05\). Find the probability of rejecting \(H_0\) by comparing the p-value and \(\alpha\).
- 1. Redo (b) with \(H_a: \mu > 0\).
- 1. Increase the number of samples to 100 and repeat (a)-(c).
- 1. How does this experiment verify the theory that we learn? Summarize your findings properly.

Problem 2: Uniform Distribution \(\text{Unif}(-1,1)\)

Repeat all steps in Problem 1 with \(X\) following a uniform distribution \(\text{Unif}(-1,1)\). How does the distribution of \(X\) affect the results?

Problem 3: Normal Distribution \(N(1,4)\)

Now let’s simulate with \(X\) following a normal distribution of \(N(1,4)\) (variance of 4 and standard deviation of 2).
- 1. With a sample size of 10, do a hypothesis test for \(X\) with \(H_0: \mu = 0\) and \(H_a: \mu > 0\) with 10,000 simulations and \(\alpha = 0.05\). Find the \(\beta\) from your simulation. What is the estimated power of the test?
- 1. Change \(\alpha\) to 0.01 in (a) and find \(\beta\) and the power of test again.
- 1. Change the sample size to 1000 and redo (a) and (b) again.
- 1. How does this experiment verify the theory that we learn? Summarize your findings properly.

Hypothesis Test Part Two

Miao Yu

2025-02-05