1 Introduction

This document reproduces the clinical trial comparison between Drug X and Drug Y using both frequentist and Bayesian approaches.
- Drug X improved 45 out of 150 patients.
- Drug Y improved 60 out of 150 patients.

We (1) run standard two-proportion tests, (2) compute Bayesian posterior distributions for response probabilities under a Beta-Binomial model with transparent priors, (3) estimate the probability that Drug Y is more effective than Drug X, and (4) visualize results with professional figures and commentary.

2 Libraries

library(tidyverse)
library(scales)

3 Data setup

In this section, we define the observed trial data for Drug X and Drug Y. Each drug was administered to 150 patients, and we recorded how many showed significant improvement. Drug X improved 45 patients, while Drug Y improved 60 patients.

This setup translates into two binomial outcomes—each with a fixed number of trials (patients treated) and a number of “successes” (patients improved). By structuring the data this way, we can:

Summarize observed rates: compute improvement proportions for each drug.
Prepare inputs for analysis: frequentist two-proportion tests and Bayesian Beta-Binomial models both require counts of successes and failures.
Enable visualization: bar charts and error bars can highlight observed differences before formal inference.

3.0.1 Outcomes explanation

From this setup:
- Drug X shows an observed improvement rate of 45/150 = 30%.
- Drug Y shows an observed improvement rate of 60/150 = 40%.

At face value, Drug Y appears to perform better. However, random variation in small samples could explain this difference. To determine whether this observed difference is statistically meaningful, we proceed to both frequentist and Bayesian analyses.

n_x <- 150; s_x <- 45; f_x <- n_x - s_x
n_y <- 150; s_y <- 60; f_y <- n_y - s_y

trial_df <- tibble(
  drug = c("Drug X", "Drug Y"),
  improved = c(s_x, s_y),
  not_improved = c(f_x, f_y)
) %>% mutate(rate = improved / (improved + not_improved))

trial_df

4 Quick descriptive visualization

Before applying statistical tests, it is useful to visually inspect the raw proportions. Visualization helps communicate differences intuitively and allows us to add error bars to illustrate uncertainty.

Here, we plot improvement rates for each drug as a bar chart with approximate 95% confidence intervals. This gives us a preliminary sense of whether the observed differences are large enough to warrant further testing.

ggplot(trial_df, aes(x = drug, y = rate, fill = drug)) +
  geom_col(width = 0.6, alpha = 0.9) +
  geom_errorbar(aes(ymin = rate - 1.96*sqrt(rate*(1-rate)/(improved+not_improved)),
                    ymax = rate + 1.96*sqrt(rate*(1-rate)/(improved+not_improved))),
                width = 0.15, linewidth = 0.8) +
  geom_text(aes(label = percent(rate, accuracy = 0.1)),
            vjust = -0.8, size = 4.2, fontface = "bold") +
  scale_y_continuous(labels = percent_format(), limits = c(0, 0.6)) +
  scale_fill_manual(values = c("Drug X" = "#2E86AB", "Drug Y" = "#F18F01")) +
  labs(x = NULL, y = "Improvement Rate", fill = NULL) +
  theme_minimal(base_size = 13) +
  theme(legend.position = "none",
        panel.grid.minor = element_blank(),
        plot.title = element_text(face = "bold")) +
  ggtitle("Observed Improvement Rates")

Observed improvements by drug with 95% normal-approximate error bars.

Expanded Comment: The plot shows that the observed improvement rate for Drug X is 30%, while for Drug Y it is 40%. At first glance, this suggests Drug Y is better. However, overlap in error bars indicates uncertainty—differences could be due to random chance. Therefore, we need formal statistical inference to determine if the difference is significant and meaningful.

5 Frequentist analysis

Frequentist methods test hypotheses by calculating probabilities under assumed null models. In this context, we test whether the difference in improvement rates between Drug X and Drug Y could have arisen by chance if both drugs were equally effective.

We use two approaches:
1. Two-proportion z-test (prop.test): compares proportions between two groups.
2. Fisher’s exact test: provides an exact p-value, especially useful for small sample sizes.

prop_res <- prop.test(x = c(s_x, s_y), n = c(n_x, n_y), correct = TRUE)
fisher_res <- fisher.test(matrix(c(s_x, f_x, s_y, f_y), nrow = 2, byrow = TRUE))

prop_res

## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(s_x, s_y) out of c(n_x, n_y)
## X-squared = 2.8718, df = 1, p-value = 0.09014
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.21401832  0.01401832
## sample estimates:
## prop 1 prop 2 
##    0.3    0.4

fisher_res

## 
##  Fisher's Exact Test for Count Data
## 
## data:  matrix(c(s_x, f_x, s_y, f_y), nrow = 2, byrow = TRUE)
## p-value = 0.08991
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.3869809 1.0657990
## sample estimates:
## odds ratio 
##  0.6438247

Interpretation (frequentist):
- If p-values are small (typically <0.05), this suggests evidence against the null hypothesis of equal proportions.
- However, frequentist methods do not directly tell us the probability that Drug Y is better—they only measure how surprising the observed data would be if there were no difference.

6 Bayesian analysis (Optional Extension)

We now shift to a Bayesian perspective, which interprets probability as degree of belief rather than long-run frequency. By using Bayes’ theorem, we update our prior uncertainty with observed data to compute posterior distributions for the improvement rates of each drug.

6.1 Posterior summaries via Monte Carlo

Posterior summaries allow us to quantify uncertainty directly. Using simulation, we can compute:
- Credible intervals for each drug’s improvement rate.
- The probability that Drug Y is better than Drug X.
- The posterior distribution of the difference (Δ = pY − pX), which provides an intuitive view of effect size.

This approach goes beyond frequentist p-values by answering the clinically relevant question: Given the data, what is the probability Drug Y truly outperforms Drug X?

alpha0 <- 1; beta0 <- 1
post_x <- list(alpha = alpha0 + s_x, beta = beta0 + f_x)
post_y <- list(alpha = alpha0 + s_y, beta = beta0 + f_y)

S <- 2e6
px <- rbeta(S, post_x$alpha, post_x$beta)
py <- rbeta(S, post_y$alpha, post_y$beta)
delta <- py - px

prob_y_gt_x <- mean(py > px)
ci_px <- quantile(px, c(0.025, 0.5, 0.975))
ci_py <- quantile(py, c(0.025, 0.5, 0.975))
ci_delta <- quantile(delta, c(0.025, 0.5, 0.975))

list(
  `P(pY > pX | data)` = prob_y_gt_x,
  `CI pX` = ci_px,
  `CI pY` = ci_py,
  `CI delta` = ci_delta
)

## $`P(pY > pX | data)`
## [1] 0.964873
## 
## $`CI pX`
##      2.5%       50%     97.5% 
## 0.2324476 0.3018038 0.3777090 
## 
## $`CI pY`
##      2.5%       50%     97.5% 
## 0.3250290 0.4009072 0.4802171 
## 
## $`CI delta`
##         2.5%          50%        97.5% 
## -0.008263519  0.098887730  0.204744567

alpha0 <- 1; beta0 <- 1   # uniform priors

# Posterior parameters
post_x <- list(alpha = alpha0 + s_x, beta = beta0 + f_x)  # Beta(46, 106)
post_y <- list(alpha = alpha0 + s_y, beta = beta0 + f_y)  # Beta(61, 91)

post_x; post_y

## $alpha
## [1] 46
## 
## $beta
## [1] 106

## $alpha
## [1] 61
## 
## $beta
## [1] 91

6.2 Posterior summaries via Monte Carlo

# Draw from posteriors to estimate summaries and P(pY > pX)
S <- 2e6  # large for precise estimate; reduce if too slow
px <- rbeta(S, post_x$alpha, post_x$beta)
py <- rbeta(S, post_y$alpha, post_y$beta)
delta <- py - px

prob_y_gt_x <- mean(py > px)
ci_px <- quantile(px, c(0.025, 0.5, 0.975))
ci_py <- quantile(py, c(0.025, 0.5, 0.975))
ci_delta <- quantile(delta, c(0.025, 0.5, 0.975))

list(
  `P(pY > pX | data)` = prob_y_gt_x,
  `CI pX` = ci_px,
  `CI pY` = ci_py,
  `CI delta` = ci_delta
)

## $`P(pY > pX | data)`
## [1] 0.964989
## 
## $`CI pX`
##      2.5%       50%     97.5% 
## 0.2324183 0.3017726 0.3778188 
## 
## $`CI pY`
##      2.5%       50%     97.5% 
## 0.3251122 0.4008826 0.4800738 
## 
## $`CI delta`
##         2.5%          50%        97.5% 
## -0.008074613  0.098861293  0.204549449

Interpretation: \(P(p_Y > p_X \mid ext{data})\) is the posterior probability that Drug Y has a higher improvement rate than Drug X. This is the direct belief update that frequentist p-values do not provide.

6.3 Posterior density plots

# Tidy draws for plotting
sample_df <- tibble(px = px, py = py) %>%
  sample_n(100000) %>%  # thin for faster plotting
  pivot_longer(cols = everything(), names_to = "parameter", values_to = "p") %>%
  mutate(parameter = recode(parameter, px = "Drug X", py = "Drug Y"))

ggplot(sample_df, aes(x = p, fill = parameter, color = parameter)) +
  geom_density(alpha = 0.25, linewidth = 0.9) +
  geom_vline(xintercept = ci_px[c(1,3)], linetype = "dashed", linewidth = 0.7, color = "#2E86AB") +
  geom_vline(xintercept = ci_py[c(1,3)], linetype = "dashed", linewidth = 0.7, color = "#F18F01") +
  scale_fill_manual(values = c("Drug X" = "#2E86AB", "Drug Y" = "#F18F01")) +
  scale_color_manual(values = c("Drug X" = "#2E86AB", "Drug Y" = "#F18F01")) +
  scale_x_continuous(labels = percent_format()) +
  labs(x = "Improvement probability", y = "Posterior density",
       title = "Posterior Distributions: Drug X vs Drug Y",
       subtitle = paste0("Shaded densities with 95% central quantile lines")) +
  theme_minimal(base_size = 13) +
  theme(legend.position = "top",
        panel.grid.minor = element_blank())

Posterior distributions for improvement probabilities.

6.4 Posterior of the difference \(\,Δ = p_Y - p_X \,\)

delta_df <- tibble(delta = delta)

ggplot(delta_df, aes(x = delta)) +
  geom_histogram(aes(y = after_stat(density)), bins = 120, fill = "#D1E8E2", color = "grey30") +
  geom_density(linewidth = 1.0, color = "#1B9E77") +
  geom_vline(xintercept = 0, color = "firebrick", linetype = "dashed", linewidth = 0.9) +
  annotate("text", x = 0.02, y = 10, label = paste0("P(Δ > 0) = ", scales::percent(prob_y_gt_x, accuracy = 0.1)),
           hjust = 0, size = 4.2, fontface = "bold") +
  labs(x = expression(Delta~" = p[Y] - p[X]"),
       y = "Density",
       title = "Posterior of the Improvement Difference (Δ)") +
  theme_minimal(base_size = 13) +
  theme(panel.grid.minor = element_blank())

Posterior distribution of Δ = pY − pX. Shaded area is P(Δ > 0).

Interpretation: If most mass is right of 0 and \(P(Δ > 0)\) is high, Drug Y likely outperforms Drug X. The 95% credible interval for \(Δ\) offers an uncertainty range in absolute improvement difference.

6.5 Posterior predictive check (optional)

What is the predicted number of improvements out of 150 future patients for each drug?

pred_x <- rbinom(length(px), size = 150, prob = px)
pred_y <- rbinom(length(py), size = 150, prob = py)

pred_df <- tibble(
  `Drug X` = pred_x,
  `Drug Y` = pred_y
) %>%
  pivot_longer(everything(), names_to = "drug", values_to = "improved")

ggplot(pred_df, aes(x = improved, fill = drug, color = drug)) +
  geom_histogram(position = "identity", alpha = 0.25, bins = 60) +
  geom_density(linewidth = 1) +
  scale_fill_manual(values = c("Drug X" = "#2E86AB", "Drug Y" = "#F18F01")) +
  scale_color_manual(values = c("Drug X" = "#2E86AB", "Drug Y" = "#F18F01")) +
  labs(x = "Predicted improvements (out of 150)", y = "Density",
       title = "Posterior Predictive Distributions") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "top", panel.grid.minor = element_blank())

Posterior predictive distributions for number of improved patients out of 150.

7 Sensitivity analysis (alternative priors)

Results should not hinge critically on the chosen prior. As a quick check, we try a slightly regularizing prior \(\mathrm{Beta}(2,2)\) that gently favors middle probabilities.

alpha0b <- 2; beta0b <- 2

post_x_b <- list(alpha = alpha0b + s_x, beta = beta0b + f_x)  # Beta(47, 107)
post_y_b <- list(alpha = alpha0b + s_y, beta = beta0b + f_y)  # Beta(62, 92)

S2 <- 8e5
px_b <- rbeta(S2, post_x_b$alpha, post_x_b$beta)
py_b <- rbeta(S2, post_y_b$alpha, post_y_b$beta)
prob_y_gt_x_b <- mean(py_b > px_b)

tibble(
  prior = c("Beta(1,1)", "Beta(2,2)"),
  `P(pY > pX | data)` = c(prob_y_gt_x, prob_y_gt_x_b)
)

8 Executive summary

Frequentist tests assess whether the proportions differ; they provide p-values but not the probability that one drug is better.
Bayesian analysis yields the posterior probability that Drug Y’s improvement rate exceeds Drug X’s, e.g., 96.5%, along with credible intervals for each drug and for the difference \(Δ\).
Visual diagnostics (posterior densities, \(Δ\) distribution, and posterior predictive checks) help communicate magnitude and uncertainty to decision-makers.

8.1 > Reporting tip: In clinical/biostat settings, complement the above with a pre-registered analysis plan, clarify prior choices, and include a region of practical equivalence (ROPE) if relevant to clinical significance.

Drug X vs Drug Y: Frequentist and Bayesian Analysis

Irene Tsapara

2025-10-03