This document reproduces the clinical trial comparison between
Drug X and Drug Y using both
frequentist and Bayesian
approaches.
- Drug X improved 45 out of 150 patients.
- Drug Y improved 60 out of 150 patients.
We (1) run standard two-proportion tests, (2) compute Bayesian posterior distributions for response probabilities under a Beta-Binomial model with transparent priors, (3) estimate the probability that Drug Y is more effective than Drug X, and (4) visualize results with professional figures and commentary.
library(tidyverse)
library(scales)
In this section, we define the observed trial data for Drug X and Drug Y. Each drug was administered to 150 patients, and we recorded how many showed significant improvement. Drug X improved 45 patients, while Drug Y improved 60 patients.
This setup translates into two binomial outcomes—each with a fixed number of trials (patients treated) and a number of “successes” (patients improved). By structuring the data this way, we can:
From this setup:
- Drug X shows an observed improvement rate of 45/150 =
30%.
- Drug Y shows an observed improvement rate of 60/150 =
40%.
At face value, Drug Y appears to perform better. However, random variation in small samples could explain this difference. To determine whether this observed difference is statistically meaningful, we proceed to both frequentist and Bayesian analyses.
n_x <- 150; s_x <- 45; f_x <- n_x - s_x
n_y <- 150; s_y <- 60; f_y <- n_y - s_y
trial_df <- tibble(
drug = c("Drug X", "Drug Y"),
improved = c(s_x, s_y),
not_improved = c(f_x, f_y)
) %>% mutate(rate = improved / (improved + not_improved))
trial_df
Before applying statistical tests, it is useful to visually inspect the raw proportions. Visualization helps communicate differences intuitively and allows us to add error bars to illustrate uncertainty.
Here, we plot improvement rates for each drug as a bar chart with approximate 95% confidence intervals. This gives us a preliminary sense of whether the observed differences are large enough to warrant further testing.
ggplot(trial_df, aes(x = drug, y = rate, fill = drug)) +
geom_col(width = 0.6, alpha = 0.9) +
geom_errorbar(aes(ymin = rate - 1.96*sqrt(rate*(1-rate)/(improved+not_improved)),
ymax = rate + 1.96*sqrt(rate*(1-rate)/(improved+not_improved))),
width = 0.15, linewidth = 0.8) +
geom_text(aes(label = percent(rate, accuracy = 0.1)),
vjust = -0.8, size = 4.2, fontface = "bold") +
scale_y_continuous(labels = percent_format(), limits = c(0, 0.6)) +
scale_fill_manual(values = c("Drug X" = "#2E86AB", "Drug Y" = "#F18F01")) +
labs(x = NULL, y = "Improvement Rate", fill = NULL) +
theme_minimal(base_size = 13) +
theme(legend.position = "none",
panel.grid.minor = element_blank(),
plot.title = element_text(face = "bold")) +
ggtitle("Observed Improvement Rates")
Observed improvements by drug with 95% normal-approximate error bars.
Expanded Comment: The plot shows that the observed improvement rate for Drug X is 30%, while for Drug Y it is 40%. At first glance, this suggests Drug Y is better. However, overlap in error bars indicates uncertainty—differences could be due to random chance. Therefore, we need formal statistical inference to determine if the difference is significant and meaningful.
Frequentist methods test hypotheses by calculating probabilities under assumed null models. In this context, we test whether the difference in improvement rates between Drug X and Drug Y could have arisen by chance if both drugs were equally effective.
We use two approaches:
1. Two-proportion z-test (prop.test):
compares proportions between two groups.
2. Fisher’s exact test: provides an exact p-value,
especially useful for small sample sizes.
prop_res <- prop.test(x = c(s_x, s_y), n = c(n_x, n_y), correct = TRUE)
fisher_res <- fisher.test(matrix(c(s_x, f_x, s_y, f_y), nrow = 2, byrow = TRUE))
prop_res
##
## 2-sample test for equality of proportions with continuity correction
##
## data: c(s_x, s_y) out of c(n_x, n_y)
## X-squared = 2.8718, df = 1, p-value = 0.09014
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## -0.21401832 0.01401832
## sample estimates:
## prop 1 prop 2
## 0.3 0.4
fisher_res
##
## Fisher's Exact Test for Count Data
##
## data: matrix(c(s_x, f_x, s_y, f_y), nrow = 2, byrow = TRUE)
## p-value = 0.08991
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 0.3869809 1.0657990
## sample estimates:
## odds ratio
## 0.6438247
Interpretation (frequentist):
- If p-values are small (typically <0.05), this suggests evidence
against the null hypothesis of equal proportions.
- However, frequentist methods do not directly tell us the probability
that Drug Y is better—they only measure how surprising the observed data
would be if there were no difference.
We now shift to a Bayesian perspective, which interprets probability as degree of belief rather than long-run frequency. By using Bayes’ theorem, we update our prior uncertainty with observed data to compute posterior distributions for the improvement rates of each drug.
Posterior summaries allow us to quantify uncertainty directly. Using
simulation, we can compute:
- Credible intervals for each drug’s improvement
rate.
- The probability that Drug Y is better than Drug
X.
- The posterior distribution of the difference (Δ = pY −
pX), which provides an intuitive view of effect size.
This approach goes beyond frequentist p-values by answering the clinically relevant question: Given the data, what is the probability Drug Y truly outperforms Drug X?
alpha0 <- 1; beta0 <- 1
post_x <- list(alpha = alpha0 + s_x, beta = beta0 + f_x)
post_y <- list(alpha = alpha0 + s_y, beta = beta0 + f_y)
S <- 2e6
px <- rbeta(S, post_x$alpha, post_x$beta)
py <- rbeta(S, post_y$alpha, post_y$beta)
delta <- py - px
prob_y_gt_x <- mean(py > px)
ci_px <- quantile(px, c(0.025, 0.5, 0.975))
ci_py <- quantile(py, c(0.025, 0.5, 0.975))
ci_delta <- quantile(delta, c(0.025, 0.5, 0.975))
list(
`P(pY > pX | data)` = prob_y_gt_x,
`CI pX` = ci_px,
`CI pY` = ci_py,
`CI delta` = ci_delta
)
## $`P(pY > pX | data)`
## [1] 0.964873
##
## $`CI pX`
## 2.5% 50% 97.5%
## 0.2324476 0.3018038 0.3777090
##
## $`CI pY`
## 2.5% 50% 97.5%
## 0.3250290 0.4009072 0.4802171
##
## $`CI delta`
## 2.5% 50% 97.5%
## -0.008263519 0.098887730 0.204744567
alpha0 <- 1; beta0 <- 1 # uniform priors
# Posterior parameters
post_x <- list(alpha = alpha0 + s_x, beta = beta0 + f_x) # Beta(46, 106)
post_y <- list(alpha = alpha0 + s_y, beta = beta0 + f_y) # Beta(61, 91)
post_x; post_y
## $alpha
## [1] 46
##
## $beta
## [1] 106
## $alpha
## [1] 61
##
## $beta
## [1] 91
# Draw from posteriors to estimate summaries and P(pY > pX)
S <- 2e6 # large for precise estimate; reduce if too slow
px <- rbeta(S, post_x$alpha, post_x$beta)
py <- rbeta(S, post_y$alpha, post_y$beta)
delta <- py - px
prob_y_gt_x <- mean(py > px)
ci_px <- quantile(px, c(0.025, 0.5, 0.975))
ci_py <- quantile(py, c(0.025, 0.5, 0.975))
ci_delta <- quantile(delta, c(0.025, 0.5, 0.975))
list(
`P(pY > pX | data)` = prob_y_gt_x,
`CI pX` = ci_px,
`CI pY` = ci_py,
`CI delta` = ci_delta
)
## $`P(pY > pX | data)`
## [1] 0.964989
##
## $`CI pX`
## 2.5% 50% 97.5%
## 0.2324183 0.3017726 0.3778188
##
## $`CI pY`
## 2.5% 50% 97.5%
## 0.3251122 0.4008826 0.4800738
##
## $`CI delta`
## 2.5% 50% 97.5%
## -0.008074613 0.098861293 0.204549449
Interpretation: \(P(p_Y > p_X \mid ext{data})\) is the posterior probability that Drug Y has a higher improvement rate than Drug X. This is the direct belief update that frequentist p-values do not provide.
# Tidy draws for plotting
sample_df <- tibble(px = px, py = py) %>%
sample_n(100000) %>% # thin for faster plotting
pivot_longer(cols = everything(), names_to = "parameter", values_to = "p") %>%
mutate(parameter = recode(parameter, px = "Drug X", py = "Drug Y"))
ggplot(sample_df, aes(x = p, fill = parameter, color = parameter)) +
geom_density(alpha = 0.25, linewidth = 0.9) +
geom_vline(xintercept = ci_px[c(1,3)], linetype = "dashed", linewidth = 0.7, color = "#2E86AB") +
geom_vline(xintercept = ci_py[c(1,3)], linetype = "dashed", linewidth = 0.7, color = "#F18F01") +
scale_fill_manual(values = c("Drug X" = "#2E86AB", "Drug Y" = "#F18F01")) +
scale_color_manual(values = c("Drug X" = "#2E86AB", "Drug Y" = "#F18F01")) +
scale_x_continuous(labels = percent_format()) +
labs(x = "Improvement probability", y = "Posterior density",
title = "Posterior Distributions: Drug X vs Drug Y",
subtitle = paste0("Shaded densities with 95% central quantile lines")) +
theme_minimal(base_size = 13) +
theme(legend.position = "top",
panel.grid.minor = element_blank())
Posterior distributions for improvement probabilities.
delta_df <- tibble(delta = delta)
ggplot(delta_df, aes(x = delta)) +
geom_histogram(aes(y = after_stat(density)), bins = 120, fill = "#D1E8E2", color = "grey30") +
geom_density(linewidth = 1.0, color = "#1B9E77") +
geom_vline(xintercept = 0, color = "firebrick", linetype = "dashed", linewidth = 0.9) +
annotate("text", x = 0.02, y = 10, label = paste0("P(Δ > 0) = ", scales::percent(prob_y_gt_x, accuracy = 0.1)),
hjust = 0, size = 4.2, fontface = "bold") +
labs(x = expression(Delta~" = p[Y] - p[X]"),
y = "Density",
title = "Posterior of the Improvement Difference (Δ)") +
theme_minimal(base_size = 13) +
theme(panel.grid.minor = element_blank())
Posterior distribution of Δ = pY − pX. Shaded area is P(Δ > 0).
Interpretation: If most mass is right of 0 and \(P(Δ > 0)\) is high, Drug Y likely outperforms Drug X. The 95% credible interval for \(Δ\) offers an uncertainty range in absolute improvement difference.
What is the predicted number of improvements out of 150 future patients for each drug?
pred_x <- rbinom(length(px), size = 150, prob = px)
pred_y <- rbinom(length(py), size = 150, prob = py)
pred_df <- tibble(
`Drug X` = pred_x,
`Drug Y` = pred_y
) %>%
pivot_longer(everything(), names_to = "drug", values_to = "improved")
ggplot(pred_df, aes(x = improved, fill = drug, color = drug)) +
geom_histogram(position = "identity", alpha = 0.25, bins = 60) +
geom_density(linewidth = 1) +
scale_fill_manual(values = c("Drug X" = "#2E86AB", "Drug Y" = "#F18F01")) +
scale_color_manual(values = c("Drug X" = "#2E86AB", "Drug Y" = "#F18F01")) +
labs(x = "Predicted improvements (out of 150)", y = "Density",
title = "Posterior Predictive Distributions") +
theme_minimal(base_size = 13) +
theme(legend.position = "top", panel.grid.minor = element_blank())
Posterior predictive distributions for number of improved patients out of 150.
Results should not hinge critically on the chosen prior. As a quick check, we try a slightly regularizing prior \(\mathrm{Beta}(2,2)\) that gently favors middle probabilities.
alpha0b <- 2; beta0b <- 2
post_x_b <- list(alpha = alpha0b + s_x, beta = beta0b + f_x) # Beta(47, 107)
post_y_b <- list(alpha = alpha0b + s_y, beta = beta0b + f_y) # Beta(62, 92)
S2 <- 8e5
px_b <- rbeta(S2, post_x_b$alpha, post_x_b$beta)
py_b <- rbeta(S2, post_y_b$alpha, post_y_b$beta)
prob_y_gt_x_b <- mean(py_b > px_b)
tibble(
prior = c("Beta(1,1)", "Beta(2,2)"),
`P(pY > pX | data)` = c(prob_y_gt_x, prob_y_gt_x_b)
)