Why A/B Testing?

  • Split users randomly into two groups
  • Group A (Control): the original version
  • Group B (Treatment): the new version
  • Track a metric (e.g. conversion rate) for both groups
  • Use statistics to decide if the difference is real or just chance

Example

A company tests two versions of a sign-up button on their landing page.

  • Group A (Control)
    • Button: Blue “Sign Up”
    • Visitors: 1,200
    • Conversions: 144 (12%)
  • Group B (Treatment)
    • Button: Green “Get Started Free”
    • Visitors: 1,200
    • Conversions: 192 (16%)

Hypotheses

\(H_0: p_A = p_B\)

\(H_a: p_A \neq p_B\)

  • \(p_A\), \(p_B\) = true conversion rates for each group
  • Significance level: \(\alpha = 0.05\)
  • Reject \(H_0\) when \(|z| > 1.96\)

Test Statistic

\[z = \frac{\hat{p}_A - \hat{p}_B}{\sqrt{\hat{p}\,(1-\hat{p})\!\left(\frac{1}{n_A} + \frac{1}{n_B}\right)}}\]

Where \(\hat{p}\) is the pooled proportion across both groups:

\[\hat{p} = \frac{x_A + x_B}{n_A + n_B} = \frac{144 + 192}{2400} = 0.14\]

Running the Test in R

x_A = 144; n_A = 1200
x_B = 192; n_B = 1200

p_hat = (x_A + x_B) / (n_A + n_B)
se = sqrt(p_hat * (1 - p_hat) * (1/n_A + 1/n_B))
z = (x_A/n_A - x_B/n_B) / se
p_value = 2 * pnorm(-abs(z))

cat(sprintf("z = %.3f, p-value = %.4f", z, p_value))
## z = -2.824, p-value = 0.0047

Conversion Rate Comparison

Example Daily Conversion Rates Over 14 Days

Where Does Our Z-Score Fall?

Interpretation

With z = -2.178 and p-value = 0.0294:

  • \(p < 0.05\), so we reject \(H_0\)
  • Our z-score falls in the rejection region (past -1.96), as shown in the previous plot
  • The green “Get Started Free” button significantly outperforms the original

Summary

  • A/B testing uses hypothesis testing to make data-driven product decisions
  • A low p-value means the result is unlikely to be random chance