A/B Testing

Why A/B Testing?

Split users randomly into two groups
Group A (Control): the original version
Group B (Treatment): the new version
Track a metric (e.g. conversion rate) for both groups
Use statistics to decide if the difference is real or just chance

Example

A company tests two versions of a sign-up button on their landing page.

Group A (Control)
- Button: Blue “Sign Up”
- Visitors: 1,200
- Conversions: 144 (12%)
Group B (Treatment)
- Button: Green “Get Started Free”
- Visitors: 1,200
- Conversions: 192 (16%)

Hypotheses

\(H_0: p_A = p_B\)

\(H_a: p_A \neq p_B\)

\(p_A\), \(p_B\) = true conversion rates for each group
Significance level: \(\alpha = 0.05\)
Reject \(H_0\) when \(|z| > 1.96\)

Test Statistic

\[z = \frac{\hat{p}_A - \hat{p}_B}{\sqrt{\hat{p}\,(1-\hat{p})\!\left(\frac{1}{n_A} + \frac{1}{n_B}\right)}}\]

Where \(\hat{p}\) is the pooled proportion across both groups:

\[\hat{p} = \frac{x_A + x_B}{n_A + n_B} = \frac{144 + 192}{2400} = 0.14\]

Running the Test in R

x_A = 144; n_A = 1200
x_B = 192; n_B = 1200

p_hat = (x_A + x_B) / (n_A + n_B)
se = sqrt(p_hat * (1 - p_hat) * (1/n_A + 1/n_B))
z = (x_A/n_A - x_B/n_B) / se
p_value = 2 * pnorm(-abs(z))

cat(sprintf("z = %.3f, p-value = %.4f", z, p_value))

## z = -2.824, p-value = 0.0047

Conversion Rate Comparison

Example Daily Conversion Rates Over 14 Days

Where Does Our Z-Score Fall?

Interpretation

With z = -2.178 and p-value = 0.0294:

\(p < 0.05\), so we reject \(H_0\)
Our z-score falls in the rejection region (past -1.96), as shown in the previous plot
The green “Get Started Free” button significantly outperforms the original

Summary

A/B testing uses hypothesis testing to make data-driven product decisions
A low p-value means the result is unlikely to be random chance