What is A/B testing? Why do we do it?

Design-led companies (Apple, Google, Airbnb, etc.) frequently apply design thinking to design new products (Naiman 2020). A/B testing (also known as split testing or bucket testing) is “a method of comparing two versions of a webpage or app against each other to determine which one performs better.” (Optimizly, 2019).

How A/B testing works

Some people get randomly assigned to group A, some others to group B. Each group is exposed to a different treatment of some underlying variable. This variable could be the discount amount, ad copy, etc. This underlying variable is what gets “manipulated.” Marketing researchers or data scientists then observe some outcome(s) that might be affected by the manipulated variables. We then create a dummy variable for the treatment group and for the control group. ## To evaluate the effect of the 2 different treatments of (A and B), we run the following regression: Y = β0 + β1*XA + ϵ

The coefficient β1 can be interpreted as the additional effect that treatment A has on the outcome variable compared to treatment B. β0 can be interpreted as the average outcome, or the predicted value of Y, for treatment group B. Usually, if one of the treatments is considered a “control” group, the control group would be used as the baseline in the regression. i.e. the group that is left out and absorbed into the intercept β0.

What is dummy coding? Why do we need it?

Dummy coding is required for performing experimental research. Since we have dummy variables (i.e., a control/placebo group and a treatment group) in our model, the intercept has more meaning. Dummy coded variables have values of 0 for the reference/control/placebo group and 1 for the comparison/treatment group. Since the intercept is the expected mean value when X=0, it is the mean value only for the reference group (when all other X=0). Dummy coding is a way to make the categorical variable into a series of dichotomous variables (variables that can have a value of zero or one only). For more details, please see the UCLA statistics site (available at https://stats.idre.ucla.edu/spss/faq/coding-systems-for-categorical-variables-in-regression-analysis/).

Example - Analyzing the relationship between advertising exposures and product purchase

It is suggested that “the effect of advertising appears non-linear, with an optimum between two and three exposures per week (Tellis, 1987).” For our example on the relationship between advertising exposures and product purchase below, we will be testing the relationship between advertising and product purchase using regression analysis. Our null hypothesis (usually denoted as H0) is that there is no relationship between advertising exposures and product purchases using regression analysis. The alternative hypothesis (usually denoted as H1) is that there is a relationship between advertising exposures and product purchases. The hypothesis test can be represented by the following notation:

Null Hypothesis: H0: β1 = 0 Alternative Hypothesis: H1: β1 ≠ 0

First, we will be creating a new variable that has a value of one for each observation at that level and zeroes for all others. In our example using the variable (Ads), the first new variable (Ads1) will have a value of one for each observation in which the consumers are exposed to the 1st ads campaign and zero for all other observations. Likewise, we create Ads2 when the consumers are exposed to the 1st ads campaign, and 0 otherwise, and Ads3 is 1 when the consumers are exposed to the 3rd ads campaign, and 0 otherwise. The level of the categorical variable that is coded as zero in the new variables is the reference level or the level to which all of the other levels are compared. In our example, it is the reference level Ads0. Our objective is to see which ads campaign leads to more product sales.

Example 1 - A simple A/B test

You can also perform this analysis using Excel

#setwd("C:/Users/zxu3/Documents/R/abtesting")
#install.packages("readr")
library(readr)
data <- read_csv("abtesting.csv")
## Rows: 38 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (2): Ads, Purchase
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
ls(data) # list the variables in the dataset
## [1] "Ads"      "Purchase"
head(data) #list the first 6 rows of the dataset
## # A tibble: 6 × 2
##     Ads Purchase
##   <dbl>    <dbl>
## 1     1      113
## 2     0       83
## 3     0       52
## 4     1      119
## 5     1      188
## 6     0       99
# creating the factor variable
data$Ads <- factor(data$Ads)
is.factor(data$Ads)
## [1] TRUE
# showing the first 15 rows of the variable "Ads"
data$Ads[1:15]
##  [1] 1 0 0 1 1 0 0 1 1 1 0 0 0 1 0
## Levels: 0 1
#now we do the regression analysis and examine the results
summary(lm(Purchase~Ads, data = data))
## 
## Call:
## lm(formula = Purchase ~ Ads, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -57.000 -23.250   3.071  22.643  51.000 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   95.429      6.441  14.816  < 2e-16 ***
## Ads1          41.571      9.630   4.317 0.000118 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 29.52 on 36 degrees of freedom
## Multiple R-squared:  0.3411, Adjusted R-squared:  0.3228 
## F-statistic: 18.64 on 1 and 36 DF,  p-value: 0.0001184

Example 2.2 -An A/B test with only one line of syntax (no dummy coding required)

#Alternatively, you can also use the factor function within the lm function, saving the step of creating the factor variable first.
summary(lm(Purchase~ factor(Ads), data))
## 
## Call:
## lm(formula = Purchase ~ factor(Ads), data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -57.000 -23.250   3.071  22.643  51.000 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    95.429      6.441  14.816  < 2e-16 ***
## factor(Ads)1   41.571      9.630   4.317 0.000118 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 29.52 on 36 degrees of freedom
## Multiple R-squared:  0.3411, Adjusted R-squared:  0.3228 
## F-statistic: 18.64 on 1 and 36 DF,  p-value: 0.0001184

Example 3 - Performing an A/B Test on Advertising Effectiveness

# Load necessary libraries
library(readr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# Load the dataset
data_ab <- read_csv("ab_testing.csv")
## Rows: 80 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (2): Ads, Purchase
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Check the structure of the dataset
str(data_ab)
## spc_tbl_ [80 × 2] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Ads     : num [1:80] 1 0 3 0 1 1 2 2 2 0 ...
##  $ Purchase: num [1:80] 152 21 77 65 183 87 121 104 116 82 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Ads = col_double(),
##   ..   Purchase = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>
# Ensure the Ads column is correctly formatted as a factor
data_ab$Ads <- as.factor(data_ab$Ads)

# Check unique values in Ads
unique_ads <- unique(data_ab$Ads)
print(unique_ads)  # Print unique values to verify
## [1] 1 0 3 2
## Levels: 0 1 2 3
# Ensure Ads has exactly two levels by filtering (in case of extra levels)
if (length(unique_ads) > 2) {
  data_ab <- data_ab %>% filter(Ads %in% c(unique_ads[1], unique_ads[2]))
}

# Perform t-test to compare means
t_test_result <- t.test(Purchase ~ Ads, data = data_ab, var.equal = TRUE)

# Display test results
print(t_test_result)
## 
##  Two Sample t-test
## 
## data:  Purchase by Ads
## t = -7.5782, df = 40, p-value = 2.975e-09
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -95.90693 -55.52164
## sample estimates:
## mean in group 0 mean in group 1 
##        55.38095       131.09524

** Interpretation of Results**

The t-test results show a statistically significant difference in purchases between the two groups (Ads = 0 and Ads = 1), with a p-value of 2.975e-09, which is much smaller than 0.05. This means we can confidently reject the null hypothesis and conclude that the ad campaign had a significant impact on purchases.

Summary of Purchase Means

  • Control group (Ads = 0) → Average purchase: 55.38
  • Treatment group (Ads = 1) → Average purchase: 131.10

This suggests that people exposed to the ad were far more likely to make a purchase compared to those who were not.

Additionally, the 95% confidence interval for the difference in means ranges from -95.91 to -55.52, meaning we are highly confident that the true difference in purchases falls within this range. Since the entire interval is negative, it further supports the conclusion that the ad campaign led to a significant increase in purchases.


Regression Analysis Interpretation

Since the p-value for Ads is 0.00018 (less than 0.05), we reject the null hypothesis in favor of the alternative hypothesis.

The coefficient for Ads1 in the regression output is 41.57, indicating that the first advertising campaign was more effective than no advertising exposure.

  • Estimated coefficients:
    • β₀ (Intercept) = 95.43
    • β₁ (Ads1 effect) = 41.57

Predictions:
- Control group (A) expected sales = 95.43
- Treatment group (B) expected sales = 95.43 + 41.57 × 1 = 137

This confirms that the advertising campaign significantly boosted purchases.

Regression Analysis Interpretation

Since the p-value for Ads is 0.00018 (less than 0.05), we reject the null hypothesis in favor of the alternative hypothesis.

The coefficient for Ads1 in the regression output is 41.57, which indicates that the first advertising campaign was more effective (relative to the group with no exposure).

  • Estimated coefficients:
    • β₀ (Intercept) = 95.43
    • β₁ (Ads1 effect) = 41.57

From this, we predict:
- Average sales for the control group (A) = 95.43
- Average sales for the treatment group (B) = 95.43 + 41.57 × 1 = 137

This confirms that the advertising campaign significantly boosted purchases.


Summary

-The A/B test and regression results strongly indicate that the ad campaign had a positive and significant impact on sales, making it an effective marketing strategy.

```

A question you might want to ask - why do I show you this tutorial using both Excel and R?

You will also find exactly the same coefficients using the Regression Data Analysis Tool in Excel. However, Excel also can’t handle large datasets (hundreds of thousands of records (Gapintelligence, 2020). Additionally, if you would like to perform the analysis and document the whole process (e.g., objectives, methods, hypotheses, results, and discussions), then using Rstudio with RMarkdown is probably one of the best choices.

Reference: Understanding R programming over Excel for Data Analysis https://www.gapintelligence.com/blog/understanding-r-programming-over-excel-for-data-analysis/

References

Families buy more sugary cereal if advertising targets kids, not adults. https://www.npr.org/sections/shots-health-news/2025/02/04/nx-s1-5285413/cereal-sugar-kids-advertising-health Links to an external site. A/B Testing: Test Your Own Hypotheses & Prepare to be Wrong - Stuart Frisby

https://www.youtube.com/watch?v=VQpQ0YHSfqM&t=189s

Naiman 2020. Design Thinking as a Strategy for Innovation. https://www.creativityatwork.com/design-thinking-strategy-for-innovation/

Tellis 1987. Marketing Science. https://www.msi.org/reports/advertising-exposure-loyalty-and-brand-purchase-a-two-stage-model-of-choice/

https://stats.idre.ucla.edu/r/modules/coding-for-categorical-variables-in-regression-models/

Create an A/B test, https://support.google.com/optimize/answer/6211930?hl=en Experiments at Airbnb,https://medium.com/airbnb-engineering/experiments-at-airbnb-e2db3abf39e7

https://firstround.com/review/How-design-thinking-transformed-Airbnb-from-failing-startup-to-billion-dollar-business/

Your Step-by-Step Guide to A/B Testing with Google Optimize, https://www.crazyegg.com/blog/ab-testing-google-analytics/

https://firebase.google.com/products/ab-testing

https://www.sitepoint.com/perform-ab-testing-google-optimize/

https://marketingplatform.google.com/about/optimize/features/