Module on Adaptive Experimentation

Author

Fred Haochen Song & Ilya Musabirov

Published

November 15, 2024

Loading required package: exams

Part I: Module Syllabus

Description

This module introduces adaptive experiments as a human-centered approach to experimental design that dynamically adjusts participant allocation based on observed outcomes. Building on previous topics of exploratory data analysis, visualization, and machine learning fairness, it explores how to balance exploration and exploitation in real-world settings.

Learning Goals

Understanding (Basic)

Define key concepts: adaptive experiments, response-adaptive randomization, exploration vs exploitation
Explain how adaptive allocation probabilities change based on accumulated evidence

Application (Intermediate)

Compare traditional RCTs vs adaptive experiments for specific use cases
Apply adaptive experiment principles to educational technology scenarios

Analysis (Advanced)

Analyze stakeholder interests and ethical implications in different experimental contexts
Evaluate tradeoffs between statistical validity and participant outcomes using situational decision-making

Integration with Previous Topics

Visualization: Using plots and diagrams to understand experimental allocation over time
Human-Centered ML: Considering fairness and impact on participants when designing experiments
Ethics: Balancing research goals with participant benefits and potential risks

These goals incorporate connections to the previous topics of the Interactive Data Science course: data visualization (Unit 2), human-centered approaches (Unit 3), and ethical considerations (Unit 5) while introducing new concepts in experimental design.

Content

For the IDS course, the module is organized into two parts:

An online module in Open Learning Initiative platform, completed by students before the class
A classroom activity for the class with individual and group work

Formative Assessment

OLI module uses a combination of multiple-choice questions (with distractors designed to corresponding common misconceptions) and schema-building activities (mix and match, elaboration), supporting contextual understanding
As this is a graduate-level course, some questions are designed to push the understanding forward, nudging students to reflect on the material they just learned. This desired challenge is compensated by the non-graded nature of these questions and answer-specific feedback for every option, explaining to the student their misconception.
Classroom tasks are aimed to connect this understanding to other course topics, as well as ethical decision-making frameworks, aiming to structure the discussion around the applicability and trade-offs behind the research method choices.

Part II: Self-Reading and Practicing Content (Before Class):

Human-Centered Data Science gives us new ways to analyze and apply data. We can combine ideas from traditional experiments with machine learning methods to do an adaptive experiment.

In an adaptive experiment, we adjust our design — typically the allocation of participants to different conditions — based on what we learn from each interaction with participants at any given point. The goal is usually to balance exploration — testing various conditions to gather more information about their effectiveness with exploitation — using the algorithm’s knowledge so far to allocate participants to the currently better-performing conditions.

One example of balancing exploration and exploitation is finding good restaurants in a new city. At first, you explore different options, trying various places to gather information about what’s good. Once you’ve identified a standout spot, you start visiting it more often, but still occasionally try new places in case there’s an even better one you haven’t discovered yet.

One strategy to achieve balancing exploration and exploitation in practice is is response-adaptive randomization, where the probabilities of assigning participants to each condition are adjusted based on accumulated evidence. As more data from participants is gathered over time, the allocation approach gradually shifts away from an even split (e.g., 50/50) toward favoring conditions that perform better empirically based on the available evidence.

Question:

Could you suggest another example illustrating the idea of an adaptive experiment?

Thanks for Response! Another example to think of is to test the effectiveness of two type of vaccines.

Question:

Select a true statement about randomization.

Response-adaptive randomization always starts from unequal probabilities of assignment for each condition.Traditional (uniform-randomly assigned) experiments express our preference toward exploration compared to exploitation.The goal of response-adaptive randomization is to achieve equal size of groups for each condition.Randomization always assumes equal probabilities of assignment to conditions.

Randomization always assumes equal probabilities of assignment to conditions:

Explanation of Options:

Option 1: False. Response-adaptive randomization typically begins with equal probabilities and adjusts them based on participant responses, not always starting with unequal probabilities.
Option 2: False. Traditional uniform randomization treats all conditions equally without a preference for exploration or exploitation.
Option 3: False. The goal of response-adaptive randomization is to allocate more participants to better-performing conditions, not necessarily to equalize group sizes.
Option 4: True. Randomization ensures that each condition has an equal chance of assignment unless specified otherwise, maintaining the integrity of the experimental results.

Let’s recap some definitions

Condition: A specific treatment, arm, or variation assigned to a participant during the experiment.

Reward: A measurable outcome or benefit that results from a participant’s interaction with one of the conditions. In the simplest case, the reward can be binary: a success or a failure, e.g., answering a test question correctly or not.

Response-adaptive randomization: A method used in experiments where the probability of assigning participants to different conditions changes over time based on observed results (e.g. reward).

Adaptive experiment is a type of randomized trial (for example, in areas like clinical trials, and educational research) in which we adjust our design — typically the allocation to different conditions — based on what we learn from interactions with participants. In this module, we will focus on adaptive experiments using statistical and machine learning methods to perform response-adaptive randomization.

Question:

Developing your example of an adaptive experiment from the task above, suggest conditions for your example. Feel free to discuss alternative ideas.

Thank you for your response! Example conditions could include two types of treatments such as:

Vaccine A with a higher initial dosage.
Vaccine B with a standard dosage and booster after one month.

These conditions would allow for comparing the effectiveness and adaptively assigning participants based on observed outcomes.

Question:

Developing your example of an adaptive experiment from the task above, suggest a reward for your example. Feel free to discuss alternative ideas.

Thank you for your response! An example reward could be whether an individual participant survives after receiving the treatment (mortality). Alternatively, it could measure improvements in the participant’s health, such as reduced symptom severity or time to recovery.

Question:

Which of the following statements about adaptive experiments is correct?

Adaptive experiments are not usually randomized.Adaptive experiments are not usually uniformly randomized.Adaptive experiments ensure an even split of participants across all conditions throughout the experiment.The primary goal of adaptive experiments is to avoid randomization entirely.

Adaptive experiments are not usually uniformly randomized:

Explanation of Options:

Option 1: False. Adaptive experiments often involve randomization, but the randomization is adjusted dynamically based on prior responses.
Option 2: True. Adaptive experiments typically deviate from uniform randomization, dynamically adjusting the allocation probabilities based on observed data to optimize outcomes.
Option 3: False. Adaptive experiments do not aim to ensure an even split of participants across conditions; instead, they aim to allocate participants based on performance or other criteria.
Option 4: False. The primary goal of adaptive experiments is to improve efficiency and outcomes, not to eliminate randomization entirely.

Question:

Which of the following statements about adaptive experiments is correct?

Response-adaptive randomization always requires observing rewards immediately.Rewards in adaptive experiments are always binary, representing success or failure.One condition in adaptive experiments should always be control.To update the probabilities of assigning the participants to different conditions, we have to observe rewards.

To update the probabilities of assigning the participants to different conditions, we have to observe rewards:

Explanation of Options:

Option 1: False. While observing rewards is necessary for updating probabilities, the observation does not always need to be immediate; delayed rewards can also be incorporated into adaptive randomization.
Option 2: False. Rewards in adaptive experiments can take various forms, such as continuous values, ordinal scales, or other metrics, not just binary success/failure.
Option 3: False. Adaptive experiments do not necessarily require a control condition; they can involve multiple experimental conditions with no distinct control group.
Option 4: True. Observing rewards is essential for updating the allocation probabilities in adaptive experiments, as the reward feedback informs how conditions are adjusted dynamically.

EXAMPLE

Full Example of an Adaptive Experiment

Now that we have some understanding of adaptive experiments, let’s examine a real-world-inspired example. This example will focus on analyzing results and comparing outcomes between a traditional uniformly-randomly assigned Randomized Controlled Trial (RCT) and an adaptive experiment.

Learning designer Yi wants to encourage student self-reflection after course activities. Her goal is to help students think critically about their responses and learning processes.

Experiment Design

Learning designer Yi selects a specific decision point in the course: a particular activity where students are encouraged to reflect on their responses. She defines two conditions for her experiment:

Condition 1: No self-explanation prompt is shown to the students.
Condition 2: A self-explanation prompt is provided, asking: Can you explain why you chose your answer?

Yi defines the correctness of the student’s answer to a multiple-choice question (related to the activity) as the reward for the experiment. This allows her to measure the impact of reflection prompts on learning outcomes.

Comparing Traditional and Adaptive Experiments

Imagine that we ran both a traditional uniformly-randomly assigned RCT and an adaptive experiment in parallel, using the same design, to compare their results.

Results of the Traditional Experiment

In the traditional experiment, there is a statistically significant difference between the mean rewards (where the mean can be interpreted as the share of positive rewards for each condition). The results are as follows:

Condition 2 (self-explanation prompt): M2=0.608 (SEM=0.032)
Condition 1 (no prompt): M1=0.512 (SEM=0.032)

Results of the Adaptive Experiment

In the adaptive experiment, the computed estimates were:

Condition 2 (self-explanation prompt): M2=0.599
Condition 1 (no prompt): M1=0.539

These results also favor the self-explanation prompt (Condition 2). However, as discussed earlier, the primary advantage of adaptive experiments lies in the allocation of participants to different conditions.

Allocation Differences

In the traditional experiment, as expected, participants were assigned almost equally to the two conditions. By contrast, in the adaptive experiment, the assignment proportions differed significantly, reflecting the algorithm’s ability to prioritize better-performing conditions (in this case, Condition 2).

Only 176 students (40%) were assigned to Condition 1 (the worse-performing condition). This means that the algorithm allocated 262 students (60%) to Condition 2, which was identified as the better option according to our analysis. This corresponds to more than a 10% reallocation toward the better-performing condition. It illustrates how adaptive A/B comparisons can facilitate the rapid use of gathered evidence, which is especially relevant in real classroom settings.

Question:

What is the key benefit of adaptive experimentation compared to traditional Randomized Controlled Trials, as described in the example?

Adaptive experiments assign participants equally to each condition.Adaptive experiments aim for more participants to be assigned to the condition with better outcomes as the experiment progresses.Adaptive experiments are faster to conduct with fewer participants.Adaptive experiments provide statistically significant results without needing a control group.

Adaptive experiments aim for more participants to be assigned to the condition with better outcomes as the experiment progresses:

Explanation of Options:

Option 1: Incorrect. Unlike traditional RCTs, adaptive experiments dynamically adjust participant allocation based on observed outcomes, rather than assigning participants equally.
Option 2: Correct. The key benefit of adaptive experimentation is the ability to allocate more participants to conditions with better outcomes, improving efficiency and relevance.
Option 3: Incorrect. Adaptive experiments may not always be faster or require fewer participants; their primary benefit lies in the adaptive allocation process.
Option 4: Incorrect. Adaptive experiments still require a control group to compare outcomes and do not bypass statistical requirements for significance.

DID I GET THIS?

Question:

How would you explain an idea of Adaptive Experiment to a child in 150 words or less?

Thanks for Response!

Imagine you’re trying to figure out the best flavor of ice cream for your birthday party, but you don’t know which one everyone likes the most. Instead of giving everyone the same flavor, you start by letting a few people try chocolate and a few people try vanilla. After they taste it, you see which one people like better. If more people like chocolate, you give more chocolate to the next group of friends to try, but you still let some people try vanilla, just in case their opinion changes things.

Over time, you keep adjusting how much chocolate or vanilla you give out based on what people like. By the end, you’ll know which flavor is the best for your party. That’s how an Adaptive Experiment works—it learns as it goes and tries to do better each step!

Question:

If Condition 1 has reward statistics ( S = 2, F = 10 ), and Condition 2 has ( S = 10, F = 2 ), on the next step we:

Are more likely to select Condition 2Will always select Condition 1Are equally likely to select any conditionAre more likely to select Condition 1Will always select Condition 2

Are more likely to select Condition 2:

Explanation of Options:

Option 1: Correct. Condition 2 has a significantly higher success-to-failure ratio (( S = 10, F = 2 )) compared to Condition 1 (( S = 2, F = 10 )), making it more likely to be selected in the next step.
Option 2: Incorrect. While Condition 1 is less successful, adaptive experiments use probabilities for selection, and no condition is guaranteed to be selected on every step.
Option 3: Incorrect. Conditions are not equally likely to be selected, as the selection probabilities depend on observed reward statistics.
Option 4: Incorrect. Condition 1 has a lower reward ratio compared to Condition 2, so it is less likely to be selected.
Option 5: Incorrect. Condition 2 is more likely to be selected, but it is not guaranteed to be chosen every time.

The following table displays the results of the experiments:

Experiment	Condition	Successes	Failures
Experiment A	Condition 1	4	6
	Condition 2	6	4
Experiment B	Condition 1	8	12
	Condition 2	12	8

Question:

Compare two experiments based on the table above. In both Experiment A and Experiment B, the success rate for Condition 1 is the same (40%). In both Experiment A and Experiment B, the success rate for Condition 2 is also the same (60%).

We know more about the effectiveness of every condition in Experiment A than in Experiment BWe know more about the effectiveness of every condition in Experiment B than in Experiment AWe know the same amount of information about the effectiveness of every condition in Experiments A and B

We know more about the effectiveness of every condition in Experiment B than in Experiment A:

Explanation of Options:

Option 1: Incorrect. While Experiment A has fewer participants, the proportional success rates do not provide higher certainty compared to Experiment B.
Option 2: Correct. Experiment B has a larger sample size, reducing variability and providing more information about the effectiveness of each condition.
Option 3: Incorrect. Although the success rates are identical, the amount of information differs due to the sample sizes.

Question:

We are more likely to select Condition 1 in Experiment A than in Experiment BWe are more likely to select Condition 1 in Experiment B than in Experiment AWe are equally likely to select Condition 1 in Experiment A and in Experiment B

We are more likely to select Condition 1 in Experiment A than in Experiment B:

Explanation of Options:

Option 1: Correct. Experiment A has a smaller sample size, resulting in a posterior distribution with greater variability. This increases the likelihood of selecting Condition 1 compared to Experiment B, where the larger sample size reduces posterior variability and skews the selection toward the true success rates.
Option 2: Incorrect. In Experiment B, the posterior is more concentrated due to the larger sample size, which reduces the likelihood of selecting Condition 1.
Option 3: Incorrect. Despite identical success rates, the difference in posterior variability makes the likelihood of selecting Condition 1 different between the experiments.

Part II: Simulation Based Scenario Analysis

This section is prepared to help you dynamically understand the changes in the concepts we have just covered in the Adaptive Experimentation module. We will further review, using a real-life example, how these concepts are changing and performing throughout data collection. Recall the following definition of adaptive experimentation:

Condition: A specific treatment, arm, or variation assigned to a participant during the experiment.
Reward: A measurable outcome or benefit that results from a participant’s interaction with one of the conditions. In the simplest case, the reward can be binary: a success or a failure, e.g., answering a test question correctly or not.
Probability of Selection: A likelihood that reflects not only the average effectiveness of each condition but also our level of uncertainty about this effectiveness. If we are early in an adaptive experiment and have limited information, or if the data we get on effectiveness of the conditions fluctuates significantly, the assignment probabilities will aim to account for this, encouraging more exploration to better understand each condition.
Posterior Probability: A distribution that represents the updated probability of a hypothesis or parameter after observing new data. It combines prior beliefs with the likelihood of the observed data to provide a refined estimate. Consider an experiment testing a new website feature that either leads to a purchase (success) or not (failure). Initially, you believe there’s a 50% chance the feature will result in a purchase. After showing the feature to 10 users and observing 7 successes, you update your belief. The posterior distribution now reflects a higher probability that the feature is effective, guiding future decisions like increasing its rollout.

Example: Starbucks Marketing Strategy

As a newly hired marketing strategist at Starbucks, you are tasked with finding the optimal way to advertise discounts to customers to encourage purchases while maintaining profit margins. In this context, you have designed the following three discount advertising conditions. The marketing team has decided to run a response-adaptive experiment to further validate which condition works best:

Condition 1: The customer receives a fixed $5 off their next purchase over $15.

Condition 2: The customer receives 40% off their next purchase of any amount.

Condition 3: The customer can buy one coffee and get complimentary snacks or breakfast on their next purchase.

Question:

What is the reward variable in your experiment design?

The reward variable is whether or not the customer make a purchase.

Question:

Is this reward a Binary reward?

Yes it is! It is either the customer make a purchase (1) or not make a purchase (0).

Scenario 1:

Since you don’t know which condition works best (typically, you could obtain some prior information by examining historical data), you assume an equal start for all three conditions.

In the illustrative chart below, you observe that all three conditions have an equal probability of selection. There are no reports of successes or failures in the middle chart because you have not received any data from participants yet. Consequently, the posterior distribution is flat, or in other words, a Uniform(0,1) distribution, as you do not assume any prior information about the likelihood of success for each condition. In this case, a success is defined as a customer making a purchase, which is a binary reward.

Scenario 2:

Moving on to the end of Day 1, you have collected the following data:

Out of 11 customers who were assigned Condition 1 (fixed $5 off), 9 customers made a purchase.
Out of 4 customers who were assigned Condition 2 (40% off), 2 customers made a purchase.
Out of 5 customers who were assigned Condition 3 (complimentary food), 3 customers made a purchase.

Looking at the illustrative charts below, we see that Condition 1 currently has a very high purchase rate, giving it the largest probability of selection (77%). Condition 3 still performs slightly better than Condition 2, resulting in a higher probability of selection compared to Condition 2. However, these are early estimates and reports. As shown in the middle chart, the variability of the estimated success rates for Conditions 2 and 3 remains very large. While Condition 1 has been more explored, the posterior distribution updates in the right chart indicate that all conditions are still in the learning stage. This means that the probabilities of selection are subject to change as more data are collected. Let’s illustrate this in Scenario 3 below.

Scenario 3:

After a week of data collection, you have collected the following data:

Out of 70 customers who were assigned Condition 1 (fixed $5 off), 40 customers made a purchase.
Out of 142 customers who were assigned Condition 2 (40% off), 102 customers made a purchase.
Out of 55 customers who were assigned Condition 3 (complimentary food), 30 customers made a purchase.

Sit down briefly and think what happened, and try to answer the following questions on your own, and bring this to the classroom discussion we have on Thursday:

Discussion:

Looking at the left chart, why does Condition 2 have a significantly higher probability of selection?
If you recall from the data we collected on Day 1, Condition 2 had a very low probability of selection. How was it subsequently explored so many times?
Looking at the middle chart, where the blue bars represent one standard deviation from the mean, do you think we have enough statistical confidence to conclude that Condition 2 is the best condition
- What about concluding that Condition 1 is better than Condition 3?
Looking at the right chart, could you highlight the area that represents the likelihood of Condition 2 being selected?
- What about the area for Condition 1?
The following link provides you with an interactive board to run analysis charts with any given number of successes and failures. Please try accumulating successes and failures in small steps.
- Could you simulate and illustrate what happens as the model transitions from Scenario 2 to Scenario 3?”

Part I - Creating Probability of Selection Chart:

library(dplyr)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

library(scales) 
library(stats)
library(ggplot2)
library(RColorBrewer)
library(tidyr)
library(patchwork)

# helper # 1: function to randomly initiate arm state
# input is num_arm -> number of arms setting up
# output is list of length 'num_arm', where each element of the list is a 2*1 vector
# first element represent number of successes 'observed' so far
# second element represent number of failures 'observed' so far

arm_generator <- function(num_arm) {
  replicate(num_arm, sample(1:10, 2, replace = TRUE), simplify = FALSE)
}
  

# helper # 2:function to estimate selection probability:

estimate_ap <- function(arms, num_sim = 5000) {
  K <- length(arms)
  arm_sim <- sapply(1:K, function(k) rbeta(num_sim, arms[[k]][1] + 1, arms[[k]][2] + 1))
  max_indices <- max.col(arm_sim, ties.method = "first")
  count <- tabulate(max_indices, nbins = K)
  ap <- count/num_sim
  return(ap)
}

set.seed(721)
num_arm <- 3
arms <- arm_generator(num_arm)

# Or one can set up their arms manually (please uncomment below):

arms <- list(c(1,2),
             c(1,1),
             c(2,3))
# arms <- list(c(9,2),
#              c(2,2),
#              c(3,2))

print(arms)

[[1]]
[1] 1 2

[[2]]
[1] 1 1

[[3]]
[1] 2 3

ap <- estimate_ap(arms)
# Create labels based on the number of probabilities
labels <- paste0("Condition ", seq_along(ap))

# Create a data frame
df <- data.frame(
  condition = labels,
  prob = ap
)

plt1 <- ggplot(df, aes(x = 2, y = prob, fill = condition)) +  # Set x to a constant (e.g., 2)
  geom_bar(stat = "identity", width = 1, color = "white") +     # White borders for separation
  coord_polar(theta = "y", start = 0) +                        # Polar coordinates for pie
  xlim(0.5, 2.5) +                                             # Adjust x limits to create the hole
  geom_text(
    aes(label = paste0(round(prob * 100), "%")), 
    position = position_stack(vjust = 0.5),                     # Center labels within slices
    color = "black",
    size = 5
  ) +
  ggtitle("Probabilities of Selection") +                       # Chart title                  
  theme_minimal() +
  theme_void(base_size = 14) +
  theme(legend.position = "none") +
  scale_fill_brewer(palette = "Pastel1")                        # Consistent color palette

Part II - Success Rates Estimate of Each Condition

df <- data.frame(
  condition = paste0("Condition ", seq_along(arms)),
  successes = sapply(arms, `[`, 1),
  failures  = sapply(arms, `[`, 2)
)

df <- df %>%
  mutate(
    total = successes + failures,
    p     = successes / total,            # mean success rate
    se    = sqrt(p * (1 - p) / total),    # standard error for p
    lower = p - 1.96 * se,                # lower CI (normal approximation)
    upper = p + 1.96 * se                 # upper CI (normal approximation)
  )

# Ensure CI doesn't go below 0 or above 1
df$lower <- pmax(df$lower, 0)
df$upper <- pmin(df$upper, 1)

# Create a pretty bar plot
plt2 <- ggplot(df, aes(x = condition, y = p, fill = condition)) +
  geom_col(width = 0.6, color = "white") +          # White borders for a cleaner look
  geom_errorbar(
    aes(ymin = lower, ymax = upper),
    width = 0.2,
    color = "black"
  ) +
  scale_y_continuous(labels = percent_format(accuracy = 1)) +
  scale_fill_brewer(palette = "Pastel1") +          # Use a pastel color palette
  ggtitle("Mean Success Rate by Condition")+
  labs(
    x = "Condition",
    y = "Success Rate"
  ) +
  theme_minimal(base_size = 14) +
  theme(legend.position = "none")

Part III - Posterior Distribution Plot:

# Create a data frame with condition labels and Beta shape parameters
df <- data.frame(
  condition = paste0("Condition ", seq_along(arms)),
  alpha = sapply(arms, function(x) x[1] + 1),
  beta  = sapply(arms, function(x) x[2] + 1)
)

# Generate a sequence of x-values from 0 to 1
x_vals <- seq(0, 1, length.out = 200)

# For each condition, compute the Beta PDF at each x
plot_data <- df %>%
  rowwise() %>%
  mutate(
    x = list(x_vals),
    y = list(dbeta(x_vals, alpha, beta))
  ) %>%
  unnest(cols = c(x, y))

# Plot each Beta distribution as a curve
plt3 <- ggplot(plot_data, aes(x = x, y = y, color = condition)) +
  geom_line(size = 1) +
  # Optionally use a pastel palette to match prior visuals
  scale_color_brewer(palette = "Pastel1") +
  labs(
    title = "Posterior Distribution Update",
    x = "Success Rate",
    y = "Density",
    color = "Condition"
  ) +
  theme_minimal(base_size = 14)

Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

Join Them Together (Horizontal Labeling)

# Combine them horizontally using patchwork
combined_plot <- plt1 | plt2 | plt3

Module on Adaptive Experimentation

Part I: Module Syllabus

Description

Learning Goals

Content

Formative Assessment

Part II: Self-Reading and Practicing Content (Before Class):

Let’s recap some definitions

WALKTHROUGH

How response-adaptive randomization works step-by-step:

LEARN MORE

Technical Note: Probability of Assignment (or Selection)

EXAMPLE

Full Example of an Adaptive Experiment

Experiment Design

Comparing Traditional and Adaptive Experiments

Results of the Traditional Experiment

Results of the Adaptive Experiment

Allocation Differences

DID I GET THIS?

Part II: Simulation Based Scenario Analysis

Example: Starbucks Marketing Strategy

Scenario 1:

Scenario 2:

Scenario 3:

Discussion:

Part I - Creating Probability of Selection Chart:

Part II - Success Rates Estimate of Each Condition

Part III - Posterior Distribution Plot:

Join Them Together (Horizontal Labeling)