Loading required package: exams
Module on Adaptive Experimentation
Part I: Module Syllabus
Description
This module introduces adaptive experiments as a human-centered approach to experimental design that dynamically adjusts participant allocation based on observed outcomes. Building on previous topics of exploratory data analysis, visualization, and machine learning fairness, it explores how to balance exploration and exploitation in real-world settings.
Learning Goals
Understanding (Basic)
Define key concepts: adaptive experiments, response-adaptive randomization, exploration vs exploitation
Explain how adaptive allocation probabilities change based on accumulated evidence
Application (Intermediate)
Compare traditional RCTs vs adaptive experiments for specific use cases
Apply adaptive experiment principles to educational technology scenarios
Analysis (Advanced)
Analyze stakeholder interests and ethical implications in different experimental contexts
Evaluate tradeoffs between statistical validity and participant outcomes using situational decision-making
Integration with Previous Topics
Visualization: Using plots and diagrams to understand experimental allocation over time
Human-Centered ML: Considering fairness and impact on participants when designing experiments
Ethics: Balancing research goals with participant benefits and potential risks
These goals incorporate connections to the previous topics of the Interactive Data Science course: data visualization (Unit 2), human-centered approaches (Unit 3), and ethical considerations (Unit 5) while introducing new concepts in experimental design.
Content
For the IDS course, the module is organized into two parts:
An online module in Open Learning Initiative platform, completed by students before the class
A classroom activity for the class with individual and group work
Formative Assessment
OLI module uses a combination of multiple-choice questions (with distractors designed to corresponding common misconceptions) and schema-building activities (mix and match, elaboration), supporting contextual understanding
As this is a graduate-level course, some questions are designed to push the understanding forward, nudging students to reflect on the material they just learned. This desired challenge is compensated by the non-graded nature of these questions and answer-specific feedback for every option, explaining to the student their misconception.
Classroom tasks are aimed to connect this understanding to other course topics, as well as ethical decision-making frameworks, aiming to structure the discussion around the applicability and trade-offs behind the research method choices.
Part II: Self-Reading and Practicing Content (Before Class):
Human-Centered Data Science gives us new ways to analyze and apply data. We can combine ideas from traditional experiments with machine learning methods to do an adaptive experiment.
In an adaptive experiment, we adjust our design — typically the allocation of participants to different conditions — based on what we learn from each interaction with participants at any given point. The goal is usually to balance exploration — testing various conditions to gather more information about their effectiveness with exploitation — using the algorithm’s knowledge so far to allocate participants to the currently better-performing conditions.
One example of balancing exploration and exploitation is finding good restaurants in a new city. At first, you explore different options, trying various places to gather information about what’s good. Once you’ve identified a standout spot, you start visiting it more often, but still occasionally try new places in case there’s an even better one you haven’t discovered yet.
One strategy to achieve balancing exploration and exploitation in practice is is response-adaptive randomization, where the probabilities of assigning participants to each condition are adjusted based on accumulated evidence. As more data from participants is gathered over time, the allocation approach gradually shifts away from an even split (e.g., 50/50) toward favoring conditions that perform better empirically based on the available evidence.
Question:
Could you suggest another example illustrating the idea of an adaptive experiment?
Thanks for Response! Another example to think of is to test the effectiveness of two type of vaccines.
Question:
Select a true statement about randomization.
Randomization always assumes equal probabilities of assignment to conditions:
Explanation of Options:
Option 1: False. Response-adaptive randomization typically begins with equal probabilities and adjusts them based on participant responses, not always starting with unequal probabilities.
Option 2: False. Traditional uniform randomization treats all conditions equally without a preference for exploration or exploitation.
Option 3: False. The goal of response-adaptive randomization is to allocate more participants to better-performing conditions, not necessarily to equalize group sizes.
Option 4: True. Randomization ensures that each condition has an equal chance of assignment unless specified otherwise, maintaining the integrity of the experimental results.
Let’s recap some definitions
Condition: A specific treatment, arm, or variation assigned to a participant during the experiment.
Reward: A measurable outcome or benefit that results from a participant’s interaction with one of the conditions. In the simplest case, the reward can be binary: a success or a failure, e.g., answering a test question correctly or not.
Response-adaptive randomization: A method used in experiments where the probability of assigning participants to different conditions changes over time based on observed results (e.g. reward).
Adaptive experiment is a type of randomized trial (for example, in areas like clinical trials, and educational research) in which we adjust our design — typically the allocation to different conditions — based on what we learn from interactions with participants. In this module, we will focus on adaptive experiments using statistical and machine learning methods to perform response-adaptive randomization.
Question:
Developing your example of an adaptive experiment from the task above, suggest conditions for your example. Feel free to discuss alternative ideas.
Thank you for your response! Example conditions could include two types of treatments such as:
- Vaccine A with a higher initial dosage.
- Vaccine B with a standard dosage and booster after one month.
These conditions would allow for comparing the effectiveness and adaptively assigning participants based on observed outcomes.
Question:
Developing your example of an adaptive experiment from the task above, suggest a reward for your example. Feel free to discuss alternative ideas.
Thank you for your response! An example reward could be whether an individual participant survives after receiving the treatment (mortality). Alternatively, it could measure improvements in the participant’s health, such as reduced symptom severity or time to recovery.
Question:
Which of the following statements about adaptive experiments is correct?
Adaptive experiments are not usually uniformly randomized:
Explanation of Options:
Option 1: False. Adaptive experiments often involve randomization, but the randomization is adjusted dynamically based on prior responses.
Option 2: True. Adaptive experiments typically deviate from uniform randomization, dynamically adjusting the allocation probabilities based on observed data to optimize outcomes.
Option 3: False. Adaptive experiments do not aim to ensure an even split of participants across conditions; instead, they aim to allocate participants based on performance or other criteria.
Option 4: False. The primary goal of adaptive experiments is to improve efficiency and outcomes, not to eliminate randomization entirely.
Question:
Which of the following statements about adaptive experiments is correct?
To update the probabilities of assigning the participants to different conditions, we have to observe rewards:
Explanation of Options:
Option 1: False. While observing rewards is necessary for updating probabilities, the observation does not always need to be immediate; delayed rewards can also be incorporated into adaptive randomization.
Option 2: False. Rewards in adaptive experiments can take various forms, such as continuous values, ordinal scales, or other metrics, not just binary success/failure.
Option 3: False. Adaptive experiments do not necessarily require a control condition; they can involve multiple experimental conditions with no distinct control group.
Option 4: True. Observing rewards is essential for updating the allocation probabilities in adaptive experiments, as the reward feedback informs how conditions are adjusted dynamically.
WALKTHROUGH
How response-adaptive randomization works step-by-step:
Let’s illustrate on a simple example, tracing adaptive experiment throughout three steps:
Question:
After Step 3, we can definitively conclude that Condition 1 is really better than Condition 2.
False:
Explanation of Options:
Option 1 (False): Correct. After Step 3, conclusions about the superiority of Condition 1 over Condition 2 cannot be definitively made. Adaptive experiments often require additional data or steps to confirm such claims with statistical confidence.
Option 2 (True): Incorrect. Conclusions about superiority require rigorous statistical evidence and may not be definitively established after just Step 3, as further steps or validation are typically needed.
Question:
After Step 3, our reward statistics were the following: - Condition 1 (S=2, F=0), probability of assignment 90% - Condition 2 (S=0, F=1), probability of assignment 10%
If on the next step (Step 4) we select Condition 2 for the next participant and get a positive reward, the reward statistics will be updated this way:
Condition 2 (S=1, F=1):
Explanation of Options:
Option 1: Incorrect. If a positive reward is observed for Condition 2, the success count (
S) should increase, not the failure count (F).Option 2: Correct. When a positive reward is observed for Condition 2, the statistics are updated to reflect one success (
S=1) and one failure (F=1).Option 3: Incorrect. This option mixes up Condition 1’s statistics, which should remain unchanged as it was not selected in Step 4.
Option 4: Incorrect. A positive reward does not affect the failure count (
F) of Condition 1 or Condition 2.
Question:
If on Step 4 we selected Condition 2 for the next participant and got a positive reward, after Step 4:
Condition 1 would more likely be selected than Condition 2:
Explanation of Options:
Option 1: Incorrect. Although Condition 2 received a positive reward, Condition 1 already has a stronger history of success, making it more likely to be selected.
Option 2: Correct. Condition 1, with its stronger historical reward statistics, remains the more likely choice for selection despite Condition 2’s positive reward in Step 4.
Option 3: Incorrect. The probabilities of selection are updated dynamically based on reward statistics, and the conditions are unlikely to have equal probabilities unless their performance metrics align perfectly.
Question:
If on Step 4 we selected Condition 2 for the next participant and got a positive reward, after Step 4:
We believe that Condition 2 would be equally likely to bring success or failure:
Explanation of Options:
Option 1: Incorrect. Condition 1’s likelihood of failure isn’t directly inferred from the performance of Condition 2 in this context.
Option 2: Correct. After a positive reward for Condition 2, we update our belief, and the posterior distribution for Condition 2 becomes centered around equal success and failure probabilities, assuming prior uniformity.
Option 3: Incorrect. There’s no reason to update Condition 1’s statistics or belief as it was not selected on Step 4.
Option 4: Incorrect. A positive reward for Condition 2 suggests that it is not more likely to fail on the next step, rather it maintains a balanced likelihood.
LEARN MORE
Technical Note: Probability of Assignment (or Selection)
In this introduction, we omit the technical details of how probability distributions for each condition are used to select a condition at each step of the algorithm. It is sufficient to understand that we can calculate the probability of assignment (or selection) for each condition at every step, reflecting our current understanding of the relative effectiveness of each condition. For a binary reward (e.g., success or failure), these probabilities are based on the observed numbers of successes and failures for each condition.
The probability of assignment reflects not only the average effectiveness of each condition but also our uncertainty about this effectiveness. Early in an adaptive experiment, when we have limited information or when the effectiveness data fluctuates significantly, the assignment probabilities account for this uncertainty by encouraging more exploration to better understand each condition.
We can observe changes in uncertainty by examining the proportion of allocation to conditions over time. In the graph below, the allocation to conditions remains constant during a traditional Randomized Controlled Trial (RCT) but varies in an adaptive experiment. During the early stages of the adaptive experiment, uncertainty about the effectiveness of the conditions is higher, leading to more reallocation and fluctuations. As the adaptive experiment progresses, our understanding improves, resulting in more stable allocation probabilities.
Question:
Increasing allocation proportion for Condition B on the graph for an adaptive experiment reflects:
Increasing exploitation of a better condition based on accumulated evidence:
Explanation of Options:
Option 1: Correct. The increasing allocation proportion for Condition B indicates that the algorithm is prioritizing this condition, exploiting it as evidence accumulates that it is the better-performing condition.
Option 2: Incorrect. Adaptive experiments do not aim to maintain equal allocation; instead, they adjust allocation to reflect evidence and favor better-performing conditions.
Option 3: Incorrect. An increasing allocation proportion does not reflect exploration; rather, it represents exploitation of the condition that has demonstrated better performance based on accumulated evidence.
Question:
Which statement about the probability of selection in adaptive experiments is correct?
The more relative successes we observe for the condition, the higher the probability of selection for this condition:
Explanation of Options:
Option 1: Incorrect. Selection probability is not solely determined by the number of participants in each condition; it depends on the observed successes and failures.
Option 2: Correct. Adaptive experiments dynamically adjust selection probabilities based on relative successes, increasing the likelihood of selecting better-performing conditions.
Option 3: Incorrect. Selection probabilities are not fixed before the experiment; they are updated as new data is observed.
Option 4: Incorrect. In adaptive experiments, selection probabilities change in response to observed rewards, reflecting the current understanding of condition performance.
Question:
What influences the probability of selection in adaptive experiments?
The probability of selection depends on our current knowledge about the effectiveness of every condition:
Explanation of Options:
Option 1: Correct. Adaptive experiments dynamically adjust selection probabilities based on the accumulated knowledge of each condition’s effectiveness.
Option 2: Incorrect. Initial randomization ratios may provide a starting point, but selection probabilities are updated throughout the experiment as new evidence is gathered.
Option 3: Incorrect. Selection probabilities are not determined solely by the number of participants; they also incorporate effectiveness metrics.
Option 4: Incorrect. Knowing the effectiveness of only the best condition is insufficient; selection probabilities require information about all conditions to balance exploration and exploitation.
Now, let us try to summarize how one version of an adaptive experiment algorithm might work:
Question:
Based on the walkthrough and the algorithm above, select the correct answer:
Using probabilities of selection instead of deterministic rules allows adaptive experiments to flexibly balance exploration and exploitation:
Explanation of Options:
Option 1: Incorrect. At the end of the experiment, conditions may still have varying probabilities of assignment, but no condition is guaranteed to have 100% probability.
Option 2: Correct. Adaptive experiments leverage probabilistic selection to balance the need to explore conditions with the need to exploit the best-performing ones.
Option 3: Incorrect. Adaptive experiments do not always select the condition with the highest probability; the selection is probabilistic and not deterministic.
Option 4: Incorrect. Observing a positive reward for the condition with the highest probability is not guaranteed, as rewards depend on real-world outcomes, which may vary.
So when do we stop the experiment?
Looking at the graph, we can ask ourselves: “When should we stop an adaptive experiment? What is the dividing line between ‘during’ and ‘after’?”
There are three possible answers:
Never.
If the adaptive experiment is going well, it can serve as an automated decision-making tool, balancing the exploitation of a better condition with the exploration of alternatives in proportion to uncertainty. In this scenario, it is even possible to introduce a new condition and let the algorithm learn how it performs while continuing to use the knowledge gathered from previous conditions.When the sample is complete.
Adaptive experiments are often used when we need to balance exploitation with exploration within a limited sample. At the end of the experiment, we can examine the final assignment probabilities and interpret them as estimates of the probability of superiority for each condition. These probabilities provide insights beyond just predicting the next step—they help identify which condition is actually better.Based on power analysis.
Similar to traditional Randomized Controlled Trials (RCTs), we can calculate the sample size required to achieve a target level of statistical power. After obtaining a sufficiently large sample, statistical analysis can be applied to the results of the adaptive experiment.
Each of these approaches has its limitations, and the choice of when to stop an adaptive experiment ultimately depends on the researcher’s judgment and the specific context of the study.
EXAMPLE
Full Example of an Adaptive Experiment
Now that we have some understanding of adaptive experiments, let’s examine a real-world-inspired example. This example will focus on analyzing results and comparing outcomes between a traditional uniformly-randomly assigned Randomized Controlled Trial (RCT) and an adaptive experiment.
Learning designer Yi wants to encourage student self-reflection after course activities. Her goal is to help students think critically about their responses and learning processes.
Experiment Design
Learning designer Yi selects a specific decision point in the course: a particular activity where students are encouraged to reflect on their responses. She defines two conditions for her experiment:
Condition 1: No self-explanation prompt is shown to the students.
Condition 2: A self-explanation prompt is provided, asking: Can you explain why you chose your answer?
Yi defines the correctness of the student’s answer to a multiple-choice question (related to the activity) as the reward for the experiment. This allows her to measure the impact of reflection prompts on learning outcomes.
Comparing Traditional and Adaptive Experiments
Imagine that we ran both a traditional uniformly-randomly assigned RCT and an adaptive experiment in parallel, using the same design, to compare their results.
Results of the Traditional Experiment
In the traditional experiment, there is a statistically significant difference between the mean rewards (where the mean can be interpreted as the share of positive rewards for each condition). The results are as follows:
Condition 2 (self-explanation prompt): M2=0.608 (SEM=0.032)
Condition 1 (no prompt): M1=0.512 (SEM=0.032)
Results of the Adaptive Experiment
In the adaptive experiment, the computed estimates were:
Condition 2 (self-explanation prompt): M2=0.599
Condition 1 (no prompt): M1=0.539
These results also favor the self-explanation prompt (Condition 2). However, as discussed earlier, the primary advantage of adaptive experiments lies in the allocation of participants to different conditions.
Allocation Differences
In the traditional experiment, as expected, participants were assigned almost equally to the two conditions. By contrast, in the adaptive experiment, the assignment proportions differed significantly, reflecting the algorithm’s ability to prioritize better-performing conditions (in this case, Condition 2).
Question:
What is the key benefit of adaptive experimentation compared to traditional Randomized Controlled Trials, as described in the example?
Adaptive experiments aim for more participants to be assigned to the condition with better outcomes as the experiment progresses:
Explanation of Options:
Option 1: Incorrect. Unlike traditional RCTs, adaptive experiments dynamically adjust participant allocation based on observed outcomes, rather than assigning participants equally.
Option 2: Correct. The key benefit of adaptive experimentation is the ability to allocate more participants to conditions with better outcomes, improving efficiency and relevance.
Option 3: Incorrect. Adaptive experiments may not always be faster or require fewer participants; their primary benefit lies in the adaptive allocation process.
Option 4: Incorrect. Adaptive experiments still require a control group to compare outcomes and do not bypass statistical requirements for significance.
DID I GET THIS?
Question:
How would you explain an idea of Adaptive Experiment to a child in 150 words or less?
Thanks for Response!
Imagine you’re trying to figure out the best flavor of ice cream for your birthday party, but you don’t know which one everyone likes the most. Instead of giving everyone the same flavor, you start by letting a few people try chocolate and a few people try vanilla. After they taste it, you see which one people like better. If more people like chocolate, you give more chocolate to the next group of friends to try, but you still let some people try vanilla, just in case their opinion changes things.
Over time, you keep adjusting how much chocolate or vanilla you give out based on what people like. By the end, you’ll know which flavor is the best for your party. That’s how an Adaptive Experiment works—it learns as it goes and tries to do better each step!
Question:
If Condition 1 has reward statistics ( S = 2, F = 10 ), and Condition 2 has ( S = 10, F = 2 ), on the next step we:
Are more likely to select Condition 2:
Explanation of Options:
Option 1: Correct. Condition 2 has a significantly higher success-to-failure ratio (( S = 10, F = 2 )) compared to Condition 1 (( S = 2, F = 10 )), making it more likely to be selected in the next step.
Option 2: Incorrect. While Condition 1 is less successful, adaptive experiments use probabilities for selection, and no condition is guaranteed to be selected on every step.
Option 3: Incorrect. Conditions are not equally likely to be selected, as the selection probabilities depend on observed reward statistics.
Option 4: Incorrect. Condition 1 has a lower reward ratio compared to Condition 2, so it is less likely to be selected.
Option 5: Incorrect. Condition 2 is more likely to be selected, but it is not guaranteed to be chosen every time.
The following table displays the results of the experiments:
| Experiment | Condition | Successes | Failures |
|---|---|---|---|
| Experiment A | Condition 1 | 4 | 6 |
| Condition 2 | 6 | 4 | |
| Experiment B | Condition 1 | 8 | 12 |
| Condition 2 | 12 | 8 |
Question:
Compare two experiments based on the table above. In both Experiment A and Experiment B, the success rate for Condition 1 is the same (40%). In both Experiment A and Experiment B, the success rate for Condition 2 is also the same (60%).
We know more about the effectiveness of every condition in Experiment B than in Experiment A:
Explanation of Options:
Option 1: Incorrect. While Experiment A has fewer participants, the proportional success rates do not provide higher certainty compared to Experiment B.
Option 2: Correct. Experiment B has a larger sample size, reducing variability and providing more information about the effectiveness of each condition.
Option 3: Incorrect. Although the success rates are identical, the amount of information differs due to the sample sizes.
Question:
Compare two experiments based on the table above. In both Experiment A and Experiment B, the success rate for Condition 1 is the same (40%). In both Experiment A and Experiment B, the success rate for Condition 2 is also the same (60%).
We are more likely to select Condition 1 in Experiment A than in Experiment B:
Explanation of Options:
Option 1: Correct. Experiment A has a smaller sample size, resulting in a posterior distribution with greater variability. This increases the likelihood of selecting Condition 1 compared to Experiment B, where the larger sample size reduces posterior variability and skews the selection toward the true success rates.
Option 2: Incorrect. In Experiment B, the posterior is more concentrated due to the larger sample size, which reduces the likelihood of selecting Condition 1.
Option 3: Incorrect. Despite identical success rates, the difference in posterior variability makes the likelihood of selecting Condition 1 different between the experiments.
Part II: Simulation Based Scenario Analysis
This section is prepared to help you dynamically understand the changes in the concepts we have just covered in the Adaptive Experimentation module. We will further review, using a real-life example, how these concepts are changing and performing throughout data collection. Recall the following definition of adaptive experimentation:
Condition: A specific treatment, arm, or variation assigned to a participant during the experiment.
Reward: A measurable outcome or benefit that results from a participant’s interaction with one of the conditions. In the simplest case, the reward can be binary: a success or a failure, e.g., answering a test question correctly or not.
Probability of Selection: A likelihood that reflects not only the average effectiveness of each condition but also our level of uncertainty about this effectiveness. If we are early in an adaptive experiment and have limited information, or if the data we get on effectiveness of the conditions fluctuates significantly, the assignment probabilities will aim to account for this, encouraging more exploration to better understand each condition.
Posterior Probability: A distribution that represents the updated probability of a hypothesis or parameter after observing new data. It combines prior beliefs with the likelihood of the observed data to provide a refined estimate. Consider an experiment testing a new website feature that either leads to a purchase (success) or not (failure). Initially, you believe there’s a 50% chance the feature will result in a purchase. After showing the feature to 10 users and observing 7 successes, you update your belief. The posterior distribution now reflects a higher probability that the feature is effective, guiding future decisions like increasing its rollout.
Example: Starbucks Marketing Strategy
As a newly hired marketing strategist at Starbucks, you are tasked with finding the optimal way to advertise discounts to customers to encourage purchases while maintaining profit margins. In this context, you have designed the following three discount advertising conditions. The marketing team has decided to run a response-adaptive experiment to further validate which condition works best:
Condition 1: The customer receives a fixed $5 off their next purchase over $15.
Condition 2: The customer receives 40% off their next purchase of any amount.
Condition 3: The customer can buy one coffee and get complimentary snacks or breakfast on their next purchase.
Question:
What is the reward variable in your experiment design?
The reward variable is whether or not the customer make a purchase.
Question:
Is this reward a Binary reward?
Yes it is! It is either the customer make a purchase (1) or not make a purchase (0).
Scenario 1:
Since you don’t know which condition works best (typically, you could obtain some prior information by examining historical data), you assume an equal start for all three conditions.
In the illustrative chart below, you observe that all three conditions have an equal probability of selection. There are no reports of successes or failures in the middle chart because you have not received any data from participants yet. Consequently, the posterior distribution is flat, or in other words, a Uniform(0,1) distribution, as you do not assume any prior information about the likelihood of success for each condition. In this case, a success is defined as a customer making a purchase, which is a binary reward.
Scenario 2:
Moving on to the end of Day 1, you have collected the following data:
- Out of 11 customers who were assigned Condition 1 (fixed $5 off), 9 customers made a purchase.
- Out of 4 customers who were assigned Condition 2 (40% off), 2 customers made a purchase.
- Out of 5 customers who were assigned Condition 3 (complimentary food), 3 customers made a purchase.
Looking at the illustrative charts below, we see that Condition 1 currently has a very high purchase rate, giving it the largest probability of selection (77%). Condition 3 still performs slightly better than Condition 2, resulting in a higher probability of selection compared to Condition 2. However, these are early estimates and reports. As shown in the middle chart, the variability of the estimated success rates for Conditions 2 and 3 remains very large. While Condition 1 has been more explored, the posterior distribution updates in the right chart indicate that all conditions are still in the learning stage. This means that the probabilities of selection are subject to change as more data are collected. Let’s illustrate this in Scenario 3 below.
Scenario 3:
After a week of data collection, you have collected the following data:
- Out of 70 customers who were assigned Condition 1 (fixed $5 off), 40 customers made a purchase.
- Out of 142 customers who were assigned Condition 2 (40% off), 102 customers made a purchase.
- Out of 55 customers who were assigned Condition 3 (complimentary food), 30 customers made a purchase.
Sit down briefly and think what happened, and try to answer the following questions on your own, and bring this to the classroom discussion we have on Thursday:
Discussion:
Looking at the left chart, why does Condition 2 have a significantly higher probability of selection?
If you recall from the data we collected on Day 1, Condition 2 had a very low probability of selection. How was it subsequently explored so many times?
Looking at the middle chart, where the blue bars represent one standard deviation from the mean, do you think we have enough statistical confidence to conclude that Condition 2 is the best condition
- What about concluding that Condition 1 is better than Condition 3?
Looking at the right chart, could you highlight the area that represents the likelihood of Condition 2 being selected?
- What about the area for Condition 1?
The following link provides you with an interactive board to run analysis charts with any given number of successes and failures. Please try accumulating successes and failures in small steps.
- Could you simulate and illustrate what happens as the model transitions from Scenario 2 to Scenario 3?”
Part I - Creating Probability of Selection Chart:
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(scales)
library(stats)
library(ggplot2)
library(RColorBrewer)
library(tidyr)
library(patchwork)
# helper # 1: function to randomly initiate arm state
# input is num_arm -> number of arms setting up
# output is list of length 'num_arm', where each element of the list is a 2*1 vector
# first element represent number of successes 'observed' so far
# second element represent number of failures 'observed' so far
arm_generator <- function(num_arm) {
replicate(num_arm, sample(1:10, 2, replace = TRUE), simplify = FALSE)
}
# helper # 2:function to estimate selection probability:
estimate_ap <- function(arms, num_sim = 5000) {
K <- length(arms)
arm_sim <- sapply(1:K, function(k) rbeta(num_sim, arms[[k]][1] + 1, arms[[k]][2] + 1))
max_indices <- max.col(arm_sim, ties.method = "first")
count <- tabulate(max_indices, nbins = K)
ap <- count/num_sim
return(ap)
}set.seed(721)
num_arm <- 3
arms <- arm_generator(num_arm)
# Or one can set up their arms manually (please uncomment below):
arms <- list(c(1,2),
c(1,1),
c(2,3))
# arms <- list(c(9,2),
# c(2,2),
# c(3,2))
print(arms)[[1]]
[1] 1 2
[[2]]
[1] 1 1
[[3]]
[1] 2 3
ap <- estimate_ap(arms)
# Create labels based on the number of probabilities
labels <- paste0("Condition ", seq_along(ap))
# Create a data frame
df <- data.frame(
condition = labels,
prob = ap
)
plt1 <- ggplot(df, aes(x = 2, y = prob, fill = condition)) + # Set x to a constant (e.g., 2)
geom_bar(stat = "identity", width = 1, color = "white") + # White borders for separation
coord_polar(theta = "y", start = 0) + # Polar coordinates for pie
xlim(0.5, 2.5) + # Adjust x limits to create the hole
geom_text(
aes(label = paste0(round(prob * 100), "%")),
position = position_stack(vjust = 0.5), # Center labels within slices
color = "black",
size = 5
) +
ggtitle("Probabilities of Selection") + # Chart title
theme_minimal() +
theme_void(base_size = 14) +
theme(legend.position = "none") +
scale_fill_brewer(palette = "Pastel1") # Consistent color palettePart II - Success Rates Estimate of Each Condition
df <- data.frame(
condition = paste0("Condition ", seq_along(arms)),
successes = sapply(arms, `[`, 1),
failures = sapply(arms, `[`, 2)
)
df <- df %>%
mutate(
total = successes + failures,
p = successes / total, # mean success rate
se = sqrt(p * (1 - p) / total), # standard error for p
lower = p - 1.96 * se, # lower CI (normal approximation)
upper = p + 1.96 * se # upper CI (normal approximation)
)
# Ensure CI doesn't go below 0 or above 1
df$lower <- pmax(df$lower, 0)
df$upper <- pmin(df$upper, 1)
# Create a pretty bar plot
plt2 <- ggplot(df, aes(x = condition, y = p, fill = condition)) +
geom_col(width = 0.6, color = "white") + # White borders for a cleaner look
geom_errorbar(
aes(ymin = lower, ymax = upper),
width = 0.2,
color = "black"
) +
scale_y_continuous(labels = percent_format(accuracy = 1)) +
scale_fill_brewer(palette = "Pastel1") + # Use a pastel color palette
ggtitle("Mean Success Rate by Condition")+
labs(
x = "Condition",
y = "Success Rate"
) +
theme_minimal(base_size = 14) +
theme(legend.position = "none")Part III - Posterior Distribution Plot:
# Create a data frame with condition labels and Beta shape parameters
df <- data.frame(
condition = paste0("Condition ", seq_along(arms)),
alpha = sapply(arms, function(x) x[1] + 1),
beta = sapply(arms, function(x) x[2] + 1)
)
# Generate a sequence of x-values from 0 to 1
x_vals <- seq(0, 1, length.out = 200)
# For each condition, compute the Beta PDF at each x
plot_data <- df %>%
rowwise() %>%
mutate(
x = list(x_vals),
y = list(dbeta(x_vals, alpha, beta))
) %>%
unnest(cols = c(x, y))
# Plot each Beta distribution as a curve
plt3 <- ggplot(plot_data, aes(x = x, y = y, color = condition)) +
geom_line(size = 1) +
# Optionally use a pastel palette to match prior visuals
scale_color_brewer(palette = "Pastel1") +
labs(
title = "Posterior Distribution Update",
x = "Success Rate",
y = "Density",
color = "Condition"
) +
theme_minimal(base_size = 14)Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
Join Them Together (Horizontal Labeling)
# Combine them horizontally using patchwork
combined_plot <- plt1 | plt2 | plt3