library(tidyverse)
library(pwr)
library(ggplot2)
<- read.csv("https://raw.githubusercontent.com/Aranaur/aranaur.rbind.io/main/datasets/ab_experiment/ab_experiment.csv")
data
data
Lab 04
Exercise 1
\(H_0:\) Delivery time does not changed after the introduction of the new system.
\(H_1:\) Delivery time decreased after the introduction of the new system.
Exercise 2
ggplot(data, aes(x = delivery_time, fill = experiment_group)) +
geom_histogram(aes(y = ..density..),
alpha = 0.6,
position = "identity",
bins = 30,
color = "white") +
scale_fill_manual(values = c("control" = "blue", "test" = "red")) +
scale_color_manual(values = c("control" = "blue", "test" = "red")) +
labs(title = "Delivery Time Distribution: Test vs Control",
x = "Delivery Time (minutes)",
y = "Probability Density") +
theme_minimal()
Exercise 3
<- data %>%
control_number filter(experiment_group == "control") %>%
nrow()
<- data %>%
test_number filter(experiment_group == "test") %>%
nrow()
cat("Control number:", control_number, "Test number:", test_number)
Control number: 10092 Test number: 10104
Exercise 4
<- data %>%
avg_time group_by(experiment_group) %>%
summarise(avg_time = mean(delivery_time))
avg_time
Exercise 5
<- data %>%
sd_time group_by(experiment_group) %>%
summarize(
sd_delivery_time = sd(delivery_time, na.rm = TRUE),
.groups = "drop"
)
sd_time
Exercise 6
t.test(data$delivery_time ~ data$experiment_group, var.equal = TRUE)
Two Sample t-test
data: data$delivery_time by data$experiment_group
t = 43.036, df = 20194, p-value < 2.2e-16
alternative hypothesis: true difference in means between group control and group test is not equal to 0
95 percent confidence interval:
5.744183 6.292393
sample estimates:
mean in group control mean in group test
45.06510 39.04681
Exercise 7
The output already provides the 95% confidence interval for the difference [5.7, 6.3]
- We are 95% confident that the true difference in average delivery times between the test and control groups lies between 5.7 and 6.3 minutes.
- Since the entire confidence interval is greater than 0, it confirms that the test group has a lower average delivery time than the control group.
Exercise 8
<- abs(avg_time$avg_time[1] - avg_time$avg_time[2])
observed_mean_diff <- sqrt((sd_time$sd_delivery_time[1]^2 + sd_time$sd_delivery_time[2]^2) / 2)
pooled_sd <- observed_mean_diff / pooled_sd
effect_size
<- control_number
sample_size_control <- test_number
sample_size_test <- min(sample_size_control, sample_size_test)
sample_size
<- pwr.t.test(
power_result n = sample_size, d = effect_size, sig.level = 0.05,
type = "two.sample", alternative = "two.sided"
)
$power power_result
[1] 1
Exercise 9
The A/B test results provide strong evidence for the effectiveness of the new delivery algorithm:
The average delivery time for the control group is 45.0651011 minutes, while for the test group, it is 39.0468131 minutes. This represents a reduction of 6.02 minutes in the test group.
The standard deviation for delivery times is nearly identical between the groups:
- Control group: 9.990017
- Test group: 9.8833084 This indicates similar variability in both groups.
The t-test confirms that the difference in means is statistically significant (p-value < 0.05) with a 95% confidence interval of [5.7, 6.3] minutes.
The power of the test is 1, demonstrating high sensitivity and reliability in detecting the observed difference.
We can reject the null hypothesis that “there is no difference in the mean delivery time between couriers using the old algorithm and those using the new algorithm.”