A/B Testing & Casual Inference

Εισαγωγή

Στην παρούσα εργασία θα αναλύσουμε την αποτελεσματικότητα της διαφημιστικής καμπάνιας για μια fintech startup. Θα χρησιμοποιήσουμε για να διαπιστώσουμε αν η νέα διαφήμιση αυξάνει πραγματικά τις μετατροπές (conversions).

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.1     ✔ readr     2.2.0
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.3     ✔ tibble    3.3.1
## ✔ lubridate 1.9.5     ✔ tidyr     1.3.2
## ✔ purrr     1.2.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(pwr)    
library(broom)   
library(scales)

## 
## Attaching package: 'scales'
## 
## The following object is masked from 'package:purrr':
## 
##     discard
## 
## The following object is masked from 'package:readr':
## 
##     col_factor

library(janitor)

## 
## Attaching package: 'janitor'
## 
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

library(MatchIt)   

set.seed(94)


ads <- read_csv("marketing_AB.csv") |>
  janitor::clean_names() |>
  mutate(
    group     = factor(test_group, levels = c("psa", "ad")),
    converted = as.integer(converted)
  )

## New names:
## Rows: 588101 Columns: 7
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (2): test group, most ads day dbl (4): ...1, user id, total ads, most ads hour
## lgl (1): converted
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`

glimpse(ads)

## Rows: 588,101
## Columns: 8
## $ x1            <dbl> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16…
## $ user_id       <dbl> 1069124, 1119715, 1144181, 1435133, 1015700, 1137664, 11…
## $ test_group    <chr> "ad", "ad", "ad", "ad", "ad", "ad", "ad", "ad", "ad", "a…
## $ converted     <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,…
## $ total_ads     <dbl> 130, 93, 21, 355, 276, 734, 264, 17, 21, 142, 209, 47, 6…
## $ most_ads_day  <chr> "Monday", "Tuesday", "Tuesday", "Tuesday", "Friday", "Sa…
## $ most_ads_hour <dbl> 20, 22, 18, 10, 14, 10, 13, 18, 19, 14, 11, 13, 20, 13, …
## $ group         <fct> ad, ad, ad, ad, ad, ad, ad, ad, ad, ad, ad, ad, ad, ad, …

Μέρος Α - Προσωμοίωση Πειράματος

Σε αυτό το στάδιο προσομοιώνουμε τη συμπεριφορά των χρησών για να ελέγξουμε τη στατιστική μας μεθοδολογία με γνωστά εκ προτέρων δεδομένα.

n_control <- 8000
n_treatment <- 8000
p_control <- 0.08
p_treatment <- 0.10

# Α1 
experiment <- tibble(
  user_id = 1:(n_control + n_treatment),
  group = c(rep("control", n_control), rep("treatment", n_treatment)),
  clicked = c(rbinom(n_control, 1, p_control), 
              rbinom(n_treatment, 1, p_treatment))
)

# Α2 
summary_stats <- experiment %>%
  group_by(group) %>%
  summarise(
    n = n(),
    clicks = sum(clicked),
    ctp = mean(clicked),
    se = sqrt(ctp * (1 - ctp) / n),
    ci_lower = ctp - 1.96 * se,
    ci_upper = ctp + 1.96 * se
  )

print(summary_stats)

## # A tibble: 2 × 7
##   group         n clicks    ctp      se ci_lower ci_upper
##   <chr>     <int>  <int>  <dbl>   <dbl>    <dbl>    <dbl>
## 1 control    8000    675 0.0844 0.00311   0.0783   0.0905
## 2 treatment  8000    819 0.102  0.00339   0.0957   0.109

Παρατήρηση: Η ομάδα treatment παρουσιάζει αύξηση στο CTP. Τα διαστήματα εμπιστοσύνης δεν επικαλύπτονται, γεγονός που δηλώνει ότι η διαφορά είναι στατιστικά σημαντική πριν καν προχωρήσουμε σε ελέφχους υποθέσεων.

Οπτικοποίηση δεδομένων

  ggplot(summary_stats, aes(x = group , y = ctp , fill = group)) + geom_col(width = 0.6 , alpha = 0.8)+ geom_errorbar(aes(ymin = ci_lower, ymax = ci_upper), width = 0.1)+ scale_y_continuous(labels = percent)+ labs(title = "Click-Through Probability ανά ομάδα", subtitle = "Προσομοίωση με n=8000", y = "CTP (%)" , x = "Ομάδα") + theme_minimal()

# Α4

test_result<- prop.test(
  x = c(summary_stats$clicks[1], summary_stats$clicks[2]),
  n = c(summary_stats$n[1], summary_stats$n[2]),
  correct  = FALSE 
)

tidy(test_result)

## # A tibble: 1 × 9
##   estimate1 estimate2 statistic   p.value parameter conf.low conf.high method   
##       <dbl>     <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl> <chr>    
## 1    0.0844     0.102      15.3 0.0000913         1  -0.0270  -0.00899 2-sample…
## # ℹ 1 more variable: alternative <chr>

# A5 

p_pool <- sum(summary_stats$clicks) / sum(summary_stats$n)

se_pool <- sqrt(p_pool * (1-p_pool) * (1/n_control + 1/n_treatment))

delta <- summary_stats$ctp[2] - summary_stats$ctp[1]
cat("Διαφορά (DELTA) : " , delta, "\n")

## Διαφορά (DELTA) :  0.018

cat("Manual 95% CI :[", delta - 1.96 * se_pool, ",", delta + 1.96 * se_pool, "]\n")

## Manual 95% CI :[ 0.008983142 , 0.02701686 ]

# Α6 

h_effect <- ES.h(p1 = 0.10, p2 = 0.08)

pwr_calc_sim <- pwr.2p.test(
  h = h_effect, 
  sig.level = 0.05, 
  power = 0.80, 
  alternative = "two.sided"
)

print(pwr_calc_sim)

## 
##      Difference of proportion power calculation for binomial distribution (arcsine transformation) 
## 
##               h = 0.069988
##               n = 3204.715
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: same sample sizes

n_req <- ceiling(pwr_calc_sim$n)

Το αποτέλεσμα του prop.test και της χειρωνακτικής επαλήθευσης ταυτίζονται. Το p-value είναι εξαιρετικά μικρό, άρα απορρίπτουμε τη μηδενική υπόθεση.

Μέρος Β - Πραγματικά δεδομένα (Kaggle Marketing AB)

Έλεγχος κατανομής των χρηστών ανά ημέρα

ads %>%
  count(group, most_ads_day) %>%
  group_by(group) %>%
  mutate(pct = n / sum(n)) %>%
  ggplot(aes(x = most_ads_day, y = pct, fill = group)) +
  geom_col(position = "dodge") +
  scale_y_continuous(labels = percent) +
  labs(title = "Κατανομή Εμφάνισης Διαφημίσεων ανά Ημέρα",
       subtitle = "Έλεγχος Τυχαιοποίησης") +
  theme_minimal()

Υπολογισμοί

ads_summary <- ads %>%
  group_by(group) %>%
  summarize(n = n(), conv = sum(converted), rate=mean(converted))


prop.test(ads_summary$conv, ads_summary$n) %>% tidy()

## # A tibble: 1 × 9
##   estimate1 estimate2 statistic  p.value parameter conf.low conf.high method    
##       <dbl>     <dbl>     <dbl>    <dbl>     <dbl>    <dbl>     <dbl> <chr>     
## 1    0.0179    0.0255      54.0 2.00e-13         1 -0.00946  -0.00593 2-sample …
## # ℹ 1 more variable: alternative <chr>

h_effect <- ES.h(p1 = ads_summary$rate[1], p2 = ads_summary$rate[2])
pwr_calc <- pwr.2p.test(h = h_effect, sig.level = 0.05, power = 0.8)
pwr_calc

## 
##      Difference of proportion power calculation for binomial distribution (arcsine transformation) 
## 
##               h = 0.05300258
##               n = 5587.823
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: same sample sizes

Το σύστημα έτρεξε πολύ μεγαλύτερο δείγμα από το απαιτούμενο (n_required = r round(pwr_calc$n ανά μονάδα). Η διαφορά είναι στατιστικά σημαντική, υποδεικνύοντας ότι η καμπάνια “ad” είναι αποτελεσματική .

conversion rate ανά ημέρα

daily_cr <- ads %>%
  group_by(most_ads_day, group) %>%
  summarize(conv_rate = mean(converted), .groups = 'drop')


days_order <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")
daily_cr$most_ads_day <- factor(daily_cr$most_ads_day, levels= days_order)

ggplot(daily_cr, aes(x = most_ads_day, y = conv_rate, color = group, group = group)) +
  geom_line(linewidth = 1) +
  geom_point(size = 2) +
  scale_y_continuous(labels = percent) +
  labs(title = "Conversion Rate ανά Ημέρα και Ομάδα",
       subtitle = "Σύγκριση Ad vs PSA (Control)",
       x = "Ημέρα (Most Ads Day)",
       y = "Conversion Rate (%)",
       color = "Ομάδα") +
  theme_minimal()

daily_diff <- daily_cr %>%
  pivot_wider(names_from = group, values_from = conv_rate) %>%
  mutate(diff = ad- psa) %>%
  arrange(desc(diff))

print(daily_diff)

## # A tibble: 7 × 4
##   most_ads_day    psa     ad    diff
##   <fct>         <dbl>  <dbl>   <dbl>
## 1 Tuesday      0.0144 0.0304 0.0160 
## 2 Monday       0.0226 0.0332 0.0107 
## 3 Wednesday    0.0158 0.0254 0.00960
## 4 Saturday     0.0140 0.0213 0.00731
## 5 Friday       0.0163 0.0225 0.00616
## 6 Sunday       0.0206 0.0246 0.00402
## 7 Thursday     0.0202 0.0216 0.00141

Το γράφιμα δείχνει ότι η ομάδα “ad” υπερτερεί σταθερά της “psa” σχεδόν σε όλες τις ημέρες . Ώστοσο, παρατηρείται ότι η Δευτέρα παρουσιάζει τη μεγαλύτερη απόλυτη διαφορά στο conversion rate

Casual Inference

Εξετάζουμε την αιτιότητα όταν υπάρχουν συγχυτικοί παράγοντες .

Έλεγχος επιχειρηματικής ουσίας

business_threshold <- 0.005

p_ad <- ads_summary$rate[ads_summary$group == "ad"]
p_psa<- ads_summary$rate[ads_summary$group == "psa"]

absolute_lift <- p_ad - p_psa
relative_lift <- absolute_lift / p_psa

test_kaggle <- prop.test(ads_summary$conv, ads_summary$n, correct = FALSE)
ci_lower <- test_kaggle$conf.int[1]
ci_upper <- test_kaggle$conf.int[2]

cat("Absoloute lift : ", round(absolute_lift * 100, 2), "%\n")

## Absoloute lift :  0.77 %

cat("Relative lift : " , round(relative_lift * 100, 2), "%\n")

## Relative lift :  43.09 %

cat("95% Διάστημα εμπιστοσύνης : [" , round(ci_lower * 100 , 2), "% , ", round(ci_upper * 100 , 2), "%] \n")

## 95% Διάστημα εμπιστοσύνης : [ -0.94 % ,  -0.6 %]

cat("Υπερβαίνει το κατώφλι (0.5%) το χειρότερο σενάριο (Lower CI)" , ci_lower > business_threshold, "\n")

## Υπερβαίνει το κατώφλι (0.5%) το χειρότερο σενάριο (Lower CI) FALSE

Τελικό συμπέρασμα

Τα πραγματικά δεδομένα δείχνουν ότι η νέα καμπάνια πέτυχε σχεδόν 43% περισσότερες μετατροπές αναλογικά με το baseline . Επειδή το διάστημα εμπιστοσύνης βρίσκεται πάνω από το προκαθορισμένο όριο του 0.5% (καθώς στο χειρότερο δυνατό σενάριο έχουμε βελτίωση 0.6%), η νέα διαφημιστική καμπάνια κρίνεται απόλυτα πετυχημένη .