Codeflow hidden below
# eval = FALSE (Show code, but don't run, no output)
# echo = FALSE (Don't show code, but run, show output)
# results = "hide" (Show code, and run, but don't show output)
# Load Libraries
library(tidyverse)
library(tidyquant)
library(plotly)
library(readxl)
library(GGally)
library(timetk)
library(resample)
library(mosaic)
# 1. Loading data sources ----
promo_referrals <- read_csv("promo_referrals.csv")
promo_participants <- read_csv("referral_promo_participants.csv")
# 2. Joining data sources ----
joined_tbl <- promo_referrals %>%
left_join(promo_participants, by = c("sender_user_id" = "user_id"))
During initial EDA, I have identified some bucket_timestamps dated later than receiver_account_timestamp and receiver_policy_timestamp.
Bucket_timestamps are labelled when the sender purchased the policy which initiates the A/B testing experiment and the promotional period (will start after time delta from bucket_timestamp (for example, 2 days)). This date must take place before the receiver creates an account and receiver purchasing a policy.
6 cases were identified where bucket_timestamp was later than receiver_policy_timestamp
(bucket_timestamp < receiver_policy_timestamp)
# 2.1 Error handling ----
# Initial EDA identified some bucket timestamps dated later than account creation and receiver policy dates
# Setting up flags to identify aforementioned cases
joined_tbl_2 <- joined_tbl %>%
# Error handling: Account creation was dated before bucket timestamp
mutate(exceed_account_date = case_when(bucket_timestamp > receiver_account_timestamp ~ 1,
TRUE ~ 0),
# Error handling: Receiver Policy was dated before bucket timestamp
exceed_policy_date = case_when(bucket_timestamp > receiver_policy_timestamp ~ 1,
TRUE ~ 0))
joined_tbl_2 %>%
filter(exceed_policy_date == 1) %>%
select(bucket, bucket_timestamp, 7) %>% head()
200 cases were identified where bucket_timestamp was later than receiver_account_timestamp
(bucket_timestamp < receiver_account_timestamp)
joined_tbl_2 %>%
filter(exceed_account_date == 1) %>%
select(bucket, bucket_timestamp, 3) %>% head()
Key Takeaway: Due to the date discrepancy error, I have removed the 200 samples
What is the fairest way to compare the four buckets?
The fairest way to compare the four buckets is to assess two measures:
1. Conversion rates
Conversion to a quote (# of quotes / # of accounts)
Conversion to a policy (# of policies / # of accounts)
conversion_rates <- joined_tbl_2 %>%
filter(exceed_account_date == 0) %>%
mutate(bucket = as_factor(bucket)) %>%
group_by(bucket) %>%
summarize(account_created = n(),
quote_issued = sum(receiver_quote),
policy_purchased = sum(receiver_policy),
conversion_rate_to_quote = quote_issued/account_created,
conversion_rate_to_policy = policy_purchased/account_created) %>%
arrange(., desc(conversion_rate_to_policy), desc(conversion_rate_to_quote))
conversion_rates %>%
mutate(conversion_rate_to_quote = scales::percent(quote_issued/account_created, accuracy = 0.01),
conversion_rate_to_policy = scales::percent(policy_purchased/account_created, accuracy = 0.01)) %>%
knitr::kable()
bucket | account_created | quote_issued | policy_purchased | conversion_rate_to_quote | conversion_rate_to_policy |
---|---|---|---|---|---|
0hr | 296 | 105 | 32 | 35.47% | 10.81% |
48hr | 272 | 78 | 26 | 28.68% | 9.56% |
off | 192 | 65 | 18 | 33.85% | 9.38% |
168hr | 245 | 83 | 20 | 33.88% | 8.16% |
2. Average time from bucket timestamp to policy purchase
Bucket timestamp to account setup
Account setup to qualifying for quote
Qualifying for quote to policy purchase
*Note, the sum of the first three timeframes will not tie to the bucket timestamp to policy purchase. The bucket to policy purchase is measuring specifically contracts that have completed all the conversion funnels which is a subset of bucket timestamp to account setup.
# 3.2 Average time to open account
# bucket timestamp to opening an account to qualifying for quote to purchasing policy
avg_time_tbl <- joined_tbl_2 %>%
filter(exceed_account_date == 0) %>%
select(bucket,bucket_timestamp,3,5,7) %>%
mutate(bucket_to_account_time = difftime(receiver_account_timestamp, bucket_timestamp, units = c("days")),
account_to_quote_time = difftime(receiver_quote_timestamp, receiver_account_timestamp, units = c("days")),
quote_to_policy_time = difftime(receiver_policy_timestamp, receiver_quote_timestamp, units = c("days")),
bucket_to_policy_time = difftime(receiver_policy_timestamp, bucket_timestamp, units = c("days")),
account_to_policy_time= difftime(receiver_policy_timestamp, receiver_account_timestamp, units = c("days"))) %>%
group_by(bucket) %>%
summarize(bucket_to_account_time = mean(bucket_to_account_time, na.rm = TRUE),
account_to_quote_time = mean(account_to_quote_time, na.rm = TRUE),
quote_to_policy_time = mean(quote_to_policy_time, na.rm = TRUE),
bucket_to_policy_time = mean(bucket_to_policy_time, na.rm = TRUE))
avg_time_tbl %>%
knitr::kable()
bucket | bucket_to_account_time | account_to_quote_time | quote_to_policy_time | bucket_to_policy_time |
---|---|---|---|---|
0hr | 10.73398 days | 20.44656 days | 5.824777 days | 31.60464 days |
168hr | 13.42509 days | 22.09691 days | 3.528773 days | 36.08054 days |
48hr | 12.08793 days | 22.04081 days | 5.831035 days | 38.17651 days |
off | 11.57877 days | 20.01060 days | 3.852240 days | 34.75762 days |
To validate the fairness of the test, I have reviewed the distributions of each category and variant. Reviewing the policy conversion rates (in green), we can confirm that there is no monthly factor that affects the conversion. (ie. more people who had their policy converted are unfairly allocated to a certain variant)
# , out.height="800px", out.width="100%"
# 9. Cost per day ----
by_day_tbl_1 <- joined_tbl_2 %>%
# Eliminate the error samples
filter(exceed_account_date == 0) %>%
# Replace NAs with 0 for summation
mutate_at(vars(sender_earned_amount_in_dollars, receiver_earned_amount_in_dollars), ~replace(.,is.na(.),0)) %>%
group_by(bucket) %>%
summarise_by_time(.date_var = bucket_timestamp,
.by = "day",
across(.cols = c(receiver_quote, receiver_policy, sender_earned_amount_in_dollars, receiver_earned_amount_in_dollars), #across applies a function across all functions
.fns = sum),
.type = "ceiling") %>% # round takes the midway point of the date unit
mutate(total_cost = sender_earned_amount_in_dollars + receiver_earned_amount_in_dollars)
# mutate(bucket_timestamp = bucket_timestamp %-time% "1 day")
by_day_tbl_2 <- joined_tbl_2 %>%
# Eliminate the error samples
filter(exceed_account_date == 0) %>%
# Replace NAs with 0 for summation
mutate_at(vars(sender_earned_amount_in_dollars, receiver_earned_amount_in_dollars), ~replace(.,is.na(.),0)) %>%
group_by(bucket) %>%
summarise_by_time(.date_var = bucket_timestamp,
.by = "day",
value = n(),
.type = "ceiling")
# mutate(bucket_timestamp = bucket_timestamp %-time% "1 day")
by_day_tbl <- by_day_tbl_1 %>%
left_join(by_day_tbl_2)
time_series_tbl <- by_day_tbl %>%
mutate(accounts_created = value) %>%
select(1:4, 9) %>%
select(1:2, 5, everything()) %>%
pivot_longer(cols = 3:5, names_to = "category", values_to = "value")
time_series_tbl %>%
mutate(category = category %>% as_factor() %>% fct_reorder(value, max) %>% fct_rev(),
text = str_glue("Bucket: {bucket}
Timestamp: {bucket_timestamp}
Category: {category}
Value: {value}")) %>%
ggplot(aes(x = bucket_timestamp, y = value)) +
geom_line(aes(color = category, linetype = category), size = .8) +
geom_point(aes(text = text), size =0.01) +
facet_wrap(~ bucket) +
theme_tq() +
scale_color_tq() +
labs(title = "Distribution of accounts created, policy qualified and quotes purchased",
subtitle = "Traffic of A/B testing is pretty evenly distributed through the months and variants for policy conversion",
y = "# of Accounts Created/Quotes Qualified/Policy Purchased",
x = "") +
theme(legend.position = "right",
legend.background = element_rect(
size=0.5, linetype="solid",
colour ="darkblue"))
In terms of generating referral activity (accounts, quotes, and policies) during the experiment, which variant of the test was most successful? Why do you think that is?
The case mentions the “A/B test is to increase referrals” however there are multiple interpretations 1) increasing account created 2) increasing quote qualified, or 3) increasing policies purchased. There are different opportunities to explore with regards to each interpretation. For the purposes of this proposal, I have focused on increased referral activity on policies purchased.
Based on first glance, customers receiving the promotions immediately (0hr bucket) resulted in highest number of accounts created, quotes and policies purchased. This can be explained since the policy purchase was recent and fresh on their minds. A limited time promotion will spur them to act on it immediately.
question_2 <- conversion_rates %>%
mutate(conversion_rate_to_quote = scales::percent(quote_issued/account_created, accuracy = 0.01),
conversion_rate_to_policy = scales::percent(policy_purchased/account_created, accuracy = 0.01))
question_2 %>% knitr::kable()
bucket | account_created | quote_issued | policy_purchased | conversion_rate_to_quote | conversion_rate_to_policy |
---|---|---|---|---|---|
0hr | 296 | 105 | 32 | 35.47% | 10.81% |
48hr | 272 | 78 | 26 | 28.68% | 9.56% |
off | 192 | 65 | 18 | 33.85% | 9.38% |
168hr | 245 | 83 | 20 | 33.88% | 8.16% |
Lift Rates
# 3.1 Conversion rates for control group
conv_rate_quote_control <- conversion_rates %>%
filter(bucket == "off") %>%
select(conversion_rate_to_quote) %>%
pluck(1)
conv_rate_policy_control <- conversion_rates %>%
filter(bucket == "off") %>%
select(conversion_rate_to_policy) %>%
pluck(1)
question_2_lift <- joined_tbl_2 %>%
filter(exceed_account_date == 0) %>%
group_by(bucket) %>%
summarize(
account_created = n(),
quote_issued = sum(receiver_quote),
policy_purchased = sum(receiver_policy),
conversion_rate_to_quote = quote_issued/account_created,
conversion_rate_to_policy = policy_purchased/account_created,
uplift_quote = conversion_rate_to_quote / conv_rate_quote_control-1,
uplift_policy = conversion_rate_to_policy / conv_rate_policy_control-1) %>%
select(1,5:8)
question_2_lift %>%
mutate(conversion_rate_to_quote = scales::percent(conversion_rate_to_quote, accuracy = 0.01),
conversion_rate_to_policy = scales::percent(conversion_rate_to_policy, accuracy = 0.01),
uplift_quote = scales::percent(uplift_quote, accuracy = 0.01),
uplift_policy = scales::percent(uplift_policy, accurate=0.01)) %>%
knitr::kable()
bucket | conversion_rate_to_quote | conversion_rate_to_policy | uplift_quote | uplift_policy |
---|---|---|---|---|
0hr | 35.47% | 10.81% | 4.78% | 15.3% |
168hr | 33.88% | 8.16% | 0.07% | -12.9% |
48hr | 28.68% | 9.56% | -15.29% | 2.0% |
off | 33.85% | 9.38% | 0.00% | 0.0% |
Average time from bucket timestamp to policy purchase
# 3.2 Average time to open account
# bucket timestamp to opening an account to qualifying for quote to purchasing policy
avg_time_to_policy_tbl <- joined_tbl_2 %>%
filter(exceed_account_date == 0) %>%
select(bucket,bucket_timestamp,3,5,7) %>%
mutate(bucket_to_account_time = difftime(receiver_account_timestamp, bucket_timestamp, units = c("days")),
account_to_quote_time = difftime(receiver_quote_timestamp, receiver_account_timestamp, units = c("days")),
quote_to_policy_time = difftime(receiver_policy_timestamp, receiver_quote_timestamp, units = c("days")),
bucket_to_policy_time = difftime(receiver_policy_timestamp, bucket_timestamp, units = c("days")),
account_to_policy_time= difftime(receiver_policy_timestamp, receiver_account_timestamp, units = c("days"))) %>%
group_by(bucket) %>%
# summarize(bucket_to_account_time = mean(bucket_to_account_time, na.rm = TRUE),
# account_to_quote_time = mean(account_to_quote_time, na.rm = TRUE),
# quote_to_policy_time = mean(quote_to_policy_time, na.rm = TRUE),
summarize(bucket_to_policy_time = mean(bucket_to_policy_time, na.rm = TRUE)) %>%
arrange(.,bucket_to_policy_time)
avg_time_to_policy_tbl %>%
knitr::kable()
bucket | bucket_to_policy_time |
---|---|
0hr | 31.60464 days |
off | 34.75762 days |
168hr | 36.08054 days |
48hr | 38.17651 days |
Key Takeaway: While it looks like immediate promotion (0hr bucket) is the best variant, but how do we know if this variation isn’t due to random chance?
We need to start with the null hypothesis. In our case, the null hypothesis will be that the policy conversion rate of the control treatment is no less than the policy conversion rate of the 3 experimental treatments.
The alternative hypothesis is that the experimental treatments has a higher policy conversion rate. This is what we want to quantify to determine which variant was most successful.
Assuming this is a normal distribution, a t-test would be appropriate given that mean and variance is unknown.
1. T-Test between 0 day variant and control group:
# 4.1 T-Test ----
# T-Test of immediate and control group
joined_tbl_2 %>%
filter(bucket == "0hr" | bucket == "off") %>%
select(bucket, receiver_policy) %>%
group_by(bucket) %>%
summarise(value = list(receiver_policy)) %>%
pivot_wider(names_from = bucket, values_from = value) %>%
mutate(p_value = t.test(unlist(`0hr`), unlist(off))$p.value,
t_value = t.test(unlist(`0hr`), unlist(off))$statistic) %>%
select(3:4) %>%
mutate(p_value = scales::number(p_value, accuracy = 0.01),
t_value = scales::number(t_value, accuracy = 0.01)) %>%
knitr::kable()
p_value | t_value |
---|---|
0.91 | -0.12 |
2. T-Test between 2 day variant and control group:
# T-Test of 48hr and control group
joined_tbl_2 %>%
filter(bucket == "48hr" | bucket == "off") %>%
select(bucket, receiver_policy) %>%
group_by(bucket) %>%
summarise(value = list(receiver_policy)) %>%
pivot_wider(names_from = bucket, values_from = value) %>%
mutate(p_value = t.test(unlist(`48hr`), unlist(off))$p.value,
t_value = t.test(unlist(`48hr`), unlist(off))$statistic) %>%
select(3:4) %>%
mutate(p_value = scales::number(p_value, accuracy = 0.01),
t_value = scales::number(t_value, accuracy = 0.01)) %>%
knitr::kable()
p_value | t_value |
---|---|
0.20 | -1.28 |
3. T-Test between 7 day variant and control group:
# T-Test of 168hr and control group
joined_tbl_2 %>%
filter(bucket == "168hr" | bucket == "off") %>%
select(bucket, receiver_policy) %>%
group_by(bucket) %>%
summarise(value = list(receiver_policy)) %>%
pivot_wider(names_from = bucket, values_from = value) %>%
mutate(p_value = t.test(unlist(`168hr`), unlist(off))$p.value,
t_value = t.test(unlist(`168hr`), unlist(off))$statistic) %>%
select(3:4) %>%
mutate(p_value = scales::number(p_value, accuracy = 0.01),
t_value = scales::number(t_value, accuracy = 0.01)) %>%
knitr::kable()
p_value | t_value |
---|---|
0.11 | -1.60 |
Key Takeaway: While immediate promotion variation is the best on paper, statistically we fail to reject the null hypothesis (since p-value is greater than 0.05) based on a 95% confidence interval. From the table, immediate promotion is not statistically higher than the control conversion rate.
Bootstrapping is an alternative approach to traditional hypothesis testing. It is easier to understand and valid for more conditions. With bootstrap, we resample the data 100 times to construct a confidence interval. If the confidence interval includes 0, then the p-value is deemed to be greater than 0.05 and the conversion difference is considered insigificant.
Confidence Interval between 0 day variant and control group
# Function: Building a confidence interval using bootstrap distribution
bootstrap_func <- function(data, samples = 100, shuffles = 100){
# Sample from initial jointed tbl
tbl_sample <- data %>%
group_by(bucket) %>%
sample_n(samples) %>%
ungroup()
# Perform policy conversation calculation
tbl_conversion_sample <- tbl_sample %>%
group_by(bucket) %>%
summarize(account_created = n(),
quote_issued = sum(receiver_quote),
policy_purchased = sum(receiver_policy),
conversion_rate_to_quote = quote_issued/account_created,
conversion_rate_to_policy = policy_purchased/account_created)
# initial observation difference
obs_diff <- diff(tbl_conversion_sample$conversion_rate_to_policy)
# Shuffled table
tbl_shuffled_sample <- tbl_sample %>%
mutate(bucket = shuffle(bucket)) %>%
group_by(bucket) %>%
summarize(account_created = n(),
quote_issued = sum(receiver_quote),
policy_purchased = sum(receiver_policy),
conversion_rate_to_quote = quote_issued/account_created,
conversion_rate_to_policy = policy_purchased/account_created) %>%
select(1,5)
# Perform specified number of resamples
many_shuffles <- do(shuffles)*
(tbl_sample %>%
mutate(bucket = shuffle(bucket)) %>%
group_by(bucket) %>%
summarize(account_created = n(),
quote_issued = sum(receiver_quote),
policy_purchased = sum(receiver_policy),
conversion_rate_to_quote = quote_issued/account_created,
conversion_rate_to_policy = policy_purchased/account_created) %>%
select(1,5))
# Pull table of distribution of conversion rsults
rand_distn <- many_shuffles %>%
group_by(.index) %>%
summarize(diffconversion = diff(conversion_rate_to_quote))
# Calculate standard error
std_err <- rand_distn %>%
summarize(se = sd(diffconversion))
# Calculate 95% CI
lower_ci <- as.numeric(obs_diff - (2*std_err))
upper_ci <- as.numeric(obs_diff + (2*std_err))
tbl <- tibble(lower_ci,
upper_ci)
return(tbl)
}
# Bootstrap: 0hr vs Off A/B Testing
immediate_v_off <- joined_tbl_2 %>%
filter(bucket == "0hr" | bucket == "off")
set.seed(2021)
bootstrap_func(immediate_v_off) %>%
mutate(lower_ci = lower_ci %>% scales::number(accuracy = 0.01),
upper_ci = upper_ci %>% scales::number(accuracy = 0.01)) %>%
knitr::kable()
lower_ci | upper_ci |
---|---|
-0.18 | 0.10 |
Confidence Interval between 2 day variant and control group
# Bootstrap: 48hr vs Off A/B Testing
twoday_v_off <- joined_tbl_2 %>%
filter(bucket == "48hr" | bucket == "off")
set.seed(2022)
bootstrap_func(twoday_v_off) %>%
mutate(lower_ci = lower_ci %>% scales::number(accuracy = 0.01),
upper_ci = upper_ci %>% scales::number(accuracy = 0.01)) %>%
knitr::kable()
lower_ci | upper_ci |
---|---|
-0.08 | 0.18 |
Confidence Interval between 7 day variant and control group
# Bootstrap: 168hr vs Off A/B Testing
sevenday_v_off <- joined_tbl_2 %>%
filter(bucket == "168hr" | bucket == "off")
set.seed(2023)
bootstrap_func(sevenday_v_off)%>%
mutate(lower_ci = lower_ci %>% scales::number(accuracy = 0.01),
upper_ci = upper_ci %>% scales::number(accuracy = 0.01)) %>%
knitr::kable()
lower_ci | upper_ci |
---|---|
-0.10 | 0.16 |
Key Takeaway: By using two methods, using hypothesis testing and bootstrap, we have demonstrated that the results shown are not significant. This supports the null hypothesis and we conclude that none of the variations are statistically more significant than the control group.
Consider the fact that we’re paying more money per referred quote during the promo variants. How would you evaluate the tradeoff between more referral activity and more cost?
We will need to determine Root Insurance’s priority before making the tradeoff:
Profitable Growth: If the objective is to demonstrate prudent financial decisions, Roots should review the lifetime value per customer (LTV) to customer acquisition cost (CAC) ratio. While the LTV is not available in the dataset, I believe Roots can extract LTV by extracting Average Contract Value ($) x # Repeat Sales / year x # years average retention time = LTV. This figure can be compared to the CAC by extracting (Total sales and marketing expenses + promotion value of 50 or 100) / # new customers acquired = CAC. If it is performing better than 1, it means we are making more sales per dollar of sales and marketing.
Maximize Growth: If the objective is to demonstrate a high growth trajectory for future investors, Roots should pursue higher referral activity which will increase their costs. Another reason may pertain to promoting more aggressively due to competitor activity. With this option, Roots should be comfortable operating at a net loss to grow top line. The benefits of growing faster can result in more negotiating power for insurance rates, and more brand recognition by customers.
Key Takeaway: For the purposes of this case, I have assumed Roots is in a stage where they are pursuing profitable growth. The main objective would be to maximize sales per marketing dollar to drive ROI. If we can compare the LTV for each bucket and compare to the CAC (including the avg referral cost below), we can gain a better sense whether the increases cost with promotions is driving more sales and a higher LTV / CAC ratio.
total_cost <- joined_tbl_2 %>%
# Eliminate the error samples
filter(exceed_account_date == 0) %>%
# Replace NAs with 0 for summation
mutate_at(vars(sender_earned_amount_in_dollars, receiver_earned_amount_in_dollars), ~replace(.,is.na(.),0)) %>%
group_by(bucket) %>%
summarize(account_created = n(),
quote_issued = sum(receiver_quote),
policy_purchased = sum(receiver_policy),
total_cost = sum(sender_earned_amount_in_dollars + receiver_earned_amount_in_dollars))
total_cost %>%
mutate(avg_referral_cost_per_policy = total_cost / policy_purchased) %>%
mutate(total_cost = scales::dollar(total_cost),
avg_referral_cost_per_policy = scales::dollar(avg_referral_cost_per_policy)) %>%
knitr::kable()
bucket | account_created | quote_issued | policy_purchased | total_cost | avg_referral_cost_per_policy |
---|---|---|---|---|---|
0hr | 296 | 105 | 32 | $10,225 | $319.53 |
168hr | 245 | 83 | 20 | $5,825 | $291.25 |
48hr | 272 | 78 | 26 | $6,300 | $242.31 |
off | 192 | 65 | 18 | $3,175 | $176.39 |
# 8. Cost per month ----
by_month_tbl_1 <- joined_tbl_2 %>%
# Eliminate the error samples
filter(exceed_account_date == 0) %>%
# Replace NAs with 0 for summation
mutate_at(vars(sender_earned_amount_in_dollars, receiver_earned_amount_in_dollars), ~replace(.,is.na(.),0)) %>%
group_by(bucket) %>%
summarise_by_time(.date_var = bucket_timestamp,
.by = "month",
across(.cols = c(receiver_quote, receiver_policy, sender_earned_amount_in_dollars, receiver_earned_amount_in_dollars), #across applies a function across all functions
.fns = sum),
.type = "ceiling") %>% # round takes the midway point of the date unit
mutate(total_cost = sender_earned_amount_in_dollars + receiver_earned_amount_in_dollars) %>%
mutate(bucket_timestamp = bucket_timestamp %-time% "1 day")
by_month_tbl_2 <- joined_tbl_2 %>%
# Eliminate the error samples
filter(exceed_account_date == 0) %>%
# Replace NAs with 0 for summation
mutate_at(vars(sender_earned_amount_in_dollars, receiver_earned_amount_in_dollars), ~replace(.,is.na(.),0)) %>%
group_by(bucket) %>%
summarise_by_time(.date_var = bucket_timestamp,
.by = "month",
value = n(),
.type = "ceiling") %>%
mutate(bucket_timestamp = bucket_timestamp %-time% "1 day")
by_month_tbl <- by_month_tbl_1 %>%
left_join(by_month_tbl_2)
#
#
# by_month_tbl %>%
# mutate(accounts_created = value) %>%
# select(1:4,7,9) %>%
# select(1:2, 6, everything()) %>%
# mutate(total_cost = scales::dollar(total_cost)) %>%
# knitr::kable()
Suppose today is 5/08/2018 at 2:15 PM (when the data was pulled). Based on your answers to questions 1-3, what should we do right now? Do you think we should roll out one of the four variants to everyone? Are any of the variants ready to be shut off? Should we leave it on to keep collecting more data? Why?
Given the current conditions, I recommend the following 2 decisions right now:
Following the analysis stated above, if a statistical significance is not concluded, we will reject the test hypothesis and conclude there is no significance in the A/B experiment.
We will then consider the business impacts: While there is no statistical significance, it may still make business sense to launch the immediate promotion if our launch decision is to have neutral or more conversion. This will require a discussion with the PM.
What other tests do you think we should run to further investigate and optimize referral activity?
If we assume the objective is: To increase referral activity while increasing high quality drivers policy purchase conversion. We should consider the following tests to further investigate:
1. Introduce a referral bonus for policy purchase
2. Segmenting customers to send more customized referral promotions
–>