The estimated price elasticity \(\eta\)=-2.5642. A 1% price increase leads to a 2.56% product’s demand reduction. Demand is elastic since \(|\eta|\)>1. Model intercept is \(\alpha\)=16.91.
Log-linear demand model:
\(log(q)=16.91-2.56log(p) + \epsilon\)
\(q=e^{16.91}p^{-2.56}\)
\(\pi(p)=e^{16.91}p^{-2.56} (p)=e^{16.91}p^{-1.56}\). When p = 0.7p: \(\pi(0.7p)=e^{16.91}0.7p^{-1.56}\)
A 30% discount is predicted to increase profit by 74.7%. With elastic demand, the quantity increase from the discount outweighs the percentage loss in price per unit.
c)
#q1c identify potential bias of model 1b library(readr)library(dplyr)# Load the datasetq1data <-read_csv("store_weekly.csv")
Rows: 104 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (4): week, month, price, quantity
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# weekly profit with mc = 0 q1data <- q1data %>%mutate(profit = price * quantity)# Identify discount perioddiscount_p <-filter(q1data, month %in%c(11, 12))ndiscount_p <-filter(q1data, !(month %in%c(11, 12)))# Calculate average weekly profit in each perioddiscount_pavg <-mean(discount_p$profit) #profit during discount period ndiscount_pavg <-mean(ndiscount_p$profit) #profit during non discount period print(discount_pavg)
Prices and quantities are endogenous variables determined in equilibrium through supply and demand curves’ intersection. The log-linear model suffers from omitted variable bias. The discount occurs during November-December, when both supply side (deliberate price cuts) and demand side (baseline demand rises during holiday) changes occur, therefore the observed low prices and high quantities may either reflect elastic demand, shift in demand curve, or both. The model omits demand shifters such as seasonality (holiday season baseline demand changes), attributing all quantity changes to discount. Holiday baseline demand rises means positive demand shocks. Low prices coincide with positive demand shocks violate exogeneity.
d)
Proposed model: fixed effects panel regression which controls for unobserved time-invariant and unit-variant factors (store-specific events, fixed seasonal trends) through dummy variables. lt is suitable since the data-set spans two years of weekly sales data and isolates discount effects from seasonality.
# regression: month fixed effects to control for seasonal demand shiftsmodel_1d <-lm(log(quantity) ~log(price) +factor(month), data = q1data)summary(model_1d)
elasticity_1d <-coef(model_1d)["log(price)"]cat("model 1d price elasticity of demand is ",round(elasticity_1d,4), sep ="", "\n")
model 1d price elasticity of demand is -0.6198
#ped difference ped_dif_q1<- elasticity_1d- elasticity_1bcat("difference between 1d_elasticity to 1b_elasticity is ",round(ped_dif_q1,4), sep ="", "\n")
difference between 1d_elasticity to 1b_elasticity is 1.9444
# profit ratiosprofit_ratio_1b <-0.7^(1+ elasticity_1b)profit_ratio_1d <-0.7^(1+ elasticity_1d) cat("model 1d profit ratio is ",round(profit_ratio_1d,4), sep ="", "\n")
model 1d profit ratio is 0.8732
#F-statistics test model_restricted <-lm(log(quantity) ~log(price), data = q1data)anova_test <-anova(model_restricted, model_1d)cat("\n=== F-TEST FOR MONTH EFFECTS ===\n")
=== F-TEST FOR MONTH EFFECTS ===
print(anova_test)
Analysis of Variance Table
Model 1: log(quantity) ~ log(price)
Model 2: log(quantity) ~ log(price) + factor(month)
Res.Df RSS Df Sum of Sq F Pr(>F)
1 102 2.53887
2 91 0.79937 11 1.7395 18.002 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
if (anova_test$`Pr(>F)`[2] <0.05) {cat("\n✓ Month effects are statistically significant (p < 0.05)\n")cat(" This confirms seasonal demand shifts are present and should be controlled for.\n")} else {cat("\n✗ Month effects are not statistically significant (p > 0.05)\n")}
✓ Month effects are statistically significant (p < 0.05)
This confirms seasonal demand shifts are present and should be controlled for.
From model_1d, \(\eta_{1d}\)=-0.6198 (p=0.0037) and \(\frac{\pi_{new}}{\pi_{old}}\)=0.8732, which means relative to old profit, at 0.7p (seasonality controlled), new profit decreases by 12.68%. The discount is therefore not justified.
Comparison of fixed-effects model to log-linear model:
Log-linear model \(R^2\)=0.857, fixed-effects model \(R^2\)=0.955, this means month effects substantially accounts for previously unexplained variance. This confirms that linear-log model exhibits optimistic bias, and shows that true demand is inelastic.
F-statistics measures if significant differences exist between the mean quantity sold each month.
\(F=\frac{between-group \ variability}{within-group\ variability}=18\ (p<0.001)\). There is enough evidence to reject the null hypothesis that all month effects are zero.
Q2:
a)
Equation: \(retention_{i}=\beta_{0}+\beta_1activation_{i}+\epsilon_i\), in which \(retention_i=1, activation_i=1\), subsequently represents retained customer and activated customer, and 0 otherwise.
#number of customers in treatment group q2a_treated <-sum(q2data$treated==1)#overall activation rate of whole dataset. q2a_activation<-mean(q2data$activation)#overall retention rate of whole dataset. q2a_retention <-mean(q2data$retention) #retention by activation retention_by_activation <- q2data %>%group_by(activation) %>%summarise(n =n(),retention_rate =mean(retention) )#regression model model_2a <-lm(retention ~ activation, data = q2data)summary(model_2a)
Call:
lm(formula = retention ~ activation, data = q2data)
Residuals:
Min 1Q Median 3Q Max
-0.7031 -0.4398 0.2969 0.5602 0.5602
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.43983 0.01386 31.72 <2e-16 ***
activation 0.26331 0.02199 11.97 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.4813 on 1998 degrees of freedom
Multiple R-squared: 0.06695, Adjusted R-squared: 0.06648
F-statistic: 143.4 on 1 and 1998 DF, p-value: < 2.2e-16
#coefficient of activation beta_activation <-coef(model_2a)["activation"]cat("activation coefficient:", round(beta_activation, 4), "\n")
activation coefficient: 0.2633
intercept_b0<-coef(model_2a)["(Intercept)"]#given infocost_per_act <-5ltv_per_retained <-50#profit calculationexpected_value <- beta_activation*ltv_per_retainedcat("expected value per activation = £",round(expected_value,2), sep ="", "\n")
#total expected profit for 2000 customers n_customers <-nrow(q2data)t_expected_value <- profit_per_act*n_customerscat("total expected profit when offered to 2000 customers = £",round(t_expected_value,2), sep ="", "\n")
total expected profit when offered to 2000 customers = £16331.06
From model_2a:
\(retention_i=0.4398+0.2633activation_i\)
The baseline retention rate (\(\beta_0\) activation = 0) is 43.98% (p<0.001). Activation coefficient (\(\beta_1)\) is 0.2633 (p<0.001), meaning that activation increases retention by 26.33%.
Expected profit per activation is: \(\pi = \beta_1 *(LTV - CPA)\).
Extra LTV per activation is \(0.2633*50=13.17\). Expected profit per activation is therefore £8.17. It is profitable for the firm to provide the new feature.
b)
The initial treatment assignment is a plausible but not perfect instrument for measuring actual activation status.
A good instrument variable (IV) \(Z\) for an endogenous regressor \(X_1\) has to satisfy the following conditions:
Exogeneity: No correlation between IV and the error term \(Cov(Z,\epsilon)=0\)
Instrument relevance: IV must be correlated with endogenous variable \(Cov(Z, X_1) \neq0\).
Strength: Weak instruments (those that weakly correlate to endogenous variable) can cause severe biases in IV estimates.
Exclusion restriction: IV \(Z\) affects the dependent variable (\(Y\)) only through its effects on \(X_1\).
#q2blibrary(tidyverse)q2data<-read.csv("customer_data.csv")#activation rate among treatment group activation_rate_treated<-mean(q2data$activation[q2data$treated ==1]) #regression activation on treated model_2b<-lm(activation ~ treated, data = q2data)summary(model_2b)
Call:
lm(formula = activation ~ treated, data = q2data)
Residuals:
Min 1Q Median 3Q Max
-0.795 0.000 0.000 0.205 0.205
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.747e-14 9.032e-03 0.00 1
treated 7.950e-01 1.277e-02 62.24 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.2856 on 1998 degrees of freedom
Multiple R-squared: 0.6598, Adjusted R-squared: 0.6596
F-statistic: 3874 on 1 and 1998 DF, p-value: < 2.2e-16
coef_treated <-coef(model_2b)["treated"]cat("coefficient of treated = ",coef_treated, sep ="", "\n")
coefficient of treated = 0.795
From the analysis:
Relevance and strength assumption holds: first stage regression \(activation_i=\gamma+\delta treated_i+u_i\). The treated coefficient \(\gamma\)=0.795 and F-statistics of 3874 exceeds the conventional threshold (10), indicating strong empirical evidence for relevance and instrument strength.
Exclusion restriction and exogeneity: Given the study design (50% treated, 50% control), it can be assumed that the firm intended to randomise assignment, making exogeneity and exclusion plausible design assumptions. Realistically, this may not hold if staff selectively chose more engaged clients. If true, this would mean a direct correlation of assignment to retention, violating the exclusion restriction. The 50/50 design supports but doesn’t guarantee exogeneity since its validity rests on unobservable procedure details.
c) d)
#q2c#q2c iv regression with initial treatment assignment library(tidyverse)#install.packages("AER")library(AER)
Warning: package 'AER' was built under R version 4.3.3
Loading required package: car
Warning: package 'car' was built under R version 4.3.3
Loading required package: carData
Warning: package 'carData' was built under R version 4.3.3
Attaching package: 'car'
The following object is masked from 'package:dplyr':
recode
The following object is masked from 'package:purrr':
some
Loading required package: lmtest
Warning: package 'lmtest' was built under R version 4.3.3
Loading required package: zoo
Warning: package 'zoo' was built under R version 4.3.3
Attaching package: 'zoo'
The following objects are masked from 'package:base':
as.Date, as.Date.numeric
Loading required package: sandwich
Warning: package 'sandwich' was built under R version 4.3.3
Loading required package: survival
q2data<-read.csv("customer_data.csv")#first stage: activation treated fs_2c<-lm(activation ~ treated, data = q2data)summary(fs_2c)
Call:
lm(formula = activation ~ treated, data = q2data)
Residuals:
Min 1Q Median 3Q Max
-0.795 0.000 0.000 0.205 0.205
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.747e-14 9.032e-03 0.00 1
treated 7.950e-01 1.277e-02 62.24 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.2856 on 1998 degrees of freedom
Multiple R-squared: 0.6598, Adjusted R-squared: 0.6596
F-statistic: 3874 on 1 and 1998 DF, p-value: < 2.2e-16
q2data$activation_hat <-fitted(fs_2c)#second stage: retention to instrumented activation model_2c<-lm(retention~activation_hat, data=q2data)summary(model_2c)
Call:
lm(formula = retention ~ activation_hat, data = q2data)
Residuals:
Min 1Q Median 3Q Max
-0.574 -0.515 0.426 0.485 0.485
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.51500 0.01573 32.742 < 2e-16 ***
activation_hat 0.07421 0.02798 2.652 0.00806 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.4974 on 1998 degrees of freedom
Multiple R-squared: 0.003509, Adjusted R-squared: 0.00301
F-statistic: 7.035 on 1 and 1998 DF, p-value: 0.008055
#alternative methods alternative_2c<-ivreg(retention~activation|treated, data = q2data)summary(alternative_2c)
Call:
ivreg(formula = retention ~ activation | treated, data = q2data)
Residuals:
Min 1Q Median 3Q Max
-0.5892 -0.5150 0.4108 0.4850 0.4850
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.51500 0.01550 33.228 < 2e-16 ***
activation 0.07421 0.02757 2.692 0.00717 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.4901 on 1998 degrees of freedom
Multiple R-Squared: 0.03242, Adjusted R-squared: 0.03194
Wald test: 7.245 on 1 and 1998 DF, p-value: 0.007167
Using initial treatment assignment as IV for actual activation, the coefficient of instrumented activation on retention is 0.0742 (p<0.001), which means among treatment group, feature activation increases retention probability by 7.42%. Each activation yields an LTV of £3.71 and expected profit per activation is therefore (-£1.29). Providing the feature is therefore not profitable.
\(ATT= E(Y_i(1)-Y_i(0)|W_i=1)=\frac{E(Y_i|Z_i=1)-E(Y_i|Z_i=0)}{Pr(W_i=1|Z_i=1)}\) represents the average effect of the treatment among those treated, \(Pr(W_i=1|Z_i=1)\) is activation rate among treatment group. \(Z \in \{0,1\}\) - assignment status, \(Y \in \{0,1\}\) - retention status, \(W \in \{0,1\}\) - activation status.
Local average treatment effect (LATE) estimates the treatment effect among those who complied with the assignment, if and only if activation among control \(E(W_i|Z_i=0)\) then ATT = LATE.
ITT measures the retention effect based on initial assignment, regardless of actual activation status. Being assigned the treatment increases retention rate by 5.9%. Activation rate among treatment \(Pr(W_i=1|Z_i=1)\) is 79.5%. ATT is around 7.42%, which means in treat=1, activators have a 7.42% higher retention rate. Activation rate among control \(Pr(W_i=1|Z_i=0)=E(W_i|Z_i=0) = 0\) , ATT and LATE both yields 7.42%.
#q2dlibrary(tidyverse)q2data<-read.csv("customer_data.csv")#retention rate among treatment and control (y|z)retention_among_treated<-mean(q2data$retention[q2data$treated ==1])cat("retention among treated is ", retention_among_treated, sep ="", "\n")
retention among treated is 0.574
retention_among_control <-mean(q2data$retention[q2data$treated ==0]) cat("retention_among_control is ", retention_among_control, sep ="", "\n")
retention_among_control is 0.515
itt <- retention_among_treated- retention_among_controlcat("itt is ",itt, sep ="", "\n")
#profitability cpa<-5ltv_per_retention<-50net_benefit <- ltv_per_retention*att - cpa cat("net profit per customers is £", round(net_benefit,2), sep ="", "\n")
net profit per customers is £-1.29
n_customers<-nrow(q2data)expected_act<- n_customers *mean(q2data$activation)total_expected_benefit <-expected_act * net_benefitcat("total expected profit per 2000 customers is £", total_expected_benefit, sep ="", "\n")
total expected profit per 2000 customers is £-1025
Model 2a and 2c yields coefficient of activation of 0.2633 and 0.0742, respectively. Profit per activation is £8.17 and (-£1.29), subsequently for 2a and 2c. 2c and 2d’s results are identical.
Results from 2c and 2d model is more reliable since 2a suffers positive selection bias (more engaged customers are more likely to activate the feature and/or more likely to retain subscription regardless of activation) leading to overestimation.
The firm is therefore recommended to not implement the feature. Once activation endogeneity is addressed, causal effect is modest (7.24%). Additionally, the firm is expected to make a net loss of -£1.29 per activation. The firm should either redesign the feature to reduce costs below £3.71 per activation or target customers with a higher LTV.
#logistic regression renewal on treatment statusmodel_4a<-glm(renewed~treat, data = q4data, family= binomial )summary(model_4a)
Call:
glm(formula = renewed ~ treat, family = binomial, data = q4data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.44288 0.05168 -47.27 <2e-16 ***
treat 1.69360 0.06004 28.21 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 9960.7 on 9999 degrees of freedom
Residual deviance: 9000.0 on 9998 degrees of freedom
AIC: 9004
Number of Fisher Scoring iterations: 5
coef_treat <-coef(model_4a)["treat"]coef_treat_odds<-exp(coef_treat)intercept_4a<-coef(model_4a)["(Intercept)"]control_renewal_rate <- (exp(intercept_4a)/(1+exp(intercept_4a)))cat("renewal probability of control group (scenario 2) is ", round(control_renewal_rate,3), sep ="", "\n")
renewal probability of control group (scenario 2) is 0.08
intercept_4a_odds<-exp(intercept_4a)cat("renewal odds for control group is ", round(intercept_4a_odds,3), sep ="", "\n")
renewal odds for control group is 0.087
#renewal probability of treatment group treated_renewal_rate<-(exp(coef_treat+intercept_4a)/(1+exp(coef_treat+intercept_4a))) cat("renewal probability of treatment group (scenario 1) is ", round(treated_renewal_rate,3), sep ="", "\n")
renewal probability of treatment group (scenario 1) is 0.321
#treatment effect on renewal rate treatment_effect_on_renewal<- treated_renewal_rate - control_renewal_ratecat("treatment effect on renewal rate is ", round(treatment_effect_on_renewal,3), sep ="", "\n")
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.321 0.321 0.321 0.321 0.321 0.321
alltreated_avg<-mean(pred_alltreated)cat("renewal rate when everyone is treated is ",round( alltreated_avg,3), sep ="", "\n")
renewal rate when everyone is treated is 0.321
#predict renewal treated = no one. nonetreated_4a<- q4data %>%mutate(treat=0)pred_nonetreated <-predict.glm(model_4a, newdata=nonetreated_4a, type ="response")nonetreated_avg<-mean(pred_nonetreated) cat("renewal rate when no one is treated is ", round(nonetreated_avg,3), sep ="", "\n")
renewal rate when no one is treated is 0.08
#comparison effect_4a<- alltreated_avg - nonetreated_avgcat("treatment effect on renewal rate is ", round(effect_4a,3), sep ="", "\n")
treatment effect on renewal rate is 0.241
\(P(renew_i=1)=\lambda(\beta_0+\beta_1treat_i)\) in which \(\lambda(z)=\frac{e^z}{1+e^z}\)
The treatment coefficient (log-odds) is \(\beta_1=1.694\) (p<0.001), \(e^{\beta_0}\approx5.44\), treatment increases renewal odds by 444%. The intercept (log-odds) \(\beta_0=-2.443\) ( p<0.001), \(e^{\beta_0}=0.087\) is the odds of renewal for treat = 0.
Scenario 2: No one treated \(\hat{P}(renewed=1|treated=0)=\lambda(\beta_0)=(-2.443)\lambda\)
\(p_0=\frac{e^{\beta_0}}{1+e^{\beta_0}}\approx0.080\), expected renewal rate for scenario 2 is 8%.
Treatment effect on overall renewal rate is \(p_1-p_0=0.241\). Treatment increases overall retention rate by 24.1%, exceeding the threshold (10%).
b)
#q4b #install.packages("pscl")#library(pscl)library(tidyverse)q4data<-read.csv("fitness.csv") #logistic regression with treatment & past workouts model_4b <-glm(renewed ~ treat + past_workouts, data = q4data, family = binomial)summary(model_4b)
Call:
glm(formula = renewed ~ treat + past_workouts, family = binomial,
data = q4data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.47606 0.09438 -47.42 <2e-16 ***
treat 0.97308 0.06574 14.80 <2e-16 ***
past_workouts 0.43771 0.01497 29.25 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 9960.7 on 9999 degrees of freedom
Residual deviance: 7964.9 on 9997 degrees of freedom
AIC: 7970.9
Number of Fisher Scoring iterations: 5
treat4b_coef<-coef(model_4b)["treat"]pworkout<-coef(model_4b)["past_workouts"]intercept_4b<-coef(model_4b)["(Intercept)"]#renewal prob for user of treat = 0 and past workouts = 0baseline_renew_prob <- (exp(intercept_4b)/(1+exp(intercept_4b)))cat("renewal rate for user with treat = 0 and past_workouts = 0 is ",round(baseline_renew_prob,3), sep ="", "\n")
renewal rate for user with treat = 0 and past_workouts = 0 is 0.011
# scenario 1 - everyone treated all_treated <- q4data %>%mutate(treat =1)pred_treatment <-predict.glm(model_4b, newdata = all_treated, type ="response")avg_ptreated <-mean(pred_treatment)cat("renewal rate when everyone is treated (pastworkout constant) is ",round(avg_ptreated,4), sep ="", "\n")
renewal rate when everyone is treated (pastworkout constant) is 0.2471
# scenario 2 - no one treateall_control <- q4data %>%mutate(treat =0)pred_control <-predict.glm(model_4b, newdata = all_control, type ="response")avg_pcontrol <-mean(pred_control)cat("renewal rate when no one is treated (pastworkout constant) is ",round(avg_pcontrol,4), sep ="", "\n")
renewal rate when no one is treated (pastworkout constant) is 0.1245
effect_4b<- avg_ptreated - avg_pcontrolcat("treatment effect on renewal rate (pastworkouts constant) is ", round(effect_4b,3), sep ="", "\n")
treatment effect on renewal rate (pastworkouts constant) is 0.123
#past workouts control vs treatment avg_control_pwrk <-mean(q4data$past_workouts[q4data$treat ==0])avg_treat_pwrk <-mean(q4data$past_workouts[q4data$treat ==1]) #pearson correlation coefficient treatment and pastworkouts model4a validity cor(q4data$treat, q4data$past_workouts)
[1] 0.4401311
The treatment coefficient \(\beta_1=0.973\) (p<0.001), odds ratio \(e^{\beta_1}=2.65\), treatment increases renewal odds by 165% (past_workout controlled).
The past_workout coefficient \(\beta_2=0.438\) (p<0.001), \(e^{\beta_2}=1.55\), which means each additional past workout raises the renewal odds by 55% (treatment controlled).
The intercept (\(\beta_0\)) (p<0.001) is -4.48 represents the log-odds of renewal for a user with treat = 0 and past_workouts = 0.
Baseline activity controlled, the model predicts that treatment increases renewal by 12.26%, exceeding the threshold (10%). The company should therefore proceed with new feature.
The \(r\) for treatment and past_workout is 0.44 demonstrating moderate positive correlation, this indicates selection bias, more active users (those with higher past workouts) were more likely to receive treatment. This creates positive confounding, violating the random assignment assumption. Model 4a hence suffers from omitted variable bias.
c)
#q4c library(tidyverse)q4data <-read.csv("fitness.csv")# Logistic regression renewal on treatment, pworkout, and app freq model_4c <-glm(renewed ~ treat + past_workouts + app_opens, data = q4data, family = binomial)summary(model_4c)
Call:
glm(formula = renewed ~ treat + past_workouts + app_opens, family = binomial,
data = q4data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.50806 0.09635 -46.789 < 2e-16 ***
treat 0.08070 0.07772 1.038 0.299
past_workouts 0.08360 0.02112 3.959 7.54e-05 ***
app_opens 0.35601 0.01578 22.556 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 9960.7 on 9999 degrees of freedom
Residual deviance: 7395.1 on 9996 degrees of freedom
AIC: 7403.1
Number of Fisher Scoring iterations: 5
treat_coef4c<-coef(model_4c)["treat"]pwrk_coef4c<-coef(model_4c)["past_workouts"]app_coef4c<-coef(model_4c)["app_opens"]intercept_4c<-coef(model_4c)["(Intercept)"] p0<- (exp(intercept_4c)/(1+exp(intercept_4c))) #app open is an in-trial parameter (not pre or post)# scenario 1: everyone treatedall_treated4c <- q4data %>%mutate(treat =1)pred_treated4c <-predict.glm(model_4c, newdata = all_treated4c, type ="response")ptreated_4c <-mean(pred_treated4c) cat("renewal rate when everyone is treated (pastworkout and app usage constant) is ",round(ptreated_4c,4), sep ="", "\n")
renewal rate when everyone is treated (pastworkout and app usage constant) is 0.2012
# scenario 2: no one treated all_control4c <- q4data %>%mutate(treat =0)pred_control4c <-predict.glm(model_4c, newdata = all_control4c, type ="response")pcontrol_4c <-mean(pred_control4c)cat("renewal rate when no one is treated (pastworkout and app usage constant) is ",round(pcontrol_4c,4), sep ="", "\n")
renewal rate when no one is treated (pastworkout and app usage constant) is 0.1918
# Treatment effect on overall retention rateeffect_4c <- ptreated_4c- pcontrol_4ccat("treatment effect on renewal rate (pastworkouts & app usage constant) is ", round(effect_4c,4), sep ="", "\n")
treatment effect on renewal rate (pastworkouts & app usage constant) is 0.0093
The intercept \(\beta_0=-4.51\) (\(p<0.001\)), \(p_0=0.0109\), which means renewal rate for baseline users (treat, post_workouts and app = 0) is 1.09%.
The treatment coefficient \(\beta_1=0.081\) (statistically insignificant \(p=0.299\)), which means the treatment has negligible effect (controlled past_workout app_opens). The past_workouts coefficient \(\beta_2=0.084\) (\(p<0.001\)), (\(e^{\beta_2}\)) of 1.09 which means each additional past workout increases renewal odds by 9%. The app usage coefficient \(\beta_3=0.356\) (\(p<0.001\)), \(e^{\beta_3}=1.43\). Each open increases renewal odds by 43%.
Controlled for past workouts and app usage, treatment affect on renewal rate is 0.93%, significantly smaller than threshold. Model_4c recommends the company to not implement the feature.
Post_workout is a pre-treatment confounders, affecting both the probability of being assigned treatment, and renewal. App open is a mediator in the indirect pathway.
Model comparison and conclusion:
Treatment affect on renewal rates across 4a, 4b and 4c are 24.1%, 12.26% and 0.93%, subsequently. Model 4a suffers from confounding due to positive selection bias because it omits the variable past_workouts, causing overestimation of treatment effect. Model 4c incorporates app usage (besides past workouts), which is a mediator (indirect pathway of how treatment affects outcome), and controlling for app opens would therefore block indirect treatment effect, resulting in underestimation. Model 4b controls for pre-treatment confounder and allows for complete estimation of treatment effect on outcome. Therefore based on model 4b, the new feature should be implemented because it raises renewal by 12.26%, exceeding the threshold (10%).
#q3av0<-0vv<-0.4vw<-0.2vt<-0.2denominator<-exp(v0)+exp(vv)+exp(vw)+exp(vt)#choice probability for brand v prob_v<-exp(vv)/denominator cat("choice probability for brand v is ",round(prob_v,4), sep ="", "\n")
choice probability for brand v is 0.3023
#choice probability for brand w prob_w<-exp(vw)/denominator cat("choice probability for brand w is ",round(prob_w,4), sep ="", "\n")
choice probability for brand w is 0.2475
#choice probability for brand t prob_t<-exp(vt)/denominator cat("choice probability for brand t is ",round(prob_t,4), sep ="", "\n")
choice probability for brand t is 0.2475
#choice probability for outside option prob_0<-exp(v0)/denominator cat("choice probability for outside option (0) is ",round(prob_0,4), sep ="", "\n")
choice probability for outside option (0) is 0.2026
b)
#q3b library(tidyverse)V_V <-0.4V_W <-0.2V_T <-0.2V_0 <-0n_consumers <-50000set.seed(123)# Generate Gumbel errors# If X ~ Exp(1), then -ln(X) ~ Gumbel(0,1)generate_gumbel <-function(n) {-log(rexp(n, rate =1))}#error termepsilon_0 <-generate_gumbel(n_consumers)epsilon_V <-generate_gumbel(n_consumers)epsilon_W <-generate_gumbel(n_consumers)epsilon_T <-generate_gumbel(n_consumers)# Calculate utilitiesU_0 <- V_0 + epsilon_0U_V <- V_V + epsilon_VU_W <- V_W + epsilon_WU_T <- V_T + epsilon_T# maximum utility for each consumerutilities_matrix <-cbind(U_0, U_V, U_W, U_T)max_utilities <-apply(utilities_matrix, 1, max)# Expected consumer surplusexpected_CS <-mean(max_utilities) cat("expected consumer surplus is ",round(expected_CS,4), sep ="", "\n")
expected consumer surplus is 2.1737
Consumer surplus represents the expected utility consumers receive from having access to these brands. Each consumer’s expected surplus is 2.17 from choosing their preferred among 4 options (V,W,T,0), benefiting from product variety (2.17>0).
c)
#q3c merger of W and T library(tidyverse)V_V <-0.4V_W <-0.2V_T <-0.2V_0 <-0n_consumers <-50000set.seed(123)#Gumbel errorsgenerate_gumbel <-function(n) {-log(rexp(n, rate =1))}# error term for each eps_0 <-generate_gumbel(n_consumers)eps_V <-generate_gumbel(n_consumers)eps_W <-generate_gumbel(n_consumers)eps_T<-generate_gumbel(n_consumers)# utilitiesU_0 <- V_0 + eps_0U_V <- V_V + eps_VU_W <- V_W + eps_WU_T <- V_T + eps_T#pre-merger utilities util_pre<-cbind(U_0, U_V, U_W, U_T)max_pre <-apply(util_pre, 1, max) cs_pre <-mean(max_pre) cat("pre-merger consumer surplus is ",round(cs_pre,4), sep ="", "\n")
post-merger consumer surplus changes by (rate) -0.1286
#market share pre and post #util_post <- cbind(U_0, U_V, U_W)post_choice<-max.col(util_post)post_shares<-prop.table(table(post_choice))#util_pre<- cbind(U_0, U_V, U_W, U_T)pre_choice<-max.col(util_pre)pre_shares <-prop.table(table(pre_choice))
The merger reduces aggregate consumer surplus by 12.86%. Reduced product variety (brand T eliminated) accounts for this. Additionally, despite V and W remains, losing T means those prefer T, must now switch to a less-preferred option. In addition, the IIA (independence of irrelevant alternatives) of \(mlogit\) means the T’s probability mass is reallocated equally across all remaining options, instead of W only. The model assumes W&T as perfect substitutes (\(V_W=V_T=0.2)\).
d)
The analysis reveals that merger would lead to a 12.86% reduction in consumer surplus, meaning a harm to consumer welfare.
Validity of policy implications based on model:
The model imposes IIA which might mean unrealistic substitution pattern. Perfect substitutes assumptions may not hold in practice.
The model is static since it assumes fixed product attributes, price, market dynamics, innovation post merger.
The model therefore provides a useful starting point but is insufficient. Real policy choices should be supplemented with potential price effects, competitive dynamics, empirical demand and efficiency claim analysis.
Market dynamics, innovation and barrier-to-entry - static analysis misses dynamic considerations. Considering competitive dynamics, the merger would likely result in reduced competition, which likely means complacency, reduced innovation and price rises. Authorities should also consider barrier to entry changes post-merger, if barriers to entry increase, consumer harm may increase in the long term.
Efficiencies and synergies - The merger might benefit from cost synergies given their shared supply chain (Eden Valley), economies of scale in bottling and distribution, or duplicative SG&A costs’ reduction. Loss in consumer surplus may decrease if efficiency gains are passed on.
Product and portfolio mix - Post-merger, the combined firm might modify their product portfolio mix, which can alter substitution patterns and welfare outcomes beyond the scenario assumption.
Distribution and fairness - authorities also consider distributional impacts and fairness perceptions. Specifically, the merger might disproportionately harming certain groups (e.g: small cafes reliant on brand T) more.