library(survival)
library(survminer)
library(tidyverse)
library(broom)
library(janitor)
survival_data = read_csv("survival_data.csv")
survival_data = survival_data %>%
mutate(status = ifelse(time_to_event <= 30,1,0))
survival_data %>%
group_by(return_status, status) %>%
summarise(
create_day = median(processed_pr_day),
return_pct = round(mean(ret_pct)),
time_event = median(time_to_event)
)
# A tibble: 4 × 5
# Groups: return_status [2]
return_status status create_day return_pct time_event
<chr> <dbl> <dbl> <dbl> <dbl>
1 NR 0 1 0 81
2 NR 1 1 0 8
3 R 0 2 19 72
4 R 1 4 7 4
survival_data %>%
tabyl(return_status,status) %>% adorn_totals() %>%
adorn_percentages() %>%
adorn_pct_formatting() %>%
adorn_ns(position = "front") %>%
adorn_title(row_name = "Return status",
col_name = "Order status")
Order status
Return status 0 1
NR 7,759 (64.6%) 4,246 (35.4%)
R 2,375 (22.1%) 8,368 (77.9%)
Total 10,134 (44.5%) 12,614 (55.5%)
surv_model = survfit(Surv(time_to_event,status) ~ 1, data = survival_data)
ggsurvplot(
surv_model,data = survival_data,
ggtheme = theme_minimal()
) +
labs(
title = expression(bold("Survival plot for overall group"))
)
surv_model_group = survfit(Surv(time_to_event,status) ~ strata(return_status), data = survival_data)
ggsurvplot(
surv_model_group,data = survival_data,fun = function(y) -log(y),
ggtheme = theme_minimal()
) +
labs(
title = expression(bold("Hazard plot for two group"))
)
cox_model = coxph(Surv(time_to_event,status) ~ strata(return_status), data = survival_data)
summary(cox_model)
Call: coxph(formula = Surv(time_to_event, status) ~ strata(return_status),
data = survival_data)
Null model
log likelihood= -111912.9
n= 22748
ggsurvplot(
survfit(cox_model,data = survival_data),
ggtheme = theme_minimal(),fun = function(y) -log(y),
) +
labs(
title = expression(bold("Hazard plot for two group")),
subtitle = expression(italic("Return unaffected(NR) , Return affected(R)"))
)
cox_cov = coxph(Surv(time_to_event,status) ~ return_status +processed_pr_day +ret_pct , data = survival_data)
summary(cox_cov)
Call:
coxph(formula = Surv(time_to_event, status) ~ return_status +
processed_pr_day + ret_pct, data = survival_data)
n= 22748, number of events= 12614
coef exp(coef) se(coef) z Pr(>|z|)
return_statusR 1.729e+00 5.636e+00 2.210e-02 78.23 < 2e-16 ***
processed_pr_day 5.565e-04 1.001e+00 9.434e-05 5.90 3.64e-09 ***
ret_pct -4.701e-02 9.541e-01 1.489e-03 -31.57 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
return_statusR 5.6363 0.1774 5.3973 5.8858
processed_pr_day 1.0006 0.9994 1.0004 1.0007
ret_pct 0.9541 1.0481 0.9513 0.9569
Concordance= 0.716 (se = 0.002 )
Likelihood ratio test= 6538 on 3 df, p=<2e-16
Wald test = 6309 on 3 df, p=<2e-16
Score (logrank) test = 7060 on 3 df, p=<2e-16
From the table , it can be interpreted that 1 unit increase in processed order per day reduces possibility of churn rate considering all other covariates constant. But it is surprisingly showed that return affected customers has a higher churn possibility than who are not return affected considering other covariates constant.
log_rank_test <- survdiff(Surv(time_to_event,status) ~ return_status, data = survival_data)
log_rank_test
Call:
survdiff(formula = Surv(time_to_event, status) ~ return_status,
data = survival_data)
N Observed Expected (O-E)^2/E (O-E)^2/V
return_status=NR 12005 4246 8008 1768 5195
return_status=R 10743 8368 4606 3074 5195
Chisq= 5195 on 1 degrees of freedom, p= <2e-16
The table shows that there are significant different survival time between two groups as the p-value is significant.
ret_mer = survival_data %>%
filter(return_status =="R")
ret_cox = coxph(Surv(time_to_event,status) ~ processed_pr_day +ret_pct , data = ret_mer)
summary(ret_cox)
Call:
coxph(formula = Surv(time_to_event, status) ~ processed_pr_day +
ret_pct, data = ret_mer)
n= 10743, number of events= 8368
coef exp(coef) se(coef) z Pr(>|z|)
processed_pr_day 5.336e-04 1.001e+00 9.721e-05 5.489 4.05e-08 ***
ret_pct -4.647e-02 9.546e-01 1.495e-03 -31.073 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
processed_pr_day 1.0005 0.9995 1.0003 1.0007
ret_pct 0.9546 1.0476 0.9518 0.9574
Concordance= 0.58 (se = 0.004 )
Likelihood ratio test= 1508 on 2 df, p=<2e-16
Wald test = 1000 on 2 df, p=<2e-16
Score (logrank) test = 955 on 2 df, p=<2e-16
ggsurvplot(
survfit(ret_cox,data = ret_mer),
ggtheme = theme_minimal(), palette = "#2E9FDF"
) +
labs(
title = expression(bold("Survival plot for return merchant without adjusting covariate "))
)
## Doing analysis for return affected merchant considering
covariates
ret_cox_cov = coxph(Surv(time_to_event,status) ~ processed_pr_day +ret_pct , data = ret_mer)
summary(ret_cox_cov)
Call:
coxph(formula = Surv(time_to_event, status) ~ processed_pr_day +
ret_pct, data = ret_mer)
n= 10743, number of events= 8368
coef exp(coef) se(coef) z Pr(>|z|)
processed_pr_day 5.336e-04 1.001e+00 9.721e-05 5.489 4.05e-08 ***
ret_pct -4.647e-02 9.546e-01 1.495e-03 -31.073 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
processed_pr_day 1.0005 0.9995 1.0003 1.0007
ret_pct 0.9546 1.0476 0.9518 0.9574
Concordance= 0.58 (se = 0.004 )
Likelihood ratio test= 1508 on 2 df, p=<2e-16
Wald test = 1000 on 2 df, p=<2e-16
Score (logrank) test = 955 on 2 df, p=<2e-16
ggsurvplot(
survfit(ret_cox_cov,data = ret_mer),
ggtheme = theme_minimal(), palette = "#2E9FDF"
) +
labs(
title = expression(bold("Survival plot for return merchant adjusting covariates "))
)
nret_mer = survival_data %>%
filter(return_status =="NR")
nret_cox = coxph(Surv(time_to_event,status) ~ processed_pr_day , data = nret_mer)
summary(nret_cox)
Call:
coxph(formula = Surv(time_to_event, status) ~ processed_pr_day,
data = nret_mer)
n= 12005, number of events= 4246
coef exp(coef) se(coef) z Pr(>|z|)
processed_pr_day 0.022639 1.022897 0.002316 9.773 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
processed_pr_day 1.023 0.9776 1.018 1.028
Concordance= 0.633 (se = 0.004 )
Likelihood ratio test= 47.15 on 1 df, p=7e-12
Wald test = 95.52 on 1 df, p=<2e-16
Score (logrank) test = 142.7 on 1 df, p=<2e-16
ggsurvplot(
survfit(nret_cox,data = nret_mer),
ggtheme = theme_minimal()
) +
labs(
title = expression(bold("Survival plot for return unaffected merchant without adjusting covariate "))
)
# Impact of return percentage on reordering time
return_data = survival_data %>%
filter(return_status == "R")
return_data %>%
dim()
[1] 10743 8
library(MatchIt)
# Create a propensity score model
matchit_model <- matchit(status ~ processed_pr_day + ret_pct , data = return_data, method = "nearest",distance = "glm")
# Create matched dataset
matched_data <- match.data(matchit_model)
ggplot(return_data, aes(x = ret_pct, group = status, color = factor(status))) +
geom_density() +
theme_minimal() +
labs(title = "Density Plot of Return Percentage by Status before matching",
x = "Return Percentage",
y = "Density",
color = "Status") +
theme(legend.position = "none")
## after matching
ggplot(matched_data, aes(x = ret_pct, group = status, color = factor(status))) +
geom_density() +
theme_minimal() +
labs(title = "Density Plot of Return Percentage by Status after matching",
x = "Return Percentage",
y = "Density",
color = "Status") +
theme(legend.position = "none")
matched_model <- lm(time_to_event ~ ret_pct , data = matched_data)
# Display the summary of the matched model
tidy(matched_model)
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 27.1 0.580 46.7 0
2 ret_pct 1.40 0.0297 47.2 0
The table shows that return percentage has a positive coefficient that indicates that return percentage increases reorder time. so we have to think about that.
Business action
The analysis results bring positive insights for the business owner of Baby Cakes. It’s promising to note that customers affected by return orders tend to reorder faster compared to those who are not affected. This trend suggests a high level of trust and loyalty among return-affected customers, indicating their satisfaction with the cakes and their affinity towards the brand.
However, the analysis also raises an alarming concern. As the frequency of return orders increases, it implies a growing dissatisfaction among customers with the quality or taste of the cakes. This dissatisfaction could potentially lead to customer churn from Baby Cakes, posing a significant risk to the business’s reputation and revenue.
Business Actions: Quality Assurance and Product Improvement: Baby Cakes should prioritize efforts to enhance product quality and taste based on the feedback received from return orders. Conducting thorough evaluations of the reasons behind returns and addressing any issues related to ingredients, baking process, or packaging can help improve customer satisfaction and reduce return rates.
Customer Feedback Mechanism: Implementing a robust system for gathering and analyzing customer feedback can provide valuable insights into specific areas for improvement. Actively soliciting feedback through surveys, reviews, and direct communication channels allows Baby Cakes to address customer concerns promptly and proactively.
Communication and Transparency: Maintain transparent communication with customers regarding any changes or improvements made to the products and processes in response to their feedback. Demonstrating a commitment to quality and customer satisfaction can help rebuild trust and loyalty among dissatisfied customers.
Retention Strategies: Develop targeted retention strategies to engage and retain customers who have experienced return orders. Offering personalized incentives, discounts, or special promotions can encourage them to continue patronizing Baby Cakes while demonstrating appreciation for their loyalty.
By taking proactive measures to address customer concerns and improve product quality, Baby Cakes can mitigate the risk of churn and foster long-term relationships with its customers, ensuring sustained growth and success in the competitive online cake processing industry.