Many of the questions on this homework are deliberately open-ended. They invite you to apply what we covered in class — RFM construction, breakeven decision rules, segment-cell sample size, lift and gains, the cost of acting on noisy estimates — together with the strategic reasoning you have built across the rest of the programme. A strong answer is not one that recites the lab; it is one that shows you thinking like an analyst making a recommendation a marketing director could act on. Treat the homework as a reflection exercise: the goal is to consolidate what we have learned and generalise it to settings we did not cover.
This homework extends the work you did in the RFM lab. It is built around a new dataset — a transaction-level export from GourmetBox, an artisan food-and-wine retailer — which you will turn into RFM segments yourself. The lab’s StyleHive customer-level table is not used here.
.Rmd and the knitted output (pdf or
html)A note on AI tools. You may use AI tools to learn syntax or check your understanding. The questions are designed so that copy-pasted AI answers will not get you far: the qualitative questions reward specific, scenario-grounded reasoning, and the coding questions ask you to interpret your own numbers. Markers can tell when an answer doesn’t match the candidate’s code output.
# Setup — run this block first.
rm(list = ls())
options(scipen = 100, digits = 3)
library(dplyr)
library(ggplot2)
library(readr)
library(tidyr)
# Read the homework data (transactions and customer file).
transactions <- read.csv("data/gourmetbox_transactions.csv",
stringsAsFactors = FALSE)
transactions$order_date <- as.Date(transactions$order_date)
customers <- read.csv("data/gourmetbox_customers.csv",
stringsAsFactors = FALSE)
customers$signup_date <- as.Date(customers$signup_date)Pick one of the three scenarios below — none of them were covered in class. Stay with that single scenario for parts (a)–(c).
For your chosen scenario:
(a) Define R, F, and M operationally: which raw fields would you use, over what aggregation window, and as of what “snapshot” date? State the units.
(b) Identify two specific ways the StyleHive-style RFM framework would mislead in this scenario. Use specifics of your scenario, not generic complaints about RFM.
(c) Propose one modification — an extra dimension, an alternative metric, or a different segmentation rule — that addresses one of the failure modes from (b). State the trade-off your modification introduces (e.g., “this gives finer segmentation but doubles data requirements”).
(a)
For the chosen scenario of a (i) Subscription News Platform, the snapshot date is set to 3 May 2026 to represent the point at which targeting decisions are made in TODAY. The aggregation window covers the 90 days prior to this snapshot, a timeframe selected because it effectively captures recent engagement behavior while avoiding the noise associated with very short windows or the outdated signals found in longer horizons.
Recency (R) is measured by the number of days since a user’s last login or article read prior to the snapshot date (i.e. 3 May 2026). This metric is calculated using raw data fields for the “last login” and “last article read dates”, measured in units of “days”. A lower recency value indicates a more recently engaged user, which suggests they are more likely to maintain their news subscription.
Frequency (F) tracks the number of active days or articles read within the 90-day aggregation window. Derived from “login events” and “article read logs” raw fields, this count helps interpret user habits. A higher frequency signifies habitual usage and suggests that repeated readers possess high brand loyalty.
In a standard model, Monetary (M) would measure the average or total spend per transaction, which in this context refers to subscription fees. The raw fields used for this calculation include the “subscription fee” and the “number of subscription months”, measured in “GBP (£)”. However, because the subscription fee is fixed at £12 per month, average spend per transaction (M) is CONSTANT. Therefore, monetary value can be replaced with a duration-based metric (D), specifically the total time spent reading articles over the past 90 days. Using raw fields like “time spent per article” and “session duration”, this metric captures deeper user engagement, where a higher duration indicates a stronger perceived value from the subscription.
(b)
The main issue is that Monetary (M) doesn’t really tell us anything in a subscription model. Unlike the retail environment of “StyleHive” where spending varies significantly (some people spend way more than others), everyone here just pays a flat £12 a month. Since the price is the same for everyone, it breaks the RFM rule that “big spenders” are your most valuable customers. Hence, we end up with a model that only looks at “Recency” and “Frequency”. This means a “hardcore daily reader” and “rare reader” look exactly the same on paper because they both pay £12 a month, even though the rare reader is way more likely to cancel soon as they rarely use the subscription. If we stick to this, we’ll end up overvaluing people who aren’t actually engaged and making bad marketing calls.
Furthermore, using a “count-only” method for “frequency (F)” can misrepresent the quality of user engagement. While the StyleHive framework measures frequency through “transaction counts” to capture purchase habits, applying this to a news platform by counting logins or article reads ignores the depth of the interaction. A user who logs in frequently just to skim headlines may receive a high frequency score, whereas a user who visits less often but spends a lot of time on reading deep-dive articles appears less valuable under this metric. This approach may overvalue superficial activity and undervalue meaningful consumption, ultimately resulting in incorrect segmentation and flawed targeting strategies.
(c)
Since the money part doesn’t work, it is suggested to replace Monetary (M) by Engagement Duration (D). This would track the total time someone actually spends reading articles over the 90-day window. It’s a move we see in the real world—like how “Spotify” looks at listening time (Duration) instead of just the subscription fee to see how much someone actually likes the service (as mentioned in the lecture, Spotify is R,F,D). Duration is a much better way to see “behavioral intensity,” basically showing if a user actually cares about the content or is just paying for a subscription they don’t use. This fixes the issue I mentioned earlier where the monetary value was the same for everyone. By using time spent, the model can finally tell the difference between a high-value user who spends hours deep-diving into news and a low-value user who barely clicks anything. This makes the churn risk much easier to spot.
However, it is messier to track. The big trade-off here is that the data gets a lot more complicated. Subscription fees are very clean —you either paid £12 or you didn’t. Tracking duration means we need to monitor things like session time or how far someone scrolls, which can be noisy. For example, if someone leaves a tab open while they go make lunch, it might look like they’re “highly engaged” when they aren’t even in the room. It’s definitely more accurate for engagement, but the data is just harder to keep perfect.
GourmetBox ran an RFM-targeted email campaign last month. The team’s pre-launch deck projected a campaign-level ROMI of ~1.5×. The actual ROMI came in at 0.7×. The post-mortem report contains the table below, plus the operational facts beneath it.
| Segment | Predicted resp. rate | Actual resp. rate | n_mailed | n_responded |
|---|---|---|---|---|
| 111 (best) | 38% | 22% | 412 | 91 |
| 112 | 31% | 18% | 510 | 92 |
| 121 | 28% | 25% | 488 | 122 |
| 211 | 26% | 14% | 605 | 85 |
| 212 | 23% | 12% | 712 | 85 |
| 221 | 21% | 11% | 690 | 76 |
| 311 (mid) | 19% | 7% | 803 | 56 |
| 511 (worst) | 8% | 6% | 945 | 57 |
Operational facts (from the post-mortem):
n_mailed but generated no
responses.(a) Identify three plausible reasons for the gap between predicted and actual ROMI. Cite specific numbers from the table or facts above for each reason.
(b) For each reason, propose one corrective action that is testable in the next campaign.
(c) Rank your three reasons by your confidence that they were the primary driver of the underperformance, and explain in one sentence why your top reason wins.
(a)
The first big problem is that the data being used is both outdated and poorly selected. The predicted response rates are based on the previous quarter’s performance and a short 6-month lookback window, which doesn’t really show stable customer habits. While using OLD RFM segment rates can be helpful, applying them directly assumes that customer behavior and market conditions never change, which just is NOT TRUE in the real world (Environment is keep changing day by day). Also, because the aggregation window was shortened to 6 months—instead of the usual 12 months used in the lab, the model over-emphasizes recent activity. This means a customer who had a temporary spike in activity might get misclassified as “high-value,” which inflates the predicted response rate. All of this creates a systematic overestimation because the inputs and expected outputs aren’t aligned with how customers are actually behaving, leading to that gap in ROMI.
Second, the response rates are skewed because of delivery failures. According to the “operational facts,” 12% of emails bounced, yet they were still counted in the “n_mailed” total. Since these people never saw the offer, they had zero chance to respond. This inflates the denominator, making the campaign look less effective than it actually was and dragging down the observed ROMI.
Finally, the timing and the actual deal were just worse. Sending it at 10am on a Saturday is a big shift from the usual Tuesday–Thursday window, which likely hurt open rates and affect the response rates. The offer itself also had more friction: the minimum spend was increased to £40 (instead of £30) and the deadline was cut to just 5 days instead of 14. This explains why even the “111” segment—the best customers—saw their response rate tank from a predicted 38% to just 22%.
(b)
To address the issue of outdated and mis-specified data inputs, the firm should both improve how customers are scored and how response rates are estimated. First, aggregation window should be extended (e.g., to 12 months) to better capture stable behavioural patterns and reduce noise from short-term activity spikes. Second, we need to update our response rate estimates using more recent data instead of solely looking at what happened last quarter. These estimates should also be validated using a holdout group to measure actual campaign impact and ensure predictions reflect true incremental response. Finally, we should also keep running A/B tests to see how behavior changes and keep our segments up to date. Making these changes will make our targeting much more accurate and help us get a better handle on our real ROMI.
To fix the delivery issue, the team should clean the email list and exclude invalid addresses before deployment. This includes removing hard bounces, verifying emails, and so on. The next campaign can test this by comparing a “cleaned list” group versus a “raw list” group, measuring improvements in effective response rate (responses divided by successfully delivered emails). This ensures that performance metrics reflect TRUE customer behaviour rather than technical inefficiencies.
To improve campaign effectiveness, the team should optimise timing and offer design through controlled experiments. For example, test weekday (Tuesday–Thursday) versus weekend sends, and compare different offer structures (e.g., £10 off for £30 purchase vs £10 off for £40 purchase, and 5-day vs 14-day expiry). Running A/B tests helps figure out exactly how much the timing or the actual discount affects response rates. This lets the company find the best mix to get the most sales without losing money.
(c)
Ranking:
1-Outdated and mis-specified data inputs;
2-Weaker campaign design and timing;
3-Delivery inefficiency due to bounce rates.
The data issue is the primary driver because it causes systematic overestimation across all segments, whereas campaign design and delivery issues reduce performance but do not fully explain the consistent gap between predicted and actual response rates.
A junior analyst at GourmetBox has produced the following report for the marketing director. Read carefully — there are deliberate methodological errors. Your job is to catch them.
Memo: RFM rollout recommendation
To: Marketing Director From: Analyst
“I built an RFM model on Q1 data using customers who had opened at least one marketing email in the previous 6 months — these are our most engaged customers and the most relevant for evaluation. I scored each customer and assigned them to one of 125 segments using independent quintiles. I then computed the response rate per segment on the same Q1 data. The mean response rate across all 125 segments is 17.4%, well above the breakeven of 8%, so I recommend mailing every segment. The ROMI estimate from this analysis is 1.46× — a strong return.
I also recommend prioritising the top-performing segments in the next campaign. Segments 113, 152, and 313 had response rates of 88%, 75%, and 71% respectively — these are clearly the strongest cells and should receive repeat investment. (Segment 113 contained 4 customers, 152 had 4, and 313 had 7.)
For the monetary cutoffs, I used the median order value within each segment (rather than the mean) to make the segments more representative. The breakeven of 8% was computed using AOV = £45 × 35% margin minus £6 coupon value, divided by £2 cost per coupon.”
(a) Identify four distinct methodological errors in this report. For each, explain why it is a problem and propose a specific correction.
(b) Of the four errors, which is most likely to lead to real money lost if the recommendation is acted on? Defend in one sentence.
(a)
- The first error is severe selection bias caused by restricting the sample to customers who “opened at least one marketing email”. The analyst builds the RFM model using only customers who had opened an email in the past six months, which excludes less engaged/ inactive users. This creates a biased sample that over-represents highly responsive and active customers, inflating estimated response rates. As a result, the reported mean response rate of 17.4% is NOT representative of the full customer base and will overestimate expected campaign performance. This directly explains why the analyst recommends mailing every segment.
Correction for (1): Build the RFM model using the entire eligible customer base, not just prior openers, so that response rates reflect true population behaviour.
- The second error is that the model is both trained and evaluated on the same Q1 data. The analyst builds the RFM segments using Q1 data and then computes response rates on that “same Q1 dataset” as mentioned in the memo, which leads to overfitting. This approach inflates performance because the model is effectively capturing patterns specific to that dataset rather than learning behaviour that generalises to future campaigns. As a result, the estimated ROMI of 1.46× is likely overly optimistic and unreliable.
Correction for (2): To fix this, the model should be tested on different data rather than the same Q1 data used to build it. For example, the analyst could build the model using Q1 data and then check how well it performs on a later period (such as Q2) or on a separate group of customers.
- The third error is the recommendation to “mail every segment” based on an average response rate exceeding breakeven. Using the overall mean response rate (17.4%) to justify mailing all segments ignores the fundamental principle of RFM targeting: different segments have different profitability levels. Some segments—especially low-value ones—may still fall below the breakeven threshold of 8%, even if the overall average is above it. This leads to inefficient resource allocation and potentially negative returns in weaker segments.
Correction for (3): Evaluate response rates and profitability at the segment level, and only target segments whose expected response rates exceed breakeven, rather than relying on an aggregate average.
- The fourth error is drawing strong conclusions from extremely small segment sizes, leading to unreliable estimates.The analyst highlights segments 113 (88%), 152 (75%), and 313 (71%) as top performers, yet these segments contain only 4, 4, and 7 customers respectively. Such small sample sizes produce highly volatile and statistically unreliable response rates, meaning the observed performance is likely due to random variation rather than true underlying behaviour. Prioritising these segments risks overinvesting in noise rather than signal.
Correction for (4): Impose a minimum sample size threshold (e.g., at least 50–100 customers per segment) or apply statistical smoothing techniques before making decisions based on segment performance.
(b)
Using the average response rate to justify mailing all segments is most likely to lead to real money lost because it leads to contacting segments whose response rates fall below the breakeven threshold, where the cost per contact (including send fee and discount leakage) exceeds the expected profit per converter, resulting in systematic financial losses.
The CMO has just returned from a conference and has emailed the analytics team:
“After this week’s keynotes I’m convinced we should retire the segment-rule targeting we use today and replace it with a per-customer ML model. The conference vendors said they can build one for us in six months. Please prepare a one-page recommendation.”
Write a memo of around 250 words to the CMO. You must:
I recommend adopting a hybrid approach (RFM + ML) rather than fully replacing our current segment-rule targeting with a pure per-customer ML model.
First, targeting must remain anchored on the breakeven-as-decision rule, which determines whether contacting a customer is profitable. As shown in our analysis, a campaign should only target customer segments whose expected conversion rate exceeds the breakeven threshold (i.e. 8% in this case). RFM segmentation is well-suited for this because it provides stable, interpretable segment-level response rates that can be directly compared to breakeven. In contrast, a pure ML model optimises predicted probability, not profitability, and may target customers who are likely to respond but still generate negative returns after costs.
Second, ML should be used to improve performance through lift and gains, not to replace the segmentation structure. The lecture emphasises that a valuable model produces a “fat” gains curve, which basically means the model is great at finding the most likely buyers and putting them at the very top of the list. While RFM already does a good job of grouping customers, ML can make it even sharper. It can rank people within those high-performing RFM groups so we only spend our limited budget on the absolute best leads.The correct test is whether ML delivers incremental lift over the RFM baseline, not whether it is more complex.
Third, RFM is great because it’s stable. Since it groups people into segments, the numbers are more reliable than looking at each person individually. We can even use “shrinkage” techniques to fix the math for small groups so the estimates aren’t so jumpy.
Fourth, we need to make sure we’re actually causing people to buy, which is why we use holdout groups. As shown in the lab, these groups help us see if the campaign actually made a difference or if people were going to buy anyway. ML models are good at guessing who might buy, but they don’t prove the “why.” Without these tests, we might think the model is doing a better job than it actually is.
Risk: The main risk of a hybrid approach is that it might limit how much ML can improve performance, because we are still restricting decisions using RFM segments. This means we might miss some good customers that ML could identify outside those segments, so we don’t fully benefit from using ML. An early sign that this approach is not working well is if using ML does not actually improve results. For example, if customers selected using ML do not show higher response rates, or if ROMI does not increase compared to using RFM alone, then ML is not adding value and the approach may need to be changed.
You have two files:
gourmetbox_transactions.csv — one row per order line,
with order_id, customer_id, order_date, order_value, is_return, channel,
device.gourmetbox_customers.csv — one row per customer, with
customer_id, signup_date, first_order_days, acquisition_channel, device,
email_opens_6m, campaign, converted.The transactions span roughly two years and include returns. Your job is to construct the RFM table that the rest of the homework relies on. Several modelling decisions are yours to make and defend.
The maximum order date in the data is 2026-04-28.
In the chunk below:
is_return == 1): exclude returns from purchase counts?
subtract returned values from monetary? something else?Then write a short justification (3–5 sentences) below the chunk explaining each choice. Your three choices together should be internally consistent.
snapshot_date <- as.Date("2026-04-29")
lookback_days <- 365 #12 months
return_policy <- "Exclude returns from purchase counts and monetary value"Justification: The snapshot date was set to “2026-04-29”, ensuring a complete view of all transactions up to the maximum order date. A 365-day (12 months) lookback window was selected to account for the seasonality of the food-and-wine industry, effectively capturing annual cycles such as holidays (e.g., Christmas) and gift-giving (e.g., Thanksgiving) periods. To maintain the integrity of “successful demand,” returns were excluded from both frequency and monetary calculations, whcih means that these metrics reflect only completed orders and actual retained revenue.This allows more accurate estimates.
Build a customer-level table with columns customer_id,
recency, frequency, monetary. Use
dplyr — no custom helpers. Show head() and
dim() of the result.
Edge case: customers in customers.csv who do not have a
non-return purchase in your lookback window. Decide whether to drop
them, mark them with NA, or assign them a default (e.g., recency =
lookback_days, frequency = 0, monetary = 0). State your choice in a
comment.
rfm <- customers %>%
select(customer_id, campaign, converted) %>%
left_join( #keep all customers, even if they have no purchase in the lookback window.
transactions %>%
mutate(order_date = as.Date(order_date)) %>%
filter(
is_return == 0,
order_date >= snapshot_date - lookback_days,
order_date < snapshot_date
) %>%
group_by(customer_id) %>%
summarise(
recency = as.numeric(snapshot_date - max(order_date)),
frequency = n_distinct(order_id),
monetary = sum(order_value),
.groups = "drop"
),
by = "customer_id"
) %>%
# Edge case choice:
# Customers with no non-return purchase in the lookback window are kept.
# They receive recency = lookback_days, frequency = 0, monetary = 0.
mutate(
recency = if_else(is.na(recency), lookback_days, recency),
frequency = if_else(is.na(frequency), 0L, frequency),
monetary = if_else(is.na(monetary), 0, monetary)
)
head(rfm)## customer_id campaign converted recency frequency monetary
## 1 1 1 0 117 4 211
## 2 2 1 0 365 0 0
## 3 3 0 NA 7 3 149
## 4 4 1 0 365 0 0
## 5 5 1 0 14 7 199
## 6 6 1 0 37 6 141
## [1] 12000 6
Apply independent quintile segmentation using
ntile() (the same idiom as RFM_lab_student.Rmd
lines 41–44 — ntile(recency, 5) for R, and
6 - ntile(...) for F and M so that 1 = best on every
dimension). Concatenate into an RFM_indep code (e.g.,
“111”, “315”).
rfm <- rfm %>%
mutate(
Rgroup = ntile(recency, 5),
Fgroup = 6 - ntile(frequency, 5),
Mgroup = 6 - ntile(monetary, 5),
RFM_indep = paste0(Rgroup, Fgroup, Mgroup)
)
head(rfm)## customer_id campaign converted recency frequency monetary Rgroup Fgroup
## 1 1 1 0 117 4 211 3 3
## 2 2 1 0 365 0 0 4 5
## 3 3 0 NA 7 3 149 1 3
## 4 4 1 0 365 0 0 4 5
## 5 5 1 0 14 7 199 1 2
## 6 6 1 0 37 6 141 2 2
## Mgroup RFM_indep
## 1 2 332
## 2 5 455
## 3 3 133
## 4 5 455
## 5 3 123
## 6 3 223
##
## 1 2 3 4 5
## 2400 2400 2400 2400 2400
##
## 1 2 3 4 5
## 2400 2400 2400 2400 2400
##
## 1 2 3 4 5
## 2400 2400 2400 2400 2400
## [1] 48
Build a second version of frequency: orders in the last 90
days (rather than your lookback window from 5.1). Re-bin into
quintiles using ntile(). Call this column
Fgroup_90.
Build a contingency table comparing Fgroup (from your
5.1 lookback) and Fgroup_90. How many customers move
between top quintile (1) on one definition and bottom three
quintiles (3, 4, 5) on the other — a “definition
disagreement”?
Discuss: which definition would you use if GourmetBox’s marketing director asked for “frequent buyers”, and why? Are there business questions where the other definition would be more appropriate?
# Original Fgroup used 365 days (12 months) as the lookback window
# Following will creates another frequency measure using only the last 90 days (as provided in the question)
freq_90 <- transactions %>%
mutate(order_date = as.Date(order_date)) %>%
filter(
is_return == 0,
order_date >= snapshot_date - 90,
order_date < snapshot_date
) %>%
group_by(customer_id) %>%
summarise(
frequency_90 = n_distinct(order_id),
.groups = "drop"
)
# Remove old 90-day columns first, in case this chunk is re-run
rfm <- rfm %>%
select(-any_of(c("frequency_90", "Fgroup_90"))) %>%
left_join(freq_90, by = "customer_id") %>%
mutate(
frequency_90 = if_else(is.na(frequency_90), 0L, frequency_90),
Fgroup_90 = 6 - ntile(frequency_90, 5)
)
freq_compare <- table(rfm$Fgroup, rfm$Fgroup_90)
freq_compare##
## 1 2 3 4 5
## 1 1555 567 201 49 28
## 2 641 921 494 190 154
## 3 204 644 778 369 405
## 4 0 268 692 713 727
## 5 0 0 235 1079 1086
definition_disagreement <- rfm %>%
summarise(
n_disagree = sum(
(Fgroup == 1 & Fgroup_90 %in% c(3, 4, 5)) |
(Fgroup_90 == 1 & Fgroup %in% c(3, 4, 5))
)
)
definition_disagreement## n_disagree
## 1 482
Discussion (3–5 sentences): The contingency table shows that many customers are classified into similar frequency groups under both the 12-month and 90-day definitions, especially along the diagonal. However, there are still a number of customers who move between very different groups. In particular, 482 customers (201 + 49 + 28 + 204) fall into the top quintile in one definition but into the bottom three quintiles in the other, which shows a clear definition disagreement. This suggests that the choice of time window has a noticeable impact on how “frequent buyers” are identified.
If GourmetBox’s marketing director asked for “frequent buyers”, I would use the 12-month (365-day) definition. This is because it captures customers’ long-term purchasing behaviour and reflects whether they have a consistent buying habit over time. It is also more stable and less affected by short-term fluctuations.This is also suitable for the food and wine industry where purchases could be hugely impacted by seasonality.
However, the 90-day definition may be more appropriate for certain business questions. For example, if the purpose is to identify “recently active” customers or to target a short-term promotion, 90-day measure is more useful because it reflects current engagement situation.
The customer file includes a campaign flag and a
converted outcome for the customers who were emailed
(campaign == 1). Use the campaigned customers’ actual
conversions to evaluate the RFM-based targeting rule, then stress-test
the targeting decision against changes in the economic parameters.
Build a cumulative gains chart on the campaigned customers
(campaign == 1) by adapting the code in Appendix A
of RFM_lab_student.Rmd (“The lift / gains code,
for after class”). You will need to:
RFM_indep and take the mean of
converted. This is the homework’s equivalent of the lab’s
respRFM_df table.lift_table. Use
RFM_indep wherever the lab uses RFM_seq.respRFM_df <- rfm %>%
filter(campaign == 1) %>%
group_by(RFM_indep) %>%
summarise(
n_resp = sum(converted),
n_mail = n(),
resp_rate = mean(converted),
.groups = "drop"
)
scored <- rfm %>%
filter(campaign == 1) %>%
left_join(respRFM_df %>% select(RFM_indep, resp_rate), by = "RFM_indep") %>%
arrange(desc(resp_rate)) %>%
mutate(decile = ntile(desc(resp_rate), 10))
decile_summary <- scored %>%
group_by(decile) %>%
summarise(
n_customers = n(),
n_buyers = sum(converted),
response = round(mean(converted), 4),
.groups = "drop"
)
total_buyers <- sum(scored$converted)
total_customers <- nrow(scored)
overall_rate <- total_buyers / total_customers
lift_table <- decile_summary %>%
mutate(
cum_customers = cumsum(n_customers),
cum_buyers = cumsum(n_buyers),
cum_lift = cum_buyers / cum_customers / overall_rate,
cum_gains = cum_buyers / total_buyers,
pct_customers = cum_customers / total_customers
)
# Plot cumulative gains chart
ggplot(lift_table, aes(x = pct_customers, y = cum_gains)) +
geom_line(linewidth = 1.2) +
geom_point(size = 2) +
geom_abline(intercept = 0, slope = 1, linetype = "dashed") +
scale_x_continuous(labels = scales::percent) +
scale_y_continuous(labels = scales::percent) +
labs(
title = "Cumulative Gains Chart",
x = "Proportion of customers targeted",
y = "Proportion of converters captured"
) +
theme_minimal()Write your interpretation (~150 words suggested) covering:
The cumulative gains curve sits clearly above the 45-degree random-targeting baseline, showing that the RFM model delivers a meaningful lift. This indicates that the model is effective at ranking customers by their likelihood to convert. The lift is most pronounced in the top deciles, especially within the first 20% – 40% of customers, where the curve increases very steeply. For instance, targeting roughly the top 30% of customers captures about two-thirds of all converters, which is hugely better than random targeting.
After you reach about 50% - 60% of the list, the curve starts to flatten out. This means that there are diminishing marginal returns: for every extra person we target after that point, we get fewer and fewer new sales. It basically becomes less worth the effort and the efficicency is reducing.
To conclude, GourmetBox should stick to the top-ranked segments, probably the best 40% to 60% of customers. Targeting beyond this range may lead to higher costs without a proportional increase in conversions.
Use the homework’s economic parameters:
Compute the breakeven response rate. Identify the segments above breakeven on the campaigned data. Report total profit and ROMI across those segments (using the campaigned customers’ actual conversions, not predictions).
cost <- 2.00
aov <- 45.00
gross_margin <- 0.35
coupon_value <- 6.00
profit_per_converter <- gross_margin * aov - coupon_value
breakeven <- cost / profit_per_converter
cat(sprintf("Profit per converter: £%.2f\n", profit_per_converter))## Profit per converter: £9.75
## Breakeven response rate: 0.2051 (20.51%)
segment_profit <- respRFM_df %>%
mutate(
target = resp_rate >= breakeven,
revenue_from_converters = n_resp * profit_per_converter,
contact_cost = n_mail * cost,
profit = revenue_from_converters - contact_cost
)
target_segments <- segment_profit %>%
filter(target == TRUE) %>%
arrange(desc(resp_rate))
target_segments## # A tibble: 18 × 8
## RFM_indep n_resp n_mail resp_rate target revenue_from_converters contact_cost
## <chr> <int> <int> <dbl> <lgl> <dbl> <dbl>
## 1 242 1 1 1 TRUE 9.75 2
## 2 112 33 78 0.423 TRUE 322. 156
## 3 111 312 741 0.421 TRUE 3042 1482
## 4 122 142 375 0.379 TRUE 1384. 750
## 5 134 10 27 0.370 TRUE 97.5 54
## 6 121 26 71 0.366 TRUE 253. 142
## 7 212 27 78 0.346 TRUE 263. 156
## 8 133 61 191 0.319 TRUE 595. 382
## 9 211 152 478 0.318 TRUE 1482 956
## 10 234 16 53 0.302 TRUE 156 106
## 11 123 18 61 0.295 TRUE 175. 122
## 12 144 17 61 0.279 TRUE 166. 122
## 13 232 20 75 0.267 TRUE 195 150
## 14 321 13 54 0.241 TRUE 127. 108
## 15 222 95 396 0.240 TRUE 926. 792
## 16 311 48 203 0.236 TRUE 468 406
## 17 221 21 89 0.236 TRUE 205. 178
## 18 132 11 49 0.224 TRUE 107. 98
## # ℹ 1 more variable: profit <dbl>
target_summary <- target_segments %>%
summarise(
n_segments = n(),
n_customers = sum(n_mail),
n_converters = sum(n_resp),
total_profit = sum(profit),
total_cost = sum(contact_cost),
ROMI = total_profit / total_cost
)
target_summary## # A tibble: 1 × 6
## n_segments n_customers n_converters total_profit total_cost ROMI
## <int> <int> <int> <dbl> <dbl> <dbl>
## 1 18 3081 1023 3812. 6162 0.619
Build a 2 × 2 scenario grid: cost ∈ {−20%, +20%} × (gross_margin × AOV − coupon_value) ∈ {−20%, +20%}. For each of the four scenarios, compute:
Tabulate the four results. (Tip: a tidyr::expand_grid()
followed by a mutate() with the formulas keeps this
tidy.)
scenario_grid <- expand_grid(
cost_change = c(-0.20, 0.20),
margin_change = c(-0.20, 0.20)
) %>%
mutate(
scenario = paste0(
"Cost ", if_else(cost_change < 0, "-20%", "+20%"),
", Margin ", if_else(margin_change < 0, "-20%", "+20%")
),
scenario_cost = cost * (1 + cost_change),
scenario_profit_per_converter = profit_per_converter * (1 + margin_change),
scenario_breakeven = scenario_cost / scenario_profit_per_converter
)
scenario_results <- scenario_grid %>%
rowwise() %>%
mutate(
n_segments_pass = sum(respRFM_df$resp_rate >= scenario_breakeven),
total_profit = sum(
(respRFM_df$n_resp * scenario_profit_per_converter -
respRFM_df$n_mail * scenario_cost)[respRFM_df$resp_rate >= scenario_breakeven]
),
total_cost = sum(
(respRFM_df$n_mail * scenario_cost)[respRFM_df$resp_rate >= scenario_breakeven]
),
ROMI = total_profit / total_cost
) %>%
ungroup() %>%
select(
scenario,
scenario_cost,
scenario_profit_per_converter,
scenario_breakeven,
n_segments_pass,
total_profit,
ROMI
)
scenario_results## # A tibble: 4 × 7
## scenario scenario_cost scenario_profit_per_…¹ scenario_breakeven
## <chr> <dbl> <dbl> <dbl>
## 1 Cost -20%, Margin -20% 1.6 7.8 0.205
## 2 Cost -20%, Margin +20% 1.6 11.7 0.137
## 3 Cost +20%, Margin -20% 2.4 7.8 0.308
## 4 Cost +20%, Margin +20% 2.4 11.7 0.205
## # ℹ abbreviated name: ¹scenario_profit_per_converter
## # ℹ 3 more variables: n_segments_pass <int>, total_profit <dbl>, ROMI <dbl>
## # A tibble: 4 × 7
## scenario scenario_cost scenario_profit_per_converter
## <chr> <dbl> <dbl>
## 1 Cost -20%, Margin -20% 1.6 7.8
## 2 Cost -20%, Margin +20% 1.6 11.7
## 3 Cost +20%, Margin -20% 2.4 7.8
## 4 Cost +20%, Margin +20% 2.4 11.7
## scenario_breakeven n_segments_pass total_profit ROMI
## <dbl> <int> <dbl> <dbl>
## 1 0.205 18 3050. 0.619
## 2 0.137 25 7299. 1.23
## 3 0.308 9 1063. 0.217
## 4 0.205 18 4575. 0.619
A “robust” segment is one that is profitable (above its scenario-specific breakeven) under all four scenarios from 6.3. Report:
segment_scenario <- respRFM_df %>%
crossing(scenario_grid) %>%
mutate(
profitable = resp_rate >= scenario_breakeven
)
robust_segments <- segment_scenario %>%
group_by(RFM_indep, resp_rate) %>%
summarise(
profitable_scenarios = sum(profitable),
.groups = "drop"
) %>%
filter(profitable_scenarios == 4) %>%
arrange(desc(resp_rate))
robust_segments## # A tibble: 9 × 3
## RFM_indep resp_rate profitable_scenarios
## <chr> <dbl> <int>
## 1 242 1 4
## 2 112 0.423 4
## 3 111 0.421 4
## 4 122 0.379 4
## 5 134 0.370 4
## 6 121 0.366 4
## 7 212 0.346 4
## 8 133 0.319 4
## 9 211 0.318 4
fragile_segments <- segment_scenario %>%
group_by(RFM_indep, resp_rate) %>%
summarise(
profitable_scenarios = sum(profitable),
.groups = "drop"
) %>%
filter(profitable_scenarios >= 1, profitable_scenarios <= 3) %>%
arrange(desc(profitable_scenarios), desc(resp_rate))
fragile_segments## # A tibble: 16 × 3
## RFM_indep resp_rate profitable_scenarios
## <chr> <dbl> <int>
## 1 234 0.302 3
## 2 123 0.295 3
## 3 144 0.279 3
## 4 232 0.267 3
## 5 321 0.241 3
## 6 222 0.240 3
## 7 311 0.236 3
## 8 221 0.236 3
## 9 132 0.224 3
## 10 223 0.203 1
## 11 412 0.2 1
## 12 421 0.176 1
## 13 233 0.172 1
## 14 244 0.166 1
## 15 243 0.162 1
## 16 332 0.152 1
Recommendation: GourmetBox should mainly focus on the robust RFM segments since they stay profitable across all four scenarios, so they are actually the safest group to target. It might still be okay to include some fragile segments, but only if they have really high response rates and they only fail in the worst-case situations. I wouldn’t suggest sending to all fragile segments though, because their profit can change a lot if costs or margins change a bit. For the next campaign, GourmetBox should watch several things like actual response rate (are people really buying?), how much the coupons are really costing, and the margin per order (How much profit is left after everything is paid?). Also need to check if those fragile segments are still making profit after costs. This can help decide whether to keep using them, reduce them, or just stop targeting them later on.
This question is optional and goes beyond what we covered in class. The lab and lecture treated RFM as a single-snapshot tool; here you extend it to study how customers move between segments over time. Bonus marks are added on top of your Q1–Q6 total — there is no penalty for skipping this question, and a complete homework without it can still earn the full 100.
The transaction data spans roughly two years, so you can compute RFM
at two snapshot dates and analyse the migration. Restrict this
question to customers with first_order_days >= 365 (so
everyone has history at both snapshots).
Construct a second RFM table as if “today” were 180 days
before your snapshot from Q5.1. Use the same lookback window
and return policy you used in 5.1. Apply independent quintile
segmentation (same ntile() idiom).
Customers who appear in your earlier-snapshot must have at least one purchase in the earlier-snapshot’s lookback window — drop the rest with a comment in the code.
earlier_snapshot <- snapshot_date - 180
eligible_customers <- customers %>%
filter(first_order_days >= 365) %>%
select(customer_id)
rfm_earlier <- transactions %>%
mutate(order_date = as.Date(order_date)) %>%
semi_join(eligible_customers, by = "customer_id") %>%
filter(
is_return == 0,
order_date >= earlier_snapshot - lookback_days,
order_date < earlier_snapshot
) %>%
group_by(customer_id) %>%
summarise(
recency = as.numeric(earlier_snapshot - max(order_date)),
frequency = n_distinct(order_id),
monetary = sum(order_value),
.groups = "drop"
) %>%
mutate(
Rgroup_earlier = ntile(recency, 5),
Fgroup_earlier = 6 - ntile(frequency, 5),
Mgroup_earlier = 6 - ntile(monetary, 5),
RFM_earlier = paste0(Rgroup_earlier, Fgroup_earlier, Mgroup_earlier)
)
head(rfm_earlier)## # A tibble: 6 × 8
## customer_id recency frequency monetary Rgroup_earlier Fgroup_earlier
## <int> <dbl> <int> <dbl> <int> <dbl>
## 1 1 40 4 172. 2 4
## 2 3 117 3 127. 4 4
## 3 4 210 2 81.2 5 5
## 4 5 174 3 90.1 4 4
## 5 6 57 5 172. 3 3
## 6 7 26 3 104. 2 4
## # ℹ 2 more variables: Mgroup_earlier <dbl>, RFM_earlier <chr>
## [1] 8334 8
Build the full migration matrix between the earlier
and current segments as a long data frame — one row per
(earlier_segment, current_segment) pair, with a column
giving the number of customers in that cell. Keep this full table; Q7.3
will use it.
Then print the top 10 migration cells by customer
count, sorted from largest to smallest. A simple
arrange(desc(n)) %>% slice_head(n = 10) is enough.
migration_full <- rfm_earlier %>%
inner_join(
rfm %>%
semi_join(eligible_customers, by = "customer_id") %>%
select(
customer_id,
RFM_current = RFM_indep,
current_monetary = monetary
),
by = "customer_id"
) %>%
count(RFM_earlier, RFM_current, name = "n")
top10_migration <- migration_full %>%
arrange(desc(n)) %>%
slice_head(n = 10)
knitr::kable(top10_migration)| RFM_earlier | RFM_current | n |
|---|---|---|
| 555 | 555 | 620 |
| 111 | 111 | 275 |
| 211 | 111 | 196 |
| 111 | 211 | 178 |
| 444 | 444 | 131 |
| 211 | 211 | 120 |
| 455 | 444 | 105 |
| 544 | 555 | 104 |
| 111 | 311 | 95 |
| 311 | 111 | 91 |
A migration is consequential if it represents both (i) a meaningful number of customers and (ii) a meaningful change in expected value (e.g., high-value customers moving toward a “lost” segment costs more than low-value customers doing the same).
Define your own metric for “consequential” (e.g., customer count × current monetary, or count × the response-rate gap between the two segments). Defend your metric briefly. Then identify your top three migrations and report the customer count and your metric value for each.
migration_value <- rfm_earlier %>%
inner_join(
rfm %>%
semi_join(eligible_customers, by = "customer_id") %>%
select(
customer_id,
RFM_current = RFM_indep,
current_monetary = monetary
),
by = "customer_id"
) %>%
group_by(RFM_earlier, RFM_current) %>%
summarise(
n = n(),
avg_current_monetary = mean(current_monetary, na.rm = TRUE),
consequential_score = n * avg_current_monetary,
.groups = "drop"
) %>%
arrange(desc(consequential_score))
top3_migrations <- migration_value %>%
slice_head(n = 3)
knitr::kable(top3_migrations)| RFM_earlier | RFM_current | n | avg_current_monetary | consequential_score |
|---|---|---|---|---|
| 111 | 111 | 275 | 732 | 201184 |
| 211 | 111 | 196 | 715 | 140223 |
| 111 | 211 | 178 | 712 | 126793 |
Metric definition and rationale (3–5 sentences): I define a consequential migration by multiplying customer count by their current spending. This metric is helpful because it looks at both the volume of the movement and the actual money involved. It helps show that a huge group of people moving might not matter much if they barely spend anything, whereas a smaller group of big spenders moving could be a much bigger deal.
Pick one of your top-three migrations from 7.3. Propose an intervention campaign (~150 words suggested) that includes:
Include any code below for the breakeven and ROMI calculation.
selected_migration <- top3_migrations %>%
filter(RFM_earlier == "111", RFM_current == "211") # I chose 111 -> 211 customers
target_n <- selected_migration$n
cost <- 2
profit_per_converter <- 9.75
breakeven_rate <- cost / profit_per_converter
assumed_response_rate <- 0.30
# I assume a 30% response rate because these customers were previously highly engaged (111) and should respond better than average customers to a personalised retention campaign
expected_converters <- target_n * assumed_response_rate
campaign_cost <- target_n * cost
expected_profit <- expected_converters * profit_per_converter - campaign_cost
expected_ROMI <- expected_profit / campaign_cost
intervention_results <- tibble(
target_migration = paste0(
selected_migration$RFM_earlier,
" to ",
selected_migration$RFM_current
),
target_n = target_n,
breakeven_rate = breakeven_rate,
assumed_response_rate = assumed_response_rate,
expected_converters = expected_converters,
campaign_cost = campaign_cost,
expected_profit = expected_profit,
expected_ROMI = expected_ROMI
)
knitr::kable(intervention_results)| target_migration | target_n | breakeven_rate | assumed_response_rate | expected_converters | campaign_cost | expected_profit | expected_ROMI |
|---|---|---|---|---|---|---|---|
| 111 to 211 | 178 | 0.205 | 0.3 | 53.4 | 356 | 165 | 0.462 |
Intervention proposal (~150 words suggested):
I’m targeting on customers moving from 111 to 211 because they used to be our best shoppers but are now starting to slip. This group is still very important for the business, and if we step in now, we can stop them from becoming low-value customers and increase retention rate (which helps a lot with the CLV).
I’ll use email for this because it’s a relatively cheaper way and allows us to send personalised messages. The goal is to bring them back with a reminder message: “We miss you! Here is a special reward for your favorite GourmetBox items”. We can uses what we already know they like to make the message feel more personal.
I’m expecting a 30% response rate since these people were very active in the past and are usually more likely to buy than others. Since the breakeven point is roughly 20.5% (£2 / £9.75), this 30% estimate good enough for us to stay safe. If things go as planned, the campaign should bring good profit and strong ROMI, making it a solid plan to keep our customers.
In ~100 words, name two situations in which migration analysis is more useful than static RFM, and two situations in which it could mislead.
Migration analysis is better than just looking at a single RFM snapshot when we want to see if customers are getting better or worse over time. It’s great for preventing people from churning, because you can catch someone moving from a strong group to a weaker one (e.g., 111 -> 211 from above example) and message them before they stop buying entirely.
However, some situations may be misleading. First, seasonality, if your business is seasonal, a customer might look like they are “declining” just because it’s the off-season, not because they’ve actually lost interest. It can also be confusing for people who don’t buy on a regular schedule, we might think they are turning to a worse segment when they’re actually just taking a normal break between orders.
Before submitting, verify: