1 Introduction
2 Packages / Libraries
3 Data Overview
4 Exploratory Data Analysis (EDA) and Findings
5 Recommendations to the CEO
6 Summary and Limitations
7 Appendix A: Reproducibility Notes (Optional)
8 Appendix B: Full Analysis Code (Not Executed During Knit)

1 Introduction

Regork (a national grocery chain) invests heavily in promotions: in-store displays, mailers, and coupons. Promotions are intended to increase sales, but discounts can also erode revenue and profit if they do not generate enough incremental volume.

This report answers one focused business question:

Which customer segments respond profitably to promotions (campaigns/coupons), and where are we over-discounting with little incremental lift?

1.1 Why the CEO should care

Promotions are a major, controllable lever for growth. If discounting is not paying back in incremental revenue (and especially if it is not paying back in margin), then:

revenue growth becomes “expensive,”
profitability suffers, and
the organization may be training customers to buy only when items are discounted.

Finding which segments (and discount tools) are most efficient lets Regork reallocate promotional spend toward higher-return activity.

1.2 What I did (data + methodology)

Data used (2 required tables):

transactions: customer purchases (household, basket, product, quantities, net sales, discount components, week)
promotions: product-store-week display/mailer placements

Key engineering:

Created a gross sales proxy = sales_value + total_discount (since list price isn’t directly provided).
Flagged promotion exposure per product-store-week (promo_any, promo_display, promo_mailer).
Built behavioral segments:
- Value segment = quartiles of household net sales
- Promo-prone segment = quartiles of household share of spend on promoted items

Incremental lift approach (interpretable and robust):

For promotions, estimated within-product-store changes over time using fixed-effect regressions:
- controls for persistent differences by product-store
- controls for common week-level shocks (seasonality)
For coupons, estimated association between coupon dollars and gross basket value using basket-level fixed effects.

Important: The dataset does not include item-level margins/COGS, so “profitability” is assessed using a revenue-based ROI proxy. I highlight limitations and how to improve the measurement.

1.3 How the analysis helps Regork

This analysis produces:

A clear diagnostic: promotions appear to be over-discounting across all value segments (negative ROI proxy).
A targeting insight: couponing correlates with larger baskets and looks more efficient than broad promotions (under a revenue proxy).
Actionable next steps: reduce low-return promo exposure, reallocate spend toward targeted couponing, and implement holdout testing to measure true incrementality and profit.

2 Packages / Libraries

This report uses:

dplyr: data manipulation (filter, mutate, summarise)
readr: fast loading of CSV output tables
ggplot2: plots for trends and segment comparisons
scales: consistent number formatting (percent and compact notation)
knitr (+ optional kableExtra): tables in the HTML report

Messages/warnings from package loading are suppressed for readability and reproducibility.

3 Data Overview

inventory <- readr::read_csv(file.path(tbl_dir, "data_inventory.csv"), show_col_types = FALSE)
render_table(inventory, caption = "Data inventory (row/column counts from the analysis pipeline)")

Data inventory (row/column counts from the analysis pipeline)
table	rows	cols
transactions	1460438	19
promotions	20927744	6
products_from_get_data	0	0
demographics_from_get_data	0	0
tx_joined	1460438	23

Notes on available tables.
In this course distribution of Complete Journey, only transactions and promotions were available (no product hierarchy table, no demographics table). As a result:

segmentation is behavioral (value + promo-prone),
“where” analysis is done at the product_id level instead of department/category.

4 Exploratory Data Analysis (EDA) and Findings

4.1 1) Overall sales and discount trends over time

weekly <- readr::read_csv(file.path(tbl_dir, "weekly_summary.csv"), show_col_types = FALSE)
weekly_summary <- weekly %>%
  summarise(
    weeks = n(),
    avg_weekly_net_sales = mean(net_sales),
    avg_discount_rate = mean(discount_rate),
    min_discount_rate = min(discount_rate),
    max_discount_rate = max(discount_rate),
    avg_promo_share_net = mean(promo_share_net)
  )

render_table(
  weekly_summary %>%
    mutate(
      avg_weekly_net_sales = dollar(avg_weekly_net_sales),
      avg_discount_rate = percent(avg_discount_rate, accuracy = 0.1),
      min_discount_rate = percent(min_discount_rate, accuracy = 0.1),
      max_discount_rate = percent(max_discount_rate, accuracy = 0.1),
      avg_promo_share_net = percent(avg_promo_share_net, accuracy = 0.1)
    ),
  caption = "High-level weekly summary across the full time window"
)

High-level weekly summary across the full time window
weeks	avg_weekly_net_sales	avg_discount_rate	min_discount_rate	max_discount_rate	avg_promo_share_net
53	$86,717.49	15.1%	13.3%	18.7%	19.6%

knitr::include_graphics(c(
  file.path(fig_dir, "weekly_net_sales.png"),
  file.path(fig_dir, "weekly_discount_rate.png")
))

What stands out:

Net sales are relatively stable week-to-week after the early weeks.
Discount rates fluctuate meaningfully, suggesting promotion intensity (or coupon usage) varies over time.
This volatility motivates segment-level analysis: who is receiving/using discounts, and is it paying back?

4.2 2) Segment profiles: who drives sales and who consumes discounts?

4.2.1 Value segments (households split into quartiles by total net sales)

kpi_value <- readr::read_csv(file.path(tbl_dir, "segment_kpis_value_segment.csv"), show_col_types = FALSE) %>%
  mutate(
    net_sales = dollar(net_sales),
    gross_sales = dollar(gross_sales),
    discount = dollar(discount),
    discount_rate = percent(discount_rate, accuracy = 0.1),
    promo_net_sales = dollar(promo_net_sales),
    promo_share_net = percent(promo_share_net, accuracy = 0.1),
    coupon_discount = dollar(coupon_discount)
  )

render_table(kpi_value, caption = "Segment KPIs by Value Segment (quartiles of household net sales)")

Segment KPIs by Value Segment (quartiles of household net sales)
value_segment	households	trips	net_sales	gross_sales	discount	discount_rate	promo_net_sales	promo_share_net	coupon_discount
Value Q4 (Top)	617	80978	$2,846,046	$3,328,250	$482,204	14.5%	$530,990	18.7%	$16,622.95
Value Q3	617	43415	$1,110,066	$1,316,928	$206,861	15.7%	$229,408	20.7%	$5,492.73
Value Q2	617	22480	$501,210	$600,736	$99,526	16.6%	$113,511	22.6%	$1,693.64
Value Q1 (Low)	618	8645	$138,705	$166,121	$27,416	16.5%	$30,609	22.1%	$548.93

Interpretation:

Top value households (Value Q4) drive the majority of revenue, as expected.
Discount rates are slightly lower for high-value customers than for mid/low value groups.
Mid/low segments have higher promo share of spend, suggesting they are more promotion-reliant.

4.2.2 Promo-prone segments (households split into quartiles by promo share of spend)

kpi_promo <- readr::read_csv(file.path(tbl_dir, "segment_kpis_promo_prone_segment.csv"), show_col_types = FALSE) %>%
  mutate(
    net_sales = dollar(net_sales),
    gross_sales = dollar(gross_sales),
    discount = dollar(discount),
    discount_rate = percent(discount_rate, accuracy = 0.1),
    promo_net_sales = dollar(promo_net_sales),
    promo_share_net = percent(promo_share_net, accuracy = 0.1),
    coupon_discount = dollar(coupon_discount)
  )

render_table(kpi_promo, caption = "Segment KPIs by Promo-Prone Segment (quartiles of promo share)")

Segment KPIs by Promo-Prone Segment (quartiles of promo share)
promo_prone_segment	households	trips	net_sales	gross_sales	discount	discount_rate	promo_net_sales	promo_share_net	coupon_discount
Promo Med-Low	617	44928	$1,390,030	$1,606,986	$216,955	13.5%	$234,294	16.9%	$6,621.02
Promo Med-High	617	43062	$1,215,543	$1,455,331	$239,789	16.5%	$278,263	22.9%	$6,398.46
Promo Low	618	36104	$1,163,959	$1,299,016	$135,057	10.4%	$122,910	10.6%	$4,095.98
Promo High	617	31424	$826,495	$1,050,701	$224,206	21.3%	$269,049	32.6%	$7,242.79

Key pattern:
The Promo High segment has:

the highest discount rate, and
the highest promo share of spend,
while contributing a smaller share of overall net sales than the mid segments.

This is a classic sign of a segment that may be price conditioned (buying mainly when discounted).

4.3 3) Are promotions paying back? (Promo ROI proxy by Value Segment)

Because margins are not available, I use a revenue-based ROI proxy:

\[ \text{ROI proxy} = \frac{\text{Gross lift} - \text{Discount lift}}{\text{Discount lift}} \]

If ROI proxy > 0: gross lift exceeds discount cost (good, on a revenue basis)
If ROI proxy < 0: discount cost exceeds gross lift (over-discounting risk)

promo_lift <- readr::read_csv(file.path(tbl_dir, "promo_lift_by_value_segment.csv"), show_col_types = FALSE) %>%
  mutate(
    net_lift_promo_any = round(net_lift_promo_any, 4),
    discount_lift_promo_any = round(discount_lift_promo_any, 4),
    roi_proxy = round(roi_proxy, 4)
  )

render_table(promo_lift, caption = "Estimated promo lift and ROI proxy (promotion vs non-promotion) by Value Segment")

Estimated promo lift and ROI proxy (promotion vs non-promotion) by Value Segment
value_segment	net_lift_promo_any	discount_lift_promo_any	roi_proxy
Value Q1 (Low)	-0.2432	0.7873	-0.3088
Value Q2	-0.1716	0.7751	-0.2213
Value Q3	-0.2179	0.7650	-0.2848
Value Q4 (Top)	-0.1621	0.8734	-0.1855

knitr::include_graphics(file.path(fig_dir, "promo_roi_proxy_value_segment.png"))

Finding: ROI proxy is negative for every value segment.

The worst segment is Value Q1 (Low) (most negative ROI proxy).
Value Q4 (Top) is “least bad,” but still negative.

CEO implication (revenue proxy): current broad promotions look like over-discounting, especially for low-value households. If margins are thin (typical grocery), the profit impact is likely even worse.

4.4 4) Coupons: do they appear more efficient than broad promotions?

Coupon “efficiency” is evaluated at the basket level as:

estimated gross lift per $1 coupon discount
net return proxy = (gross lift per $1) − 1

coupon_eff <- readr::read_csv(file.path(tbl_dir, "coupon_efficiency_by_segment.csv"), show_col_types = FALSE) %>%
  mutate(
    coef_gross_per_1_coupon = round(coef_gross_per_1_coupon, 3),
    net_return_per_1_coupon = round(net_return_per_1_coupon, 3)
  )

render_table(coupon_eff, caption = "Coupon efficiency (revenue proxy) by segment")

Coupon efficiency (revenue proxy) by segment
segment_var	segment_level	coef_gross_per_1_coupon	net_return_per_1_coupon
value_segment	Value Q3	9.371	8.371
value_segment	Value Q1 (Low)	4.791	3.791
value_segment	Value Q4 (Top)	11.206	10.206
value_segment	Value Q2	11.728	10.728
promo_prone_segment	Promo Med-Low	11.198	10.198
promo_prone_segment	Promo Low	12.461	11.461
promo_prone_segment	Promo High	9.004	8.004
promo_prone_segment	Promo Med-High	11.131	10.131

coupon_highlights <- coupon_eff %>%
  group_by(segment_var) %>%
  arrange(desc(net_return_per_1_coupon)) %>%
  slice_head(n = 1) %>%
  ungroup()

render_table(coupon_highlights, caption = "Highest coupon net-return segments (within each segmentation)")

Highest coupon net-return segments (within each segmentation)
segment_var	segment_level	coef_gross_per_1_coupon	net_return_per_1_coupon
promo_prone_segment	Promo Low	12.461	11.461
value_segment	Value Q2	11.728	10.728

Interpretation (cautious):

All segments show positive coupon net-return proxies.
The lowest coupon net-return is in Value Q1 (Low).
The Promo Low segment has the highest coupon net-return proxy, which is consistent with coupons being more incremental for customers who are not already discount-conditioned.

Caution: Coupon usage is not randomly assigned. This result should be treated as a prioritization signal for testing, not a definitive causal estimate.

4.5 5) “Over-discounting” flags: where should we stop subsidizing?

promo_flags <- readr::read_csv(file.path(tbl_dir, "promo_overdiscount_flags.csv"), show_col_types = FALSE) %>%
  mutate(roi_proxy = round(roi_proxy, 4))

coupon_flags <- readr::read_csv(file.path(tbl_dir, "coupon_overdiscount_flags.csv"), show_col_types = FALSE) %>%
  mutate(
    coef_gross_per_1_coupon = round(coef_gross_per_1_coupon, 3),
    net_return_per_1_coupon = round(net_return_per_1_coupon, 3)
  )

render_table(promo_flags, caption = "Promotion over-discount flags (ROI proxy ≤ 0)")

Promotion over-discount flags (ROI proxy ≤ 0)
value_segment	net_lift_promo_any	discount_lift_promo_any	roi_proxy	over_discount_flag
Value Q1 (Low)	-0.2431593	0.7873239	-0.3088	TRUE
Value Q2	-0.1715567	0.7751448	-0.2213	TRUE
Value Q3	-0.2179016	0.7650251	-0.2848	TRUE
Value Q4 (Top)	-0.1620533	0.8733747	-0.1855	TRUE

render_table(coupon_flags, caption = "Coupon over-discount flags (net return < 0)")

Coupon over-discount flags (net return
segment_var	segment_level	coef_gross_per_1_coupon	net_return_per_1_coupon	over_discount_flag
value_segment	Value Q3	9.371	8.371	FALSE
value_segment	Value Q1 (Low)	4.791	3.791	FALSE
value_segment	Value Q4 (Top)	11.206	10.206	FALSE
value_segment	Value Q2	11.728	10.728	FALSE
promo_prone_segment	Promo Med-Low	11.198	10.198	FALSE
promo_prone_segment	Promo Low	12.461	11.461	FALSE
promo_prone_segment	Promo High	9.004	8.004	FALSE
promo_prone_segment	Promo Med-High	11.131	10.131	FALSE

Bottom line:

Promotions: flagged as over-discounting in every value segment.
Coupons: no segments flagged under the revenue proxy metric.

5 Recommendations to the CEO

Based on these results, the most defensible strategy is not “more promotions,” but “smarter promotions.” Specifically:

5.1 1) Reduce broad promotions where ROI proxy is most negative

Start with Value Q1 (Low) and Promo High households.
Pilot reducing promotion depth/frequency on products heavily purchased by these segments.

5.2 2) Reallocate spend toward targeted coupons (with guardrails)

Coupons appear more efficient than broad promotions in this dataset.
Guardrails to prevent waste:
- minimum basket thresholds,
- limit stacking,
- personalized offers for non–promo-conditioned customers.

5.3 3) Make it causal: implement holdout testing

To move from “correlation with fixed effects” to decision-grade evidence:

Create randomized holdouts (no promo / reduced promo) by store-week or customer-week
Measure incremental net revenue and incremental margin
Update targeting based on measured incrementality

6 Summary and Limitations

6.1 Summary

Promotions show negative ROI proxy across all value segments → likely over-discounting on a revenue basis.
Promo-prone households show much higher discount rates → suggests discount conditioning.
Coupons correlate with larger baskets and show positive net-return proxies across segments.

6.2 Limitations (what could improve)

No margin/COGS data. Revenue-based ROI can still be unprofitable if gross margins are thin.
Selection bias. Promotions/coupons are not randomly assigned.
Limited product hierarchy. Without a products table, “where” analysis is constrained to product_id rather than department/category.
Short/long-term effects. Promotions can shift future behavior (stockpiling, brand switching), not captured here.

6.3 How to build on this work

Add product hierarchy (department/commodity) + margin to compute true profit lift.
Run holdout experiments for promotion and coupon strategies.
Extend segmentation beyond behavioral quartiles (demographics, life stage, store region) if available.

7 Appendix A: Reproducibility Notes (Optional)

The report is designed to knit quickly by loading precomputed outputs in outputs/.
To reproduce outputs from raw data, run the full analysis script first (the one that: loads get_transactions() and get_promotions(), aggregates promotions, creates segments, runs fixed-effect models, and writes the tables/figures).

8 Appendix B: Full Analysis Code (Not Executed During Knit)

# ============================
# Full analysis script used to generate the outputs/ folder
# (shown for reproducibility; not executed when knitting)
# ============================

options(stringsAsFactors = FALSE)
set.seed(123)

pkgs <- c(
  "completejourney",
  "data.table",
  "janitor",
  "lubridate",
  "fixest",
  "ggplot2",
  "dplyr",
  "readr",
  "scales"
)

to_install <- pkgs[!vapply(pkgs, requireNamespace, logical(1), quietly = TRUE)]
if (length(to_install) > 0) install.packages(to_install, repos = "https://cloud.r-project.org")

suppressPackageStartupMessages({
  library(completejourney)
  library(data.table)
  library(janitor)
  library(lubridate)
  library(fixest)
  library(ggplot2)
  library(dplyr)
  library(readr)
  library(scales)
})

dir.create("outputs", showWarnings = FALSE)
dir.create("outputs/tables", showWarnings = FALSE)
dir.create("outputs/figures", showWarnings = FALSE)
dir.create("outputs/models", showWarnings = FALSE)

# 1) Load data
transactions_raw <- completejourney::get_transactions()
promotions_raw   <- completejourney::get_promotions()

# 2) Clean transactions
tx <- transactions_raw %>%
  janitor::clean_names() %>%
  mutate(
    week = as.integer(week),
    transaction_timestamp = suppressWarnings(lubridate::ymd_hms(transaction_timestamp, tz = "UTC")),
    quantity = as.numeric(quantity),
    sales_value = as.numeric(sales_value),
    retail_disc = as.numeric(retail_disc),
    coupon_disc = as.numeric(coupon_disc),
    coupon_match_disc = as.numeric(coupon_match_disc)
  ) %>%
  filter(!is.na(household_id), !is.na(product_id), !is.na(store_id), !is.na(week)) %>%
  filter(!is.na(quantity), quantity > 0) %>%
  mutate(
    retail_disc_amt       = abs(coalesce(retail_disc, 0)),
    coupon_disc_amt       = abs(coalesce(coupon_disc, 0)),
    coupon_match_disc_amt = abs(coalesce(coupon_match_disc, 0)),
    coupon_total_amt      = coupon_disc_amt + coupon_match_disc_amt,
    total_disc_amt        = retail_disc_amt + coupon_total_amt,
    gross_sales_est       = sales_value + total_disc_amt,
    used_coupon           = as.integer(coupon_total_amt > 0),
    any_discount          = as.integer(total_disc_amt > 0)
  )

# 3) Promotions aggregation with data.table (fast)
prom_dt <- as.data.table(promotions_raw)
setnames(prom_dt, names(prom_dt), janitor::make_clean_names(names(prom_dt)))

need_cols <- c("product_id", "store_id", "week", "display_location", "mailer_location")
prom_dt <- prom_dt[, ..need_cols]
prom_dt[, week := as.integer(week)]
prom_dt[, display_location := as.character(display_location)]
prom_dt[, mailer_location  := as.character(mailer_location)]

prom_dt[, promo_display := fifelse(is.na(display_location) | display_location %in% c("0","NONE","none","","NA"), 0L, 1L)]
prom_dt[, promo_mailer  := fifelse(is.na(mailer_location)  | mailer_location  %in% c("0","NONE","none","","NA"), 0L, 1L)]

promotions <- prom_dt[
  ,
  .(
    promo_display = as.integer(max(promo_display)),
    promo_mailer  = as.integer(max(promo_mailer)),
    promo_any     = as.integer(max(promo_display | promo_mailer))
  ),
  by = .(product_id, store_id, week)
]

rm(promotions_raw, prom_dt); gc()

# Join promotions -> transactions
tx <- as.data.table(tx)
setDT(promotions)
setkey(tx, product_id, store_id, week)
setkey(promotions, product_id, store_id, week)

tx <- promotions[tx]  # left join
tx[is.na(promo_display), promo_display := 0L]
tx[is.na(promo_mailer),  promo_mailer  := 0L]
tx[is.na(promo_any),     promo_any     := 0L]

# 4) Household segments
hh <- tx[
  ,
  .(
    total_net_sales   = sum(sales_value, na.rm = TRUE),
    total_gross_sales = sum(gross_sales_est, na.rm = TRUE),
    total_discount    = sum(total_disc_amt, na.rm = TRUE),
    coupon_discount   = sum(coupon_total_amt, na.rm = TRUE),
    promo_net_sales   = sum(fifelse(promo_any == 1L, sales_value, 0), na.rm = TRUE),
    trips             = uniqueN(basket_id),
    weeks_active      = uniqueN(week),
    units             = sum(quantity, na.rm = TRUE)
  ),
  by = .(household_id)
]

hh[, discount_rate := fifelse(total_gross_sales > 0, total_discount / total_gross_sales, NA_real_)]
hh[, promo_share_net := fifelse(total_net_sales > 0, promo_net_sales / total_net_sales, NA_real_)]

hh[, value_quartile := dplyr::ntile(total_net_sales, 4)]
hh[, promo_quartile := dplyr::ntile(fifelse(is.na(promo_share_net), 0, promo_share_net), 4)]

hh[, value_segment := factor(value_quartile,
                             levels = 1:4,
                             labels = c("Value Q1 (Low)", "Value Q2", "Value Q3", "Value Q4 (Top)"))]
hh[, promo_prone_segment := factor(promo_quartile,
                                   levels = 1:4,
                                   labels = c("Promo Low", "Promo Med-Low", "Promo Med-High", "Promo High"))]

fwrite(hh, "outputs/tables/household_segments.csv")

# Attach segments
setkey(hh, household_id)
setkey(tx, household_id)
tx <- hh[tx]

# 5) Segment KPI tables (fixed)
seg_kpis <- function(segcol) {
  stopifnot(segcol %in% names(tx))
  tx[
    ,
    .(
      households = uniqueN(household_id),
      trips      = uniqueN(basket_id),
      net_sales  = sum(sales_value, na.rm = TRUE),
      gross_sales = sum(gross_sales_est, na.rm = TRUE),
      discount   = sum(total_disc_amt, na.rm = TRUE),
      discount_rate = ifelse(sum(gross_sales_est, na.rm = TRUE) > 0,
                             sum(total_disc_amt, na.rm = TRUE) / sum(gross_sales_est, na.rm = TRUE),
                             NA_real_),
      promo_net_sales = sum(fifelse(promo_any == 1L, sales_value, 0), na.rm = TRUE),
      promo_share_net = ifelse(sum(sales_value, na.rm = TRUE) > 0,
                               sum(fifelse(promo_any == 1L, sales_value, 0), na.rm = TRUE) / sum(sales_value, na.rm = TRUE),
                               NA_real_),
      coupon_discount = sum(coupon_total_amt, na.rm = TRUE)
    ),
    by = segcol
  ][order(-net_sales)]
}

k1 <- seg_kpis("value_segment"); fwrite(k1, "outputs/tables/segment_kpis_value_segment.csv")
k2 <- seg_kpis("promo_prone_segment"); fwrite(k2, "outputs/tables/segment_kpis_promo_prone_segment.csv")

# 6) Promo lift by value segment (product-store FE + week FE)
psw <- tx[
  ,
  .(
    gross_sales = sum(gross_sales_est, na.rm = TRUE),
    discount    = sum(total_disc_amt, na.rm = TRUE),
    promo_any   = as.integer(max(promo_any))
  ),
  by = .(value_segment, product_id, store_id, week)
]

promo_lift <- list()
for (seg in levels(psw$value_segment)) {
  d <- psw[value_segment == seg]

  m_g_any <- feols(gross_sales ~ promo_any | product_id^store_id + week, data = d, cluster = ~store_id)
  m_d_any <- feols(discount    ~ promo_any | product_id^store_id + week, data = d, cluster = ~store_id)

  net_lift <- as.numeric(coef(m_g_any)["promo_any"]) - as.numeric(coef(m_d_any)["promo_any"])
  disc_lift <- as.numeric(coef(m_d_any)["promo_any"])
  roi <- ifelse(abs(disc_lift) > 1e-9, net_lift / disc_lift, NA_real_)

  promo_lift[[seg]] <- data.frame(
    value_segment = seg,
    net_lift_promo_any = net_lift,
    discount_lift_promo_any = disc_lift,
    roi_proxy = roi
  )

  saveRDS(m_g_any, paste0("outputs/models/promo_", make.names(seg), "_gross_any.rds"))
  saveRDS(m_d_any, paste0("outputs/models/promo_", make.names(seg), "_disc_any.rds"))
}

promo_lift_df <- do.call(rbind, promo_lift)
write_csv(promo_lift_df, "outputs/tables/promo_lift_by_value_segment.csv")

p_roi <- ggplot(promo_lift_df, aes(x = value_segment, y = roi_proxy)) +
  geom_col() +
  labs(
    title = "Promo ROI Proxy by Value Segment",
    subtitle = "ROI proxy = (gross lift - discount lift) / discount lift (revenue proxy, not margin)",
    x = NULL, y = "ROI proxy"
  )
ggsave("outputs/figures/promo_roi_proxy_value_segment.png", p_roi, width = 9, height = 4.5, dpi = 160)

# 7) Coupon efficiency by segment (safe column names)
basket <- tx[
  ,
  .(
    gross_sales = sum(gross_sales_est, na.rm = TRUE),
    coupon_disc = sum(coupon_total_amt, na.rm = TRUE)
  ),
  by = .(household_id, basket_id, store_id, week, value_segment, promo_prone_segment)
]

coupon_eff <- function(segcol) {
  out <- list()
  seg_levels <- unique(basket[[segcol]])
  for (s in seg_levels) {
    d <- basket[get(segcol) == s]
    m <- feols(gross_sales ~ coupon_disc | household_id + week + store_id,
               data = d, cluster = ~household_id)
    b <- as.numeric(coef(m)["coupon_disc"])

    out[[as.character(s)]] <- data.frame(
      segment_var = segcol,
      segment_level = as.character(s),
      coef_gross_per_1_coupon = b,
      net_return_per_1_coupon = b - 1
    )
  }
  rbindlist(out, fill = TRUE)
}

coupon_df <- rbind(
  coupon_eff("value_segment"),
  coupon_eff("promo_prone_segment")
)
fwrite(coupon_df, "outputs/tables/coupon_efficiency_by_segment.csv")

# 8) Weekly summary + plots (label_number_si2 replacement)
label_number_si2 <- function(...) {
  scales::label_number(scale_cut = scales::cut_si(""), ...)
}

weekly <- tx[
  ,
  .(
    net_sales = sum(sales_value, na.rm = TRUE),
    gross_sales = sum(gross_sales_est, na.rm = TRUE),
    discount = sum(total_disc_amt, na.rm = TRUE),
    discount_rate = sum(total_disc_amt, na.rm = TRUE) / sum(gross_sales_est, na.rm = TRUE),
    promo_net_sales = sum(fifelse(promo_any == 1L, sales_value, 0), na.rm = TRUE),
    promo_share_net = sum(fifelse(promo_any == 1L, sales_value, 0), na.rm = TRUE) / sum(sales_value, na.rm = TRUE)
  ),
  by = .(week)
][order(week)]

fwrite(weekly, "outputs/tables/weekly_summary.csv")

p_week_sales <- ggplot(as.data.frame(weekly), aes(x = week, y = net_sales)) +
  geom_line() +
  labs(title = "Weekly Net Sales", x = "Week", y = "Net sales ($)") +
  scale_y_continuous(labels = label_number_si2())
ggsave("outputs/figures/weekly_net_sales.png", p_week_sales, width = 9, height = 4, dpi = 160)

p_week_disc <- ggplot(as.data.frame(weekly), aes(x = week, y = discount_rate)) +
  geom_line() +
  labs(title = "Weekly Discount Rate (Discount / Gross Sales)", x = "Week", y = "Discount rate") +
  scale_y_continuous(labels = scales::percent_format(accuracy = 1))
ggsave("outputs/figures/weekly_discount_rate.png", p_week_disc, width = 9, height = 4, dpi = 160)

# Over-discount flags
promo_flags <- promo_lift_df %>%
  mutate(over_discount_flag = (discount_lift_promo_any > 0 & roi_proxy <= 0))
write_csv(promo_flags, "outputs/tables/promo_overdiscount_flags.csv")

coupon_flags <- as.data.table(coupon_df)
coupon_flags[, over_discount_flag := (net_return_per_1_coupon < 0)]
fwrite(coupon_flags, "outputs/tables/coupon_overdiscount_flags.csv")

Regork: Which Customer Segments Respond Profitably to Promotions and Coupons?

Danny

2026-02-26