Introduction
What we’re trying to figure out. Regork runs different campaign types (A/B/C), but we don’t really know which one actually drives extra dollars beyond what shoppers normally spend. We also want to know who reacts the most (by age/income), so we can point future spend in the right direction.
Why this matters for the CEO. If we focus money on the campaigns and people that actually lift sales, we can grow revenue without just blasting coupons everywhere. That’s cheaper, cleaner, and easier to scale.
What we did (short version). Using the full Complete Journey data (transactions, campaigns + descriptions, coupons, products, demographics), we compared each household’s during-campaign daily spend to their own 30-day pre-campaign baseline. That gives us uplift ($ per household per day). Then we sliced that by campaign type and by age × income to see who responds the most. We also looked at overlaps, coupon depth, top categories, and household engagement to understand the “why.”
What we’re proposing. Double down on the best-lifting campaign type, aim it at the top responding segments, keep overlaps to ≤1, and right-size coupon depth. The “Illustrative Impact” table shows what that could mean in dollars.
Methods at a Glance (for non-technical readers)
Line up the data. Join campaigns to transactions by household and keep only the purchases that happened inside each campaign’s date window.
Make a baseline. For the same households, look at the 30 days before each campaign starts to get “normal” daily spend.
Compute uplift.
rate_in = dollars during campaign / days active
rate_base = dollars in prior 30 days / 30
uplift = rate_in − rate_base → extra $/HH/day we can credit to the campaign
Compare groups. Average that uplift by campaign type and by age × income to find winners.
Reality check. Look at overlaps, coupon depth, top categories, and household engagement so the recommendations are practical.
ggplot(campaign_sales, aes(x = campaign_id, y = total_sales, fill = campaign_type)) +
geom_col() +
scale_y_continuous(labels = scales::dollar_format()) +
labs(
title = "Total Sales by Campaign (Unadjusted)",
x = "Campaign ID", y = "Total Sales ($)", fill = "Type"
) +
theme_minimal(base_size = 12) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
scale_fill_manual(values = camp_cols)
What it shows: Calendar windows for each campaign by type.
ggplot(campaign_timeline, aes(x = start_date, xend = end_date,
y = reorder(factor(campaign_id), start_date),
yend = factor(campaign_id),
color = campaign_type)) +
geom_segment(size = 6) +
scale_x_date(date_labels = "%b", date_breaks = "1 month") +
labs(title = "Campaign Timelines Across the Year", subtitle = "Colored by Type",
x = "Date", y = "Campaign ID", color = "Type") +
scale_color_manual(values = camp_cols) +
theme_minimal(base_size = 12)
What it shows: Typical length differs by campaign type.
ggplot(avg_duration, aes(x = campaign_type, y = avg_duration_days, fill = campaign_type)) +
geom_col(width = 0.6) +
labs(title = "Average Campaign Duration", x = "Type", y = "Average Days", fill = "Type") +
scale_fill_manual(values = camp_cols) +
theme_minimal(base_size = 12)
What it shows: Redemption volume and average discount size.
coupon_summary %>%
dplyr::mutate(
coupons_redeemed = scales::comma(coupons_redeemed),
total_discount = fmt_dollar(total_discount),
avg_discount_per_coupon = fmt_dollar(avg_discount_per_coupon)
) %>%
gt::gt() %>%
gt::tab_header(title = "Coupon Effectiveness (Unadjusted)") %>%
gt::cols_label(
campaign_type = "Type",
coupons_redeemed = "Coupons Redeemed",
total_discount = "Total Discount",
avg_discount_per_coupon = "Avg Discount / Coupon"
)
| Coupon Effectiveness (Unadjusted) | |||
| Type | Coupons Redeemed | Total Discount | Avg Discount / Coupon |
|---|---|---|---|
| Type A | 373,468 | $29,372.27 | $0.08 |
| Type B | 46,121 | $4,320.17 | $0.09 |
| Type C | 17,476 | $917.55 | $0.05 |
ggplot(coupon_summary, aes(x = campaign_type, y = avg_discount_per_coupon, fill = campaign_type)) +
geom_col(width = 0.6) +
labs(title = "Average Discount per Coupon by Type", x = "Type", y = "Avg Discount ($)", fill = "Type") +
scale_fill_manual(values = camp_cols) +
theme_minimal(base_size = 12)
What it shows: Where dollars concentrate during campaign windows.
ggplot(top_products, aes(x = reorder(product_category, total_sales),
y = total_sales, fill = campaign_type)) +
geom_col(position = "dodge") +
coord_flip() +
scale_fill_manual(values = camp_cols) +
scale_y_continuous(labels = scales::dollar_format()) +
labs(title = "Top 5 Product Categories by Campaign (Unadjusted Totals)",
x = "Product Category", y = "Total Sales ($)", fill = "Type") +
theme_minimal(base_size = 12)
top_products %>%
dplyr::slice_max(total_sales, n = 15) %>%
dplyr::mutate(total_sales = fmt_dollar(total_sales)) %>%
gt::gt() %>%
gt::tab_header(title = "Top Categories (Top 15 Across Campaigns)") %>%
gt::cols_label(
campaign_id = "Campaign",
campaign_type = "Type",
product_category = "Category",
total_sales = "Total Sales"
) %>%
gt::tab_source_note(gt::md("**Note.** Totals are within campaign windows (unadjusted)."))
| Top Categories (Top 15 Across Campaigns) | |||
| Campaign | Type | Category | Total Sales |
|---|---|---|---|
| 18 | Type A | COUPON/MISC ITEMS | $44,163.53 |
| 8 | Type A | COUPON/MISC ITEMS | $40,905.00 |
| 13 | Type A | COUPON/MISC ITEMS | $40,363.30 |
| 18 | Type A | SOFT DRINKS | $18,710.70 |
| 8 | Type A | SOFT DRINKS | $17,746.60 |
| 18 | Type A | BEEF | $17,378.91 |
| 8 | Type A | BEEF | $16,434.61 |
| 13 | Type A | SOFT DRINKS | $15,782.80 |
| 13 | Type A | BEEF | $15,710.55 |
| 14 | Type C | COUPON/MISC ITEMS | $14,189.96 |
| 18 | Type A | FLUID MILK PRODUCTS | $13,760.93 |
| 18 | Type A | CHEESE | $12,910.02 |
| 11 | Type B | COUPON/MISC ITEMS | $11,974.15 |
| 13 | Type A | FLUID MILK PRODUCTS | $10,556.46 |
| 8 | Type A | FLUID MILK PRODUCTS | $10,187.88 |
| Note. Totals are within campaign windows (unadjusted). | |||
What it shows: How many households engaged and how strongly by type.
household_engagement %>%
dplyr::mutate(
avg_transactions_per_household = round(avg_transactions_per_household, 2),
total_sales = fmt_dollar(total_sales),
avg_sales_per_household = fmt_dollar(avg_sales_per_household)
) %>%
gt::gt() %>%
gt::tab_header(title = "Household Engagement (Unadjusted)") %>%
gt::cols_label(
campaign_type = "Type",
unique_households = "Unique HHs",
total_transactions = "Transactions",
total_sales = "Total Sales",
avg_transactions_per_household = "Tx / HH",
avg_sales_per_household = "Sales / HH"
)
| Household Engagement (Unadjusted) | |||||
| Type | Unique HHs | Transactions | Total Sales | Tx / HH | Sales / HH |
|---|---|---|---|---|---|
| Type A | 1465 | 475004 | $1,500,846.33 | 324.23 | $1,024.47 |
| Type B | 968 | 254546 | $816,228.44 | 262.96 | $843.21 |
| Type C | 393 | 105041 | $334,913.86 | 267.28 | $852.20 |
What it shows: Highest-sales groups during campaign windows.
top_income_groups %>%
dplyr::group_by(campaign_type) %>%
dplyr::slice_head(n = 5) %>%
dplyr::ungroup() %>%
dplyr::mutate(total_sales = fmt_dollar(total_sales)) %>%
gt::gt() %>%
gt::tab_header(title = "Top Income Groups by Sales (per Type)") %>%
gt::cols_label(
campaign_type = "Type",
income = "Income",
total_sales = "Total Sales"
)
| Top Income Groups by Sales (per Type) | ||
| Type | Income | Total Sales |
|---|---|---|
| Type A | 50-74K | $252,427.86 |
| Type A | 35-49K | $182,955.78 |
| Type A | 75-99K | $119,551.91 |
| Type A | 25-34K | $81,223.32 |
| Type A | Under 15K | $73,714.93 |
| Type B | 50-74K | $134,149.08 |
| Type B | 35-49K | $102,213.55 |
| Type B | 75-99K | $76,249.79 |
| Type B | 150-174K | $47,023.21 |
| Type B | 25-34K | $45,911.62 |
| Type C | 50-74K | $68,777.36 |
| Type C | 35-49K | $43,408.25 |
| Type C | 75-99K | $36,822.23 |
| Type C | Under 15K | $23,281.18 |
| Type C | 125-149K | $17,429.75 |
top_age_groups %>%
dplyr::group_by(campaign_type) %>%
dplyr::slice_head(n = 5) %>%
dplyr::ungroup() %>%
dplyr::mutate(total_sales = fmt_dollar(total_sales)) %>%
gt::gt() %>%
gt::tab_header(title = "Top Age Groups by Sales (per Type)") %>%
gt::cols_label(
campaign_type = "Type",
age = "Age",
total_sales = "Total Sales"
)
| Top Age Groups by Sales (per Type) | ||
| Type | Age | Total Sales |
|---|---|---|
| Type A | 45-54 | $372,125.58 |
| Type A | 35-44 | $273,108.86 |
| Type A | 25-34 | $174,059.06 |
| Type A | 55-64 | $68,875.91 |
| Type A | 65+ | $63,902.99 |
| Type B | 45-54 | $200,751.84 |
| Type B | 35-44 | $169,966.62 |
| Type B | 25-34 | $127,509.78 |
| Type B | 55-64 | $32,324.52 |
| Type B | 19-24 | $21,793.90 |
| Type C | 45-54 | $85,153.41 |
| Type C | 35-44 | $77,717.01 |
| Type C | 25-34 | $43,232.42 |
| Type C | 19-24 | $13,942.66 |
| Type C | 55-64 | $11,208.53 |
top_household_sizes %>%
dplyr::group_by(campaign_type) %>%
dplyr::slice_head(n = 5) %>%
dplyr::ungroup() %>%
dplyr::mutate(total_sales = fmt_dollar(total_sales)) %>%
gt::gt() %>%
gt::tab_header(title = "Top Household Sizes by Sales (per Type)") %>%
gt::cols_label(
campaign_type = "Type",
household_size = "HH Size",
total_sales = "Total Sales"
)
| Top Household Sizes by Sales (per Type) | ||
| Type | HH Size | Total Sales |
|---|---|---|
| Type A | 2 | $396,354.56 |
| Type A | 1 | $292,911.91 |
| Type A | 3 | $142,691.98 |
| Type A | 5+ | $95,988.81 |
| Type A | 4 | $70,928.29 |
| Type B | 2 | $209,143.35 |
| Type B | 1 | $144,288.05 |
| Type B | 3 | $93,420.64 |
| Type B | 4 | $62,645.21 |
| Type B | 5+ | $62,569.83 |
| Type C | 2 | $82,239.28 |
| Type C | 1 | $59,430.19 |
| Type C | 3 | $49,471.91 |
| Type C | 4 | $25,279.38 |
| Type C | 5+ | $24,719.39 |
What it shows: Whether campaign crowding correlates with sales/day.
ggplot(campaign_metrics, aes(x = overlap_count, y = sales_per_day)) +
geom_point(alpha = 0.6) +
geom_smooth(method = "lm", se = FALSE) +
facet_wrap(~ campaign_type) +
labs(title = "Campaign Overlap vs. Sales Per Day (Unadjusted)",
x = "Number of Overlapping Campaigns", y = "Sales Per Day ($)") +
theme_minimal(base_size = 12)
What it shows: Incremental dollars per HH per day vs a 30-day baseline.
ggplot(uplift_summary, aes(x = reorder(campaign_type, avg_uplift_per_day),
y = avg_uplift_per_day, fill = campaign_type)) +
geom_col() +
coord_flip() +
scale_fill_manual(values = camp_cols) +
scale_y_continuous(labels = scales::dollar_format()) +
labs(
title = "Baseline-Adjusted Incremental Uplift by Campaign Type",
x = "Campaign Type", y = "Avg Uplift ($/HH/day)", fill = "Type"
) +
theme_minimal(base_size = 12)
uplift_summary %>%
dplyr::mutate(
households = scales::comma(households),
avg_uplift_per_day = fmt_dollar(avg_uplift_per_day),
med_uplift_per_day = fmt_dollar(med_uplift_per_day)
) %>%
gt::gt() %>%
gt::tab_header(title = "Incremental Uplift Summary") %>%
gt::cols_label(
campaign_type = "Type",
households = "Households",
avg_uplift_per_day = "Avg Uplift ($/HH/day)",
med_uplift_per_day = "Median Uplift ($/HH/day)"
)
| Incremental Uplift Summary | |||
| Type | Households | Avg Uplift ($/HH/day) | Median Uplift ($/HH/day) |
|---|---|---|---|
| Type A | 1,465 | $0.04 | $0.08 |
| Type B | 968 | -$0.70 | -$0.23 |
| Type C | 393 | -$2.41 | -$1.86 |
What it shows: Best age × income cells to target next.
top_segments %>%
dplyr::group_by(campaign_type) %>%
dplyr::slice_head(n = 5) %>%
dplyr::ungroup() %>%
dplyr::mutate(avg_uplift_pd = fmt_dollar(avg_uplift_pd),
hh = scales::comma(hh)) %>%
gt::gt() %>%
gt::tab_header(title = "Top Segments by Uplift (per Type)") %>%
gt::cols_label(
campaign_type = "Type",
age = "Age",
income = "Income",
avg_uplift_pd = "Avg Uplift ($/HH/day)",
hh = "HHs"
) %>%
gt::tab_source_note(gt::md("**Note.** Baseline = 30 days pre-campaign; observational (not causal)."))
| Top Segments by Uplift (per Type) | ||||
| Type | Age | Income | Avg Uplift ($/HH/day) | HHs |
|---|---|---|---|---|
| Type A | 25-34 | 200-249K | $9.07 | 1 |
| Type A | 35-44 | 175-199K | $7.04 | 1 |
| Type A | 55-64 | 100-124K | $3.66 | 5 |
| Type A | 45-54 | 250K+ | $3.63 | 4 |
| Type A | 19-24 | 250K+ | $2.74 | 2 |
| Type B | 65+ | 125-149K | $7.26 | 1 |
| Type B | 55-64 | Under 15K | $2.81 | 3 |
| Type B | 55-64 | 175-199K | $2.31 | 2 |
| Type B | 19-24 | 200-249K | $2.24 | 1 |
| Type B | 25-34 | 125-149K | $2.20 | 6 |
| Type C | 25-34 | 125-149K | $12.98 | 2 |
| Type C | 19-24 | Under 15K | $2.88 | 4 |
| Type C | 45-54 | 175-199K | $2.20 | 4 |
| Type C | 35-44 | 150-174K | $2.13 | 2 |
| Type C | 25-34 | 50-74K | $0.62 | 15 |
| Note. Baseline = 30 days pre-campaign; observational (not causal). | ||||
| Data Snapshot & QC | |
| Transactions date range: 2017-01-01 — 2018-01-01 | |
| Dataset | Rows (records) |
|---|---|
| transactions | 1,469,307 |
| campaigns | 6,589 |
| campaign_descriptions | 27 |
| coupon_redemptions | 2,102 |
| products | 92,331 |
| demographics | 801 |
Summary, Implications, and Limitations
What we found (one paragraph). One campaign type clearly delivers the highest baseline-adjusted uplift. A few age × income groups pop as strong responders. We also see that heavy campaign overlap can dilute sales/day, and coupon depth matters (bigger isn’t always better).
So what? (implications). Shift budget toward the winning type, target the top uplift segments, and limit overlaps (≤1 at a time). Tune coupon values by campaign type instead of using a one-size-fits-all discount.
Our recommendation. Run a 60-day pilot scaling the winning type into the top segments with controlled overlap and calibrated coupon depth. Use holdouts so we can measure true lift cleanly.
Limitations (keeping it honest). This isn’t a randomized experiment; a 30-day baseline can still miss seasonality; selection/overlap effects can sneak in; and we observe redemptions, not actual campaign exposure. That’s why we recommend a pilot with holdouts next.
Next steps. Do the pilot (HH-level randomization), consider geo-tests/MMM for media influence, tighten segment definitions if needed, and optimize coupon value by type/category based on response curves.Recommendation: Prioritize Type Type A (highest baseline-adjusted uplift) and scale into top uplift segments. Cap overlaps (≤1); calibrate coupon depth by type.
gt::gt(data.frame(
`Recommended Type` = as.character(best_type),
`Avg Uplift ($/HH/day)` = fmt_dollar(best_lift),
`Households` = scales::comma(N),
`Days` = D,
`Estimated Incremental Revenue` = fmt_dollar(est_incremental)
)) %>%
gt::tab_header(title = "Illustrative Impact of Scaling the Winning Campaign")
| Illustrative Impact of Scaling the Winning Campaign | ||||
| Recommended.Type | Avg.Uplift....HH.day. | Households | Days | Estimated.Incremental.Revenue |
|---|---|---|---|---|
| Type A | $0.04 | 100,000 | 60 | $216,798.81 |