Regork Final Project: Campaigns & Growth Opportunities

Executive Summary

Big question: Which campaign type (A/B/C) actually drives extra spend, and who reacts the most?
How we tested: For each household, we compared during-campaign $/day vs. their own 30-day pre-campaign $/day → uplift ($ per HH per day).
Headline: Type A shows the highest average uplift; Types B/C are mostly flat/negative.
Who moves: Strongest lift in ages 25–44 with higher incomes (see “Top Responding Segments”).
What to do: Run a 60-day pilot scaling Type A into those segments, keep overlaps ≤ 1, and tune coupon depth. Rough math: ~$216.8k incremental for 100k HHs × 60 days.
Caveats: Not randomized; 30-day baseline can miss seasonality; overlaps/selection bias; we measure redemptions, not exposure.

Introduction

What we’re trying to figure out. Regork runs different campaign types (A/B/C), but we don’t really know which one actually drives extra dollars beyond what shoppers normally spend. We also want to know who reacts the most (by age/income), so we can point future spend in the right direction.

Why this matters for the CEO. If we focus money on the campaigns and people that actually lift sales, we can grow revenue without just blasting coupons everywhere. That’s cheaper, cleaner, and easier to scale.

What we did (short version). Using the full Complete Journey data (transactions, campaigns + descriptions, coupons, products, demographics), we compared each household’s during-campaign daily spend to their own 30-day pre-campaign baseline. That gives us uplift ($ per household per day). Then we sliced that by campaign type and by age × income to see who responds the most. We also looked at overlaps, coupon depth, top categories, and household engagement to understand the “why.”

What we’re proposing. Double down on the best-lifting campaign type, aim it at the top responding segments, keep overlaps to ≤1, and right-size coupon depth. The “Illustrative Impact” table shows what that could mean in dollars.

Packages / Libraries Required

completejourney — loads the course datasets (transactions, campaigns, etc.).
dplyr, tidyr — joining tables, filtering, summarizing, reshaping.
lubridate — clean date handling (30-day windows, start/end dates).
ggplot2 — the charts.
scales — nice currency/percent labels.
gt — cleaner, presentation-ready tables.

Methods at a Glance (for non-technical readers)

Line up the data. Join campaigns to transactions by household and keep only the purchases that happened inside each campaign’s date window.
Make a baseline. For the same households, look at the 30 days before each campaign starts to get “normal” daily spend.
Compute uplift.
- rate_in = dollars during campaign / days active
- rate_base = dollars in prior 30 days / 30
- uplift = rate_in − rate_base → extra $/HH/day we can credit to the campaign
Compare groups. Average that uplift by campaign type and by age × income to find winners.
Reality check. Look at overlaps, coupon depth, top categories, and household engagement so the recommendations are practical.

ggplot(campaign_sales, aes(x = campaign_id, y = total_sales, fill = campaign_type)) +
  geom_col() +
  scale_y_continuous(labels = scales::dollar_format()) +
  labs(
    title = "Total Sales by Campaign (Unadjusted)",
    x = "Campaign ID", y = "Total Sales ($)", fill = "Type"
  ) +
  theme_minimal(base_size = 12) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  scale_fill_manual(values = camp_cols)

Campaign Timelines

What it shows: Calendar windows for each campaign by type.

ggplot(campaign_timeline, aes(x = start_date, xend = end_date,
                              y = reorder(factor(campaign_id), start_date),
                              yend = factor(campaign_id),
                              color = campaign_type)) +
  geom_segment(size = 6) +
  scale_x_date(date_labels = "%b", date_breaks = "1 month") +
  labs(title = "Campaign Timelines Across the Year", subtitle = "Colored by Type",
       x = "Date", y = "Campaign ID", color = "Type") +
  scale_color_manual(values = camp_cols) +
  theme_minimal(base_size = 12)

Average Campaign Duration by Type

What it shows: Typical length differs by campaign type.

ggplot(avg_duration, aes(x = campaign_type, y = avg_duration_days, fill = campaign_type)) +
  geom_col(width = 0.6) +
  labs(title = "Average Campaign Duration", x = "Type", y = "Average Days", fill = "Type") +
  scale_fill_manual(values = camp_cols) +
  theme_minimal(base_size = 12)

Coupon Effectiveness by Type

What it shows: Redemption volume and average discount size.

coupon_summary %>%
  dplyr::mutate(
    coupons_redeemed = scales::comma(coupons_redeemed),
    total_discount = fmt_dollar(total_discount),
    avg_discount_per_coupon = fmt_dollar(avg_discount_per_coupon)
  ) %>%
  gt::gt() %>%
  gt::tab_header(title = "Coupon Effectiveness (Unadjusted)") %>%
  gt::cols_label(
    campaign_type = "Type",
    coupons_redeemed = "Coupons Redeemed",
    total_discount = "Total Discount",
    avg_discount_per_coupon = "Avg Discount / Coupon"
  )

Type	Coupons Redeemed	Total Discount	Avg Discount / Coupon
Coupon Effectiveness (Unadjusted)
Type A	373,468	$29,372.27	$0.08
Type B	46,121	$4,320.17	$0.09
Type C	17,476	$917.55	$0.05

ggplot(coupon_summary, aes(x = campaign_type, y = avg_discount_per_coupon, fill = campaign_type)) +
  geom_col(width = 0.6) +
  labs(title = "Average Discount per Coupon by Type", x = "Type", y = "Avg Discount ($)", fill = "Type") +
  scale_fill_manual(values = camp_cols) +
  theme_minimal(base_size = 12)

Top Product Categories (within campaign windows)

What it shows: Where dollars concentrate during campaign windows.

ggplot(top_products, aes(x = reorder(product_category, total_sales),
                         y = total_sales, fill = campaign_type)) +
  geom_col(position = "dodge") +
  coord_flip() +
  scale_fill_manual(values = camp_cols) +
  scale_y_continuous(labels = scales::dollar_format()) +
  labs(title = "Top 5 Product Categories by Campaign (Unadjusted Totals)",
       x = "Product Category", y = "Total Sales ($)", fill = "Type") +
  theme_minimal(base_size = 12)

top_products %>%
  dplyr::slice_max(total_sales, n = 15) %>%
  dplyr::mutate(total_sales = fmt_dollar(total_sales)) %>%
  gt::gt() %>%
  gt::tab_header(title = "Top Categories (Top 15 Across Campaigns)") %>%
  gt::cols_label(
    campaign_id = "Campaign",
    campaign_type = "Type",
    product_category = "Category",
    total_sales = "Total Sales"
  ) %>%
  gt::tab_source_note(gt::md("**Note.** Totals are within campaign windows (unadjusted)."))

Campaign	Type	Category	Total Sales
Top Categories (Top 15 Across Campaigns)
18	Type A	COUPON/MISC ITEMS	$44,163.53
8	Type A	COUPON/MISC ITEMS	$40,905.00
13	Type A	COUPON/MISC ITEMS	$40,363.30
18	Type A	SOFT DRINKS	$18,710.70
8	Type A	SOFT DRINKS	$17,746.60
18	Type A	BEEF	$17,378.91
8	Type A	BEEF	$16,434.61
13	Type A	SOFT DRINKS	$15,782.80
13	Type A	BEEF	$15,710.55
14	Type C	COUPON/MISC ITEMS	$14,189.96
18	Type A	FLUID MILK PRODUCTS	$13,760.93
18	Type A	CHEESE	$12,910.02
11	Type B	COUPON/MISC ITEMS	$11,974.15
13	Type A	FLUID MILK PRODUCTS	$10,556.46
8	Type A	FLUID MILK PRODUCTS	$10,187.88
Note. Totals are within campaign windows (unadjusted).

Household Engagement

What it shows: How many households engaged and how strongly by type.

household_engagement %>%
  dplyr::mutate(
    avg_transactions_per_household = round(avg_transactions_per_household, 2),
    total_sales = fmt_dollar(total_sales),
    avg_sales_per_household = fmt_dollar(avg_sales_per_household)
  ) %>%
  gt::gt() %>%
  gt::tab_header(title = "Household Engagement (Unadjusted)") %>%
  gt::cols_label(
    campaign_type = "Type",
    unique_households = "Unique HHs",
    total_transactions = "Transactions",
    total_sales = "Total Sales",
    avg_transactions_per_household = "Tx / HH",
    avg_sales_per_household = "Sales / HH"
  )

Type	Unique HHs	Transactions	Total Sales	Tx / HH	Sales / HH
Household Engagement (Unadjusted)
Type A	1465	475004	$1,500,846.33	324.23	$1,024.47
Type B	968	254546	$816,228.44	262.96	$843.21
Type C	393	105041	$334,913.86	267.28	$852.20

Demographic Top Groups (unadjusted)

What it shows: Highest-sales groups during campaign windows.

top_income_groups %>%
  dplyr::group_by(campaign_type) %>%
  dplyr::slice_head(n = 5) %>%
  dplyr::ungroup() %>%
  dplyr::mutate(total_sales = fmt_dollar(total_sales)) %>%
  gt::gt() %>%
  gt::tab_header(title = "Top Income Groups by Sales (per Type)") %>%
  gt::cols_label(
    campaign_type = "Type",
    income = "Income",
    total_sales = "Total Sales"
  )

Type	Income	Total Sales
Top Income Groups by Sales (per Type)
Type A	50-74K	$252,427.86
Type A	35-49K	$182,955.78
Type A	75-99K	$119,551.91
Type A	25-34K	$81,223.32
Type A	Under 15K	$73,714.93
Type B	50-74K	$134,149.08
Type B	35-49K	$102,213.55
Type B	75-99K	$76,249.79
Type B	150-174K	$47,023.21
Type B	25-34K	$45,911.62
Type C	50-74K	$68,777.36
Type C	35-49K	$43,408.25
Type C	75-99K	$36,822.23
Type C	Under 15K	$23,281.18
Type C	125-149K	$17,429.75

top_age_groups %>%
  dplyr::group_by(campaign_type) %>%
  dplyr::slice_head(n = 5) %>%
  dplyr::ungroup() %>%
  dplyr::mutate(total_sales = fmt_dollar(total_sales)) %>%
  gt::gt() %>%
  gt::tab_header(title = "Top Age Groups by Sales (per Type)") %>%
  gt::cols_label(
    campaign_type = "Type",
    age = "Age",
    total_sales = "Total Sales"
  )

Type	Age	Total Sales
Top Age Groups by Sales (per Type)
Type A	45-54	$372,125.58
Type A	35-44	$273,108.86
Type A	25-34	$174,059.06
Type A	55-64	$68,875.91
Type A	65+	$63,902.99
Type B	45-54	$200,751.84
Type B	35-44	$169,966.62
Type B	25-34	$127,509.78
Type B	55-64	$32,324.52
Type B	19-24	$21,793.90
Type C	45-54	$85,153.41
Type C	35-44	$77,717.01
Type C	25-34	$43,232.42
Type C	19-24	$13,942.66
Type C	55-64	$11,208.53

top_household_sizes %>%
  dplyr::group_by(campaign_type) %>%
  dplyr::slice_head(n = 5) %>%
  dplyr::ungroup() %>%
  dplyr::mutate(total_sales = fmt_dollar(total_sales)) %>%
  gt::gt() %>%
  gt::tab_header(title = "Top Household Sizes by Sales (per Type)") %>%
  gt::cols_label(
    campaign_type = "Type",
    household_size = "HH Size",
    total_sales = "Total Sales"
  )

Type	HH Size	Total Sales
Top Household Sizes by Sales (per Type)
Type A	2	$396,354.56
Type A	1	$292,911.91
Type A	3	$142,691.98
Type A	5+	$95,988.81
Type A	4	$70,928.29
Type B	2	$209,143.35
Type B	1	$144,288.05
Type B	3	$93,420.64
Type B	4	$62,645.21
Type B	5+	$62,569.83
Type C	2	$82,239.28
Type C	1	$59,430.19
Type C	3	$49,471.91
Type C	4	$25,279.38
Type C	5+	$24,719.39

Overlap vs Sales Per Day

What it shows: Whether campaign crowding correlates with sales/day.

ggplot(campaign_metrics, aes(x = overlap_count, y = sales_per_day)) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "lm", se = FALSE) +
  facet_wrap(~ campaign_type) +
  labs(title = "Campaign Overlap vs. Sales Per Day (Unadjusted)",
       x = "Number of Overlapping Campaigns", y = "Sales Per Day ($)") +
  theme_minimal(base_size = 12)

Incremental Uplift by Type (baseline-adjusted)

What it shows: Incremental dollars per HH per day vs a 30-day baseline.

ggplot(uplift_summary, aes(x = reorder(campaign_type, avg_uplift_per_day),
                           y = avg_uplift_per_day, fill = campaign_type)) +
  geom_col() +
  coord_flip() +
  scale_fill_manual(values = camp_cols) +
  scale_y_continuous(labels = scales::dollar_format()) +
  labs(
    title = "Baseline-Adjusted Incremental Uplift by Campaign Type",
    x = "Campaign Type", y = "Avg Uplift ($/HH/day)", fill = "Type"
  ) +
  theme_minimal(base_size = 12)

uplift_summary %>%
  dplyr::mutate(
    households = scales::comma(households),
    avg_uplift_per_day = fmt_dollar(avg_uplift_per_day),
    med_uplift_per_day = fmt_dollar(med_uplift_per_day)
  ) %>%
  gt::gt() %>%
  gt::tab_header(title = "Incremental Uplift Summary") %>%
  gt::cols_label(
    campaign_type = "Type",
    households = "Households",
    avg_uplift_per_day = "Avg Uplift ($/HH/day)",
    med_uplift_per_day = "Median Uplift ($/HH/day)"
  )

Type	Households	Avg Uplift ($/HH/day)	Median Uplift ($/HH/day)
Incremental Uplift Summary
Type A	1,465	$0.04	$0.08
Type B	968	-$0.70	-$0.23
Type C	393	-$2.41	-$1.86

Top Responding Segments (by uplift)

What it shows: Best age × income cells to target next.

top_segments %>%
  dplyr::group_by(campaign_type) %>%
  dplyr::slice_head(n = 5) %>%
  dplyr::ungroup() %>%
  dplyr::mutate(avg_uplift_pd = fmt_dollar(avg_uplift_pd),
                hh = scales::comma(hh)) %>%
  gt::gt() %>%
  gt::tab_header(title = "Top Segments by Uplift (per Type)") %>%
  gt::cols_label(
    campaign_type = "Type",
    age = "Age",
    income = "Income",
    avg_uplift_pd = "Avg Uplift ($/HH/day)",
    hh = "HHs"
  ) %>%
  gt::tab_source_note(gt::md("**Note.** Baseline = 30 days pre-campaign; observational (not causal)."))

Type	Age	Income	Avg Uplift ($/HH/day)	HHs
Top Segments by Uplift (per Type)
Type A	25-34	200-249K	$9.07	1
Type A	35-44	175-199K	$7.04	1
Type A	55-64	100-124K	$3.66	5
Type A	45-54	250K+	$3.63	4
Type A	19-24	250K+	$2.74	2
Type B	65+	125-149K	$7.26	1
Type B	55-64	Under 15K	$2.81	3
Type B	55-64	175-199K	$2.31	2
Type B	19-24	200-249K	$2.24	1
Type B	25-34	125-149K	$2.20	6
Type C	25-34	125-149K	$12.98	2
Type C	19-24	Under 15K	$2.88	4
Type C	45-54	175-199K	$2.20	4
Type C	35-44	150-174K	$2.13	2
Type C	25-34	50-74K	$0.62	15
Note. Baseline = 30 days pre-campaign; observational (not causal).

Dataset	Rows (records)
Data Snapshot & QC
Transactions date range: 2017-01-01 — 2018-01-01
transactions	1,469,307
campaigns	6,589
campaign_descriptions	27
coupon_redemptions	2,102
products	92,331
demographics	801

Summary, Implications, and Limitations

What we found (one paragraph). One campaign type clearly delivers the highest baseline-adjusted uplift. A few age × income groups pop as strong responders. We also see that heavy campaign overlap can dilute sales/day, and coupon depth matters (bigger isn’t always better).

So what? (implications). Shift budget toward the winning type, target the top uplift segments, and limit overlaps (≤1 at a time). Tune coupon values by campaign type instead of using a one-size-fits-all discount.

Our recommendation. Run a 60-day pilot scaling the winning type into the top segments with controlled overlap and calibrated coupon depth. Use holdouts so we can measure true lift cleanly.

Limitations (keeping it honest). This isn’t a randomized experiment; a 30-day baseline can still miss seasonality; selection/overlap effects can sneak in; and we observe redemptions, not actual campaign exposure. That’s why we recommend a pilot with holdouts next.

Next steps. Do the pilot (HH-level randomization), consider geo-tests/MMM for media influence, tighten segment definitions if needed, and optimize coupon value by type/category based on response curves.

CEO Recommendation & Illustrative Impact

Recommendation: Prioritize Type Type A (highest baseline-adjusted uplift) and scale into top uplift segments. Cap overlaps (≤1); calibrate coupon depth by type.

gt::gt(data.frame(
  `Recommended Type` = as.character(best_type),
  `Avg Uplift ($/HH/day)` = fmt_dollar(best_lift),
  `Households` = scales::comma(N),
  `Days` = D,
  `Estimated Incremental Revenue` = fmt_dollar(est_incremental)
)) %>%
  gt::tab_header(title = "Illustrative Impact of Scaling the Winning Campaign")

Recommended.Type	Avg.Uplift....HH.day.	Households	Days	Estimated.Incremental.Revenue
Illustrative Impact of Scaling the Winning Campaign
Type A	$0.04	100,000	60	$216,798.81