Regork Final Project: Campaigns & Growth Opportunities

Executive Summary

Big question: Which campaign type (A/B/C) actually drives extra spend, and who reacts the most?
How we tested: For each household, we compared during-campaign $/day vs. their own 30-day pre-campaign $/day → uplift ($ per HH per day).
Headline: Type A shows the highest average uplift; Type C is mostly flat/negative.
Who moves: Strongest lift in ages 25–44 with higher incomes (see “Top Responding Segments”).
What to do: Run a 60-day pilot scaling Type A into those segments, keep overlaps ≤ 1, and tune coupon depth. Rough math: ~$216.8k incremental for 100k HHs × 60 days.
Caveats: Not randomized; 30-day baseline can miss seasonality; overlaps/selection bias; we measure redemptions, not exposure.

Introduction

What we’re trying to figure out. Regork runs different campaign types (A/B/C) over the course of a given year. Assessing how these campaigns perform provides insights into key customer demographics and their responsiveness to different sales approaches. The differences across campaign types also have implications for related activities such as coupons or advertisements during key points in the fiscal year.

Why this matters for the CEO. By focusing on campaigns that drive sales and understanding why others fail, Regork can reduce spend on underperforming efforts. This means shifting funds toward campaigns and advertising that impact the bottom line and bring more customers to the store when Regork needs it most.

What we did (short version). Using the full Complete Journey data (transactions, campaigns + descriptions, coupons, products, demographics), we evaluated each campaign type’s overall performance. The three campaign types achieved varying levels of success from Type A being wildly profitable to Type C making meager gains. We compared each household’s during-campaign daily spend to their own 30-day pre-campaign baseline. That gives us uplift ($ per household per day). Then we sliced that by campaign type and by age × income to see who responds the most. We also looked at overlaps, coupon depth, top categories, and household engagement to understand the “why” behind each campaigns results.

What we’re proposing. Double down on the best-lifting campaign type, aim it at the top responding segments, keep overlaps to ≤1, and right-size coupon depth. These successes and lessons can be applied to poorly performing Type C campaigns. The “Illustrative Impact” table shows what that could mean in dollars.

Packages / Libraries Required

completejourney — loads the course datasets (transactions, campaigns, etc.).
dplyr, tidyr — joining tables, filtering, summarizing, reshaping.
lubridate — clean date handling (30-day windows, start/end dates).
ggplot2 — the charts.
scales — nice currency/percent labels.
gt — cleaner, presentation-ready tables.

Methods at a Glance (for non-technical readers)

Line up the data. Join campaigns to transactions by household and keep only the purchases that happened inside each campaign’s date window.
Make a baseline. For the same households, look at the 30 days before each campaign starts to get “normal” daily spend.
Compute uplift.
- rate_in = dollars during campaign / days active
- rate_base = dollars in prior 30 days / 30
- uplift = rate_in − rate_base → extra $/HH/day we can credit to the campaign
Compare groups. Average that uplift by campaign type and by age × income to find winners.
Reality check. Look at overlaps, coupon depth, top categories, and household engagement so the recommendations are practical.

ggplot(campaign_sales, aes(x = campaign_id, y = total_sales, fill = campaign_type)) +
  geom_col() +
  scale_y_continuous(labels = scales::dollar_format()) +
  labs(
    title = "Total Sales by Campaign (Unadjusted)",
    x = "Campaign ID", y = "Total Sales ($)", fill = "Type"
  ) +
  theme_minimal(base_size = 12) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  scale_fill_manual(values = camp_cols)

Campaign Timelines

What it shows: Calendar windows for each campaign by type.

Each Campaign Type is spread across the entire calender year with no major focus on seasonality or clustering.Therefore you can not definitively conclude the varying performance is due to timing of the campaign types.

ggplot(campaign_timeline, aes(x = start_date, xend = end_date,
                              y = reorder(factor(campaign_id), start_date),
                              yend = factor(campaign_id),
                              color = campaign_type)) +
  geom_segment(size = 6) +
  scale_x_date(date_labels = "%b", date_breaks = "1 month") +
  labs(title = "Campaign Timelines Across the Year", subtitle = "Colored by Type",
       x = "Date", y = "Campaign ID", color = "Type") +
  scale_color_manual(values = camp_cols) +
  theme_minimal(base_size = 12)

Average Campaign Duration by Type

What it shows: Typical length differs by campaign type.

Type A has much greater sales than Type B or C despite not being much shorter or longer than it’s peers. Type C campaigns are however are 20+ days longer on average, signals a potential cross contamination effect or consumer aparthy due to its length.

ggplot(avg_duration, aes(x = campaign_type, y = avg_duration_days, fill = campaign_type)) +
  geom_col(width = 0.6) +
  labs(title = "Average Campaign Duration", x = "Type", y = "Average Days", fill = "Type") +
  scale_fill_manual(values = camp_cols) +
  theme_minimal(base_size = 12)

Coupon Effectiveness by Type

What it shows: Redemption volume and average discount size.

Type A once again shows that while Discounts may not be why a campaign overwhelmingly succeeds, it can be a potential reason it fails. Type A’s coupons were highly leveraged during it’s campaigns as Type C falls far behind its peers in avg discount amount and redemptions.

coupon_summary %>%
  dplyr::mutate(
    coupons_redeemed = scales::comma(coupons_redeemed),
    total_discount = fmt_dollar(total_discount),
    avg_discount_per_coupon = fmt_dollar(avg_discount_per_coupon)
  ) %>%
  gt::gt() %>%
  gt::tab_header(title = "Coupon Effectiveness (Unadjusted)") %>%
  gt::cols_label(
    campaign_type = "Type",
    coupons_redeemed = "Coupons Redeemed",
    total_discount = "Total Discount",
    avg_discount_per_coupon = "Avg Discount / Coupon"
  )

Type	Coupons Redeemed	Total Discount	Avg Discount / Coupon
Coupon Effectiveness (Unadjusted)
Type A	373,468	$29,372.27	$0.08
Type B	46,121	$4,320.17	$0.09
Type C	17,476	$917.55	$0.05

ggplot(coupon_summary, aes(x = campaign_type, y = avg_discount_per_coupon, fill = campaign_type)) +
  geom_col(width = 0.6) +
  labs(title = "Average Discount per Coupon by Type", x = "Type", y = "Avg Discount ($)", fill = "Type") +
  scale_fill_manual(values = camp_cols) +
  theme_minimal(base_size = 12)

Top Product Categories (within campaign windows)

What it shows: Where dollars concentrate during campaign windows.

Solving why certain campaign types succeed or fail is not in the categories, as the top categories overall remain popular regardless of the Campaign Type.

ggplot(top_products, aes(x = reorder(product_category, total_sales),
                         y = total_sales, fill = campaign_type)) +
  geom_col(position = "dodge") +
  coord_flip() +
  scale_fill_manual(values = camp_cols) +
  scale_y_continuous(labels = scales::dollar_format()) +
  labs(title = "Top 5 Product Categories by Campaign (Unadjusted Totals)",
       x = "Product Category", y = "Total Sales ($)", fill = "Type") +
  theme_minimal(base_size = 12)

top_products %>%
  dplyr::slice_max(total_sales, n = 15) %>%
  dplyr::mutate(total_sales = fmt_dollar(total_sales)) %>%
  gt::gt() %>%
  gt::tab_header(title = "Top Categories (Top 15 Across Campaigns)") %>%
  gt::cols_label(
    campaign_id = "Campaign",
    campaign_type = "Type",
    product_category = "Category",
    total_sales = "Total Sales"
  ) %>%
  gt::tab_source_note(gt::md("**Note.** Totals are within campaign windows (unadjusted)."))

Campaign	Type	Category	Total Sales
Top Categories (Top 15 Across Campaigns)
18	Type A	COUPON/MISC ITEMS	$44,163.53
8	Type A	COUPON/MISC ITEMS	$40,905.00
13	Type A	COUPON/MISC ITEMS	$40,363.30
18	Type A	SOFT DRINKS	$18,710.70
8	Type A	SOFT DRINKS	$17,746.60
18	Type A	BEEF	$17,378.91
8	Type A	BEEF	$16,434.61
13	Type A	SOFT DRINKS	$15,782.80
13	Type A	BEEF	$15,710.55
14	Type C	COUPON/MISC ITEMS	$14,189.96
18	Type A	FLUID MILK PRODUCTS	$13,760.93
18	Type A	CHEESE	$12,910.02
11	Type B	COUPON/MISC ITEMS	$11,974.15
13	Type A	FLUID MILK PRODUCTS	$10,556.46
8	Type A	FLUID MILK PRODUCTS	$10,187.88
Note. Totals are within campaign windows (unadjusted).

Household Engagement

What it shows: How many households engaged and how strongly by type.

Households and Demographics that show it in droves for Type A Campaigns, fail to drive that same enthusiasm with Type C Campaigns despite showing up in similar proportions.

household_engagement %>%
  dplyr::mutate(
    avg_transactions_per_household = round(avg_transactions_per_household, 2),
    total_sales = fmt_dollar(total_sales),
    avg_sales_per_household = fmt_dollar(avg_sales_per_household)
  ) %>%
  gt::gt() %>%
  gt::tab_header(title = "Household Engagement (Unadjusted)") %>%
  gt::cols_label(
    campaign_type = "Type",
    unique_households = "Unique HHs",
    total_transactions = "Transactions",
    total_sales = "Total Sales",
    avg_transactions_per_household = "Tx / HH",
    avg_sales_per_household = "Sales / HH"
  )

Type	Unique HHs	Transactions	Total Sales	Tx / HH	Sales / HH
Household Engagement (Unadjusted)
Type A	1465	475004	$1,500,846.33	324.23	$1,024.47
Type B	968	254546	$816,228.44	262.96	$843.21
Type C	393	105041	$334,913.86	267.28	$852.20

Demographic Top Groups (unadjusted)

What it shows: Highest-sales groups during campaign windows.

top_income_groups %>%
  dplyr::group_by(campaign_type) %>%
  dplyr::slice_head(n = 5) %>%
  dplyr::ungroup() %>%
  dplyr::mutate(total_sales = fmt_dollar(total_sales)) %>%
  gt::gt() %>%
  gt::tab_header(title = "Top Income Groups by Sales (per Type)") %>%
  gt::cols_label(
    campaign_type = "Type",
    income = "Income",
    total_sales = "Total Sales"
  )

Type	Income	Total Sales
Top Income Groups by Sales (per Type)
Type A	50-74K	$252,427.86
Type A	35-49K	$182,955.78
Type A	75-99K	$119,551.91
Type A	25-34K	$81,223.32
Type A	Under 15K	$73,714.93
Type B	50-74K	$134,149.08
Type B	35-49K	$102,213.55
Type B	75-99K	$76,249.79
Type B	150-174K	$47,023.21
Type B	25-34K	$45,911.62
Type C	50-74K	$68,777.36
Type C	35-49K	$43,408.25
Type C	75-99K	$36,822.23
Type C	Under 15K	$23,281.18
Type C	125-149K	$17,429.75

top_age_groups %>%
  dplyr::group_by(campaign_type) %>%
  dplyr::slice_head(n = 5) %>%
  dplyr::ungroup() %>%
  dplyr::mutate(total_sales = fmt_dollar(total_sales)) %>%
  gt::gt() %>%
  gt::tab_header(title = "Top Age Groups by Sales (per Type)") %>%
  gt::cols_label(
    campaign_type = "Type",
    age = "Age",
    total_sales = "Total Sales"
  )

Type	Age	Total Sales
Top Age Groups by Sales (per Type)
Type A	45-54	$372,125.58
Type A	35-44	$273,108.86
Type A	25-34	$174,059.06
Type A	55-64	$68,875.91
Type A	65+	$63,902.99
Type B	45-54	$200,751.84
Type B	35-44	$169,966.62
Type B	25-34	$127,509.78
Type B	55-64	$32,324.52
Type B	19-24	$21,793.90
Type C	45-54	$85,153.41
Type C	35-44	$77,717.01
Type C	25-34	$43,232.42
Type C	19-24	$13,942.66
Type C	55-64	$11,208.53

top_household_sizes %>%
  dplyr::group_by(campaign_type) %>%
  dplyr::slice_head(n = 5) %>%
  dplyr::ungroup() %>%
  dplyr::mutate(total_sales = fmt_dollar(total_sales)) %>%
  gt::gt() %>%
  gt::tab_header(title = "Top Household Sizes by Sales (per Type)") %>%
  gt::cols_label(
    campaign_type = "Type",
    household_size = "HH Size",
    total_sales = "Total Sales"
  )

Type	HH Size	Total Sales
Top Household Sizes by Sales (per Type)
Type A	2	$396,354.56
Type A	1	$292,911.91
Type A	3	$142,691.98
Type A	5+	$95,988.81
Type A	4	$70,928.29
Type B	2	$209,143.35
Type B	1	$144,288.05
Type B	3	$93,420.64
Type B	4	$62,645.21
Type B	5+	$62,569.83
Type C	2	$82,239.28
Type C	1	$59,430.19
Type C	3	$49,471.91
Type C	4	$25,279.38
Type C	5+	$24,719.39

Overlap vs Sales Per Day

What it shows: Whether campaign crowding correlates with sales/day.

ggplot(campaign_metrics, aes(x = overlap_count, y = sales_per_day)) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "lm", se = FALSE) +
  facet_wrap(~ campaign_type) +
  labs(title = "Campaign Overlap vs. Sales Per Day (Unadjusted)",
       x = "Number of Overlapping Campaigns", y = "Sales Per Day ($)") +
  theme_minimal(base_size = 12)

Incremental Uplift by Type (baseline-adjusted)

What it shows: Incremental dollars per HH per day vs a 30-day baseline.

ggplot(uplift_summary, aes(x = reorder(campaign_type, avg_uplift_per_day),
                           y = avg_uplift_per_day, fill = campaign_type)) +
  geom_col() +
  coord_flip() +
  scale_fill_manual(values = camp_cols) +
  scale_y_continuous(labels = scales::dollar_format()) +
  labs(
    title = "Baseline-Adjusted Incremental Uplift by Campaign Type",
    x = "Campaign Type", y = "Avg Uplift ($/HH/day)", fill = "Type"
  ) +
  theme_minimal(base_size = 12)

uplift_summary %>%
  dplyr::mutate(
    households = scales::comma(households),
    avg_uplift_per_day = fmt_dollar(avg_uplift_per_day),
    med_uplift_per_day = fmt_dollar(med_uplift_per_day)
  ) %>%
  gt::gt() %>%
  gt::tab_header(title = "Incremental Uplift Summary") %>%
  gt::cols_label(
    campaign_type = "Type",
    households = "Households",
    avg_uplift_per_day = "Avg Uplift ($/HH/day)",
    med_uplift_per_day = "Median Uplift ($/HH/day)"
  )

Type	Households	Avg Uplift ($/HH/day)	Median Uplift ($/HH/day)
Incremental Uplift Summary
Type A	1,465	$0.04	$0.08
Type B	968	-$0.70	-$0.23
Type C	393	-$2.41	-$1.86

Top Responding Segments (by uplift)

What it shows: Best age × income cells to target next.

top_segments %>%
  dplyr::group_by(campaign_type) %>%
  dplyr::slice_head(n = 5) %>%
  dplyr::ungroup() %>%
  dplyr::mutate(avg_uplift_pd = fmt_dollar(avg_uplift_pd),
                hh = scales::comma(hh)) %>%
  gt::gt() %>%
  gt::tab_header(title = "Top Segments by Uplift (per Type)") %>%
  gt::cols_label(
    campaign_type = "Type",
    age = "Age",
    income = "Income",
    avg_uplift_pd = "Avg Uplift ($/HH/day)",
    hh = "HHs"
  ) %>%
  gt::tab_source_note(gt::md("**Note.** Baseline = 30 days pre-campaign; observational (not causal)."))

Type	Age	Income	Avg Uplift ($/HH/day)	HHs
Top Segments by Uplift (per Type)
Type A	25-34	200-249K	$9.07	1
Type A	35-44	175-199K	$7.04	1
Type A	55-64	100-124K	$3.66	5
Type A	45-54	250K+	$3.63	4
Type A	19-24	250K+	$2.74	2
Type B	65+	125-149K	$7.26	1
Type B	55-64	Under 15K	$2.81	3
Type B	55-64	175-199K	$2.31	2
Type B	19-24	200-249K	$2.24	1
Type B	25-34	125-149K	$2.20	6
Type C	25-34	125-149K	$12.98	2
Type C	19-24	Under 15K	$2.88	4
Type C	45-54	175-199K	$2.20	4
Type C	35-44	150-174K	$2.13	2
Type C	25-34	50-74K	$0.62	15
Note. Baseline = 30 days pre-campaign; observational (not causal).

Dataset	Rows (records)
Data Snapshot & QC
Transactions date range: 2017-01-01 — 2018-01-01
transactions	1,469,307
campaigns	6,589
campaign_descriptions	27
coupon_redemptions	2,102
products	92,331
demographics	801

Summary, Implications, and Limitations

What we found. Type A campaigns deliver the highest baseline-adjusted uplift and overall sales, attracting Regok’s largest customer groups across demographics. We also see that heavy campaign overlap dilutes sales per day, and that coupon depth matters—bigger isn’t always better. By contrast, Type C campaigns show negative baseline-adjusted uplift and fail to attract these same households, driven by longer durations, fewer coupons, and lower average discounts.

So what? Shift budget toward the winning type, focus on the top uplift segments, and keep overlaps minimal (ideally no more than one at a time). Calibrate coupon values by campaign type rather than using a one-size-fits-all discount.

Our recommendation. Run a 60-day pilot scaling the winning type into the top segments with controlled overlap and tuned coupon depth. Include holdouts so we can measure true lift cleanly, and then apply those lessons to improve the worst-performing campaigns.

Limitations. This is not a randomized experiment; a 30-day baseline can still miss seasonality; selection and overlap effects can sneak in; and we observe redemption, not exposure. That’s why we recommend a pilot with holdouts next.

Next steps. Launch the pilot (HH-level randomization), explore geo-tests or MMM for media influence, refine segment definitions if needed, and optimize coupon value by type and category using response curves.

CEO Recommendation & Illustrative Impact

Recommendation: Prioritize Type Type A (highest baseline-adjusted uplift) and scale into top uplift segments. Cap overlaps (≤1); calibrate coupon depth by type.

gt::gt(data.frame(
  `Recommended Type` = as.character(best_type),
  `Avg Uplift ($/HH/day)` = fmt_dollar(best_lift),
  `Households` = scales::comma(N),
  `Days` = D,
  `Estimated Incremental Revenue` = fmt_dollar(est_incremental)
)) %>%
  gt::tab_header(title = "Illustrative Impact of Scaling the Winning Campaign")

Recommended.Type	Avg.Uplift....HH.day.	Households	Days	Estimated.Incremental.Revenue
Illustrative Impact of Scaling the Winning Campaign
Type A	$0.04	100,000	60	$216,798.81