Promotions are one of Regork’s largest controllable levers for driving short-term sales, but they also carry real costs: limited prime display space, ad “real estate,” and the opportunity cost of promoting one product over another. The CEO should care because not all promotion placements are equally effective some placements may meaningfully lift demand, while others may produce little benefit or even correlate with lower sales.
This analysis addresses a practical question that directly impacts revenue strategy and merchandising execution:
For Regork’s highest-selling products, which in-store display locations and mailer placements are associated with the largest lift in weekly sales (revenue and units), compared to weeks with no display/no ad controlling for product and store?
We used two CompleteJourney datasets:
Transactions: what customers bought, including weekly sales and units
Promotions: where products were promoted (display + mailer placement codes)
Methodologically, we:
Focused on the most recent 13 weeks to analyze recent promotion performance
Selected the Top 20 products by revenue to prioritize high-impact items
Aggregated sales to product × store × week (matching how promotions are defined)
Collapsed multiple promotion records into a single “most prominent placement” per product-store-week using placement priority rules
Estimated lift by comparing promoted weeks to a baseline of the same product in the same store during weeks with no display and not on ad (controls for product and store)
This analysis helps leadership decide where to invest premium promotional real estate. The proposed solution is to:
Prioritize placements that show consistent positive lift (especially those with enough evidence/observations)
Use these placements for the Top 20 products to maximize sales lift
Avoid or investigate placements that show negative or weak association with sales
The following R packages are required in order to run the code in this R project:
library(completejourney) # dataset tables (transactions/products/promotions)
library(tidyverse) # data wrangling + plotting toolkit
library(dplyr) # data manipulation + joins
library(stringr) # string/text cleaning
library(ggplot2) # charts/visualizations
library(scales) # axis/label formatting (percent, dollar, comma)
library(knitr) # R Markdown + kable tables
library(kableExtra) # styled/pretty tablesWe narrowed the analysis to the last 13 weeks and the Top 20 products by revenue. This ensures:
Results reflect recent promotion behavior
Findings focus on products with meaningful business impact
Output: top_products_tbl confirms which products drive the most revenue in the chosen window.
transactions <- get_transactions()
promotions <- get_promotions()
# BUSINESS PARAMETERS (dynamic)
n_weeks <- 13
top_n_products <- 20
# Baseline rule: at least 30% of window weeks or minimum 4
baseline_min_weeks <- max(4, floor(n_weeks * 0.3))We aggregated transactions to product × store × week, producing:
weekly_sales (revenue)
weekly_units (volume)
This grain is critical because promotions are defined at the same level.
Output: tx_psw becomes the foundation for lift measurement.
The promotions table can have multiple rows per product-store-week. To avoid duplicated joins and inflated metrics, we:
filtered promotions to only keys appearing in tx_psw
collapsed promotions into one record per product-store-week
selected the most prominent placement using priority rules
We also created interpretable flags and intensity measures:
has_display, has_mailer, promo_any
n_display_rows, n_mailer_rows
Output: promo_psw is joined to create tx_joined.
#Keys = All product-store-week combination in our aggregated transactions
keys <- tx_psw %>% distinct(product_id, store_id, week)
promo_filtered <- promotions %>% # only those promotions that match those keys
semi_join(keys, by = c("product_id", "store_id", "week"))display_priority <- c("0","A","7","9","2","1","6","5","4","3")
mailer_priority <- c("0","C","A","P","J","F","L","X","Z","H","D")
display_labels <- c(
"0"="No display",
"1"="Store front",
"2"="Store rear",
"3"="Front end cap",
"4"="Mid-aisle end cap",
"5"="Rear end cap",
"6"="Side aisle end cap",
"7"="In-aisle",
"9"="Secondary location display",
"A"="In-shelf"
)
mailer_labels <- c(
"0"="Not on ad",
"A"="Interior page feature",
"C"="Interior page line item",
"D"="Front page feature",
"F"="Back page feature",
"H"="Wrap front feature",
"J"="Wrap interior coupon",
"L"="Wrap back feature",
"P"="Interior page coupon",
"X"="Free on interior page",
"Z"="Free on front/back/wrap"
)
promo_psw <- promo_filtered %>%
mutate(
display_location = as.character(display_location),
mailer_location = as.character(mailer_location),
display_location = if_else(is.na(display_location) | display_location == "", "0", display_location),
mailer_location = if_else(is.na(mailer_location) | mailer_location == "", "0", mailer_location)
) %>%
group_by(product_id, store_id, week) %>%
summarise(
# pick most prominent by priority ordering
display_loc = display_location[which.max(match(display_location, display_priority, nomatch = 0))],
mailer_loc = mailer_location[ which.max(match(mailer_location, mailer_priority, nomatch = 0))],
# flags
has_display = as.integer(any(display_location != "0")),
has_mailer = as.integer(any(mailer_location != "0")),
promo_any = as.integer(any(display_location != "0" | mailer_location != "0")),
# intensity
n_display_rows = sum(display_location != "0"),
n_mailer_rows = sum(mailer_location != "0"),
.groups = "drop"
) %>%
mutate(
display_label = display_labels[display_loc],
mailer_label = mailer_labels[mailer_loc]
)tx_joined <- tx_psw %>%
left_join(promo_psw, by = c("product_id", "store_id", "week")) %>%
mutate(
display_loc = coalesce(display_loc, "0"),
mailer_loc = coalesce(mailer_loc, "0"),
display_label = coalesce(display_label, "No display"),
mailer_label = coalesce(mailer_label, "Not on ad"),
has_display = coalesce(has_display, 0L),
has_mailer = coalesce(has_mailer, 0L),
promo_any = coalesce(promo_any, 0L),
n_display_rows = coalesce(n_display_rows, 0L),
n_mailer_rows = coalesce(n_mailer_rows, 0L)
)This EDA is not simply “plot the raw data.” We engineered analytical features by combining transactions and promotions to uncover patterns that are not visible in either dataset alone.
To control for product and store, we computed a baseline within each product-store combination.
Baseline weeks satisfy:
No display (display_loc == “0”)
Not on ad (mailer_loc == “0”)
We required at least 4 baseline weeks for stability, preventing lift estimates from being driven by very small baseline samples.
baseline_min_weeks <- max(4, floor(n_weeks * 0.3))
tx_lift <- tx_joined %>%
group_by(product_id, store_id) %>%
mutate(
baseline_weeks = sum(display_loc == "0" & mailer_loc == "0", na.rm = TRUE),
baseline_sales = mean(weekly_sales[display_loc == "0" & mailer_loc == "0"], na.rm = TRUE),
baseline_units = mean(weekly_units[display_loc == "0" & mailer_loc == "0"], na.rm = TRUE)
) %>%
ungroup() %>%
filter(
baseline_weeks >= baseline_min_weeks,
!is.na(baseline_sales), baseline_sales > 0,
!is.na(baseline_units), baseline_units > 0
) %>%
mutate(
lift_sales_pct = (weekly_sales - baseline_sales) / baseline_sales,
lift_units_pct = (weekly_units - baseline_units) / baseline_units
)We produced:
Ranked table: Display location → avg sales lift, avg unit lift, n_obs
Ranked table: Mailer placement → avg sales lift, avg unit lift, n_obs
Two plots visualizing average lift by placement
These outputs reveal “placement effectiveness” patterns that are not self-evident in the raw data.
Display placements
Some display locations show extremely high lift but have very low n_obs (ex: mid-aisle end cap with n = 1). These should be treated as hypotheses, not major recommendations.
Front end cap shows a smaller positive lift with much stronger support (ex: n = 65), making it more believable.
Secondary location display shows negative lift in the observed window, suggesting it may be less effective or used in weaker demand conditions.
display_impact <- tx_lift %>%
group_by(display_loc, display_label) %>%
summarise(
avg_lift_sales_pct = mean(lift_sales_pct, na.rm = TRUE),
avg_lift_units_pct = mean(lift_units_pct, na.rm = TRUE),
n_obs = n(),
.groups = "drop"
) %>% slice_max(order_by = avg_lift_sales_pct, n = 6)
display_impact## # A tibble: 6 × 5
## display_loc display_label avg_lift_sales_pct avg_lift_units_pct n_obs
## <chr> <chr> <dbl> <dbl> <int>
## 1 4 Mid-aisle end cap 2.82 5 1
## 2 6 Side aisle end cap 1.27 1.92 3
## 3 2 Store rear 0.677 1.18 10
## 4 1 Store front 0.116 0.121 9
## 5 7 In-aisle 0.0934 0.536 9
## 6 3 Front end cap 0.0914 0.224 65
Mailer placements
Wrap front feature (H) and interior page feature (A) show large lift with moderate sample sizes (~50 observations).
Front page feature (D) shows smaller lift but has a very large sample size (ex: n = 824), making it a reliable “workhorse” placement.
“Free on interior page (X)” shows negative lift, which may reflect cannibalization or context effects.
This is strong EDA because it reports both signal and evidence strength (n_obs).
mailer_impact <- tx_lift %>%
group_by(mailer_loc, mailer_label) %>%
summarise(
avg_lift_sales_pct = mean(lift_sales_pct, na.rm = TRUE),
avg_lift_units_pct = mean(lift_units_pct, na.rm = TRUE),
n_obs = n(),
.groups = "drop"
) %>%
slice_max(order_by = avg_lift_sales_pct, n = 6)
mailer_impact## # A tibble: 6 × 5
## mailer_loc mailer_label avg_lift_sales_pct avg_lift_units_pct n_obs
## <chr> <chr> <dbl> <dbl> <int>
## 1 H Wrap front feature 0.849 1.95 54
## 2 A Interior page feature 0.652 1.12 50
## 3 F Back page feature 0.266 0.438 42
## 4 D Front page feature 0.227 0.706 824
## 5 0 Not on ad 0.000523 0.000361 5059
## 6 X Free on interior page -0.134 -0.158 123
subtitle_text <- paste0(
"Lift vs baseline (no display + no mailer), last ", n_weeks,
" weeks, top ", top_n_products, " products"
)display_plot <- display_impact %>%
mutate(
lift_pct = avg_lift_sales_pct,
label_txt = paste0(percent(lift_pct, accuracy = 0.1), " | n=", comma(n_obs))
)
ggplot(display_plot,
aes(x = reorder(display_label, lift_pct), y = lift_pct, fill = lift_pct)) +
geom_col(width = 0.75, alpha = 0.95, color = "white", linewidth = 0.6) +
geom_hline(yintercept = 0, linetype = "dashed", linewidth = 0.9, color = "gray35") +
geom_text(
aes(label = label_txt),
hjust = ifelse(display_plot$lift_pct >= 0, -0.10, 1.10),
size = 4.1,
fontface = "bold",
color = "#1f1f1f"
) +
coord_flip() +
scale_y_continuous(
labels = percent_format(accuracy = 1),
expand = expansion(mult = c(0.06, 0.28))
) +
scale_fill_gradient2(
low = "#FF4D6D", # vibrant negative
mid = "#F4F4F4", # neutral around 0
high = "#00B4D8", # vibrant positive
midpoint = 0,
labels = percent_format(accuracy = 1),
name = "Lift"
) +
labs(
title = "Average Sales Lift by Display Location (Top 5)",
subtitle = subtitle_text,
x = NULL,
y = "Avg Sales Lift"
) +
theme_minimal(base_size = 14) +
theme(
plot.title = element_text(face = "bold", size = 18, color = "#243665"),
plot.subtitle = element_text(size = 12, color = "gray30"),
axis.text.y = element_text(face = "bold"),
legend.position = "right",
panel.grid.major.y = element_blank()
)mailer_plot <- mailer_impact %>%
mutate(
lift_pct = avg_lift_sales_pct,
label_txt = paste0(percent(lift_pct, accuracy = 0.1), " | n=", comma(n_obs))
)
ggplot(mailer_plot,
aes(x = reorder(mailer_label, lift_pct), y = lift_pct, fill = lift_pct)) +
geom_col(width = 0.75, alpha = 0.95, color = "white", linewidth = 0.6) +
geom_hline(yintercept = 0, linetype = "dashed", linewidth = 0.9, color = "gray35") +
geom_text(
aes(label = label_txt),
hjust = ifelse(mailer_plot$lift_pct >= 0, -0.10, 1.10),
size = 4.1,
fontface = "bold",
color = "#1f1f1f"
) +
coord_flip() +
scale_y_continuous(
labels = percent_format(accuracy = 1),
expand = expansion(mult = c(0.06, 0.28))
) +
scale_fill_gradient2(
low = "#FF4D6D",
mid = "#F4F4F4",
high = "#00B4D8",
midpoint = 0,
labels = percent_format(accuracy = 1),
name = "Lift"
) +
labs(
title = "Average Sales Lift by Mailer Placement (Top 5)",
subtitle = subtitle_text,
x = NULL,
y = "Avg Sales Lift"
) +
theme_minimal(base_size = 14) +
theme(
plot.title = element_text(face = "bold", size = 18, color = "#243665"),
plot.subtitle = element_text(size = 12, color = "gray30"),
axis.text.y = element_text(face = "bold"),
legend.position = "right",
panel.grid.major.y = element_blank()
)
### Glimpse of Sales of top Products
top_products_tbl <- tx_13 %>%
semi_join(top_products, by = "product_id") %>%
group_by(product_id) %>%
summarise(total_sales_13w = sum(sales_value, na.rm = TRUE), .groups = "drop") %>%
arrange(desc(total_sales_13w))
top_products_tbl## # A tibble: 20 × 2
## product_id total_sales_13w
## <chr> <dbl>
## 1 6534178 67676.
## 2 6533889 6958.
## 3 1029743 6668.
## 4 6533765 5016.
## 5 6534166 4610.
## 6 1106523 4571.
## 7 1082185 4180
## 8 916122 4045.
## 9 844179 3096.
## 10 995242 3000.
## 11 1044078 2418.
## 12 5569230 2400.
## 13 1126899 2351.
## 14 1070820 2105.
## 15 899624 2022.
## 16 1005186 1903.
## 17 1127831 1882.
## 18 866211 1803.
## 19 854852 1706.
## 20 948670 1685.
We evaluated which promotional placements (display and mailer) are most associated with sales lift for Regork’s highest-selling products.
Using CompleteJourney transactions and promotions:
restricted to the last 13 weeks
selected Top 20 products by revenue
aggregated to product-store-week
collapsed promotions to “most prominent placement”
computed lift relative to within product-store baseline weeks with no promotion exposure
High-visibility mailer placements (wrap/front features) are associated with higher lift
Front page placements show more modest lift but have strong evidence (large n_obs)
Some placements show negative associations and may be poor candidates for revenue-driven campaigns
Use Wrap front (H) and Interior feature (A) for Top 20 products when maximizing lift is the goal
Keep Front page feature (D) as a scalable default placement because evidence is strongest
Prioritize Front end caps for displays due to positive lift with meaningful support
Treat results with very low n_obs as hypotheses—collect more weeks or expand products before policy decisions
Investigate placements with negative association (ex: “Free on interior page (X)”)
This analysis is associational, not causal (placements are not randomized)
Some placements have low sample size in the 13-week window
Improvements:
extend to more weeks and/or more products
run a regression with week fixed effects (controls for seasonality)
incorporate price/discount variables to separate placement effect from price effect