Our current promotional strategies, while frequent, are not systematically quantified. We are spending time and money without a clear understanding of their impact on sales and profitability. This analysis aims to solve this problem by providing a data-driven evaluation of our promotional strategies.
This study will move us from a reactive to a strategic, data-driven approach by addressing these critical questions:
By answering these questions, we can transition our retail team from a broad, reactive approach to one that is strategic and data-driven, allowing us to optimize promotional strategies, increase overall sales revenue, and allocate marketing budgets more efficiently.
Our current approach to promotions is based on assumptions, not data. We can’t definitively say which promotions are working on our top revenue products and which are not. This analysis will change that by providing a clear understanding of our return on investment (ROI).
By implementing the insights from this study, we can:
Increase Revenue: Tailor promotions to these products and customer segments where they have the biggest impact, directly leading to higher sales.
Boost Profitability: Stop wasting resources on ineffective promotions and reallocate our marketing budget to strategies that deliver a proven return.
Empower Our Teams: Provide the retail and marketing teams with the specific data they need to make smarter, more profitable decisions.
Ultimately, this analysis is about ensuring that our promotions are a strategic driver of growth, not a shot in the dark. It will transform how we think about and execute promotions, leading to more efficient spending and a stronger bottom line.
Our methodology for this analysis is structured into a three-part approach to provide a comprehensive view of promotional effectiveness.
Phase 1: Product and Departmental Analysis
First, we will conduct a comparative analysis of sales performance during promotion and non-promotion periods for each product. Products will be grouped by department to identify which departments are most significantly affected by promotions. This will serve as the foundation for the subsequent phases of the analysis, directing our focus to the areas with the highest potential for growth.
Tools that will be used in this phase: Quasi-Experimental Design, Causal Inference, Correlation, Outlier Detection, Quadrant Analysis & Metrics of Interest
Phase 2: Customer Type Analysis
Following an initial focus on the top five products that exhibit the strongest response to promotions, we will conduct a more in-depth analysis of customer behavior. This will involve segmenting customer data by key demographic factors, such as marital status and income level. The primary objective is to identify which customer segments are most influenced by promotional activities, thereby providing actionable intelligence for developing highly targeted marketing campaigns.
Tools that will be used will be in this phase: Data Segmentation and Cohort Analysis
Phase 3: Promotion Type Analysis
In the final phase, we will examine the effectiveness of different promotion types within the selected high-impact products. By distinguishing between various promotional methods, we can observe their respective effects on these specific products. This will enable us to develop a “promotion blueprint” that prescribes the most effective type of promotion for these products, thereby maximizing the return on investment for future campaigns.
Tools that will be used will be in this phase: Experimental Design Analysis & Comparative Analysis
Numbers tell a compelling story. They will allow us to move beyond gut feelings and into data-driven decision-making. For the finance and marketing teams, this report provides a powerful new tool. It allows them to quantify the potential revenue from a new promotion before it even launches, turning a hopeful guess into a calculated business forecast. It’s about maximizing our return on investment and ensuring every dollar spent on marketing is working as hard as possible.
We’ve also uncovered our most valuable assets. The data pinpoints the specific products and customer segments that are most receptive to our promotions. This is our strategic advantage. Instead of a one-size-fits-all approach, we can now tailor our campaigns to the customers most likely to respond, leading to a much higher conversion rate.
This isn’t the end of the journey; it’s just the beginning. The insights within this report are the foundational blocks for all future analyses and strategic planning. We can now test new ideas, measure their impact with precision, and continuously refine our approach. We’ve replaced a guess with a compass, and the path to growth is now clearer than ever.
library(completejourney) # Load the Complete Journey dataset and associated functions
library(dplyr) # Load dplyr for data manipulation (filter, mutate, summarise, etc.)
library(tidyr) # Load tidyr for data tidying (pivot, separate, unite, etc.)
library(ggplot2) # Load ggplot2 for creating plots and visualizations
library(ggrepel) # Load ggrepel for improved text label placement in ggplot2
library(knitr) # Load knitr for dynamic report generation and R Markdown
library(gridExtra) # Load gridExtra for arranging multiple ggplot2 plots in a grid
library(DT) # Create interactive, scrollable, and searchable tables in HTML reports
library(tidyverse) # Load the tidyverse, a collection of packages for data science
library(plotly) # Load plotly for creating interactive and dynamic visualizations
library(kableExtra) # Enhance knitr tables with advanced formatting, styling, and HTML/LaTeX export options
promotions <- get_promotions()
transactions <- get_transactions()
products <- products
show_col_info <- function(...) {
data_list <- list(...)
names_list <- names(data_list)
# Capitalize first letter
capitalize_first <- function(x) {
paste0(toupper(substr(x, 1, 1)), substr(x, 2, nchar(x)))
}
# Aliases for display
aliases <- c(
Column = "Column",
Type = "Type",
NA_Count = "NA Count",
Unique_Values = "Unique Values",
Sample_Values = "Sample Values",
Category = "Category"
)
for (i in seq_along(data_list)) {
df <- data_list[[i]]
if (ncol(df) == 0) next
cat("\n### Dataset:", capitalize_first(names_list[i]), "\n\n")
col_class <- sapply(df, function(x) paste(class(x), collapse = "/"))
na_count <- sapply(df, function(x) sum(is.na(x)))
unique_count <- sapply(df, function(x) if (is.list(x)) NA else length(unique(x)))
sample_vals <- sapply(df, function(x) {
if (is.list(x)) "list-column" else paste(head(unique(x), 3), collapse = ", ")
})
info <- data.frame(
Column = names(df),
Type = col_class,
NA_Count = na_count,
Unique_Values = unique_count,
Sample_Values = sample_vals,
stringsAsFactors = FALSE
)
info <- info %>%
mutate(
Category = case_when(
grepl("id|ID|code|Code", Column) ~ "Potential ID",
Type %in% c("character", "factor") & Unique_Values < 20 ~ "Categorical",
Type %in% c("character", "factor") & Unique_Values >= 20 ~ "Text/High-cardinality",
Type %in% c("numeric", "integer") & Unique_Values < 20 ~ "Categorical Numeric",
Type %in% c("numeric", "integer") & Unique_Values >= 20 ~ "Continuous",
Type %in% c("Date", "POSIXct", "POSIXlt") ~ "Date/Time",
TRUE ~ "Other"
)
) %>%
select(Column, Type, NA_Count, Unique_Values, Sample_Values, Category)
# Print with aliases
print(
kable(info,
caption = paste("Column Info for", capitalize_first(names_list[i])),
row.names = FALSE,
col.names = aliases[names(info)]
) %>%
kable_styling(
full_width = FALSE,
position = "left",
bootstrap_options = c("striped", "hover", "condensed")
)
)
}
}
show_col_info(
promotions = promotions,
transactions = transactions,
products = products
)
Column | Type | NA Count | Unique Values | Sample Values | Category |
---|---|---|---|---|---|
product_id | character | 0 | 59800 | 1000050, 1000092, 1000106 | Potential ID |
store_id | character | 0 | 112 | 316, 337, 441 | Potential ID |
display_location | factor | 0 | 10 | 9, 3, 5 | Categorical |
mailer_location | factor | 0 | 11 | 0, A, D | Categorical |
week | integer | 0 | 53 | 1, 2, 3 | Continuous |
Column | Type | NA Count | Unique Values | Sample Values | Category |
---|---|---|---|---|---|
household_id | character | 0 | 2469 | 900, 1228, 906 | Potential ID |
store_id | character | 0 | 457 | 330, 406, 319 | Potential ID |
basket_id | character | 0 | 155848 | 31198570044, 31198570047, 31198655051 | Potential ID |
product_id | character | 0 | 68509 | 1095275, 9878513, 1041453 | Potential ID |
quantity | numeric | 0 | 8443 | 1, 2, 3 | Continuous |
sales_value | numeric | 0 | 5003 | 0.5, 0.99, 1.43 | Continuous |
retail_disc | numeric | 0 | 2054 | 0, 0.1, 0.15 | Continuous |
coupon_disc | numeric | 0 | 321 | 0, 0.55, 2 | Continuous |
coupon_match_disc | numeric | 0 | 66 | 0, 0.45, 0.25 | Continuous |
week | integer | 0 | 53 | 1, 2, 3 | Continuous |
transaction_timestamp | POSIXct/POSIXt | 0 | 155036 | 2017-01-01 06:53:26, 2017-01-01 07:10:28, 2017-01-01 07:26:30 | Other |
Column | Type | NA Count | Unique Values | Sample Values | Category |
---|---|---|---|---|---|
product_id | character | 0 | 92331 | 25671, 26081, 26093 | Potential ID |
manufacturer_id | character | 0 | 6471 | 2, 69, 16 | Potential ID |
department | character | 0 | 32 | GROCERY, MISCELLANEOUS, PASTRY | Text/High-cardinality |
brand | factor | 0 | 2 | National, Private | Categorical |
product_category | character | 540 | 304 | FRZN ICE, NA, BREAD | Text/High-cardinality |
product_type | character | 528 | 2379 | ICE - CRUSHED/CUBED, NA, BREAD:ITALIAN/FRENCH | Text/High-cardinality |
package_size | character | 30586 | 3706 | 22 LB, NA, 50 OZ | Text/High-cardinality |
The Impact of Promotions on Sales
Before we can target our efforts, we need to understand the fundamental question: Do promotions actually work? To answer this, we’ve conducted a side-by-side comparison of average sales during promotional weeks versus non-promotional weeks. We focused on two key metrics: the average weekly sales volume and the average weekly revenue.
Our analysis looked at every product, but we filtered the data to ensure our findings are reliable. We only included products that had sales in both promotional and non-promotional weeks, giving us a true basis for comparison. Additionally, we excluded products with an average of fewer than 10 units sold during non-promo weeks. These products have a low baseline, making them susceptible to outliers that could skew our results and obscure the trends we’re most interested in.
The data now tells us a clear story about the baseline performance of our products, setting the stage for a detailed comparison of how they respond when a promotion is introduced.
# Step 1: Weekly totals per product
weekly_sales <- transactions %>%
group_by(product_id, week) %>%
summarise(
weekly_qty = sum(quantity, na.rm = TRUE),
weekly_revenue = sum(sales_value, na.rm = TRUE),
.groups = "drop"
) %>%
left_join(products %>% select(product_id, department), by = "product_id")
# Step 2: Identify valid promo weeks
valid_promo_weeks <- promotions %>%
filter(!(display_location == 0 & mailer_location == 0)) %>%
distinct(product_id, week)
# Step 3: Mark promo vs non-promo
weekly_sales <- weekly_sales %>%
mutate(is_promo_week = paste(product_id, week) %in%
paste(valid_promo_weeks$product_id, valid_promo_weeks$week))
# Step 4: Compute averages per product
avg_by_prod <- weekly_sales %>%
group_by(department, product_id, is_promo_week) %>%
summarise(
avg_weekly_qty = mean(weekly_qty, na.rm = TRUE),
avg_weekly_revenue = mean(weekly_revenue, na.rm = TRUE),
.groups = "drop"
) %>%
pivot_wider(
names_from = is_promo_week,
values_from = c(avg_weekly_qty, avg_weekly_revenue),
names_sep = "_promo_"
) %>%
rename(
avg_nonpromo_weekly_qty = avg_weekly_qty_promo_FALSE,
avg_promo_weekly_qty = avg_weekly_qty_promo_TRUE,
avg_nonpromo_weekly_revenue = avg_weekly_revenue_promo_FALSE,
avg_promo_weekly_revenue = avg_weekly_revenue_promo_TRUE
) %>%
mutate(
promo_lift_qty = avg_promo_weekly_qty / avg_nonpromo_weekly_qty,
promo_lift_revenue = avg_promo_weekly_revenue / avg_nonpromo_weekly_revenue
) %>%
filter(!is.na(avg_promo_weekly_qty), !is.na(avg_nonpromo_weekly_qty))
# Filter products where the average weekly quantity in non-promo weeks is at least 10
avg_by_prod_filtered <- avg_by_prod %>%
filter(avg_nonpromo_weekly_qty >= 10)
# Prepare summarized data for stacked bar chart
plot_data_grouped <- avg_by_prod_filtered %>%
summarise(
qty_gt1 = mean(promo_lift_qty > 1),
qty_le1 = 1 - qty_gt1,
revenue_gt1 = mean(promo_lift_revenue > 1),
revenue_le1 = 1 - revenue_gt1,
both_gt1 = mean(promo_lift_qty > 1 & promo_lift_revenue > 1),
both_le1 = 1 - both_gt1
) %>%
pivot_longer(
everything(),
names_to = c("metric", "lift"),
names_sep = "_",
values_to = "proportion"
) %>%
mutate(
proportion = proportion * 100,
metric = case_when(
metric == "qty" ~ "Quantity",
metric == "revenue" ~ "Revenue",
metric == "both" ~ "Combined",
TRUE ~ metric
),
# Legend-friendly label
lift = factor(
lift,
levels = c("le1", "gt1"),
labels = c("No Positive Lift", "Positive Lift")
)
)
# Create stacked bar chart
ggplot(plot_data_grouped, aes(x = metric, y = proportion, fill = lift)) +
geom_col(position = "stack", width = 0.6) +
geom_text(
aes(label = paste0(round(proportion, 1), "%")),
position = position_stack(vjust = 0.5),
size = 4.2,
color = "white",
fontface = "bold"
) +
scale_fill_manual(values = c(
"No Positive Lift" = "#ff7f50",
"Positive Lift" = "#0047ab"
)) +
labs(
title = "Measuring the Impact of Promotions Across Products",
subtitle = "Percentage of Products with and without Positive Lift, by Metric",
x = "Performance Metric",
y = "Percentage of Products (%)",
fill = NULL # <- Remove legend title
) +
ylim(0, 100) +
theme_minimal(base_size = 12) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold", size = 13),
plot.subtitle = element_text(hjust = 0.5, face = "italic", size = 11),
axis.title.x = element_text(size = 10, face = "bold"),
axis.title.y = element_text(size = 10, face = "bold"),
axis.text.x = element_text(size = 10),
axis.text.y = element_text(size = 10),
panel.grid = element_blank(),
legend.position = "bottom",
legend.box.margin = margin(t = 10)
)
When we put a product on promotion, the goal is simple: sell more and make more money. We looked at a group of 200 products that had at least 10 units in average weekly sales when they weren’t on promotion. For these products, we analyzed whether promotions successfully boosted sales quantity and revenue.
The results are encouraging:
71.1% of products saw an increase in average weekly sales quantity during their promotional weeks. This means the promotions are successfully getting more products out the door.
64.7% of products experienced an increase in average weekly sales revenue. So, not only are we selling more items, but they are also bringing in more money.
Most importantly, our data shows a strong combined effect. Approximately 64.2% of the products saw a positive lift in both sales quantity and revenue. This shows that for the vast majority of our products, our promotions are a win-win, driving both volume and value.
This data gives us a clear picture of what’s working and can help us refine our promotional strategies to improve these numbers even further.
plot_data_dept <- avg_by_prod_filtered %>%
group_by(department) %>%
summarise(
avg_nonpromo_weekly_qty = mean(avg_nonpromo_weekly_qty, na.rm = TRUE),
avg_promo_weekly_qty = mean(avg_promo_weekly_qty, na.rm = TRUE),
avg_nonpromo_weekly_revenue = mean(avg_nonpromo_weekly_revenue, na.rm = TRUE),
avg_promo_weekly_revenue = mean(avg_promo_weekly_revenue, na.rm = TRUE)
) %>%
pivot_longer(
cols = -department,
names_to = c("promo_status", "metric"),
names_pattern = "avg_(.*)_weekly_(.*)",
values_to = "avg_value"
) %>%
mutate(
promo_status = recode(promo_status,
"nonpromo" = "Non-Promo",
"promo" = "Promo"),
metric = recode(metric,
"qty" = "Sales Volume",
"revenue" = "Revenue"),
metric = factor(metric, levels = c("Sales Volume", "Revenue"))
)
dept_order <- plot_data_dept %>%
filter(metric == "Revenue") %>%
group_by(department) %>%
summarise(total_revenue = sum(avg_value, na.rm = TRUE)) %>%
arrange(total_revenue) %>%
pull(department)
plot_data_dept <- plot_data_dept %>%
mutate(department = factor(department, levels = dept_order))
# --- PLOT ---
ggplot(plot_data_dept, aes(x = department, y = avg_value, fill = promo_status)) +
geom_col(position = position_dodge(width = 0.8), width = 0.7) +
facet_wrap(~metric) +
geom_text(aes(label = round(avg_value, 1)),
position = position_dodge(width = 0.8),
hjust = -0.1, size = 3.5) +
scale_fill_manual(values = c("Non-Promo" = "#ff7f50", "Promo" = "#0047ab")) +
coord_flip() +
scale_y_continuous(expand = expansion(mult = c(0, 0.2))) +
labs(
title = "Average Weekly Sales Volume and Revenue by Department",
subtitle = "Comparison Between Promo and Non-Promo Weeks",
x = "Department",
y = "Average Weekly Value ($)",
fill = NULL
) +
theme_minimal(base_size = 13) +
theme(
plot.title = element_text(hjust = 0.5, face = "bold", size = 12),
plot.subtitle = element_text(hjust = 0.5, face = "italic", size = 12),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_rect(fill = "white", color = NA),
strip.text = element_text(face = "bold", size = 12),
axis.text.y = element_text(size = 11),
legend.position = "bottom",
legend.box.margin = margin(t = 10)
)
It’s clear that promotions are working across most departments. For nearly every department, promotions significantly boost both the average quantity of products sold and the average weekly revenue.
In the Produce and Meat departments, promotions have a particularly strong impact. For Produce, promotions more than double the average weekly revenue from $49.80 to $71.30 and also result in a noticeable increase in the quantity of items sold. The Meat department sees an even more dramatic effect, with promotions nearly tripling the average weekly revenue from $73.60 to a staggering $179.00.
Conversely, promotional results are mixed for the Miscellaneous and Meat-Pckgd departments. For miscellaneous items, promotions lead to a slight decrease in both quantity sold and revenue. Similarly, the Meat-Pckgd department experiences a drop in revenue, despite a slight increase in the quantity of units sold.
The data indicates that promotions successfully drive both sales volume and revenue across most departments. However, a few specific areas may require a different strategic approach to fully capitalize on promotional efforts. Given the strong performance observed in the Produce and Meat departments, it is highly probable that our top-performing products under promotion are concentrated in these two areas.
The Connection Between Sales Volume and Revenue
By analyzing promotions across our product line, we can clearly see a strong, positive relationship between a product’s sales volume lift and its revenue lift. As the scatter plot and regression line show, when a promotion successfully increases the average weekly sales volume for a product, it almost always leads to a corresponding increase in its average weekly sales revenue. This tells us that our promotions are generally effective at driving both more sales and more income.
# 1️ Create descriptive product labels
products_with_label <- products %>%
mutate(
product_label = paste(
brand,
coalesce(product_category, ""),
coalesce(product_type, ""),
coalesce(package_size, ""),
sep = " - "
)
)
# 2️ Select top 5 products by promo lift quantity
top5_prod <- avg_by_prod_filtered %>%
arrange(desc(promo_lift_qty)) %>%
slice_head(n = 5) %>%
left_join(products_with_label %>% select(product_id, product_label),
by = "product_id") %>%
mutate(label_text = product_label)
# 3️ Create ggplot
p <- ggplot(avg_by_prod_filtered, aes(x = promo_lift_qty, y = promo_lift_revenue)) +
geom_point(aes(text = paste0("Product: ", product_id)),
color = "#333333", size = 2, alpha = 0.5,
position = position_jitter(width = 0.1, height = 0.1)) +
geom_smooth(method = "lm", se = FALSE, color = "#ff7f50", linewidth = 1) +
geom_hline(yintercept = 1, linetype = "dashed", color = "gray50") +
geom_vline(xintercept = 1, linetype = "dashed", color = "gray50") +
geom_point(data = top5_prod, aes(x = promo_lift_qty, y = promo_lift_revenue,
text = paste0("Product: ", label_text)),
color = "#0047ab", size = 3, shape = 21, stroke = 1.5, fill = "#0047ab") +
geom_text_repel(
data = top5_prod,
aes(x = promo_lift_qty, y = promo_lift_revenue, label = label_text),
size = 3,
color = "#0047ab",
fontface = "bold",
nudge_x = -0.25,
nudge_y = 0.2,
segment.color = "gray70",
segment.size = 0.5
) +
labs(
x = "Promo Lift - Quantity",
y = "Promo Lift - Revenue"
) +
theme_minimal(base_size = 10) +
theme(
axis.title.x = element_text(size = 10, face = "bold"),
axis.title.y = element_text(size = 10, face = "bold"),
axis.text.x = element_text(size = 10),
axis.text.y = element_text(size = 10),
panel.grid = element_blank()
)
# 4️ Convert to interactive plot
ggplotly(p, tooltip = "text") %>%
layout(
title = list(
text = "Promotional Impact on Products<br><sup>Sales quantity vs. revenue lift across all products</sup>",
x = 0.5,
y = 0.95,
xanchor = "center",
yanchor = "top"
),
xaxis = list(title = "Promo Lift - Quantity"),
yaxis = list(title = "Promo Lift - Revenue")
) %>%
config(displayModeBar = FALSE)
Top Performers: The Big Winners
While all promotions tend to increase both volume and revenue, some products stand out as exceptional performers. We’ve highlighted the Top 5 Products that saw the most significant revenue increases. These aren’t just selling more; they’re also generating a disproportionately high amount of revenue.
The top performer, with the largest revenue increase, is Beef Lean. This item saw an impressive 5.8x increase in average weekly sales volume, which translated to a 4.4x increase in average weekly sales revenue.
The other high-performing products are Berries - Strawberries, Apples, Citrus - Tangerines & Tangelos, and Beef - Primal.
By focusing on products that achieve this kind of promotional lift, we can maximize our sales and profit.
Understanding Our Demographics
Before this report, our marketing strategy was a “one-size-fits-all” model, casting a wide net to reach every potential customer. We ran campaigns that offered the same promotions to everyone, from a young couple living downtown to a large family in the suburbs. While this approach generated some sales, our return on investment was inconsistent, and we couldn’t pinpoint why some campaigns succeeded while others fell flat. It was like throwing a dart with our eyes closed, hoping to hit the bullseye.
Our data in this analysis, however, tells a different story. It reveals that our most engaged customers are not a monolithic group, but rather a collection of distinct communities, each with its own unique needs and motivations.
Regork isn’t just about selling products; it’s about building a connection. By listening to the story our data tells, we can now see that a premium offer on a new product resonates deeply with high-income households, while a digital coupon for daily essentials speaks directly to the needs of a different segment. This personalized approach not only drives immediate sales by targeting the right customers with the right message, but also builds lasting loyalty and trust. This report’s goal is to shift from a broad, wasteful approach to a precise, data-driven strategy, proving that understanding the “who” behind the numbers is the key to maximizing both short-term profit and long-term customer value.
plot_avg_weekly_spend_facet <- function(household_cols = c("household_size", "income"),
prod_filter = c(),
promo_colors = c("TRUE"="#0047ab", "FALSE"="#ff7f50"),
ncol_facets = 2) { # number of columns in facet wrap
# Define column names (aliases)
col_alias <- c(
household_size = "Household Size",
income = "Income",
age = "Household Age",
kids_count = "Number of Children",
household_comp = "Household Comp.",
household_age = "Household Age",
marital_status = "Marital Status"
)
# Merge transactions with products and filter top products
top_prod <- transactions %>%
left_join(products %>% select(product_id, department), by = "product_id") %>%
filter(product_id %in% prod_filter)
# Loop over household features and combine
all_data <- lapply(household_cols, function(household_col) {
friendly_label <- ifelse(household_col %in% names(col_alias),
col_alias[[household_col]],
household_col)
trans_demo <- top_prod %>%
left_join(demographics %>% select(household_id, !!sym(household_col)),
by = "household_id") %>%
rename(household_feature = !!sym(household_col)) %>%
mutate(
household_feature = as.character(household_feature),
is_promo = paste(product_id, week) %in%
paste(valid_promo_weeks$product_id, valid_promo_weeks$week),
household_col_name = friendly_label
) %>%
filter(!is.na(household_feature))
trans_demo %>%
group_by(household_id, household_feature, week, is_promo, household_col_name) %>%
summarise(total_spent = sum(sales_value, na.rm = TRUE), .groups = "drop") %>%
group_by(household_feature, is_promo, household_col_name) %>%
summarise(avg_weekly_spent = mean(total_spent, na.rm = TRUE), .groups = "drop") %>%
group_by(household_feature, household_col_name) %>%
mutate(spend_pct = avg_weekly_spent / sum(avg_weekly_spent) * 100) %>%
ungroup()
}) %>% bind_rows()
# Plot with facets
ggplot(all_data, aes(x = household_feature, y = spend_pct, fill = factor(is_promo))) +
geom_col(width = 0.5, color = "white") +
geom_text(aes(label = paste0(round(spend_pct, 1), "%")),
position = position_stack(vjust = 0.5), size = 2.5, color = "white") +
scale_fill_manual(values = promo_colors, labels = c("Non-Promo", "Promo")) +
labs(
title = "Average Weekly Spending by Household Feature for Top 5 Products",
subtitle = "Comparison between promotional and non-promotional purchases",
x = NULL,
y = "Percentage of Average Spend (%)",
fill = NULL
) +
coord_flip() +
theme_minimal(base_size = 10) +
theme(
panel.grid = element_blank(),
plot.background = element_rect(fill = "white", color = NA),
plot.title = element_text(hjust = 0.5, face = "bold", size = 12),
plot.subtitle = element_text(hjust = 0.5, face = "italic", size = 9),
axis.text.x = element_text(size = 7),
axis.text.y = element_text(size = 7),
legend.position = "bottom",
legend.box.margin = margin(t = 5),
strip.text = element_text(size = 8, face = "bold")
) +
facet_wrap(~household_col_name, scales = "free_y", ncol = ncol_facets)
}
# --------------------------
# Define top products
# --------------------------
top_prodIDs <- c(1044078, 1135476, 1069415, 867519, 844179)
Our Key Demographics:
Under $15K Income Bracket
For households earning under $15K annually, promotional spending reaches
the highest level at 56.1%. This value-driven group
relies heavily on promotions as a central part of their purchasing
decisions. Marketing campaigns for this segment should emphasize
discounts and savings on essential goods to appeal to their
price-conscious nature.
$250K+ Income Bracket
Conversely, households earning over $250K per year also show strong
engagement with promotional spending, allocating 54.9%
of their budgets to such products. This high-income segment presents a
different opportunity. Effective campaigns should focus on premium
products offered at reduced prices, limited-time promotions, or
“buy-more-save-more” deals that appeal to their pursuit of value without
compromising on quality.
# --------------------------
# Facet plot 1
# --------------------------
facet_plot_1 <- c("household_size", "income")
plot_avg_weekly_spend_facet(
household_cols = facet_plot_1,
prod_filter = top_prodIDs
)
# --------------------------
# Facet plot 2
# --------------------------
facet_plot_2 <- c("age", "kids_count")
plot_avg_weekly_spend_facet(
household_cols = facet_plot_2,
prod_filter = top_prodIDs
)
Households with 5+ People
Large households with five or more members dedicate 53%
of their spending to promotional products, likely reflecting their need
to manage larger family budgets. Marketing strategies targeting this
group should highlight bulk-purchase deals, multi-item discounts, and
family-sized offers to meet their consumption demands
efficiently.
2 Adults with No Kids
Meanwhile, households consisting of two adults with no children also
display high responsiveness to promotions, allocating
54% of their spending in this category. This segment
provides a clearly defined target for campaigns focusing on
discretionary or premium goods. Their spending patterns suggest a
preference for products that deliver value while enhancing lifestyle
quality, making them ideal candidates for promotions centered around
non-essential but appealing items.
# --------------------------
# Facet plot 3
# --------------------------
facet_plot_3 <- c("household_comp", "marital_status")
plot_avg_weekly_spend_facet(
household_cols = facet_plot_3,
prod_filter = top_prodIDs
)
By concentrating on these four distinct segments; Seniors (65+), High and Low Income Earners, Large Families, and Couples without Kids; we can create highly focused and effective marketing campaigns that maximize ROI. Each group’s motivation for engaging with promotions is different, allowing for tailored messaging and product selection of our Top 5 products that can enhance overall promotional impact.
Optimizing Promotional Strategy for Top 5 Products
In this final section, we will examine the effectiveness of different promotion types within the selected high-impact products. Distinguishing between various promotional methods allows us to see their unique effects on our products.
The effectiveness of a promotion is not just about the product; it’s also about the target audience. A mailer coupon may resonate with price-sensitive, budget-conscious households, while a “buy-one-get-one-free” digital offer might be more effective with younger, tech-savvy consumers. The ultimate goal is to connect the right product with the right promotion for the right customer. By analyzing sales data against demographic information, we can tailor our strategies to specific customer segments, significantly increasing the probability of a sale and the overall return on investment. This tailored approach ensures marketing spend is focused on the channels and methods that provide the most significant uplift.
# --------------------------
# Step 1: Still Focus on Top Performed Products
# --------------------------
top_prod <- transactions %>%
left_join(products %>% select(product_id, department), by = "product_id") %>%
filter(product_id %in% top_prodIDs)
# --------------------------
# Step 2: Classify promotion types in the promotions table
# --------------------------
promotions_labeled <- promotions %>%
mutate(
display_promo_type = case_when(
display_location == '0' ~ "No display",
display_location == '1' ~ "Store front",
display_location == '2' ~ "Store rear",
display_location == '3' ~ "Front end cap",
display_location == '4' ~ "Mid-aisle end cap",
display_location == '5' ~ "Rear end cap",
display_location == '6' ~ "Side aisle end cap",
display_location == '7' ~ "In-aisle",
display_location == '9' ~ "Secondary location display",
display_location == 'A' ~ "In-shelf",
TRUE ~ NA_character_
),
mail_promo_type = case_when(
mailer_location == '0' ~ "Not on ad",
mailer_location == 'A' ~ "Interior page feature",
mailer_location == 'C' ~ "Interior page line item",
mailer_location == 'D' ~ "Front page feature",
mailer_location == 'F' ~ "Back page feature",
mailer_location == 'H' ~ "Wrap front feature",
mailer_location == 'J' ~ "Wrap interior coupon",
mailer_location == 'L' ~ "Wrap back feature",
mailer_location == 'P' ~ "Interior page coupon",
mailer_location == 'X' ~ "Free on interior page",
mailer_location == 'Z' ~ "Free on front/back/wrap page",
TRUE ~ NA_character_
)
) %>%
select(product_id, week, store_id, display_promo_type, mail_promo_type)
# --------------------------
# Step 3: Join promotion type info to transactions
# --------------------------
produce_trans_promo <- top_prod %>%
left_join(promotions_labeled, by = c("product_id", "week", "store_id")) %>%
mutate(mail_promo_type = ifelse(is.na(mail_promo_type), "Not on ad", mail_promo_type),
display_promo_type = ifelse(is.na(display_promo_type), "No display", display_promo_type))
# --------------------------
# Step 4: Aggregate by product_id, week, and promo_type
# Calculate total sales_value per product per week under each promo type
# --------------------------
weekly_spend_by_promo <- produce_trans_promo %>%
group_by(product_id, week, store_id, mail_promo_type, display_promo_type) %>%
summarise(
total_spent = sum(sales_value, na.rm = TRUE),
.groups = "drop"
)
# --------------------------
# Step 5: Calculate mean sales per product and promo type
# --------------------------
avg_sales_by_promo <- weekly_spend_by_promo %>%
group_by(product_id, mail_promo_type, display_promo_type) %>%
summarise(
mean_sales = mean(total_spent, na.rm = TRUE),
.groups = "drop"
)
avg_sales_by_promo <- avg_sales_by_promo %>%
mutate(
promo_type = case_when(
mail_promo_type == "Not on ad" & display_promo_type == "No display" ~ "No Promo",
mail_promo_type == "Not on ad" & display_promo_type != "No display" ~ paste0("DISPLAY: ", display_promo_type),
mail_promo_type != "Not on ad" & display_promo_type == "No display" ~ paste0("MAIL: ", mail_promo_type),
mail_promo_type != "Not on ad" & display_promo_type != "No display" ~ "BOTH",
TRUE ~ NA_character_
)
)%>%
select(-mail_promo_type, -display_promo_type)
# --------------------------
# Step 6: Create product alias mapping
# --------------------------
# Create a lookup table for product aliases
product_aliases <- products %>%
mutate(
product_full = paste(
brand,
coalesce(product_category, ""),
coalesce(product_type, ""),
coalesce(package_size, ""),
sep = " - "
),
product_label = case_when(
product_full == "National - BEEF - LEAN - " ~ "Lean Beef",
product_full == "National - APPLES - APPLES GOLD DELICIOUS (BULK&BA - " ~ "Apples",
product_full == "National - BERRIES - STRAWBERRIES - 1 LB" ~ "Strawberries",
product_full == "National - BEEF - PRIMAL - " ~ "Primal Beef",
product_full == "National - CITRUS - TANGERINES & TANGELOS - " ~ "Tangerines & Tangelos",
TRUE ~ product_full # Keep original if no alias defined
)
) %>%
select(product_id, product_label)
# --------------------------
# Step 7: Pivot to wide format
# --------------------------
avg_sales_wide <- avg_sales_by_promo %>%
tidyr::pivot_wider(
names_from = promo_type,
values_from = mean_sales
) %>%
# Join with product aliases
left_join(product_aliases, by = "product_id") %>%
# Move product_label to the front and remove product_id
select(product_label, everything(), -product_id)
# --------------------------
# Step 8: Rename columns using alias names
# --------------------------
avg_sales_wide <- avg_sales_wide %>%
rename(
"Product" = product_label,
"Mail: Front Page Feature" = "MAIL: Front page feature",
"Mail: Interior Page Feature" = "MAIL: Interior page feature",
"No Promo" = "No Promo",
"Mail: Wrap Front Feature" = "MAIL: Wrap front feature",
"Mail: Back Page Feature" = "MAIL: Back page feature",
"Display: In-Aisle" = "DISPLAY: In-aisle"
)
# --------------------------
# Step 9: Display table
# --------------------------
DT::datatable(
avg_sales_wide,
options = list(
scrollX = TRUE,
paging = FALSE,
searching = FALSE,
info = FALSE,
ordering = TRUE,
autoWidth = TRUE,
columnDefs = list(list(className = 'dt-center', targets = "_all"))
),
caption = "Average Sales by Promotion"
) %>%
DT::formatRound(columns = 2:ncol(avg_sales_wide), digits = 2)
The analysis of promotion effectiveness across our high-impact products reveals two distinct performance narratives, underscoring the need for a segmented promotional strategy.
Mailer promotions are exceptionally effective for high-value staple items, demonstrating a significant sales lift for lean and primal beef cuts compared to the “No Promo” baseline. Specifically, the data from Table 1 shows that average sales for National Lean Beef rose to $8.69 with an “Interior Page Feature” and $8.21 with a “Wrap Front Feature,” considerably higher than the $6.03 generated with no promotion. This performance confirms that targeted mailer placements are powerful drivers of consumer response for these items. This insight presents a clear opportunity to maximize sales among high-value demographic groups, such as high-income households, by strategically leveraging these top-performing mailer placements.
Conversely, the current promotional mix is consistently ineffective for impulse-driven produce, indicating a fundamental disconnect between the promotional strategy and customer behavior. The data for strawberries shows that the Front Page Feature and In-Aisle display promotions resulted in lower average sales than the “No Promo” baseline of $5.61. Similarly, both apples and citrus exhibited minimal engagement, with sales remaining low across all tracked promotions. This challenges the rule of thumb that perishable goods would benefit from immediate in-store displays, demonstrating that the promotional strategy itself may not be the primary driver of purchase for these segments.
While this analysis provides valuable insights, it’s important to acknowledge its limitations and consider opportunities for future improvement. The primary limitation is the limited data scope; our findings are based solely on the provided dataset and do not account for external factors. To create a more comprehensive view, future analysis should incorporate outside data, such as public economic indicators or competitor promotions. Furthermore, while we’ve identified a strong correlation between promotions and sales, we have not established a direct causal link. Other uncaptured variables, like simultaneous marketing campaigns, could be influencing the results. A more sophisticated approach, such as a controlled A/B test, would be needed to isolate the true impact of promotions.
Another area for enhancement is customer segmentation. The current report’s general trends could be made more precise by analyzing different customer groups. Applying clustering algorithms, like K-Means, could help to identify distinct customer segments based on their purchasing habits and demographics. This deeper understanding would enable the development of more targeted and effective promotional strategies, moving beyond a one-size-fits-all approach to better meet the needs of specific customer groups.
Finally, while focusing promotions on the top five revenue-producing products is a sound, low-risk strategy, it presents potential downsides. It could lead to missed opportunities by neglecting other high-potential products, and it risks market saturation as customers may become accustomed to frequent discounts on the same items, potentially devaluing the brand. Moving forward, a more balanced strategy is recommended. It should continue to capitalize on the success of the top performers while also allocating a portion of the promotional budget to test and develop other products. This approach would help uncover future revenue streams and maintain a fresh, engaging brand perception.
Our analysis has transformed our understanding of how promotions influence sales performance, customer behavior, and profitability. The data clearly shows that promotions are a powerful growth driver, with over 70% of products experiencing increased sales volume and nearly 65% seeing a revenue lift during promotional periods. Departments like Produce and Meat were particularly successful, confirming that well-structured promotions in these areas deliver substantial returns.
By focusing closely on our Top 5 revenue-generating products, we were able to identify which items respond most strongly to promotional activity and which benefit less. This product-level insight highlights where future promotional investments will have the greatest payoff; allowing us to prioritize high-impact items and refine our promotional mix for maximum efficiency.
Our findings also revealed that success depends heavily on the target customer. Distinct, high-potential segments; such as seniors, high- and low-income households, large families, and couples without children; respond differently to promotional messaging. Understanding these differences enables us to create more personalized, data-driven marketing campaigns that resonate with each group.
This report marks a strategic shift from intuition-based promotions to precision marketing rooted in analytics. By evaluating the effectiveness of different promotion types and their impact on our Top 5 products, we now have a clear roadmap for optimizing future campaigns. This data-driven framework empowers our teams to make smarter, more strategic decisions, matching the right product and promotion to the right customer segment. The result is a system that not only maximizes revenue and return on investment but also establishes a sustainable competitive advantage through customer-centric promotional strategies.