Introduction

Business Problem: Endcap displays are a well-established retail tool. It is widely accepted that prominent product placement drives sales. But does endcap placement actually work for every product? Regork may be allocating premium display space to categories that would sell just as well on a regular shelf, representing a missed opportunity to use that space more effectively.

The Key Question The ultimate goal of this project was to characterize endcap performance to identify products that are low performers, to see if premium endcap space could be better utilized.

Methodology: To answer this, we utilized the completejourney dataset, joining the transactions, promotions, and products tables. We isolated products placed on endcaps and analyzed their performance across a reproducible sample of 100 stores. Our methodology focused on calculating “Sales Lift” (comparing a product’s average weekly sales when featured on an endcap versus standard shelving) as well as analyzing average revenue per transaction and overall exposure time, measured in “store-weeks.” A “store-week” represents the total count of unique stores + week combinations, which will be explained in more detail below.

Proposed Solution: By identifying which categories consistently respond to endcap placement and which ones don’t, we can give Regork a data-driven basis for reallocating display space away from non-responsive categories toward proven performers.

Packages and Libraries

The following packages are utilized in this analysis:

# Libraries
library(completejourney)
library(tidyverse)
library(stringr)
library(plotly)
library(DT)
# Get full datasets
transactions <- get_transactions()
promotions <- get_promotions()

Data Preparation

This first step included narrowing down the dataset to a samples of 100 stores due to CPU processing limitations. In this step, we also combined any endcap location into one single variable.

endcap_codes <- c("3", "4", "5", "6")

# Sample 100 stores due to processing constraints
set.seed(20260222) #So that the same 100 stores are chosen reproducibly each time the
#analysis is performed

sample_stores <- promotions %>%
  distinct(store_id) %>%
  sample_n(100)

transactions_100 <- transactions %>%
  filter(store_id %in% sample_stores$store_id)

df <- transactions_100 %>%
  left_join(products, by = "product_id") %>%
  left_join(promotions, by = c("product_id", "week", "store_id"), 
             relationship = "many-to-many") %>% 
  mutate(display_group = ifelse(display_location %in% endcap_codes, "endcap", "no_endcap"))

Exploratory Analysis of Endcap Allocation by Department

To explore this business question, the first exploratory analysis was to identify which department(s) got the most time on endcaps, measured in a metric called “store-weeks.” As mentioned in the introduction, a store-week is calculated as the total count of unique stores + week combinations for a given category. For example, if a product is displayed on an endcap for just one week at each of the 100 stores, this would equal 100 store-weeks.

Additionally, we also looked to see which departments earned the highest revenue from endcap placement during the year, by looking at revenue per store-weeks.

#Question 1: What department(s) get the most endcap exposure timewise?

endcap_time_by_dept <- df %>%
  filter(display_group == "endcap") %>%
  group_by(department) %>%
  summarize(n_store_weeks = n_distinct(str_c(store_id, week, sep = "_"))) %>%
    arrange(desc(n_store_weeks))

ggplot(endcap_time_by_dept, aes(x = reorder(department, n_store_weeks), y = n_store_weeks)) +
  geom_col(fill = "#004990") + 
  coord_flip() +
  labs(title = "Endcap Allocation by Department",
       subtitle = "Total number of store-weeks across 100 stores",
       x = "Department",
       y = "Store-weeks on endcap") 

endcap_revenue_by_dept <- df %>%
   filter(display_group == "endcap", !is.na(department)) %>%
   group_by(department) %>%
   summarize(endcap_sales = sum(sales_value, na.rm = TRUE),
             n_store_weeks = n_distinct(str_c(store_id, week, sep = "_")),
             revenue_per_store_week = endcap_sales / n_store_weeks) %>%
     arrange(desc(revenue_per_store_week))

ggplot(endcap_revenue_by_dept, aes(x = reorder(department, revenue_per_store_week), 
                                   y = revenue_per_store_week)) +
  geom_col(fill = "#004990") + 
  coord_flip() +
  labs(title = "Endcap Revenue Efficiency by Department",
       subtitle = "Average revenue per store-week",
       x = "Department",
       y = "Average Revenue per Store Week ($)") 

From this preliminary analysis, we found that the Grocery and Drug & General Merchandise departments were given the most endcap time by a large margin. In line with that finding, Grocery also had the highest amount of average revenue from endcap placements. However, despite receiving the second most amount of time on endcaps, Drug & General Merchandising ranked 9th in revenue per store-week.

Since Grocery and Drug & General Merchandising dominate endcap space, we next want to focus on the individual product categories in these departments. We plotted two variables simultaneously to see how much exposure a product category is getting, and how much revenue it generates per transaction, shown on the interactive bubble plot below.

scatter_data <- df %>%
  filter(display_group == "endcap",
         department %in% c("GROCERY", "DRUG GM"),
         !is.na(product_category)) %>%
  group_by(product_category, department) %>%
  summarize(
    avg_revenue_per_transaction = mean(sales_value, na.rm = TRUE),
    total_revenue = sum(sales_value, na.rm = TRUE),
    n_transactions = n(),
    n_store_weeks = n_distinct(str_c(store_id, week, sep = "_")),
    .groups = "drop"
  ) %>%
  filter(n_transactions >= 20)

p_scatter <- ggplot(scatter_data, aes(x = n_store_weeks, 
                          y = avg_revenue_per_transaction,
                          color = department,
                          text = paste("Category:", product_category))) +
  geom_vline(xintercept = median(scatter_data$n_store_weeks), 
             linetype = "dashed", color = "gray50") +
  geom_hline(yintercept = median(scatter_data$avg_revenue_per_transaction), 
             linetype = "dashed", color = "gray50") +
  geom_point(aes(size = total_revenue), alpha = 0.7) +
  annotate("text", x = max(scatter_data$n_store_weeks) * 0.75, 
           y = max(scatter_data$avg_revenue_per_transaction) * 0.95,
           label = "High Revenue\nLow Exposure", 
           color = "darkgreen", size = 3, fontface = "bold") +
  annotate("text", x = max(scatter_data$n_store_weeks) * 0.75,
           y = min(scatter_data$avg_revenue_per_transaction) * 1.0,
           label = "Low Revenue\nHigh Exposure", 
           color = "red", size = 3, fontface = "bold") +
  scale_size_continuous(name = "Total Revenue ($)") +
  labs(
    title = "Endcap Exposure vs Revenue per Transaction",
    subtitle = "Bubble size represents total revenue\nHover over the bubbles to see the product labels",
    x = "Endcap Store-Weeks (Exposure)",
    y = "Avg Revenue per Transaction ($)",
    color = "Department"
  ) 

ggplotly(p_scatter, tooltip = c("text", "size", "x", "y"))

Hover over each point to explore individual product categories.

The bubble plot shows that many Drug & General Merchandise categories cluster in the upper left, including Infant Formula, Diapers, Adult Incontinence, and Vitamins, meaning they generate strong returns from less endcap time.

Grocery categories, meanwhile, spread mostly along the x-axis, meaning that the average revenue per transaction is lower. However, given more exposure time, frozen pizza, bag snacks, and crackers generate a lot of total revenue, as evidenced by the bigger bubble size.

To look more closely at the product category performance within each department, we next graphed boxplots showing the Top and Bottom 10 performers for Grocery and Drug & General Merchandising departments separately.

# Get top and bottom 10 categories

grocery_endcap_raw <- df %>%
  filter(display_group == "endcap", department == "GROCERY",
         !is.na(product_category)) %>%
  group_by(product_category) %>%
  mutate(n_transactions = n()) %>%
  filter(n_transactions >= 10) %>%
  ungroup()

grocery_order <- grocery_endcap_raw %>%
  group_by(product_category) %>%
  summarize(med = median(sales_value)) %>%
  arrange(desc(med))

top_bottom <- c(head(grocery_order$product_category, 10),
                tail(grocery_order$product_category, 10))

grocery_endcap_raw %>%
  filter(product_category %in% top_bottom) %>%
  mutate(group = ifelse(product_category %in% head(grocery_order$product_category, 10),
                        "Top 10", "Bottom 10")) %>%
  ggplot(aes(x = reorder(product_category, sales_value, median),
             y = sales_value,
             fill = group)) +
  geom_boxplot(alpha = 0.7) +
  scale_fill_manual(values = c("Bottom 10" = "tomato", "Top 10" = "#004990")) +
  coord_flip(ylim = c(0,25)) +
  labs(title = "Grocery Department: Best vs Worst Endcap Performers",
       subtitle = "Distribution of transaction values",
       x = "Product Category",
       y = "Sales Value per Transaction ($)",
       caption = "Note: Transactions above $25 excluded for readability",
       fill = "") +
  theme_minimal()

The chart above shows the gap between the top and bottom grocery performers on endcaps. Top categories like Dog Foods, Domestic Wine, and Beers/Ales generate higher median transaction, suggesting customers are willing to spend more on these items when they are on display. It should be noted that there is a wide spread, suggesting that performance varies a bit from store to store.

The bottom performers show a different story. Eggs, Dry Noodles/Pasta, and Powdered Drink Mixes show consistently low transaction values with very little variance. Endcap placement does not generate as much revenue for these products. This may reflect the fact that staple purchases will be sought out by customers regardless of placement, potentially suggesting that endcap placement may be better utilized by displaying other products.

# Get top and bottom 10 categories for DRUG GM

drug_endcap_raw <- df %>%
  filter(display_group == "endcap", department == "DRUG GM",
         !is.na(product_type)) %>%
  group_by(product_type) %>%
  mutate(n_transactions = n()) %>%
  filter(n_transactions >= 5) %>% # low threshold - note as limitation
  ungroup()

drug_order <- drug_endcap_raw %>%
  group_by(product_type) %>%
  summarize(med = median(sales_value)) %>%
  arrange(desc(med))

top_bottom_drug <- c(head(drug_order$product_type, 10),
                     tail(drug_order$product_type, 10))

drug_endcap_raw %>%
  filter(product_type %in% top_bottom_drug) %>%
  mutate(group = ifelse(product_type %in% head(drug_order$product_type, 10),
                        "Top 10", "Bottom 10")) %>%
  ggplot(aes(x = reorder(product_type, sales_value, median),
             y = sales_value,
             fill = group)) +
  geom_boxplot(alpha = 0.7) +
  scale_fill_manual(values = c("Bottom 10" = "tomato", "Top 10" = "#004990")) +
  coord_flip() +
  labs(title = "Drug & General Merchandise:\nBest vs Worst Endcap Performers",
       subtitle = "Distribution of transaction values",
       x = "Product Type",
       y = "Sales Value per Transaction ($)",
       caption = "Note: Products with a transactions threshold <5 were filtered out",
       fill = "") +
  theme_minimal()

Interestingly, the top performers for Drug & Genernal Merchandise are high-ticket necessity items, including Vitamins, Infant Formula, Adult Incontinence, and Baby Diapers. Endcap placement for these categories is beneficial.

The bottom performers are almost entirely small candy and convenience items, such as Candy Bars, Chewing Gum, Whitening Systems. These are low-value impulse purchases that generate very little revenue per transaction regardless of placement.

Sales Lift

The next question we wanted to answer was whether or not endcap placement actually made a difference towards sales of a product, compared to sales of a product during weeks when that product did not have endcap placement. We calculated “sales lift” as the percentage increase in average weekly sales during weeks when a category was featured on an endcap compared to weeks when it was not.

For example, if a category averaged $100 in weekly sales during non-endcap weeks and $130 during endcap weeks, that would represent a 30% sales lift. A category with zero or negative lift is essentially selling the same amount regardless of whether it gets premium display space, suggesting that that space could be better utilized by displaying other products.

We calculated lift for every category across all 100 stores, requiring at least 5 endcap weeks and 5 non-endcap weeks at a given store to ensure each comparison is meaningful. Only categories that met this threshold at 5 or more stores are included. The table below shows the median and average lifts for each qualifying category.

#Sales Lift Analysis 

all_results_full <- data.frame()
for(store in unique(df$store_id)) {
  temp <- df %>%
    filter(store_id == store, !is.na(product_category)) %>%
    group_by(week, product_category) %>%
    summarize(total_weekly_sales = sum(sales_value, na.rm = TRUE),
              had_endcap = any(display_group == "endcap"),
              .groups = "drop") %>%
    mutate(week_type = ifelse(had_endcap, "Endcap Week", "No Display Week"),
           store_id = store)
  all_results_full <- rbind(all_results_full, temp)
}

lift_by_store_full <- all_results_full %>%
  group_by(store_id, product_category, week_type) %>%
  summarize(avg_weekly_sales = mean(total_weekly_sales, na.rm = TRUE),
            n_weeks = n(),
            .groups = "drop") %>%
  pivot_wider(names_from = week_type,
              values_from = c(avg_weekly_sales, n_weeks)) %>%
  rename(endcap_sales = `avg_weekly_sales_Endcap Week`,
         no_display_sales = `avg_weekly_sales_No Display Week`,
         endcap_weeks = `n_weeks_Endcap Week`,
         no_display_weeks = `n_weeks_No Display Week`) %>%
  mutate(sales_lift_pct = (endcap_sales - no_display_sales) / 
                           no_display_sales * 100) %>%
  filter(!is.na(sales_lift_pct),
         endcap_weeks >= 5,
         no_display_weeks >= 5)

lift_full <- lift_by_store_full %>%
  group_by(product_category) %>%
  summarize(
    avg_lift = mean(sales_lift_pct),
    median_lift = median(sales_lift_pct),
    n_stores = n()
  ) %>%
  filter(n_stores >= 5) %>%
  arrange(desc(median_lift))

# We are selecting a few categories for visualization.
categories_to_plot <- c(
  "BAKING NEEDS", "FROZEN PIZZA", "SOUP", "CRACKERS/MISC BKD FD",
  "BEERS/ALES", "BAG SNACKS", "COLD CEREAL", "COOKIES/CONES",
  "FRZN BREAKFAST FOODS", "FLUID MILK PRODUCTS", "BAKED BREAD/BUNS/ROLLS",
  "EGGS", "SALD DRSNG/SNDWCH SPRD",
  "SOFT DRINKS", "SHORTENING/OIL"
)

lift_by_store_full %>%
  filter(product_category %in% categories_to_plot) %>%
  ggplot(aes(x = reorder(product_category, sales_lift_pct, median), 
             y = sales_lift_pct)) +
  geom_boxplot(fill = "#004990", alpha = 0.7) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  coord_flip() +
  labs(title = "Distribution of Endcap Sales Lift Across Stores",
       subtitle = "Selected categories illustrating range of performance",
       x = "Product Category",
       y = "Sales Lift (%)",
       caption = "Categories selected to represent strong and weak endcap performers from full results table above.") +
  theme_minimal()

lift_by_store_full %>%
  filter(product_category %in% lift_full$product_category) %>%
  group_by(product_category) %>%
  summarize(
    median_lift = median(sales_lift_pct),
    avg_lift = mean(sales_lift_pct),
    pct_stores_positive = mean(sales_lift_pct > 0) * 100,
    n_stores = n()
  ) %>%
  arrange(desc(median_lift)) %>%
  datatable(
    colnames = c("Product Category", "Median Lift (%)", 
                 "Avg Lift (%)", "% Stores Positive", "N Stores"),
    caption = "Endcap Sales Lift by Product Category (All Stores)",
    options = list(pageLength = 15)
  ) %>%
  formatRound(columns = c(2, 3, 4), digits = 1)

The full results table above contains all qualifying categories showing the range of endcap performance. The chart shows a visual representation to illustrate the spectrum of performance.

The top performers include Baking Needs (+157%!) Frozen Pie/Desserts, Frozen Pizza, Canned Vegetables, Wine, and Beers/Ales. These products show consistently positive lift, meaning that they reliably respond well to endcap placement across most stores.

The middle of the chart shows products like Soft Drinks, Cereal, and Salad Dressing show modest but positive median lift, suggesting that endcaps help, but not dramatically. These are not priority candidates but aren’t wasted space either.

The bottom of the chart shows Frozen Breakfast Foods and Shortening/Oil show median lift near or below zero across many stores, suggesting that customers buy these products regardless of where they are in the store or even that endcap placement may be counterproductive or wasteful for that category. These are good candidates for endcap removal.

Mailer Effect on Endcap Performance

The last question we wanted to explore was whether or not mailers contributed to revenue of products displayed on endcaps. The interactive chart below shows the effect of combining a mailer with an endcap versus endcap placement alone. Each point represents a product category, plotted by the change in average revenue and average quantity per transaction.

mailer_quantity <- df %>%
  filter(display_group == "endcap",
         department %in% c("GROCERY", "DRUG GM"),
         !is.na(product_category)) %>%
  mutate(mailer_flag = ifelse(!is.na(mailer_location) & mailer_location != "0",
                               TRUE, FALSE)) %>%
  group_by(product_category, department, mailer_flag) %>%
  summarize(
    avg_revenue = mean(sales_value, na.rm = TRUE),
    avg_quantity = mean(quantity, na.rm = TRUE),
    n_transactions = n(),
    .groups = "drop"
  ) %>%
  pivot_wider(names_from = mailer_flag,
              values_from = c(avg_revenue, avg_quantity, n_transactions)) %>%
  rename(
    avg_revenue_endcap_only = avg_revenue_FALSE,
    avg_revenue_endcap_mailer = avg_revenue_TRUE,
    avg_quantity_endcap_only = avg_quantity_FALSE,
    avg_quantity_endcap_mailer = avg_quantity_TRUE
  ) %>%
  filter(!is.na(avg_revenue_endcap_only) & !is.na(avg_revenue_endcap_mailer)) %>%
  mutate(
    quantity_effect = avg_quantity_endcap_mailer - avg_quantity_endcap_only,
    revenue_effect = avg_revenue_endcap_mailer - avg_revenue_endcap_only
  )

p <- ggplot(mailer_quantity, aes(x = revenue_effect, 
                                  y = quantity_effect,
                                  color = department,
                                  text = product_category)) +
  geom_point(alpha = 0.7) +
  geom_vline(xintercept = 0, linetype = "dashed", color = "gray50") +
  geom_hline(yintercept = 0, linetype = "dashed", color = "gray50") +
  annotate("text", x = max(mailer_quantity$revenue_effect) * 0.7,
           y = max(mailer_quantity$quantity_effect) * 0.9,
           label = "More units\nMore revenue", color = "darkgreen", 
           size = 3, fontface = "bold") +
  annotate("text", x = min(mailer_quantity$revenue_effect) * 0.7,
           y = max(mailer_quantity$quantity_effect) * 0.9,
           label = "More units\nLess revenue", color = "#004990", 
           size = 3, fontface = "bold") +
  annotate("text", x = max(mailer_quantity$revenue_effect) * 0.7,
           y = min(mailer_quantity$quantity_effect) * 0.9,
           label = "Fewer units\nMore revenue", color = "darkorange", 
           size = 3, fontface = "bold") +
  annotate("text", x = min(mailer_quantity$revenue_effect) * 0.7,
           y = min(mailer_quantity$quantity_effect) * 0.9,
           label = "Fewer units\nLess revenue", color = "red", 
           size = 3, fontface = "bold") +
  labs(
    title = "Mailer Effect on Quantity vs Revenue When on Endcap",
    x = "Change in Avg Revenue per Transaction ($)",
    y = "Change in Avg Quantity per Transaction",
    color = "Department"
  ) +
  theme_minimal()

ggplotly(p, tooltip = "text")

Hover over each point to explore individual product categories.

The majority of categories cluster tightly around zero on both axes, meaning that for most products, adding a mailer doesn’t meaningfully change either how much customers buy or how much they spend. Endcap placement appears to be doing most of the heavy lifting on its own.

Overall, mailers appear to have limited and inconsistent impact when a product is already on an endcap, with a few category-specific exceptions. A more dedicated, thorough analysis of promotions would be needed to generate actionable insights.

Summary and Recommendations

Problem Statement Regork allocates premium endcap display space under the assumption that prominent product placement drives sales. This analysis challenged that assumption by asking if there are product categories that would sell just as well on a regular shelf, and is Regork therefore wasting valuable real estate?

Approach Using the completejourney dataset, we joined transaction, promotion, and product data across a reproducible sample of 100 stores. We looked to see where endcap space was being given, the revenue per store-week of products on display, and sales lift, to look at the percentage difference in average weekly sales during endcap weeks versus non-endcap weeks. Finally, we set the stage for a more in-depth mailer interaction analysis, which would assess whether combined promotions amplified endcap returns. This is a project that Regork could explore to further optimize endcap product placement.

Insights Uncovered:

Implications and Recommendations for Regork Management:

Limitations and Future Improvements: