Overview

Introduction

.
.

Regork is an important company that manages a variety of products. Their catalog is so big that the need to track multiple items simultaneously is pivotal. One distinct thing that Regork does is their coupons and promotions throughout the store. As analysts, we believe there should be a placement strategy for promotions. To better utilize their marketing budget, we decided to dive deep into the statistics and find the most profitable strategy, whereby Regork will be able to increase sales and decrease costs of marketing.

Business Problem
Which products deliver the highest profitability for Regork and where should their promotions be placed? Is there any opportunity to strategize promotion location for multiple products?

How we addressed the problem
We will utilize the transaction information to analyze how different product categories contributed to the company’s revenue. Then we will analyze their contribution in terms of promotion locations to see Where they are located and Which location yielded the highest revenue. Lastly, we will expand our research by contemplating other potential areas of growth. We will find complementary goods that show most potential. Accompanying this will be a recommendation of locational strategies.

Why our analysis brings values to Regork
We believe in the values of this analysis as well as the feasibility of expanding it for other areas of Regork.

Datasets and tables

For our analysis we used the following packages:
Tidyverse: for data manipulation, visualization, and analysis
completejouney: dataset package containing all Regork data
dplyr: to facilitate data manipulation like filtering, selecting and transforming data
stringr: to makes text manipulation easy
lubridate: to handle date and time data and calculating differences between dates
ggplot2: for data visualization
scales: to format and adjust scales in plots
DT: to creates interactive data tables in R markdown

# Loading the packages
library(DT)
library(tidyverse)
library(completejourney)
library(dplyr)
library(stringr)
library(lubridate)
library(ggplot2)
library(scales)

Due to our business problem, we employed 3 main data sets including:
- Transactions
- Promotions
- Products

# Getting the full datasets
transactions <- get_transactions()
promotions <- get_promotions()

Since we planned to investigate the impact of promotions location on total sales, we created a complete table joining ‘transactions’, ‘products’ and ‘promotions’ datasets. This way we could easily granulate our analysis in different levels of depth.

completedataset <- transactions %>%
  inner_join(products) %>%
  inner_join(promotions, by = c("product_id", "store_id", "week")) %>%
  mutate(month = month(transaction_timestamp, label = TRUE)) %>%
  mutate(display_location = as.character(display_location)) %>%
  mutate(display_label = case_when(
    display_location == 0 ~ "No Display",
    display_location == 1 ~ "Display In-store",
    display_location == 2 ~ "Store Rear",
    display_location == 3 ~ "Front End Cap",
    display_location == 4 ~ "Rear End Cap",
    display_location == 5 ~ "Mid-aisle End Cap",
    display_location == 6 ~ "Side-aisle End Cap",
    display_location == 7 ~ "In Aisle",
    display_location == 9 ~ "Secondary Location",
    display_location == "A" ~ "In Shelf",
    TRUE ~ "Unknown"
  ))

datatable(head(completedataset), options = list(pageLength = 5, scrollX = TRUE))

Lastly, we performed a Pareto table in our annual sales to narrow down our investigation and pick the product categories that had the greatest impact on sales for the store as a whole.

# Create a data set with the total yearly sales (in dollars) grouped by Product Category
paretoyearly_sales <-  completedataset %>%
  filter(display_label != "No Display") %>%
  group_by(product_category) %>%
  summarise(total_sales = sum(sales_value)) %>%
  arrange(desc(total_sales)) %>%
  mutate(cumulative_sales = cumsum(total_sales), 
         cumulative_percent = cumulative_sales / sum(total_sales) * 100)

datatable(head(paretoyearly_sales), options = list(pageLength = 5, scrollX = TRUE))

Data Analysis

Initial Analysis

To create valuable insights for Regork, we performed a Pareto analysis to determine the top product categories that accounted for the highest sale annually. To our surprise, Soft Drinks was the most sold category, far surpassing other products considered, making approximately 50% of the total revenue. Upon seeing these results, we decided to move forwards with analyzing Soft Drink, since it was the greatest area of revenue for Regork.

ggplot(paretoyearly_sales, aes(x = reorder(product_category, -total_sales), y = total_sales)) + 
  geom_bar(stat = "identity", fill = "steelblue") +  # Bars for total sales 
  geom_line(aes(y = cumulative_percent * max(total_sales) / 100, group = 1), color = "red", size = 1) +  # Cumulative % 
  geom_point(aes(y = cumulative_percent * max(total_sales) / 100), color = "red", size = 2) +  # Points on line 
  scale_y_continuous(sec.axis = sec_axis(~ . / max(paretoyearly_sales$total_sales) * 100, name = "Cumulative Percentage")) +  # Secondary Y-axis 
  labs(title = "Pareto Analysis of Product Sales", 
       x = "Product Category", y = "Total Revenue") +
  scale_y_continuous(labels = scales::dollar) +
  theme(axis.title.x = element_text(face = "bold"),
        axis.title.y = element_text(face = "bold"),
        plot.title = element_text(size = 13, face = "bold"),
        plot.subtitle = element_text(size = 9, face = "italic"))

Promotions Locations

Our second step was to inspect the financial influence of transactions conducted during which products and promotions were displayed versus not displayed. The reason was to see if displaying promotions yielded the results expected, since we had the hypothesis that product displayed had higher sales.

The analysis below confirmed our initial thoughts. Transactions of no display make up only 22.9% of the total sales. While products displayed contributed to the 77.1% of the whole revenue.

display_option <- completedataset %>%
  mutate(display_option = case_when(
    display_label == "No Display" ~ "No Display",
    display_label != "No Display" ~ "Displayed",
    TRUE ~ "Unknown"
  )) %>%
  group_by(product_id) %>%
  summarise(has_displayed = any(display_option == "Displayed"), 
            has_undisplayed = all(display_option == "No Display"),
            total_sales = sum(sales_value)) %>%
  mutate(category = ifelse(has_displayed, "Promoted", "No Promotion at all")) %>%
  group_by(category) %>%
  summarise(total_sales = sum(total_sales)) %>%
  arrange(desc(total_sales))

display_option <- display_option %>%
  mutate(percentage = total_sales / sum(total_sales) * 100)  # Calculate percentage

ggplot(display_option, aes(x = "", y = total_sales, fill = category)) +
  geom_bar(stat = "identity", width = 1) +  # Creates a bar chart
  coord_polar(theta = "y") +  # Converts it into a pie chart
  labs(title = "Sales Distribution by Display Option",
       subtitle = "Regork seemingly sold less products in which they invested 
no in-store marketing strategies at all.") +
  geom_text(aes(label = paste0(round(percentage, 1), "%")), color = "white", 
            position = position_stack(vjust = 0.5), size = 6) + 
  theme(axis.text.y = element_blank(),
        axis.text.x = element_blank(),
        axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        plot.title = element_text(size = 13, face = "bold"),
        plot.subtitle = element_text(size = 9, face = "italic"))


After filtering out the no-displays, we looked at the total annual sales by promotion locations. Our results showed that the location IN AISLE rendered the highest annual sales, followed by Store Rear and Secondary Location. Our next step was to look at the relationship between Soft Drinks and the ‘In Aisle’ location.

display_locations <- completedataset %>%
  filter(display_label != "No Display") %>%
  group_by(display_label) %>%
  summarise(total_sales = round(sum(sales_value), 0)) %>%
  arrange(desc(total_sales)) 

ggplot(display_locations, aes(x = reorder(display_label, total_sales), y = total_sales, fill = display_label)) +
  geom_col()  +
  geom_text(aes(label = scales::dollar(total_sales)), hjust = 1.2, color = "white", size = 3) +
  scale_y_continuous(labels = scales::dollar) +
  labs(title = "Total Sales by Promotion Location",
       subtitle = "Products promoted in-aisle accounted for the most sales value.",
       x = "Location",
       y = "Total Sales") +
  coord_flip() +
  theme(legend.position = "none", 
        axis.title.x = element_text(face = "bold"),
        axis.title.y = element_text(face = "bold"),
        plot.title = element_text(face = "bold", size = 13),
        plot.subtitle = element_text(size = 9, face = "italic"))

Products & Locations

After analyzing the highest contribution from a product category and the highest sales, we decided to analyze the relationship of soft drink and ‘In Aisle’ location to determine a better promotion mapping. The result showed that Soft Drinks was in fact the main contributor to sales of promotions displayed in aisle. However, we want to also see if ‘In Aisle’ location is the main reason for the high sales of Soft Drink. Following this will be the analysis.

in_aisle <- completedataset %>%
  group_by(product_category) %>%
  filter(display_label == "In Aisle")

top_10_in_aisle <- completedataset %>%
  filter(display_label == "In Aisle") %>%
  group_by(product_category) %>%
  summarise(total_sales = sum(sales_value)) %>%
  arrange(desc(total_sales)) %>%
  mutate(cumulative_sales = cumsum(total_sales), 
         cumulative_percent = cumulative_sales / sum(total_sales) * 100) %>%
  slice_max(order_by = total_sales, n = 10)


# Plot for Top 10 Products with In-aisle Promotions
ggplot(top_10_in_aisle, aes(x = reorder(product_category, -total_sales), y = total_sales, fill = product_category)) +
  geom_col() +  # Bars for total sales
  scale_y_continuous(
    name = "Total Sales",
    labels = scales::dollar_format(prefix = "$")
  ) +
  labs(title = "Top 10 Products with In-aisle Promotions in Total Sales",
       subtitle = "A breakdown of products promoted specifically in aisle.\nWe are interested in Soft Drinks, which observed the highest sales.",
       x = "Product Category", y = "Total Sales") + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 10)) +
  theme(legend.position = "none", 
        axis.title.x = element_text(face = "bold"),
        axis.title.y = element_text(face = "bold"),
        plot.title = element_text(face = "bold", size = 13),
        plot.subtitle = element_text(size = 9, face = "italic"),
        axis.text.x = element_text(angle = 45, hjust = 1, size = 10)) +
  scale_fill_viridis_d()

We then analyzed our data by weeks, since most products changed their promotion location weekly. We first divided the sales of soft drink by ‘In Aisle’ and ‘Elsewhere’ (which represented sales when the promotion was displayed elsewhere, NOT in aisle). We wanted to confirm whether or not ‘In Aisle’ was the most profitable location for promoting soft drinks.

From the chart below, our hypothesis was correct. The sales of soft drinks in aisle was almost as high as when displayed elsewhere. This represented that the sales when soft drinks was solely promoted in aisle was roughly the same as that when they were promoted in multiple other locations.

# Look at the sales throughout the year depending on the promotion location ONLY for SOFT DRINKS
soft_drink_weekly_sales <- completedataset %>%
  filter(product_category == "SOFT DRINKS" & display_label != "No Display") %>%
  mutate(location_group = ifelse(display_label == "In Aisle", "In Aisle", "Other Locations")) %>%
  group_by(week, location_group) %>%
  summarise(Tot_sales = sum(sales_value), .groups = "drop") 

soft_drink_weekly_sales %>%
  ggplot(aes(x = week, y = Tot_sales, color = location_group, group = location_group)) +
  geom_line(size = 1) +  
  scale_y_continuous(labels = scales::dollar) +
  labs(title = "Weekly Total Sales of Soft Drinks: In Aisle vs. Other Locations",
       subtitle = "Figures of In-Aisle promotions alone were comparable to those of other locations combined.",
       x = "Week",
       y = "Total Sales") +
  theme(axis.title.x = element_text(face = "bold", size = 9),
        axis.title.y = element_text(face = "bold"),
        plot.title = element_text(face = "bold", size = 13)) +
  scale_fill_viridis_d()

Complementary Goods

In our previous analysis, we determined that soft drinks and in-aisle promotions might have gone hand in hand in driving sales. However, we wanted to determine if there was any area of growth for strategizing promotion locations.

With that in mind, we analyzed the top 10 products that were most commonly bought along with soft drinks, and determined where their promotions were located. Specifically, We compared their sales when respective promotions were In Aisle vs Elsewhere.

nd <- completedataset %>%
  group_by(product_category) %>%
  filter(all(display_location == 0)) %>%
  summarise(total_sales = sum(sales_value)) %>%
  arrange(desc(total_sales)) %>%
  mutate(cumulative_sales = cumsum(total_sales), 
         cumulative_percent = cumulative_sales / sum(total_sales) * 100)

elsewhere <- completedataset %>%
  group_by(product_category) %>%
  filter(display_label != "In Aisle")

# Top 10 Products purchased with Soft Drinks that are promoted in-aisle
soft_drinks_basket_ia <- completedataset %>%
  semi_join(in_aisle) %>%
  group_by(basket_id) %>%
  filter(any(product_category == "SOFT DRINKS"))

co_purchases_ia <- completedataset %>%
  semi_join(in_aisle) %>%
  filter(basket_id %in% soft_drinks_basket_ia$basket_id & product_category != "SOFT DRINKS")

co_purchases_sales_ia <- co_purchases_ia %>%
  group_by(product_category) %>%
  summarise(total_sales = round(sum(sales_value), 0)) %>%
  arrange(desc(total_sales)) %>%
  slice_max(order_by = total_sales, n = 10)

# The same 10 products but when they were promoted elsewhere
co_purchases_sales_ia_e <- completedataset %>%
  filter(product_category %in% elsewhere$product_category & display_label != "In Aisle") %>%
  semi_join(co_purchases_sales_ia, by = "product_category") %>%
  group_by(product_category) %>%
  summarise(total_sales = round(sum(sales_value), 0)) %>%
  arrange(desc(total_sales)) %>%
  slice_max(order_by = total_sales, n = 10)

# Top 10 Products purchased with Soft Drinks that are only promoted elsewhere
soft_drinks_basket_e <- completedataset %>%
  filter(product_category %in% elsewhere$product_category) %>%
  group_by(basket_id) %>%
  filter(any(product_category == "SOFT DRINKS"))

co_purchases_e <- completedataset %>%
  anti_join(co_purchases_ia) %>%
  filter(display_label != "No Display" & display_label != "In Aisle") %>%
  filter(basket_id %in% soft_drinks_basket_e$basket_id & product_category != "SOFT DRINKS")

co_purchases_sales_e <- co_purchases_e %>%
  anti_join(co_purchases_sales_ia) %>%
  group_by(product_category) %>%
  summarise(total_sales = round(sum(sales_value), 0)) %>%
  arrange(desc(total_sales)) %>%
  slice_max(order_by = total_sales, n = 10)

# Top 10 Products purchased with Soft Drinks not promoted
co_purchases_n <- completedataset %>%
  anti_join(co_purchases_ia) %>%
  anti_join(co_purchases_e) %>%
  filter(display_label == "No Display") %>%
  filter(basket_id %in% soft_drinks_basket_e$basket_id & product_category != "SOFT DRINKS")

co_purchases_sales_n <- co_purchases_n %>%
  filter(product_category %in% nd$product_category) %>%
  group_by(product_category) %>%
  summarise(total_sales = round(sum(sales_value), 0)) %>%
  arrange(desc(total_sales)) %>%
  slice_max(order_by = total_sales, n = 10)

# Combining
co_purchases_combined <- bind_rows(
  co_purchases_sales_ia %>% mutate(display_option = "Aisle"),
  co_purchases_sales_ia_e %>% mutate(display_option = "Elsewhere")
)
co_purchases_combined <- co_purchases_combined %>%
  group_by(display_option) %>%
  arrange(desc(total_sales)) %>%  # Ensuring proper sorting before converting to factor
  mutate(product_category = factor(product_category, levels = unique(product_category))) %>%
  ungroup()

datatable(co_purchases_combined, options = list(pageLength = 5, scrollX = TRUE))

Interestingly, we found out that the sales of complementary goods is highest when promoted NOT in aisle. This meant that having a more diverse placement of those products is more advantageous.

# Plot for Top 10 Complementary Products of Soft Drinks by Display Location
ggplot(co_purchases_combined, aes(x = product_category, y = total_sales, color = product_category)) +
  geom_segment(aes(x = product_category, xend = product_category, y = 0, yend = total_sales), size = 1.9) +
  geom_point(size = 8.5) +
  geom_text(aes(label = scales::comma(total_sales)), vjust = 0.4, color = "white", size = 2.5, fontface = "bold") +
  scale_y_continuous(labels = scales::dollar,
                     expand = expansion(mult = c(0, 0.1))) +
  labs(title = "Top 10 Complementary Goods of Soft Drinks 
by Promotion Location and Total Sales",
       subtitle = "Below are the total sales of 10 products promoted in-aisle (leftward)
and their respective statistics when promoted elsewhere (rightward).",
       x = "Product Category",
       y = "Total Sales") +
  coord_flip() +
  facet_wrap(~display_option) +  # Creates separate panels
  theme(legend.position = "none", 
        axis.title.x = element_text(face = "bold", size = 9),
        axis.title.y = element_text(face = "bold"),
        plot.title = element_text(face = "bold", size = 13),
        plot.subtitle = element_text(size = 9, face = "italic"),
        plot.margin = margin(10, 10, 10, 30)) +
  scale_color_viridis_d() 

We further analyzed complementary products that were displayed elsewhere, NOT in aisle, versus the ones that were not displayed at all.

co_purchases_combined_e <- bind_rows(
  co_purchases_sales_e %>% mutate(display_option = "Displayed"),
  co_purchases_sales_n %>% mutate(display_option = "No Display")
)

co_purchases_combined_e <- co_purchases_combined_e %>%
  group_by(display_option) %>%
  arrange(desc(total_sales)) %>%  
  mutate(product_category = factor(product_category, levels = unique(product_category))) %>%
  ungroup()

datatable(co_purchases_combined_e, options = list(pageLength = 5, scrollX = TRUE))

This helped us discover another area of growth, since at the beginning we found that promoted products yielded higher sales than the ones not promoted anywhere. Specifically, we found a list of products which, despite not being promoted yet, had the potential to be co-purchased with soft drinks. This list included items such as chicken/poultry, stone fruits, salad bar, etc. Regork can try investing in promotion campaigns for those products and see if they really drive higher sales.

ggplot(co_purchases_combined_e, aes(x = product_category, y = total_sales, color = product_category)) +
  geom_segment(aes(x = product_category, xend = product_category, y = 0, yend = total_sales), size = 1.9) +
  geom_point(size = 8.5) +
  geom_text(aes(label = scales::comma(total_sales)), vjust = 0.4, color = "white", size = 2.5, fontface = "bold") +
  scale_y_continuous(labels = scales::dollar, 
                     expand = expansion(mult = c(0, 0.1))) +
  labs(title = "Top 10 Complementary Goods of Soft Drinks 
by Promotion Option and Total Sales",
       subtitle = "On the left are the total sales of 10 products purchased with soft drinks
but promoted elsewhere, while one the right are those of products \nnot promoted at all.",
       x = "Product Category",
       y = "Total Sales") +
  coord_flip() +
  facet_wrap(~display_option, scales = "free") +  # Creates separate panels
  theme(legend.position = "none", 
        axis.title.x = element_text(face = "bold", size = 9),
        axis.title.y = element_text(face = "bold"),
        axis.text.x = element_text( angle = 45),
        plot.title = element_text(face = "bold", size = 13),
        plot.subtitle = element_text(size = 9, face = "italic"),
        plot.margin = margin(10, 10, 10, 30)) +
  scale_color_viridis_d() 

Summary & Findings

Problem Statement
Our main business questions was to examine which products delivered the highest profitability for Regork and where their promotions should be placed. Following that is whether or not it is viable to strategize promotion locations for multiple products.

How we addressed problem
We addressed our problem by looking at the most profitable product category and its promotion location. We then analyzed their relation and determined the contribution of major promotion locations. Then, to find areas of growth, we analyzed the complementary goods. All of these were to create a complete picture as to how Regork can strategize the locations of product promotions.

Interesting Insights
Our primary insight was that soft drinks drove more than 50% of the company business,leading far ahead of other products. In terms of promotion locations, we found out that in-aisle promotions drove the highest sales among all locations. This is surprising considering the common assumption that store-front promotions appeal the most to consumers.

Another insight from our analysis was that promoted products were purchased more often than those without promotions. This shows that promotion campaigns were worth the investment.

Our most surprising insight was that customers bought more complementary goods of soft drinks when they are promoted elsewhere other than in aisle. We do want to clarify that this is what the analysis showed for Soft drinks. We would need further analysis on other big product categories. Nevertheless, the main point is the importance of having a diversified mapping of promotions.

What we propose to Regork

We propose that Regork strategize their promotion locations by decreasing promotions for Soft Drink in areas other than ‘In Aisle’. Ultimately, this will decrease the costs of marketing without affecting the sales of their major contributor, although we understand that it takes time for all the adjustments to kick in.

Our second recommendation is to move the promotions for complementary goods of soft drinks to other locations. Like the previous recommendation, this will reduce the material costs and betofter utilize promotion budget.

Finally, we encourage Regork to display promotions for complementary products such as chicken/poultry and salad bar, since they’re commonly bought with soft drinks and promoting these may drive sales for the business.

Limitations of Analysis

The limitation of our analysis lies in the relations between our findings and campaigns. If we had more time, we would’ve analyzed the performance of each location by campaign length. We would’ve also implemented the factor of seasonality. Additionally, as mentioned, we only analyzed certain product category that we assumed the most valuable for the company. This type of study could be done for other product categories.

.
.