Regork is an important company that manages a variety of products.
Their catalog is so big that the need to track multiple items
simultaneously is pivotal. One distinct thing that Regork does is their
coupons and promotions throughout the store. As analysts, we believe
there should be a placement strategy for promotions. To better utilize
their marketing budget, we decided to dive deep into the statistics and
find the most profitable strategy, whereby Regork will be able to
increase sales and decrease costs of marketing.
Business Problem
Which products deliver the
highest profitability for Regork and where should their promotions be
placed? Is there any opportunity to strategize promotion location for
multiple products?
How we addressed the problem
We will utilize
the transaction information to analyze how different product categories
contributed to the company’s revenue. Then we will analyze their
contribution in terms of promotion locations to see Where they are
located and Which location yielded the highest revenue. Lastly, we will
expand our research by contemplating other potential areas of growth. We
will find complementary goods that show most potential. Accompanying
this will be a recommendation of locational strategies.
Why our analysis brings values to Regork
We
believe in the values of this analysis as well as the feasibility of
expanding it for other areas of Regork.
For our analysis we used the following packages:
Tidyverse: for data manipulation, visualization, and
analysis
completejouney: dataset package
containing all Regork data
dplyr: to facilitate
data manipulation like filtering, selecting and transforming data
stringr: to makes text manipulation easy
lubridate: to handle date and time data and calculating
differences between dates
ggplot2: for data
visualization
scales: to format and adjust scales
in plots
DT: to creates interactive data tables in
R markdown
# Loading the packages
library(DT)
library(tidyverse)
library(completejourney)
library(dplyr)
library(stringr)
library(lubridate)
library(ggplot2)
library(scales)
Due to our business problem, we employed 3 main data sets including:
- Transactions
- Promotions
- Products
# Getting the full datasets
transactions <- get_transactions()
promotions <- get_promotions()
Since we planned to investigate the impact of promotions location on total sales, we created a complete table joining ‘transactions’, ‘products’ and ‘promotions’ datasets. This way we could easily granulate our analysis in different levels of depth.
completedataset <- transactions %>%
inner_join(products) %>%
inner_join(promotions, by = c("product_id", "store_id", "week")) %>%
mutate(month = month(transaction_timestamp, label = TRUE)) %>%
mutate(display_location = as.character(display_location)) %>%
mutate(display_label = case_when(
display_location == 0 ~ "No Display",
display_location == 1 ~ "Display In-store",
display_location == 2 ~ "Store Rear",
display_location == 3 ~ "Front End Cap",
display_location == 4 ~ "Rear End Cap",
display_location == 5 ~ "Mid-aisle End Cap",
display_location == 6 ~ "Side-aisle End Cap",
display_location == 7 ~ "In Aisle",
display_location == 9 ~ "Secondary Location",
display_location == "A" ~ "In Shelf",
TRUE ~ "Unknown"
))
datatable(head(completedataset), options = list(pageLength = 5, scrollX = TRUE))
Lastly, we performed a Pareto table in our annual sales to narrow down our investigation and pick the product categories that had the greatest impact on sales for the store as a whole.
# Create a data set with the total yearly sales (in dollars) grouped by Product Category
paretoyearly_sales <- completedataset %>%
filter(display_label != "No Display") %>%
group_by(product_category) %>%
summarise(total_sales = sum(sales_value)) %>%
arrange(desc(total_sales)) %>%
mutate(cumulative_sales = cumsum(total_sales),
cumulative_percent = cumulative_sales / sum(total_sales) * 100)
datatable(head(paretoyearly_sales), options = list(pageLength = 5, scrollX = TRUE))
To create valuable insights for Regork, we performed a Pareto analysis to determine the top product categories that accounted for the highest sale annually. To our surprise, Soft Drinks was the most sold category, far surpassing other products considered, making approximately 50% of the total revenue. Upon seeing these results, we decided to move forwards with analyzing Soft Drink, since it was the greatest area of revenue for Regork.
ggplot(paretoyearly_sales, aes(x = reorder(product_category, -total_sales), y = total_sales)) +
geom_bar(stat = "identity", fill = "steelblue") + # Bars for total sales
geom_line(aes(y = cumulative_percent * max(total_sales) / 100, group = 1), color = "red", size = 1) + # Cumulative %
geom_point(aes(y = cumulative_percent * max(total_sales) / 100), color = "red", size = 2) + # Points on line
scale_y_continuous(sec.axis = sec_axis(~ . / max(paretoyearly_sales$total_sales) * 100, name = "Cumulative Percentage")) + # Secondary Y-axis
labs(title = "Pareto Analysis of Product Sales",
x = "Product Category", y = "Total Revenue") +
scale_y_continuous(labels = scales::dollar) +
theme(axis.title.x = element_text(face = "bold"),
axis.title.y = element_text(face = "bold"),
plot.title = element_text(size = 13, face = "bold"),
plot.subtitle = element_text(size = 9, face = "italic"))
Our second step was to inspect the financial influence of transactions conducted during which products and promotions were displayed versus not displayed. The reason was to see if displaying promotions yielded the results expected, since we had the hypothesis that product displayed had higher sales.
The analysis below confirmed our initial thoughts. Transactions of no display make up only 22.9% of the total sales. While products displayed contributed to the 77.1% of the whole revenue.
display_option <- completedataset %>%
mutate(display_option = case_when(
display_label == "No Display" ~ "No Display",
display_label != "No Display" ~ "Displayed",
TRUE ~ "Unknown"
)) %>%
group_by(product_id) %>%
summarise(has_displayed = any(display_option == "Displayed"),
has_undisplayed = all(display_option == "No Display"),
total_sales = sum(sales_value)) %>%
mutate(category = ifelse(has_displayed, "Promoted", "No Promotion at all")) %>%
group_by(category) %>%
summarise(total_sales = sum(total_sales)) %>%
arrange(desc(total_sales))
display_option <- display_option %>%
mutate(percentage = total_sales / sum(total_sales) * 100) # Calculate percentage
ggplot(display_option, aes(x = "", y = total_sales, fill = category)) +
geom_bar(stat = "identity", width = 1) + # Creates a bar chart
coord_polar(theta = "y") + # Converts it into a pie chart
labs(title = "Sales Distribution by Display Option",
subtitle = "Regork seemingly sold less products in which they invested
no in-store marketing strategies at all.") +
geom_text(aes(label = paste0(round(percentage, 1), "%")), color = "white",
position = position_stack(vjust = 0.5), size = 6) +
theme(axis.text.y = element_blank(),
axis.text.x = element_blank(),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
plot.title = element_text(size = 13, face = "bold"),
plot.subtitle = element_text(size = 9, face = "italic"))
After filtering out the no-displays, we looked at the total annual
sales by promotion locations. Our results showed that the location
IN AISLE rendered the highest annual sales, followed by
Store Rear and Secondary Location. Our
next step was to look at the relationship between Soft Drinks and the
‘In Aisle’ location.
display_locations <- completedataset %>%
filter(display_label != "No Display") %>%
group_by(display_label) %>%
summarise(total_sales = round(sum(sales_value), 0)) %>%
arrange(desc(total_sales))
ggplot(display_locations, aes(x = reorder(display_label, total_sales), y = total_sales, fill = display_label)) +
geom_col() +
geom_text(aes(label = scales::dollar(total_sales)), hjust = 1.2, color = "white", size = 3) +
scale_y_continuous(labels = scales::dollar) +
labs(title = "Total Sales by Promotion Location",
subtitle = "Products promoted in-aisle accounted for the most sales value.",
x = "Location",
y = "Total Sales") +
coord_flip() +
theme(legend.position = "none",
axis.title.x = element_text(face = "bold"),
axis.title.y = element_text(face = "bold"),
plot.title = element_text(face = "bold", size = 13),
plot.subtitle = element_text(size = 9, face = "italic"))
After analyzing the highest contribution from a product category and the highest sales, we decided to analyze the relationship of soft drink and ‘In Aisle’ location to determine a better promotion mapping. The result showed that Soft Drinks was in fact the main contributor to sales of promotions displayed in aisle. However, we want to also see if ‘In Aisle’ location is the main reason for the high sales of Soft Drink. Following this will be the analysis.
in_aisle <- completedataset %>%
group_by(product_category) %>%
filter(display_label == "In Aisle")
top_10_in_aisle <- completedataset %>%
filter(display_label == "In Aisle") %>%
group_by(product_category) %>%
summarise(total_sales = sum(sales_value)) %>%
arrange(desc(total_sales)) %>%
mutate(cumulative_sales = cumsum(total_sales),
cumulative_percent = cumulative_sales / sum(total_sales) * 100) %>%
slice_max(order_by = total_sales, n = 10)
# Plot for Top 10 Products with In-aisle Promotions
ggplot(top_10_in_aisle, aes(x = reorder(product_category, -total_sales), y = total_sales, fill = product_category)) +
geom_col() + # Bars for total sales
scale_y_continuous(
name = "Total Sales",
labels = scales::dollar_format(prefix = "$")
) +
labs(title = "Top 10 Products with In-aisle Promotions in Total Sales",
subtitle = "A breakdown of products promoted specifically in aisle.\nWe are interested in Soft Drinks, which observed the highest sales.",
x = "Product Category", y = "Total Sales") +
theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 10)) +
theme(legend.position = "none",
axis.title.x = element_text(face = "bold"),
axis.title.y = element_text(face = "bold"),
plot.title = element_text(face = "bold", size = 13),
plot.subtitle = element_text(size = 9, face = "italic"),
axis.text.x = element_text(angle = 45, hjust = 1, size = 10)) +
scale_fill_viridis_d()
We then analyzed our data by weeks, since most products changed their promotion location weekly. We first divided the sales of soft drink by ‘In Aisle’ and ‘Elsewhere’ (which represented sales when the promotion was displayed elsewhere, NOT in aisle). We wanted to confirm whether or not ‘In Aisle’ was the most profitable location for promoting soft drinks.
From the chart below, our hypothesis was correct. The sales of soft drinks in aisle was almost as high as when displayed elsewhere. This represented that the sales when soft drinks was solely promoted in aisle was roughly the same as that when they were promoted in multiple other locations.
# Look at the sales throughout the year depending on the promotion location ONLY for SOFT DRINKS
soft_drink_weekly_sales <- completedataset %>%
filter(product_category == "SOFT DRINKS" & display_label != "No Display") %>%
mutate(location_group = ifelse(display_label == "In Aisle", "In Aisle", "Other Locations")) %>%
group_by(week, location_group) %>%
summarise(Tot_sales = sum(sales_value), .groups = "drop")
soft_drink_weekly_sales %>%
ggplot(aes(x = week, y = Tot_sales, color = location_group, group = location_group)) +
geom_line(size = 1) +
scale_y_continuous(labels = scales::dollar) +
labs(title = "Weekly Total Sales of Soft Drinks: In Aisle vs. Other Locations",
subtitle = "Figures of In-Aisle promotions alone were comparable to those of other locations combined.",
x = "Week",
y = "Total Sales") +
theme(axis.title.x = element_text(face = "bold", size = 9),
axis.title.y = element_text(face = "bold"),
plot.title = element_text(face = "bold", size = 13)) +
scale_fill_viridis_d()
In our previous analysis, we determined that soft drinks and in-aisle promotions might have gone hand in hand in driving sales. However, we wanted to determine if there was any area of growth for strategizing promotion locations.
With that in mind, we analyzed the top 10 products that were most commonly bought along with soft drinks, and determined where their promotions were located. Specifically, We compared their sales when respective promotions were In Aisle vs Elsewhere.
nd <- completedataset %>%
group_by(product_category) %>%
filter(all(display_location == 0)) %>%
summarise(total_sales = sum(sales_value)) %>%
arrange(desc(total_sales)) %>%
mutate(cumulative_sales = cumsum(total_sales),
cumulative_percent = cumulative_sales / sum(total_sales) * 100)
elsewhere <- completedataset %>%
group_by(product_category) %>%
filter(display_label != "In Aisle")
# Top 10 Products purchased with Soft Drinks that are promoted in-aisle
soft_drinks_basket_ia <- completedataset %>%
semi_join(in_aisle) %>%
group_by(basket_id) %>%
filter(any(product_category == "SOFT DRINKS"))
co_purchases_ia <- completedataset %>%
semi_join(in_aisle) %>%
filter(basket_id %in% soft_drinks_basket_ia$basket_id & product_category != "SOFT DRINKS")
co_purchases_sales_ia <- co_purchases_ia %>%
group_by(product_category) %>%
summarise(total_sales = round(sum(sales_value), 0)) %>%
arrange(desc(total_sales)) %>%
slice_max(order_by = total_sales, n = 10)
# The same 10 products but when they were promoted elsewhere
co_purchases_sales_ia_e <- completedataset %>%
filter(product_category %in% elsewhere$product_category & display_label != "In Aisle") %>%
semi_join(co_purchases_sales_ia, by = "product_category") %>%
group_by(product_category) %>%
summarise(total_sales = round(sum(sales_value), 0)) %>%
arrange(desc(total_sales)) %>%
slice_max(order_by = total_sales, n = 10)
# Top 10 Products purchased with Soft Drinks that are only promoted elsewhere
soft_drinks_basket_e <- completedataset %>%
filter(product_category %in% elsewhere$product_category) %>%
group_by(basket_id) %>%
filter(any(product_category == "SOFT DRINKS"))
co_purchases_e <- completedataset %>%
anti_join(co_purchases_ia) %>%
filter(display_label != "No Display" & display_label != "In Aisle") %>%
filter(basket_id %in% soft_drinks_basket_e$basket_id & product_category != "SOFT DRINKS")
co_purchases_sales_e <- co_purchases_e %>%
anti_join(co_purchases_sales_ia) %>%
group_by(product_category) %>%
summarise(total_sales = round(sum(sales_value), 0)) %>%
arrange(desc(total_sales)) %>%
slice_max(order_by = total_sales, n = 10)
# Top 10 Products purchased with Soft Drinks not promoted
co_purchases_n <- completedataset %>%
anti_join(co_purchases_ia) %>%
anti_join(co_purchases_e) %>%
filter(display_label == "No Display") %>%
filter(basket_id %in% soft_drinks_basket_e$basket_id & product_category != "SOFT DRINKS")
co_purchases_sales_n <- co_purchases_n %>%
filter(product_category %in% nd$product_category) %>%
group_by(product_category) %>%
summarise(total_sales = round(sum(sales_value), 0)) %>%
arrange(desc(total_sales)) %>%
slice_max(order_by = total_sales, n = 10)
# Combining
co_purchases_combined <- bind_rows(
co_purchases_sales_ia %>% mutate(display_option = "Aisle"),
co_purchases_sales_ia_e %>% mutate(display_option = "Elsewhere")
)
co_purchases_combined <- co_purchases_combined %>%
group_by(display_option) %>%
arrange(desc(total_sales)) %>% # Ensuring proper sorting before converting to factor
mutate(product_category = factor(product_category, levels = unique(product_category))) %>%
ungroup()
datatable(co_purchases_combined, options = list(pageLength = 5, scrollX = TRUE))
Interestingly, we found out that the sales of complementary goods is highest when promoted NOT in aisle. This meant that having a more diverse placement of those products is more advantageous.
# Plot for Top 10 Complementary Products of Soft Drinks by Display Location
ggplot(co_purchases_combined, aes(x = product_category, y = total_sales, color = product_category)) +
geom_segment(aes(x = product_category, xend = product_category, y = 0, yend = total_sales), size = 1.9) +
geom_point(size = 8.5) +
geom_text(aes(label = scales::comma(total_sales)), vjust = 0.4, color = "white", size = 2.5, fontface = "bold") +
scale_y_continuous(labels = scales::dollar,
expand = expansion(mult = c(0, 0.1))) +
labs(title = "Top 10 Complementary Goods of Soft Drinks
by Promotion Location and Total Sales",
subtitle = "Below are the total sales of 10 products promoted in-aisle (leftward)
and their respective statistics when promoted elsewhere (rightward).",
x = "Product Category",
y = "Total Sales") +
coord_flip() +
facet_wrap(~display_option) + # Creates separate panels
theme(legend.position = "none",
axis.title.x = element_text(face = "bold", size = 9),
axis.title.y = element_text(face = "bold"),
plot.title = element_text(face = "bold", size = 13),
plot.subtitle = element_text(size = 9, face = "italic"),
plot.margin = margin(10, 10, 10, 30)) +
scale_color_viridis_d()
We further analyzed complementary products that were displayed elsewhere, NOT in aisle, versus the ones that were not displayed at all.
co_purchases_combined_e <- bind_rows(
co_purchases_sales_e %>% mutate(display_option = "Displayed"),
co_purchases_sales_n %>% mutate(display_option = "No Display")
)
co_purchases_combined_e <- co_purchases_combined_e %>%
group_by(display_option) %>%
arrange(desc(total_sales)) %>%
mutate(product_category = factor(product_category, levels = unique(product_category))) %>%
ungroup()
datatable(co_purchases_combined_e, options = list(pageLength = 5, scrollX = TRUE))
This helped us discover another area of growth, since at the beginning we found that promoted products yielded higher sales than the ones not promoted anywhere. Specifically, we found a list of products which, despite not being promoted yet, had the potential to be co-purchased with soft drinks. This list included items such as chicken/poultry, stone fruits, salad bar, etc. Regork can try investing in promotion campaigns for those products and see if they really drive higher sales.
ggplot(co_purchases_combined_e, aes(x = product_category, y = total_sales, color = product_category)) +
geom_segment(aes(x = product_category, xend = product_category, y = 0, yend = total_sales), size = 1.9) +
geom_point(size = 8.5) +
geom_text(aes(label = scales::comma(total_sales)), vjust = 0.4, color = "white", size = 2.5, fontface = "bold") +
scale_y_continuous(labels = scales::dollar,
expand = expansion(mult = c(0, 0.1))) +
labs(title = "Top 10 Complementary Goods of Soft Drinks
by Promotion Option and Total Sales",
subtitle = "On the left are the total sales of 10 products purchased with soft drinks
but promoted elsewhere, while one the right are those of products \nnot promoted at all.",
x = "Product Category",
y = "Total Sales") +
coord_flip() +
facet_wrap(~display_option, scales = "free") + # Creates separate panels
theme(legend.position = "none",
axis.title.x = element_text(face = "bold", size = 9),
axis.title.y = element_text(face = "bold"),
axis.text.x = element_text( angle = 45),
plot.title = element_text(face = "bold", size = 13),
plot.subtitle = element_text(size = 9, face = "italic"),
plot.margin = margin(10, 10, 10, 30)) +
scale_color_viridis_d()
Problem Statement
Our main business questions
was to examine which products delivered the highest profitability for
Regork and where their promotions should be placed. Following that is
whether or not it is viable to strategize promotion locations for
multiple products.
How we addressed problem
We addressed our
problem by looking at the most profitable product category and its
promotion location. We then analyzed their relation and determined the
contribution of major promotion locations. Then, to find areas of
growth, we analyzed the complementary goods. All of these were to create
a complete picture as to how Regork can strategize the locations of
product promotions.
Interesting Insights
Our primary insight was
that soft drinks drove more than 50% of the company business,leading far
ahead of other products. In terms of promotion locations, we found out
that in-aisle promotions drove the highest sales among all locations.
This is surprising considering the common assumption that store-front
promotions appeal the most to consumers.
Another insight from our analysis was that promoted products were purchased more often than those without promotions. This shows that promotion campaigns were worth the investment.
Our most surprising insight was that customers bought more complementary goods of soft drinks when they are promoted elsewhere other than in aisle. We do want to clarify that this is what the analysis showed for Soft drinks. We would need further analysis on other big product categories. Nevertheless, the main point is the importance of having a diversified mapping of promotions.
What we propose to Regork
We propose that Regork strategize their promotion locations by decreasing promotions for Soft Drink in areas other than ‘In Aisle’. Ultimately, this will decrease the costs of marketing without affecting the sales of their major contributor, although we understand that it takes time for all the adjustments to kick in.
Our second recommendation is to move the promotions for complementary goods of soft drinks to other locations. Like the previous recommendation, this will reduce the material costs and betofter utilize promotion budget.
Finally, we encourage Regork to display promotions for complementary products such as chicken/poultry and salad bar, since they’re commonly bought with soft drinks and promoting these may drive sales for the business.
Limitations of Analysis
The limitation of our analysis lies in the relations between our findings and campaigns. If we had more time, we would’ve analyzed the performance of each location by campaign length. We would’ve also implemented the factor of seasonality. Additionally, as mentioned, we only analyzed certain product category that we assumed the most valuable for the company. This type of study could be done for other product categories.