The business problem that we are focusing on is how the sale of holiday items correlates with other products and discounts/coupons during that period. The holidays that we are using are the 4th of July and beer sales, Easter and ham sales, and Christmas and cookie sales. By using the transaction and products data we are able to recognize trends and relationships between these key holiday items and learn how to effectively gauge Regork’s customer base.
With using this data and finding these trends we will be able to make better decisions on key holiday items. We do this by finding out how often they are purchased, what items are bought with it, and how much revenue is made off of the sale of the product. Our first graph shown with each product group will be the amount of product bought by month over the course of a year. Our second graph shows for each product the percentage of transactions that a discount/coupon is used.
This analysis will help the Regork understand which products they should be promoting during certain times of the year. Whether it can be proximity to related products within a grocery store, or a similarity in packaging, or even the coupons given out to their customer base.
Packages Required library(dplyr) library(ggplot2) library(lubridate) library(completejourney)
Exploratory Data Analysis For our analysis, we highlighted three holidays and the sales of a holiday product. The holidays and products we chose are the 4th of July and beer, and Easter and Ham, and Christmas and seasonal products. By running this code we can locate the targeted transactions pertaining to the key holiday items.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(completejourney)
## Welcome to the completejourney package! Learn more about these data
## sets at http://bit.ly/completejourney.
transactions<- get_transactions()%>%
left_join(products %>% select(product_id, product_category), by = "product_id")
alcohol_sales <- transactions %>%
filter(product_category == "BEERS/ALES") %>%
mutate(month = month(transaction_timestamp, label = TRUE))
alcohol_sales_by_month <- alcohol_sales %>%
mutate(month = floor_date(transaction_timestamp, "month")) %>%
group_by(month) %>%
summarise(total_sales = sum(quantity, na.rm = TRUE))
alcohol_sales_by_month$month <- factor(c("January", "February", "March", "April", "May",
"June", "July", "August", "September",
"October", "November", "December"),
levels = c("January", "February", "March", "April", "May",
"June", "July", "August", "September",
"October", "November", "December"))
print(alcohol_sales_by_month)
## # A tibble: 12 × 2
## month total_sales
## <fct> <dbl>
## 1 January 787
## 2 February 849
## 3 March 784
## 4 April 841
## 5 May 959
## 6 June 1033
## 7 July 1125
## 8 August 969
## 9 September 865
## 10 October 873
## 11 November 856
## 12 December 1042
ggplot(alcohol_sales_by_month, aes(x = month, y = total_sales)) +
geom_bar(stat = "identity", fill = "skyblue") +
labs(title = "Beer Sales Over the Year",
x = "Month",
y = "Total Beer Sales") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
This graph we can clearly see that in late spring/early summer there is an increase in the sale of beer/ale over the course of the year. There is a peak in July and a drop-off until later in the year in December. Someone within a firm could take a look at this graph and realize that certain products are going to sell better at different times of the year. In this case, beer sells the best in the summer months as well as in December. This can help a company look in directions that it might not have before.
transactions2 <- get_transactions() %>%
left_join(products %>% select(product_id, product_category, product_type), by = "product_id")
# Filter for beer/ales transactions and the Fourth of July period
beer_sales_july <- transactions %>%
filter(product_category == "BEERS/ALES" &
month(transaction_timestamp) == 7) # Only include transactions from July
# Find the top 5 items that are sold with beer/ales
top_items_july <- transactions %>%
filter(basket_id %in% beer_sales_july$basket_id) %>% # Find all transactions in the same baskets as beer/ales
group_by(product_category) %>% # Group by product type (or product name)
summarise(
total_appearances = n(), # Total appearances of each item
discounted_appearances = sum(retail_disc > 0, na.rm = TRUE), # Discounted appearances
discount_percentage = (discounted_appearances / total_appearances) * 100 # Calculate discount percentage
) %>%
arrange(desc(total_appearances)) %>%
slice(1:5) # Select the top 5 items
# Create a horizontal bar chart with total and discounted appearances
ggplot(top_items_july, aes(y = reorder(product_category, total_appearances))) +
geom_segment(aes(x = 0, xend = total_appearances, yend = product_category), color = "pink", size = 3) +
geom_point(aes(x = total_appearances), color = "seagreen3", size = 2.5) +
geom_point(aes(x = discounted_appearances), color = "seagreen3", size = 2.5) +
geom_text(aes(x = total_appearances, label = total_appearances), vjust = -0.5, size = 3, color = "black") +
geom_text(aes(x = discounted_appearances, label = discounted_appearances), vjust = -0.5, size = 3) +
geom_text(aes(x = discounted_appearances, label = paste0(round(discount_percentage), "%")),
vjust = 1.5, color = "black") +
labs(title = "Top 5 Items Sold with Beer/Ale Around the Fourth of July",
subtitle = "Percentage and count of discounted vs total appearances in transactions",
x = NULL, y = NULL) +
theme_minimal() +
theme(axis.text.x = element_blank(),
axis.text.y = element_text(size = 12, face = "bold"),
panel.grid.major.x = element_blank())
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
The graph above shows the products that are the most frequently bought with beer/ale. This can help any specific store understand that in months like June/July, other products like soft drinks, baked bread, bag snacks, and cheese. A store can advertise these products more heavily along with beer/ale in the months that it will sell the most. Doing so can increase their sales of certain items that they promote during different times of the year. Another way to promote their products would be through the use of discounts/coupons and looking at this graph you can see the different percentages at which a product is bought when it is on sale.
transactions3 <- get_transactions() %>%
left_join(products %>% select(product_id, product_category), by = "product_id")
# Step 2: Filter for "Smoked Ham" from the product category
smoked_meat_sales <- transactions %>%
filter(product_category == "SMOKED MEATS") %>%
mutate(month = floor_date(transaction_timestamp, "month")) # Create a month column
smoked_meat_sales_by_month <- smoked_meat_sales %>%
mutate(month = floor_date(transaction_timestamp, "month")) %>%
group_by(month) %>%
summarise(total_sales = sum(quantity, na.rm = TRUE))
smoked_meat_sales_by_month$month <- factor(c("January", "February", "March", "April", "May",
"June", "July", "August", "September",
"October", "November", "December"),
levels = c("January", "February", "March", "April", "May",
"June", "July", "August", "September",
"October", "November", "December"))
print(smoked_meat_sales_by_month)
## # A tibble: 12 × 2
## month total_sales
## <fct> <dbl>
## 1 January 152
## 2 February 135
## 3 March 209
## 4 April 282
## 5 May 155
## 6 June 171
## 7 July 206
## 8 August 182
## 9 September 191
## 10 October 212
## 11 November 344
## 12 December 400
ggplot(smoked_meat_sales_by_month, aes(x = month, y = total_sales)) +
geom_bar(stat = "identity", fill = "salmon") +
labs(title = "Smoked Meat Sales Over the Year",
x = "Month",
y = "Total Smoked Meat Sales") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
As we can see in the graph above there is a noticeable surge in smoked meat sales during the Easter season, highlighting a strong seasonal trend. Sales begin to rise just before Easter, peaking just before the holiday just before gradually declining until the end of the year. This pattern suggests that consumers associate Easter celebrations with smoked meats. This is probably due to traditional meals featuring smoked products. This data gives Regork an opportunity for strategic promotions and inventory planning to be able to maximize sales during a high demand period.
transactions4 <- get_transactions() %>%
left_join(products %>% select(product_id, product_category, product_type), by = "product_id")
# Step 2: Filter for Smoked Meat sales and the Easter period (March and April)
smoked_meat_sales_easter <- transactions %>%
filter(product_category == "SMOKED MEATS" &
month(transaction_timestamp) %in% c(3, 4)) # Filter for March and April (Easter period)
# Step 3: Identify top 5 items sold with smoked meats during the Easter period
top_items_easter <- transactions %>%
filter(basket_id %in% smoked_meat_sales_easter$basket_id) %>% # Same basket as smoked meats
group_by(product_category) %>% # Group by product type (or product name)
summarise(
total_appearances = n(), # Total appearances of each item
discounted_appearances = sum(retail_disc > 0, na.rm = TRUE), # Discounted appearances
discount_percentage = (discounted_appearances / total_appearances) * 100 # Calculate discount percentage
) %>%
arrange(desc(total_appearances)) %>%
slice(1:5) # Select the top 5 items
# Step 4: Create the chart showing total and discounted appearances
ggplot(top_items_easter, aes(y = reorder(product_category, total_appearances))) +
geom_segment(aes(x = 0, xend = total_appearances, yend = product_category), color = "salmon", size = 3) +
geom_point(aes(x = total_appearances), color = "blue", size = 2.5) +
geom_point(aes(x = discounted_appearances), color = "lightblue", size = 2.5) +
geom_text(aes(x = total_appearances, label = total_appearances), vjust = -0.5, size = 3, color = "black") +
geom_text(aes(x = discounted_appearances, label = discounted_appearances), vjust = -0.5, size = 3) +
geom_text(aes(x = discounted_appearances, label = paste0(round(discount_percentage), "%")),
vjust = 1.5, color = "black") +
labs(title = "Top 5 Items Sold with Smoked Meats Around Easter",
subtitle = "Percentage and count of discounted vs total appearances in transactions",
x = NULL, y = NULL) +
theme_minimal() +
theme(axis.text.x = element_blank(),
axis.text.y = element_text(size = 12, face = "bold"),
panel.grid.major.x = element_blank())
The analysis reveals clear purchasing patterns around the Easter season. This shows that cheese(83%) and baked bread(74%) are the most likely bought products around that time. Stores can use this information to strategically promote cheese and baked bread alongside smoked meats during the Easter season. By bundling these products together in advertisements or offering targeted discounts, retailers can encourage customers to purchase complementary items, ultimately increasing overall sales. Additionally, analyzing the impact of discounts and coupons on these products can help optimize promotional strategies. The graph also highlights the varying percentages at which these items are purchased when on sale, allowing stores to tailor their pricing and marketing efforts to maximize revenue during peak buying periods.
transactions5 <- get_transactions() %>%
left_join(products %>% select(product_id, product_category), by = "product_id")
# Step 2: Filter for "COOKIES/CONES" from the product category
christmas_sales <- transactions %>%
filter(product_category == "CHRISTMAS SEASONAL") %>%
mutate(month = floor_date(transaction_timestamp, "month")) # Create a month column
# Step 3: Summarize sales data by month
christmas_sales_by_month <- christmas_sales %>%
mutate(month = floor_date(transaction_timestamp, "month")) %>%
group_by(month) %>%
summarise(total_sales = sum(quantity, na.rm = TRUE))
christmas_sales_by_month$month <- factor(c("January", "February", "March", "April", "May",
"June", "July", "August", "September",
"October", "November", "December"),
levels = c("January", "February", "March", "April", "May",
"June", "July", "August", "September",
"October", "November", "December"))
print(christmas_sales_by_month)
## # A tibble: 12 × 2
## month total_sales
## <fct> <dbl>
## 1 January 335
## 2 February 61
## 3 March 20
## 4 April 16
## 5 May 13
## 6 June 30
## 7 July 10
## 8 August 8
## 9 September 18
## 10 October 130
## 11 November 511
## 12 December 1287
ggplot(christmas_sales_by_month, aes(x = month, y = total_sales)) +
geom_bar(stat = "identity", fill = "palegreen3") +
labs(title = "Christmas Seasonal Sales Over the Year",
x = "Month",
y = "Total Christmas Seasonal Sales") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
The seasonal surge in sales for Christmas is unlike any other. The sales during and around Christmas time come so quickly and leave so quickly that it makes it quite unique. Referring to the graph above, the only real product sales are done in November and December. It makes sense for it to be this way but few other products are like this. Regork would most likely already understand the importance of placing promotions on the right products during a seasonal time but with this next graph they can also understand what products they should be promoting and which way they should be promoting them.
transactions6 <- get_transactions() %>%
left_join(products %>% select(product_id, product_category, product_type), by = "product_id")
# Step 2: Filter for "CHRISTMAS SEASONAL" sales during December
christmas_sales_december <- transactions %>%
filter(product_category == "CHRISTMAS SEASONAL" & month(transaction_timestamp) == 12) # December filter
# Step 3: Identify top 5 items bought with "CHRISTMAS SEASONAL" products during December
top_items_christmas <- transactions %>%
filter(basket_id %in% christmas_sales_december$basket_id) %>% # Same basket as Christmas items
group_by(product_category) %>% # Group by product type
summarise(
total_appearances = n(), # Count total appearances of each item
discounted_appearances = sum(retail_disc > 0, na.rm = TRUE), # Count discounted appearances
discount_percentage = (discounted_appearances / total_appearances) * 100 # Calculate discount percentage
) %>%
arrange(desc(total_appearances)) %>%
slice(1:5) # Select top 5 items
# Step 4: Create the chart showing total and discounted appearances
ggplot(top_items_christmas, aes(y = reorder(product_category, total_appearances))) +
geom_segment(aes(x = 0, xend = total_appearances, yend = product_category), color = "palegreen3", size = 3) +
geom_point(aes(x = total_appearances), color = "pink", size = 2.5) +
geom_point(aes(x = discounted_appearances), color = "red", size = 2.5) +
geom_text(aes(x = total_appearances, label = total_appearances), vjust = -0.5, size = 3, color = "black") +
geom_text(aes(x = discounted_appearances, label = discounted_appearances), vjust = -0.5, size = 3) +
geom_text(aes(x = discounted_appearances, label = paste0(round(discount_percentage), "%")),
vjust = 1.5, color = "black") +
labs(title = "Top 5 Items Sold with Christmas Seasonal Products Around Christmas",
subtitle = "Percentage and count of discounted vs total appearances in transactions",
x = NULL, y = NULL) +
theme_minimal() +
theme(axis.text.x = element_blank(),
axis.text.y = element_text(size = 12, face = "bold"),
panel.grid.major.x = element_blank())
Stores can use this information to strategically promote these items alongside soft drinks, cheese, and candy for the holiday season. By bundling these products together in advertisements or offering targeted discounts, retailers can encourage customers to purchase complementary items, ultimately increasing overall sales. Additionally, analyzing the impact of discounts and coupons on these products can help optimize promotional strategies. The graph also highlights the varying percentages at which these items are purchased when on sale, allowing stores to tailor their pricing and marketing efforts to maximize revenue during the peak holiday shopping period.
Our study examined the relationship between the sales of key holiday products—beer during the 4th of July, smoked meats for Easter, and cookies for Christmas—and their correlation with other frequently purchased items and promotional discounts. The goal was to help Regork Grocery Chain develop more effective marketing strategies and optimize product placement to maximize revenue during peak seasons. Being able to identify the purchasing power of Regork’s customer base and the impacts of discount during holiday periods, we aim to provide actionable insights to enhance holiday sales performance.
Starting this analysis, we utilized transaction and product data to track purchasing trends. This involved examining monthly sales fluctuations of a multitude of products and identifying which products were most commonly sold together. We then took it a step further and wanted to see how a discount affected the number of transactions of a certain product. The results revealed clear seasonal trends, such as increased beer sales in July and December, higher demand for smoked meats around Easter, and a concentrated spike in cookie purchases during November and December. Additionally, our findings showed that strategic discounting plays a role in driving sales, particularly for items frequently bought together.
Based on the insights above, we think Regork should focus on adjusting store layouts, potentially create bundled promotions and implement target discounts on certain products during holiday seasons. For example, displaying beer alongside snacks and soft drinks, pairing smoked meat with cheese and bread before Easter and lastly promoting cookies with seasonal treats in December could encourage higher sales.