Every year certain products will fly off the shelves for months, like popsicles in the summer, hot cocoa in the winter, and pumpkin spice in the fall. However, for some products, seasonal trends dramatically hinder transaction counts and stagnate their potential for year round sales. In this report, we hope to identify some possible solutions to spur seasonal product sales during their down season(s) and understand if any new product-pairings may be helpful to advertise.
By first analyzing total department transactions (using the transactions and products data) we were able to get a sense of which areas of product offering likely had the most potential for more sales. By then diving deeper into the appropriate product categories and types we were able to identify a few top products to initiate our promotional program. We then identified common products that people bought as complements that had positive sales trends. Finally, we completed a demographic overview (using the demographics data) to help the advertising team understand who to target the promotions toward.
Our analysis will hopefully allow Regork to not only minimize the impact of high seasonality in some staple products, but also hopefully allow them to begin understanding the impact that some complement based bundling might have on their bottom line. By utilizing our report as an example, Regork may be able to identify other product-complement pairings that would warrant significant transaction upticks if joint-promotions were implemented for targeted customers.
#load data
library(completejourney)
library(tidyverse)
library(ggplot2)
library(DT)
library(stringr)
library(ggthemes)
#import data sets
transactions <- get_transactions()
data(products)
data(demographics)
We began our analysis by examining the overarching trends present among each department. In this time series chart, the total transactions for each department is illustrated across every month of the year.
The first time we produced this, the grocery department greatly exceeded all other departments making it difficult to discover unique insights (can see Figure 1.1 in appendix). As a result, in the visual provided below, the groceries department was removed. We then noticed that the produce department had a very noticeable decline from July (14,250 total transactions) through December (11,000 transactions. This represents a 22.8% decline in total transactions.
# Total Monthly Transactions by Department Excluding the Grocery Department
transactions %>%
inner_join(products, by = "product_id") %>%
mutate(month = month(transaction_timestamp)) %>%
filter(department != "GROCERY") %>%
group_by(month, department) %>%
summarize(TT = n()) %>%
ggplot(aes(x = month, y = TT, color = department)) +
geom_line() +
scale_x_continuous("Month",
breaks = seq(1, 12, by = 1),
labels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug","Sep","Oct","Nov","Dec")) +
scale_y_continuous("Total Transactions",
breaks = seq(0, 15000, by = 1000),
limits = c(0,15000),
expand = expansion(0, 0)) +
labs(color = "Department") +
ggtitle("Total Monthly Transactions by Department",
subtitle = "Excluding Grocery Department")
To understand what areas of the produce department were causing these seasonal trends, we broke out each different product type to examine their individual transaction totals. Please note, we removed any produce types that did not have any transactions as it would be difficult to understand their seasonal trends/store availability.
Upon reviewing these plots we identified berries and stone fruits to be the categories with the most significant and consistent seasonal trends. Moving forward, these were the areas that we chose to focus our efforts into.
# Total Transactions by Produce Type
transactions %>%
inner_join(products, by = "product_id") %>%
mutate(month = month(transaction_timestamp)) %>%
filter(department == "PRODUCE",
month %in% c(7, 8, 9, 10, 11, 12)) %>%
group_by(month, product_category) %>%
summarize(TT = n()) %>%
group_by(product_category) %>%
filter(sum(TT) > 250)%>%
ggplot(aes(x = month, y = TT)) +
geom_bar(position = 'dodge',
stat = 'identity') +
facet_wrap(~product_category,
labeller = labeller(product_category = label_wrap_gen(width = 10))) +
labs( x = "Month",
y = "Total Transactions",
title = "Total Transactions by Produce Type"
) + scale_x_continuous(breaks = seq(7,12, by = 1),
labels = c("Jul", "Aug", "Sep","Oct","Nov", "Dec")) +
scale_y_continuous(expand = expansion(0, 0)) +
theme_excel() +
theme(strip.text.x = element_text(size = 7))
When reviewing the stone fruit category, we found that “Peaches Yellow Flesh” had the most significant decline. Similarly within berries, “Strawberries” and “Blueberries” had the most significant drop off. When plotted against one another, the similar trends between the three products becomes self-evident as all three begin July at about 125 transactions a month or higher and finish at about 25 transactions or less by December.
#Total Transactions within the Stone Fruit Category
transactions %>%
inner_join(products, by = "product_id") %>%
mutate(month = month(transaction_timestamp)) %>%
filter(product_category == "STONE FRUIT",
month %in% c(7, 8, 9, 10, 11, 12)) %>%
group_by(month, product_type) %>%
summarize(TT = n()) %>%
ggplot(aes(x= month, y = TT)) +
geom_bar( position= 'dodge',
stat= 'identity') +
facet_wrap(~product_type) +
labs( x = "Month",
y = "Total Transactions",
title = "Total Transactions within Stone Fruit Category") +
scale_x_continuous(breaks = seq(7,12, by = 1),
labels = c("Jul", "Aug", "Sep","Oct","Nov", "Dec")) +
scale_y_continuous(expand = expansion(0, 0)) +
theme_excel()
#Total Transactions within the Berry Category
transactions %>%
inner_join(products, by = "product_id") %>%
mutate(month = month(transaction_timestamp)) %>%
filter(product_category == "BERRIES", month %in% c(7, 8, 9, 10, 11, 12)) %>%
group_by(month, product_type) %>%
summarize(TT = n()) %>%
ggplot(aes(x= month, y = TT)) +
geom_bar( position= 'dodge',
stat= 'identity') +
facet_wrap(~product_type) +
theme_excel() +
scale_x_continuous(breaks = seq(7,12, by = 1),
labels = c("Jul", "Aug", "Sep","Oct","Nov", "Dec")) +
labs( x = "Month",
y= "Total Transactions",
title = "Total Transactions within Berry Category") +
scale_y_continuous(expand = expansion(0, 0))
#Total Transactions for Blueberries, Peaches, and Strawberries
transactions %>%
inner_join(products, by = "product_id") %>%
mutate(month = month(transaction_timestamp)) %>%
filter(department=="PRODUCE",
product_type == c("BLUEBERRIES", "STRAWBERRIES", "PEACHES YELLOW FLESH"),
month %in% c(7, 8, 9, 10, 11, 12)) %>%
group_by(month, product_type) %>%
summarize(TT = n()) %>%
ggplot(aes(x= month, y = TT,
fill = product_type) ) +
geom_bar( position= 'dodge',
stat= 'identity') +
theme_excel() +
geom_smooth(method = "glm",
aes(group = product_type, color = product_type),
se = FALSE,
method.args = list(family = "quasipoisson")) +
labs( x = "Month",
y= "Total Transactions",
title = str_wrap("Total Transactions for Blueberries, Strawberries, and Peaches"),
fill = "Product") +
scale_x_continuous(breaks = seq(7,12, by = 1),
labels = c("Jul", "Aug", "Sep","Oct","Nov", "Dec")) +
scale_y_continuous(expand = expansion(0, 0)) +
scale_color_manual(name = "Trend Lines",
values = c("BLUEBERRIES" = "red",
"STRAWBERRIES" = "blue",
"PEACHES YELLOW FLESH" = "green"))
Since we identified our target products, the next step is to identify other items that were commonly bought with the target items and had increasing transaction rates throughout the year. To do this, we first created a new variable (called baskets_with_fruit) to group the transaction data with the product data to pull the list of every single basket ID in which a blueberry, strawberry, or peach (yellow flesh) was purchased.
# New variable creation: baskets with blueberries, strawberries, and peaches
baskets_with_fruit<-
transactions %>%
inner_join(products, "product_id") %>%
filter(product_type %in% c("BLUEBERRIES","STRAWBERRIES","PEACHES YELLOW FLESH")) %>%
group_by(basket_id) %>%
summarize(basketnumber=n()) %>%
select(basket_id)
Using the new variable we just created (list of basket IDs), we can now create a graph of the top 10 complements (for ease of viewing) and table of the top 50 complements. This will allow us to understand which products are the most frequently accompanied with the purchases of our target goods.
#Top 10 most bought items with strawberries, blueberries, and peaches.
transactions %>%
inner_join(products, "product_id") %>%
inner_join(baskets_with_fruit, by = "basket_id") %>%
group_by(basket_id, product_type) %>%
summarize(total_transactions = n()) %>%
group_by(product_type) %>%
summarize(transactions = sum(total_transactions)) %>%
filter(!product_type %in% c("STRAWBERRIES", "BLUEBERRIES","PEACHES YELLOW FLESH")) %>%
mutate(product_type = reorder(product_type, -transactions)) %>%
top_n(10) %>%
ggplot(aes(x=product_type,
y=transactions,
fill = product_type)) +
geom_col(label = NULL) +
theme_excel() +
labs( x = NULL,
y = "Transactions",
title = str_wrap("Most Frequently Bought Items With Blueberries, Strawberries, and Peaches (Yellow Flesh)",
width = 40),
fill = "Product") +
theme(axis.text.x = element_blank()) +
scale_y_continuous(expand = expansion(0, 0),
limits = c(0,4000))
#List creation: Top 50 items bought with strawberries, blueberries, and peahces.
complement_list<-
transactions %>%
inner_join(products, "product_id") %>%
inner_join(baskets_with_fruit, by = "basket_id") %>%
group_by(basket_id, product_type) %>%
summarize(popular_products = n()) %>%
group_by(product_type)%>%
summarize(popular_product = sum(popular_products)) %>%
filter(!product_type %in% c("STRAWBERRIES", "BLUEBERRIES", "PEACHES YELLOW FLESH")) %>%
arrange(desc(popular_product)) %>%
top_n(50)
Complement_List <- complement_list
names(Complement_List) <- c("Product Type", "Transactions")
datatable(Complement_List)
At first glance, yogurt, milk, and bananas seem to be the best complements for our target products. However, upon using an alternative method, we found that some other products may be better. Instead of just looking at volume of product, we looked to understand which products have positive sales trends in the later months of the year. That way, when customers are looking to get these products that hold heavy winter seasonality they will also be inclined to buy more of our target products. In the accompanying series of charts, the transaction totals by month are shown.
#Total transactions of products bought with blueberries, peaches, and strawberries
transactions %>%
inner_join(products, "product_id") %>%
inner_join(baskets_with_fruit, by = "basket_id") %>%
group_by(basket_id, product_type) %>%
summarize(popular_products = n()) %>%
group_by(product_type) %>%
summarize(popular_product = sum(popular_products)) %>%
filter(!product_type %in% c("STRAWBERRIES", "BLUEBERRIES", "PEACHES YELLOW FLESH")) %>%
arrange(desc(popular_product)) %>%
top_n(20) %>%
inner_join(products, "product_type") %>%
inner_join(transactions, "product_id") %>%
mutate(month = month(transaction_timestamp)) %>%
filter(month %in% 7:12) %>%
group_by(month,product_type) %>%
summarize(total_transactions = n()) %>%
ggplot(aes( x = month,
y = total_transactions)) +
geom_col() +
facet_wrap(~product_type,
labeller = labeller(product_type = label_wrap_gen(width = 10))) +
theme_excel() +
labs( x = "Month", y = "Total Transactions", title = "Total Transactions of Complementary Products") +
scale_x_continuous(breaks = seq(7,12, by = 1),
labels = c("Jul", "Aug", "Sep","Oct","Nov","Dec")) + scale_y_continuous(expand = expansion(0, 0)) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
Just by looking at the plots of the top 20 most frequently purchased items in these basket IDs, only Sft Drink 2 Liter Btls has any kind of significant upward trend (can also see Figure 1.2 in appendix). To try and obtain a more accurate picture of how much a trend may exist, we created our own growth variable. By taking the end of year sales, subtracting July sales, and finding the percent change since July we were able to rank the complement products in terms of growth over the back half of the year. In our efforts, we increased our complement range to the top 50 items.
We then identified that RTS Soup, Cream Cheese, and Condensed Soups offered the greatest potential as they had the three largest growth rate among our fruit complements. When plotted against one another, as seen below, it appears that these three products would be great to create joint coupons/advertising for.
#List creation: Total transactions of compliment goods in December.
dec_transactions <-
transactions %>%
inner_join(products, by = "product_id") %>%
inner_join(baskets_with_fruit, by = "basket_id") %>%
group_by(basket_id, product_type) %>%
summarize(popular_products = n()) %>%
group_by(product_type) %>%
summarize(popular_product =
sum(popular_products)) %>%
filter(!product_type %in% c("STRAWBERRIES", "BLUEBERRIES", "PEACHES YELLOW FRUIT")) %>%
arrange(desc(popular_product)) %>%
top_n(50) %>%
inner_join(products, by = "product_type") %>%
inner_join(transactions, by = "product_id") %>%
mutate(month = month(transaction_timestamp)) %>%
filter(month == 12) %>%
group_by(product_type) %>%
summarize(total_transactions = n())
#List creation: Total transactions of compliment goods in July
july_transactions <-
transactions %>%
inner_join(products, by = "product_id") %>%
inner_join(baskets_with_fruit, by = "basket_id") %>%
group_by(basket_id, product_type) %>%
summarize(popular_products = n()) %>%
group_by(product_type) %>%
summarize(popular_product = sum(popular_products)) %>%
filter(!product_type %in%
c("STRAWBERRIES", "BLUEBERRIES", "PEACHES YELLOW FRUIT")) %>%
arrange(desc(popular_product)) %>%
top_n(50) %>%
inner_join(products, by = "product_type") %>%
inner_join(transactions, by = "product_id") %>%
mutate(month = month(transaction_timestamp)) %>%
filter(month == 7) %>%
group_by(product_type) %>%
summarize(total_transactions = n())
#Table creation: Growth rate of compliment goods from July to December.
growth <- dec_transactions %>%
inner_join(july_transactions, by = "product_type") %>%
mutate(growth_rate =
round(100*((total_transactions.x / total_transactions.y) - 1)),2) %>%
select(product_type, growth_rate) %>%
arrange(desc(growth_rate))
#Growth Table
names(growth) <- c("Product Type", "Growth Rate %")
datatable(growth)
#Total transactions of soups and fruits
transactions %>%
inner_join(products, by = "product_id") %>%
mutate(month = month(transaction_timestamp)) %>%
filter((department == "PRODUCE" & product_type %in% c("BLUEBERRIES", "STRAWBERRIES", "PEACHES YELLOW FLESH")) |
product_type %in% c("CREAM CHEESE", "RTS SOUP: CHUNKY/HOMESTYLE ET", "CONDENSED SOUP"),
month %in% c(7, 8, 9, 10, 11, 12)) %>%
group_by(month, product_type) %>%
summarize(TT = n()) %>%
mutate(product_type = factor(product_type,
levels = c("BLUEBERRIES", "STRAWBERRIES", "PEACHES YELLOW FLESH", "CREAM CHEESE", "RTS SOUP: CHUNKY/HOMESTYLE ET", "CONDENSED SOUP"))) %>%
ggplot(aes(x = month,
y = TT,
fill = product_type)) +
geom_bar(position = 'dodge',
stat = 'identity') +
labs(x = "Month",
y = "Total Transactions",
title = str_wrap("Total Transactions for All Target Products and Complements", width = 40),
fill = "Product") +
theme_excel() +
geom_smooth(method = "glm",
aes(group = product_type,
color = product_type),
se = FALSE,
method.args = list(family = "quasipoisson")) +
scale_x_continuous(breaks = seq(7,12, by = 1),
labels = c("Jul", "Aug", "Sep","Oct","Nov", "Dec")) +
scale_y_continuous(expand = expansion(0, 0)) +
scale_color_manual(name = "Trend Lines",
values = c("BLUEBERRIES" = "red", "STRAWBERRIES" = "yellow", "PEACHES YELLOW FLESH" = "green", "CREAM CHEESE" = "blue", "RTS SOUP: CHUNKY/HOMESTYLE ET" = "navy", "CONDENSED SOUP" = "purple"))
Since we now understand which complements to target, we began to do some preliminary demographic analysis to understand how we can cater our ads to be as effective as possible.
# Total transactions by product type and income
transactions %>%
inner_join(products, "product_id") %>%
inner_join(demographics, "household_id") %>%
mutate(month = month(transaction_timestamp)) %>%
filter(month %in% 7:12,
product_type %in% c("BLUEBERRIES", "STRAWBERRIES", "PEACHES YELLOW FLESH", "CREAM CHEESE", "RTS SOUP: CHUNKY/HOMESTYLE ET", "CONDENSED SOUP")) %>%
group_by(month, income, product_type) %>%
summarize(total_transactions = n()) %>%
# Reorder product_type levels
mutate(product_type = factor(product_type,
levels = c("BLUEBERRIES", "STRAWBERRIES", "PEACHES YELLOW FLESH", "CREAM CHEESE", "RTS SOUP: CHUNKY/HOMESTYLE ET", "CONDENSED SOUP"))) %>%
ggplot(aes(x = month,
y = total_transactions,
fill = product_type)) +
geom_col() +
scale_fill_manual(values = c("BLUEBERRIES" = "purple", "STRAWBERRIES" = "blue",
"PEACHES YELLOW FLESH" = "green",
"CREAM CHEESE" = "yellow",
"RTS SOUP: CHUNKY/HOMESTYLE ET" = "orange",
"CONDENSED SOUP" = "red"),
name = "Product Type") +
facet_wrap(~income) +
labs(x = "Month",
y = "Total Transactions",
title = "Total Transactions by Product Type and Income",
fill = "Product Type") + # Title for the legend
scale_y_continuous(expand = expansion(0,0),
breaks = seq(0, 500, by = 50)) +
scale_x_continuous(breaks = seq(7, 12, by = 1),
labels = c("Jul", "Aug", "Sep", "Oct", "Nov", "Dec"))
#Demographics of households who purchase RTS Soup, Condensed Soup, and Cream Cheese
transactions %>%
inner_join(products, "product_id") %>%
mutate(month = month(transaction_timestamp)) %>%
filter(month %in% 7:12, product_type %in% c("RTS SOUP: CHUNKY/HOMESTYLE ET", "CONDENSED SOUP", "CREAM CHEESE")) %>%
distinct(household_id, .keep_all = TRUE) %>%
inner_join(demographics, "household_id") %>%
group_by(household_size, month) %>%
summarize(total_transactions = n()) %>%
ggplot(aes(x = month,
y = total_transactions)) +
geom_col() +
facet_wrap(~household_size) +
labs(x = "Month",
y = "Total Transactions") +
scale_y_continuous(expand = expansion(0, 0),
breaks = seq(0, 130,
by = 10),
limits = c(0,130)) +
scale_x_continuous(breaks = seq(7, 12, by = 1),
labels = c("Jul", "Aug", "Sep", "Oct", "Nov", "Dec")) +
theme_excel() +
ggtitle(str_wrap("Demographics of Households Who Purchase RTS Soup, Condensed Soup and Cream Cheese", width = 40))
After using both of the graphs above, it is clear that the target audience for these coupons and advertisements should be one or two individual households that have an income in the 35K - 74K range. By this logic, coupons should likely focus more on pairing individual-sized servings as opposed to “family-sized” meals and focus more on price deductions rather than increased volume.
At the beginning of the analysis we sought out to provide some cohesive solutions to the issue of seasonal products having dramatic down ticks in sales. Through analyzing department wide transaction data we were able to identify the Produce Department as being an area of extreme interest. While examining the product-categories within produce, we discovered that strawberries, blueberries, and peaches contributed to the significant downturn of the produce department. By backtracking the Basket IDs of orders in which strawberries, blueberries, or peaches were brought we then compiled a list of which products were most commonly in those baskets. Of these products, we examined which had the most significant growth rates by creating a new variable to analyze the change in total transactions across the back half of the year. This led us to find soups and cream cheese to be a viable solution to pair our fruit products with. Finally, we examined the consumer demographics of the individuals who bought all six products as well as who are currently buying soups and cream cheese to understand how a marketing/advertising campaign should be targeted. This concluded our strategic analysis and provided us with a clear recommendation for Regork.
Through our analysis we feel that creating a coupon campaign to promote the joint purchase of blueberries, strawberries, and peaches along with soups and cream cheese would be a very effective way to minimize the down-season trends that many of these fruit products have. Within the campaign, marketers should aim to focus more on pairing individual-sized servings as opposed to “family-sized” meals and focus more on price deductions rather than increased volume. In addition, it should be taken into account that most individuals who use these coupons will likely be in the 35K - 74K income range.
Although our analysis was thorough, it is limited by the groupings of the data. Marketing should look more closely into the trends of each individual product to examine what single complement(s) might be best for blueberries, strawberries, or peaches alone. Additionally, it would be useful to examine each product_type by their product_ids to understand if perhaps there are specific listings that contribute more significantly to these trends than others. Finally, it may be good to understand if there is any historical background to which income/demographic groups use coupons so that the campaign can be tailored for the best possible target audience.
#Total Monthly Transactions by Department Including the Grocery Department
transactions %>%
inner_join(products, by = "product_id") %>%
mutate(month = month(transaction_timestamp)) %>%
group_by(month, department) %>%
summarize(TT = n()) %>%
ggplot(aes(x = month, y = TT, color = department)) +
geom_line() +
scale_x_continuous("Month",
breaks = seq(1, 12, by = 1),
labels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug","Sep","Oct","Nov","Dec")) +
scale_y_continuous("Total Transactions",
breaks = seq(0, 85000, by = 5000),
limits = c(0,85000),
expand = expansion(0, 0)) +
labs(color = "Department") +
ggtitle("Total Monthly Transactions by Department")
#Seasonal Growth of Complement Procuts
complement_list %>%
top_n(20) %>%
inner_join(products, by = "product_type") %>%
inner_join(transactions, by = "product_id") %>%
mutate(month = month(transaction_timestamp)) %>%
filter(month %in% 7:12) %>%
group_by(month, product_type) %>%
summarize(total_transactions = n()) %>%
ggplot(aes(x = as.Date(paste0("2024-", month, "-01")),
y = total_transactions,
color = product_type)) +
geom_line() +
labs(x = "Month",
y = "Total Transactions",
title = "Seasonal Growth of Complement Products",
color = "Product Type") +
scale_x_date(date_labels = "%b",
date_breaks = "1 month") +
theme_minimal() +
scale_y_continuous(expand = expansion(0, 0))