The business problem, or growth opportunity, I am trying to solve is what products are often sold together and what discounts and campaigns can be marketed to increase the sales for these two items or related items.
I used the complete journey dataset, specifically the transactions and products datasets, by merging them together and filtering the transactions to focus on Gift Tags and Mini Ornaments. I calculated how often the two items appear together, the likelihood of one being purchased with the other, and the strength of association between the two products. I then used the data to come up with strategic recommendations for promotions within Regork.
In order to boost the combination sales of these products, Regork could create promotions offering discounts on the products if purchased together, or discounts on other related products, such as Christmas decorations, wrapping paper, Christmas lights, etc., when these two items are also purchased. Regork could post social media advertisements, as well as make end-cap displays with these products along with other Christmas essentials, promoting the discount, to catch customers’ eyes. They could also send specialized advertisements and promotions to customers who have bought these items in previous years or frequently in the past. These solutions would ultimately drive the sales pairs of gift tags and mini ornaments.
completejourney- data package that represents shopping transactions over one year from a group of 2,469 households who are buy products frequently at a national grocery chain, Regork.
tidyverse- system of packages used for data manipulation and visualization.
ggplot2- data visualization package that maps variables to aesthetics.
DT- this package gives you the ability to make datatables.
library(completejourney)
library(tidyverse)
library(ggplot2)
library(DT)
transactions <- get_transactions()
products
## # A tibble: 92,331 × 7
## product_id manufacturer_id department brand product_category product_type
## <chr> <chr> <chr> <fct> <chr> <chr>
## 1 25671 2 GROCERY Natio… FRZN ICE ICE - CRUSH…
## 2 26081 2 MISCELLANEOUS Natio… <NA> <NA>
## 3 26093 69 PASTRY Priva… BREAD BREAD:ITALI…
## 4 26190 69 GROCERY Priva… FRUIT - SHELF S… APPLE SAUCE
## 5 26355 69 GROCERY Priva… COOKIES/CONES SPECIALTY C…
## 6 26426 69 GROCERY Priva… SPICES & EXTRAC… SPICES & SE…
## 7 26540 69 GROCERY Priva… COOKIES/CONES TRAY PACK/C…
## 8 26601 69 DRUG GM Priva… VITAMINS VITAMIN - M…
## 9 26636 69 PASTRY Priva… BREAKFAST SWEETS SW GDS: SW …
## 10 26691 16 GROCERY Priva… PNT BTR/JELLY/J… HONEY
## # ℹ 92,321 more rows
## # ℹ 1 more variable: package_size <chr>
transa_prod <- transactions %>%
left_join(products, by = "product_id") %>%
select(basket_id, product_type)
transa_prod <- transa_prod %>%
filter(!is.na(product_type))
datatable(head(transa_prod, 100), options = list(pageLength = 10))
After joining transactions and products by their product ID, and filtering the columns to only show basket ID and product type, with no blanks, I was left with 1,462,997 observations.
pairs <- transa_prod %>%
inner_join(transa_prod, by = "basket_id") %>%
filter(product_type.x != product_type.y) %>%
count(product_type.x, product_type.y, sort = TRUE)
top_pairs <- pairs %>%
top_n(20, wt = n) %>%
mutate(pair = paste(product_type.x, "&", product_type.y))
ggplot(top_pairs, aes(x = reorder(pair, n), y = n)) +
geom_col(fill = "steelblue") +
coord_flip() +
labs( title = "Top 10 Frequently Purchased Product Pairs",
x = "Product Pair",
y = "Number of Co-Purchases")

This bar chart shows the top 20 paired transactions by how many times the two products were bought together. We can see that single pack yogurts and milk are co-purchsed the most.
item_counts <- transa_prod %>%
count(product_type) %>%
rename(item_count = n)
pair_metrics <- pairs %>%
left_join(item_counts, by = c("product_type.x" = "product_type")) %>%
rename(count_x = item_count) %>%
left_join(item_counts, by = c("product_type.y" = "product_type")) %>%
rename(count_y = item_count) %>%
mutate(
support = n / nrow(transactions),
confidence = n / count_x,
lift = confidence / (count_y / nrow(transactions))
) %>%
arrange(desc(lift))
top_lift_matrix <- pair_metrics %>%
top_n(15, wt = lift)
ggplot(top_lift_matrix, aes(x = product_type.x, y = product_type.y, fill = lift)) +
geom_tile() +
scale_fill_gradient(low = "lightblue", high = "darkblue") +
labs(title = "Heatmap of Top Product Pairs by Lift",
x = "Product 1",
y = "Product 2",
fill = "Lift Score") +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
scale_y_discrete(expand = expansion(mult = .2)) +
theme(axis.text.y = element_text(size = 10))

This graph is a heatmap of the top product pairs by lift, or their strength of association. The darker blue markers represent those pairs with a higher lift score, and the light blue markers represent those with a lower lift score. As you can see, wings and drums, measure and layout products and bath hardware, and vodka and gin are the three top products by lift.
In this analysis, I identified frequently purchased product pairs, and which customers bought the products that I think would be best for advertising and promotions. Specifically, I focused on Gift Tags and Mini Ornaments.
To answer the business problem, I used the compelete journey data set, specifically the transactions and products data. I merged the two datasets together to analyze product pairings. I filtered the transactions to focus on gift tags and mini ornaments. I calculated the co-purchase frequency, confidence, and lift scores of product pairs to see the relationships between items. I was able to visualize all of these findings through different charts and graphs using the ggplot2 package.
The top 20 product pairings showed that yogurt and milk, vodka and gin, and drumsticks and wings are bought together the most. I, however, focused on gift tags and mini ornaments being purchased together, which also had a high lift score. The highest lift scores showed that products within the same category, such as household items, or food groups, are bought together the most often. When looking at seasonal items, gift tags and mini ornaments had a much higher lift score than other related products, such as wrapping paper.
I recommend offering discounts when purchasing gift tags and mini ornaments together, especially leading up to Christmas or right after. Providing additional discounts on related products, such as wrapping paper, Christmas lights, etc. could drive sales of these two products as well. I also recommend displaying gift tags and mini ornaments on end-cap displays so that they catch customers’ eyes and lead to impulse purchases. Making a holiday section in the store could also drive the combined sales of these products and related products. Using email and digital advertisements can be a way of targeting customers that have purchased similar items in the past or repeat customers. Launching these promotions and deals in late fall and continuing until January or February would maximize these sales opportunities.
This analysis could be improved or built on in numerous ways. One example would be to measure the seasonality trends of these products in order to refine the timing of promotions and sales. Another example is using the demographics data to dive deeper into age, income, purchase history, etc. of frequent customers who purchased these types of items. This would allow you to target specific customers and take marketing strategies even further. Last, diving into other product pairs with high lift scores, such as yogurt and milk, or vodka and gin, could benefit Regork and give many more marketing and promotion opportunities.