Introduction

Business Problem

The business problem, or growth opportunity, I am trying to solve is what products are often sold together and what discounts and campaigns can be marketed to increase the sales for these two items or related items.

How I addressed the problem

I used the complete journey dataset, specifically the transactions and products datasets, by merging them together and filtering the transactions to focus on Gift Tags and Mini Ornaments. I calculated how often the two items appear together, the likelihood of one being purchased with the other, and the strength of association between the two products. I then used the data to come up with strategic recommendations for promotions within Regork.

Proposed Solution

In order to boost the combination sales of these products, Regork could create promotions offering discounts on the products if purchased together, or discounts on other related products, such as Christmas decorations, wrapping paper, Christmas lights, etc., when these two items are also purchased. Regork could post social media advertisements, as well as make end-cap displays with these products along with other Christmas essentials, promoting the discount, to catch customers’ eyes. They could also send specialized advertisements and promotions to customers who have bought these items in previous years or frequently in the past. These solutions would ultimately drive the sales pairs of gift tags and mini ornaments.

Packages/Libraries Required

completejourney- data package that represents shopping transactions over one year from a group of 2,469 households who are buy products frequently at a national grocery chain, Regork.

tidyverse- system of packages used for data manipulation and visualization.

ggplot2- data visualization package that maps variables to aesthetics.

DT- this package gives you the ability to make datatables.

Load Packages

library(completejourney)
library(tidyverse)
library(ggplot2)
library(DT)

Load Transactions & Products

transactions <- get_transactions()
products
## # A tibble: 92,331 × 7
##    product_id manufacturer_id department    brand  product_category product_type
##    <chr>      <chr>           <chr>         <fct>  <chr>            <chr>       
##  1 25671      2               GROCERY       Natio… FRZN ICE         ICE - CRUSH…
##  2 26081      2               MISCELLANEOUS Natio… <NA>             <NA>        
##  3 26093      69              PASTRY        Priva… BREAD            BREAD:ITALI…
##  4 26190      69              GROCERY       Priva… FRUIT - SHELF S… APPLE SAUCE 
##  5 26355      69              GROCERY       Priva… COOKIES/CONES    SPECIALTY C…
##  6 26426      69              GROCERY       Priva… SPICES & EXTRAC… SPICES & SE…
##  7 26540      69              GROCERY       Priva… COOKIES/CONES    TRAY PACK/C…
##  8 26601      69              DRUG GM       Priva… VITAMINS         VITAMIN - M…
##  9 26636      69              PASTRY        Priva… BREAKFAST SWEETS SW GDS: SW …
## 10 26691      16              GROCERY       Priva… PNT BTR/JELLY/J… HONEY       
## # ℹ 92,321 more rows
## # ℹ 1 more variable: package_size <chr>
transa_prod <- transactions %>%
  left_join(products, by = "product_id") %>%
  select(basket_id, product_type)

transa_prod <- transa_prod %>%
  filter(!is.na(product_type))
  
datatable(head(transa_prod, 100), options = list(pageLength = 10))

After joining transactions and products by their product ID, and filtering the columns to only show basket ID and product type, with no blanks, I was left with 1,462,997 observations.

Top product pairings by basket ID

pairs <- transa_prod %>%
  inner_join(transa_prod, by = "basket_id") %>%
  filter(product_type.x != product_type.y) %>%
  count(product_type.x, product_type.y, sort = TRUE)

top_pairs <- pairs %>%
  top_n(20, wt = n) %>%
  mutate(pair = paste(product_type.x, "&", product_type.y))

ggplot(top_pairs, aes(x = reorder(pair, n), y = n)) +
  geom_col(fill = "steelblue") +
  coord_flip() +
  labs( title = "Top 10 Frequently Purchased Product Pairs",
        x = "Product Pair",
        y = "Number of Co-Purchases")

This bar chart shows the top 20 paired transactions by how many times the two products were bought together. We can see that single pack yogurts and milk are co-purchsed the most.

Top pairs by Lift

item_counts <- transa_prod %>%
  count(product_type) %>%
  rename(item_count = n)

pair_metrics <- pairs %>%
  left_join(item_counts, by = c("product_type.x" = "product_type")) %>%
  rename(count_x = item_count) %>%
  left_join(item_counts, by = c("product_type.y" = "product_type")) %>%
  rename(count_y = item_count) %>%
  mutate(
    support = n / nrow(transactions),
    confidence = n / count_x,
    lift = confidence / (count_y / nrow(transactions))
  ) %>%
  arrange(desc(lift))
  
top_lift_matrix <- pair_metrics %>%
  top_n(15, wt = lift)

ggplot(top_lift_matrix, aes(x = product_type.x, y = product_type.y, fill = lift)) +
  geom_tile() +
  scale_fill_gradient(low = "lightblue", high = "darkblue") +
  labs(title = "Heatmap of Top Product Pairs by Lift",
       x = "Product 1",
       y = "Product 2",
       fill = "Lift Score") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  scale_y_discrete(expand = expansion(mult = .2)) +
  theme(axis.text.y = element_text(size = 10))

This graph is a heatmap of the top product pairs by lift, or their strength of association. The darker blue markers represent those pairs with a higher lift score, and the light blue markers represent those with a lower lift score. As you can see, wings and drums, measure and layout products and bath hardware, and vodka and gin are the three top products by lift.

Top Product Pairings for Gift Tags & Mini Ornaments

tag_pairs <- pair_metrics %>%
  filter(product_type.x == "GIFT TAGS" | product_type.y == "GIFT TAGS") %>%
  arrange(desc(lift))


ornament_pairs <- pair_metrics %>%
  filter(product_type.x == "MINI ORNAMENTS" | product_type.y == "MINI ORNAMENTS") %>%
  arrange(desc(lift))

combined_pairs <- bind_rows(tag_pairs, ornament_pairs) %>%
  top_n(10, lift) %>%  # Select top 10 lift values
  mutate(pair = paste(product_type.x, "&", product_type.y))

ggplot(combined_pairs, aes(x = lift, y = reorder(pair, lift))) +
  geom_point(size = 5) +
  labs(title = "Top Product Pairings for Gift Tags & Mini Ornaments",
       x = "Lift Score",
       y = "Product Pair")

This graph represents the top product pairings for either gift tags or mini ornaments by lift. These two items have the highest lift score of being purchased together. All other pairs have a significantly lower lift score.

transa_prod_2 <- transactions %>%
  left_join(products, by = "product_id") %>%
  select(basket_id, product_type, household_id)
  
  
gift_tags_transactions <- transa_prod_2 %>%
  filter(product_type == "GIFT TAGS")

mini_ornaments_transactions <- transa_prod_2 %>%
  filter(product_type == "MINI ORNAMENTS")

gift_tags_counts <- gift_tags_transactions %>%
  count(household_id, name = "gift_tags_purchases")

mini_ornaments_counts <- mini_ornaments_transactions %>%
  count(household_id, name = "mini_ornaments_purchases")

customer_purchases <- full_join(gift_tags_counts, mini_ornaments_counts, by = "household_id") %>%
  replace_na(list(gift_tags_purchases = 0, mini_ornaments_purchases = 0)) %>%
  mutate(total_purchases = gift_tags_purchases + mini_ornaments_purchases) %>%
  arrange(desc(total_purchases))

customer_purchases %>%
  top_n(10, total_purchases) %>%
  pivot_longer(cols = c(gift_tags_purchases, mini_ornaments_purchases), 
               names_to = "product", values_to = "count") %>%
  ggplot(aes(x = reorder(household_id, total_purchases), y = count, fill = product)) +
  geom_col() +
  coord_flip() +
  labs(title = "Customers by Gift Tags & Mini Ornament Purchases",
       x = "Household ID",
       y = "Purchase Count",
       fill = "Product")

This graph shows the households who purchased gift tags or mini ornaments. Regork could send these households personalized promotions or advertisements via email so that they want to buy more of these products.

Summary

In this analysis, I identified frequently purchased product pairs, and which customers bought the products that I think would be best for advertising and promotions. Specifically, I focused on Gift Tags and Mini Ornaments.

How I Answered the Business Problem

To answer the business problem, I used the compelete journey data set, specifically the transactions and products data. I merged the two datasets together to analyze product pairings. I filtered the transactions to focus on gift tags and mini ornaments. I calculated the co-purchase frequency, confidence, and lift scores of product pairs to see the relationships between items. I was able to visualize all of these findings through different charts and graphs using the ggplot2 package.

Insights

The top 20 product pairings showed that yogurt and milk, vodka and gin, and drumsticks and wings are bought together the most. I, however, focused on gift tags and mini ornaments being purchased together, which also had a high lift score. The highest lift scores showed that products within the same category, such as household items, or food groups, are bought together the most often. When looking at seasonal items, gift tags and mini ornaments had a much higher lift score than other related products, such as wrapping paper.

Implications and Recommendations

I recommend offering discounts when purchasing gift tags and mini ornaments together, especially leading up to Christmas or right after. Providing additional discounts on related products, such as wrapping paper, Christmas lights, etc. could drive sales of these two products as well. I also recommend displaying gift tags and mini ornaments on end-cap displays so that they catch customers’ eyes and lead to impulse purchases. Making a holiday section in the store could also drive the combined sales of these products and related products. Using email and digital advertisements can be a way of targeting customers that have purchased similar items in the past or repeat customers. Launching these promotions and deals in late fall and continuing until January or February would maximize these sales opportunities.

Limitations

This analysis could be improved or built on in numerous ways. One example would be to measure the seasonality trends of these products in order to refine the timing of promotions and sales. Another example is using the demographics data to dive deeper into age, income, purchase history, etc. of frequent customers who purchased these types of items. This would allow you to target specific customers and take marketing strategies even further. Last, diving into other product pairs with high lift scores, such as yogurt and milk, or vodka and gin, could benefit Regork and give many more marketing and promotion opportunities.