Market Basket Analysis: Charcuterie Box Association

Introduction

Business Questions:

  • The goal of this project is to identify areas of growth for a national grocery chain named Regork, where the company could invest future resources to increase revenue and profits. We will be using transaction data from the company database. More information about the data set can be found here

In this project, we plan to explore, analyze and visualize the data to ultimately answer the below three questions:

  1. What are the trends of charcuterie board items purchases? Do they tend to peak around at special holidays?

  2. Is there a certain demographic group is generating large amounts of revenue for these products and we could invest to capture market share in this area?

  3. What types of products are usually bought together for a common charcuterie board? How can we design effective promotions that encourage customers to buy more products together as listed in our common charcuterie board list?

This analysis will help the Regork better their marketing strategies to optimize sales and customer retention.

Common items that are usually included in charcuterie board/box are:

  • Cured Meats like prosciutto, salami, chorizo, or bresaola.

  • Cheeses: A variety of cheeses with different flavors and textures, such as brie, cheddar, goat cheese, or blue cheese.

  • Bread and Crackers: Baguette slices, crackers, or crostini.

  • Fruits: Grapes, figs, berries, or dried fruits like apricots or dates.

  • Nuts: Almonds, walnuts, pecans, or pistachios.

  • Olives: Kalamata or other olives.

  • Spreads: Honey, jam, chutney, or mustard.

Packages Required

The below libraries are used in this project. Most are data wrangling packages except completejourney, which is the transaction data and arules which is association rule mining package. You will need to install these packages to replicate the analysis

library(ggplot2)
library(completejourney)
library(tidyverse)
library(ggrepel)
library(scales)
library(arules)
library(arulesViz)
library(knitr)

Data Prep

Importing the Data

  1. We first import the data from completejournery library with additional of one field called transaction id which is the primary key of the table for us to later remove duplications after joining to products and coupons, household and redemptions.

For the purpose of this analysis - a table charcuterie_board included items mentioned in introduction is created to search againts the product_type

# Create a transaction ID with increment of 1 to remove dups after join
transactions <- get_transactions() %>%
  mutate(transaction_id = row_number()) %>%
  select(transaction_id, everything())

charcuterie_board <- list(
  Cured_Meats = c("Prosciutto", "Salami", "Chorizo", "Bresaola"),
  Cheeses = c("Cheese","Brie", "Cheddar", "Goat Cheese", "Blue Cheese"),
  Bread_and_Crackers = c("Baguette Slices", "Crackers","Bread","Crostini"),
  Fruits = c("Fruits","Grapes", "Figs"),
  Nuts = c("Almonds", "Walnuts", "Pecans", "Pistachios"),
  Olives = c("Kalamata Olives"),
  Spreads = c("Honey", "Mustard"))

data_joined <- transactions %>%
  left_join(products, by = "product_id") %>%
  left_join(coupons, by = "product_id", relationship = "many-to-many") %>%
  left_join(demographics, by = "household_id") %>%
  left_join(campaigns,
            by = c("household_id", "campaign_id"),
            relationship = "many-to-many") %>%
  left_join(
    coupon_redemptions,
    by = c("household_id", "campaign_id", "coupon_upc"),
    relationship = "many-to-many"
  ) %>%
  distinct(transaction_id, .keep_all = TRUE) %>% # Keep one row per transaction_id
  group_by(transaction_id) %>%
  filter(is.na(redemption_date) |
           row_number() == 1) %>%  # Retain one row if redemption_date is NULL
  ungroup()

Exploratory Data Analysis

  1. We filter our common items list for charcuterie board or any combinations of the occurrences and visualize the trend of these purchases throughout the year with the sales values and demographic groups that purchase these products.

Purchase Demographics

# Plot demographic group that purchase items
purchasing_demographics <- interested_transactions %>%
  mutate(loyalty_sales = sales_value - (retail_disc+coupon_match_disc) / quantity,
         loyalty_sales = if_else(loyalty_sales<0.0,0.0,loyalty_sales )) %>%
    mutate(loyalty_sales = replace_na(loyalty_sales, 0)) %>%  # Replace NA with 0
  group_by(age) %>%
  summarise( total_sales = sum(sales_value),
    total_loyalty_sales = sum(loyalty_sales)) 

purchasing_demographics_long <- purchasing_demographics %>%
  pivot_longer(cols = c(total_sales, total_loyalty_sales),
               names_to = "sales_type",
               values_to = "amount")

second_highest_age_group <- purchasing_demographics_long %>%
  filter(!is.na(age)) %>%
  slice_max(order_by= amount, n=1) 

ggplot(purchasing_demographics_long, aes(x = age, y = amount, fill = sales_type)) +
  geom_bar(stat = "identity", position = "dodge") +
    scale_y_continuous(labels = scales::dollar) +
  labs(title = "Total Sales and Loyalty Sales by Age",
       x = "Age Group",
       y = "Sales Amount") +
  theme_minimal() +
  geom_text(data = second_highest_age_group, aes(x = age, y = amount+1000, label = "Age group 45-54 is the second \n highest group in sales for selected products"), size = 5, vjust = 0)

  • The highest sales generated come from Age Group Unknown ~ $120,000 and Age Group 45-54 ~ $65,000. What interesting is that sales tends to increase with Age then decreases after the middle life mark - possibly they are not the ones doing the shopping but their children?

Association Analysis

  1. In our case, basket id is considered a transaction in terms of association analysis. So, we group baskets together to create dummy variables which indicates if an item is present in the basket to feed into our item set for basic association analysis.

  2. What is the likelihood that for a given basket, the items mentioned are purchased together? What is the likelihood when they are not or what is the likelihood that either any combinations of them are purchased together?

Before we interpret our result, we will use three important metrics used in association rule mining: Support, Confidence, and Lift.

Support measures the proportion of transactions that contain a specific itemset \(A\). In this case, what is the proportion of transactions that contains Cheese or Deli Meat? The total number of transactions in this case is the interested transactions that includes selected products, not all transactions at the store - so we don’t introduce noise to the model.

Formula: \[ \text{Support}(A) = \frac{\text{Number of transactions containing } A}{\text{Total number of transactions}} \]

Confidence measures how often items in \(B\) appear in transactions that contain \(A\). In this case, how often Deli Meat appear in transactions that contains Cheese?

Formula: \[ \text{Confidence}(A \Rightarrow B) = \frac{\text{Support}(A \cup B)}{\text{Support}(A)} \]

Lift measures the increase in the likelihood of purchasing \(B\) when \(A\) is purchased, compared to the likelihood of purchasing \(B\) independently. Determine the effectiveness of a rule. A lift greater than 1 indicates a positive correlation, suggesting that the presence of \(A\) increases the likelihood of purchasing \(B\). In other words, Cheese might increase the likelihood of purchasing Meat or otherwise.

Formula: \[ \text{Lift}(A \Rightarrow B) = \frac{\text{Confidence}(A \Rightarrow B)}{\text{Support}(B)} \]

interested_basket <- interested_transactions %>%
  group_by(basket_id) %>%
  summarize(product_list = str_c(unique(product_type), collapse = ";"))

#dummy variable creation
model_basket <- interested_basket %>%
  mutate(
    cured_meats = if_else(str_detect(
      product_list, regex(paste(c("meat", unlist(charcuterie_board$Cured_Meats)), collapse = "|"), ignore_case = TRUE)), TRUE, FALSE),
    cheeses = if_else(str_detect(
      product_list, regex(paste(unlist(charcuterie_board$Cheeses), collapse = "|"), ignore_case = TRUE)), TRUE, FALSE),
    bread_and_crackers = if_else(str_detect(
      product_list, regex(paste(unlist(charcuterie_board$Bread_and_Crackers), collapse = "|"), ignore_case = TRUE)), TRUE, FALSE),
    fruits = if_else(str_detect(
      product_list, regex(paste(unlist(charcuterie_board$Fruits), collapse = "|"), ignore_case = TRUE)), TRUE, FALSE),
      nuts = if_else(str_detect(
      product_list, regex(paste(unlist(charcuterie_board$Nuts), collapse = "|"), ignore_case = TRUE)), TRUE, FALSE),
      olives = if_else(str_detect(
      product_list, regex(paste(unlist(charcuterie_board$Olives), collapse = "|"), ignore_case = TRUE)), TRUE, FALSE),
      spreads = if_else(str_detect(
      product_list, regex(paste(unlist(charcuterie_board$Spreads), collapse = "|"), ignore_case = TRUE)), TRUE, FALSE)
  ) %>%
  select(basket_id, cured_meats, cheeses, bread_and_crackers, fruits, nuts, olives, spreads) %>%
  as.data.frame() %>%
  `rownames<-`(.$basket_id) %>%
  select(-basket_id)

We will use the Apriori algorithm from arules library which is used in data mining for discovering frequent itemsets and generating association rules. It is particularly well-known for its application in market basket analysis, where it helps identify relationships between items purchased together.

basket_trans <- arules::transactions(model_basket)
itemFrequencyPlot(basket_trans,topN = 5)

  • The Frequent Set Items are, suggesting they are viewed as complementary:
    • Cheese
    • Bread and Crackers
    • Cured Meats
    • Fruits
  • Rule: The set of items in the antecedent (left side) leads to the consequent (right side).
rules_table <- head(its, n = 10)

kable(as(rules_table, "data.frame"), format = "markdown")
rules support confidence coverage lift count
2 {} => {cheeses} 0.6334796 0.6334796 1.0000000 1.000000 32348
1 {} => {bread_and_crackers} 0.6264100 0.6264100 1.0000000 1.000000 31987
3 {cured_meats} => {cheeses} 0.1302287 0.6957522 0.1871769 1.098303 6650
8 {cured_meats,bread_and_crackers} => {cheeses} 0.0853635 0.7728723 0.1104496 1.220043 4359
9 {cured_meats,cheeses} => {bread_and_crackers} 0.0853635 0.6554887 0.1302287 1.046421 4359
6 {bread_and_crackers,fruits} => {cheeses} 0.0374432 0.6697023 0.0559102 1.057181 1912
7 {cheeses,fruits} => {bread_and_crackers} 0.0374432 0.6697023 0.0559102 1.069112 1912
5 {cured_meats,fruits} => {cheeses} 0.0164499 0.7872540 0.0208953 1.242746 840
4 {cured_meats,fruits} => {bread_and_crackers} 0.0147266 0.7047798 0.0208953 1.125109 752
10 {cured_meats,bread_and_crackers,fruits} => {cheeses} 0.0123375 0.8377660 0.0147266 1.322483 630

Interpretation:

  • High Demand for Cheeses and Bread & Crackers

  • Rule 1 & 2 : {} => {cheeses} & {} => {bread_and_crackers}

    • Support: both have very similar support which is ~ 63% of all transactions

    • Confidence: both also have very similar confidence ~ 63%, meaning 63% of transactions include cheeses / bread and crackers, regardless of other items.

      => This rule suggests cheeses & bread and crackers are a popular choice overall.

  • Rule 3 & 9: {cured_meats} => {cheeses} & {cured_meats,cheeses} => {bread_and_crackers}

    • Cured meats frequently lead to purchases of cheeses (Rule 3: 69.5% confidence) and, to a lesser extent, bread & crackers (Rule 9: 65.5% confidence). Promoting cured meats as part of a charcuterie board could drive sales of complementary items like cheeses & bread and crackers.
  • Rule 4 & 5: {cured_meats,fruits} => {bread_and_crackers} & {cured_meats,fruits} => {cheeses}

    • Although these items have low support, indicating that they occur less frequently in transactions. They have a very high lift, they still offer valuable insights for targeted marketing and product placement.

Summary

Question 1:

What are the trends of charcuterie board items purchases? Do they tend to peak around at special holidays?

  • Items classified as common charcuterie board products sales reach a peak of $7,819 at the end of the year, while consistently ranging from $4,000 to $6,000 throughout the rest of the year. Notably, most weeks show minimal coupon redemption, as indicated by the low red line on the first graph in representing total discounts applied. This indicates informed marketing strategies - encouraging targeted promotions around major holidays while maintaining regular pricing strategies for consistent sales.

Question 2:

Is there a certain demographic group is generating large amounts of revenue for these products and we could invest to capture market share in this area? * The bar graph comparison suggests that older individuals may not be the primary shoppers; instead, their children might be making these purchases on their behalf. This insight could guide targeted marketing strategies to better capture market share in these demographics.

  • Personalized Promotions - Paper Coupons or Cashback: Create tailored offers that appeal to the preferences of consumers of 35-45 age group, potentially encouraging them to shop for charcuterie items for gatherings or family events as they might also want to save money if they bundle.

Question 3:

What types of products are usually bought together for a common charcuterie board? How can we design effective promotions that encourage customers to buy more products together as listed in our common charcuterie board introduction?

  • Based on the rule tables, the commonly Paired Products are typically bought together and have a strong connection indicated by rule table are:

    • Cured Meats
    • Cheeses
    • Bread & Crackers
    • Fruits
  • Effective Promotion Strategies: Curated Charcuterie Kits: Create bundles that include all essential items: cured meats, cheeses, assorted bread & crackers, and fruits. Offer these kits at a slight discount or cashback or double reward points compared to buying items individually to incentivize purchase. This can encourage bulk purchases and reward repeat customers.

  • For the target group of marketing is 45-54, hosting in-store tasting events where customers can sample combinations of cured meats, cheeses, fruits, and crackers. This can help them discover new pairings and encourage them to purchase the featured items from gourmet charcuterie board brands and store made board

Conclusion:

The analysis reveals distinct purchasing trends for charcuterie board items, highlighting peak sales around significant holidays such as New Year’s, Thanksgiving, and Christmas. These seasonal spikes indicate a strong consumer interest in charcuterie boards during festive occasions, suggesting that targeted promotional strategies should be developed to capitalize on this demand.

To optimize sales throughout the year, brands can create seasonal or holiday-themed charcuterie board/box that feature carefully curated selections of items like Cured Meats, Cheeses, Bread, Crackers, Fruits (indicated in Frequent Set Items). An area for growth could be to add nuts, spreads and olives to the charcuterie board items. There was no purchase or link of these items to our frequent set. This approach not only caters to customers looking for convenient options for gatherings but also enhances the shopping experience by simplifying decision-making during busy times and increase sales by bundling products.

Additionally, the insights regarding demographics suggest that marketing efforts should focus on middle-aged consumers, who may be purchasing on behalf of older family members. Tailored promotions, such as personalized paper coupons or cashback offers, could further engage other groups, encouraging them to buy charcuterie items for family gatherings and events.

Limitations:

There is only one year of transaction data, which might be bias because different year can have a different purchasing trends such as during COVID-19. We could use the same method of analysis to look at other complementary items such as BBQ and explore further on brand vs store products to drive sales.