REGORK GROCERY CHAIN: Optimal Marketing Campaign to Maximize Revenue and Customer Satisfaction

SYNOPSIS

Currently, the consumption trends of customer groups are completely different, fluctuating based on many different demographics. It is completely practical and urgent for grocery stores to grasp and orient their sales strategies, both bringing profits and outlining the long-term effectiveness of the strategy.

Recognizing and evaluating the trends of big names like Walmart, Tesco, grocery stores are tending to sell items together, also known as bundles, bringing in a lot of revenue, especially when sold at the right times. Based on the connection of data in the completejourney package, Regork can find suitable factors to deploy bundles and increase its sales.

Using the differences in customer demographics, Regork can analyze which age group will spend the most on its products, thereby focusing and building strategies, stimulating supply for the items that this age group needs. From there, we can combine the purchasing density of each product, deduce which products are “famous”, and then combine the “famous” products together, creating bundles with close connections. In addition, taking advantage of the times (holidays) and customer loyalty, Regork can completely promote the bundle trading strategy and bring in huge revenue.

In this report, we will analyze and evaluate each data that Regork needs and combine them together, creating a detailed and vivid statistical picture. From there, develop strategies for each product set. The most important thing is that the methods proposed through each statistical data will contribute to sales growth and enhance Regork’s potential development.

Packages Required

library(tidyverse)
library(dplyr)
library(ggplot2)
library(tidyr)
library(lubridate)
library(arules)
library(completejourney)
library(widyr)
library(kableExtra)

Data Preparation

Our analytical approach and data:

We use the join function to combine the get_transactions and product data to get more detailed product data.

Then we combine the demographics data and the get_promotions big data to get a more comprehensive view of consumer shopping behavior.

Finally, we choose to join the promotions dataset and transactions to get more information and assess the impact of marketing campaigns and coupons on purchases based on product category.

Importing data sets needed for analysis

# Load the datasets
transactions <- get_transactions()
products <- products
demographics <- demographics
promotions <- get_promotions()
# Data Frame
# Join transactions with products
transactions_products <- transactions %>%
  left_join(products, by = "product_id")
# Join transactions with demographics
transactions_demographics <- transactions %>%
  left_join(demographics, by = "household_id") %>%
  filter(!is.na(age))
# Join transactions with promotions
transactions_promotions <- promotions %>%
  inner_join(transactions, by = c('product_id', 'store_id', 'week'))

Exploratory Data Analysis

1. Demographic Analysis

How can demographic groups under-purchasing certain products (age, income, household size) generate different levels of revenue and sales for specific product categories?

Which demographic groups under-purchasing certain products, indicating potential untapped markets?

1.1 Aggregate Revenue by Demographic Group and Product Category

# Sales by demographic group
revenue_by_demographic <- transactions_demographics %>%
  inner_join(products, by = 'product_id') %>%
  group_by(age, income, household_size, product_category) %>%
  summarise(total_revenue = sum(sales_value), .groups = 'drop')

# Plot revenue by demographic group
ggplot(revenue_by_demographic, aes(x = age, y = total_revenue, fill = income)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Sales by Demographic Group", x = "Age Group", y = "Total Sales") 

1.2 Identify High-Revenue Generating Demographic Groups per Product Category

find the demographic group that generates the highest revenue for each product category.

# Find the highest revenue-generating demographic groups for each product category
high_revenue_demographics <- revenue_by_demographic %>%
  group_by(product_category) %>%
  top_n(1, total_revenue) %>%  # Select the demographic group with the highest revenue for each category
  arrange(product_category, desc(total_revenue))

# View the highest revenue-generating demographic groups per product category
high_revenue_demographics %>%
  head() %>%
  kable("html", caption = "High Revenue - Generating Demographics by Product Category") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = FALSE)
High Revenue - Generating Demographics by Product Category
age income household_size product_category total_revenue
65+ 35-49K 1 ADULT INCONTINENCE 489.56
45-54 50-74K 1 AIR CARE 427.67
45-54 50-74K 1 ANALGESICS 319.09
19-24 50-74K 3 ANTACIDS 364.42
45-54 50-74K 1 APPAREL 112.23
45-54 75-99K 2 APPLES 561.83

1.3 Identify Under-Purchasing Demographic Groups

# Calculate the average revenue per product category
avg_revenue_by_category <- revenue_by_demographic %>%
  group_by(product_category) %>%
  summarise(avg_revenue = mean(total_revenue, na.rm = TRUE), .groups = 'drop')

# Join the average revenue with the original data to find under-purchasing groups
under_purchasing_groups <- revenue_by_demographic %>%
  left_join(avg_revenue_by_category, by = "product_category") %>%
  filter(total_revenue < avg_revenue) %>%  # Identify groups with less than average revenue
  arrange(product_category, total_revenue)

# View the under-purchasing demographic groups
under_purchasing_groups %>%
  head() %>%
  kable("html", caption = "Under-Purchasing Demographic Groups") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), 
                full_width = FALSE, position = "center") %>%
  column_spec(1, bold = TRUE, color = "black", background = "#f2f2f2") %>%
  column_spec(2, width = "20em") 
Under-Purchasing Demographic Groups
age income household_size product_category total_revenue avg_revenue
25-34 50-74K 2 ADULT INCONTINENCE 2.79 52.05714
45-54 50-74K 1 ADULT INCONTINENCE 2.79 52.05714
65+ 50-74K 2 ADULT INCONTINENCE 2.79 52.05714
35-44 15-24K 3 ADULT INCONTINENCE 2.95 52.05714
35-44 15-24K 4 ADULT INCONTINENCE 2.99 52.05714
55-64 25-34K 3 ADULT INCONTINENCE 3.18 52.05714

2. Product Pairing and Bundling

We will analyze which products are frequently purchased together. Furthermore, we will also show how Regork can leverage this data to develop strategies such as cross-selling or bundling strategies to contribute to increasing revenue.

And most importantly, we will try to find products that are closely linked to customer consumption behavior (e.g., frozen pizzas and beer) to analyze their behavior for long-term projects.

# Ensure distinct transactions with relevant columns
transactions_products_pairing <- transactions_products %>%
  select(basket_id, product_category) %>%
  distinct()

# Find product pairs frequently purchased together
product_pairs <- transactions_products_pairing %>%
  group_by(basket_id) %>%
  summarise(products = list(sort(unique(product_category)))) %>%  # Sort product names to ensure consistent order
  unnest(products) %>%
  pairwise_count(products, basket_id, sort = TRUE) %>%
  mutate(pair = ifelse(item1 < item2, 
                       paste(item1, "&", item2), 
                       paste(item2, "&", item1))) %>%  # Standardize pair order
  group_by(pair) %>%
  summarise(n = sum(n)) %>%  # Aggregate counts for duplicate pairs
  arrange(desc(n))

# Get top 10 pairs
top_pairs <- top_n(product_pairs, 10, n)

# Plot the bar chart
ggplot(top_pairs, aes(x = reorder(pair, n), y = n)) +
  geom_col(fill = "steelblue") +
  coord_flip() +
  labs(title = "Top 10 Frequently Bought Together Product Pairs",
       x = "Product Pair",
       y = "Frequency") +
  theme_minimal()

3. Seasonal and Trend Analysis

We will first research and evaluate which products have appropriate and relevant categories that experience sales spikes, especially for special holidays of the year such as Halloween or Christmas.

We will also make many hypotheses about what factors influence consumer trends. To do that, we analyze data of shopping trends for items that change over the year to gain more insights into this factor and propose business strategies.

Seasonal Product Categories for Plots: Halloween → CANDY, CHOCOLATE, PUMPKIN Thanksgiving → TURKEY, STUFFING, PIES, CRANBERRY SAUCE Christmas → CHRISTMAS DECOR, HOLIDAY TREATS Valentine’s Day → VALENTINE GIFTS, CHOCOLATE Summer BBQ → BBQ MEATS, WATERMELON

# Add date information to transactions
transactions_products_seasonal <- transactions_products %>%
  mutate(date = as.Date(transaction_timestamp))
unique_product_types <- transactions_products_seasonal %>%
  distinct(product_type) %>%
  arrange(product_type)
# Simplified product category mapping
transactions_products_seasonal <- transactions_products_seasonal %>%
  mutate(product_group = case_when(
    str_detect(product_type, "CANDY|CHOCOLATE") ~ "Candy & Chocolate",
    str_detect(product_type, "PUMPKIN") ~ "Pumpkin",
    str_detect(product_type, "TURKEY") ~ "Turkey",
    str_detect(product_type, "HOLIDAY|CHRISTMAS") ~ "Holiday Treats",
    str_detect(product_type, "VALENTINE") ~ "Valentine Gifts",
    str_detect(product_type, "BBQ|HOT DOGS|BURGERS") ~ "BBQ Meats",
    str_detect(product_type, "WATERMELON") ~ "Watermelon",
    TRUE ~ NA_character_
  )) %>%
  filter(!is.na(product_group))

# Function to create sales trend plots for each holiday
plot_holiday_trends <- function(holiday_name, categories) {
  holiday_sales <- transactions_products_seasonal %>%
    filter(product_group %in% categories) %>%
    group_by(date, product_group) %>%
    summarise(total_sales = sum(quantity, na.rm = TRUE), .groups = 'drop')

  ggplot(holiday_sales, aes(x = date, y = total_sales, color = product_group)) +
    geom_line(size = 1) +
    labs(title = paste("Sales Trend for", holiday_name),
         x = "Date",
         y = "Total Sales",
         color = "Product Category") +
    theme_minimal()
}

# Generate plots for each holiday
plot_halloween <- plot_holiday_trends("Halloween", c("Candy & Chocolate", "Pumpkin"))
plot_thanksgiving <- plot_holiday_trends("Thanksgiving", "Turkey")
plot_christmas <- plot_holiday_trends("Christmas", c("Holiday Treats"))
plot_valentine <- plot_holiday_trends("Valentine's Day", c("Candy & Chocolate", "Valentine Gifts"))
plot_summer_bbq <- plot_holiday_trends("Summer BBQ", c("BBQ Meats", "Watermelon"))

All plots display Sales Trend for each Holiday

Halloween
print(plot_halloween)

Thanksgiving
print(plot_thanksgiving)

Christmas
print(plot_christmas)

Valentine
print(plot_valentine)

Summer BBQ
print(plot_summer_bbq)

4. Marketing Campaign Effectiveness

To analyze the marketing effectiveness of Regork, we base it on the profitability and revenue of marketing strategies related to promotions.

In addition, we also consider the influence of coupons or discounts on consumer purchasing behavior.

# Create two subsets: with promotions and without promotions
sales_with_promotions <- transactions_promotions %>%
  filter(retail_disc > 0 | coupon_disc > 0 | coupon_match_disc > 0)

sales_without_promotions <- transactions_promotions %>%
  filter(retail_disc == 0 & coupon_disc == 0 & coupon_match_disc == 0)

# Summarize sales with and without promotions
summary_comparison <- data.frame(
  category = c("With Promotions", "Without Promotions"),
  total_sales_value = c(
    sum(sales_with_promotions$sales_value, na.rm = TRUE),
    sum(sales_without_promotions$sales_value, na.rm = TRUE)
  ),
  total_quantity = c(
    sum(sales_with_promotions$quantity, na.rm = TRUE),
    sum(sales_without_promotions$quantity, na.rm = TRUE)
  ),
  avg_sales_value = c(
    mean(sales_with_promotions$sales_value, na.rm = TRUE),
    mean(sales_without_promotions$sales_value, na.rm = TRUE)
  ),
  avg_quantity = c(
    mean(sales_with_promotions$quantity, na.rm = TRUE),
    mean(sales_without_promotions$quantity, na.rm = TRUE)
  )
)

# Print the summary comparison
summary_comparison %>%
  kable("html", caption = "Comparison of Sales and Quantity with and without Promotions") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), 
                full_width = FALSE, position = "center") %>%
  column_spec(1, bold = TRUE) %>%
  column_spec(2:5, width = "10em") %>%
  row_spec(0, bold = TRUE, font_size = 14)
Comparison of Sales and Quantity with and without Promotions
category total_sales_value total_quantity avg_sales_value avg_quantity
With Promotions 739529.5 397202 2.670977 1.434584
Without Promotions 166196.5 67380 3.267724 1.324813
# Reshape data for line plot
summary_comparison_long <- na.omit(summary_comparison) %>%
  pivot_longer(cols = c(total_sales_value, total_quantity),
               names_to = "metric", values_to = "value") 
# Plot: Stacked bar plot to compare all four metrics in one plot
ggplot(summary_comparison_long, aes(x = category, y = value, fill = metric)) +
  geom_bar(stat = "identity") +
  labs(title = "Comparison of Sales and Quantity with and without Promotions",
       x = "Promotion Type", y = "Value") +
  scale_fill_manual(values = c( "lightgreen","lightsalmon")) +
   scale_y_continuous(labels = scales::comma) +
  theme_minimal()

5. Customer Loyalty and Retention

Finally, loyalty and retention analysis is also a campaign that Regork should pay attention to. Can Regork enhance customer retention, through our analysis of the shopping habits of loyal customers and occasional shoppers?

In addition, we also analyze the repeatability of certain products and their categories specifically among those repeat customers.

5.1 Identify Loyal vs. Occasional Shoppers

First, we prioritize the analysis by taking the number of transactions per customer and then aggregating that number, inferring the loyalty of customers and then classifying them into occasional groups based on this.

# Calculate the number of transactions per household_id (representing the customer)
customer_transactions <- transactions_products %>%
  group_by(household_id) %>%
  summarise(transaction_count = n_distinct(basket_id))  # basket_id represents distinct purchases

# Classify customers as loyal or occasional based on the number of transactions
customer_transactions <- customer_transactions %>%
  mutate(customer_type = ifelse(transaction_count > 5, "Loyal", "Occasional"))

# Merge this classification back with the transactions_products dataset
transactions_products <- transactions_products %>%
  left_join(customer_transactions, by = "household_id")

# View the updated transactions_products dataset to confirm
transactions_products %>%
  head() %>%
  kable("html", caption = "Customer Transactions and Loyalty Classification") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), 
                full_width = FALSE, position = "center") %>%
  column_spec(1, bold = TRUE) %>%
  column_spec(2) %>%
  column_spec(3, width = "20em") %>%
  column_spec(4, bold = TRUE) %>%
  column_spec(5, bold = TRUE) %>%
  row_spec(0, bold = TRUE, font_size = 14)
Customer Transactions and Loyalty Classification
household_id store_id basket_id product_id quantity sales_value retail_disc coupon_disc coupon_match_disc week transaction_timestamp manufacturer_id department brand product_category product_type package_size transaction_count customer_type
900 330 31198570044 1095275 1 0.50 0.00 0 0 1 2017-01-01 06:53:26 2 PASTRY National ROLLS ROLLS: BAGELS 4 OZ 674 Loyal
900 330 31198570047 9878513 1 0.99 0.10 0 0 1 2017-01-01 07:10:28 69 GROCERY Private FACIAL TISS/DNR NAPKIN FACIAL TISSUE & PAPER HANDKE 85 CT 674 Loyal
1228 406 31198655051 1041453 1 1.43 0.15 0 0 1 2017-01-01 07:26:30 69 GROCERY Private BAG SNACKS POTATO CHIPS 11.5 OZ 403 Loyal
906 319 31198705046 1020156 1 1.50 0.29 0 0 1 2017-01-01 07:30:27 2142 GROCERY National REFRGRATD DOUGH PRODUCTS REFRIGERATED BAGELS 17.1 OZ 211 Loyal
906 319 31198705046 1053875 2 2.78 0.80 0 0 1 2017-01-01 07:30:27 2326 GROCERY National SEAFOOD - SHELF STABLE TUNA 5.0 OZ 211 Loyal
906 319 31198705046 1060312 1 5.49 0.50 0 0 1 2017-01-01 07:30:27 608 GROCERY National FRZN MEAT/MEAT DINNERS FRZN BREADED PREPARED CHICK 30 OZ 211 Loyal

5.2 Analyze Purchasing Patterns (Loyal vs. Occasional)

We implemented and compared purchasing behavior (purchase quantity, average transaction value) between loyal and occasional customers.

# Summarize purchasing patterns for loyal vs. occasional customers
purchase_summary <- transactions_products %>%
  group_by(customer_type) %>%
  summarise(
    avg_transaction_value = mean(sales_value, na.rm = TRUE),
    total_quantity = sum(quantity, na.rm = TRUE),
    total_sales_value = sum(sales_value, na.rm = TRUE),
    avg_quantity_per_purchase = mean(quantity, na.rm = TRUE)
  )

# Print the summary
purchase_summary %>%
  kable("html", caption = "Purchasing Patterns of Loyal vs. Occasional Customers") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), 
                full_width = FALSE, position = "center") %>%
  column_spec(1, bold = TRUE) %>%
  column_spec(2) %>%
  column_spec(3) %>%
  column_spec(4) %>%
  column_spec(5) %>%
  row_spec(0, bold = TRUE, font_size = 14)
Purchasing Patterns of Loyal vs. Occasional Customers
customer_type avg_transaction_value total_quantity total_sales_value avg_quantity_per_purchase
Loyal 3.128353 152835591 4584329.75 104.29523
Occasional 3.007147 87517 11709.83 22.47483

Recommendations

  1. Propose Specific Marketing Strategies Based on Demographic Trends and Purchasing Patterns

We define the Target High-Revenue Demographics: We based on the demographic analysis including age, income, and household size. Then we identify which groups have the most revenue (from demographics)of each product category.

Recommendation:

-> Focus on personalized ads and promotions for customers within the age and income groups that generate the highest revenue for a specific product category. For instance, if air care products are generating higher revenue for the 45-54 age group with 50-74k income, we should continue to target our promotion and advertisement to this segment. -> For under-purchasing groups, consider pairing it with popular items with a slight discount to increase product visibility. For example, “Buy product A ( under-purchasing product), with product B ( popular product) with only $X more”. Also, offer loyalty rewards to convert occasional customers to loyal ones.

  1. Suggest Product Bundling or Cross-Selling Opportunities

We try to connect and define frequent product pairs from the Market Basket Analysis. We also identify the co-purchased product from the transaction data. Then lastly, we propose bundling opportunities for these products.

Recommendation:

-> Mixed Bundling: Use the mixed bundling technique to offer a slight discount for any top-frequent pair of items bought together as a bundle. For instance, “Buy a Bottle of Chocolate Milk, Get 10% off Coke.” -> Tiered Discounts: Offer tiered discounts based on quantity: “Buy 2 get 5% off, Buy 3 get 10% off, Buy 4 get 15% off”. This fosters sales and customer loyalty within that product. Cross-selling: Promoting frequent-paired products as recommendations for customers at checkout or online. If a customer buys a frozen pizza, for instance, we can suggest pairing it with Coke or Beer, as this is a common combination among other shoppers. This personalized approach improves customer experience while increasing sales.

  1. Recommend Optimal Timing and Types of Promotions to Maximize Sales

Based on the Seasonal and Trend Analysis, we will identify which products experience the sales peak during special times of the year such as Halloween, Christmas, and Valentine’s. By leveraging this advantageous information to suggest promotions for these times.

Recommendation: -> Leverage holiday sales trends by promoting high-demand products during peak seasons at 2-3 weeks before -> Boost sales with seasonal flash sales and time-sensitive offers to drive urgency

  1. Suggest Discount Strategies to Maximize Customer Engagement

We demonstrate the kind of Discount from Coupon and Discount Analysis. To do that, we try to identify various types of promotions ranging from retail to coupon and match, then determine which one is the most effective among the rest.

Recommendation:

Multi-Discount Promotions: . Offering a combination of retail discounts and coupon discounts at a time, such as 20% off retail price + $5 off coupon Loyalty Program Enhancements: Focus on the loyalty program with milestone-based incentives for loyal customers. For instance, “10% off on the 10th purchase at Regork”

  1. Leverage Customer Segmentation for Tailored Offers

We classify shoppers as loyal and Occasional, and then we can target the loyal segments with different promotional strategies to maximize their engagement.

Recommendation:

-> Loyal customers will have better promotions compared to normal customers, such as 10% extra on top of the 15% discount that normal customers get. -> Offering exclusive weekly deals for loyal customers to increase purchase frequency and higher customer retention. Customers will be returning every week if they know there will be good deals and stop looking elsewhere for similar products.