Introduction

Understanding the Business Problem

The grocery industry is highly competitive, with thin margins and increasing pressure from discount retailers and e-commerce platforms. One major strategy that retailers use to increase profitability and customer loyalty is investing in private label (store-brand) products.

According to Nielsen (2023), private label sales in the U.S. account for over 18% of total grocery sales, with European markets seeing even higher penetration levels of 30-40%. Private label products typically yield higher profit margins than national brands due to lower marketing costs and vertical integration in supply chains. However, consumer perception varies—while some shoppers actively seek private labels for affordability and quality, others remain loyal to national brands, often associating them with higher trust and consistency.

This analysis aims to answer the key business question:

How do customers choose between private label and national brand products, and how can Regork drive private label adoption to maximize profitability?

Methodology and Data Sources

To analyze customer behavior and private label preference, we use the Complete Journey dataset, which includes:

Transaction Data: Captures purchase history, allowing us to segment sales into private label vs. national brands.
Product Data: Identifies whether an item is a private label or a national brand.
Demographic Data: Provides insights into customer segments (income, household size, etc.).

Steps in Our Analysis

Data Preparation: Clean and merge datasets to classify products into private label vs. national brand.
Exploratory Data Analysis: Identify sales trends, customer demographics, and price sensitivity.
Statistical & Visual Analysis:

Compare private label adoption across different income groups.
Analyze price sensitivity—do customers switch to private labels when national brands are more expensive?
Evaluate promotion impact—do discounts significantly increase private label purchases?

Expected Impact & Proposed Solution

By understanding private label adoption trends, Regork can:

Enhance marketing strategies by targeting demographics most likely to convert to private label.
Optimize pricing & promotions to maximize private label sales while maintaining profitability.
Improve store layout & bundling strategies to increase visibility and preference for private label items.

This data-driven approach will provide actionable insights for Regork’s CEO and marketing team, enabling them to make strategic decisions that boost revenue, enhance customer loyalty, and drive higher margins.

Resources

Packages

Below are the key R packages used in this analysis:

tidyverse – A collection of essential R packages for data manipulation, visualization, and analysis. Includes ggplot2, dplyr, tidyr, readr, and more.
dplyr – Provides a fast, intuitive, and consistent grammar for data wrangling (filtering, summarizing, joining, etc.).
ggplot2 – Used for data visualization, creating high-quality and customizable graphs.
lubridate – Simplifies working with dates and times, crucial for analyzing time-based purchasing behavior.
stringr – Provides functions for working with text data, useful for handling product names and categories.
knitr – Converts R Markdown into formatted HTML reports.
kableExtra – Enhances tables in reports, allowing for styled and formatted tables in HTML.
scales – Improves number and date formatting for better data visualization.
gridExtra – Helps arrange multiple plots in a single output.
arules – Used for association rule mining (e.g., identifying which products are frequently purchased together).
arulesViz – Helps visualize association rules from market basket analysis.
janitor – Simplifies data cleaning, including removing duplicate column names and handling messy datasets.

Libraries

To use these packages, we need to load them in R using the library() function. Below is the code to load all necessary libraries for our analysis:

# load libraries

library(tidyverse)            # Data manipulation & visualization
library(dplyr)                # Efficient data wrangling
library(ggplot2)              # High-quality visualizations
library(lubridate)            # Handling date & time data
library(stringr)              # String/text manipulation
library(knitr)                # Markdown reporting
library(kableExtra)           # Enhanced table formatting
library(scales)               # Improves axis scaling in plots
library(gridExtra)            # Arranges multiple plots
library(arules)               # Market basket analysis
library(arulesViz)            # Association rule visualization
library(janitor)              # Data cleaning
library(completejourney)      # Data set

Data

Data Overview

The data for this R project can be accessed from the CompleteJourney website. The CompleteJourney datasets are based on grocery shopping transactions from a group of 2,469 households. Entities such as demographics, products, coupons, campaigns, etc., were collected over a one-year timeframe from January 2017 - December 2017.

We utilize three key datasets: transactions, products, and demographics. The tables below outline the variables used in my analysis.

All data preparation, including joins, slices, and new variables are included in the exploratory data analysis code set up.

Transactions Table

The transactions dataset records customer purchases, including details about sales and discounts.

# define transactions table structure

transactions_table <- data.frame(
  "Variable Name" = c("household_id", "product_id", "quantity", "sales_value", "retail_disc", "coupon_disc"),
  "Data Type" = c("character", "character", "numeric", "numeric", "numeric", "numeric"),
  "Variable Description" = c(
    "Uniquely identifies each household",
    "Uniquely identifies each product",
    "Number of the product purchased during the trip",
    "Amount of dollars the retailer receives from sale",
    "Discount applied due to the retailer’s loyalty card program",
    "Discount applied due to a manufacturer coupon"
  )
)

# display transactions table

kable(transactions_table, caption = "Transactions Table Definitions") %>%
  kable_styling(bootstrap_options = c("striped", "condensed"), full_width = TRUE)

Transactions Table Definitions
Variable.Name	Data.Type	Variable.Description
household_id	character	Uniquely identifies each household
product_id	character	Uniquely identifies each product
quantity	numeric	Number of the product purchased during the trip
sales_value	numeric	Amount of dollars the retailer receives from sale
retail_disc	numeric	Discount applied due to the retailer’s loyalty card program
coupon_disc	numeric	Discount applied due to a manufacturer coupon

library(completejourney)

transactions <- get_transactions()

# load data sets
 
data("transactions")
data("products")
data("demographics")

Products Table

The products dataset provides information about each product, such as its category, brand, and whether it’s a private label or national brand.

# define products table structure

products_table <- data.frame(
  "Variable Name" = c("product_id", "department", "brand"),
  "Data Type" = c("character", "character", "character"),
  "Variable Description" = c(
    "Uniquely identifies each product",
    "Department/category the product belongs to",
    "Indicates whether the product is a private label or national brand"
  )
)

# display products table

kable(products_table, caption = "Products Table Definitions") %>%
  kable_styling(bootstrap_options = c("striped", "condensed"), full_width = TRUE)

Products Table Definitions
Variable.Name	Data.Type	Variable.Description
product_id	character	Uniquely identifies each product
department	character	Department/category the product belongs to
brand	character	Indicates whether the product is a private label or national brand

Demographics Table

The demographics dataset provides information on household demographic data such as age, income, family size, and more.

# define demographics table structure

demographics_table <- data.frame(
  "Variable Name" = c("household_id", "age", "income", "household_size"),
  "Data Type" = c("character", "character", "character", "character"),
  "Variable Description" = c(
    "Uniquely identifies each household",
    "Age group of the primary shopper",
    "Income bracket of the household",
    "Household size category"
  )
)

# display demographics table

kable(demographics_table, caption = "Demographics Table Definitions") %>%
  kable_styling(bootstrap_options = c("striped", "condensed"), full_width =TRUE)

Demographics Table Definitions
Variable.Name	Data.Type	Variable.Description
household_id	character	Uniquely identifies each household
age	character	Age group of the primary shopper
income	character	Income bracket of the household
household_size	character	Household size category

Exploratory Data Analysis

Brand Market Share

1. Overall Regork Market

# load libraries

library(tidyverse)
library(ggplot2)
library(scales)

# join: transactions + product
# summarize total sales by brand type

brand_sales <- transactions %>%
  inner_join(products, by = "product_id")

brand_sales <- brand_sales %>%
  mutate(label_type = ifelse(brand == "Private", "Private Label", "National Brand")) %>%
  group_by(label_type) %>%
  summarise(total_sales = sum(sales_value, na.rm = TRUE), .groups = "drop")

# create pie chart

ggplot(brand_sales, aes(x = "", y = total_sales, fill = label_type)) +
  geom_bar(stat = "identity", width = 1, color = "white") + 
  coord_polar(theta = "y", start = 0) +  
  scale_fill_manual(values = c("#2E86C1", "#E74C3C")) + 
  geom_text(
    aes(label = paste0(round(total_sales / sum(total_sales) * 100, 1), "%")), 
    position = position_stack(vjust = 0.5), 
    size = 5, 
    color = "white",
    fontface = "bold"
  ) +
  labs(
    title = "Total Sales Distribution: Private Label vs. National Brands",
    subtitle = "Percentage share of total sales by brand type",
    fill = "Brand Type"
  ) +
  theme_void() +  
  theme(
    plot.title = element_text(size = 16, face = "bold", hjust = 0.5), 
    plot.subtitle = element_text(size = 12, hjust = 0.5, face = "italic"), 
    legend.position = "bottom",
    legend.title = element_text(size = 12), 
    legend.text = element_text(size = 11)
  )

Analysis

This pie chart provides a high-level overview of the market share distribution between Private Label and National Brands at Regork. The results indicate that National Brands dominate the grocery landscape, accounting for 71.7% of total sales, while Private Label products make up only 28.3%.

This suggests that while private label products hold a significant portion of the market, they are still secondary in consumer preference compared to national brands. The strong presence of national brands is likely driven by brand loyalty, perceived quality, and consumer trust, particularly in categories where national brands have long-established reputations. Additionally, national brands often have higher marketing investments, promotional campaigns, and widespread consumer recognition, which can further contribute to their larger market share. However, the 28.3% private label share is not insignificant. This indicates a growing acceptance of store-branded products, likely due to their competitive pricing, improved quality, and shifting consumer preferences toward cost-effective alternatives. In recent years, retailers have been investing heavily in private label branding, positioning store-brand items as premium yet affordable alternatives to national brands.

Recommendations for Regork

For Regork, this data highlights an opportunity to expand private label penetration by enhancing product perception, strategic pricing, and promotional efforts. Strategies such as bundling private label products, emphasizing quality comparisons, and offering exclusive discounts can encourage customers to substitute national brands with private label options. Additionally, targeting specific consumer segments—such as budget-conscious shoppers or families purchasing in bulk—could further boost private label adoption.

Ultimately, while national brands continue to hold the majority of the market, there is clear potential for growth in private label sales. With targeted marketing and strategic pricing, Regork can work toward shifting consumer behavior and capturing a larger share of the private label market.

2. Overall Regork Market by Department

# load libraries

library(tidyverse)
library(ggplot2)
library(scales)

# join: transactions + products
# summarize total sales by brand and department

brand_dept_sales <- transactions %>%
  inner_join(products, by = "product_id") %>%
  mutate(label_type = ifelse(brand == "Private", "Private Label", "National Brand")) %>%
  filter(!is.na(department) & !is.na(label_type)) %>%
  group_by(department, label_type) %>%
  summarise(total_sales = sum(sales_value, na.rm = TRUE), .groups = "drop") %>%
  group_by(department) %>%
  mutate(percentage = total_sales / sum(total_sales, na.rm = TRUE) * 100)

# adjust percentages to 100

brand_dept_sales <- brand_dept_sales %>%
  group_by(department) %>%
  mutate(percentage = percentage / sum(percentage) * 100) %>%
  ungroup()

# sort departments in descending order

sorted_dept <- brand_dept_sales %>%
  filter(label_type == "Private Label") %>%
  arrange(desc(percentage))

brand_dept_sales$department <- factor(brand_dept_sales$department, levels = sorted_dept$department)

# create horizontal bar graph

ggplot(brand_dept_sales, aes(x = percentage, y = reorder(department, percentage), fill = label_type)) +
  geom_bar(stat = "identity", width = 0.7) +
  scale_x_continuous(labels = percent_format(scale = 1), limits = c(0, 100)) +
  scale_fill_manual(values = c("Private Label" = "#2E86C1", "National Brand" = "#E74C3C")) +

  labs(
    title = "Private Label vs. National Brand Market Share 
    by Department",
    subtitle = "Percentage share of sales by brand type across departments",
    x = "Percentage of Department Sales",
    y = "Department",
    fill = "Brand Type"
  ) +

  theme_minimal() +
  theme(
    plot.title = element_text(size = 16, face = "bold", hjust = 0.5),
    plot.subtitle = element_text(size = 12, face = "italic", hjust = 0.5),
    axis.title.x = element_text(size = 12),
    axis.title.y = element_text(size = 12),
    axis.text.x = element_text(size = 10),
    axis.text.y = element_text(size = 10),
    legend.position = "bottom",
    legend.title = element_text(size = 12),
    legend.text = element_text(size = 11)
  ) +
  
  guides(fill = guide_legend(reverse = TRUE))

Analysis

This analysis provides a detailed look into the market share distribution between Private Label and National Brands across different grocery store departments at Regork. The findings reveal clear patterns in consumer preferences and highlight opportunities for growth in private label adoption.

Certain categories, such as Fuel, Automotive, Miscellaneous, and Packaged Seafood, show a strong presence of private label sales, making up more than 50% of department sales. Additionally, Pastry, Grocery, and Central/Store Supplies also exhibit a notable share of private label products. These findings suggest that consumers are more open to store-brand alternatives in everyday essentials and non-perishable goods, making these key areas for further private label expansion.

On the other hand, departments like Cosmetics, Floral, Meat, and Seafood remain heavily dominated by national brands. Private label penetration in these areas is minimal, likely due to consumer concerns about quality, safety, and brand trust. Similarly, Nutrition and Drug GM (General Merchandise) also lean toward national brands, reinforcing the idea that customers prefer established and recognizable brands when it comes to health-related and high-risk purchases.

A few departments, such as Deli, Produce, and Packaged Meat, demonstrate a more balanced distribution between private label and national brands. This suggests that customers may be willing to explore private label alternatives in these sections, but additional marketing efforts may be required to increase adoption.

Recommendations for Regork

For Regork, these insights suggest three key strategies. First, the company should continue investing in high-performing private label categories, particularly Fuel, Packaged Goods, and Grocery, where private labels already hold a competitive edge. Offering bulk discounts, loyalty incentives, or premium packaging could further enhance store-brand loyalty. Second, in high-trust national brand categories like Cosmetics, Nutrition, and Drug GM, Regork can introduce premium private label branding, focusing on expert endorsements, quality assurances, and product transparency to build consumer confidence. Lastly, expanding private label offerings in balanced categories like Deli and Produce through taste tests, product bundling, and targeted promotions could encourage more customers to try and switch to store-brand alternatives.

By leveraging these insights, Regork can strengthen its private label strategy, drive higher profitability, and improve customer retention in key grocery departments.

Household Income Effect on Brand Choice

1. Spending Distributions by Income

# load libraries
library(tidyverse)

# join: transactions + products + demographics
# summarize income levels and brand type

spending_data <- transactions %>%
  inner_join(products, by = "product_id") %>%
  inner_join(demographics, by = "household_id") %>%
  mutate(
    label_type = ifelse(brand == "Private", "Private Label", "National Brand"),
    total_spending = sales_value
  ) %>%
  filter(!is.na(income), total_spending > 0) %>%  # Remove missing income and zero spending
  group_by(household_id, income, label_type) %>%
  summarise(total_spending = sum(total_spending), .groups = "drop")  # Sum spending per household

# convert income to ordered factor for sorting

spending_data$income_bracket <- factor(spending_data$income, 
                                       levels = c("Under 15K", "15-24K", "25-34K", "35-49K", 
                                                  "50-74K", "75-99K", "100-124K", "125-149K", "150K+"))

ggplot(spending_data, aes(x = income_bracket, y = total_spending / 1000, fill = label_type)) +
  geom_boxplot(outlier.shape = NA, alpha = 0.7) +  # Hide extreme outliers
  
  # scale y-axis to show in thousands
  
  scale_y_continuous(labels = scales::label_comma(suffix = "K"), limits = c(0, 9)) +  
  
  # graph components
  
  scale_fill_manual(values = c("Private Label" = "#2E86C1", "National Brand" = "#E74C3C")) +
  labs(
    title = "Distribution of Spending by Income Level",
    subtitle = "Comparing spending behavior for private and national brands",
    x = "Income Level",
    y = "Total Spending ($ Thousands)",
    fill = "Brand Type"
  ) +
  
  # formatting theme
  
  theme_minimal() +
  theme(
    text = element_text(size = 10),
    plot.title = element_text(size = 14, hjust = 0, face = "bold"),
    plot.subtitle = element_text(size = 10, hjust = 0),  
    axis.title.x = element_text(size = 12),
    axis.title.y = element_text(size = 12),
    axis.text.x = element_text(size = 10, angle = 45, hjust = 1), 
    axis.text.y = element_text(size = 10),
    legend.position = "bottom"
  )

Analysis

The box plot illustrates the distribution of total spending across different income levels, comparing private label and national brand purchases. Across all income brackets, national brands consistently show higher spending compared to private labels, as reflected in the higher medians and wider interquartile ranges (IQRs). This suggests that, regardless of income level, consumers tend to allocate more of their budget to national brands.

Interestingly, lower-income households (under $15K) exhibit a more concentrated spending range with lower overall variance, indicating that their spending behavior is relatively constrained. As income levels rise, spending increases, particularly on national brands, with upper-income groups (e.g., $125K+) showing a greater spread, suggesting diverse purchasing habits within this segment. The presence of outliers in higher-income brackets further indicates that some households allocate significantly more to national brands compared to others in the same income group.

Despite the trend favoring national brands, private label spending remains relatively stable across income levels, suggesting a baseline demand that does not significantly fluctuate with income. This could imply that private label products serve as a consistent budget-friendly option for all consumers, while national brand purchases increase as discretionary income rises.

Recommendations for Regork

Retailers should focus on promoting private label products as a cost-effective alternative, especially for lower-income consumers who demonstrate steady spending patterns on these brands. Introducing premium-tier private label options for higher-income shoppers could capture additional market share, particularly in categories where national brands dominate in order to increase their profit margins.

2. Market Share by Income Level

# join data: transactions + products + demographics

customer_data <- transactions %>%
  left_join(products, by = "product_id") %>%
  left_join(demographics, by = "household_id")

# aggregate total spending by income level and brand type

income_brand_sales <- customer_data %>%
  group_by(income, brand) %>%
  summarise(total_sales = sum(sales_value, na.rm = TRUE), .groups = "drop") %>%
  mutate(brand = ifelse(brand == "Private", "Private Label", "National Brand"))

# calculate percentage of total sales within each income level

income_brand_sales <- income_brand_sales %>%
  group_by(income) %>%
  mutate(percentage = total_sales / sum(total_sales) * 100) %>%
  ungroup()

# reorder income levels 

income_levels <- c("Under 15K", "15-24K", "25-34K", "35-49K", "50-74K", 
                   "75-99K", "100-124K", "125-149K", "150K+")

income_brand_sales$income <- factor(income_brand_sales$income, levels = income_levels)

# create stacked bar chart

ggplot(income_brand_sales, aes(x = income, y = percentage, fill = brand)) +
  geom_bar(stat = "identity", width = 0.7) +
  scale_y_continuous(labels = scales::percent_format(scale = 1), limits = c(0, 100)) +
  scale_fill_manual(values = c("Private Label" = "#2E86C1", "National Brand" = "#E74C3C")) +
  
  # graph components
  
  labs(
    title = "Market Share of Private Label vs. National Brands by Income Level",
    subtitle = "Comparing the percentage of sales for Private Label & National Brands across income groups",
    x = "Income Level",
    y = "Percentage of Total Sales",
    fill = "Brand Type"
  ) +

  # formatting theme
  
  theme_minimal() +
  theme(
    text = element_text(size = 10),
    plot.title = element_text(size = 14, hjust = 0, face = 'bold'),
    plot.subtitle = element_text(size = 10, hjust = 0, face = "italic"),  
    axis.title.x = element_text(size = 12),  
    axis.title.y = element_text(size = 12),  
    axis.text.x = element_text(size = 10, angle = 45, hjust = 1),  
    axis.text.y = element_text(size = 10),
    legend.position = "bottom",
    legend.text = element_text(size = 10)
  ) +
  
  guides(fill = guide_legend(reverse = TRUE))

Analysis

The stacked bar chart provides insights into how private label and national brand sales are distributed across different income levels. Overall, private label products account for a smaller share of total spending in all income brackets, with national brands consistently dominating. However, there is a noticeable trend where lower-income groups allocate a higher percentage of their spending to private label goods compared to higher-income groups. The share of private label sales is highest in the lowest income bracket (Under 15K) and gradually decreases as income increases, though the difference is not substantial. Interestingly, the highest income brackets still show a consistent portion of private label purchases, suggesting that even higher-income consumers find value in private label products.

The presence of “NA” in income levels indicates that some households did not report their income, and this segment shows a significantly different purchasing pattern, with a much larger proportion of private label sales. This could be due to factors such as discount store shopping behavior or alternative economic circumstances not captured by income data. Given that private label products offer cost savings, price sensitivity likely plays a role in influencing brand preference, particularly among lower-income consumers.

Recommendations for Regork

Retailers should consider targeted marketing efforts to further increase private label adoption in lower-income segments while also reinforcing value-driven messaging for higher-income consumers. Promotional campaigns emphasizing affordability and quality could resonate well with lower-income groups, encouraging continued preference for private label brands. Meanwhile, for higher-income shoppers, retailers could position premium private label products as a competitive alternative to national brands, leveraging quality perception rather than just cost savings.

Discount Effect on Brands

1. Brand Usage of Discount Levels

# laod libraries

library(ggplot2)
library(dplyr)

# join: transactions + products
# summarize discounts by brand

data_joined <- transactions %>%
  inner_join(products, by = "product_id") %>%
  filter(coupon_disc > 0 & coupon_disc < quantile(coupon_disc, 0.99))  # Remove extreme outliers

# create the density plot
ggplot(data_joined, aes(x = coupon_disc, fill = brand)) +
  geom_density(alpha = 0.5, adjust = 1.2) +  # Density plot with transparency
  scale_fill_manual(values = c("Private" = "#2E86C1", "National" = "#E74C3C")) +  # Custom colors
 
  # graph components
   labs(
    title = "Distribution of Discounts for Private Label vs. National Brands",
    subtitle = "Comparing the frequency of discount levels across brand types",
    x = "Coupon Discount ($)",
    y = "Density",
    fill = "Brand Type"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 14, hjust = 0),  
    plot.subtitle = element_text(size = 10, hjust = 0, face = "italic"), 
    axis.title = element_text(size = 12),  
    axis.text = element_text(size = 10),  
    legend.position = "bottom"
  )

Analysis

The density plot reveals distinct discounting patterns between national brands and private labels. Private label brands tend to cluster around higher discount values, particularly in the range of $0.30 to $0.40, with a sharper peak than national brands. This suggests that private labels rely more heavily on structured discounts to attract price-sensitive consumers. Meanwhile, national brands show a broader distribution of discount values, with a notable presence at lower discount levels. This could indicate that while national brands do offer discounts, they are less reliant on them compared to private labels, potentially leveraging brand recognition and loyalty instead.

The overlap between private label and national brand discounts in the higher range suggests that when national brands discount aggressively, they compete more directly with private labels. However, the higher density of private label discounts in specific price points implies a more predictable and structured pricing strategy designed to convert price-sensitive shoppers consistently.

Recommendations for Regork

Retailers and private label managers should take advantage of these findings by optimizing their discounting strategy. Since private labels are already benefiting from a structured discount model, they should explore whether deeper or more frequent promotions would lead to higher conversions, particularly in categories where national brands also compete aggressively on price. Additionally, private labels should test loyalty-driven discounts to encourage repeat purchases rather than one-time conversions.

Summary of Discount Transactions

# Load necessary libraries
library(dplyr)
library(knitr)
library(kableExtra)

# Join transactions with product data to link brands
discount_data <- transactions %>%
  inner_join(products, by = "product_id") %>%
  filter(coupon_disc > 0)  # Keep only transactions where a discount was applied

# Create a summary table of discounts by brand type
discount_summary <- discount_data %>%
  group_by(brand) %>%
  summarise(
    `Avg Discount` = round(mean(coupon_disc, na.rm = TRUE), 2),
    `Median Discount` = round(median(coupon_disc, na.rm = TRUE), 2),
    `Discounted Transactions` = n(),
    `Total Transactions` = nrow(transactions %>% inner_join(products, by = "product_id")),
    `Percentage of Discounted Transactions` = round((`Discounted Transactions` / `Total Transactions`) * 100, 2)
  ) %>%
  arrange(desc(`Avg Discount`))

# Display as a formatted table
discount_summary %>%
  kable("html", escape = FALSE) %>%  # Prevents unwanted escape characters
  kable_styling("striped", full_width = FALSE, bootstrap_options = c("hover", "condensed", "responsive")) %>%
  column_spec(1, bold = TRUE) %>%  # Make brand names bold
  row_spec(0, bold = TRUE, background = "#f7f7f7")  # Style header row

brand	Avg Discount	Median Discount	Discounted Transactions	Total Transactions	Percentage of Discounted Transactions
Private	2.05	1.50	271	1464471	0.02
National	1.03	0.75	18649	1464471	1.27

Analysis

The table reveals a stark difference in discounting strategies between Private Label and National Brands. Private Label products receive a higher average ($2.05) and median ($1.50) discount per transaction compared to National Brands ($1.03 average, $0.75 median). However, despite the higher discount per transaction, the number of transactions that received discounts for Private Label is extremely low (271 transactions out of 1.46M total), making up only 0.02% of total transactions. In contrast, National Brands have significantly more discounted transactions (18,649), comprising 1.27% of all transactions, despite offering lower individual discounts.

This suggests that Private Label products are either infrequently discounted or that when discounts are applied, they are targeted at a much smaller customer base. Meanwhile, National Brands appear to leverage discounts more broadly, potentially as a volume-based promotional strategy.

Recommendations for Regork

Given the data, Regork should consider increasing the frequency of Private Label discounts to better compete with National Brands. While Private Label products already offer a higher discount per transaction, the low volume of discounted transactions means that price-sensitive customers may not be incentivized to switch. Implementing more frequent but slightly lower-value discounts on Private Label could encourage trial and retention among customers accustomed to National Brands.

Additionally, targeting discount campaigns to specific demographics (such as lower-income households, identified in previous analyses) could be an effective strategy to drive conversion. Given the low rate of Private Label discounts, it’s also worth testing bundling or loyalty-based discount strategies to increase engagement and long-term switching behavior. Finally, A/B testing different discount structures (e.g., percentage-based vs. fixed-value discounts) could provide insights into the most effective promotional strategies for driving Private Label adoption.

Summary

Problem Statement

The primary objective of this analysis was to assess the performance of Private Label brands compared to National Brands, identify the factors influencing consumer purchasing behavior, and determine strategies to improve Private Label sales. Specifically, we aimed to understand Private Label’s market share across different product categories, how income levels influence brand choice, the impact of discounts on consumer preference, and potential brand loyalty insights.

Approach to Analysis

To address these questions, we integrated three key datasets:

Transactions: Contained purchase-level data, including product_id, household_id, total purchase amount, and coupon discounts.
Products: Merged using product_id to associate each transaction with its respective brand (Private Label or National Brand).
Demographics: Merged using household_id to link consumer transactions with household income levels.

We applied the following analytic approach:

Overall Market Share Analysis: A pie chart was created to visualize the proportion of total sales accounted for by Private Label vs. National Brands.
Category-Level Performance: A stacked bar chart was used to compare Private Label penetration across different product departments.
Income-Based Purchasing Behavior: A box plot was employed to assess total spending distribution by income level for both brand types.
Brand Market Share by Income Level: Another stacked bar chart analyzed Private Label vs. National Brand sales distribution across income groups.
Impact of Discounts: A density plot was used to examine how discount distribution differed between Private Label and National Brands, identifying whether Private Label products receive more or deeper discounts.

Insights

The analysis revealed that Private Label products account for only 28.3% of total sales, indicating that despite their affordability, consumers still favor National Brands. This suggests that brand perception, product quality, or marketing influence may play a significant role in driving purchasing decisions. However, Private Label is not without its strengths—it has successfully captured meaningful market share in select categories, proving that price-sensitive consumers do exist, but their engagement is not uniform across all product types.

Looking deeper into category-level performance, we found that certain departments have strong Private Label penetration, particularly in Grocery and Packaged Goods, where store brands are more common and accepted by consumers. However, categories such as Meat, Seafood, and Cosmetics overwhelmingly favor National Brands, likely due to perceived quality differences or brand loyalty in these segments. This highlights an opportunity for Private Label to expand strategically into categories where consumer trust is a key purchasing factor, whether through improved product positioning or targeted marketing efforts.

Income-based spending behavior also provided an interesting dynamic. While higher-income households allocate greater total spending to National Brands, Private Label’s share remains relatively stable across income brackets. This suggests that while wealthier consumers are willing to pay a premium for established brands, budget-conscious shoppers continue to rely on Private Label at similar rates regardless of earnings. However, given that Private Label purchases are typically associated with lower transaction values, it raises the question of whether these consumers are making repeat purchases or simply opting for Private Label products only in specific cases rather than consistently.

Lastly, the impact of discounting strategies was critical in understanding consumer price sensitivity. The density plot of discount distributions showed that Private Label products receive frequent and moderate discounts, but National Brands are also aggressive in their pricing strategies. Rather than creating a strong competitive edge, these frequent National Brand discounts may be neutralizing Private Label’s primary advantage—price. If consumers see comparable savings on National Brands, the incentive to switch to Private Label diminishes, particularly if brand trust and perceived quality remain differentiating factors.

Together, these insights tell a compelling story about the nuanced challenges Private Label faces. It has carved out a steady share of the market, but growth opportunities remain largely in category expansion, differentiated marketing, and strategic discounting to further shift consumer preference.

Recommendations for Regork

The business implications of this analysis suggest that Regork should refine its Private Label strategy to drive growth without directly competing in areas where National Brands hold an unshakable advantage. Instead of trying to win across all categories, Regork should prioritize expansion in departments where Private Label already has traction, such as Grocery, Packaged Goods, and Store Supplies, while exploring strategic entry into high-margin but underpenetrated categories like Meat and Seafood, where consumers may be open to alternatives with the right positioning.

Strategic Pricing and Discounts

The findings on discounting behavior reveal that frequent National Brand discounts may be diluting Private Label’s price advantage. To counteract this, Regork should consider shifting Private Label’s discounting strategy from broad, frequent markdowns to targeted promotions that drive trial and long-term adoption. For example, instead of general discounts, Regork could implement loyalty-based discounts for repeat Private Label purchases or bundle Private Label items with high-demand National Brand products to increase exposure and adoption.

Targeted Marketing by Income & Category

Income-based spending behavior suggests that Private Label loyalty does not significantly increase with rising income, meaning that price-conscious shopping exists across all income levels. This presents an opportunity to tailor marketing strategies based on consumer segments rather than just price-based messaging. For lower-income shoppers, promotions should reinforce the cost savings of Private Label, while for mid-to-upper-income consumers, messaging should focus on quality assurance, product innovation, and ethical sourcing to remove stigma around Private Label as a purely budget option.

Product Development & Brand Differentiation

The department-level market share analysis suggests that brand trust is a key barrier in higher-end categories like Cosmetics, Meat, and Seafood. To increase adoption in these areas, Regork should invest in elevating the perceived quality of Private Label through better branding, transparent sourcing, and enhanced packaging. Additionally, exclusive product lines that differentiate Private Label from National Brands—rather than simply imitating them—could drive interest and make Private Label a more attractive alternative.

Final Recommendation to the CEO

Regork should double down on its strongest categories, where Private Label is already winning, while strategically expanding into high-margin but underpenetrated areas with the right product positioning. Instead of relying solely on price-driven competition, the company should refine its discounting strategy to drive repeat purchases, market Private Label as a trusted alternative rather than a budget option, and create differentiation through innovation and branding. These efforts will help Regork grow its Private Label market share in a sustainable way, rather than simply competing on price alone

Limitations

While this analysis provides actionable insights, there are limitations. First, it focuses on transactional data without incorporating qualitative factors such as brand perception or consumer sentiment. Second, product availability was not considered, meaning National Brands may dominate certain categories due to stocking constraints rather than preference. Lastly, further analysis on repeat purchase behavior and long-term customer retention would strengthen insights into Private Label loyalty. Future research could integrate loyalty program data, competitor benchmarking, and customer feedback to refine recommendations further.

Regork Grocery Chain: Private Label vs. National Brand Preference

Sanjana Chenna

March 2, 2025

Introduction

Understanding the Business Problem

Methodology and Data Sources

Steps in Our Analysis

Expected Impact & Proposed Solution

Resources

Packages

Libraries

Data

Data Overview

Transactions Table

Products Table

Demographics Table

Exploratory Data Analysis

Household Income Effect on Brand Choice

1. Spending Distributions by Income

Analysis

Recommendations for Regork

Discount Effect on Brands

1. Brand Usage of Discount Levels

Analysis

Recommendations for Regork

Summary of Discount Transactions

Analysis

Recommendations for Regork

Summary

Problem Statement

Approach to Analysis

Insights

Recommendations for Regork

Strategic Pricing and Discounts

Targeted Marketing by Income & Category

Product Development & Brand Differentiation

Final Recommendation to the CEO

Limitations