Introduction

Problem Statement

Synopsis (business problem, how to address the problem, and how the analysis will help Regork):

The business problem we aim to solve is understanding how consumer demographics influence the purchase of national brands versus private brands. To achieve this, we will analyze total sales of both private and national brands across various demographic factors such as age, household size, income, and more. From there, we can identify trends, pinpoint key relationships between consumer backgrounds and brand preferences, and uncover insights that drive purchasing decisions.

Through our research, we found that private brand purchases generate higher profit margins for grocery chains. In addition to being more lucrative, private brands offer retailers greater control over product quality, pricing, and overall brand strategy. By identifying the demographics that tend to prefer national brands, we can develop strategies to shift their purchasing behavior toward private brands. More specifically, our goal is to use data-driven insights to understand why certain consumers favor national brands and determine how to influence their purchasing decisions.


Packages & Libraries Required

Packages:

library(tidyverse) - collection of R packages used for data manipulation, visualization, and analysis

tidyr - reshaping and cleaning data

ggplot2 - customizing data visualizations

stringr - for cleaning and transforming text data

dplyr - helpful for filtering, summarizing, and mutating data

library(completejourney) - provides access to the Complete Journey to analyze consumer transaction data

library(lubridate) - simplifies working with date and time data


# Load datasets
transactions <- completejourney::transactions_sample
products <- completejourney::products
demographics <- completejourney::demographics

# Merge data
data <- transactions %>%
  left_join(products, by = "product_id") %>%
  left_join(demographics, by = "household_id")

# Create brand classification
data <- data %>% 
  mutate(brand_type = ifelse(brand == "Private", "Private Label", "National Brand"))

Sales Overview

Total Sales by Brand Type

This graph shows the total sales in dollars for both national brands and private brands. We are able to use the results from the graph to compare the performances.

brand_sales <- data %>%
  filter(!is.na(brand_type)) %>% 
  group_by(brand_type) %>%
  summarise(total_sales = sum(sales_value), total_transactions = n()) %>%
  arrange(desc(total_sales))

ggplot(brand_sales, aes(x = brand_type, y = total_sales, fill = brand_type)) +
  geom_col() +
  labs(title = "Total Sales by Brand Type", x = "Brand Type", y = "Total Sales ($)") +
  theme_minimal()

Finding:Our analysis reveals that national brands significantly outperform private brands in total sales. The data shows that national brands generate $167,798.07 in sales, while private brands account for only $66,132.50. This means that private brands make up just 30% of total purchases, highlighting a strong consumer preference for national brands. Therefore, this information proves private brands are not being utilized equally which allows further evaluation of the demographic data to help mitigate the issue.


Demographic Analysis

Age Group

This graph compares national brands and private brands for each age group. It finds the total sales for at all 6 age levels ranging from 19 years old to over 65.

age_sales <- data %>%
  filter(!is.na(brand_type)) %>% 
  group_by(age, brand_type) %>%
  summarise(total_sales = sum(sales_value)) %>%
  arrange(desc(total_sales))


ggplot(age_sales, aes(x = age, y = total_sales, fill = brand_type)) +
  geom_col(position = "dodge") +
  labs(title = "Brand Sales by Age Group", x = "Age Group", y = "Total Sales") +
  theme_minimal()

Finding: The results from our analysis is that the age group of 45 to 54 purchases the most of both brands. Total sales for national brands in this age group was $35,571.82 and for private labels it was $13,609.09. This means people between 45 and 54 years old are making 42% of the purchases for national brands and 45% private. The next highest sales for both were the group ranging between 35 years old to 44 years old. They are about $10,000 and $3,000 behind the leading category. These two age groups combined make up 72% of national brand purchases and 80% of private label purchases, displaying they are the target market for these products.

Income Level

This graph compares national brands and private brands for each income level. It finds the total sales for all 11 income levels ranging from under $15,00 to over $250,000.

income_sales <- data %>%
  filter(!is.na(brand_type)) %>% 
  group_by(income, brand_type) %>%
  summarise(total_sales = sum(sales_value)) %>%
  arrange(desc(total_sales))


ggplot(income_sales, aes(x = income, y = total_sales, fill = brand_type)) +
  geom_col(position = "dodge") +
  labs(title = "Brand Sales by Income Level", x = "Income Level", y = "Total Sales") +
  theme_minimal()

Finding: Our findings were that the income range that purchased the most for national and private brands was $50K - $74K. National brands receive $23,922.88 and private labels make $9,053.79 from individuals within this income range. The income levels of $35K - $49K and $75K - $99K also have higher total sales, whereas all the rest of the ranges contribute low percentages.

Household Size

This graph compares national brands and private brands for each household size. It finds the total sales for all 5 household sizes ranging from 1 person to 5 and more people.

household_sales <- data %>%
  filter(!is.na(brand_type)) %>% 
  group_by(household_size, brand_type) %>%
  summarise(total_sales = sum(sales_value)) %>%
  arrange(desc(total_sales))

ggplot(household_sales, aes(x = household_size, y = total_sales, fill = brand_type)) +
  geom_col(position = "dodge") +
  labs(title = "Brand Sales by Household Size", x = "Household Size", y = "Total Sales") +
  theme_minimal()

Finding: The results show that people from a household size of 2 are the highest purchasing consumers with an individual household near in numbers. People that live in a 1 or 2 person household make up 57% of the purchases for national brands and 42% for private labels. While there are 5 different ranges for household size, these lower numbers of residents are the main group purchasing private labels.

Marital Status

This graph compares total sales of national brands and private brands for both marital statuses, married and unmarried.

marital_sales <- data %>%
  filter(!is.na(brand_type)) %>% 
  group_by(marital_status, brand_type) %>%
  summarise(total_sales = sum(sales_value)) %>%
  arrange(desc(total_sales))

ggplot(marital_sales, aes(x = marital_status, y = total_sales, fill = brand_type)) +
  geom_col(position = "dodge") +
  labs(title = "Brand Sales by Marital Status", x = "Marital Status", y = "Total Sales") +
  theme_minimal()

Finding: The data reveals that both married and unmarried individuals contribute fairly equally to total sales, though married individuals purchase slightly more of both brands. For national brands, married individuals account for $44,512.23 in sales, representing 56% of purchases within this demographic, while unmarried individuals make up the remaining 44%. Similarly, for private brands, married individuals contribute $17,398.47 in sales, making up 55% of purchases.


Promotion Analysis

Promotional Impact by Discount Type

This graph displays total sales purchased for national brands and private labels when there is a promotion.

promo_impact <- data %>%
  filter((coupon_disc > 0 | retail_disc > 0) & !is.na(brand_type)) %>%
  group_by(brand_type) %>%
  summarise(
    avg_coupon_discount = mean(coupon_disc, na.rm = TRUE),
    avg_retail_discount = mean(retail_disc, na.rm = TRUE),
    total_sales_with_promo = sum(sales_value, na.rm = TRUE)
  )

ggplot(promo_impact, aes(x = brand_type, y = total_sales_with_promo, fill = brand_type)) +
  geom_col() +
  labs(title = "Total Sales with Promotions by Brand Type", x = "Brand Type", y = "Total Sales ($)") +
  theme_minimal()

Finding: Our analysis reveals that national brands significantly outperform private brands in total sales with promotions. The data shows that national brands generate 76233.08 in sales, while private brands account for only 43711.92. However, in comparison to total sales, including both promotional and non-promotional purchases, they have a similar percentage contribution to overall revenue.

Effectiveness of Coupon vs. Retail Discounts

This graph compares national brands and private brands for different discount types, coupon discount and retail discount.

promo_type_effectiveness <- data %>%
  filter((coupon_disc > 0 | retail_disc > 0) & !is.na(brand_type)) %>%
  mutate(discount_type = ifelse(coupon_disc > 0, "Coupon Discount", "Retail Discount")) %>%
  group_by(brand_type, discount_type) %>%
  summarise(total_sales_with_promo = sum(sales_value, na.rm = TRUE)) %>%
  arrange(desc(total_sales_with_promo))

ggplot(promo_type_effectiveness, aes(x = discount_type, y = total_sales_with_promo, fill = brand_type)) +
  geom_col(position = "dodge") +
  labs(title = "Effectiveness of Coupon vs. Retail Discounts", x = "Discount Type", y = "Total Sales ($)") +
  theme_minimal()

Finding: Through the results we were able to find major differences in total sales with coupon discount and retail discount. While retail discounts had total sales of 72215.85 for national brands and 43673.10 for private brands, coupon discounts were ineffective. National brands total revenue with coupon discount was 4017.23, while private labels had none. This displays how retail coupons significantly increased total sales and was more effective in attracting consumers.

Comparison of Promotional Impact Over Time

This graph tracks the total sales with promotions for national and private brands over the time period of January 2017 to October 2017.

promo_time <- data %>%
  filter((coupon_disc > 0 | retail_disc > 0) & !is.na(brand_type)) %>%
  mutate(purchase_month = floor_date(transaction_timestamp, "month")) %>%
  group_by(purchase_month, brand_type) %>%
  summarise(total_sales_with_promo = sum(sales_value, na.rm = TRUE))

ggplot(promo_time, aes(x = purchase_month, y = total_sales_with_promo, color = brand_type)) +
  geom_line(size = 1) +
  labs(title = "Sales Impact of Promotions Over Time", x = "Month", y = "Total Sales with Promotions ($)") +
  theme_minimal()

Finding:From this line graph we were able to see trends of total sales with promotions for both brands. National brand had a fluctuating graph with a slight positive slope overall. They had high purchasing periods in March and the end of May. On the other hand, they experienced their lowest total sales in April. Like national brands, private labels had an increase in sales with many spikes and divots, but shifted downwards. It had its major peaks in the end of May, July, and August and the lowest revenue in March. However, both ended October 2017 with their highest total sales and still growing, letting us forecast future sales will be rising.

Summary & Recommendations

Key Findings

When creating our business problem we aimed to solve the understanding how consumer demographics influence the purchase of national brands versus private brands. In order to achieve this goal, we analyzed total sales of both brand types across various demographic factors, including age, household size, income, and marital status. This approach allowed us to identify trends, uncover key relationships between consumer backgrounds and brand preferences, and gain insights into the factors driving purchasing decisions.

Through our data collection, we found it interesting how both private and national brands had the same target demographic market and trends for promotions. Overall, throughout all of our research national brands always had about 250% more total sales for each category than private labels. It was beneficial finding this pattern early because it allowed us to refine our analysis, focus on key demographic segments, and better understand the impact of promotions on consumer purchasing behavior. We came to the conclusion they attract similar demographics, so we used the data given to find new ways of continuing to increase private label sales in other ways.

Proposed Strategy for Regork

What we propose: From our analysis and in order to gain more consumers that purchase private brands, we propose utilizing promotions catered towards their target market and elevating their quality and products to expand their consumers. From the data, we found our highest purchasing consumer for private brands is someone with an income between $50K and $74K, married, from a household of 2, and in the age range of 45 to 54. There are opportunities to gain more revenue from making this ideal market want to return and buy their products. Regork can use retail promotions on private-label goods frequently purchased by this group to strengthen brand affinity and drive long-term loyalty. Implementing strategies like bundled deals, loyalty program perks, or premium private-label offerings to encourage repeat purchases will help total sales continue to increase in these already highly lucrative areas. Since total sales with promotion was a large percentage of overall revenue for private brands, we believe using more visible and attractive discounts similar to national brands will be beneficial.

Other ways we believe Regork can continue to improve their private brand is by shifting focus from just price discounts to quality, exclusivity, and value perception. If private brands can achieve equal or superior quality compared to national brands while providing better value, they will be able to capture a new target market. For example, introducing specialty lines that cater to health-conscious, eco-friendly, or gourmet-focused consumers will bring in new revenue that is not being collected from either brand. Additionally, by creating new and innovative products, flavors, and packaging, Regork can create a competitive advantage against national brands. Lastly, making the in-person and online shopping experience more effective and visualizing can help reshape consumer perception and increase total sales.

Limitations & Future Work

  • Data limitations: The analysis is based on historical data and does not account for real-time market changes.
  • Consumer behavior variability: External factors such as economic conditions and brand loyalty were not directly analyzed.
  • Potential bias in promotional data: The study assumes all discounts were equally promoted to consumers.
  • Future improvements: Incorporating regional or store-specific data could refine targeting strategies.