Regork’s Business Opportunity

Introduction

As a data scientist as Regork, I have been asked to identify potential growth opportunities that Regork can invest in to grow revenue. Regork prides itself on being a premier grocer that adds value to consumer’s lives. This analysis explores which products Regork can target based on consumer income level for increased sales through marketing activities that will add value for Regork and its consumers.

Analysis Question: Which products can we target based on the customer’s income level to increase the quantity of items purchased per basket (basket size)?

To answer this question fully, I’ll address these additional questions along the way:

Q1: What is the average quantity of products purchased per basket?
Q2: Do larger basket sizes lead to higher sales?
Q3: Which brand of products has a higher sales value?
Q4: What are the most common products purchased by each income level?

Definitions of key variables:

basket_id: uniquely identifies each purchase occassion
product_id: uniquely identifies each product
quantity: describes how many of the unique product was purchased
sales_value: describes the amount in dollars Regork received from the sale after any discounts applied
department: groupings of similar products
brand: Indicator the describes if the product comes from a private or national brand
product_category: groupings of similar products at a level below department
product_type: groupings of products at a level below product_category

Analysis Setup

Libraries

library(tidyverse, quietly = TRUE) # compilation of packages used for tidying and visualizing data
library(completejourney, quietly = TRUE) # grocery store transaction, product, and demographic data
library(RColorBrewer) # improved colors for plots
library(reactable) # for displaying data tables

palette <- brewer.pal(n = 9, name = "Pastel1")

Tidying Data: Joining, Filtering, Mutating, and Factor Collapse

# Retrieving and creating data frame from completejourney library, then
# joining transactions, products, and demographics data tables
df <- get_transactions() %>%
  inner_join(demographics, by = "household_id") %>%
  left_join(products, by = "product_id") %>%
  select(household_id, basket_id, product_id, quantity, sales_value, income, department, brand, product_category, product_type) %>% # narrow down to variables of interest
  mutate(income = fct_collapse(
    income,
    "low income" = c("Under 15K", "15-24K", "25-34K", "35-49K"),
    "middle income" = c("50-74K", "75-99K", "100-124K"),
    "upper income" = c("125-149K", "150-174K", "175-199K", "200-249K", "250K+")
  )) # recode income levels for ease of analysis

Starting Point

# starting point

tibble(df)

Exploratory Data Analysis

Q1

What is the average quantity of products purchased per basket?

There are high outliers in the quantity field of this data set relating to two categories in the product_category column. The observations relating to COUPON/MISC ITEMS and FUEL have been filtered out. From this analysis we learn that on average, customers purchase 15 items (rounded up) on each trip to Regork.

df <- df %>%
  filter(product_category != "COUPON/MISC ITEMS") %>%
  filter(product_category != "FUEL")

# what is the average quantity of products per basket?
df %>%
  group_by(basket_id) %>%
  summarize(total_quantity = sum(quantity, na.rm = TRUE)) %>%
  summary(total_quantity)

##   basket_id         total_quantity  
##  Length:72992       Min.   :  0.00  
##  Class :character   1st Qu.:  3.00  
##  Mode  :character   Median :  8.00  
##                     Mean   : 14.59  
##                     3rd Qu.: 19.00  
##                     Max.   :220.00

Is the average basket size different for each income level?

The following analysis demonstrates that average basket size does change by income level. The average basket sizes based on income levels are:

Low Income: 13.03 items
Middle Income: 15.6 items
Upper Income: 17.89 items

Low Income

df %>%
  filter(income == "low income") %>%
  group_by(basket_id) %>%
  summarize(total_quantity = sum(quantity, na.rm = TRUE)) %>%
  summary(total_quantity)

##   basket_id         total_quantity  
##  Length:36715       Min.   :  0.00  
##  Class :character   1st Qu.:  3.00  
##  Mode  :character   Median :  7.00  
##                     Mean   : 13.03  
##                     3rd Qu.: 16.00  
##                     Max.   :220.00

Middle Income

df %>%
  filter(income == "middle income") %>%
  group_by(basket_id) %>%
  summarize(total_quantity = sum(quantity, na.rm = TRUE)) %>%
  summary(total_quantity)

##   basket_id         total_quantity 
##  Length:27340       Min.   :  0.0  
##  Class :character   1st Qu.:  3.0  
##  Mode  :character   Median :  9.0  
##                     Mean   : 15.6  
##                     3rd Qu.: 21.0  
##                     Max.   :219.0

Upper Income

df %>%
  filter(income == "upper income") %>%
  group_by(basket_id) %>%
  summarize(total_quantity = sum(quantity, na.rm = TRUE)) %>%
  summary(total_quantity)

##   basket_id         total_quantity  
##  Length:8937        Min.   :  0.00  
##  Class :character   1st Qu.:  4.00  
##  Mode  :character   Median : 10.00  
##                     Mean   : 17.89  
##                     3rd Qu.: 25.00  
##                     Max.   :210.00

Q2

Do larger basket sizes lead to higher sales?

We can see from this scatter plot a positive correlation between basket size and sales value. This affirms that if we can increase basket size through targeted marketing campaigns, revenue will increase.

df %>%
  group_by(basket_id) %>%
  summarize(
    basket_size = sum(quantity, na.rm = TRUE),
    total_sale = sum(sales_value, na.rm = TRUE)) %>%
  ggplot(aes(total_sale, basket_size)) +
  geom_point(alpha = .1) +
  geom_smooth(method = "lm", se = FALSE, color = palette[1]) +
  geom_smooth(method = "auto", se = FALSE, color = palette[2]) +
  coord_flip() +
  scale_x_continuous(labels = scales::dollar) +
  scale_y_continuous(labels = scales::comma) +
  labs(
    title = "Relationship Between Basket Size & Total Sale",
    subtitle = "Based on Regork transaction data from 2016-2017",
    y = "Basket Size",
    x= "Total Sale"
  )

Q3

Which brand of products has a higher sales value?

As we work to determine which products we could target for consumers to purchase more of, it is relevant to ask how we might maximize the increase in sales value. One way to do that is to compare the sales_value of products based on whether it comes from a national or private brand.

Sales_value represents the dollar amount Regork receives from the sale of a particular product. We can see from this analysis that Regork makes more on average from selling national brand products.

National

# Average Sales Value per National Brand Product
asv_national <- df %>%
  filter(brand == "National") %>%
  summarise(
    total_sale = sum(sales_value, na.rm = TRUE),
    total_quantity = sum(quantity, na.rm = TRUE)
    ) %>%
  mutate(revenue_per_item = round((total_sale / total_quantity), 2))

reactable(asv_national, bordered = TRUE, striped = TRUE, compact = TRUE)

Private

# Average Sales Value per Private Brand Product
asv_private <- df %>%
  filter(brand == "Private") %>%
  summarise(
    total_sale = sum(sales_value, na.rm = TRUE),
    total_quantity = sum(quantity, na.rm = TRUE)
    ) %>%
  mutate(revenue_per_item = round((total_sale / total_quantity), 2))

reactable(asv_private, bordered = TRUE, striped = TRUE, compact = TRUE)

Q4

What are the most common products purchased by each income level?

This section of the analysis identifies the top 30 products purchased by each income level. Why 30? 30 represents roughly double the average number of products purchased on each visit to Regork. Remember that the average basket size across all income levels is ~15 items. By examining the top 30 products, we can identify the following:

the top 15 products relate to the products customers are already most likely to purchase
the next 15 identify products that consumer’s already regularly purchase (albeit less frequently)
the opportunity here is that if we can create promotions that incentivize consumers to purchase products in the ranking of 16-30 whenever they purchase items in the ranking of 1-15, overall basket size will increase, and therefore total sales.

Top 30 Products: Low Income

top_low_prods <- df %>%
  filter(income == "low income") %>%
  group_by(product_id) %>%
  summarize(total_quantity = sum(quantity)) %>%
  slice_max(total_quantity, n = 30) %>%
  left_join(products, by = "product_id") %>%
  select(-c(manufacturer_id, package_size))

reactable(top_low_prods, filterable = TRUE, searchable = TRUE, defaultPageSize = 5, highlight = TRUE, bordered = TRUE, striped = TRUE, compact = TRUE)

p1 <- top_low_prods %>%
  slice(1:15) %>%
  ggplot(aes(x = total_quantity, y = product_type)) +
  geom_point(aes(color = department), size = 3) +
  geom_segment(
    aes(x = 0, xend = total_quantity, y = product_type, yend = product_type, color = department),
    linewidth = 0.1
  ) +
  labs(title = "Top 15 Products Purchased by Low Income Households",
       subtitle = "Based on Regork transaction data from 2016-2017",
    x = "Total Quantity Purchased",
    y = "Product"
    
  )

p1

Top 30 Products: Middle Income

top_mid_prods <- df %>%
  filter(income == "middle income") %>%
  group_by(product_id) %>%
  summarize(total_quantity = sum(quantity)) %>%
  slice_max(total_quantity, n = 30) %>%
  left_join(products, by = "product_id") %>%
  select(-c(manufacturer_id, package_size))

reactable(top_mid_prods, filterable = TRUE, searchable = TRUE, defaultPageSize = 5, highlight = TRUE, bordered = TRUE, striped = TRUE, compact = TRUE)

p2 <- top_mid_prods %>%
  slice(1:15) %>%
  ggplot(aes(x = total_quantity, y = product_type)) +
  geom_point(aes(color = department), size = 3) +
  geom_segment(
    aes(x = 0, xend = total_quantity, y = product_type, yend = product_type, color = department),
    linewidth = 0.1
  ) +
  labs(title = "Top 15 Products Purchased by Middle Income Households",
       subtitle = "Based on Regork transaction data from 2016-2017",
    x = "Total Quantity Purchased",
    y = "Product"
    
  )

p2

Top 30 Products: Upper Income

top_up_prods <- df %>%
  filter(income == "upper income") %>%
  group_by(product_id) %>%
  summarize(total_quantity = sum(quantity)) %>%
  slice_max(total_quantity, n = 30) %>%
  left_join(products, by = "product_id") %>%
  select(-c(manufacturer_id, package_size))

reactable(top_up_prods, filterable = TRUE, searchable = TRUE, defaultPageSize = 5, highlight = TRUE, bordered = TRUE, striped = TRUE, compact = TRUE)

p3 <- top_up_prods %>%
  slice(1:15) %>%
  ggplot(aes(x = total_quantity, y = product_type)) +
  geom_point(aes(color = department), size = 3) +
  geom_segment(
    aes(x = 0, xend = total_quantity, y = product_type, yend = product_type, color = department),
    linewidth = 0.1
  ) +
  labs(title = "Top 15 Products Purchased by Upper Income Households",
       subtitle = "Based on Regork transaction data from 2016-2017",
    x = "Total Quantity Purchased",
    y = "Product"
  )

p3

Summary

Summary of Problem Statment

My goal was to analyze which products Regork could target in marketing campaigns to increase overall revenue by increasing basket size. By utilizing the available transaction data I was able to identify 15 products for each income level Regork can target for marketing and promotion offerings.

Methodology

In order to address the question of this analysis I broke the question down into sub-sections. I first sought to understand how many items customers tend to purchase in a single visit, the average of which is ~15 (this varies slightly by income level). Although it may seem obvious, I then confirmed that larger basket sizes do in fact correlate with larger sales_values, which means greater revenue to Regork. Next, I sought to understand if national brand or private brand goods yield greater revenue for Regork. This analysis showed that national brand product sales have a greater positive impact to our revenue. Finally, I sought to understand the 30 most common products purchased by each income level.

In this methodology is an assumption that it is easier to get customers to purchase more of products they already purchase than to push a customer to a new product altogether.

Findings

Based on this analysis, Regork should target the following products for cross-promotions to each income level:

Low Income Households

top_low_prods %>%
  slice(16:30) %>%
  select(product_id, product_type) %>%
  reactable(defaultPageSize = 10, highlight = TRUE, bordered = TRUE, striped = TRUE, compact = TRUE)

Middle Income Households

top_mid_prods %>%
  slice(16:30) %>%
  select(product_id, product_type) %>%
  reactable(defaultPageSize = 10, highlight = TRUE, bordered = TRUE, striped = TRUE, compact = TRUE)

Upper Income Households

top_up_prods %>%
  slice(16:30) %>%
  select(product_id, product_type) %>%
  reactable(defaultPageSize = 10, highlight = TRUE, bordered = TRUE, striped = TRUE, compact = TRUE)

Implications for Consumers

Based on this analysis, I propose that Regork designs promotions (i.e. discounts) in the following structure: Offer discounts so that when a customer purchases an item in the top 15 for their income level, they receive a discount on an item from the remaining items in the top 30 products, provided that this purchase is made during the same trip. Such a promotion would have the effect of increasing basket_size and therefore revenue for Regork; and for the consumer it would help them save on the products they already know and love. The paired item in the discount should be from a national brand to maximize revenue generation.

Limitations

This analysis makes an assumption that is worth testing to determine if it holds. Further analysis could also be done to maximize this approach and make targeted promotions even more individualized to the consumer. For example, the pairing of products for discounts could be changed by store_id to improve offerings at a local level. It would also be worth noting if there are significant changes if income levels were split into more subgroups (e.g. 5 groups, instead of 3).

About the Author

Contact Information

Email - moermojl@ucmail.uc.edu

LinkedIn - jmoermond

Education

Master of Arts

Educational Leadership

Eastern Michigan University

Graduated - May 2015

Bachelor of Arts

Communication Studies

Northern Kentucky University

Graduated - December 2012

Employment

University of Cincinnati

Assistant Director, Resident Education and Development January 2022 to Present

Serve as the primary liaison to the Office of the Dean of Students, Office of Conduct and Community Standards, and Title IX on all departmental student conduct and community standards matters
Provide ongoing professional development for staff on best practices and trends in student conduct practice
Develop and facilitate training on university policy, procedures, and guidelines
Collect, maintain and distribute information on student conduct in housing including charges, turn-around time, demographics, outstanding cases and sanctions and behavioral trends
Investigate and ensure timely intervention with residents of concern including the creation of intervention plans and follow-up to ensure compliance with recommendations

Area Coordinator, Resident Education and Development August 2020 to December 2021

Managed daily residence life operations for a neighborhood of 2,100 residents
Provided direct and indirect supervision to Community Coordinators and Resident Advisors
Adjudicated student conduct cases within expected timelines and protocol
Chaired Inclusive Excellence Committee to implement monthly diversity, equity, and inclusion development opportunities
Chaired the RA Recruitment Committee and led efforts to increase the number, quality, and diversity of RA applicants. Revised selection process to be more inclusive, reduce evaluator bias, and align with department values.
Implemented Mental Health First Aid training for department professional staff.

Eastern Michigan University

Complex Director, Housing and Residence Life October 2016 to August 2020

Southern Illinois University Edwardsville

Community Director, University Housing July 2015 to October 2016

2023 BANA 7025 Final Project

Joshua L. Moermond

2023-02-23

Regork’s Business Opportunity

Introduction

Analysis Setup

Exploratory Data Analysis

Q1

Is the average basket size different for each income level?

Low Income

Middle Income

Upper Income

Q2

Q3

National

Private

Q4

Top 30 Products: Low Income

Top 30 Products: Middle Income

Top 30 Products: Upper Income

Summary

Summary of Problem Statment

Methodology

Findings

Low Income Households

Middle Income Households

Upper Income Households

Implications for Consumers

Limitations

About the Author

Contact Information

Education

Master of Arts

Bachelor of Arts

Employment

University of Cincinnati

Eastern Michigan University

Southern Illinois University Edwardsville