Regork Growth Opportunities

Riley Brandenburg | 10/3/2023

Overview

Growth Opportunities

How does the size of a package and category type correlate to the total sales value of the transaction in accordance with household size? By analyzing how package size and department category influences sales value, Regork will be able to understand how promotions influence order volume and therefore increase sales.

My Solution

By joining Transactions / Demographics / Products I was able to pin point exactly what package sizes and catagories yield a greater or lesser return. From here Regork is able to optimize promotions to increase overall sales value.

Packages Required

completejourney - provides access to data sets characterizing household level transactions over one year from a group of 2,469 households.

ggplot2 -data visualization package.

dplyr - aims to provide a function for each basic verb of data manipulation.

tidyverse - an R programming package that helps to transform and better present data.

#Load the packages I plan to use:
library(completejourney)
## Welcome to the completejourney package! Learn more about these data
## sets at http://bit.ly/completejourney.
library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ lubridate 1.9.2     ✔ tibble    3.2.1
## ✔ purrr     1.0.2     ✔ tidyr     1.3.0
## ✔ readr     2.1.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
#Load the full data sets of Transactions, Promotions, Products, and Demographics:
Transactions <- get_transactions()
Demos <- demographics
Promotions <- get_promotions()
Prods <- products

#Joining data:

Trans_Demos <- inner_join(Demos, Transactions, by = join_by("household_id"))

Transs_Demos_Prods <- left_join(Trans_Demos, Prods)
## Joining with `by = join_by(product_id)`
glimpse(Transs_Demos_Prods)
## Rows: 828,850
## Columns: 24
## $ household_id          <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "1"…
## $ age                   <ord> 65+, 65+, 65+, 65+, 65+, 65+, 65+, 65+, 65+, 65+…
## $ income                <ord> 35-49K, 35-49K, 35-49K, 35-49K, 35-49K, 35-49K, …
## $ home_ownership        <ord> Homeowner, Homeowner, Homeowner, Homeowner, Home…
## $ marital_status        <ord> Married, Married, Married, Married, Married, Mar…
## $ household_size        <ord> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, …
## $ household_comp        <ord> 2 Adults No Kids, 2 Adults No Kids, 2 Adults No …
## $ kids_count            <ord> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ store_id              <chr> "436", "436", "436", "436", "436", "436", "436",…
## $ basket_id             <chr> "31317046240", "31317046240", "31317046240", "31…
## $ product_id            <chr> "823721", "832990", "854920", "856942", "868401"…
## $ quantity              <dbl> 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, …
## $ sales_value           <dbl> 2.99, 5.98, 1.49, 2.99, 0.59, 2.50, 0.34, 2.42, …
## $ retail_disc           <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.89, 0.00, 0.00, …
## $ coupon_disc           <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, …
## $ coupon_match_disc     <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, …
## $ week                  <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, …
## $ transaction_timestamp <dttm> 2017-01-07 13:55:24, 2017-01-07 13:55:24, 2017-…
## $ manufacturer_id       <chr> "317", "1266", "1251", "159", "593", "586", "179…
## $ department            <chr> "GROCERY", "GROCERY", "GROCERY", "GROCERY", "GRO…
## $ brand                 <fct> National, National, National, National, National…
## $ product_category      <chr> "CHEESE", "MOLASSES/SYRUP/PANCAKE MIXS", "SOUP",…
## $ product_type          <chr> "GRATED CHEESE", "MOLASSES & SYRUPS", "CONDENSED…
## $ package_size          <chr> "8 OZ", "14.5 OZ", "10.5 OZ", "16 OZ", "10 OZ", …

Visualizing the Numbers

Transs_Demos_Prods %>%
  ggplot(aes(x = package_size, y = sales_value, color = household_size)) +
  geom_point(aes(x=))+
  geom_smooth(method = "lm", se = FALSE, color = "set1") +
  scale_color_brewer() +
  labs(title = "Retailer Sales Value by Package Size",
        subtitle = "Household transaction data.",
        x = "Package Size",
        y = "Total Sales Value ($)",
        color = "Household Size")
## `geom_smooth()` using formula = 'y ~ x'

Transs_Demos_Prods %>%
  filter(sales_value < 20) %>%
  ggplot(aes(x = sales_value, color = department, fill = department)) +
  geom_density(alpha = .0004)

ggplot(Prods, aes(x = factor(department))) +  
  geom_bar(aes(y = (..count..)/sum(..count..))) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 1L))+
  ggtitle("% of Products", ) +
  xlab('Departments') +
  theme(axis.text.x=element_text(angle=45, hjust=1))+
  ylab('Percentage')
## Warning: The dot-dot notation (`..count..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(count)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Interpreting the Data

Within my 3 data visualization break-downs, we are able to identify 3 major themes.

  1. As household size increases, package size and transaction volume also increase. This data analysis is helpful in understanding market segmentation and ways Regork can best optimize targeting different demographics. Based on these findings, I would recommend promotions specifically based on targeting the size of a household. By motivating larger households to increase the volume and quality of transactions, we would see an increase in overall sale value.

  2. By visualizing how a product category contributes to overall sales value, we are able to combine our knowledge of household size with category density to increase customers overall purchasing power. By specifically offering promotions within categories like “PROD-WHS SALES” and “POSTAL CENTER” Regork will be able to increase these contributions to sales value.

  3. Its also crucial to understand what volume of products contribute to each department. Now, rather than analyzing contribution to sales value, we can understand the actual volume of products that make up each product category. Here, we see that GROCERY and DRUG GM significantly make up the highest percentage of products. Based on these findings, Regork can center their marketing campaigns around these categories to increase customer awareness of product inventory and availability.