Introduction

In any organization, profitability can be determined by two metrics–revenue and expenses. There are two ways to maximize profitability. The first is to increase revenue given that expenses remain the same. The second is to try to reduce expenses while maintaining the same revenue. Regarding the latter, there are many approaches to how this can be achieved. You can negotiate better contracts from suppliers, streamline operations and distributions, or reduce any unnecessary waste or spend. For this analysis, I decided to evaluate the impact of reducing overall inventory by removing products that yield very small sales revenue. Reducing our inventory would help us reduce expenses for several reasons. Firstly, it eliminates the initial expense of purchasing these products that do not yield many sales to recover that cost. Secondly, it helps streamline the sourcing process as it would potentially reduce the number of suppliers we need to engage with. Additionally, reducing inventory could potentially free up shelf space to be used by products that yield higher revenue which could also allow us to generate more profit. This would be a secondary effect of this conclusion, but a positive one. If we can conclude that reducing products in our inventory will reduce expenses without significantly reducing revenue, this would result in higher profitability for Regork. As a result, this increased profitability would enable regork to evaluate a number of different strategies including, but not limited to expansion, new investments, philanthropy, etc.

The above thoughts can be summarized as follows. Are there certain products that are not producing enough sales? Can we remove these products from our inventory to help us save on expenses and free up shelf space for higher yielding products? Could reducing inventory help Regork increase profitability?

I addressed this question using the data provided in the completejourney package–more specifically the transactions and products datasets. I evaluated the top 10 product categories (by total sales) and found a threshold sales value as the minimum requirement for a product to remain in inventory. Any products that earn below this threshold are removed from inventory and total sales are calculated again. I then compared the original sales value to the new sales value with products removed. I then compared the number of products for each category before and after removal. The sales value decreased minimally in each category while the number of products decreased considerably. I would recommend consolidating our inventory. Some products do not generate enough revenue to help the company be profitable. The expenses associated with these products likely mean these products contribute negatively to our profitability. A deeper analysis with more detail data about the sourcing and supplier contracts is likely needed to truly determine the impact to expenses, however I feel confident in saying that we would be better served removing products from our inventory.

Libraries Required

For this evaluation, the following packages are required:

tidyverse: suite of packages that assist with data manipulation and visualization
completejourney: package containing all of the datasets used for evaluation
knitr: package that allows for rendering of nice data tables in html

library(tidyverse)
library(completejourney)
library(knitr)

Approach

My theory is that if we remove lower revenue products from our inventory, we can cut expenses and increase overall profitability. To do this, I evaluated the 10 product categories that generated the most sales. After identifying these categories (and removing the COUPON/MISC ITEMS category for being too generic), I looked at the sales value for each product within these groups and evaluated how the total sales for each category would change if we removed a certain number of products.

Exploratory Data Analysis

I start by loading the full transactions dataset.

# Load the full transactions dataset into the 'transactions' object
transactions <- get_transactions()

After loading the full dataset, I identify the top 10 product categories with the highest revenue.

# Find the product_category values that yield the top 10 sales amount
# Excluding COUPON/MISC ITEMS
top_10_catgs <- transactions %>%
  inner_join(products, by = 'product_id') %>%
  group_by(product_category) %>%
  summarize(
    total_sales = sum(sales_value),
    total_quantity = sum(quantity)
  ) %>%
  arrange(desc(total_sales)) %>%
  filter(product_category != 'COUPON/MISC ITEMS') %>%
  slice_head(n = 10)

The barchart below shows the total revenue for each fo the top 10 categories.

# Creating a bar chart for sales amount for the top 10 product_category values
top_10_catgs %>%
  ggplot(aes(
    fct_reorder(factor(product_category), total_sales),
    total_sales,
    fill = product_category)
  ) +
  geom_bar(stat = 'identity') +
  scale_y_continuous(labels = scales::dollar) +
  scale_fill_brewer(palette = 'Set3') +
  labs(
    title = 'Top 10 Product Categories by Total Sales',
    subtitle = 'NOTE: COUPON/MISC ITEMS category removed.',
    x = 'Product Category',
    y = 'Total Sales',
    fill = 'Product Category'
  ) +
  coord_flip() +
  theme(legend.position = 'none')

These categories generate a lot of sales revenue for Regork.

Now I evaluate the number of products in each of these categories.

# Counting the number of product_ids within each of the top 10 product_category
# values
catg_n_products <- top_10_catgs %>%
  inner_join(products, by = 'product_category') %>%
  group_by(product_category, total_sales) %>%
  summarize(
    n_products = n_distinct(product_id)
  )

# Creating a bar chart for count of products for the top 10 product_category
# values
catg_n_products %>%
  ggplot(aes(fct_reorder(factor(product_category), n_products), n_products, fill = product_category)) +
  geom_bar(stat = 'identity') +
  coord_flip() +
  scale_fill_brewer(palette = 'Set3') +
  scale_y_continuous(labels = scales::comma) +
  labs(
    title = 'Number of Products in Top 10 Product Categories',
    x = 'Product Category',
    y = 'Number of Products',
  ) +
  theme(legend.position = 'none')

Each category has a lot of unique products within them. The theory is that some of these products will generate a lot of revenue while others will product a very small revenue. Assuming a lot of these products could be very similar to one another, it’s reasonable to conclude that some products are more popular than other products of the same type. For example, in SOFT DRINKS, we could have many different varieties. Some of these products could be limited releases, but also some of them cold be regular products that simply do not get purchased very frequently.

The below charts break the total revenue for each product into different groupings to assess if there is a large number of these prooducts that generate lower sales.

top_10_catgs %>%
  inner_join(products, by = 'product_category') %>%
  inner_join(transactions, by = 'product_id') %>%
  group_by(product_category, product_id) %>%
  summarize(
    total_sales = sum(sales_value)
  ) %>%
  mutate(total_sales_group = case_when(
    total_sales <= 50 ~ '$0 - $50',
    total_sales <= 100 ~ '$50 - $100',
    total_sales <= 150 ~ '$100 - $150',
    total_sales <= 200 ~ '$150 - $200',
    total_sales <= 250 ~ '$200 - $250',
    total_sales <= 300 ~ '$250 - $300',
    total_sales <= 350 ~ '$300 - $350',
    total_sales <= 400 ~ '$350 - $400',
    total_sales <= 450 ~ '$400 - $450',
    total_sales <= 500 ~ '$450 - $500',
    TRUE ~ '$500 +'
  )) %>%
  ggplot(aes(total_sales_group)) +
  geom_bar(color = 'darkslategray3', fill = 'darkslategray3') +
  facet_wrap(~ product_category, ncol = 5) +
  coord_flip() +
  labs(
    title = 'Number of Products by Total Sales Bin',
    subtitle = 'Broken out by product category',
    x = 'Total Sales Bin',
    y = 'Number of Products'
  )

The charts above indicate that for each category, the overwhelming majority of the products total between $0 and $50 of sales revenue. Most products seem to generate very little overall revenue. It is worth looking into the lowest sales group further to see the distribution of sales value across this group for each product category. This will help us get a better idea of just how many really low revenue products we have. This will help us set a threshold of minimum revenue for our evaluation.

top_10_catgs %>%
  inner_join(products, by = 'product_category') %>%
  inner_join(transactions, by = 'product_id') %>%
  group_by(product_category, product_id) %>%
  summarize(
    total_sales = sum(sales_value)
  ) %>%
  filter(total_sales <= 50) %>%
  ggplot(aes(total_sales)) +
  geom_histogram(bins = 20, color = 'darkslategray3', fill = 'darkslategray3') +
  geom_vline(xintercept = 5, linetype = 'dashed', color = 'darkslategrey') +
  facet_wrap(~ product_category, ncol = 5) +
  labs(
    title = 'Histograms of Products with $0-$50 of Total Sales',
    subtitle = 'Broken out by product category',
    x = 'Total Sales',
    y = 'Number of Products'
  )

These histograms show that for every product category, there is a considerable number of products that generate $5 or less in sales revenue. The dashed vertical line is placed at $5 and we can see some peaks generally to the left of this line or at least around this line. The $5 threshold is somewhat arbitrary. It is a low enough amount to figure that cutting any products that do not reach this threshold likely would not overly impact the overall revenue. Additionally, the value is low enough to not inadvertently cut out products that are less popular, but still potentially some shoppers’ favorite.

So let’s suppose that we remove all of the products that have generated $5 in sales or less. How does this impact revenue for these products? If it has very low impact to the overall revenue for these groups, it’s possible that the cost to procure these products and shelve them could outweigh their revenue contribution. The below chart illustrates this.

all_products <- top_10_catgs %>%
  inner_join(products, by = 'product_category') %>%
  inner_join(transactions, by = 'product_id') %>%
  group_by(product_category, product_id) %>%
  summarize(
    total_sales = sum(sales_value)
  ) %>%
  group_by(product_category) %>%
  summarize(
    total_sales = sum(total_sales),
    n_products = n_distinct(product_id)
  )

# Calculate total sales and number of products per category for only products
# that generate > $5 in sales
gt5_products <- top_10_catgs %>%
  inner_join(products, by = 'product_category') %>%
  inner_join(transactions, by = 'product_id') %>%
  group_by(product_category, product_id) %>%
  summarize(
    total_sales = sum(sales_value)
  ) %>%
  filter(total_sales > 5) %>%
  group_by(product_category) %>%
  summarize(
    total_sales = sum(total_sales),
    n_products = n_distinct(product_id)
  )

product_compare <- all_products %>%
  inner_join(gt5_products, by = 'product_category')

names(product_compare) <- c('product_category', 'total_sales_all',
                            'n_products_all', 'total_sales_trim',
                            'n_products_trim')

product_compare %>%
  select(product_category, total_sales_all, total_sales_trim) %>%
  pivot_longer(cols = 2:3, names_to = 'type', values_to = 'sales_amount') %>%
  mutate(type = if_else(type == 'total_sales_all', 'Before', 'After')) %>%
  ggplot(aes(product_category, sales_amount, fill = type)) +
  geom_col(position = 'dodge') +
  scale_y_continuous(labels = scales::dollar) +
  coord_flip() +
  labs(
    title = 'Total Sales Comparison',
    subtitle = 'Broken out by product category',
    x = 'Product Category',
    y = 'Total Sales',
    fill = 'Before/After Product Removal'
  )

We can see that for each category, both bars are very close. While we do lose some revenue from removing products, the impact appears to be minimal.

On the other hand, let’s look at the difference in the number of unique products remaining after we remove those that generate $5 of revenue or less.

product_compare %>%
  select(product_category, n_products_all, n_products_trim) %>%
  pivot_longer(cols = 2:3, names_to = 'type', values_to = 'sales_amount') %>%
  mutate(type = if_else(type == 'n_products_all', 'Before', 'After')) %>%
  ggplot(aes(product_category, sales_amount, fill = type)) +
  geom_col(position = 'dodge') +
  coord_flip() +
  scale_y_continuous(labels = scales::comma) +
  labs(
    title = 'Number of Products Comparison',
    subtitle = 'Broken out by product category',
    x = 'Product Category',
    y = 'Number of Products',
    fill = 'Before/After Product Removal'
  )

In this case, the difference between the original number of products and the new reduced number of products is drastic. By removing these low revenue products, we are able to cut out a significant portion of our inventory. It appears that removing these low revenue products has very minimal overall impact to revenue for each group, but reduces the number of products significantly. This is encouraging as it could mean that there is an overall trend for every department.

Below is a table indicating the difference in revenue and product count for each category.

product_compare_diffs <- product_compare %>%
  mutate(
    sales_diff = total_sales_all - total_sales_trim,
    pct_sales_diff =
      paste0(
        round((total_sales_all - total_sales_trim) / total_sales_all, 4) * 100,
        '%'
      ),
    products_diff = n_products_all - n_products_trim,
    pct_products_diff =
      paste0(
        round((n_products_all - n_products_trim) / n_products_all, 4) * 100,
        '%'
      )
  ) %>%
  select(product_category, sales_diff, pct_sales_diff, products_diff, pct_products_diff)

kable(
  product_compare_diffs,
  caption = 'Comparison of Revenue and Products (By Category)',
  col.names = c('Product Category', 'Sales',
                'Sales (%)', 'Number of Products',
                'Number of Products (%)'),
  align = c('l', 'r', 'r', 'r', 'r')
)

Comparison of Revenue and Products (By Category)
Product Category	Sales	Sales (%)	Number of Products	Number of Products (%)
BAG SNACKS	1159.54	1.37%	468	40.91%
BAKED BREAD/BUNS/ROLLS	844.01	1.02%	382	43.56%
BEEF	405.74	0.23%	120	16.06%
BEERS/ALES	174.13	0.21%	51	8.1%
CHEESE	819.10	0.77%	297	36.76%
COLD CEREAL	418.55	0.66%	155	34.52%
FLUID MILK PRODUCTS	313.51	0.27%	131	38.08%
FROZEN PIZZA	486.72	0.6%	181	26.93%
FRZN MEAT/MEAT DINNERS	637.20	0.68%	266	28.88%
SOFT DRINKS	1060.28	0.58%	468	35.89%

What can we get from this table? The sales revenue for each group only drops by 0.21% to 1.37% while the number of products reduced drops by 8.1% to 43.56%. If we exclude the two lowest changes of 8.1% and 16.06%, the percentage decrease in number of products range from 26.93% to 43.56%. This means that for most groups at least a quarter of the products are reduced and potentially up to almost half. This does not guarantee cutting half of the expense to procure these items, but it still could be a considerable cost saving.

But what if we apply this logic to the all of our products? How many products would we remove from our inventory and how much would sales reduce?

total_inventory <- transactions %>%
  summarize(
    total_sales_all = sum(sales_value),
    n_products_all = n_distinct(product_id)
  )

trimmed_inventory <- transactions %>%
  group_by(product_id) %>%
  summarize(
    total_sales = sum(sales_value)
  ) %>%
  filter(total_sales > 5) %>%
  summarize(
    total_sales_trim = sum(total_sales),
    n_products_trim = n_distinct(product_id)
  )

total_diffs <- total_inventory %>%
  cbind(trimmed_inventory) %>%
  mutate(
    sales_diff = total_sales_all - total_sales_trim,
    pct_sales_diff =
      paste0(
        round((total_sales_all - total_sales_trim) / total_sales_all, 4) * 100,
        '%'
      ),
    products_diff = n_products_all - n_products_trim,
    pct_products_diff =
      paste0(
        round((n_products_all - n_products_trim) / n_products_all, 4) * 100,
        '%'
      )
  ) %>%
  select(sales_diff, pct_sales_diff, products_diff, pct_products_diff)

kable(
  total_diffs,
  caption = 'Comparison of Revenue and Products (Overall)',
  col.names = c('Sales',
                'Sales (%)', 'Number of Products',
                'Number of Products (%)'),
  align = c('r', 'r', 'r', 'r')
)

Comparison of Revenue and Products (Overall)
Sales	Sales (%)	Number of Products	Number of Products (%)
58107.32	1.26%	22089	32.24%

By removing all products that generated $5 of revenue or less, we only decrease sales by 1.26% while decreasing the number of products by 32.24%. Of course there is a small reduction in the total company revenue, but this is quite likely more than offset by the reduction of products.

Summary

I wanted to evaluate if reducing product inventory could help increase profitability from a number of different angles. The theory being that it would reduce procurement expenses, help streamline operations, and potentially allow for more profitable products to reclaim some shelf space. This is entirely contingent on whether or not removing these products from our inventory have a minimal effect on revenue. This problem was addressed by using the transactions and products datasets. I combined these datasets to tie transaction information to product categories and then explored to find a good threshold value for product revenue. Once this was determined, I evaluated the impact of removing all products with $5 or less in sales revenue. The analysis revealed very minimal impact to the overall revenue of Regork–only a 1.26% reduction–while cutting nearly one-third (32.24%) of products. Removing these poorer performing products could result in many savings. It is worth considering that, for some customers, we could be cutting some of their favorite products. I would ultimately recommend to consolidate the inventory for the reasons mentioned above–lower procurement costs, streamline operations, and free up shelf space.

It is important to consider the limitations of this analysis. Some of the limitations are as follows:

We do not have access to data associate with the costs of acquiring these products. This data would be needed in order to determined total cost savings from removing any given number of products.
The threshold of $5 was chosen arbitrarily by visual context. A more formal analysis could help generate a better threshold for removing products.
Not all products or categories are created equal. Perhaps one threshold in one category may not be the optimal threshold for another. It could be helpful to consider different thresholds for different categories.
We are missing more detailed information about what the products are specifically. Some items could be limited edition products that are run as part of some promotion between brands. In such a case, these products likely do not apply as they will be removed by default once the promotional period expires.

Spring 2025 BANA7025 Final Project

Christian Dinevski

2025-02-27

Introduction

Libraries Required

Approach

Exploratory Data Analysis

Summary