ABC Analysis

Author

Tony Wang

Executive Summary:

This project successfully implemented an ABC Analysis framework combined with Sales Frequency Profiling to evaluate product performance, optimize inventory allocation, and uncover hidden demand patterns within our annual sales data. By categorizing our product portfolio and examining purchasing behavioral patterns (repeat purchases), this analysis provides actionable, data-driven recommendations to maximize working capital efficiency and secure supply chain resilience for top-performing assets.

Methodology & Technical Refinement:

  • ABC Value Segmentation: Products were successfully classified into A (High-Value), B (Mid-Value), and C (Low-Value) categories based on their cumulative contribution to annual sales revenue, adhering to the 80/20 Pareto principle.

  • Frequency Profiling via Boxplots: To evaluate customer retention and order consistency, we generated comparative boxplots assessing individual product purchase frequencies across the three distinct classes.

    • Data Quality Assurance: Initial iterations revealed identical distribution patterns due to a data-merging logical error (variable reference override). Through rigorous code auditing, the script was successfully debugged, and a Logarithmic Scale (\(Log_{10}\)) transformation was applied to the Y-axis. This advanced adjustment successfully mitigated extreme outlier skewness, exposing the genuine, underlying operational patterns of the business.

Load Packages

Code
library(cABCanalysis)
library(tidyverse)
library(dplyr)
library(readr)

Load data

Code
df <- read_csv("olist_order_items_dataset.csv")
head (df)
# A tibble: 6 × 7
  order_id          order_item_id product_id seller_id shipping_limit_date price
  <chr>                     <dbl> <chr>      <chr>     <dttm>              <dbl>
1 00010242fe8c5a6d…             1 4244733e0… 48436dad… 2017-09-19 09:45:35  58.9
2 00018f77f2f0320c…             1 e5f2d52b8… dd7ddc04… 2017-05-03 11:05:13 240. 
3 000229ec398224ef…             1 c777355d1… 5b51032e… 2018-01-18 14:48:30 199  
4 00024acbcdf0a6da…             1 7634da152… 9d7a1d34… 2018-08-15 10:10:18  13.0
5 00042b26cf59d7ce…             1 ac6c36230… df560393… 2017-02-13 13:57:51 200. 
6 00048cc3ae777c65…             1 ef92defde… 6426d21a… 2017-05-23 03:55:27  21.9
# ℹ 1 more variable: freight_value <dbl>
Code
nrow(df)
[1] 112650

The dataset comprises 112,650 transaction entries from a Brazilian e-commerce business, spanning a four-year period

Group data by item id for analysis

Code
# Conduct Analysis
annual_sales <- df %>%
  group_by(product_id) %>%
  summarise(
    value = sum(price, na.rm = TRUE),
    units_sold = n()
  ) %>%
  arrange(desc(value)) %>%
  mutate(
    grand_total = sum(value),
    pct = value / grand_total,
    cum_pct = cumsum(pct),
    abc_class = case_when(
      cum_pct <= 0.8 ~ "A",
      cum_pct <= 0.95 ~ "B",
      TRUE ~ "C"
         )
  )

nrow(annual_sales)
[1] 32951

There are a total of 32,951 unique skus in the data

ABC Analysis

Code
# Conduct the analysis
abc_result <- cABC_analysis(annual_sales$value, PlotIt = TRUE, useGGPlot = TRUE)

Code
# 3. View the corrected counts calculated by cABCanalysis
length(abc_result$Aind)
[1] 6557
Code
length(abc_result$Bind)
[1] 3813
Code
length(abc_result$Cind)
[1] 22581

Generate Analysis Report

Code
# Generate summary
executive_summary <- annual_sales %>%
  group_by(abc_class) %>%
  summarise(
    total_skus = n(),
    sku_percentage = n() / nrow(annual_sales) * 100,
    total_revenue = sum(value),
    revenue_percentage = sum(value) / max(grand_total) * 100
  )

print(executive_summary)
# A tibble: 3 × 5
  abc_class total_skus sku_percentage total_revenue revenue_percentage
  <chr>          <int>          <dbl>         <dbl>              <dbl>
1 A               8535           25.9     10873150.              80.0 
2 B              11301           34.3      2038874.              15.0 
3 C              13115           39.8       679620.               5.00

Key findings:

  • Class A: the “Vital Few” that contribute 80% of total revenue - 8,535 skus, 25.9% of total skus

  • Class B: the “Moderate Middle” that contribute 15% of total revenue - 11,301 skus, 34.3% of total skus

  • Class C: the “Trivial Many” that contribute merely 5% of total revenue - 13,115 skus, 39.8% of total skus

Export Products list in each categories

Code
class_A_items <- annual_sales[abc_result$Aind,]
class_B_items <- annual_sales[abc_result$Bind,]
class_C_items <- annual_sales[abc_result$Cind,]

Sales Pattern Comparison

Code
# Filter sales records by class
df_a <- df %>%
  filter(product_id %in% class_A_items$product_id)

df_b <- df %>%
  filter(product_id %in% class_B_items$product_id)

df_c <- df %>%
  filter(product_id %in% class_C_items$product_id)

# Label the filtered data
df_a <- df_a %>%
  mutate(class = "A")

df_b <- df_b %>%
  mutate(class = "B")

df_c <- df_c %>%
  mutate(class = "C")

# Merge data and calculate sales frequency
df_sales_freq <- bind_rows(df_a, df_b, df_c) %>%
  group_by(product_id, class) %>%
  summarise(sales_frequency = n(), .groups = "drop")

ggplot(df_sales_freq, aes(x = class, y = sales_frequency, fill = class)) +
  geom_boxplot(alpha = 0.7, outlier.colour = "red", outlier.size = 1) +
  scale_y_log10(labels = scales::comma) + 
  scale_fill_brewer(palette = "Set2") +
  labs(
    title = "Sales Frequency Comparison",
    subtitle = "Log scaled",
    x = "ABC Classes",
    y = "Sales Frequency (Log Scale)"
  ) +
  theme_minimal() +
  theme(legend.position = "none")

Strategic Recommendations

To translate these analytical findings into immediate financial and operational value, management should adopt a differentiated inventory and commercial approach:

  1. Implement Aggressive Protection for Class A: Because Class A items represent high financial stakes and high customer purchase frequencies, stockouts pose a massive threat to client retention. Implement continuous cycle-counting and establish collaborative Just-In-Time (JIT) or Vendor-Managed Inventory (VMI) frameworks for the top-tier 300-order velocity items.

  2. Audit High-Frequency Class C Outliers: Audit the low-cost, high-frequency Class C outliers. Determine if they are priced too low, or if they serve as vital loss-leaders/hooks for Class A sales. If they are companion products, protect their availability despite low direct revenue.

  3. Execute “Inventory Slimming” on Standard Class C Items: For the vast sea of Class C products that sell only once a year, transition away from stock-holding models. Move these items to a “Just-In-Time” purchasing model, or initiate a structured inventory clearance to liquidate stagnant capital and free up vital distribution center capacity.