Code
library(cABCanalysis)
library(tidyverse)
library(dplyr)
library(readr)This project successfully implemented an ABC Analysis framework combined with Sales Frequency Profiling to evaluate product performance, optimize inventory allocation, and uncover hidden demand patterns within our annual sales data. By categorizing our product portfolio and examining purchasing behavioral patterns (repeat purchases), this analysis provides actionable, data-driven recommendations to maximize working capital efficiency and secure supply chain resilience for top-performing assets.
Methodology & Technical Refinement:
ABC Value Segmentation: Products were successfully classified into A (High-Value), B (Mid-Value), and C (Low-Value) categories based on their cumulative contribution to annual sales revenue, adhering to the 80/20 Pareto principle.
Frequency Profiling via Boxplots: To evaluate customer retention and order consistency, we generated comparative boxplots assessing individual product purchase frequencies across the three distinct classes.
library(cABCanalysis)
library(tidyverse)
library(dplyr)
library(readr)df <- read_csv("olist_order_items_dataset.csv")
head (df)# A tibble: 6 × 7
order_id order_item_id product_id seller_id shipping_limit_date price
<chr> <dbl> <chr> <chr> <dttm> <dbl>
1 00010242fe8c5a6d… 1 4244733e0… 48436dad… 2017-09-19 09:45:35 58.9
2 00018f77f2f0320c… 1 e5f2d52b8… dd7ddc04… 2017-05-03 11:05:13 240.
3 000229ec398224ef… 1 c777355d1… 5b51032e… 2018-01-18 14:48:30 199
4 00024acbcdf0a6da… 1 7634da152… 9d7a1d34… 2018-08-15 10:10:18 13.0
5 00042b26cf59d7ce… 1 ac6c36230… df560393… 2017-02-13 13:57:51 200.
6 00048cc3ae777c65… 1 ef92defde… 6426d21a… 2017-05-23 03:55:27 21.9
# ℹ 1 more variable: freight_value <dbl>
nrow(df)[1] 112650
The dataset comprises 112,650 transaction entries from a Brazilian e-commerce business, spanning a four-year period
# Conduct Analysis
annual_sales <- df %>%
group_by(product_id) %>%
summarise(
value = sum(price, na.rm = TRUE),
units_sold = n()
) %>%
arrange(desc(value)) %>%
mutate(
grand_total = sum(value),
pct = value / grand_total,
cum_pct = cumsum(pct),
abc_class = case_when(
cum_pct <= 0.8 ~ "A",
cum_pct <= 0.95 ~ "B",
TRUE ~ "C"
)
)
nrow(annual_sales)[1] 32951
There are a total of 32,951 unique skus in the data
# Conduct the analysis
abc_result <- cABC_analysis(annual_sales$value, PlotIt = TRUE, useGGPlot = TRUE)# 3. View the corrected counts calculated by cABCanalysis
length(abc_result$Aind)[1] 6557
length(abc_result$Bind)[1] 3813
length(abc_result$Cind)[1] 22581
# Generate summary
executive_summary <- annual_sales %>%
group_by(abc_class) %>%
summarise(
total_skus = n(),
sku_percentage = n() / nrow(annual_sales) * 100,
total_revenue = sum(value),
revenue_percentage = sum(value) / max(grand_total) * 100
)
print(executive_summary)# A tibble: 3 × 5
abc_class total_skus sku_percentage total_revenue revenue_percentage
<chr> <int> <dbl> <dbl> <dbl>
1 A 8535 25.9 10873150. 80.0
2 B 11301 34.3 2038874. 15.0
3 C 13115 39.8 679620. 5.00
Key findings:
Class A: the “Vital Few” that contribute 80% of total revenue - 8,535 skus, 25.9% of total skus
Class B: the “Moderate Middle” that contribute 15% of total revenue - 11,301 skus, 34.3% of total skus
Class C: the “Trivial Many” that contribute merely 5% of total revenue - 13,115 skus, 39.8% of total skus
class_A_items <- annual_sales[abc_result$Aind,]
class_B_items <- annual_sales[abc_result$Bind,]
class_C_items <- annual_sales[abc_result$Cind,]# Filter sales records by class
df_a <- df %>%
filter(product_id %in% class_A_items$product_id)
df_b <- df %>%
filter(product_id %in% class_B_items$product_id)
df_c <- df %>%
filter(product_id %in% class_C_items$product_id)
# Label the filtered data
df_a <- df_a %>%
mutate(class = "A")
df_b <- df_b %>%
mutate(class = "B")
df_c <- df_c %>%
mutate(class = "C")
# Merge data and calculate sales frequency
df_sales_freq <- bind_rows(df_a, df_b, df_c) %>%
group_by(product_id, class) %>%
summarise(sales_frequency = n(), .groups = "drop")
ggplot(df_sales_freq, aes(x = class, y = sales_frequency, fill = class)) +
geom_boxplot(alpha = 0.7, outlier.colour = "red", outlier.size = 1) +
scale_y_log10(labels = scales::comma) +
scale_fill_brewer(palette = "Set2") +
labs(
title = "Sales Frequency Comparison",
subtitle = "Log scaled",
x = "ABC Classes",
y = "Sales Frequency (Log Scale)"
) +
theme_minimal() +
theme(legend.position = "none")To translate these analytical findings into immediate financial and operational value, management should adopt a differentiated inventory and commercial approach:
Implement Aggressive Protection for Class A: Because Class A items represent high financial stakes and high customer purchase frequencies, stockouts pose a massive threat to client retention. Implement continuous cycle-counting and establish collaborative Just-In-Time (JIT) or Vendor-Managed Inventory (VMI) frameworks for the top-tier 300-order velocity items.
Audit High-Frequency Class C Outliers: Audit the low-cost, high-frequency Class C outliers. Determine if they are priced too low, or if they serve as vital loss-leaders/hooks for Class A sales. If they are companion products, protect their availability despite low direct revenue.
Execute “Inventory Slimming” on Standard Class C Items: For the vast sea of Class C products that sell only once a year, transition away from stock-holding models. Move these items to a “Just-In-Time” purchasing model, or initiate a structured inventory clearance to liquidate stagnant capital and free up vital distribution center capacity.