Executive Summary

This report uses k-means and PAM clustering to analyze 8,736 products (with >=3 orders) from the Olist Brazilian e-commerce marketplace. The goal: determine whether customer dissatisfaction is driven by logistics problems (late delivery) or product quality problems (inherent product flaws).

Data Overview

Products analyzed: 8,736 Categories represented: 71 Avg review score: 4.08 (SD: 0.81) Avg bad review rate: 15.0% Avg delivery delay: -12.3 days Avg on-time rate: 93.7% Avg price: R$ 119.24 Avg freight ratio: 32.4%

Correlation between quality and logistics variables
	pct_bad_reviews	avg_delay	pct_on_time	avg_price	avg_freight_ratio	weight_g
pct_bad_reviews	1.00	0.17	-0.33	0.01	0.03	0.03
avg_delay	0.17	1.00	-0.56	-0.01	-0.03	0.02
pct_on_time	-0.33	-0.56	1.00	-0.05	0.03	-0.04
avg_price	0.01	-0.01	-0.05	1.00	-0.41	0.35
avg_freight_ratio	0.03	-0.03	0.03	-0.41	1.00	-0.14
weight_g	0.03	0.02	-0.04	0.35	-0.14	1.00

The correlation matrix shows that pct_bad_reviews has a moderate positive correlation with avg_delay — meaning late deliveries are associated with worse reviews. But the relationship is not perfect, suggesting some dissatisfied products arrive on time (product problem) and some arrive late despite good quality scores (logistics issue).

Clustering Methodology

Features Used

Six standardized features capture two quality dimensions plus product attributes:

Dimension	Features	Purpose
Dissatisfaction	`pct_bad_reviews`	% of 1-2 star scores
Logistics quality	`avg_delay`, `pct_on_time`	Delivery speed and reliability
Product economics	`avg_price_log`, `avg_freight_ratio`, `weight_g_log`	Product type and cost structure

Optimal Number of Clusters

We determine the optimal k using three criteria: Gap statistic (k-means) and average silhouette width (k-means and PAM).

The three methods converge on k = 3 clusters as the optimal choice.

Clustering Results

K-Means vs PAM: Method Selection

K-means achieved an average silhouette of 0.2494 vs PAM’s 0.1646. The k-means algorithm was selected for the final solution.

Cluster Characterization

Cluster Size Distribution
Cluster	Products	% of Total
Excellent — Great Product & Logistics	4511	51.6
Good — Acceptable Quality	3019	34.6
Low Quality (Mixed)	1206	13.8

PCA Cluster Map (P2)

Interpretation: The clusters separate clearly in PCA space. The first two principal components explain 63.9% of the variance. Problem clusters (product and logistics) occupy distinct regions of the quality space, validating the segmentation.

Cluster Profile Heatmap (P3)

Interpretation: The heatmap confirms the cluster differentiation. The “Excellent” cluster is below average in bad reviews and delay. The “Logistics Problem” cluster is strongly above average in delivery delay. The “Product Problem” cluster is above average in bad reviews but below average in delay — these products arrive on time but still disappoint.

Logistics vs Product Problem Quadrants (P4)

Interpretation: This is the core diagnostic plot. Each point is a product. The horizontal dashed line is the median bad-review rate; the vertical line at x=0 separates on-time (left) from late (right) deliveries. Products in the upper-left quadrant are Product Problems — they arrive on time but receive poor reviews. Products in the upper-right are Compound Problems or logistics issues. The clustering algorithm successfully separates these two problem types.

Silhouette Comparison (P5)

Interpretation: Both methods show similar silhouette profiles. The k-means algorithm achieves a slightly better average silhouette and is used for the final cluster assignments. Products below zero (negative silhouette) may be misclassified — these are borderline cases near cluster boundaries.

Category-Level Analysis

Most Problematic Categories (P6)

Which categories have the highest proportion of products in problem clusters?

Interpretation: home_confort leads with 28% of its products flagged as problematic. The pattern across categories reveals whether the issue is systemic (e.g., a problematic category like construction materials always disappoints) or isolated (e.g., electronics has a mix of good and bad products).

Problem Type Breakdown by Category (P7)

For the worst categories, is the problem logistics-driven or product-driven?

Interpretation: For most categories, the problem type is mixed. Some categories (e.g., furniture, construction) lean toward logistics problems (heavy/bulky items are harder to deliver). Others (e.g., health & beauty, electronics) lean toward product problems — suggesting issues with product quality itself rather than delivery.

Product Attributes by Cluster (P8)

Interpretation: The “Product Problem” cluster tends to have lower-priced products on average, while “Logistics Problem” products span a wider price range. Weight distribution shows that logistics problem products are often heavier — suggesting that bulky items are harder to deliver reliably.

Key Findings

Overall problem rate: 13.8% of products are flagged as problematic (n = 1,206).

Note on problem type split: The 50/50 overall split between logistics and product problems is a methodological artifact — problem types are divided at the median delay of problem products, which guarantees equal halves. The real insight lies in how individual categories deviate from this baseline (see Problem Type by Category figure).

Problem Type Breakdown
Problem Type	Products	% of Problems	Root Cause
Logistics-driven	603	50	Late or unreliable delivery
Product-driven	603	50	Inherent product flaws despite on-time delivery
Low Quality (Mixed)	0	0	Combination of factors

Top 8 Categories by Problem Rate
Category	Problem Rate (%)	Products (n)
home_confort	28.2	39
construction_tools_lights	20.0	20
kitchen_dining_laundry_garden_furniture	19.2	26
bed_bath_table	18.7	871
fixed_telephony	18.2	22
audio	16.7	24
christmas_supplies	16.7	12
furniture_decor	16.5	674

Strategic Recommendations

Finding	Implication	Recommendation
home_confort has 28% problem rate	Highest-risk category for customer dissatisfaction	Investigate supplier quality; set stricter acceptance criteria
Problematic products are more expensive on average	Higher-priced products set higher expectations	Ensure premium products meet premium expectations
Logistics problem products are heavier — bulky items harder to deliver	Bulky items are harder to deliver reliably	Offer specialized shipping for large items; manage delivery window expectations

Machine Learning — Predicting Quality Problems

Can we predict which products will get bad reviews using only product attributes (price, weight, freight, category) and does adding delivery data (delay, on-time rate) improve the prediction? This directly tests whether logistics or product factors drive quality problems.

Methodology

Two model sets were trained on 8736 products (80/20 stratified split):

Model Set	Features	Purpose
A: Product Only	Price, weight, freight ratio, orders, category	Can we predict quality before shipping?
B: + Delivery	A + avg_delay, pct_on_time	Does delivery data improve predictions?

Both sets use Random Forest (500 trees) and XGBoost (100 rounds, early stopping). The target is above_median_bad (above-median bad review rate), which is independent of the clustering — avoiding circularity.

Results

Model B Performance (RF, Product + Delivery features)
Metric	Model B (+ Delivery)
Accuracy	0.732
Precision	0.775
Recall	0.649
F1	0.706

Does Delivery Data Improve AUC?
Metric	Product Only (A)	+ Delivery (B)	Δ (B - A)
RF AUC	0.689	0.783	+0.0940
XGBoost AUC	0.714	0.799	+0.0854

Delivery features improve prediction (ΔAUC > 0.01). Logistics data adds value beyond knowing the product type and price — delivery delay and on-time rate carry real signal about quality problems.

Feature Importance — XGBoost Model B (P9)

Interpretation: The feature importance plot shows which factors most strongly predict bad reviews. If delivery metrics (avg_delay, pct_on_time) rank high, logistics quality is diagnostically important. If product attributes (price, weight, freight ratio) dominate, the product itself drives dissatisfaction. Category dummies ranking lower suggests the effect is driven by continuous attributes, not category-level patterns.

ROC Curves — Product Only vs Product + Delivery (P10)

Interpretation: The ROC curves compare Model A (product only) vs Model B (product + delivery). The gap between curves shows how much predictive power delivery data adds. A narrow gap means product attributes tell the whole story; a wide gap means logistics information is essential for identifying quality problems.

Confusion Matrix — XGBoost (P11)

Interpretation: The confusion matrix shows how well the best model separates above- from below-median review products. High accuracy on non-problem products is expected; false negatives (missed problem products) are the cost of predicting quality from logistics data alone.

Report generated on 2026-06-23 · Data: Olist Brazilian E-Commerce Public Dataset (2016-2018) · 11 visualizations · Day 2: Product Quality Clustering

Product Quality Clustering — Logistics vs Product Problem

Olist Marketplace 2016-2018 | Day 2

Data Analyst

2026-06-23