This report uses k-means and PAM clustering to analyze 8,736 products (with >=3 orders) from the Olist Brazilian e-commerce marketplace. The goal: determine whether customer dissatisfaction is driven by logistics problems (late delivery) or product quality problems (inherent product flaws).
Products analyzed: 8,736 Categories represented: 71 Avg review score: 4.08 (SD: 0.81) Avg bad review rate: 15.0% Avg delivery delay: -12.3 days Avg on-time rate: 93.7% Avg price: R$ 119.24 Avg freight ratio: 32.4%
| pct_bad_reviews | avg_delay | pct_on_time | avg_price | avg_freight_ratio | weight_g | |
|---|---|---|---|---|---|---|
| pct_bad_reviews | 1.00 | 0.17 | -0.33 | 0.01 | 0.03 | 0.03 |
| avg_delay | 0.17 | 1.00 | -0.56 | -0.01 | -0.03 | 0.02 |
| pct_on_time | -0.33 | -0.56 | 1.00 | -0.05 | 0.03 | -0.04 |
| avg_price | 0.01 | -0.01 | -0.05 | 1.00 | -0.41 | 0.35 |
| avg_freight_ratio | 0.03 | -0.03 | 0.03 | -0.41 | 1.00 | -0.14 |
| weight_g | 0.03 | 0.02 | -0.04 | 0.35 | -0.14 | 1.00 |
The correlation matrix shows that pct_bad_reviews has a
moderate positive correlation with avg_delay — meaning late
deliveries are associated with worse reviews. But the relationship is
not perfect, suggesting some dissatisfied products arrive on time
(product problem) and some arrive late despite good quality scores
(logistics issue).
Six standardized features capture two quality dimensions plus product attributes:
| Dimension | Features | Purpose |
|---|---|---|
| Dissatisfaction | pct_bad_reviews |
% of 1-2 star scores |
| Logistics quality | avg_delay, pct_on_time |
Delivery speed and reliability |
| Product economics | avg_price_log, avg_freight_ratio,
weight_g_log |
Product type and cost structure |
We determine the optimal k using three criteria: Gap statistic (k-means) and average silhouette width (k-means and PAM).
The three methods converge on k = 3 clusters as the optimal choice.
K-means achieved an average silhouette of 0.2494 vs PAM’s 0.1646. The k-means algorithm was selected for the final solution.
| Cluster | Products | % of Total |
|---|---|---|
| Excellent — Great Product & Logistics | 4511 | 51.6 |
| Good — Acceptable Quality | 3019 | 34.6 |
| Low Quality (Mixed) | 1206 | 13.8 |
Interpretation: The clusters separate clearly in PCA space. The first two principal components explain 63.9% of the variance. Problem clusters (product and logistics) occupy distinct regions of the quality space, validating the segmentation.
Interpretation: The heatmap confirms the cluster differentiation. The “Excellent” cluster is below average in bad reviews and delay. The “Logistics Problem” cluster is strongly above average in delivery delay. The “Product Problem” cluster is above average in bad reviews but below average in delay — these products arrive on time but still disappoint.
Interpretation: This is the core diagnostic plot. Each point is a product. The horizontal dashed line is the median bad-review rate; the vertical line at x=0 separates on-time (left) from late (right) deliveries. Products in the upper-left quadrant are Product Problems — they arrive on time but receive poor reviews. Products in the upper-right are Compound Problems or logistics issues. The clustering algorithm successfully separates these two problem types.
Interpretation: Both methods show similar silhouette profiles. The k-means algorithm achieves a slightly better average silhouette and is used for the final cluster assignments. Products below zero (negative silhouette) may be misclassified — these are borderline cases near cluster boundaries.
Which categories have the highest proportion of products in problem clusters?
Interpretation: home_confort leads with 28% of its products flagged as problematic. The pattern across categories reveals whether the issue is systemic (e.g., a problematic category like construction materials always disappoints) or isolated (e.g., electronics has a mix of good and bad products).
For the worst categories, is the problem logistics-driven or product-driven?
Interpretation: For most categories, the problem type is mixed. Some categories (e.g., furniture, construction) lean toward logistics problems (heavy/bulky items are harder to deliver). Others (e.g., health & beauty, electronics) lean toward product problems — suggesting issues with product quality itself rather than delivery.
Interpretation: The “Product Problem” cluster tends to have lower-priced products on average, while “Logistics Problem” products span a wider price range. Weight distribution shows that logistics problem products are often heavier — suggesting that bulky items are harder to deliver reliably.
Overall problem rate: 13.8% of products are flagged as problematic (n = 1,206).
Note on problem type split: The 50/50 overall split between logistics and product problems is a methodological artifact — problem types are divided at the median delay of problem products, which guarantees equal halves. The real insight lies in how individual categories deviate from this baseline (see Problem Type by Category figure).
| Problem Type | Products | % of Problems | Root Cause |
|---|---|---|---|
| Logistics-driven | 603 | 50 | Late or unreliable delivery |
| Product-driven | 603 | 50 | Inherent product flaws despite on-time delivery |
| Low Quality (Mixed) | 0 | 0 | Combination of factors |
| Category | Problem Rate (%) | Products (n) |
|---|---|---|
| home_confort | 28.2 | 39 |
| construction_tools_lights | 20.0 | 20 |
| kitchen_dining_laundry_garden_furniture | 19.2 | 26 |
| bed_bath_table | 18.7 | 871 |
| fixed_telephony | 18.2 | 22 |
| audio | 16.7 | 24 |
| christmas_supplies | 16.7 | 12 |
| furniture_decor | 16.5 | 674 |
| Finding | Implication | Recommendation |
|---|---|---|
| home_confort has 28% problem rate | Highest-risk category for customer dissatisfaction | Investigate supplier quality; set stricter acceptance criteria |
| Problematic products are more expensive on average | Higher-priced products set higher expectations | Ensure premium products meet premium expectations |
| Logistics problem products are heavier — bulky items harder to deliver | Bulky items are harder to deliver reliably | Offer specialized shipping for large items; manage delivery window expectations |
Can we predict which products will get bad reviews using only product attributes (price, weight, freight, category) and does adding delivery data (delay, on-time rate) improve the prediction? This directly tests whether logistics or product factors drive quality problems.
Two model sets were trained on 8736 products (80/20 stratified split):
| Model Set | Features | Purpose |
|---|---|---|
| A: Product Only | Price, weight, freight ratio, orders, category | Can we predict quality before shipping? |
| B: + Delivery | A + avg_delay, pct_on_time | Does delivery data improve predictions? |
Both sets use Random Forest (500 trees) and
XGBoost (100 rounds, early stopping). The target is
above_median_bad (above-median bad review rate), which is
independent of the clustering — avoiding
circularity.
| Metric | Model B (+ Delivery) |
|---|---|
| Accuracy | 0.732 |
| Precision | 0.775 |
| Recall | 0.649 |
| F1 | 0.706 |
| Metric | Product Only (A) | + Delivery (B) | Δ (B - A) |
|---|---|---|---|
| RF AUC | 0.689 | 0.783 | +0.0940 |
| XGBoost AUC | 0.714 | 0.799 | +0.0854 |
Delivery features improve prediction (ΔAUC > 0.01). Logistics data adds value beyond knowing the product type and price — delivery delay and on-time rate carry real signal about quality problems.
Interpretation: The feature importance plot shows
which factors most strongly predict bad reviews. If delivery metrics
(avg_delay, pct_on_time) rank high, logistics
quality is diagnostically important. If product attributes (price,
weight, freight ratio) dominate, the product itself drives
dissatisfaction. Category dummies ranking lower suggests the effect is
driven by continuous attributes, not category-level patterns.
Interpretation: The ROC curves compare Model A (product only) vs Model B (product + delivery). The gap between curves shows how much predictive power delivery data adds. A narrow gap means product attributes tell the whole story; a wide gap means logistics information is essential for identifying quality problems.
Interpretation: The confusion matrix shows how well the best model separates above- from below-median review products. High accuracy on non-problem products is expected; false negatives (missed problem products) are the cost of predicting quality from logistics data alone.
Report generated on 2026-06-23 · Data: Olist Brazilian E-Commerce Public Dataset (2016-2018) · 11 visualizations · Day 2: Product Quality Clustering