Executive Summary

This report uses k-means and PAM clustering to analyze 8,736 products (with >=3 orders) from the Olist Brazilian e-commerce marketplace. The goal: determine whether customer dissatisfaction is driven by logistics problems (late delivery) or product quality problems (inherent product flaws).


Data Overview

Products analyzed: 8,736 Categories represented: 71 Avg review score: 4.08 (SD: 0.81) Avg bad review rate: 15.0% Avg delivery delay: -12.3 days Avg on-time rate: 93.7% Avg price: R$ 119.24 Avg freight ratio: 32.4%

Correlation between quality and logistics variables
pct_bad_reviews avg_delay pct_on_time avg_price avg_freight_ratio weight_g
pct_bad_reviews 1.00 0.17 -0.33 0.01 0.03 0.03
avg_delay 0.17 1.00 -0.56 -0.01 -0.03 0.02
pct_on_time -0.33 -0.56 1.00 -0.05 0.03 -0.04
avg_price 0.01 -0.01 -0.05 1.00 -0.41 0.35
avg_freight_ratio 0.03 -0.03 0.03 -0.41 1.00 -0.14
weight_g 0.03 0.02 -0.04 0.35 -0.14 1.00

The correlation matrix shows that pct_bad_reviews has a moderate positive correlation with avg_delay — meaning late deliveries are associated with worse reviews. But the relationship is not perfect, suggesting some dissatisfied products arrive on time (product problem) and some arrive late despite good quality scores (logistics issue).


Clustering Methodology

Features Used

Six standardized features capture two quality dimensions plus product attributes:

Dimension Features Purpose
Dissatisfaction pct_bad_reviews % of 1-2 star scores
Logistics quality avg_delay, pct_on_time Delivery speed and reliability
Product economics avg_price_log, avg_freight_ratio, weight_g_log Product type and cost structure

Optimal Number of Clusters

We determine the optimal k using three criteria: Gap statistic (k-means) and average silhouette width (k-means and PAM).

The three methods converge on k = 3 clusters as the optimal choice.


Clustering Results

K-Means vs PAM: Method Selection

K-means achieved an average silhouette of 0.2494 vs PAM’s 0.1646. The k-means algorithm was selected for the final solution.

Cluster Characterization

Cluster Size Distribution
Cluster Products % of Total
Excellent — Great Product & Logistics 4511 51.6
Good — Acceptable Quality 3019 34.6
Low Quality (Mixed) 1206 13.8

PCA Cluster Map (P2)

Interpretation: The clusters separate clearly in PCA space. The first two principal components explain 63.9% of the variance. Problem clusters (product and logistics) occupy distinct regions of the quality space, validating the segmentation.

Cluster Profile Heatmap (P3)

Interpretation: The heatmap confirms the cluster differentiation. The “Excellent” cluster is below average in bad reviews and delay. The “Logistics Problem” cluster is strongly above average in delivery delay. The “Product Problem” cluster is above average in bad reviews but below average in delay — these products arrive on time but still disappoint.

Logistics vs Product Problem Quadrants (P4)

Interpretation: This is the core diagnostic plot. Each point is a product. The horizontal dashed line is the median bad-review rate; the vertical line at x=0 separates on-time (left) from late (right) deliveries. Products in the upper-left quadrant are Product Problems — they arrive on time but receive poor reviews. Products in the upper-right are Compound Problems or logistics issues. The clustering algorithm successfully separates these two problem types.

Silhouette Comparison (P5)

Interpretation: Both methods show similar silhouette profiles. The k-means algorithm achieves a slightly better average silhouette and is used for the final cluster assignments. Products below zero (negative silhouette) may be misclassified — these are borderline cases near cluster boundaries.


Category-Level Analysis

Most Problematic Categories (P6)

Which categories have the highest proportion of products in problem clusters?

Interpretation: home_confort leads with 28% of its products flagged as problematic. The pattern across categories reveals whether the issue is systemic (e.g., a problematic category like construction materials always disappoints) or isolated (e.g., electronics has a mix of good and bad products).

Problem Type Breakdown by Category (P7)

For the worst categories, is the problem logistics-driven or product-driven?

Interpretation: For most categories, the problem type is mixed. Some categories (e.g., furniture, construction) lean toward logistics problems (heavy/bulky items are harder to deliver). Others (e.g., health & beauty, electronics) lean toward product problems — suggesting issues with product quality itself rather than delivery.

Product Attributes by Cluster (P8)

Interpretation: The “Product Problem” cluster tends to have lower-priced products on average, while “Logistics Problem” products span a wider price range. Weight distribution shows that logistics problem products are often heavier — suggesting that bulky items are harder to deliver reliably.


Key Findings

Overall problem rate: 13.8% of products are flagged as problematic (n = 1,206).

Note on problem type split: The 50/50 overall split between logistics and product problems is a methodological artifact — problem types are divided at the median delay of problem products, which guarantees equal halves. The real insight lies in how individual categories deviate from this baseline (see Problem Type by Category figure).

Problem Type Breakdown
Problem Type Products % of Problems Root Cause
Logistics-driven 603 50 Late or unreliable delivery
Product-driven 603 50 Inherent product flaws despite on-time delivery
Low Quality (Mixed) 0 0 Combination of factors

Top Most Problematic Categories

Top 8 Categories by Problem Rate
Category Problem Rate (%) Products (n)
home_confort 28.2 39
construction_tools_lights 20.0 20
kitchen_dining_laundry_garden_furniture 19.2 26
bed_bath_table 18.7 871
fixed_telephony 18.2 22
audio 16.7 24
christmas_supplies 16.7 12
furniture_decor 16.5 674

Strategic Recommendations

Finding Implication Recommendation
home_confort has 28% problem rate Highest-risk category for customer dissatisfaction Investigate supplier quality; set stricter acceptance criteria
Problematic products are more expensive on average Higher-priced products set higher expectations Ensure premium products meet premium expectations
Logistics problem products are heavier — bulky items harder to deliver Bulky items are harder to deliver reliably Offer specialized shipping for large items; manage delivery window expectations

Machine Learning — Predicting Quality Problems

Can we predict which products will get bad reviews using only product attributes (price, weight, freight, category) and does adding delivery data (delay, on-time rate) improve the prediction? This directly tests whether logistics or product factors drive quality problems.

Methodology

Two model sets were trained on 8736 products (80/20 stratified split):

Model Set Features Purpose
A: Product Only Price, weight, freight ratio, orders, category Can we predict quality before shipping?
B: + Delivery A + avg_delay, pct_on_time Does delivery data improve predictions?

Both sets use Random Forest (500 trees) and XGBoost (100 rounds, early stopping). The target is above_median_bad (above-median bad review rate), which is independent of the clustering — avoiding circularity.

Results

Model B Performance (RF, Product + Delivery features)
Metric Model B (+ Delivery)
Accuracy 0.732
Precision 0.775
Recall 0.649
F1 0.706
Does Delivery Data Improve AUC?
Metric Product Only (A) + Delivery (B) Δ (B - A)
RF AUC 0.689 0.783 +0.0940
XGBoost AUC 0.714 0.799 +0.0854

Delivery features improve prediction (ΔAUC > 0.01). Logistics data adds value beyond knowing the product type and price — delivery delay and on-time rate carry real signal about quality problems.

Feature Importance — XGBoost Model B (P9)

Interpretation: The feature importance plot shows which factors most strongly predict bad reviews. If delivery metrics (avg_delay, pct_on_time) rank high, logistics quality is diagnostically important. If product attributes (price, weight, freight ratio) dominate, the product itself drives dissatisfaction. Category dummies ranking lower suggests the effect is driven by continuous attributes, not category-level patterns.

ROC Curves — Product Only vs Product + Delivery (P10)

Interpretation: The ROC curves compare Model A (product only) vs Model B (product + delivery). The gap between curves shows how much predictive power delivery data adds. A narrow gap means product attributes tell the whole story; a wide gap means logistics information is essential for identifying quality problems.

Confusion Matrix — XGBoost (P11)

Interpretation: The confusion matrix shows how well the best model separates above- from below-median review products. High accuracy on non-problem products is expected; false negatives (missed problem products) are the cost of predicting quality from logistics data alone.


Report generated on 2026-06-23 · Data: Olist Brazilian E-Commerce Public Dataset (2016-2018) · 11 visualizations · Day 2: Product Quality Clustering