Customer Satisfaction: Predicting the Silent Customer

Executive Summary
1. Data & Methodology
2. Model Performance
3. What Drives Customer Sentiment?
4. Recommendations for Olist

Executive Summary

Many customers on the Olist marketplace do not leave a written review or score. To understand the operational drivers of customer satisfaction—and proactively identify at-risk “silent” customers—we developed an XGBoost machine learning model.

The model was trained on customers who did leave reviews, learning the complex relationships between the logistics of their order and their ultimate satisfaction. We categorized satisfaction into three actionable classes:

Mad (Review Score 1-2)
Neutral (Review Score 3)
Happy (Review Score 4-5)

This report details the operational drivers of these sentiments.

1. Data & Methodology

Instead of relying on textual data, we engineered features that describe the Customer Journey, matching methodologies from previous clustering analyses:

Logistics: Delivery time (days), delay vs. estimate, and an “is late” indicator.
Financials: Total order value and freight value.
Complexity: Number of items, unique sellers, and product categories.
Payments: Number of payment installments.

2. Model Performance

We trained an XGBoost multi-class classification model to predict the customer’s sentiment class.

## Training model: XGBoost Base...

Confusion Matrix

The confusion matrix below shows how well the model differentiates between Happy, Neutral, and Mad customers on unseen test data.

(Note: The overall accuracy is constrained by the extreme baseline imbalance where “Happy” dominates the platform. However, the model successfully identifies the boundary conditions that push a customer into the “Mad” category.)

3. What Drives Customer Sentiment?

Predicting the score is only half the battle; we need to know why. We use two techniques to extract business insights from the model.

Global View: Feature Importance

Feature importance tells us which variables the model relies on most heavily across all customers.

Insight: Logistical factors—specifically delivery_time_days and delay_days—are consistently the most critical factors in determining customer satisfaction.

Local View: SHAP Values

SHAP (SHapley Additive exPlanations) values break down exactly how each feature impacts the prediction.

How to read this plot:

A dot represents a single customer.
The color represents the actual value of the feature (e.g., Red = High delay, Blue = Low delay).
The X-axis represents the impact on the prediction. A positive SHAP value pushes the prediction toward that specific class.

Key Business Takeaways:

Delivery Delay is Toxic: High values of delay_days (red dots) strongly push the model to predict the “Mad” class.
Speed equals Happiness: Extremely low delivery_time_days (fast shipping) strongly pushes the prediction into the “Happy” class.
Freight Sensitivity: High log_freight_value has a slight pushing effect towards lower satisfaction, indicating customers are sensitive to disproportionately high shipping costs.

4. Recommendations for Olist

Proactive Intervention: Run this model daily on all delivered orders that have not received a review within 3 days. If the model predicts the customer is “Mad”, proactively trigger a customer service outreach email with a discount code.
Seller Logistics Enforcement: Since delivery speed and delays are the absolute dominant drivers of anger, Olist must penalize sellers with consistently high carrier handover times.
Manage Expectations: If an order is predicted to be late during transit, notify the customer immediately. Managing the expectation of the delay before it happens may blunt the shift from “Happy” to “Mad”.