Many customers on the Olist marketplace do not leave a written review or score. To understand the operational drivers of customer satisfaction—and proactively identify at-risk “silent” customers—we developed an XGBoost machine learning model.
The model was trained on customers who did leave reviews, learning the complex relationships between the logistics of their order and their ultimate satisfaction. We categorized satisfaction into three actionable classes:
This report details the operational drivers of these sentiments.
Instead of relying on textual data, we engineered features that describe the Customer Journey, matching methodologies from previous clustering analyses:
We trained an XGBoost multi-class classification model to predict the customer’s sentiment class.
## Training model: XGBoost Base...
The confusion matrix below shows how well the model differentiates between Happy, Neutral, and Mad customers on unseen test data.
(Note: The overall accuracy is constrained by the extreme baseline imbalance where “Happy” dominates the platform. However, the model successfully identifies the boundary conditions that push a customer into the “Mad” category.)
Predicting the score is only half the battle; we need to know why. We use two techniques to extract business insights from the model.
Feature importance tells us which variables the model relies on most heavily across all customers.
Insight: Logistical factors—specifically
delivery_time_days and delay_days—are
consistently the most critical factors in determining customer
satisfaction.
SHAP (SHapley Additive exPlanations) values break down exactly how each feature impacts the prediction.
How to read this plot:
Key Business Takeaways:
delay_days (red dots) strongly push the model to predict
the “Mad” class.delivery_time_days (fast shipping) strongly pushes the
prediction into the “Happy” class.log_freight_value has a slight pushing effect towards lower
satisfaction, indicating customers are sensitive to disproportionately
high shipping costs.