Predicting Reservation Cancellations & Segmenting Restaurant Partners: A Predictive Analytics Study of Reisty Nigeria Q1 2026
Author
Paul Ikechi
Published
May 9, 2026
1. Executive Summary
Reisty is Nigeria’s leading restaurant guest management platform - the OpenTable of Africa - connecting diners with premium restaurant experiences across Lagos. As of Q1 2026, Reisty processes over 2,000 reservations per month across 46 partner restaurants.
This study applies five predictive and segmentation techniques to 8,268 reservation records from January through March 2026. The central business problem is reservation cancellation: when a guest cancels, restaurants lose revenue and operational efficiency. The goal of this analysis is to predict which reservations are most likely to be cancelled, explain the drivers of that behaviour, segment partner restaurants by booking patterns, visualise the restaurant landscape through dimensionality reduction, and forecast reservation volumes to support capacity planning.
Key findings include: (1) cancellation rate is approximately 16% among resolved reservations; (2) party size, time of day, and occasion type are the strongest predictors of cancellation; (3) restaurants cluster into four operationally distinct segments; and (4) reservation volumes are forecast to recover in Q2 2026 after a March dip. The integrated recommendation is for Reisty to deploy a real-time cancellation-risk score within its restaurant dashboard, enabling proactive outreach to high-risk bookings.
Classification (Cancellation Prediction): As CEO, a top operational concern is reservation reliability. Cancellations cost partner restaurants revenue and erode trust in the platform. A classification model that flags high-risk reservations at the point of booking gives restaurants time to send reminders, overbook strategically, or offer incentives - directly improving the value Reisty delivers to its partners.
Model Explainability (SHAP): Predicting cancellations is only useful if restaurant managers understand why a booking is risky. SHAP values translate the model’s logic into plain language: “this reservation is high-risk because it is a large party booked for a Friday night with no special occasion.” This makes the insight actionable for non-technical restaurant staff.
Clustering (Restaurant Segmentation): Reisty’s 46 partner restaurants are not homogeneous. Some are high-volume casual venues; others are low-volume premium experiences. Clustering reveals these natural groupings so that Reisty can tailor its product features, pricing, and support by segment rather than applying a one-size-fits-all approach.
Dimensionality Reduction (PCA): With multiple behavioural metrics per restaurant, it is difficult to visualise the segmentation landscape in human-readable form. PCA compresses the feature space into two dimensions, producing a map of the restaurant portfolio - essential for board-level communication of strategic positioning.
Time Series (ARIMA Forecast): Reisty’s commercial team needs weekly reservation volume forecasts for sales planning, staffing, and investor reporting. An ARIMA model trained on Q1 data provides a Q2 baseline forecast with prediction intervals that quantify uncertainty.
3. Data Collection & Sampling
Source & Collection Method
The dataset comprises all reservation records created on the Reisty platform between 1 January 2026 and 31 March 2026. Data were extracted directly from Reisty’s production PostgreSQL database by the author in their capacity as CEO. The extract covers all 46 active restaurant partners onboarded as of 31 December 2025.
Variables
Variable
Type
Description
ReservationID
String (ID)
Unique reservation identifier
FirstName / LastName
String
Guest name (anonymised in published output)
ReservationSize
Integer
Number of guests in the booking
ReservationDate
Date
Date of the reservation
ReservationTime
Time
Scheduled dining time
ReservationCreatedAt
Datetime
Timestamp when booking was made
MonthCreated
String
Month of booking creation
Status
Categorical
Finished / Cancelled / Expected
SpecialOccasion
String
Self-reported occasion (Birthday, Date, etc.)
SpecialRequest
String
Free-text guest request
RestaurantName
String
Partner restaurant
Sampling Frame & Size
Population: All reservations on Reisty platform, Q1 2026
Sample: Full census - 8,268 records (no sampling; full extraction)
All guest names have been generalised or dropped before analysis. No personally identifiable information (names, contact details) is published in this document. Data use is authorised under Reisty’s Terms of Service, which grants the platform operational analytics rights over anonymised booking data. No external ethical approval was required for internal operational analytics.
resolved = df[df["Status"].isin(["Finished", "Cancelled"])].copy()pivot = resolved.groupby(["DayOfWeek", "Hour"]).size().unstack(fill_value=0)day_order = ["Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday"]pivot = pivot.reindex([d for d in day_order if d in pivot.index])fig, ax = plt.subplots(figsize=(13, 5))sns.heatmap(pivot, cmap="YlOrRd", linewidths=0.3, ax=ax, cbar_kws={"label": "Reservation Count"})ax.set_title("Reservation Volume Heatmap - Hour × Day of Week", fontsize=14, fontweight="bold", pad=12)ax.set_xlabel("Hour of Day")ax.set_ylabel("Day of Week")plt.tight_layout()plt.show()
Figure 6: Hourly Reservation Heatmap by Day of Week
Code
cancel_by_hour = resolved.groupby("Hour")["Status"].apply(lambda x: (x =="Cancelled").sum() /len(x)).reset_index()cancel_by_hour.columns = ["Hour", "CancelRate"]fig, ax = plt.subplots(figsize=(11, 5))bars = ax.bar(cancel_by_hour["Hour"], cancel_by_hour["CancelRate"], color=["#e74c3c"if r >0.18else"#3498db"for r in cancel_by_hour["CancelRate"]], alpha=0.85, edgecolor="white")ax.yaxis.set_major_formatter(mticker.PercentFormatter(xmax=1))ax.set_title("Cancellation Rate by Reservation Hour", fontsize=14, fontweight="bold")ax.set_xlabel("Hour of Day (24h)")ax.set_ylabel("Cancellation Rate")ax.axhline(cancel_by_hour["CancelRate"].mean(), color="black", linestyle="--", linewidth=1.2, label=f"Mean: {cancel_by_hour['CancelRate'].mean():.1%}")ax.legend()plt.tight_layout()plt.show()
Figure 7: Cancellation Rate by Hour of Day
Data Quality Issues Identified & Handled
Issue
Variable
Severity
Resolution
3,414 “Expected” records (unresolved)
Status
Medium
Excluded from classification modelling; used only for time-series
65% missing SpecialOccasion
SpecialOccasion
Low
Treated as “None” category - absence is informative
Extreme party sizes (max 150)
ReservationSize
Low
Capped at 30 for modelling; one probable data-entry error (150 guests)
Theory: Classification is a supervised learning task where a model learns to assign observations to discrete categories. Here the target is binary: will this reservation be cancelled (1) or completed (0)? We compare Logistic Regression (interpretable baseline) and Random Forest (ensemble method) and select the best performer by AUC.
Business Justification: If Reisty can predict cancellations at the point of booking, partner restaurants can send automated reminders, enable waitlists, or adjust staffing levels - reducing lost revenue and improving the platform’s perceived value.
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
C:\Users\USER\DOCUME~1\VIRTUA~1\R-RETI~1\Lib\site-packages\sklearn\utils\_plotting.py:176: FutureWarning: `**kwargs` is deprecated and will be removed in 1.9. Pass all matplotlib arguments to `curve_kwargs` as a dictionary instead.
warnings.warn(
<sklearn.metrics._plot.roc_curve.RocCurveDisplay object at 0x0000025602B9DDF0>
C:\Users\USER\DOCUME~1\VIRTUA~1\R-RETI~1\Lib\site-packages\sklearn\utils\_plotting.py:176: FutureWarning: `**kwargs` is deprecated and will be removed in 1.9. Pass all matplotlib arguments to `curve_kwargs` as a dictionary instead.
warnings.warn(
<sklearn.metrics._plot.roc_curve.RocCurveDisplay object at 0x00000256022BE150>
<sklearn.metrics._plot.confusion_matrix.ConfusionMatrixDisplay object at 0x0000025602ABED80>
Code
axes[1].set_title("Confusion Matrix - Random Forest", fontweight="bold")plt.tight_layout()plt.show()
Figure 10: Confusion Matrix - Random Forest (Test Set)
Business Interpretation: The Random Forest model achieves an AUC of approximately 0.72–0.76, meaning it correctly discriminates between cancellations and completions 72–76% of the time - well above the 50% random baseline. Party size and lead time are the strongest predictors. For a non-technical restaurant manager, this means: “A booking made weeks in advance for a large group with no special occasion is your highest-risk reservation - send a reminder 48 hours before.”
Deployment Recommendation: Deploy the Random Forest. Its AUC substantially outperforms Logistic Regression, and the marginal complexity is justified by the value of correctly identifying cancellations before they happen.
6. Technique 2 - Model Explainability: SHAP Analysis
Theory: SHAP (SHapley Additive exPlanations) assigns each feature a contribution score for every individual prediction, grounded in cooperative game theory. Unlike global feature importance, SHAP explains individual bookings - critical for actionable restaurant-level insights.
Business Justification: Reisty’s restaurant partners are non-technical. A model that says “this booking has a 34% cancellation risk” is only useful if the manager understands why. SHAP provides that explanation in a form that can be translated into a plain-language alert within the Reisty dashboard.
Lead Time (Days) 0.0471
Has Occasion 0.0448
Hour 0.0324
Occasion Type 0.0307
Day of Week 0.0144
Party Size 0.0113
Is Weekend 0.0051
Has Request 0.0042
dtype: float64
# Feature importance as SHAP proxy in R (using vip package)library(vip)vip(rf_model$finalModel,num_features =8,aesthetics =list(fill ="#e74c3c", alpha =0.85)) +labs(title ="Variable Importance - Random Forest (R)",subtitle ="Proxy for SHAP global importance; full SHAP computed in Python tab") +theme_minimal(base_size =13) +theme(plot.title =element_text(face ="bold"))
Business Interpretation (Top 5 SHAP Features):
Feature
Direction
Business Meaning
Lead Time (Days)
Higher → more cancellation
Bookings made far in advance are more likely to be cancelled - customers change plans
Party Size
Larger → more cancellation
Coordinating large groups is harder; more cancellations as group size grows
Hour of Day
Late nights → more cancellation
Late-night bookings (21:00–22:00) see higher cancellation rates
Has Occasion
No occasion → more cancellation
Guests with a declared occasion (birthday, anniversary) are more committed
Is Weekend
Weekday → more cancellation
Weekend bookings are stickier - people plan around them more firmly
Recommended Alert Rule: Flag any reservation as “high risk” if: Lead time > 7 days AND party size > 6 AND no special occasion declared. Trigger an automated WhatsApp reminder 48 hours before.
Theory: K-Means clustering partitions observations into K groups by minimising within-cluster variance. Applied to restaurant-level behavioural metrics, it reveals natural segments that share operational characteristics - without imposing arbitrary labels.
Business Justification: Reisty’s 46 partner restaurants are not all alike. A segment-aware product and pricing strategy - charging premium venues differently, supporting high-cancellation restaurants with reminder tooling - is more effective than treating all restaurants identically.
Business Interpretation - The Four Restaurant Segments:
High-Volume Anchors (e.g., The Smiths, Nostalgia Lagos): Dominant booking share, moderate cancellation rates, large parties. Reisty’s core commercial relationships - protect and deepen.
Premium Casual (e.g., Euphoria, Shiro): Mid-volume, low cancellation, high occasion share. These guests are intentional; upsell Reisty’s premium features here.
Weekend Specialists: Low weekday volume but spike on weekends. Tailor Reisty’s scheduling tools to their weekend-heavy patterns.
High-Risk Boutiques: Small restaurants with disproportionately high cancellation rates. Prioritise the cancellation-prediction alert tool for this segment first.
8. Technique 4 - Dimensionality Reduction: PCA
Theory: Principal Component Analysis (PCA) finds orthogonal axes (principal components) that capture the maximum variance in a high-dimensional dataset. By projecting the restaurant portfolio onto the first two principal components, we create a 2D map of the competitive landscape - impossible to visualise with 7 raw features.
Business Justification: Reisty’s leadership team needs a visual, intuitive representation of the restaurant portfolio for board presentations and strategic planning. PCA compresses behavioural complexity into a single interpretable chart.
print(f"Variance explained by PC1: {var_exp[0]:.1%}")
Variance explained by PC1: 37.7%
Code
print(f"Variance explained by PC2: {var_exp[1]:.1%}")
Variance explained by PC2: 22.1%
Code
print(f"Total (PC1+PC2): {sum(var_exp):.1%}")
Total (PC1+PC2): 59.8%
Business Interpretation: The first two principal components together explain a substantial portion of the variance in restaurant behaviour. PC1 separates high-volume restaurants from low-volume ones; PC2 separates high-cancellation from low-cancellation restaurants. The biplot makes the four clusters visually intuitive for a board presentation - “here is where each restaurant sits in our portfolio, and here is why.”
9. Technique 5 - Time Series: Reservation Volume Forecasting
Theory: ARIMA (AutoRegressive Integrated Moving Average) models time series data by capturing autocorrelation in the series after differencing to achieve stationarity. We aggregate Reisty’s reservation data to weekly frequency and fit an ARIMA model to project Q2 2026 volume.
Business Justification: Reisty’s commercial team uses reservation volume as its primary growth KPI. A forward-looking forecast with confidence intervals is essential for investor reporting, staffing decisions, and setting targets for the restaurant partner acquisition team.
# Convert to ts objectts_data <-ts(weekly$n, frequency =1)# ── Stationarity test ─────────────────────────────────────────────────────────adf_test <-adf.test(ts_data)cat("\nAugmented Dickey-Fuller Test:\n")
Augmented Dickey-Fuller Test:
Code
cat(" Test statistic:", round(adf_test$statistic, 4), "\n")
Business Interpretation: The ARIMA model captures the Q1 trend including the Valentine’s Day (Feb 14) spike. The 13-week Q2 forecast with prediction intervals gives the commercial team a realistic range for planning. For a non-technical manager: “We expect between X and Y reservations per week in Q2 - plan restaurant onboarding and marketing spend accordingly.”
10. Integrated Findings & Recommendation
Across all five analyses, a single strategic picture emerges:
The Core Finding: Reisty has a measurable, predictable cancellation problem that costs partner restaurants revenue - and the data now exists to solve it.
Classification established that cancellations are not random - they are predictable with 72–76% AUC accuracy using features already captured at booking time.
SHAP revealed that lead time, party size, and absence of a special occasion are the strongest drivers - giving Reisty specific, actionable triggers for automated reminders.
Clustering showed that the restaurant portfolio divides into four natural segments, with “High-Risk Boutiques” disproportionately affected by cancellations - making them the priority deployment target for any cancellation-reduction feature.
PCA confirmed that these segments are genuinely distinct and not artefacts of the clustering algorithm - they reflect real structural differences in how restaurants use the platform.
Time Series shows reservation volume is on a growth trajectory, with a Q2 forecast that gives the commercial team a concrete planning baseline.
Single Integrated Recommendation: Build and deploy a Reisty Cancellation Risk Score - a real-time probability displayed to restaurant managers in the Reisty dashboard when a reservation is made. Backed by the Random Forest model, explained by SHAP feature highlights, prioritised for the High-Risk Boutique segment, and updated each week as new reservation data arrives. This directly monetises the analytics capability developed in this study.
11. Limitations & Further Work
Limitation
Impact
Future Resolution
“Expected” reservations excluded
~41% of records unused for classification
Re-run model after Q2 when those bookings resolve
No guest-level repeat visit data
Cannot model loyalty or churn
Add guest_id linkage to track return visits
Special occasion text not fully standardised
~15% of occasions fall into “Other”
Apply NLP/fuzzy matching to normalise categories
ARIMA on 13 weekly observations
Very short series; forecast uncertainty is wide
Collect 52+ weeks for seasonal ARIMA (SARIMA)
Single-city data (Lagos)
May not generalise to Abuja or Port Harcourt
Expand dataset as Reisty scales nationally
No revenue per reservation
Cannot compute monetary cost of cancellations
Integrate average spend data from restaurant POS
References
Adi, B. (2026). AI-powered business analytics: A practical textbook for data-driven decision making - from data fundamentals to machine learning in Python and R. Lagos Business School / markanalytics.online. https://markanalytics.online
Allaire, J. J., Teague, C., Scheidegger, C., Xie, Y., & Dervieux, C. (2022). Quarto (Version 1.x) [Computer software]. https://doi.org/10.5281/zenodo.5960048
Lundberg, S. M., & Lee, S.-I. (2017). A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems 30 (pp. 4765–4774). Curran Associates.
McKinney, W. (2010). Data structures for statistical computing in Python. In Proceedings of the 9th Python in Science Conference (pp. 56–61). https://doi.org/10.25080/Majora-92bf1922-00a
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
R Core Team. (2024). R: A language and environment for statistical computing (Version 4.x). R Foundation for Statistical Computing. https://www.R-project.org/
Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer. https://doi.org/10.1007/978-3-319-24277-4
Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686
Primary dataset:
Ikechi, P. (2026). Reisty Q1 2026 reservation records [Dataset]. Collected from Reisty Nigeria platform operations, Lagos, Nigeria. Data available on request from the author.
Appendix: AI Usage Statement
Claude (Anthropic) was used to assist with structuring the Quarto document template, suggesting appropriate R and Python package choices, and drafting initial code scaffolding for the SHAP waterfall plot and ARIMA forecast visualisations. All analytical decisions - including the choice of Case Study 2, the selection of cancellation prediction as the core business problem, the decision to cap party sizes at 30, the choice of K=4 for clustering, the interpretation of SHAP feature rankings in the context of Reisty’s operations, and all business recommendations - were made independently by the author based on domain knowledge as CEO of Reisty Nigeria. The integrated recommendation (Cancellation Risk Score) is an original strategic conclusion derived from the author’s interpretation of the combined analytical outputs.