The dataset is imported from the specified file path, and an output directory is created to store results if it does not already exist. This ensures reproducibility and organized storage of analysis outputs.
The raw dataset is read. Column names are standardized to lowercase for consistency. A copy of the dataset is kept as a backup, and missing values across all variables are checked to assess data quality before processing.
## order_id customer_id order_date order_amount
## 0 0 0 0
## city cuisine_type delivery_time vendor_id
## 0 0 0 0
## rating is_repeat_customer
## 0 0
Key data transformations are applied to make the dataset analysis-ready. Dates and times are converted to proper formats, new variables such as day of the week, hour of order, weekend flags, and time slots are created. Numeric variables like order amount, delivery time, and ratings are standardized, while missing customer repeat flags are imputed. This structured dataset is stored for further analysis.
To identify the most valuable customers, revenue, order counts, average ratings, order frequency, and preferences are aggregated at the customer level. The top 10% of customers by revenue are selected, highlighting their contribution to overall business performance.
## [1] "Total customers in top 10% is 1000. Heres a gimple of top 10"
## # A tibble: 10 × 9
## customer_id revenue orders avg_rating aov orders_per_month fav_time_slot
## <chr> <dbl> <int> <dbl> <dbl> <dbl> <chr>
## 1 C14617 46208. 132 3.60 350. 11 Lunch (12–16)
## 2 C12102 45225. 114 3.35 397. 9.5 Lunch (12–16)
## 3 C19551 45062. 121 3.61 372. 10.1 Late Evening (2…
## 4 C19539 45026. 139 3.68 324. 11.6 Overnight (0–5)
## 5 C15333 43493. 131 3.53 332. 10.9 Evening (17–19)
## 6 C19493 43041. 116 3.77 371. 9.67 Breakfast (6–11)
## 7 C10000 42940. 118 3.42 364. 9.83 Overnight (0–5)
## 8 C17332 42808. 128 3.59 334. 10.7 Late Evening (2…
## 9 C11940 42775. 123 3.42 348. 10.2 Breakfast (6–11)
## 10 C13336 42731. 133 3.37 321. 11.1 Breakfast (6–11)
## # ℹ 2 more variables: fav_cuisine <chr>, fav_city <chr>
This pie chart shows the distribution of the most common order time slots among the top 10% of customers. It highlights when your highest-value customers prefer to place orders — whether during breakfast, lunch, evening, or late-night hours.
This pie chart summarizes which cuisines are most popular among top customers. It provides insight into which food categories drive loyalty and high spending.
Here, we visualize the cities where top customers are most active. This pie chart highlights the geographic concentration of your most valuable customers.
Customers are segmented based on whether they placed repeat orders or only a single order. For each segment, customer counts, revenue, order behavior, delivery time, and satisfaction (ratings) are summarized. This helps in contrasting loyal customers against one-time buyers.
## # A tibble: 2 × 11
## segment customers total_orders avg_orders_per_cust avg_revenue total_revenue
## <chr> <int> <int> <dbl> <dbl> <dbl>
## 1 One-time… 10000 399197 39.9 308. 123082793.
## 2 Repeat C… 10000 600803 60.1 308. 185207349.
## # ℹ 5 more variables: avg_delivery_time <dbl>, avg_rating <dbl>,
## # fav_time_slot <chr>, fav_cuisine <chr>, fav_city <chr>
Orders are analyzed across different days of the week and time slots. Visualizations show which weekdays and time windows attract the highest number of orders, offering insights into peak demand periods.
The dataset is grouped by weekday versus weekend to compare order volumes, revenue, and ratings. This highlights customer demand shifts between workdays and leisure days, supported by visual comparisons.
## # A tibble: 2 × 4
## segment total_orders avg_revenue avg_rating
## <chr> <int> <dbl> <dbl>
## 1 Weekday 714817 308. 3.55
## 2 Weekend 285183 309. 3.55
Monthly order trends are tracked across different cities to identify growth patterns or declines. The month-over-month growth rate is calculated, and a line chart illustrates the performance of each city over time, highlighting regional strengths and weaknesses.
To better understand customer preferences, cuisines were analyzed at the city level. The most popular cuisine in each city was identified, and cuisine types were compared by average order value and total revenue. Results highlight both high-demand cuisines and those that drive premium spending. Vendor performance was also assessed, ranking vendors by orders, revenue, and customer ratings to identify top performers and opportunities for improvement.
## $by_orders
## # A tibble: 10 × 5
## vendor_id orders revenue avg_rating aov
## <chr> <int> <dbl> <dbl> <dbl>
## 1 V4020 310 97033. 3.52 313.
## 2 V3887 309 92662. 3.50 300.
## 3 V3610 303 95746. 3.58 316.
## 4 V3604 303 93907. 3.53 310.
## 5 V4560 299 92513. 3.61 309.
## 6 V3362 298 93397. 3.46 313.
## 7 V3394 297 91594. 3.59 308.
## 8 V2936 296 98286. 3.59 332.
## 9 V1104 296 88353. 3.49 298.
## 10 V1425 295 94125. 3.62 319.
##
## $by_revenue
## # A tibble: 10 × 5
## vendor_id orders revenue avg_rating aov
## <chr> <int> <dbl> <dbl> <dbl>
## 1 V4188 292 99263. 3.61 340.
## 2 V2936 296 98286. 3.59 332.
## 3 V3947 287 98031. 3.63 342.
## 4 V4020 310 97033. 3.52 313.
## 5 V4233 279 96214. 3.44 345.
## 6 V1976 274 96043. 3.65 351.
## 7 V2244 288 95916. 3.52 333.
## 8 V3610 303 95746. 3.58 316.
## 9 V3962 287 95716. 3.53 334.
## 10 V1339 282 95068. 3.59 337.
##
## $by_rating
## # A tibble: 10 × 5
## vendor_id orders revenue avg_rating aov
## <chr> <int> <dbl> <dbl> <dbl>
## 1 V1641 226 68732. 3.79 304.
## 2 V2601 270 82079. 3.79 304.
## 3 V4206 237 69703. 3.78 294.
## 4 V2092 226 74370. 3.76 329.
## 5 V1091 233 73381. 3.75 315.
## 6 V2650 242 73483. 3.75 304.
## 7 V3380 241 72071. 3.75 299.
## 8 V3740 263 80211. 3.75 305.
## 9 V2249 229 76447. 3.74 334.
## 10 V2478 260 81004. 3.74 312.
Order activity patterns were explored using a day-by-hour heatmap, which revealed peak ordering times across the week. Additionally, a Pareto analysis showed that a small share of customers contributed disproportionately to overall revenue, supporting the “80/20 rule” in customer value distribution. These insights help focus retention and promotional strategies on high-value time slots and customer groups.
Pivot-style summaries were created to examine cuisine performance across different cities. These tables compare total orders, revenue, and average ratings by cuisine and city, allowing us to pinpoint regional strengths, local specialties, and areas where cuisine-specific marketing may drive growth.
## # A tibble: 36 × 9
## city total_revenue avg_rating Chinese Continental Italian Mexican
## <chr> <dbl> <dbl> <int> <int> <int> <int>
## 1 Bangalore 8479350. 3.55 27674 0 0 0
## 2 Bangalore 8554579. 3.55 0 27726 0 0
## 3 Bangalore 8549064. 3.55 0 0 27780 0
## 4 Bangalore 8564959. 3.55 0 0 0 27947
## 5 Bangalore 8514884. 3.55 0 0 0 0
## 6 Bangalore 8583921. 3.55 0 0 0 0
## 7 Chennai 8693959. 3.54 28109 0 0 0
## 8 Chennai 8587596. 3.55 0 27943 0 0
## 9 Chennai 8546724. 3.56 0 0 27681 0
## 10 Chennai 8467507. 3.55 0 0 0 27503
## # ℹ 26 more rows
## # ℹ 2 more variables: `North Indian` <int>, `South Indian` <int>
To predict customer return behavior, survival models were fitted using multiple distributions (Weibull, lognormal, exponential, and log-logistic). Model performance was compared with AIC, and the best-fitting model was used to estimate median time to next purchase for repeat customers. Predicted return dates provide valuable inputs for targeted re-engagement campaigns and churn prevention.
## weibull lognormal exponential loglogistic
## 3463072 3485374 3486244 3447133
## ✅ Best distribution based on AIC: loglogistic
## # A tibble: 6 × 20
## order_id customer_id order_date order_amount city cuisine_type
## <chr> <chr> <dttm> <dbl> <fct> <fct>
## 1 O1911174 C10000 2025-06-26 21:05:00 410. Delhi Italian
## 2 O1178813 C10001 2025-06-30 02:43:00 408. Bangalore Italian
## 3 O1320164 C10002 2025-06-29 15:40:00 944. Mumbai Mexican
## 4 O1086640 C10003 2025-06-16 02:18:00 411. Delhi Chinese
## 5 O1969590 C10004 2025-06-18 21:49:00 207. Bangalore South Indian
## 6 O1830655 C10005 2025-06-30 20:13:00 282. Kolkata Chinese
## # ℹ 14 more variables: delivery_time <dbl>, vendor_id <chr>, rating <dbl>,
## # is_repeat_customer <dbl>, order_datetime <dttm>, date <date>, month <chr>,
## # dow <ord>, order_time <chr>, hour <int>, is_weekend <dbl>, time_slot <fct>,
## # predicted_gap_days <dbl>, predicted_next_purchase <dttm>
An RFM (Recency, Frequency, Monetary) framework was applied to segment customers by purchasing behavior. Customers were classified into groups such as “Champions,” “Loyal Customers,” and “At Risk.” Segment summaries highlight the number of customers, their spending power, and contribution to revenue. This analysis provides a foundation for personalized loyalty strategies and customer lifecycle management.
## # A tibble: 6 × 6
## Loyalty_segment customers avg_recency avg_frequency avg_monetary total_revenue
## <chr> <int> <dbl> <dbl> <dbl> <dbl>
## 1 At Risk 3255 6 92.9 28328. 92207097.
## 2 Loyal Customers 2325 1.7 104 31836. 74017567.
## 3 Big Spenders 1478 6.6 108. 34171. 50504090.
## 4 Champions 1398 0.9 111. 34511. 48246737.
## 5 Recent but Low… 1510 0.9 90.9 28024. 42316324.
## 6 Frequent Low S… 34 7.4 104 29363. 998327.
Here’s a polished conclusion section you can insert into your report, weaving together all the findings you summarized:
The analysis of the food delivery dataset provides several key insights:
Top Customers: The top 10% of revenue-generating customers show a clear preference for breakfast and overnight orders, while their favorite cuisines and cities are more evenly distributed without strong dominance in any one category.
Order Trends: Weekly order values remain broadly stable across the week. While weekdays generate a greater total number of orders compared to weekends, the daily averages are similar, indicating balanced demand on a per-day basis. Interestingly, Monday exhibits the highest hourly fluctuations in order volume compared to other days, suggesting irregular demand patterns at the start of the week.
City-Level Dynamics: Overall city-level order trends remain steady, with a noticeable dip in February 2025 but otherwise consistent performance. Across cities, all cuisine categories perform at similar levels, with average order values just above ₹300 regardless of cuisine type
Forecasting Next Orders: Survival models were tested to forecast customer return behavior. Based on AIC scores, the loglogistic distribution provided the best fit for modeling purchase intervals, making it the most reliable approach for predicting the timing of the next order.
Customer Loyalty (RFM Analysis): Loyalty segmentation revealed distinct groups.
Champions (1,398 customers) and Loyal Customers (2,325 customers) show the highest frequency and monetary contributions, making them critical to sustain.
Big Spenders (1,478 customers) contribute significantly to revenue despite less frequent recency.
At Risk (3,255 customers) form the largest segment, representing a retention challenge and an opportunity for targeted re-engagement.
Other smaller segments such as Recent but Low Frequency and Frequent Low Spenders highlight diverse customer behaviors that may need tailored strategies.
Overall, the platform enjoys a stable demand base with strong contributions from a small group of top-tier customers. Retaining loyal and high-spending segments, addressing “At Risk” customers, and capitalizing on predictable ordering patterns (like breakfast and overnight slots) can significantly enhance growth and customer lifetime value.
The analysis covered customer behavior, order trends, cuisine preferences, vendor performance, and loyalty dynamics. Key findings show that a small proportion of customers and vendors drive the majority of revenue, ordering patterns are strongly time-dependent, and customer loyalty can be segmented effectively using RFM. These insights enable data-driven decision-making to boost retention, optimize vendor partnerships, and enhance marketing strategies.