LIBRARIES Load

Configuration of dataset

The dataset is imported from the specified file path, and an output directory is created to store results if it does not already exist. This ensures reproducibility and organized storage of analysis outputs.

LOAD DATA

The raw dataset is read. Column names are standardized to lowercase for consistency. A copy of the dataset is kept as a backup, and missing values across all variables are checked to assess data quality before processing.

##           order_id        customer_id         order_date       order_amount 
##                  0                  0                  0                  0 
##               city       cuisine_type      delivery_time          vendor_id 
##                  0                  0                  0                  0 
##             rating is_repeat_customer 
##                  0                  0

DATA CLEANING

Key data transformations are applied to make the dataset analysis-ready. Dates and times are converted to proper formats, new variables such as day of the week, hour of order, weekend flags, and time slots are created. Numeric variables like order amount, delivery time, and ratings are standardized, while missing customer repeat flags are imputed. This structured dataset is stored for further analysis.

Customer Segmentation & Behavior

Top 10% Revenue Customers

To identify the most valuable customers, revenue, order counts, average ratings, order frequency, and preferences are aggregated at the customer level. The top 10% of customers by revenue are selected, highlighting their contribution to overall business performance.

## [1] "Total customers in top 10% is 1000. Heres a gimple of top 10"
## # A tibble: 10 × 9
##    customer_id revenue orders avg_rating   aov orders_per_month fav_time_slot   
##    <chr>         <dbl>  <int>      <dbl> <dbl>            <dbl> <chr>           
##  1 C14617       46208.    132       3.60  350.            11    Lunch (12–16)   
##  2 C12102       45225.    114       3.35  397.             9.5  Lunch (12–16)   
##  3 C19551       45062.    121       3.61  372.            10.1  Late Evening (2…
##  4 C19539       45026.    139       3.68  324.            11.6  Overnight (0–5) 
##  5 C15333       43493.    131       3.53  332.            10.9  Evening (17–19) 
##  6 C19493       43041.    116       3.77  371.             9.67 Breakfast (6–11)
##  7 C10000       42940.    118       3.42  364.             9.83 Overnight (0–5) 
##  8 C17332       42808.    128       3.59  334.            10.7  Late Evening (2…
##  9 C11940       42775.    123       3.42  348.            10.2  Breakfast (6–11)
## 10 C13336       42731.    133       3.37  321.            11.1  Breakfast (6–11)
## # ℹ 2 more variables: fav_cuisine <chr>, fav_city <chr>

Favorite Time Slots (Top Customers)

This pie chart shows the distribution of the most common order time slots among the top 10% of customers. It highlights when your highest-value customers prefer to place orders — whether during breakfast, lunch, evening, or late-night hours.

Favorite Cuisines (Top Customers)

This pie chart summarizes which cuisines are most popular among top customers. It provides insight into which food categories drive loyalty and high spending.

Favorite Cities (Top Customers)

Here, we visualize the cities where top customers are most active. This pie chart highlights the geographic concentration of your most valuable customers.

Repeat vs One-time Customers

Customers are segmented based on whether they placed repeat orders or only a single order. For each segment, customer counts, revenue, order behavior, delivery time, and satisfaction (ratings) are summarized. This helps in contrasting loyal customers against one-time buyers.

## # A tibble: 2 × 11
##   segment   customers total_orders avg_orders_per_cust avg_revenue total_revenue
##   <chr>         <int>        <int>               <dbl>       <dbl>         <dbl>
## 1 One-time…     10000       399197                39.9        308.    123082793.
## 2 Repeat C…     10000       600803                60.1        308.    185207349.
## # ℹ 5 more variables: avg_delivery_time <dbl>, avg_rating <dbl>,
## #   fav_time_slot <chr>, fav_cuisine <chr>, fav_city <chr>

Order Trend Analysis

Orders by Day & Time Slot

Orders are analyzed across different days of the week and time slots. Visualizations show which weekdays and time windows attract the highest number of orders, offering insights into peak demand periods.

Weekday vs Weekend Orders

The dataset is grouped by weekday versus weekend to compare order volumes, revenue, and ratings. This highlights customer demand shifts between workdays and leisure days, supported by visual comparisons.

## # A tibble: 2 × 4
##   segment total_orders avg_revenue avg_rating
##   <chr>          <int>       <dbl>      <dbl>
## 1 Weekday       714817        308.       3.55
## 2 Weekend       285183        309.       3.55

City-Level Growth/Decline

Monthly order trends are tracked across different cities to identify growth patterns or declines. The month-over-month growth rate is calculated, and a line chart illustrates the performance of each city over time, highlighting regional strengths and weaknesses.

Cuisine & Vendor Insights

To better understand customer preferences, cuisines were analyzed at the city level. The most popular cuisine in each city was identified, and cuisine types were compared by average order value and total revenue. Results highlight both high-demand cuisines and those that drive premium spending. Vendor performance was also assessed, ranking vendors by orders, revenue, and customer ratings to identify top performers and opportunities for improvement.

## $by_orders
## # A tibble: 10 × 5
##    vendor_id orders revenue avg_rating   aov
##    <chr>      <int>   <dbl>      <dbl> <dbl>
##  1 V4020        310  97033.       3.52  313.
##  2 V3887        309  92662.       3.50  300.
##  3 V3610        303  95746.       3.58  316.
##  4 V3604        303  93907.       3.53  310.
##  5 V4560        299  92513.       3.61  309.
##  6 V3362        298  93397.       3.46  313.
##  7 V3394        297  91594.       3.59  308.
##  8 V2936        296  98286.       3.59  332.
##  9 V1104        296  88353.       3.49  298.
## 10 V1425        295  94125.       3.62  319.
## 
## $by_revenue
## # A tibble: 10 × 5
##    vendor_id orders revenue avg_rating   aov
##    <chr>      <int>   <dbl>      <dbl> <dbl>
##  1 V4188        292  99263.       3.61  340.
##  2 V2936        296  98286.       3.59  332.
##  3 V3947        287  98031.       3.63  342.
##  4 V4020        310  97033.       3.52  313.
##  5 V4233        279  96214.       3.44  345.
##  6 V1976        274  96043.       3.65  351.
##  7 V2244        288  95916.       3.52  333.
##  8 V3610        303  95746.       3.58  316.
##  9 V3962        287  95716.       3.53  334.
## 10 V1339        282  95068.       3.59  337.
## 
## $by_rating
## # A tibble: 10 × 5
##    vendor_id orders revenue avg_rating   aov
##    <chr>      <int>   <dbl>      <dbl> <dbl>
##  1 V1641        226  68732.       3.79  304.
##  2 V2601        270  82079.       3.79  304.
##  3 V4206        237  69703.       3.78  294.
##  4 V2092        226  74370.       3.76  329.
##  5 V1091        233  73381.       3.75  315.
##  6 V2650        242  73483.       3.75  304.
##  7 V3380        241  72071.       3.75  299.
##  8 V3740        263  80211.       3.75  305.
##  9 V2249        229  76447.       3.74  334.
## 10 V2478        260  81004.       3.74  312.

Heatmaps and Pareto Charts

Order activity patterns were explored using a day-by-hour heatmap, which revealed peak ordering times across the week. Additionally, a Pareto analysis showed that a small share of customers contributed disproportionately to overall revenue, supporting the “80/20 rule” in customer value distribution. These insights help focus retention and promotional strategies on high-value time slots and customer groups.

Pivot tables

Pivot-style summaries were created to examine cuisine performance across different cities. These tables compare total orders, revenue, and average ratings by cuisine and city, allowing us to pinpoint regional strengths, local specialties, and areas where cuisine-specific marketing may drive growth.

## # A tibble: 36 × 9
##    city      total_revenue avg_rating Chinese Continental Italian Mexican
##    <chr>             <dbl>      <dbl>   <int>       <int>   <int>   <int>
##  1 Bangalore      8479350.       3.55   27674           0       0       0
##  2 Bangalore      8554579.       3.55       0       27726       0       0
##  3 Bangalore      8549064.       3.55       0           0   27780       0
##  4 Bangalore      8564959.       3.55       0           0       0   27947
##  5 Bangalore      8514884.       3.55       0           0       0       0
##  6 Bangalore      8583921.       3.55       0           0       0       0
##  7 Chennai        8693959.       3.54   28109           0       0       0
##  8 Chennai        8587596.       3.55       0       27943       0       0
##  9 Chennai        8546724.       3.56       0           0   27681       0
## 10 Chennai        8467507.       3.55       0           0       0   27503
## # ℹ 26 more rows
## # ℹ 2 more variables: `North Indian` <int>, `South Indian` <int>

Predict Next Purchase Date

FORECASTING

To predict customer return behavior, survival models were fitted using multiple distributions (Weibull, lognormal, exponential, and log-logistic). Model performance was compared with AIC, and the best-fitting model was used to estimate median time to next purchase for repeat customers. Predicted return dates provide valuable inputs for targeted re-engagement campaigns and churn prevention.

##     weibull   lognormal exponential loglogistic 
##     3463072     3485374     3486244     3447133
## ✅ Best distribution based on AIC: loglogistic
## # A tibble: 6 × 20
##   order_id customer_id order_date          order_amount city      cuisine_type
##   <chr>    <chr>       <dttm>                     <dbl> <fct>     <fct>       
## 1 O1911174 C10000      2025-06-26 21:05:00         410. Delhi     Italian     
## 2 O1178813 C10001      2025-06-30 02:43:00         408. Bangalore Italian     
## 3 O1320164 C10002      2025-06-29 15:40:00         944. Mumbai    Mexican     
## 4 O1086640 C10003      2025-06-16 02:18:00         411. Delhi     Chinese     
## 5 O1969590 C10004      2025-06-18 21:49:00         207. Bangalore South Indian
## 6 O1830655 C10005      2025-06-30 20:13:00         282. Kolkata   Chinese     
## # ℹ 14 more variables: delivery_time <dbl>, vendor_id <chr>, rating <dbl>,
## #   is_repeat_customer <dbl>, order_datetime <dttm>, date <date>, month <chr>,
## #   dow <ord>, order_time <chr>, hour <int>, is_weekend <dbl>, time_slot <fct>,
## #   predicted_gap_days <dbl>, predicted_next_purchase <dttm>

Customer Loyalty (RFM Analysis)

An RFM (Recency, Frequency, Monetary) framework was applied to segment customers by purchasing behavior. Customers were classified into groups such as “Champions,” “Loyal Customers,” and “At Risk.” Segment summaries highlight the number of customers, their spending power, and contribution to revenue. This analysis provides a foundation for personalized loyalty strategies and customer lifecycle management.

## # A tibble: 6 × 6
##   Loyalty_segment customers avg_recency avg_frequency avg_monetary total_revenue
##   <chr>               <int>       <dbl>         <dbl>        <dbl>         <dbl>
## 1 At Risk              3255         6            92.9       28328.     92207097.
## 2 Loyal Customers      2325         1.7         104         31836.     74017567.
## 3 Big Spenders         1478         6.6         108.        34171.     50504090.
## 4 Champions            1398         0.9         111.        34511.     48246737.
## 5 Recent but Low…      1510         0.9          90.9       28024.     42316324.
## 6 Frequent Low S…        34         7.4         104         29363.       998327.

CONCLUSION

Here’s a polished conclusion section you can insert into your report, weaving together all the findings you summarized:

Conclusion

The analysis of the food delivery dataset provides several key insights:

  • Top Customers: The top 10% of revenue-generating customers show a clear preference for breakfast and overnight orders, while their favorite cuisines and cities are more evenly distributed without strong dominance in any one category.

  • Order Trends: Weekly order values remain broadly stable across the week. While weekdays generate a greater total number of orders compared to weekends, the daily averages are similar, indicating balanced demand on a per-day basis. Interestingly, Monday exhibits the highest hourly fluctuations in order volume compared to other days, suggesting irregular demand patterns at the start of the week.

  • City-Level Dynamics: Overall city-level order trends remain steady, with a noticeable dip in February 2025 but otherwise consistent performance. Across cities, all cuisine categories perform at similar levels, with average order values just above ₹300 regardless of cuisine type

  • Forecasting Next Orders: Survival models were tested to forecast customer return behavior. Based on AIC scores, the loglogistic distribution provided the best fit for modeling purchase intervals, making it the most reliable approach for predicting the timing of the next order.

  • Customer Loyalty (RFM Analysis): Loyalty segmentation revealed distinct groups.

    • Champions (1,398 customers) and Loyal Customers (2,325 customers) show the highest frequency and monetary contributions, making them critical to sustain.

    • Big Spenders (1,478 customers) contribute significantly to revenue despite less frequent recency.

    • At Risk (3,255 customers) form the largest segment, representing a retention challenge and an opportunity for targeted re-engagement.

    • Other smaller segments such as Recent but Low Frequency and Frequent Low Spenders highlight diverse customer behaviors that may need tailored strategies.

  • Overall, the platform enjoys a stable demand base with strong contributions from a small group of top-tier customers. Retaining loyal and high-spending segments, addressing “At Risk” customers, and capitalizing on predictable ordering patterns (like breakfast and overnight slots) can significantly enhance growth and customer lifetime value.

The analysis covered customer behavior, order trends, cuisine preferences, vendor performance, and loyalty dynamics. Key findings show that a small proportion of customers and vendors drive the majority of revenue, ordering patterns are strongly time-dependent, and customer loyalty can be segmented effectively using RFM. These insights enable data-driven decision-making to boost retention, optimize vendor partnerships, and enhance marketing strategies.