Overview

In this project, we will explore this data from a study of food delivery times in minutes (i.e., the time from the initial order to receiving the food) for a single restaurant. The data contains 10,012 orders from a specific restaurant. I have sourced this from modeldata R package. My primal goal is to explore my curiosity about this restaurant and operational flow of their delivery business, draw business insights that can use for business decision making.

To get started, these are the initial questions I am hoping to answer through my exploration.

  • What menu items is likely to order during the Breakfast, Lunch, or Dinner hours?

  • Which days get the most orders and which items get order the most on those days?

  • Is the delivery time of an order strongly related to the customer distance from the restaurant?

  • Out of the 10,000+ orders made to this restaurant, which item is commonly ordered?

    • Does the commonly order meal differ based on the customer distance from the restaurant?
    • Does the commonly order meal differ based on the day of the week?

Lastly, we will run a Regression and Classification predictive Modelling to predict the time a delivery takes and the day of which an order belongs.

Exploratory Data Analysis

(Left) Most orders are under an hour and and the average time it takes for delievering an order is around 25 minutes. Most orders take around 20 to 30 minutes. (Right) Most orders come from within 3 miles radius from the resturant. Only few orders are from far way distances like 9 miles or 12 miles.

Figure 1: (Left) Most orders are under an hour and and the average time it takes for delievering an order is around 25 minutes. Most orders take around 20 to 30 minutes. (Right) Most orders come from within 3 miles radius from the resturant. Only few orders are from far way distances like 9 miles or 12 miles.

It is the case the resturant opens at 11AM and closes at 9PM or atleast that is when orders for delivery are taken. Orders reduce around 2PM and more orders flow during the evening hours

Figure 2: It is the case the resturant opens at 11AM and closes at 9PM or atleast that is when orders for delivery are taken. Orders reduce around 2PM and more orders flow during the evening hours

(Left) We see a nonlinear relationship between the distance and time it takes to deliever an order, which is unexpected. It appears for some orders that are 9 miles away from the resturant take about 20 minutes versus some orders in 2 miles distance taking almost an hour for delivery (RIGHT) We see a quadratic relationship between the time of the day an order is made and the time it takes to deliever which indicate that earlier day orders take shorter time to deliver(make sense because less order traffic. However, at peak time or evenings, orders take longer and in fact after peak time, delivery time starts reduce resulting in the

Figure 3: (Left) We see a nonlinear relationship between the distance and time it takes to deliever an order, which is unexpected. It appears for some orders that are 9 miles away from the resturant take about 20 minutes versus some orders in 2 miles distance taking almost an hour for delivery (RIGHT) We see a quadratic relationship between the time of the day an order is made and the time it takes to deliever which indicate that earlier day orders take shorter time to deliver(make sense because less order traffic. However, at peak time or evenings, orders take longer and in fact after peak time, delivery time starts reduce resulting in the

  • this raises the question, why is that some far away orders take shorter time and some shorter distance orders take longer.

Exploring Orders

Plot showing number of foods by days of the week. It appears to follow a left skewed distribution meaning more orders come towards the end of the end or approaching the end week. Friday and Saturday have received the most orders while Monday have the least with about less than 500 orders

Figure 4: Plot showing number of foods by days of the week. It appears to follow a left skewed distribution meaning more orders come towards the end of the end or approaching the end week. Friday and Saturday have received the most orders while Monday have the least with about less than 500 orders

Most days follow the right skewed distribution where most orders take under 30 minutes to deliever versus Friday and Saturday where orders take much longer compare to other days and they even follow a bi-modal distribution

Figure 5: Most days follow the right skewed distribution where most orders take under 30 minutes to deliever versus Friday and Saturday where orders take much longer compare to other days and they even follow a bi-modal distribution

We do see that across all days of the week, most orders are place in the evenings which is a confirmation the relationship we observed earlier.

Figure 6: We do see that across all days of the week, most orders are place in the evenings which is a confirmation the relationship we observed earlier.

Exploring Menu Items

In this section, we will uncover the following - Number of menu items order at once - Most order menu item - Most order menu item per day and time of the day - Most order menu by distance group/neighborhood

(LEFT) We see when an order is made, it mostly 3 different items on the menu that order at once. Most orders are between 2 to 4 different items.(RIGHT) We see that  when an order is made, they mostly 3 items at once, and 2-4 is the range of most orders. It appears that 5 different items are order more often than 1 items off the menu.

Figure 7: (LEFT) We see when an order is made, it mostly 3 different items on the menu that order at once. Most orders are between 2 to 4 different items.(RIGHT) We see that when an order is made, they mostly 3 items at once, and 2-4 is the range of most orders. It appears that 5 different items are order more often than 1 items off the menu.

It appears that Item 4 is the most ordered and Item 19 is the least order. Specifically 4,8, 24 are the most while 20,21,16, 19 are least order.

Figure 8: It appears that Item 4 is the most ordered and Item 19 is the least order. Specifically 4,8, 24 are the most while 20,21,16, 19 are least order.

  • Insights: Saturday is our busiest day of the week and delivery takes longer time generally during the evening.

Modeling and Forcasting Delivery Duration using Supervised Learning

Delivery Time Prediction and Model Evaluation

Train/Test Split

Model 1: Linear Regression

Model 2: K-Nearest Neigbors

neighbors .metric mean n std_err
1 rmse 3.342966 10 0.0654968
6 rmse 2.688308 10 0.0446421
11 rmse 2.594064 10 0.0392295
17 rmse 2.560397 10 0.0347248
22 rmse 2.548658 10 0.0341018
28 rmse 2.544904 10 0.0349100
33 rmse 2.545263 10 0.0353429
39 rmse 2.548994 10 0.0354578
44 rmse 2.553932 10 0.0354950
50 rmse 2.562203 10 0.0355529

Model 3: Decision Tree

## # A tibble: 100 × 2
##    tree_depth cost_complexity
##         <int>           <dbl>
##  1          1    0.0000000001
##  2          2    0.0000000001
##  3          4    0.0000000001
##  4          5    0.0000000001
##  5          7    0.0000000001
##  6          8    0.0000000001
##  7         10    0.0000000001
##  8         11    0.0000000001
##  9         13    0.0000000001
## 10         15    0.0000000001
## # ℹ 90 more rows

Selecting Best Paramters: Using One-Standard Error Rule

Residual Analysis

Evaluating Model Performance

Model rmse rsq mae
Linear Regression 2.619468 0.8583001 1.855581
k-Nearest Neighbors 2.511664 0.8696189 1.721909
Decision Tree 2.495521 0.8716732 1.769594

Predicting Customer Ordering Patterns: A Multiclass Classification Approach for Forecasting the Number of Orders Per Day

## #  10-fold cross-validation using stratification 
## # A tibble: 10 × 2
##    splits             id    
##    <list>             <chr> 
##  1 <split [6756/753]> Fold01
##  2 <split [6756/753]> Fold02
##  3 <split [6757/752]> Fold03
##  4 <split [6758/751]> Fold04
##  5 <split [6758/751]> Fold05
##  6 <split [6758/751]> Fold06
##  7 <split [6759/750]> Fold07
##  8 <split [6759/750]> Fold08
##  9 <split [6760/749]> Fold09
## 10 <split [6760/749]> Fold10

Day Metric Estimator Estimate
Mon accuracy multiclass 0.2500000
Tue accuracy multiclass 0.4152047
Wed accuracy multiclass 0.1873278
Thu accuracy multiclass 0.2974239
Fri accuracy multiclass 0.3789683
Sat accuracy multiclass 0.5113835
Sun accuracy multiclass 0.2782369

Multi-nominal Logistic Regression

Table 1: Table showing Multinominal Logistic Regression Accuarcy by Class
Day Metric Estimator Estimate
Mon accuracy multiclass 0.3076923
Tue accuracy multiclass 0.1754386
Wed accuracy multiclass 0.1652893
Thu accuracy multiclass 0.1569087
Fri accuracy multiclass 0.2361111
Sat accuracy multiclass 0.5639229
Sun accuracy multiclass 0.2093664

Decision Tree: Classification

## # A tibble: 10 × 7
##    cost_complexity .metric .estimator  mean     n std_err .config         
##              <dbl> <chr>   <chr>      <dbl> <int>   <dbl> <chr>           
##  1    0.0000000001 roc_auc hand_till  0.726    10 0.00290 pre0_mod01_post0
##  2    0.000000001  roc_auc hand_till  0.726    10 0.00290 pre0_mod02_post0
##  3    0.00000001   roc_auc hand_till  0.726    10 0.00290 pre0_mod03_post0
##  4    0.0000001    roc_auc hand_till  0.726    10 0.00290 pre0_mod04_post0
##  5    0.000001     roc_auc hand_till  0.726    10 0.00290 pre0_mod05_post0
##  6    0.00001      roc_auc hand_till  0.726    10 0.00290 pre0_mod06_post0
##  7    0.0001       roc_auc hand_till  0.727    10 0.00279 pre0_mod07_post0
##  8    0.001        roc_auc hand_till  0.767    10 0.00226 pre0_mod08_post0
##  9    0.01         roc_auc hand_till  0.695    10 0.00420 pre0_mod09_post0
## 10    0.1          roc_auc hand_till  0.5      10 0       pre0_mod10_post0

Random Forest

## # A tibble: 20 × 7
##    trees .metric .estimator  mean     n std_err .config         
##    <int> <chr>   <chr>      <dbl> <int>   <dbl> <chr>           
##  1     1 roc_auc hand_till  0.647    10 0.00324 pre0_mod01_post0
##  2    27 roc_auc hand_till  0.784    10 0.00252 pre0_mod02_post0
##  3    53 roc_auc hand_till  0.787    10 0.00289 pre0_mod03_post0
##  4    79 roc_auc hand_till  0.790    10 0.00278 pre0_mod04_post0
##  5   106 roc_auc hand_till  0.790    10 0.00272 pre0_mod05_post0
##  6   132 roc_auc hand_till  0.791    10 0.00291 pre0_mod06_post0
##  7   158 roc_auc hand_till  0.791    10 0.00296 pre0_mod07_post0
##  8   184 roc_auc hand_till  0.791    10 0.00282 pre0_mod08_post0
##  9   211 roc_auc hand_till  0.791    10 0.00290 pre0_mod09_post0
## 10   237 roc_auc hand_till  0.792    10 0.00276 pre0_mod10_post0
## 11   263 roc_auc hand_till  0.791    10 0.00303 pre0_mod11_post0
## 12   289 roc_auc hand_till  0.791    10 0.00290 pre0_mod12_post0
## 13   316 roc_auc hand_till  0.792    10 0.00272 pre0_mod13_post0
## 14   342 roc_auc hand_till  0.791    10 0.00298 pre0_mod14_post0
## 15   368 roc_auc hand_till  0.791    10 0.00302 pre0_mod15_post0
## 16   394 roc_auc hand_till  0.792    10 0.00279 pre0_mod16_post0
## 17   421 roc_auc hand_till  0.792    10 0.00292 pre0_mod17_post0
## 18   447 roc_auc hand_till  0.791    10 0.00266 pre0_mod18_post0
## 19   473 roc_auc hand_till  0.792    10 0.00293 pre0_mod19_post0
## 20   500 roc_auc hand_till  0.792    10 0.00320 pre0_mod20_post0

Model Comparism

## # A tibble: 4 × 4
##   .metric .estimator .estimate Model             
##   <chr>   <chr>          <dbl> <chr>             
## 1 roc_auc hand_till      0.715 MultiNominal Logit
## 2 roc_auc hand_till      0.768 Decision Tree     
## 3 roc_auc hand_till      0.786 Random Forest     
## 4 roc_auc hand_till      0.790 kNN

Appendix

-dayFri: On Fridays, we expect an average 7.7 minutes increase in the time taken for a delivery. daySat: On Saturday, items are expected to take 8 minutes longer to arrive after accounting the time of hour and distance.

(Intercept) : -71.797533: Orders within 3 miles radius of the restaurant on a Monday night are expected to take 60-71.99 = 11 minutes less on average, holding all other conditions constant.