Welcome to an Exploratory Data Analysis for food demand forecasting!
This is a Kaggle machine learning project https://www.kaggle.com/datasets/kannanaikkal/food-demand-forecasting. It is a meal delivery company which operates in multiple cities. They have various fulfillment centers in these cities for dispatching meal orders to their customers. The client wants you to help these centers with demand forecasting for upcoming weeks so that these centers will plan the stock of raw materials accordingly.
The replenishment of majority of raw materials is done on weekly basis and since the raw material is perishable, the procurement planning is of utmost importance. Given the following information, the task is to predict the demand for the next 10 weeks (Weeks: 146-155) for the center-meal combinations in the test set.
Let’s get started!
| center_id | city_code | region_code | center_type | op_area |
|---|---|---|---|---|
| 11 | 679 | 56 | TYPE_A | 3.7 |
| 13 | 590 | 56 | TYPE_B | 6.7 |
| 124 | 590 | 56 | TYPE_C | 4.0 |
| 66 | 648 | 34 | TYPE_A | 4.1 |
| meal_id | category | cuisine |
|---|---|---|
| 1885 | Beverages | Thai |
| 1993 | Beverages | Thai |
| 2539 | Beverages | Thai |
| 1248 | Beverages | Indian |
| id | week | center_id | meal_id | checkout_price | base_price | emailer_for_promotion | homepage_featured |
|---|---|---|---|---|---|---|---|
| 1028232 | 146 | 55 | 1885 | 158.11 | 159.11 | 0 | 0 |
| 1127204 | 146 | 55 | 1993 | 160.11 | 159.11 | 0 | 0 |
| 1212707 | 146 | 55 | 2539 | 157.14 | 159.14 | 0 | 0 |
| 1082698 | 146 | 55 | 2631 | 162.02 | 162.02 | 0 | 0 |
| id | week | center_id | meal_id | checkout_price | base_price | emailer_for_promotion | homepage_featured | num_orders |
|---|---|---|---|---|---|---|---|---|
| 1379560 | 1 | 55 | 1885 | 136.83 | 152.29 | 0 | 0 | 177 |
| 1466964 | 1 | 55 | 1993 | 136.83 | 135.83 | 0 | 0 | 270 |
| 1346989 | 1 | 55 | 2539 | 134.86 | 135.86 | 0 | 0 | 189 |
| 1338232 | 1 | 55 | 2139 | 339.50 | 437.53 | 0 | 0 | 54 |
Findings:
1. store and meal data can be connected to
train by center_id and meal_id
respectively.
2. The id column in train seems to be
unique.
## `summarise()` has grouped output by 'center_id'. You can override using the
## `.groups` argument.
## # A tibble: 3,597 Ă— 3
## # Groups: center_id [77]
## center_id meal_id count
## <int> <int> <int>
## 1 101 1571 1
## 2 145 1571 1
## 3 145 2104 1
## 4 24 1248 3
## 5 41 1248 3
## 6 92 2577 3
## 7 97 2956 3
## 8 139 2577 3
## 9 24 2492 4
## 10 39 2956 4
## # … with 3,587 more rows
## id week center_id
## 0 0 0
## meal_id checkout_price base_price
## 0 0 0
## emailer_for_promotion homepage_featured num_orders
## 0 0 0
Findings: 1. There are a lot of missing
values.
2. There is no NAs in the data set.