Introduction

Instacart is an American company that operates as a same-day grocery delivery service. Customers can select groceries through a web application and the order is reviewed and delivered by personal shoppers through various retailers. With the large customer base, the company collects data of the users’ transactions behaviour and purchasing history.Instacart published the anonymized data on customer orders collected over time. Dataset consists 3 million grocery orders for more than 200,000 Instacart Orders.

In this analysis report,Exploratory Data analysis was performed in order to derive insights from customer data.

Data Understanding

Orders

The data set contains list of unique order_id for corresponding orders made by users. Order_number gives the number of the order. Eval_set denotes if the order is a prior order, train, or test. All but the last order of every user is classified as prior. Last order of every user is either classified as train or test.Order_dow gives the day of the week and order_hour_of_day denotes hour of the day. Days_since_prior_order gives the time difference between two orders and contains NULL value for the first order of every user. There are 3 million plus order_id for 200,000 plus different users.

order_id user_id eval_set order_number order_dow order_hour_of_day days_since_prior_order
2539329 1 prior 1 2 8 NA
2398795 1 prior 2 3 7 15
473747 1 prior 3 3 12 21
2254736 1 prior 4 4 7 29
431534 1 prior 5 4 15 28

Prior Orders

Prior table contains product_id for every order_id. It thereby gives information about products included in every order. Add_to_cart_order gives the order for product_id by which it was added by customer to their shopping cart. Every product_id is classified and coded as 1 under reordered column if it was previously ordered by customer and 0 otherwise. It is the largest table with over 32 million rows of data.

order_id product_id add_to_cart_order reordered
2 33120 1 1
2 28985 2 1
2 9327 3 0
2 45918 4 1
2 30035 5 0

Products

Products table consists of Product name,Product_id ,Aisle_id and Department_id.
product_id product_name aisle_id department_id
1 Chocolate Sandwich Cookies 61 19
2 All-Seasons Salt 104 13
3 Robust Golden Unsweetened Oolong Tea 94 7
4 Smart Ones Classic Favorites Mini Rigatoni With Vodka Cream Sauce 38 1
5 Green Chile Anytime Sauce 5 13

Aisles

Aisles table consist of Aisle id and Aisle name, where products are placed.
aisle_id aisle
1 prepared soups salads
2 specialty cheeses
3 energy granola bars
4 instant foods
5 marinades meat preparation

Departments

Department table consist of department id and department name.
department_id department
1 frozen
2 other
3 bakery
4 produce
5 alcohol

Data Preparation

  • There are no missing values in the dataset.There are no obvious outliers as every user has placed at least 4 orders with multiple products in each order

  • Some initial data processing was done to recode the varaibles to convert from categorical to factors such as order hour of day ,product name , aisle , department

Exploratory Data Analysis

Exploratory data analyis was divided in to 3 parts

  • Order level Analyis
  • Product level Analysis
  • Product Portfolio

Order level Analysis

Order level Analysis

Order level analysis was performed in order to find the answer to questions such as When do people order , When do they order again, How many prior orders ,How many items people buy and many more.

  • Most orders are on Sunday and Monday and between 9.00-6.00 PM in the evening.

  • Everyday ,Most orders are between 9.00-6.00 PM in the evening.
  • There seems to be 2 categories of people- one who reorder after 7 days and others who reorder after 30 days.


  • It was found that there were always at least 3 prior orders and people most often order around 5 items


Product level Analysis

Product level Analysis

  • Banana and Bag of Organic Banana are top ordered products.
  • Around 60% of ordered items are reordered.

  • Top 10 products having the highest probability of being reordered. 2% Lactose free milk and Organic Low fat milk have highest probability of being reordered.
  • Around 66% of time ,Customers put Multifold Towels first in to their cart.

Product Portfolio

Product Portfolio

  • There were 21 departments containing 134 aisles.The organisation of Aisle with in department is as below.Maximum number of product offerings across personal care and edible item departments.

  • The size of the boxes shows the number of products in each category ( Department/Aisle) Candy chocoloate and Ice cream ice aisle are the aisles with maximum variety of products.

Conclusion

  • Most orders are on Sunday and Monday and between 9.00-6.00 PM in the evening.
  • Two categories of people- one who reorder after 7 days and others who reorder after 30 days.
  • Atleast 3 prior orders and people most often order around 5 items.
  • Banana and Bag of Organic Banana are top ordered products.
  • Around 60% of ordered items are reordered.
  • Around 66% of time ,Customers put Multifold Towels first in to their cart.

Market Basket analysis was performed to propose recommendations in area of Store Layout & Marketing and Catalogue arrangement.The association between different products and aisles were find out using Apriori algorithm.The application of Apriori algorithm and interpretation with recommendation explained in other analysis report.