Instacart is an American company that operates as a same-day grocery delivery service. Customers can select groceries through a web application and the order is reviewed and delivered by personal shoppers through various retailers. With the large customer base, the company collects data of the users’ transactions behaviour and purchasing history.Instacart published the anonymized data on customer orders collected over time. Dataset consists 3 million grocery orders for more than 200,000 Instacart Orders.
In this analysis report,Exploratory Data analysis was performed in order to derive insights from customer data.
The data set contains list of unique order_id for corresponding orders made by users. Order_number gives the number of the order. Eval_set denotes if the order is a prior order, train, or test. All but the last order of every user is classified as prior. Last order of every user is either classified as train or test.Order_dow gives the day of the week and order_hour_of_day denotes hour of the day. Days_since_prior_order gives the time difference between two orders and contains NULL value for the first order of every user. There are 3 million plus order_id for 200,000 plus different users.
order_id | user_id | eval_set | order_number | order_dow | order_hour_of_day | days_since_prior_order |
---|---|---|---|---|---|---|
2539329 | 1 | prior | 1 | 2 | 8 | NA |
2398795 | 1 | prior | 2 | 3 | 7 | 15 |
473747 | 1 | prior | 3 | 3 | 12 | 21 |
2254736 | 1 | prior | 4 | 4 | 7 | 29 |
431534 | 1 | prior | 5 | 4 | 15 | 28 |
Prior table contains product_id for every order_id. It thereby gives information about products included in every order. Add_to_cart_order gives the order for product_id by which it was added by customer to their shopping cart. Every product_id is classified and coded as 1 under reordered column if it was previously ordered by customer and 0 otherwise. It is the largest table with over 32 million rows of data.
order_id | product_id | add_to_cart_order | reordered |
---|---|---|---|
2 | 33120 | 1 | 1 |
2 | 28985 | 2 | 1 |
2 | 9327 | 3 | 0 |
2 | 45918 | 4 | 1 |
2 | 30035 | 5 | 0 |
product_id | product_name | aisle_id | department_id |
---|---|---|---|
1 | Chocolate Sandwich Cookies | 61 | 19 |
2 | All-Seasons Salt | 104 | 13 |
3 | Robust Golden Unsweetened Oolong Tea | 94 | 7 |
4 | Smart Ones Classic Favorites Mini Rigatoni With Vodka Cream Sauce | 38 | 1 |
5 | Green Chile Anytime Sauce | 5 | 13 |
aisle_id | aisle |
---|---|
1 | prepared soups salads |
2 | specialty cheeses |
3 | energy granola bars |
4 | instant foods |
5 | marinades meat preparation |
department_id | department |
---|---|
1 | frozen |
2 | other |
3 | bakery |
4 | produce |
5 | alcohol |
There are no missing values in the dataset.There are no obvious outliers as every user has placed at least 4 orders with multiple products in each order
Some initial data processing was done to recode the varaibles to convert from categorical to factors such as order hour of day ,product name , aisle , department
Exploratory data analyis was divided in to 3 parts
Order level analysis was performed in order to find the answer to questions such as When do people order , When do they order again, How many prior orders ,How many items people buy and many more.
Market Basket analysis was performed to propose recommendations in area of Store Layout & Marketing and Catalogue arrangement.The association between different products and aisles were find out using Apriori algorithm.The application of Apriori algorithm and interpretation with recommendation explained in other analysis report.