In this RMD file, I’ve tried and explore the basic information about the dataset given. The dataset is a set of files describing Instacart customers’ orders over time.
Objective:
The goal is to predict which products will be in a user’s next order. The dataset is anonymized and contains a sample of over 3 million grocery orders from more than 200,000 Instacart users.For each user, 4 and 100 of their orders are given, with the sequence of products purchased in each order
When do people order?
Hour of Day -There is a clear effect of hour of day on order volume. Most orders are between 8.00-18.00
Day of Week -There is a clear effect of day of the week. Most orders are on days 0 and 1
When do they order again?
People seem to order more often after exactly 1 week
How many prior orders are there?
How many items do people buy?
Bestsellers
Let’s have a look which products are sold most often (top30)
And the clear winner is: Bananas
A look at products which are reordered the most (top10)
A look at aisle where reordering happens the most (top30)
A look at aisles which are reordered the most and their reorder percentage (top20)
How many unique products are offered in each department/aisle?
How are aisles organized within departments?