In this RMD file, I’ve tried and explore the basic information about the dataset given. The dataset is a set of files describing Instacart customers’ orders over time.

Objective:

The goal is to predict which products will be in a user’s next order. The dataset is anonymized and contains a sample of over 3 million grocery orders from more than 200,000 Instacart users.For each user, 4 and 100 of their orders are given, with the sequence of products purchased in each order

When do people order?

Hour of Day -There is a clear effect of hour of day on order volume. Most orders are between 8.00-18.00

Day of Week -There is a clear effect of day of the week. Most orders are on days 0 and 1

When do they order again?

People seem to order more often after exactly 1 week

How many prior orders are there?

How many items do people buy?

Bestsellers

Let’s have a look which products are sold most often (top30)

And the clear winner is: Bananas

A look at products which are reordered the most (top10)

A look at aisle where reordering happens the most (top30)

A look at aisles which are reordered the most and their reorder percentage (top20)

Visualizing the Product Portfolio

How many unique products are offered in each department/aisle?

How are aisles organized within departments?