Introduction

Instacart is an American company that operates as a same-day grocery delivery service. Customers can select groceries through a web application and the order is reviewed and delivered by personal shoppers through various retailers. With the large customer base, the company collects data of the users’ transactions behaviour and purchasing history.The dataset contains a sample of over 3 million grocery orders from more than 200,000 Instacart users.

In this Report, Market Basket analysis (MBA) was performed to propose recommendations in 2 areas:

1. Store Layout & Marketing :-

Products that co-occur together or highly associated should be put close to one another,to improve the customer shopping experience. Marketing can deliver target marketing ,by targeting those customers who bought specific products with other products and providing offers on those products that are likely to be interesting for them.

2. Catalogue Arrangement :-

The association between various Aisles was identified with the help of Association analysis.Highly associated Aisles were recommended to place close together to boost the sales of products from those Aisles and to reduce the time spent in finding products from multiple Aisles.

Data Understanding

Orders

The data set contains list of unique order_id for corresponding orders made by users. Order_number gives the number of the order. Eval_set denotes if the order is a prior order, train, or test. All but the last order of every user is classified as prior. Last order of every user is either classified as train or test.Order_dow gives the day of the week and order_hour_of_day denotes hour of the day. Days_since_prior_order gives the time difference between two orders and contains NULL value for the first order of every user. There are 3 million plus order_id for 200,000 plus different users.

order_id user_id eval_set order_number order_dow order_hour_of_day days_since_prior_order
2539329 1 prior 1 2 8 NA
2398795 1 prior 2 3 7 15
473747 1 prior 3 3 12 21
2254736 1 prior 4 4 7 29
431534 1 prior 5 4 15 28

Products

Products table consists of Product name,Product_id ,Aisle_id and Department_id.
product_id product_name aisle_id department_id
1 Chocolate Sandwich Cookies 61 19
2 All-Seasons Salt 104 13
3 Robust Golden Unsweetened Oolong Tea 94 7
4 Smart Ones Classic Favorites Mini Rigatoni With Vodka Cream Sauce 38 1
5 Green Chile Anytime Sauce 5 13

Aisles

Aisles table consist of Aisle id and Aisle name, where products are placed.
aisle_id aisle
1 prepared soups salads
2 specialty cheeses
3 energy granola bars
4 instant foods
5 marinades meat preparation

Departments

Department table consist of department id and department name.
department_id department
1 frozen
2 other
3 bakery
4 produce
5 alcohol

Data Preparation for MBA

Our goal was to identify associated products and associated Aisles. Hence,data preparation for both ( Product & Aisle) were performed.Then apriori algorithm was implemented on both transaction datasets,followed by interpretation and analysis of association rules.

For Products

  • Creating customer shopping basket,where each row is order id and have list of products purchase in that order.

  • The shopping basket was then converted to transaction data, where each order is one transaction.

  • Order ID was used as transaction ID and Product names as target for Apriori algorithm.

  • Summary of transaction shows that most frequently bought item is banana and one transaction may have one,two or more than 2 items purchased.

For Aisles

  • Creating customer shopping basket,where each row is order id and have list of Aisles from which products were purchased in that order.

  • The shopping basket was then converted to transaction data, where each order is one transaction.

  • Order ID was used as transaction ID and Aisle names as target for Apriori algorithm

  • Summary of transaction shows that most frequently bought Aisle from where items were purchased is “Fresh Fruits” and one transaction may have one,two or more than Aisle from where items were purchased.

MBA:-Apriori Algorithm

Overview of Market Basket Analysis (MBA):-

Market Basket Analysis(MBA) is a modelling technique based upon the theory that if you buy a certain group of items, you are more (or less) likely to buy another group of items.The output of MBA is in the form of rules.The rules can be simple {A ==> B}, when a customer buys item A then it is (very) likely that the customer buys item B. More complex rules are also possible if {A, B ==> “Then” D, F}, when a customer buys items A and B then it is likely that he buys items D and F.

Support:-
Support is the basic probability of an event to occur. If we have an event to buy product A, Support(A) is the number of transactions which includes A divided by total number of transactions.

Confidence:-
The confidence of an event is the conditional probability of the occurrence; the chances of A happening given B has already happened.

Lift:-
This is the ratio of confidence to expected confidence.The probability of all of the items in a rule occurring together (otherwise known as the support) divided by the product of the probabilities of the items on the left and right side occurring as if there was no association between them.

The lift value tells us how much better a rule is at predicting something than randomly guessing. The higher the lift, the stronger the association.

Apriori algorithm was used for frequent item set mining and association rule learning over transactional databases. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database. The frequent item sets determined by Apriori can be used to determine association rules which highlight general trends in the database.
There is a “arules package” in R which implements the apriori algorithm can be used for analyzing the customer shopping basket .It requires 2 parameter to be set which are Support measures how frequently a pattern occurs in the data (how often a certain set of items are bought together) Confidence measures How strong an association rule is .

Apriori algorithm was implemented on both Products and Aisles transaction data.

On Products

Parameters while generating rules were: Support=0.005 , Confidence=0.25

Among generated rules, there were some repeated or redundant rules (for instance, one rule is the super rule of another rule), so they were pruned. There were total 27 rules,after pruning.The First rule tells that if customer buy Organic Cilantro, they are 6.21 times more likely to buy Limes, compared to random.The Top 5 rules are shown below:

lhs rhs support confidence lift count
{Organic Cilantro} {Limes} 0.0076748 0.2855927 6.211228 1007
{Limes} {Large Lemon} 0.0121562 0.2643792 4.264159 1595
{Organic Hass Avocado,Organic Strawberries} {Bag of Organic Bananas} 0.0054112 0.4613385 3.910321 710
{Organic Raspberries} {Organic Strawberries} 0.0127278 0.3011179 3.626710 1670
{Bag of Organic Bananas,Organic Hass Avocado} {Organic Strawberries} 0.0054112 0.2933884 3.533615 710

On Aisles

Parameters while generating rules were: Support=0.007 , Confidence=0.4

Among generated rules, there were some repeated or redundant rules (for instance, one rule is the super rule of another rule), so they were pruned.There were total 9361 rules,after pruning.The First rule tells that if customer buy product from Aisle - cereals and lunch meat,they are 2.79 times more likely to buy items from bread Aisle, compared to random.The Top 5 rules are shown below:

lhs rhs support confidence lift count
{cereal,lunch meat} {bread} 0.0076595 0.4574420 2.793210 1005
{chips pretzels,lunch meat,packaged cheese} {bread} 0.0073699 0.4472710 2.731105 967
{lunch meat,milk,yogurt} {bread} 0.0080406 0.4429051 2.704446 1055
{lunch meat,milk,packaged cheese} {bread} 0.0084826 0.4427208 2.703320 1113
{packaged cheese,preserved dips spreads} {chips pretzels} 0.0072708 0.4760479 2.694408 954

Interpretation & Analysis

It is important to identify which products were sold how frequently in our dataset.Visualisation is the effective way to analyse the associations.

For Products

Products Associations


1. Frequency Histogram :-

Item Frequency Histogram tells how many times an item has occurred in our dataset as compared to the others.

The relative frequency plot shows that “Banana” and “Bag of Organic Banana” constitute around 1/4th of the transaction dataset; 1/4th the total sales are these items.It means that many people are buying these items.

So, Other items can be placed around the more frequently purchased items to boost the sales, For instance Organic Grape tomatoes can be placed beside Banana and Bag of Organic Banana.

2. Scatter Plot :-

Rules with high lift have typically a relatively low support.

3. Network Graph visualization :-

In this Vis each node represents product in shopping basket and each rule “from ==> to” is an edge of the graph.The graphs tells that if a customer buys Bag of Organic banana , He is likely to buy Organic Lemon, Organic Large Extra Fancy Fuji Apple ..etc.


For Aisles

Aisles Associations

1. Frequency Histogram :-

Frequency Histogram tells how many times an Aisle has occurred in our dataset as compared to the others. The relative frequency plot shows that the first four Aisles shown in graph constitute around 1/5th of the transaction dataset; 1/5th the total sales are from these Aisles. It means that many people are buying these items.


2. Scatter Plot :-

Rules with high lift have typically a relatively low support.

3. Network Graph visualization :-

In this Vis each node represents product in shopping basket and each rule “from ==> to” is an edge of the graph.The graphs tells that if a customer buys from Aisle Package Vegetable fruit,He is likely to buy from Fresh Vegetable and Packaged Cheese.



Conclusion

Instacart is a web app and in web analytics , data reflects the way users behave, and the way they are encouraged to behave, by the website design decisions earlier made.Market Basket analysis can be used to drive bussiness decision making by using the association results.There are number of ways in which MBA can be used :

  • Associated Aisles should be placed together on the web application platform to boost the sales and reduce the time spent in finding that Aisle.For instance Cereral, lunch meat and Bread Aisle should be placed together, as they are highly associated.

  • Associated products such as Organic Cilantro and limes should be put close to one another,to improve the customer shopping experience and improving the store layout.Marketing can take benefit from these association relationship, for example target customers who buy Organic Cilantro with offers on Lime, to encourage them to spend more on their shopping basket.

  • List of rules can be used to put recommendations at the product pages and at product cart pages.Those rules that are applicable for each product with the high lift where the product recommended has a high margin should be considered.It can drive the significant uplift in profit.