Instacart is an American company that operates as a same-day grocery delivery service. Customers can select groceries through a web application and the order is reviewed and delivered by personal shoppers through various retailers. With the large customer base, the company collects data of the users’ transactions behaviour and purchasing history.The dataset contains a sample of over 3 million grocery orders from more than 200,000 Instacart users.
In this Report, Market Basket analysis (MBA) was performed to propose recommendations in 2 areas:
1. Store Layout & Marketing :-
Products that co-occur together or highly associated should be put close to one another,to improve the customer shopping experience. Marketing can deliver target marketing ,by targeting those customers who bought specific products with other products and providing offers on those products that are likely to be interesting for them.
2. Catalogue Arrangement :-
The association between various Aisles was identified with the help of Association analysis.Highly associated Aisles were recommended to place close together to boost the sales of products from those Aisles and to reduce the time spent in finding products from multiple Aisles.
The data set contains list of unique order_id for corresponding orders made by users. Order_number gives the number of the order. Eval_set denotes if the order is a prior order, train, or test. All but the last order of every user is classified as prior. Last order of every user is either classified as train or test.Order_dow gives the day of the week and order_hour_of_day denotes hour of the day. Days_since_prior_order gives the time difference between two orders and contains NULL value for the first order of every user. There are 3 million plus order_id for 200,000 plus different users.
| order_id | user_id | eval_set | order_number | order_dow | order_hour_of_day | days_since_prior_order |
|---|---|---|---|---|---|---|
| 2539329 | 1 | prior | 1 | 2 | 8 | NA |
| 2398795 | 1 | prior | 2 | 3 | 7 | 15 |
| 473747 | 1 | prior | 3 | 3 | 12 | 21 |
| 2254736 | 1 | prior | 4 | 4 | 7 | 29 |
| 431534 | 1 | prior | 5 | 4 | 15 | 28 |
| product_id | product_name | aisle_id | department_id |
|---|---|---|---|
| 1 | Chocolate Sandwich Cookies | 61 | 19 |
| 2 | All-Seasons Salt | 104 | 13 |
| 3 | Robust Golden Unsweetened Oolong Tea | 94 | 7 |
| 4 | Smart Ones Classic Favorites Mini Rigatoni With Vodka Cream Sauce | 38 | 1 |
| 5 | Green Chile Anytime Sauce | 5 | 13 |
| aisle_id | aisle |
|---|---|
| 1 | prepared soups salads |
| 2 | specialty cheeses |
| 3 | energy granola bars |
| 4 | instant foods |
| 5 | marinades meat preparation |
| department_id | department |
|---|---|
| 1 | frozen |
| 2 | other |
| 3 | bakery |
| 4 | produce |
| 5 | alcohol |
Our goal was to identify associated products and associated Aisles. Hence,data preparation for both ( Product & Aisle) were performed.Then apriori algorithm was implemented on both transaction datasets,followed by interpretation and analysis of association rules.
Creating customer shopping basket,where each row is order id and have list of products purchase in that order.
The shopping basket was then converted to transaction data, where each order is one transaction.
Order ID was used as transaction ID and Product names as target for Apriori algorithm.
Summary of transaction shows that most frequently bought item is banana and one transaction may have one,two or more than 2 items purchased.
Creating customer shopping basket,where each row is order id and have list of Aisles from which products were purchased in that order.
The shopping basket was then converted to transaction data, where each order is one transaction.
Order ID was used as transaction ID and Aisle names as target for Apriori algorithm
Summary of transaction shows that most frequently bought Aisle from where items were purchased is “Fresh Fruits” and one transaction may have one,two or more than Aisle from where items were purchased.
Overview of Market Basket Analysis (MBA):-
Market Basket Analysis(MBA) is a modelling technique based upon the theory that if you buy a certain group of items, you are more (or less) likely to buy another group of items.The output of MBA is in the form of rules.The rules can be simple {A ==> B}, when a customer buys item A then it is (very) likely that the customer buys item B. More complex rules are also possible if {A, B ==> “Then” D, F}, when a customer buys items A and B then it is likely that he buys items D and F.
Support:-
Support is the basic probability of an event to occur. If we have an event to buy product A, Support(A) is the number of transactions which includes A divided by total number of transactions.
Confidence:-
The confidence of an event is the conditional probability of the occurrence; the chances of A happening given B has already happened.
Lift:-
This is the ratio of confidence to expected confidence.The probability of all of the items in a rule occurring together (otherwise known as the support) divided by the product of the probabilities of the items on the left and right side occurring as if there was no association between them.
The lift value tells us how much better a rule is at predicting something than randomly guessing. The higher the lift, the stronger the association.
Apriori algorithm was used for frequent item set mining and association rule learning over transactional databases. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database. The frequent item sets determined by Apriori can be used to determine association rules which highlight general trends in the database.
There is a “arules package” in R which implements the apriori algorithm can be used for analyzing the customer shopping basket .It requires 2 parameter to be set which are Support measures how frequently a pattern occurs in the data (how often a certain set of items are bought together) Confidence measures How strong an association rule is .
Apriori algorithm was implemented on both Products and Aisles transaction data.
Parameters while generating rules were: Support=0.005 , Confidence=0.25
Among generated rules, there were some repeated or redundant rules (for instance, one rule is the super rule of another rule), so they were pruned. There were total 27 rules,after pruning.The First rule tells that if customer buy Organic Cilantro, they are 6.21 times more likely to buy Limes, compared to random.The Top 5 rules are shown below:
| lhs | rhs | support | confidence | lift | count |
|---|---|---|---|---|---|
| {Organic Cilantro} | {Limes} | 0.0076748 | 0.2855927 | 6.211228 | 1007 |
| {Limes} | {Large Lemon} | 0.0121562 | 0.2643792 | 4.264159 | 1595 |
| {Organic Hass Avocado,Organic Strawberries} | {Bag of Organic Bananas} | 0.0054112 | 0.4613385 | 3.910321 | 710 |
| {Organic Raspberries} | {Organic Strawberries} | 0.0127278 | 0.3011179 | 3.626710 | 1670 |
| {Bag of Organic Bananas,Organic Hass Avocado} | {Organic Strawberries} | 0.0054112 | 0.2933884 | 3.533615 | 710 |
Parameters while generating rules were: Support=0.007 , Confidence=0.4
Among generated rules, there were some repeated or redundant rules (for instance, one rule is the super rule of another rule), so they were pruned.There were total 9361 rules,after pruning.The First rule tells that if customer buy product from Aisle - cereals and lunch meat,they are 2.79 times more likely to buy items from bread Aisle, compared to random.The Top 5 rules are shown below:
| lhs | rhs | support | confidence | lift | count |
|---|---|---|---|---|---|
| {cereal,lunch meat} | {bread} | 0.0076595 | 0.4574420 | 2.793210 | 1005 |
| {chips pretzels,lunch meat,packaged cheese} | {bread} | 0.0073699 | 0.4472710 | 2.731105 | 967 |
| {lunch meat,milk,yogurt} | {bread} | 0.0080406 | 0.4429051 | 2.704446 | 1055 |
| {lunch meat,milk,packaged cheese} | {bread} | 0.0084826 | 0.4427208 | 2.703320 | 1113 |
| {packaged cheese,preserved dips spreads} | {chips pretzels} | 0.0072708 | 0.4760479 | 2.694408 | 954 |
It is important to identify which products were sold how frequently in our dataset.Visualisation is the effective way to analyse the associations.
Item Frequency Histogram tells how many times an item has occurred in our dataset as compared to the others.
The relative frequency plot shows that “Banana” and “Bag of Organic Banana” constitute around 1/4th of the transaction dataset; 1/4th the total sales are these items.It means that many people are buying these items.
So, Other items can be placed around the more frequently purchased items to boost the sales, For instance Organic Grape tomatoes can be placed beside Banana and Bag of Organic Banana.
Rules with high lift have typically a relatively low support.
In this Vis each node represents product in shopping basket and each rule “from ==> to” is an edge of the graph.The graphs tells that if a customer buys Bag of Organic banana , He is likely to buy Organic Lemon, Organic Large Extra Fancy Fuji Apple ..etc.
Frequency Histogram tells how many times an Aisle has occurred in our dataset as compared to the others. The relative frequency plot shows that the first four Aisles shown in graph constitute around 1/5th of the transaction dataset; 1/5th the total sales are from these Aisles. It means that many people are buying these items.
Rules with high lift have typically a relatively low support.
In this Vis each node represents product in shopping basket and each rule “from ==> to” is an edge of the graph.The graphs tells that if a customer buys from Aisle Package Vegetable fruit,He is likely to buy from Fresh Vegetable and Packaged Cheese.
Instacart is a web app and in web analytics , data reflects the way users behave, and the way they are encouraged to behave, by the website design decisions earlier made.Market Basket analysis can be used to drive bussiness decision making by using the association results.There are number of ways in which MBA can be used :
Associated Aisles should be placed together on the web application platform to boost the sales and reduce the time spent in finding that Aisle.For instance Cereral, lunch meat and Bread Aisle should be placed together, as they are highly associated.
Associated products such as Organic Cilantro and limes should be put close to one another,to improve the customer shopping experience and improving the store layout.Marketing can take benefit from these association relationship, for example target customers who buy Organic Cilantro with offers on Lime, to encourage them to spend more on their shopping basket.
List of rules can be used to put recommendations at the product pages and at product cart pages.Those rules that are applicable for each product with the high lift where the product recommended has a high margin should be considered.It can drive the significant uplift in profit.