Our goal is to earn customer loyalty and increase revenue through machine-readable marketing data, which allows us to create a user behavior profile and more accurately predict revenue.
So, from the above graph, its clear that Maximum no of orders are placed between 10:00AM and 5:00PM on Sunday and Monday.
Actions: Develop strategies to increase sales during the quietest days of the week:
Offer discounts on certain items.
Offer free delivery with orders over a specific dollar amount.
Positive note: Customers can shop on a Wednesday or Thursday, and most likely, they will never encounter “out of stock” items or delivery/pick delays.
The highest ordered products are Banana, Bag of Organic Bananas, Organic Strawberries and Organic Baby Spinach. So to increase the sale of Strawberries the retailer can put it near the Bananas.
Categorically we see what areas we can improve sales on by addressing various marketing strategies.
Product | Number_of_Orders | Category |
Green Chile Anytime Sauce | 1 | Non-organic |
Pure Coconut Water With Orange | 1 | Non-organic |
Saline Nasal Mist | 1 | Non-organic |
Fresh Scent Dishwasher Cleaner | 1 | Non-organic |
Mint Chocolate Flavored Syrup | 1 | Non-organic |
Product | Number_of_Orders | Category |
Organic Spaghetti Style Pasta | 1 | Organic |
Organic Vegetable Greens & Greens Juice Blend | 1 | Organic |
Organic Peppermint Lemonade | 1 | Organic |
Organic Lemon Gingersnap | 1 | Organic |
Reg; Organic Apple Cider Vinegar 16 Fl Oz | 1 | Organic |
To conduct an Association Rule Mining or Market Basket Analysis, we need to build the transaction dataset. We need to gather information about user_id, order_id, and products_name. This last feature needs to be converted into one row, so we can get all the items that have been bought in the same transaction or order.
## user_id order_id
## 1 112108 1
## 2 79431 36
## 3 42756 38
## 4 17227 96
## 5 56463 98
## 6 125030 112
## V1
## 1 Bag of Organic Bananas,Bulgarian Yogurt,Organic Whole String Cheese,Organic 4% Milk Fat Whole Milk Cottage Cheese,Organic Hass Avocado,Cucumber Kirby,Lightly Smoked Sardines in Olive Oil,Organic Celery Hearts
## 2 Super Greens Salad,Grated Pecorino Romano Cheese,Organic Garnet Sweet Potato (Yam),Organic Half & Half,Prosciutto Americano,Asparagus,Spring Water,Cage Free Extra Large Grade AA Eggs
## 3 Flat Parsley Bunch,Organic Biologique Limes,Bunched Cilantro,Fresh Dill,Shelled Pistachios,Organic Raw Unfiltered Apple Cider Vinegar,Green Peas,Organic Hot House Tomato,Organic Baby Arugula
## 4 Organic Whole Strawberries,Organic Grape Tomatoes,Organic Cucumber,Organic Blueberries,Roasted Turkey,Organic Pomegranate Kernels,Organic Raspberries
## 5 Natural Spring Water,Organic Orange Juice With Calcium & Vitamin D,Olive Oil & Aloe Vera Hand Soap,Baby Swiss Slices Cheese,Organic Free Range Chicken Broth,Organic Raspberries,Raspberry Sorbet Pops,Aluminum Foil,Organic Whole Grassmilk Milk,Plastic Wrap,Black Beans,Corn Maize Tortillas,Organic Extra Virgin Oil Olive,Organic Lemonade,Lavender Scent Laundry Detergent,Queso Fresco,Mild Diced Green Chiles,Organic Raw Kombucha Gingerade,Organic Sliced Provalone Cheese,Organic Coconut Milk,Organic Ketchup,Uncured Applewood Smoked Bacon,Organic Corn Starch,Garbanzo Beans,Organic Chocolate Almondmilk Pudding,Uncured Genoa Salami,Tomatoes Crushed Organic,Organic Stringles Mozzarella String Cheese,Black Beans No Salt Added,Organic 2% Buttermilk,Organic Zucchini,Organic Cinnamon Apple Sauce,Organic Seasoned Yukon Select Potatoes Hashed Browns,Whole Milk Greek Blended Vanilla Bean Yogurt,Unsalted Cultured Butter,Organic Yellow Onion,Natural Chicken & Maple Breakfast Sausage Patty,Geranium Liquid Dish Soap,Pinto Beans No Salt Added,Organic Italian Parsley Bunch,Guacamole,100% Organic Unbleached All-Purpose Flour,Bag of Organic Bananas,Organic Hothouse Cucumbers,Sliced Pepperoni,Crackers Oyster,Organic Unsweetened Almond Milk,Organic Garlic,Plastic Spoons
## 6 Organic Lemon,Coconut Water Kefir,Organic Hass Avocado,Umcka Elderberry Intensive Cold + Flu Berry Flavor,Premium Epsom Salt,Hickory Honey Barbeque Baked Potato Chips,Baked Sea Salt & Vinegar Potato Chips,Sea Salt Baked Potato Chips,Marinara Pasta Sauce,I Heart Baby Kale,Fresh Cauliflower
## transactions as itemMatrix in sparse format with
## 131210 rows (elements/itemsets/transactions) and
## 306433 columns (items) and a density of 3.866984e-05
##
## most frequent items:
## Banana Bag of Organic Bananas Organic Strawberries
## 17611 14604 10241
## Organic Baby Spinach Large Lemon (Other)
## 9326 7724 1495295
##
## element (itemset/transaction) length distribution:
## sizes
## 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
## 7930 8337 8887 8798 9332 8999 8719 8098 7239 6454 5888 5229 4636 4220 3603 3332
## 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
## 2893 2521 2279 1877 1656 1461 1271 1118 924 821 726 548 524 448 373 328
## 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
## 264 237 212 142 134 125 107 66 59 65 42 55 33 28 28 19
## 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
## 16 17 16 17 10 5 5 6 3 3 1 5 1 1 2 4
## 67 68 70 72 74 76 78 82
## 2 2 1 3 1 1 2 1
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.00 6.00 10.00 11.85 15.00 82.00
##
## includes extended item information - examples:
## labels
## 1 #2 Coffee Filters
## 2 #2 Cone White Coffee Filters
## 3 #2 Mechanical Pencils
The summary(tr) is a useful command that gives us information about our transaction object. Let’s take a look at what the above output says:
There are 131,210 transactions (rows) and 306,433 items (columns). Note that 306,433 are the product names involved in the dataset, and 131,210 transactions are collections of these items.
The summary can also tell you the most frequent items.
Element (itemset/transaction) length distribution: This tells you how many transactions are there for 1-itemset, for 2-itemset, and so on. The first row indicates the number of items, and the second row shows the number of transactions.
For example, there are 7930 transactions for three items, 8337 transactions for four things, and 82 items in one transaction.
We use the Apriori algorithm in Arules library to mine frequent itemsets and association rules.
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.2 0.1 1 none FALSE TRUE 5 0.001 1
## maxlen target ext
## 15 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 131
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[306433 item(s), 131210 transaction(s)] done [1.17s].
## sorting and recoding items ... [1729 item(s)] done [0.05s].
## creating transaction tree ... done [0.07s].
## checking subsets of size 1 2 3 4 done [0.06s].
## writing ... [873 rule(s)] done [0.00s].
## creating S4 object ... done [0.04s].
## set of 873 rules
##
## rule length distribution (lhs + rhs):sizes
## 2 3 4
## 380 485 8
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.000 2.000 3.000 2.574 3.000 4.000
##
## summary of quality measures:
## support confidence coverage lift
## Min. :0.001006 Min. :0.2000 Min. :0.001936 Min. : 1.498
## 1st Qu.:0.001136 1st Qu.:0.2261 1st Qu.:0.004344 1st Qu.: 2.167
## Median :0.001372 Median :0.2571 Median :0.005243 Median : 3.120
## Mean :0.001981 Mean :0.2747 Mean :0.007634 Mean : 5.137
## 3rd Qu.:0.001936 3rd Qu.:0.3051 3rd Qu.:0.007599 3rd Qu.: 4.152
## Max. :0.021004 Max. :0.5714 Max. :0.078050 Max. :85.582
## count
## Min. : 132
## 1st Qu.: 149
## Median : 180
## Mean : 260
## 3rd Qu.: 254
## Max. :2756
##
## mining info:
## data ntransactions support confidence
## tr 131210 0.001 0.2
## call
## apriori(data = tr, parameter = list(supp = 0.001, conf = 0.2, maxlen = 15))
The summary of the rules gives us some information:
The number of rules : 873
The distribution of rules by length: a length of 3 items has the most rules.
The summary of quality measures: ranges of support, confidence, and lift.
Since there are 873 rules, let’s print only top 10:
## lhs rhs support confidence coverage lift count
## [1] {Organic Fuji Apples} => {Bag of Organic Bananas} 0.001021264 0.3292383 0.003101898 2.958050 134
## [2] {Packaged Grape Tomatoes} => {Hass Avocados} 0.001173691 0.2558140 0.004588065 15.229287 154
## [3] {Packaged Grape Tomatoes} => {Strawberries} 0.001105099 0.2408638 0.004588065 5.159794 145
## [4] {Baby Cucumbers} => {Hass Avocados} 0.001028885 0.2170418 0.004740492 12.921077 135
## [5] {Baby Cucumbers} => {Raspberries} 0.001036506 0.2186495 0.004740492 9.177544 136
## [6] {Baby Cucumbers} => {Strawberries} 0.001028885 0.2170418 0.004740492 4.649478 135
## [7] {Baby Cucumbers} => {Bag of Organic Bananas} 0.001127963 0.2379421 0.004740492 2.137797 148
## [8] {Nonfat Icelandic Style Strawberry Yogurt} => {Icelandic Style Skyr Blueberry Non-fat Yogurt} 0.001097477 0.4161850 0.002636994 81.625755 144
## [9] {Icelandic Style Skyr Blueberry Non-fat Yogurt} => {Nonfat Icelandic Style Strawberry Yogurt} 0.001097477 0.2152466 0.005098697 81.625755 144
## [10] {Sweet Potato Yam} => {Banana} 0.001028885 0.3638814 0.002827528 2.711083 135
After running the above code for the Apriori algorithm, we can see the following output, specifying the first 10 strongest Association rules, based on the support (minimum support of 0.01), confidence (minimum confidence of 0.2), and lift, along with mentioning the count of times the products occur together in the transactions.
Let’s select 10 rules from subRules having the highest confidence.
Plot an interactive graph:
The network graph below shows associations between selected items. Larger circles imply higher support, while red circles imply higher lift. Graphs only work well with very few rules, why we only use a subset of 10 rules from our data:
Represents the rules (or itemsets) as a parallel coordinate plot (from LHS to RHS).
## NULL
The top rule shows us that when a customer has Organic Raspberries and Organic Hass Avocados in their shopping cart, they are highly likely to buy a Bag of Organic Bananas to go along with those items, and so on. With this information, retailers can influence purchasing decisions by confidently predicting customers’ purchasing decisions.
Retailers can recommend related products and target their marketing strategies to increase revenue. They also can design a better store layout and improve the customer buying experience.
Conduct a Cluster Analysis to identify and understand the customer’s behavior. Then, the buying behavior of each group can be examined separately on measures such as brand loyalty, price willing to pay, frequency of purchase, etc. To do this analysis, refer to the K-means clustering Analysis, Gap Statistic Method, Average Silhouette or Elbow Method.
Conduct a Basket Analysis by department or aisles and increase the confidence of the model. Refer to the lift metric to find the strong-association and randomly sample out those associations.
Instacart Market Basket Analysis. Available online: https://www.kaggle.com/c/instacart-market-basket-analysis/data
Market Basket Analysis using R. https://www.datacamp.com/community/tutorials/market-basket-analysis-r
A Gentle Introduction on Market Basket Analysis.Association Rules. Available online: https://datascienceplus.com/a-gentle-introduction-on-market-basket-analysis%E2%80%8A-%E2%80%8Aassociation-rules/