Market Basket Analysis, also know as affinity analysis or association rule mining, is a data mining technique used mostly in retail to increase sales by focusing on finding purchase patterns by extracting associations from transactional data. The Apriori algorithm generates association rules. An association rule says that if an product 1 occurs, then product 2 occurs with a certain probability
What do my customer buy? What did they buy together? These are the common question or analysis to be made. These analysis revel what people buy together and then can be used to create appropriate promotions or placements of the product in the store. Also to design promotional campaigns.
transactions <- read.transactions('transaction_sample.txt', sep = '|', format = 'single', cols = c(1,2))
summary(transactions)## transactions as itemMatrix in sparse format with
## 8833 rows (elements/itemsets/transactions) and
## 171 columns (items) and a density of 0.01560668
##
## most frequent items:
## Local Vegetables Chocolate Local Fruit Regular Biscuit
## 1860 1337 1129 802
## Spices (Other)
## 726 17719
##
## element (itemset/transaction) length distribution:
## sizes
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
## 4129 1909 1031 590 332 219 145 117 57 55 44 36 24 18 21 10
## 17 18 19 20 21 22 23 24 25 26 27 28 29 32 33 34
## 5 9 11 7 11 11 4 8 5 2 5 5 4 2 3 1
## 36 38
## 1 2
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 1.000 2.000 2.669 3.000 38.000
##
## includes extended item information - examples:
## labels
## 1 100 Pct Juices
## 2 After Shave
## 3 After-Wash
##
## includes extended transaction information - examples:
## transactionID
## 1 1002_J3591_3591060710_01122015
## 2 1002_J3591_3591060711_01122015
## 3 1002_J3591_3591060712_01122015
## items transactionID
## [1] {Local Vegetables} 1002_J3591_3591060710_01122015
## [2] {Toilet Cleaner} 1002_J3591_3591060711_01122015
## [3] {Cream Biscuit} 1002_J3591_3591060712_01122015
## [4] {Dessert,Sauce} 1002_J3591_3591060714_01122015
## [5] {Chocolate,Crisp Corn Snack} 1002_J3591_3591060715_01122015
Sampling the data
If the data size is too big to handle we can do a sample selection. But for association rule sample selection one must consider the basic criterion of support of the itemset Formula for sample selection is, n=(−2)∗log(c)(support∗epsilon2) But before we go ahead with the sample, we must make sure that the sample is a good replica of the universe. In order to do that we will look at the item frequency plot with an additional paramerter of lift. Since our data is not that big we can consider entire datset.
Apriori is the command for association in R. data should be in transaction class. Support: This measure gives an idea of how frequent an itemset is in all the transactions Support({X} -> {Y}) = Transaction containing both X and Y / Total number of transaction Value of support helps us identify the rules worth considering for further analysis Confidence: This measure defines the likeliness of occurrence of consequent on the cart given that the cart already has the antecedents Confidence({X} -> {Y}) = Transaction containing both X and Y / Total number of transaction containing X
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.8 0.1 1 none FALSE TRUE 5 0.005 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 44
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[171 item(s), 8833 transaction(s)] done [0.00s].
## sorting and recoding items ... [83 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 7 done [0.00s].
## writing ... [795 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## lhs rhs support confidence lift
## [1] {Pulses,Vermicelli} => {Edible Oil} 0.0054 0.8136 11.3526
## [2] {Flours,Pooja Needs} => {Spices} 0.0055 0.8033 9.7732
## [3] {Hair Oil,Pulses} => {Spices} 0.0053 0.8103 9.8592
## [4] {Marketing,Salt} => {Pulses} 0.0053 0.8545 14.3230
## [5] {Marketing,Salt} => {Edible Oil} 0.0055 0.8909 12.4319
## [6] {Marketing,Salt} => {Flours} 0.0053 0.8545 10.5422
## [7] {Marketing,Salt} => {Spices} 0.0059 0.9455 11.5030
## [8] {Marketing,Sugar} => {Pulses} 0.0059 0.8125 13.6182
## [9] {Marketing,Sugar} => {Flours} 0.0059 0.8125 10.0235
## [10] {Marketing,Utensil Cleaner} => {In-Wash} 0.0065 0.8028 17.4662
## count
## [1] 48
## [2] 49
## [3] 47
## [4] 47
## [5] 49
## [6] 47
## [7] 52
## [8] 52
## [9] 52
## [10] 57
LHS is antecedent and RHS is consequent {Pulses,Vermicelli} implies {Edible Oil}
support: Of the total no of bills 0.54% bills contain {Pulses, Vermicelli & Edible Oil} confidence: Of the bills that contain {Pulses & Vermicelli} 81.36% also contains {Edible Oil} Looks like a high confidence value. There could be something misleading about this high confidence value. Lift is introduced to overcome this challenge.
lift: (Transaction containing both X and Y / Total number of transaction containing X) / Fraction of transaction containing Y
A value of lift greater than 1 vouches for high association between {Y} and {X}. The larger the lift ratio, the more significant the association.
## lhs rhs support confidence lift count
## [1] {Edible Oil,
## Marketing,
## Spices} => {Salt} 0.0052 0.8070 30.3336 46
## [2] {Edible Oil,
## Flours,
## In-Wash,
## Pulses,
## Spices,
## Utensil Cleaner} => {Salt} 0.0051 0.8036 30.2040 45
## [3] {Flours,
## In-Wash,
## Marketing} => {Utensil Cleaner} 0.0054 0.9057 21.9170 48
## [4] {Edible Oil,
## Flours,
## In-Wash,
## Pulses,
## Salt} => {Utensil Cleaner} 0.0054 0.8727 21.1200 48
## [5] {Edible Oil,
## Flours,
## In-Wash,
## Pulses,
## Salt,
## Spices} => {Utensil Cleaner} 0.0051 0.8654 20.9423 45
subset.matrix <- is.subset(rules_sorted, rules_sorted, sparse = FALSE)
subset.matrix[lower.tri(subset.matrix, diag = T)] <- NA
redundant <- colSums(subset.matrix, na.rm = T) >= 1
rules_pruned <- rules_sorted[!redundant]
rules_pruned## set of 345 rules
## lhs rhs support confidence lift count
## [1] {Edible Oil,
## Marketing,
## Spices} => {Salt} 0.0052 0.8070 30.3336 46
## [2] {Edible Oil,
## Flours,
## In-Wash,
## Pulses,
## Spices,
## Utensil Cleaner} => {Salt} 0.0051 0.8036 30.2040 45
## [3] {Flours,
## In-Wash,
## Marketing} => {Utensil Cleaner} 0.0054 0.9057 21.9170 48
## [4] {Edible Oil,
## Flours,
## In-Wash,
## Pulses,
## Salt} => {Utensil Cleaner} 0.0054 0.8727 21.1200 48
## [5] {Flours,
## In-Wash,
## Pulses,
## Salt,
## Spices} => {Utensil Cleaner} 0.0058 0.8644 20.9186 51
This visualization method draws a two dimensional scatterplot with different measures of interestingness (parameter “measure”) on the axes and a third measure (parameter “shading”) is represented by the color of the points.
Represents the rules (or itemsets) as a graph with items as labeled vertices, and rules (or itemsets) represented as vertices connected to items using arrows. For rules, the LHS items are connected with arrows pointing to the vertex representing the rule and the RHS has an arrow pointing to the item.
## Warning: Unknown control parameters: type
Graph visualization with different layout
This can be done by filtering the rules to see what leads to a particular product
filter = 'Spices'
rules_filtered <- subset(rules_pruned, subset = rhs %in% filter)
inspect(rules_filtered)## lhs rhs support confidence lift count
## [1] {Flours,
## Salt,
## Tooth Paste} => {Spices} 0.0054 0.9600 11.6800 48
## [2] {Flours,
## Salt,
## Sugar,
## Utensil Cleaner} => {Spices} 0.0054 0.9600 11.6800 48
## [3] {Marketing,
## Salt} => {Spices} 0.0059 0.9455 11.5030 52
## [4] {Flours,
## In-Wash,
## Salt,
## Sugar} => {Spices} 0.0058 0.9444 11.4907 51
## [5] {Flours,
## Sugar,
## Tooth Paste} => {Spices} 0.0055 0.9423 11.4647 49
## [6] {In-Wash,
## Salt,
## Sugar} => {Spices} 0.0063 0.9333 11.3556 56
## [7] {Salt,
## Sugar,
## Utensil Cleaner} => {Spices} 0.0058 0.9273 11.2818 51
## [8] {Cereals,
## Flours,
## Salt} => {Spices} 0.0053 0.9216 11.2124 47
## [9] {Salt,
## Soap,
## Sugar} => {Spices} 0.0052 0.9200 11.1933 46
## [10] {Flours,
## Fly Insecticide / Repellant,
## Soap} => {Spices} 0.0051 0.9184 11.1735 45
## [11] {Flours,
## Salt,
## Soap} => {Spices} 0.0062 0.9167 11.1528 55
## [12] {Edible Oil,
## In-Wash,
## Sugar} => {Spices} 0.0076 0.9054 11.0158 67
## [13] {Edible Oil,
## Sugar,
## Tooth Paste} => {Spices} 0.0052 0.9020 10.9739 46
## [14] {Flours,
## In-Wash,
## Sugar} => {Spices} 0.0080 0.8987 10.9346 71
## [15] {Flours,
## In-Wash,
## Salt} => {Spices} 0.0076 0.8933 10.8689 67
## [16] {Flours,
## Soap,
## Sugar} => {Spices} 0.0062 0.8871 10.7930 55
## [17] {Flours,
## In-Wash,
## Marketing} => {Spices} 0.0053 0.8868 10.7893 47
## [18] {In-Wash,
## Pulses,
## Salt} => {Spices} 0.0069 0.8841 10.7560 61
## [19] {Edible Oil,
## Soap,
## Sugar} => {Spices} 0.0058 0.8793 10.6983 51
## [20] {Flours,
## Salt,
## Sugar} => {Spices} 0.0072 0.8767 10.6667 64
## [21] {Flours,
## Soap,
## Tooth Paste} => {Spices} 0.0055 0.8750 10.6458 49
## [22] {Flours,
## Fly Insecticide / Repellant,
## Salt} => {Spices} 0.0054 0.8727 10.6182 48
## [23] {Flours,
## In-Wash,
## Tooth Paste} => {Spices} 0.0067 0.8676 10.5564 59
## [24] {Salt,
## Tea} => {Spices} 0.0059 0.8667 10.5444 52
## [25] {Cereals,
## Salt} => {Spices} 0.0059 0.8667 10.5444 52
## [26] {Pulses,
## Soap,
## Sugar} => {Spices} 0.0058 0.8644 10.5169 51
## [27] {Flours,
## Fly Insecticide / Repellant,
## In-Wash} => {Spices} 0.0065 0.8636 10.5076 57
## [28] {In-Wash,
## Sugar,
## Utensil Cleaner} => {Spices} 0.0066 0.8529 10.3775 58
## [29] {Flours,
## Tooth Paste,
## Utensil Cleaner} => {Spices} 0.0059 0.8525 10.3716 52
## [30] {Salt,
## Tooth Paste} => {Spices} 0.0063 0.8485 10.3232 56
## [31] {Edible Oil,
## Pulses,
## Tea} => {Spices} 0.0061 0.8438 10.2656 54
## [32] {Edible Oil,
## Fly Insecticide / Repellant,
## Pulses} => {Spices} 0.0054 0.8421 10.2456 48
## [33] {In-Wash,
## Sugar} => {Spices} 0.0096 0.8416 10.2393 85
## [34] {Cereals,
## Flours,
## Pulses} => {Spices} 0.0066 0.8406 10.2271 58
## [35] {Fly Insecticide / Repellant,
## Sugar} => {Spices} 0.0062 0.8333 10.1389 55
## [36] {Soap,
## Sugar,
## Utensil Cleaner} => {Spices} 0.0051 0.8333 10.1389 45
## [37] {Edible Oil,
## Pulses,
## Sugar} => {Spices} 0.0094 0.8300 10.0983 83
## [38] {Sugar,
## Tooth Paste} => {Spices} 0.0066 0.8286 10.0810 58
## [39] {Pulses,
## Tooth Paste,
## Utensil Cleaner} => {Spices} 0.0053 0.8246 10.0322 47
## [40] {In-Wash,
## Salt,
## Soap} => {Spices} 0.0058 0.8226 10.0081 51
## [41] {In-Wash,
## Salt} => {Spices} 0.0088 0.8211 9.9895 78
## [42] {Pulses,
## Salt} => {Spices} 0.0103 0.8198 9.9745 91
## [43] {Edible Oil,
## In-Wash,
## Pulses} => {Spices} 0.0088 0.8125 9.8854 78
## [44] {Hair Oil,
## Pulses} => {Spices} 0.0053 0.8103 9.8592 47
## [45] {Salt,
## Soap} => {Spices} 0.0072 0.8101 9.8565 64
## [46] {Flours,
## In-Wash,
## Soap} => {Spices} 0.0072 0.8101 9.8565 64
## [47] {Pulses,
## Tea} => {Spices} 0.0077 0.8095 9.8492 68
## [48] {Salt,
## Utensil Cleaner} => {Spices} 0.0076 0.8072 9.8213 67
## [49] {Soap,
## Sugar} => {Spices} 0.0075 0.8049 9.7927 66
## [50] {Flours,
## Tooth Paste} => {Spices} 0.0084 0.8043 9.7862 74
## [51] {Flours,
## Pooja Needs} => {Spices} 0.0055 0.8033 9.7732 49
## [52] {Flours,
## Soap,
## Utensil Cleaner} => {Spices} 0.0060 0.8030 9.7702 53