Association Mining or Market Basket Analysis is an interesting approach for understanding the product purchase pattern and using which we can come up with various decision factors like Promotional schemes, product recommenders on website, product placement in a store layout, can do fast mover and slow mover analysis.
Let’s load the data.
for dataset visit:https://www.kaggle.com/gorkhachatryan01/purchase-behaviour
And plot the item frequency plot with top 25 transactions.
We will follow the standard process and apply algorithms on the standard data without any parameters which enables the algorithms to give default rules or standard rules
rules<-apriori(mydata)
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.8 0.1 1 none FALSE TRUE 5 0.1 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 149
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[95 item(s), 1499 transaction(s)] done [0.00s].
## sorting and recoding items ... [50 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [86 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
We the see the default parameters and lets find how many rules the algorithm was able to find
## set of 86 rules
Now lets plot the default rules
After seeing the data to clear out we can set parameters and start mining the rules playing with support, confidence and lift.
We can choose an individual product and start mining the association. Let’s take an example that we want to sell bags and we want to target the group customer who buys other products along with bag. So we start looking for the association in the transactions.
bags_rules<-apriori(mydata,appearance = list(default="lhs",rhs="bags"))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.8 0.1 1 none FALSE TRUE 5 0.1 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 149
##
## set item appearances ...[1 item(s)] done [0.00s].
## set transactions ...[95 item(s), 1499 transaction(s)] done [0.00s].
## sorting and recoding items ... [50 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
Lets find the number of rules
## set of 0 rules
We are able to find no rules in the data. so we can play with parameters and find the associations
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.9 0.1 1 none FALSE TRUE 5 0.001 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 1
##
## set item appearances ...[1 item(s)] done [0.00s].
## set transactions ...[95 item(s), 1499 transaction(s)] done [0.00s].
## sorting and recoding items ... [95 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 7 8 9 10
## Warning in apriori(mydata, appearance = list(default = "lhs", rhs =
## "bags"), : Mining stopped (maxlen reached). Only patterns up to a length of
## 10 returned!
## done [4.38s].
## writing ... [4803 rule(s)] done [0.44s].
## creating S4 object ... done [0.14s].
Lets find the rules
## set of 4803 rules
By decreasing the support I was able to find few associations. Now Lets plot it
## Warning: plot: Too many rules supplied. Only plotting the best 100 rules
## using 'lift' (change control parameter max if needed)
Similarly we can find associations with sandwich
sand_rules<-apriori(mydata,appearance = list(default="lhs",rhs="sandwich"))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.8 0.1 1 none FALSE TRUE 5 0.1 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 149
##
## set item appearances ...[1 item(s)] done [0.00s].
## set transactions ...[95 item(s), 1499 transaction(s)] done [0.01s].
## sorting and recoding items ... [50 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [6 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
Lets find the rules
## set of 6 rules
Lets plot the rules
If you want to have a check on the rules individually based on confidence
rules<-sort(rules,by="confidence",decreasing = TRUE)
inspect(rules[1:10])
## lhs rhs support confidence lift
## [1] {loaves,} => {sandwich} 0.2368245 1 2.320433
## [2] {bags,} => {sandwich} 0.2414943 1 2.320433
## [3] {rolls,} => {dinner} 0.2414943 1 3.863402
## [4] {sauce,} => {spaghetti} 0.2434957 1 3.934383
## [5] {towels,} => {paper} 0.2481654 1 3.785354
## [6] {detergent,} => {laundry} 0.2501668 1 3.785354
## [7] {purpose,} => {all-} 0.2508339 1 3.794937
## [8] {foil,} => {aluminum} 0.2521681 1 3.785354
## [9] {meals,} => {individual} 0.2568379 1 3.683047
## [10] {liquid/detergent,} => {dishwashing} 0.2541694 1 3.728856
Similarly we can sort with lift
rules<-sort(rules,by="lift",decreasing = TRUE)
inspect(rules[1:10])
## lhs rhs support confidence lift
## [1] {sandwich,spaghetti} => {sauce,} 0.1060707 0.9636364 3.957509
## [2] {sauce,} => {spaghetti} 0.2434957 1.0000000 3.934383
## [3] {sauce,,soap,} => {spaghetti} 0.1014009 1.0000000 3.934383
## [4] {sandwich,sauce,} => {spaghetti} 0.1060707 1.0000000 3.934383
## [5] {sauce,,vegetables,} => {spaghetti} 0.1334223 1.0000000 3.934383
## [6] {spaghetti} => {sauce,} 0.2434957 0.9580052 3.934383
## [7] {spaghetti,vegetables,} => {sauce,} 0.1334223 0.9569378 3.929999
## [8] {soap,,spaghetti} => {sauce,} 0.1014009 0.9500000 3.901507
## [9] {all-,sandwich} => {purpose,} 0.1107405 0.9707602 3.870132
## [10] {dinner} => {rolls,} 0.2414943 0.9329897 3.863402