Introduction
Association rules are machine learning methods which are widely used for market basket analysis. They let us discover the relationships which drive customers to buy certain products.This knowledge can help shop owners to make their business more profitable by ordering the products in a specific way around the shop.
In this project, an Apriori algorithm was used to extract the realationships in the data. Apriori algorithm allows to reduce the amount of itemsets. Firstly the algorithm identifies frequent items which are individual and then extends them to bigger and more frequent itemsets.Apriori uses two parameters: “support” and “confidence”. Support is item’s frequency of occurrence and confidence is a conditional probability.
Data used for the analysis comes from https://www.kaggle.com/roshansharma/market-basket-optimization.
#Loading the data
tr <- read.transactions("C:\\Users\\sikor\\OneDrive\\Dokumenty\\DATA SCIENCE\\USL\\Market_Basket_Optimisation.csv", format = 'basket', sep=',')
## Warning in asMethod(object): removing duplicated items in transactions
summary(tr)
## transactions as itemMatrix in sparse format with
## 7501 rows (elements/itemsets/transactions) and
## 119 columns (items) and a density of 0.03288973
##
## most frequent items:
## mineral water eggs spaghetti french fries chocolate
## 1788 1348 1306 1282 1229
## (Other)
## 22405
##
## element (itemset/transaction) length distribution:
## sizes
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
## 1754 1358 1044 816 667 493 391 324 259 139 102 67 40 22 17 4
## 18 19 20
## 1 2 1
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 2.000 3.000 3.914 5.000 20.000
##
## includes extended item information - examples:
## labels
## 1 almonds
## 2 antioxydant juice
## 3 asparagus
itemFrequency(tr, type="relative")
## almonds antioxydant juice asparagus
## 0.0203972804 0.0089321424 0.0047993601
## avocado babies food bacon
## 0.0333288895 0.0045327290 0.0086655113
## barbecue sauce black tea blueberries
## 0.0107985602 0.0142647647 0.0091987735
## body spray bramble brownies
## 0.0114651380 0.0018664178 0.0337288362
## bug spray burger sauce burgers
## 0.0086655113 0.0058658845 0.0871883749
## butter cake candy bars
## 0.0301293161 0.0810558592 0.0097320357
## carrots cauliflower cereals
## 0.0153312892 0.0047993601 0.0257299027
## champagne chicken chili
## 0.0467937608 0.0599920011 0.0061325157
## chocolate chocolate bread chutney
## 0.1638448207 0.0042660979 0.0041327823
## cider clothes accessories cookies
## 0.0105319291 0.0083988801 0.0803892814
## cooking oil corn cottage cheese
## 0.0510598587 0.0047993601 0.0318624183
## cream dessert wine eggplant
## 0.0009332089 0.0043994134 0.0131982402
## eggs energy bar energy drink
## 0.1797093721 0.0270630583 0.0266631116
## escalope extra dark chocolate flax seed
## 0.0793227570 0.0119984002 0.0090654579
## french fries french wine fresh bread
## 0.1709105453 0.0225303293 0.0430609252
## fresh tuna fromage blanc frozen smoothie
## 0.0222636982 0.0135981869 0.0633248900
## frozen vegetables gluten free bar grated cheese
## 0.0953206239 0.0069324090 0.0523930143
## green beans green grapes green tea
## 0.0086655113 0.0090654579 0.1321157179
## ground beef gums ham
## 0.0982535662 0.0134648714 0.0265297960
## hand protein bar herb & pepper honey
## 0.0051993068 0.0494600720 0.0474603386
## hot dogs ketchup light cream
## 0.0323956806 0.0043994134 0.0155979203
## light mayo low fat yogurt magazines
## 0.0271963738 0.0765231302 0.0109318757
## mashed potato mayonnaise meatballs
## 0.0041327823 0.0061325157 0.0209305426
## melons milk mineral water
## 0.0119984002 0.1295827223 0.2383682176
## mint mint green tea muffins
## 0.0174643381 0.0055992534 0.0241301160
## mushroom cream sauce napkins nonfat milk
## 0.0190641248 0.0006665778 0.0103986135
## oatmeal oil olive oil
## 0.0043994134 0.0230635915 0.0658578856
## pancakes parmesan cheese pasta
## 0.0950539928 0.0198640181 0.0157312358
## pepper pet food pickles
## 0.0265297960 0.0065324623 0.0059992001
## protein bar red wine rice
## 0.0185308626 0.0281295827 0.0187974937
## salad salmon salt
## 0.0049326756 0.0425276630 0.0091987735
## sandwich shallot shampoo
## 0.0045327290 0.0077323024 0.0049326756
## shrimp soda soup
## 0.0714571390 0.0062658312 0.0505265965
## spaghetti sparkling water spinach
## 0.1741101187 0.0062658312 0.0070657246
## strawberries strong cheese tea
## 0.0213304893 0.0077323024 0.0038661512
## tomato juice tomato sauce tomatoes
## 0.0303959472 0.0141314491 0.0683908812
## toothpaste turkey vegetables mix
## 0.0081322490 0.0625249967 0.0257299027
## water spray white wine whole weat flour
## 0.0003999467 0.0165311292 0.0093320891
## whole wheat pasta whole wheat rice yams
## 0.0294627383 0.0585255299 0.0114651380
## yogurt cake zucchini
## 0.0273296894 0.0094654046
itemFrequency(tr, type="absolute")
## almonds antioxydant juice asparagus
## 153 67 36
## avocado babies food bacon
## 250 34 65
## barbecue sauce black tea blueberries
## 81 107 69
## body spray bramble brownies
## 86 14 253
## bug spray burger sauce burgers
## 65 44 654
## butter cake candy bars
## 226 608 73
## carrots cauliflower cereals
## 115 36 193
## champagne chicken chili
## 351 450 46
## chocolate chocolate bread chutney
## 1229 32 31
## cider clothes accessories cookies
## 79 63 603
## cooking oil corn cottage cheese
## 383 36 239
## cream dessert wine eggplant
## 7 33 99
## eggs energy bar energy drink
## 1348 203 200
## escalope extra dark chocolate flax seed
## 595 90 68
## french fries french wine fresh bread
## 1282 169 323
## fresh tuna fromage blanc frozen smoothie
## 167 102 475
## frozen vegetables gluten free bar grated cheese
## 715 52 393
## green beans green grapes green tea
## 65 68 991
## ground beef gums ham
## 737 101 199
## hand protein bar herb & pepper honey
## 39 371 356
## hot dogs ketchup light cream
## 243 33 117
## light mayo low fat yogurt magazines
## 204 574 82
## mashed potato mayonnaise meatballs
## 31 46 157
## melons milk mineral water
## 90 972 1788
## mint mint green tea muffins
## 131 42 181
## mushroom cream sauce napkins nonfat milk
## 143 5 78
## oatmeal oil olive oil
## 33 173 494
## pancakes parmesan cheese pasta
## 713 149 118
## pepper pet food pickles
## 199 49 45
## protein bar red wine rice
## 139 211 141
## salad salmon salt
## 37 319 69
## sandwich shallot shampoo
## 34 58 37
## shrimp soda soup
## 536 47 379
## spaghetti sparkling water spinach
## 1306 47 53
## strawberries strong cheese tea
## 160 58 29
## tomato juice tomato sauce tomatoes
## 228 106 513
## toothpaste turkey vegetables mix
## 61 469 193
## water spray white wine whole weat flour
## 3 124 70
## whole wheat pasta whole wheat rice yams
## 221 439 86
## yogurt cake zucchini
## 205 71
# Checking the frequency plot for top 15
itemFrequencyPlot(tr,topN=15,type="absolute",col=brewer.pal(10,'Paired'), xlab="Product name",
ylab="Frequency in absolute terms", main="Absolute Item Frequency Plot")
# Trying diffrent support and confidence tresholds to see how many rules they generate, to later choose the most appropriate treshold values.
support <- c(0.1, 0.05, 0.01, 0.005)
confidencelvl <- c(0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1)
support1 <- integer(length=9)
support2 <- integer(length=9)
support3 <- integer(length=9)
support4 <- integer(length=9)
for (i in 1:length(confidencelvl)) {
support1[i] <- length(apriori(tr, parameter=list(sup=support[1], conf=confidencelvl[i], target="rules")))
}
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.9 0.1 1 none FALSE TRUE 5 0.1 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 750
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 7501 transaction(s)] done [0.00s].
## sorting and recoding items ... [7 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.8 0.1 1 none FALSE TRUE 5 0.1 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 750
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 7501 transaction(s)] done [0.00s].
## sorting and recoding items ... [7 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.7 0.1 1 none FALSE TRUE 5 0.1 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 750
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 7501 transaction(s)] done [0.00s].
## sorting and recoding items ... [7 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.6 0.1 1 none FALSE TRUE 5 0.1 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 750
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 7501 transaction(s)] done [0.00s].
## sorting and recoding items ... [7 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.5 0.1 1 none FALSE TRUE 5 0.1 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 750
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 7501 transaction(s)] done [0.00s].
## sorting and recoding items ... [7 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.4 0.1 1 none FALSE TRUE 5 0.1 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 750
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 7501 transaction(s)] done [0.00s].
## sorting and recoding items ... [7 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.3 0.1 1 none FALSE TRUE 5 0.1 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 750
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 7501 transaction(s)] done [0.00s].
## sorting and recoding items ... [7 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.2 0.1 1 none FALSE TRUE 5 0.1 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 750
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 7501 transaction(s)] done [0.00s].
## sorting and recoding items ... [7 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [1 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.1 0.1 1 none FALSE TRUE 5 0.1 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 750
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 7501 transaction(s)] done [0.00s].
## sorting and recoding items ... [7 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [7 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
for (i in 1:length(confidencelvl)){
support2[i] <- length(apriori(tr, parameter=list(sup=support[2], conf=confidencelvl[i], target="rules")))
}
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.9 0.1 1 none FALSE TRUE 5 0.05 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 375
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 7501 transaction(s)] done [0.00s].
## sorting and recoding items ... [25 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.8 0.1 1 none FALSE TRUE 5 0.05 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 375
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 7501 transaction(s)] done [0.00s].
## sorting and recoding items ... [25 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.7 0.1 1 none FALSE TRUE 5 0.05 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 375
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 7501 transaction(s)] done [0.00s].
## sorting and recoding items ... [25 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.6 0.1 1 none FALSE TRUE 5 0.05 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 375
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 7501 transaction(s)] done [0.00s].
## sorting and recoding items ... [25 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.5 0.1 1 none FALSE TRUE 5 0.05 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 375
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 7501 transaction(s)] done [0.00s].
## sorting and recoding items ... [25 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.4 0.1 1 none FALSE TRUE 5 0.05 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 375
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 7501 transaction(s)] done [0.00s].
## sorting and recoding items ... [25 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.3 0.1 1 none FALSE TRUE 5 0.05 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 375
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 7501 transaction(s)] done [0.00s].
## sorting and recoding items ... [25 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [2 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.2 0.1 1 none FALSE TRUE 5 0.05 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 375
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 7501 transaction(s)] done [0.00s].
## sorting and recoding items ... [25 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [7 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.1 0.1 1 none FALSE TRUE 5 0.05 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 375
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 7501 transaction(s)] done [0.00s].
## sorting and recoding items ... [25 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [13 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
for (i in 1:length(confidencelvl)){
support3[i] <- length(apriori(tr, parameter=list(sup=support[3], conf=confidencelvl[i], target="rules")))
}
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.9 0.1 1 none FALSE TRUE 5 0.01 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 75
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 7501 transaction(s)] done [0.00s].
## sorting and recoding items ... [75 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.8 0.1 1 none FALSE TRUE 5 0.01 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 75
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 7501 transaction(s)] done [0.00s].
## sorting and recoding items ... [75 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.7 0.1 1 none FALSE TRUE 5 0.01 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 75
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 7501 transaction(s)] done [0.00s].
## sorting and recoding items ... [75 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.6 0.1 1 none FALSE TRUE 5 0.01 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 75
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 7501 transaction(s)] done [0.00s].
## sorting and recoding items ... [75 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.5 0.1 1 none FALSE TRUE 5 0.01 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 75
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 7501 transaction(s)] done [0.00s].
## sorting and recoding items ... [75 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [2 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.4 0.1 1 none FALSE TRUE 5 0.01 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 75
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 7501 transaction(s)] done [0.00s].
## sorting and recoding items ... [75 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [18 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.3 0.1 1 none FALSE TRUE 5 0.01 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 75
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 7501 transaction(s)] done [0.00s].
## sorting and recoding items ... [75 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [63 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.2 0.1 1 none FALSE TRUE 5 0.01 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 75
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 7501 transaction(s)] done [0.00s].
## sorting and recoding items ... [75 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [164 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.1 0.1 1 none FALSE TRUE 5 0.01 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 75
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 7501 transaction(s)] done [0.00s].
## sorting and recoding items ... [75 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [316 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
for (i in 1:length(confidencelvl)){
support4[i] <- length(apriori(tr, parameter=list(sup=support[4], conf=confidencelvl[i], target="rules")))
}
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.9 0.1 1 none FALSE TRUE 5 0.005 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 37
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 7501 transaction(s)] done [0.00s].
## sorting and recoding items ... [101 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.8 0.1 1 none FALSE TRUE 5 0.005 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 37
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 7501 transaction(s)] done [0.00s].
## sorting and recoding items ... [101 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.7 0.1 1 none FALSE TRUE 5 0.005 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 37
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 7501 transaction(s)] done [0.00s].
## sorting and recoding items ... [101 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.6 0.1 1 none FALSE TRUE 5 0.005 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 37
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 7501 transaction(s)] done [0.00s].
## sorting and recoding items ... [101 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [1 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.5 0.1 1 none FALSE TRUE 5 0.005 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 37
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 7501 transaction(s)] done [0.00s].
## sorting and recoding items ... [101 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [20 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.4 0.1 1 none FALSE TRUE 5 0.005 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 37
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 7501 transaction(s)] done [0.00s].
## sorting and recoding items ... [101 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [91 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.3 0.1 1 none FALSE TRUE 5 0.005 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 37
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 7501 transaction(s)] done [0.00s].
## sorting and recoding items ... [101 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [261 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.2 0.1 1 none FALSE TRUE 5 0.005 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 37
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 7501 transaction(s)] done [0.00s].
## sorting and recoding items ... [101 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [599 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.1 0.1 1 none FALSE TRUE 5 0.005 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 37
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 7501 transaction(s)] done [0.00s].
## sorting and recoding items ... [101 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [1066 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
#Plotting the results
rulez <- data.frame(support1,support2,support3,support4, confidencelvl)
ggplot(data=rulez, aes(x=confidencelvl)) +
geom_line(aes(y=support1, colour="Support 10%")) +
geom_point(aes(y=support1, colour="Support 10%")) +
geom_line(aes(y=support2, colour="Support 5%")) +
geom_point(aes(y=support2, colour="Support 5%")) +
geom_line(aes(y=support3, colour="Support 1%")) +
geom_point(aes(y=support3, colour="Support 1%")) +
geom_line(aes(y=support4, colour="Support 0.5%")) +
geom_point(aes(y=support4, colour="Support 0.5%")) +
labs(x="Confidence", y="Rules found",
title="Choosing appropriate support level") +
theme_bw() +
theme(legend.title=element_blank())
# A closer look on support levels 5% and 10%
plot1 <- qplot(confidencelvl, support2, geom=c("point", "line"),
xlab="Confidence", ylab="Rules",
main="Support 5%") +
theme_bw()
# Number of rules found with a support level of 5%
plot2 <- qplot(confidencelvl, support1, geom=c("point", "line"),
xlab="Confidence", ylab="Rules",
main="Support 10 %") +
scale_y_continuous(breaks=seq(0, 10, 2)) +
theme_bw()
grid.arrange(plot1, plot2, ncol=2)
# Conducting Apriori with support level at 5% and confidence level at 30%
resulting_rules1 <- apriori(tr, parameter=list(sup=support[2], conf=confidencelvl[7], target="rules"))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.3 0.1 1 none FALSE TRUE 5 0.05 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 375
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 7501 transaction(s)] done [0.00s].
## sorting and recoding items ... [25 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [2 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
inspect(resulting_rules1)
## lhs rhs support confidence coverage lift
## [1] {chocolate} => {mineral water} 0.05265965 0.3213995 0.1638448 1.348332
## [2] {spaghetti} => {mineral water} 0.05972537 0.3430322 0.1741101 1.439085
## count
## [1] 395
## [2] 448
We are getting 2 rules as a result. First rule indicates that 32% of customers who purchased chocolate also bought mineral water. Second interpretation: 34% of customers who bought spaghetti, also purchased mineral water.
To see more rules we can use diffrent support and confidence levels. Let’s see the results for lower support, for exaMPLE SUPPORT = 0.01. The elbow point for support=0.01 is around confidence level of 40%.
resulting_rules2 <- apriori(tr, parameter=list(sup=support[3], conf=confidencelvl[6], target="rules"))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.4 0.1 1 none FALSE TRUE 5 0.01 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 75
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[119 item(s), 7501 transaction(s)] done [0.00s].
## sorting and recoding items ... [75 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [18 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
inspect(resulting_rules2[1:10])
## lhs rhs support confidence
## [1] {salmon} => {mineral water} 0.01706439 0.4012539
## [2] {soup} => {mineral water} 0.02306359 0.4564644
## [3] {olive oil} => {mineral water} 0.02759632 0.4190283
## [4] {ground beef} => {mineral water} 0.04092788 0.4165536
## [5] {olive oil,spaghetti} => {mineral water} 0.01026530 0.4476744
## [6] {pancakes,spaghetti} => {mineral water} 0.01146514 0.4550265
## [7] {frozen vegetables,milk} => {mineral water} 0.01106519 0.4689266
## [8] {frozen vegetables,spaghetti} => {mineral water} 0.01199840 0.4306220
## [9] {ground beef,milk} => {mineral water} 0.01106519 0.5030303
## [10] {chocolate,ground beef} => {mineral water} 0.01093188 0.4739884
## coverage lift count
## [1] 0.04252766 1.683336 128
## [2] 0.05052660 1.914955 173
## [3] 0.06585789 1.757904 207
## [4] 0.09825357 1.747522 307
## [5] 0.02293028 1.878079 77
## [6] 0.02519664 1.908923 86
## [7] 0.02359685 1.967236 83
## [8] 0.02786295 1.806541 90
## [9] 0.02199707 2.110308 83
## [10] 0.02306359 1.988472 82
summary(resulting_rules2)
## set of 18 rules
##
## rule length distribution (lhs + rhs):sizes
## 2 3
## 4 14
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.000 3.000 3.000 2.778 3.000 3.000
##
## summary of quality measures:
## support confidence coverage lift
## Min. :0.01013 Min. :0.4013 Min. :0.02000 Min. :1.683
## 1st Qu.:0.01117 1st Qu.:0.4175 1st Qu.:0.02400 1st Qu.:1.763
## Median :0.01373 Median :0.4355 Median :0.03266 Median :1.844
## Mean :0.01621 Mean :0.4414 Mean :0.03737 Mean :1.888
## 3rd Qu.:0.01706 3rd Qu.:0.4561 3rd Qu.:0.04049 3rd Qu.:1.954
## Max. :0.04093 Max. :0.5067 Max. :0.09825 Max. :2.395
## count
## Min. : 76.00
## 1st Qu.: 83.75
## Median :103.00
## Mean :121.61
## 3rd Qu.:128.00
## Max. :307.00
##
## mining info:
## data ntransactions support confidence
## tr 7501 0.01 0.4
Interpretation:
There are 18 rules for such support and confidence levels. 40% of customers who bought salmon also purchased mineral water. 50% of customers who bought ground beef and milk also purchased mineral water. 43% of customers who bought frozen vegetables and spaghetti also purchased mineral water.
#The most meaningfull rules
inspect(sort(resulting_rules2, by = "lift")[1:5])
## lhs rhs support confidence
## [1] {ground beef,mineral water} => {spaghetti} 0.01706439 0.4169381
## [2] {eggs,ground beef} => {mineral water} 0.01013198 0.5066667
## [3] {ground beef,milk} => {mineral water} 0.01106519 0.5030303
## [4] {chocolate,ground beef} => {mineral water} 0.01093188 0.4739884
## [5] {frozen vegetables,milk} => {mineral water} 0.01106519 0.4689266
## coverage lift count
## [1] 0.04092788 2.394681 128
## [2] 0.01999733 2.125563 76
## [3] 0.02199707 2.110308 83
## [4] 0.02306359 1.988472 82
## [5] 0.02359685 1.967236 83
42% of customers who bought ground beef and mineral water also purchased spaghetti. 51% of customers who bought eggs and ground beef also purchased mineral water. 50% of customers who bought ground beef and milk also purchased mineral water.
#ordered by confidence
inspect(sort(resulting_rules2, by = "confidence")[1:5])
## lhs rhs support confidence
## [1] {eggs,ground beef} => {mineral water} 0.01013198 0.5066667
## [2] {ground beef,milk} => {mineral water} 0.01106519 0.5030303
## [3] {chocolate,ground beef} => {mineral water} 0.01093188 0.4739884
## [4] {frozen vegetables,milk} => {mineral water} 0.01106519 0.4689266
## [5] {soup} => {mineral water} 0.02306359 0.4564644
## coverage lift count
## [1] 0.01999733 2.125563 76
## [2] 0.02199707 2.110308 83
## [3] 0.02306359 1.988472 82
## [4] 0.02359685 1.967236 83
## [5] 0.05052660 1.914955 173
#ordered by support
inspect(sort(resulting_rules2, by = "support")[1:5])
## lhs rhs support confidence coverage
## [1] {ground beef} => {mineral water} 0.04092788 0.4165536 0.09825357
## [2] {olive oil} => {mineral water} 0.02759632 0.4190283 0.06585789
## [3] {soup} => {mineral water} 0.02306359 0.4564644 0.05052660
## [4] {salmon} => {mineral water} 0.01706439 0.4012539 0.04252766
## [5] {ground beef,spaghetti} => {mineral water} 0.01706439 0.4353741 0.03919477
## lift count
## [1] 1.747522 307
## [2] 1.757904 207
## [3] 1.914955 173
## [4] 1.683336 128
## [5] 1.826477 128
#Answering some questions
# What is driving people to buy mineral water?
rules.rootveg<-apriori(data=tr, parameter=list(supp=0.01,conf = 0.4),
appearance=list(default="lhs", rhs="mineral water"), control=list(verbose=F))
rules.rootveg.byconf<-sort(rules.rootveg, by="confidence", decreasing=TRUE)
inspect(head(rules.rootveg.byconf))
## lhs rhs support confidence
## [1] {eggs,ground beef} => {mineral water} 0.01013198 0.5066667
## [2] {ground beef,milk} => {mineral water} 0.01106519 0.5030303
## [3] {chocolate,ground beef} => {mineral water} 0.01093188 0.4739884
## [4] {frozen vegetables,milk} => {mineral water} 0.01106519 0.4689266
## [5] {soup} => {mineral water} 0.02306359 0.4564644
## [6] {pancakes,spaghetti} => {mineral water} 0.01146514 0.4550265
## coverage lift count
## [1] 0.01999733 2.125563 76
## [2] 0.02199707 2.110308 83
## [3] 0.02306359 1.988472 82
## [4] 0.02359685 1.967236 83
## [5] 0.05052660 1.914955 173
## [6] 0.02519664 1.908923 86
In most cases it is eggs and ground beef, then ground beef and milk.
#Plotting the results
plot(resulting_rules2, col="#EEACACFF")
plot(resulting_rules2, col="#EEACACFF", method="grouped")
plot(resulting_rules2, method="graph")
The apriori algorithm allowed us to discover the realtionships between purchasing certain products. It appears that, for example buying eggs and ground beef or ground beef and milk drives people to buy also mineral water. This suggests, that a shop owner may increase sales by putting mineral water close to beef and milk or eggs and beef.