Association rules are used in data mining to discover relationships between variables in large data sets. By analyzing patterns in the data, we can discover connections and dependencies that might not be easily apparent, and use them to make predictions, recommendations, and other insights.
For example, a retailer can use association rules to gain insights into customer buying behavior. By analyzing transaction data, they may discover that customers who buy milk are more likely to buy bread and eggs as well. With this knowledge, the retailer can create targeted promotions or product bundles to encourage customers to purchase all three items together, leading to increased sales and customer satisfaction.
Association rules can also be applied to a wide range of other fields, such as healthcare, finance, and social networks. In healthcare, they can be used to identify patterns in patient data that could be indicative of certain diseases or conditions. In finance, they can help detect fraudulent transactions by identifying unusual patterns of activity. And in social networks, they can be used to understand how users interact with each other, and to make recommendations for new connections and content.
In this project, we will use the groceries dataset from the arules library to analyze customer purchase patterns and uncover association rules that can help retailers make targeted promotions and product bundles to increase sales and customer satisfaction.
library(arules)
library(arulesViz)
library(tidyverse)
library(fim4r)
In this project, Groceries dataset from the arules package will be analyzed. This data set contains 9835 transactions and 169 items.
data(Groceries)
transactions <- Groceries
Product names in the groceries data set.
# Find the products in the data set
products <- itemLabels(transactions)
# View the unique products
products
## [1] "frankfurter" "sausage"
## [3] "liver loaf" "ham"
## [5] "meat" "finished products"
## [7] "organic sausage" "chicken"
## [9] "turkey" "pork"
## [11] "beef" "hamburger meat"
## [13] "fish" "citrus fruit"
## [15] "tropical fruit" "pip fruit"
## [17] "grapes" "berries"
## [19] "nuts/prunes" "root vegetables"
## [21] "onions" "herbs"
## [23] "other vegetables" "packaged fruit/vegetables"
## [25] "whole milk" "butter"
## [27] "curd" "dessert"
## [29] "butter milk" "yogurt"
## [31] "whipped/sour cream" "beverages"
## [33] "UHT-milk" "condensed milk"
## [35] "cream" "soft cheese"
## [37] "sliced cheese" "hard cheese"
## [39] "cream cheese " "processed cheese"
## [41] "spread cheese" "curd cheese"
## [43] "specialty cheese" "mayonnaise"
## [45] "salad dressing" "tidbits"
## [47] "frozen vegetables" "frozen fruits"
## [49] "frozen meals" "frozen fish"
## [51] "frozen chicken" "ice cream"
## [53] "frozen dessert" "frozen potato products"
## [55] "domestic eggs" "rolls/buns"
## [57] "white bread" "brown bread"
## [59] "pastry" "roll products "
## [61] "semi-finished bread" "zwieback"
## [63] "potato products" "flour"
## [65] "salt" "rice"
## [67] "pasta" "vinegar"
## [69] "oil" "margarine"
## [71] "specialty fat" "sugar"
## [73] "artif. sweetener" "honey"
## [75] "mustard" "ketchup"
## [77] "spices" "soups"
## [79] "ready soups" "Instant food products"
## [81] "sauces" "cereals"
## [83] "organic products" "baking powder"
## [85] "preservation products" "pudding powder"
## [87] "canned vegetables" "canned fruit"
## [89] "pickled vegetables" "specialty vegetables"
## [91] "jam" "sweet spreads"
## [93] "meat spreads" "canned fish"
## [95] "dog food" "cat food"
## [97] "pet care" "baby food"
## [99] "coffee" "instant coffee"
## [101] "tea" "cocoa drinks"
## [103] "bottled water" "soda"
## [105] "misc. beverages" "fruit/vegetable juice"
## [107] "syrup" "bottled beer"
## [109] "canned beer" "brandy"
## [111] "whisky" "liquor"
## [113] "rum" "liqueur"
## [115] "liquor (appetizer)" "white wine"
## [117] "red/blush wine" "prosecco"
## [119] "sparkling wine" "salty snack"
## [121] "popcorn" "nut snack"
## [123] "snack products" "long life bakery product"
## [125] "waffles" "cake bar"
## [127] "chewing gum" "chocolate"
## [129] "cooking chocolate" "specialty chocolate"
## [131] "specialty bar" "chocolate marshmallow"
## [133] "candy" "seasonal products"
## [135] "detergent" "softener"
## [137] "decalcifier" "dish cleaner"
## [139] "abrasive cleaner" "cleaner"
## [141] "toilet cleaner" "bathroom cleaner"
## [143] "hair spray" "dental care"
## [145] "male cosmetics" "make up remover"
## [147] "skin care" "female sanitary products"
## [149] "baby cosmetics" "soap"
## [151] "rubbing alcohol" "hygiene articles"
## [153] "napkins" "dishes"
## [155] "cookware" "kitchen utensil"
## [157] "cling film/bags" "kitchen towels"
## [159] "house keeping products" "candles"
## [161] "light bulbs" "sound storage medium"
## [163] "newspapers" "photo/film"
## [165] "pot plants" "flower soil/fertilizer"
## [167] "flower (seeds)" "shopping bags"
## [169] "bags"
summary(transactions)
## transactions as itemMatrix in sparse format with
## 9835 rows (elements/itemsets/transactions) and
## 169 columns (items) and a density of 0.02609146
##
## most frequent items:
## whole milk other vegetables rolls/buns soda
## 2513 1903 1809 1715
## yogurt (Other)
## 1372 34055
##
## element (itemset/transaction) length distribution:
## sizes
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
## 2159 1643 1299 1005 855 645 545 438 350 246 182 117 78 77 55 46
## 17 18 19 20 21 22 23 24 26 27 28 29 32
## 29 14 14 9 11 4 6 1 1 1 1 3 1
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 2.000 3.000 4.409 6.000 32.000
##
## includes extended item information - examples:
## labels level2 level1
## 1 frankfurter sausage meat and sausage
## 2 sausage sausage meat and sausage
## 3 liver loaf sausage meat and sausage
Transaction number
nrow(transactions)
## [1] 9835
Average size of basket
size <- size(transactions)
mean(size)
## [1] 4.409456
This graph shows the frequencies of each product.
itemFrequencyPlot(transactions, support = 0.05, cex.names = 0.8, col = "pink")
Most frequent 30 product
itemFrequencyPlot(transactions, topN=30, col="pink")
sort(itemFrequency(transactions, type="relative"))
## baby food sound storage medium preservation products
## 0.0001016777 0.0001016777 0.0002033554
## kitchen utensil bags frozen chicken
## 0.0004067107 0.0004067107 0.0006100661
## baby cosmetics toilet cleaner salad dressing
## 0.0006100661 0.0007117438 0.0008134215
## whisky make up remover liqueur
## 0.0008134215 0.0008134215 0.0009150991
## rubbing alcohol hair spray frozen fruits
## 0.0010167768 0.0011184545 0.0012201322
## cream honey decalcifier
## 0.0013218099 0.0015251652 0.0015251652
## organic products specialty vegetables ready soups
## 0.0016268429 0.0017285206 0.0018301983
## flower soil/fertilizer prosecco organic sausage
## 0.0019318760 0.0020335536 0.0022369090
## cocoa drinks tidbits pudding powder
## 0.0022369090 0.0023385867 0.0023385867
## cooking chocolate soap bathroom cleaner
## 0.0025419420 0.0026436197 0.0027452974
## cookware potato products fish
## 0.0027452974 0.0028469751 0.0029486528
## snack products nut snack artif. sweetener
## 0.0030503305 0.0031520081 0.0032536858
## canned fruit syrup nuts/prunes
## 0.0032536858 0.0032536858 0.0033553635
## abrasive cleaner skin care specialty fat
## 0.0035587189 0.0035587189 0.0036603965
## tea brandy light bulbs
## 0.0038637519 0.0041687850 0.0041687850
## ketchup meat spreads rum
## 0.0042704626 0.0042704626 0.0044738180
## male cosmetics liver loaf curd cheese
## 0.0045754957 0.0050838841 0.0050838841
## cleaner spices jam
## 0.0050838841 0.0051855618 0.0053889171
## sauces softener sparkling wine
## 0.0054905948 0.0054905948 0.0055922725
## cereals dental care kitchen towels
## 0.0056939502 0.0057956279 0.0059989832
## female sanitary products finished products vinegar
## 0.0061006609 0.0065073716 0.0065073716
## soups zwieback popcorn
## 0.0068124047 0.0069140824 0.0072191154
## instant coffee rice liquor (appetizer)
## 0.0074224708 0.0076258261 0.0079308592
## Instant food products turkey house keeping products
## 0.0080325369 0.0081342145 0.0083375699
## frozen potato products specialty cheese dog food
## 0.0084392476 0.0085409253 0.0085409253
## candles sweet spreads chocolate marshmallow
## 0.0089476360 0.0090493137 0.0090493137
## mayonnaise photo/film pet care
## 0.0091509914 0.0092526690 0.0094560244
## condensed milk roll products flower (seeds)
## 0.0102694459 0.0102694459 0.0103711235
## dish cleaner frozen dessert salt
## 0.0104728012 0.0107778343 0.0107778343
## canned vegetables liquor spread cheese
## 0.0107778343 0.0110828673 0.0111845450
## cling film/bags frozen fish mustard
## 0.0113879004 0.0116929334 0.0119979664
## packaged fruit/vegetables cake bar seasonal products
## 0.0130147433 0.0132180986 0.0142348754
## pasta canned fish herbs
## 0.0150482969 0.0150482969 0.0162684291
## processed cheese soft cheese pot plants
## 0.0165734621 0.0170818505 0.0172852059
## flour dishes semi-finished bread
## 0.0173868836 0.0175902389 0.0176919166
## baking powder pickled vegetables white wine
## 0.0176919166 0.0178952720 0.0190137265
## red/blush wine detergent chewing gum
## 0.0192170819 0.0192170819 0.0210472801
## grapes cat food sliced cheese
## 0.0223690900 0.0232841891 0.0245043213
## hard cheese ice cream meat
## 0.0245043213 0.0250127097 0.0258261312
## ham beverages specialty bar
## 0.0260294865 0.0260294865 0.0273512964
## butter milk oil frozen meals
## 0.0279613625 0.0280630402 0.0283680732
## misc. beverages candy specialty chocolate
## 0.0283680732 0.0298932384 0.0304016268
## onions hygiene articles hamburger meat
## 0.0310116929 0.0329435689 0.0332486019
## berries UHT-milk sugar
## 0.0332486019 0.0334519573 0.0338586680
## dessert long life bakery product salty snack
## 0.0371123538 0.0374173869 0.0378240976
## waffles cream cheese white bread
## 0.0384341637 0.0396542959 0.0420945602
## chicken frozen vegetables chocolate
## 0.0429079817 0.0480935435 0.0496187087
## napkins beef curd
## 0.0523640061 0.0524656838 0.0532791052
## butter pork coffee
## 0.0554143366 0.0576512456 0.0580579563
## margarine frankfurter domestic eggs
## 0.0585663447 0.0589730554 0.0634468734
## brown bread whipped/sour cream fruit/vegetable juice
## 0.0648703610 0.0716827656 0.0722928317
## pip fruit canned beer newspapers
## 0.0756481952 0.0776817489 0.0798169802
## bottled beer citrus fruit pastry
## 0.0805287239 0.0827656329 0.0889679715
## sausage shopping bags tropical fruit
## 0.0939501779 0.0985256736 0.1049313676
## root vegetables bottled water yogurt
## 0.1089984748 0.1105236401 0.1395017794
## soda rolls/buns other vegetables
## 0.1743772242 0.1839349263 0.1934926284
## whole milk
## 0.2555160142
Some products have extremely low frequency and they can not be analyzed with association rules. The products which has frequency lower then 0.005 dropped from the data set.
transactions <- transactions[, itemFrequency(transactions)>0.005]
sort(itemFrequency(transactions, type="relative"))
## liver loaf curd cheese cleaner
## 0.005083884 0.005083884 0.005083884
## spices jam sauces
## 0.005185562 0.005388917 0.005490595
## softener sparkling wine cereals
## 0.005490595 0.005592272 0.005693950
## dental care kitchen towels female sanitary products
## 0.005795628 0.005998983 0.006100661
## finished products vinegar soups
## 0.006507372 0.006507372 0.006812405
## zwieback popcorn instant coffee
## 0.006914082 0.007219115 0.007422471
## rice liquor (appetizer) Instant food products
## 0.007625826 0.007930859 0.008032537
## turkey house keeping products frozen potato products
## 0.008134215 0.008337570 0.008439248
## specialty cheese dog food candles
## 0.008540925 0.008540925 0.008947636
## sweet spreads chocolate marshmallow mayonnaise
## 0.009049314 0.009049314 0.009150991
## photo/film pet care condensed milk
## 0.009252669 0.009456024 0.010269446
## roll products flower (seeds) dish cleaner
## 0.010269446 0.010371124 0.010472801
## frozen dessert salt canned vegetables
## 0.010777834 0.010777834 0.010777834
## liquor spread cheese cling film/bags
## 0.011082867 0.011184545 0.011387900
## frozen fish mustard packaged fruit/vegetables
## 0.011692933 0.011997966 0.013014743
## cake bar seasonal products pasta
## 0.013218099 0.014234875 0.015048297
## canned fish herbs processed cheese
## 0.015048297 0.016268429 0.016573462
## soft cheese pot plants flour
## 0.017081851 0.017285206 0.017386884
## dishes semi-finished bread baking powder
## 0.017590239 0.017691917 0.017691917
## pickled vegetables white wine red/blush wine
## 0.017895272 0.019013726 0.019217082
## detergent chewing gum grapes
## 0.019217082 0.021047280 0.022369090
## cat food sliced cheese hard cheese
## 0.023284189 0.024504321 0.024504321
## ice cream meat ham
## 0.025012710 0.025826131 0.026029487
## beverages specialty bar butter milk
## 0.026029487 0.027351296 0.027961362
## oil frozen meals misc. beverages
## 0.028063040 0.028368073 0.028368073
## candy specialty chocolate onions
## 0.029893238 0.030401627 0.031011693
## hygiene articles hamburger meat berries
## 0.032943569 0.033248602 0.033248602
## UHT-milk sugar dessert
## 0.033451957 0.033858668 0.037112354
## long life bakery product salty snack waffles
## 0.037417387 0.037824098 0.038434164
## cream cheese white bread chicken
## 0.039654296 0.042094560 0.042907982
## frozen vegetables chocolate napkins
## 0.048093543 0.049618709 0.052364006
## beef curd butter
## 0.052465684 0.053279105 0.055414337
## pork coffee margarine
## 0.057651246 0.058057956 0.058566345
## frankfurter domestic eggs brown bread
## 0.058973055 0.063446873 0.064870361
## whipped/sour cream fruit/vegetable juice pip fruit
## 0.071682766 0.072292832 0.075648195
## canned beer newspapers bottled beer
## 0.077681749 0.079816980 0.080528724
## citrus fruit pastry sausage
## 0.082765633 0.088967972 0.093950178
## shopping bags tropical fruit root vegetables
## 0.098525674 0.104931368 0.108998475
## bottled water yogurt soda
## 0.110523640 0.139501779 0.174377224
## rolls/buns other vegetables whole milk
## 0.183934926 0.193492628 0.255516014
The Apriori algorithm is a technique utilized to extract frequent item sets from transactional data. It operates by generating a group of potential item sets and eliminating those that fail to meet the minimum support threshold. This algorithm implements an iterative process, where each round generating candidate item sets with a length of k and eliminating those that are infrequent. The frequent item sets obtained at each iteration serve as the basis for generating the candidate item sets in the subsequent iteration.
The FP-Growth algorithm is a frequent pattern mining algorithm that is used to extract frequent item sets from a large data set. It is much faster and more efficient than the Apriori algorithm.
The FP-Growth algorithm has two phases:
Build the FP-Tree: In this phase, the algorithm reads the data set and builds an FP-Tree. The FP-Tree is a tree structure that represents all the frequent item sets in the data set. The root of the tree represents the null set, and each node in the tree represents an item. The nodes are linked together based on their frequency in the data set.
Extract frequent item sets: In this phase, the algorithm uses the FP-Tree to extract frequent item sets from the data set. The algorithm starts at the bottom of the tree and works its way up, mining the frequent item sets. It does this by recursively generating conditional FP-Trees for each frequent item set, and then using those trees to generate new frequent item sets.
The algorithm uses support and confidence as parameters. The support and confidence levels should be chosen based on objectives. If frequent item sets should be found, a high support threshold should be chosen. On the other hand, if strong association rules needed to discovered, a high confidence threshold should be chosen.
The data set size is also an important factor when choosing the support level. If the data set is small, higher support level can be used. However, if the data set is large, a lower support level may be appropriate.
The Eclat algorithm is another popular association rule mining algorithm, it is also known as Equivalence Class Clustering and Bottom-Up Lattice Traversal, is widely used in association rule mining to extract frequent item sets from vast data sets.
Its vertical data representation enables it to quickly identify frequent item sets containing specific items. Eclat generates candidate item sets by finding the intersections between transaction ID lists for each item, and prunes those sets that don’t meet the minimum support threshold. Only item sets with support above the threshold are considered frequent.
By recursively finding the intersections between transaction ID lists of frequent item sets, Eclat generates larger sets. The algorithm stops when no more frequent item sets can be generated. Eclat is particularly efficient when dealing with data sets with many items but few transactions, as its vertical data representation reduces computational complexity.
These are the differences between FP Growth Algorithm and Apriori Algorithm.
Scan the dataset: Apriori algorithm scans the dataset multiple times to generate candidate itemsets, whereas FP-Growth scans the dataset only once to build the FP-Tree.
Data structure: Apriori algorithm uses a candidate generation approach where the itemsets are stored in a list or a hash table, whereas FP-Growth uses a tree-like structure called an FP-Tree.
Memory usage: Apriori algorithm requires a lot of memory, especially when the dataset is large, because it generates a large number of candidate itemsets. In contrast, FP-Growth uses a more compact data structure and requires less memory.
Speed: FP-Growth is faster than Apriori algorithm, especially when the dataset is large. This is because Apriori generates a large number of candidate itemsets, whereas FP-Growth generates a small number of conditional FP-Trees.
Pruning: Apriori algorithm prunes infrequent itemsets, whereas FP-Growth prunes infrequent items by compressing the FP-Tree.
Handling noise: Apriori algorithm can be sensitive to noisy data because it generates a large number of candidate itemsets, including many that may not be useful. In contrast, FP-Growth is less sensitive to noisy data because it generates a smaller number of conditional FP-Trees.
Overall, the FP-Growth algorithm is more efficient than the Apriori algorithm, especially when dealing with large data sets. However, the Apriori algorithm is still widely used because it is easy to understand and implement, and it can handle a variety of data sets.
Here, apriori algorithm applied to data set.
Support refers to the proportion of transactions in the data set that contain a particular item set. It represents the frequency of occurrence of the item set in the data set, and is calculated as the number of transactions containing the item set divided by the total number of transactions in the data set.
Confidence, is a measure of the strength of the association between two item sets, and is defined as the proportion of transactions containing both product out of the total number of transactions containing A. In other words, it represents the conditional probability of finding item set B in a transaction given that itemset A is present in the same transaction.
Support and confidence are used to identify rules in the dataset. Typically, a minimum support and confidence threshold is set by the user, and the Apriori algorithm generates all itemsets that meet or exceed the specified thresholds. These itemsets are then used to generate association rules with a minimum confidence level, which can be further analyzed and visualized to gain insights into the dataset.
association_rules <- apriori(transactions, parameter = list(support = 0.01, confidence = 0.25, minlen = 2))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.25 0.1 1 none FALSE TRUE 5 0.01 2
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 98
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[120 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [88 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [170 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
fpgrowth <- fim4r(transactions, method = "fpgrowth", target = "rules", supp = 10, conf = 10)
summary(fpgrowth)
## set of 137 rules
##
## rule length distribution (lhs + rhs):sizes
## 1 2
## 8 129
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 2.000 2.000 1.942 2.000 2.000
##
## summary of quality measures:
## support confidence lift count
## Min. :0.01068 Min. :0.1007 Min. :0.8991 Min. : 105.0
## 1st Qu.:0.01739 1st Qu.:0.1151 1st Qu.:1.2121 1st Qu.: 171.0
## Median :0.02420 Median :0.1394 Median :1.4904 Median : 238.0
## Mean :0.03342 Mean :0.1657 Mean :1.5294 Mean : 328.6
## 3rd Qu.:0.03325 3rd Qu.:0.1935 3rd Qu.:1.7716 3rd Qu.: 327.0
## Max. :0.25552 Max. :0.4487 Max. :3.0404 Max. :2513.0
eclat <- fim4r(transactions, method = "eclat", target = "rules", supp = 10, conf = 10)
summary(eclat)
## set of 137 rules
##
## rule length distribution (lhs + rhs):sizes
## 1 2
## 8 129
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 2.000 2.000 1.942 2.000 2.000
##
## summary of quality measures:
## support confidence lift count
## Min. :0.01068 Min. :0.1007 Min. :0.8991 Min. : 105.0
## 1st Qu.:0.01739 1st Qu.:0.1151 1st Qu.:1.2121 1st Qu.: 171.0
## Median :0.02420 Median :0.1394 Median :1.4904 Median : 238.0
## Mean :0.03342 Mean :0.1657 Mean :1.5294 Mean : 328.6
## 3rd Qu.:0.03325 3rd Qu.:0.1935 3rd Qu.:1.7716 3rd Qu.: 327.0
## Max. :0.25552 Max. :0.4487 Max. :3.0404 Max. :2513.0
inspect(head(sort(association_rules, by ="lift"),10))
## lhs rhs support
## [1] {citrus fruit, other vegetables} => {root vegetables} 0.01037112
## [2] {tropical fruit, other vegetables} => {root vegetables} 0.01230300
## [3] {beef} => {root vegetables} 0.01738688
## [4] {citrus fruit, root vegetables} => {other vegetables} 0.01037112
## [5] {tropical fruit, root vegetables} => {other vegetables} 0.01230300
## [6] {other vegetables, whole milk} => {root vegetables} 0.02318251
## [7] {whole milk, curd} => {yogurt} 0.01006609
## [8] {other vegetables, yogurt} => {root vegetables} 0.01291307
## [9] {other vegetables, yogurt} => {tropical fruit} 0.01230300
## [10] {other vegetables, rolls/buns} => {root vegetables} 0.01220132
## confidence coverage lift count
## [1] 0.3591549 0.02887646 3.295045 102
## [2] 0.3427762 0.03589222 3.144780 121
## [3] 0.3313953 0.05246568 3.040367 171
## [4] 0.5862069 0.01769192 3.029608 102
## [5] 0.5845411 0.02104728 3.020999 121
## [6] 0.3097826 0.07483477 2.842082 228
## [7] 0.3852140 0.02613116 2.761356 99
## [8] 0.2974239 0.04341637 2.728698 127
## [9] 0.2833724 0.04341637 2.700550 121
## [10] 0.2863962 0.04260295 2.627525 120
inspect(head(sort(association_rules, by ="confidence"),10))
## lhs rhs support
## [1] {citrus fruit, root vegetables} => {other vegetables} 0.01037112
## [2] {tropical fruit, root vegetables} => {other vegetables} 0.01230300
## [3] {curd, yogurt} => {whole milk} 0.01006609
## [4] {other vegetables, butter} => {whole milk} 0.01148958
## [5] {tropical fruit, root vegetables} => {whole milk} 0.01199797
## [6] {root vegetables, yogurt} => {whole milk} 0.01453991
## [7] {other vegetables, domestic eggs} => {whole milk} 0.01230300
## [8] {yogurt, whipped/sour cream} => {whole milk} 0.01087951
## [9] {root vegetables, rolls/buns} => {whole milk} 0.01270971
## [10] {pip fruit, other vegetables} => {whole milk} 0.01352313
## confidence coverage lift count
## [1] 0.5862069 0.01769192 3.029608 102
## [2] 0.5845411 0.02104728 3.020999 121
## [3] 0.5823529 0.01728521 2.279125 99
## [4] 0.5736041 0.02003050 2.244885 113
## [5] 0.5700483 0.02104728 2.230969 118
## [6] 0.5629921 0.02582613 2.203354 143
## [7] 0.5525114 0.02226741 2.162336 121
## [8] 0.5245098 0.02074225 2.052747 107
## [9] 0.5230126 0.02430097 2.046888 125
## [10] 0.5175097 0.02613116 2.025351 133
inspect(head(sort(association_rules, by ="support"),10))
## lhs rhs support confidence coverage
## [1] {other vegetables} => {whole milk} 0.07483477 0.3867578 0.1934926
## [2] {whole milk} => {other vegetables} 0.07483477 0.2928770 0.2555160
## [3] {rolls/buns} => {whole milk} 0.05663447 0.3079049 0.1839349
## [4] {yogurt} => {whole milk} 0.05602440 0.4016035 0.1395018
## [5] {root vegetables} => {whole milk} 0.04890696 0.4486940 0.1089985
## [6] {root vegetables} => {other vegetables} 0.04738180 0.4347015 0.1089985
## [7] {yogurt} => {other vegetables} 0.04341637 0.3112245 0.1395018
## [8] {tropical fruit} => {whole milk} 0.04229792 0.4031008 0.1049314
## [9] {tropical fruit} => {other vegetables} 0.03589222 0.3420543 0.1049314
## [10] {bottled water} => {whole milk} 0.03436706 0.3109476 0.1105236
## lift count
## [1] 1.513634 736
## [2] 1.513634 736
## [3] 1.205032 557
## [4] 1.571735 551
## [5] 1.756031 481
## [6] 2.246605 466
## [7] 1.608457 427
## [8] 1.577595 416
## [9] 1.767790 353
## [10] 1.216940 338
plot(association_rules, measure=c("support", "confidence"), shading="lift",col="#FF66CC")
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.
milk.rules <- sort(subset(association_rules, subset = rhs %in% "whole milk"), by = "confidence")
inspect(milk.rules)
## lhs rhs support confidence coverage lift count
## [1] {curd,
## yogurt} => {whole milk} 0.01006609 0.5823529 0.01728521 2.2791250 99
## [2] {other vegetables,
## butter} => {whole milk} 0.01148958 0.5736041 0.02003050 2.2448850 113
## [3] {tropical fruit,
## root vegetables} => {whole milk} 0.01199797 0.5700483 0.02104728 2.2309690 118
## [4] {root vegetables,
## yogurt} => {whole milk} 0.01453991 0.5629921 0.02582613 2.2033536 143
## [5] {other vegetables,
## domestic eggs} => {whole milk} 0.01230300 0.5525114 0.02226741 2.1623358 121
## [6] {yogurt,
## whipped/sour cream} => {whole milk} 0.01087951 0.5245098 0.02074225 2.0527473 107
## [7] {root vegetables,
## rolls/buns} => {whole milk} 0.01270971 0.5230126 0.02430097 2.0468876 125
## [8] {pip fruit,
## other vegetables} => {whole milk} 0.01352313 0.5175097 0.02613116 2.0253514 133
## [9] {tropical fruit,
## yogurt} => {whole milk} 0.01514997 0.5173611 0.02928317 2.0247698 149
## [10] {other vegetables,
## yogurt} => {whole milk} 0.02226741 0.5128806 0.04341637 2.0072345 219
## [11] {other vegetables,
## whipped/sour cream} => {whole milk} 0.01464159 0.5070423 0.02887646 1.9843854 144
## [12] {other vegetables,
## fruit/vegetable juice} => {whole milk} 0.01047280 0.4975845 0.02104728 1.9473713 103
## [13] {butter} => {whole milk} 0.02755465 0.4972477 0.05541434 1.9460530 271
## [14] {curd} => {whole milk} 0.02613116 0.4904580 0.05327911 1.9194805 257
## [15] {root vegetables,
## other vegetables} => {whole milk} 0.02318251 0.4892704 0.04738180 1.9148326 228
## [16] {tropical fruit,
## other vegetables} => {whole milk} 0.01708185 0.4759207 0.03589222 1.8625865 168
## [17] {citrus fruit,
## yogurt} => {whole milk} 0.01026945 0.4741784 0.02165735 1.8557678 101
## [18] {domestic eggs} => {whole milk} 0.02999492 0.4727564 0.06344687 1.8502027 295
## [19] {pork,
## other vegetables} => {whole milk} 0.01016777 0.4694836 0.02165735 1.8373939 100
## [20] {other vegetables,
## pastry} => {whole milk} 0.01057448 0.4684685 0.02257245 1.8334212 104
## [21] {yogurt,
## rolls/buns} => {whole milk} 0.01555669 0.4526627 0.03436706 1.7715630 153
## [22] {citrus fruit,
## other vegetables} => {whole milk} 0.01301474 0.4507042 0.02887646 1.7638982 128
## [23] {whipped/sour cream} => {whole milk} 0.03223183 0.4496454 0.07168277 1.7597542 317
## [24] {root vegetables} => {whole milk} 0.04890696 0.4486940 0.10899847 1.7560310 481
## [25] {tropical fruit,
## rolls/buns} => {whole milk} 0.01098119 0.4462810 0.02460600 1.7465872 108
## [26] {sugar} => {whole milk} 0.01504830 0.4444444 0.03385867 1.7393996 148
## [27] {hamburger meat} => {whole milk} 0.01474326 0.4434251 0.03324860 1.7354101 145
## [28] {ham} => {whole milk} 0.01148958 0.4414062 0.02602949 1.7275091 113
## [29] {sliced cheese} => {whole milk} 0.01077783 0.4398340 0.02450432 1.7213560 106
## [30] {other vegetables,
## bottled water} => {whole milk} 0.01077783 0.4344262 0.02480935 1.7001918 106
## [31] {other vegetables,
## soda} => {whole milk} 0.01392984 0.4254658 0.03274021 1.6651240 137
## [32] {frozen vegetables} => {whole milk} 0.02043721 0.4249471 0.04809354 1.6630940 201
## [33] {other vegetables,
## rolls/buns} => {whole milk} 0.01789527 0.4200477 0.04260295 1.6439194 176
## [34] {cream cheese } => {whole milk} 0.01647178 0.4153846 0.03965430 1.6256696 162
## [35] {butter milk} => {whole milk} 0.01159126 0.4145455 0.02796136 1.6223854 114
## [36] {margarine} => {whole milk} 0.02419929 0.4131944 0.05856634 1.6170980 238
## [37] {hard cheese} => {whole milk} 0.01006609 0.4107884 0.02450432 1.6076815 99
## [38] {chicken} => {whole milk} 0.01759024 0.4099526 0.04290798 1.6044106 173
## [39] {white bread} => {whole milk} 0.01708185 0.4057971 0.04209456 1.5881474 168
## [40] {beef} => {whole milk} 0.02125064 0.4050388 0.05246568 1.5851795 209
## [41] {tropical fruit} => {whole milk} 0.04229792 0.4031008 0.10493137 1.5775950 416
## [42] {oil} => {whole milk} 0.01128622 0.4021739 0.02806304 1.5739675 111
## [43] {yogurt} => {whole milk} 0.05602440 0.4016035 0.13950178 1.5717351 551
## [44] {pip fruit} => {whole milk} 0.03009659 0.3978495 0.07564820 1.5570432 296
## [45] {onions} => {whole milk} 0.01209964 0.3901639 0.03101169 1.5269647 119
## [46] {hygiene articles} => {whole milk} 0.01281139 0.3888889 0.03294357 1.5219746 126
## [47] {brown bread} => {whole milk} 0.02521607 0.3887147 0.06487036 1.5212930 248
## [48] {other vegetables} => {whole milk} 0.07483477 0.3867578 0.19349263 1.5136341 736
## [49] {pork} => {whole milk} 0.02216573 0.3844797 0.05765125 1.5047187 218
## [50] {yogurt,
## soda} => {whole milk} 0.01047280 0.3828996 0.02735130 1.4985348 103
## [51] {sausage,
## other vegetables} => {whole milk} 0.01016777 0.3773585 0.02694459 1.4768487 100
## [52] {napkins} => {whole milk} 0.01972547 0.3766990 0.05236401 1.4742678 194
## [53] {pastry} => {whole milk} 0.03324860 0.3737143 0.08896797 1.4625865 327
## [54] {dessert} => {whole milk} 0.01372649 0.3698630 0.03711235 1.4475140 135
## [55] {citrus fruit} => {whole milk} 0.03050330 0.3685504 0.08276563 1.4423768 300
## [56] {fruit/vegetable juice} => {whole milk} 0.02663955 0.3684951 0.07229283 1.4421604 262
## [57] {long life bakery product} => {whole milk} 0.01352313 0.3614130 0.03741739 1.4144438 133
## [58] {berries} => {whole milk} 0.01179461 0.3547401 0.03324860 1.3883281 116
## [59] {frankfurter} => {whole milk} 0.02053889 0.3482759 0.05897306 1.3630295 202
## [60] {newspapers} => {whole milk} 0.02735130 0.3426752 0.07981698 1.3411103 269
## [61] {chocolate} => {whole milk} 0.01667514 0.3360656 0.04961871 1.3152427 164
## [62] {waffles} => {whole milk} 0.01270971 0.3306878 0.03843416 1.2941961 125
## [63] {coffee} => {whole milk} 0.01870869 0.3222417 0.05805796 1.2611408 184
## [64] {sausage} => {whole milk} 0.02989324 0.3181818 0.09395018 1.2452520 294
## [65] {bottled water} => {whole milk} 0.03436706 0.3109476 0.11052364 1.2169396 338
## [66] {rolls/buns} => {whole milk} 0.05663447 0.3079049 0.18393493 1.2050318 557
## [67] {salty snack} => {whole milk} 0.01118454 0.2956989 0.03782410 1.1572618 110
## [68] {bottled beer} => {whole milk} 0.02043721 0.2537879 0.08052872 0.9932367 201
yogurt.rules <- sort(subset(association_rules, subset = rhs %in% "yogurt"), by = "confidence")
inspect(yogurt.rules)
## lhs rhs support confidence
## [1] {whole milk, curd} => {yogurt} 0.01006609 0.3852140
## [2] {tropical fruit, whole milk} => {yogurt} 0.01514997 0.3581731
## [3] {other vegetables, whipped/sour cream} => {yogurt} 0.01016777 0.3521127
## [4] {tropical fruit, other vegetables} => {yogurt} 0.01230300 0.3427762
## [5] {whole milk, whipped/sour cream} => {yogurt} 0.01087951 0.3375394
## [6] {citrus fruit, whole milk} => {yogurt} 0.01026945 0.3366667
## [7] {curd} => {yogurt} 0.01728521 0.3244275
## [8] {berries} => {yogurt} 0.01057448 0.3180428
## [9] {cream cheese } => {yogurt} 0.01240468 0.3128205
## [10] {other vegetables, whole milk} => {yogurt} 0.02226741 0.2975543
## [11] {root vegetables, whole milk} => {yogurt} 0.01453991 0.2972973
## [12] {whipped/sour cream} => {yogurt} 0.02074225 0.2893617
## [13] {tropical fruit} => {yogurt} 0.02928317 0.2790698
## [14] {whole milk, rolls/buns} => {yogurt} 0.01555669 0.2746858
## [15] {root vegetables, other vegetables} => {yogurt} 0.01291307 0.2725322
## [16] {other vegetables, rolls/buns} => {yogurt} 0.01148958 0.2696897
## [17] {butter} => {yogurt} 0.01464159 0.2642202
## [18] {citrus fruit} => {yogurt} 0.02165735 0.2616708
## [19] {whole milk, soda} => {yogurt} 0.01047280 0.2614213
## [20] {fruit/vegetable juice} => {yogurt} 0.01870869 0.2587904
## [21] {frozen vegetables} => {yogurt} 0.01240468 0.2579281
## coverage lift count
## [1] 0.02613116 2.761356 99
## [2] 0.04229792 2.567516 149
## [3] 0.02887646 2.524073 100
## [4] 0.03589222 2.457146 121
## [5] 0.03223183 2.419607 107
## [6] 0.03050330 2.413350 101
## [7] 0.05327911 2.325615 170
## [8] 0.03324860 2.279848 104
## [9] 0.03965430 2.242412 122
## [10] 0.07483477 2.132979 219
## [11] 0.04890696 2.131136 143
## [12] 0.07168277 2.074251 204
## [13] 0.10493137 2.000475 288
## [14] 0.05663447 1.969049 153
## [15] 0.04738180 1.953611 127
## [16] 0.04260295 1.933235 113
## [17] 0.05541434 1.894027 144
## [18] 0.08276563 1.875752 213
## [19] 0.04006101 1.873964 103
## [20] 0.07229283 1.855105 184
## [21] 0.04809354 1.848924 122
root_vegetables.rules <- sort(subset(association_rules, subset = rhs %in% "root vegetables"), by = "confidence")
inspect(root_vegetables.rules)
## lhs rhs support
## [1] {citrus fruit, other vegetables} => {root vegetables} 0.01037112
## [2] {tropical fruit, other vegetables} => {root vegetables} 0.01230300
## [3] {beef} => {root vegetables} 0.01738688
## [4] {other vegetables, whole milk} => {root vegetables} 0.02318251
## [5] {other vegetables, yogurt} => {root vegetables} 0.01291307
## [6] {other vegetables, rolls/buns} => {root vegetables} 0.01220132
## [7] {tropical fruit, whole milk} => {root vegetables} 0.01199797
## [8] {whole milk, yogurt} => {root vegetables} 0.01453991
## [9] {chicken} => {root vegetables} 0.01087951
## confidence coverage lift count
## [1] 0.3591549 0.02887646 3.295045 102
## [2] 0.3427762 0.03589222 3.144780 121
## [3] 0.3313953 0.05246568 3.040367 171
## [4] 0.3097826 0.07483477 2.842082 228
## [5] 0.2974239 0.04341637 2.728698 127
## [6] 0.2863962 0.04260295 2.627525 120
## [7] 0.2836538 0.04229792 2.602365 118
## [8] 0.2595281 0.05602440 2.381025 143
## [9] 0.2535545 0.04290798 2.326221 107
Rules for selected products
plot(milk.rules,method="graph",shading="lift",col="#FF66CC")
plot(yogurt.rules,method="graph",shading="lift",col="#FF66CC")
plot(root_vegetables.rules,method="graph",shading="lift",col="#FF66CC")
plot(association_rules, method="paracoord", control=list(reorder=TRUE),col="#FF66CC")
The visualization shown here is a network plot that displays the association rules derived from the Apriori algorithm. This kind of visualization is advantageous for detecting groups of interconnected items and rules, and can be beneficial in uncovering intricate associations between different items in the dataset.
# Create a network plot of the association rules
plot(association_rules, method = "graph", measure = "lift", shading = "confidence", engine = "htmlwidget", network = TRUE, itemCol="pink", max = 200)
## Available control parameters (with default values):
## itemCol = #CBD2FC
## nodeCol = c("#EE0000", "#EE0303", "#EE0606", "#EE0909", "#EE0C0C", "#EE0F0F", "#EE1212", "#EE1515", "#EE1818", "#EE1B1B", "#EE1E1E", "#EE2222", "#EE2525", "#EE2828", "#EE2B2B", "#EE2E2E", "#EE3131", "#EE3434", "#EE3737", "#EE3A3A", "#EE3D3D", "#EE4040", "#EE4444", "#EE4747", "#EE4A4A", "#EE4D4D", "#EE5050", "#EE5353", "#EE5656", "#EE5959", "#EE5C5C", "#EE5F5F", "#EE6262", "#EE6666", "#EE6969", "#EE6C6C", "#EE6F6F", "#EE7272", "#EE7575", "#EE7878", "#EE7B7B", "#EE7E7E", "#EE8181", "#EE8484", "#EE8888", "#EE8B8B", "#EE8E8E", "#EE9191", "#EE9494", "#EE9797", "#EE9999", "#EE9B9B", "#EE9D9D", "#EE9F9F", "#EEA0A0", "#EEA2A2", "#EEA4A4", "#EEA5A5", "#EEA7A7", "#EEA9A9", "#EEABAB", "#EEACAC", "#EEAEAE", "#EEB0B0", "#EEB1B1", "#EEB3B3", "#EEB5B5", "#EEB7B7", "#EEB8B8", "#EEBABA", "#EEBCBC", "#EEBDBD", "#EEBFBF", "#EEC1C1", "#EEC3C3", "#EEC4C4", "#EEC6C6", "#EEC8C8", "#EEC9C9", "#EECBCB", "#EECDCD", "#EECFCF", "#EED0D0", "#EED2D2", "#EED4D4", "#EED5D5", "#EED7D7", "#EED9D9", "#EEDBDB", "#EEDCDC", "#EEDEDE", "#EEE0E0", "#EEE1E1", "#EEE3E3", "#EEE5E5", "#EEE7E7", "#EEE8E8", "#EEEAEA", "#EEECEC", "#EEEEEE")
## precision = 3
## igraphLayout = layout_nicely
## interactive = TRUE
## engine = visNetwork
## max = 100
## selection_menu = TRUE
## degree_highlight = 1
## verbose = FALSE
plot(fpgrowth, method = "graph", measure = "lift", shading = "confidence", engine = "htmlwidget", network = TRUE, itemCol="pink", max = 200)
## Warning: Unknown control parameters: network
## Available control parameters (with default values):
## itemCol = #CBD2FC
## nodeCol = c("#EE0000", "#EE0303", "#EE0606", "#EE0909", "#EE0C0C", "#EE0F0F", "#EE1212", "#EE1515", "#EE1818", "#EE1B1B", "#EE1E1E", "#EE2222", "#EE2525", "#EE2828", "#EE2B2B", "#EE2E2E", "#EE3131", "#EE3434", "#EE3737", "#EE3A3A", "#EE3D3D", "#EE4040", "#EE4444", "#EE4747", "#EE4A4A", "#EE4D4D", "#EE5050", "#EE5353", "#EE5656", "#EE5959", "#EE5C5C", "#EE5F5F", "#EE6262", "#EE6666", "#EE6969", "#EE6C6C", "#EE6F6F", "#EE7272", "#EE7575", "#EE7878", "#EE7B7B", "#EE7E7E", "#EE8181", "#EE8484", "#EE8888", "#EE8B8B", "#EE8E8E", "#EE9191", "#EE9494", "#EE9797", "#EE9999", "#EE9B9B", "#EE9D9D", "#EE9F9F", "#EEA0A0", "#EEA2A2", "#EEA4A4", "#EEA5A5", "#EEA7A7", "#EEA9A9", "#EEABAB", "#EEACAC", "#EEAEAE", "#EEB0B0", "#EEB1B1", "#EEB3B3", "#EEB5B5", "#EEB7B7", "#EEB8B8", "#EEBABA", "#EEBCBC", "#EEBDBD", "#EEBFBF", "#EEC1C1", "#EEC3C3", "#EEC4C4", "#EEC6C6", "#EEC8C8", "#EEC9C9", "#EECBCB", "#EECDCD", "#EECFCF", "#EED0D0", "#EED2D2", "#EED4D4", "#EED5D5", "#EED7D7", "#EED9D9", "#EEDBDB", "#EEDCDC", "#EEDEDE", "#EEE0E0", "#EEE1E1", "#EEE3E3", "#EEE5E5", "#EEE7E7", "#EEE8E8", "#EEEAEA", "#EEECEC", "#EEEEEE")
## precision = 3
## igraphLayout = layout_nicely
## interactive = TRUE
## engine = visNetwork
## max = 100
## selection_menu = TRUE
## degree_highlight = 1
## verbose = FALSE
plot(eclat, method = "graph", measure = "lift", shading = "confidence", engine = "htmlwidget", network = TRUE, itemCol="pink", max = 200)
## Warning: Unknown control parameters: network
## Available control parameters (with default values):
## itemCol = #CBD2FC
## nodeCol = c("#EE0000", "#EE0303", "#EE0606", "#EE0909", "#EE0C0C", "#EE0F0F", "#EE1212", "#EE1515", "#EE1818", "#EE1B1B", "#EE1E1E", "#EE2222", "#EE2525", "#EE2828", "#EE2B2B", "#EE2E2E", "#EE3131", "#EE3434", "#EE3737", "#EE3A3A", "#EE3D3D", "#EE4040", "#EE4444", "#EE4747", "#EE4A4A", "#EE4D4D", "#EE5050", "#EE5353", "#EE5656", "#EE5959", "#EE5C5C", "#EE5F5F", "#EE6262", "#EE6666", "#EE6969", "#EE6C6C", "#EE6F6F", "#EE7272", "#EE7575", "#EE7878", "#EE7B7B", "#EE7E7E", "#EE8181", "#EE8484", "#EE8888", "#EE8B8B", "#EE8E8E", "#EE9191", "#EE9494", "#EE9797", "#EE9999", "#EE9B9B", "#EE9D9D", "#EE9F9F", "#EEA0A0", "#EEA2A2", "#EEA4A4", "#EEA5A5", "#EEA7A7", "#EEA9A9", "#EEABAB", "#EEACAC", "#EEAEAE", "#EEB0B0", "#EEB1B1", "#EEB3B3", "#EEB5B5", "#EEB7B7", "#EEB8B8", "#EEBABA", "#EEBCBC", "#EEBDBD", "#EEBFBF", "#EEC1C1", "#EEC3C3", "#EEC4C4", "#EEC6C6", "#EEC8C8", "#EEC9C9", "#EECBCB", "#EECDCD", "#EECFCF", "#EED0D0", "#EED2D2", "#EED4D4", "#EED5D5", "#EED7D7", "#EED9D9", "#EEDBDB", "#EEDCDC", "#EEDEDE", "#EEE0E0", "#EEE1E1", "#EEE3E3", "#EEE5E5", "#EEE7E7", "#EEE8E8", "#EEEAEA", "#EEECEC", "#EEEEEE")
## precision = 3
## igraphLayout = layout_nicely
## interactive = TRUE
## engine = visNetwork
## max = 100
## selection_menu = TRUE
## degree_highlight = 1
## verbose = FALSE
In conclusion, the analysis of the Groceries dataset using the Apriori algorithm and association rules has provided valuable insights into the relationships and patterns between items in the data set. By setting appropriate support and confidence thresholds, we were able to generate a large number of association rules that highlight the frequent itemsets and itemset pairs that co-occur in the transactions.
The visualizations, including item frequency plots, scatterplots, and network plots, have allowed us to further explore and understand the relationships and dependencies between the items, clusters of related items, and rules in the dataset.
Healthcare: Association rules can be applied to patient data to identify patterns and correlations that could be indicative of certain diseases or conditions. For example, analyzing symptoms, treatments, and medications in electronic health records could help identify patients at high risk for a particular disease.
Fraud Detection: Association rules can be used to detect fraudulent activities by identifying unusual patterns of behavior. For example, credit card transactions that deviate from a customer’s typical spending habits could indicate fraud.
Recommender Systems: Association rules can be used to generate personalized recommendations for products or services based on a customer’s purchase history or preferences.