Identify the association rules for Market Basket Analysis
library(data.table)
library(arules)
## Loading required package: Matrix
##
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
##
## abbreviate, write
library(arulesViz)
library(gridExtra)
library(ggplot2)
## Loading & inspecting the data as a data.table
DT <- fread("~/Groceries_dataset.csv")
DT[1:10]
## Let's have a look of the data table.
str(DT)
## Classes 'data.table' and 'data.frame': 38765 obs. of 3 variables:
## $ Member_number : int 1808 2552 2300 1187 3037 4941 4501 3803 2762 4119 ...
## $ Date : chr "21-07-2015" "05-01-2015" "19-09-2015" "12-12-2015" ...
## $ itemDescription: chr "tropical fruit" "whole milk" "pip fruit" "other vegetables" ...
## - attr(*, ".internal.selfref")=<externalptr>
here are 3 variables(columns) and total 38765 observations(rows) in the data table.
## Let's see the summary of data table
summary(DT)
## Member_number Date itemDescription
## Min. :1000 Length:38765 Length:38765
## 1st Qu.:2002 Class :character Class :character
## Median :3005 Mode :character Mode :character
## Mean :3004
## 3rd Qu.:4007
## Max. :5000
There are total 38,765 rows and three columns in the data table. The Date column is not in date time format also itemDescription is factor however it’s currently as character. Let’s convert the columns into proper data types.
## Changing the `Date` column data to datetime object and `itemDescription` to factor.
DT$Date <- as.Date(DT$Date, format="%d-%m-%Y")
DT$itemDescription <- as.factor(DT$itemDescription)
DT[1:10]
library(DataExplorer)
introduce(DT)
plot_intro(DT)
This shows us that there are no missing observation in any of the three columns in the data table.
## Sort the data table by Member Number & Date
setkey(DT,"Member_number", "Date")
DT[1:10]
## Merging all the items purchased by a member on a specific date to one row.
itemList <- DT[, .(itemList= paste(itemDescription, collapse=",")), by=list(Member_number,Date)]
itemList[1:10]
## Remove Member_number & Date Columns
itemList <- itemList[,c("itemList"), with=FALSE]
itemList[1:10]
write.csv(itemList,"ItemList.csv", quote = FALSE, row.names = TRUE)
itemList[1:10]
## Creating transactions object in basket format
trans = read.transactions(file="ItemList.csv", rm.duplicates= TRUE, format="basket",sep=",",cols=1);
## distribution of transactions with duplicates:
## items
## 1 2 3 4
## 662 39 5 1
##removing quotes from the transaction
trans@itemInfo$labels <- gsub("\"","",trans@itemInfo$labels)
summary(trans)
## transactions as itemMatrix in sparse format with
## 14964 rows (elements/itemsets/transactions) and
## 168 columns (items) and a density of 0.01511843
##
## most frequent items:
## whole milk other vegetables rolls/buns soda
## 2363 1827 1646 1453
## yogurt (Other)
## 1285 29433
##
## element (itemset/transaction) length distribution:
## sizes
## 1 2 3 4 5 6 7 8 9 10
## 206 10012 2727 1273 338 179 113 96 19 1
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 2.00 2.00 2.54 3.00 10.00
##
## includes extended item information - examples:
## labels
## 1 abrasive cleaner
## 2 artif. sweetener
## 3 baby cosmetics
##
## includes extended transaction information - examples:
## transactionID
## 1
## 2 1
## 3 2
The Apriori algorithm generates the most relevant set of rules from a given transaction data. It also shows the support, confidence and lift of those rules. These three measure can be used to decide the relative strength of the rules.
Lets consider the rule A => B in order to compute these metrics.
Support is the ratio of no. of transactions with both A & B to the total no. of transactions.
\[Support = {P(A \cap B)}\] Confidence is the ratio of no of transactions with both A & B to the total no. of transaction with A.
\[Confidence = \frac {P(A\cap B)}{P(A)}\]
Expected Confidence is the ratio of Number of Transactions with B to the total Number of Transactions.
\[Expected \ Confidence = {P(B)}\]
And finally, Lift is the ratio of Confidence & Expected Confidence.
\[Lift = \frac {P(A \cap B)}{P(A).P(B)}\]
Lift is the factor by which, the co-occurrence of A and B exceeds the expected probability of A and B co-occurring, had they been independent. So, higher the lift, higher the chance of A and B occurring together.
frequentItems <- eclat(trans, parameter = list(supp = 0.05, maxlen = 15)) # calculates support for frequent items
## Eclat
##
## parameter specification:
## tidLists support minlen maxlen target ext
## FALSE 0.05 1 15 frequent itemsets TRUE
##
## algorithmic control:
## sparse sort verbose
## 7 -2 TRUE
##
## Absolute minimum support count: 748
##
## create itemset ...
## set transactions ...[168 item(s), 14964 transaction(s)] done [0.01s].
## sorting and recoding items ... [11 item(s)] done [0.00s].
## creating sparse bit matrix ... [11 row(s), 14964 column(s)] done [0.00s].
## writing ... [11 set(s)] done [0.00s].
## Creating S4 object ... done [0.00s].
inspect(head(frequentItems,10))
## items support transIdenticalToItemsets count
## [1] {whole milk} 0.15791232 2363 2363
## [2] {other vegetables} 0.12209302 1827 1827
## [3] {rolls/buns} 0.10999733 1646 1646
## [4] {soda} 0.09709971 1453 1453
## [5] {yogurt} 0.08587276 1285 1285
## [6] {tropical fruit} 0.06776263 1014 1014
## [7] {root vegetables} 0.06956696 1041 1041
## [8] {sausage} 0.06034483 903 903
## [9] {bottled water} 0.06067896 908 908
## [10] {citrus fruit} 0.05312751 795 795
# plot frequent items
itemFrequencyPlot(trans, topN=10, type="absolute", main="Item Frequency")
## Finding Association Rules
rules <- apriori(trans)
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.8 0.1 1 none FALSE TRUE 5 0.1 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 1496
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[168 item(s), 14964 transaction(s)] done [0.01s].
## sorting and recoding items ... [3 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
rules
## set of 0 rules
So with default parameters for support(0.1) the algorithm is not returning any rules.
We need to fine-tune the parameters to get some association rules.
# Support and confidence values
supportLevels <- c(0.05, 0.01, 0.005,0.001)
confidenceLevels <- c(0.5,0.45,0.4,0.35,0.3,0.25,0.2,0.15,0.1)
# Empty integers
rules_sup5 <- integer(length=9)
rules_sup1 <- integer(length=9)
rules_sup0.5 <- integer(length=9)
rules_sup0.1 <- integer(length=9)
# Apriori algorithm with a support level of 5%
for (i in 1:length(confidenceLevels)) {
rules_sup5[i] <- length(apriori(trans, parameter=list(sup=supportLevels[1],
conf=confidenceLevels[i], target="rules")))
}
# Apriori algorithm with a support level of 1%
for (i in 1:length(confidenceLevels)){
rules_sup1[i] <- length(apriori(trans, parameter=list(sup=supportLevels[2],
conf=confidenceLevels[i], target="rules")))
}
# Apriori algorithm with a support level of 0.5%
for (i in 1:length(confidenceLevels)){
rules_sup0.5[i] <- length(apriori(trans, parameter=list(sup=supportLevels[3],
conf=confidenceLevels[i], target="rules")))
}
# Apriori algorithm with a support level of 0.1%
for (i in 1:length(confidenceLevels)){
rules_sup0.1[i] <- length(apriori(trans, parameter=list(sup=supportLevels[4],
conf=confidenceLevels[i], target="rules")))
}
# Data frame
num_rules <- data.table(rules_sup5, rules_sup1, rules_sup0.5, rules_sup0.1, confidenceLevels)
ggplot(data=num_rules, aes(x=confidenceLevels)) +
# Plot line and points (support level of 5%)
geom_line(aes(y=rules_sup5, colour="Support level of 5%")) +
geom_point(aes(y=rules_sup5, colour="Support level of 5%")) +
# Plot line and points (support level of 1%)
geom_line(aes(y=rules_sup1, colour="Support level of 1%")) +
geom_point(aes(y=rules_sup1, colour="Support level of 1%")) +
# Plot line and points (support level of 0.5%)
geom_line(aes(y=rules_sup0.5, colour="Support level of 0.5%")) +
geom_point(aes(y=rules_sup0.5, colour="Support level of 0.5%")) +
# Plot line and points (support level of 0.1%)
geom_line(aes(y=rules_sup0.1, colour="Support level of 0.1%")) +
geom_point(aes(y=rules_sup0.1, colour="Support level of 0.1%")) +
# Labs and theme
labs(x="Confidence levels", y="Number of rules found",
title="Apriori algorithm with different support levels") +
theme_bw() +
theme(legend.title=element_blank())
After analyzing the graph above,
We will consider a support level of 0.1% and a confidence level of 10%.
## rules with specified parameters (support=0.1% and confidence 10% with minimum length of 2.)
rules <- apriori(trans, parameter=list(minlen=2,
supp=0.001,
conf=0.1))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.1 0.1 1 none FALSE TRUE 5 0.001 2
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 14
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[168 item(s), 14964 transaction(s)] done [0.01s].
## sorting and recoding items ... [149 item(s)] done [0.00s].
## creating transaction tree ... done [0.01s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [131 rule(s)] done [0.00s].
## creating S4 object ... done [0.01s].
summary(rules)
## set of 131 rules
##
## rule length distribution (lhs + rhs):sizes
## 2 3
## 114 17
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.00 2.00 2.00 2.13 2.00 3.00
##
## summary of quality measures:
## support confidence coverage lift
## Min. :0.001002 Min. :0.1000 Min. :0.005346 Min. :0.6458
## 1st Qu.:0.001337 1st Qu.:0.1098 1st Qu.:0.010325 1st Qu.:0.8075
## Median :0.001938 Median :0.1215 Median :0.016573 Median :0.8795
## Mean :0.002933 Mean :0.1257 Mean :0.023743 Mean :0.9465
## 3rd Qu.:0.003776 3rd Qu.:0.1347 3rd Qu.:0.031910 3rd Qu.:1.0320
## Max. :0.014836 Max. :0.2558 Max. :0.122093 Max. :2.1831
## count
## Min. : 15.00
## 1st Qu.: 20.00
## Median : 29.00
## Mean : 43.89
## 3rd Qu.: 56.50
## Max. :222.00
##
## mining info:
## data ntransactions support confidence
## trans 14964 0.001 0.1
inspect(head(rules,20))
## lhs rhs support confidence
## [1] {frozen fish} => {whole milk} 0.001069233 0.1568627
## [2] {seasonal products} => {rolls/buns} 0.001002406 0.1415094
## [3] {pot plants} => {other vegetables} 0.001002406 0.1282051
## [4] {pot plants} => {whole milk} 0.001002406 0.1282051
## [5] {pasta} => {whole milk} 0.001069233 0.1322314
## [6] {pickled vegetables} => {whole milk} 0.001002406 0.1119403
## [7] {packaged fruit/vegetables} => {rolls/buns} 0.001202887 0.1417323
## [8] {detergent} => {yogurt} 0.001069233 0.1240310
## [9] {detergent} => {rolls/buns} 0.001002406 0.1162791
## [10] {detergent} => {whole milk} 0.001403368 0.1627907
## [11] {semi-finished bread} => {other vegetables} 0.001002406 0.1056338
## [12] {semi-finished bread} => {whole milk} 0.001670676 0.1760563
## [13] {red/blush wine} => {rolls/buns} 0.001336541 0.1273885
## [14] {red/blush wine} => {other vegetables} 0.001136060 0.1082803
## [15] {flour} => {tropical fruit} 0.001069233 0.1095890
## [16] {flour} => {whole milk} 0.001336541 0.1369863
## [17] {herbs} => {yogurt} 0.001136060 0.1075949
## [18] {herbs} => {whole milk} 0.001136060 0.1075949
## [19] {processed cheese} => {root vegetables} 0.001069233 0.1052632
## [20] {processed cheese} => {rolls/buns} 0.001470195 0.1447368
## coverage lift count
## [1] 0.006816359 0.9933534 16
## [2] 0.007083667 1.2864807 15
## [3] 0.007818765 1.0500611 15
## [4] 0.007818765 0.8118754 15
## [5] 0.008086073 0.8373723 16
## [6] 0.008954825 0.7088763 15
## [7] 0.008487036 1.2885066 18
## [8] 0.008620690 1.4443580 16
## [9] 0.008620690 1.0571081 15
## [10] 0.008620690 1.0308929 21
## [11] 0.009489441 0.8651911 15
## [12] 0.009489441 1.1148993 25
## [13] 0.010491847 1.1581057 20
## [14] 0.010491847 0.8868668 17
## [15] 0.009756750 1.6172489 16
## [16] 0.009756750 0.8674833 20
## [17] 0.010558674 1.2529577 17
## [18] 0.010558674 0.6813587 17
## [19] 0.010157712 1.5131200 16
## [20] 0.010157712 1.3158214 22
### Visualizing the rules using two-key scatter plot
plot(rules,jitter=2, method = "two-key plot")
The above plot clearly shows strong inverse correlation between order and support.
plot(rules, measure=c("support", "lift"), shading = "confidence", engine="ggplot2")
## Rules sorted by confidence.
rules_conf <- sort (rules, by="confidence", decreasing=TRUE)
inspect(head(rules_conf,20))
## lhs rhs support confidence
## [1] {sausage,yogurt} => {whole milk} 0.001470195 0.2558140
## [2] {rolls/buns,sausage} => {whole milk} 0.001136060 0.2125000
## [3] {sausage,soda} => {whole milk} 0.001069233 0.1797753
## [4] {semi-finished bread} => {whole milk} 0.001670676 0.1760563
## [5] {rolls/buns,yogurt} => {whole milk} 0.001336541 0.1709402
## [6] {sausage,whole milk} => {yogurt} 0.001470195 0.1641791
## [7] {detergent} => {whole milk} 0.001403368 0.1627907
## [8] {ham} => {whole milk} 0.002739909 0.1601562
## [9] {bottled beer} => {whole milk} 0.007150495 0.1578171
## [10] {frozen fish} => {whole milk} 0.001069233 0.1568627
## [11] {candy} => {whole milk} 0.002138466 0.1488372
## [12] {sausage} => {whole milk} 0.008954825 0.1483942
## [13] {onions} => {whole milk} 0.002940390 0.1452145
## [14] {processed cheese} => {rolls/buns} 0.001470195 0.1447368
## [15] {processed cheese} => {whole milk} 0.001470195 0.1447368
## [16] {newspapers} => {whole milk} 0.005613472 0.1443299
## [17] {domestic eggs} => {whole milk} 0.005279337 0.1423423
## [18] {packaged fruit/vegetables} => {rolls/buns} 0.001202887 0.1417323
## [19] {seasonal products} => {rolls/buns} 0.001002406 0.1415094
## [20] {cat food} => {whole milk} 0.001670676 0.1412429
## coverage lift count
## [1] 0.005747126 1.6199746 22
## [2] 0.005346164 1.3456835 17
## [3] 0.005947608 1.1384500 16
## [4] 0.009489441 1.1148993 25
## [5] 0.007818765 1.0825005 20
## [6] 0.008954825 1.9118880 22
## [7] 0.008620690 1.0308929 21
## [8] 0.017107725 1.0142100 41
## [9] 0.045308741 0.9993970 107
## [10] 0.006816359 0.9933534 16
## [11] 0.014367816 0.9425307 32
## [12] 0.060344828 0.9397255 134
## [13] 0.020248597 0.9195895 44
## [14] 0.010157712 1.3158214 22
## [15] 0.010157712 0.9165646 22
## [16] 0.038893344 0.9139875 84
## [17] 0.037089014 0.9014011 79
## [18] 0.008487036 1.2885066 18
## [19] 0.007083667 1.2864807 15
## [20] 0.011828388 0.8944390 25
## Rules sorted by Highest Lift.
rules_lift <- sort (rules, by="lift", decreasing=TRUE)
inspect(head(rules_lift,20))
## lhs rhs support confidence
## [1] {whole milk,yogurt} => {sausage} 0.001470195 0.1317365
## [2] {sausage,whole milk} => {yogurt} 0.001470195 0.1641791
## [3] {sausage,yogurt} => {whole milk} 0.001470195 0.2558140
## [4] {flour} => {tropical fruit} 0.001069233 0.1095890
## [5] {processed cheese} => {root vegetables} 0.001069233 0.1052632
## [6] {soft cheese} => {yogurt} 0.001269714 0.1266667
## [7] {detergent} => {yogurt} 0.001069233 0.1240310
## [8] {chewing gum} => {yogurt} 0.001403368 0.1166667
## [9] {rolls/buns,sausage} => {whole milk} 0.001136060 0.2125000
## [10] {processed cheese} => {rolls/buns} 0.001470195 0.1447368
## [11] {packaged fruit/vegetables} => {rolls/buns} 0.001202887 0.1417323
## [12] {seasonal products} => {rolls/buns} 0.001002406 0.1415094
## [13] {herbs} => {yogurt} 0.001136060 0.1075949
## [14] {oil} => {soda} 0.001804330 0.1210762
## [15] {sausage,whole milk} => {soda} 0.001069233 0.1194030
## [16] {beverages} => {soda} 0.001871157 0.1129032
## [17] {red/blush wine} => {rolls/buns} 0.001336541 0.1273885
## [18] {sausage,whole milk} => {rolls/buns} 0.001136060 0.1268657
## [19] {rolls/buns,soda} => {other vegetables} 0.001136060 0.1404959
## [20] {sausage,soda} => {whole milk} 0.001069233 0.1797753
## coverage lift count
## [1] 0.011160118 2.183062 22
## [2] 0.008954825 1.911888 22
## [3] 0.005747126 1.619975 22
## [4] 0.009756750 1.617249 16
## [5] 0.010157712 1.513120 16
## [6] 0.010024058 1.475051 19
## [7] 0.008620690 1.444358 16
## [8] 0.012028869 1.358599 21
## [9] 0.005346164 1.345683 17
## [10] 0.010157712 1.315821 22
## [11] 0.008487036 1.288507 18
## [12] 0.007083667 1.286481 15
## [13] 0.010558674 1.252958 17
## [14] 0.014902433 1.246927 27
## [15] 0.008954825 1.229695 16
## [16] 0.016573109 1.162756 28
## [17] 0.010491847 1.158106 20
## [18] 0.008954825 1.153352 17
## [19] 0.008086073 1.150728 17
## [20] 0.005947608 1.138450 16
We can easily observe that there is strong association in {whole milk,yogurt} => {sausage} as it has high Lift of > 2 and good confidence score also.
We will reduce the number of rules by filtering out all rules with very low confidence score (median score of 0.12).
subrules <- rules[quality(rules)$confidence > 0.12]
subrules
## set of 71 rules
head(inspect(subrules))
## lhs rhs support confidence
## [1] {frozen fish} => {whole milk} 0.001069233 0.1568627
## [2] {seasonal products} => {rolls/buns} 0.001002406 0.1415094
## [3] {pot plants} => {other vegetables} 0.001002406 0.1282051
## [4] {pot plants} => {whole milk} 0.001002406 0.1282051
## [5] {pasta} => {whole milk} 0.001069233 0.1322314
## [6] {packaged fruit/vegetables} => {rolls/buns} 0.001202887 0.1417323
## [7] {detergent} => {yogurt} 0.001069233 0.1240310
## [8] {detergent} => {whole milk} 0.001403368 0.1627907
## [9] {semi-finished bread} => {whole milk} 0.001670676 0.1760563
## [10] {red/blush wine} => {rolls/buns} 0.001336541 0.1273885
## [11] {flour} => {whole milk} 0.001336541 0.1369863
## [12] {processed cheese} => {rolls/buns} 0.001470195 0.1447368
## [13] {processed cheese} => {whole milk} 0.001470195 0.1447368
## [14] {soft cheese} => {yogurt} 0.001269714 0.1266667
## [15] {cat food} => {whole milk} 0.001670676 0.1412429
## [16] {chewing gum} => {whole milk} 0.001670676 0.1388889
## [17] {hygiene articles} => {whole milk} 0.001737503 0.1268293
## [18] {candy} => {whole milk} 0.002138466 0.1488372
## [19] {ice cream} => {whole milk} 0.001937984 0.1277533
## [20] {grapes} => {whole milk} 0.001937984 0.1342593
## [21] {oil} => {soda} 0.001804330 0.1210762
## [22] {oil} => {other vegetables} 0.001804330 0.1210762
## [23] {oil} => {whole milk} 0.001937984 0.1300448
## [24] {hard cheese} => {whole milk} 0.001871157 0.1272727
## [25] {meat} => {other vegetables} 0.002138466 0.1269841
## [26] {meat} => {whole milk} 0.002205293 0.1309524
## [27] {ham} => {whole milk} 0.002739909 0.1601562
## [28] {frozen meals} => {other vegetables} 0.002138466 0.1274900
## [29] {sugar} => {whole milk} 0.002472601 0.1396226
## [30] {long life bakery product} => {whole milk} 0.002405774 0.1343284
## [31] {waffles} => {whole milk} 0.002606255 0.1407942
## [32] {onions} => {whole milk} 0.002940390 0.1452145
## [33] {berries} => {other vegetables} 0.002673082 0.1226994
## [34] {hamburger meat} => {whole milk} 0.003074044 0.1406728
## [35] {cream cheese} => {whole milk} 0.002873563 0.1214689
## [36] {chocolate} => {whole milk} 0.002940390 0.1246459
## [37] {white bread} => {whole milk} 0.003140871 0.1309192
## [38] {chicken} => {whole milk} 0.003408180 0.1223022
## [39] {frozen vegetables} => {whole milk} 0.003809142 0.1360382
## [40] {coffee} => {whole milk} 0.003809142 0.1205074
## [41] {margarine} => {whole milk} 0.004076450 0.1265560
## [42] {beef} => {whole milk} 0.004677894 0.1377953
## [43] {fruit/vegetable juice} => {whole milk} 0.004410585 0.1296660
## [44] {curd} => {whole milk} 0.004143277 0.1230159
## [45] {butter} => {whole milk} 0.004677894 0.1328273
## [46] {pork} => {whole milk} 0.005012029 0.1351351
## [47] {domestic eggs} => {whole milk} 0.005279337 0.1423423
## [48] {newspapers} => {whole milk} 0.005613472 0.1443299
## [49] {frankfurter} => {other vegetables} 0.005145683 0.1362832
## [50] {frankfurter} => {whole milk} 0.005279337 0.1398230
## [51] {bottled beer} => {whole milk} 0.007150495 0.1578171
## [52] {canned beer} => {whole milk} 0.006014435 0.1282051
## [53] {shopping bags} => {whole milk} 0.006348570 0.1334270
## [54] {pip fruit} => {whole milk} 0.006615878 0.1348774
## [55] {pastry} => {whole milk} 0.006482224 0.1253230
## [56] {citrus fruit} => {whole milk} 0.007150495 0.1345912
## [57] {sausage} => {whole milk} 0.008954825 0.1483942
## [58] {tropical fruit} => {whole milk} 0.008219727 0.1213018
## [59] {yogurt} => {whole milk} 0.011160118 0.1299611
## [60] {rolls/buns} => {whole milk} 0.013966854 0.1269745
## [61] {other vegetables} => {whole milk} 0.014835605 0.1215107
## [62] {sausage,yogurt} => {whole milk} 0.001470195 0.2558140
## [63] {sausage,whole milk} => {yogurt} 0.001470195 0.1641791
## [64] {whole milk,yogurt} => {sausage} 0.001470195 0.1317365
## [65] {sausage,soda} => {whole milk} 0.001069233 0.1797753
## [66] {rolls/buns,sausage} => {whole milk} 0.001136060 0.2125000
## [67] {sausage,whole milk} => {rolls/buns} 0.001136060 0.1268657
## [68] {rolls/buns,yogurt} => {whole milk} 0.001336541 0.1709402
## [69] {other vegetables,yogurt} => {whole milk} 0.001136060 0.1404959
## [70] {rolls/buns,soda} => {other vegetables} 0.001136060 0.1404959
## [71] {rolls/buns,soda} => {whole milk} 0.001002406 0.1239669
## coverage lift count
## [1] 0.006816359 0.9933534 16
## [2] 0.007083667 1.2864807 15
## [3] 0.007818765 1.0500611 15
## [4] 0.007818765 0.8118754 15
## [5] 0.008086073 0.8373723 16
## [6] 0.008487036 1.2885066 18
## [7] 0.008620690 1.4443580 16
## [8] 0.008620690 1.0308929 21
## [9] 0.009489441 1.1148993 25
## [10] 0.010491847 1.1581057 20
## [11] 0.009756750 0.8674833 20
## [12] 0.010157712 1.3158214 22
## [13] 0.010157712 0.9165646 22
## [14] 0.010024058 1.4750506 19
## [15] 0.011828388 0.8944390 25
## [16] 0.012028869 0.8795317 25
## [17] 0.013699546 0.8031626 26
## [18] 0.014367816 0.9425307 32
## [19] 0.015169741 0.8090142 29
## [20] 0.014434643 0.8502139 29
## [21] 0.014902433 1.2469269 27
## [22] 0.014902433 0.9916720 27
## [23] 0.014902433 0.8235256 29
## [24] 0.014701951 0.8059708 28
## [25] 0.016840417 1.0400605 32
## [26] 0.016840417 0.8292727 33
## [27] 0.017107725 1.0142100 41
## [28] 0.016773590 1.0442041 32
## [29] 0.017709169 0.8841783 37
## [30] 0.017909650 0.8506515 36
## [31] 0.018511093 0.8915974 39
## [32] 0.020248597 0.9195895 44
## [33] 0.021785619 1.0049664 40
## [34] 0.021852446 0.8908284 46
## [35] 0.023656776 0.7692175 43
## [36] 0.023589949 0.7893361 44
## [37] 0.023990912 0.8290627 47
## [38] 0.027866881 0.7744941 51
## [39] 0.028000535 0.8614792 57
## [40] 0.031609195 0.7631285 57
## [41] 0.032210639 0.8014322 61
## [42] 0.033948142 0.8726062 70
## [43] 0.034014969 0.8211266 66
## [44] 0.033680834 0.7790138 62
## [45] 0.035217856 0.8411460 70
## [46] 0.037089014 0.8557605 75
## [47] 0.037089014 0.9014011 79
## [48] 0.038893344 0.9139875 84
## [49] 0.037757284 1.1162242 77
## [50] 0.037757284 0.8854471 79
## [51] 0.045308741 0.9993970 107
## [52] 0.046912590 0.8118754 90
## [53] 0.047580861 0.8449433 95
## [54] 0.049051056 0.8541283 99
## [55] 0.051724138 0.7936239 97
## [56] 0.053127506 0.8523160 107
## [57] 0.060344828 0.9397255 134
## [58] 0.067762630 0.7681590 123
## [59] 0.085872761 0.8229952 167
## [60] 0.109997327 0.8040822 209
## [61] 0.122093023 0.7694819 222
## [62] 0.005747126 1.6199746 22
## [63] 0.008954825 1.9118880 22
## [64] 0.011160118 2.1830624 22
## [65] 0.005947608 1.1384500 16
## [66] 0.005346164 1.3456835 17
## [67] 0.008954825 1.1533523 17
## [68] 0.007818765 1.0825005 20
## [69] 0.008086073 0.8897081 17
## [70] 0.008086073 1.1507281 17
## [71] 0.008086073 0.7850365 15
plot(subrules, method = "grouped", control = list(k = 50),engine="grid")
This clearly shows a strong association between {whole milk, yougurt} & {sausage}.
## Removing Redundancy
redundent <- is.redundant(rules, measure="lift")
which(redundent)
## [1] 129 130 131
rules.pruned <- rules[!redundent]
inspect(rules.pruned)
## lhs rhs support
## [1] {frozen fish} => {whole milk} 0.001069233
## [2] {seasonal products} => {rolls/buns} 0.001002406
## [3] {pot plants} => {other vegetables} 0.001002406
## [4] {pot plants} => {whole milk} 0.001002406
## [5] {pasta} => {whole milk} 0.001069233
## [6] {pickled vegetables} => {whole milk} 0.001002406
## [7] {packaged fruit/vegetables} => {rolls/buns} 0.001202887
## [8] {detergent} => {yogurt} 0.001069233
## [9] {detergent} => {rolls/buns} 0.001002406
## [10] {detergent} => {whole milk} 0.001403368
## [11] {semi-finished bread} => {other vegetables} 0.001002406
## [12] {semi-finished bread} => {whole milk} 0.001670676
## [13] {red/blush wine} => {rolls/buns} 0.001336541
## [14] {red/blush wine} => {other vegetables} 0.001136060
## [15] {flour} => {tropical fruit} 0.001069233
## [16] {flour} => {whole milk} 0.001336541
## [17] {herbs} => {yogurt} 0.001136060
## [18] {herbs} => {whole milk} 0.001136060
## [19] {processed cheese} => {root vegetables} 0.001069233
## [20] {processed cheese} => {rolls/buns} 0.001470195
## [21] {processed cheese} => {whole milk} 0.001470195
## [22] {soft cheese} => {yogurt} 0.001269714
## [23] {soft cheese} => {rolls/buns} 0.001002406
## [24] {soft cheese} => {other vegetables} 0.001202887
## [25] {soft cheese} => {whole milk} 0.001202887
## [26] {white wine} => {whole milk} 0.001269714
## [27] {cat food} => {whole milk} 0.001670676
## [28] {chewing gum} => {yogurt} 0.001403368
## [29] {chewing gum} => {whole milk} 0.001670676
## [30] {specialty bar} => {other vegetables} 0.001670676
## [31] {specialty bar} => {whole milk} 0.001670676
## [32] {hygiene articles} => {other vegetables} 0.001403368
## [33] {hygiene articles} => {whole milk} 0.001737503
## [34] {candy} => {rolls/buns} 0.001470195
## [35] {candy} => {whole milk} 0.002138466
## [36] {sliced cheese} => {other vegetables} 0.001403368
## [37] {sliced cheese} => {whole milk} 0.001470195
## [38] {ice cream} => {rolls/buns} 0.001737503
## [39] {ice cream} => {whole milk} 0.001937984
## [40] {grapes} => {other vegetables} 0.001603849
## [41] {grapes} => {whole milk} 0.001937984
## [42] {oil} => {soda} 0.001804330
## [43] {oil} => {other vegetables} 0.001804330
## [44] {oil} => {whole milk} 0.001937984
## [45] {hard cheese} => {rolls/buns} 0.001670676
## [46] {hard cheese} => {other vegetables} 0.001670676
## [47] {hard cheese} => {whole milk} 0.001871157
## [48] {specialty chocolate} => {other vegetables} 0.001670676
## [49] {meat} => {other vegetables} 0.002138466
## [50] {meat} => {whole milk} 0.002205293
## [51] {beverages} => {soda} 0.001871157
## [52] {beverages} => {other vegetables} 0.001737503
## [53] {beverages} => {whole milk} 0.001937984
## [54] {ham} => {whole milk} 0.002739909
## [55] {frozen meals} => {other vegetables} 0.002138466
## [56] {frozen meals} => {whole milk} 0.001937984
## [57] {sugar} => {whole milk} 0.002472601
## [58] {long life bakery product} => {whole milk} 0.002405774
## [59] {waffles} => {whole milk} 0.002606255
## [60] {salty snack} => {rolls/buns} 0.001937984
## [61] {salty snack} => {other vegetables} 0.002205293
## [62] {salty snack} => {whole milk} 0.001937984
## [63] {onions} => {whole milk} 0.002940390
## [64] {UHT-milk} => {other vegetables} 0.002138466
## [65] {UHT-milk} => {whole milk} 0.002539428
## [66] {berries} => {other vegetables} 0.002673082
## [67] {berries} => {whole milk} 0.002272120
## [68] {hamburger meat} => {other vegetables} 0.002205293
## [69] {hamburger meat} => {whole milk} 0.003074044
## [70] {dessert} => {whole milk} 0.002405774
## [71] {napkins} => {whole milk} 0.002405774
## [72] {cream cheese} => {whole milk} 0.002873563
## [73] {chocolate} => {rolls/buns} 0.002806736
## [74] {chocolate} => {whole milk} 0.002940390
## [75] {white bread} => {other vegetables} 0.002606255
## [76] {white bread} => {whole milk} 0.003140871
## [77] {chicken} => {rolls/buns} 0.002873563
## [78] {chicken} => {whole milk} 0.003408180
## [79] {frozen vegetables} => {other vegetables} 0.003140871
## [80] {frozen vegetables} => {whole milk} 0.003809142
## [81] {coffee} => {whole milk} 0.003809142
## [82] {margarine} => {whole milk} 0.004076450
## [83] {beef} => {whole milk} 0.004677894
## [84] {fruit/vegetable juice} => {rolls/buns} 0.003742315
## [85] {fruit/vegetable juice} => {whole milk} 0.004410585
## [86] {curd} => {other vegetables} 0.003541834
## [87] {curd} => {whole milk} 0.004143277
## [88] {butter} => {whole milk} 0.004677894
## [89] {pork} => {other vegetables} 0.003942796
## [90] {pork} => {whole milk} 0.005012029
## [91] {domestic eggs} => {whole milk} 0.005279337
## [92] {brown bread} => {whole milk} 0.004477412
## [93] {newspapers} => {whole milk} 0.005613472
## [94] {frankfurter} => {other vegetables} 0.005145683
## [95] {frankfurter} => {whole milk} 0.005279337
## [96] {whipped/sour cream} => {whole milk} 0.004611067
## [97] {bottled beer} => {other vegetables} 0.004677894
## [98] {bottled beer} => {whole milk} 0.007150495
## [99] {canned beer} => {whole milk} 0.006014435
## [100] {shopping bags} => {other vegetables} 0.004945202
## [101] {shopping bags} => {whole milk} 0.006348570
## [102] {pip fruit} => {rolls/buns} 0.004945202
## [103] {pip fruit} => {other vegetables} 0.004945202
## [104] {pip fruit} => {whole milk} 0.006615878
## [105] {pastry} => {whole milk} 0.006482224
## [106] {citrus fruit} => {whole milk} 0.007150495
## [107] {bottled water} => {whole milk} 0.007150495
## [108] {sausage} => {whole milk} 0.008954825
## [109] {root vegetables} => {whole milk} 0.007551457
## [110] {tropical fruit} => {whole milk} 0.008219727
## [111] {yogurt} => {whole milk} 0.011160118
## [112] {soda} => {whole milk} 0.011627907
## [113] {rolls/buns} => {whole milk} 0.013966854
## [114] {other vegetables} => {whole milk} 0.014835605
## [115] {sausage,yogurt} => {whole milk} 0.001470195
## [116] {sausage,whole milk} => {yogurt} 0.001470195
## [117] {whole milk,yogurt} => {sausage} 0.001470195
## [118] {sausage,soda} => {whole milk} 0.001069233
## [119] {sausage,whole milk} => {soda} 0.001069233
## [120] {rolls/buns,sausage} => {whole milk} 0.001136060
## [121] {sausage,whole milk} => {rolls/buns} 0.001136060
## [122] {rolls/buns,yogurt} => {whole milk} 0.001336541
## [123] {whole milk,yogurt} => {rolls/buns} 0.001336541
## [124] {other vegetables,yogurt} => {whole milk} 0.001136060
## [125] {whole milk,yogurt} => {other vegetables} 0.001136060
## [126] {rolls/buns,soda} => {other vegetables} 0.001136060
## [127] {other vegetables,soda} => {rolls/buns} 0.001136060
## [128] {other vegetables,rolls/buns} => {soda} 0.001136060
## confidence coverage lift count
## [1] 0.1568627 0.006816359 0.9933534 16
## [2] 0.1415094 0.007083667 1.2864807 15
## [3] 0.1282051 0.007818765 1.0500611 15
## [4] 0.1282051 0.007818765 0.8118754 15
## [5] 0.1322314 0.008086073 0.8373723 16
## [6] 0.1119403 0.008954825 0.7088763 15
## [7] 0.1417323 0.008487036 1.2885066 18
## [8] 0.1240310 0.008620690 1.4443580 16
## [9] 0.1162791 0.008620690 1.0571081 15
## [10] 0.1627907 0.008620690 1.0308929 21
## [11] 0.1056338 0.009489441 0.8651911 15
## [12] 0.1760563 0.009489441 1.1148993 25
## [13] 0.1273885 0.010491847 1.1581057 20
## [14] 0.1082803 0.010491847 0.8868668 17
## [15] 0.1095890 0.009756750 1.6172489 16
## [16] 0.1369863 0.009756750 0.8674833 20
## [17] 0.1075949 0.010558674 1.2529577 17
## [18] 0.1075949 0.010558674 0.6813587 17
## [19] 0.1052632 0.010157712 1.5131200 16
## [20] 0.1447368 0.010157712 1.3158214 22
## [21] 0.1447368 0.010157712 0.9165646 22
## [22] 0.1266667 0.010024058 1.4750506 19
## [23] 0.1000000 0.010024058 0.9091130 15
## [24] 0.1200000 0.010024058 0.9828571 18
## [25] 0.1200000 0.010024058 0.7599154 18
## [26] 0.1085714 0.011694734 0.6875425 19
## [27] 0.1412429 0.011828388 0.8944390 25
## [28] 0.1166667 0.012028869 1.3585992 21
## [29] 0.1388889 0.012028869 0.8795317 25
## [30] 0.1196172 0.013966854 0.9797220 25
## [31] 0.1196172 0.013966854 0.7574914 25
## [32] 0.1024390 0.013699546 0.8390244 21
## [33] 0.1268293 0.013699546 0.8031626 26
## [34] 0.1023256 0.014367816 0.9302552 22
## [35] 0.1488372 0.014367816 0.9425307 32
## [36] 0.1000000 0.014033681 0.8190476 21
## [37] 0.1047619 0.014033681 0.6634182 22
## [38] 0.1145374 0.015169741 1.0412748 26
## [39] 0.1277533 0.015169741 0.8090142 29
## [40] 0.1111111 0.014434643 0.9100529 24
## [41] 0.1342593 0.014434643 0.8502139 29
## [42] 0.1210762 0.014902433 1.2469269 27
## [43] 0.1210762 0.014902433 0.9916720 27
## [44] 0.1300448 0.014902433 0.8235256 29
## [45] 0.1136364 0.014701951 1.0330830 25
## [46] 0.1136364 0.014701951 0.9307359 25
## [47] 0.1272727 0.014701951 0.8059708 28
## [48] 0.1046025 0.015971665 0.8567444 25
## [49] 0.1269841 0.016840417 1.0400605 32
## [50] 0.1309524 0.016840417 0.8292727 33
## [51] 0.1129032 0.016573109 1.1627556 28
## [52] 0.1048387 0.016573109 0.8586790 26
## [53] 0.1169355 0.016573109 0.7405089 29
## [54] 0.1601562 0.017107725 1.0142100 41
## [55] 0.1274900 0.016773590 1.0442041 32
## [56] 0.1155378 0.016773590 0.7316582 29
## [57] 0.1396226 0.017709169 0.8841783 37
## [58] 0.1343284 0.017909650 0.8506515 36
## [59] 0.1407942 0.018511093 0.8915974 39
## [60] 0.1032028 0.018778401 0.9382305 29
## [61] 0.1174377 0.018778401 0.9618709 33
## [62] 0.1032028 0.018778401 0.6535452 29
## [63] 0.1452145 0.020248597 0.9195895 44
## [64] 0.1000000 0.021384657 0.8190476 32
## [65] 0.1187500 0.021384657 0.7519996 38
## [66] 0.1226994 0.021785619 1.0049664 40
## [67] 0.1042945 0.021785619 0.6604581 34
## [68] 0.1009174 0.021852446 0.8265618 33
## [69] 0.1406728 0.021852446 0.8908284 46
## [70] 0.1019830 0.023589949 0.6458204 36
## [71] 0.1087613 0.022119754 0.6887450 36
## [72] 0.1214689 0.023656776 0.7692175 43
## [73] 0.1189802 0.023589949 1.0816642 42
## [74] 0.1246459 0.023589949 0.7893361 44
## [75] 0.1086351 0.023990912 0.8897732 39
## [76] 0.1309192 0.023990912 0.8290627 47
## [77] 0.1031175 0.027866881 0.9374547 43
## [78] 0.1223022 0.027866881 0.7744941 51
## [79] 0.1121718 0.028000535 0.9187408 47
## [80] 0.1360382 0.028000535 0.8614792 57
## [81] 0.1205074 0.031609195 0.7631285 57
## [82] 0.1265560 0.032210639 0.8014322 61
## [83] 0.1377953 0.033948142 0.8726062 70
## [84] 0.1100196 0.034014969 1.0002029 56
## [85] 0.1296660 0.034014969 0.8211266 66
## [86] 0.1051587 0.033680834 0.8613001 53
## [87] 0.1230159 0.033680834 0.7790138 62
## [88] 0.1328273 0.035217856 0.8411460 70
## [89] 0.1063063 0.037089014 0.8706993 59
## [90] 0.1351351 0.037089014 0.8557605 75
## [91] 0.1423423 0.037089014 0.9014011 79
## [92] 0.1190053 0.037623630 0.7536165 67
## [93] 0.1443299 0.038893344 0.9139875 84
## [94] 0.1362832 0.037757284 1.1162242 77
## [95] 0.1398230 0.037757284 0.8854471 79
## [96] 0.1055046 0.043704892 0.6681213 69
## [97] 0.1032448 0.045308741 0.8456244 70
## [98] 0.1578171 0.045308741 0.9993970 107
## [99] 0.1282051 0.046912590 0.8118754 90
## [100] 0.1039326 0.047580861 0.8512574 74
## [101] 0.1334270 0.047580861 0.8449433 95
## [102] 0.1008174 0.049051056 0.9165444 74
## [103] 0.1008174 0.049051056 0.8257428 74
## [104] 0.1348774 0.049051056 0.8541283 99
## [105] 0.1253230 0.051724138 0.7936239 97
## [106] 0.1345912 0.053127506 0.8523160 107
## [107] 0.1178414 0.060678963 0.7462458 107
## [108] 0.1483942 0.060344828 0.9397255 134
## [109] 0.1085495 0.069566961 0.6874034 113
## [110] 0.1213018 0.067762630 0.7681590 123
## [111] 0.1299611 0.085872761 0.8229952 167
## [112] 0.1197522 0.097099706 0.7583464 174
## [113] 0.1269745 0.109997327 0.8040822 209
## [114] 0.1215107 0.122093023 0.7694819 222
## [115] 0.2558140 0.005747126 1.6199746 22
## [116] 0.1641791 0.008954825 1.9118880 22
## [117] 0.1317365 0.011160118 2.1830624 22
## [118] 0.1797753 0.005947608 1.1384500 16
## [119] 0.1194030 0.008954825 1.2296946 16
## [120] 0.2125000 0.005346164 1.3456835 17
## [121] 0.1268657 0.008954825 1.1533523 17
## [122] 0.1709402 0.007818765 1.0825005 20
## [123] 0.1197605 0.011160118 1.0887581 20
## [124] 0.1404959 0.008086073 0.8897081 17
## [125] 0.1017964 0.011160118 0.8337610 17
## [126] 0.1404959 0.008086073 1.1507281 17
## [127] 0.1172414 0.009689922 1.0658566 17
## [128] 0.1075949 0.010558674 1.1080872 17
Hence, there are no redundant rules now.
Graph-based visualization offers a very clear representation of rules but they tend to easily become cluttered and thus are only viable for very small sets of rules.
plot(rules.pruned, method = "graph")
## Warning: Too many rules supplied. Only plotting the best 100 rules using lift
## (change control parameter max if needed)
For the following plots we select the 10 rules with the highest lift.
subrules2 <- head(rules.pruned, n = 10, by = "lift")
inspect(subrules2)
## lhs rhs support confidence
## [1] {whole milk,yogurt} => {sausage} 0.001470195 0.1317365
## [2] {sausage,whole milk} => {yogurt} 0.001470195 0.1641791
## [3] {sausage,yogurt} => {whole milk} 0.001470195 0.2558140
## [4] {flour} => {tropical fruit} 0.001069233 0.1095890
## [5] {processed cheese} => {root vegetables} 0.001069233 0.1052632
## [6] {soft cheese} => {yogurt} 0.001269714 0.1266667
## [7] {detergent} => {yogurt} 0.001069233 0.1240310
## [8] {chewing gum} => {yogurt} 0.001403368 0.1166667
## [9] {rolls/buns,sausage} => {whole milk} 0.001136060 0.2125000
## [10] {processed cheese} => {rolls/buns} 0.001470195 0.1447368
## coverage lift count
## [1] 0.011160118 2.183062 22
## [2] 0.008954825 1.911888 22
## [3] 0.005747126 1.619975 22
## [4] 0.009756750 1.617249 16
## [5] 0.010157712 1.513120 16
## [6] 0.010024058 1.475051 19
## [7] 0.008620690 1.444358 16
## [8] 0.012028869 1.358599 21
## [9] 0.005346164 1.345683 17
## [10] 0.010157712 1.315821 22
plot(subrules2, method = "graph", engine="igraph")
## conditional rules
subrules3 <- rules.pruned[quality(rules.pruned)$confidence > 0.12 & quality(rules.pruned)$lift > 1.2]
subrules3 <- head(subrules3, n = 10, by = c("lift","confidence"))
inspect(subrules3)
## lhs rhs support confidence
## [1] {whole milk,yogurt} => {sausage} 0.001470195 0.1317365
## [2] {sausage,whole milk} => {yogurt} 0.001470195 0.1641791
## [3] {sausage,yogurt} => {whole milk} 0.001470195 0.2558140
## [4] {soft cheese} => {yogurt} 0.001269714 0.1266667
## [5] {detergent} => {yogurt} 0.001069233 0.1240310
## [6] {rolls/buns,sausage} => {whole milk} 0.001136060 0.2125000
## [7] {processed cheese} => {rolls/buns} 0.001470195 0.1447368
## [8] {packaged fruit/vegetables} => {rolls/buns} 0.001202887 0.1417323
## [9] {seasonal products} => {rolls/buns} 0.001002406 0.1415094
## [10] {oil} => {soda} 0.001804330 0.1210762
## coverage lift count
## [1] 0.011160118 2.183062 22
## [2] 0.008954825 1.911888 22
## [3] 0.005747126 1.619975 22
## [4] 0.010024058 1.475051 19
## [5] 0.008620690 1.444358 16
## [6] 0.005346164 1.345683 17
## [7] 0.010157712 1.315821 22
## [8] 0.008487036 1.288507 18
## [9] 0.007083667 1.286481 15
## [10] 0.014902433 1.246927 27
plot(subrules3, method = "graph",engine="igraph")
plot(subrules3, method = "paracoord", control=list(reorder=TRUE))
We have done market basket analysis using Association rule mining with Apriori Algorithm. The dataset used in the notebook can be downloaded from
https://www.kaggle.com/heeraldedhia/groceries-dataset.
Based on the association rule mining techniques we have observed that due to higher lift and confidence scores following items have higher affinity and to be purchased together.