Association Rule Mining

Identify the association rules for Market Basket Analysis

library(data.table)
library(arules)
## Loading required package: Matrix
## 
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
## 
##     abbreviate, write
library(arulesViz)
library(gridExtra)
library(ggplot2)
## Loading & inspecting the data as a data.table
DT <- fread("~/Groceries_dataset.csv")
DT[1:10]
## Let's have a look of the data table.
str(DT)
## Classes 'data.table' and 'data.frame':   38765 obs. of  3 variables:
##  $ Member_number  : int  1808 2552 2300 1187 3037 4941 4501 3803 2762 4119 ...
##  $ Date           : chr  "21-07-2015" "05-01-2015" "19-09-2015" "12-12-2015" ...
##  $ itemDescription: chr  "tropical fruit" "whole milk" "pip fruit" "other vegetables" ...
##  - attr(*, ".internal.selfref")=<externalptr>

here are 3 variables(columns) and total 38765 observations(rows) in the data table.

## Let's see the summary of data table
summary(DT)
##  Member_number      Date           itemDescription   
##  Min.   :1000   Length:38765       Length:38765      
##  1st Qu.:2002   Class :character   Class :character  
##  Median :3005   Mode  :character   Mode  :character  
##  Mean   :3004                                        
##  3rd Qu.:4007                                        
##  Max.   :5000

There are total 38,765 rows and three columns in the data table. The Date column is not in date time format also itemDescription is factor however it’s currently as character. Let’s convert the columns into proper data types.

## Changing the `Date` column data to datetime object and `itemDescription` to factor.
DT$Date <- as.Date(DT$Date, format="%d-%m-%Y")
DT$itemDescription <- as.factor(DT$itemDescription)
DT[1:10]
library(DataExplorer)
introduce(DT)
plot_intro(DT)

This shows us that there are no missing observation in any of the three columns in the data table.

## Sort the data table by Member Number & Date
setkey(DT,"Member_number", "Date")
DT[1:10]
## Merging all the items purchased by a member on a specific date to one row.
itemList <- DT[, .(itemList= paste(itemDescription, collapse=",")), by=list(Member_number,Date)]
itemList[1:10]
## Remove Member_number & Date Columns
itemList <- itemList[,c("itemList"), with=FALSE]
itemList[1:10]
write.csv(itemList,"ItemList.csv", quote = FALSE, row.names = TRUE)
itemList[1:10]
## Creating transactions object in basket format
trans = read.transactions(file="ItemList.csv", rm.duplicates= TRUE, format="basket",sep=",",cols=1);
## distribution of transactions with duplicates:
## items
##   1   2   3   4 
## 662  39   5   1
##removing quotes from the transaction
trans@itemInfo$labels <- gsub("\"","",trans@itemInfo$labels)
summary(trans)
## transactions as itemMatrix in sparse format with
##  14964 rows (elements/itemsets/transactions) and
##  168 columns (items) and a density of 0.01511843 
## 
## most frequent items:
##       whole milk other vegetables       rolls/buns             soda 
##             2363             1827             1646             1453 
##           yogurt          (Other) 
##             1285            29433 
## 
## element (itemset/transaction) length distribution:
## sizes
##     1     2     3     4     5     6     7     8     9    10 
##   206 10012  2727  1273   338   179   113    96    19     1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    2.00    2.00    2.54    3.00   10.00 
## 
## includes extended item information - examples:
##             labels
## 1 abrasive cleaner
## 2 artif. sweetener
## 3   baby cosmetics
## 
## includes extended transaction information - examples:
##   transactionID
## 1              
## 2             1
## 3             2

Apriori Algorithm

The Apriori algorithm generates the most relevant set of rules from a given transaction data. It also shows the support, confidence and lift of those rules. These three measure can be used to decide the relative strength of the rules.

Lets consider the rule A => B in order to compute these metrics.

Support is the ratio of no. of transactions with both A & B to the total no. of transactions.

\[Support = {P(A \cap B)}\] Confidence is the ratio of no of transactions with both A & B to the total no. of transaction with A.

\[Confidence = \frac {P(A\cap B)}{P(A)}\]

Expected Confidence is the ratio of Number of Transactions with B to the total Number of Transactions.

\[Expected \ Confidence = {P(B)}\]

And finally, Lift is the ratio of Confidence & Expected Confidence.

\[Lift = \frac {P(A \cap B)}{P(A).P(B)}\]

Lift is the factor by which, the co-occurrence of A and B exceeds the expected probability of A and B co-occurring, had they been independent. So, higher the lift, higher the chance of A and B occurring together.

frequentItems <- eclat(trans, parameter = list(supp = 0.05, maxlen = 15)) # calculates support for frequent items
## Eclat
## 
## parameter specification:
##  tidLists support minlen maxlen            target  ext
##     FALSE    0.05      1     15 frequent itemsets TRUE
## 
## algorithmic control:
##  sparse sort verbose
##       7   -2    TRUE
## 
## Absolute minimum support count: 748 
## 
## create itemset ... 
## set transactions ...[168 item(s), 14964 transaction(s)] done [0.01s].
## sorting and recoding items ... [11 item(s)] done [0.00s].
## creating sparse bit matrix ... [11 row(s), 14964 column(s)] done [0.00s].
## writing  ... [11 set(s)] done [0.00s].
## Creating S4 object  ... done [0.00s].
inspect(head(frequentItems,10))
##      items              support    transIdenticalToItemsets count
## [1]  {whole milk}       0.15791232 2363                     2363 
## [2]  {other vegetables} 0.12209302 1827                     1827 
## [3]  {rolls/buns}       0.10999733 1646                     1646 
## [4]  {soda}             0.09709971 1453                     1453 
## [5]  {yogurt}           0.08587276 1285                     1285 
## [6]  {tropical fruit}   0.06776263 1014                     1014 
## [7]  {root vegetables}  0.06956696 1041                     1041 
## [8]  {sausage}          0.06034483  903                      903 
## [9]  {bottled water}    0.06067896  908                      908 
## [10] {citrus fruit}     0.05312751  795                      795
 # plot frequent items
itemFrequencyPlot(trans, topN=10, type="absolute", main="Item Frequency")

## Finding Association Rules
rules <- apriori(trans)
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.8    0.1    1 none FALSE            TRUE       5     0.1      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 1496 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[168 item(s), 14964 transaction(s)] done [0.01s].
## sorting and recoding items ... [3 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
rules
## set of 0 rules

So with default parameters for support(0.1) the algorithm is not returning any rules.

We need to fine-tune the parameters to get some association rules.

# Support and confidence values
supportLevels <- c(0.05, 0.01, 0.005,0.001)
confidenceLevels <- c(0.5,0.45,0.4,0.35,0.3,0.25,0.2,0.15,0.1)

# Empty integers 
rules_sup5 <- integer(length=9)
rules_sup1 <- integer(length=9)
rules_sup0.5 <- integer(length=9)
rules_sup0.1 <- integer(length=9)

# Apriori algorithm with a support level of 5%
for (i in 1:length(confidenceLevels)) {
  
  rules_sup5[i] <- length(apriori(trans, parameter=list(sup=supportLevels[1], 
                                   conf=confidenceLevels[i], target="rules")))
  
}

# Apriori algorithm with a support level of 1%
for (i in 1:length(confidenceLevels)){
  
  rules_sup1[i] <- length(apriori(trans, parameter=list(sup=supportLevels[2], 
                                  conf=confidenceLevels[i], target="rules")))
  
}

# Apriori algorithm with a support level of 0.5%
for (i in 1:length(confidenceLevels)){
  
  rules_sup0.5[i] <- length(apriori(trans, parameter=list(sup=supportLevels[3], 
                                  conf=confidenceLevels[i], target="rules")))
  
}

# Apriori algorithm with a support level of 0.1%
for (i in 1:length(confidenceLevels)){
  
  rules_sup0.1[i] <- length(apriori(trans, parameter=list(sup=supportLevels[4], 
                                    conf=confidenceLevels[i], target="rules")))
  
}
# Data frame
num_rules <- data.table(rules_sup5, rules_sup1, rules_sup0.5, rules_sup0.1, confidenceLevels)

ggplot(data=num_rules, aes(x=confidenceLevels)) +
  
  # Plot line and points (support level of 5%)
  geom_line(aes(y=rules_sup5, colour="Support level of 5%")) + 
  geom_point(aes(y=rules_sup5, colour="Support level of 5%")) +
  
  # Plot line and points (support level of 1%)
  geom_line(aes(y=rules_sup1, colour="Support level of 1%")) +
  geom_point(aes(y=rules_sup1, colour="Support level of 1%")) +
  
  # Plot line and points (support level of 0.5%)
  geom_line(aes(y=rules_sup0.5, colour="Support level of 0.5%")) + 
  geom_point(aes(y=rules_sup0.5, colour="Support level of 0.5%")) +
  
  # Plot line and points (support level of 0.1%)
  geom_line(aes(y=rules_sup0.1, colour="Support level of 0.1%")) +
  geom_point(aes(y=rules_sup0.1, colour="Support level of 0.1%")) +
  
  # Labs and theme
  labs(x="Confidence levels", y="Number of rules found", 
       title="Apriori algorithm with different support levels") +
  theme_bw() +
  theme(legend.title=element_blank())

After analyzing the graph above,

  • Support Level 5%: We have almost negligible rules so it’s of no use for us.
  • Support Level 1%: We have some rules but at very low confidence level of around 3.5%. We need to look at support level below 1% to get some rules with reasonable level of confidence.
  • Support Level 0.5%: It has almost 25 rules with confidence level of 10%.
  • Support Level 0.1%: It has almost 120 rules with a confidence level of 10%.

We will consider a support level of 0.1% and a confidence level of 10%.

## rules with specified parameters (support=0.1% and confidence 10% with minimum length of 2.)
rules <- apriori(trans, parameter=list(minlen=2,
                                        supp=0.001, 
                                        conf=0.1))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.1    0.1    1 none FALSE            TRUE       5   0.001      2
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 14 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[168 item(s), 14964 transaction(s)] done [0.01s].
## sorting and recoding items ... [149 item(s)] done [0.00s].
## creating transaction tree ... done [0.01s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [131 rule(s)] done [0.00s].
## creating S4 object  ... done [0.01s].
summary(rules)
## set of 131 rules
## 
## rule length distribution (lhs + rhs):sizes
##   2   3 
## 114  17 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2.00    2.00    2.00    2.13    2.00    3.00 
## 
## summary of quality measures:
##     support           confidence        coverage             lift       
##  Min.   :0.001002   Min.   :0.1000   Min.   :0.005346   Min.   :0.6458  
##  1st Qu.:0.001337   1st Qu.:0.1098   1st Qu.:0.010325   1st Qu.:0.8075  
##  Median :0.001938   Median :0.1215   Median :0.016573   Median :0.8795  
##  Mean   :0.002933   Mean   :0.1257   Mean   :0.023743   Mean   :0.9465  
##  3rd Qu.:0.003776   3rd Qu.:0.1347   3rd Qu.:0.031910   3rd Qu.:1.0320  
##  Max.   :0.014836   Max.   :0.2558   Max.   :0.122093   Max.   :2.1831  
##      count       
##  Min.   : 15.00  
##  1st Qu.: 20.00  
##  Median : 29.00  
##  Mean   : 43.89  
##  3rd Qu.: 56.50  
##  Max.   :222.00  
## 
## mining info:
##   data ntransactions support confidence
##  trans         14964   0.001        0.1
inspect(head(rules,20))
##      lhs                            rhs                support     confidence
## [1]  {frozen fish}               => {whole milk}       0.001069233 0.1568627 
## [2]  {seasonal products}         => {rolls/buns}       0.001002406 0.1415094 
## [3]  {pot plants}                => {other vegetables} 0.001002406 0.1282051 
## [4]  {pot plants}                => {whole milk}       0.001002406 0.1282051 
## [5]  {pasta}                     => {whole milk}       0.001069233 0.1322314 
## [6]  {pickled vegetables}        => {whole milk}       0.001002406 0.1119403 
## [7]  {packaged fruit/vegetables} => {rolls/buns}       0.001202887 0.1417323 
## [8]  {detergent}                 => {yogurt}           0.001069233 0.1240310 
## [9]  {detergent}                 => {rolls/buns}       0.001002406 0.1162791 
## [10] {detergent}                 => {whole milk}       0.001403368 0.1627907 
## [11] {semi-finished bread}       => {other vegetables} 0.001002406 0.1056338 
## [12] {semi-finished bread}       => {whole milk}       0.001670676 0.1760563 
## [13] {red/blush wine}            => {rolls/buns}       0.001336541 0.1273885 
## [14] {red/blush wine}            => {other vegetables} 0.001136060 0.1082803 
## [15] {flour}                     => {tropical fruit}   0.001069233 0.1095890 
## [16] {flour}                     => {whole milk}       0.001336541 0.1369863 
## [17] {herbs}                     => {yogurt}           0.001136060 0.1075949 
## [18] {herbs}                     => {whole milk}       0.001136060 0.1075949 
## [19] {processed cheese}          => {root vegetables}  0.001069233 0.1052632 
## [20] {processed cheese}          => {rolls/buns}       0.001470195 0.1447368 
##      coverage    lift      count
## [1]  0.006816359 0.9933534 16   
## [2]  0.007083667 1.2864807 15   
## [3]  0.007818765 1.0500611 15   
## [4]  0.007818765 0.8118754 15   
## [5]  0.008086073 0.8373723 16   
## [6]  0.008954825 0.7088763 15   
## [7]  0.008487036 1.2885066 18   
## [8]  0.008620690 1.4443580 16   
## [9]  0.008620690 1.0571081 15   
## [10] 0.008620690 1.0308929 21   
## [11] 0.009489441 0.8651911 15   
## [12] 0.009489441 1.1148993 25   
## [13] 0.010491847 1.1581057 20   
## [14] 0.010491847 0.8868668 17   
## [15] 0.009756750 1.6172489 16   
## [16] 0.009756750 0.8674833 20   
## [17] 0.010558674 1.2529577 17   
## [18] 0.010558674 0.6813587 17   
## [19] 0.010157712 1.5131200 16   
## [20] 0.010157712 1.3158214 22
### Visualizing the rules using two-key scatter plot
plot(rules,jitter=2, method = "two-key plot")

The above plot clearly shows strong inverse correlation between order and support.

plot(rules, measure=c("support", "lift"), shading = "confidence", engine="ggplot2")

## Rules sorted by confidence.
rules_conf <- sort (rules, by="confidence", decreasing=TRUE) 
inspect(head(rules_conf,20))
##      lhs                            rhs          support     confidence
## [1]  {sausage,yogurt}            => {whole milk} 0.001470195 0.2558140 
## [2]  {rolls/buns,sausage}        => {whole milk} 0.001136060 0.2125000 
## [3]  {sausage,soda}              => {whole milk} 0.001069233 0.1797753 
## [4]  {semi-finished bread}       => {whole milk} 0.001670676 0.1760563 
## [5]  {rolls/buns,yogurt}         => {whole milk} 0.001336541 0.1709402 
## [6]  {sausage,whole milk}        => {yogurt}     0.001470195 0.1641791 
## [7]  {detergent}                 => {whole milk} 0.001403368 0.1627907 
## [8]  {ham}                       => {whole milk} 0.002739909 0.1601562 
## [9]  {bottled beer}              => {whole milk} 0.007150495 0.1578171 
## [10] {frozen fish}               => {whole milk} 0.001069233 0.1568627 
## [11] {candy}                     => {whole milk} 0.002138466 0.1488372 
## [12] {sausage}                   => {whole milk} 0.008954825 0.1483942 
## [13] {onions}                    => {whole milk} 0.002940390 0.1452145 
## [14] {processed cheese}          => {rolls/buns} 0.001470195 0.1447368 
## [15] {processed cheese}          => {whole milk} 0.001470195 0.1447368 
## [16] {newspapers}                => {whole milk} 0.005613472 0.1443299 
## [17] {domestic eggs}             => {whole milk} 0.005279337 0.1423423 
## [18] {packaged fruit/vegetables} => {rolls/buns} 0.001202887 0.1417323 
## [19] {seasonal products}         => {rolls/buns} 0.001002406 0.1415094 
## [20] {cat food}                  => {whole milk} 0.001670676 0.1412429 
##      coverage    lift      count
## [1]  0.005747126 1.6199746  22  
## [2]  0.005346164 1.3456835  17  
## [3]  0.005947608 1.1384500  16  
## [4]  0.009489441 1.1148993  25  
## [5]  0.007818765 1.0825005  20  
## [6]  0.008954825 1.9118880  22  
## [7]  0.008620690 1.0308929  21  
## [8]  0.017107725 1.0142100  41  
## [9]  0.045308741 0.9993970 107  
## [10] 0.006816359 0.9933534  16  
## [11] 0.014367816 0.9425307  32  
## [12] 0.060344828 0.9397255 134  
## [13] 0.020248597 0.9195895  44  
## [14] 0.010157712 1.3158214  22  
## [15] 0.010157712 0.9165646  22  
## [16] 0.038893344 0.9139875  84  
## [17] 0.037089014 0.9014011  79  
## [18] 0.008487036 1.2885066  18  
## [19] 0.007083667 1.2864807  15  
## [20] 0.011828388 0.8944390  25
## Rules sorted by Highest Lift.
rules_lift <- sort (rules, by="lift", decreasing=TRUE) 
inspect(head(rules_lift,20))
##      lhs                            rhs                support     confidence
## [1]  {whole milk,yogurt}         => {sausage}          0.001470195 0.1317365 
## [2]  {sausage,whole milk}        => {yogurt}           0.001470195 0.1641791 
## [3]  {sausage,yogurt}            => {whole milk}       0.001470195 0.2558140 
## [4]  {flour}                     => {tropical fruit}   0.001069233 0.1095890 
## [5]  {processed cheese}          => {root vegetables}  0.001069233 0.1052632 
## [6]  {soft cheese}               => {yogurt}           0.001269714 0.1266667 
## [7]  {detergent}                 => {yogurt}           0.001069233 0.1240310 
## [8]  {chewing gum}               => {yogurt}           0.001403368 0.1166667 
## [9]  {rolls/buns,sausage}        => {whole milk}       0.001136060 0.2125000 
## [10] {processed cheese}          => {rolls/buns}       0.001470195 0.1447368 
## [11] {packaged fruit/vegetables} => {rolls/buns}       0.001202887 0.1417323 
## [12] {seasonal products}         => {rolls/buns}       0.001002406 0.1415094 
## [13] {herbs}                     => {yogurt}           0.001136060 0.1075949 
## [14] {oil}                       => {soda}             0.001804330 0.1210762 
## [15] {sausage,whole milk}        => {soda}             0.001069233 0.1194030 
## [16] {beverages}                 => {soda}             0.001871157 0.1129032 
## [17] {red/blush wine}            => {rolls/buns}       0.001336541 0.1273885 
## [18] {sausage,whole milk}        => {rolls/buns}       0.001136060 0.1268657 
## [19] {rolls/buns,soda}           => {other vegetables} 0.001136060 0.1404959 
## [20] {sausage,soda}              => {whole milk}       0.001069233 0.1797753 
##      coverage    lift     count
## [1]  0.011160118 2.183062 22   
## [2]  0.008954825 1.911888 22   
## [3]  0.005747126 1.619975 22   
## [4]  0.009756750 1.617249 16   
## [5]  0.010157712 1.513120 16   
## [6]  0.010024058 1.475051 19   
## [7]  0.008620690 1.444358 16   
## [8]  0.012028869 1.358599 21   
## [9]  0.005346164 1.345683 17   
## [10] 0.010157712 1.315821 22   
## [11] 0.008487036 1.288507 18   
## [12] 0.007083667 1.286481 15   
## [13] 0.010558674 1.252958 17   
## [14] 0.014902433 1.246927 27   
## [15] 0.008954825 1.229695 16   
## [16] 0.016573109 1.162756 28   
## [17] 0.010491847 1.158106 20   
## [18] 0.008954825 1.153352 17   
## [19] 0.008086073 1.150728 17   
## [20] 0.005947608 1.138450 16

We can easily observe that there is strong association in {whole milk,yogurt} => {sausage} as it has high Lift of > 2 and good confidence score also.

Hence there are higher chances of a customer buying sausage if he has purchased whole milk & yogurt.

We will reduce the number of rules by filtering out all rules with very low confidence score (median score of 0.12).

subrules <- rules[quality(rules)$confidence > 0.12]
subrules
## set of 71 rules
head(inspect(subrules))
##      lhs                            rhs                support     confidence
## [1]  {frozen fish}               => {whole milk}       0.001069233 0.1568627 
## [2]  {seasonal products}         => {rolls/buns}       0.001002406 0.1415094 
## [3]  {pot plants}                => {other vegetables} 0.001002406 0.1282051 
## [4]  {pot plants}                => {whole milk}       0.001002406 0.1282051 
## [5]  {pasta}                     => {whole milk}       0.001069233 0.1322314 
## [6]  {packaged fruit/vegetables} => {rolls/buns}       0.001202887 0.1417323 
## [7]  {detergent}                 => {yogurt}           0.001069233 0.1240310 
## [8]  {detergent}                 => {whole milk}       0.001403368 0.1627907 
## [9]  {semi-finished bread}       => {whole milk}       0.001670676 0.1760563 
## [10] {red/blush wine}            => {rolls/buns}       0.001336541 0.1273885 
## [11] {flour}                     => {whole milk}       0.001336541 0.1369863 
## [12] {processed cheese}          => {rolls/buns}       0.001470195 0.1447368 
## [13] {processed cheese}          => {whole milk}       0.001470195 0.1447368 
## [14] {soft cheese}               => {yogurt}           0.001269714 0.1266667 
## [15] {cat food}                  => {whole milk}       0.001670676 0.1412429 
## [16] {chewing gum}               => {whole milk}       0.001670676 0.1388889 
## [17] {hygiene articles}          => {whole milk}       0.001737503 0.1268293 
## [18] {candy}                     => {whole milk}       0.002138466 0.1488372 
## [19] {ice cream}                 => {whole milk}       0.001937984 0.1277533 
## [20] {grapes}                    => {whole milk}       0.001937984 0.1342593 
## [21] {oil}                       => {soda}             0.001804330 0.1210762 
## [22] {oil}                       => {other vegetables} 0.001804330 0.1210762 
## [23] {oil}                       => {whole milk}       0.001937984 0.1300448 
## [24] {hard cheese}               => {whole milk}       0.001871157 0.1272727 
## [25] {meat}                      => {other vegetables} 0.002138466 0.1269841 
## [26] {meat}                      => {whole milk}       0.002205293 0.1309524 
## [27] {ham}                       => {whole milk}       0.002739909 0.1601562 
## [28] {frozen meals}              => {other vegetables} 0.002138466 0.1274900 
## [29] {sugar}                     => {whole milk}       0.002472601 0.1396226 
## [30] {long life bakery product}  => {whole milk}       0.002405774 0.1343284 
## [31] {waffles}                   => {whole milk}       0.002606255 0.1407942 
## [32] {onions}                    => {whole milk}       0.002940390 0.1452145 
## [33] {berries}                   => {other vegetables} 0.002673082 0.1226994 
## [34] {hamburger meat}            => {whole milk}       0.003074044 0.1406728 
## [35] {cream cheese}              => {whole milk}       0.002873563 0.1214689 
## [36] {chocolate}                 => {whole milk}       0.002940390 0.1246459 
## [37] {white bread}               => {whole milk}       0.003140871 0.1309192 
## [38] {chicken}                   => {whole milk}       0.003408180 0.1223022 
## [39] {frozen vegetables}         => {whole milk}       0.003809142 0.1360382 
## [40] {coffee}                    => {whole milk}       0.003809142 0.1205074 
## [41] {margarine}                 => {whole milk}       0.004076450 0.1265560 
## [42] {beef}                      => {whole milk}       0.004677894 0.1377953 
## [43] {fruit/vegetable juice}     => {whole milk}       0.004410585 0.1296660 
## [44] {curd}                      => {whole milk}       0.004143277 0.1230159 
## [45] {butter}                    => {whole milk}       0.004677894 0.1328273 
## [46] {pork}                      => {whole milk}       0.005012029 0.1351351 
## [47] {domestic eggs}             => {whole milk}       0.005279337 0.1423423 
## [48] {newspapers}                => {whole milk}       0.005613472 0.1443299 
## [49] {frankfurter}               => {other vegetables} 0.005145683 0.1362832 
## [50] {frankfurter}               => {whole milk}       0.005279337 0.1398230 
## [51] {bottled beer}              => {whole milk}       0.007150495 0.1578171 
## [52] {canned beer}               => {whole milk}       0.006014435 0.1282051 
## [53] {shopping bags}             => {whole milk}       0.006348570 0.1334270 
## [54] {pip fruit}                 => {whole milk}       0.006615878 0.1348774 
## [55] {pastry}                    => {whole milk}       0.006482224 0.1253230 
## [56] {citrus fruit}              => {whole milk}       0.007150495 0.1345912 
## [57] {sausage}                   => {whole milk}       0.008954825 0.1483942 
## [58] {tropical fruit}            => {whole milk}       0.008219727 0.1213018 
## [59] {yogurt}                    => {whole milk}       0.011160118 0.1299611 
## [60] {rolls/buns}                => {whole milk}       0.013966854 0.1269745 
## [61] {other vegetables}          => {whole milk}       0.014835605 0.1215107 
## [62] {sausage,yogurt}            => {whole milk}       0.001470195 0.2558140 
## [63] {sausage,whole milk}        => {yogurt}           0.001470195 0.1641791 
## [64] {whole milk,yogurt}         => {sausage}          0.001470195 0.1317365 
## [65] {sausage,soda}              => {whole milk}       0.001069233 0.1797753 
## [66] {rolls/buns,sausage}        => {whole milk}       0.001136060 0.2125000 
## [67] {sausage,whole milk}        => {rolls/buns}       0.001136060 0.1268657 
## [68] {rolls/buns,yogurt}         => {whole milk}       0.001336541 0.1709402 
## [69] {other vegetables,yogurt}   => {whole milk}       0.001136060 0.1404959 
## [70] {rolls/buns,soda}           => {other vegetables} 0.001136060 0.1404959 
## [71] {rolls/buns,soda}           => {whole milk}       0.001002406 0.1239669 
##      coverage    lift      count
## [1]  0.006816359 0.9933534  16  
## [2]  0.007083667 1.2864807  15  
## [3]  0.007818765 1.0500611  15  
## [4]  0.007818765 0.8118754  15  
## [5]  0.008086073 0.8373723  16  
## [6]  0.008487036 1.2885066  18  
## [7]  0.008620690 1.4443580  16  
## [8]  0.008620690 1.0308929  21  
## [9]  0.009489441 1.1148993  25  
## [10] 0.010491847 1.1581057  20  
## [11] 0.009756750 0.8674833  20  
## [12] 0.010157712 1.3158214  22  
## [13] 0.010157712 0.9165646  22  
## [14] 0.010024058 1.4750506  19  
## [15] 0.011828388 0.8944390  25  
## [16] 0.012028869 0.8795317  25  
## [17] 0.013699546 0.8031626  26  
## [18] 0.014367816 0.9425307  32  
## [19] 0.015169741 0.8090142  29  
## [20] 0.014434643 0.8502139  29  
## [21] 0.014902433 1.2469269  27  
## [22] 0.014902433 0.9916720  27  
## [23] 0.014902433 0.8235256  29  
## [24] 0.014701951 0.8059708  28  
## [25] 0.016840417 1.0400605  32  
## [26] 0.016840417 0.8292727  33  
## [27] 0.017107725 1.0142100  41  
## [28] 0.016773590 1.0442041  32  
## [29] 0.017709169 0.8841783  37  
## [30] 0.017909650 0.8506515  36  
## [31] 0.018511093 0.8915974  39  
## [32] 0.020248597 0.9195895  44  
## [33] 0.021785619 1.0049664  40  
## [34] 0.021852446 0.8908284  46  
## [35] 0.023656776 0.7692175  43  
## [36] 0.023589949 0.7893361  44  
## [37] 0.023990912 0.8290627  47  
## [38] 0.027866881 0.7744941  51  
## [39] 0.028000535 0.8614792  57  
## [40] 0.031609195 0.7631285  57  
## [41] 0.032210639 0.8014322  61  
## [42] 0.033948142 0.8726062  70  
## [43] 0.034014969 0.8211266  66  
## [44] 0.033680834 0.7790138  62  
## [45] 0.035217856 0.8411460  70  
## [46] 0.037089014 0.8557605  75  
## [47] 0.037089014 0.9014011  79  
## [48] 0.038893344 0.9139875  84  
## [49] 0.037757284 1.1162242  77  
## [50] 0.037757284 0.8854471  79  
## [51] 0.045308741 0.9993970 107  
## [52] 0.046912590 0.8118754  90  
## [53] 0.047580861 0.8449433  95  
## [54] 0.049051056 0.8541283  99  
## [55] 0.051724138 0.7936239  97  
## [56] 0.053127506 0.8523160 107  
## [57] 0.060344828 0.9397255 134  
## [58] 0.067762630 0.7681590 123  
## [59] 0.085872761 0.8229952 167  
## [60] 0.109997327 0.8040822 209  
## [61] 0.122093023 0.7694819 222  
## [62] 0.005747126 1.6199746  22  
## [63] 0.008954825 1.9118880  22  
## [64] 0.011160118 2.1830624  22  
## [65] 0.005947608 1.1384500  16  
## [66] 0.005346164 1.3456835  17  
## [67] 0.008954825 1.1533523  17  
## [68] 0.007818765 1.0825005  20  
## [69] 0.008086073 0.8897081  17  
## [70] 0.008086073 1.1507281  17  
## [71] 0.008086073 0.7850365  15
plot(subrules, method = "grouped", control = list(k = 50),engine="grid")

This clearly shows a strong association between {whole milk, yougurt} & {sausage}.

## Removing Redundancy
redundent <- is.redundant(rules, measure="lift")
which(redundent)
## [1] 129 130 131
rules.pruned <- rules[!redundent]
inspect(rules.pruned)
##       lhs                              rhs                support    
## [1]   {frozen fish}                 => {whole milk}       0.001069233
## [2]   {seasonal products}           => {rolls/buns}       0.001002406
## [3]   {pot plants}                  => {other vegetables} 0.001002406
## [4]   {pot plants}                  => {whole milk}       0.001002406
## [5]   {pasta}                       => {whole milk}       0.001069233
## [6]   {pickled vegetables}          => {whole milk}       0.001002406
## [7]   {packaged fruit/vegetables}   => {rolls/buns}       0.001202887
## [8]   {detergent}                   => {yogurt}           0.001069233
## [9]   {detergent}                   => {rolls/buns}       0.001002406
## [10]  {detergent}                   => {whole milk}       0.001403368
## [11]  {semi-finished bread}         => {other vegetables} 0.001002406
## [12]  {semi-finished bread}         => {whole milk}       0.001670676
## [13]  {red/blush wine}              => {rolls/buns}       0.001336541
## [14]  {red/blush wine}              => {other vegetables} 0.001136060
## [15]  {flour}                       => {tropical fruit}   0.001069233
## [16]  {flour}                       => {whole milk}       0.001336541
## [17]  {herbs}                       => {yogurt}           0.001136060
## [18]  {herbs}                       => {whole milk}       0.001136060
## [19]  {processed cheese}            => {root vegetables}  0.001069233
## [20]  {processed cheese}            => {rolls/buns}       0.001470195
## [21]  {processed cheese}            => {whole milk}       0.001470195
## [22]  {soft cheese}                 => {yogurt}           0.001269714
## [23]  {soft cheese}                 => {rolls/buns}       0.001002406
## [24]  {soft cheese}                 => {other vegetables} 0.001202887
## [25]  {soft cheese}                 => {whole milk}       0.001202887
## [26]  {white wine}                  => {whole milk}       0.001269714
## [27]  {cat food}                    => {whole milk}       0.001670676
## [28]  {chewing gum}                 => {yogurt}           0.001403368
## [29]  {chewing gum}                 => {whole milk}       0.001670676
## [30]  {specialty bar}               => {other vegetables} 0.001670676
## [31]  {specialty bar}               => {whole milk}       0.001670676
## [32]  {hygiene articles}            => {other vegetables} 0.001403368
## [33]  {hygiene articles}            => {whole milk}       0.001737503
## [34]  {candy}                       => {rolls/buns}       0.001470195
## [35]  {candy}                       => {whole milk}       0.002138466
## [36]  {sliced cheese}               => {other vegetables} 0.001403368
## [37]  {sliced cheese}               => {whole milk}       0.001470195
## [38]  {ice cream}                   => {rolls/buns}       0.001737503
## [39]  {ice cream}                   => {whole milk}       0.001937984
## [40]  {grapes}                      => {other vegetables} 0.001603849
## [41]  {grapes}                      => {whole milk}       0.001937984
## [42]  {oil}                         => {soda}             0.001804330
## [43]  {oil}                         => {other vegetables} 0.001804330
## [44]  {oil}                         => {whole milk}       0.001937984
## [45]  {hard cheese}                 => {rolls/buns}       0.001670676
## [46]  {hard cheese}                 => {other vegetables} 0.001670676
## [47]  {hard cheese}                 => {whole milk}       0.001871157
## [48]  {specialty chocolate}         => {other vegetables} 0.001670676
## [49]  {meat}                        => {other vegetables} 0.002138466
## [50]  {meat}                        => {whole milk}       0.002205293
## [51]  {beverages}                   => {soda}             0.001871157
## [52]  {beverages}                   => {other vegetables} 0.001737503
## [53]  {beverages}                   => {whole milk}       0.001937984
## [54]  {ham}                         => {whole milk}       0.002739909
## [55]  {frozen meals}                => {other vegetables} 0.002138466
## [56]  {frozen meals}                => {whole milk}       0.001937984
## [57]  {sugar}                       => {whole milk}       0.002472601
## [58]  {long life bakery product}    => {whole milk}       0.002405774
## [59]  {waffles}                     => {whole milk}       0.002606255
## [60]  {salty snack}                 => {rolls/buns}       0.001937984
## [61]  {salty snack}                 => {other vegetables} 0.002205293
## [62]  {salty snack}                 => {whole milk}       0.001937984
## [63]  {onions}                      => {whole milk}       0.002940390
## [64]  {UHT-milk}                    => {other vegetables} 0.002138466
## [65]  {UHT-milk}                    => {whole milk}       0.002539428
## [66]  {berries}                     => {other vegetables} 0.002673082
## [67]  {berries}                     => {whole milk}       0.002272120
## [68]  {hamburger meat}              => {other vegetables} 0.002205293
## [69]  {hamburger meat}              => {whole milk}       0.003074044
## [70]  {dessert}                     => {whole milk}       0.002405774
## [71]  {napkins}                     => {whole milk}       0.002405774
## [72]  {cream cheese}                => {whole milk}       0.002873563
## [73]  {chocolate}                   => {rolls/buns}       0.002806736
## [74]  {chocolate}                   => {whole milk}       0.002940390
## [75]  {white bread}                 => {other vegetables} 0.002606255
## [76]  {white bread}                 => {whole milk}       0.003140871
## [77]  {chicken}                     => {rolls/buns}       0.002873563
## [78]  {chicken}                     => {whole milk}       0.003408180
## [79]  {frozen vegetables}           => {other vegetables} 0.003140871
## [80]  {frozen vegetables}           => {whole milk}       0.003809142
## [81]  {coffee}                      => {whole milk}       0.003809142
## [82]  {margarine}                   => {whole milk}       0.004076450
## [83]  {beef}                        => {whole milk}       0.004677894
## [84]  {fruit/vegetable juice}       => {rolls/buns}       0.003742315
## [85]  {fruit/vegetable juice}       => {whole milk}       0.004410585
## [86]  {curd}                        => {other vegetables} 0.003541834
## [87]  {curd}                        => {whole milk}       0.004143277
## [88]  {butter}                      => {whole milk}       0.004677894
## [89]  {pork}                        => {other vegetables} 0.003942796
## [90]  {pork}                        => {whole milk}       0.005012029
## [91]  {domestic eggs}               => {whole milk}       0.005279337
## [92]  {brown bread}                 => {whole milk}       0.004477412
## [93]  {newspapers}                  => {whole milk}       0.005613472
## [94]  {frankfurter}                 => {other vegetables} 0.005145683
## [95]  {frankfurter}                 => {whole milk}       0.005279337
## [96]  {whipped/sour cream}          => {whole milk}       0.004611067
## [97]  {bottled beer}                => {other vegetables} 0.004677894
## [98]  {bottled beer}                => {whole milk}       0.007150495
## [99]  {canned beer}                 => {whole milk}       0.006014435
## [100] {shopping bags}               => {other vegetables} 0.004945202
## [101] {shopping bags}               => {whole milk}       0.006348570
## [102] {pip fruit}                   => {rolls/buns}       0.004945202
## [103] {pip fruit}                   => {other vegetables} 0.004945202
## [104] {pip fruit}                   => {whole milk}       0.006615878
## [105] {pastry}                      => {whole milk}       0.006482224
## [106] {citrus fruit}                => {whole milk}       0.007150495
## [107] {bottled water}               => {whole milk}       0.007150495
## [108] {sausage}                     => {whole milk}       0.008954825
## [109] {root vegetables}             => {whole milk}       0.007551457
## [110] {tropical fruit}              => {whole milk}       0.008219727
## [111] {yogurt}                      => {whole milk}       0.011160118
## [112] {soda}                        => {whole milk}       0.011627907
## [113] {rolls/buns}                  => {whole milk}       0.013966854
## [114] {other vegetables}            => {whole milk}       0.014835605
## [115] {sausage,yogurt}              => {whole milk}       0.001470195
## [116] {sausage,whole milk}          => {yogurt}           0.001470195
## [117] {whole milk,yogurt}           => {sausage}          0.001470195
## [118] {sausage,soda}                => {whole milk}       0.001069233
## [119] {sausage,whole milk}          => {soda}             0.001069233
## [120] {rolls/buns,sausage}          => {whole milk}       0.001136060
## [121] {sausage,whole milk}          => {rolls/buns}       0.001136060
## [122] {rolls/buns,yogurt}           => {whole milk}       0.001336541
## [123] {whole milk,yogurt}           => {rolls/buns}       0.001336541
## [124] {other vegetables,yogurt}     => {whole milk}       0.001136060
## [125] {whole milk,yogurt}           => {other vegetables} 0.001136060
## [126] {rolls/buns,soda}             => {other vegetables} 0.001136060
## [127] {other vegetables,soda}       => {rolls/buns}       0.001136060
## [128] {other vegetables,rolls/buns} => {soda}             0.001136060
##       confidence coverage    lift      count
## [1]   0.1568627  0.006816359 0.9933534  16  
## [2]   0.1415094  0.007083667 1.2864807  15  
## [3]   0.1282051  0.007818765 1.0500611  15  
## [4]   0.1282051  0.007818765 0.8118754  15  
## [5]   0.1322314  0.008086073 0.8373723  16  
## [6]   0.1119403  0.008954825 0.7088763  15  
## [7]   0.1417323  0.008487036 1.2885066  18  
## [8]   0.1240310  0.008620690 1.4443580  16  
## [9]   0.1162791  0.008620690 1.0571081  15  
## [10]  0.1627907  0.008620690 1.0308929  21  
## [11]  0.1056338  0.009489441 0.8651911  15  
## [12]  0.1760563  0.009489441 1.1148993  25  
## [13]  0.1273885  0.010491847 1.1581057  20  
## [14]  0.1082803  0.010491847 0.8868668  17  
## [15]  0.1095890  0.009756750 1.6172489  16  
## [16]  0.1369863  0.009756750 0.8674833  20  
## [17]  0.1075949  0.010558674 1.2529577  17  
## [18]  0.1075949  0.010558674 0.6813587  17  
## [19]  0.1052632  0.010157712 1.5131200  16  
## [20]  0.1447368  0.010157712 1.3158214  22  
## [21]  0.1447368  0.010157712 0.9165646  22  
## [22]  0.1266667  0.010024058 1.4750506  19  
## [23]  0.1000000  0.010024058 0.9091130  15  
## [24]  0.1200000  0.010024058 0.9828571  18  
## [25]  0.1200000  0.010024058 0.7599154  18  
## [26]  0.1085714  0.011694734 0.6875425  19  
## [27]  0.1412429  0.011828388 0.8944390  25  
## [28]  0.1166667  0.012028869 1.3585992  21  
## [29]  0.1388889  0.012028869 0.8795317  25  
## [30]  0.1196172  0.013966854 0.9797220  25  
## [31]  0.1196172  0.013966854 0.7574914  25  
## [32]  0.1024390  0.013699546 0.8390244  21  
## [33]  0.1268293  0.013699546 0.8031626  26  
## [34]  0.1023256  0.014367816 0.9302552  22  
## [35]  0.1488372  0.014367816 0.9425307  32  
## [36]  0.1000000  0.014033681 0.8190476  21  
## [37]  0.1047619  0.014033681 0.6634182  22  
## [38]  0.1145374  0.015169741 1.0412748  26  
## [39]  0.1277533  0.015169741 0.8090142  29  
## [40]  0.1111111  0.014434643 0.9100529  24  
## [41]  0.1342593  0.014434643 0.8502139  29  
## [42]  0.1210762  0.014902433 1.2469269  27  
## [43]  0.1210762  0.014902433 0.9916720  27  
## [44]  0.1300448  0.014902433 0.8235256  29  
## [45]  0.1136364  0.014701951 1.0330830  25  
## [46]  0.1136364  0.014701951 0.9307359  25  
## [47]  0.1272727  0.014701951 0.8059708  28  
## [48]  0.1046025  0.015971665 0.8567444  25  
## [49]  0.1269841  0.016840417 1.0400605  32  
## [50]  0.1309524  0.016840417 0.8292727  33  
## [51]  0.1129032  0.016573109 1.1627556  28  
## [52]  0.1048387  0.016573109 0.8586790  26  
## [53]  0.1169355  0.016573109 0.7405089  29  
## [54]  0.1601562  0.017107725 1.0142100  41  
## [55]  0.1274900  0.016773590 1.0442041  32  
## [56]  0.1155378  0.016773590 0.7316582  29  
## [57]  0.1396226  0.017709169 0.8841783  37  
## [58]  0.1343284  0.017909650 0.8506515  36  
## [59]  0.1407942  0.018511093 0.8915974  39  
## [60]  0.1032028  0.018778401 0.9382305  29  
## [61]  0.1174377  0.018778401 0.9618709  33  
## [62]  0.1032028  0.018778401 0.6535452  29  
## [63]  0.1452145  0.020248597 0.9195895  44  
## [64]  0.1000000  0.021384657 0.8190476  32  
## [65]  0.1187500  0.021384657 0.7519996  38  
## [66]  0.1226994  0.021785619 1.0049664  40  
## [67]  0.1042945  0.021785619 0.6604581  34  
## [68]  0.1009174  0.021852446 0.8265618  33  
## [69]  0.1406728  0.021852446 0.8908284  46  
## [70]  0.1019830  0.023589949 0.6458204  36  
## [71]  0.1087613  0.022119754 0.6887450  36  
## [72]  0.1214689  0.023656776 0.7692175  43  
## [73]  0.1189802  0.023589949 1.0816642  42  
## [74]  0.1246459  0.023589949 0.7893361  44  
## [75]  0.1086351  0.023990912 0.8897732  39  
## [76]  0.1309192  0.023990912 0.8290627  47  
## [77]  0.1031175  0.027866881 0.9374547  43  
## [78]  0.1223022  0.027866881 0.7744941  51  
## [79]  0.1121718  0.028000535 0.9187408  47  
## [80]  0.1360382  0.028000535 0.8614792  57  
## [81]  0.1205074  0.031609195 0.7631285  57  
## [82]  0.1265560  0.032210639 0.8014322  61  
## [83]  0.1377953  0.033948142 0.8726062  70  
## [84]  0.1100196  0.034014969 1.0002029  56  
## [85]  0.1296660  0.034014969 0.8211266  66  
## [86]  0.1051587  0.033680834 0.8613001  53  
## [87]  0.1230159  0.033680834 0.7790138  62  
## [88]  0.1328273  0.035217856 0.8411460  70  
## [89]  0.1063063  0.037089014 0.8706993  59  
## [90]  0.1351351  0.037089014 0.8557605  75  
## [91]  0.1423423  0.037089014 0.9014011  79  
## [92]  0.1190053  0.037623630 0.7536165  67  
## [93]  0.1443299  0.038893344 0.9139875  84  
## [94]  0.1362832  0.037757284 1.1162242  77  
## [95]  0.1398230  0.037757284 0.8854471  79  
## [96]  0.1055046  0.043704892 0.6681213  69  
## [97]  0.1032448  0.045308741 0.8456244  70  
## [98]  0.1578171  0.045308741 0.9993970 107  
## [99]  0.1282051  0.046912590 0.8118754  90  
## [100] 0.1039326  0.047580861 0.8512574  74  
## [101] 0.1334270  0.047580861 0.8449433  95  
## [102] 0.1008174  0.049051056 0.9165444  74  
## [103] 0.1008174  0.049051056 0.8257428  74  
## [104] 0.1348774  0.049051056 0.8541283  99  
## [105] 0.1253230  0.051724138 0.7936239  97  
## [106] 0.1345912  0.053127506 0.8523160 107  
## [107] 0.1178414  0.060678963 0.7462458 107  
## [108] 0.1483942  0.060344828 0.9397255 134  
## [109] 0.1085495  0.069566961 0.6874034 113  
## [110] 0.1213018  0.067762630 0.7681590 123  
## [111] 0.1299611  0.085872761 0.8229952 167  
## [112] 0.1197522  0.097099706 0.7583464 174  
## [113] 0.1269745  0.109997327 0.8040822 209  
## [114] 0.1215107  0.122093023 0.7694819 222  
## [115] 0.2558140  0.005747126 1.6199746  22  
## [116] 0.1641791  0.008954825 1.9118880  22  
## [117] 0.1317365  0.011160118 2.1830624  22  
## [118] 0.1797753  0.005947608 1.1384500  16  
## [119] 0.1194030  0.008954825 1.2296946  16  
## [120] 0.2125000  0.005346164 1.3456835  17  
## [121] 0.1268657  0.008954825 1.1533523  17  
## [122] 0.1709402  0.007818765 1.0825005  20  
## [123] 0.1197605  0.011160118 1.0887581  20  
## [124] 0.1404959  0.008086073 0.8897081  17  
## [125] 0.1017964  0.011160118 0.8337610  17  
## [126] 0.1404959  0.008086073 1.1507281  17  
## [127] 0.1172414  0.009689922 1.0658566  17  
## [128] 0.1075949  0.010558674 1.1080872  17

Hence, there are no redundant rules now.

Graph-based visualization offers a very clear representation of rules but they tend to easily become cluttered and thus are only viable for very small sets of rules.

plot(rules.pruned, method = "graph")
## Warning: Too many rules supplied. Only plotting the best 100 rules using lift
## (change control parameter max if needed)

For the following plots we select the 10 rules with the highest lift.

subrules2 <- head(rules.pruned, n = 10, by = "lift")
inspect(subrules2)
##      lhs                     rhs               support     confidence
## [1]  {whole milk,yogurt}  => {sausage}         0.001470195 0.1317365 
## [2]  {sausage,whole milk} => {yogurt}          0.001470195 0.1641791 
## [3]  {sausage,yogurt}     => {whole milk}      0.001470195 0.2558140 
## [4]  {flour}              => {tropical fruit}  0.001069233 0.1095890 
## [5]  {processed cheese}   => {root vegetables} 0.001069233 0.1052632 
## [6]  {soft cheese}        => {yogurt}          0.001269714 0.1266667 
## [7]  {detergent}          => {yogurt}          0.001069233 0.1240310 
## [8]  {chewing gum}        => {yogurt}          0.001403368 0.1166667 
## [9]  {rolls/buns,sausage} => {whole milk}      0.001136060 0.2125000 
## [10] {processed cheese}   => {rolls/buns}      0.001470195 0.1447368 
##      coverage    lift     count
## [1]  0.011160118 2.183062 22   
## [2]  0.008954825 1.911888 22   
## [3]  0.005747126 1.619975 22   
## [4]  0.009756750 1.617249 16   
## [5]  0.010157712 1.513120 16   
## [6]  0.010024058 1.475051 19   
## [7]  0.008620690 1.444358 16   
## [8]  0.012028869 1.358599 21   
## [9]  0.005346164 1.345683 17   
## [10] 0.010157712 1.315821 22
plot(subrules2, method = "graph", engine="igraph")

## conditional rules
subrules3 <- rules.pruned[quality(rules.pruned)$confidence > 0.12 & quality(rules.pruned)$lift > 1.2]
subrules3 <- head(subrules3, n = 10, by = c("lift","confidence"))
inspect(subrules3)
##      lhs                            rhs          support     confidence
## [1]  {whole milk,yogurt}         => {sausage}    0.001470195 0.1317365 
## [2]  {sausage,whole milk}        => {yogurt}     0.001470195 0.1641791 
## [3]  {sausage,yogurt}            => {whole milk} 0.001470195 0.2558140 
## [4]  {soft cheese}               => {yogurt}     0.001269714 0.1266667 
## [5]  {detergent}                 => {yogurt}     0.001069233 0.1240310 
## [6]  {rolls/buns,sausage}        => {whole milk} 0.001136060 0.2125000 
## [7]  {processed cheese}          => {rolls/buns} 0.001470195 0.1447368 
## [8]  {packaged fruit/vegetables} => {rolls/buns} 0.001202887 0.1417323 
## [9]  {seasonal products}         => {rolls/buns} 0.001002406 0.1415094 
## [10] {oil}                       => {soda}       0.001804330 0.1210762 
##      coverage    lift     count
## [1]  0.011160118 2.183062 22   
## [2]  0.008954825 1.911888 22   
## [3]  0.005747126 1.619975 22   
## [4]  0.010024058 1.475051 19   
## [5]  0.008620690 1.444358 16   
## [6]  0.005346164 1.345683 17   
## [7]  0.010157712 1.315821 22   
## [8]  0.008487036 1.288507 18   
## [9]  0.007083667 1.286481 15   
## [10] 0.014902433 1.246927 27
plot(subrules3, method = "graph",engine="igraph")

plot(subrules3, method = "paracoord", control=list(reorder=TRUE))

We have done market basket analysis using Association rule mining with Apriori Algorithm. The dataset used in the notebook can be downloaded from

https://www.kaggle.com/heeraldedhia/groceries-dataset.

Based on the association rule mining techniques we have observed that due to higher lift and confidence scores following items have higher affinity and to be purchased together.

  1. {whole milk,yogurt} => {sausage}
  2. {soft cheese} => {yogurt}
  3. {detergent} => {yogurt}
  4. {rolls/buns,sausage} => {whole milk}
  5. {processed cheese} => {rolls/buns}
  6. {packaged fruit/vegetables} => {rolls/buns}
  7. {seasonal products} => {rolls/buns}
  8. {oil} => {soda}

References:

  1. aRulesViz Documentation: https://cran.r-project.org/web/packages/arulesViz/index.html
  2. aRules Vignettes: Introduction to arules – A computational environment for mining association rules and frequent item sets https://cran.r-project.org/web/packages/arules/vignettes/arules.pdf