Introdution

library(arules)
library(arulesViz)
library(dplyr)


data <- read.transactions('GroceryDataSet.csv', format = 'basket', sep=',')
data
## transactions in sparse format with
##  9835 transactions (rows) and
##  169 items (columns)
summary(data)
## transactions as itemMatrix in sparse format with
##  9835 rows (elements/itemsets/transactions) and
##  169 columns (items) and a density of 0.02609146 
## 
## most frequent items:
##       whole milk other vegetables       rolls/buns             soda 
##             2513             1903             1809             1715 
##           yogurt          (Other) 
##             1372            34055 
## 
## element (itemset/transaction) length distribution:
## sizes
##    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
## 2159 1643 1299 1005  855  645  545  438  350  246  182  117   78   77   55   46 
##   17   18   19   20   21   22   23   24   26   27   28   29   32 
##   29   14   14    9   11    4    6    1    1    1    1    3    1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   3.000   4.409   6.000  32.000 
## 
## includes extended item information - examples:
##             labels
## 1 abrasive cleaner
## 2 artif. sweetener
## 3   baby cosmetics
data %>% itemFrequencyPlot(topN=15,col= 'Blue', main="Grocery")

We can the product with the highest frequency its the “Whole Milk”.here we can see the 15th highst frequency products.

rule <- data %>% apriori(list(supp=0.001, conf=0.8,maxlen=5))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.8    0.1    1 none FALSE            TRUE       5   0.001      1
##  maxlen target   ext
##       5  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 9 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [157 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5
## Warning in apriori(., list(supp = 0.001, conf = 0.8, maxlen = 5)): Mining
## stopped (maxlen reached). Only patterns up to a length of 5 returned!
##  done [0.01s].
## writing ... [398 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
summary(rule)
## set of 398 rules
## 
## rule length distribution (lhs + rhs):sizes
##   3   4   5 
##  29 229 140 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.000   4.000   4.000   4.279   5.000   5.000 
## 
## summary of quality measures:
##     support           confidence          lift            count     
##  Min.   :0.001017   Min.   :0.8000   Min.   : 3.131   Min.   :10.0  
##  1st Qu.:0.001017   1st Qu.:0.8333   1st Qu.: 3.312   1st Qu.:10.0  
##  Median :0.001220   Median :0.8462   Median : 3.588   Median :12.0  
##  Mean   :0.001251   Mean   :0.8658   Mean   : 3.942   Mean   :12.3  
##  3rd Qu.:0.001322   3rd Qu.:0.9091   3rd Qu.: 4.307   3rd Qu.:13.0  
##  Max.   :0.003152   Max.   :1.0000   Max.   :11.235   Max.   :31.0  
## 
## mining info:
##  data ntransactions support confidence
##     .          9835   0.001        0.8
inspect(head(rule))
##     lhs                                 rhs            support     confidence
## [1] {liquor,red/blush wine}          => {bottled beer} 0.001931876 0.9047619 
## [2] {cereals,curd}                   => {whole milk}   0.001016777 0.9090909 
## [3] {cereals,yogurt}                 => {whole milk}   0.001728521 0.8095238 
## [4] {butter,jam}                     => {whole milk}   0.001016777 0.8333333 
## [5] {bottled beer,soups}             => {whole milk}   0.001118454 0.9166667 
## [6] {house keeping products,napkins} => {whole milk}   0.001321810 0.8125000 
##     lift      count
## [1] 11.235269 19   
## [2]  3.557863 10   
## [3]  3.168192 17   
## [4]  3.261374 10   
## [5]  3.587512 11   
## [6]  3.179840 13

We can see the combinations rule after the using the Apriori Algorythim. obseving the table we can say that 90 confidence who bougth a “liquor,red /blush wine” also bougth “bottled beer”.

plot(rule[quality(rule)$confidence>0.3],method ="paracoord")

After observing the result we see the most cofidence level products are usually breakfast prodcut(cereal,yogurt,butter) with “Whole Milk” and liquor product (red wine) with “bottled beer”.