library(arules)
library(arulesViz)
library(dplyr)
data <- read.transactions('GroceryDataSet.csv', format = 'basket', sep=',')
data## transactions in sparse format with
## 9835 transactions (rows) and
## 169 items (columns)
summary(data)## transactions as itemMatrix in sparse format with
## 9835 rows (elements/itemsets/transactions) and
## 169 columns (items) and a density of 0.02609146
##
## most frequent items:
## whole milk other vegetables rolls/buns soda
## 2513 1903 1809 1715
## yogurt (Other)
## 1372 34055
##
## element (itemset/transaction) length distribution:
## sizes
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
## 2159 1643 1299 1005 855 645 545 438 350 246 182 117 78 77 55 46
## 17 18 19 20 21 22 23 24 26 27 28 29 32
## 29 14 14 9 11 4 6 1 1 1 1 3 1
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 2.000 3.000 4.409 6.000 32.000
##
## includes extended item information - examples:
## labels
## 1 abrasive cleaner
## 2 artif. sweetener
## 3 baby cosmetics
data %>% itemFrequencyPlot(topN=15,col= 'Blue', main="Grocery") We can the product with the highest frequency its the “Whole Milk”.here we can see the 15th highst frequency products.
rule <- data %>% apriori(list(supp=0.001, conf=0.8,maxlen=5))## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.8 0.1 1 none FALSE TRUE 5 0.001 1
## maxlen target ext
## 5 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 9
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [157 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5
## Warning in apriori(., list(supp = 0.001, conf = 0.8, maxlen = 5)): Mining
## stopped (maxlen reached). Only patterns up to a length of 5 returned!
## done [0.01s].
## writing ... [398 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
summary(rule)## set of 398 rules
##
## rule length distribution (lhs + rhs):sizes
## 3 4 5
## 29 229 140
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.000 4.000 4.000 4.279 5.000 5.000
##
## summary of quality measures:
## support confidence lift count
## Min. :0.001017 Min. :0.8000 Min. : 3.131 Min. :10.0
## 1st Qu.:0.001017 1st Qu.:0.8333 1st Qu.: 3.312 1st Qu.:10.0
## Median :0.001220 Median :0.8462 Median : 3.588 Median :12.0
## Mean :0.001251 Mean :0.8658 Mean : 3.942 Mean :12.3
## 3rd Qu.:0.001322 3rd Qu.:0.9091 3rd Qu.: 4.307 3rd Qu.:13.0
## Max. :0.003152 Max. :1.0000 Max. :11.235 Max. :31.0
##
## mining info:
## data ntransactions support confidence
## . 9835 0.001 0.8
inspect(head(rule))## lhs rhs support confidence
## [1] {liquor,red/blush wine} => {bottled beer} 0.001931876 0.9047619
## [2] {cereals,curd} => {whole milk} 0.001016777 0.9090909
## [3] {cereals,yogurt} => {whole milk} 0.001728521 0.8095238
## [4] {butter,jam} => {whole milk} 0.001016777 0.8333333
## [5] {bottled beer,soups} => {whole milk} 0.001118454 0.9166667
## [6] {house keeping products,napkins} => {whole milk} 0.001321810 0.8125000
## lift count
## [1] 11.235269 19
## [2] 3.557863 10
## [3] 3.168192 17
## [4] 3.261374 10
## [5] 3.587512 11
## [6] 3.179840 13
We can see the combinations rule after the using the Apriori Algorythim. obseving the table we can say that 90 confidence who bougth a “liquor,red /blush wine” also bougth “bottled beer”.
plot(rule[quality(rule)$confidence>0.3],method ="paracoord") After observing the result we see the most cofidence level products are usually breakfast prodcut(cereal,yogurt,butter) with “Whole Milk” and liquor product (red wine) with “bottled beer”.
Snipped code and refereces from: https://www.datacamp.com/community/tutorials/market-basket-analysis-r