df <- read.csv("G:\\RStudio\\udemy\\ml\\Machine Learning AZ\\Part 5 - Association Rule Learning\\Section 28 - Apriori\\Apriori\\Market_Basket_Optimisation.csv", header = FALSE)
head(df)
Build the sparse matrix
# install.packages("arules")
library(arules)
df <- read.transactions("G:\\RStudio\\udemy\\ml\\Machine Learning AZ\\Part 5 - Association Rule Learning\\Section 28 - Apriori\\Apriori\\Market_Basket_Optimisation.csv", sep =',', rm.duplicates = TRUE)
distribution of transactions with duplicates:
1
5
summary(df)
transactions as itemMatrix in sparse format with
7501 rows (elements/itemsets/transactions) and
119 columns (items) and a density of 0.03288973
most frequent items:
mineral water eggs spaghetti french fries chocolate (Other)
1788 1348 1306 1282 1229 22405
element (itemset/transaction) length distribution:
sizes
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 18 19 20
1754 1358 1044 816 667 493 391 324 259 139 102 67 40 22 17 4 1 2 1
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 2.000 3.000 3.914 5.000 20.000
includes extended item information - examples:
labels
1 almonds
2 antioxydant juice
3 asparagus
# frequency plot
itemFrequencyPlot(df, topN = 10)
# set support to items that are purchased 3 times a day over the whole week 7*3/7500
# set minlen to get at least 2 items in the set.
rules <- eclat(data = df, parameter = list(support = 0.004, minlen = 2) )
Eclat
parameter specification:
tidLists support minlen maxlen target ext
FALSE 0.004 2 10 frequent itemsets FALSE
algorithmic control:
sparse sort verbose
7 -2 TRUE
Absolute minimum support count: 30
create itemset ...
set transactions ...[119 item(s), 7501 transaction(s)] done [0.01s].
sorting and recoding items ... [114 item(s)] done [0.00s].
creating sparse bit matrix ... [114 row(s), 7501 column(s)] done [0.00s].
writing ... [845 set(s)] done [0.04s].
Creating S4 object ... done [0.00s].
Visualize the results
# show the top 20 rules sorted decreasing support
inspect(sort(rules, by = 'support')[1:20])
items support
[1] {mineral water,spaghetti} 0.05972537
[2] {chocolate,mineral water} 0.05265965
[3] {eggs,mineral water} 0.05092654
[4] {milk,mineral water} 0.04799360
[5] {ground beef,mineral water} 0.04092788
[6] {ground beef,spaghetti} 0.03919477
[7] {chocolate,spaghetti} 0.03919477
[8] {eggs,spaghetti} 0.03652846
[9] {eggs,french fries} 0.03639515
[10] {frozen vegetables,mineral water} 0.03572857
[11] {milk,spaghetti} 0.03546194
[12] {chocolate,french fries} 0.03439541
[13] {mineral water,pancakes} 0.03372884
[14] {french fries,mineral water} 0.03372884
[15] {chocolate,eggs} 0.03319557
[16] {chocolate,milk} 0.03212905
[17] {green tea,mineral water} 0.03106252
[18] {eggs,milk} 0.03079589
[19] {burgers,eggs} 0.02879616
[20] {french fries,green tea} 0.02852953