Pada report ini, saya akan membuat association rules dari data yang digunakan diambil dari Weka dataset. Dataset dapat diakses pada data/supermarket.csv. supermarket.csv merupakan dataset yang berisi daftar pembelian barang setiap transaksinya.
## 'data.frame': 79626 obs. of 2 variables:
## $ TID : int 1 1 1 1 1 1 1 1 1 1 ...
## $ name: Factor w/ 100 levels "750ml red imp ",..: 5 11 7 53 10 16 23 24 85 27 ...
Terdapat 541 ribu observasi dengan 8 kolom/variabel. Berikut penjelasan untuk setiap kolom :
## TID name
## Min. : 1 bread and cake : 3330
## 1st Qu.:1162 fruit : 2962
## Median :2324 vegetables : 2961
## Mean :2317 milk cream : 2939
## 3rd Qu.:3476 baking needs : 2795
## Max. :4627 frozen foods : 2717
## (Other) :61922
Dari fungsi tail() & summary() terlihat dalam satu Transaction ID terdapat beberapa produk yang dibeli, data ini harus diubah ke bentuk transaction agar tidak memakan banyak memori dan waktu saat proses di R-nya.
Barang yang paling sering dibeli adalah: ‘bread and cake’, ‘fruit’, ‘vegetables’, ‘milk cream’, dan ‘baking needs’.
Banyaknya barang yang dibeli di 5 transaksi terbanyak adalah:
## $`1`
## [1] baby needs bread and cake baking needs juice sat cord ms
## [5] biscuits canned vegetables cleaners polishers coffee
## [9] sauces gravy pkle confectionary dishcloths scour frozen foods
## [13] razor blades party snack foods tissues paper prd wrapping
## [17] mens toiletries cheese milk cream margarine
## [21] small goods fruit vegetables 750ml white nz
## 100 Levels: 750ml red imp 750ml red nz 750ml white imp ... wrapping
##
## $`2`
## [1] canned fish meat canned fruit canned vegetables sauces gravy pkle
## [5] deod disinfectant frozen foods pet foods laundry needs
## [9] tissues paper prd deodorants soap haircare milk cream
## [13] fruit vegetables
## 100 Levels: 750ml red imp 750ml red nz 750ml white imp ... wrapping
##
## $`3`
## [1] bread and cake baking needs juice sat cord ms biscuits
## [5] canned fruit sauces gravy pkle puddings deserts wrapping
## [9] health food other small goods dairy foods beef
## [13] lamb fruit vegetables stationary
## 100 Levels: 750ml red imp 750ml red nz 750ml white imp ... wrapping
## [1] 4601 100
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.75 0.1 1 none FALSE TRUE 5 0.1 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 460
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[100 item(s), 4601 transaction(s)] done [0.01s].
## sorting and recoding items ... [47 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 7 done [0.07s].
## writing ... [9958 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## set of 9958 rules
##
## rule length distribution (lhs + rhs):sizes
## 2 3 4 5 6 7
## 41 720 3440 4367 1320 70
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.000 4.000 5.000 4.644 5.000 7.000
##
## summary of quality measures:
## support confidence coverage lift
## Min. :0.1002 Min. :0.7500 Min. :0.1098 Min. :1.042
## 1st Qu.:0.1067 1st Qu.:0.7716 1st Qu.:0.1324 1st Qu.:1.196
## Median :0.1161 Median :0.7995 Median :0.1452 Median :1.252
## Mean :0.1259 Mean :0.8080 Mean :0.1563 Mean :1.260
## 3rd Qu.:0.1332 3rd Qu.:0.8408 3rd Qu.:0.1674 3rd Qu.:1.313
## Max. :0.5079 Max. :0.9205 Max. :0.6438 Max. :1.594
## count
## Min. : 461.0
## 1st Qu.: 491.0
## Median : 534.0
## Mean : 579.2
## 3rd Qu.: 613.0
## Max. :2337.0
##
## mining info:
## data ntransactions support confidence
## supermarket_trans 4601 0.1 0.75
Interpretasi: pada bagian ‘rule length’ dapat dilihat panjang rules yang generated sebanyak 7 atau 7 variasi item. Rules yang generated secara total sebanyak 9,958 rules.
Pada fungsi berikut, kita akan melihat 10 rules teratas yang dibuat oleh fungsi apriori() berdasarkan lift. Parameter lift penting untuk melihat hubungan antar variasi antecedent yang meningkatkan peluang seseorang membeli item consequent.
## lhs rhs support confidence coverage lift count
## [1] {baking needs ,
## biscuits,
## bread and cake ,
## juice sat cord ms,
## sauces gravy pkle} => {party snack foods } 0.1010650 0.8072917 0.1251902 1.594141 465
## [2] {laundry needs ,
## wrapping} => {tissues paper prd } 0.1038905 0.7697262 0.1349707 1.576106 478
## [3] {biscuits,
## bread and cake ,
## frozen foods ,
## juice sat cord ms,
## sauces gravy pkle} => {party snack foods } 0.1056292 0.7928222 0.1332319 1.565569 486
## [4] {biscuits,
## margarine,
## wrapping} => {tissues paper prd } 0.1006303 0.7565359 0.1330146 1.549097 463
## [5] {baking needs ,
## biscuits,
## cheese,
## tissues paper prd } => {margarine} 0.1019344 0.7701149 0.1323625 1.548645 469
## [6] {biscuits,
## juice sat cord ms,
## sauces gravy pkle,
## tissues paper prd } => {party snack foods } 0.1025864 0.7840532 0.1308411 1.548253 472
## [7] {baking needs ,
## bread and cake ,
## frozen foods ,
## juice sat cord ms,
## sauces gravy pkle} => {party snack foods } 0.1028037 0.7831126 0.1312758 1.546395 473
## [8] {baking needs ,
## biscuits,
## bread and cake ,
## laundry needs } => {tissues paper prd } 0.1049772 0.7523364 0.1395349 1.540498 483
## [9] {biscuits,
## frozen foods ,
## juice sat cord ms,
## sauces gravy pkle} => {party snack foods } 0.1225820 0.7800830 0.1571398 1.540413 564
## [10] {baking needs ,
## biscuits,
## juice sat cord ms,
## sauces gravy pkle} => {party snack foods } 0.1169311 0.7797101 0.1499674 1.539677 538
Berdasarkan rules yang telah dibuat pada fungsi apriori() tersebut, berikut visualisasinya:
Apabila Anda ingin mencoba menelaah kategori produk secara lebih detail, tunning parameter saat generating rules pada fungsi apriori() bisa dicoba diperluas dengan menurunkan support maupun condidence.