Load package yang diperlukan. Saya menggunakan algoritma apriori sehingga membutuhkan library arules dan arulesviz untuk visualisasi
Pada Latihan ini kita akan membuat association rules dari data yang digunakan diambil dari Weka dataset. Dataset dapat diakses pada data/supermarket.csv. supermarket.csv merupakan dataset yang berisi daftar pembelian barang setiap transaksinya.
Summary data dan lihat data secara keseluruhan
## TID name
## Min. : 1 bread and cake : 3330
## 1st Qu.:1162 fruit : 2962
## Median :2324 vegetables : 2961
## Mean :2317 milk cream : 2939
## 3rd Qu.:3476 baking needs : 2795
## Max. :4627 frozen foods : 2717
## (Other) :61922
Saya akan memulai dengan menunjukan barang yang paling banyak dibeli di supermartket ini .
Setelah itu, saya membuat sebuah rules algoritma apriori.Pertama, ubah dataframe menjadi sebuah list
## $`1`
## [1] baby needs bread and cake baking needs juice sat cord ms
## [5] biscuits canned vegetables cleaners polishers coffee
## [9] sauces gravy pkle confectionary dishcloths scour frozen foods
## [13] razor blades party snack foods tissues paper prd wrapping
## [17] mens toiletries cheese milk cream margarine
## [21] small goods fruit vegetables 750ml white nz
## 100 Levels: 750ml red imp 750ml red nz 750ml white imp ... wrapping
##
## $`2`
## [1] canned fish meat canned fruit canned vegetables sauces gravy pkle
## [5] deod disinfectant frozen foods pet foods laundry needs
## [9] tissues paper prd deodorants soap haircare milk cream
## [13] fruit vegetables
## 100 Levels: 750ml red imp 750ml red nz 750ml white imp ... wrapping
##
## $`3`
## [1] bread and cake baking needs juice sat cord ms biscuits
## [5] canned fruit sauces gravy pkle puddings deserts wrapping
## [9] health food other small goods dairy foods beef
## [13] lamb fruit vegetables stationary
## 100 Levels: 750ml red imp 750ml red nz 750ml white imp ... wrapping
##
## $`4`
## [1] bread and cake baking needs juice sat cord ms biscuits
## [5] canned vegetables breakfast food cleaners polishers frozen foods
## [9] jams spreads pet foods party snack foods tissues paper prd
## [13] deodorants soap mens toiletries cheese margarine
## [17] dairy foods beef stationary prepared meals
## 100 Levels: 750ml red imp 750ml red nz 750ml white imp ... wrapping
##
## $`5`
## [1] bread and cake baking needs juice sat cord ms tea
## [5] cleaners polishers coffee sauces gravy pkle frozen foods
## [9] jams spreads laundry needs wrapping deodorants soap
## [13] haircare dental needs meat misc milk cream
## [17] margarine beef poultry potatoes
## [21] vegetables condiments small goods2
## 100 Levels: 750ml red imp 750ml red nz 750ml white imp ... wrapping
##
## $`6`
## [1] bread and cake baking needs juice sat cord ms tea
## [5] biscuits canned vegetables breakfast food confectionary
## [9] frozen foods spices party snack foods tissues paper prd
## [13] deodorants soap margarine dairy foods fruit
## [17] potatoes vegetables stationary bake off products
## 100 Levels: 750ml red imp 750ml red nz 750ml white imp ... wrapping
Kemudian, buat class list menjadi transactions. Hal ini akan mempermudah untuk melakukan algoritma apriori
## items transactionID
## [1] {750ml white nz ,
## baby needs ,
## baking needs ,
## biscuits,
## bread and cake ,
## canned vegetables ,
## cheese,
## cleaners polishers,
## coffee,
## confectionary,
## dishcloths scour,
## frozen foods ,
## fruit,
## juice sat cord ms,
## margarine,
## mens toiletries ,
## milk cream,
## party snack foods ,
## razor blades ,
## sauces gravy pkle,
## small goods ,
## tissues paper prd ,
## vegetables,
## wrapping} 1
## [2] {canned fish meat ,
## canned fruit ,
## canned vegetables ,
## deod disinfectant,
## deodorants soap,
## frozen foods ,
## fruit,
## haircare,
## laundry needs ,
## milk cream,
## pet foods ,
## sauces gravy pkle,
## tissues paper prd ,
## vegetables} 2
## [3] {baking needs ,
## beef,
## biscuits,
## bread and cake ,
## canned fruit ,
## dairy foods ,
## fruit,
## health food other ,
## juice sat cord ms,
## lamb,
## puddings deserts,
## sauces gravy pkle,
## small goods ,
## stationary,
## vegetables,
## wrapping} 3
## [1] 4601 100
Buat rules menggunakan fungsi apriori dengan parameter supp = 0.1 dan conf = 0.75
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.75 0.1 1 none FALSE TRUE 5 0.1 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 460
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[100 item(s), 4601 transaction(s)] done [0.00s].
## sorting and recoding items ... [47 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 7 done [0.05s].
## writing ... [9958 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
Lakukan summary untuk rules
## set of 9958 rules
##
## rule length distribution (lhs + rhs):sizes
## 2 3 4 5 6 7
## 41 720 3440 4367 1320 70
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.000 4.000 5.000 4.644 5.000 7.000
##
## summary of quality measures:
## support confidence coverage lift
## Min. :0.1002 Min. :0.7500 Min. :0.1098 Min. :1.042
## 1st Qu.:0.1067 1st Qu.:0.7716 1st Qu.:0.1324 1st Qu.:1.196
## Median :0.1161 Median :0.7995 Median :0.1452 Median :1.252
## Mean :0.1259 Mean :0.8080 Mean :0.1563 Mean :1.260
## 3rd Qu.:0.1332 3rd Qu.:0.8408 3rd Qu.:0.1674 3rd Qu.:1.313
## Max. :0.5079 Max. :0.9205 Max. :0.6438 Max. :1.594
## count
## Min. : 461.0
## 1st Qu.: 491.0
## Median : 534.0
## Mean : 579.2
## 3rd Qu.: 613.0
## Max. :2337.0
##
## mining info:
## data ntransactions support confidence
## sup_transaction 4601 0.1 0.75
Bandingkan rules berdasarkan confidence, support, dan lift.
## lhs rhs support confidence coverage lift count
## [1] {biscuits,
## frozen foods ,
## milk cream,
## pet foods ,
## vegetables} => {bread and cake } 0.1032384 0.9205426 0.1121495 1.271897 475
## [2] {baking needs ,
## biscuits,
## fruit,
## margarine,
## milk cream,
## vegetables} => {bread and cake } 0.1008476 0.9188119 0.1097587 1.269506 464
## [3] {biscuits,
## frozen foods ,
## margarine,
## milk cream,
## vegetables} => {bread and cake } 0.1167138 0.9179487 0.1271463 1.268313 537
## [4] {biscuits,
## canned vegetables ,
## frozen foods ,
## fruit,
## vegetables} => {bread and cake } 0.1069333 0.9179104 0.1164964 1.268260 492
## [5] {baking needs ,
## frozen foods ,
## fruit,
## margarine,
## milk cream,
## vegetables} => {bread and cake } 0.1030211 0.9168279 0.1123669 1.266764 474
Nilai confidence yang tinggi menunjukan peluang untuk terbelinya barang B jika sudah membeli barang A. Pada data frame di atas, dapat dilihat bahwa nilai confidence tertinggi ada pada item bread and cake. Hal ini menjelaskan bahwa peluang item bread and cake terbeli sangat tinggi. Banyak kombinasi pembelian item yang akan membuat seorang pelanggan membeli bread and cake.
## lhs rhs support confidence coverage
## [1] {milk cream} => {bread and cake } 0.5079331 0.7951684 0.6387742
## [2] {fruit} => {bread and cake } 0.5053249 0.7849426 0.6437731
## [3] {vegetables} => {bread and cake } 0.4994566 0.7760892 0.6435557
## [4] {baking needs } => {bread and cake } 0.4762008 0.7838998 0.6074766
## [5] {frozen foods } => {bread and cake } 0.4627255 0.7835848 0.5905238
## lift count
## [1] 1.098670 2337
## [2] 1.084541 2325
## [3] 1.072308 2298
## [4] 1.083100 2191
## [5] 1.082665 2129
Nilai support menunjukkan peluang seorang pembeli membeli sebuah atau kombinasi barang. Dengan kata lain, support adalah rasio terbelinya suatu barang terhadap total pembelian. Dataset ini menerangkan pembelian milk cream => bread and cake memiliki peluang tertinggi dibanding barang lainnya yaitu sebesar 0.5. Situasi ini menjelaskan bahwa ketika seorang membli milk cream ia akan membeli bread and cake juga.
## lhs rhs support confidence coverage lift count
## [1] {baking needs ,
## biscuits,
## bread and cake ,
## juice sat cord ms,
## sauces gravy pkle} => {party snack foods } 0.1010650 0.8072917 0.1251902 1.594141 465
## [2] {laundry needs ,
## wrapping} => {tissues paper prd } 0.1038905 0.7697262 0.1349707 1.576106 478
## [3] {biscuits,
## bread and cake ,
## frozen foods ,
## juice sat cord ms,
## sauces gravy pkle} => {party snack foods } 0.1056292 0.7928222 0.1332319 1.565569 486
## [4] {biscuits,
## margarine,
## wrapping} => {tissues paper prd } 0.1006303 0.7565359 0.1330146 1.549097 463
## [5] {baking needs ,
## biscuits,
## cheese,
## tissues paper prd } => {margarine} 0.1019344 0.7701149 0.1323625 1.548645 469
Nilai lift adalah tingkat kemampuan suatu peristiwa untuk mendorong terjadinya peristiwa lain. Dalam kasus ini, niali lift dapat dijelaskan dengan kemampuan pembelian suatu barang untuk meningkatkan peluang terbelinya barang lain. Contoh pada data frame di atas, nilai lift dari nomer 1 adalah sebesar 1.5 . Nilai lift > 1 berarti pembelian baking needs, biscuits, bread and cake, juice sar cord ms, dan sauces gravy pkle memiliki kemampuan untuk meningkatan peluang pembelian party snack foods.
Lakukan visualisasi rules yang sudah dobuat menggunakan grafik agar mudah dianalisa
Grafik di atas menunjukan hubungan atau keterkaitan pembelian suatu barang terhadap pembelian barang lain. Panah menunjukan barang yang dibeli selanjutnya. Di sini, kita dapat melihat barang mana yang paling banyak dibeli dan memiliki keterkaitan terhadap pembelian barang lainnya. Metode ini sangat baik digunakan di industri retail karena dapat membantu dalam penyediaan barang (supply chain) dan juga meningkatkan revenue tiap bulannya. Dengan mengetahui penjualan suatu barang tertentu, pihak supplier dapat menentukan item yang memiliki potensi untuk dijual selanjutnya.
So, that’s all for the process of market basket analysis and visualization by apriori algorithm using packages in R programming language.I hope this page can help you understand text problem and the solution behind it.
See you in the other page!
Author,
Alfado Sembiring
Notes :
In case you want to look up my profile, click the link below :
Jump To My Profile (open link in a new tab)