Pada Latihan ini kita akan membuat association rules dari data yang digunakan diambil dari Weka dataset. Dataset dapat diakses pada data/supermarket.csv. supermarket.csv merupakan dataset yang berisi daftar pembelian barang setiap transaksinya.
## 'data.frame': 79626 obs. of 2 variables:
## $ TID : int 1 1 1 1 1 1 1 1 1 1 ...
## $ name: chr "baby needs " "bread and cake " "baking needs " "juice sat cord ms" ...
## TID name
## Min. : 1 Length:79626
## 1st Qu.:1162 Class :character
## Median :2324 Mode :character
## Mean :2317
## 3rd Qu.:3476
## Max. :4627
Data terdiri dari dua kolom yaitu TID dan name. TID merupakan id transaksi dan name merupakan keterangan produk yang dibeli. Kita dapat summary statistik dengan menggunakan summary.
## [1] "baby needs " "bread and cake " "baking needs "
## [4] "juice sat cord ms" "biscuits" "canned vegetables "
## [7] "cleaners polishers" "coffee" "sauces gravy pkle"
## [10] "confectionary" "dishcloths scour" "frozen foods "
## [13] "razor blades " "party snack foods " "tissues paper prd "
## [16] "wrapping" "mens toiletries " "cheese"
## [19] "milk cream" "margarine" "small goods "
## [22] "fruit" "vegetables" "750ml white nz "
## [25] "canned fish meat " "canned fruit " "deod disinfectant"
## [28] "pet foods " "laundry needs " "deodorants soap"
## [31] "haircare" "puddings deserts" "health food other "
## [34] "dairy foods " "beef" "lamb"
## [37] "stationary" "breakfast food " "jams spreads"
## [40] "prepared meals " "tea" "dental needs "
## [43] "meat misc " "poultry" "potatoes"
## [46] "condiments" "small goods2 " "spices"
## [49] "bake off products " "soft drinks " "sanitary pads "
## [52] "electrical" "750ml red imp " "sparkling nz "
## [55] "pork" "beverages hot " "lotions creams"
## [58] "cough cold pain" "pet food " "grocery misc "
## [61] "cigs tobacco pkts " "pkt canned soup " "produce misc "
## [64] "deli gourmet " "variety misc " "kitchen"
## [67] "imported cheese " "fuels garden aids " "cold meats"
## [70] "casks white wine " "cooking oils " "offal"
## [73] "manchester" "plasticware" "insecticides"
## [76] "brushware" "non host support " "haberdashery"
## [79] "port and sherry " "delicatessen misc " "health beauty misc "
## [82] "dried vegetables " "medicines" "trim pork "
## [85] "cigarette cartons " "plants" "preserving needs "
## [88] "750ml white imp " "hogget" "fruit drinks "
## [91] "750ml red nz " "casks red wine " "trim lamb "
## [94] "mutton" "sparkling imp " "pantyhose"
## [97] "salads" "chickens" "gourmet meat "
## [100] "veal"
Yang paling banyak dibeli adalah bread and cake dengan frekuensi 3330.
Kita dapat melihat berapa banyak yang dibeli tiap transaksi dengan menggunakan koding berikut
## $`1`
## [1] "baby needs " "bread and cake " "baking needs "
## [4] "juice sat cord ms" "biscuits" "canned vegetables "
## [7] "cleaners polishers" "coffee" "sauces gravy pkle"
## [10] "confectionary" "dishcloths scour" "frozen foods "
## [13] "razor blades " "party snack foods " "tissues paper prd "
## [16] "wrapping" "mens toiletries " "cheese"
## [19] "milk cream" "margarine" "small goods "
## [22] "fruit" "vegetables" "750ml white nz "
##
## $`2`
## [1] "canned fish meat " "canned fruit " "canned vegetables "
## [4] "sauces gravy pkle" "deod disinfectant" "frozen foods "
## [7] "pet foods " "laundry needs " "tissues paper prd "
## [10] "deodorants soap" "haircare" "milk cream"
## [13] "fruit" "vegetables"
##
## $`3`
## [1] "bread and cake " "baking needs " "juice sat cord ms"
## [4] "biscuits" "canned fruit " "sauces gravy pkle"
## [7] "puddings deserts" "wrapping" "health food other "
## [10] "small goods " "dairy foods " "beef"
## [13] "lamb" "fruit" "vegetables"
## [16] "stationary"
supermarket_transactions <- as(supermarket_list, "transactions")
supermarket_transactions %>%
head(2) %>%
inspect()## items transactionID
## [1] {750ml white nz ,
## baby needs ,
## baking needs ,
## biscuits,
## bread and cake ,
## canned vegetables ,
## cheese,
## cleaners polishers,
## coffee,
## confectionary,
## dishcloths scour,
## frozen foods ,
## fruit,
## juice sat cord ms,
## margarine,
## mens toiletries ,
## milk cream,
## party snack foods ,
## razor blades ,
## sauces gravy pkle,
## small goods ,
## tissues paper prd ,
## vegetables,
## wrapping} 1
## [2] {canned fish meat ,
## canned fruit ,
## canned vegetables ,
## deod disinfectant,
## deodorants soap,
## frozen foods ,
## fruit,
## haircare,
## laundry needs ,
## milk cream,
## pet foods ,
## sauces gravy pkle,
## tissues paper prd ,
## vegetables} 2
## [1] "ngCMatrix"
## attr(,"package")
## [1] "Matrix"
# your code
supermarket_rules <- apriori(data = supermarket_transactions, parameter = list(supp = 0.1, conf = 0.7))## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.7 0.1 1 none FALSE TRUE 5 0.1 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 460
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[100 item(s), 4601 transaction(s)] done [0.01s].
## sorting and recoding items ... [47 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 7 done [0.06s].
## writing ... [15513 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
catatan : komputer tidak kuat menggunakan supp 0.01, sehingga analisis berikutnya menggunakan supp 0.1
## set of 15513 rules
##
## rule length distribution (lhs + rhs):sizes
## 1 2 3 4 5 6 7
## 1 92 1889 6093 5794 1570 74
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 4.000 4.000 4.456 5.000 7.000
##
## summary of quality measures:
## support confidence coverage lift
## Min. :0.1002 Min. :0.7000 Min. :0.1098 Min. :1.000
## 1st Qu.:0.1074 1st Qu.:0.7364 1st Qu.:0.1378 1st Qu.:1.174
## Median :0.1178 Median :0.7689 Median :0.1528 Median :1.235
## Mean :0.1291 Mean :0.7788 Mean :0.1668 Mean :1.247
## 3rd Qu.:0.1378 3rd Qu.:0.8164 3rd Qu.:0.1782 3rd Qu.:1.307
## Max. :0.7238 Max. :0.9205 Max. :1.0000 Max. :1.594
## count
## Min. : 461
## 1st Qu.: 494
## Median : 542
## Mean : 594
## 3rd Qu.: 634
## Max. :3330
##
## mining info:
## data ntransactions support confidence
## supermarket_transactions 4601 0.1 0.7
## lhs rhs support confidence coverage lift count
## [1] {biscuits,
## frozen foods ,
## milk cream,
## pet foods ,
## vegetables} => {bread and cake } 0.1032384 0.9205426 0.1121495 1.271897 475
## [2] {baking needs ,
## biscuits,
## fruit,
## margarine,
## milk cream,
## vegetables} => {bread and cake } 0.1008476 0.9188119 0.1097587 1.269506 464
## [3] {biscuits,
## frozen foods ,
## margarine,
## milk cream,
## vegetables} => {bread and cake } 0.1167138 0.9179487 0.1271463 1.268313 537
## [4] {biscuits,
## canned vegetables ,
## frozen foods ,
## fruit,
## vegetables} => {bread and cake } 0.1069333 0.9179104 0.1164964 1.268260 492
## [5] {baking needs ,
## frozen foods ,
## fruit,
## margarine,
## milk cream,
## vegetables} => {bread and cake } 0.1030211 0.9168279 0.1123669 1.266764 474
Nilai Confidence yang tinggi menunjukkan seberapa besar peluang membeli item lainnya jika kita telah membeli suatu item. Rules dengan Confidence tertinggi adalah {biscuits,frozen foods ,milk cream,pet foods ,vegetables} => {bread and cake}.
artinya ketika seorang pelanggan membeli 5 item tersebut, kemungkinan besar pelanggan juga akan membeli bread and cake karena dari seluruh transaksi yang berisi 5 item pertama yang dibeli, 0.958 atau 92.05% pembelian juga terdapat item bread and cake.
Kelemahan apabila melihat rules berdasarkan nilai Confidence adalah, Confidence hanya melihat dari transaksi antecedent dan tidak memperhatikan transaksi-transaksi lain dari consequent. Maka, kita perlu melihat seberapa baik antecedent meningkatkan peluang untuk pelanggan membeli item lainnya jika kita mengetahui dia telah memberi sekumpulan barang-barang tertentu dibandingkan ketika kita tidak tahu bahwa pelanggan membeli barang-barang tersebut. Oleh sebab itu, mari kita periksa 5 rules dengan Lift tertinggi.
## lhs rhs support confidence coverage lift count
## [1] {baking needs ,
## biscuits,
## bread and cake ,
## juice sat cord ms,
## sauces gravy pkle} => {party snack foods } 0.1010650 0.8072917 0.1251902 1.594141 465
## [2] {laundry needs ,
## wrapping} => {tissues paper prd } 0.1038905 0.7697262 0.1349707 1.576106 478
## [3] {biscuits,
## bread and cake ,
## frozen foods ,
## juice sat cord ms,
## sauces gravy pkle} => {party snack foods } 0.1056292 0.7928222 0.1332319 1.565569 486
## [4] {frozen foods ,
## party snack foods ,
## prepared meals } => {sauces gravy pkle} 0.1049772 0.7442219 0.1410563 1.555731 483
## [5] {biscuits,
## margarine,
## wrapping} => {tissues paper prd } 0.1006303 0.7565359 0.1330146 1.549097 463
## [6] {baking needs ,
## biscuits,
## cheese,
## tissues paper prd } => {margarine} 0.1019344 0.7701149 0.1323625 1.548645 469
## [7] {biscuits,
## juice sat cord ms,
## sauces gravy pkle,
## tissues paper prd } => {party snack foods } 0.1025864 0.7840532 0.1308411 1.548253 472
## [8] {baking needs ,
## bread and cake ,
## frozen foods ,
## juice sat cord ms,
## sauces gravy pkle} => {party snack foods } 0.1028037 0.7831126 0.1312758 1.546395 473
## [9] {baking needs ,
## biscuits,
## bread and cake ,
## laundry needs } => {tissues paper prd } 0.1049772 0.7523364 0.1395349 1.540498 483
## [10] {biscuits,
## frozen foods ,
## juice sat cord ms,
## sauces gravy pkle} => {party snack foods } 0.1225820 0.7800830 0.1571398 1.540413 564
Berdasarkan hasil di atas, rules {baking needs, biscuits, bread and cake, juice sat cord ms, sauces gravy pkle} => { party snack foods } memiliki Lift terbesar dengan nilai 1.594141. Ketika nilai Lift lebih besar dari 1, maka pembelian barang{Juice sat cord ms, soft drinks} memang meningkatkan peluang pembeli untuk membeli party snack foods. Jika kita bandingkan, rules dengan Confidence tertinggi, yakni{biscuits,frozen foods ,milk cream,pet foods ,vegetables} => {bread and cake} ternyata hanya memiliki Lift sebesar 1.271897. Meskipun pembelian kedua item tersebut meningkatkan peluang untuk membeli bread and cake, tetapi efeknya tidak terlalu besar jika dibandingkan dengan rules lainnya.
Berdasarkan rules yang dihasilkan, hanya terdapat beberapa rules yang memiliki Confidence tinggi dan Lift yang tinggi pula, sementara sebagian besar rules hanya memiliki Lift tinggi atau Confidence tinggi saja. Sedangkan, dari semua rules yang dihasilkan, tidak terdapat rules yang memilki Lift kurang dari 1, sehingga dapat disimpulkan bahwa semua rules yang dihasilkan dapat meningkatkan peluang untuk pembelian item tertentu. Semua confidence juga berada di atas angka 0.7 yang menandakan nilai yang cukup baik.
Terdapat beberapa nilai yang memberikan confidence juga lift yang sangat baik.
Dari rules yang dihasilkan dapat dilihat juga hubungan antar rules dengan menggunakan graph atau network, dengan tiap lingkaran atau titik adalah rules dan panah sebagai hubungan antara rules dengan item barangnya.