Analisis Asosiasi adalah sebuah metodologi untuk mencari relasi (asosiasi) istimewa/menarik yang tersembunyi dalam himpunan data (atau data set) yang besar. Salah satu penerapan Metode Association rules adalah pada Market Basket Analysis.
Membaca data xls
library(xlsx)
library(arules)
## Loading required package: Matrix
##
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
##
## abbreviate, write
library(arulesViz)
## Loading required package: grid
data.kafe. = read.xlsx("GeorgiaCafe.xlsx",1,header = T)
head(data.kafe.)
## Txn.no. Tea Coffee Frappe Patties Samosa Soft.drinks Burgers Chips
## 1 561 0 0 0 0 0 0 0 1
## 2 368 0 3 0 2 0 0 0 0
## 3 668 3 2 0 0 1 0 0 0
## 4 549 1 1 0 0 1 0 0 0
## 5 381 3 3 0 0 2 0 1 0
## 6 456 3 0 0 3 0 0 0 0
Melihat ada tidaknya data yang kosong
colSums(is.na(data.kafe.))
## Txn.no. Tea Coffee Frappe Patties Samosa
## 0 0 0 0 0 0
## Soft.drinks Burgers Chips
## 0 0 0
Melihat struktur data
str(data.kafe.)
## 'data.frame': 148 obs. of 9 variables:
## $ Txn.no. : num 561 368 668 549 381 456 506 569 607 351 ...
## $ Tea : num 0 0 3 1 3 3 2 0 3 0 ...
## $ Coffee : num 0 3 2 1 3 0 0 2 2 2 ...
## $ Frappe : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Patties : num 0 2 0 0 0 3 1 3 0 0 ...
## $ Samosa : num 0 0 1 1 2 0 0 0 1 0 ...
## $ Soft.drinks: num 0 0 0 0 0 0 0 1 0 0 ...
## $ Burgers : num 0 0 0 0 1 0 0 0 1 0 ...
## $ Chips : num 1 0 0 0 0 0 0 1 0 1 ...
Mengubah data ketipe logical atau 0 1 agar dapat diubah ke tipe data transaki
data.kafe. = apply(data.kafe.,c(1,2),as.logical)
data.kafe. = apply(data.kafe.,c(1,2),as.numeric)
str(data.kafe.)
## num [1:148, 1:9] 1 1 1 1 1 1 1 1 1 1 ...
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr [1:9] "Txn.no." "Tea" "Coffee" "Frappe" ...
Mengubah data ketipe data transaksi
data.kafe.transaksi = as(data.kafe.[,-1],"transactions")
head(data.kafe.transaksi)
## transactions in sparse format with
## 6 transactions (rows) and
## 8 items (columns)
Menggunakan algoritma apriori dalam menentukan asosiasi
Minimum_confidence <- 0.8
Minimum_support <- .001
Minimum_rule_length <-2
Maximum_rule_length <-5
rules1 = apriori(data.kafe.transaksi,parameter = list(supp=Minimum_support, conf=Minimum_confidence, maxlen=Maximum_rule_length, minlen=Minimum_rule_length))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.8 0.1 1 none FALSE TRUE 5 0.001 2
## maxlen target ext
## 5 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 0
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[8 item(s), 148 transaction(s)] done [0.00s].
## sorting and recoding items ... [8 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5
## Warning in apriori(data.kafe.transaksi, parameter = list(supp =
## Minimum_support, : Mining stopped (maxlen reached). Only patterns up to a
## length of 5 returned!
## done [0.00s].
## writing ... [76 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
Visualisasi hasil asosiasi data kafe
plot(rules1)
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.
Visualisasi dengan metode pengelompokan
plot(rules1, method = "grouped")
Visualisasi dengan metode paracoord
plot(rules1, method = "paracoord")
Visualisasi dengan metode graf
plot(rules1, method="graph",
control=list(layout=igraph::with_fr()))
Mengurutkan berdasarkan nilai lift dan support yang terbesar
top.lift = sort(rules1, by ="lift")
df.top.lift<-DATAFRAME(top.lift, separate = TRUE)
head(df.top.lift)
## LHS RHS support confidence lift
## 4 {Soft.drinks,Chips} {Coffee} 0.006756757 1 1.644444
## 10 {Burgers,Chips} {Coffee} 0.013513514 1 1.644444
## 20 {Samosa,Chips} {Coffee} 0.074324324 1 1.644444
## 25 {Patties,Soft.drinks,Chips} {Coffee} 0.006756757 1 1.644444
## 29 {Samosa,Burgers,Chips} {Coffee} 0.006756757 1 1.644444
## 31 {Patties,Burgers,Chips} {Coffee} 0.013513514 1 1.644444
## count
## 4 1
## 10 2
## 20 11
## 25 1
## 29 1
## 31 2
top.support <- sort(rules1, decreasing = TRUE, na.last = NA, by = "support")
df.top.support<-DATAFRAME(top.support, separate = TRUE)
head(df.top.lift)
## LHS RHS support confidence lift
## 4 {Soft.drinks,Chips} {Coffee} 0.006756757 1 1.644444
## 10 {Burgers,Chips} {Coffee} 0.013513514 1 1.644444
## 20 {Samosa,Chips} {Coffee} 0.074324324 1 1.644444
## 25 {Patties,Soft.drinks,Chips} {Coffee} 0.006756757 1 1.644444
## 29 {Samosa,Burgers,Chips} {Coffee} 0.006756757 1 1.644444
## 31 {Patties,Burgers,Chips} {Coffee} 0.013513514 1 1.644444
## count
## 4 1
## 10 2
## 20 11
## 25 1
## 29 1
## 31 2
Mengubah data ke tipe karakter
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.5.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:arules':
##
## intersect, recode, setdiff, setequal, union
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
df.top.support %>% mutate_if(is.factor, as.character) -> df.top.support
Menghapus data dengan tanda huruf \{|\}
df.top.support$LHS <- gsub("\\{|\\}", "", df.top.support$LHS)
Membulatkan nilai-nilai dari variabel support,confidence,lift
df.top.support$support <- round(df.top.support$support,digits = 3)
df.top.support$confidence <- round(df.top.support$confidence,digits = 2)
df.top.support$lift <- round(df.top.support$lift,digits = 2)
Hasil data pada df.top.support setelah dilakukan perapian
df.top.support
## LHS RHS support confidence lift count
## 1 Coffee,Samosa {Tea} 0.318 0.82 1.05 47
## 2 Frappe {Tea} 0.142 0.81 1.03 21
## 3 Burgers {Tea} 0.108 0.94 1.20 16
## 4 Frappe,Samosa {Tea} 0.108 0.80 1.02 16
## 5 Coffee,Burgers {Tea} 0.074 0.92 1.17 11
## 6 Samosa,Chips {Coffee} 0.074 1.00 1.64 11
## 7 Patties,Burgers {Tea} 0.068 0.91 1.16 10
## 8 Samosa,Burgers {Tea} 0.054 0.89 1.13 8
## 9 Patties,Samosa,Chips {Coffee} 0.054 1.00 1.64 8
## 10 Tea,Samosa,Chips {Coffee} 0.054 1.00 1.64 8
## 11 Coffee,Frappe {Tea} 0.047 0.88 1.12 7
## 12 Coffee,Patties,Burgers {Tea} 0.047 0.88 1.12 7
## 13 Coffee,Frappe,Samosa {Tea} 0.041 1.00 1.28 6
## 14 Tea,Coffee,Frappe {Samosa} 0.041 0.86 1.31 6
## 15 Patties,Samosa,Burgers {Tea} 0.034 0.83 1.06 5
## 16 Coffee,Frappe,Patties {Tea} 0.034 0.83 1.06 5
## 17 Tea,Patties,Samosa,Chips {Coffee} 0.034 1.00 1.64 5
## 18 Soft.drinks {Tea} 0.027 0.80 1.02 4
## 19 Coffee,Samosa,Burgers {Tea} 0.027 0.80 1.02 4
## 20 Coffee,Frappe,Patties,Samosa {Tea} 0.027 1.00 1.28 4
## 21 Tea,Coffee,Frappe,Patties {Samosa} 0.027 0.80 1.22 4
## 22 Frappe,Chips {Patties} 0.020 1.00 1.48 3
## 23 Frappe,Chips {Tea} 0.020 1.00 1.28 3
## 24 Frappe,Patties,Chips {Tea} 0.020 1.00 1.28 3
## 25 Tea,Frappe,Chips {Patties} 0.020 1.00 1.48 3
## 26 Coffee,Soft.drinks {Patties} 0.014 1.00 1.48 2
## 27 Samosa,Soft.drinks {Tea} 0.014 1.00 1.28 2
## 28 Burgers,Chips {Coffee} 0.014 1.00 1.64 2
## 29 Burgers,Chips {Patties} 0.014 1.00 1.48 2
## 30 Frappe,Burgers {Samosa} 0.014 1.00 1.53 2
## 31 Frappe,Burgers {Patties} 0.014 1.00 1.48 2
## 32 Frappe,Burgers {Tea} 0.014 1.00 1.28 2
## 33 Coffee,Burgers,Chips {Patties} 0.014 1.00 1.48 2
## 34 Patties,Burgers,Chips {Coffee} 0.014 1.00 1.64 2
## 35 Frappe,Samosa,Burgers {Patties} 0.014 1.00 1.48 2
## 36 Frappe,Patties,Burgers {Samosa} 0.014 1.00 1.53 2
## 37 Frappe,Samosa,Burgers {Tea} 0.014 1.00 1.28 2
## 38 Tea,Frappe,Burgers {Samosa} 0.014 1.00 1.53 2
## 39 Frappe,Patties,Burgers {Tea} 0.014 1.00 1.28 2
## 40 Tea,Frappe,Burgers {Patties} 0.014 1.00 1.48 2
## 41 Frappe,Patties,Samosa,Burgers {Tea} 0.014 1.00 1.28 2
## 42 Tea,Frappe,Samosa,Burgers {Patties} 0.014 1.00 1.48 2
## 43 Tea,Frappe,Patties,Burgers {Samosa} 0.014 1.00 1.53 2
## 44 Soft.drinks,Chips {Coffee} 0.007 1.00 1.64 1
## 45 Soft.drinks,Chips {Patties} 0.007 1.00 1.48 1
## 46 Frappe,Soft.drinks {Samosa} 0.007 1.00 1.53 1
## 47 Frappe,Soft.drinks {Tea} 0.007 1.00 1.28 1
## 48 Coffee,Soft.drinks,Chips {Patties} 0.007 1.00 1.48 1
## 49 Patties,Soft.drinks,Chips {Coffee} 0.007 1.00 1.64 1
## 50 Frappe,Samosa,Soft.drinks {Tea} 0.007 1.00 1.28 1
## 51 Tea,Frappe,Soft.drinks {Samosa} 0.007 1.00 1.53 1
## 52 Tea,Coffee,Soft.drinks {Patties} 0.007 1.00 1.48 1
## 53 Samosa,Burgers,Chips {Coffee} 0.007 1.00 1.64 1
## 54 Tea,Burgers,Chips {Coffee} 0.007 1.00 1.64 1
## 55 Samosa,Burgers,Chips {Patties} 0.007 1.00 1.48 1
## 56 Tea,Burgers,Chips {Patties} 0.007 1.00 1.48 1
## 57 Coffee,Frappe,Chips {Samosa} 0.007 1.00 1.53 1
## 58 Frappe,Samosa,Chips {Coffee} 0.007 1.00 1.64 1
## 59 Coffee,Frappe,Chips {Patties} 0.007 1.00 1.48 1
## 60 Coffee,Frappe,Chips {Tea} 0.007 1.00 1.28 1
## 61 Frappe,Samosa,Chips {Patties} 0.007 1.00 1.48 1
## 62 Frappe,Samosa,Chips {Tea} 0.007 1.00 1.28 1
## 63 Coffee,Samosa,Burgers,Chips {Patties} 0.007 1.00 1.48 1
## 64 Patties,Samosa,Burgers,Chips {Coffee} 0.007 1.00 1.64 1
## 65 Tea,Coffee,Burgers,Chips {Patties} 0.007 1.00 1.48 1
## 66 Tea,Patties,Burgers,Chips {Coffee} 0.007 1.00 1.64 1
## 67 Coffee,Frappe,Samosa,Chips {Patties} 0.007 1.00 1.48 1
## 68 Coffee,Frappe,Patties,Chips {Samosa} 0.007 1.00 1.53 1
## 69 Frappe,Patties,Samosa,Chips {Coffee} 0.007 1.00 1.64 1
## 70 Coffee,Frappe,Samosa,Chips {Tea} 0.007 1.00 1.28 1
## 71 Tea,Coffee,Frappe,Chips {Samosa} 0.007 1.00 1.53 1
## 72 Tea,Frappe,Samosa,Chips {Coffee} 0.007 1.00 1.64 1
## 73 Coffee,Frappe,Patties,Chips {Tea} 0.007 1.00 1.28 1
## 74 Tea,Coffee,Frappe,Chips {Patties} 0.007 1.00 1.48 1
## 75 Frappe,Patties,Samosa,Chips {Tea} 0.007 1.00 1.28 1
## 76 Tea,Frappe,Samosa,Chips {Patties} 0.007 1.00 1.48 1
Support adalah ukuran yang penting karena jika aturan memiliki support yang kecil, maka kejadian bisa saja hanyalah sebuah kebetulan. Aturan Support yang rendah juga cenderung tidak menarik dari perspektif bisnis karena mungkin tidak akan memberikan keuntungan saat mempromosikan barang-barang yang jarang dibeli pelanggan bersamaan. Untuk alasan ini, dukungan sering digunakan untuk menghilangkan ketidak-menarikan ini. Confidence, adalah ukuran kehandalan dari kesimpulan yang dibuat oleh aturan. Semakin besar Confidence, semakin besar kemungkinan untuk Y hadir dalam transaksi yang mengandung X. Confidence juga memberikan probabilitas bersyarat dari Y yang diberikan ke X.