Basket analysis là kĩ thuật phân tích hành vi khách hàng dựa trên lịch sử giao dịch của họ, từ đó giúp cho bộ phận kinh doanh nắm được thị hiếu, thói quen tiêu dùng của khách hàng để có những chiến dịch Marketing một cách hợp lý
Một khách hàng, khi đi siêu thị, họ thường có xu hướng mua một vài sản phẩm cùng lúc. Ví dụ: Giỏ hàng của 1 khách hàng A bao gồm (Sữa, bánh mỳ, bia, thuốc lá), giỏ hàng của khách hàng B gồm (Sữa, bánh mỳ, bàn chải đánh răng, kem đánh răng) …
Câu hỏi đặt ra là khách hàng thường mua những sản phẩm gì, sau khi mua sản phẩm X thì khách hàng sẽ mua sản phẩm Y nào, sản phẩm Z nào?
\(\Rightarrow\) Việc phân tích giỏ hàng sẽ trả lời những câu hỏi trên
Đối với phân tích giỏ hàng, có một số khái niệm cơ bản là Association rules, Support, Confidence, Lift, Conviction
Associate rule định nghĩa như sau:
\[X \Rightarrow Y \ where \ X \subset I, \ Y \subset I \ and \ X \cap Y = 0 \] \(\Rightarrow\) Hay hiểu đơn giản khi khách hàng mua sắm nhóm sản phẩm X thì sẽ có khả năng dùng sản phẩm Y với 1 xác suất nào đấy
| Transaction | Items |
|---|---|
| t1 | {T-shirt, Trousers, Belt} |
| t2 | {T-shirt, Jacket} |
| t3 | {Jacket, Gloves} |
| t4 | {T-shirt, Trousers, Jacket} |
| t5 | {T-shirt, Trousers, Sneakers, Jacket, Belt} |
| t6 | {Trousers, Sneakers, Belt} |
| t7 | {Trousers, Belt, Sneakers} |
Đặt các sản phẩm (item) như sau: \(I=\{i_1, i_2,..., i_k\}\). Tương ứng: \(I = \{T\text-shirt,Trousers,Belt,Jacket,Gloves,Sneakers\}\)
Giao dịch (transaction): \(T = \{t_1, t_2, ..., t_n \}\). Ví dụ: \(t_1=\{T\text-shirt, Trousers, Belt\}\)
\(\Rightarrow\) Associate rule: \(\{T\text- shirt, Trousers\} \Rightarrow \{Belt\}\)
Support của 1 rule: Tần suất xuất hiện của nhóm sản phẩm X và Y trong tổng số các giỏ hàng. Hay số lần X và Y cùng 1 giỏ hàng chia tổng số giỏ hàng
\[ supp(X \Rightarrow Y)=\dfrac{|X \cup Y|}{n} \]
Ví dụ:
\(supp(T\text- shirt \Rightarrow Trousers)=\dfrac{3}{7}=43 \%\)
\(supp(Trousers \Rightarrow Belt)=\dfrac{4}{7}= 57 \%\)
\(supp(\{T\text- shirt, Trousers\} \Rightarrow \{Belt\})=\dfrac{2}{7}=28 \%\)
Confidence của 1 rule: là tỷ lệ % số lần xuất hiện Y trong những giỏ hàng có nhóm sản phẩm X
\[ conf(X \Rightarrow Y)=\dfrac{supp(X \cup Y)}{supp(X)} \]
\[ conf(Trousers \Rightarrow Belt)=\dfrac{4/7}{5/7}= 80 \% \]
Tương tự với nhóm khác:
\(conf(T\text- shirt \Rightarrow Belt)=\dfrac{2/7}{4/7}=50 \%\)
\(conf(\{T\text- shirt, Trousers\} \Rightarrow \{Belt\})=\dfrac{2/7}{3/7}=66 \%\)
Lift của 1 rule: Là tỷ lệ giữa số lần xuất hiện đồng thời nhóm X và Y chia cho số lần xuất hiện X và số lần xuất hiện Y.
\(\Rightarrow\) Giá trị của Lift càng lớn thì sự kết hợp giữa X và Y càng chặt
\[ lift(X \Rightarrow Y)=\dfrac{supp(X \cup Y)}{supp(X)supp(Y) }\]
Ví dụ:
\(lift(T\text- shirt \Rightarrow Trousers)=\dfrac{3/7}{(4/7)(5/7)}= 1.05\)
\(lift(Trousers \Rightarrow Belt)=\dfrac{4/7}{(5/7)(4/7)}= 1.4\)
\(lift(\{T\text- shirt, Trousers\} \Rightarrow \{Belt\})=\dfrac{2/7}{(3/7)(4/7)}=1.17\)
Conviction của 1 rule được định nghĩa như sau:
\[ conv(X \Rightarrow Y)=\dfrac{1-supp(Y)}{1-conf(X \Rightarrow Y) } \]
\(\Rightarrow\) Chỉ số này được hiểu là khả năng X xảy ra mà không có Y
Tóm lại:
Sử dụng dữ liệu BreadBasket, dữ liệu gồm 4 cột, nhưng ta chỉ sử dụng cột 3, 4 để phân tích. Trong đó, cột 3 là id_transaction, cột 4 là item
trans_csv <- read_csv("data/BreadBasket_DMS.csv")
## Parsed with column specification:
## cols(
## Date = col_date(format = ""),
## Time = col_time(format = ""),
## Transaction = col_double(),
## Item = col_character()
## )
trans_csv %>% head()
## # A tibble: 6 x 4
## Date Time Transaction Item
## <date> <time> <dbl> <chr>
## 1 2016-10-30 09:58 1 Bread
## 2 2016-10-30 10:05 2 Scandinavian
## 3 2016-10-30 10:05 2 Scandinavian
## 4 2016-10-30 10:07 3 Hot chocolate
## 5 2016-10-30 10:07 3 Jam
## 6 2016-10-30 10:07 3 Cookies
Đọc dữ liệu dạng transacion từ đĩa, chi tiết cách đọc xem ?read.transactions
trans <- read.transactions("data/BreadBasket_DMS.csv", format="single", cols=c(3,4), sep=",", rm.duplicates=TRUE)
Đầu tiên chúng ta chọn 1 tập kết hợp các rule (chọn các ngưỡng support và confidence) để tìm ra ngưỡng tối ưu. Nếu chúng ta chọn ngưỡng quá thấp thì thuật toán sẽ chạy lâu hơn, đồng thời sẽ có nhiều rule hơn.
Ngưỡng chọn phụ thuộc vào sự đánh đổi giữa suppport và confidence
Xem đồ thị dưới với mức support lần lượt là 10%, 5%, 1% và 0.5%
# Support and confidence values
supportLevels <- c(0.1, 0.05, 0.01, 0.005)
confidenceLevels <- c(0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1)
# Empty integers
rules_sup10 <- integer(length=9)
rules_sup5 <- integer(length=9)
rules_sup1 <- integer(length=9)
rules_sup0.5 <- integer(length=9)
# Apriori algorithm with a support level of 10%
for (i in 1:length(confidenceLevels)) {
rules_sup10[i] <- length(apriori(trans, parameter=list(sup=supportLevels[1],
conf=confidenceLevels[i], target="rules")))
}
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.9 0.1 1 none FALSE TRUE 5 0.1 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 661
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [4 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.8 0.1 1 none FALSE TRUE 5 0.1 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 661
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [4 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.7 0.1 1 none FALSE TRUE 5 0.1 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 661
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [4 item(s)] done [0.00s].
## creating transaction tree ... done [0.01s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.6 0.1 1 none FALSE TRUE 5 0.1 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 661
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [4 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.5 0.1 1 none FALSE TRUE 5 0.1 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 661
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [4 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.4 0.1 1 none FALSE TRUE 5 0.1 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 661
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [4 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [1 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.3 0.1 1 none FALSE TRUE 5 0.1 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 661
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [4 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [2 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.2 0.1 1 none FALSE TRUE 5 0.1 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 661
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [4 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [2 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.1 0.1 1 none FALSE TRUE 5 0.1 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 661
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [4 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [4 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
# Apriori algorithm with a support level of 5%
for (i in 1:length(confidenceLevels)) {
rules_sup5[i] <- length(apriori(trans, parameter=list(sup=supportLevels[2],
conf=confidenceLevels[i], target="rules")))
}
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.9 0.1 1 none FALSE TRUE 5 0.05 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 330
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [10 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.8 0.1 1 none FALSE TRUE 5 0.05 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 330
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [10 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.7 0.1 1 none FALSE TRUE 5 0.05 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 330
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [10 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.6 0.1 1 none FALSE TRUE 5 0.05 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 330
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [10 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.5 0.1 1 none FALSE TRUE 5 0.05 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 330
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [10 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [1 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.4 0.1 1 none FALSE TRUE 5 0.05 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 330
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [10 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [2 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.3 0.1 1 none FALSE TRUE 5 0.05 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 330
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [10 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [4 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.2 0.1 1 none FALSE TRUE 5 0.05 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 330
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [10 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [5 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.1 0.1 1 none FALSE TRUE 5 0.05 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 330
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [10 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [10 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
# Apriori algorithm with a support level of 1%
for (i in 1:length(confidenceLevels)) {
rules_sup1[i] <- length(apriori(trans, parameter=list(sup=supportLevels[3],
conf=confidenceLevels[i], target="rules")))
}
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.9 0.1 1 none FALSE TRUE 5 0.01 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 66
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.01s].
## sorting and recoding items ... [30 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.8 0.1 1 none FALSE TRUE 5 0.01 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 66
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [30 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.7 0.1 1 none FALSE TRUE 5 0.01 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 66
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [30 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [1 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.6 0.1 1 none FALSE TRUE 5 0.01 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 66
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [30 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [2 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.5 0.1 1 none FALSE TRUE 5 0.01 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 66
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [30 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [13 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.4 0.1 1 none FALSE TRUE 5 0.01 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 66
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [30 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [18 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.3 0.1 1 none FALSE TRUE 5 0.01 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 66
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [30 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [22 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.2 0.1 1 none FALSE TRUE 5 0.01 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 66
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [30 item(s)] done [0.00s].
## creating transaction tree ... done [0.01s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [36 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.1 0.1 1 none FALSE TRUE 5 0.01 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 66
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [30 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [48 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
# Apriori algorithm with a support level of 0.5%
for (i in 1:length(confidenceLevels)) {
rules_sup0.5[i] <- length(apriori(trans, parameter=list(sup=supportLevels[4],
conf=confidenceLevels[i], target="rules")))
}
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.9 0.1 1 none FALSE TRUE 5 0.005 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 33
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [36 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.8 0.1 1 none FALSE TRUE 5 0.005 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 33
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [36 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.7 0.1 1 none FALSE TRUE 5 0.005 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 33
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [36 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [2 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.6 0.1 1 none FALSE TRUE 5 0.005 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 33
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.01s].
## sorting and recoding items ... [36 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [6 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.5 0.1 1 none FALSE TRUE 5 0.005 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 33
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [36 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [19 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.4 0.1 1 none FALSE TRUE 5 0.005 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 33
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [36 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [32 rule(s)] done [0.00s].
## creating S4 object ... done [0.01s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.3 0.1 1 none FALSE TRUE 5 0.005 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 33
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [36 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [42 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.2 0.1 1 none FALSE TRUE 5 0.005 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 33
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [36 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [73 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.1 0.1 1 none FALSE TRUE 5 0.005 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 33
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [36 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [123 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
f_plot <- function(confidenceLevels, rules_sup, sup_lev = "10%" ){
plot1 <- qplot(confidenceLevels, rules_sup, geom=c("point", "line"),
xlab="Confidence level", ylab="Number of rules found",
main=paste0("Apriori with a support level of ", sup_lev)) +
theme_bw()
}
# Number of rules found with a support level of 10%
plot1 <- f_plot(confidenceLevels, rules_sup10, "10%")
# Number of rules found with a support level of 5%
plot2 <- f_plot(confidenceLevels, rules_sup5, "5%")
# Number of rules found with a support level of 1%
plot3 <- f_plot(confidenceLevels, rules_sup1, "1%")
# Number of rules found with a support level of 0.5%
plot4 <- f_plot(confidenceLevels, rules_sup0.5, "0.5%")
# Subplot
grid.arrange(plot1, plot2, plot3, plot4, ncol=2)
# Data frame
num_rules <- data.frame(rules_sup10, rules_sup5, rules_sup1, rules_sup0.5, confidenceLevels)
# Number of rules found with a support level of 10%, 5%, 1% and 0.5%
ggplot(data=num_rules, aes(x=confidenceLevels)) +
# Plot line and points (support level of 10%)
geom_line(aes(y=rules_sup10, colour="Support level of 10%")) +
geom_point(aes(y=rules_sup10, colour="Support level of 10%")) +
# Plot line and points (support level of 5%)
geom_line(aes(y=rules_sup5, colour="Support level of 5%")) +
geom_point(aes(y=rules_sup5, colour="Support level of 5%")) +
# Plot line and points (support level of 1%)
geom_line(aes(y=rules_sup1, colour="Support level of 1%")) +
geom_point(aes(y=rules_sup1, colour="Support level of 1%")) +
# Plot line and points (support level of 0.5%)
geom_line(aes(y=rules_sup0.5, colour="Support level of 0.5%")) +
geom_point(aes(y=rules_sup0.5, colour="Support level of 0.5%")) +
# Labs and theme
labs(x="Confidence levels", y="Number of rules found",
title="Apriori algorithm with different support levels") +
theme_bw() +
theme(legend.title=element_blank())
Với support = 10%, chúng ta có quá ít rule với confidence thấp. Có nghĩa là không có sự liên quan của các sản phẩm trong dữ liệu. Chúng ta không thể chọn rule này
Với support = 5%, độ tin cậy có thể lớn nhất = 50%. Điều này có nghĩa là chúng ta có thể chọn mức support thấp hơn 5% với 1 mức confidence hợp lý, lớn hơn
Với support = 1%. Có hàng chục rule có thể có
Với support = 0.05%. Quá nhiều rule để phân tích!
rules_sup1_conf20 <- apriori(trans, parameter = list(sup=supportLevels[3],
conf = confidenceLevels[8], target = "rules"))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.2 0.1 1 none FALSE TRUE 5 0.01 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 66
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [30 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [36 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
inspect(rules_sup1_conf20)
## lhs rhs support confidence lift count
## [1] {} => {Bread} 0.32446326 0.3244633 1.0000000 2146
## [2] {} => {Coffee} 0.48200786 0.4820079 1.0000000 3188
## [3] {Tiffin} => {Coffee} 0.01058361 0.5468750 1.1345769 70
## [4] {Spanish Brunch} => {Coffee} 0.01406108 0.6326531 1.3125368 93
## [5] {Scone} => {Coffee} 0.01844572 0.5422222 1.1249240 122
## [6] {Muffin} => {Coffee} 0.01799214 0.4958333 1.0286831 119
## [7] {Toast} => {Coffee} 0.02570305 0.7296137 1.5136967 170
## [8] {Soup} => {Coffee} 0.01708497 0.4870690 1.0105000 113
## [9] {Alfajores} => {Bread} 0.01118839 0.2761194 0.8510036 74
## [10] {Alfajores} => {Coffee} 0.02237678 0.5522388 1.1457050 148
## [11] {Brownie} => {Bread} 0.01179317 0.2689655 0.8289552 78
## [12] {Brownie} => {Coffee} 0.02086483 0.4758621 0.9872496 138
## [13] {Juice} => {Coffee} 0.02131842 0.5300752 1.0997231 141
## [14] {Hot chocolate} => {Bread} 0.01194436 0.2309942 0.7119270 79
## [15] {Hot chocolate} => {Coffee} 0.02721500 0.5263158 1.0919237 180
## [16] {Medialuna} => {Bread} 0.01632900 0.2849604 0.8782517 108
## [17] {Medialuna} => {Coffee} 0.03296039 0.5751979 1.1933372 218
## [18] {Cookies} => {Bread} 0.01511944 0.2673797 0.8240677 100
## [19] {Cookies} => {Coffee} 0.02978530 0.5267380 1.0927995 197
## [20] {NONE} => {Tea} 0.01693378 0.2357895 1.6572918 112
## [21] {NONE} => {Bread} 0.01874811 0.2610526 0.8045676 124
## [22] {NONE} => {Coffee} 0.04172966 0.5810526 1.2054837 276
## [23] {Sandwich} => {Bread} 0.01693378 0.2271805 0.7001733 112
## [24] {Sandwich} => {Coffee} 0.04233444 0.5679513 1.1783030 280
## [25] {Pastry} => {Bread} 0.02963411 0.3402778 1.0487406 196
## [26] {Pastry} => {Coffee} 0.04868461 0.5590278 1.1597897 322
## [27] {Cake} => {Tea} 0.02615664 0.2492795 1.7521093 173
## [28] {Cake} => {Bread} 0.02328394 0.2219020 0.6839049 154
## [29] {Cake} => {Coffee} 0.05654672 0.5389049 1.1180417 374
## [30] {Tea} => {Bread} 0.02948292 0.2072264 0.6386743 195
## [31] {Tea} => {Coffee} 0.05170850 0.3634431 0.7540191 342
## [32] {Bread} => {Coffee} 0.08980950 0.2767940 0.5742521 594
## [33] {Bread,Pastry} => {Coffee} 0.01133958 0.3826531 0.7938731 75
## [34] {Coffee,Pastry} => {Bread} 0.01133958 0.2329193 0.7178602 75
## [35] {Cake,Tea} => {Coffee} 0.01118839 0.4277457 0.8874247 74
## [36] {Coffee,Tea} => {Cake} 0.01118839 0.2163743 2.0621029 74
plot(rules_sup1_conf20, method = "graph")
plot(rules_sup1_conf20, method = "graph", control = list(layout=igraph::in_circle()))
plot(rules_sup1_conf20, method = "grouped")
Liên quan đến L.H.S (left hand side items), và R.H.S (right hand side items)
Ví dụ - Rule số 1, 2 ở trên có trường empty, tức là giỏ hàng chỉ có 1 sản phẩm - Rule số 3 {Tiffin} => {Coffee}: L.H.S là {Tiffin}, R.H.S là {Coffee}
Người bán hàng muốn biết: Sau khi mua Cookies, khách hàng thường mua sản phẩm gì
rules_sup1_conf20 %>% subset(lhs %in% "Cookies") %>% inspect()
## lhs rhs support confidence lift count
## [1] {Cookies} => {Bread} 0.01511944 0.2673797 0.8240677 100
## [2] {Cookies} => {Coffee} 0.02978530 0.5267380 1.0927995 197
Người bán hàng muốn biết: KH thường mua gì trước khi mua “Bread”
rules_sup1_conf20 %>% subset(rhs %in% "Bread") %>% inspect()
## lhs rhs support confidence lift count
## [1] {} => {Bread} 0.32446326 0.3244633 1.0000000 2146
## [2] {Alfajores} => {Bread} 0.01118839 0.2761194 0.8510036 74
## [3] {Brownie} => {Bread} 0.01179317 0.2689655 0.8289552 78
## [4] {Hot chocolate} => {Bread} 0.01194436 0.2309942 0.7119270 79
## [5] {Medialuna} => {Bread} 0.01632900 0.2849604 0.8782517 108
## [6] {Cookies} => {Bread} 0.01511944 0.2673797 0.8240677 100
## [7] {NONE} => {Bread} 0.01874811 0.2610526 0.8045676 124
## [8] {Sandwich} => {Bread} 0.01693378 0.2271805 0.7001733 112
## [9] {Pastry} => {Bread} 0.02963411 0.3402778 1.0487406 196
## [10] {Cake} => {Bread} 0.02328394 0.2219020 0.6839049 154
## [11] {Tea} => {Bread} 0.02948292 0.2072264 0.6386743 195
## [12] {Coffee,Pastry} => {Bread} 0.01133958 0.2329193 0.7178602 75