1 Giới thiệu

Basket analysis là kĩ thuật phân tích hành vi khách hàng dựa trên lịch sử giao dịch của họ, từ đó giúp cho bộ phận kinh doanh nắm được thị hiếu, thói quen tiêu dùng của khách hàng để có những chiến dịch Marketing một cách hợp lý

2 Ví dụ

  • Một khách hàng, khi đi siêu thị, họ thường có xu hướng mua một vài sản phẩm cùng lúc. Ví dụ: Giỏ hàng của 1 khách hàng A bao gồm (Sữa, bánh mỳ, bia, thuốc lá), giỏ hàng của khách hàng B gồm (Sữa, bánh mỳ, bàn chải đánh răng, kem đánh răng) …

  • Câu hỏi đặt ra là khách hàng thường mua những sản phẩm gì, sau khi mua sản phẩm X thì khách hàng sẽ mua sản phẩm Y nào, sản phẩm Z nào?

\(\Rightarrow\) Việc phân tích giỏ hàng sẽ trả lời những câu hỏi trên

3 Khái niệm cơ bản

Đối với phân tích giỏ hàng, có một số khái niệm cơ bản là Association rules, Support, Confidence, Lift, Conviction

3.1 Association rules

Associate rule định nghĩa như sau:

\[X \Rightarrow Y \ where \ X \subset I, \ Y \subset I \ and \ X \cap Y = 0 \] \(\Rightarrow\) Hay hiểu đơn giản khi khách hàng mua sắm nhóm sản phẩm X thì sẽ có khả năng dùng sản phẩm Y với 1 xác suất nào đấy

  • Ví dụ: Có 7 giao dịch của 1 cửa hàng quần áo như bảng sau:
Transaction Items
t1 {T-shirt, Trousers, Belt}
t2 {T-shirt, Jacket}
t3 {Jacket, Gloves}
t4 {T-shirt, Trousers, Jacket}
t5 {T-shirt, Trousers, Sneakers, Jacket, Belt}
t6 {Trousers, Sneakers, Belt}
t7 {Trousers, Belt, Sneakers}
  • Đặt các sản phẩm (item) như sau: \(I=\{i_1, i_2,..., i_k\}\). Tương ứng: \(I = \{T\text-shirt,Trousers,Belt,Jacket,Gloves,Sneakers\}\)

  • Giao dịch (transaction): \(T = \{t_1, t_2, ..., t_n \}\). Ví dụ: \(t_1=\{T\text-shirt, Trousers, Belt\}\)

\(\Rightarrow\) Associate rule: \(\{T\text- shirt, Trousers\} \Rightarrow \{Belt\}\)

3.1.1 Support

Support của 1 rule: Tần suất xuất hiện của nhóm sản phẩm X và Y trong tổng số các giỏ hàng. Hay số lần X và Y cùng 1 giỏ hàng chia tổng số giỏ hàng

\[ supp(X \Rightarrow Y)=\dfrac{|X \cup Y|}{n} \]

Ví dụ:

  • \(supp(T\text- shirt \Rightarrow Trousers)=\dfrac{3}{7}=43 \%\)

  • \(supp(Trousers \Rightarrow Belt)=\dfrac{4}{7}= 57 \%\)

  • \(supp(\{T\text- shirt, Trousers\} \Rightarrow \{Belt\})=\dfrac{2}{7}=28 \%\)

3.1.2 Confidence

Confidence của 1 rule: là tỷ lệ % số lần xuất hiện Y trong những giỏ hàng có nhóm sản phẩm X

\[ conf(X \Rightarrow Y)=\dfrac{supp(X \cup Y)}{supp(X)} \]

  • Ví dụ: Trousers xuất hiện 5/ 7 giỏ hàng, Trousers và Belt đồng thời xuất hiện 4/7 giỏ hàng. Khi đó

\[ conf(Trousers \Rightarrow Belt)=\dfrac{4/7}{5/7}= 80 \% \]

  • Tương tự với nhóm khác:

    • \(conf(T\text- shirt \Rightarrow Belt)=\dfrac{2/7}{4/7}=50 \%\)

    • \(conf(\{T\text- shirt, Trousers\} \Rightarrow \{Belt\})=\dfrac{2/7}{3/7}=66 \%\)

3.1.3 Lift

Lift của 1 rule: Là tỷ lệ giữa số lần xuất hiện đồng thời nhóm X và Y chia cho số lần xuất hiện X và số lần xuất hiện Y.

\(\Rightarrow\) Giá trị của Lift càng lớn thì sự kết hợp giữa X và Y càng chặt

\[ lift(X \Rightarrow Y)=\dfrac{supp(X \cup Y)}{supp(X)supp(Y) }\]

  • Ví dụ:

    • \(lift(T\text- shirt \Rightarrow Trousers)=\dfrac{3/7}{(4/7)(5/7)}= 1.05\)

    • \(lift(Trousers \Rightarrow Belt)=\dfrac{4/7}{(5/7)(4/7)}= 1.4\)

    • \(lift(\{T\text- shirt, Trousers\} \Rightarrow \{Belt\})=\dfrac{2/7}{(3/7)(4/7)}=1.17\)

3.1.4 Conviction

Conviction của 1 rule được định nghĩa như sau:

\[ conv(X \Rightarrow Y)=\dfrac{1-supp(Y)}{1-conf(X \Rightarrow Y) } \]

\(\Rightarrow\) Chỉ số này được hiểu là khả năng X xảy ra mà không có Y

Tóm lại:

  • Support dùng để đánh giá số lần xuất hiện của rule \(X \Rightarrow Y\) trong tổng số các giỏ hàng
  • Confidence dùng để đánh giá khả năng xuất hiện Y trong những giỏ hàng có nhóm sản phẩm X
  • Lift đo lường mức độ chặt chẽ trong sự kết hợp của rule \(X \Rightarrow Y\) (càng lớn càng chặt)
  • Conviction đo lường khả năng xảy ra X mà không có Y (càng nhỏ càng tốt)

4 Ứng dụng

4.1 Dữ liệu

Sử dụng dữ liệu BreadBasket, dữ liệu gồm 4 cột, nhưng ta chỉ sử dụng cột 3, 4 để phân tích. Trong đó, cột 3 là id_transaction, cột 4 là item

trans_csv <- read_csv("data/BreadBasket_DMS.csv")
## Parsed with column specification:
## cols(
##   Date = col_date(format = ""),
##   Time = col_time(format = ""),
##   Transaction = col_double(),
##   Item = col_character()
## )
trans_csv %>% head()
## # A tibble: 6 x 4
##   Date       Time   Transaction Item         
##   <date>     <time>       <dbl> <chr>        
## 1 2016-10-30 09:58            1 Bread        
## 2 2016-10-30 10:05            2 Scandinavian 
## 3 2016-10-30 10:05            2 Scandinavian 
## 4 2016-10-30 10:07            3 Hot chocolate
## 5 2016-10-30 10:07            3 Jam          
## 6 2016-10-30 10:07            3 Cookies

Đọc dữ liệu dạng transacion từ đĩa, chi tiết cách đọc xem ?read.transactions

trans <- read.transactions("data/BreadBasket_DMS.csv", format="single", cols=c(3,4), sep=",", rm.duplicates=TRUE)

4.2 Chọn support và confidence

Đầu tiên chúng ta chọn 1 tập kết hợp các rule (chọn các ngưỡng support và confidence) để tìm ra ngưỡng tối ưu. Nếu chúng ta chọn ngưỡng quá thấp thì thuật toán sẽ chạy lâu hơn, đồng thời sẽ có nhiều rule hơn.

Ngưỡng chọn phụ thuộc vào sự đánh đổi giữa suppport và confidence

Xem đồ thị dưới với mức support lần lượt là 10%, 5%, 1% và 0.5%

# Support and confidence values
supportLevels <- c(0.1, 0.05, 0.01, 0.005)
confidenceLevels <- c(0.9, 0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1)

# Empty integers 
rules_sup10 <- integer(length=9)
rules_sup5 <- integer(length=9)
rules_sup1 <- integer(length=9)
rules_sup0.5 <- integer(length=9)

# Apriori algorithm with a support level of 10%
for (i in 1:length(confidenceLevels)) {
  
  rules_sup10[i] <- length(apriori(trans, parameter=list(sup=supportLevels[1], 
                                   conf=confidenceLevels[i], target="rules")))
  
}
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.9    0.1    1 none FALSE            TRUE       5     0.1      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 661 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [4 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.8    0.1    1 none FALSE            TRUE       5     0.1      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 661 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [4 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.7    0.1    1 none FALSE            TRUE       5     0.1      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 661 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [4 item(s)] done [0.00s].
## creating transaction tree ... done [0.01s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.6    0.1    1 none FALSE            TRUE       5     0.1      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 661 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [4 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.5    0.1    1 none FALSE            TRUE       5     0.1      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 661 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [4 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.4    0.1    1 none FALSE            TRUE       5     0.1      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 661 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [4 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [1 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.3    0.1    1 none FALSE            TRUE       5     0.1      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 661 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [4 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [2 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.2    0.1    1 none FALSE            TRUE       5     0.1      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 661 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [4 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [2 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.1    0.1    1 none FALSE            TRUE       5     0.1      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 661 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [4 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [4 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
# Apriori algorithm with a support level of 5%
for (i in 1:length(confidenceLevels)) {
  
  rules_sup5[i] <- length(apriori(trans, parameter=list(sup=supportLevels[2], 
                                  conf=confidenceLevels[i], target="rules")))
  
}
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.9    0.1    1 none FALSE            TRUE       5    0.05      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 330 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [10 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.8    0.1    1 none FALSE            TRUE       5    0.05      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 330 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [10 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.7    0.1    1 none FALSE            TRUE       5    0.05      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 330 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [10 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.6    0.1    1 none FALSE            TRUE       5    0.05      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 330 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [10 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.5    0.1    1 none FALSE            TRUE       5    0.05      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 330 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [10 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [1 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.4    0.1    1 none FALSE            TRUE       5    0.05      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 330 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [10 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [2 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.3    0.1    1 none FALSE            TRUE       5    0.05      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 330 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [10 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [4 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.2    0.1    1 none FALSE            TRUE       5    0.05      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 330 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [10 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [5 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.1    0.1    1 none FALSE            TRUE       5    0.05      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 330 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [10 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [10 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
# Apriori algorithm with a support level of 1%
for (i in 1:length(confidenceLevels)) {
  
  rules_sup1[i] <- length(apriori(trans, parameter=list(sup=supportLevels[3], 
                                  conf=confidenceLevels[i], target="rules")))
  
}
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.9    0.1    1 none FALSE            TRUE       5    0.01      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 66 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.01s].
## sorting and recoding items ... [30 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.8    0.1    1 none FALSE            TRUE       5    0.01      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 66 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [30 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.7    0.1    1 none FALSE            TRUE       5    0.01      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 66 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [30 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [1 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.6    0.1    1 none FALSE            TRUE       5    0.01      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 66 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [30 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [2 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.5    0.1    1 none FALSE            TRUE       5    0.01      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 66 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [30 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [13 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.4    0.1    1 none FALSE            TRUE       5    0.01      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 66 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [30 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [18 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.3    0.1    1 none FALSE            TRUE       5    0.01      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 66 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [30 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [22 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.2    0.1    1 none FALSE            TRUE       5    0.01      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 66 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [30 item(s)] done [0.00s].
## creating transaction tree ... done [0.01s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [36 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.1    0.1    1 none FALSE            TRUE       5    0.01      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 66 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [30 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [48 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
# Apriori algorithm with a support level of 0.5%
for (i in 1:length(confidenceLevels)) {
  
  rules_sup0.5[i] <- length(apriori(trans, parameter=list(sup=supportLevels[4], 
                                    conf=confidenceLevels[i], target="rules")))
  
}
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.9    0.1    1 none FALSE            TRUE       5   0.005      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 33 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [36 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.8    0.1    1 none FALSE            TRUE       5   0.005      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 33 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [36 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [0 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.7    0.1    1 none FALSE            TRUE       5   0.005      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 33 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [36 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [2 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.6    0.1    1 none FALSE            TRUE       5   0.005      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 33 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.01s].
## sorting and recoding items ... [36 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [6 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.5    0.1    1 none FALSE            TRUE       5   0.005      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 33 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [36 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [19 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.4    0.1    1 none FALSE            TRUE       5   0.005      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 33 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [36 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [32 rule(s)] done [0.00s].
## creating S4 object  ... done [0.01s].
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.3    0.1    1 none FALSE            TRUE       5   0.005      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 33 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [36 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [42 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.2    0.1    1 none FALSE            TRUE       5   0.005      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 33 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [36 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [73 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.1    0.1    1 none FALSE            TRUE       5   0.005      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 33 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [36 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [123 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
f_plot <- function(confidenceLevels, rules_sup, sup_lev = "10%" ){
  plot1 <- qplot(confidenceLevels, rules_sup, geom=c("point", "line"), 
               xlab="Confidence level", ylab="Number of rules found", 
               main=paste0("Apriori with a support level of ", sup_lev)) +
  theme_bw()
}

# Number of rules found with a support level of 10%
plot1 <- f_plot(confidenceLevels, rules_sup10, "10%")

# Number of rules found with a support level of 5%
plot2 <- f_plot(confidenceLevels, rules_sup5, "5%")


# Number of rules found with a support level of 1%
plot3 <- f_plot(confidenceLevels, rules_sup1, "1%")


# Number of rules found with a support level of 0.5%
plot4 <- f_plot(confidenceLevels, rules_sup0.5, "0.5%")

# Subplot

grid.arrange(plot1, plot2, plot3, plot4, ncol=2)

  • Kết hợp các đường vào cùng 1 đồ thị
# Data frame
num_rules <- data.frame(rules_sup10, rules_sup5, rules_sup1, rules_sup0.5, confidenceLevels)

# Number of rules found with a support level of 10%, 5%, 1% and 0.5%
ggplot(data=num_rules, aes(x=confidenceLevels)) +
  
  # Plot line and points (support level of 10%)
  geom_line(aes(y=rules_sup10, colour="Support level of 10%")) + 
  geom_point(aes(y=rules_sup10, colour="Support level of 10%")) +
  
  # Plot line and points (support level of 5%)
  geom_line(aes(y=rules_sup5, colour="Support level of 5%")) +
  geom_point(aes(y=rules_sup5, colour="Support level of 5%")) +
  
  # Plot line and points (support level of 1%)
  geom_line(aes(y=rules_sup1, colour="Support level of 1%")) + 
  geom_point(aes(y=rules_sup1, colour="Support level of 1%")) +
  
  # Plot line and points (support level of 0.5%)
  geom_line(aes(y=rules_sup0.5, colour="Support level of 0.5%")) +
  geom_point(aes(y=rules_sup0.5, colour="Support level of 0.5%")) +
  
  # Labs and theme
  labs(x="Confidence levels", y="Number of rules found", 
       title="Apriori algorithm with different support levels") +
  theme_bw() +
  theme(legend.title=element_blank())

4.3 Phân tích kết quả

  • Với support = 10%, chúng ta có quá ít rule với confidence thấp. Có nghĩa là không có sự liên quan của các sản phẩm trong dữ liệu. Chúng ta không thể chọn rule này

  • Với support = 5%, độ tin cậy có thể lớn nhất = 50%. Điều này có nghĩa là chúng ta có thể chọn mức support thấp hơn 5% với 1 mức confidence hợp lý, lớn hơn

  • Với support = 1%. Có hàng chục rule có thể có

  • Với support = 0.05%. Quá nhiều rule để phân tích!

rules_sup1_conf20 <- apriori(trans, parameter = list(sup=supportLevels[3], 
                             conf = confidenceLevels[8], target = "rules"))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.2    0.1    1 none FALSE            TRUE       5    0.01      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 66 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[104 item(s), 6614 transaction(s)] done [0.00s].
## sorting and recoding items ... [30 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [36 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
  • Hiện kết quả các rule theo support tăng dần
inspect(rules_sup1_conf20)
##      lhs                 rhs      support    confidence lift      count
## [1]  {}               => {Bread}  0.32446326 0.3244633  1.0000000 2146 
## [2]  {}               => {Coffee} 0.48200786 0.4820079  1.0000000 3188 
## [3]  {Tiffin}         => {Coffee} 0.01058361 0.5468750  1.1345769   70 
## [4]  {Spanish Brunch} => {Coffee} 0.01406108 0.6326531  1.3125368   93 
## [5]  {Scone}          => {Coffee} 0.01844572 0.5422222  1.1249240  122 
## [6]  {Muffin}         => {Coffee} 0.01799214 0.4958333  1.0286831  119 
## [7]  {Toast}          => {Coffee} 0.02570305 0.7296137  1.5136967  170 
## [8]  {Soup}           => {Coffee} 0.01708497 0.4870690  1.0105000  113 
## [9]  {Alfajores}      => {Bread}  0.01118839 0.2761194  0.8510036   74 
## [10] {Alfajores}      => {Coffee} 0.02237678 0.5522388  1.1457050  148 
## [11] {Brownie}        => {Bread}  0.01179317 0.2689655  0.8289552   78 
## [12] {Brownie}        => {Coffee} 0.02086483 0.4758621  0.9872496  138 
## [13] {Juice}          => {Coffee} 0.02131842 0.5300752  1.0997231  141 
## [14] {Hot chocolate}  => {Bread}  0.01194436 0.2309942  0.7119270   79 
## [15] {Hot chocolate}  => {Coffee} 0.02721500 0.5263158  1.0919237  180 
## [16] {Medialuna}      => {Bread}  0.01632900 0.2849604  0.8782517  108 
## [17] {Medialuna}      => {Coffee} 0.03296039 0.5751979  1.1933372  218 
## [18] {Cookies}        => {Bread}  0.01511944 0.2673797  0.8240677  100 
## [19] {Cookies}        => {Coffee} 0.02978530 0.5267380  1.0927995  197 
## [20] {NONE}           => {Tea}    0.01693378 0.2357895  1.6572918  112 
## [21] {NONE}           => {Bread}  0.01874811 0.2610526  0.8045676  124 
## [22] {NONE}           => {Coffee} 0.04172966 0.5810526  1.2054837  276 
## [23] {Sandwich}       => {Bread}  0.01693378 0.2271805  0.7001733  112 
## [24] {Sandwich}       => {Coffee} 0.04233444 0.5679513  1.1783030  280 
## [25] {Pastry}         => {Bread}  0.02963411 0.3402778  1.0487406  196 
## [26] {Pastry}         => {Coffee} 0.04868461 0.5590278  1.1597897  322 
## [27] {Cake}           => {Tea}    0.02615664 0.2492795  1.7521093  173 
## [28] {Cake}           => {Bread}  0.02328394 0.2219020  0.6839049  154 
## [29] {Cake}           => {Coffee} 0.05654672 0.5389049  1.1180417  374 
## [30] {Tea}            => {Bread}  0.02948292 0.2072264  0.6386743  195 
## [31] {Tea}            => {Coffee} 0.05170850 0.3634431  0.7540191  342 
## [32] {Bread}          => {Coffee} 0.08980950 0.2767940  0.5742521  594 
## [33] {Bread,Pastry}   => {Coffee} 0.01133958 0.3826531  0.7938731   75 
## [34] {Coffee,Pastry}  => {Bread}  0.01133958 0.2329193  0.7178602   75 
## [35] {Cake,Tea}       => {Coffee} 0.01118839 0.4277457  0.8874247   74 
## [36] {Coffee,Tea}     => {Cake}   0.01118839 0.2163743  2.0621029   74
plot(rules_sup1_conf20, method = "graph")

  • graph sắp sếp support tăng dần và label theo thứ từ a->z
plot(rules_sup1_conf20, method = "graph", control = list(layout=igraph::in_circle()))

plot(rules_sup1_conf20, method = "grouped")

4.4 Lựa chọn rules

Liên quan đến L.H.S (left hand side items), và R.H.S (right hand side items)

Ví dụ - Rule số 1, 2 ở trên có trường empty, tức là giỏ hàng chỉ có 1 sản phẩm - Rule số 3 {Tiffin} => {Coffee}: L.H.S là {Tiffin}, R.H.S là {Coffee}

4.4.1 Trường hợp 1:

Người bán hàng muốn biết: Sau khi mua Cookies, khách hàng thường mua sản phẩm gì

rules_sup1_conf20 %>% subset(lhs %in% "Cookies") %>% inspect()
##     lhs          rhs      support    confidence lift      count
## [1] {Cookies} => {Bread}  0.01511944 0.2673797  0.8240677 100  
## [2] {Cookies} => {Coffee} 0.02978530 0.5267380  1.0927995 197

4.4.2 Trường hợp 2:

Người bán hàng muốn biết: KH thường mua gì trước khi mua “Bread”

rules_sup1_conf20 %>% subset(rhs %in% "Bread") %>% inspect()
##      lhs                rhs     support    confidence lift      count
## [1]  {}              => {Bread} 0.32446326 0.3244633  1.0000000 2146 
## [2]  {Alfajores}     => {Bread} 0.01118839 0.2761194  0.8510036   74 
## [3]  {Brownie}       => {Bread} 0.01179317 0.2689655  0.8289552   78 
## [4]  {Hot chocolate} => {Bread} 0.01194436 0.2309942  0.7119270   79 
## [5]  {Medialuna}     => {Bread} 0.01632900 0.2849604  0.8782517  108 
## [6]  {Cookies}       => {Bread} 0.01511944 0.2673797  0.8240677  100 
## [7]  {NONE}          => {Bread} 0.01874811 0.2610526  0.8045676  124 
## [8]  {Sandwich}      => {Bread} 0.01693378 0.2271805  0.7001733  112 
## [9]  {Pastry}        => {Bread} 0.02963411 0.3402778  1.0487406  196 
## [10] {Cake}          => {Bread} 0.02328394 0.2219020  0.6839049  154 
## [11] {Tea}           => {Bread} 0.02948292 0.2072264  0.6386743  195 
## [12] {Coffee,Pastry} => {Bread} 0.01133958 0.2329193  0.7178602   75