AR APRIORI

##### AP APRIORI
library(arules)
## Warning: package 'arules' was built under R version 4.4.3
## Loading required package: Matrix
## 
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
## 
##     abbreviate, write
library(arulesViz)
## Warning: package 'arulesViz' was built under R version 4.4.3
library(RColorBrewer)

data("Groceries")
summary(Groceries)
## transactions as itemMatrix in sparse format with
##  9835 rows (elements/itemsets/transactions) and
##  169 columns (items) and a density of 0.02609146 
## 
## most frequent items:
##       whole milk other vegetables       rolls/buns             soda 
##             2513             1903             1809             1715 
##           yogurt          (Other) 
##             1372            34055 
## 
## element (itemset/transaction) length distribution:
## sizes
##    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
## 2159 1643 1299 1005  855  645  545  438  350  246  182  117   78   77   55   46 
##   17   18   19   20   21   22   23   24   26   27   28   29   32 
##   29   14   14    9   11    4    6    1    1    1    1    3    1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   3.000   4.409   6.000  32.000 
## 
## includes extended item information - examples:
##        labels  level2           level1
## 1 frankfurter sausage meat and sausage
## 2     sausage sausage meat and sausage
## 3  liver loaf sausage meat and sausage
inspect(Groceries[1:5])
##     items                     
## [1] {citrus fruit,            
##      semi-finished bread,     
##      margarine,               
##      ready soups}             
## [2] {tropical fruit,          
##      yogurt,                  
##      coffee}                  
## [3] {whole milk}              
## [4] {pip fruit,               
##      yogurt,                  
##      cream cheese ,           
##      meat spreads}            
## [5] {other vegetables,        
##      whole milk,              
##      condensed milk,          
##      long life bakery product}
#Parameter supp = 0.01 artinya itemset harus muncul di minimal 1% transaksi.
#Parameter conf = 0.2 artinya aturan yang diambil memiliki kepercayaan minimal 20%.
rules <- apriori(Groceries, parameter = list(supp = 0.01, conf = 0.2))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.2    0.1    1 none FALSE            TRUE       5    0.01      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 98 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.01s].
## sorting and recoding items ... [88 item(s)] done [0.00s].
## creating transaction tree ... done [0.01s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [232 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
rules
## set of 232 rules
inspect(rules[1:10])
##      lhs                rhs                support    confidence coverage  
## [1]  {}              => {whole milk}       0.25551601 0.2555160  1.00000000
## [2]  {hard cheese}   => {whole milk}       0.01006609 0.4107884  0.02450432
## [3]  {butter milk}   => {other vegetables} 0.01037112 0.3709091  0.02796136
## [4]  {butter milk}   => {whole milk}       0.01159126 0.4145455  0.02796136
## [5]  {ham}           => {whole milk}       0.01148958 0.4414062  0.02602949
## [6]  {sliced cheese} => {whole milk}       0.01077783 0.4398340  0.02450432
## [7]  {oil}           => {whole milk}       0.01128622 0.4021739  0.02806304
## [8]  {onions}        => {other vegetables} 0.01423488 0.4590164  0.03101169
## [9]  {onions}        => {whole milk}       0.01209964 0.3901639  0.03101169
## [10] {berries}       => {yogurt}           0.01057448 0.3180428  0.03324860
##      lift     count
## [1]  1.000000 2513 
## [2]  1.607682   99 
## [3]  1.916916  102 
## [4]  1.622385  114 
## [5]  1.727509  113 
## [6]  1.721356  106 
## [7]  1.573968  111 
## [8]  2.372268  140 
## [9]  1.526965  119 
## [10] 2.279848  104
arules::itemFrequencyPlot(Groceries, topN = 20, 
                          col = brewer.pal(8, 'Pastel2'),
                          main = 'Relative Item Frequency Plot',
                          type = "relative",
                          ylab = "Item Frequency (Relative)")

Grafik Relative Item Frequency Plot menunjukkan bahwa whole milk merupakan item yang paling sering muncul dalam transaksi, dengan frekuensi relatif sekitar 25%, diikuti oleh other vegetables dan rolls/buns dengan frekuensi masing-masing sekitar 23% dan 19%. Item seperti soda, yogurt, dan bottled water juga termasuk dalam daftar produk yang cukup sering dibeli, dengan frekuensi berkisar antara 15% hingga 18%. Sementara itu, item seperti domestic eggs, brown bread, dan whipped/sour cream menempati posisi terbawah dari 20 item teratas, dengan frekuensi relatif sekitar 7%. Pola ini menunjukkan bahwa produk-produk dengan frekuensi tinggi, terutama whole milk dan vegetables, sangat potensial untuk dianalisis lebih lanjut dalam pembentukan aturan asosiasi serta dapat ditempatkan secara strategis untuk meningkatkan pembelian bersamaan di toko.

AR FP-GROWTH

##### AP FP-Growt
# Load data
library(arules)
data(Groceries)
class(Groceries)
## [1] "transactions"
## attr(,"package")
## [1] "arules"
summary(Groceries)
## transactions as itemMatrix in sparse format with
##  9835 rows (elements/itemsets/transactions) and
##  169 columns (items) and a density of 0.02609146 
## 
## most frequent items:
##       whole milk other vegetables       rolls/buns             soda 
##             2513             1903             1809             1715 
##           yogurt          (Other) 
##             1372            34055 
## 
## element (itemset/transaction) length distribution:
## sizes
##    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
## 2159 1643 1299 1005  855  645  545  438  350  246  182  117   78   77   55   46 
##   17   18   19   20   21   22   23   24   26   27   28   29   32 
##   29   14   14    9   11    4    6    1    1    1    1    3    1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   3.000   4.409   6.000  32.000 
## 
## includes extended item information - examples:
##        labels  level2           level1
## 1 frankfurter sausage meat and sausage
## 2     sausage sausage meat and sausage
## 3  liver loaf sausage meat and sausage
inspect(head(Groceries, 3))
##     items                 
## [1] {citrus fruit,        
##      semi-finished bread, 
##      margarine,           
##      ready soups}         
## [2] {tropical fruit,      
##      yogurt,              
##      coffee}              
## [3] {whole milk}
frequentItems <- eclat (Groceries, parameter = list(supp = 0.07, maxlen = 15)) # calculates support for frequent items
## Eclat
## 
## parameter specification:
##  tidLists support minlen maxlen            target  ext
##     FALSE    0.07      1     15 frequent itemsets TRUE
## 
## algorithmic control:
##  sparse sort verbose
##       7   -2    TRUE
## 
## Absolute minimum support count: 688 
## 
## create itemset ... 
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.01s].
## sorting and recoding items ... [18 item(s)] done [0.00s].
## creating sparse bit matrix ... [18 row(s), 9835 column(s)] done [0.00s].
## writing  ... [19 set(s)] done [0.00s].
## Creating S4 object  ... done [0.00s].
inspect(frequentItems)
##      items                          support    count
## [1]  {other vegetables, whole milk} 0.07483477  736 
## [2]  {whole milk}                   0.25551601 2513 
## [3]  {other vegetables}             0.19349263 1903 
## [4]  {rolls/buns}                   0.18393493 1809 
## [5]  {yogurt}                       0.13950178 1372 
## [6]  {soda}                         0.17437722 1715 
## [7]  {root vegetables}              0.10899847 1072 
## [8]  {tropical fruit}               0.10493137 1032 
## [9]  {bottled water}                0.11052364 1087 
## [10] {sausage}                      0.09395018  924 
## [11] {shopping bags}                0.09852567  969 
## [12] {citrus fruit}                 0.08276563  814 
## [13] {pastry}                       0.08896797  875 
## [14] {pip fruit}                    0.07564820  744 
## [15] {whipped/sour cream}           0.07168277  705 
## [16] {fruit/vegetable juice}        0.07229283  711 
## [17] {newspapers}                   0.07981698  785 
## [18] {bottled beer}                 0.08052872  792 
## [19] {canned beer}                  0.07768175  764
itemFrequencyPlot(Groceries, topN=10, type="absolute", main="Item Frequency") # plot frequent items

rules <- apriori (Groceries, parameter = list(supp = 0.001, conf = 0.5)) # Min Support as 0.001, confidence as 0.5.
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.5    0.1    1 none FALSE            TRUE       5   0.001      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 9 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.01s].
## sorting and recoding items ... [157 item(s)] done [0.00s].
## creating transaction tree ... done [0.01s].
## checking subsets of size 1 2 3 4 5 6 done [0.02s].
## writing ... [5668 rule(s)] done [0.00s].
## creating S4 object  ... done [0.01s].
rules_conf <- sort (rules, by="confidence", decreasing=TRUE) # 'high-confidence' rules.
inspect(head(rules_conf)) # show the support, lift and confidence for all rules
##     lhs                      rhs                    support confidence    coverage     lift count
## [1] {rice,                                                                                       
##      sugar}               => {whole milk}       0.001220132          1 0.001220132 3.913649    12
## [2] {canned fish,                                                                                
##      hygiene articles}    => {whole milk}       0.001118454          1 0.001118454 3.913649    11
## [3] {root vegetables,                                                                            
##      butter,                                                                                     
##      rice}                => {whole milk}       0.001016777          1 0.001016777 3.913649    10
## [4] {root vegetables,                                                                            
##      whipped/sour cream,                                                                         
##      flour}               => {whole milk}       0.001728521          1 0.001728521 3.913649    17
## [5] {butter,                                                                                     
##      soft cheese,                                                                                
##      domestic eggs}       => {whole milk}       0.001016777          1 0.001016777 3.913649    10
## [6] {citrus fruit,                                                                               
##      root vegetables,                                                                            
##      soft cheese}         => {other vegetables} 0.001016777          1 0.001016777 5.168156    10
rules <- apriori(Groceries, parameter = list (supp = 0.001, conf = 0.2, maxlen=3)) # maxlen = 3 limits the elements in a rule to 3
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.2    0.1    1 none FALSE            TRUE       5   0.001      1
##  maxlen target  ext
##       3  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 9 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.01s].
## sorting and recoding items ... [157 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3
## Warning in apriori(Groceries, parameter = list(supp = 0.001, conf = 0.2, :
## Mining stopped (maxlen reached). Only patterns up to a length of 3 returned!
##  done [0.01s].
## writing ... [9958 rule(s)] done [0.00s].
## creating S4 object  ... done [0.01s].
#summary of rules
summary(rules)
## set of 9958 rules
## 
## rule length distribution (lhs + rhs):sizes
##    1    2    3 
##    1  620 9337 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   3.000   3.000   2.938   3.000   3.000 
## 
## summary of quality measures:
##     support           confidence        coverage             lift        
##  Min.   :0.001017   Min.   :0.2000   Min.   :0.001118   Min.   : 0.8028  
##  1st Qu.:0.001220   1st Qu.:0.2439   1st Qu.:0.003762   1st Qu.: 1.8658  
##  Median :0.001627   Median :0.3077   Median :0.005287   Median : 2.3603  
##  Mean   :0.002554   Mean   :0.3452   Mean   :0.008272   Mean   : 2.6338  
##  3rd Qu.:0.002542   3rd Qu.:0.4194   3rd Qu.:0.008236   3rd Qu.: 3.0742  
##  Max.   :0.255516   Max.   :1.0000   Max.   :1.000000   Max.   :35.7158  
##      count        
##  Min.   :  10.00  
##  1st Qu.:  12.00  
##  Median :  16.00  
##  Mean   :  25.12  
##  3rd Qu.:  25.00  
##  Max.   :2513.00  
## 
## mining info:
##       data ntransactions support confidence
##  Groceries          9835   0.001        0.2
##                                                                               call
##  apriori(data = Groceries, parameter = list(supp = 0.001, conf = 0.2, maxlen = 3))
# Inspect rules
#inspect(rules)
#inspect top 5 rules by highest lift
inspect(head(sort(rules, by ="lift"),5))
##     lhs                               rhs                     support    
## [1] {bottled beer, red/blush wine} => {liquor}                0.001931876
## [2] {hamburger meat, soda}         => {Instant food products} 0.001220132
## [3] {ham, white bread}             => {processed cheese}      0.001931876
## [4] {bottled beer, liquor}         => {red/blush wine}        0.001931876
## [5] {Instant food products, soda}  => {hamburger meat}        0.001220132
##     confidence coverage    lift     count
## [1] 0.3958333  0.004880529 35.71579 19   
## [2] 0.2105263  0.005795628 26.20919 12   
## [3] 0.3800000  0.005083884 22.92822 19   
## [4] 0.4130435  0.004677173 21.49356 19   
## [5] 0.6315789  0.001931876 18.99565 12

Dari output diatas didapatkan 5 rule dengan menggunakan algoritma FP-Growth dimana nilai confidence tertinggi ada pada rule 5 sebesar 63.15% yang artinya tingkat kepercayaan seseorang untuk membeli instant food product, soda dan setelahnya membeli hamburger meat sebesar 63.15%. Nilai lift yang dihasilkan sebesar 18.99565 dimana ini lebih besar daripada 1 yang menandakan bahwa pada rule 5 terdapat hubungan antara setiap item LHS dan item RHS sehingga rule 5 cocok untuk dilakukan analisis. Kemudian diperoleh support sebesar 0.12% ini mengartikan bahwa pada rule 5, dari seluruh transaksi yang ada item instant food product, soda dan setelahnya membeli hamburger meat dibeli secara bersamaan sebanyak 12 kali dari total transaksi keseluruhan 9835. Begitupun untuk interpretasi pada rule lainnya.