PENGERTIAN

FP-Growth adalah algoritma pencarian frequent itemsets yang didapat dari FP-tree. Algoritma FP-Growth merupakan pengembangan dari algoritma Apriori. Sehingga kekurangan dari algoritma Apriori diperbaiki di algoritma FP-Growth. Algoritma ini menenfukan frequent itemset yang berakhkan suffix tertentu dengan menggunakan metode devide and conquer untuk memecah problem menjadi subproblem yang lebih kecil. Jadi dapat disimpulkan bahwa FP-Growth adalah salah satu algoritma yang digunakan untuk mencari himpunan data yang sering muncul dari sekumpulan data, dengan menggunakan cara pembangktan stuktur data Tree.

TUJUAN PEMBELAJARAN

Mahasiswa dapat menganalisis, menginterpretasikan data dan informasi serta mengambil keputusan yang tepat berdasarkan pendekatan analisis asosiasi (CPMK1, CPMK2, KUE, KKB). - Analisis afinitas - Algoritma Apriori di R Studio - Pertumbuhan FP di R Studio

Analisis Afinitas

Analisis afinitas adalah studi tentang atribut atau karakteristik yang “berjalan bersama”. Metode untuk analisis afinitas, juga dikenal sebagai analisis keranjang pasar, berusaha mengungkap asosiasi di antara atribut-atribut ini; yaitu, ia berusaha mengungkap aturan untuk mengukur hubungan antara dua atau lebih atribut. Aturan asosiasi mengambil bentuk “Jika anteseden, maka konsekuensinya”, bersama dengan ukuran dukungan dan kepercayaan yang terkait dengan aturan tersebut.

Apriori dalam R Studio

Apriori() menghasilkan seperangkat aturan yang paling relevan dari data transaksi tertentu. Ini juga menunjukkan dukungan, kepercayaan diri, dan pencabutan aturan tersebut. Ketiga ukuran ini dapat digunakan untuk memutuskan kekuatan relatif aturan. Jadi apa arti istilah-istilah ini?

Support \(=\frac{\text { Number of transactions with both } A \text { and } B}{\text { Total number of transactions }}=P(A \cap B)\)

Confidence \(=\frac{\text { Number of transactions with both A and } B}{\text { Total number of transactions with } A}=\frac{P(A \cap B)}{P(A)}\)

ExpectedCon fidence \(=\frac{\text { Number of transactions with } B}{\text { Total number of transactions }}=P(B)\)

Lift \(=\frac{\text { Confidence }}{\text { Expected Confidence }}=\frac{P(A \cap B)}{P(A) \cdot P(B)}\)

TAHAPAN

Algoritma dan Paket

library(Matrix)
library(arules)

## Warning: package 'arules' was built under R version 4.4.2

## 
## Attaching package: 'arules'

## The following objects are masked from 'package:base':
## 
##     abbreviate, write

library(arulesViz)

## Warning: package 'arulesViz' was built under R version 4.4.2

library(grid)
data("Groceries")
class(Groceries)

## [1] "transactions"
## attr(,"package")
## [1] "arules"

inspect(head(Groceries, 3))

##     items                 
## [1] {citrus fruit,        
##      semi-finished bread, 
##      margarine,           
##      ready soups}         
## [2] {tropical fruit,      
##      yogurt,              
##      coffee}              
## [3] {whole milk}

Produk Paling Sering

frequentItems <- eclat (Groceries, parameter = list(supp = 0.07, maxlen = 15)) #Menghitung dukungan untuk item yang sering

## Eclat
## 
## parameter specification:
##  tidLists support minlen maxlen            target  ext
##     FALSE    0.07      1     15 frequent itemsets TRUE
## 
## algorithmic control:
##  sparse sort verbose
##       7   -2    TRUE
## 
## Absolute minimum support count: 688 
## 
## create itemset ... 
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.01s].
## sorting and recoding items ... [18 item(s)] done [0.00s].
## creating sparse bit matrix ... [18 row(s), 9835 column(s)] done [0.00s].
## writing  ... [19 set(s)] done [0.00s].
## Creating S4 object  ... done [0.00s].

inspect(frequentItems)

##      items                          support    count
## [1]  {other vegetables, whole milk} 0.07483477  736 
## [2]  {whole milk}                   0.25551601 2513 
## [3]  {other vegetables}             0.19349263 1903 
## [4]  {rolls/buns}                   0.18393493 1809 
## [5]  {yogurt}                       0.13950178 1372 
## [6]  {soda}                         0.17437722 1715 
## [7]  {root vegetables}              0.10899847 1072 
## [8]  {tropical fruit}               0.10493137 1032 
## [9]  {bottled water}                0.11052364 1087 
## [10] {sausage}                      0.09395018  924 
## [11] {shopping bags}                0.09852567  969 
## [12] {citrus fruit}                 0.08276563  814 
## [13] {pastry}                       0.08896797  875 
## [14] {pip fruit}                    0.07564820  744 
## [15] {whipped/sour cream}           0.07168277  705 
## [16] {fruit/vegetable juice}        0.07229283  711 
## [17] {newspapers}                   0.07981698  785 
## [18] {bottled beer}                 0.08052872  792 
## [19] {canned beer}                  0.07768175  764

itemFrequencyPlot(Groceries, topN=10, type="absolute", main="Item Frequency")

Aturan Rekomendasi Produk

rules <- apriori (Groceries, parameter = list(supp = 0.001, conf = 0.5)) #Minimal dukungan ialah 0.001, kepercayaan ialah 0.8

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.5    0.1    1 none FALSE            TRUE       5   0.001      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 9 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [157 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 done [0.01s].
## writing ... [5668 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

rules_conf <- sort (rules, by="confidence", decreasing=TRUE) #Aturan kepercayaan yang tinggi
inspect(head(rules_conf)) #Menunjukkan dukungan,tingkatan dan kepercayaan untuk semua aturan

##     lhs                      rhs                    support confidence    coverage     lift count
## [1] {rice,                                                                                       
##      sugar}               => {whole milk}       0.001220132          1 0.001220132 3.913649    12
## [2] {canned fish,                                                                                
##      hygiene articles}    => {whole milk}       0.001118454          1 0.001118454 3.913649    11
## [3] {root vegetables,                                                                            
##      butter,                                                                                     
##      rice}                => {whole milk}       0.001016777          1 0.001016777 3.913649    10
## [4] {root vegetables,                                                                            
##      whipped/sour cream,                                                                         
##      flour}               => {whole milk}       0.001728521          1 0.001728521 3.913649    17
## [5] {butter,                                                                                     
##      soft cheese,                                                                                
##      domestic eggs}       => {whole milk}       0.001016777          1 0.001016777 3.913649    10
## [6] {citrus fruit,                                                                               
##      root vegetables,                                                                            
##      soft cheese}         => {other vegetables} 0.001016777          1 0.001016777 5.168156    10

rules_lift <- sort (rules, by="lift", decreasing=TRUE) #Aturan tingkatan yang tinggi
inspect(head(rules_lift)) #Menunjukkan dukungan, tingkatan dan kepercayaan untuk semua aturan

##     lhs                         rhs                  support confidence    coverage     lift count
## [1] {Instant food products,                                                                       
##      soda}                   => {hamburger meat} 0.001220132  0.6315789 0.001931876 18.99565    12
## [2] {soda,                                                                                        
##      popcorn}                => {salty snack}    0.001220132  0.6315789 0.001931876 16.69779    12
## [3] {flour,                                                                                       
##      baking powder}          => {sugar}          0.001016777  0.5555556 0.001830198 16.40807    10
## [4] {ham,                                                                                         
##      processed cheese}       => {white bread}    0.001931876  0.6333333 0.003050330 15.04549    19
## [5] {whole milk,                                                                                  
##      Instant food products}  => {hamburger meat} 0.001525165  0.5000000 0.003050330 15.03823    15
## [6] {other vegetables,                                                                            
##      curd,                                                                                        
##      yogurt,                                                                                      
##      whipped/sour cream}     => {cream cheese }  0.001016777  0.5882353 0.001728521 14.83409    10

Algoritma Apriori

Groceries <- apriori(txn, parameter = list(minlen=2, sup = 0.001, conf = 0.05, target="rules"))

Total aturan yang dihasilkan

summary(Groceries)

## transactions as itemMatrix in sparse format with
##  9835 rows (elements/itemsets/transactions) and
##  169 columns (items) and a density of 0.02609146 
## 
## most frequent items:
##       whole milk other vegetables       rolls/buns             soda 
##             2513             1903             1809             1715 
##           yogurt          (Other) 
##             1372            34055 
## 
## element (itemset/transaction) length distribution:
## sizes
##    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
## 2159 1643 1299 1005  855  645  545  438  350  246  182  117   78   77   55   46 
##   17   18   19   20   21   22   23   24   26   27   28   29   32 
##   29   14   14    9   11    4    6    1    1    1    1    3    1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   3.000   4.409   6.000  32.000 
## 
## includes extended item information - examples:
##        labels  level2           level1
## 1 frankfurter sausage meat and sausage
## 2     sausage sausage meat and sausage
## 3  liver loaf sausage meat and sausage

Memeriksa aturan keranjang

inspect(Groceries[1:20])

##      items                      
## [1]  {citrus fruit,             
##       semi-finished bread,      
##       margarine,                
##       ready soups}              
## [2]  {tropical fruit,           
##       yogurt,                   
##       coffee}                   
## [3]  {whole milk}               
## [4]  {pip fruit,                
##       yogurt,                   
##       cream cheese ,            
##       meat spreads}             
## [5]  {other vegetables,         
##       whole milk,               
##       condensed milk,           
##       long life bakery product} 
## [6]  {whole milk,               
##       butter,                   
##       yogurt,                   
##       rice,                     
##       abrasive cleaner}         
## [7]  {rolls/buns}               
## [8]  {other vegetables,         
##       UHT-milk,                 
##       rolls/buns,               
##       bottled beer,             
##       liquor (appetizer)}       
## [9]  {pot plants}               
## [10] {whole milk,               
##       cereals}                  
## [11] {tropical fruit,           
##       other vegetables,         
##       white bread,              
##       bottled water,            
##       chocolate}                
## [12] {citrus fruit,             
##       tropical fruit,           
##       whole milk,               
##       butter,                   
##       curd,                     
##       yogurt,                   
##       flour,                    
##       bottled water,            
##       dishes}                   
## [13] {beef}                     
## [14] {frankfurter,              
##       rolls/buns,               
##       soda}                     
## [15] {chicken,                  
##       tropical fruit}           
## [16] {butter,                   
##       sugar,                    
##       fruit/vegetable juice,    
##       newspapers}               
## [17] {fruit/vegetable juice}    
## [18] {packaged fruit/vegetables}
## [19] {chocolate}                
## [20] {specialty bar}

Memvisualisasikan Aturan Asosiasi

plot(Groceries, jitter = 0)

## Warning in plot.itemMatrix(Groceries, jitter = 0): Use image() instead of
## plot().

plot(Groceries, method = "grouped", control = list(k = 5))

## Warning in plot.itemMatrix(Groceries, method = "grouped", control = list(k =
## 5)): Use image() instead of plot().

Grafik dari 20 aturan pertama

plot(Groceries[1:20], method="graph")

## Warning in plot.itemMatrix(Groceries[1:20], method = "graph"): Use image()
## instead of plot().

Grafik dari 50 aturan pertama

plot(Groceries[1:50], method="graph")

## Warning in plot.itemMatrix(Groceries[1:50], method = "graph"): Use image()
## instead of plot().

Plot koordinat paralel

plot(Groceries[1:20], method="paracoord")

## Warning in plot.itemMatrix(Groceries[1:20], method = "paracoord"): Use image()
## instead of plot().

Kontrol Jumlah Aturan di Output

rules <- apriori(Groceries, parameter = list (supp = 0.001, conf = 0.2, maxlen=3)) #maxlen=3, membatasi elemen dalam aturan menjadi 3

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.2    0.1    1 none FALSE            TRUE       5   0.001      1
##  maxlen target  ext
##       3  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 9 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [157 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3

## Warning in apriori(Groceries, parameter = list(supp = 0.001, conf = 0.2, :
## Mining stopped (maxlen reached). Only patterns up to a length of 3 returned!

##  done [0.01s].
## writing ... [9958 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

summary(rules)

## set of 9958 rules
## 
## rule length distribution (lhs + rhs):sizes
##    1    2    3 
##    1  620 9337 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   3.000   3.000   2.938   3.000   3.000 
## 
## summary of quality measures:
##     support           confidence        coverage             lift        
##  Min.   :0.001017   Min.   :0.2000   Min.   :0.001118   Min.   : 0.8028  
##  1st Qu.:0.001220   1st Qu.:0.2439   1st Qu.:0.003762   1st Qu.: 1.8658  
##  Median :0.001627   Median :0.3077   Median :0.005287   Median : 2.3603  
##  Mean   :0.002554   Mean   :0.3452   Mean   :0.008272   Mean   : 2.6338  
##  3rd Qu.:0.002542   3rd Qu.:0.4194   3rd Qu.:0.008236   3rd Qu.: 3.0742  
##  Max.   :0.255516   Max.   :1.0000   Max.   :1.000000   Max.   :35.7158  
##      count        
##  Min.   :  10.00  
##  1st Qu.:  12.00  
##  Median :  16.00  
##  Mean   :  25.12  
##  3rd Qu.:  25.00  
##  Max.   :2513.00  
## 
## mining info:
##       data ntransactions support confidence
##  Groceries          9835   0.001        0.2
##                                                                               call
##  apriori(data = Groceries, parameter = list(supp = 0.001, conf = 0.2, maxlen = 3))

inspect(head(sort(rules, by ="lift"),5)) #Periksa 5 aturan teratas dengan tingkatan tertinggi

##     lhs                               rhs                     support    
## [1] {bottled beer, red/blush wine} => {liquor}                0.001931876
## [2] {hamburger meat, soda}         => {Instant food products} 0.001220132
## [3] {ham, white bread}             => {processed cheese}      0.001931876
## [4] {bottled beer, liquor}         => {red/blush wine}        0.001931876
## [5] {Instant food products, soda}  => {hamburger meat}        0.001220132
##     confidence coverage    lift     count
## [1] 0.3958333  0.004880529 35.71579 19   
## [2] 0.2105263  0.005795628 26.20919 12   
## [3] 0.3800000  0.005083884 22.92822 19   
## [4] 0.4130435  0.004677173 21.49356 19   
## [5] 0.6315789  0.001931876 18.99565 12

plot(rules)

## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.

plot(rules , shading="order", control=list(main="two-key plot"))

## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.

RulesBev1 <- subset(rules, subset = rhs %ain% "soda")
summary(RulesBev1)

## set of 699 rules
## 
## rule length distribution (lhs + rhs):sizes
##   2   3 
##  64 635 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.000   3.000   3.000   2.908   3.000   3.000 
## 
## summary of quality measures:
##     support           confidence        coverage             lift      
##  Min.   :0.001017   Min.   :0.2000   Min.   :0.001322   Min.   :1.147  
##  1st Qu.:0.001220   1st Qu.:0.2390   1st Qu.:0.004067   1st Qu.:1.371  
##  Median :0.001627   Median :0.2807   Median :0.005592   Median :1.610  
##  Mean   :0.002408   Mean   :0.2994   Mean   :0.008989   Mean   :1.717  
##  3rd Qu.:0.002440   3rd Qu.:0.3415   3rd Qu.:0.009049   3rd Qu.:1.958  
##  Max.   :0.038332   Max.   :0.7692   Max.   :0.183935   Max.   :4.411  
##      count       
##  Min.   : 10.00  
##  1st Qu.: 12.00  
##  Median : 16.00  
##  Mean   : 23.68  
##  3rd Qu.: 24.00  
##  Max.   :377.00  
## 
## mining info:
##       data ntransactions support confidence
##  Groceries          9835   0.001        0.2
##                                                                               call
##  apriori(data = Groceries, parameter = list(supp = 0.001, conf = 0.2, maxlen = 3))

inspect(head(sort(RulesBev1, by ="lift"),5))

##     lhs                              rhs    support     confidence coverage   
## [1] {coffee, misc. beverages}     => {soda} 0.001016777 0.7692308  0.001321810
## [2] {pastry, misc. beverages}     => {soda} 0.001220132 0.6315789  0.001931876
## [3] {chicken, waffles}            => {soda} 0.001220132 0.5714286  0.002135231
## [4] {tropical fruit, canned beer} => {soda} 0.001728521 0.5666667  0.003050330
## [5] {bottled water, cake bar}     => {soda} 0.001016777 0.5555556  0.001830198
##     lift     count
## [1] 4.411303 10   
## [2] 3.621912 12   
## [3] 3.276968 12   
## [4] 3.249660 17   
## [5] 3.185941 10

FP-Growth

Syahron Fitrahjon-Institut Teknologi Statistika dan Bisnis Muhammadiyah

2024-11-26