Index

  • R介紹,與關聯規則基本觀念
  • Apriori 演算法
  • 範例:使用Titanic的簡易乘客資料
  • 資料使用Apriori演算法
  • 篩除多餘規則
  • 視覺化
  • 解讀

R 的介紹

關聯規則基本觀念

Support (支持度):

  在所有事件發生的狀況下,同時發生A、B事件的機率=P(B∩A)

Confidence (信心水準) :

  在A事件發生的狀況下,同時發生A、B事件的機率=條件機率 = P(B∣A)

Lift (增益):

  Lift值=Confidence / Expected Confidence(本例為P(B))

            當Lift值>1, 則A與B間有正向關係
            當Lift值=1, 則A與B間沒有關係
            當Lift值<1, 則A與B間為負向關係

Apriori 演算法

Apriori演算法是種最有影響的挖掘布爾關聯規則頻繁項集的演算法。它的核心是基於兩階段頻集思想的遞推演算法。該關聯規則在分類上屬於單維、單層、布爾關聯規則。在這裡,所有支持度大於最小支持度的項集稱為頻繁項集(簡稱頻集),也常稱為最大項目集。

優點:

  簡單、易理解、數據要求低,

缺點:

  1. 在每一步產生侯選項目集時循環產生的組合過多,沒有排除不應該參與組合的元素;

  2. 每次計算項集的支持度時,都對資料庫D中的全部記錄進行了一遍掃描比較,如果是一個大型的資料庫的話,這種掃描比較會大大增加計算機系統的I/O開銷。而這種代價是隨著資料庫的記錄的增加呈現出幾何級數的增加。

Apriori:尋找最大項目集(頻繁項集)

  1. 簡單統計所有含一個元素項目集出現的頻數,並找出那些不小於最小支持度的項目集,即一維最大項目集。

  2. 開始循環處理直到再沒有最大項目集生成。循環過程是:第k步中,根據第k-1步生成的(k-1)維最大項目集產生k維侯選項目集,然後對資料庫進行搜索,得到侯選項目集的項集支持度,與最小支持度進行比較,從而找到k維最大項目集。

Aproiro演算法使用上述的性質產生候選項目(Candidate itemsets),候選項目集的產生主要包含

  • 「結合(Join)」
  • 「刪除(Prune)」

範例:使用Titanic的簡易乘客資料

在泰坦尼克號數據集中,根據[Class(艙等)]、[Sex(性別)]、[Age(年齡)]、[Survired(生存)]而有四種欄位。

其中,每一行代表一個人的資料。

##    Class  Sex   Age Survived
## 1    3rd Male Child       No
## 2    3rd Male Child       No
## 3    3rd Male Child       No
## 4    3rd Male Child       No
## 5    3rd Male Child       No
## 6    3rd Male Child       No
## 7    3rd Male Child       No
## 8    3rd Male Child       No
## 9    3rd Male Child       No
## 10   3rd Male Child       No

範例:使用Titanic的簡易乘客資料

任意抓5筆資料

##      Class  Sex   Age Survived
## 1774  Crew Male Adult      Yes
## 1598   1st Male Adult      Yes
## 86     1st Male Adult       No
## 1292  Crew Male Adult       No
## 1165  Crew Male Adult       No

資料小結:

##   Class         Sex          Age       Survived  
##  1st :325   Female: 470   Adult:2092   No :1490  
##  2nd :285   Male  :1731   Child: 109   Yes: 711  
##  3rd :706                                        
##  Crew:885

資料使用Apriori演算法

  • 最小支持度: supp=0.1
  • 最小信任度: conf=0.8
  • 最大規則數: maxlen=10

## Loading required package: Matrix
## 
## Attaching package: 'arules'
## 
## The following objects are masked from 'package:base':
## 
##     %in%, write
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport support minlen maxlen
##         0.8    0.1    1 none FALSE            TRUE     0.1      1     10
##  target   ext
##   rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## apriori - find association rules with the apriori algorithm
## version 4.21 (2004.05.09)        (c) 1996-2004   Christian Borgelt
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[10 item(s), 2201 transaction(s)] done [0.00s].
## sorting and recoding items ... [9 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [27 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

資料使用Apriori演算法

顯示分類規則

##    lhs               rhs             support confidence      lift
## 1  {}             => {Age=Adult}   0.9504771  0.9504771 1.0000000
## 2  {Class=2nd}    => {Age=Adult}   0.1185825  0.9157895 0.9635051
## 3  {Class=1st}    => {Age=Adult}   0.1449341  0.9815385 1.0326798
## 4  {Sex=Female}   => {Age=Adult}   0.1930940  0.9042553 0.9513700
## 5  {Class=3rd}    => {Age=Adult}   0.2848705  0.8881020 0.9343750
## 6  {Survived=Yes} => {Age=Adult}   0.2971377  0.9198312 0.9677574
## 7  {Class=Crew}   => {Sex=Male}    0.3916402  0.9740113 1.2384742
## 8  {Class=Crew}   => {Age=Adult}   0.4020900  1.0000000 1.0521033
## 9  {Survived=No}  => {Sex=Male}    0.6197183  0.9154362 1.1639949
## 10 {Survived=No}  => {Age=Adult}   0.6533394  0.9651007 1.0153856
## 11 {Sex=Male}     => {Age=Adult}   0.7573830  0.9630272 1.0132040
## 12 {Sex=Female,                                                  
##     Survived=Yes} => {Age=Adult}   0.1435711  0.9186047 0.9664669
## 13 {Class=3rd,                                                   
##     Sex=Male}     => {Survived=No} 0.1917310  0.8274510 1.2222950
## 14 {Class=3rd,                                                   
##     Survived=No}  => {Age=Adult}   0.2162653  0.9015152 0.9484870
## 15 {Class=3rd,                                                   
##     Sex=Male}     => {Age=Adult}   0.2099046  0.9058824 0.9530818
## 16 {Sex=Male,                                                    
##     Survived=Yes} => {Age=Adult}   0.1535666  0.9209809 0.9689670
## 17 {Class=Crew,                                                  
##     Survived=No}  => {Sex=Male}    0.3044071  0.9955423 1.2658514
## 18 {Class=Crew,                                                  
##     Survived=No}  => {Age=Adult}   0.3057701  1.0000000 1.0521033
## 19 {Class=Crew,                                                  
##     Sex=Male}     => {Age=Adult}   0.3916402  1.0000000 1.0521033
## 20 {Class=Crew,                                                  
##     Age=Adult}    => {Sex=Male}    0.3916402  0.9740113 1.2384742
## 21 {Sex=Male,                                                    
##     Survived=No}  => {Age=Adult}   0.6038164  0.9743402 1.0251065
## 22 {Age=Adult,                                                   
##     Survived=No}  => {Sex=Male}    0.6038164  0.9242003 1.1751385
## 23 {Class=3rd,                                                   
##     Sex=Male,                                                    
##     Survived=No}  => {Age=Adult}   0.1758292  0.9170616 0.9648435
## 24 {Class=3rd,                                                   
##     Age=Adult,                                                   
##     Survived=No}  => {Sex=Male}    0.1758292  0.8130252 1.0337773
## 25 {Class=3rd,                                                   
##     Sex=Male,                                                    
##     Age=Adult}    => {Survived=No} 0.1758292  0.8376623 1.2373791
## 26 {Class=Crew,                                                  
##     Sex=Male,                                                    
##     Survived=No}  => {Age=Adult}   0.3044071  1.0000000 1.0521033
## 27 {Class=Crew,                                                  
##     Age=Adult,                                                   
##     Survived=No}  => {Sex=Male}    0.3044071  0.9955423 1.2658514

將rhs只列出Survied 的部分

##    lhs             rhs            support confidence  lift
## 1  {Class=2nd,                                            
##     Age=Child}  => {Survived=Yes}   0.011      1.000 3.096
## 2  {Class=2nd,                                            
##     Sex=Female} => {Survived=Yes}   0.042      0.877 2.716
## 3  {Class=2nd,                                            
##     Sex=Male}   => {Survived=No}    0.070      0.860 1.271
## 4  {Class=1st,                                            
##     Sex=Female} => {Survived=Yes}   0.064      0.972 3.010
## 5  {Class=Crew,                                           
##     Sex=Female} => {Survived=Yes}   0.009      0.870 2.692
## 6  {Class=3rd,                                            
##     Sex=Male}   => {Survived=No}    0.192      0.827 1.222
## 7  {Class=2nd,                                            
##     Sex=Female,                                           
##     Age=Child}  => {Survived=Yes}   0.006      1.000 3.096
## 8  {Class=2nd,                                            
##     Sex=Female,                                           
##     Age=Adult}  => {Survived=Yes}   0.036      0.860 2.663
## 9  {Class=2nd,                                            
##     Sex=Male,                                             
##     Age=Adult}  => {Survived=No}    0.070      0.917 1.354
## 10 {Class=1st,                                            
##     Sex=Female,                                           
##     Age=Adult}  => {Survived=Yes}   0.064      0.972 3.010
## 11 {Class=Crew,                                           
##     Sex=Female,                                           
##     Age=Adult}  => {Survived=Yes}   0.009      0.870 2.692
## 12 {Class=3rd,                                            
##     Sex=Male,                                             
##     Age=Adult}  => {Survived=No}    0.176      0.838 1.237

刪除調整規則

視覺化

## Loading required package: grid
## 
## Attaching package: 'arulesViz'
## 
## The following object is masked from 'package:base':
## 
##     abbreviate