- R介紹,與關聯規則基本觀念
- Apriori 演算法
- 範例:使用Titanic的簡易乘客資料
- 資料使用Apriori演算法
- 篩除多餘規則
- 視覺化
- 解讀
關於R的部分,可以參考 :
Support (支持度):
在所有事件發生的狀況下,同時發生A、B事件的機率=P(B∩A)
Confidence (信心水準) :
在A事件發生的狀況下,同時發生A、B事件的機率=條件機率 = P(B∣A)
Lift (增益):
Lift值=Confidence / Expected Confidence(本例為P(B)) 當Lift值>1, 則A與B間有正向關係 當Lift值=1, 則A與B間沒有關係 當Lift值<1, 則A與B間為負向關係
Apriori演算法是種最有影響的挖掘布爾關聯規則頻繁項集的演算法。它的核心是基於兩階段頻集思想的遞推演算法。該關聯規則在分類上屬於單維、單層、布爾關聯規則。在這裡,所有支持度大於最小支持度的項集稱為頻繁項集(簡稱頻集),也常稱為最大項目集。
優點:
簡單、易理解、數據要求低,
缺點:
1. 在每一步產生侯選項目集時循環產生的組合過多,沒有排除不應該參與組合的元素; 2. 每次計算項集的支持度時,都對資料庫D中的全部記錄進行了一遍掃描比較,如果是一個大型的資料庫的話,這種掃描比較會大大增加計算機系統的I/O開銷。而這種代價是隨著資料庫的記錄的增加呈現出幾何級數的增加。
簡單統計所有含一個元素項目集出現的頻數,並找出那些不小於最小支持度的項目集,即一維最大項目集。
開始循環處理直到再沒有最大項目集生成。循環過程是:第k步中,根據第k-1步生成的(k-1)維最大項目集產生k維侯選項目集,然後對資料庫進行搜索,得到侯選項目集的項集支持度,與最小支持度進行比較,從而找到k維最大項目集。
Aproiro演算法使用上述的性質產生候選項目(Candidate itemsets),候選項目集的產生主要包含
在泰坦尼克號數據集中,根據[Class(艙等)]、[Sex(性別)]、[Age(年齡)]、[Survired(生存)]而有四種欄位。
其中,每一行代表一個人的資料。
## Class Sex Age Survived ## 1 3rd Male Child No ## 2 3rd Male Child No ## 3 3rd Male Child No ## 4 3rd Male Child No ## 5 3rd Male Child No ## 6 3rd Male Child No ## 7 3rd Male Child No ## 8 3rd Male Child No ## 9 3rd Male Child No ## 10 3rd Male Child No
任意抓5筆資料
## Class Sex Age Survived ## 1774 Crew Male Adult Yes ## 1598 1st Male Adult Yes ## 86 1st Male Adult No ## 1292 Crew Male Adult No ## 1165 Crew Male Adult No
資料小結:
## Class Sex Age Survived ## 1st :325 Female: 470 Adult:2092 No :1490 ## 2nd :285 Male :1731 Child: 109 Yes: 711 ## 3rd :706 ## Crew:885
## Loading required package: Matrix ## ## Attaching package: 'arules' ## ## The following objects are masked from 'package:base': ## ## %in%, write
## ## Parameter specification: ## confidence minval smax arem aval originalSupport support minlen maxlen ## 0.8 0.1 1 none FALSE TRUE 0.1 1 10 ## target ext ## rules FALSE ## ## Algorithmic control: ## filter tree heap memopt load sort verbose ## 0.1 TRUE TRUE FALSE TRUE 2 TRUE ## ## apriori - find association rules with the apriori algorithm ## version 4.21 (2004.05.09) (c) 1996-2004 Christian Borgelt ## set item appearances ...[0 item(s)] done [0.00s]. ## set transactions ...[10 item(s), 2201 transaction(s)] done [0.00s]. ## sorting and recoding items ... [9 item(s)] done [0.00s]. ## creating transaction tree ... done [0.00s]. ## checking subsets of size 1 2 3 4 done [0.00s]. ## writing ... [27 rule(s)] done [0.00s]. ## creating S4 object ... done [0.00s].
顯示分類規則
## lhs rhs support confidence lift ## 1 {} => {Age=Adult} 0.9504771 0.9504771 1.0000000 ## 2 {Class=2nd} => {Age=Adult} 0.1185825 0.9157895 0.9635051 ## 3 {Class=1st} => {Age=Adult} 0.1449341 0.9815385 1.0326798 ## 4 {Sex=Female} => {Age=Adult} 0.1930940 0.9042553 0.9513700 ## 5 {Class=3rd} => {Age=Adult} 0.2848705 0.8881020 0.9343750 ## 6 {Survived=Yes} => {Age=Adult} 0.2971377 0.9198312 0.9677574 ## 7 {Class=Crew} => {Sex=Male} 0.3916402 0.9740113 1.2384742 ## 8 {Class=Crew} => {Age=Adult} 0.4020900 1.0000000 1.0521033 ## 9 {Survived=No} => {Sex=Male} 0.6197183 0.9154362 1.1639949 ## 10 {Survived=No} => {Age=Adult} 0.6533394 0.9651007 1.0153856 ## 11 {Sex=Male} => {Age=Adult} 0.7573830 0.9630272 1.0132040 ## 12 {Sex=Female, ## Survived=Yes} => {Age=Adult} 0.1435711 0.9186047 0.9664669 ## 13 {Class=3rd, ## Sex=Male} => {Survived=No} 0.1917310 0.8274510 1.2222950 ## 14 {Class=3rd, ## Survived=No} => {Age=Adult} 0.2162653 0.9015152 0.9484870 ## 15 {Class=3rd, ## Sex=Male} => {Age=Adult} 0.2099046 0.9058824 0.9530818 ## 16 {Sex=Male, ## Survived=Yes} => {Age=Adult} 0.1535666 0.9209809 0.9689670 ## 17 {Class=Crew, ## Survived=No} => {Sex=Male} 0.3044071 0.9955423 1.2658514 ## 18 {Class=Crew, ## Survived=No} => {Age=Adult} 0.3057701 1.0000000 1.0521033 ## 19 {Class=Crew, ## Sex=Male} => {Age=Adult} 0.3916402 1.0000000 1.0521033 ## 20 {Class=Crew, ## Age=Adult} => {Sex=Male} 0.3916402 0.9740113 1.2384742 ## 21 {Sex=Male, ## Survived=No} => {Age=Adult} 0.6038164 0.9743402 1.0251065 ## 22 {Age=Adult, ## Survived=No} => {Sex=Male} 0.6038164 0.9242003 1.1751385 ## 23 {Class=3rd, ## Sex=Male, ## Survived=No} => {Age=Adult} 0.1758292 0.9170616 0.9648435 ## 24 {Class=3rd, ## Age=Adult, ## Survived=No} => {Sex=Male} 0.1758292 0.8130252 1.0337773 ## 25 {Class=3rd, ## Sex=Male, ## Age=Adult} => {Survived=No} 0.1758292 0.8376623 1.2373791 ## 26 {Class=Crew, ## Sex=Male, ## Survived=No} => {Age=Adult} 0.3044071 1.0000000 1.0521033 ## 27 {Class=Crew, ## Age=Adult, ## Survived=No} => {Sex=Male} 0.3044071 0.9955423 1.2658514
## lhs rhs support confidence lift ## 1 {Class=2nd, ## Age=Child} => {Survived=Yes} 0.011 1.000 3.096 ## 2 {Class=2nd, ## Sex=Female} => {Survived=Yes} 0.042 0.877 2.716 ## 3 {Class=2nd, ## Sex=Male} => {Survived=No} 0.070 0.860 1.271 ## 4 {Class=1st, ## Sex=Female} => {Survived=Yes} 0.064 0.972 3.010 ## 5 {Class=Crew, ## Sex=Female} => {Survived=Yes} 0.009 0.870 2.692 ## 6 {Class=3rd, ## Sex=Male} => {Survived=No} 0.192 0.827 1.222 ## 7 {Class=2nd, ## Sex=Female, ## Age=Child} => {Survived=Yes} 0.006 1.000 3.096 ## 8 {Class=2nd, ## Sex=Female, ## Age=Adult} => {Survived=Yes} 0.036 0.860 2.663 ## 9 {Class=2nd, ## Sex=Male, ## Age=Adult} => {Survived=No} 0.070 0.917 1.354 ## 10 {Class=1st, ## Sex=Female, ## Age=Adult} => {Survived=Yes} 0.064 0.972 3.010 ## 11 {Class=Crew, ## Sex=Female, ## Age=Adult} => {Survived=Yes} 0.009 0.870 2.692 ## 12 {Class=3rd, ## Sex=Male, ## Age=Adult} => {Survived=No} 0.176 0.838 1.237
## Loading required package: grid ## ## Attaching package: 'arulesViz' ## ## The following object is masked from 'package:base': ## ## abbreviate