Using Arules for Association in R

Will describe multiple example on Association Rule

Get the titanic data from: http://www.rdatamining.com/data/titanic.raw.rdata?attredirects=0&d=1.

# Please change the folder path as  per your local machine
load("C:/Users/rahul/Google Drive/IISWBM Study Materials/Data Mining 1/titanic.raw.rdata")

titanic_r <- titanic.raw

titanic_r[1:3,]
##   Class  Sex   Age Survived
## 1   3rd Male Child       No
## 2   3rd Male Child       No
## 3   3rd Male Child       No
table(titanic_r$Age,titanic_r$Survived)
##        
##           No  Yes
##   Adult 1438  654
##   Child   52   57
require(arules)
## Loading required package: arules
## Loading required package: Matrix
## 
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
## 
##     abbreviate, write
# Visulization
require(arulesViz)
## Loading required package: arulesViz
## Loading required package: grid
rule <- apriori(titanic_r[2:4], 
                # min support & confidence
                parameter=list(minlen=2, supp=0.001, conf=0.001),  
                appearance = list(default = "lhs", rhs=c("Survived=Yes","Survived=No")))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##       0.001    0.1    1 none FALSE            TRUE       5   0.001      2
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 2 
## 
## set item appearances ...[2 item(s)] done [0.00s].
## set transactions ...[6 item(s), 2201 transaction(s)] done [0.00s].
## sorting and recoding items ... [6 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [16 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
inspect(rule)
##      lhs                       rhs            support     confidence
## [1]  {Age=Child}            => {Survived=Yes} 0.025897319 0.5229358 
## [2]  {Age=Child}            => {Survived=No}  0.023625625 0.4770642 
## [3]  {Sex=Female}           => {Survived=Yes} 0.156292594 0.7319149 
## [4]  {Sex=Female}           => {Survived=No}  0.057246706 0.2680851 
## [5]  {Sex=Male}             => {Survived=Yes} 0.166742390 0.2120162 
## [6]  {Age=Adult}            => {Survived=Yes} 0.297137665 0.3126195 
## [7]  {Sex=Male}             => {Survived=No}  0.619718310 0.7879838 
## [8]  {Age=Adult}            => {Survived=No}  0.653339391 0.6873805 
## [9]  {Sex=Female,Age=Child} => {Survived=Yes} 0.012721490 0.6222222 
## [10] {Sex=Female,Age=Child} => {Survived=No}  0.007723762 0.3777778 
## [11] {Sex=Male,Age=Child}   => {Survived=Yes} 0.013175829 0.4531250 
## [12] {Sex=Male,Age=Child}   => {Survived=No}  0.015901863 0.5468750 
## [13] {Sex=Female,Age=Adult} => {Survived=Yes} 0.143571104 0.7435294 
## [14] {Sex=Female,Age=Adult} => {Survived=No}  0.049522944 0.2564706 
## [15] {Sex=Male,Age=Adult}   => {Survived=Yes} 0.153566561 0.2027594 
## [16] {Sex=Male,Age=Adult}   => {Survived=No}  0.603816447 0.7972406 
##      lift      count
## [1]  1.6188209   57 
## [2]  0.7047103   52 
## [3]  2.2657450  344 
## [4]  0.3960103  126 
## [5]  0.6563257  367 
## [6]  0.9677574  654 
## [7]  1.1639949 1364 
## [8]  1.0153856 1438 
## [9]  1.9261760   28 
## [10] 0.5580462   17 
## [11] 1.4027118   29 
## [12] 0.8078335   35 
## [13] 2.3016993  316 
## [14] 0.3788535  109 
## [15] 0.6276702  338 
## [16] 1.1776688 1329
sort.rule <- sort(rule, by="lift")
plot(sort.rule, method="graph", control=list(nodeCol="red", edgeCol="blue"))

rule1 <- apriori(titanic_r[2:4], 
                # min support & confidence
                parameter=list(minlen=2, supp=0.001, conf=0.001),  
                appearance = list(lhs=c("Age=Child","Age=Adult"), rhs=c("Survived=Yes")))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##       0.001    0.1    1 none FALSE            TRUE       5   0.001      2
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 2 
## 
## set item appearances ...[3 item(s)] done [0.00s].
## set transactions ...[3 item(s), 2201 transaction(s)] done [0.00s].
## sorting and recoding items ... [3 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [2 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
inspect(rule1)
##     lhs            rhs            support    confidence lift      count
## [1] {Age=Child} => {Survived=Yes} 0.02589732 0.5229358  1.6188209  57  
## [2] {Age=Adult} => {Survived=Yes} 0.29713766 0.3126195  0.9677574 654
sort.rule1 <- sort(rule1, by="lift")
inspect(sort.rule1)
##     lhs            rhs            support    confidence lift      count
## [1] {Age=Child} => {Survived=Yes} 0.02589732 0.5229358  1.6188209  57  
## [2] {Age=Adult} => {Survived=Yes} 0.29713766 0.3126195  0.9677574 654
plot(sort.rule1)

plot(sort.rule1, method="graph", control=list(nodeCol="red", edgeCol="blue"))

plot(sort.rule1, method="grouped", control=list(col=2))

Conlcusion:

It seems that the number of children are less but proportion of there survived is more, compared to the adult. For female and adult, survival rate is highest based on the list value which represents proportion or probability of survived.Even amoung the children-Female has more survival rate compare to the children-Male. But count was Adult are more than children, so graphically some bigger dots are plotted.