Will describe multiple example on Association Rule
Get the titanic data from: http://www.rdatamining.com/data/titanic.raw.rdata?attredirects=0&d=1.
# Please change the folder path as per your local machine
load("C:/Users/rahul/Google Drive/IISWBM Study Materials/Data Mining 1/titanic.raw.rdata")
titanic_r <- titanic.raw
titanic_r[1:3,]
## Class Sex Age Survived
## 1 3rd Male Child No
## 2 3rd Male Child No
## 3 3rd Male Child No
table(titanic_r$Age,titanic_r$Survived)
##
## No Yes
## Adult 1438 654
## Child 52 57
require(arules)
## Loading required package: arules
## Loading required package: Matrix
##
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
##
## abbreviate, write
# Visulization
require(arulesViz)
## Loading required package: arulesViz
## Loading required package: grid
rule <- apriori(titanic_r[2:4],
# min support & confidence
parameter=list(minlen=2, supp=0.001, conf=0.001),
appearance = list(default = "lhs", rhs=c("Survived=Yes","Survived=No")))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.001 0.1 1 none FALSE TRUE 5 0.001 2
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 2
##
## set item appearances ...[2 item(s)] done [0.00s].
## set transactions ...[6 item(s), 2201 transaction(s)] done [0.00s].
## sorting and recoding items ... [6 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 done [0.00s].
## writing ... [16 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
inspect(rule)
## lhs rhs support confidence
## [1] {Age=Child} => {Survived=Yes} 0.025897319 0.5229358
## [2] {Age=Child} => {Survived=No} 0.023625625 0.4770642
## [3] {Sex=Female} => {Survived=Yes} 0.156292594 0.7319149
## [4] {Sex=Female} => {Survived=No} 0.057246706 0.2680851
## [5] {Sex=Male} => {Survived=Yes} 0.166742390 0.2120162
## [6] {Age=Adult} => {Survived=Yes} 0.297137665 0.3126195
## [7] {Sex=Male} => {Survived=No} 0.619718310 0.7879838
## [8] {Age=Adult} => {Survived=No} 0.653339391 0.6873805
## [9] {Sex=Female,Age=Child} => {Survived=Yes} 0.012721490 0.6222222
## [10] {Sex=Female,Age=Child} => {Survived=No} 0.007723762 0.3777778
## [11] {Sex=Male,Age=Child} => {Survived=Yes} 0.013175829 0.4531250
## [12] {Sex=Male,Age=Child} => {Survived=No} 0.015901863 0.5468750
## [13] {Sex=Female,Age=Adult} => {Survived=Yes} 0.143571104 0.7435294
## [14] {Sex=Female,Age=Adult} => {Survived=No} 0.049522944 0.2564706
## [15] {Sex=Male,Age=Adult} => {Survived=Yes} 0.153566561 0.2027594
## [16] {Sex=Male,Age=Adult} => {Survived=No} 0.603816447 0.7972406
## lift count
## [1] 1.6188209 57
## [2] 0.7047103 52
## [3] 2.2657450 344
## [4] 0.3960103 126
## [5] 0.6563257 367
## [6] 0.9677574 654
## [7] 1.1639949 1364
## [8] 1.0153856 1438
## [9] 1.9261760 28
## [10] 0.5580462 17
## [11] 1.4027118 29
## [12] 0.8078335 35
## [13] 2.3016993 316
## [14] 0.3788535 109
## [15] 0.6276702 338
## [16] 1.1776688 1329
sort.rule <- sort(rule, by="lift")
plot(sort.rule, method="graph", control=list(nodeCol="red", edgeCol="blue"))
rule1 <- apriori(titanic_r[2:4],
# min support & confidence
parameter=list(minlen=2, supp=0.001, conf=0.001),
appearance = list(lhs=c("Age=Child","Age=Adult"), rhs=c("Survived=Yes")))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.001 0.1 1 none FALSE TRUE 5 0.001 2
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 2
##
## set item appearances ...[3 item(s)] done [0.00s].
## set transactions ...[3 item(s), 2201 transaction(s)] done [0.00s].
## sorting and recoding items ... [3 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [2 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
inspect(rule1)
## lhs rhs support confidence lift count
## [1] {Age=Child} => {Survived=Yes} 0.02589732 0.5229358 1.6188209 57
## [2] {Age=Adult} => {Survived=Yes} 0.29713766 0.3126195 0.9677574 654
sort.rule1 <- sort(rule1, by="lift")
inspect(sort.rule1)
## lhs rhs support confidence lift count
## [1] {Age=Child} => {Survived=Yes} 0.02589732 0.5229358 1.6188209 57
## [2] {Age=Adult} => {Survived=Yes} 0.29713766 0.3126195 0.9677574 654
plot(sort.rule1)
plot(sort.rule1, method="graph", control=list(nodeCol="red", edgeCol="blue"))
plot(sort.rule1, method="grouped", control=list(col=2))
It seems that the number of children are less but proportion of there survived is more, compared to the adult. For female and adult, survival rate is highest based on the list value which represents proportion or probability of survived.Even amoung the children-Female has more survival rate compare to the children-Male. But count was Adult are more than children, so graphically some bigger dots are plotted.