Association Rules

Introduction

The Titanic disaster remains one of the most analyzed historical events due to its tragic loss of life and the notable patterns among survivors (and probably very popular and available dataset :D). Using association rules, this project aims to uncover hidden relationships between passenger attributes and survival outcomes leveraging Apriori algorithm to find meaningful associations in the Titanic dataset, revealing patterns in demographics, ticket class, family structure, and embarkation points.This project is conducted using the Titanic dataset from Kaggle (https://www.kaggle.com/c/titanic/data?select=train.csv), transformed into a transaction-based format for association rule mining.

str(titanic_data)

## 'data.frame':    891 obs. of  12 variables:
##  $ PassengerId: int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Survived   : int  0 1 1 1 0 0 0 0 1 1 ...
##  $ Pclass     : int  3 1 3 1 3 3 1 3 3 2 ...
##  $ Name       : chr  "Braund, Mr. Owen Harris" "Cumings, Mrs. John Bradley (Florence Briggs Thayer)" "Heikkinen, Miss. Laina" "Futrelle, Mrs. Jacques Heath (Lily May Peel)" ...
##  $ Sex        : chr  "male" "female" "female" "female" ...
##  $ Age        : num  22 38 26 35 35 NA 54 2 27 14 ...
##  $ SibSp      : int  1 1 0 1 0 0 0 3 0 1 ...
##  $ Parch      : int  0 0 0 0 0 0 0 1 2 0 ...
##  $ Ticket     : chr  "A/5 21171" "PC 17599" "STON/O2. 3101282" "113803" ...
##  $ Fare       : num  7.25 71.28 7.92 53.1 8.05 ...
##  $ Cabin      : chr  "" "C85" "" "C123" ...
##  $ Embarked   : chr  "S" "C" "S" "S" ...

summary(titanic_data)

##   PassengerId       Survived          Pclass          Name          
##  Min.   :  1.0   Min.   :0.0000   Min.   :1.000   Length:891        
##  1st Qu.:223.5   1st Qu.:0.0000   1st Qu.:2.000   Class :character  
##  Median :446.0   Median :0.0000   Median :3.000   Mode  :character  
##  Mean   :446.0   Mean   :0.3838   Mean   :2.309                     
##  3rd Qu.:668.5   3rd Qu.:1.0000   3rd Qu.:3.000                     
##  Max.   :891.0   Max.   :1.0000   Max.   :3.000                     
##                                                                     
##      Sex                 Age            SibSp           Parch       
##  Length:891         Min.   : 0.42   Min.   :0.000   Min.   :0.0000  
##  Class :character   1st Qu.:20.12   1st Qu.:0.000   1st Qu.:0.0000  
##  Mode  :character   Median :28.00   Median :0.000   Median :0.0000  
##                     Mean   :29.70   Mean   :0.523   Mean   :0.3816  
##                     3rd Qu.:38.00   3rd Qu.:1.000   3rd Qu.:0.0000  
##                     Max.   :80.00   Max.   :8.000   Max.   :6.0000  
##                     NA's   :177                                     
##     Ticket               Fare           Cabin             Embarked        
##  Length:891         Min.   :  0.00   Length:891         Length:891        
##  Class :character   1st Qu.:  7.91   Class :character   Class :character  
##  Mode  :character   Median : 14.45   Mode  :character   Mode  :character  
##                     Mean   : 32.20                                        
##                     3rd Qu.: 31.00                                        
##                     Max.   :512.33                                        
##

Data preparation

To prepare dataset for analysis I performed some transformations. Firstly, to deal with NA values in “Age” variable, I changed them to median to reduce their impact, then converted “Age” to district variable with levels: “Child” (0-12 years), “Teenager” (13-18 years), “Adult” (19-60 years) and “Senior” (over 60 years). For variables “SibSp” (indicating how many siblings or spouses did particular passenger have present on board) and ”Parch” (the same for parents or children) I changed numerical values to levels “yes” if they had any and “no” if the value was 0 for better interpretation. Nulls in “Embarked” I changed to mode, and for better visibility I changed values 0 and 1 in “Survived” variable to “yes” and “no”.

summary(titanic_data)

##  Survived  Pclass      Sex      SibSp     Parch            Embarked  
##  No :549   1:216   female:314   no :608   no :678   Cherbourg  :168  
##  Yes:342   2:184   male  :577   yes:283   yes:213   Queenstown : 77  
##            3:491                                    Southampton:646  
##                                                                      
##      AgeGroup  
##  Child   : 69  
##  Teenager: 70  
##  Adult   :730  
##  Senior  : 22

titanic_transactions <- as(titanic_data, "transactions")

Apriori

I chose to perform data association project using Apriori algorithm as Titanic dataset contains many categorical attributes and this algorithms allows intuitive visualization of this kind of data, doesn’t require pre-defined dependent variable which helps to find various rules.

After experimenting a little with parameters I decided to choose support = 0.3 and confidence = 0.8 for general exploration of rules. Which leaves us with number of 119 rules in total.

rules <- apriori(titanic_transactions, 
                 parameter = list(support = 0.3, confidence = 0.8,  minlen = 2))

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.8    0.1    1 none FALSE            TRUE       5     0.3      2
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 267 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[18 item(s), 891 transaction(s)] done [0.00s].
## sorting and recoding items ... [10 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 done [0.00s].
## writing ... [119 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

inspect(sort(rules, by="lift")[1:20])

##      lhs                        rhs             support confidence  coverage     lift count
## [1]  {Survived=No,                                                                         
##       SibSp=no,                                                                            
##       Parch=no,                                                                            
##       Embarked=Southampton}  => {Sex=male}    0.3063973  0.9349315 0.3277217 1.443716   273
## [2]  {Survived=No,                                                                         
##       SibSp=no,                                                                            
##       Parch=no,                                                                            
##       AgeGroup=Adult}        => {Sex=male}    0.3524130  0.9317507 0.3782267 1.438804   314
## [3]  {Survived=No,                                                                         
##       SibSp=no,                                                                            
##       Parch=no}              => {Sex=male}    0.3894501  0.9278075 0.4197531 1.432715   347
## [4]  {Survived=No,                                                                         
##       SibSp=no,                                                                            
##       Embarked=Southampton}  => {Sex=male}    0.3153760  0.9213115 0.3423120 1.422684   281
## [5]  {Survived=No,                                                                         
##       Parch=no,                                                                            
##       Embarked=Southampton,                                                                
##       AgeGroup=Adult}        => {Sex=male}    0.3198653  0.9163987 0.3490460 1.415097   285
## [6]  {Survived=No,                                                                         
##       SibSp=no,                                                                            
##       AgeGroup=Adult}        => {Sex=male}    0.3670034  0.9159664 0.4006734 1.414430   327
## [7]  {Survived=No,                                                                         
##       Parch=no,                                                                            
##       AgeGroup=Adult}        => {Sex=male}    0.4118967  0.9152120 0.4500561 1.413265   367
## [8]  {Survived=No,                                                                         
##       Parch=no,                                                                            
##       Embarked=Southampton}  => {Sex=male}    0.3546577  0.9132948 0.3883277 1.410304   316
## [9]  {Pclass=3,                                                                            
##       Sex=male}              => {Survived=No} 0.3367003  0.8645533 0.3894501 1.403128   300
## [10] {Survived=No,                                                                         
##       Parch=no}              => {Sex=male}    0.4534231  0.9078652 0.4994388 1.401920   404
## [11] {Survived=No,                                                                         
##       SibSp=no}              => {Sex=male}    0.4051627  0.9070352 0.4466891 1.400638   361
## [12] {Sex=male,                                                                            
##       Embarked=Southampton,                                                                
##       AgeGroup=Adult}        => {Survived=No} 0.3512907  0.8505435 0.4130191 1.380390   313
## [13] {Sex=male,                                                                            
##       SibSp=no,                                                                            
##       Parch=no,                                                                            
##       Embarked=Southampton}  => {Survived=No} 0.3063973  0.8504673 0.3602694 1.380267   273
## [14] {Sex=male,                                                                            
##       Parch=no,                                                                            
##       Embarked=Southampton}  => {Survived=No} 0.3546577  0.8471850 0.4186308 1.374940   316
## [15] {Sex=male,                                                                            
##       SibSp=no,                                                                            
##       Parch=no}              => {Survived=No} 0.3894501  0.8442822 0.4612795 1.370229   347
## [16] {Sex=male,                                                                            
##       SibSp=no,                                                                            
##       Embarked=Southampton}  => {Survived=No} 0.3153760  0.8438438 0.3737374 1.369517   281
## [17] {Sex=male,                                                                            
##       Parch=no,                                                                            
##       Embarked=Southampton,                                                                
##       AgeGroup=Adult}        => {Survived=No} 0.3198653  0.8431953 0.3793490 1.368464   285
## [18] {Sex=male,                                                                            
##       SibSp=no,                                                                            
##       AgeGroup=Adult}        => {Survived=No} 0.3670034  0.8406170 0.4365881 1.364280   327
## [19] {Sex=male,                                                                            
##       SibSp=no,                                                                            
##       Parch=no,                                                                            
##       AgeGroup=Adult}        => {Survived=No} 0.3524130  0.8395722 0.4197531 1.362584   314
## [20] {Sex=male,                                                                            
##       Parch=no}              => {Survived=No} 0.4534231  0.8347107 0.5432099 1.354694   404

As variable “Sex” doesn’t make much sense as rhs, I adjusted the code to exclude it.

rules_filtered <- subset(rules, !(rhs %pin% "Sex=male" | rhs %pin% "Sex=female"))
length(rules_filtered)

## [1] 101

inspect(sort(rules_filtered, by="lift")[1:20])

##      lhs                        rhs             support confidence  coverage     lift count
## [1]  {Pclass=3,                                                                            
##       Sex=male}              => {Survived=No} 0.3367003  0.8645533 0.3894501 1.403128   300
## [2]  {Sex=male,                                                                            
##       Embarked=Southampton,                                                                
##       AgeGroup=Adult}        => {Survived=No} 0.3512907  0.8505435 0.4130191 1.380390   313
## [3]  {Sex=male,                                                                            
##       SibSp=no,                                                                            
##       Parch=no,                                                                            
##       Embarked=Southampton}  => {Survived=No} 0.3063973  0.8504673 0.3602694 1.380267   273
## [4]  {Sex=male,                                                                            
##       Parch=no,                                                                            
##       Embarked=Southampton}  => {Survived=No} 0.3546577  0.8471850 0.4186308 1.374940   316
## [5]  {Sex=male,                                                                            
##       SibSp=no,                                                                            
##       Parch=no}              => {Survived=No} 0.3894501  0.8442822 0.4612795 1.370229   347
## [6]  {Sex=male,                                                                            
##       SibSp=no,                                                                            
##       Embarked=Southampton}  => {Survived=No} 0.3153760  0.8438438 0.3737374 1.369517   281
## [7]  {Sex=male,                                                                            
##       Parch=no,                                                                            
##       Embarked=Southampton,                                                                
##       AgeGroup=Adult}        => {Survived=No} 0.3198653  0.8431953 0.3793490 1.368464   285
## [8]  {Sex=male,                                                                            
##       SibSp=no,                                                                            
##       AgeGroup=Adult}        => {Survived=No} 0.3670034  0.8406170 0.4365881 1.364280   327
## [9]  {Sex=male,                                                                            
##       SibSp=no,                                                                            
##       Parch=no,                                                                            
##       AgeGroup=Adult}        => {Survived=No} 0.3524130  0.8395722 0.4197531 1.362584   314
## [10] {Sex=male,                                                                            
##       Parch=no}              => {Survived=No} 0.4534231  0.8347107 0.5432099 1.354694   404
## [11] {Sex=male,                                                                            
##       SibSp=no}              => {Survived=No} 0.4051627  0.8317972 0.4870932 1.349966   361
## [12] {Sex=male,                                                                            
##       Parch=no,                                                                            
##       AgeGroup=Adult}        => {Survived=No} 0.4118967  0.8303167 0.4960718 1.347563   367
## [13] {Sex=male,                                                                            
##       AgeGroup=Adult}        => {Survived=No} 0.4534231  0.8295688 0.5465769 1.346349   404
## [14] {Sex=male,                                                                            
##       Embarked=Southampton}  => {Survived=No} 0.4085297  0.8253968 0.4949495 1.339578   364
## [15] {Sex=male}              => {Survived=No} 0.5252525  0.8110919 0.6475870 1.316362   468
## [16] {Pclass=3,                                                                            
##       Embarked=Southampton}  => {Survived=No} 0.3209877  0.8101983 0.3961841 1.314912   286
## [17] {Sex=male,                                                                            
##       SibSp=no,                                                                            
##       Embarked=Southampton,                                                                
##       AgeGroup=Adult}        => {Parch=no}    0.3243547  0.9730640 0.3333333 1.278761   289
## [18] {Survived=No,                                                                         
##       Sex=male,                                                                            
##       SibSp=no,                                                                            
##       Embarked=Southampton}  => {Parch=no}    0.3063973  0.9715302 0.3153760 1.276746   273
## [19] {Sex=male,                                                                            
##       SibSp=no,                                                                            
##       Embarked=Southampton}  => {Parch=no}    0.3602694  0.9639640 0.3737374 1.266802   321
## [20] {Survived=No,                                                                         
##       Sex=male,                                                                            
##       Parch=no,                                                                            
##       Embarked=Southampton}  => {SibSp=no}    0.3063973  0.8639241 0.3546577 1.266047   273

This left us with 101 rules, that also need to be adjusted in terms of correlation as in this study I don’t want to focus on uncorrelated variables.

hist(quality(rules)$lift,
     breaks = 30,
     col='navy',
     main = "Lift distribution", 
     xlab = "Lift", 
     ylab = "number of items"
)

There’s not much negatively correlated rules, all close to 1, so I decided to limit the scope of rules to the ones with lift value above 1.2.

rules_uncorr <- subset(rules, lift >= 1.2)
hist(quality(rules_uncorr)$lift,
     breaks = 30,
     col='navy',
     main = "Lift distribution", 
     xlab = "Lift", 
     ylab = "number of items"
)

length(rules_uncorr)

## [1] 62

At the end it gives us 62 different rules, which is much better number to visualize.

summary(rules_uncorr)

## set of 62 rules
## 
## rule length distribution (lhs + rhs):sizes
##  2  3  4  5 
##  2 18 29 13 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.000   3.000   4.000   3.855   4.000   5.000 
## 
## summary of quality measures:
##     support         confidence        coverage           lift      
##  Min.   :0.3064   Min.   :0.8029   Min.   :0.3154   Min.   :1.202  
##  1st Qu.:0.3277   1st Qu.:0.8434   1st Qu.:0.3799   1st Qu.:1.246  
##  Median :0.3620   Median :0.8598   Median :0.4153   Median :1.278  
##  Mean   :0.3758   Mean   :0.8814   Mean   :0.4280   Mean   :1.309  
##  3rd Qu.:0.4085   3rd Qu.:0.9266   3rd Qu.:0.4593   3rd Qu.:1.369  
##  Max.   :0.5421   Max.   :0.9731   Max.   :0.6476   Max.   :1.444  
##      count      
##  Min.   :273.0  
##  1st Qu.:292.0  
##  Median :322.5  
##  Mean   :334.9  
##  3rd Qu.:364.0  
##  Max.   :483.0  
## 
## mining info:
##                  data ntransactions support confidence
##  titanic_transactions           891     0.3        0.8
##                                                                                                 call
##  apriori(data = titanic_transactions, parameter = list(support = 0.3, confidence = 0.8, minlen = 2))

Visualization

plot(rules_uncorr, 
     method = "graph", 
     measure = "support", 
     colors = c("lightblue", "navy")
)

Chart suggests that men who embarked from Southampton and traveled alone (SibSp=no, Parch=no) in the 3rd class were least likely to survive.

plot(rules_uncorr, method="paracoord", control=list(reorder=TRUE))

This chart drives us to very similar conclusion that men traveling from Southampton alone in 3rd class had poor survival chances.

Why Southampton?

The most passengers traveled from there, so it’s not surprising result.

Who survived then?

rules.Survived <- apriori(titanic_transactions, 
                 parameter = list(support = 0.08,  confidence = 0.6, minlen = 2), appearance=list(default="lhs", rhs="Survived=Yes"), control=list(verbose=F)) 

rules.Survived.bylift<-sort(rules.Survived, by="lift", decreasing=TRUE)
inspect(head(rules.Survived.bylift))

##     lhs                 rhs               support confidence   coverage     lift count
## [1] {Pclass=1,                                                                        
##      Sex=female,                                                                      
##      AgeGroup=Adult} => {Survived=Yes} 0.08866442  0.9753086 0.09090909 2.540936    79
## [2] {Pclass=1,                                                                        
##      Sex=female}     => {Survived=Yes} 0.10213244  0.9680851 0.10549944 2.522116    91
## [3] {Sex=female,                                                                      
##      Parch=no,                                                                        
##      AgeGroup=Adult} => {Survived=Yes} 0.14927048  0.7964072 0.18742985 2.074850   133
## [4] {Sex=female,                                                                      
##      SibSp=no,                                                                        
##      Parch=no,                                                                        
##      AgeGroup=Adult} => {Survived=Yes} 0.09652076  0.7889908 0.12233446 2.055529    86
## [5] {Sex=female,                                                                      
##      Parch=no}       => {Survived=Yes} 0.17171717  0.7886598 0.21773288 2.054666   153
## [6] {Sex=female,                                                                      
##      SibSp=no}       => {Survived=Yes} 0.15375982  0.7873563 0.19528620 2.051270   137

Adult women travelling in 1st class had the best chances to survive. Around 15% of people that survived were adult women without children, also around 15% - women without spouses or siblings and in total around 10% of survivors were adult women travelling alone (without siblings, spouses, parents or children).

What about kids?

Children and teenagers didn’t have big representation on board (around 16% of total number of passengers), so I needed to adjust parameters to much smaller.

##     lhs                        rhs               support confidence   coverage     lift count
## [1] {Pclass=2,                                                                               
##      AgeGroup=Child}        => {Survived=Yes} 0.01907969  1.0000000 0.01907969 2.605263    17
## [2] {Pclass=2,                                                                               
##      Parch=yes,                                                                              
##      AgeGroup=Child}        => {Survived=Yes} 0.01907969  1.0000000 0.01907969 2.605263    17
## [3] {Pclass=2,                                                                               
##      Embarked=Southampton,                                                                   
##      AgeGroup=Child}        => {Survived=Yes} 0.01683502  1.0000000 0.01683502 2.605263    15
## [4] {Pclass=2,                                                                               
##      Parch=yes,                                                                              
##      Embarked=Southampton,                                                                   
##      AgeGroup=Child}        => {Survived=Yes} 0.01683502  1.0000000 0.01683502 2.605263    15
## [5] {SibSp=no,                                                                               
##      AgeGroup=Child}        => {Survived=Yes} 0.01571268  0.8235294 0.01907969 2.145511    14
## [6] {Sex=female,                                                                             
##      SibSp=no,                                                                               
##      AgeGroup=Teenager}     => {Survived=Yes} 0.02020202  0.7826087 0.02581369 2.038902    18
## [7] {Sex=female,                                                                             
##      AgeGroup=Teenager}     => {Survived=Yes} 0.03030303  0.7500000 0.04040404 1.953947    27
## [8] {Sex=female,                                                                             
##      Embarked=Southampton,                                                                   
##      AgeGroup=Teenager}     => {Survived=Yes} 0.01571268  0.7368421 0.02132435 1.919668    14

Young kids definitely had better chances than teenagers, which is not surprising. There’s also visible association with 2nd class, but probably because not many children traveled on 1st. In a group of teenagers teenagers, women had better chances to survive similarly to general analysis. Around 3% of all passengers that survived were teenage girls which is around 38% of all teenagers.

Did people with families had better or worse chances to survive?

summary(titanic_data)

##  Survived  Pclass      Sex      SibSp     Parch            Embarked  
##  No :549   1:216   female:314   no :608   no :678   Cherbourg  :168  
##  Yes:342   2:184   male  :577   yes:283   yes:213   Queenstown : 77  
##            3:491                                    Southampton:646  
##                                                                      
##      AgeGroup  
##  Child   : 69  
##  Teenager: 70  
##  Adult   :730  
##  Senior  : 22

rules.FamilySurvived <- apriori(titanic_transactions, 
                 parameter = list(support = 0.02, confidence = 0.4, minlen = 2), 
                 appearance = list(lhs=c("SibSp=yes","SibSp=no", "Parch=yes", "Parch=no"), rhs="Survived=Yes"), 
                 control = list(verbose=F))

rules.FamilySurvived.bylift <- sort(rules.FamilySurvived, by="lift", decreasing=TRUE)

inspect(rules.FamilySurvived.bylift)

##     lhs                       rhs            support    confidence coverage  
## [1] {SibSp=no, Parch=yes}  => {Survived=Yes} 0.05274972 0.6619718  0.07968575
## [2] {Parch=yes}            => {Survived=Yes} 0.12233446 0.5117371  0.23905724
## [3] {SibSp=yes, Parch=no}  => {Survived=Yes} 0.07856341 0.4964539  0.15824916
## [4] {SibSp=yes}            => {Survived=Yes} 0.14814815 0.4664311  0.31762065
## [5] {SibSp=yes, Parch=yes} => {Survived=Yes} 0.06958474 0.4366197  0.15937149
##     lift     count
## [1] 1.724611  47  
## [2] 1.333210 109  
## [3] 1.293393  70  
## [4] 1.215176 132  
## [5] 1.137509  62

From this we can take that people with no siblings or spouses but with parent or child on board had better chances to survive. Around 12% of people that survived had parent or child on board and around 15% had sibling or spouse. This may make some sense when it comes to motivation to rescue someone, but both confidence and support are not high enough to consider it as valuable insights.

Who had worse chances to survive even though traveled in 1st class?

rules.1stClassDied <- apriori(titanic_transactions, 
                 parameter = list(support = 0.01, confidence = 0.6, minlen = 2), 
                 appearance = list(default="lhs", rhs="Survived=No"), 
                 control = list(verbose=F))

rules.1stClassDied <- subset(rules.1stClassDied, lhs %pin% "Pclass=1")

rules.1stClassDied.bylift <- sort(rules.1stClassDied, by="lift", decreasing=TRUE)

inspect(head(rules.1stClassDied.bylift))

##     lhs                  rhs              support confidence   coverage     lift count
## [1] {Pclass=1,                                                                        
##      Sex=male,                                                                        
##      AgeGroup=Senior} => {Survived=No} 0.01234568  0.9166667 0.01346801 1.487705    11
## [2] {Pclass=1,                                                                        
##      Sex=male,                                                                        
##      SibSp=no,                                                                        
##      AgeGroup=Senior} => {Survived=No} 0.01010101  0.9000000 0.01122334 1.460656     9
## [3] {Pclass=1,                                                                        
##      SibSp=no,                                                                        
##      AgeGroup=Senior} => {Survived=No} 0.01010101  0.8181818 0.01234568 1.327869     9
## [4] {Pclass=1,                                                                        
##      AgeGroup=Senior} => {Survived=No} 0.01234568  0.7857143 0.01571268 1.275176    11
## [5] {Pclass=1,                                                                        
##      Sex=male,                                                                        
##      SibSp=no,                                                                        
##      Parch=yes}       => {Survived=No} 0.01010101  0.6923077 0.01459035 1.123581     9
## [6] {Pclass=1,                                                                        
##      Sex=male,                                                                        
##      Parch=yes,                                                                       
##      AgeGroup=Adult}  => {Survived=No} 0.01234568  0.6875000 0.01795735 1.115779    11

Again the results are expected. Senior men travelling alone had the worse chances to survive. However, support level is very low here, as not that many people traveled in first class, and what’s more relevant, there was not many seniors in Titanic.

Conclusion

The application of association rules effectively uncovered key survival patterns among Titanic passengers. The results confirmed well-known trends, such as the higher survival rates of women and children, particularly in first and second class, while also revealing the impact (or maybe rather correlation) of embarkation city and family presence. Notably, passengers from Cherbourg had a better survival rate, likely due to a higher proportion of first-class travelers, whereas men traveling alone in third class from Southampton had the lowest chances of survival.

While the Apriori method proved valuable in identifying meaningful relationships, further improvements could enhance the analysis. Incorporating additional features such as ticket pricing or cabin location could provide further insights. Future work could also compare association rule mining with predictive models, offering a broader perspective on survival probabilities and experimentation area.

Overall, even if I’m aware that this was one of most basic choices for dataset, I’m not disappointed by the results of analysis. I find some insights about cities of departure or company during the travel interesting.