Introduction to the topic and data set

Association rules represent relationships or dependencies between items in a dataset.They are used to uncover frequent itemsets and find interesting relationships between items.

A happy marriage is every couple’s dream, unfortunately things don’t always go according to plan. What is the recipe for a happy marriage? Or what prompts the other half to cheat on you? There is no single accepted definition of infidelity. In a general and broad sense, betrayal is the intentional and fully conscious letting down of a given trust. Based on Fair’s Extramarital Affairs Data, an analysis will be made using the association rules. Original dataset can be found here: https://www.kaggle.com/datasets/utkarshx27/fairs-extramarital-affairs-data, dataset used for analysis has been adjusted. Also materials used to determine the reliability of the results can be found here: https://media.statsoft.pl/_old_dnn/downloads/niewiernosc_malzenska_analiza_ankiety_internetowej.pdf

Analysis

How dataset looks like:

head(Affairs)
##   affairs gender age          yearsmarried children     religiousness
## 1      no   male  37       almost 10 years       no          normally
## 2      no female  27 between 2 and 8 years       no more than average
## 3      no female  32       almost 15 years      yes              anti
## 4      no   male  57       almost 15 years      yes              very
## 5      no   male  22     less than 2 years       no            little
## 6      no female  32     less than 2 years       no            little
##              education                                   occupation
## 1        master degree       professionals, high-ranking executives
## 2              college       lower-level managers, teachers, nurses
## 3 high school graduate unemployed, individuals not in the workforce
## 4        master degree       lower-level managers, teachers, nurses
## 5        graduate work       lower-level managers, teachers, nurses
## 6        graduate work       small business owners, skilled workers
##   rating_marriage
## 1           happy
## 2           happy
## 3           happy
## 4      very happy
## 5   average happy
## 6      very happy

Are there missing values?

sum(is.na(Affairs))
## [1] 0

Descriptive stats

summary(Affairs)
##    affairs             gender               age        yearsmarried      
##  Length:601         Length:601         Min.   :17.50   Length:601        
##  Class :character   Class :character   1st Qu.:27.00   Class :character  
##  Mode  :character   Mode  :character   Median :32.00   Mode  :character  
##                                        Mean   :32.49                     
##                                        3rd Qu.:37.00                     
##                                        Max.   :57.00                     
##    children         religiousness       education          occupation       
##  Length:601         Length:601         Length:601         Length:601        
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##  rating_marriage   
##  Length:601        
##  Class :character  
##  Mode  :character  
##                    
##                    
## 

Gender distribution

table_gender <- table(Affairs$gender)
library("wesanderson")
barplot(table_gender, col = wes_palette('GrandBudapest1', 2, type = ('continuous')), main = "Gender distribution", xlab = "Gender", ylab = "Count")

What is the most frequent in dataset?

itemFrequencyPlot(data_trans, topN=15, type="relative", main="ItemFrequency", col = wes_palette('GrandBudapest1', 15, type = ('continuous'))) 

APRIORI

Affair_rules <- apriori(Affairs, parameter = list(support = 0.2, confidence = 0.75, minlen = 2))
## Warning: Column(s) 1, 2, 3, 4, 5, 6, 7, 8, 9 not logical or factor. Applying
## default discretization (see '? discretizeDF').
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##        0.75    0.1    1 none FALSE            TRUE       5     0.2      2
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 120 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[37 item(s), 601 transaction(s)] done [0.00s].
## sorting and recoding items ... [20 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [24 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
Affair_rules
## set of 24 rules

Graph for 24 rules

set.seed(123) 
plot(Affair_rules, method="graph", colors = wes_palette('GrandBudapest1', 3, type = ('continuous')))

plot(Affair_rules, method="grouped", col = wes_palette('GrandBudapest1', 3, type = ('continuous')))

Let’s inspect the most meaningful rules

inspect(sort(Affair_rules, by = "lift")[1:5])
##     lhs                                rhs                              support confidence  coverage     lift count
## [1] {age=[37,57],                                                                                                  
##      children=yes}                  => {yearsmarried=almost 15 years} 0.2778702  0.8608247 0.3227953 2.536057   167
## [2] {affairs=no,                                                                                                   
##      yearsmarried=almost 15 years}  => {age=[37,57]}                  0.2063228  0.8732394 0.2362729 2.499128   124
## [3] {yearsmarried=almost 15 years}  => {age=[37,57]}                  0.2895175  0.8529412 0.3394343 2.441036   174
## [4] {age=[37,57]}                   => {yearsmarried=almost 15 years} 0.2895175  0.8285714 0.3494176 2.441036   174
## [5] {yearsmarried=almost 15 years,                                                                                 
##      children=yes}                  => {age=[37,57]}                  0.2778702  0.8520408 0.3261231 2.438460   167
inspect(sort(Affair_rules, by = "support")[1:5])
##     lhs                               rhs            support   confidence
## [1] {gender=female}                => {affairs=no}   0.4043261 0.7714286 
## [2] {age=[27,37)}                  => {children=yes} 0.3344426 0.7500000 
## [3] {rating_marriage=very happy}   => {affairs=no}   0.3294509 0.8534483 
## [4] {yearsmarried=almost 15 years} => {children=yes} 0.3261231 0.9607843 
## [5] {age=[37,57]}                  => {children=yes} 0.3227953 0.9238095 
##     coverage  lift     count
## [1] 0.5241265 1.028001 243  
## [2] 0.4459235 1.048256 201  
## [3] 0.3860233 1.137300 198  
## [4] 0.3394343 1.342864 196  
## [5] 0.3494176 1.291185 194
inspect(sort(Affair_rules, by = "confidence")[1:5])
##     lhs                               rhs              support confidence  coverage     lift count
## [1] {affairs=no,                                                                                  
##      yearsmarried=almost 15 years} => {children=yes} 0.2279534  0.9647887 0.2362729 1.348461   137
## [2] {yearsmarried=almost 15 years} => {children=yes} 0.3261231  0.9607843 0.3394343 1.342864   196
## [3] {age=[37,57],                                                                                 
##      yearsmarried=almost 15 years} => {children=yes} 0.2778702  0.9597701 0.2895175 1.341446   167
## [4] {age=[37,57]}                  => {children=yes} 0.3227953  0.9238095 0.3494176 1.291185   194
## [5] {affairs=no,                                                                                  
##      age=[37,57]}                  => {children=yes} 0.2346090  0.9215686 0.2545757 1.288053   141

It is possible to distinguish some nice connections for example first rule suggests that when affairs are not present, and the years married is almost 15 years, there is a high confidence (96.5%) that children are present, with a lift of 1.35.

ECLAT

freq.items<-eclat(Affairs, parameter=list(supp=0.2, maxlen=3)) 
## Warning: Column(s) 1, 2, 3, 4, 5, 6, 7, 8, 9 not logical or factor. Applying
## default discretization (see '? discretizeDF').
## Eclat
## 
## parameter specification:
##  tidLists support minlen maxlen            target  ext
##     FALSE     0.2      1      3 frequent itemsets TRUE
## 
## algorithmic control:
##  sparse sort verbose
##       7   -2    TRUE
## 
## Absolute minimum support count: 120 
## 
## create itemset ... 
## set transactions ...[37 item(s), 601 transaction(s)] done [0.00s].
## sorting and recoding items ... [20 item(s)] done [0.00s].
## creating bit matrix ... [20 row(s), 601 column(s)] done [0.00s].
## writing  ... [58 set(s)] done [0.00s].
## Creating S4 object  ... done [0.00s].
freq.rules<-ruleInduction(freq.items, data_trans, confidence=0.8)
freq.rules
## set of 16 rules
sorted_rules <- sort(freq.rules, by = "confidence", decreasing = TRUE)
inspect(freq.rules)
##      lhs                                  rhs                              support confidence     lift itemset
## [1]  {yearsmarried=less than 2 years}  => {affairs=no}                   0.2046589  0.8785714 1.170779       1
## [2]  {affairs=yes}                     => {children=yes}                 0.2046589  0.8200000 1.146093       2
## [3]  {children=no}                     => {affairs=no}                   0.2396007  0.8421053 1.122185       4
## [4]  {religiousness=more than average} => {affairs=no}                   0.2612313  0.8263158 1.101144       8
## [5]  {affairs=no,                                                                                             
##       yearsmarried=almost 15 years}    => {age=[37,57]}                  0.2063228  0.8732394 2.499128      12
## [6]  {affairs=no,                                                                                             
##       age=[37,57]}                     => {yearsmarried=almost 15 years} 0.2063228  0.8104575 2.387671      12
## [7]  {yearsmarried=almost 15 years,                                                                           
##       children=yes}                    => {age=[37,57]}                  0.2778702  0.8520408 2.438460      13
## [8]  {age=[37,57],                                                                                            
##       children=yes}                    => {yearsmarried=almost 15 years} 0.2778702  0.8608247 2.536057      13
## [9]  {age=[37,57],                                                                                            
##       yearsmarried=almost 15 years}    => {children=yes}                 0.2778702  0.9597701 1.341446      13
## [10] {affairs=no,                                                                                             
##       yearsmarried=almost 15 years}    => {children=yes}                 0.2279534  0.9647887 1.348461      14
## [11] {yearsmarried=almost 15 years}    => {children=yes}                 0.3261231  0.9607843 1.342864      16
## [12] {yearsmarried=almost 15 years}    => {age=[37,57]}                  0.2895175  0.8529412 2.441036      17
## [13] {age=[37,57]}                     => {yearsmarried=almost 15 years} 0.2895175  0.8285714 2.441036      17
## [14] {affairs=no,                                                                                             
##       age=[37,57]}                     => {children=yes}                 0.2346090  0.9215686 1.288053      20
## [15] {age=[37,57]}                     => {children=yes}                 0.3227953  0.9238095 1.291185      22
## [16] {rating_marriage=very happy}      => {affairs=no}                   0.3294509  0.8534483 1.137300      24

The association rule from ECLAT algorithm suggests a strong relationship between individuals who have been married for less than 2 years and the absence of affairs, as the high confidence (87.86%) indicates a significant likelihood of not having affairs among those with less than 2 years of marriage. Additionally, the support value (20.47%) indicates that this association is observed in approximately 20.47% of the transactions in the dataset. In approximately 20.47% of cases where affairs are present, there is an 82.00% confidence that children are also present. The lift of 1.15 suggests a moderate positive correlation, indicating that the presence of “affairs=yes” slightly increases the likelihood of “children=yes.” In opposite in approximately 23.96% of cases where there are no children, there is an 84.21% confidence that there are no affairs. Next rule implies a positive correlation between higher religiousness and the absence of affairs. This rule indicates that when the level of religiousness is more than average, there is an 82.63% confidence that there are no affairs.

Let’s go deeper and see what the rules are that lead to betrayal

AFFAIRS ANALYSIS

Apperance of affair

rules.yes <- apriori(data = data_trans,
                      parameter = list(supp = 0.08, conf = 0.2),
                      appearance = list(default = "lhs", rhs = "affairs=yes"),
                      control = list(verbose = FALSE))

rules_lift_sorted_yes <- sort(rules.yes, by = "confidence")

if (length(rules_lift_sorted_yes) >= 10) {
  inspect(rules_lift_sorted_yes[1:10])
} else {
  inspect(rules_lift_sorted_yes)
}
##      lhs                                rhs              support confidence  coverage     lift count
## [1]  {yearsmarried=almost 15 years}  => {affairs=yes} 0.10316140  0.3039216 0.3394343 1.217712    62
## [2]  {yearsmarried=almost 15 years,                                                                 
##       children=yes}                  => {affairs=yes} 0.09816972  0.3010204 0.3261231 1.206088    59
## [3]  {age=[27,37),                                                                                  
##       children=yes}                  => {affairs=yes} 0.09983361  0.2985075 0.3344426 1.196020    60
## [4]  {gender=male,                                                                                  
##       children=yes}                  => {affairs=yes} 0.10482529  0.2943925 0.3560732 1.179533    63
## [5]  {age=[37,57],                                                                                  
##       yearsmarried=almost 15 years}  => {affairs=yes} 0.08319468  0.2873563 0.2895175 1.151341    50
## [6]  {children=yes}                  => {affairs=yes} 0.20465890  0.2860465 0.7154742 1.146093   123
## [7]  {gender=female,                                                                                
##       children=yes}                  => {affairs=yes} 0.09983361  0.2777778 0.3594010 1.112963    60
## [8]  {age=[27,37)}                   => {affairs=yes} 0.12312812  0.2761194 0.4459235 1.106318    74
## [9]  {age=[37,57],                                                                                  
##       children=yes}                  => {affairs=yes} 0.08818636  0.2731959 0.3227953 1.094605    53
## [10] {gender=male}                   => {affairs=yes} 0.12978369  0.2727273 0.4758735 1.092727    78
plot(rules.yes, method = "graph", colors = wes_palette('GrandBudapest1', 3, type = ('continuous')))

The strongest relationship is between individuals who have been married almost 15 years and the apperance of affairs.There is also a strong relationship between the occurrence of infidelity and the age range of 27-37 or 37-57. Having children is also a strong relationship with the occurrence of infidelity. There is a stereotype that men are more likely to cheat in a relationship, but the tendency among women to cheat is increasing, according to StatSoft. From a survey of a sample of 3,200 people, opportunities for infidelity are used by 26.9% of women and 28.9% of men.

source: StatSoft

Absence of affair

rules.no<-apriori(data=data_trans, parameter=list(supp=0.3,conf = 0.7), 
                 appearance=list(default="lhs", rhs="affairs=no"), control=list(verbose=F)) 

rules_lift_sorted_no <- sort(rules.no, by = "confidence")

if (length(rules_lift_sorted_no) >= 10) {
  inspect(rules_lift_sorted_no[1:10])
} else {
  inspect(rules_lift_sorted_no)
}
##     lhs                             rhs          support   confidence coverage 
## [1] {rating_marriage=very happy} => {affairs=no} 0.3294509 0.8534483  0.3860233
## [2] {gender=female}              => {affairs=no} 0.4043261 0.7714286  0.5241265
## [3] {}                           => {affairs=no} 0.7504160 0.7504160  1.0000000
## [4] {gender=male}                => {affairs=no} 0.3460899 0.7272727  0.4758735
## [5] {age=[27,37)}                => {affairs=no} 0.3227953 0.7238806  0.4459235
## [6] {children=yes}               => {affairs=no} 0.5108153 0.7139535  0.7154742
##     lift      count
## [1] 1.1373003 198  
## [2] 1.0280013 243  
## [3] 1.0000000 451  
## [4] 0.9691594 208  
## [5] 0.9646391 194  
## [6] 0.9514103 307
plot(rules.no, method = "graph", colors = wes_palette('GrandBudapest1', 3, type = ('continuous')))

In approximately 32.95% of cases where the marriage rating is very happy, there is an 85.34% confidence that there are no affairs.

So what is recipe for happy marriage?

rules.happy <- apriori(data = data_trans,
                       parameter = list(supp = 0.15, conf = 0.25),
                       appearance = list(default = "lhs", rhs = "rating_marriage=very happy"),
                       control = list(verbose = FALSE))

rules_lift_sorted <- sort(rules.happy, by = "confidence")

if (length(rules_lift_sorted) >= 10) {
  inspect(rules_lift_sorted[1:10])
} else {
  inspect(rules_lift_sorted)
}
##     lhs                rhs                            support confidence  coverage      lift count
## [1] {children=no}   => {rating_marriage=very happy} 0.1597338  0.5614035 0.2845258 1.4543255    96
## [2] {affairs=no,                                                                                  
##      gender=female} => {rating_marriage=very happy} 0.1863561  0.4609053 0.4043261 1.1939833   112
## [3] {affairs=no}    => {rating_marriage=very happy} 0.3294509  0.4390244 0.7504160 1.1373003   198
## [4] {gender=female} => {rating_marriage=very happy} 0.2163062  0.4126984 0.5241265 1.0691024   130
## [5] {age=[27,37)}   => {rating_marriage=very happy} 0.1780366  0.3992537 0.4459235 1.0342737   107
## [6] {}              => {rating_marriage=very happy} 0.3860233  0.3860233 1.0000000 1.0000000   232
## [7] {affairs=no,                                                                                  
##      children=yes}  => {rating_marriage=very happy} 0.1896839  0.3713355 0.5108153 0.9619510   114
## [8] {gender=male}   => {rating_marriage=very happy} 0.1697171  0.3566434 0.4758735 0.9238908   102
## [9] {children=yes}  => {rating_marriage=very happy} 0.2262895  0.3162791 0.7154742 0.8193264   136
plot(rules.happy, method="graph", colors = wes_palette('GrandBudapest1', 3, type = ('continuous')))

These rules collectively provide insights into the relationships between the absence of children, the absence of affairs, being female, and the likelihood of a very happy marriage.

Conclusions

According to StatSoft each year of the relationship increases the risk of infidelity by about 8.2%, which would confirm the strong relationship between the variable: years of marriage = almost 15 years. Statistics also confirm the strong relationship between infidelity and age, as a person is less likely to take advantage of infidelity opportunities as he or she ages, and the risk decreases by 5.8% with each year.

source: StatSoft