Association rules represent relationships or dependencies between items in a dataset.They are used to uncover frequent itemsets and find interesting relationships between items.
A happy marriage is every couple’s dream, unfortunately things don’t always go according to plan. What is the recipe for a happy marriage? Or what prompts the other half to cheat on you? There is no single accepted definition of infidelity. In a general and broad sense, betrayal is the intentional and fully conscious letting down of a given trust. Based on Fair’s Extramarital Affairs Data, an analysis will be made using the association rules. Original dataset can be found here: https://www.kaggle.com/datasets/utkarshx27/fairs-extramarital-affairs-data, dataset used for analysis has been adjusted. Also materials used to determine the reliability of the results can be found here: https://media.statsoft.pl/_old_dnn/downloads/niewiernosc_malzenska_analiza_ankiety_internetowej.pdf
How dataset looks like:
head(Affairs)
## affairs gender age yearsmarried children religiousness
## 1 no male 37 almost 10 years no normally
## 2 no female 27 between 2 and 8 years no more than average
## 3 no female 32 almost 15 years yes anti
## 4 no male 57 almost 15 years yes very
## 5 no male 22 less than 2 years no little
## 6 no female 32 less than 2 years no little
## education occupation
## 1 master degree professionals, high-ranking executives
## 2 college lower-level managers, teachers, nurses
## 3 high school graduate unemployed, individuals not in the workforce
## 4 master degree lower-level managers, teachers, nurses
## 5 graduate work lower-level managers, teachers, nurses
## 6 graduate work small business owners, skilled workers
## rating_marriage
## 1 happy
## 2 happy
## 3 happy
## 4 very happy
## 5 average happy
## 6 very happy
Are there missing values?
sum(is.na(Affairs))
## [1] 0
Descriptive stats
summary(Affairs)
## affairs gender age yearsmarried
## Length:601 Length:601 Min. :17.50 Length:601
## Class :character Class :character 1st Qu.:27.00 Class :character
## Mode :character Mode :character Median :32.00 Mode :character
## Mean :32.49
## 3rd Qu.:37.00
## Max. :57.00
## children religiousness education occupation
## Length:601 Length:601 Length:601 Length:601
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## rating_marriage
## Length:601
## Class :character
## Mode :character
##
##
##
Gender distribution
table_gender <- table(Affairs$gender)
library("wesanderson")
barplot(table_gender, col = wes_palette('GrandBudapest1', 2, type = ('continuous')), main = "Gender distribution", xlab = "Gender", ylab = "Count")
What is the most frequent in dataset?
itemFrequencyPlot(data_trans, topN=15, type="relative", main="ItemFrequency", col = wes_palette('GrandBudapest1', 15, type = ('continuous')))
Affair_rules <- apriori(Affairs, parameter = list(support = 0.2, confidence = 0.75, minlen = 2))
## Warning: Column(s) 1, 2, 3, 4, 5, 6, 7, 8, 9 not logical or factor. Applying
## default discretization (see '? discretizeDF').
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.75 0.1 1 none FALSE TRUE 5 0.2 2
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 120
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[37 item(s), 601 transaction(s)] done [0.00s].
## sorting and recoding items ... [20 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [24 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
Affair_rules
## set of 24 rules
Graph for 24 rules
set.seed(123)
plot(Affair_rules, method="graph", colors = wes_palette('GrandBudapest1', 3, type = ('continuous')))
plot(Affair_rules, method="grouped", col = wes_palette('GrandBudapest1', 3, type = ('continuous')))
Let’s inspect the most meaningful rules
inspect(sort(Affair_rules, by = "lift")[1:5])
## lhs rhs support confidence coverage lift count
## [1] {age=[37,57],
## children=yes} => {yearsmarried=almost 15 years} 0.2778702 0.8608247 0.3227953 2.536057 167
## [2] {affairs=no,
## yearsmarried=almost 15 years} => {age=[37,57]} 0.2063228 0.8732394 0.2362729 2.499128 124
## [3] {yearsmarried=almost 15 years} => {age=[37,57]} 0.2895175 0.8529412 0.3394343 2.441036 174
## [4] {age=[37,57]} => {yearsmarried=almost 15 years} 0.2895175 0.8285714 0.3494176 2.441036 174
## [5] {yearsmarried=almost 15 years,
## children=yes} => {age=[37,57]} 0.2778702 0.8520408 0.3261231 2.438460 167
inspect(sort(Affair_rules, by = "support")[1:5])
## lhs rhs support confidence
## [1] {gender=female} => {affairs=no} 0.4043261 0.7714286
## [2] {age=[27,37)} => {children=yes} 0.3344426 0.7500000
## [3] {rating_marriage=very happy} => {affairs=no} 0.3294509 0.8534483
## [4] {yearsmarried=almost 15 years} => {children=yes} 0.3261231 0.9607843
## [5] {age=[37,57]} => {children=yes} 0.3227953 0.9238095
## coverage lift count
## [1] 0.5241265 1.028001 243
## [2] 0.4459235 1.048256 201
## [3] 0.3860233 1.137300 198
## [4] 0.3394343 1.342864 196
## [5] 0.3494176 1.291185 194
inspect(sort(Affair_rules, by = "confidence")[1:5])
## lhs rhs support confidence coverage lift count
## [1] {affairs=no,
## yearsmarried=almost 15 years} => {children=yes} 0.2279534 0.9647887 0.2362729 1.348461 137
## [2] {yearsmarried=almost 15 years} => {children=yes} 0.3261231 0.9607843 0.3394343 1.342864 196
## [3] {age=[37,57],
## yearsmarried=almost 15 years} => {children=yes} 0.2778702 0.9597701 0.2895175 1.341446 167
## [4] {age=[37,57]} => {children=yes} 0.3227953 0.9238095 0.3494176 1.291185 194
## [5] {affairs=no,
## age=[37,57]} => {children=yes} 0.2346090 0.9215686 0.2545757 1.288053 141
It is possible to distinguish some nice connections for example first rule suggests that when affairs are not present, and the years married is almost 15 years, there is a high confidence (96.5%) that children are present, with a lift of 1.35.
freq.items<-eclat(Affairs, parameter=list(supp=0.2, maxlen=3))
## Warning: Column(s) 1, 2, 3, 4, 5, 6, 7, 8, 9 not logical or factor. Applying
## default discretization (see '? discretizeDF').
## Eclat
##
## parameter specification:
## tidLists support minlen maxlen target ext
## FALSE 0.2 1 3 frequent itemsets TRUE
##
## algorithmic control:
## sparse sort verbose
## 7 -2 TRUE
##
## Absolute minimum support count: 120
##
## create itemset ...
## set transactions ...[37 item(s), 601 transaction(s)] done [0.00s].
## sorting and recoding items ... [20 item(s)] done [0.00s].
## creating bit matrix ... [20 row(s), 601 column(s)] done [0.00s].
## writing ... [58 set(s)] done [0.00s].
## Creating S4 object ... done [0.00s].
freq.rules<-ruleInduction(freq.items, data_trans, confidence=0.8)
freq.rules
## set of 16 rules
sorted_rules <- sort(freq.rules, by = "confidence", decreasing = TRUE)
inspect(freq.rules)
## lhs rhs support confidence lift itemset
## [1] {yearsmarried=less than 2 years} => {affairs=no} 0.2046589 0.8785714 1.170779 1
## [2] {affairs=yes} => {children=yes} 0.2046589 0.8200000 1.146093 2
## [3] {children=no} => {affairs=no} 0.2396007 0.8421053 1.122185 4
## [4] {religiousness=more than average} => {affairs=no} 0.2612313 0.8263158 1.101144 8
## [5] {affairs=no,
## yearsmarried=almost 15 years} => {age=[37,57]} 0.2063228 0.8732394 2.499128 12
## [6] {affairs=no,
## age=[37,57]} => {yearsmarried=almost 15 years} 0.2063228 0.8104575 2.387671 12
## [7] {yearsmarried=almost 15 years,
## children=yes} => {age=[37,57]} 0.2778702 0.8520408 2.438460 13
## [8] {age=[37,57],
## children=yes} => {yearsmarried=almost 15 years} 0.2778702 0.8608247 2.536057 13
## [9] {age=[37,57],
## yearsmarried=almost 15 years} => {children=yes} 0.2778702 0.9597701 1.341446 13
## [10] {affairs=no,
## yearsmarried=almost 15 years} => {children=yes} 0.2279534 0.9647887 1.348461 14
## [11] {yearsmarried=almost 15 years} => {children=yes} 0.3261231 0.9607843 1.342864 16
## [12] {yearsmarried=almost 15 years} => {age=[37,57]} 0.2895175 0.8529412 2.441036 17
## [13] {age=[37,57]} => {yearsmarried=almost 15 years} 0.2895175 0.8285714 2.441036 17
## [14] {affairs=no,
## age=[37,57]} => {children=yes} 0.2346090 0.9215686 1.288053 20
## [15] {age=[37,57]} => {children=yes} 0.3227953 0.9238095 1.291185 22
## [16] {rating_marriage=very happy} => {affairs=no} 0.3294509 0.8534483 1.137300 24
The association rule from ECLAT algorithm suggests a strong relationship between individuals who have been married for less than 2 years and the absence of affairs, as the high confidence (87.86%) indicates a significant likelihood of not having affairs among those with less than 2 years of marriage. Additionally, the support value (20.47%) indicates that this association is observed in approximately 20.47% of the transactions in the dataset. In approximately 20.47% of cases where affairs are present, there is an 82.00% confidence that children are also present. The lift of 1.15 suggests a moderate positive correlation, indicating that the presence of “affairs=yes” slightly increases the likelihood of “children=yes.” In opposite in approximately 23.96% of cases where there are no children, there is an 84.21% confidence that there are no affairs. Next rule implies a positive correlation between higher religiousness and the absence of affairs. This rule indicates that when the level of religiousness is more than average, there is an 82.63% confidence that there are no affairs.
Let’s go deeper and see what the rules are that lead to betrayal
Apperance of affair
rules.yes <- apriori(data = data_trans,
parameter = list(supp = 0.08, conf = 0.2),
appearance = list(default = "lhs", rhs = "affairs=yes"),
control = list(verbose = FALSE))
rules_lift_sorted_yes <- sort(rules.yes, by = "confidence")
if (length(rules_lift_sorted_yes) >= 10) {
inspect(rules_lift_sorted_yes[1:10])
} else {
inspect(rules_lift_sorted_yes)
}
## lhs rhs support confidence coverage lift count
## [1] {yearsmarried=almost 15 years} => {affairs=yes} 0.10316140 0.3039216 0.3394343 1.217712 62
## [2] {yearsmarried=almost 15 years,
## children=yes} => {affairs=yes} 0.09816972 0.3010204 0.3261231 1.206088 59
## [3] {age=[27,37),
## children=yes} => {affairs=yes} 0.09983361 0.2985075 0.3344426 1.196020 60
## [4] {gender=male,
## children=yes} => {affairs=yes} 0.10482529 0.2943925 0.3560732 1.179533 63
## [5] {age=[37,57],
## yearsmarried=almost 15 years} => {affairs=yes} 0.08319468 0.2873563 0.2895175 1.151341 50
## [6] {children=yes} => {affairs=yes} 0.20465890 0.2860465 0.7154742 1.146093 123
## [7] {gender=female,
## children=yes} => {affairs=yes} 0.09983361 0.2777778 0.3594010 1.112963 60
## [8] {age=[27,37)} => {affairs=yes} 0.12312812 0.2761194 0.4459235 1.106318 74
## [9] {age=[37,57],
## children=yes} => {affairs=yes} 0.08818636 0.2731959 0.3227953 1.094605 53
## [10] {gender=male} => {affairs=yes} 0.12978369 0.2727273 0.4758735 1.092727 78
plot(rules.yes, method = "graph", colors = wes_palette('GrandBudapest1', 3, type = ('continuous')))
The strongest relationship is between individuals who have been married almost 15 years and the apperance of affairs.There is also a strong relationship between the occurrence of infidelity and the age range of 27-37 or 37-57. Having children is also a strong relationship with the occurrence of infidelity. There is a stereotype that men are more likely to cheat in a relationship, but the tendency among women to cheat is increasing, according to StatSoft. From a survey of a sample of 3,200 people, opportunities for infidelity are used by 26.9% of women and 28.9% of men.
source: StatSoft
Absence of affair
rules.no<-apriori(data=data_trans, parameter=list(supp=0.3,conf = 0.7),
appearance=list(default="lhs", rhs="affairs=no"), control=list(verbose=F))
rules_lift_sorted_no <- sort(rules.no, by = "confidence")
if (length(rules_lift_sorted_no) >= 10) {
inspect(rules_lift_sorted_no[1:10])
} else {
inspect(rules_lift_sorted_no)
}
## lhs rhs support confidence coverage
## [1] {rating_marriage=very happy} => {affairs=no} 0.3294509 0.8534483 0.3860233
## [2] {gender=female} => {affairs=no} 0.4043261 0.7714286 0.5241265
## [3] {} => {affairs=no} 0.7504160 0.7504160 1.0000000
## [4] {gender=male} => {affairs=no} 0.3460899 0.7272727 0.4758735
## [5] {age=[27,37)} => {affairs=no} 0.3227953 0.7238806 0.4459235
## [6] {children=yes} => {affairs=no} 0.5108153 0.7139535 0.7154742
## lift count
## [1] 1.1373003 198
## [2] 1.0280013 243
## [3] 1.0000000 451
## [4] 0.9691594 208
## [5] 0.9646391 194
## [6] 0.9514103 307
plot(rules.no, method = "graph", colors = wes_palette('GrandBudapest1', 3, type = ('continuous')))
In approximately 32.95% of cases where the marriage rating is very happy, there is an 85.34% confidence that there are no affairs.
So what is recipe for happy marriage?
rules.happy <- apriori(data = data_trans,
parameter = list(supp = 0.15, conf = 0.25),
appearance = list(default = "lhs", rhs = "rating_marriage=very happy"),
control = list(verbose = FALSE))
rules_lift_sorted <- sort(rules.happy, by = "confidence")
if (length(rules_lift_sorted) >= 10) {
inspect(rules_lift_sorted[1:10])
} else {
inspect(rules_lift_sorted)
}
## lhs rhs support confidence coverage lift count
## [1] {children=no} => {rating_marriage=very happy} 0.1597338 0.5614035 0.2845258 1.4543255 96
## [2] {affairs=no,
## gender=female} => {rating_marriage=very happy} 0.1863561 0.4609053 0.4043261 1.1939833 112
## [3] {affairs=no} => {rating_marriage=very happy} 0.3294509 0.4390244 0.7504160 1.1373003 198
## [4] {gender=female} => {rating_marriage=very happy} 0.2163062 0.4126984 0.5241265 1.0691024 130
## [5] {age=[27,37)} => {rating_marriage=very happy} 0.1780366 0.3992537 0.4459235 1.0342737 107
## [6] {} => {rating_marriage=very happy} 0.3860233 0.3860233 1.0000000 1.0000000 232
## [7] {affairs=no,
## children=yes} => {rating_marriage=very happy} 0.1896839 0.3713355 0.5108153 0.9619510 114
## [8] {gender=male} => {rating_marriage=very happy} 0.1697171 0.3566434 0.4758735 0.9238908 102
## [9] {children=yes} => {rating_marriage=very happy} 0.2262895 0.3162791 0.7154742 0.8193264 136
plot(rules.happy, method="graph", colors = wes_palette('GrandBudapest1', 3, type = ('continuous')))
These rules collectively provide insights into the relationships between the absence of children, the absence of affairs, being female, and the likelihood of a very happy marriage.
According to StatSoft each year of the relationship increases the risk of infidelity by about 8.2%, which would confirm the strong relationship between the variable: years of marriage = almost 15 years. Statistics also confirm the strong relationship between infidelity and age, as a person is less likely to take advantage of infidelity opportunities as he or she ages, and the risk decreases by 5.8% with each year.
source: StatSoft