“We introduced the problem of mining association rules between sets of items in a large database of customer transactions. Each transaction consists of items purchased by a customer in a visit. We are interested in finding those rules that have:
● Minimum transactional support s — the union of items in the consequent and antecedent of the rule is present in a minimum of S% of transactions in the database.
● Minimum confidence c — at least c% of transactions in the database that satisfy the antecedent of the rule also satisfy the consequent of the rule.
The rules that we discover have one item in the consequent and a union of any number of items in the antecedent. We solve this problem by decomposing it into two subproblems:
Finding all itemsets, called large itemsets, that are present in at least 5% of transactions.
Generating from each large itemset, rules that use items from the large itemset.
Having obtained the large itemsets and their transactional support count, the solution to the second subproblem is rather straightforward. A simple solution to the first subproblem is to form all itemsets and obtain their support in one pass over the data. However, this solution is computationally infeasible — if there are m items in the database, there will be 2^m possible itemsets, and m can easily be more than 1000.”
(Rakesh Agrawal, Tomasz Imielinski, Arun Swami, 1993, p. 9)
https://dl.acm.org/doi/10.1145/170035.170072#abstract
That was the first time the concepts of support and confidence were introduced and the first version of Apriori algorithm for association rule mining was published.
In this analysis, we will apply the Apriori algorithm to a dataset containing survey responses from young people to uncover interesting associations between different attributes, such as smoking habits, alcohol consumption, and lifestyle choices made by different gender and ages. By identifying these patterns, we aim to gain insights into the relationships between various behaviors and demographics.
Dataset: https://www.kaggle.com/datasets/miroslavsabo/young-people-survey
The effectiveness of the Apriori algorithm depends on several factors:
The algorithm requires a clean, well-structured dataset with transactional and categorical data (e.g., market baskets, survey responses). Missing or inconsistent data can lead to inaccurate results.
Setting appropriate values for minimum support and minimum confidence is crucial. If the thresholds are too high, the algorithm may miss important patterns. If they are too low, it may generate too many irrelevant rules.
Apriori can be computationally expensive for very large datasets because it requires multiple passes over the data to identify frequent itemsets.
Understanding the context of the data is essential for interpreting the rules. For example, a rule like {Diapers} → {Beer} might seem surprising without knowing that it reflects a real-world shopping pattern among young parents.
Firstly I am looking at the structure of data, sample entries and the number of missing values for each feature, in order to find out what kind of preprocessing is needed in order to achieve satisfying results.
# Display the structure and summary of the data
str(data)
## 'data.frame': 1010 obs. of 150 variables:
## $ Music : int 5 4 5 5 5 5 5 5 5 5 ...
## $ Slow.songs.or.fast.songs : int 3 4 5 3 3 3 5 3 3 3 ...
## $ Dance : int 2 2 2 2 4 2 5 3 3 2 ...
## $ Folk : int 1 1 2 1 3 3 3 2 1 5 ...
## $ Country : int 2 1 3 1 2 2 1 1 1 2 ...
## $ Classical.music : int 2 1 4 1 4 3 2 2 2 2 ...
## $ Musical : int 1 2 5 1 3 3 2 2 4 5 ...
## $ Pop : int 5 3 3 2 5 2 5 4 3 3 ...
## $ Rock : int 5 5 5 2 3 5 3 5 5 5 ...
## $ Metal.or.Hardrock : int 1 4 3 1 1 5 1 1 5 2 ...
## $ Punk : int 1 4 4 4 2 3 1 2 1 3 ...
## $ Hiphop..Rap : int 1 1 1 2 5 4 3 3 1 2 ...
## $ Reggae..Ska : int 1 3 4 2 3 3 1 2 2 4 ...
## $ Swing..Jazz : int 1 1 3 1 2 4 1 2 2 4 ...
## $ Rock.n.roll : int 3 4 5 2 1 4 2 3 2 4 ...
## $ Alternative : int 1 4 5 5 2 5 3 1 NA 4 ...
## $ Latino : int 1 2 5 1 4 3 3 2 1 5 ...
## $ Techno..Trance : int 1 1 1 2 2 1 5 3 1 1 ...
## $ Opera : int 1 1 3 1 2 3 2 2 1 2 ...
## $ Movies : int 5 5 5 5 5 5 4 5 5 5 ...
## $ Horror : int 4 2 3 4 4 5 2 4 1 2 ...
## $ Thriller : int 2 2 4 4 4 5 1 4 5 1 ...
## $ Comedy : int 5 4 4 3 5 5 5 5 5 5 ...
## $ Romantic : int 4 3 2 3 2 2 3 2 4 5 ...
## $ Sci.fi : int 4 4 4 4 3 3 1 3 4 1 ...
## $ War : int 1 1 2 3 3 3 3 3 5 3 ...
## $ Fantasy.Fairy.tales : int 5 3 5 1 4 4 5 4 4 4 ...
## $ Animated : int 5 5 5 2 4 3 5 4 4 4 ...
## $ Documentary : int 3 4 2 5 3 3 3 3 5 4 ...
## $ Western : int 1 1 2 1 1 2 1 1 1 1 ...
## $ Action : int 2 4 1 2 4 4 2 3 1 2 ...
## $ History : int 1 1 1 4 3 5 3 5 3 3 ...
## $ Psychology : int 5 3 2 4 2 3 3 2 2 2 ...
## $ Politics : int 1 4 1 5 3 4 1 3 1 3 ...
## $ Mathematics : int 3 5 5 4 2 2 1 1 1 3 ...
## $ Physics : int 3 2 2 1 2 3 1 1 1 1 ...
## $ Internet : int 5 4 4 3 2 4 2 5 1 5 ...
## $ PC : int 3 4 2 1 2 4 1 4 1 1 ...
## $ Economy.Management : int 5 5 4 2 2 1 3 1 1 4 ...
## $ Biology : int 3 1 1 3 3 4 5 2 3 2 ...
## $ Chemistry : int 3 1 1 3 3 4 5 2 1 1 ...
## $ Reading : int 3 4 5 5 5 3 3 2 5 4 ...
## $ Geography : int 3 4 2 4 2 3 3 3 1 4 ...
## $ Foreign.languages : int 5 5 5 4 3 4 4 4 1 5 ...
## $ Medicine : int 3 1 2 2 3 4 5 1 1 1 ...
## $ Law : int 1 2 3 5 2 3 3 2 1 1 ...
## $ Cars : int 1 2 1 1 3 5 4 1 1 1 ...
## $ Art.exhibitions : int 1 2 5 5 1 2 1 1 1 4 ...
## $ Religion : int 1 1 5 4 4 2 1 2 2 4 ...
## $ Countryside..outdoors : int 5 1 5 1 4 5 4 2 4 4 ...
## $ Dancing : int 3 1 5 1 1 1 3 1 1 5 ...
## $ Musical.instruments : int 3 1 5 1 3 5 2 1 2 3 ...
## $ Writing : int 2 1 5 3 1 1 1 1 1 1 ...
## $ Passive.sport : int 1 1 5 1 3 5 5 4 4 4 ...
## $ Active.sport : int 5 1 2 1 1 4 3 5 1 4 ...
## $ Gardening : int 5 1 1 1 4 2 3 1 1 1 ...
## $ Celebrities : int 1 2 1 2 3 1 1 3 5 2 ...
## $ Shopping : int 4 3 4 4 3 2 3 3 2 4 ...
## $ Science.and.technology : int 4 3 2 3 3 3 4 2 1 3 ...
## $ Theatre : int 2 2 5 1 2 1 3 2 5 5 ...
## $ Fun.with.friends : int 5 4 5 2 4 3 5 4 4 5 ...
## $ Adrenaline.sports : int 4 2 5 1 2 3 1 2 1 2 ...
## $ Pets : int 4 5 5 1 1 2 5 5 1 2 ...
## $ Flying : int 1 1 1 2 1 3 1 3 2 4 ...
## $ Storm : int 1 1 1 1 2 2 3 2 3 5 ...
## $ Darkness : int 1 1 1 1 1 2 2 4 1 4 ...
## $ Heights : int 1 2 1 3 1 2 1 3 5 5 ...
## $ Spiders : int 1 1 1 5 1 1 1 1 5 3 ...
## $ Snakes : int 5 1 1 5 1 2 5 5 5 4 ...
## $ Rats : int 3 1 1 5 2 2 1 3 2 4 ...
## $ Ageing : int 1 3 1 4 2 1 4 1 2 3 ...
## $ Dangerous.dogs : int 3 1 1 5 4 1 1 2 3 5 ...
## $ Fear.of.public.speaking : int 2 4 2 5 3 3 1 4 4 3 ...
## $ Smoking : chr "never smoked" "never smoked" "tried smoking" "former smoker" ...
## $ Alcohol : chr "drink a lot" "drink a lot" "drink a lot" "drink a lot" ...
## $ Healthy.eating : int 4 3 3 3 4 2 4 2 1 3 ...
## $ Daily.events : int 2 3 1 4 3 2 3 3 1 4 ...
## $ Prioritising.workload : int 2 2 2 4 1 2 5 1 2 2 ...
## $ Writing.notes : int 5 4 5 4 2 3 5 3 1 2 ...
## $ Workaholism : int 4 5 3 5 3 3 5 2 4 3 ...
## $ Thinking.ahead : int 2 4 5 3 5 3 3 4 2 3 ...
## $ Final.judgement : int 5 1 3 1 5 1 3 3 5 5 ...
## $ Reliability : int 4 4 4 3 5 3 4 3 5 4 ...
## $ Keeping.promises : int 4 4 5 4 4 4 5 3 4 5 ...
## $ Loss.of.interest : int 1 3 1 5 2 3 3 1 1 3 ...
## $ Friends.versus.money : int 3 4 5 2 3 2 4 4 4 4 ...
## $ Funniness : int 5 3 2 1 3 3 4 4 2 3 ...
## $ Fake : int 1 2 4 1 2 1 1 2 2 1 ...
## $ Criminal.damage : int 1 1 1 5 1 4 2 1 1 2 ...
## $ Decision.making : int 3 2 3 5 3 2 2 3 4 5 ...
## $ Elections : int 4 5 5 5 5 5 5 5 1 5 ...
## $ Self.criticism : int 1 4 4 5 5 4 3 3 3 4 ...
## $ Judgment.calls : int 3 4 4 4 5 4 5 5 2 5 ...
## $ Hypochondria : int 1 1 1 3 1 1 1 2 2 1 ...
## $ Empathy : int 3 2 5 3 3 4 4 1 5 4 ...
## $ Eating.to.survive : int 1 1 5 1 1 2 1 2 1 1 ...
## $ Giving : int 4 2 5 1 3 3 5 3 1 4 ...
## $ Compassion.to.animals : int 5 4 4 2 3 5 5 5 4 5 ...
## $ Borrowed.stuff : int 4 3 2 5 4 5 5 2 5 4 ...
## [list output truncated]
# Names of columns with NAs
colnames(data)[colSums(is.na(data)) > 0]
## [1] "Music" "Slow.songs.or.fast.songs"
## [3] "Dance" "Folk"
## [5] "Country" "Classical.music"
## [7] "Musical" "Pop"
## [9] "Rock" "Metal.or.Hardrock"
## [11] "Punk" "Hiphop..Rap"
## [13] "Reggae..Ska" "Swing..Jazz"
## [15] "Rock.n.roll" "Alternative"
## [17] "Latino" "Techno..Trance"
## [19] "Opera" "Movies"
## [21] "Horror" "Thriller"
## [23] "Comedy" "Romantic"
## [25] "Sci.fi" "War"
## [27] "Fantasy.Fairy.tales" "Animated"
## [29] "Documentary" "Western"
## [31] "Action" "History"
## [33] "Psychology" "Politics"
## [35] "Mathematics" "Physics"
## [37] "Internet" "PC"
## [39] "Economy.Management" "Biology"
## [41] "Chemistry" "Reading"
## [43] "Geography" "Foreign.languages"
## [45] "Medicine" "Law"
## [47] "Cars" "Art.exhibitions"
## [49] "Religion" "Countryside..outdoors"
## [51] "Dancing" "Musical.instruments"
## [53] "Writing" "Passive.sport"
## [55] "Active.sport" "Gardening"
## [57] "Celebrities" "Shopping"
## [59] "Science.and.technology" "Theatre"
## [61] "Fun.with.friends" "Adrenaline.sports"
## [63] "Pets" "Flying"
## [65] "Storm" "Darkness"
## [67] "Heights" "Spiders"
## [69] "Rats" "Ageing"
## [71] "Dangerous.dogs" "Fear.of.public.speaking"
## [73] "Healthy.eating" "Daily.events"
## [75] "Prioritising.workload" "Writing.notes"
## [77] "Workaholism" "Thinking.ahead"
## [79] "Final.judgement" "Reliability"
## [81] "Keeping.promises" "Loss.of.interest"
## [83] "Friends.versus.money" "Funniness"
## [85] "Fake" "Criminal.damage"
## [87] "Decision.making" "Elections"
## [89] "Self.criticism" "Judgment.calls"
## [91] "Hypochondria" "Empathy"
## [93] "Giving" "Compassion.to.animals"
## [95] "Borrowed.stuff" "Loneliness"
## [97] "Cheating.in.school" "Health"
## [99] "Changing.the.past" "God"
## [101] "Charity" "Waiting"
## [103] "New.environment" "Mood.swings"
## [105] "Appearence.and.gestures" "Socializing"
## [107] "Achievements" "Responding.to.a.serious.letter"
## [109] "Children" "Assertiveness"
## [111] "Getting.angry" "Knowing.the.right.people"
## [113] "Public.speaking" "Unpopularity"
## [115] "Life.struggles" "Happiness.in.life"
## [117] "Energy.levels" "Small...big.dogs"
## [119] "Personality" "Finding.lost.valuables"
## [121] "Getting.up" "Interests.or.hobbies"
## [123] "Parents..advice" "Questionnaires.or.polls"
## [125] "Finances" "Shopping.centres"
## [127] "Branded.clothing" "Entertainment.spending"
## [129] "Spending.on.looks" "Spending.on.healthy.eating"
## [131] "Age" "Height"
## [133] "Weight" "Number.of.siblings"
# Number of NA rows
sum(rowSums(is.na(data)) > 0)
## [1] 324
Given the nature of our data it’s pretty hard to interpolate individual human traits, like fear of Rats or affinity for obeying Law. Because of that I decided to fill the Age, Height and Weight from the mean and drop all the other NA rows.
# Fill in mean for Age, Height and Weight
data$Age[is.na(data$Age)] <- mean(data$Age, na.rm = TRUE)
data$Height[is.na(data$Height)] <- mean(data$Height, na.rm = TRUE)
data$Weight[is.na(data$Weight)] <- mean(data$Weight, na.rm = TRUE)
# Checking how many rows were populated with means
sum(rowSums(is.na(data)) > 0)
## [1] 309
# Drop rows with remaining NA values
data <- data[complete.cases(data), ]
colnames(data)[colSums(is.na(data)) > 0]
## character(0)
sum(rowSums(is.na(data)) > 0)
## [1] 0
To perform association rules mining it is necessary to have data with categorical values, as of right now we have numerical values for the majority of the dataset.
With the function below I binned the survey responses ranging from 1
to 5 into:
● ‘Low’ for values of 1-2
● ‘Medium for value of 3
● ’High’ for values of 4-5.
Separate binning logic was performed for Age, Height, Weight and Number of Siblings.
The column name was preappended to the value so that is easier to identify the newly found rules later on.
# Define a function to bin numerical values into categories and append the column name
bin_numerical <- function(x, col_name) {
if (is.numeric(x)) {
if (col_name == "Age") {
# Age binning
cut(x,
breaks = c(-Inf, 20, 25, Inf),
labels = paste0(col_name, "_", c("15-20", "20-25", "25-30")),
right = TRUE)
} else if (col_name == "Height") {
# Height binning
cut(x,
breaks = c(-Inf, 160, 180, Inf),
labels = paste0(col_name, "_", c("Short", "Medium", "Tall")),
right = TRUE)
} else if (col_name == "Number.of.siblings") {
# Number.of.siblings binning
cut(x,
breaks = c(-Inf, 0, 1, 3, 5,Inf),
labels = paste0(col_name, "_", c("Zero", "One", "Two or Three", "Four or Five", "Six or more")),
right = TRUE)
} else if (col_name == "Weight") {
# Height binning
cut(x,
breaks = c(-Inf, 50, 80, Inf),
labels = paste0(col_name, "_", c("Low", "Medium", "Big")),
right = TRUE)
} else {
# General numerical binning
cut(x,
breaks = c(-Inf, 2, 3, 5),
labels = paste0(col_name, "_", c("Low", "Medium", "High")),
right = TRUE)
}
} else {
if (col_name == "Punctuality") {
paste0(col_name, "_", x)
} else if (col_name == "Internet.usage") {
paste0(col_name, "_", x)
} else if (col_name == "Lying") {
paste0(col_name, "_", x)
} else if (col_name == "Smoking") {
paste0(col_name, "_", x)
} else if (col_name == "Alcohol") {
paste0(col_name, "_", x)
} else if (col_name == "Only.child") {
paste0(col_name, "_", x)
} else {
x
}
}
}
# Apply the binning function to the dataset
transformed_data <- data %>%
mutate(across(everything(), ~bin_numerical(., cur_column())))
## SAVE PREPROCESSED DATA
write.csv(transformed_data, file = "data\\final.csv", row.names = FALSE)
### ASSOCIATION RULES
trans1<-read.transactions("data\\final.csv", format="basket", sep=",", skip=0) # reading the file as transactions
# View the transformed dataset
head(transformed_data)
## Music Slow.songs.or.fast.songs Dance Folk
## 1 Music_High Slow.songs.or.fast.songs_Medium Dance_Low Folk_Low
## 2 Music_High Slow.songs.or.fast.songs_High Dance_Low Folk_Low
## 3 Music_High Slow.songs.or.fast.songs_High Dance_Low Folk_Low
## 5 Music_High Slow.songs.or.fast.songs_Medium Dance_High Folk_Medium
## 6 Music_High Slow.songs.or.fast.songs_Medium Dance_Low Folk_Medium
## 7 Music_High Slow.songs.or.fast.songs_High Dance_High Folk_Medium
## Country Classical.music Musical Pop Rock
## 1 Country_Low Classical.music_Low Musical_Low Pop_High Rock_High
## 2 Country_Low Classical.music_Low Musical_Low Pop_Medium Rock_High
## 3 Country_Medium Classical.music_High Musical_High Pop_Medium Rock_High
## 5 Country_Low Classical.music_High Musical_Medium Pop_High Rock_Medium
## 6 Country_Low Classical.music_Medium Musical_Medium Pop_Low Rock_High
## 7 Country_Low Classical.music_Low Musical_Low Pop_High Rock_Medium
## Metal.or.Hardrock Punk Hiphop..Rap Reggae..Ska
## 1 Metal.or.Hardrock_Low Punk_Low Hiphop..Rap_Low Reggae..Ska_Low
## 2 Metal.or.Hardrock_High Punk_High Hiphop..Rap_Low Reggae..Ska_Medium
## 3 Metal.or.Hardrock_Medium Punk_High Hiphop..Rap_Low Reggae..Ska_High
## 5 Metal.or.Hardrock_Low Punk_Low Hiphop..Rap_High Reggae..Ska_Medium
## 6 Metal.or.Hardrock_High Punk_Medium Hiphop..Rap_High Reggae..Ska_Medium
## 7 Metal.or.Hardrock_Low Punk_Low Hiphop..Rap_Medium Reggae..Ska_Low
## Swing..Jazz Rock.n.roll Alternative Latino
## 1 Swing..Jazz_Low Rock.n.roll_Medium Alternative_Low Latino_Low
## 2 Swing..Jazz_Low Rock.n.roll_High Alternative_High Latino_Low
## 3 Swing..Jazz_Medium Rock.n.roll_High Alternative_High Latino_High
## 5 Swing..Jazz_Low Rock.n.roll_Low Alternative_Low Latino_High
## 6 Swing..Jazz_High Rock.n.roll_High Alternative_High Latino_Medium
## 7 Swing..Jazz_Low Rock.n.roll_Low Alternative_Medium Latino_Medium
## Techno..Trance Opera Movies Horror Thriller
## 1 Techno..Trance_Low Opera_Low Movies_High Horror_High Thriller_Low
## 2 Techno..Trance_Low Opera_Low Movies_High Horror_Low Thriller_Low
## 3 Techno..Trance_Low Opera_Medium Movies_High Horror_Medium Thriller_High
## 5 Techno..Trance_Low Opera_Low Movies_High Horror_High Thriller_High
## 6 Techno..Trance_Low Opera_Medium Movies_High Horror_High Thriller_High
## 7 Techno..Trance_High Opera_Low Movies_High Horror_Low Thriller_Low
## Comedy Romantic Sci.fi War
## 1 Comedy_High Romantic_High Sci.fi_High War_Low
## 2 Comedy_High Romantic_Medium Sci.fi_High War_Low
## 3 Comedy_High Romantic_Low Sci.fi_High War_Low
## 5 Comedy_High Romantic_Low Sci.fi_Medium War_Medium
## 6 Comedy_High Romantic_Low Sci.fi_Medium War_Medium
## 7 Comedy_High Romantic_Medium Sci.fi_Low War_Medium
## Fantasy.Fairy.tales Animated Documentary Western
## 1 Fantasy.Fairy.tales_High Animated_High Documentary_Medium Western_Low
## 2 Fantasy.Fairy.tales_Medium Animated_High Documentary_High Western_Low
## 3 Fantasy.Fairy.tales_High Animated_High Documentary_Low Western_Low
## 5 Fantasy.Fairy.tales_High Animated_High Documentary_Medium Western_Low
## 6 Fantasy.Fairy.tales_High Animated_Medium Documentary_Medium Western_Low
## 7 Fantasy.Fairy.tales_High Animated_High Documentary_Medium Western_Low
## Action History Psychology Politics
## 1 Action_Low History_Low Psychology_High Politics_Low
## 2 Action_High History_Low Psychology_Medium Politics_High
## 3 Action_Low History_Low Psychology_Low Politics_Low
## 5 Action_High History_Medium Psychology_Low Politics_Medium
## 6 Action_High History_High Psychology_Medium Politics_High
## 7 Action_Low History_Medium Psychology_Medium Politics_Low
## Mathematics Physics Internet PC
## 1 Mathematics_Medium Physics_Medium Internet_High PC_Medium
## 2 Mathematics_High Physics_Low Internet_High PC_High
## 3 Mathematics_High Physics_Low Internet_High PC_Low
## 5 Mathematics_Low Physics_Low Internet_Low PC_Low
## 6 Mathematics_Low Physics_Medium Internet_High PC_High
## 7 Mathematics_Low Physics_Low Internet_Low PC_Low
## Economy.Management Biology Chemistry Reading
## 1 Economy.Management_High Biology_Medium Chemistry_Medium Reading_Medium
## 2 Economy.Management_High Biology_Low Chemistry_Low Reading_High
## 3 Economy.Management_High Biology_Low Chemistry_Low Reading_High
## 5 Economy.Management_Low Biology_Medium Chemistry_Medium Reading_High
## 6 Economy.Management_Low Biology_High Chemistry_High Reading_Medium
## 7 Economy.Management_Medium Biology_High Chemistry_High Reading_Medium
## Geography Foreign.languages Medicine Law
## 1 Geography_Medium Foreign.languages_High Medicine_Medium Law_Low
## 2 Geography_High Foreign.languages_High Medicine_Low Law_Low
## 3 Geography_Low Foreign.languages_High Medicine_Low Law_Medium
## 5 Geography_Low Foreign.languages_Medium Medicine_Medium Law_Low
## 6 Geography_Medium Foreign.languages_High Medicine_High Law_Medium
## 7 Geography_Medium Foreign.languages_High Medicine_High Law_Medium
## Cars Art.exhibitions Religion Countryside..outdoors
## 1 Cars_Low Art.exhibitions_Low Religion_Low Countryside..outdoors_High
## 2 Cars_Low Art.exhibitions_Low Religion_Low Countryside..outdoors_Low
## 3 Cars_Low Art.exhibitions_High Religion_High Countryside..outdoors_High
## 5 Cars_Medium Art.exhibitions_Low Religion_High Countryside..outdoors_High
## 6 Cars_High Art.exhibitions_Low Religion_Low Countryside..outdoors_High
## 7 Cars_High Art.exhibitions_Low Religion_Low Countryside..outdoors_High
## Dancing Musical.instruments Writing Passive.sport
## 1 Dancing_Medium Musical.instruments_Medium Writing_Low Passive.sport_Low
## 2 Dancing_Low Musical.instruments_Low Writing_Low Passive.sport_Low
## 3 Dancing_High Musical.instruments_High Writing_High Passive.sport_High
## 5 Dancing_Low Musical.instruments_Medium Writing_Low Passive.sport_Medium
## 6 Dancing_Low Musical.instruments_High Writing_Low Passive.sport_High
## 7 Dancing_Medium Musical.instruments_Low Writing_Low Passive.sport_High
## Active.sport Gardening Celebrities Shopping
## 1 Active.sport_High Gardening_High Celebrities_Low Shopping_High
## 2 Active.sport_Low Gardening_Low Celebrities_Low Shopping_Medium
## 3 Active.sport_Low Gardening_Low Celebrities_Low Shopping_High
## 5 Active.sport_Low Gardening_High Celebrities_Medium Shopping_Medium
## 6 Active.sport_High Gardening_Low Celebrities_Low Shopping_Low
## 7 Active.sport_Medium Gardening_Medium Celebrities_Low Shopping_Medium
## Science.and.technology Theatre Fun.with.friends
## 1 Science.and.technology_High Theatre_Low Fun.with.friends_High
## 2 Science.and.technology_Medium Theatre_Low Fun.with.friends_High
## 3 Science.and.technology_Low Theatre_High Fun.with.friends_High
## 5 Science.and.technology_Medium Theatre_Low Fun.with.friends_High
## 6 Science.and.technology_Medium Theatre_Low Fun.with.friends_Medium
## 7 Science.and.technology_High Theatre_Medium Fun.with.friends_High
## Adrenaline.sports Pets Flying Storm Darkness
## 1 Adrenaline.sports_High Pets_High Flying_Low Storm_Low Darkness_Low
## 2 Adrenaline.sports_Low Pets_High Flying_Low Storm_Low Darkness_Low
## 3 Adrenaline.sports_High Pets_High Flying_Low Storm_Low Darkness_Low
## 5 Adrenaline.sports_Low Pets_Low Flying_Low Storm_Low Darkness_Low
## 6 Adrenaline.sports_Medium Pets_Low Flying_Medium Storm_Low Darkness_Low
## 7 Adrenaline.sports_Low Pets_High Flying_Low Storm_Medium Darkness_Low
## Heights Spiders Snakes Rats Ageing
## 1 Heights_Low Spiders_Low Snakes_High Rats_Medium Ageing_Low
## 2 Heights_Low Spiders_Low Snakes_Low Rats_Low Ageing_Medium
## 3 Heights_Low Spiders_Low Snakes_Low Rats_Low Ageing_Low
## 5 Heights_Low Spiders_Low Snakes_Low Rats_Low Ageing_Low
## 6 Heights_Low Spiders_Low Snakes_Low Rats_Low Ageing_Low
## 7 Heights_Low Spiders_Low Snakes_High Rats_Low Ageing_High
## Dangerous.dogs Fear.of.public.speaking Smoking
## 1 Dangerous.dogs_Medium Fear.of.public.speaking_Low Smoking_never smoked
## 2 Dangerous.dogs_Low Fear.of.public.speaking_High Smoking_never smoked
## 3 Dangerous.dogs_Low Fear.of.public.speaking_Low Smoking_tried smoking
## 5 Dangerous.dogs_High Fear.of.public.speaking_Medium Smoking_tried smoking
## 6 Dangerous.dogs_Low Fear.of.public.speaking_Medium Smoking_never smoked
## 7 Dangerous.dogs_Low Fear.of.public.speaking_Low Smoking_tried smoking
## Alcohol Healthy.eating Daily.events
## 1 Alcohol_drink a lot Healthy.eating_High Daily.events_Low
## 2 Alcohol_drink a lot Healthy.eating_Medium Daily.events_Medium
## 3 Alcohol_drink a lot Healthy.eating_Medium Daily.events_Low
## 5 Alcohol_social drinker Healthy.eating_High Daily.events_Medium
## 6 Alcohol_never Healthy.eating_Low Daily.events_Low
## 7 Alcohol_social drinker Healthy.eating_High Daily.events_Medium
## Prioritising.workload Writing.notes Workaholism
## 1 Prioritising.workload_Low Writing.notes_High Workaholism_High
## 2 Prioritising.workload_Low Writing.notes_High Workaholism_High
## 3 Prioritising.workload_Low Writing.notes_High Workaholism_Medium
## 5 Prioritising.workload_Low Writing.notes_Low Workaholism_Medium
## 6 Prioritising.workload_Low Writing.notes_Medium Workaholism_Medium
## 7 Prioritising.workload_High Writing.notes_High Workaholism_High
## Thinking.ahead Final.judgement Reliability
## 1 Thinking.ahead_Low Final.judgement_High Reliability_High
## 2 Thinking.ahead_High Final.judgement_Low Reliability_High
## 3 Thinking.ahead_High Final.judgement_Medium Reliability_High
## 5 Thinking.ahead_High Final.judgement_High Reliability_High
## 6 Thinking.ahead_Medium Final.judgement_Low Reliability_Medium
## 7 Thinking.ahead_Medium Final.judgement_Medium Reliability_High
## Keeping.promises Loss.of.interest Friends.versus.money
## 1 Keeping.promises_High Loss.of.interest_Low Friends.versus.money_Medium
## 2 Keeping.promises_High Loss.of.interest_Medium Friends.versus.money_High
## 3 Keeping.promises_High Loss.of.interest_Low Friends.versus.money_High
## 5 Keeping.promises_High Loss.of.interest_Low Friends.versus.money_Medium
## 6 Keeping.promises_High Loss.of.interest_Medium Friends.versus.money_Low
## 7 Keeping.promises_High Loss.of.interest_Medium Friends.versus.money_High
## Funniness Fake Criminal.damage Decision.making
## 1 Funniness_High Fake_Low Criminal.damage_Low Decision.making_Medium
## 2 Funniness_Medium Fake_Low Criminal.damage_Low Decision.making_Low
## 3 Funniness_Low Fake_High Criminal.damage_Low Decision.making_Medium
## 5 Funniness_Medium Fake_Low Criminal.damage_Low Decision.making_Medium
## 6 Funniness_Medium Fake_Low Criminal.damage_High Decision.making_Low
## 7 Funniness_High Fake_Low Criminal.damage_Low Decision.making_Low
## Elections Self.criticism Judgment.calls Hypochondria
## 1 Elections_High Self.criticism_Low Judgment.calls_Medium Hypochondria_Low
## 2 Elections_High Self.criticism_High Judgment.calls_High Hypochondria_Low
## 3 Elections_High Self.criticism_High Judgment.calls_High Hypochondria_Low
## 5 Elections_High Self.criticism_High Judgment.calls_High Hypochondria_Low
## 6 Elections_High Self.criticism_High Judgment.calls_High Hypochondria_Low
## 7 Elections_High Self.criticism_Medium Judgment.calls_High Hypochondria_Low
## Empathy Eating.to.survive Giving
## 1 Empathy_Medium Eating.to.survive_Low Giving_High
## 2 Empathy_Low Eating.to.survive_Low Giving_Low
## 3 Empathy_High Eating.to.survive_High Giving_High
## 5 Empathy_Medium Eating.to.survive_Low Giving_Medium
## 6 Empathy_High Eating.to.survive_Low Giving_Medium
## 7 Empathy_High Eating.to.survive_Low Giving_High
## Compassion.to.animals Borrowed.stuff Loneliness
## 1 Compassion.to.animals_High Borrowed.stuff_High Loneliness_Medium
## 2 Compassion.to.animals_High Borrowed.stuff_Medium Loneliness_Low
## 3 Compassion.to.animals_High Borrowed.stuff_Low Loneliness_High
## 5 Compassion.to.animals_Medium Borrowed.stuff_High Loneliness_Medium
## 6 Compassion.to.animals_High Borrowed.stuff_High Loneliness_Low
## 7 Compassion.to.animals_High Borrowed.stuff_High Loneliness_Medium
## Cheating.in.school Health Changing.the.past God
## 1 Cheating.in.school_Low Health_Low Changing.the.past_Low God_Low
## 2 Cheating.in.school_High Health_High Changing.the.past_High God_Low
## 3 Cheating.in.school_Medium Health_Low Changing.the.past_High God_High
## 5 Cheating.in.school_High Health_Medium Changing.the.past_High God_High
## 6 Cheating.in.school_High Health_Medium Changing.the.past_Medium God_Medium
## 7 Cheating.in.school_Low Health_Medium Changing.the.past_Low God_High
## Dreams Charity Number.of.friends
## 1 Dreams_High Charity_Low Number.of.friends_Medium
## 2 Dreams_Medium Charity_Low Number.of.friends_Medium
## 3 Dreams_Low Charity_Medium Number.of.friends_Medium
## 5 Dreams_Medium Charity_Medium Number.of.friends_Medium
## 6 Dreams_Medium Charity_Low Number.of.friends_Medium
## 7 Dreams_Medium Charity_Medium Number.of.friends_Medium
## Punctuality Lying
## 1 Punctuality_i am always on time Lying_never
## 2 Punctuality_i am often early Lying_sometimes
## 3 Punctuality_i am often running late Lying_sometimes
## 5 Punctuality_i am always on time Lying_everytime it suits me
## 6 Punctuality_i am often early Lying_only to avoid hurting someone
## 7 Punctuality_i am often early Lying_never
## Waiting New.environment Mood.swings
## 1 Waiting_Medium New.environment_High Mood.swings_Medium
## 2 Waiting_Medium New.environment_High Mood.swings_High
## 3 Waiting_Low New.environment_Medium Mood.swings_High
## 5 Waiting_Medium New.environment_High Mood.swings_Low
## 6 Waiting_Medium New.environment_High Mood.swings_Medium
## 7 Waiting_High New.environment_High Mood.swings_High
## Appearence.and.gestures Socializing Achievements
## 1 Appearence.and.gestures_High Socializing_Medium Achievements_High
## 2 Appearence.and.gestures_High Socializing_High Achievements_Low
## 3 Appearence.and.gestures_Medium Socializing_High Achievements_Medium
## 5 Appearence.and.gestures_Medium Socializing_Medium Achievements_Medium
## 6 Appearence.and.gestures_Medium Socializing_High Achievements_Low
## 7 Appearence.and.gestures_High Socializing_High Achievements_High
## Responding.to.a.serious.letter Children Assertiveness
## 1 Responding.to.a.serious.letter_Medium Children_High Assertiveness_Low
## 2 Responding.to.a.serious.letter_High Children_Low Assertiveness_Low
## 3 Responding.to.a.serious.letter_High Children_High Assertiveness_Medium
## 5 Responding.to.a.serious.letter_Medium Children_High Assertiveness_High
## 6 Responding.to.a.serious.letter_Low Children_Medium Assertiveness_High
## 7 Responding.to.a.serious.letter_Medium Children_Low Assertiveness_Medium
## Getting.angry Knowing.the.right.people Public.speaking
## 1 Getting.angry_Low Knowing.the.right.people_Medium Public.speaking_High
## 2 Getting.angry_High Knowing.the.right.people_High Public.speaking_High
## 3 Getting.angry_High Knowing.the.right.people_Medium Public.speaking_Low
## 5 Getting.angry_Low Knowing.the.right.people_Medium Public.speaking_High
## 6 Getting.angry_Medium Knowing.the.right.people_High Public.speaking_High
## 7 Getting.angry_Medium Knowing.the.right.people_High Public.speaking_Medium
## Unpopularity Life.struggles Happiness.in.life
## 1 Unpopularity_High Life.struggles_Low Happiness.in.life_High
## 2 Unpopularity_High Life.struggles_Low Happiness.in.life_High
## 3 Unpopularity_High Life.struggles_High Happiness.in.life_High
## 5 Unpopularity_High Life.struggles_Low Happiness.in.life_Medium
## 6 Unpopularity_High Life.struggles_Medium Happiness.in.life_Medium
## 7 Unpopularity_Medium Life.struggles_High Happiness.in.life_High
## Energy.levels Small...big.dogs Personality
## 1 Energy.levels_High Small...big.dogs_Low Personality_High
## 2 Energy.levels_Medium Small...big.dogs_High Personality_Medium
## 3 Energy.levels_High Small...big.dogs_Medium Personality_Medium
## 5 Energy.levels_High Small...big.dogs_Medium Personality_Medium
## 6 Energy.levels_High Small...big.dogs_High Personality_Medium
## 7 Energy.levels_High Small...big.dogs_Medium Personality_Medium
## Finding.lost.valuables Getting.up Interests.or.hobbies
## 1 Finding.lost.valuables_Medium Getting.up_Low Interests.or.hobbies_Medium
## 2 Finding.lost.valuables_High Getting.up_High Interests.or.hobbies_Medium
## 3 Finding.lost.valuables_Medium Getting.up_High Interests.or.hobbies_High
## 5 Finding.lost.valuables_Low Getting.up_High Interests.or.hobbies_Medium
## 6 Finding.lost.valuables_Medium Getting.up_Medium Interests.or.hobbies_High
## 7 Finding.lost.valuables_Low Getting.up_Low Interests.or.hobbies_High
## Parents..advice Questionnaires.or.polls
## 1 Parents..advice_High Questionnaires.or.polls_Medium
## 2 Parents..advice_Low Questionnaires.or.polls_Medium
## 3 Parents..advice_Medium Questionnaires.or.polls_Low
## 5 Parents..advice_Medium Questionnaires.or.polls_Medium
## 6 Parents..advice_Medium Questionnaires.or.polls_High
## 7 Parents..advice_High Questionnaires.or.polls_High
## Internet.usage Finances
## 1 Internet.usage_few hours a day Finances_Medium
## 2 Internet.usage_few hours a day Finances_Medium
## 3 Internet.usage_few hours a day Finances_Low
## 5 Internet.usage_few hours a day Finances_High
## 6 Internet.usage_few hours a day Finances_Low
## 7 Internet.usage_less than an hour a day Finances_High
## Shopping.centres Branded.clothing Entertainment.spending
## 1 Shopping.centres_High Branded.clothing_High Entertainment.spending_Medium
## 2 Shopping.centres_High Branded.clothing_Low Entertainment.spending_High
## 3 Shopping.centres_High Branded.clothing_Low Entertainment.spending_High
## 5 Shopping.centres_Medium Branded.clothing_High Entertainment.spending_Medium
## 6 Shopping.centres_Medium Branded.clothing_Medium Entertainment.spending_Medium
## 7 Shopping.centres_Medium Branded.clothing_Low Entertainment.spending_Medium
## Spending.on.looks Spending.on.gadgets
## 1 Spending.on.looks_Medium Spending.on.gadgets_Low
## 2 Spending.on.looks_Low Spending.on.gadgets_High
## 3 Spending.on.looks_Medium Spending.on.gadgets_High
## 5 Spending.on.looks_Medium Spending.on.gadgets_Low
## 6 Spending.on.looks_Low Spending.on.gadgets_High
## 7 Spending.on.looks_High Spending.on.gadgets_Low
## Spending.on.healthy.eating Age Height Weight
## 1 Spending.on.healthy.eating_Medium Age_15-20 Height_Medium Weight_Low
## 2 Spending.on.healthy.eating_Low Age_15-20 Height_Medium Weight_Medium
## 3 Spending.on.healthy.eating_Low Age_15-20 Height_Medium Weight_Medium
## 5 Spending.on.healthy.eating_High Age_15-20 Height_Medium Weight_Medium
## 6 Spending.on.healthy.eating_High Age_15-20 Height_Tall Weight_Medium
## 7 Spending.on.healthy.eating_High Age_15-20 Height_Medium Weight_Low
## Number.of.siblings Gender Left...right.handed
## 1 Number.of.siblings_One female right handed
## 2 Number.of.siblings_Two or Three female right handed
## 3 Number.of.siblings_Two or Three female right handed
## 5 Number.of.siblings_One female right handed
## 6 Number.of.siblings_One male right handed
## 7 Number.of.siblings_One female right handed
## Education Only.child Village...town House...block.of.flats
## 1 college/bachelor degree Only.child_no village block of flats
## 2 college/bachelor degree Only.child_no city block of flats
## 3 secondary school Only.child_no city block of flats
## 5 secondary school Only.child_no village house/bungalow
## 6 secondary school Only.child_no city block of flats
## 7 secondary school Only.child_no village house/bungalow
With the above data we can start teh rule association mining.
rules <- list()
rules$male <- apriori(data=trans1, parameter=list(supp=0.3, conf=0.4, minlen=2), appearance=list(default="rhs", lhs=c("male")), control=list(verbose=F))
rules$male.byconf<-sort(rules$male, by="confidence", decreasing=TRUE)[1:15]
inspect(rules$male.byconf)
## lhs rhs support confidence coverage
## [1] {male} => {Music_High} 0.3732194 0.9357143 0.3988604
## [2] {male} => {Movies_High} 0.3618234 0.9071429 0.3988604
## [3] {male} => {Comedy_High} 0.3603989 0.9035714 0.3988604
## [4] {male} => {Fun.with.friends_High} 0.3547009 0.8892857 0.3988604
## [5] {male} => {Storm_Low} 0.3532764 0.8857143 0.3988604
## [6] {male} => {right handed} 0.3532764 0.8857143 0.3988604
## [7] {male} => {Internet_High} 0.3447293 0.8642857 0.3988604
## [8] {male} => {Darkness_Low} 0.3333333 0.8357143 0.3988604
## [9] {male} => {Gardening_Low} 0.3319088 0.8321429 0.3988604
## [10] {male} => {Writing_Low} 0.3176638 0.7964286 0.3988604
## [11] {male} => {Only.child_no} 0.3162393 0.7928571 0.3988604
## [12] {male} => {Action_High} 0.3148148 0.7892857 0.3988604
## [13] {male} => {Hypochondria_Low} 0.3119658 0.7821429 0.3988604
## [14] {male} => {Dancing_Low} 0.3105413 0.7785714 0.3988604
## [15] {male} => {Internet.usage_few hours a day} 0.3062678 0.7678571 0.3988604
## lift count
## [1] 0.9833405 262
## [2] 0.9873090 254
## [3] 1.0068367 253
## [4] 0.9924938 249
## [5] 1.1753713 248
## [6] 0.9853747 248
## [7] 1.1235714 242
## [8] 1.3066179 234
## [9] 1.1105785 233
## [10] 1.0629142 223
## [11] 1.0345459 222
## [12] 1.4134657 221
## [13] 1.0518473 219
## [14] 1.3330662 218
## [15] 1.0326355 215
rules$female <- apriori(data=trans1, parameter=list(supp=0.3, conf=0.4, minlen=2), appearance=list(default="rhs", lhs="female"), control=list(verbose=F))
rules$female.byconf<-sort(rules$female, by="confidence", decreasing=TRUE)[1:15]
inspect(rules$female.byconf)
## lhs rhs support confidence coverage
## [1] {female} => {Music_High} 0.5754986 0.9642005 0.5968661
## [2] {female} => {Movies_High} 0.5541311 0.9284010 0.5968661
## [3] {female} => {right handed} 0.5427350 0.9093079 0.5968661
## [4] {female} => {Fun.with.friends_High} 0.5384615 0.9021480 0.5968661
## [5] {female} => {Comedy_High} 0.5356125 0.8973747 0.5968661
## [6] {female} => {Height_Medium} 0.5042735 0.8448687 0.5968661
## [7] {female} => {Western_Low} 0.4914530 0.8233890 0.5968661
## [8] {female} => {Weight_Medium} 0.4772080 0.7995227 0.5968661
## [9] {female} => {Physics_Low} 0.4743590 0.7947494 0.5968661
## [10] {female} => {Borrowed.stuff_High} 0.4601140 0.7708831 0.5968661
## [11] {female} => {Compassion.to.animals_High} 0.4544160 0.7613365 0.5968661
## [12] {female} => {Empathy_High} 0.4487179 0.7517900 0.5968661
## [13] {female} => {Only.child_no} 0.4487179 0.7517900 0.5968661
## [14] {female} => {Keeping.promises_High} 0.4472934 0.7494033 0.5968661
## [15] {female} => {Fake_Low} 0.4415954 0.7398568 0.5968661
## lift count
## [1] 1.0132765 404
## [2] 1.0104457 389
## [3] 1.0116230 381
## [4] 1.0068488 378
## [5] 0.9999318 376
## [6] 1.2304935 354
## [7] 1.1967269 345
## [8] 1.0630017 335
## [9] 1.1551016 333
## [10] 1.0268689 323
## [11] 1.1088345 319
## [12] 1.1017882 315
## [13] 0.9809602 315
## [14] 0.9944823 314
## [15] 1.0284742 310
rules$Age15_20 <- apriori(data=trans1, parameter=list(supp=0.3, conf=0.4, minlen=2), appearance=list(default="rhs", lhs="Age_15-20"), control=list(verbose=F))
rules$Age15_20.byconf<-sort(rules$Age15_20, by="confidence", decreasing=TRUE)[1:15]
inspect(rules$Age15_20.byconf)
## lhs rhs support confidence
## [1] {Age_15-20} => {Music_High} 0.6039886 0.9636364
## [2] {Age_15-20} => {Movies_High} 0.5811966 0.9272727
## [3] {Age_15-20} => {Comedy_High} 0.5712251 0.9113636
## [4] {Age_15-20} => {Fun.with.friends_High} 0.5698006 0.9090909
## [5] {Age_15-20} => {right handed} 0.5683761 0.9068182
## [6] {Age_15-20} => {Internet_High} 0.4900285 0.7818182
## [7] {Age_15-20} => {Storm_Low} 0.4886040 0.7795455
## [8] {Age_15-20} => {Weight_Medium} 0.4829060 0.7704545
## [9] {Age_15-20} => {Internet.usage_few hours a day} 0.4814815 0.7681818
## [10] {Age_15-20} => {Gardening_Low} 0.4800570 0.7659091
## [11] {Age_15-20} => {Borrowed.stuff_High} 0.4772080 0.7613636
## [12] {Age_15-20} => {Hypochondria_Low} 0.4700855 0.7500000
## [13] {Age_15-20} => {Writing_Low} 0.4672365 0.7454545
## [14] {Age_15-20} => {Only.child_no} 0.4672365 0.7454545
## [15] {Age_15-20} => {Judgment.calls_High} 0.4558405 0.7272727
## coverage lift count
## [1] 0.6267806 1.0126837 424
## [2] 0.6267806 1.0092178 408
## [3] 0.6267806 1.0155195 401
## [4] 0.6267806 1.0145975 400
## [5] 0.6267806 1.0088532 399
## [6] 0.6267806 1.0163636 344
## [7] 0.6267806 1.0344819 343
## [8] 0.6267806 1.0243543 339
## [9] 0.6267806 1.0330721 338
## [10] 0.6267806 1.0221829 337
## [11] 0.6267806 1.0141884 335
## [12] 0.6267806 1.0086207 330
## [13] 0.6267806 0.9948842 328
## [14] 0.6267806 0.9726935 328
## [15] 0.6267806 1.0293255 320
rules$Age20_25 <- apriori(data=trans1, parameter=list(supp=0.2, conf=0.4, minlen=2), appearance=list(default="rhs", lhs="Age_20-25"), control=list(verbose=F))
rules$Age20_25.byconf<-sort(rules$Age20_25, by="confidence", decreasing=TRUE)[1:15]
inspect(rules$Age20_25.byconf)
## lhs rhs support confidence coverage
## [1] {Age_20-25} => {Music_High} 0.2934473 0.9406393 0.3119658
## [2] {Age_20-25} => {right handed} 0.2820513 0.9041096 0.3119658
## [3] {Age_20-25} => {Movies_High} 0.2820513 0.9041096 0.3119658
## [4] {Age_20-25} => {Fun.with.friends_High} 0.2763533 0.8858447 0.3119658
## [5] {Age_20-25} => {Comedy_High} 0.2749288 0.8812785 0.3119658
## [6] {Age_20-25} => {Keeping.promises_High} 0.2521368 0.8082192 0.3119658
## [7] {Age_20-25} => {Only.child_no} 0.2478632 0.7945205 0.3119658
## [8] {Age_20-25} => {Writing_Low} 0.2421652 0.7762557 0.3119658
## [9] {Age_20-25} => {Chemistry_Low} 0.2407407 0.7716895 0.3119658
## [10] {Age_20-25} => {Hypochondria_Low} 0.2350427 0.7534247 0.3119658
## [11] {Age_20-25} => {city} 0.2321937 0.7442922 0.3119658
## [12] {Age_20-25} => {Weight_Medium} 0.2321937 0.7442922 0.3119658
## [13] {Age_20-25} => {Fake_Low} 0.2293447 0.7351598 0.3119658
## [14] {Age_20-25} => {Internet_High} 0.2293447 0.7351598 0.3119658
## [15] {Age_20-25} => {Reliability_High} 0.2279202 0.7305936 0.3119658
## lift count
## [1] 0.9885161 206
## [2] 1.0058398 198
## [3] 0.9840076 198
## [4] 0.9886534 194
## [5] 0.9819961 193
## [6] 1.0725328 177
## [7] 1.0367164 174
## [8] 1.0359915 170
## [9] 1.0812895 169
## [10] 1.0132263 165
## [11] 1.0366928 163
## [12] 0.9895704 163
## [13] 1.0219449 161
## [14] 0.9557078 161
## [15] 1.0866032 160
rules$Age25_30 <- apriori(data=trans1, parameter=list(supp=0.03, conf=0.2, minlen=2), appearance=list(default="rhs", lhs="Age_25-30"), control=list(verbose=F))
rules$Age25_30.byconf<-sort(rules$Age25_30, by="confidence", decreasing=TRUE)[1:15]
inspect(rules$Age25_30.byconf)
## lhs rhs support confidence coverage
## [1] {Age_25-30} => {Movies_High} 0.05555556 0.9285714 0.05982906
## [2] {Age_25-30} => {Music_High} 0.05413105 0.9047619 0.05982906
## [3] {Age_25-30} => {Only.child_no} 0.05128205 0.8571429 0.05982906
## [4] {Age_25-30} => {Comedy_High} 0.05128205 0.8571429 0.05982906
## [5] {Age_25-30} => {Internet_High} 0.04985755 0.8333333 0.05982906
## [6] {Age_25-30} => {Fun.with.friends_High} 0.04985755 0.8333333 0.05982906
## [7] {Age_25-30} => {right handed} 0.04843305 0.8095238 0.05982906
## [8] {Age_25-30} => {Borrowed.stuff_High} 0.04700855 0.7857143 0.05982906
## [9] {Age_25-30} => {Keeping.promises_High} 0.04558405 0.7619048 0.05982906
## [10] {Age_25-30} => {Reliability_High} 0.04415954 0.7380952 0.05982906
## [11] {Age_25-30} => {city} 0.04415954 0.7380952 0.05982906
## [12] {Age_25-30} => {Fake_Low} 0.04415954 0.7380952 0.05982906
## [13] {Age_25-30} => {PC_High} 0.04273504 0.7142857 0.05982906
## [14] {Age_25-30} => {Thriller_High} 0.04273504 0.7142857 0.05982906
## [15] {Age_25-30} => {Happiness.in.life_High} 0.04273504 0.7142857 0.05982906
## lift count
## [1] 1.0106312 39
## [2] 0.9508127 38
## [3] 1.1184280 36
## [4] 0.9551020 36
## [5] 1.0833333 35
## [6] 0.9300477 35
## [7] 0.9006113 34
## [8] 1.0466251 33
## [9] 1.0110721 32
## [10] 1.0977603 31
## [11] 1.0280612 31
## [12] 1.0260255 31
## [13] 1.6826462 30
## [14] 1.4085072 30
## [15] 1.1044682 30
rules$Smoking <- apriori(data=trans1, parameter=list(supp=0.1, conf=0.22, minlen=2), appearance=list(default="lhs", rhs=c("Smoking_current smoker")), control=list(verbose=F))
rules$Smoking.byconf<-sort(rules$Smoking, by="confidence", decreasing=TRUE)[1:15]
inspect(rules$Smoking.byconf)
## lhs rhs support confidence coverage lift count
## [1] {Cheating.in.school_High,
## Fun.with.friends_High,
## Judgment.calls_High} => {Smoking_current smoker} 0.1039886 0.2607143 0.3988604 1.475979 73
## [2] {Cheating.in.school_High,
## Chemistry_Low,
## Fun.with.friends_High} => {Smoking_current smoker} 0.1054131 0.2596491 0.4059829 1.469949 74
## [3] {Folk_Low,
## Fun.with.friends_High,
## Religion_Low} => {Smoking_current smoker} 0.1039886 0.2588652 0.4017094 1.465511 73
## [4] {Fun.with.friends_High,
## Judgment.calls_High,
## Religion_Low} => {Smoking_current smoker} 0.1039886 0.2588652 0.4017094 1.465511 73
## [5] {Cheating.in.school_High,
## Chemistry_Low,
## Music_High} => {Smoking_current smoker} 0.1068376 0.2568493 0.4159544 1.454099 75
## [6] {Entertainment.spending_High} => {Smoking_current smoker} 0.1039886 0.2561404 0.4059829 1.450085 73
## [7] {Chemistry_Low,
## Folk_Low,
## Fun.with.friends_High} => {Smoking_current smoker} 0.1054131 0.2560554 0.4116809 1.449604 74
## [8] {Judgment.calls_High,
## Music_High,
## Religion_Low} => {Smoking_current smoker} 0.1082621 0.2558923 0.4230769 1.448680 76
## [9] {Folk_Low,
## Fun.with.friends_High,
## Judgment.calls_High} => {Smoking_current smoker} 0.1025641 0.2553191 0.4017094 1.445436 72
## [10] {Chemistry_Low,
## Fun.with.friends_High,
## Religion_Low} => {Smoking_current smoker} 0.1054131 0.2551724 0.4131054 1.444605 74
## [11] {Folk_Low,
## Music_High,
## Religion_Low} => {Smoking_current smoker} 0.1082621 0.2550336 0.4245014 1.443819 76
## [12] {Chemistry_Low,
## Music_High,
## Religion_Low} => {Smoking_current smoker} 0.1096866 0.2549669 0.4301994 1.443442 77
## [13] {Cheating.in.school_High,
## Judgment.calls_High,
## Music_High} => {Smoking_current smoker} 0.1039886 0.2534722 0.4102564 1.434980 73
## [14] {Folk_Low,
## Judgment.calls_High,
## Music_High} => {Smoking_current smoker} 0.1068376 0.2533784 0.4216524 1.434449 75
## [15] {Cheating.in.school_High,
## Chemistry_Low} => {Smoking_current smoker} 0.1096866 0.2516340 0.4358974 1.424573 77
rules$Drinking <- apriori(data=trans1, parameter=list(supp=0.1, conf=0.22, minlen=2), appearance=list(default="lhs", rhs="Alcohol_drink a lot"), control=list(verbose=F))
rules$Drinking.byconf<-sort(rules$Drinking, by="confidence", decreasing=TRUE)[1:15]
inspect(rules$Drinking.byconf)
## lhs rhs support confidence coverage lift count
## [1] {Charity_Low,
## Entertainment.spending_High,
## Storm_Low} => {Alcohol_drink a lot} 0.1011396 0.5071429 0.1994302 2.239084 71
## [2] {Dancing_Low,
## Entertainment.spending_High,
## Fun.with.friends_High} => {Alcohol_drink a lot} 0.1111111 0.5064935 0.2193732 2.236217 78
## [3] {Entertainment.spending_High,
## Getting.up_High,
## Internet.usage_few hours a day} => {Alcohol_drink a lot} 0.1025641 0.5000000 0.2051282 2.207547 72
## [4] {Charity_Low,
## Entertainment.spending_High,
## Internet_High} => {Alcohol_drink a lot} 0.1039886 0.4965986 0.2094017 2.192530 73
## [5] {Charity_Low,
## Entertainment.spending_High,
## Internet.usage_few hours a day} => {Alcohol_drink a lot} 0.1025641 0.4965517 0.2065527 2.192323 72
## [6] {Dancing_Low,
## Entertainment.spending_High,
## Music_High} => {Alcohol_drink a lot} 0.1096866 0.4935897 0.2222222 2.179245 77
## [7] {Entertainment.spending_High,
## Getting.up_High,
## Storm_Low} => {Alcohol_drink a lot} 0.1082621 0.4935065 0.2193732 2.178878 76
## [8] {Dancing_Low,
## Entertainment.spending_High,
## right handed} => {Alcohol_drink a lot} 0.1039886 0.4932432 0.2108262 2.177715 73
## [9] {Dancing_Low,
## Entertainment.spending_High} => {Alcohol_drink a lot} 0.1139601 0.4907975 0.2321937 2.166917 80
## [10] {Dancing_Low,
## Entertainment.spending_High,
## Movies_High} => {Alcohol_drink a lot} 0.1011396 0.4829932 0.2094017 2.132461 71
## [11] {Charity_Low,
## Entertainment.spending_High,
## Gardening_Low} => {Alcohol_drink a lot} 0.1054131 0.4774194 0.2207977 2.107851 74
## [12] {city,
## Entertainment.spending_High,
## Getting.up_High} => {Alcohol_drink a lot} 0.1025641 0.4768212 0.2150997 2.105211 72
## [13] {Entertainment.spending_High,
## Getting.up_High,
## Only.child_no} => {Alcohol_drink a lot} 0.1011396 0.4765101 0.2122507 2.103837 71
## [14] {Entertainment.spending_High,
## Getting.up_High,
## Internet_High} => {Alcohol_drink a lot} 0.1096866 0.4753086 0.2307692 2.098532 77
## [15] {Chemistry_Low,
## Entertainment.spending_High,
## Getting.up_High} => {Alcohol_drink a lot} 0.1039886 0.4740260 0.2193732 2.092869 73
rules$Healthy <- apriori(data=trans1, parameter=list(supp=0.1, conf=0.22, minlen=2), appearance=list(default="lhs", rhs="Healthy.eating_High"), control=list(verbose=F))
rules$Healthy.byconf<-sort(rules$Healthy, by="confidence", decreasing=TRUE)[1:15]
inspect(rules$Healthy.byconf)
## lhs rhs support confidence coverage lift count
## [1] {Interests.or.hobbies_High,
## Keeping.promises_High,
## Spending.on.healthy.eating_High} => {Healthy.eating_High} 0.1096866 0.4610778 0.2378917 1.778443 77
## [2] {Borrowed.stuff_High,
## Interests.or.hobbies_High,
## Spending.on.healthy.eating_High} => {Healthy.eating_High} 0.1039886 0.4534161 0.2293447 1.748891 73
## [3] {city,
## Energy.levels_High,
## Spending.on.healthy.eating_High} => {Healthy.eating_High} 0.1054131 0.4484848 0.2350427 1.729870 74
## [4] {Eating.to.survive_Low,
## Energy.levels_High,
## Spending.on.healthy.eating_High} => {Healthy.eating_High} 0.1025641 0.4444444 0.2307692 1.714286 72
## [5] {Punctuality_i am always on time,
## Spending.on.healthy.eating_High} => {Healthy.eating_High} 0.1011396 0.4437500 0.2279202 1.711607 71
## [6] {Energy.levels_High,
## Reliability_High,
## Spending.on.healthy.eating_High} => {Healthy.eating_High} 0.1111111 0.4431818 0.2507123 1.709416 78
## [7] {city,
## Documentary_High,
## Spending.on.healthy.eating_High} => {Healthy.eating_High} 0.1054131 0.4404762 0.2393162 1.698980 74
## [8] {city,
## Lying_sometimes,
## Spending.on.healthy.eating_High} => {Healthy.eating_High} 0.1011396 0.4382716 0.2307692 1.690476 71
## [9] {Internet_High,
## Lying_sometimes,
## Spending.on.healthy.eating_High} => {Healthy.eating_High} 0.1011396 0.4382716 0.2307692 1.690476 71
## [10] {Borrowed.stuff_High,
## Energy.levels_High,
## Spending.on.healthy.eating_High} => {Healthy.eating_High} 0.1111111 0.4382022 0.2535613 1.690209 78
## [11] {Energy.levels_High,
## Keeping.promises_High,
## Spending.on.healthy.eating_High} => {Healthy.eating_High} 0.1111111 0.4357542 0.2549858 1.680766 78
## [12] {Interests.or.hobbies_High,
## Only.child_no,
## Spending.on.healthy.eating_High} => {Healthy.eating_High} 0.1011396 0.4355828 0.2321937 1.680105 71
## [13] {Energy.levels_High,
## Flying_Low,
## Spending.on.healthy.eating_High} => {Healthy.eating_High} 0.1011396 0.4355828 0.2321937 1.680105 71
## [14] {female,
## Only.child_no,
## Spending.on.healthy.eating_High} => {Healthy.eating_High} 0.1039886 0.4345238 0.2393162 1.676020 73
## [15] {Borrowed.stuff_High,
## Spending.on.healthy.eating_High,
## Thinking.ahead_High} => {Healthy.eating_High} 0.1025641 0.4337349 0.2364672 1.672978 72
## Male
plot(rules$male.byconf, method="graph")
plot(rules$male.byconf, method="paracoord", control=list(reorder=TRUE), main='Top 15 rules for males')
## Female
plot(rules$female.byconf, method="graph")
plot(rules$female.byconf, method="paracoord", control=list(reorder=TRUE), main='Top 15 rules for females')
## 15 - 20
plot(rules$Age15_20.byconf, method="graph")
plot(rules$Age15_20.byconf, method="paracoord", control=list(reorder=TRUE), main='Top 15 rules for ages 15 - 20')
## 21 - 25
plot(rules$Age20_25.byconf, method="graph")
plot(rules$Age20_25.byconf, method="paracoord", control=list(reorder=TRUE), main='Top 15 rules for ages 21 - 25')
## 26 - 30
plot(rules$Age25_30.byconf, method="graph")
plot(rules$Age25_30.byconf, method="paracoord", control=list(reorder=TRUE), main='Top 15 rules for ages 26 - 30')
## Drinking
plot(rules$Drinking.byconf, method="graph")
plot(rules$Drinking.byconf, method="paracoord", control=list(reorder=TRUE), main='Top 15 rules for High Drinking')
## Smoking
plot(rules$Smoking.byconf, method="graph")
plot(rules$Smoking.byconf, method="paracoord", control=list(reorder=TRUE), main='Top 15 rules for High Smoking')
## Healthy Eating
plot(rules$Healthy.byconf, method="graph")
plot(rules$Healthy.byconf, method="paracoord", control=list(reorder=TRUE), main='Top 15 rules for Healthy Eating')
From the results of the Apriori Association Rule Mining we can clearly see that the interests and habits for different genders and ages are different. People tend to want different things in different points in time of their journey on Earth.
The more curious part of the plotting we can see what influences people’s habits: to smoke, to drink alcohol and to eat healthy. These results show a curious and complex dependency of different factors that lead to habit forming, perhaps with a bigger sample it would be possible to conduct a better analysis, to root out bad habit forming by properly guiding younger people before they are formed.
The analysis revealed that males are highly associated with preferences for music, movies, and comedy, as well as traits like being right-handed and having low interest in gardening or writing. These rules suggest that males in the dataset tend to prioritize entertainment and leisure activities.
Females showed strong associations with high interest in music, movies, and social activities, as well as traits like being right-handed and having medium height. Additionally, females were more likely to exhibit empathy and compassion toward animals, indicating a focus on emotional and social well-being.
Rules related to smoking highlighted associations with high entertainment spending, low interest in religion, and high levels of cheating in school. These patterns suggest that smoking behavior may be linked to risk-taking and non-conformist tendencies.
High alcohol consumption was associated with high entertainment spending, low interest in charity, and being right-handed. These rules indicate that drinking behavior is often tied to social and leisure activities.
Healthy eating habits were strongly associated with high energy levels, reliability, and spending on healthy foods. These rules suggest that individuals who prioritize health are also likely to exhibit disciplined and responsible behaviors.