Association rule mining [1]is a powerful technique in data science
for uncovering hidden patterns and relationships within large data sets.
This paper employs the Apriori algorithm, implemented in R using the
arules and arulesViz packages, to analyze a
data set containing demographic and health-related attributes such as
age, gender, weight change, and physical activity levels.A dataset [2]
was used and variables were grouped into factors to achieve the required
dataset. For example, weight was scaled and ranked from XXs to XXL and
also from weight gain to extreme weight loss.
By setting support and confidence thresholds, we generate meaningful association rules, which are then filtered based on lift to identify the most significant relationships. The results are visualized to provide intuitive insights, aiding in the understanding of factors influencing health outcomes. This approach is widely supported in literature for its effectiveness in pattern discovery.
#data_trans <- as(data, "transactions")
#data_trans <- read_csv("C:/Users/Ceee/Desktop/data science/2400-DS1UL/papers/rule.csv")
# Load necessary libraries
library(arules)
## Loading required package: Matrix
##
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
##
## abbreviate, write
library(arulesViz)
library(readr) # For read_csv
# Load the dataset
data_trans <- read_csv("C:/Users/Ceee/Desktop/data science/2400-DS1UL/papers/rule.csv")
## Rows: 100 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): Age, Gender, weight ranking, Weight Change (lbs), Duration (weeks),...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Convert to transactions if necessary
data_trans <- as(data_trans, "transactions") # Ensure proper format for Apriori
## Warning: Column(s) 1, 2, 3, 4, 5, 6, 7 not logical or factor. Applying default
## discretization (see '? discretizeDF').
# Generate association rules
rules <- apriori(
data_trans,
parameter = list(supp = 0.04, conf = 0.8) # Adjust thresholds as needed
)
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.8 0.1 1 none FALSE TRUE 5 0.04 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 4
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[30 item(s), 100 transaction(s)] done [0.00s].
## sorting and recoding items ... [30 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 done [0.00s].
## writing ... [125 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
# Inspect rules
#inspect(rules)
# Filter rules by lift
strong_rules <- subset(rules, lift > 5)
inspect(strong_rules)
## lhs rhs support confidence coverage lift count
## [1] {Duration (weeks)=1-3 wks,
## Sleep Quality=Excellent} => {Weight Change (lbs)=no change} 0.04 1.0000000 0.04 5.882353 4
## [2] {Physical Activity Level=Sedentary,
## Sleep Quality=Excellent} => {Weight Change (lbs)=no change} 0.04 1.0000000 0.04 5.882353 4
## [3] {Age=31-40,
## Gender=M,
## Sleep Quality=Poor} => {Weight Change (lbs)=extreme loss} 0.04 0.8000000 0.05 5.333333 4
## [4] {weight ranking=M,
## Physical Activity Level=Lightly Active,
## Sleep Quality=Poor} => {Weight Change (lbs)=extreme loss} 0.05 0.8333333 0.06 5.555556 5
## [5] {Gender=M,
## weight ranking=M,
## Physical Activity Level=Lightly Active} => {Weight Change (lbs)=extreme loss} 0.04 1.0000000 0.04 6.666667 4
## [6] {Gender=M,
## weight ranking=M,
## Physical Activity Level=Lightly Active,
## Sleep Quality=Poor} => {Weight Change (lbs)=extreme loss} 0.04 1.0000000 0.04 6.666667 4
# Save the results
write(rules, file = "association_rules_output.csv", sep = ",")
# Visualize the rules
plot(rules, method = "graph", engine = "htmlwidget") # Corrected plotting function
## Warning: Too many rules supplied. Only plotting the best 100 using 'lift'
## (change control parameter max if needed).
The code uses the Apriori algorithm[3] to find associations between different attributes in the data set, such as age, gender, weight ranking, weight change, duration, physical activity level, and sleep quality.
Some of the rules from the data are listed below:
Rule 1 Antecedent : Age=51-60, Gender=M, Weight Change=no change Consequent :sleep Quality=Excellent Men aged 51-60 with no weight change are highly likely to have excellent sleep quality.
Rule 2 Antecedent : Age=18-30, Gender=F, Physical Activity Level=Sedentary Consequent : Sleep Quality=Poor Sedentary women aged 18-30 are highly likely to have poor sleep quality.
Rule 3 Antecedent : Age=31-40, Gender=M, Weight Change=extreme loss Consequent : Sleep Quality=Poor Men aged 31-40 with extreme weight loss are highly likely to have poor sleep quality.
Rule 4 Antecedent : Age=41-50, Gender=F, Physical Activity Level=Very Active Consequent : Weight Change=gained
Very active women aged 41-50 are highly likely to have gained weight.
Rule 5 Antecedent : Age=18-30, Gender=M, Duration=10-12 wks Consequent : Weight Change=extreme loss
Men aged 18-30 with a duration of 10-12 weeks are highly likely to experience extreme weight loss.
From a general perspective, someone who is very active is often expected to lose weight. However, Rule 4 suggests that there is a high chance that highly active individuals may actually gain weight,this might be because individuals tend to develop metabolic disorders and other chronic diseases, due to the poor conduction of physical activities[4] This indicates that weight gain or loss is a complex process influenced by various factors, and individual responses to activity levels can vary significantly. To achieve their goals, individuals must find a system that works uniquely for them, as everyone responds differently to lifestyle variables such as diet, exercise, and metabolism.
References
1.Wolfgang Hardle.Bernd Ronz. (2002) Procedings in Computational Statistics. Springer-Verlag Berlin Heidelberg GmbH
2.httpsfragilestatesindex.orgexcel
3.https://rajeshreddycse.wordpress.com/wp-content/uploads/2023/05/apriori-algorithm-example-problems.pdf
4.Effect of age and weight on physical activity, Rivan Virlando Suryadinata, Bambang Wirjatmadi, Merryana Adriani, Amelia Lorensia