This part project focuses on applying association rule mining to the RHMCD-20 data set, which consists of comprehensive survey data about depression and mental health. The data set captures various factors, such as stress levels, changes in habits, social interactions, and work-related dynamics, providing a rich source of information for analysis.
The primary goal of this analysis is to uncover meaningful associations and patterns among the survey responses. By identifying relationships between different variables, such as the connection between stress and changes in work interest or the link between social weakness and coping struggles, the project aims to provide deeper insights into the factors influencing mental health.
Association rule mining is particularly suited for this task, as it enables the discovery of hidden connections that might not be immediately apparent through traditional analysis. These insights have the potential to inform mental health interventions, guide future research, and contribute to a better understanding of how different aspects of life interact to affect mental well-being. Ultimately, this approach underscores the importance of data-driven methods in addressing complex mental health challenges.
# Set a CRAN mirror
options(repos = c(CRAN = "https://cloud.r-project.org"))
# Force-install missing packages
required_packages <- c("tidyverse", "arules", "arulesViz", "dplyr")
new_packages <- required_packages[!(required_packages %in% installed.packages()[, "Package"])]
if (length(new_packages)) install.packages(new_packages, dependencies = TRUE)
# Load necessary libraries
lapply(required_packages, library, character.only = TRUE)
## Warning: package 'tidyverse' was built under R version 4.4.2
## Warning: package 'ggplot2' was built under R version 4.4.2
## Warning: package 'readr' was built under R version 4.4.2
## Warning: package 'dplyr' was built under R version 4.4.2
## Warning: package 'forcats' was built under R version 4.4.2
## Warning: package 'lubridate' was built under R version 4.4.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## Warning: package 'arules' was built under R version 4.4.2
## Loading required package: Matrix
##
## Attaching package: 'Matrix'
##
## The following objects are masked from 'package:tidyr':
##
## expand, pack, unpack
##
##
## Attaching package: 'arules'
##
## The following object is masked from 'package:dplyr':
##
## recode
##
## The following objects are masked from 'package:base':
##
## abbreviate, write
## Warning: package 'arulesViz' was built under R version 4.4.2
## [[1]]
## [1] "lubridate" "forcats" "stringr" "dplyr" "purrr" "readr"
## [7] "tidyr" "tibble" "ggplot2" "tidyverse" "stats" "graphics"
## [13] "grDevices" "utils" "datasets" "methods" "base"
##
## [[2]]
## [1] "arules" "Matrix" "lubridate" "forcats" "stringr" "dplyr"
## [7] "purrr" "readr" "tidyr" "tibble" "ggplot2" "tidyverse"
## [13] "stats" "graphics" "grDevices" "utils" "datasets" "methods"
## [19] "base"
##
## [[3]]
## [1] "arulesViz" "arules" "Matrix" "lubridate" "forcats" "stringr"
## [7] "dplyr" "purrr" "readr" "tidyr" "tibble" "ggplot2"
## [13] "tidyverse" "stats" "graphics" "grDevices" "utils" "datasets"
## [19] "methods" "base"
##
## [[4]]
## [1] "arulesViz" "arules" "Matrix" "lubridate" "forcats" "stringr"
## [7] "dplyr" "purrr" "readr" "tidyr" "tibble" "ggplot2"
## [13] "tidyverse" "stats" "graphics" "grDevices" "utils" "datasets"
## [19] "methods" "base"
# Load the dataset
data <- read.csv("C:/Users/MAGWALI/Downloads/mental_health_finaldata_1 (1).csv")
# Display the first few rows
head(data)
## Age Gender Occupation Days_Indoors Growing_Stress
## 1 20-25 Female Corporate 1-14 days Yes
## 2 30-Above Male Others 31-60 days Yes
## 3 30-Above Female Student Go out Every day No
## 4 25-30 Male Others 1-14 days Yes
## 5 16-20 Female Student More than 2 months Yes
## 6 25-30 Male Housewife More than 2 months No
## Quarantine_Frustrations Changes_Habits Mental_Health_History Weight_Change
## 1 Yes No Yes Yes
## 2 Yes Maybe No No
## 3 No Yes No No
## 4 No Maybe No Maybe
## 5 Yes Yes No Yes
## 6 Yes Yes Yes Yes
## Mood_Swings Coping_Struggles Work_Interest Social_Weakness
## 1 Medium No No Yes
## 2 High No No Yes
## 3 Medium Yes Maybe No
## 4 Medium No Maybe Yes
## 5 Medium Yes Maybe No
## 6 Medium No Maybe Maybe
# Check the structure and dimensions of the dataset
str(data)
## 'data.frame': 824 obs. of 13 variables:
## $ Age : chr "20-25" "30-Above" "30-Above" "25-30" ...
## $ Gender : chr "Female" "Male" "Female" "Male" ...
## $ Occupation : chr "Corporate" "Others" "Student" "Others" ...
## $ Days_Indoors : chr "1-14 days" "31-60 days" "Go out Every day" "1-14 days" ...
## $ Growing_Stress : chr "Yes" "Yes" "No" "Yes" ...
## $ Quarantine_Frustrations: chr "Yes" "Yes" "No" "No" ...
## $ Changes_Habits : chr "No" "Maybe" "Yes" "Maybe" ...
## $ Mental_Health_History : chr "Yes" "No" "No" "No" ...
## $ Weight_Change : chr "Yes" "No" "No" "Maybe" ...
## $ Mood_Swings : chr "Medium" "High" "Medium" "Medium" ...
## $ Coping_Struggles : chr "No" "No" "Yes" "No" ...
## $ Work_Interest : chr "No" "No" "Maybe" "Maybe" ...
## $ Social_Weakness : chr "Yes" "Yes" "No" "Yes" ...
dim(data)
## [1] 824 13
# Check for missing values
cat("Missing values per column:\n")
## Missing values per column:
print(colSums(is.na(data)))
## Age Gender Occupation
## 0 0 0
## Days_Indoors Growing_Stress Quarantine_Frustrations
## 0 0 0
## Changes_Habits Mental_Health_History Weight_Change
## 0 0 0
## Mood_Swings Coping_Struggles Work_Interest
## 0 0 0
## Social_Weakness
## 0
# Impute missing values (if any) with mode for categorical variables
impute_mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
for (col in names(data)) {
if (!is.numeric(data[[col]])) {
data[[col]][is.na(data[[col]])] <- impute_mode(data[[col]])
}
}
# Confirm no missing values remain
cat("Missing values after imputation:\n")
## Missing values after imputation:
print(colSums(is.na(data)))
## Age Gender Occupation
## 0 0 0
## Days_Indoors Growing_Stress Quarantine_Frustrations
## 0 0 0
## Changes_Habits Mental_Health_History Weight_Change
## 0 0 0
## Mood_Swings Coping_Struggles Work_Interest
## 0 0 0
## Social_Weakness
## 0
# Convert all variables to factors for association rule mining
data <- data %>% mutate(across(everything(), as.factor))
# Check structure
str(data)
## 'data.frame': 824 obs. of 13 variables:
## $ Age : Factor w/ 4 levels "16-20","20-25",..: 2 4 4 3 1 3 1 3 4 2 ...
## $ Gender : Factor w/ 2 levels "Female","Male": 1 2 1 2 1 2 1 1 2 2 ...
## $ Occupation : Factor w/ 5 levels "Business","Corporate",..: 2 4 5 4 5 3 1 5 4 2 ...
## $ Days_Indoors : Factor w/ 5 levels "1-14 days","15-30 days",..: 1 3 4 1 5 5 4 1 4 4 ...
## $ Growing_Stress : Factor w/ 3 levels "Maybe","No","Yes": 3 3 2 3 3 2 3 3 3 1 ...
## $ Quarantine_Frustrations: Factor w/ 3 levels "Maybe","No","Yes": 3 3 2 2 3 3 3 2 3 1 ...
## $ Changes_Habits : Factor w/ 3 levels "Maybe","No","Yes": 2 1 3 1 3 3 1 1 3 3 ...
## $ Mental_Health_History : Factor w/ 3 levels "Maybe","No","Yes": 3 2 2 2 2 3 2 1 2 3 ...
## $ Weight_Change : Factor w/ 3 levels "Maybe","No","Yes": 3 2 2 1 3 3 3 1 3 3 ...
## $ Mood_Swings : Factor w/ 3 levels "High","Low","Medium": 3 1 3 3 3 3 2 1 3 2 ...
## $ Coping_Struggles : Factor w/ 2 levels "No","Yes": 1 1 2 1 2 1 1 1 2 1 ...
## $ Work_Interest : Factor w/ 3 levels "Maybe","No","Yes": 2 2 1 1 1 1 1 2 1 1 ...
## $ Social_Weakness : Factor w/ 3 levels "Maybe","No","Yes": 3 3 2 3 2 1 1 3 1 2 ...
dim(data)
## [1] 824 13
# Convert the dataset to a transaction format
transactions <- as(data, "transactions")
# Summary of transactions
summary(transactions)
## transactions as itemMatrix in sparse format with
## 824 rows (elements/itemsets/transactions) and
## 42 columns (items) and a density of 0.3095238
##
## most frequent items:
## Gender=Female Coping_Struggles=No Coping_Struggles=Yes
## 434 414 410
## Gender=Male Changes_Habits=Yes (Other)
## 390 305 8759
##
## element (itemset/transaction) length distribution:
## sizes
## 13
## 824
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 13 13 13 13 13 13
##
## includes extended item information - examples:
## labels variables levels
## 1 Age=16-20 Age 16-20
## 2 Age=20-25 Age 20-25
## 3 Age=25-30 Age 25-30
##
## includes extended transaction information - examples:
## transactionID
## 1 1
## 2 2
## 3 3
# Generate rules using the Apriori algorithm
rules <- apriori(transactions, parameter = list(supp = 0.01, conf = 0.8, minlen = 2))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.8 0.1 1 none FALSE TRUE 5 0.01 2
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 8
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[42 item(s), 824 transaction(s)] done [0.00s].
## sorting and recoding items ... [42 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 done [0.03s].
## writing ... [1300 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
# Summary of the rules
summary(rules)
## set of 1300 rules
##
## rule length distribution (lhs + rhs):sizes
## 4 5 6
## 146 1057 97
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.000 5.000 5.000 4.962 5.000 6.000
##
## summary of quality measures:
## support confidence coverage lift
## Min. :0.01092 Min. :0.8000 Min. :0.01092 Min. :1.519
## 1st Qu.:0.01092 1st Qu.:0.8182 1st Qu.:0.01335 1st Qu.:1.617
## Median :0.01214 Median :0.8333 Median :0.01335 Median :1.709
## Mean :0.01244 Mean :0.8517 Mean :0.01467 Mean :1.837
## 3rd Qu.:0.01335 3rd Qu.:0.9000 3rd Qu.:0.01578 3rd Qu.:1.866
## Max. :0.03277 Max. :1.0000 Max. :0.04005 Max. :4.013
## count
## Min. : 9.00
## 1st Qu.: 9.00
## Median :10.00
## Mean :10.25
## 3rd Qu.:11.00
## Max. :27.00
##
## mining info:
## data ntransactions support confidence
## transactions 824 0.01 0.8
## call
## apriori(data = transactions, parameter = list(supp = 0.01, conf = 0.8, minlen = 2))
# Inspect the top 10 rules
inspect(head(rules, 10))
## lhs rhs support confidence coverage lift count
## [1] {Age=25-30,
## Occupation=Business,
## Days_Indoors=Go out Every day} => {Coping_Struggles=Yes} 0.01213592 0.9090909 0.01334951 1.827051 10
## [2] {Age=25-30,
## Occupation=Business,
## Days_Indoors=Go out Every day} => {Gender=Female} 0.01092233 0.8181818 0.01334951 1.553414 9
## [3] {Occupation=Business,
## Days_Indoors=Go out Every day,
## Quarantine_Frustrations=No} => {Coping_Struggles=Yes} 0.01213592 0.9090909 0.01334951 1.827051 10
## [4] {Occupation=Business,
## Days_Indoors=Go out Every day,
## Mental_Health_History=Yes} => {Coping_Struggles=Yes} 0.01092233 0.9000000 0.01213592 1.808780 9
## [5] {Gender=Male,
## Occupation=Business,
## Days_Indoors=Go out Every day} => {Coping_Struggles=Yes} 0.01213592 0.8333333 0.01456311 1.674797 10
## [6] {Age=25-30,
## Occupation=Business,
## Growing_Stress=Yes} => {Gender=Female} 0.01092233 0.8181818 0.01334951 1.553414 9
## [7] {Age=30-Above,
## Occupation=Business,
## Growing_Stress=No} => {Coping_Struggles=Yes} 0.01092233 0.8181818 0.01334951 1.644346 9
## [8] {Age=30-Above,
## Occupation=Business,
## Mood_Swings=Medium} => {Gender=Male} 0.01213592 0.8333333 0.01456311 1.760684 10
## [9] {Occupation=Business,
## Mental_Health_History=Maybe,
## Mood_Swings=High} => {Gender=Female} 0.01456311 0.8571429 0.01699029 1.627386 12
## [10] {Occupation=Business,
## Growing_Stress=Maybe,
## Weight_Change=No} => {Coping_Struggles=Yes} 0.01213592 0.8333333 0.01456311 1.674797 10
# Filter rules by lift > 1
filtered_rules <- subset(rules, lift > 1)
# Sort rules by confidence
sorted_rules <- sort(filtered_rules, by = "confidence", decreasing = TRUE)
# Inspect the top 10 sorted rules
inspect(head(sorted_rules, 10))
## lhs rhs support confidence coverage lift count
## [1] {Age=25-30,
## Occupation=Others,
## Days_Indoors=More than 2 months} => {Gender=Female} 0.01213592 1 0.01213592 1.898618 10
## [2] {Age=25-30,
## Days_Indoors=Go out Every day,
## Social_Weakness=Yes} => {Quarantine_Frustrations=No} 0.01092233 1 0.01092233 3.244094 9
## [3] {Age=16-20,
## Weight_Change=Yes,
## Work_Interest=Yes} => {Gender=Female} 0.01456311 1 0.01456311 1.898618 12
## [4] {Days_Indoors=15-30 days,
## Changes_Habits=No,
## Weight_Change=No,
## Work_Interest=No} => {Gender=Female} 0.01092233 1 0.01092233 1.898618 9
## [5] {Age=25-30,
## Gender=Female,
## Occupation=Others,
## Social_Weakness=Yes} => {Coping_Struggles=Yes} 0.01092233 1 0.01092233 2.009756 9
## [6] {Age=25-30,
## Occupation=Others,
## Growing_Stress=Yes,
## Coping_Struggles=Yes} => {Gender=Female} 0.01092233 1 0.01092233 1.898618 9
## [7] {Gender=Female,
## Occupation=Others,
## Weight_Change=Maybe,
## Social_Weakness=No} => {Coping_Struggles=Yes} 0.01334951 1 0.01334951 2.009756 11
## [8] {Gender=Male,
## Occupation=Corporate,
## Growing_Stress=Maybe,
## Changes_Habits=Yes} => {Coping_Struggles=No} 0.01213592 1 0.01213592 1.990338 10
## [9] {Days_Indoors=More than 2 months,
## Quarantine_Frustrations=Yes,
## Weight_Change=Maybe,
## Social_Weakness=Maybe} => {Growing_Stress=Yes} 0.01092233 1 0.01092233 2.737542 9
## [10] {Occupation=Student,
## Days_Indoors=Go out Every day,
## Quarantine_Frustrations=Maybe,
## Weight_Change=Maybe} => {Mood_Swings=High} 0.01092233 1 0.01092233 3.097744 9
# Plot the rules using arulesViz
plot(sorted_rules, method = "graph", engine = "htmlwidget")
## Warning: Too many rules supplied. Only plotting the best 100 using 'lift'
## (change control parameter max if needed).
# Scatterplot of rules
plot(sorted_rules, method = "scatterplot", measure = c("support", "confidence"), shading = "lift")
## To reduce overplotting, jitter is added! Use jitter = 0 to prevent jitter.
The association rules analysis conducted on the RHMCD-20 dataset reveals several interesting patterns and relationships between variables related to mental health. These insights provide valuable information for understanding the factors influencing mental well-being and can guide targeted interventions:
1. Work and Mental Health
Insight: Individuals who reported increasing stress levels were often associated with losing interest in work.
Implication: Work-related stress significantly impacts mental health, highlighting the need for workplace interventions that prioritize employee well-being, such as flexible schedules, stress management programs, and mental health resources.
2. Social Weakness and Coping Struggles
Insight: Participants experiencing social weakness (difficulty interacting with others) were strongly linked to coping struggles(inability to handle daily stress).
Implication: Social support networks are crucial for mental health. Programs that encourage community-building, peer support, or counseling can help individuals develop stronger coping mechanisms.
3. Impact of Quarantine and Stress
Insight: Responses indicating quarantine frustration were frequently associated with increasing stress levels and changes in habits(eating and sleeping patterns).
Implication: Prolonged isolation during quarantine negatively affects mental health. Future public health policies should incorporate mental health support systems during periods of isolation or restricted movement.
Insight: Participants experiencing high mood swings were often linked to weight changes during the survey period.
Implication: Significant mood fluctuations could be a sign of deeper mental health challenges, such as depression or anxiety. This association underscores the importance of screening individuals experiencing mood swings for additional mental health indicators.
5. Generational Mental Health History
Insight: Those with a family history of mental health issues often reported challenges like mood swings,coping struggles, and work-related dissatisfaction.
Implication: A generational history of mental health conditions could predispose individuals to similar challenges. Early intervention strategies targeting at-risk groups can be effective.
6. Stress and Habit Changes
Insight: Participants with increasing stress levels were frequently associated with changes in eating and sleeping habits.
Implication: Stress management techniques, such as mindfulness training and therapy, can be implemented to help individuals maintain healthier lifestyles and mitigate the effects of stress.
7. Frustration and Social Weakness
Insight: Participants experiencing quarantine frustration often reported feeling socially weak.
Implication: Isolation has a detrimental effect on social interactions. Providing online social engagement platforms or encouraging digital group activities can help mitigate this.
Overall Themes from the Analysis
Workplace Stress
Strongly linked to mental health issues, emphasizing the role of job satisfaction and workplace dynamics in overall well-being.
Social Interactions
Social struggles are often paired with other mental health challenges, reinforcing the importance of support systems.
Lifestyle Changes
Eating and sleeping habits are key indicators of mental health, often tied to stress and mood fluctuations.
Generational Factors
A history of mental health issues within families plays a significant role in predicting mental health challenges.
These insights not only deepen our understanding of the relationships within the dataset but also provide actionable knowledge that can be applied to design targeted mental health interventions and policies.