why this question matters to you: Music is something most people engage with daily, yet its effect on mental health is rarely examined. Our senses constantly shape how we feel, and music as an auditory experience is no exception. With mental health becoming an increasingly important public concern, understanding whether and how music listening habits correlate with mental wellbeing is both personally meaningful and socially relevant.
Source: Music & Mental Health Survey Results, collected via an online survey and publicly available on Kaggle. (https://www.kaggle.com/datasets/catherinerasgaitis/mxmh-survey-results)
Sample size: 736 respondents
Structure: 33 variables capturing music listening behaviour, genre preferences, and self-reported mental health scores.
Key variables:
Age, Hours per day, BPM -
numeric
While working, Fav genre, Instrumentalist - categorical
Frequency [Genre] columns (16 genres) - ordinal (Never → Very
frequently)
Anxiety, Depression, Insomnia, OCD - numeric scores
(0–10) ← mental health outcomes
Music effects — your target variable
(Improve / No effect / Worsen)
Target variable distribution: Improve: 74% (542
respondents)
No effect: 23% (169 respondents)
Worsen: 3% (17
respondents)
Notable data quality issues addressed: BPM
contained extreme outliers (up to 999,999,999) which is filtered to
realistic range (40–250)
107 missing BPM values which is removed
during cleaning
8 missing target variable rows which is also
removed
Final clean dataset: ~620 rows
# Cleaning
mu<- music_data%>%
filter(!is.na(`Music effects`))%>%
mutate(BPM = ifelse(BPM < 40 | BPM > 250, NA, BPM)) %>%
drop_na(Age, BPM) %>%
mutate(music_effect = factor(`Music effects`,
levels = c("Worsen", "No effect", "Improve"))) %>%
mutate(improves = as.factor(ifelse(`Music effects` == "Improve", 1, 0))) %>%
mutate(across(starts_with("Frequency"),
~ factor(., levels = c("Never", "Rarely", "Sometimes",
"Very frequently"), ordered = TRUE))) %>%
mutate(across(c(`While working`, Instrumentalist, Composer,
Exploratory, `Foreign languages`), as.factor))%>%
print()
## # A tibble: 616 × 35
## Timestamp Age Primary streaming se…¹ `Hours per day` `While working`
## <chr> <dbl> <chr> <dbl> <fct>
## 1 8/27/2022 21:28… 18 Spotify 4 No
## 2 8/27/2022 21:40… 61 YouTube Music 2.5 Yes
## 3 8/27/2022 21:54… 18 Spotify 4 Yes
## 4 8/27/2022 21:56… 18 Spotify 5 Yes
## 5 8/27/2022 22:00… 18 YouTube Music 3 Yes
## 6 8/27/2022 22:18… 21 Spotify 1 Yes
## 7 8/27/2022 22:33… 19 Spotify 6 Yes
## 8 8/27/2022 22:44… 18 I do not use a stream… 1 Yes
## 9 8/27/2022 23:00… 19 YouTube Music 8 Yes
## 10 8/27/2022 23:12… 19 Spotify 2 Yes
## # ℹ 606 more rows
## # ℹ abbreviated name: ¹`Primary streaming service`
## # ℹ 30 more variables: Instrumentalist <fct>, Composer <fct>,
## # `Fav genre` <chr>, Exploratory <fct>, `Foreign languages` <fct>, BPM <dbl>,
## # `Frequency [Classical]` <ord>, `Frequency [Country]` <ord>,
## # `Frequency [EDM]` <ord>, `Frequency [Folk]` <ord>,
## # `Frequency [Gospel]` <ord>, `Frequency [Hip hop]` <ord>, …
mu %>%
count(music_effect) %>%
mutate(pct = n / sum(n),
label = paste0(round(pct * 100), "%")) %>%
ggplot(aes(x = "", y = pct, fill = music_effect)) +
geom_col(width = 1, color = "white") +
coord_polar(theta = "y") +
geom_text(aes(label = label),
position = position_stack(vjust = 0.5),
size = 5, color = "white", fontface = "bold") +
labs(title = "Target Variable Distribution", fill = "Effect") +
theme_void() +
theme(legend.position = "bottom",
plot.title = element_text(hjust = 0.5, face = "bold", size = 14))
It’s very clear that the majority of the responder reported that music improves their mental health, while about 23% reported no effect and only about 3% reported it worsened. This class imbalance will be accounted for in modeling.
mu %>%
dplyr::select(music_effect, Anxiety, Depression, Insomnia, OCD) %>%
pivot_longer(cols = c(Anxiety, Depression, Insomnia, OCD),
names_to = "Condition", values_to = "Score") %>%
ggplot(aes(x = music_effect, y = Score, fill = music_effect)) +
geom_boxplot() +
facet_wrap(~ Condition) +
labs(title = "Mental Health Scores by Music Effect",
x = "Music Effect", y = "Score (0-10)") +
theme_minimal() +
theme(legend.position = "none")
Respondents who reported that music worsens their mental health consistently show the highest median scores for anxiety, depression, and insomnia. This suggests that individuals with more severe mental health symptoms may have a more negative experience with music. Interestingly, the “Improve” and “No effect” groups show similar mental health profiles, indicating that symptom severity alone does not fully explain who benefits from music. OCD scores remain relatively low and consistent across all three groups.
mu %>%
count(`Fav genre`, music_effect) %>%
ggplot(aes(x = reorder(`Fav genre`, n), y = n, fill = music_effect)) +
geom_col(position = "fill") +
coord_flip() +
labs(title = "Music Effect by Favourite Genre",
x = "Genre", y = "Proportion", fill = "Effect") +
theme_minimal()
Based on the graph, Gospel and Lofi listeners report near-universal improvement, suggesting these calmer, mood-focused genres may be particularly beneficial. In contrast, Latin and Video game music listeners show the highest proportions of “No effect” and “Worsen” responses. Rap stands out as the genre with the most “Worsen” responses proportionally. These patterns suggest that genre may be a meaningful predictor in our models.
library(tidyverse)
library(tidymodels)
## ── Attaching packages ────────────────────────────────────── tidymodels 1.5.0 ──
## ✔ broom 1.0.12 ✔ rsample 1.3.2
## ✔ dials 1.4.3 ✔ tailor 0.1.0
## ✔ infer 1.1.0 ✔ tune 2.1.0
## ✔ modeldata 1.5.1 ✔ workflows 1.3.0
## ✔ parsnip 1.5.0 ✔ workflowsets 1.1.1
## ✔ recipes 1.3.2 ✔ yardstick 1.4.0
## ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
## ✖ foreach::accumulate() masks purrr::accumulate()
## ✖ scales::discard() masks purrr::discard()
## ✖ dplyr::filter() masks stats::filter()
## ✖ recipes::fixed() masks stringr::fixed()
## ✖ dplyr::lag() masks stats::lag()
## ✖ yardstick::spec() masks readr::spec()
## ✖ recipes::step() masks stats::step()
## ✖ foreach::when() masks purrr::when()
library(discrim)
##
## Attaching package: 'discrim'
## The following object is masked from 'package:dials':
##
## smoothness
# Select predictors and drop remaining NAs
model_data <- mu %>%
dplyr::select(improves, Age, `Hours per day`, BPM,
Anxiety, Depression, Insomnia, OCD,
`While working`, `Fav genre`) %>%
drop_na()
# Train/test split (80/20)
set.seed(42)
split <- initial_split(model_data, prop = 0.8)
train_data <- training(split)
test_data <- testing(split)
cat("Training rows:", nrow(train_data), "\n")
## Training rows: 492
cat("Test rows:", nrow(test_data), "\n")
## Test rows: 123
log_model <- logistic_reg(mode = "classification") %>%
fit(data = train_data, improves ~ .)
# View coefficients sorted by p-value
broom::tidy(log_model) %>% arrange(p.value)
## # A tibble: 24 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Anxiety 0.169 0.0497 3.39 0.000699
## 2 `While working`Yes 0.724 0.276 2.63 0.00866
## 3 `Fav genre`Video game music -1.36 0.557 -2.44 0.0146
## 4 `Fav genre`Hip hop 1.47 0.855 1.72 0.0856
## 5 Depression -0.0732 0.0459 -1.60 0.111
## 6 `Fav genre`EDM 0.589 0.662 0.890 0.374
## 7 `Fav genre`Latin -1.24 1.55 -0.800 0.424
## 8 `Fav genre`Jazz 0.669 0.891 0.751 0.452
## 9 OCD 0.0291 0.0443 0.657 0.511
## 10 Age -0.00646 0.0101 -0.638 0.524
## # ℹ 14 more rows
Anxiety (p = 0.0007) which show positive effect ( positive estimate),
most significant!
While working: Yes (p = 0.009) => listening
while working increases odds of improvement ( positive estimate)
Fav
genre: Video game music (p = 0.015) we see negative effect, reduces odds
of improvement (negative estimate)
# Class predictions
log_preds <- predict(log_model, test_data, type = "class")
# Add to test data and compute metrics
test_results <- test_data %>%
bind_cols(log_preds)
# Confusion matrix
conf_mat(test_results, truth = improves, estimate = .pred_class)
## Truth
## Prediction 0 1
## 0 3 4
## 1 30 86
# Accuracy + Kappa
metrics(test_results, truth = improves, estimate = .pred_class)
## # A tibble: 2 × 3
## .metric .estimator .estimate
## <chr> <chr> <dbl>
## 1 accuracy binary 0.724
## 2 kap binary 0.0619
Confusion matrix:
Predicted almost everything as “1” (Improve) ( only 7 predicted as
“0”)
This is the class imbalance problem with 74% of data is
“Improve” so model just predicts that every time
# Probability predictions
log_probs <- predict(log_model, test_data, type = "prob")
test_data %>%
bind_cols(log_probs) %>%
roc_curve(improves, .pred_1) %>%
autoplot() +
labs(title = "ROC Curve - Logistic Regression")
Accuracy: 72.4% is about decent, better than random guessing (~74%
“Improve” baseline so actually close)
Kappa: 0.062 -> very
low.This means the model barely beats a naive classifier ROC curve hugs
the diagonal confirms that the model is weak overall
broom::tidy(log_model) %>%
filter(term != "(Intercept)") %>%
mutate(significant = p.value < 0.05) %>%
ggplot(aes(x = reorder(term, estimate),
y = estimate, fill = significant)) +
geom_col() +
coord_flip() +
labs(title = "Logistic Regression Coefficients",
x = "Predictor", y = "Estimate",
fill = "Significant (p<0.05)") +
theme_minimal()
The logistic regression model achieved 72.4% accuracy but a very
low kappa of 0.06 suggesting it barely outperforms a naive classifier
that always predicts “Improve.” This happens because of the class
imbalance causing the model to default to predicting improvement in most
cases.
Only three predictors were statistically significant: Anxiety
(positive), listening While Working (positive), and preferring Video
Game Music (negative).
The ROC curve closely follows the diagonal, indicating limited
discriminative ability.
=> These results suggest logistic regression alone is
insufficient, motivating the use of more flexible methods like LDA and
Random Forest.
library(discrim)
lda_model <- discrim_linear() %>%
fit(data = train_data, improves ~ .)
# View discriminant coefficients
lda_model$fit$scaling
## LD1
## Age -0.0103034496
## `Hours per day` 0.0018367935
## BPM 0.0010447607
## Anxiety 0.2274756629
## Depression -0.0918722871
## Insomnia 0.0008089612
## OCD 0.0317814150
## `While working`Yes 1.0775087396
## `Fav genre`Country 0.5280938036
## `Fav genre`EDM 0.6630677149
## `Fav genre`Folk 0.1282064955
## `Fav genre`Gospel 2.1925266667
## `Fav genre`Hip hop 1.3961215090
## `Fav genre`Jazz 0.7555282925
## `Fav genre`K pop 0.1572000266
## `Fav genre`Latin -2.0352668169
## `Fav genre`Lofi 1.8139094522
## `Fav genre`Metal 0.1536635673
## `Fav genre`Pop 0.2562262745
## `Fav genre`R&B 0.0112128965
## `Fav genre`Rap -0.1178674244
## `Fav genre`Rock -0.1484036579
## `Fav genre`Video game music -2.2126542933
# Class predictions
lda_preds <- predict(lda_model, test_data, type = "class")
# Add to test data
lda_results <- test_data %>%
bind_cols(lda_preds)
# Confusion matrix
conf_mat(lda_results, truth = improves, estimate = .pred_class)
## Truth
## Prediction 0 1
## 0 3 7
## 1 30 83
# Accuracy + Kappa
metrics(lda_results, truth = improves, estimate = .pred_class)
## # A tibble: 2 × 3
## .metric .estimator .estimate
## <chr> <chr> <dbl>
## 1 accuracy binary 0.699
## 2 kap binary 0.0169
lda_probs <- predict(lda_model, test_data, type = "prob")
test_data %>%
bind_cols(lda_probs) %>%
roc_curve(improves, .pred_1) %>%
autoplot() +
labs(title = "ROC Curve - LDA")
curve stays very close to the diagonal almost the entire way, only
pulling away near the top right. This confirms its kappa of 0.017 which
is barely useful.
log_kap <- metrics(test_data %>%
bind_cols(predict(log_model, test_data, type = "class")),
truth = improves, estimate = .pred_class) %>%
filter(.metric == "kap") %>% pull(.estimate)
lda_kap <- metrics(lda_results,
truth = improves, estimate = .pred_class) %>%
filter(.metric == "kap") %>% pull(.estimate)
tibble(
Model = c("Logistic Regression", "LDA"),
Kappa = c(log_kap, lda_kap)
) %>%
ggplot(aes(x = Model, y = Kappa, fill = Model)) +
geom_col(width = 0.4) +
labs(title = "Model Comparison: Logistic Regression vs LDA",
y = "Kappa Score") +
theme_minimal() +
theme(legend.position = "none")
LDA achieved 69.9% accuracy and a kappa of 0.017, performing worse than
logistic regression.
Both models struggled with class imbalance, defaulting to predicting
“Improve” in most cases.
The LDA scaling coefficients highlight
genre as the strongest discriminating factor, particularly Gospel and
Lofi (positive) versus Video game music and Latin (negative), consistent
with our EDA findings.
Since both linear classifiers underperform,
we proceed to Random Forest which can capture non-linear patterns and
handle class imbalance more effectively.
library(ranger)
rf_model <- rand_forest(
mtry = 3,
trees = 500,
min_n = 10
) %>%
set_engine("ranger", importance = "impurity") %>%
set_mode("classification") %>%
fit(improves ~ ., data = train_data)
rf_model
## parsnip model object
##
## Ranger result
##
## Call:
## ranger::ranger(x = maybe_data_frame(x), y = y, mtry = min_cols(~3, x), num.trees = ~500, min.node.size = min_rows(~10, x), importance = ~"impurity", num.threads = 1, verbose = FALSE, seed = sample.int(10^5, 1), probability = TRUE)
##
## Type: Probability estimation
## Number of trees: 500
## Sample size: 492
## Number of independent variables: 9
## Mtry: 3
## Target node size: 10
## Variable importance mode: impurity
## Splitrule: gini
## OOB prediction error (Brier s.): 0.1741619
# Class predictions
rf_preds <- predict(rf_model, test_data, type = "class")
# Add to test data
rf_results <- test_data %>%
bind_cols(rf_preds)
# Confusion matrix
conf_mat(rf_results, truth = improves, estimate = .pred_class)
## Truth
## Prediction 0 1
## 0 6 4
## 1 27 86
# Accuracy + Kappa
metrics(rf_results, truth = improves, estimate = .pred_class)
## # A tibble: 2 × 3
## .metric .estimator .estimate
## <chr> <chr> <dbl>
## 1 accuracy binary 0.748
## 2 kap binary 0.176
rf_probs <- predict(rf_model, test_data, type = "prob")
test_data %>%
bind_cols(rf_probs) %>%
roc_curve(improves, .pred_1) %>%
autoplot() +
labs(title = "ROC Curve - Random Forest")
curve pulls away from the diagonal more consistently across the middle range, showing it genuinely learns some patterns. Consistent with its kappa of 0.176.
rf_model %>%
extract_fit_engine() %>%
pluck("variable.importance") %>%
enframe(name = "variable", value = "importance") %>%
arrange(desc(importance)) %>%
slice_head(n = 10) %>%
ggplot(aes(x = reorder(variable, importance),
y = importance)) +
geom_col(fill = "steelblue") +
coord_flip() +
labs(title = "Top 10 Most Important Variables - Random Forest",
x = "Variable", y = "Importance") +
theme_minimal()
# Get kappa for all 3 models
get_kappa <- function(results) {
metrics(results, truth = improves,
estimate = .pred_class) %>%
filter(.metric == "kap") %>%
pull(.estimate)
}
tibble(
Model = c("Logistic Regression", "LDA", "Random Forest"),
Kappa = c(
get_kappa(test_data %>% bind_cols(predict(log_model, test_data, type = "class"))),
get_kappa(lda_results),
get_kappa(rf_results)
)
) %>%
ggplot(aes(x = reorder(Model, Kappa), y = Kappa, fill = Model)) +
geom_col(width = 0.4) +
labs(title = "Model Comparison: All Three Models",
x = "Model", y = "Kappa Score") +
theme_minimal() +
theme(legend.position = "none")
Random Forest outperformed both linear classifiers, achieving 74.8% accuracy and a kappa of 0.176 which is nearly three times higher than logistic regression. This suggests the relationship between music habits and mental health outcomes is non-linear, which linear models cannot capture.
The variable importance plot reveals that BPM and Age are the strongest predictors, followed closely by Anxiety, favourite genre, and Depression. This is a notable contrast to logistic regression, where BPM and Age were not significant which indicating these variables have complex, non-linear relationships with music’s effect that only tree-based methods can detect.
Despite the improvement, all three models still struggle with class imbalance.
library(mgcv)
## Loading required package: nlme
##
## Attaching package: 'nlme'
## The following object is masked from 'package:dplyr':
##
## collapse
## This is mgcv 1.9-4. For overview type '?mgcv'.
##
## Attaching package: 'mgcv'
## The following objects are masked from 'package:gam':
##
## gam, gam.control, gam.fit, s
# GAM works better with numeric outcome, so we use
# the original 3-class variable as ordered numeric
gam_data <- mu %>%
mutate(music_score = case_when(
`Music effects` == "Worsen" ~ 0,
`Music effects` == "No effect" ~ 1,
`Music effects` == "Improve" ~ 2
)) %>%
dplyr::select(music_score, Age, `Hours per day`, BPM,
Anxiety, Depression, Insomnia, OCD) %>%
rename(Hours_per_day = `Hours per day`) %>%
drop_na()
# Fit GAM with smooth terms for numeric predictors
gam_model <- gam(
music_score ~ s(Anxiety) + s(Depression) +
s(Age) + s(BPM) + s(Insomnia) +
s(OCD) + s(`Hours_per_day`),
data = gam_data,
method = "REML"
)
summary(gam_model)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## music_score ~ s(Anxiety) + s(Depression) + s(Age) + s(BPM) +
## s(Insomnia) + s(OCD) + s(Hours_per_day)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.73052 0.01936 89.39 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(Anxiety) 2.013 2.540 4.776 0.00628 **
## s(Depression) 5.321 6.420 3.368 0.00223 **
## s(Age) 2.002 2.529 1.049 0.39519
## s(BPM) 1.000 1.001 0.002 0.96637
## s(Insomnia) 1.000 1.001 0.592 0.44235
## s(OCD) 1.008 1.016 0.083 0.77884
## s(Hours_per_day) 1.981 2.487 2.255 0.11575
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.0614 Deviance explained = 8.32%
## -REML = 450.41 Scale est. = 0.23087 n = 616
par(mfrow = c(2, 4)) # 2 rows, 4 plots per row
plot(gam_model, shade = TRUE, col = "steelblue",
shade.col = "lightblue", seWithMean = TRUE)
# R-squared and deviance explained
cat("R-squared: ", summary(gam_model)$r.sq, "\n")
## R-squared: 0.0613555
cat("Deviance explained:", summary(gam_model)$dev.expl, "\n")
## Deviance explained: 0.08321963
The GAM reveals that only Anxiety and Depression have statistically
significant nonlinear relationships with music’s effect (p < 0.01).
Depression shows the most complex pattern (edf = 5.32), rising at
moderate levels then dropping sharply at high scores, suggesting that
music benefits those with moderate depression but may be less effective
for severe cases. Anxiety shows a gentler upward curve (edf = 2.01),
consistent with findings from logistic regression.
The overall model fit is weak (R-squared = 0.06, deviance explained =
8.3%), indicating that mental health scores alone cannot fully explain
music’s effect. This aligns with our earlier finding that genre and
listening habits also play important roles.
set.seed(42)
# Use same model_data from before
cv_folds <- vfold_cv(model_data, v = 5, strata = improves)
cv_folds
## # 5-fold cross-validation using stratification
## # A tibble: 5 × 2
## splits id
## <list> <chr>
## 1 <split [491/124]> Fold1
## 2 <split [492/123]> Fold2
## 3 <split [492/123]> Fold3
## 4 <split [492/123]> Fold4
## 5 <split [493/122]> Fold5
# Logistic Regression
log_spec <- logistic_reg(mode = "classification") %>%
set_engine("glm")
# LDA
lda_spec <- discrim_linear() %>%
set_engine("MASS")
# Random Forest
rf_spec <- rand_forest(
mtry = 3, trees = 500, min_n = 10
) %>%
set_engine("ranger") %>%
set_mode("classification")
# Define a simple recipe
recipe_spec <- recipe(improves ~ ., data = model_data)
# Workflow + fit for each model
log_cv <- workflow() %>%
add_recipe(recipe_spec) %>%
add_model(log_spec) %>%
fit_resamples(cv_folds,
metrics = metric_set(accuracy, kap))
lda_cv <- workflow() %>%
add_recipe(recipe_spec) %>%
add_model(lda_spec) %>%
fit_resamples(cv_folds,
metrics = metric_set(accuracy, kap))
## → A | warning: no non-missing arguments to min; returning Inf
## There were issues with some computations A: x3There were issues with some computations A: x3
rf_cv <- workflow() %>%
add_recipe(recipe_spec) %>%
add_model(rf_spec) %>%
fit_resamples(cv_folds,
metrics = metric_set(accuracy, kap))
# Collect results
log_res <- collect_metrics(log_cv) %>% mutate(model = "Logistic Regression")
lda_res <- collect_metrics(lda_cv) %>% mutate(model = "LDA")
rf_res <- collect_metrics(rf_cv) %>% mutate(model = "Random Forest")
# Combine and display
cv_results <- bind_rows(log_res, lda_res, rf_res)
cv_results %>%
select(model, .metric, mean, std_err) %>%
arrange(.metric, desc(mean))
## # A tibble: 6 × 4
## model .metric mean std_err
## <chr> <chr> <dbl> <dbl>
## 1 Random Forest accuracy 0.759 0.00419
## 2 Logistic Regression accuracy 0.752 0.00960
## 3 LDA accuracy 0.747 0.0107
## 4 Random Forest kap 0.105 0.0166
## 5 Logistic Regression kap 0.0920 0.0421
## 6 LDA kap 0.0873 0.0445
cv_results %>%
filter(.metric == "kap") %>%
ggplot(aes(x = reorder(model, mean),
y = mean, fill = model)) +
geom_col(width = 0.4) +
geom_errorbar(aes(ymin = mean - std_err,
ymax = mean + std_err),
width = 0.1) +
labs(title = "5-Fold Cross-Validation: Kappa by Model",
x = "Model", y = "Mean Kappa") +
theme_minimal() +
theme(legend.position = "none")
5-fold cross-validation confirms Random Forest as the best model,
achieving the highest mean accuracy (75.9%) and kappa (0.105).
Crucially, Random Forest also has the smallest standard error (0.017),
indicating it is the most stable and consistent model across different
data splits. In contrast, Logistic Regression and LDA show much higher
variance (std_err ~0.04), suggesting they are more sensitive to which
data points end up in the training set.
The CV kappa scores are lower than single split results, which is
expected because cross-validation gives a more honest estimate of true
model performance by averaging across multiple splits rather than
relying on one lucky split.
Across all four methods, the results consistently show that music
does have a measurable but modest effect on mental health. The majority
of respondents (74%) reported improvement, but predicting WHO benefits
from music remains difficult.
The three most important findings are:
Anxiety is the strongest and most consistent predictor across all
models: people with higher anxiety are more likely to report music helps
them.
Genre matters: Gospel and Lofi listeners report the most
improvement, while Video Game Music and Latin listeners report the
least, confirmed by both logistic regression and LDA.
BPM and Age are important in nonlinear ways: Random Forest
identified them as the top predictors, while GAM showed their effects
are complex and cannot be captured by linear models alone.
All models achieved kappa scores below 0.2, indicating limited
predictive power. This is likely due to:
- Class imbalance (74%
“Improve”) causing models to default to predicting improvement
-
Music’s effect on mental health is highly subjective and influenced by
factors not captured in this dataset (e.g. listening context, mood,
personal history)
Class imbalance: 74% “Improve” makes it hard for models to learn
patterns for minority classes. Future work could apply SMOTE
oversampling to balance the classes.
Self-reported data: all responses are subjective. Two people with
the same anxiety score may experience music very differently.
Cross-sectional survey: we cannot establish causation. People who
already feel better may listen to more music, rather than music causing
improvement.
Small “Worsen” class: only 17 respondents reported worsening,
making it nearly impossible to model this group reliably.