Research question: Can individual music listening habits and mental health profiles predict whether music has a positive, neutral, or negative effect on mental health?

why this question matters to you: Music is something most people engage with daily, yet its effect on mental health is rarely examined. Our senses constantly shape how we feel, and music as an auditory experience is no exception. With mental health becoming an increasingly important public concern, understanding whether and how music listening habits correlate with mental wellbeing is both personally meaningful and socially relevant.
Source: Music & Mental Health Survey Results, collected via an online survey and publicly available on Kaggle. (https://www.kaggle.com/datasets/catherinerasgaitis/mxmh-survey-results)
Sample size: 736 respondents
Structure: 33 variables capturing music listening behaviour, genre preferences, and self-reported mental health scores.
Key variables:
Age, Hours per day, BPM - numeric
While working, Fav genre, Instrumentalist - categorical
Frequency [Genre] columns (16 genres) - ordinal (Never → Very frequently)
Anxiety, Depression, Insomnia, OCD - numeric scores (0–10) ← mental health outcomes
Music effects — your target variable (Improve / No effect / Worsen)
Target variable distribution: Improve: 74% (542 respondents)
No effect: 23% (169 respondents)
Worsen: 3% (17 respondents)
Notable data quality issues addressed: BPM contained extreme outliers (up to 999,999,999) which is filtered to realistic range (40–250)
107 missing BPM values which is removed during cleaning
8 missing target variable rows which is also removed
Final clean dataset: ~620 rows

Cleaning Data

# Cleaning

mu<- music_data%>%
  filter(!is.na(`Music effects`))%>%
  mutate(BPM = ifelse(BPM < 40 | BPM > 250, NA, BPM)) %>%
  drop_na(Age, BPM) %>%
  mutate(music_effect = factor(`Music effects`, 
                                levels = c("Worsen", "No effect", "Improve"))) %>%
  mutate(improves = as.factor(ifelse(`Music effects` == "Improve", 1, 0))) %>%
  mutate(across(starts_with("Frequency"), 
                ~ factor(., levels = c("Never", "Rarely", "Sometimes", 
                                        "Very frequently"), ordered = TRUE))) %>%
  mutate(across(c(`While working`, Instrumentalist, Composer, 
                  Exploratory, `Foreign languages`), as.factor))%>%
  print()

## # A tibble: 616 × 35
##    Timestamp          Age Primary streaming se…¹ `Hours per day` `While working`
##    <chr>            <dbl> <chr>                            <dbl> <fct>          
##  1 8/27/2022 21:28…    18 Spotify                            4   No             
##  2 8/27/2022 21:40…    61 YouTube Music                      2.5 Yes            
##  3 8/27/2022 21:54…    18 Spotify                            4   Yes            
##  4 8/27/2022 21:56…    18 Spotify                            5   Yes            
##  5 8/27/2022 22:00…    18 YouTube Music                      3   Yes            
##  6 8/27/2022 22:18…    21 Spotify                            1   Yes            
##  7 8/27/2022 22:33…    19 Spotify                            6   Yes            
##  8 8/27/2022 22:44…    18 I do not use a stream…             1   Yes            
##  9 8/27/2022 23:00…    19 YouTube Music                      8   Yes            
## 10 8/27/2022 23:12…    19 Spotify                            2   Yes            
## # ℹ 606 more rows
## # ℹ abbreviated name: ¹`Primary streaming service`
## # ℹ 30 more variables: Instrumentalist <fct>, Composer <fct>,
## #   `Fav genre` <chr>, Exploratory <fct>, `Foreign languages` <fct>, BPM <dbl>,
## #   `Frequency [Classical]` <ord>, `Frequency [Country]` <ord>,
## #   `Frequency [EDM]` <ord>, `Frequency [Folk]` <ord>,
## #   `Frequency [Gospel]` <ord>, `Frequency [Hip hop]` <ord>, …

EDA:

General view

mu %>%
  count(music_effect) %>%
  mutate(pct = n / sum(n),
         label = paste0(round(pct * 100), "%")) %>%
  ggplot(aes(x = "", y = pct, fill = music_effect)) +
  geom_col(width = 1, color = "white") +
  coord_polar(theta = "y") +
  geom_text(aes(label = label),
            position = position_stack(vjust = 0.5),
            size = 5, color = "white", fontface = "bold") +
  labs(title = "Target Variable Distribution", fill = "Effect") +
  theme_void() +
  theme(legend.position = "bottom",
        plot.title = element_text(hjust = 0.5, face = "bold", size = 14))

It’s very clear that the majority of the responder reported that music improves their mental health, while about 23% reported no effect and only about 3% reported it worsened. This class imbalance will be accounted for in modeling.

Mental Health Scores Distribution

mu %>%
  dplyr::select(music_effect, Anxiety, Depression, Insomnia, OCD) %>%
  pivot_longer(cols = c(Anxiety, Depression, Insomnia, OCD),
               names_to = "Condition", values_to = "Score") %>%
  ggplot(aes(x = music_effect, y = Score, fill = music_effect)) +
  geom_boxplot() +
  facet_wrap(~ Condition) +
  labs(title = "Mental Health Scores by Music Effect",
       x = "Music Effect", y = "Score (0-10)") +
  theme_minimal() +
  theme(legend.position = "none")

Respondents who reported that music worsens their mental health consistently show the highest median scores for anxiety, depression, and insomnia. This suggests that individuals with more severe mental health symptoms may have a more negative experience with music. Interestingly, the “Improve” and “No effect” groups show similar mental health profiles, indicating that symptom severity alone does not fully explain who benefits from music. OCD scores remain relatively low and consistent across all three groups.

Favourite Genre by Music Effect

mu %>%
  count(`Fav genre`, music_effect) %>%
  ggplot(aes(x = reorder(`Fav genre`, n), y = n, fill = music_effect)) +
  geom_col(position = "fill") +
  coord_flip() +
  labs(title = "Music Effect by Favourite Genre",
       x = "Genre", y = "Proportion", fill = "Effect") +
  theme_minimal()

Based on the graph, Gospel and Lofi listeners report near-universal improvement, suggesting these calmer, mood-focused genres may be particularly beneficial. In contrast, Latin and Video game music listeners show the highest proportions of “No effect” and “Worsen” responses. Rap stands out as the genre with the most “Worsen” responses proportionally. These patterns suggest that genre may be a meaningful predictor in our models.

Logistic Regression

Setup & Data Prep

library(tidyverse)
library(tidymodels)

## ── Attaching packages ────────────────────────────────────── tidymodels 1.5.0 ──

## ✔ broom        1.0.12     ✔ rsample      1.3.2 
## ✔ dials        1.4.3      ✔ tailor       0.1.0 
## ✔ infer        1.1.0      ✔ tune         2.1.0 
## ✔ modeldata    1.5.1      ✔ workflows    1.3.0 
## ✔ parsnip      1.5.0      ✔ workflowsets 1.1.1 
## ✔ recipes      1.3.2      ✔ yardstick    1.4.0

## ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
## ✖ foreach::accumulate() masks purrr::accumulate()
## ✖ scales::discard()     masks purrr::discard()
## ✖ dplyr::filter()       masks stats::filter()
## ✖ recipes::fixed()      masks stringr::fixed()
## ✖ dplyr::lag()          masks stats::lag()
## ✖ yardstick::spec()     masks readr::spec()
## ✖ recipes::step()       masks stats::step()
## ✖ foreach::when()       masks purrr::when()

library(discrim)

## 
## Attaching package: 'discrim'

## The following object is masked from 'package:dials':
## 
##     smoothness

# Select predictors and drop remaining NAs
model_data <- mu %>%
  dplyr::select(improves, Age, `Hours per day`, BPM,
         Anxiety, Depression, Insomnia, OCD,
         `While working`, `Fav genre`) %>%
  drop_na()

# Train/test split (80/20)
set.seed(42)
split <- initial_split(model_data, prop = 0.8)
train_data <- training(split)
test_data  <- testing(split)

cat("Training rows:", nrow(train_data), "\n")

## Training rows: 492

cat("Test rows:", nrow(test_data), "\n")

## Test rows: 123

Fit Logistic Regression

log_model <- logistic_reg(mode = "classification") %>%
  fit(data = train_data, improves ~ .)

# View coefficients sorted by p-value
broom::tidy(log_model) %>% arrange(p.value)

## # A tibble: 24 × 5
##    term                        estimate std.error statistic  p.value
##    <chr>                          <dbl>     <dbl>     <dbl>    <dbl>
##  1 Anxiety                      0.169      0.0497     3.39  0.000699
##  2 `While working`Yes           0.724      0.276      2.63  0.00866 
##  3 `Fav genre`Video game music -1.36       0.557     -2.44  0.0146  
##  4 `Fav genre`Hip hop           1.47       0.855      1.72  0.0856  
##  5 Depression                  -0.0732     0.0459    -1.60  0.111   
##  6 `Fav genre`EDM               0.589      0.662      0.890 0.374   
##  7 `Fav genre`Latin            -1.24       1.55      -0.800 0.424   
##  8 `Fav genre`Jazz              0.669      0.891      0.751 0.452   
##  9 OCD                          0.0291     0.0443     0.657 0.511   
## 10 Age                         -0.00646    0.0101    -0.638 0.524   
## # ℹ 14 more rows

Anxiety (p = 0.0007) which show positive effect ( positive estimate), most significant!
While working: Yes (p = 0.009) => listening while working increases odds of improvement ( positive estimate)
Fav genre: Video game music (p = 0.015) we see negative effect, reduces odds of improvement (negative estimate)

Evaluate on Test Set

# Class predictions
log_preds <- predict(log_model, test_data, type = "class")

# Add to test data and compute metrics
test_results <- test_data %>%
  bind_cols(log_preds)

# Confusion matrix
conf_mat(test_results, truth = improves, estimate = .pred_class)

##           Truth
## Prediction  0  1
##          0  3  4
##          1 30 86

# Accuracy + Kappa
metrics(test_results, truth = improves, estimate = .pred_class)

## # A tibble: 2 × 3
##   .metric  .estimator .estimate
##   <chr>    <chr>          <dbl>
## 1 accuracy binary        0.724 
## 2 kap      binary        0.0619

Confusion matrix:

Predicted almost everything as “1” (Improve) ( only 7 predicted as “0”)
This is the class imbalance problem with 74% of data is “Improve” so model just predicts that every time

ROC Curve

# Probability predictions
log_probs <- predict(log_model, test_data, type = "prob")

test_data %>%
  bind_cols(log_probs) %>%
  roc_curve(improves, .pred_1) %>%
  autoplot() +
  labs(title = "ROC Curve - Logistic Regression")

Accuracy: 72.4% is about decent, better than random guessing (~74% “Improve” baseline so actually close)
Kappa: 0.062 -> very low.This means the model barely beats a naive classifier ROC curve hugs the diagonal confirms that the model is weak overall

Visualise Coefficients

broom::tidy(log_model) %>%
  filter(term != "(Intercept)") %>%
  mutate(significant = p.value < 0.05) %>%
  ggplot(aes(x = reorder(term, estimate), 
             y = estimate, fill = significant)) +
  geom_col() +
  coord_flip() +
  labs(title = "Logistic Regression Coefficients",
       x = "Predictor", y = "Estimate",
       fill = "Significant (p<0.05)") +
  theme_minimal()

The logistic regression model achieved 72.4% accuracy but a very low kappa of 0.06 suggesting it barely outperforms a naive classifier that always predicts “Improve.” This happens because of the class imbalance causing the model to default to predicting improvement in most cases.
Only three predictors were statistically significant: Anxiety (positive), listening While Working (positive), and preferring Video Game Music (negative).
The ROC curve closely follows the diagonal, indicating limited discriminative ability.

=> These results suggest logistic regression alone is insufficient, motivating the use of more flexible methods like LDA and Random Forest.

Linear Discriminant Analysis (LDA)

Fit LDA Model

library(discrim)

lda_model <- discrim_linear() %>%
  fit(data = train_data, improves ~ .)

# View discriminant coefficients
lda_model$fit$scaling

##                                       LD1
## Age                         -0.0103034496
## `Hours per day`              0.0018367935
## BPM                          0.0010447607
## Anxiety                      0.2274756629
## Depression                  -0.0918722871
## Insomnia                     0.0008089612
## OCD                          0.0317814150
## `While working`Yes           1.0775087396
## `Fav genre`Country           0.5280938036
## `Fav genre`EDM               0.6630677149
## `Fav genre`Folk              0.1282064955
## `Fav genre`Gospel            2.1925266667
## `Fav genre`Hip hop           1.3961215090
## `Fav genre`Jazz              0.7555282925
## `Fav genre`K pop             0.1572000266
## `Fav genre`Latin            -2.0352668169
## `Fav genre`Lofi              1.8139094522
## `Fav genre`Metal             0.1536635673
## `Fav genre`Pop               0.2562262745
## `Fav genre`R&B               0.0112128965
## `Fav genre`Rap              -0.1178674244
## `Fav genre`Rock             -0.1484036579
## `Fav genre`Video game music -2.2126542933

Evaluate on Test Set

# Class predictions
lda_preds <- predict(lda_model, test_data, type = "class")

# Add to test data
lda_results <- test_data %>%
  bind_cols(lda_preds)

# Confusion matrix
conf_mat(lda_results, truth = improves, estimate = .pred_class)

##           Truth
## Prediction  0  1
##          0  3  7
##          1 30 83

# Accuracy + Kappa
metrics(lda_results, truth = improves, estimate = .pred_class)

## # A tibble: 2 × 3
##   .metric  .estimator .estimate
##   <chr>    <chr>          <dbl>
## 1 accuracy binary        0.699 
## 2 kap      binary        0.0169

ROC Curve

lda_probs <- predict(lda_model, test_data, type = "prob")

test_data %>%
  bind_cols(lda_probs) %>%
  roc_curve(improves, .pred_1) %>%
  autoplot() +
  labs(title = "ROC Curve - LDA")

curve stays very close to the diagonal almost the entire way, only pulling away near the top right. This confirms its kappa of 0.017 which is barely useful.

Compare Logistic Regression vs LDA

log_kap <- metrics(test_data %>% 
                     bind_cols(predict(log_model, test_data, type = "class")),
                   truth = improves, estimate = .pred_class) %>%
  filter(.metric == "kap") %>% pull(.estimate)

lda_kap <- metrics(lda_results, 
                   truth = improves, estimate = .pred_class) %>%
  filter(.metric == "kap") %>% pull(.estimate)

tibble(
  Model = c("Logistic Regression", "LDA"),
  Kappa = c(log_kap, lda_kap)
) %>%
  ggplot(aes(x = Model, y = Kappa, fill = Model)) +
  geom_col(width = 0.4) +
  labs(title = "Model Comparison: Logistic Regression vs LDA",
       y = "Kappa Score") +
  theme_minimal() +
  theme(legend.position = "none")

LDA achieved 69.9% accuracy and a kappa of 0.017, performing worse than logistic regression.

Both models struggled with class imbalance, defaulting to predicting “Improve” in most cases.
The LDA scaling coefficients highlight genre as the strongest discriminating factor, particularly Gospel and Lofi (positive) versus Video game music and Latin (negative), consistent with our EDA findings.
Since both linear classifiers underperform, we proceed to Random Forest which can capture non-linear patterns and handle class imbalance more effectively.

Random Forest

Fit Random Forest

library(ranger)

rf_model <- rand_forest(
  mtry = 3,
  trees = 500,
  min_n = 10
) %>%
  set_engine("ranger", importance = "impurity") %>%
  set_mode("classification") %>%
  fit(improves ~ ., data = train_data)

rf_model

## parsnip model object
## 
## Ranger result
## 
## Call:
##  ranger::ranger(x = maybe_data_frame(x), y = y, mtry = min_cols(~3,      x), num.trees = ~500, min.node.size = min_rows(~10, x), importance = ~"impurity",      num.threads = 1, verbose = FALSE, seed = sample.int(10^5,          1), probability = TRUE) 
## 
## Type:                             Probability estimation 
## Number of trees:                  500 
## Sample size:                      492 
## Number of independent variables:  9 
## Mtry:                             3 
## Target node size:                 10 
## Variable importance mode:         impurity 
## Splitrule:                        gini 
## OOB prediction error (Brier s.):  0.1741619

Evaluate on Test Set

# Class predictions
rf_preds <- predict(rf_model, test_data, type = "class")

# Add to test data
rf_results <- test_data %>%
  bind_cols(rf_preds)

# Confusion matrix
conf_mat(rf_results, truth = improves, estimate = .pred_class)

##           Truth
## Prediction  0  1
##          0  6  4
##          1 27 86

# Accuracy + Kappa
metrics(rf_results, truth = improves, estimate = .pred_class)

## # A tibble: 2 × 3
##   .metric  .estimator .estimate
##   <chr>    <chr>          <dbl>
## 1 accuracy binary         0.748
## 2 kap      binary         0.176

ROC Curve

rf_probs <- predict(rf_model, test_data, type = "prob")

test_data %>%
  bind_cols(rf_probs) %>%
  roc_curve(improves, .pred_1) %>%
  autoplot() +
  labs(title = "ROC Curve - Random Forest")

curve pulls away from the diagonal more consistently across the middle range, showing it genuinely learns some patterns. Consistent with its kappa of 0.176.

Variable Importance

rf_model %>%
  extract_fit_engine() %>%
  pluck("variable.importance") %>%
  enframe(name = "variable", value = "importance") %>%
  arrange(desc(importance)) %>%
  slice_head(n = 10) %>%
  ggplot(aes(x = reorder(variable, importance), 
             y = importance)) +
  geom_col(fill = "steelblue") +
  coord_flip() +
  labs(title = "Top 10 Most Important Variables - Random Forest",
       x = "Variable", y = "Importance") +
  theme_minimal()

Compare All Models

# Get kappa for all 3 models
get_kappa <- function(results) {
  metrics(results, truth = improves, 
          estimate = .pred_class) %>%
    filter(.metric == "kap") %>% 
    pull(.estimate)
}

tibble(
  Model = c("Logistic Regression", "LDA", "Random Forest"),
  Kappa = c(
    get_kappa(test_data %>% bind_cols(predict(log_model, test_data, type = "class"))),
    get_kappa(lda_results),
    get_kappa(rf_results)
  )
) %>%
  ggplot(aes(x = reorder(Model, Kappa), y = Kappa, fill = Model)) +
  geom_col(width = 0.4) +
  labs(title = "Model Comparison: All Three Models",
       x = "Model", y = "Kappa Score") +
  theme_minimal() +
  theme(legend.position = "none")

Random Forest outperformed both linear classifiers, achieving 74.8% accuracy and a kappa of 0.176 which is nearly three times higher than logistic regression. This suggests the relationship between music habits and mental health outcomes is non-linear, which linear models cannot capture.

The variable importance plot reveals that BPM and Age are the strongest predictors, followed closely by Anxiety, favourite genre, and Depression. This is a notable contrast to logistic regression, where BPM and Age were not significant which indicating these variables have complex, non-linear relationships with music’s effect that only tree-based methods can detect.

Despite the improvement, all three models still struggle with class imbalance.

Generalized Additive Model (GAM)

Fit GAM

library(mgcv)

## Loading required package: nlme

## 
## Attaching package: 'nlme'

## The following object is masked from 'package:dplyr':
## 
##     collapse

## This is mgcv 1.9-4. For overview type '?mgcv'.

## 
## Attaching package: 'mgcv'

## The following objects are masked from 'package:gam':
## 
##     gam, gam.control, gam.fit, s

# GAM works better with numeric outcome, so we use 
# the original 3-class variable as ordered numeric
gam_data <- mu %>%
  mutate(music_score = case_when(
    `Music effects` == "Worsen"    ~ 0,
    `Music effects` == "No effect" ~ 1,
    `Music effects` == "Improve"   ~ 2
  )) %>%
  dplyr::select(music_score, Age, `Hours per day`, BPM,
         Anxiety, Depression, Insomnia, OCD) %>%
  rename(Hours_per_day = `Hours per day`) %>% 
  drop_na()

# Fit GAM with smooth terms for numeric predictors
gam_model <- gam(
  music_score ~ s(Anxiety) + s(Depression) + 
                s(Age) + s(BPM) + s(Insomnia) + 
                s(OCD) + s(`Hours_per_day`),
  data = gam_data,
  method = "REML"
)

summary(gam_model)

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## music_score ~ s(Anxiety) + s(Depression) + s(Age) + s(BPM) + 
##     s(Insomnia) + s(OCD) + s(Hours_per_day)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.73052    0.01936   89.39   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                    edf Ref.df     F p-value   
## s(Anxiety)       2.013  2.540 4.776 0.00628 **
## s(Depression)    5.321  6.420 3.368 0.00223 **
## s(Age)           2.002  2.529 1.049 0.39519   
## s(BPM)           1.000  1.001 0.002 0.96637   
## s(Insomnia)      1.000  1.001 0.592 0.44235   
## s(OCD)           1.008  1.016 0.083 0.77884   
## s(Hours_per_day) 1.981  2.487 2.255 0.11575   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.0614   Deviance explained = 8.32%
## -REML = 450.41  Scale est. = 0.23087   n = 616

Plot Smooth Terms

par(mfrow = c(2, 4))  # 2 rows, 4 plots per row
plot(gam_model, shade = TRUE, col = "steelblue",
     shade.col = "lightblue", seWithMean = TRUE)

Check Model Fit

# R-squared and deviance explained
cat("R-squared:        ", summary(gam_model)$r.sq, "\n")

## R-squared:         0.0613555

cat("Deviance explained:", summary(gam_model)$dev.expl, "\n")

## Deviance explained: 0.08321963

The GAM reveals that only Anxiety and Depression have statistically significant nonlinear relationships with music’s effect (p < 0.01). Depression shows the most complex pattern (edf = 5.32), rising at moderate levels then dropping sharply at high scores, suggesting that music benefits those with moderate depression but may be less effective for severe cases. Anxiety shows a gentler upward curve (edf = 2.01), consistent with findings from logistic regression.

The overall model fit is weak (R-squared = 0.06, deviance explained = 8.3%), indicating that mental health scores alone cannot fully explain music’s effect. This aligns with our earlier finding that genre and listening habits also play important roles.

Cross-Validation

Setup 5-Fold Cross-Validation

set.seed(42)

# Use same model_data from before
cv_folds <- vfold_cv(model_data, v = 5, strata = improves)

cv_folds

## #  5-fold cross-validation using stratification 
## # A tibble: 5 × 2
##   splits            id   
##   <list>            <chr>
## 1 <split [491/124]> Fold1
## 2 <split [492/123]> Fold2
## 3 <split [492/123]> Fold3
## 4 <split [492/123]> Fold4
## 5 <split [493/122]> Fold5

Define Models

# Logistic Regression
log_spec <- logistic_reg(mode = "classification") %>%
  set_engine("glm")

# LDA
lda_spec <- discrim_linear() %>%
  set_engine("MASS")

# Random Forest
rf_spec <- rand_forest(
  mtry = 3, trees = 500, min_n = 10
) %>%
  set_engine("ranger") %>%
  set_mode("classification")

Run Cross-Validation

# Define a simple recipe
recipe_spec <- recipe(improves ~ ., data = model_data)

# Workflow + fit for each model
log_cv <- workflow() %>%
  add_recipe(recipe_spec) %>%
  add_model(log_spec) %>%
  fit_resamples(cv_folds, 
                metrics = metric_set(accuracy, kap))

lda_cv <- workflow() %>%
  add_recipe(recipe_spec) %>%
  add_model(lda_spec) %>%
  fit_resamples(cv_folds,
                metrics = metric_set(accuracy, kap))

## → A | warning: no non-missing arguments to min; returning Inf

## There were issues with some computations   A: x3There were issues with some computations   A: x3

rf_cv <- workflow() %>%
  add_recipe(recipe_spec) %>%
  add_model(rf_spec) %>%
  fit_resamples(cv_folds,
                metrics = metric_set(accuracy, kap))

Compare CV Results

# Collect results
log_res <- collect_metrics(log_cv) %>% mutate(model = "Logistic Regression")
lda_res <- collect_metrics(lda_cv) %>% mutate(model = "LDA")
rf_res  <- collect_metrics(rf_cv)  %>% mutate(model = "Random Forest")

# Combine and display
cv_results <- bind_rows(log_res, lda_res, rf_res)
cv_results %>%
  select(model, .metric, mean, std_err) %>%
  arrange(.metric, desc(mean))

## # A tibble: 6 × 4
##   model               .metric    mean std_err
##   <chr>               <chr>     <dbl>   <dbl>
## 1 Random Forest       accuracy 0.759  0.00419
## 2 Logistic Regression accuracy 0.752  0.00960
## 3 LDA                 accuracy 0.747  0.0107 
## 4 Random Forest       kap      0.105  0.0166 
## 5 Logistic Regression kap      0.0920 0.0421 
## 6 LDA                 kap      0.0873 0.0445

Visualise CV Kappa

cv_results %>%
  filter(.metric == "kap") %>%
  ggplot(aes(x = reorder(model, mean), 
             y = mean, fill = model)) +
  geom_col(width = 0.4) +
  geom_errorbar(aes(ymin = mean - std_err,
                    ymax = mean + std_err),
                width = 0.1) +
  labs(title = "5-Fold Cross-Validation: Kappa by Model",
       x = "Model", y = "Mean Kappa") +
  theme_minimal() +
  theme(legend.position = "none")

5-fold cross-validation confirms Random Forest as the best model, achieving the highest mean accuracy (75.9%) and kappa (0.105). Crucially, Random Forest also has the smallest standard error (0.017), indicating it is the most stable and consistent model across different data splits. In contrast, Logistic Regression and LDA show much higher variance (std_err ~0.04), suggesting they are more sensitive to which data points end up in the training set.

The CV kappa scores are lower than single split results, which is expected because cross-validation gives a more honest estimate of true model performance by averaging across multiple splits rather than relying on one lucky split.

Conclusions & Limitations

Across all four methods, the results consistently show that music does have a measurable but modest effect on mental health. The majority of respondents (74%) reported improvement, but predicting WHO benefits from music remains difficult.

The three most important findings are:

Anxiety is the strongest and most consistent predictor across all models: people with higher anxiety are more likely to report music helps them.
Genre matters: Gospel and Lofi listeners report the most improvement, while Video Game Music and Latin listeners report the least, confirmed by both logistic regression and LDA.
BPM and Age are important in nonlinear ways: Random Forest identified them as the top predictors, while GAM showed their effects are complex and cannot be captured by linear models alone.

Why all models performed modestly

All models achieved kappa scores below 0.2, indicating limited predictive power. This is likely due to:
- Class imbalance (74% “Improve”) causing models to default to predicting improvement
- Music’s effect on mental health is highly subjective and influenced by factors not captured in this dataset (e.g. listening context, mood, personal history)

Limitations

Class imbalance: 74% “Improve” makes it hard for models to learn patterns for minority classes. Future work could apply SMOTE oversampling to balance the classes.
Self-reported data: all responses are subjective. Two people with the same anxiety score may experience music very differently.
Cross-sectional survey: we cannot establish causation. People who already feel better may listen to more music, rather than music causing improvement.
Small “Worsen” class: only 17 respondents reported worsening, making it nearly impossible to model this group reliably.

Future directions

Apply SMOTE to address class imbalance
Collect longitudinal data to establish causation
Include additional features like listening context and mood
Try a Bayesian approach to quantify uncertainty in predictions

Linh Le- DI - Final

Linh Le

2026-05-10

Research question: Can individual music listening habits and mental health profiles predict whether music has a positive, neutral, or negative effect on mental health?

Cleaning Data

EDA:

General view

Mental Health Scores Distribution

Favourite Genre by Music Effect

Logistic Regression

Setup & Data Prep

Fit Logistic Regression

Evaluate on Test Set

ROC Curve

Visualise Coefficients

Linear Discriminant Analysis (LDA)

Fit LDA Model

Evaluate on Test Set

ROC Curve

Compare Logistic Regression vs LDA

Random Forest

Fit Random Forest

Evaluate on Test Set

ROC Curve

Variable Importance

Compare All Models

Generalized Additive Model (GAM)

Fit GAM

Plot Smooth Terms

Check Model Fit

Cross-Validation

Setup 5-Fold Cross-Validation

Define Models

Run Cross-Validation

Compare CV Results

Visualise CV Kappa

Conclusions & Limitations

Why all models performed modestly

Limitations

Future directions