Recently a good friend of mine mentioned that listening to and learning to play the ukulele really helped her to cope during the height of the COVID-19 pandemic. I have had similar experiences of having my mood positively altered by music, and I wondered how shared of an experience this might be. So when shortly thereafter, I was tasked with finding a dataset for our DATA 110 final project I thought it was an opportune time to explore this topic of music and its potential impact on mood and mental health. I was lucky to have found such a dataset through Kaggle. The Music and Mental Health (MxMH) data was collected via a Google Form (survey) and managed by Catherine Rasgaitis, a Computer Science student at the University of Washington. Participation from respondents was solicited through “various Reddit forums, Discord servers, and social media platforms. Posters and business cards were also used to advertise the form in libraries, parks, and other public locations”. The dataset has 33 variables and 736 observations. The variables I anticipate being especially useful to my exploration include age, hours per day, fav genre, anxiety, depression, insomnia, OCD, and music effects. For my exploration I would first like to establish does the data show a correlation between music and mood or mental health? Then, I will set out to explore questions like: (1) which musical genres, if any, have the greatest impact on mental health? (2) Does age have an impact on the effectiveness of music on mental health? (3) Does the number of listening hours per day have an impact on the effectiveness of music on mental health. Below is a full list of the dataset variables and descriptions:
Find the dataset here: https://www.kaggle.com/datasets/catherinerasgaitis/mxmh-survey-results
| Variable | Description |
|---|---|
| Timestamp | Date and time when form was submitted |
| Age | Respondent’s age |
| Primary streaming service | Respondent’s primary streaming service |
| Hours per day | Number of hours the respondent listens to music per day |
| While working | Does the respondent listen to music while studying/working? |
| Instrumentalist | Does the respondent play an instrument regularly? |
| Composer | Does the respondent compose music? |
| Fav genre | Respondent’s favorite or top genre |
| Exploratory | Does the respondent actively explore new artists/genres? |
| Foreign languages | Does the respondent regularly listen to music with lyrics in a language they are not fluent in? |
| BPM | Beats per minute of favorite genre |
| Frequency | Respondents rank how often they listen to 16 music genres, where they can select: “Never”, “Rarely”, “Sometimes”, “Very frequently” |
| Anxiety | Respondents rank Anxiety on a scale of 0 to 10, where 0 = I do not experience this and 10 = I experience this regularly, constantly/or to an extreme. |
| Depression | Respondents rank Depression on a scale of 0 to 10, where 0 = I do not experience this and 10 = I experience this regularly, constantly/or to an extreme. |
| Insomnia | Respondents rank Insomnia on a scale of 0 to 10, where 0 = I do not experience this and 10 = I experience this regularly, constantly/or to an extreme. |
| OCD | Respondents rank OCD on a scale of 0 to 10, where 0 = I do not experience this and 10 = I experience this regularly, constantly/or to an extreme. |
| Music effects | Does music improve/worsen respondent’s mental health conditions? |
| Permissions | Permissions to publicize data. |
# Loading libraries.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(janitor)
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(dplyr)
library(plotly)
##
## Attaching package: 'plotly'
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following object is masked from 'package:graphics':
##
## layout
library(highcharter)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
library(RColorBrewer)
# Reading in the dataset and viewing the head.
mxmh <- read_csv("musicxmentalhealth.csv")
## Rows: 736 Columns: 33
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (26): Timestamp, Primary streaming service, While working, Instrumentali...
## dbl (7): Age, Hours per day, BPM, Anxiety, Depression, Insomnia, OCD
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(mxmh)
## # A tibble: 6 × 33
## Timestamp Age Primary streaming se…¹ `Hours per day` `While working`
## <chr> <dbl> <chr> <dbl> <chr>
## 1 8/27/2022 19:29:… 18 Spotify 3 Yes
## 2 8/27/2022 19:57:… 63 Pandora 1.5 Yes
## 3 8/27/2022 21:28:… 18 Spotify 4 No
## 4 8/27/2022 21:40:… 61 YouTube Music 2.5 Yes
## 5 8/27/2022 21:54:… 18 Spotify 4 Yes
## 6 8/27/2022 21:56:… 18 Spotify 5 Yes
## # ℹ abbreviated name: ¹`Primary streaming service`
## # ℹ 28 more variables: Instrumentalist <chr>, Composer <chr>,
## # `Fav genre` <chr>, Exploratory <chr>, `Foreign languages` <chr>, BPM <dbl>,
## # `Frequency [Classical]` <chr>, `Frequency [Country]` <chr>,
## # `Frequency [EDM]` <chr>, `Frequency [Folk]` <chr>,
## # `Frequency [Gospel]` <chr>, `Frequency [Hip hop]` <chr>,
## # `Frequency [Jazz]` <chr>, `Frequency [K pop]` <chr>, …
The colSums output below shows that most of the variables do not have any missing values and some variables have very few missing values. BPM has 107 NAs, which is more than 14% of all observations in the BPM variable. I believe the high number of missing values in the BPM column is attributed to it being one of the “harder” items to answer and being specified as “optional” in the survey, according to Catherine Rasgaitis.
# Finding the NAs in the dataset.
colSums(is.na(mxmh))
## Timestamp Age
## 0 1
## Primary streaming service Hours per day
## 1 0
## While working Instrumentalist
## 3 4
## Composer Fav genre
## 1 0
## Exploratory Foreign languages
## 0 4
## BPM Frequency [Classical]
## 107 0
## Frequency [Country] Frequency [EDM]
## 0 0
## Frequency [Folk] Frequency [Gospel]
## 0 0
## Frequency [Hip hop] Frequency [Jazz]
## 0 0
## Frequency [K pop] Frequency [Latin]
## 0 0
## Frequency [Lofi] Frequency [Metal]
## 0 0
## Frequency [Pop] Frequency [R&B]
## 0 0
## Frequency [Rap] Frequency [Rock]
## 0 0
## Frequency [Video game music] Anxiety
## 0 0
## Depression Insomnia
## 0 0
## OCD Music effects
## 0 8
## Permissions
## 0
# Removing capital letters and spaces from variable names.
mxmh <- clean_names(mxmh)
# Removing the variables I don't need, and removing missing values from the remaining variables that have NAs.
mxmh1 <- mxmh |>
select(-exploratory, -instrumentalist, -composer, -foreign_languages, -permissions) |>
filter(!is.na(music_effects)) |>
filter(!is.na(age)) |>
filter(!is.na(while_working)) |>
filter(!is.na(primary_streaming_service)) |>
filter(!is.na(bpm))
# Creating a bar plot to show the distribution on music effects.
mxmh1 |>
select(age, music_effects) |>
filter(age > 17) |>
mutate(music_effects = recode(music_effects, "No effect" = "No Change")) |>
count(music_effects) |>
plot_ly(x=~music_effects, y=~n, color = ~music_effects, text = ~n, textposition = "outside", hovertext = ~paste("Change=", music_effects, "\n", "Count=", n), hoverinfo = "text") |>
add_bars() |>
layout(title = "Distribution of Changes in Mental Health, Ages 18-89", xaxis = list(title = "Music Effects"),
yaxis = list(title = "Respondent Count")) |>
layout(annotations = list(x = 1, y = -0.1, text = "Source: Source: Catherine Rasgaitis via Kaggle", showarrow = F, xref='paper', yref='paper', xanchor='center', yanchor='auto', xshift=0, yshift=0, font=list(size=10)))
The bar graph above shows that the majority of respondents self-reported an improvement in mood or mental health symptoms as a result of listening to music. A smaller group of respondents reported that no change occurred, and a very marginal group reported that their mood or symptoms worsened as a result of their music consumption. What music genres are the respondents who reported a worsening of mood/symptoms listening to? I will explore this question with my second visualization.
mxmh1 |>
select(age, fav_genre, music_effects, hours_per_day) |>
filter(age > 17) |>
mutate(music_effects = recode(music_effects, "No effect" = "No Change")) |>
hchart('column', hcaes(x = fav_genre, y = hours_per_day, group = music_effects)) |>
hc_title(text = "Relationship Between Favorite Genre and Music Effects, Ages 18-89", align = "center") |>
hc_xAxis(title = list(text = "Favorite Music Genre")) |>
hc_yAxis(title = list(text = "Listening Hours Per Day")) |>
hc_legend(align = "center") |>
hc_caption(text= "Source: Catherine Rasgaitis via Kaggle") |>
hc_add_theme(hc_theme_smpl()) |>
hc_colors(c("#9989c6", "#00cc99", "#d2691e"))
The graph above shows the relationship between respondents’ favorite music genre and self-reported music effects. The three genres where respondents reported a worsening of mood or mental health are video game music, rock, and pop. Conversely, gospel, Lofi and Latin music are the only three music genres that are solely associated with an improvement of mood or mental health.
# Statistical analysis of fav genre and daily listening hours with boxplots.
mxmh1 |>
select(fav_genre, hours_per_day) |>
filter(fav_genre == "Video game music" | fav_genre == "Rock" | fav_genre == "Pop" | fav_genre == "Gospel" | fav_genre == "Lofi" | fav_genre == "Latin") |>
ggplot(aes(hours_per_day, fav_genre, fill = fav_genre)) +
geom_boxplot() +
labs(x="Listening Hours Per Day", y="Favorite Genre", title = "Daily Music Consumption by Favorite Genre", caption = "Source: Catherine Rasgaitis via Kaggle") +
theme_bw() +
scale_fill_brewer(palette = "Blues")
Above, I created boxplots for the six unique favorite music genres and the number of hours respondents reported listening to music daily. Although, there is not enough data to suggest that respondents listen to their favorite music genres all or most of the time, for the purpose of this project I will assume that respondents do tend to listen to their favorite music genres more often than not.
While several fav genres have outliers in daily music consumption, rock’s outlier is stark and suggests that a respondent who favors this genre of music has reported listening to music 24 hours a day. This is likely an error. The rest of the graph shows that 50 percent of respondents from each fav genre category listen to less than 5 hours of music a day. The exception is the Latin fav genre, which has a majority of respondents who listen to between 5 and 10 hours of music a day.
Anecdotally, music has had a longstanding reputation that promulgates its healing and cathartic effects. Music as therapy has been officially and professionally recognized since after WWI and WWII when hospitals commissioned musicians to perform for veterans suffering from war-related physical and emotional traumas. As an answer to the growing evidence of music’s therapeutic effects, medical practitioners called for formal education and training to be developed. In the 1940s, Music Therapy (MT) curricula began to be integrated in academic institutions, which paved the way for MT to be established as a clinical profession.
The first visualization in this exploration shows that music has clearly had an impact on the mood/mental health of the respondents based on their own self-reported. The majority of respondents reported an improvement, which is consistent with the literature surrounding the therapeutic nature of music. I found the findings of the second visualization to be interesting. All cases where a worsening of mental health is attributed to music consumption were concentrated in three favorite music genres: video game music, rock, and pop. I believe this would be a great place for further exploration in another project that looks more closely at these music genres. It was also interesting to see that gospel, lofi, and Latin music were solely attributed to improved mental health, according to this dataset. One can make assumptions based on the uplifting and spiritual lyrics of most gospel music or the vibrant and upbeat rhythms of Latin music, but I think this would also be a good point for further exploration in future projects.