Dataset Information

This dataset looks at survey results on music taste, music listening habits, and self-reported mental health scores for anxiety, depression, and insomnia. The dataset has 736 observations (participants) and 31 variables. The survey was created, distributed, and formatted into a data frame by computer science professor Catherine Rasgaitis at the University of Washington. The variables are as follows:

Variable Name Description
Age Respondent’s age
Primary Streaming Service The streaming app the respondent uses the most to listen to music
Hours per day Average hours the respondent spends listening to music per day
While working Whether or not the respondent listens to music while working
Instrumentalist Whether or not the respondent plays an instrument regularly
Composer Whether or not the respondent composes music
Fav genre The respondent’s favorite genre
Exploratory Whether or not the respondent actively explores new artists/genres
Foreign languages Whether or not the respondent regularly listens to music with lyrics in a language they are not fluent in
BPM Average BPM of some of their favorite songs
Frequency[Classical, Country, EDM, Folk, Gospel, Hip Hop, Jazz, K Pop, Latin, Lofi, Metal,, Pop, R&B, Rap, Rock, Video Game Music] (each in their own respective column) Respondents’ rank of how often they listen to each of the music genres where they can select: Never, Rarely, Sometimes, or Very Frequently
Anxiety, Depression, Insomnia, OCD (each in their own respective column) Respondents’ self-reported ranks to these feelings where: 0 = “I do not experience this.” and 10 = “I experience this regularly, constantly, or extreme.”
Music effects Whether or not music improves/worsens respondent’s mental health conditions

Loading the data

library(tidyverse)
surveyraw <- read_csv("mxmhsurvey.csv")
surveyraw <- surveyraw %>% select(-c(Timestamp, Permissions))
names(surveyraw) <- gsub(" ", "_", names(surveyraw))

Examining missing values

sum(is.na(surveyraw))
## [1] 129

A more detailed look:

colSums(is.na(surveyraw))
##                          Age    Primary_streaming_service 
##                            1                            1 
##                Hours_per_day                While_working 
##                            0                            3 
##              Instrumentalist                     Composer 
##                            4                            1 
##                    Fav_genre                  Exploratory 
##                            0                            0 
##            Foreign_languages                          BPM 
##                            4                          107 
##        Frequency_[Classical]          Frequency_[Country] 
##                            0                            0 
##              Frequency_[EDM]             Frequency_[Folk] 
##                            0                            0 
##           Frequency_[Gospel]          Frequency_[Hip_hop] 
##                            0                            0 
##             Frequency_[Jazz]            Frequency_[K_pop] 
##                            0                            0 
##            Frequency_[Latin]             Frequency_[Lofi] 
##                            0                            0 
##            Frequency_[Metal]              Frequency_[Pop] 
##                            0                            0 
##              Frequency_[R&B]              Frequency_[Rap] 
##                            0                            0 
##             Frequency_[Rock] Frequency_[Video_game_music] 
##                            0                            0 
##                      Anxiety                   Depression 
##                            0                            0 
##                     Insomnia                          OCD 
##                            0                            0 
##                Music_effects 
##                            8

BPM is the column with the most missing data, and it is no surprise. It was anticipated that many respondents would not go through the hassle of researching song data, and thus was left optional on the survey.

Examining percentage of missing values

mean(is.na(surveyraw) * 100)
## [1] 0.5653927

This is about 0.0565%, which is an incredibly small portion of the dataset. Onto a more detailed look:

colMeans(is.na(surveyraw) * 100)
##                          Age    Primary_streaming_service 
##                    0.1358696                    0.1358696 
##                Hours_per_day                While_working 
##                    0.0000000                    0.4076087 
##              Instrumentalist                     Composer 
##                    0.5434783                    0.1358696 
##                    Fav_genre                  Exploratory 
##                    0.0000000                    0.0000000 
##            Foreign_languages                          BPM 
##                    0.5434783                   14.5380435 
##        Frequency_[Classical]          Frequency_[Country] 
##                    0.0000000                    0.0000000 
##              Frequency_[EDM]             Frequency_[Folk] 
##                    0.0000000                    0.0000000 
##           Frequency_[Gospel]          Frequency_[Hip_hop] 
##                    0.0000000                    0.0000000 
##             Frequency_[Jazz]            Frequency_[K_pop] 
##                    0.0000000                    0.0000000 
##            Frequency_[Latin]             Frequency_[Lofi] 
##                    0.0000000                    0.0000000 
##            Frequency_[Metal]              Frequency_[Pop] 
##                    0.0000000                    0.0000000 
##              Frequency_[R&B]              Frequency_[Rap] 
##                    0.0000000                    0.0000000 
##             Frequency_[Rock] Frequency_[Video_game_music] 
##                    0.0000000                    0.0000000 
##                      Anxiety                   Depression 
##                    0.0000000                    0.0000000 
##                     Insomnia                          OCD 
##                    0.0000000                    0.0000000 
##                Music_effects 
##                    1.0869565

What are the average scores of each genre based on age group?

The age groups are as follows:

Description Age Range
Kid 10 to 17
Young Adult 18 to 22
Adult 23 to 40
Mid-Life Adult 41 to 64
Elderly 65 +
age_intervals <- c(0, 17, 22, 40, 64, Inf)
age_labels <- c("Kid", "Young Adult", "Adult", "Mid-Life Adult", "Elderly")
surveyraw <- na.omit(surveyraw)
surveyraw$age_range <- cut(surveyraw$Age, breaks = age_intervals, labels = age_labels, right = FALSE, na.omit = TRUE)

Visual per age group

(x-axis labels were causing a lot of trouble so I color coded them as opposed to listing favorite genre on the x-axis)

mean_anx <- surveyraw %>% 
  group_by(Fav_genre, age_range) %>% 
  summarise(Mean_Anxiety = mean(Anxiety, na.rm = TRUE))
anx_kid <- mean_anx %>% filter(age_range == "Kid")
barplot(height = anx_kid$Mean_Anxiety,
        beside = TRUE,
        xlab = "Favorite Genre",
        ylab = "Mean Self-Reported Anxiety Scores",
        main = "Mean Self-Reported Anxiety Scores of Kids (10 to 17)",
        col = rainbow(length(unique(anx_kid$Fav_genre))),
        legend.text = unique(anx_kid$Fav_genre),
        args.legend = list(x = "topright", bty = "n", cex = 0.35))

anx_ya <- mean_anx %>% filter(age_range == "Young Adult")
barplot(height = anx_ya$Mean_Anxiety,
        beside = TRUE,
        ylim = c(0, 10),
        xlab = "Favorite Genre",
        ylab = "Mean Self-Reported Anxiety Scores",
        main = "Mean Self-Reported Anxiety Scores of Young Adults (18 to 22)",
        col = rainbow(length(unique(anx_ya$Fav_genre))),
        legend.text = unique(anx_ya$Fav_genre),
        args.legend = list(x = "topleft", bty = "n", cex = 0.3))

anx_adult <- mean_anx %>% filter(age_range == "Adult")
barplot(height = anx_adult$Mean_Anxiety,
        beside = TRUE,
        ylim = c(0, 10),
        xlab = "Favorite Genre",
        ylab = "Mean Self-Reported Anxiety Scores",
        main = "Mean Self-Reported Anxiety Scores of Adults (23 to 40)",
        col = rainbow(length(unique(anx_adult$Fav_genre))),
        legend.text = unique(anx_adult$Fav_genre),
        args.legend = list(x = "topleft", bty = "n", cex = 0.4))

anx_mla <- mean_anx %>% filter(age_range == "Mid-Life Adult")
barplot(height = anx_mla$Mean_Anxiety,
        beside = TRUE,
        ylim = c(0, 10),
        xlab = "Favorite Genre",
        ylab = "Mean Self-Reported Anxiety Scores",
        main = "Mean Self-Reported Anxiety Scores of Mid-Life Adults (41 to 64)",
        col = rainbow(length(unique(anx_mla$Fav_genre))),
        legend.text = unique(anx_mla$Fav_genre),
        args.legend = list(x = "topright", bty = "n", cex = 0.4))

anx_eld <- mean_anx %>% filter(age_range == "Elderly")
barplot(height = anx_eld$Mean_Anxiety,
        beside = TRUE,
        xlab = "Favorite Genre",
        ylab = "Mean Self-Reported Anxiety Scores",
        main = "Mean Self-Reported Anxiety Scores of the Elderly (65+)",
        col = rainbow(length(unique(anx_eld$Fav_genre))),
        legend.text = unique(anx_eld$Fav_genre),
        args.legend = list(x = "topright", bty = "n", cex = 1))

Data Exploration Analysis

What do our findings tell us about the data? There’s no doubt that there’s plenty of information to dissect from our graphs pertaining to the variance of favorite genres and self-reported anxiety scores based on different favorite genres among age groups. Rock, Pop, and Metal make up the majority of the selected favorites, with 30.1% saying Rock is their favorite genre, 18.3 % going with Pop, and 14.1% choosing Metal. This also gives us (to a very minimal extent) some insight on the listening habits of Washingtonians, given that the majority of people who completed the survey were Washington State locals. The latter portion of the analysis also gave us rich insight on mean self-reported anxiety scores based on both age group, and genre selected as their favorite. This representation of course, may not be representative of the population based on aggregated counts for means, but nonetheless, it’s interpretation is an interesting one. Let us discuss the upper and lower limits of each category.

Kids - The highest anxiety score was attributed to R&B music, and the lowest to rap. This trend follows my personal experience with music growing up as a teen in the early 2010s, as rap was getting more and more creative and unorthodox, while R&B was yet to innovate and had a “stuck in the past” feel to it that didn’t resonate with the young crowds.

Young Adults - The highest anxiety score was attributed to Folk Music, and the lowest to Gospel. Gospel music is attributed to Christianity, and many young adults find themselves in religion. This may attribute to bettering themselves and having more mental clarity. Folk music is music that transcends generations, and originated from tradition. It may be the case that some young adults are heavily surrounded by this music and while they enjoy it, it somehow affects their anxiety levels. This would be a great study.

Adults - The highest anxiety scores were attributed to both K-Pop and Lofi, and the lowest to R&B. Adults can be reluctant to opening their ears and hearts to newer styles of music, and kpop and lofi are novel semi-mainstream genres. R&B however, is incredibly popular among this age group, especially for those who found love in the 2000s, as is the case for the latter half this age group.

Mid-Life Adults - The highest anxiety scores were attributed to EDM, and the lowest to R&B. This one in particular made me chuckle. EDM is my favorite genre, and the amount of times i’ve been told it’s obnoxious, loud, irritable, etc. is too many to count. R&B however? A classic, and most closely resembles cadences of songs from their childhood.

Elderly - The highest anxiety scores were attributed to Gospel, and the lowest to Rap. It’s difficult to even try and understand this one, as I am confident in saying that very few would have guessed it to be this way.

4. Does listening to more music throughout the day affect self-reported depression levels in young adults? (ages 18 to 22)

survey_ya <- surveyraw %>% filter(age_range == "Young Adult")
plot(survey_ya$Hours_per_day, survey_ya$Depression, 
     main = "Depression Levels vs. Daily Hours Spent Listening to Music for Young Adults",
     xlab = "Hours Spent Listening to Music (Per Day)",
     ylab = "Self-Reported Depression Level")
abline(lm(survey_ya$Depression ~ survey_ya$Hours_per_day))

Looking at the linear model

lm <- lm(survey_ya$Depression ~ survey_ya$Hours_per_day, data = survey_ya)
summary(lm)
## 
## Call:
## lm(formula = survey_ya$Depression ~ survey_ya$Hours_per_day, 
##     data = survey_ya)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.2319 -2.4711  0.0217  2.2753  5.5289 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)               4.2176     0.2888  14.603   <2e-16 ***
## survey_ya$Hours_per_day   0.1268     0.0611   2.075    0.039 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.874 on 253 degrees of freedom
## Multiple R-squared:  0.01673,    Adjusted R-squared:  0.01285 
## F-statistic: 4.306 on 1 and 253 DF,  p-value: 0.039

Linear Model for our variables: Anxiety Score = 0.0611x + 4.2176, where x is the number of hours spent listening to music throughout the day.

The p-value of this linear model is 0.039, so at an alpha level of 0.05, it is statistically significant. Our r-squared value however, is 0.01285, which indicates that age range explains only about 1.29% of the variation of mean depression. This is very weak, but does not mean that there is no correlation at all.

Ethical Concerns

The first ethical concern that struck me was the fact that there were participants under the age of 18 that submitted their information. They signed off on the “permissions” tab, which is their consent in allowing the school to use their information. While they remained anonymous, it still raises a concern on not having some sort of a safeguard that shows at least an effort to ensure that those who can sign their consent are at least of age to do so.

Also, something I would like to consider is how exactly this data is going to be used. While it was a fascinating exploration, it can cause some social damage if they attempt to use this data to sway a particular music genre or listening habit. Lastly, I personally would’ve put a column on whether or not they have had a particular mental health diagnostic in the past. My concern is discrediting those who are perhaps a bit more educated on how to more appropriately scale their levels and reduce the variance to something more appropriate, which will polish our statistical calculations.