Python Data Visualization: Music & Mental Health Survey Results

Introduction

The Data Set, Music & Mental Health Survey Results, provided by Kaggle.com, includes data collected from survey results on music taste and self-reported mental health.

For the purposes of this analysis, the data visualizations focus on data collected about listener age, daily listening habits, favorite genres, self assessed mental health scores, and the effects of music on their mental health.

Distribution of Mental Illness’ According to Listeners Self Survey

The first data visualization shows the total scores of all listeners self assessed mental health sypmtoms. It shows the overall score of each condition reported by each listener. The conditions discussed in this data analysis are Anxiety, Depression, Insomnia, and OCD.


import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns

df=pd.read_csv('mxmh_survey_results.csv')

df=df.drop(columns=['Timestamp','Instrumentalist','Composer','Exploratory','Foreign languages','Permissions'])

df=df.rename(columns={'Primary streaming service':'Streaming_Service','Hours per day':'Hrs_Daily','While working':'While_Working','Fav genre':'Fav_Genre','Frequency [Classical]':'Classical_FRQ','Frequency [Country]':'Country_FRQ','Frequency [EDM]':'EDM_FRQ','Frequency [Folk]':'Folk_FRQ','Frequency [Gospel]':'Gospel_FRQ','Frequency [Hip hop]':'HipHop_FRQ','Frequency [Jazz]':'Jazz_FRQ','Frequency [K pop]':'KPop_FRQ','Frequency [Latin]':'Latin_FRQ','Frequency [Lofi]':'Lofi_FRQ','Frequency [Metal]':'Metal_FRQ','Frequency [Pop]':'Pop_FRQ','Frequency [R&B]':'RB_FRQ','Frequency [Rap]':'Rap_FRQ','Frequency [Rock]':'Rock_FRQ','Frequency [Video game Music]':'VideoGame_FRQ','Music effects':'Music_Effects'})

newdf=df.dropna()

Anxiety_df=newdf.drop(columns=['Depression','Insomnia','OCD'])
Depression_df=newdf.drop(columns=['Anxiety','Insomnia','OCD'])
Insomnia_df=newdf.drop(columns=['Anxiety','Depression','OCD'])
OCD_df=newdf.drop(columns=['Anxiety','Depression','Insomnia'])

plt.title('Distribution of Mental Illness Disorders')
plt.hist([Anxiety_df.Anxiety,Depression_df.Depression,Insomnia_df.Insomnia,OCD_df.OCD],
        bins=np.arange(0,10,1),
        stacked=True);
plt.legend(['Anxiety','Depression','Insomnia','OCD'])
plt.show()

Improved Mental Health by Mental Illness

The second data visualization shows the overall scores of each condition based on listeners that reported their mental health improving as a result of music listening habits.

AnxietyDF=newdf[["Age","Anxiety","Music_Effects"]]
DepressionDF=newdf[["Age","Depression","Music_Effects"]]
InsomniaDF=newdf[["Age","Insomnia","Music_Effects"]]
OCDDF=newdf[["Age","OCD","Music_Effects"]]

plt.figure(figsize=(16,10),dpi=80)

sns.kdeplot(AnxietyDF.loc[AnxietyDF['Music_Effects']=="Improve","Anxiety"],shade=True,color="g",label="Anxiety",alpha=.7)
sns.kdeplot(DepressionDF.loc[DepressionDF['Music_Effects']=="Improve","Depression"],shade=True,color="deeppink",label="Depression",alpha=.7)
sns.kdeplot(InsomniaDF.loc[InsomniaDF['Music_Effects']=="Improve","Insomnia"],shade=True,color="dodgerblue",label="Insomnia",alpha=.7)
sns.kdeplot(OCDDF.loc[OCDDF['Music_Effects']=="Improve","OCD"],shade=True,color="orange",label="OCD",alpha=.7)

plt.xlabel("")
plt.title('Density Plot of Improved Mental Health by Illness',fontsize=22)
plt.legend()
plt.show()

Number of Listeners per Streaming Service

The third data visualization shows how many people that participated in the survey reported listening to music on which streaming service plateform.

streamingdf=newdf.groupby(['Streaming_Service'])['Streaming_Service'].count().reset_index(name='count')
newstreamdf=streamingdf.astype('str')
streamingdf=pd.DataFrame(newstreamdf)

plt.figure(figsize=(8,8))
labels=streamingdf['Streaming_Service']
sizes=streamingdf['count']
colors=['#FF0000','#FFA500','#00FF00','#FFFF00','#FF69B4','#0000FF']
textprops={"fontsize":8,'color':'black'}
plt.pie(sizes,labels=labels,colors=colors,autopct='%.2f%%',pctdistance=0.9,shadow=False,textprops=textprops,wedgeprops={'linewidth':3.0,'edgecolor':'white'},)

centre_circle=plt.Circle((0,0),0.65,color='grey',fc='white',linewidth=1.00)
fig=plt.gcf()
fig.gca().add_artist(centre_circle)
plt.axis('equal')

plt.show()

Listener Age by Favorite Genre

The fourth data visualization illustrates what age range is prodominantly listening each genre. The outliers help highlight which age ranges are most involved in this survey.

plt.figure(figsize=(30,30),dpi=80)
sns.boxplot(x='Fav_Genre',y='Age',data=newdf,notch=False)

def add_n_obs(newdf,group_col,y):
    medians_dict={grp[0]:grp[1][y].median()for grp in df.groupby(group_col)}
    xticklabels=[x.get_text()for x in plt.gca().get_xticklabels()]
    n_obs=newdf.groupby(group_col)[y].size().values
    for (x,xticklabel),n_obs in zip(enumerate(xticklabels),n_obs):
        plt.text(x,medians_dict[xticklabel]*1.01,"#obs : "+str(n_obs),horizontalalignment='center',fontdict={'size':14},color='white')
        
add_n_obs(newdf,group_col='Fav_Genre',y='Age')
        
plt.title('Box Plot: Listener Age by Favorite Genere',fontsize=22)
plt.ylim(12,89)

plt.show()

Number of Listening Hours/Day by Age per Favorite Genre

The fifth data visualization displays the same information but also incorporates the hours spent listening to music. This further implies how the daily listening habits can impact mental health and influence whether it improves or worsens.

genres=np.unique(newdf['Fav_Genre'])
colors=[plt.cm.tab10(i/float(len(genres)-1))for i in range(len(genres))]
plt.figure(figsize=(16,10),dpi=80,facecolor='w',edgecolor='k')

for i, Fav_Genre in enumerate(genres):
    plt.scatter('Age','Hrs_Daily',
               data=newdf.loc[newdf.Fav_Genre==Fav_Genre,:],
               s=20,c=colors[i],label=str(Fav_Genre))

plt.gca().set(xlim=(0,100),ylim=(0,20),
            xlabel='Age',ylabel='# of Listening Hours per Day')

plt.xticks(fontsize=12);plt.yticks(fontsize=12)

plt.title("Scatterplot of Favorite Genres: Age vs Listening Hours per Day ",fontsize=22)
plt.legend(fontsize=12)
plt.show()

Music Effects: Number of Listening Hours/Day by Age

Similarly to the previous two, the last data visualization shows that listening to music mostly improves mental health regardless of daily listening habits or their age. Majority of the handful self reported worsened mental health can be attributed to younger adults who do not spend a significant amount of time listening to music and this should be taken into consideration.


df_select=newdf.loc[newdf.Music_Effects.isin(["Improve","Worsen"]),:]

sns.set_style("white")
gridobj=sns.lmplot(x="Age",y="Hrs_Daily",
                  data=df_select,
                  height=7,
                  robust=True,
                  palette='Set1',
                  col="Music_Effects",
                  scatter_kws=dict(s=60,linewidths=.7,edgecolors='black'))

gridobj.set(xlim=(10,90),ylim=(0,24))

Conclusion

Ultimately, it appears that listening to music, at any threshold, improves mental health.The context provided on this data set on Kaggle.com explains that Music Therapy is recognized as an evidence-based practice and listening to more music that you enjoy is what aids in producing “happy” hormones to improve mood.