The Data Set, Music & Mental Health Survey Results, provided by Kaggle.com, includes data collected from survey results on music taste and self-reported mental health.
For the purposes of this analysis, the data visualizations focus on data collected about listener age, daily listening habits, favorite genres, self assessed mental health scores, and the effects of music on their mental health.
The first data visualization shows the total scores of all listeners self assessed mental health sypmtoms. It shows the overall score of each condition reported by each listener. The conditions discussed in this data analysis are Anxiety, Depression, Insomnia, and OCD.
import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
df=pd.read_csv('mxmh_survey_results.csv')
df=df.drop(columns=['Timestamp','Instrumentalist','Composer','Exploratory','Foreign languages','Permissions'])
df=df.rename(columns={'Primary streaming service':'Streaming_Service','Hours per day':'Hrs_Daily','While working':'While_Working','Fav genre':'Fav_Genre','Frequency [Classical]':'Classical_FRQ','Frequency [Country]':'Country_FRQ','Frequency [EDM]':'EDM_FRQ','Frequency [Folk]':'Folk_FRQ','Frequency [Gospel]':'Gospel_FRQ','Frequency [Hip hop]':'HipHop_FRQ','Frequency [Jazz]':'Jazz_FRQ','Frequency [K pop]':'KPop_FRQ','Frequency [Latin]':'Latin_FRQ','Frequency [Lofi]':'Lofi_FRQ','Frequency [Metal]':'Metal_FRQ','Frequency [Pop]':'Pop_FRQ','Frequency [R&B]':'RB_FRQ','Frequency [Rap]':'Rap_FRQ','Frequency [Rock]':'Rock_FRQ','Frequency [Video game Music]':'VideoGame_FRQ','Music effects':'Music_Effects'})
newdf=df.dropna()
Anxiety_df=newdf.drop(columns=['Depression','Insomnia','OCD'])
Depression_df=newdf.drop(columns=['Anxiety','Insomnia','OCD'])
Insomnia_df=newdf.drop(columns=['Anxiety','Depression','OCD'])
OCD_df=newdf.drop(columns=['Anxiety','Depression','Insomnia'])
plt.title('Distribution of Mental Illness Disorders')
plt.hist([Anxiety_df.Anxiety,Depression_df.Depression,Insomnia_df.Insomnia,OCD_df.OCD],
bins=np.arange(0,10,1),
stacked=True);
plt.legend(['Anxiety','Depression','Insomnia','OCD'])
plt.show()
The second data visualization shows the overall scores of each condition based on listeners that reported their mental health improving as a result of music listening habits.
AnxietyDF=newdf[["Age","Anxiety","Music_Effects"]]
DepressionDF=newdf[["Age","Depression","Music_Effects"]]
InsomniaDF=newdf[["Age","Insomnia","Music_Effects"]]
OCDDF=newdf[["Age","OCD","Music_Effects"]]
plt.figure(figsize=(16,10),dpi=80)
sns.kdeplot(AnxietyDF.loc[AnxietyDF['Music_Effects']=="Improve","Anxiety"],shade=True,color="g",label="Anxiety",alpha=.7)
sns.kdeplot(DepressionDF.loc[DepressionDF['Music_Effects']=="Improve","Depression"],shade=True,color="deeppink",label="Depression",alpha=.7)
sns.kdeplot(InsomniaDF.loc[InsomniaDF['Music_Effects']=="Improve","Insomnia"],shade=True,color="dodgerblue",label="Insomnia",alpha=.7)
sns.kdeplot(OCDDF.loc[OCDDF['Music_Effects']=="Improve","OCD"],shade=True,color="orange",label="OCD",alpha=.7)
plt.xlabel("")
plt.title('Density Plot of Improved Mental Health by Illness',fontsize=22)
plt.legend()
plt.show()
The third data visualization shows how many people that participated in the survey reported listening to music on which streaming service plateform.
streamingdf=newdf.groupby(['Streaming_Service'])['Streaming_Service'].count().reset_index(name='count')
newstreamdf=streamingdf.astype('str')
streamingdf=pd.DataFrame(newstreamdf)
plt.figure(figsize=(8,8))
labels=streamingdf['Streaming_Service']
sizes=streamingdf['count']
colors=['#FF0000','#FFA500','#00FF00','#FFFF00','#FF69B4','#0000FF']
textprops={"fontsize":8,'color':'black'}
plt.pie(sizes,labels=labels,colors=colors,autopct='%.2f%%',pctdistance=0.9,shadow=False,textprops=textprops,wedgeprops={'linewidth':3.0,'edgecolor':'white'},)
centre_circle=plt.Circle((0,0),0.65,color='grey',fc='white',linewidth=1.00)
fig=plt.gcf()
fig.gca().add_artist(centre_circle)
plt.axis('equal')
plt.show()
The fourth data visualization illustrates what age range is prodominantly listening each genre. The outliers help highlight which age ranges are most involved in this survey.
plt.figure(figsize=(30,30),dpi=80)
sns.boxplot(x='Fav_Genre',y='Age',data=newdf,notch=False)
def add_n_obs(newdf,group_col,y):
medians_dict={grp[0]:grp[1][y].median()for grp in df.groupby(group_col)}
xticklabels=[x.get_text()for x in plt.gca().get_xticklabels()]
n_obs=newdf.groupby(group_col)[y].size().values
for (x,xticklabel),n_obs in zip(enumerate(xticklabels),n_obs):
plt.text(x,medians_dict[xticklabel]*1.01,"#obs : "+str(n_obs),horizontalalignment='center',fontdict={'size':14},color='white')
add_n_obs(newdf,group_col='Fav_Genre',y='Age')
plt.title('Box Plot: Listener Age by Favorite Genere',fontsize=22)
plt.ylim(12,89)
plt.show()
The fifth data visualization displays the same information but also incorporates the hours spent listening to music. This further implies how the daily listening habits can impact mental health and influence whether it improves or worsens.
genres=np.unique(newdf['Fav_Genre'])
colors=[plt.cm.tab10(i/float(len(genres)-1))for i in range(len(genres))]
plt.figure(figsize=(16,10),dpi=80,facecolor='w',edgecolor='k')
for i, Fav_Genre in enumerate(genres):
plt.scatter('Age','Hrs_Daily',
data=newdf.loc[newdf.Fav_Genre==Fav_Genre,:],
s=20,c=colors[i],label=str(Fav_Genre))
plt.gca().set(xlim=(0,100),ylim=(0,20),
xlabel='Age',ylabel='# of Listening Hours per Day')
plt.xticks(fontsize=12);plt.yticks(fontsize=12)
plt.title("Scatterplot of Favorite Genres: Age vs Listening Hours per Day ",fontsize=22)
plt.legend(fontsize=12)
plt.show()
Similarly to the previous two, the last data visualization shows that listening to music mostly improves mental health regardless of daily listening habits or their age. Majority of the handful self reported worsened mental health can be attributed to younger adults who do not spend a significant amount of time listening to music and this should be taken into consideration.
df_select=newdf.loc[newdf.Music_Effects.isin(["Improve","Worsen"]),:]
sns.set_style("white")
gridobj=sns.lmplot(x="Age",y="Hrs_Daily",
data=df_select,
height=7,
robust=True,
palette='Set1',
col="Music_Effects",
scatter_kws=dict(s=60,linewidths=.7,edgecolors='black'))
gridobj.set(xlim=(10,90),ylim=(0,24))
Ultimately, it appears that listening to music, at any threshold, improves mental health.The context provided on this data set on Kaggle.com explains that Music Therapy is recognized as an evidence-based practice and listening to more music that you enjoy is what aids in producing “happy” hormones to improve mood.