Spotify User Mood Prediction

9 December 2019

Business Context

WHAT? The problem is to understand the mood of a listener to recommend tracks based on their current mood
WHY? We want to enhance the listener’s user experience. So, imagine if an angry person gets suggestion to listen to party music, it may lead to a bad user experience.
HOW? Develop a predictive analytical product that can predict the mood of a person, provided we have data about his/her recently played songs.

The above mentioned data files contain the information that we have used for our analysis

Recap of our proposal:

We were able to acheive what we have proposed with an accuracy better than what was initially expected.

Taking into account the time at which a user listens to a particular genre: No time stamps
Are there more attributes available? If yes, could you prioritize among all the available attributes to make better decisions?: Yes available. But we don’t need to consider them as we are able to get same results with the default 5 attributes.
Do you collect the mood data from the same APIs as other 5 features? : No, we labelled the clusters with critical human thinking
How can you use this to boost your profit? Rule 1: Know your customer better to gain the revenue.

The data used in this project was collected by downloading a list of 3,000 artists from the following github repository
This list of artists are simply the top 3,000 most popular artists in the United States
These songs have 5 musical attributes as follows:

No other data was pulled outside of the API.
Sample size in your collected data were more than 20,000 instances.
R package to make API call-Spotify https://cran.r-project.org/web/packages/spotifyr
Some limitations on the number, frequency, and speed of your API calls were addressed by proper planning.

Each audio feature describes the song using numeric values.
Every feature has a scale which varies from 0 to 1 except loudness.
Loudness is on a scale from -60db to 0db and was thus rescaled to vary from 0 to 1.
Only required pre-processing (normalizing). The data was normalized so that all variables are on the same scale.

The collected data was visualized using several methods.
The first method included plotting a distance matrix.
Therefore, it seems reasonable to use some type of distance metric and classify the set of songs using some type of clustering algorithm.
Use K-meand Elbow Method to find optimal number of clusters. K-means Elbow Method results indicate optimal number of clusters = 5.

Add cluster labels to original dataset
We choose to use K-means clustering since it is a simple, intuitive and time-proven method for clustering. To determine the optimal number of clusters
Split the data into training and testing dataset
Start training supervised classification algorithms on the training set.

We have tried SVM and Random forest.
We have acheived 0.98 F1 score with SVM and stood out to be the best
Other algorithms performed well too.
Key take-aways from your findings: Making use of this important data to monitor a person’s mood on real time basis is a sophisticated process for Spotify. However, we believe that the reason how these insights translate into money in a business setting is worth the investment of time and money to implement this anlytical tool.