9 December 2019

Business Context

  • WHAT? The problem is to understand the mood of a listener to recommend tracks based on their current mood

  • WHY? We want to enhance the listener’s user experience. So, imagine if an angry person gets suggestion to listen to party music, it may lead to a bad user experience.

  • HOW? Develop a predictive analytical product that can predict the mood of a person, provided we have data about his/her recently played songs.

Key to our data vault

Folder that contains data we collected from the API: https://github.com/doyeunlim87/CIS-8392-Course-Project/

The above mentioned data files contain the information that we have used for our analysis

Recap of our proposal:

We were able to acheive what we have proposed with an accuracy better than what was initially expected.

Answers to top peer comments

  • Taking into account the time at which a user listens to a particular genre: No time stamps

  • Are there more attributes available? If yes, could you prioritize among all the available attributes to make better decisions?: Yes available. But we don’t need to consider them as we are able to get same results with the default 5 attributes.

  • Do you collect the mood data from the same APIs as other 5 features? : No, we labelled the clusters with critical human thinking

  • How can you use this to boost your profit? Rule 1: Know your customer better to gain the revenue.

Data Collection

  • The data used in this project was collected by downloading a list of 3,000 artists from the following github repository

  • This list of artists are simply the top 3,000 most popular artists in the United States
  • These songs have 5 musical attributes as follows:

  1. Acousticness
  2. Danceability
  3. Liveliness
  4. Loudness
  5. Speechiness

Data Collection- Cont.

  • No other data was pulled outside of the API.
  • Sample size in your collected data were more than 20,000 instances.
  • R package to make API call-Spotify https://cran.r-project.org/web/packages/spotifyr
  • Some limitations on the number, frequency, and speed of your API calls were addressed by proper planning.

Data Wrangling

  • Each audio feature describes the song using numeric values.
  • Every feature has a scale which varies from 0 to 1 except loudness.
  • Loudness is on a scale from -60db to 0db and was thus rescaled to vary from 0 to 1.
  • Only required pre-processing (normalizing). The data was normalized so that all variables are on the same scale.

Data Analytics

  • The collected data was visualized using several methods.
  • The first method included plotting a distance matrix.
  • Therefore, it seems reasonable to use some type of distance metric and classify the set of songs using some type of clustering algorithm.
  • Use K-meand Elbow Method to find optimal number of clusters. K-means Elbow Method results indicate optimal number of clusters = 5.

Data Analytics- Cont.

  • Add cluster labels to original dataset
  • We choose to use K-means clustering since it is a simple, intuitive and time-proven method for clustering. To determine the optimal number of clusters
  • Split the data into training and testing dataset
  • Start training supervised classification algorithms on the training set.

Conclusion

  • We have tried SVM and Random forest.
  • We have acheived 0.98 F1 score with SVM and stood out to be the best
  • Other algorithms performed well too.
  • Key take-aways from your findings: Making use of this important data to monitor a person’s mood on real time basis is a sophisticated process for Spotify. However, we believe that the reason how these insights translate into money in a business setting is worth the investment of time and money to implement this anlytical tool.