Spotify Music Recommendation:

Spotify is on demand music recommendation system, it has huge catalog of :

  1. 20 Million Songs
  2. 24 Million Active users
  3. 6 Million paying users
  4. 8 million daily active users
  5. 1TB of compressed data generated from users per day.
  6. 700 node Hadoop cluster
  7. 1 Million years worth music streamed.
  8. 1 Billion users generated playlist

Recommendation at Spotify

  1. Discover (personalized recommendation)
  2. Artist Radio station
  3. Related Artists
  4. Now Playing

Ways to find good recommendations:

  1. Manual Curation: Good of system which has small catalog. Like Songza
  2. Manual Tag Attributes: Experts come and tag bunch of attributes. These attributes are used for recommendation. Cons is lot of valuable t ime is needed to tag attribute.
  3. Audio Content, Metadata, Text Analysis : Look at audio content or text analysis on music blog on news articles.
  4. Collaborative Filtering: Spotify based on this. This is based on what listeners are listening to and finding the relationships and then recommending on that bases.

Explicit Matrix Factorization:

Spotify is based on Implicit Matrix Factorization:

Alternating Least Squares:

  1. Initialize user and item vectors to random noise.
  2. Fix Item vectors and solve optimal user vectors.
  3. Take the derivative of loss function with respect to user’s vector, set equal to 0, and solve.
  4. Result in a system of linear equations with closed form solutions.

  1. Fix user vectors and solve for optimal item vector

  2. Repeat until convergence.

  3. Note that:

Problem

  1. Had to write custom serializer to deal with performance issue.
  2. Running with larger datasets often result in failed executers.