Spotify Music Recommendation:
Spotify is on demand music recommendation system, it has huge catalog of :
- 20 Million Songs
- 24 Million Active users
- 6 Million paying users
- 8 million daily active users
- 1TB of compressed data generated from users per day.
- 700 node Hadoop cluster
- 1 Million years worth music streamed.
- 1 Billion users generated playlist
Recommendation at Spotify
- Discover (personalized recommendation)
- Artist Radio station
- Related Artists
- Now Playing
Ways to find good recommendations:
- Manual Curation: Good of system which has small catalog. Like Songza
- Manual Tag Attributes: Experts come and tag bunch of attributes. These attributes are used for recommendation. Cons is lot of valuable t ime is needed to tag attribute.
- Audio Content, Metadata, Text Analysis : Look at audio content or text analysis on music blog on news articles.
- Collaborative Filtering: Spotify based on this. This is based on what listeners are listening to and finding the relationships and then recommending on that bases.
Explicit Matrix Factorization:
Spotify is based on Implicit Matrix Factorization:
Alternating Least Squares:
- Initialize user and item vectors to random noise.
- Fix Item vectors and solve optimal user vectors.
- Take the derivative of loss function with respect to user’s vector, set equal to 0, and solve.
- Result in a system of linear equations with closed form solutions.
Fix user vectors and solve for optimal item vector
Repeat until convergence.
Note that:
Problem
- Had to write custom serializer to deal with performance issue.
- Running with larger datasets often result in failed executers.