Music Recommendations at Scale with Spark

This week’s topic of discussion is regarding Christopher Johnson’s presentation about the progression of the music recommendation systems at Spotify as well as the use of Spark to handle the scale for the recommendation systems for millions of users and songs.

ALS - Alternating Least Squares:

I found the first half most-interesting because Johnson details the progression of the recommender schema over time using linear algebra concepts that are applicable to our program’s studies.

$\approx 8 minutes into the video$

$\approx$ 8 minutes into the video

The key insight for me was that with user vectors and $X^TX$, you can solve for the optimal item vector (songs). This is because $X^TX$ yields only the songs listened to by the user and therefore only a given user’s listen history is included in the recommendation. This is key because it reduces the amount of data requiring processing significantly.

This demonstration of ALS provided one of the clearest real-world implementations of linear algebra that I have encountered. Further, it provided a good introduction to understanding how very large problems could be broken into subsets for processing across distributed data - something he covers in the second portion of the presentation.

DATA643 Discussion2

jbrnbrg

June 18, 2018

Music Recommendations at Scale with Spark

ALS - Alternating Least Squares: