DAT 612 - Discussion 2

Music Recommendations at Scale with Spark by Christopher Johnson (Spotify)

Synopsis:

This was a great video at a high-level of how Spotify make recommendations to its end-users.

Recommendation at Spotify utilized the following:

“Experts” performed manual tagging on certain attributes; these attributes used for recommendations but the dis-advantages is the huge amount of time needed to process the taggings -Audio Content, Metadata, Text Analysis : Look at audio content or text analysis on music blog on news articles. Collaborative Filtering: Spotify uses this. This is based on what listeners are listening to and finding the relationships and then recommending on that basis; related artists

While Netflix, one of the best recommender systems in the business uses Collaborative Filtering & Explicit Factorization, Spotify uses an Implicit Factorization format which has subtle differences.

In explicit factorization:

The motivation here is to get \(r_{ui}\) = \(q^{T}_{i}\)*\(p_{u}\)
Penalized by some \(\lambda\) to correct for overfitting, as illustrated below

Discussion I

However in Spotify’s case, an Implicit Factorization is applied instead,

Instead of explicitly capturing the users’ ratinigs like in Netflix, Spotify simply used a binary system. With 1 = streamed while 0 = never streamed. The structure looks similar to the explicit and the motivation is still the same; minimization of the RSME but on a weighted basis contraints by some regularization factor, \(\lambda\). Regularization is simply a constraint on the miminzation problem to prevent overfitting.

In the you-tube video, the Alternating Least Squares was shown to solve this problem: The main idea is to alternate between holding the songs fixed and solving for the users and holding the users fixed and solve for the songs vectors. This back and forth is why its called the Alternating Least Squares Method:

Initialize user and item vectors to random noise
Fix Item vectors and solve optimal user vectors which makes it a weighted ridge regression
Take the derivative of loss function with respect to user’s vector, set equal to 0, and solve
Result in a system of linear equations with closed form solutions
Fix user vectors and solve for optimal item vector
Repeat until convergence

Summary & Recommendation

Although I was able to follow and understand the motiviation of using factorization and the Alternating Least Squares method at a high-level, I would love for the Professor to show a detailed numerical example as to how this can be implemented at the math and code level. Perhaps start with a simple utility matrix and go from there.
In the case of scaling and the problems associated with scaling to an industrial size, I was “lost” and since I have no experience nor knowledge of Hadoop or Spark, so I again, love to have a detailed explanations of scaling in class either by my peers or the professor.

DAT 612 - Discussion 2

sufian

6/7/2020

Music Recommendations at Scale with Spark by Christopher Johnson (Spotify)

Discussion I

Summary & Recommendation