Synopsis:
This was a great video at a high-level of how Spotify make recommendations to its end-users.
Recommendation at Spotify utilized the following:
While Netflix, one of the best recommender systems in the business uses Collaborative Filtering & Explicit Factorization, Spotify uses an Implicit Factorization format which has subtle differences.
In explicit factorization:
The motivation here is to get \(r_{ui}\) = \(q^{T}_{i}\)*\(p_{u}\)
Penalized by some \(\lambda\) to correct for overfitting, as illustrated below
However in Spotify’s case, an Implicit Factorization is applied instead,
Instead of explicitly capturing the users’ ratinigs like in Netflix, Spotify simply used a binary system. With 1 = streamed while 0 = never streamed. The structure looks similar to the explicit and the motivation is still the same; minimization of the RSME but on a weighted basis contraints by some regularization factor, \(\lambda\). Regularization is simply a constraint on the miminzation problem to prevent overfitting.
In the you-tube video, the Alternating Least Squares was shown to solve this problem: The main idea is to alternate between holding the songs fixed and solving for the users and holding the users fixed and solve for the songs vectors. This back and forth is why its called the Alternating Least Squares Method:
Initialize user and item vectors to random noise
Fix Item vectors and solve optimal user vectors which makes it a weighted ridge regression
Take the derivative of loss function with respect to user’s vector, set equal to 0, and solve
Result in a system of linear equations with closed form solutions
Fix user vectors and solve for optimal item vector
Repeat until convergence
Although I was able to follow and understand the motiviation of using factorization and the Alternating Least Squares method at a high-level, I would love for the Professor to show a detailed numerical example as to how this can be implemented at the math and code level. Perhaps start with a simple utility matrix and go from there.
In the case of scaling and the problems associated with scaling to an industrial size, I was “lost” and since I have no experience nor knowledge of Hadoop or Spark, so I again, love to have a detailed explanations of scaling in class either by my peers or the professor.