Summary and Reflection on Music Recommendations at Scale with Spark - Christopher Johnson (Spotify)
https://www.youtube.com/watch?v=3LBgiFch4_g&feature=youtu.be
Christopher Johnson discussed how to use matrix factorization to create a recommender system. Johnson described writing the matrix of rows of users and columns of songs that lists the users’ ratings as the product of two lower dimensional matrices.
Co-clustering involves grouping similar users and similar items into categories at the same time. I read an example from the following website that explained this very clearly- https://datasciencemadesimpler.wordpress.com/tag/alternating-least-squares/ Let’s say that there are 2 users that read similar types of news articles. Even if the two users match up perfectly, it is unlikely that they will read the exact same articles because of the vast numbers of articles available. Therefore clustering users based on those who read the same articles is likely to be unproductive and co-clustering will be useful.
Matrix factorization can be used to solve co-clustering problems. The following diagram is also from the article I cited above: The ratings matrix can be written as the product of two lower dimensional matrices, seen here as U (user vector) and P (songs vector). Each rating in the matrix R is affected by k effects, and the number k is the width of the matrices U and P. (I’m not sure if width is the most appropriate word, but it can be seen in the diagram.)
Alternating Least Squares
To determine the values of the matrices U and P, one can fix the P matrix (songs) and then use the method of ordinary least squares to solve for U(users vector). One can then fix U (users vector) and then solve for P vector (songs) using ordinary least squares. Johnson described going back and forth between these two procedures until they converge.