Christopher Johnson showcased the evolution of Spotify’s music recommender system. The presentation also gives insight into various approaches utilized by Spotify such as auto content, meta, text analysis and content filtering versus manual curation, manually tagging attributes which are some methods employed by Spotify’s competitors. Insight was given into explicit and implicit matrix factorization, full and half gridify classification, Hadoop and Spark.
The explicit approach takes a user’s previous ratings (e.g. rating a song 4 out of 5 stars) into consideration and uses it to predict songs that would be of interest to the user. Implicit, however, does not take users actual rating into consideration but it takes whether the user has played a song and apply a weight based on the number of times the user listen’s to the song.
Hadoop and Spark facilitate the processing of large datasets however Hadoop utilizes disk space for storage during calculations while Spark utilizes system memory. The presentation seemed biased towards spark since the use of the system memory along with caching outperforms Hadoop I/O approach. However, Disk space is far cheaper than system memory which may imply that deploying a Spark solution could be more costly than a Hadoop implementation.