The video talks about how Spotify recommends music to users. With a catalog of more than 40 million songs, they sort through the items using features such as Discover (personalized recommendation), Radio, Related Artists and Now Playing. The way they find recommendations is through some techniques such as Audio Content, Metadata, Text Analysis and Collaborative Filtering.
Explicit Matrix Factorization Explicit input by users regarding their interst in the product. The data is then used to predict user preference for similar items. The way to do this is by approximating the original ratings matrix by multiplying low dimensional user-item matrices. Helps to reduce RMSE which is better.
Implicit Matrix Factorization Most times explicit data is not available so what users may like is implicitly inferred based on what they are listening via their purchase history, browsing history, search patterns,or even mouse movements. Some methods include Alternating Least Square(ALS). ALS is implemented in Spark and is built for a larges-scale collaborative filtering problems. The method also solves scalability and sparseness of the ratings data especially with very large datasets.
Spark Vs Hadoop When processing the data, Hadoop has a lot of overhead and has to read from and write to a disk on every iteration. On the other hand, Spark loads the data into memory, cache it, and join things to where the ratings are cached. The the speed of processing clearly differs significantly where Spark may be up to 100 times faster.