For this discussion item, please watch the following talk and summarize what you found to be the most important or interesting points. The first half will cover some of the mathematical techniques covered in this unit’s reading and the second half some of the data management challenges in an industrial-scale recommendation system. Video is about music recommendations at Scale which can be found at http://www.youtube.com/watch?v=3LBgiFch4_g
This video showed how spotify recommends music using Spark. Spotify is a platform which has a large catalogue of musics and it uses mainly collaborative filtering to recommend songs to the users. It recommends personalized recommendations with different channels i.e. radio, now playing, discover, etc. Spotify uses audio content, text, metadata and collaborative filtering. It uses binary labels with 1 as streamed and 0 as never streamed and then measure RMSE and evaluates the recommendation model. It has once used Hadoop but due to big data influx it was moved to Spark in 2014. Reason was Spark works better and efficiently as compared with Hadoop. It saved time and memory which was very significant as it had 40 million users and 20 million songs. It would have become very inefficient with Hadoop. Also, it couldn’t use recommendation system with the entire dataset as one of its limitation.