Summary of interesting points

Spotify has about 40M audio files in its data store and is very huge definitely requiring an automated recommender system
ALS - Applied Alternate Least Squares method
Scaling with Hadoop - Applied implicit factorization, but ran into IO overhead troubles
Scaling with Spark - Applying Gridify, Half Gridify techniques which run much faster than Hadoop
Speed: Half gridify is the fastest techniques which delivered results when tested with a huge dataset
Learnings - PairRDDFunctions to group by particular data and assign nodes to work on it
Learnings - Better write own or use kryo serializers vs regular java serializers
Learnings - Running with larger datasets results in failed executors

612_Spark_RecommenderSystem_discussion