Salient points in “Music recommendations at Scale with Spark” : https://www.youtube.com/watch?v=3LBgiFch4_g
Here are the points I found salient, and have retained.
Software development is an iterative process.
RDD based Key:Value pairs is best practice for big dataset RecSys
Running with larger datasets often results in failed executors and job never recovers : So make partitions.
Spark is much faster than Hadoop when working with RDDs.
ALS and Matrix factorization primitive algorithims have a place in big business.