Discussion 2

Salient points in “Music recommendations at Scale with Spark” : https://www.youtube.com/watch?v=3LBgiFch4_g

Here are the points I found salient, and have retained.

Software development is an iterative process.

RDD based Key:Value pairs is best practice for big dataset RecSys

Running with larger datasets often results in failed executors and job never recovers : So make partitions.

Spark is much faster than Hadoop when working with RDDs.

ALS and Matrix factorization primitive algorithims have a place in big business.