Please complete the research discussion assignment in a Jupyter or R Markdown notebook. You should post the GitHub link to your research in a new discussion thread.

For this discussion item, please watch the following talk and summarize what you found to be the most important or interesting points. The first half will cover some of the mathematical techniques covered in this unit's reading and the second half some of the data management challenges in an industrial-scale recommendation system.

Please make your post before our meetup on Thursday, and respond to at least one other student's posts by our meetup on Tuesday.

Between Spark and Hadoop, Hadoop has the weakness of having to read and write from disk every time the algorithm is performed. Spark, on the other hand, can load the rating matrix into memory and solves the problem of having to reread the matrix from disk every iteration. What is interesting is the difference in running time between Hadoop and Spark. Hadoop took 10 hours to run the alternating least square algorithm while Spark took 3.5 hours for full gridify and 1.5 for half gridify.