Please complete the research discussion assignment in a Jupyter or R Markdown notebook. You should post the GitHub link to your research in a new discussion thread.
For this discussion item, please watch the following talk and summarize what you found to be the most important or interesting points. The first half will cover some of the mathematical techniques covered in this unit’s reading and the second half some of the data management challenges in an industrial-scale recommendation system.
Please make your post before our meetup on Thursday, and respond to at least one other student’s posts by our meetup on Tuesday.
Chris Johnson explain a different concept very nicely and this video There are certain concepts which I found most interesting in the video.
Manually tag attributes
Collaborative Filtering
Implicit/Explicit Matrix Factorization
Audio Content, Metadata, Text Analysis
Implicit Matrix Factorization with Hadoop
Technique
Running with larger datasetes often results in failed execution beacuse it required a lot of local memory. They are using Hadoop and spark solve the issue. Loading the ratings Matrix into bunch of blocks and grouping K x L partitions. He Partition with Block ID and cache into memory so the reduce time to read from disk from every iteration. It can be joining it to where the ratings are cached and keep performing the iterations. It also reduces the memory requirements
** runtime **