Your task is to analyze an existing recommender system that you find interesting. You should:
Perform a Scenario Design analysis as described below. Consider whether it makes sense for your selected recommender system to perform scenario design twice, once for the organization (e.g. Amazon.com) and once for the organization’s customers.
Attempt to reverse engineer what you can about the site, from the site interface and any available information that you can find on the Internet or elsewhere.
Include specific recommendations about how to improve the site’s recommendation capabilities going forward.
Create your report using an R Markdown file, and create a discussion thread with a link to the GitHub repo where your Markdown file notebook resides. You are not expected to need to write code for this discussion assignment.
Definition:
Collaborative filtering (CF) is a recommender systems technique that helps people discover items that are most relevant to them.
What are the target users?
Facebook’s average dataset for CF has:
- 100 billion ratings
- more than a billion users
- millions of items
What are their key goals?
Facebook uses historical item ratings of like-minded people to predict how someone would rate an item. This might include pages, groups, events, games, and more. CF is based on the idea that the best recommendations come from people who have similar tastes.
How can you help them accomplish their goals?
- By better matrix factorization or by parent/child tree implementation
- By better item mapping and iterations
- By smart partitioning and provisioning to reduce huge amount of network traffic
Item recommendation computation:
In order to get the actual recommendations for all users, we need to find items with highest predicted ratings for each user. When dealing with the huge data sets, checking the dot product for each (user, item) pair becomes unfeasible, even if we distribute the problem to more workers. We needed a faster way to find the top K recommendations for each user, or a good approximation of it.
One possible solution is to use a ball tree data structure to hold item vectors. A ball tree is a binary tree where leafs contain some subset of item vectors, and each inner node defines a ball that surrounds all vectors within its subtree.