I will be building a recommender system on the book-crossings dataset containing 1.1 million ratings, 270000 items, and 90000 ratings. Some ratings are explicit on a scale from 1-10, and some ratings are implicit. There are certain challenges pertaining to this dataset including the different rating types and the sparsity of ratings.
I hope to implement 2 versions of this recommender. The first will primarily employ an ALS algorithm based on the implicit ratings, and the second will employ a FunkSVD algorithm with the explicit ratings. Both recommenders will need to include hybrid portions to solve the cold start problem due to the sparsity of the matrix. Users with a number of ratings below a certain threshold will be given content based recommendations. I will also hold out some users from the dataset and employ a user-based collaborative filtering system in order to make predictions for new users. This will compare similarity between users within the latent factor space.
Lastly I will attempt to employ a metric other than accuracy in order to increase the long lasting potential of the recommender. All of this will implemented in Spark.