This project aim to build a complete recommendation system using various recommendation techniques. The project will build an recommender system that will provide solution to following
The dataset for this project was originally posted on snap website http://snap.stanford.edu/data/web-FineFoods.html and now available also on kaggle site. The dataset contains 500k reviews of 74k products from 250k users. As the data is parse
Citation for data
J. McAuley and J. Leskovec. From amateurs to connoisseurs: modeling the evolution of user expertise through online reviews. WWW, 2013. http://i.stanford.edu/~julian/pdfs/www13.pdf
Exploratory analysis helps understand the data and identify any scaling or normalization required. The data will also be checked for missing values, distribution etc in order to identify any data preparation that is required.
Though the data contains reviews for 74k products many products does not have enough reviews and so could not be considered for collaborative filtering. The data preparation step will analyze the rating and if required scale and/or normalize the data. For collaborative filtering subset of data that has enough information will be identified. This step will also split the dataset to testing and training data in order to evaluate the model.
This project will address cold start problem from a user perspective (that is recommending item for new user). As user profile is not available the recommendation for new user will be made based on popularity of items.
A collaborative system that always recommends same set of products is not attractive. Also recommending only items based on a recommendation scheme restrict opportunity for user to explore other products. In order to provide better choice a explore/exploit scheme will be used for recommendation.
Following recommendation methods will be explored to come up with suitable recommender
Based on the selected explore / exploit scheme, part of recommendation will be made based the scoring provided by recommendation system and the rest of the recommendation will be made using the explore method identified.
The performance of the model will be evaluated on test set. Performance of various recommendation methods will be compared.
The overall performance of the system will be analyzed and how the system responds to different ratio of explore/exploit recommendation will be studied.
The outcome of the project as well as learning will be summarized and presented.