1 Project Instructions

The goal for your final project is for you to build out a recommender system using a large dataset (ex: 1M+ ratings or 10k+ users, 10k+ items. There are three deliverables, with separate dates.

The requirements for the Final Project Planning Document (Proposal) are as per below:

[1] Planning Document :

Find an interesting dataset and describe the system you plan to build out. If you would like to use one of the datasets you have already worked with, you should add a unique element or incorporate additional data. (i.e. explicit features you scrape from another source, like image analysis on movie posters). The overall goal, however, will be to produce quality recommendations by extracting insights from a large dataset. You may do so using Spark, or another distributed computing method, OR by effectively applying one of the more advanced mathematical techniques we have covered. There is no preference for one over the other, as long as your recommender works! The planning document should be written up and published as a notebook on GitHub or in RPubs. Please submit the link in the Unit 4 folder, due Thursday, July 5.


2 Rationale

Books are our best friends!!

We yearn, learn and travel a different journey every time we read a book!

While we know that each person is different, the thirst for knowledge, creativity , entertainment, the therapeutic and satiating feeling can be provided by a book of interest. And to each one’s own..

Through “Amigo De Libro” we make an effort to put you in good company of yet another friend, a good book, who may take you on to a magical journey.

2.1 Amigo De Libro, RecSys, answers..

What should a particular user who is reading a book on R Programming be recommended to read next? If this user reads computer-related books and children’s books, how do we represent his or her interest in books?.

User who is interested in R programming and also in Harry potter book may also like Practical Statistics for Data Scientists or even love The Cicada Prophecy, where scientists discover a cure for aging and everyone rushes to drink from the fountain of youth

2.2 .. continued..

This recommender system answers questions like these and strives to make a book reader’s experience both an exciting and rewarding one when buying books through novelty and diversity

The reader gets recommendations personally made for him or her, based on certain attributes: the reader’s age, the book’s publication date, etc.

The recommender system utilizes techniques which were covered in this class’s curriculum and will be used to represent a user’s interest in books. Collaborative algorithms such as the user based and item based, along with the Content based method, may also be built.

3 Salient Features

- Item Based / User Based Collaborative Filtering Recommendation System

- Content and/or Context Based Recommendation (based on attributes from user profile / item profile viz. the reader’s age, the book’s publication date, etc.)

- Sentiment Analysis, if relevant

4 Data Sources

Dataset will be downloaded from IIF (Institute for Informatik Freiburg). The dataset comprises of three files in csv format - users, books and ratings.

- Primarily datasets of Book Crossing sourced from http://www2.informatik.uni-freiburg.de/~cziegler/BX/

The dataset comprises of

- Users Profile Data (BX-Users.CSV ) (userid, location, age)

- Books Profile Data (BX-Books.CSV ) (ISBN, BookTitle, BookAuthor, YearofPublication, Publisher, ImageURLSizeS, ImageURLSizeM, ImageURLSizeL)

- Ratings Data (BX-Book-Ratings.CSV ) (userid , ISBN, rating)

- Subjects from the ISBN DB through their APIs http://isbndb.com/

5 Tentative Flow Chart

6 Methodology & Evaluation Measures

R programming language will be used to accomplish the task. Time permitting, Simio, a simulation software may be utilized to present the topic on July 20th.

6.1 Assumptions

The model will be based on reasonable assumptions about the character of the user pool. The types and the number of recommenders could change, as the project develops.

7 References

7.1 Journal References:

The authors provide an overview of content-based recommender systems and discusses in detail some of the widely used techniques for representing items and user profiles along-with the advantages and the drawbacks.

Some of the widely used techniques for representing items and user profiles are discussed. It also discusses the trends that may lead to the next generation of systems. Taking evolving vocabularies into account and using serendipity when recommending are some of the topics discussed.

The author talks about improving performance when knowledge based system is combined with collaborative filtering to recommend restaurants. Also, how semantic ratings collected from knowledge-based enhances the effectiveness of collaborative filtering.

7.2 Other references :

Recommender systems - How they work and their impacts

Over specialization and new user topics are discussed along with flow charts

In this, new user topics are discussed along with flow charts