Deliverable
Planning Document Find an interesting dataset and describe the system you plan to build out. If you would like to use one of the datasets you have already worked with, you should add a unique element or incorporate additional data. (i.e. explicit features you scrape from another source, like image analysis on movie posters). The overall goal, however, will be to produce quality recommendations by extracting insights from a large dataset.
This project will explore the contents of goodbook dataset and develop a collaborative filtering recommender system for recommending books. The collaborative filtering uses only user preference through ratings that users provided and will not take into account the features or contents of items. Analysis and evaluation will be done on the recommender system to see how well it performs when recommending items.
To build a book recommender system using a large dataset.
For the final project I will be using goodbooks dataset from Kaggle.
The dataset contains ratings for ten thousand popular books. As to the source, let’s say that these ratings were found on the internet. Generally, there are 100 reviews for each book, although some have less - fewer - ratings. Ratings go from one to five. Both book IDs and user IDs are contiguous. For books, they are 1-10000, for users, 1-53424. All users have made at least two ratings. Median number of ratings per user is 8.There are also books marked to read by the users, book metadata (author, year, etc.) and tags.
books<-read.csv("https://raw.githubusercontent.com/ekhahm/Data612/master/Final%20project/books.csv")
ratings<-read.csv("https://raw.githubusercontent.com/ekhahm/Data612/master/Final%20project/ratings.csv")
btags<-read.csv("https://raw.githubusercontent.com/ekhahm/Data612/master/Final%20project/book_tags.csv")
tags<-read.csv("https://raw.githubusercontent.com/ekhahm/Data612/master/Final%20project/tags.csv")
head(books)
## id book_id best_book_id work_id books_count isbn isbn13
## 1 1 2767052 2767052 2792775 272 439023483 9.780439e+12
## 2 2 3 3 4640799 491 439554934 9.780440e+12
## 3 3 41865 41865 3212258 226 316015849 9.780316e+12
## 4 4 2657 2657 3275794 487 61120081 9.780061e+12
## 5 5 4671 4671 245494 1356 743273567 9.780743e+12
## 6 6 11870085 11870085 16827462 226 525478817 9.780525e+12
## authors original_publication_year
## 1 Suzanne Collins 2008
## 2 J.K. Rowling, Mary GrandPré 1997
## 3 Stephenie Meyer 2005
## 4 Harper Lee 1960
## 5 F. Scott Fitzgerald 1925
## 6 John Green 2012
## original_title
## 1 The Hunger Games
## 2 Harry Potter and the Philosopher's Stone
## 3 Twilight
## 4 To Kill a Mockingbird
## 5 The Great Gatsby
## 6 The Fault in Our Stars
## title language_code
## 1 The Hunger Games (The Hunger Games, #1) eng
## 2 Harry Potter and the Sorcerer's Stone (Harry Potter, #1) eng
## 3 Twilight (Twilight, #1) en-US
## 4 To Kill a Mockingbird eng
## 5 The Great Gatsby eng
## 6 The Fault in Our Stars eng
## average_rating ratings_count work_ratings_count work_text_reviews_count
## 1 4.34 4780653 4942365 155254
## 2 4.44 4602479 4800065 75867
## 3 3.57 3866839 3916824 95009
## 4 4.25 3198671 3340896 72586
## 5 3.89 2683664 2773745 51992
## 6 4.26 2346404 2478609 140739
## ratings_1 ratings_2 ratings_3 ratings_4 ratings_5
## 1 66715 127936 560092 1481305 2706317
## 2 75504 101676 455024 1156318 3011543
## 3 456191 436802 793319 875073 1355439
## 4 60427 117415 446835 1001952 1714267
## 5 86236 197621 606158 936012 947718
## 6 47994 92723 327550 698471 1311871
## image_url
## 1 https://images.gr-assets.com/books/1447303603m/2767052.jpg
## 2 https://images.gr-assets.com/books/1474154022m/3.jpg
## 3 https://images.gr-assets.com/books/1361039443m/41865.jpg
## 4 https://images.gr-assets.com/books/1361975680m/2657.jpg
## 5 https://images.gr-assets.com/books/1490528560m/4671.jpg
## 6 https://images.gr-assets.com/books/1360206420m/11870085.jpg
## small_image_url
## 1 https://images.gr-assets.com/books/1447303603s/2767052.jpg
## 2 https://images.gr-assets.com/books/1474154022s/3.jpg
## 3 https://images.gr-assets.com/books/1361039443s/41865.jpg
## 4 https://images.gr-assets.com/books/1361975680s/2657.jpg
## 5 https://images.gr-assets.com/books/1490528560s/4671.jpg
## 6 https://images.gr-assets.com/books/1360206420s/11870085.jpg
books.csv has metadata for each book (goodreads IDs, authors, title, average rating, etc.).
ratings.csv contains book_id, user_id and ratings.
toread.csv provides IDs of the books marked “to read” by each user, as userid,book_id pairs.
book_tags.csv contains tags/shelves/genres assigned by users to books. Tags in this file are represented by their IDs.
tags.csv translates tag IDs to names.
collaborative filitering
SVD and ALS Matrix Factorization
Accuracy comparision by using error matrix between SVD and ALS