Eleanor
The goal for the final project is for us to build out a recommender system using a large dataset (ex: 1M+ ratings or 10k+ users, 10k+ items.
The purpose of this project is to produce quality recommendations by extracting insights from a large dataset.
I will be using Amazon Fine Food Reviews dataset that I found in Kaggle.
I will be using the Recommender Systems that we've learned from the course and apply the methods such as, UBCF, IBCF and SVD. I will compare these models to see which one of them will provide better results.
I will then try to use Spark to do a distributed processing and then compare the performance and accuracy of the recommendation between the centralized system and the distributed system.
Source: SNAP
| Id | ProductId | UserId | ProfileName | HelpfulnessNumerator | HelpfulnessDenominator | Score | Time | Summary | Text |
|---|---|---|---|---|---|---|---|---|---|
| 1 | B001E4KFG0 | A3SGXH7AUHU8GW | delmartian | 1 | 1 | 5 | 1303862400 | Good Quality Dog Food | I have bought several of the Vitality canned dog food products and have found them all to be of good quality. The product looks more like a stew than a processed meat and it smells better. My Labrador is finicky and she appreciates this product better than most. |
| FieldName | Field_Description |
|---|---|
| Id | Unique Identifier |
| ProductId | unique identifier for the product |
| UserId | unqiue identifier for the user |
| ProfileName | |
| HelpfulnessNumerator | number of users who found the review helpful |
| HelpfulnessDenominator | number of users who indicated whether they found the review helpful |
| Score | rating in the range 1 and 5, with 1 being the worse and 5 being the best |
| Time | timestamp |
| Summary | Brief summary of the review |
| Text | Content of the review |