DATA 643 Final Project Proposal | Large Dataset based Recommender System

Objective
Implementation
Results
Data Sources
- References

Objective

The purpose of this project is to build a new live interactive movie recommendation system using a large dataset.

Implementation

We plan to leverage a distributed computing framework such as Spark to aid in the execution time of our model. Specifically packages like Sparklyr, Databricks, H20 etc. The user will be allowed to input movie/s that they currently like and the system will recommend possible matches to the user along with a percentage match.

Results

Any one can use our recommendation system’s interactive interface to obtain recommendations for a movie or set of movies. This would be interactive and will be based on a validated model in our system.

Data Sources

The data used to create the system will be sourced from Movielens 20M Dataset. The datset contains approximately 20000263 ratings and 138493 users.

References

https://grouplens.org/datasets/movielens/20m/