The purpose of this project is to build a new live interactive movie recommendation system using a large dataset.
We plan to leverage a distributed computing framework such as Spark to aid in the execution time of our model. Specifically packages like Sparklyr, Databricks, H20 etc. The user will be allowed to input movie/s that they currently like and the system will recommend possible matches to the user along with a percentage match.
Any one can use our recommendation system’s interactive interface to obtain recommendations for a movie or set of movies. This would be interactive and will be based on a validated model in our system.
The data used to create the system will be sourced from Movielens 20M Dataset. The datset contains approximately 20000263 ratings and 138493 users.