The goal for your final project is for you to build out a recommender system using a large dataset (ex: 1M+ ratings or 10k+ users, 10k+ items. There are three deliverables, with separate dates:

[1] Planning Document Find an interesting dataset and describe the system you plan to build out. If you would like to use one of the datasets you have already worked with, you should add a unique element or incorporate additional data. (i.e. explicit features you scrape from another source, like image analysis on movie posters). The overall goal, however, will be to produce quality recommendations by extracting insights from a large dataset. You may do so using Spark, or another distributed computing method, OR by effectively applying one of the more advanced mathematical techniques we have covered.

Recommender System

I will be using the MovieLense dataset since, I have been using it for my previous projects also and I feel familiar with the dataset and its contents.

I am not sure yet which filtering system I might use yet. Before I have used user collaborative and item collaborative.

Dataset

This dataset is available publicily online at https://grouplens.org/datasets/movielens/

It has number of MovieLense datasets. For the final project, I might use a dataset with greater than 1M records. There are 10M, 20M, 25M sized datasets.

Process

I might download the dataset into my github account for easy and fast accessibility. Perform different kinds of wrangling, sorting, transforming, etc.

Once I have a clean dataset, I will split the data into traning and testing dataset for further analysis. Then I will run different kinds of models in this split datasets to build a best recommendation model.