Project Scope and Description

The Final project will rely more on python packages and libraries instead of R’s much-vaunted Recommenderlabs mainly to learn more about other packages besides R’s Recommenderlabs package. To that extent, this is an opportunity to venture out of the book’s procedural-driven guidance and discover for ourselves if there are tools out there beside Recomenderlabs one could use to build a recommender systems. One that has all the nifty built-in libraries as powerful as Recommenderlabs.

The goal is to evaluate and create an end-to-end recommender system using the movielens data and determine which algorithms produce the best predictive models.

We will attempt multiple recommenders libraries to determine which is most effective. And within data analysis; comparisons of accuracy metrics like RSMEs, ROC and Precision-to-Recall curves will be utilized to determine which ones produces the best result. “Home-made” IBCF, UBCF and SVD methods will be implemented and tested for accuracies and performance against Python’s Scikit-Learn packaged Turicriate & Surprise libraries.

The following steps will be applied throughout the project: - Basic data ETLs followed by standard data exploration and visualizations - Basic statistical checks of the underlying data - Training and test split of the datasets - Hyper-parameter tuning via cross-validation - Implement the different algorithms/models - Comparing accuracy across the datasets per the different algorithms - Predictive accuracies will rely on Root mean squared error (RMSE). Comparative analysis will be use to determine which models gives the best results via learning/accuracy curves and computational speeds.

Finally, We will perform some random recommendations to see if the recommender systems we have built make some common sense recommendations.

Fig1: Process Work flow

ETLs

Step 1: We will first import the dataset from GitHub, clean the datasets, and perform standard data exploration via Statistical and visualizations.

Step 2/3: Transform and/or combine the datasets for further analysis if necessary before splitting it into training and testing sets. Finally, when data is deem “cleaned”, it will be loaded either into Python or R.

Step 4: Introduced and implement the User-Based Collaborative Filtering (UBCF), Item-Based Collaborative Filtering (IBCF), Singular Value Decomposition (SVD), and other algorithms with different parameters (e.g. similarity methods, normalization techniques) from the python packages

Step5: Evaluate the model performance and accuracy using the metrics RMSE and Precision-Recall. These metrics will in turn guide in the final recommender system to be build for production; model with the best overall accuracy and computational performance.

Reference:

Movielens Data source was downloaded from Kaggle website: https://www.kaggle.com/prajitdatta/movielens-100k-dataset?
BUILDING_A_RECOMMENDATION_SYSTEM_WITH_R by Suresh K. Gorakala & Michele Usuelli

DAT 612 - Final Project Proposal

sufian

7/5/2020

Project Scope and Description

ETLs