Project 1

  • Briefly describe the recommender system that you’re going to build out from a businessperspective, e.g. “This system recommends data science books to readers.”
  • Find a dataset, or build out your own toy dataset. As a minimum requirement for complexity,please include numeric ratings for at least five users, across at least five items, with some missingdata.
  • Load your data into (for example) an R or pandas dataframe, a Python dictionary or list of lists, (oranother data structure of your choosing). From there, create a user-item matrix.
  • If you choose to work with a large dataset, you’re encouraged to also create a small, relativelydense “user-item” matrix as a subset so that you can hand-verify your calculations.
  • Break your ratings into separate training and test datasets.
  • Using your training data, calculate the raw average (mean) rating for every user-item combination.
  • Calculate the RMSE for raw average for both your training data and your test data.
  • Using your training data, calculate the bias for each user and each item.
  • From the raw average, and the appropriate user and item biases, calculate the baseline predictorsfor every user-item combination.
  • Calculate the RMSE for the baseline predictors for both your training data and your test data.
  • Summarize your results.

    Data Processing

    Built a sample dataset with Users and Books, 10 on each and randomly assigned the values with some missing values NA

    User-Book Ratings
    Book_1 Book_2 Book_3 Book_4 Book_5 Book_6 Book_7 Book_8 Book_9 Book_10
    User_1 4 1 5 4 3 1 5 4 2 1
    User_2 3 5 4 1 NA 1 3 1 3 5
    User_3 2 4 1 2 4 3 1 3 5 5
    User_4 1 4 4 3 1 1 5 4 5 5
    User_5 1 4 NA 1 4 4 3 4 2 5
    User_6 3 NA 1 3 3 4 NA 4 2 5
    User_7 3 2 NA 2 1 2 4 4 3 5
    User_8 2 NA 4 2 NA 5 2 1 2 1
    User_9 5 4 5 5 1 4 4 1 NA 5
    User_10 2 1 NA 3 NA 4 5 5 3 3

    Train Dataset

    Building a training dataset

    train_dfing Dataset
    Book_1 Book_2 Book_3 Book_4 Book_5 Book_6 Book_7 Book_8 Book_9 Book_10
    User_1 4 1 5 4 NA 1 5 4 2 1
    User_2 3 5 4 1 NA NA 3 NA 3 5
    User_3 2 4 1 2 4 3 1 3 5 5
    User_4 1 4 4 3 1 1 5 4 5 5
    User_5 1 4 NA 1 NA 4 3 4 2 NA
    User_6 3 NA NA 3 3 4 NA 4 2 5
    User_7 3 2 NA 2 1 2 4 4 3 5
    User_8 NA NA 4 2 NA 5 2 1 2 1
    User_9 5 4 5 5 1 4 4 1 NA 5
    User_10 2 1 NA 3 NA 4 5 NA 3 NA

    Test Dataset

    Builsing a test dataset

    test_df Dataset
    Book_1 Book_2 Book_3 Book_4 Book_5 Book_6 Book_7 Book_8 Book_9 Book_10
    User_1 NA NA NA NA 3 NA NA NA NA NA
    User_2 NA NA NA NA NA 1 NA 1 NA NA
    User_3 NA NA NA NA NA NA NA NA NA NA
    User_4 NA NA NA NA NA NA NA NA NA NA
    User_5 NA NA NA NA 4 NA NA NA NA 5
    User_6 NA NA 1 NA NA NA NA NA NA NA
    User_7 NA NA NA NA NA NA NA NA NA NA
    User_8 2 NA NA NA NA NA NA NA NA NA
    User_9 NA NA NA NA NA NA NA NA NA NA
    User_10 NA NA NA NA NA NA NA 5 NA 3

    User-Item Matrix

    Used replicate function to create an user-item matrix and finding the average

    User-Item Matrix
    Book_1 Book_2 Book_3 Book_4 Book_5 Book_6 Book_7 Book_8 Book_9 Book_10
    User_1 3.11 3.11 3.11 3.11 3.11 3.11 3.11 3.11 3.11 3.11
    User_2 3.11 3.11 3.11 3.11 3.11 3.11 3.11 3.11 3.11 3.11
    User_3 3.11 3.11 3.11 3.11 3.11 3.11 3.11 3.11 3.11 3.11
    User_4 3.11 3.11 3.11 3.11 3.11 3.11 3.11 3.11 3.11 3.11
    User_5 3.11 3.11 3.11 3.11 3.11 3.11 3.11 3.11 3.11 3.11
    User_6 3.11 3.11 3.11 3.11 3.11 3.11 3.11 3.11 3.11 3.11
    User_7 3.11 3.11 3.11 3.11 3.11 3.11 3.11 3.11 3.11 3.11
    User_8 3.11 3.11 3.11 3.11 3.11 3.11 3.11 3.11 3.11 3.11
    User_9 3.11 3.11 3.11 3.11 3.11 3.11 3.11 3.11 3.11 3.11
    User_10 3.11 3.11 3.11 3.11 3.11 3.11 3.11 3.11 3.11 3.11

    Baseline Predictor

    Calculating baseline predictors for every user-item combination

    Item Bias
    Book_1 Book_2 Book_3 Book_4 Book_5 Book_6 Book_7 Book_8 Book_9 Book_10
    User_1 2.56 3.02 3.72 2.49 1.89 3.00 3.45 3.02 2.89 3.89
    User_2 2.99 3.45 4.15 2.92 2.32 3.43 3.88 3.45 3.32 4.32
    User_3 2.56 3.02 3.72 2.49 1.89 3.00 3.45 3.02 2.89 3.89
    User_4 2.86 3.32 4.02 2.79 2.19 3.30 3.75 3.32 3.19 4.19
    User_5 2.27 2.73 3.43 2.20 1.60 2.71 3.16 2.73 2.60 3.60
    User_6 2.99 3.45 4.15 2.92 2.32 3.43 3.88 3.45 3.32 4.32
    User_7 2.45 2.91 3.61 2.38 1.78 2.89 3.34 2.91 2.78 3.78
    User_8 1.99 2.45 3.15 1.92 1.32 2.43 2.88 2.45 2.32 3.32
    User_9 3.34 3.80 4.50 3.27 2.67 3.78 4.23 3.80 3.67 4.67
    User_10 2.56 3.02 3.72 2.49 1.89 3.00 3.45 3.02 2.89 3.89