Project 1

Briefly describe the recommender system that you’re going to build out from a business perspective, e.g. “This system recommends data science books to readers.”

This system recommends movies to Users

Find a dataset, or build out your own toy dataset. As a minimum requirement for complexity, please include numeric ratings for at least five users, across at least five items, with some missing data.

I created a dataset with 7 movies for 20 users. The Dataset is located here:- https://raw.githubusercontent.com/ashishsm1986/Cuny-Assignments/master/Data-612/Week1/ashish-movies.csv

Load your data into (for example) an R or pandas dataframe, a Python dictionary or list of lists, (or another data structure of your choosing). From there, create a user-item matrix.

##    userId Toy.Story Jumanji Interstellar GoldenEye American.President Turning
## 1       1       3.5     3.0          4.0       3.0                5.0     4.5
## 2       2       4.0     3.0          3.0       2.0                3.0     4.0
## 3       3       3.5     3.5          5.0       5.0                4.0     2.0
## 4       4       3.0     4.0          3.0       4.0                3.0      NA
## 5       5       2.5     2.5          4.0       4.5                4.5     3.5
## 6       6       2.5     3.0          3.5       3.5                3.0     4.0
## 7       7       4.0     3.0          5.0        NA                4.0     4.5
## 8       8       3.0     5.0          4.0       4.0                3.0     5.0
## 9       9       4.0     4.0          3.0       2.0                5.0     5.0
## 10     10       4.0     5.0          2.0       4.0                5.0     4.0
## 11     11       3.0     1.5          4.5       2.0                4.0     4.5
## 12     12       4.0     3.0          1.0       3.0                3.0     4.0
## 13     13       3.0     3.5          5.0       3.0                4.0     4.0
## 14     14       3.5     3.0          2.0       3.0                4.0     3.0
## 15     15       2.0     2.0          4.0       4.0                5.0     2.0
## 16     16       4.0     4.0          2.0       1.0                2.0     3.0
## 17     17       4.5     4.0          3.0       3.0                4.0     2.0
## 18     18       5.0     3.5          3.0       4.0                3.0     3.0
## 19     19       4.0     4.0          2.0       4.0                3.0     4.0
## 20     20       4.0     4.0          5.0       4.0                4.0     2.0
##    The.Martian
## 1          3.0
## 2          2.0
## 3          2.5
## 4          5.0
## 5          1.5
## 6          2.0
## 7          2.0
## 8          2.0
## 9          3.5
## 10         3.0
## 11         4.0
## 12         2.0
## 13         5.0
## 14         3.0
## 15         3.0
## 16         4.0
## 17         2.0
## 18         2.0
## 19         2.0
## 20         4.0

Break your ratings into separate training and test datasets.

Assigning 70% of data to Train and 30% to Test

##    userId Toy.Story Jumanji Interstellar GoldenEye American.President Turning
## 2       2       4.0     3.0          3.0       2.0                3.0     4.0
## 11     11       3.0     1.5          4.5       2.0                4.0     4.5
## 15     15       2.0     2.0          4.0       4.0                5.0     2.0
## 19     19       4.0     4.0          2.0       4.0                3.0     4.0
## 9       9       4.0     4.0          3.0       2.0                5.0     5.0
## 16     16       4.0     4.0          2.0       1.0                2.0     3.0
## 5       5       2.5     2.5          4.0       4.5                4.5     3.5
## 7       7       4.0     3.0          5.0        NA                4.0     4.5
## 13     13       3.0     3.5          5.0       3.0                4.0     4.0
## 3       3       3.5     3.5          5.0       5.0                4.0     2.0
## 17     17       4.5     4.0          3.0       3.0                4.0     2.0
## 6       6       2.5     3.0          3.5       3.5                3.0     4.0
## 20     20       4.0     4.0          5.0       4.0                4.0     2.0
## 12     12       4.0     3.0          1.0       3.0                3.0     4.0
##    The.Martian
## 2          2.0
## 11         4.0
## 15         3.0
## 19         2.0
## 9          3.5
## 16         4.0
## 5          1.5
## 7          2.0
## 13         5.0
## 3          2.5
## 17         2.0
## 6          2.0
## 20         4.0
## 12         2.0
##    userId Toy.Story Jumanji Interstellar GoldenEye American.President Turning
## 1       1       3.5     3.0            4         3                  5     4.5
## 4       4       3.0     4.0            3         4                  3      NA
## 8       8       3.0     5.0            4         4                  3     5.0
## 10     10       4.0     5.0            2         4                  5     4.0
## 14     14       3.5     3.0            2         3                  4     3.0
## 18     18       5.0     3.5            3         4                  3     3.0
##    The.Martian
## 1            3
## 4            5
## 8            2
## 10           3
## 14           3
## 18           2

Using your training data, calculate the bias for each user and each item.

Bias for users

##    userId         bias
## 1       2 -0.355670103
## 2      11  0.001472754
## 3      15 -0.212812960
## 4      19 -0.069955817
## 5       9  0.430044183
## 6      16 -0.498527246
## 7       5 -0.069955817
## 8       7  0.394329897
## 9      13  0.572901325
## 10      3  0.287187040
## 11     17 -0.141384389
## 12      6 -0.284241532
## 13     20  0.501472754
## 14     12 -0.498527246

Bias for movies

##                movie       bias
## 2          Toy.Story  0.1443299
## 3            Jumanji -0.1413844
## 4       Interstellar  0.2157585
## 5          GoldenEye -0.2018239
## 6 American.President  0.3943299
## 7            Turning  0.1086156
## 8        The.Martian -0.5342415

From the raw average, and the appropriate user and item biases, calculate the baseline predictors for every user-item combination.

##    userId Toy.Story Jumanji Interstellar GoldenEye American.President Turning
## 2       2      3.14    2.86         3.22      2.80               3.39    3.11
## 11     11      3.50    3.22         3.57      3.16               3.75    3.47
## 15     15      3.29    3.00         3.36      2.94               3.54    3.25
## 19     19      3.43    3.14         3.50      3.08               3.68    3.39
## 9       9      3.93    3.64         4.00      3.58               4.18    3.89
## 16     16      3.00    2.72         3.07      2.66               3.25    2.97
## 5       5      3.43    3.14         3.50      3.08               3.68    3.39
## 7       7      3.89    3.61         3.97      3.55               4.14    3.86
## 13     13      4.07    3.79         4.14      3.73               4.32    4.04
## 3       3      3.79    3.50         3.86      3.44               4.04    3.75
## 17     17      3.36    3.07         3.43      3.01               3.61    3.32
## 6       6      3.22    2.93         3.29      2.87               3.47    3.18
## 20     20      4.00    3.72         4.07      3.66               4.25    3.97
## 12     12      3.00    2.72         3.07      2.66               3.25    2.97
##    The.Martian
## 2         2.47
## 11        2.82
## 15        2.61
## 19        2.75
## 9         3.25
## 16        2.32
## 5         2.75
## 7         3.22
## 13        3.39
## 3         3.11
## 17        2.68
## 6         2.54
## 20        3.32
## 12        2.32

Calculate the RMSE for the baseline predictors for both your training data and your test data.

For Train Dataset

User bias for test set

##   userId         bias
## 1      1  0.358615611
## 2      4  0.310996564
## 3      8  0.358615611
## 4     10  0.501472754
## 5     14 -0.284241532
## 6     18  0.001472754

Prediction for Test

##    userId Toy.Story Jumanji Interstellar GoldenEye American.President Turning
## 1       1      3.86    3.57         3.93      3.51               4.11    3.82
## 4       4      3.81    3.53         3.88      3.46               4.06    3.78
## 8       8      3.86    3.57         3.93      3.51               4.11    3.82
## 10     10      4.00    3.72         4.07      3.66               4.25    3.97
## 14     14      3.22    2.93         3.29      2.87               3.47    3.18
## 18     18      3.50    3.22         3.57      3.16               3.75    3.47
##    The.Martian
## 1         3.18
## 4         3.13
## 8         3.18
## 10        3.32
## 14        2.54
## 18        2.82

Test RMSE Calculation

RMSE of baseline predictors for training data 0.915 and RMSE of baseline predictors for Test data 0.847

Summarize your results.

Both Test and Train RMSE improved with the addition of the movie bias and user bias compared to just the raw average. The improvement for test was roughly 6.72%, and The improvement for train was roughly 10.56% which isn’t a huge increase and probably due to the small subset with the movies chosen. The results could possibly be improved by using a larger variety of movies and bigger dataset.