Basic Recommender System

task: Create a recommender to predict a rating of a U.S. President based on user bias and President bias, compared to an average rating.

15 people rated 10 US presidents on a scale of 1 to 5. The Presidents rated were: Ronald Reagan, Abraham Lincoln, John Quincy Adams, Gerald Ford, John F Kennedy, Franklin Pierce, Herbert Hoover, William McKinley, Andrew Jackson and Thomas Jefferson. People rated between 5 and 10 of the presidents and didn’t rate the remaining presidents. The matrix of ratings is:

To create a prediction for each user/president pair, we calculate a mean for all of the entries. We found out how nice/harsh a rater was by calculating the difference between user average and total average. We calculated the difference from the mean for each president. Our predicted rating was made up of the overall average, a user bias and a president average.

##    Ronald Reagan Abraham Lincoln John Quincy Adams Gerald Ford John F Kennedy Franklin Pierce Herbert Hoover
## 1             NA               5                NA           3             NA              NA             NA
## 2              2              NA               3.0           3            2.0              NA             NA
## 3              3              NA               1.5          NA            3.5               1              1
## 4             NA               5                NA          NA             NA              NA             NA
## 5              3               4               2.0           2            4.0              NA             NA
## 6              4               5                NA           4            4.0               3             NA
## 7             NA               4               4.0          NA            5.0               3              4
## 8              3               5               1.0           2             NA              NA              1
## 9             NA               5               2.0           2            2.0               1             NA
## 10             2               5               4.0          NA            3.0              NA             NA
## 11             2              NA                NA           3            3.0              NA             NA
## 12            NA              NA               4.0           3            3.0               2             NA
## 13             2               5               4.0           2             NA              NA             NA
## 14             3               5               3.0          NA             NA              NA              1
## 15             3               5               3.0           3            4.0              NA             NA
##    William McKinley Andrew Jackson Thomas Jefferson
## 1                NA              1                4
## 2                 2              1                4
## 3                 2              1                3
## 4                 4              1                3
## 5                NA              1                5
## 6                NA              4                4
## 7                 2             NA                4
## 8                 3             NA                4
## 9                 3              3                4
## 10               NA              1                5
## 11                2              3               NA
## 12                2              1                4
## 13                2              2                5
## 14                4              5               NA
## 15                2              2                5

The mean rating for the training set, by itsself, is

## [1] 3.030612

The root mean square error for the training set, by itsself, is

## [1] 1.61641

The mean for each column and row in the training set, by itsself, is

##         [,1]     [,2]     [,3] [,4] [,5] [,6]     [,7]     [,8] [,9]    [,10] [,11]    [,12]    [,13] [,14] [,15]
## columns 2.70 4.818182 2.863636 2.70 3.35    2 1.750000 2.545455 2.00 4.153846   0.0 0.000000 0.000000   0.0 0.000
## rows    3.25 2.428571 2.000000 3.25 3.00    4 3.714286 2.714286 2.75 3.333333   2.6 2.714286 3.142857   3.5 3.375

The bias for each column and row in the training set, by itsself, is

##               [,1]       [,2]       [,3]       [,4]        [,5]       [,6]       [,7]       [,8]       [,9]     [,10]
## columns -0.3306122  1.7875696 -0.1669759 -0.3306122  0.31938776 -1.0306122 -1.2806122 -0.4851577 -1.0306122 1.1232339
## rows     0.2193878 -0.6020408 -1.0306122  0.2193878 -0.03061224  0.9693878  0.6836735 -0.3163265 -0.2806122 0.3027211
##              [,11]      [,12]     [,13]     [,14]     [,15]
## columns         NA         NA        NA        NA        NA
## rows    -0.4306122 -0.3163265 0.1122449 0.4693878 0.3443878

The baseline predictors for each user and president is

##    Ronald Reagan Abraham Lincoln John Quincy Adams Gerald Ford John F Kennedy Franklin Pierce Herbert Hoover
## 1       2.919388        5.037570          3.083024    2.919388       3.569388       2.2193878      1.9693878
## 2       2.097959        4.216141          2.261596    2.097959       2.747959       1.3979592      1.1479592
## 3       1.669388        3.787570          1.833024    1.669388       2.319388       0.9693878      0.7193878
## 4       2.919388        5.037570          3.083024    2.919388       3.569388       2.2193878      1.9693878
## 5       2.669388        4.787570          2.833024    2.669388       3.319388       1.9693878      1.7193878
## 6       3.669388        5.787570          3.833024    3.669388       4.319388       2.9693878      2.7193878
## 7       3.383673        5.501855          3.547310    3.383673       4.033673       2.6836735      2.4336735
## 8       2.383673        4.501855          2.547310    2.383673       3.033673       1.6836735      1.4336735
## 9       2.419388        4.537570          2.583024    2.419388       3.069388       1.7193878      1.4693878
## 10      3.002721        5.120903          3.166357    3.002721       3.652721       2.3027211      2.0527211
## 11      2.269388        4.387570          2.433024    2.269388       2.919388       1.5693878      1.3193878
## 12      2.383673        4.501855          2.547310    2.383673       3.033673       1.6836735      1.4336735
## 13      2.812245        4.930427          2.975881    2.812245       3.462245       2.1122449      1.8622449
## 14      3.169388        5.287570          3.333024    3.169388       3.819388       2.4693878      2.2193878
## 15      3.044388        5.162570          3.208024    3.044388       3.694388       2.3443878      2.0943878
##    William McKinley Andrew Jackson Thomas Jefferson
## 1          2.764842      2.2193878         4.373234
## 2          1.943414      1.3979592         3.551805
## 3          1.514842      0.9693878         3.123234
## 4          2.764842      2.2193878         4.373234
## 5          2.514842      1.9693878         4.123234
## 6          3.514842      2.9693878         5.123234
## 7          3.229128      2.6836735         4.837520
## 8          2.229128      1.6836735         3.837520
## 9          2.264842      1.7193878         3.873234
## 10         2.848176      2.3027211         4.456567
## 11         2.114842      1.5693878         3.723234
## 12         2.229128      1.6836735         3.837520
## 13         2.657699      2.1122449         4.266091
## 14         3.014842      2.4693878         4.623234
## 15         2.889842      2.3443878         4.498234

The root mean square error of the training baseline predictor, is

## [1] 1.130327

After predicting the rating for each user, for each president, we want to test it using our test data. The root mean square error of the test data, without using a prediction model, is

## [1] 1.941389

The matrix for our testing data is:

##    [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## 1     2   NA   NA   NA   NA   NA   NA   NA   NA    NA
## 2    NA  4.0   NA   NA   NA   NA    1   NA   NA    NA
## 3    NA  4.5   NA    1   NA   NA   NA   NA   NA    NA
## 4     3   NA   NA   NA   NA   NA   NA   NA   NA    NA
## 5    NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
## 6    NA   NA    2   NA   NA   NA    4    2   NA    NA
## 7     1   NA   NA    5   NA   NA   NA   NA    4    NA
## 8    NA   NA   NA   NA   NA   NA   NA   NA    3    NA
## 9     3   NA   NA   NA   NA   NA    2   NA   NA    NA
## 10   NA   NA   NA   NA   NA   NA   NA   NA   NA    NA
## 11   NA  5.0   NA   NA   NA    1    1   NA   NA     4
## 12    3  5.0   NA   NA   NA   NA    1   NA   NA    NA
## 13   NA   NA   NA   NA    3   NA    3   NA   NA    NA
## 14   NA   NA   NA    3    4    3   NA   NA   NA     5
## 15   NA   NA   NA   NA   NA    1    1   NA   NA    NA

The root mean square error for the testing data, taking the recommender model into account, is

## [1] 0.9413736

The ratio of the RMSEraw to RMSEmodeled is:

## [1] 2.062294

Our model successfully reduced the root mean square error to half of what it was without a model.