Recommender System - PonyPickR

PonyPickR recommends horses to horse players based on ratings from popular speed figure providers. The system will not only recommend desirable horses, but should also provide some insight into the who are the best raters. The ratings utilized by PonyPickR include:

Load and Preprocess Data

Data for PonyPickR was compiled from Timeform US and Ragozin Speed Figures. Information was gathered for the first race at Aqueduct racetrack on February 8, 2020. The first race at Acqueduct on February 8th was a seven horse field racing over mile on the dirt course. Our ratings are on different scales. Therefore, we use the scales packages to place data on a consistent 1-100 scale. Tables 1 and 2 below set forth the raw and scaled ratings for our data set.

Table 1. Raw Data

horse tfus_lst tfus_lst3 rags_lst rags_lst3 tfus_avg rags_avg
Tequilla_Sunday 57 69.30 30.50 23.08333 63.150 26.79167
Lookbothways 73 81.33 19.25 19.41667 77.165 19.33333
Our_Ticket 65 64.60 21.00 23.41667 64.800 22.20833
Sister_Alexa 83 83.30 22.00 23.75000 83.150 22.87500
Movie_Score 60 66.00 29.00 27.83333 63.000 28.41667
Solitary_Gem 94 89.30 16.75 16.16667 91.650 16.45833
OK_Honey 83 75.00 17.50 22.00000 79.000 19.75000

Table 2. Scaled Data - Scaled to a Range of 1 to 100

The scaled data frame is assigned to the user_matrix variable.

horse tfus_lst tfus_lst3 rags_lst rags_lst3 tfus_avg rags_avg
Tequilla_Sunday 43.97674 53.41628 67.15865 74.21875 48.69651 70.68870
Lookbothways 56.25581 62.64860 77.86779 77.70913 59.45221 77.78846
Our_Ticket 50.11628 49.80930 76.20192 73.90144 49.96279 75.05168
Sister_Alexa 63.93023 64.16047 75.25000 73.58413 64.04535 74.41707
Movie_Score 46.27907 50.88372 68.58654 69.69712 48.58140 69.14183
Solitary_Gem 72.37209 68.76512 80.24760 80.80288 70.56860 80.52524
OK_Honey 63.93023 57.79070 79.53365 75.25000 60.86047 77.39183

Create Training and Test Data Sets

Training Data Set

rater Lookbothways Movie_Score OK_Honey Our_Ticket Sister_Alexa Solitary_Gem Tequilla_Sunday
rags_avg NA 69.14183 77.39183 NA 74.41707 80.52524 NA
rags_lst 77.86779 68.58654 79.53365 76.20192 75.25000 80.24760 NA
rags_lst3 77.70913 NA 75.25000 NA 73.58413 80.80288 74.21875
tfus_avg 59.45221 NA 60.86047 49.96279 64.04535 70.56860 48.69651
tfus_lst 56.25581 46.27907 63.93023 50.11628 NA 72.37209 43.97674
tfus_lst3 NA 50.88372 57.79070 49.80930 64.16047 NA 53.41628

Test Data Set

rater Lookbothways Movie_Score Our_Ticket Sister_Alexa Solitary_Gem Tequilla_Sunday
rags_avg 77.78846 NA 75.05168 NA NA 70.68870
rags_lst NA NA NA NA NA 67.15865
rags_lst3 NA 69.69712 73.90144 NA NA NA
tfus_avg NA 48.58140 NA NA NA NA
tfus_lst NA NA NA 63.93023 NA NA
tfus_lst3 62.64860 NA NA NA 68.76512 NA

Using your training data, calculate the raw average (mean) rating for every user-item combination.

Calculate the RMSE for raw average for both your training data and your test data

The train_data raw_mean and RMSE are highlighted in red below.

rater raw_mean rmse Lookbothways Movie_Score OK_Honey Our_Ticket Sister_Alexa Solitary_Gem Tequilla_Sunday
rags_avg 65.72828 11.65349 NA 69.14183 77.39183 NA 74.41707 80.52524 NA
rags_lst 65.72828 11.65349 77.86779 68.58654 79.53365 76.20192 75.25000 80.24760 NA
rags_lst3 65.72828 11.65349 77.70913 NA 75.25000 NA 73.58413 80.80288 74.21875
tfus_avg 65.72828 11.65349 59.45221 NA 60.86047 49.96279 64.04535 70.56860 48.69651
tfus_lst 65.72828 11.65349 56.25581 46.27907 63.93023 50.11628 NA 72.37209 43.97674
tfus_lst3 65.72828 11.65349 NA 50.88372 57.79070 49.80930 64.16047 NA 53.41628

The test_data RMSE is highligted in blue below.

rater raw_mean rmse Lookbothways Movie_Score Our_Ticket Sister_Alexa Solitary_Gem Tequilla_Sunday
rags_avg 65.72828 8.108842 77.78846 NA 75.05168 NA NA 70.68870
rags_lst 65.72828 8.108842 NA NA NA NA NA 67.15865
rags_lst3 65.72828 8.108842 NA 69.69712 73.90144 NA NA NA
tfus_avg 65.72828 8.108842 NA 48.58140 NA NA NA NA
tfus_lst 65.72828 8.108842 NA NA NA 63.93023 NA NA
tfus_lst3 65.72828 8.108842 62.64860 NA NA NA 68.76512 NA

Using your training data, calculate the bias for each user and each item.

The user (rater) bias is highlighted in green in the table below.

rater raw_mean rmse user_bias Lookbothways Movie_Score OK_Honey Our_Ticket Sister_Alexa Solitary_Gem Tequilla_Sunday
rags_avg 65.72828 11.65349 9.640709 NA 69.14183 77.39183 NA 74.41707 80.52524 NA
rags_lst 65.72828 11.65349 10.552969 77.86779 68.58654 79.53365 76.20192 75.25000 80.24760 NA
rags_lst3 65.72828 11.65349 10.584700 77.70913 NA 75.25000 NA 73.58413 80.80288 74.21875
tfus_avg 65.72828 11.65349 -6.797293 59.45221 NA 60.86047 49.96279 64.04535 70.56860 48.69651
tfus_lst 65.72828 11.65349 -10.239909 56.25581 46.27907 63.93023 50.11628 NA 72.37209 43.97674
tfus_lst3 65.72828 11.65349 -10.516188 NA 50.88372 57.79070 49.80930 64.16047 NA 53.41628

The item bias is highlighted in green in the table below.

horse raw_mean rmse item_bias rags_avg rags_lst rags_lst3 tfus_avg tfus_lst tfus_lst3
Lookbothways 65.72828 11.65349 2.092955 NA 77.86779 77.70913 59.45221 56.25581 NA
Movie_Score 65.72828 11.65349 -7.005492 69.14183 68.58654 NA NA 46.27907 50.88372
OK_Honey 65.72828 11.65349 3.397865 77.39183 79.53365 75.25000 60.86047 63.93023 57.79070
Our_Ticket 65.72828 11.65349 -9.205707 NA 76.20192 NA 49.96279 50.11628 49.80930
Sister_Alexa 65.72828 11.65349 4.563122 74.41707 75.25000 73.58413 64.04535 NA 64.16047
Solitary_Gem 65.72828 11.65349 11.175003 80.52524 80.24760 80.80288 70.56860 72.37209 NA
Tequilla_Sunday 65.72828 11.65349 -10.651210 NA NA 74.21875 48.69651 43.97674 53.41628

From the raw average, and the appropriate user and item biases, calculate the baseline predictors for every user-item combination.

The train_data base line prediction, RME and the decrease from the raw_mean RMSE are highlighted in red. The original RMSE is highlighted in green.

horse rater rating user_bias raw_mean rmse item_bias bl_pred rmse2 decrease_in_rmse
Tequilla_Sunday tfus_lst 43.97674 -10.239909 65.72828 11.65349 -10.651210 44.83716 4.517599 -0.6123392
Lookbothways tfus_lst 56.25581 -10.239909 65.72828 11.65349 2.092955 57.58133 4.517599 -0.6123392
Our_Ticket tfus_lst 50.11628 -10.239909 65.72828 11.65349 -9.205707 46.28266 4.517599 -0.6123392
Sister_Alexa tfus_lst 63.93023 -10.239909 65.72828 11.65349 4.563122 60.05149 4.517599 -0.6123392
Movie_Score tfus_lst 46.27907 -10.239909 65.72828 11.65349 -7.005492 48.48288 4.517599 -0.6123392
Solitary_Gem tfus_lst 72.37209 -10.239909 65.72828 11.65349 11.175003 66.66337 4.517599 -0.6123392
OK_Honey tfus_lst 63.93023 -10.239909 65.72828 11.65349 3.397865 58.88624 4.517599 -0.6123392
Tequilla_Sunday tfus_lst3 53.41628 -10.516188 65.72828 11.65349 -10.651210 44.56088 4.517599 -0.6123392
Lookbothways tfus_lst3 62.64860 -10.516188 65.72828 11.65349 2.092955 57.30505 4.517599 -0.6123392
Our_Ticket tfus_lst3 49.80930 -10.516188 65.72828 11.65349 -9.205707 46.00639 4.517599 -0.6123392
Sister_Alexa tfus_lst3 64.16047 -10.516188 65.72828 11.65349 4.563122 59.77522 4.517599 -0.6123392
Movie_Score tfus_lst3 50.88372 -10.516188 65.72828 11.65349 -7.005492 48.20660 4.517599 -0.6123392
Solitary_Gem tfus_lst3 68.76512 -10.516188 65.72828 11.65349 11.175003 66.38710 4.517599 -0.6123392
OK_Honey tfus_lst3 57.79070 -10.516188 65.72828 11.65349 3.397865 58.60996 4.517599 -0.6123392
Tequilla_Sunday rags_lst 67.15865 10.552969 65.72828 11.65349 -10.651210 65.63004 4.517599 -0.6123392
Lookbothways rags_lst 77.86779 10.552969 65.72828 11.65349 2.092955 78.37421 4.517599 -0.6123392
Our_Ticket rags_lst 76.20192 10.552969 65.72828 11.65349 -9.205707 67.07554 4.517599 -0.6123392
Sister_Alexa rags_lst 75.25000 10.552969 65.72828 11.65349 4.563122 80.84437 4.517599 -0.6123392
Movie_Score rags_lst 68.58654 10.552969 65.72828 11.65349 -7.005492 69.27576 4.517599 -0.6123392
Solitary_Gem rags_lst 80.24760 10.552969 65.72828 11.65349 11.175003 87.45625 4.517599 -0.6123392
OK_Honey rags_lst 79.53365 10.552969 65.72828 11.65349 3.397865 79.67911 4.517599 -0.6123392
Tequilla_Sunday rags_lst3 74.21875 10.584700 65.72828 11.65349 -10.651210 65.66177 4.517599 -0.6123392
Lookbothways rags_lst3 77.70913 10.584700 65.72828 11.65349 2.092955 78.40594 4.517599 -0.6123392
Our_Ticket rags_lst3 73.90144 10.584700 65.72828 11.65349 -9.205707 67.10727 4.517599 -0.6123392
Sister_Alexa rags_lst3 73.58413 10.584700 65.72828 11.65349 4.563122 80.87610 4.517599 -0.6123392
Movie_Score rags_lst3 69.69712 10.584700 65.72828 11.65349 -7.005492 69.30749 4.517599 -0.6123392
Solitary_Gem rags_lst3 80.80288 10.584700 65.72828 11.65349 11.175003 87.48798 4.517599 -0.6123392
OK_Honey rags_lst3 75.25000 10.584700 65.72828 11.65349 3.397865 79.71085 4.517599 -0.6123392
Tequilla_Sunday tfus_avg 48.69651 -6.797293 65.72828 11.65349 -10.651210 48.27978 4.517599 -0.6123392
Lookbothways tfus_avg 59.45221 -6.797293 65.72828 11.65349 2.092955 61.02394 4.517599 -0.6123392
Our_Ticket tfus_avg 49.96279 -6.797293 65.72828 11.65349 -9.205707 49.72528 4.517599 -0.6123392
Sister_Alexa tfus_avg 64.04535 -6.797293 65.72828 11.65349 4.563122 63.49411 4.517599 -0.6123392
Movie_Score tfus_avg 48.58140 -6.797293 65.72828 11.65349 -7.005492 51.92550 4.517599 -0.6123392
Solitary_Gem tfus_avg 70.56860 -6.797293 65.72828 11.65349 11.175003 70.10599 4.517599 -0.6123392
OK_Honey tfus_avg 60.86047 -6.797293 65.72828 11.65349 3.397865 62.32885 4.517599 -0.6123392
Tequilla_Sunday rags_avg 70.68870 9.640709 65.72828 11.65349 -10.651210 64.71778 4.517599 -0.6123392
Lookbothways rags_avg 77.78846 9.640709 65.72828 11.65349 2.092955 77.46195 4.517599 -0.6123392
Our_Ticket rags_avg 75.05168 9.640709 65.72828 11.65349 -9.205707 66.16328 4.517599 -0.6123392
Sister_Alexa rags_avg 74.41707 9.640709 65.72828 11.65349 4.563122 79.93211 4.517599 -0.6123392
Movie_Score rags_avg 69.14183 9.640709 65.72828 11.65349 -7.005492 68.36350 4.517599 -0.6123392
Solitary_Gem rags_avg 80.52524 9.640709 65.72828 11.65349 11.175003 86.54399 4.517599 -0.6123392
OK_Honey rags_avg 77.39183 9.640709 65.72828 11.65349 3.397865 78.76686 4.517599 -0.6123392

The test_data base line predictor (calculated from train_data), RMSE and the decrease from the raw_mean RMSE are highlighted in red. The raw_mean RMSE is highlighted in green.

horse rater rating raw_mean rmse bl_pred rmse2 decrease_in_rmse
Sister_Alexa tfus_lst 63.93023 65.72828 8.108842 60.05149 4.731253 -0.4165316
Lookbothways tfus_lst3 62.64860 65.72828 8.108842 57.30505 4.731253 -0.4165316
Solitary_Gem tfus_lst3 68.76512 65.72828 8.108842 66.38710 4.731253 -0.4165316
Tequilla_Sunday rags_lst 67.15865 65.72828 8.108842 65.63004 4.731253 -0.4165316
Our_Ticket rags_lst3 73.90144 65.72828 8.108842 67.10727 4.731253 -0.4165316
Movie_Score rags_lst3 69.69712 65.72828 8.108842 69.30749 4.731253 -0.4165316
Movie_Score tfus_avg 48.58140 65.72828 8.108842 51.92550 4.731253 -0.4165316
Tequilla_Sunday rags_avg 70.68870 65.72828 8.108842 64.71778 4.731253 -0.4165316
Lookbothways rags_avg 77.78846 65.72828 8.108842 77.46195 4.731253 -0.4165316
Our_Ticket rags_avg 75.05168 65.72828 8.108842 66.16328 4.731253 -0.4165316

Summarized Results

In Project 1 we gathered Horse Ratings from speed figure makers. We used this infomation like survey information to build a recommender system under two different methods, raw_mean and base line predictor. The raw mean is a more crude approach that does not factor in rater or horse biases. The base line predictor method incorporates the raw mean as well as user and item bias. Accordingly, its expected that base line predictor method would yield better results. This is ecactly what we would found. The raw mean approach had training and test_data RMSEs of 11.65 and 8.10, respectively. This compares to RMSEs of 4.52 and 4.73 for the base line predictor methodology. The base line predictor approach reduced the RMSE in the train and test_data sets by 61.2% and 41.7%, respectively.


Table 3. Summary Results

Method Data Set Raw Mean RMSE
raw_mean train 65.73 11.65
raw_mean test 65.73 8.11
base line pred train 65.73 4.52
base line pred train 65.73 4.73