Recommender System - PonyPickR
PonyPickR recommends horses to horse players based on ratings from popular speed figure providers. The system will not only recommend desirable horses, but should also provide some insight into the who are the best raters. The ratings utilized by PonyPickR include:
tfus_lst - Timeform US Most Recent Speed Figure
tfus_lst3 - Timeform US Average of Last 3 Speed Figures
tfus_avg - Timeform US Avg of Last and Average of Last 3
rags_lst - Ragozin Sheet Most Recent Speed Figure
rags_lst3 - Ragozin Sheet Average of Last 3 Speed Figures
rags_avg - Ragozin Avg of Last and Average of Last 3
Load and Preprocess Data
Data for PonyPickR was compiled from Timeform US and Ragozin Speed Figures. Information was gathered for the first race at Aqueduct racetrack on February 8, 2020. The first race at Acqueduct on February 8th was a seven horse field racing over mile on the dirt course. Our ratings are on different scales. Therefore, we use the scales packages to place data on a consistent 1-100 scale. Tables 1 and 2 below set forth the raw and scaled ratings for our data set.
Table 1. Raw Data
|
horse
|
tfus_lst
|
tfus_lst3
|
rags_lst
|
rags_lst3
|
tfus_avg
|
rags_avg
|
|
Tequilla_Sunday
|
57
|
69.30
|
30.50
|
23.08333
|
63.150
|
26.79167
|
|
Lookbothways
|
73
|
81.33
|
19.25
|
19.41667
|
77.165
|
19.33333
|
|
Our_Ticket
|
65
|
64.60
|
21.00
|
23.41667
|
64.800
|
22.20833
|
|
Sister_Alexa
|
83
|
83.30
|
22.00
|
23.75000
|
83.150
|
22.87500
|
|
Movie_Score
|
60
|
66.00
|
29.00
|
27.83333
|
63.000
|
28.41667
|
|
Solitary_Gem
|
94
|
89.30
|
16.75
|
16.16667
|
91.650
|
16.45833
|
|
OK_Honey
|
83
|
75.00
|
17.50
|
22.00000
|
79.000
|
19.75000
|
Table 2. Scaled Data - Scaled to a Range of 1 to 100
The scaled data frame is assigned to the user_matrix variable.
|
horse
|
tfus_lst
|
tfus_lst3
|
rags_lst
|
rags_lst3
|
tfus_avg
|
rags_avg
|
|
Tequilla_Sunday
|
43.97674
|
53.41628
|
67.15865
|
74.21875
|
48.69651
|
70.68870
|
|
Lookbothways
|
56.25581
|
62.64860
|
77.86779
|
77.70913
|
59.45221
|
77.78846
|
|
Our_Ticket
|
50.11628
|
49.80930
|
76.20192
|
73.90144
|
49.96279
|
75.05168
|
|
Sister_Alexa
|
63.93023
|
64.16047
|
75.25000
|
73.58413
|
64.04535
|
74.41707
|
|
Movie_Score
|
46.27907
|
50.88372
|
68.58654
|
69.69712
|
48.58140
|
69.14183
|
|
Solitary_Gem
|
72.37209
|
68.76512
|
80.24760
|
80.80288
|
70.56860
|
80.52524
|
|
OK_Honey
|
63.93023
|
57.79070
|
79.53365
|
75.25000
|
60.86047
|
77.39183
|
Create Training and Test Data Sets
We utiize tidymodels to create 75/25 training and test data sets.
Training Data Set
|
rater
|
Lookbothways
|
Movie_Score
|
OK_Honey
|
Our_Ticket
|
Sister_Alexa
|
Solitary_Gem
|
Tequilla_Sunday
|
|
rags_avg
|
NA
|
69.14183
|
77.39183
|
NA
|
74.41707
|
80.52524
|
NA
|
|
rags_lst
|
77.86779
|
68.58654
|
79.53365
|
76.20192
|
75.25000
|
80.24760
|
NA
|
|
rags_lst3
|
77.70913
|
NA
|
75.25000
|
NA
|
73.58413
|
80.80288
|
74.21875
|
|
tfus_avg
|
59.45221
|
NA
|
60.86047
|
49.96279
|
64.04535
|
70.56860
|
48.69651
|
|
tfus_lst
|
56.25581
|
46.27907
|
63.93023
|
50.11628
|
NA
|
72.37209
|
43.97674
|
|
tfus_lst3
|
NA
|
50.88372
|
57.79070
|
49.80930
|
64.16047
|
NA
|
53.41628
|
Test Data Set
|
rater
|
Lookbothways
|
Movie_Score
|
Our_Ticket
|
Sister_Alexa
|
Solitary_Gem
|
Tequilla_Sunday
|
|
rags_avg
|
77.78846
|
NA
|
75.05168
|
NA
|
NA
|
70.68870
|
|
rags_lst
|
NA
|
NA
|
NA
|
NA
|
NA
|
67.15865
|
|
rags_lst3
|
NA
|
69.69712
|
73.90144
|
NA
|
NA
|
NA
|
|
tfus_avg
|
NA
|
48.58140
|
NA
|
NA
|
NA
|
NA
|
|
tfus_lst
|
NA
|
NA
|
NA
|
63.93023
|
NA
|
NA
|
|
tfus_lst3
|
62.64860
|
NA
|
NA
|
NA
|
68.76512
|
NA
|
Using your training data, calculate the raw average (mean) rating for every user-item combination.
Calculate the RMSE for raw average for both your training data and your test data
The train_data raw_mean and RMSE are highlighted in red below.
|
rater
|
raw_mean
|
rmse
|
Lookbothways
|
Movie_Score
|
OK_Honey
|
Our_Ticket
|
Sister_Alexa
|
Solitary_Gem
|
Tequilla_Sunday
|
|
rags_avg
|
65.72828
|
11.65349
|
NA
|
69.14183
|
77.39183
|
NA
|
74.41707
|
80.52524
|
NA
|
|
rags_lst
|
65.72828
|
11.65349
|
77.86779
|
68.58654
|
79.53365
|
76.20192
|
75.25000
|
80.24760
|
NA
|
|
rags_lst3
|
65.72828
|
11.65349
|
77.70913
|
NA
|
75.25000
|
NA
|
73.58413
|
80.80288
|
74.21875
|
|
tfus_avg
|
65.72828
|
11.65349
|
59.45221
|
NA
|
60.86047
|
49.96279
|
64.04535
|
70.56860
|
48.69651
|
|
tfus_lst
|
65.72828
|
11.65349
|
56.25581
|
46.27907
|
63.93023
|
50.11628
|
NA
|
72.37209
|
43.97674
|
|
tfus_lst3
|
65.72828
|
11.65349
|
NA
|
50.88372
|
57.79070
|
49.80930
|
64.16047
|
NA
|
53.41628
|
The test_data RMSE is highligted in blue below.
|
rater
|
raw_mean
|
rmse
|
Lookbothways
|
Movie_Score
|
Our_Ticket
|
Sister_Alexa
|
Solitary_Gem
|
Tequilla_Sunday
|
|
rags_avg
|
65.72828
|
8.108842
|
77.78846
|
NA
|
75.05168
|
NA
|
NA
|
70.68870
|
|
rags_lst
|
65.72828
|
8.108842
|
NA
|
NA
|
NA
|
NA
|
NA
|
67.15865
|
|
rags_lst3
|
65.72828
|
8.108842
|
NA
|
69.69712
|
73.90144
|
NA
|
NA
|
NA
|
|
tfus_avg
|
65.72828
|
8.108842
|
NA
|
48.58140
|
NA
|
NA
|
NA
|
NA
|
|
tfus_lst
|
65.72828
|
8.108842
|
NA
|
NA
|
NA
|
63.93023
|
NA
|
NA
|
|
tfus_lst3
|
65.72828
|
8.108842
|
62.64860
|
NA
|
NA
|
NA
|
68.76512
|
NA
|
Using your training data, calculate the bias for each user and each item.
The user (rater) bias is highlighted in green in the table below.
|
rater
|
raw_mean
|
rmse
|
user_bias
|
Lookbothways
|
Movie_Score
|
OK_Honey
|
Our_Ticket
|
Sister_Alexa
|
Solitary_Gem
|
Tequilla_Sunday
|
|
rags_avg
|
65.72828
|
11.65349
|
9.640709
|
NA
|
69.14183
|
77.39183
|
NA
|
74.41707
|
80.52524
|
NA
|
|
rags_lst
|
65.72828
|
11.65349
|
10.552969
|
77.86779
|
68.58654
|
79.53365
|
76.20192
|
75.25000
|
80.24760
|
NA
|
|
rags_lst3
|
65.72828
|
11.65349
|
10.584700
|
77.70913
|
NA
|
75.25000
|
NA
|
73.58413
|
80.80288
|
74.21875
|
|
tfus_avg
|
65.72828
|
11.65349
|
-6.797293
|
59.45221
|
NA
|
60.86047
|
49.96279
|
64.04535
|
70.56860
|
48.69651
|
|
tfus_lst
|
65.72828
|
11.65349
|
-10.239909
|
56.25581
|
46.27907
|
63.93023
|
50.11628
|
NA
|
72.37209
|
43.97674
|
|
tfus_lst3
|
65.72828
|
11.65349
|
-10.516188
|
NA
|
50.88372
|
57.79070
|
49.80930
|
64.16047
|
NA
|
53.41628
|
The item bias is highlighted in green in the table below.
|
horse
|
raw_mean
|
rmse
|
item_bias
|
rags_avg
|
rags_lst
|
rags_lst3
|
tfus_avg
|
tfus_lst
|
tfus_lst3
|
|
Lookbothways
|
65.72828
|
11.65349
|
2.092955
|
NA
|
77.86779
|
77.70913
|
59.45221
|
56.25581
|
NA
|
|
Movie_Score
|
65.72828
|
11.65349
|
-7.005492
|
69.14183
|
68.58654
|
NA
|
NA
|
46.27907
|
50.88372
|
|
OK_Honey
|
65.72828
|
11.65349
|
3.397865
|
77.39183
|
79.53365
|
75.25000
|
60.86047
|
63.93023
|
57.79070
|
|
Our_Ticket
|
65.72828
|
11.65349
|
-9.205707
|
NA
|
76.20192
|
NA
|
49.96279
|
50.11628
|
49.80930
|
|
Sister_Alexa
|
65.72828
|
11.65349
|
4.563122
|
74.41707
|
75.25000
|
73.58413
|
64.04535
|
NA
|
64.16047
|
|
Solitary_Gem
|
65.72828
|
11.65349
|
11.175003
|
80.52524
|
80.24760
|
80.80288
|
70.56860
|
72.37209
|
NA
|
|
Tequilla_Sunday
|
65.72828
|
11.65349
|
-10.651210
|
NA
|
NA
|
74.21875
|
48.69651
|
43.97674
|
53.41628
|
From the raw average, and the appropriate user and item biases, calculate the baseline predictors for every user-item combination.
The train_data base line prediction, RME and the decrease from the raw_mean RMSE are highlighted in red. The original RMSE is highlighted in green.
|
horse
|
rater
|
rating
|
user_bias
|
raw_mean
|
rmse
|
item_bias
|
bl_pred
|
rmse2
|
decrease_in_rmse
|
|
Tequilla_Sunday
|
tfus_lst
|
43.97674
|
-10.239909
|
65.72828
|
11.65349
|
-10.651210
|
44.83716
|
4.517599
|
-0.6123392
|
|
Lookbothways
|
tfus_lst
|
56.25581
|
-10.239909
|
65.72828
|
11.65349
|
2.092955
|
57.58133
|
4.517599
|
-0.6123392
|
|
Our_Ticket
|
tfus_lst
|
50.11628
|
-10.239909
|
65.72828
|
11.65349
|
-9.205707
|
46.28266
|
4.517599
|
-0.6123392
|
|
Sister_Alexa
|
tfus_lst
|
63.93023
|
-10.239909
|
65.72828
|
11.65349
|
4.563122
|
60.05149
|
4.517599
|
-0.6123392
|
|
Movie_Score
|
tfus_lst
|
46.27907
|
-10.239909
|
65.72828
|
11.65349
|
-7.005492
|
48.48288
|
4.517599
|
-0.6123392
|
|
Solitary_Gem
|
tfus_lst
|
72.37209
|
-10.239909
|
65.72828
|
11.65349
|
11.175003
|
66.66337
|
4.517599
|
-0.6123392
|
|
OK_Honey
|
tfus_lst
|
63.93023
|
-10.239909
|
65.72828
|
11.65349
|
3.397865
|
58.88624
|
4.517599
|
-0.6123392
|
|
Tequilla_Sunday
|
tfus_lst3
|
53.41628
|
-10.516188
|
65.72828
|
11.65349
|
-10.651210
|
44.56088
|
4.517599
|
-0.6123392
|
|
Lookbothways
|
tfus_lst3
|
62.64860
|
-10.516188
|
65.72828
|
11.65349
|
2.092955
|
57.30505
|
4.517599
|
-0.6123392
|
|
Our_Ticket
|
tfus_lst3
|
49.80930
|
-10.516188
|
65.72828
|
11.65349
|
-9.205707
|
46.00639
|
4.517599
|
-0.6123392
|
|
Sister_Alexa
|
tfus_lst3
|
64.16047
|
-10.516188
|
65.72828
|
11.65349
|
4.563122
|
59.77522
|
4.517599
|
-0.6123392
|
|
Movie_Score
|
tfus_lst3
|
50.88372
|
-10.516188
|
65.72828
|
11.65349
|
-7.005492
|
48.20660
|
4.517599
|
-0.6123392
|
|
Solitary_Gem
|
tfus_lst3
|
68.76512
|
-10.516188
|
65.72828
|
11.65349
|
11.175003
|
66.38710
|
4.517599
|
-0.6123392
|
|
OK_Honey
|
tfus_lst3
|
57.79070
|
-10.516188
|
65.72828
|
11.65349
|
3.397865
|
58.60996
|
4.517599
|
-0.6123392
|
|
Tequilla_Sunday
|
rags_lst
|
67.15865
|
10.552969
|
65.72828
|
11.65349
|
-10.651210
|
65.63004
|
4.517599
|
-0.6123392
|
|
Lookbothways
|
rags_lst
|
77.86779
|
10.552969
|
65.72828
|
11.65349
|
2.092955
|
78.37421
|
4.517599
|
-0.6123392
|
|
Our_Ticket
|
rags_lst
|
76.20192
|
10.552969
|
65.72828
|
11.65349
|
-9.205707
|
67.07554
|
4.517599
|
-0.6123392
|
|
Sister_Alexa
|
rags_lst
|
75.25000
|
10.552969
|
65.72828
|
11.65349
|
4.563122
|
80.84437
|
4.517599
|
-0.6123392
|
|
Movie_Score
|
rags_lst
|
68.58654
|
10.552969
|
65.72828
|
11.65349
|
-7.005492
|
69.27576
|
4.517599
|
-0.6123392
|
|
Solitary_Gem
|
rags_lst
|
80.24760
|
10.552969
|
65.72828
|
11.65349
|
11.175003
|
87.45625
|
4.517599
|
-0.6123392
|
|
OK_Honey
|
rags_lst
|
79.53365
|
10.552969
|
65.72828
|
11.65349
|
3.397865
|
79.67911
|
4.517599
|
-0.6123392
|
|
Tequilla_Sunday
|
rags_lst3
|
74.21875
|
10.584700
|
65.72828
|
11.65349
|
-10.651210
|
65.66177
|
4.517599
|
-0.6123392
|
|
Lookbothways
|
rags_lst3
|
77.70913
|
10.584700
|
65.72828
|
11.65349
|
2.092955
|
78.40594
|
4.517599
|
-0.6123392
|
|
Our_Ticket
|
rags_lst3
|
73.90144
|
10.584700
|
65.72828
|
11.65349
|
-9.205707
|
67.10727
|
4.517599
|
-0.6123392
|
|
Sister_Alexa
|
rags_lst3
|
73.58413
|
10.584700
|
65.72828
|
11.65349
|
4.563122
|
80.87610
|
4.517599
|
-0.6123392
|
|
Movie_Score
|
rags_lst3
|
69.69712
|
10.584700
|
65.72828
|
11.65349
|
-7.005492
|
69.30749
|
4.517599
|
-0.6123392
|
|
Solitary_Gem
|
rags_lst3
|
80.80288
|
10.584700
|
65.72828
|
11.65349
|
11.175003
|
87.48798
|
4.517599
|
-0.6123392
|
|
OK_Honey
|
rags_lst3
|
75.25000
|
10.584700
|
65.72828
|
11.65349
|
3.397865
|
79.71085
|
4.517599
|
-0.6123392
|
|
Tequilla_Sunday
|
tfus_avg
|
48.69651
|
-6.797293
|
65.72828
|
11.65349
|
-10.651210
|
48.27978
|
4.517599
|
-0.6123392
|
|
Lookbothways
|
tfus_avg
|
59.45221
|
-6.797293
|
65.72828
|
11.65349
|
2.092955
|
61.02394
|
4.517599
|
-0.6123392
|
|
Our_Ticket
|
tfus_avg
|
49.96279
|
-6.797293
|
65.72828
|
11.65349
|
-9.205707
|
49.72528
|
4.517599
|
-0.6123392
|
|
Sister_Alexa
|
tfus_avg
|
64.04535
|
-6.797293
|
65.72828
|
11.65349
|
4.563122
|
63.49411
|
4.517599
|
-0.6123392
|
|
Movie_Score
|
tfus_avg
|
48.58140
|
-6.797293
|
65.72828
|
11.65349
|
-7.005492
|
51.92550
|
4.517599
|
-0.6123392
|
|
Solitary_Gem
|
tfus_avg
|
70.56860
|
-6.797293
|
65.72828
|
11.65349
|
11.175003
|
70.10599
|
4.517599
|
-0.6123392
|
|
OK_Honey
|
tfus_avg
|
60.86047
|
-6.797293
|
65.72828
|
11.65349
|
3.397865
|
62.32885
|
4.517599
|
-0.6123392
|
|
Tequilla_Sunday
|
rags_avg
|
70.68870
|
9.640709
|
65.72828
|
11.65349
|
-10.651210
|
64.71778
|
4.517599
|
-0.6123392
|
|
Lookbothways
|
rags_avg
|
77.78846
|
9.640709
|
65.72828
|
11.65349
|
2.092955
|
77.46195
|
4.517599
|
-0.6123392
|
|
Our_Ticket
|
rags_avg
|
75.05168
|
9.640709
|
65.72828
|
11.65349
|
-9.205707
|
66.16328
|
4.517599
|
-0.6123392
|
|
Sister_Alexa
|
rags_avg
|
74.41707
|
9.640709
|
65.72828
|
11.65349
|
4.563122
|
79.93211
|
4.517599
|
-0.6123392
|
|
Movie_Score
|
rags_avg
|
69.14183
|
9.640709
|
65.72828
|
11.65349
|
-7.005492
|
68.36350
|
4.517599
|
-0.6123392
|
|
Solitary_Gem
|
rags_avg
|
80.52524
|
9.640709
|
65.72828
|
11.65349
|
11.175003
|
86.54399
|
4.517599
|
-0.6123392
|
|
OK_Honey
|
rags_avg
|
77.39183
|
9.640709
|
65.72828
|
11.65349
|
3.397865
|
78.76686
|
4.517599
|
-0.6123392
|
The test_data base line predictor (calculated from train_data), RMSE and the decrease from the raw_mean RMSE are highlighted in red. The raw_mean RMSE is highlighted in green.
|
horse
|
rater
|
rating
|
raw_mean
|
rmse
|
bl_pred
|
rmse2
|
decrease_in_rmse
|
|
Sister_Alexa
|
tfus_lst
|
63.93023
|
65.72828
|
8.108842
|
60.05149
|
4.731253
|
-0.4165316
|
|
Lookbothways
|
tfus_lst3
|
62.64860
|
65.72828
|
8.108842
|
57.30505
|
4.731253
|
-0.4165316
|
|
Solitary_Gem
|
tfus_lst3
|
68.76512
|
65.72828
|
8.108842
|
66.38710
|
4.731253
|
-0.4165316
|
|
Tequilla_Sunday
|
rags_lst
|
67.15865
|
65.72828
|
8.108842
|
65.63004
|
4.731253
|
-0.4165316
|
|
Our_Ticket
|
rags_lst3
|
73.90144
|
65.72828
|
8.108842
|
67.10727
|
4.731253
|
-0.4165316
|
|
Movie_Score
|
rags_lst3
|
69.69712
|
65.72828
|
8.108842
|
69.30749
|
4.731253
|
-0.4165316
|
|
Movie_Score
|
tfus_avg
|
48.58140
|
65.72828
|
8.108842
|
51.92550
|
4.731253
|
-0.4165316
|
|
Tequilla_Sunday
|
rags_avg
|
70.68870
|
65.72828
|
8.108842
|
64.71778
|
4.731253
|
-0.4165316
|
|
Lookbothways
|
rags_avg
|
77.78846
|
65.72828
|
8.108842
|
77.46195
|
4.731253
|
-0.4165316
|
|
Our_Ticket
|
rags_avg
|
75.05168
|
65.72828
|
8.108842
|
66.16328
|
4.731253
|
-0.4165316
|
Summarized Results
Table 3. Summary Results
| raw_mean |
train |
65.73 |
11.65 |
| raw_mean |
test |
65.73 |
8.11 |
| base line pred |
train |
65.73 |
4.52 |
| base line pred |
train |
65.73 |
4.73 |