The Literature Review is broken down into two primary groups:
In 1981 Henry described the outcome of races as being the consequence of a stochastic ordering of the permutations which define the outcome of a race. Henry also developed formulas. Henry’s formulas were better at predicting win, place and show runners, but also were significantly more challenging from a mathematical perspective.
In 1986, Bolton and Chapman performed a study on 200 races from five U.S. tracks. The study employed 10 basic handicapping variables yielded returns of approximately 3.3% utilizing a simple win-bet wagering strategy.
In 1994, Benter published Computer Based Horse Race Handicapping and Wagering Systems: A Report. This study utilized Hong Kong racing data and employed at least 20 sophisticated independent handicapping variables. Benter clearly demonstrated that a computerized handicapping system can beat the races. The author stated that at least at sometimes, at some tracks, a statistically derived fundamental handicapping model can achieve a significant positive expectation. It should be noted that prior to writing his paper, Mr. Benter had spent the previous five years in Hong Kong developing a handicapping team and model. In 2019, Mr. Benter was featured in the Bloomberg article, The Gambler Who Cracked the Horse Race Code – A Billion Dollars Later He Tells His Story.
In 2008, Chapman, published Still Searching For Positive Returns At The Track: Empirical Results From 2,000 Hong Kong Races. Chapman utilized 20 sophisticated handicapping variables model upon the Benter study and produced return in excess of 20%, employing a simple win-bet wagering strategy.
In 2011, CXWong published Precision - Statistical and Mathematical Methods in Horse Racing. Chapter 8 introduces the concepts of hedging and derivatives and introduces the mathematics behind horse racing hedging strategies. The chapter goes own to introduces derivatives and how alternative wager pools can be utilize to simulate a win wagers.
In September 2016 The College Mathematics Journal , Vol. 47, No. 4 published Horse Racing Odds: Can You Beat the Track by Hedging Your Bets? by Joel Pasternack and Stewart Venit. This paper develops a formula that provides a bettor the means to determine if its possible, by betting the correct amount on each horse in a race (given odds for each horse), to win money regardless of the outcome of the race. Converting the given odds to probabilities and summing those probabilities yields an easily calculated parameter that indicates whether the answer to this question is “yes” or “no.” This parameter also determines the percentage of the total amount bet on the race returned to the winning bettors and the percentage retained by the track.
Data for this project was obtained from Time Form US (“TFUS”). TFUS is a leading provider of horse racing data, including past performances and race charts. Data was obtained from Race Charts from North American Thoroughbred race tracks. Specifically, the data is comprised of 12,741 win bet wager results and payoffs.
We wrote two R-programs, one to scrape the data and one to wrangle the wager and race data. The TFChart_Scraper.R program logs into the Timeform site and scrapes race chart data including wager results and payoffs. The scraped data is exported to a csv file. The Race.R program imports the csv data performs some data tidying and then inserts data into a MySQL database that includes a Wager table and a race_starter tables. R-code is included in the Appendix.
Below, we connect to the database, retrieve the wager table and then filter on the win bets. The win bet data along with some race starter data will be utilized for the analysis. We utilize the skim function to provide a brief summary of the win_bets data.
# Esblish Connection with wager table
con <- dbConnect(odbc::odbc(), "MySQL")
# Retrieve
wagers <- tbl(con, "wagers")## Warning: Couldn't find skimmers for class: integer64; No user-defined `sfl`
## provided. Falling back to `character`.
| Name | win_bets |
| Number of rows | 12741 |
| Number of columns | 12 |
| _______________________ | |
| Column type frequency: | |
| character | 7 |
| Date | 1 |
| numeric | 4 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| trk_code | 0 | 1 | 2 | 3 | 0 | 62 | 0 |
| wager_typ | 0 | 1 | 5 | 5 | 0 | 1 | 0 |
| wager_winner | 0 | 1 | 5 | 27 | 0 | 11812 | 0 |
| wager_cat | 0 | 1 | 1 | 1 | 0 | 1 | 0 |
| wager_code | 0 | 1 | 14 | 15 | 0 | 12741 | 0 |
| race_code | 0 | 1 | 14 | 15 | 0 | 12741 | 0 |
| wager_id | 0 | 1 | 17 | 21 | 0 | 12741 | 0 |
Variable type: Date
| skim_variable | n_missing | complete_rate | min | max | median | n_unique |
|---|---|---|---|---|---|---|
| wager_dt | 0 | 1 | 2008-07-02 | 2020-08-22 | 2019-03-06 | 1611 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| race_nbr | 0 | 1 | 5.86 | 2.83 | 1 | 4 | 6 | 8 | 15 | ▆▇▇▂▁ |
| wager_legs | 0 | 1 | 1.00 | 0.00 | 1 | 1 | 1 | 1 | 1 | ▁▁▇▁▁ |
| wager_min | 0 | 1 | 200.00 | 0.00 | 200 | 200 | 200 | 200 | 200 | ▁▁▇▁▁ |
| wager_payoff | 0 | 1 | 1246.27 | 1414.20 | 0 | 520 | 800 | 1380 | 18720 | ▇▁▁▁▁ |
This hedging analysis will compare returns from various hedging and derivative strategies to the returns of simple win-bet strategy.
Here are some key definitions.
Key Horse - The key horse is the horse that is picked to win the race.
Full Hedge - A full hedge allows one to pick multiple horses and earn the same return if either of the selected horses wins.
Fixed Hedge - A fixed hedge allows one to pick one or more hedge horses. If one of the hedge horses wins the return is equal to the total amount wagered.
Exacta - An exacta is a wager where you win by pick the winner (slot 1) and second place horse (slot 2) in order.
Daily Double - is a wager where you pick the winner of two consecutive races. Similar to the exacta the daily double has two slots - winner of first race and winner of second race.
Win_Exacta - The Win_Exacta derivative is a synthetic win bet that is created by using the Key Horse in the first slot of the exacta and all (or substantially all) other horses in the second slot. Since all the horse are used in the second slot the Win_Exacta is essentially a Win Bet.
Win_Double - The Win_Double derivative, similar to the Win_Exacta, is a synthetic win bet that is created by using the Key Horse in the first slot (first race) and all (or substantially all) other horses in the second slot (second race). Since all the horse are used in the second slot the Win_Double is essentially a Win Bet.
The table below sets forth candidate hedging and derivative strategies.
| Hedge and Derivative Strategies | ||
|---|---|---|
| Understanding the betting strategies | ||
| Type | Name | Description |
| Hedge | Full_1 | Full hedge using key horse and 1 hedge horse |
| Hedge | Full_2 | Full hedge using key horse and 2 hedge horses |
| Hedge | Fixed_1 | Fixed hedge using key horse and 1 hedge horse |
| Hedge | Fixed_2 | Fixed hedge using key horse and 2 hedge horses |
| Derivative | Win_Exacta | Derivative win bet using the exacta pool |
| Derivative | Win_Double | Derivative win bet using the daily double pool |
Before we dive right into the hedging algorithm, lets look at a hedging example that demonstrate the key concepts.
In the example below, the 3 horse is the key horse and the 5 and 6 horses are the two hedge horses. The odds for each horse have been input into the odds column. We are also assuming a $1000 total wager. Based upon these inputs, the hedge calculator provides the following information:
If $462, $308, and $231 are bet on the #3, #5, and #6 horses, respectively, the return, if any one of these horses wins, will be $1,848. This compares to $4,000 if the whole $1,000 dollars were bet on the #3 horse, the key horse. Hedging increases your chance of winning, but decreases your return.
Alternatively, if the Fixed flag is employed (guaranteeing a fixed return equal to amount wagered) then if the #3 horse Wins the return is $2,832 and if the #5 or #6 horse wins the return is $1000 - the amount wagered.
In our example above we have three horses that will return $4, $6 and $8 for a $2 dollar wager. If we want to determine how much to bet on each horse so that no matter which horse wins we get the same return, We would want to solve for the values of A, B, C and K from the equations below:
4.0 x A = K
6.0 x B = K
8.0 x C = K
Where A, B and C are the amounts bet on the horses and K is the return earned from the bet.
Using the various payoffs from above, if we calculate
P1 = 6 x 8 = 48, p2 = 4 x 8 = 32 and p3 = 4 x 6 = 24.
and then use the proportion p1 : p2 : p3 = 48 : 32 : 24 and then convert to a percentage, then the proportions for bet 1 through bet 3 would be;
Horse #3 - 48 / 104 = 0.461538
Horse #5 - 32 / 104 = 0.307692
Horse #6 - 24 / 104 = 0.230769
Now if you apply the bet proportions to the amount bet (1000) we can derive the amount to bet on each horse
Therefore,
Horse #3 bet = 1000 x 0.46153 = $462
Horse #5 bet = 1000 x 0.30769 = $308
Horse #6 bet = 1000 x 0.23076 = $231
These bet amounts are consistent with the calculator results above.
Below is an embedded shiny app hedge calculator. The app enables one to wager on up to three different horses. The base hedge will ensure a fixed return a no matter which selected horse wins. The calculator also allows for a fixed return - the wager amount is set to yield a return equal to the amount wagered. This approach enables the bettor to wager more on her key horse and less on the hedge or defensive horses.
The hedge calculator can be accessed here: https://mundymsds.shinyapps.io/HedgeCalc/
The Code for the calculator is set forth below:
odds_list <- c('--','1/9','1/5','2/5','1/2','3/5','4/5','1/1','6/5','7/5',
'3/2','8/5','9/5','2/1','5/2','3/1','7/2','4/1','9/2','5/1',
'6/1','7/1','8/1','9/1','10/1','11/1','12/1','13/1','14/1','15/1',
'16/1','17/1','18/1','19/1','20/1','25/1','30/1','40/1','50/1',
'60/1','70/1','80/1','90/1','99/1')
value_list <- c(0,0.20,0.40,0.80,1.0,1.20,1.60,2.0,2.4,2.80,
3,3.20,3.60,4.0,5.0,6.0,7.0,8.0,9.0,10.0,
12,14,16,18,20,22,24,26,28,30,
32,34,36,38,40,50,60,80,100,
120,140,160,180,198)
odds_df <- tibble(odds=odds_list,value=value_list)
get_odds <- odds_df$value
names(get_odds) <- odds_df$odds
wager_list <- c(10, 20, 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000)
return_list <- c("Y","N")
num_list <- c('-',seq(1:20))
ui <- fluidPage(
sidebarLayout(
sidebarPanel(
fluidRow(
column(3,
selectInput("no1",label="No.",choices=num_list),
selectInput("no2",label=NULL,choices=num_list),
selectInput("no3",label=NULL,choices=num_list),
),
column(4,
selectInput("odd1",label="Odds",choices=odds_list),
selectInput("odd2",label=NULL,choices=odds_list),
selectInput("odd3",label=NULL,choices=odds_list),
),
column(3,
selectInput("fix1",label="Fixed",choices=return_list, selected = 'N'),
selectInput("fix2",label=NULL,choices=return_list, selected = 'N'),
selectInput("fix3",label=NULL,choices=return_list, selected = 'N')
),
selectInput("wager",label=' Wager Amount:',choices=wager_list,selected = 1000),
br(),
actionButton("action_Calc", label = "Refresh & Calculate"),
br(),
br(),
br()
)
,width = 5),
mainPanel(
tabsetPanel(
tabPanel("Output",
# p(h5("Your entered values:")),
# textOutput("text_odd1"),
# textOutput("text_fix1"),
# textOutput("text_odd2"),
# textOutput("text_fix2"),
# textOutput("text_odd3"),
# textOutput("text_fix3"),
# textOutput("text_wager"),
# br(),
p(h5("Calculated values:")),
fluidRow(
column(4,
textOutput("text_no1"),
textOutput("text_odd1"),
textOutput("text_b1"),
textOutput("text_m1"),),
column(4,
textOutput("text_no2"),
textOutput("text_odd2"),
textOutput("text_b2"),
textOutput("text_m2"),),
column(4,
textOutput("text_no3"),
textOutput("text_odd3"),
textOutput("text_b3"),
textOutput("text_m3"),),),
br(),
),
tabPanel("Documentation",
p(h4("Hedge Calculator:")),
br(),
helpText("This application calculates the appropriate hedge given user inputs."),
HTML("<u><b>Complete the following steps: </b></u>
<br> <br>
1. Select the program number for each horse</br>
2. Enter Odds for each horse entered</br>
3. If a Fixed return is desired, select Y in the Fixed drop down</br>
4. Select a wager amount from the drop down</br>
5. Click the Refresh & Calculate button
")
)
),
width = 7)
)
)
server <- function(input,output){
values <- reactiveValues()
# Calculate the interest and amount
observe({
input$action_Calc
values$b1 <- round(isolate({
if_else(get_odds[[input$odd2]] == 0 & get_odds[[input$odd3]]==0, as.numeric(input$wager), as.numeric(input$wager) * ((max(1,get_odds[[input$odd2]])*max(1,get_odds[[input$odd3]]))/((max(1,get_odds[[input$odd2]])*if_else(get_odds[[input$odd1]]==0,0,1)*max(1,get_odds[[input$odd3]]))+(max(1,get_odds[[input$odd1]])*if_else(get_odds[[input$odd2]]==0,0,1)*max(1,get_odds[[input$odd3]]))+(max(1,get_odds[[input$odd2]])*if_else(get_odds[[input$odd3]]==0,0,1)*max(1,get_odds[[input$odd1]])))))
}),0)
values$b2 <- round(isolate({
if_else(get_odds[[input$odd1]] == 0 & get_odds[[input$odd3]]==0, as.numeric(input$wager), as.numeric(input$wager) * ((max(1,get_odds[[input$odd1]])*if_else(get_odds[[input$odd2]]==0,0,1)*max(1,get_odds[[input$odd3]]))/((max(1,get_odds[[input$odd2]])*if_else(get_odds[[input$odd1]]==0,0,1)*max(1,get_odds[[input$odd3]]))+(max(1,get_odds[[input$odd1]])*if_else(get_odds[[input$odd2]]==0,0,1)*max(1,get_odds[[input$odd3]]))+(max(1,get_odds[[input$odd2]])*if_else(get_odds[[input$odd3]]==0,0,1)*max(1,get_odds[[input$odd1]])))))
}),0)
values$b3 <- round(isolate({
if_else(get_odds[[input$odd2]] == 0 & get_odds[[input$odd1]]==0, as.numeric(input$wager) ,as.numeric(input$wager) * ((max(1,get_odds[[input$odd2]])*if_else(get_odds[[input$odd3]]==0,0,1)*max(1,get_odds[[input$odd1]]))/((max(1,get_odds[[input$odd2]])*if_else(get_odds[[input$odd1]]==0,0,1)*max(1,get_odds[[input$odd3]]))+(max(1,get_odds[[input$odd1]])*if_else(get_odds[[input$odd2]]==0,0,1)*max(1,get_odds[[input$odd3]]))+(max(1,get_odds[[input$odd2]])*if_else(get_odds[[input$odd3]]==0,0,1)*max(1,get_odds[[input$odd1]])))))
}),0)
if(input$fix3 == 'Y' & input$fix1=='N' & input$fix2 == 'N') {
values$b3 <- (as.numeric(input$wager) / get_odds[[input$odd3]])
values$b1 <- (as.numeric(input$wager) - as.numeric(values$b3)) * (isolate(get_odds[[input$odd2]]) / (isolate(get_odds[[input$odd2]])+isolate(get_odds[[input$odd1]])))
values$b2 <- (as.numeric(input$wager) - as.numeric(values$b3)) * (isolate(get_odds[[input$odd1]]) / (isolate(get_odds[[input$odd2]])+isolate(get_odds[[input$odd1]])))
}
if(input$fix3 == 'Y' & input$fix2=='Y' & input$fix1 == 'N') {
values$b3 <- round((as.numeric(input$wager) / get_odds[[input$odd3]]),0)
values$b2 <- round((as.numeric(input$wager) / get_odds[[input$odd2]]),0)
values$b1 <- round((as.numeric(input$wager)) - values$b2 - values$b3,0)
}
if(input$fix3 == 'Y' & input$fix2=='Y' & input$fix1 == 'Y') {
values$b3 <- round((as.numeric(input$wager) / get_odds[[input$odd3]]),0)
values$b2 <- round((as.numeric(input$wager) / get_odds[[input$odd2]]),0)
values$b1 <- round((as.numeric(input$wager) / get_odds[[input$odd1]]),0)
}
if(input$fix3 == 'N' & input$fix1=='N' & input$fix2 == 'Y') {
values$b2 <- (as.numeric(input$wager) / get_odds[[input$odd2]])
values$b1 <- (as.numeric(input$wager) - as.numeric(values$b2)) * (isolate(get_odds[[input$odd3]]) / (isolate(get_odds[[input$odd3]])+isolate(get_odds[[input$odd1]])))
values$b3 <- (as.numeric(input$wager) - as.numeric(values$b2)) * (isolate(get_odds[[input$odd1]]) / (isolate(get_odds[[input$odd3]])+isolate(get_odds[[input$odd1]])))
}
if(input$fix3 == 'N' & input$fix1=='Y' & input$fix2 == 'N') {
values$b1 <- (as.numeric(input$wager) / get_odds[[input$odd1]])
values$b2 <- (as.numeric(input$wager) - as.numeric(values$b1)) * (isolate(get_odds[[input$odd3]]) / (isolate(get_odds[[input$odd3]])+isolate(get_odds[[input$odd2]])))
values$b3 <- (as.numeric(input$wager) - as.numeric(values$b1)) * (isolate(get_odds[[input$odd2]]) / (isolate(get_odds[[input$odd3]])+isolate(get_odds[[input$odd2]])))
}
if(input$fix1 == 'Y' & input$fix2=='Y' & input$fix3 == 'N') {
values$b1 <- round((as.numeric(input$wager) / get_odds[[input$odd1]]),0)
values$b2 <- round((as.numeric(input$wager) / get_odds[[input$odd2]]),0)
values$b3 <- round((as.numeric(input$wager)) - values$b2 - values$b1,0)
}
if(input$fix1 == 'Y' & input$fix2=='N' & input$fix3 == 'Y') {
values$b1 <- round((as.numeric(input$wager) / get_odds[[input$odd1]]),0)
values$b3 <- round((as.numeric(input$wager) / get_odds[[input$odd3]]),0)
values$b2 <- round((as.numeric(input$wager)) - values$b3 - values$b1,0)
}
values$m1 <- round(isolate(get_odds[[input$odd1]] * values$b1),2)
values$m2 <- round(isolate(get_odds[[input$odd2]] * values$b2),2)
values$m3 <- round(isolate(get_odds[[input$odd3]] * values$b3),2)
values$text_no1 <- textOutput("text_no1")
})
# Display values entered
output$text_no1 <- renderText({
input$action_Calc
paste("Pgm No.: ", isolate(input$no1))
})
output$text_odd1 <- renderText({
input$action_Calc
paste("Odds: ", isolate(input$odd1))
})
output$text_fix1 <- renderText({
input$action_Calc
paste("Fixed Return: ", isolate(input$fix1))
})
output$text_no2 <- renderText({
input$action_Calc
paste("Pgm No.: ", isolate(input$no2))
})
output$text_odd2 <- renderText({
input$action_Calc
paste("Odds: ", isolate(input$odd2))
})
output$text_fix2 <- renderText({
input$action_Calc
paste("Fixed Return: ", isolate(input$fix2))
})
output$text_no3 <- renderText({
input$action_Calc
paste("Pgm No.: ", isolate(input$no3))
})
output$text_odd3 <- renderText({
input$action_Calc
paste("Odds: ", isolate(input$odd3))
})
output$text_fix3 <- renderText({
input$action_Calc
paste("Fixed Return: ", isolate(input$fix3))
})
output$text_wager <- renderText({
input$action_Calc
paste("Wager Amount: ", isolate(input$wager))
})
# Display calculated values
output$text_b1 <- renderText({
if(input$action_Calc == 0) ""
else
paste("Bet Amount: $", values$b1)
})
output$text_m1 <- renderText({
if(input$action_Calc == 0) ""
else
paste("Payout: $", values$m1)
})
output$text_b2 <- renderText({
if(input$action_Calc == 0) ""
else
paste("Bet Amount: $", values$b2)
})
output$text_m2 <- renderText({
if(input$action_Calc == 0) ""
else
paste("Payout: $", values$m2)
})
output$text_b3 <- renderText({
if(input$action_Calc == 0) ""
else
paste("Bet Amount: $", values$b3)
})
output$text_m3 <- renderText({
if(input$action_Calc == 0) ""
else
paste("Payout: $", values$m3)
}
)
}
shinyApp(ui=ui,server=server, options = list(height = 800))
library(httr)
library(rjson)
library(XML)
library(stringr)
library(curl)
httr::set_config(config(ssl_verifypeer = 0L))
options(HTTPUserAgent="Mozilla/5.0 (Windows NT 6.1; WOW64; rv:47.0) Gecko/20100101 Firefox/47.0")
#Define Input
abb_race_list = list(list("GP","AIK","AJX","ABT","ALB","ANF","AQU","ARP","AP","ASD","ATH","ATL","ATO","BSR","BM","BMF","BEL","BTP","BHP","BEU","BRD","BRO","BCF","CRC","CAM","CBY","CAS","CWF","CHA","CHL","CPW","CD","CNL","NS2","CLS","DAY","DMR","NS9","DEL","DED","DEP","DET","DXD","DUQ","UN","ELK","ELP","EMD","EMT","ED","EUR","EVD","FG","FAI","FMT","FPL","FAX","FP","FPX","FH","FER","FL","FON","FE","FTP","FX","FNO","GV","GIL","GLN","GG","SAF","GN","GPR","GRP","GBF","GF","GLD","GRM","N11","GRA","DUN","GP","GPW","BRN","HST","HAW","HP","HIA","CT","HOL","HOO","HPO","HCF","IND","ING","JRM","KSP","KAM","KEE","KD","KIN","LAM","LBT","LRL","BOI","LBG","LEX","LNN","LEV","LS","LA","LRC","LAD","MVR","MAL","MAN","MAF","MAR","MD","MED","MDA","MID","MC","MS","MIL","MON","NS4","MTH","MTP","MOR","MNR","MPM","S11","FAR","NP","NVD","OKR","OTH","OTP","OSA","OP","OTC","ONE","PMB","PRX","NS6","PEN","NS5","UNI","PW","PHA","PIC","PIM","PMT","PNL","PLN","POD","PM","PRM","PID","RB","RDM","RP","RET","RIL","RD","RKM","RPD","RUI","RUP","SAC","HOU","SJD","SLR","SAN","SDY","SA","SON","SFE","SR","SAR","NS8","SEO","SHW","SHD","SOL","SOP","SPT","STP","STK","STN","SH","SUF","N12","SUD","SND","SUN","SRP","SWF","TAM","WDS","TDN","TIL","TIM","TGD","TRY","TUP","TP","WTS","WW","WBR","WMF","WPR","ELY","WRD","WIL","WNT","WO","WYO","YM","YAV","YD","ZIA"),list("GULFSTREAM PARK","AIKEN","AJAX DOWNS","ALBERTA DOWNS","ALBUQUERQUE","ANTHONY DOWNS","AQUEDUCT","ARAPAHOE PARK","ARLINGTON","ASSINIBOIA DOWNS","ATLANTA","ATLANTIC CITY","ATOKAD DOWNS","BARRETTS RACE MEET AT FAIRPLEX","BAY MEADOWS","BAY MEADOWS FAIR","BELMONT PARK","BELTERRA PARK","BETFAIR HOLLYWOOD PARK","BEULAH PARK","BLUE RIBBON DOWNS","BROOKHILL FARM","BROWN COUNTY FAIR","CALDER RACE COURSE","CAMDEN","CANTERBURY PARK","CASSIA COUNTY FAIR","CENTRAL WYOMING FAIR","CHARLESTON","CHARLOTTE","CHIPPEWA DOWNS","CHURCHILL DOWNS","COLONIAL DOWNS","COLONIAL NSA","COLUMBUS","DAYTON","DEL MAR","DELAWARE NSA","DELAWARE PARK","DELTA DOWNS","DESERT PARK","DETROIT RACE COURSE","DIXIE DOWNS","DU QUOIN","EASTERN OREGON LIVESTOCK SHOW","ELKO COUNTY FAIR","ELLIS PARK","EMERALD DOWNS","EMMETT","ENERGY DOWNS","EUREKA","EVANGELINE DOWNS","FAIR GROUNDS","FAIR HILL","FAIR MEADOWS","FAIR PLAY PARK","FAIRFAX","FAIRMOUNT PARK","FAIRPLEX PARK","FAR HILLS","FERNDALE","FINGER LAKES","FONNER PARK","FORT ERIE","FORT PIERRE","FOXFIELD","FRESNO","GENESEE VALLEY","GILLESPIE COUNTY FAIRGROUND","GLYNDON","GOLDEN GATE FIELDS","GRAHAM COUNTY FAIR @ SAFFORD","GRAND NATIONAL","GRANDE PRAIRIE","GRANTS PASS","GREAT BARRINGTON FAIR","GREAT FALLS","GREAT LAKES DOWNS","GREAT MEADOW","GREAT MEADOW NSA","GREEN ACRES","GREENELEE COUNTY FAIR @ DUNCAN","GULFSTREAM PARK WEST","HARNEY COUNTY FAIR","HASTINGS RACECOURSE","HAWTHORNE","HAZEL PARK","HIALEAH PARK","HOLLYWOOD CASINO AT CHARLES TOWN RACES","HOLLYWOOD PARK","HOOSIER PARK","HORSEMEN'S PARK","HUMBOLDT COUNTY FAIR","INDIANA GRAND RACE COURSE","INGLESIDE","JEROME COUNTY FAIR","KALISPELL","KAMLOOPS","KEENELAND","KENTUCKY DOWNS","KIN PARK","LA MESA PARK","LAUREL BROWN RACETRACK","LAUREL PARK","LES BOIS PARK","LETHBRIDGE","LEXINGTON","LINCOLN RACE COURSE","LITTLE EVERGLADES","LONE STAR PARK","LOS ALAMITOS","LOS ALAMITOS RACE COURSE","LOUISIANA DOWNS","MAHONING VALLEY RACE COURSE","MALVERN","MANOR DOWNS","MARIAS FAIR","MARLBORO","MARQUIS DOWNS","MEADOWLANDS","MELVILLE DISTRICT AGRIPAR","MIDDLEBURG","MILES CITY","MILL SPRING","MILLARVILLE","MONKTON","MONMOUTH NSA","MONMOUTH PARK","MONTPELIER","MORVEN PARK","MOUNTAINEER CASINO RACETRACK & RESORT","MT. PLEASANT MEADOWS","Monterrico Peru","NORTH DAKOTA HORSE PARK","NORTHLANDS PARK","NORTHVILLE DOWNS","OAK RIDGE","OAK TREE AT HOLLYWOOD PARK","OAK TREE AT PLEASANTON","OAK TREE AT SANTA ANITA","OAKLAWN PARK","OCALA TRAINING CENTER","ONEIDA COUNTY FAIR","PALM BEACH POLO CLUB","PARX RACING","PARX RACING NSA","PENN NATIONAL","PENN NATIONAL NSA","PENNSYLVANIA HUNT CUP","PERCY WARNER","PHILADELPHA PARK","PICOV DOWNS","PIMLICO","PINE MTN-CALLAWAY GARDEN","PINNACLE RACE COURSE","PLEASANTON","POCATELLO DOWNS","PORTLAND MEADOWS","PRAIRIE MEADOWS","PRESQUE ISLE DOWNS","RED BANK","RED MILE","REMINGTON PARK","RETAMA PARK","RILLITO","RIVER DOWNS","ROCKINGHAM PARK","ROSSBURN PARKLAND DOWNS","RUIDOSO DOWNS","RUPERT DOWNS","SACRAMENTO","SAM HOUSTON RACE PARK","SAN JUAN DOWNS","SAN LUIS REY TRAINING CENTER","SANDOWN PARK","SANDY DOWNS","SANTA ANITA PARK","SANTA CRUZ COUNTY FAIR @ SONOITA","SANTA FE","SANTA ROSA","SARATOGA","SARATOGA NSA","SEARCH ENGINE OPTIMIZATION","SHAWAN DOWNS","SHENNANDOAH DOWNS","SOLANO","SOUTHERN PINES","SPORTSMAN'S PARK","STAMPEDE PARK","STOCKTON","STONEYBROOK AT FIVE POINTS","STRAWBERRY HILL","SUFFOLK DOWNS","SUFFOLK NSA","SUN DOWNS","SUNFLOWER","SUNLAND PARK","SUNRAY PARK","SWEETWATER DOWNS","TAMPA BAY DOWNS","THE WOODLANDS","THISTLEDOWN","TILLAMOOK COUNTY FAIR","TIMONIUM","TIOGA DOWNS","TRYON","TURF PARADISE","TURFWAY PARK","WAITSBURG RACE TRACK","WALLA WALLA","WEBER DOWNS","WESTERN MT FAIR","WHITE PINE RACEWAY","WHITE PINE RACING","WILL ROGERS DOWNS","WILLOWDALE STEEPLECHASE","WINTERTHUR","WOODBINE","WYOMING DOWNS","YAKIMA MEADOWS","YAVAPAI DOWNS","YELLOWSTONE DOWNS","ZIA PARK"))
MonthConverter <- list(Jan="1",Feb="2",Mar="3",Apr="4",May="5",Jun="6",Jul="7",Aug="8",Sep="9",Oct="10",Nov="11",Dec="12")
TrackConventer <- list("Albuquerque"="ALB","Arlington Park"="AP ","Aqueduct"="AQU","Arapahoe Park"="ARP","Assiniboia"="ASD","Atlantic City"="ATL","Atokad Downs"="ATO","Belmont Park"="BEL","Beulah Park"="BEU","BetfairHollywood"="BHP","Le Bois"="BOI","Fairplex Park"="BSR","Belterra Park"="BTP","Canterbury Park"="CBY","Churchill Downs"="CD ","Columbus"="CLS","Colonial Downs"="CNL","Calder"="CRC","Charles Town"="CT ","Delta Downs"="DED","Delaware Park"="DEL","Derby Future Pool"="DFP","Del Mar"="DMR","Meydan"="DUB","Ellis Park"="ELP","Emerald Downs"="EMD","Emmett"="EMT","Evangeline"="EVD","Fort Erie"="FE ","Ferndale"="FER","Fair Grounds"="FG ","Finger Lakes"="FL ","Fair Meadows"="FMT","Fresno"="FNO","Fonner Park"="FON","Fairmount Park"="FP ","Fairplex Park"="FPX","Fort Pierre"="FTP","Golden Gate"="GG ","Gulfstream Park"="GP","Gulfstream West"="GPW","Hawthorne"="HAW","Humboldt Fair"="HCF","Hollywood Park"="HOL","Hoosier Park"="HOO","Sam Houston"="HOU","Horsemen's Park"="HPO","Hastings"="HST","Indiana Downs"="IND","Kentucky Downs"="KD ","Keeneland"="KEE","Los Al Night"="LA ","Louisiana Downs"="LAD","Lincoln"="LNN","Los Al Day"="LRC","Laurel Park"="LRL","Lone Star Park"="LS ","Manor Downs"="MAN","Marquis Downs"="MD ","Meadowlands"="MED","Millarville"="MIL","Mountaineer"="MNR","Monmouth Park"="MTH","Mahoning Valley"="MVR","Northlands Park"="NP ","Oaklawn Park"="OP ","Santa Anita"="SA","Hollywood Park"="HOL","Pleasanton"="PLN","Penn National"="PEN","Philadelphia Park"="PHA","Presque Isle"="PID","Pimlico"="PIM","Pleasanton"="PLN","Portland Meadow"="PM ","Pinnacle"="PNL","Prairie Meadows"="PRM","Parx Racing"="PRX","River Downs"="RD ","Retama Park"="RET","Rockingham Park"="RKM","Remington Park"="RP ","Ruidoso Downs"="RUI","Santa Anita"="SA ","Sacramento"="SAC","Saratoga"="SAR","Keeneland"="SAX","Solano"="SOL","Santa Rosa"="SR ","Sunray Park"="SRP","Stockton"="STK","Stampede Park"="STP","Suffolk Downs"="SUF","Sunland Park"="SUN","Tampa Bay"="TAM","Thistledown"="TDN","Timonium"="TIM","Turfway Park"="TP ","Turf Paradise"="TUP","The Woodlands"="WDS","Woodbine"="WO ","Will Rogers"="WRD","Wyoming Downs"="WYO","Yavapai Downs"="YAV","Zia Park"="ZIA","Oak Tree at Santa Anita"="OSA","Ocala Training Center"="OTC", "Oak Tree HD"="OTH")
TrackCodeList <- list("ALB","AP ","AQU","ARP","ASD","ATL","ATO","BEL","BEU","BHP","BOI","BSR","BTP","CBY","CD ","CLS","CNL","CRC","CT ","DED","DEL","DFP","DMR","DUB","ELP","EMD","EMT","EVD","FE ","FER","FG ","FL ","FMT","FNO","FON","FP ","FPX","FTP","GG ","GP ","GPW","HAW","HCF","HOL","HOO","HOU","HPO","HST","IND","KD ","KEE","LA ","LAD","LNN","LRC","LRL","LS ","MAN","MD ","MED","MIL","MNR","MTH","MVR","NP ","OP ","OSA","OTC","OTH","OTP","PEN","PHA","PID","PIM","PLN","PM ","PNL","PRM","PRX","RD ","RET","RKM","RP ","RUI","SA ","SAC","SAR","SAX","SOL","SR ","SRP","STK","STP","SUF","SUN","TAM","TDN","TIM","TP ","TUP","WDS","WO ","WRD","WYO","YAV","ZIA")
repeat_link <- c()
year <- "2020"
race_date <- "2020-09-05"
track_name <- "Churchill Downs"
x <- track_name
x <- gsub(" ", "",x)
x <- sapply(x, tolower)
y <- race_date
y <- gsub("-","",y)
z <- ".csv"
file_name <- paste(x,y,z,sep="")
h<-"_header_c"
header_file<-paste(x,y,h,z,sep="")
r<-"_races_c"
race_file<-paste(x,y,r,z,sep="")
s<-"_starters_c"
starter_file<-paste(x,y,s,z,sep="")
f<-"_footnotes_c"
footnote_file<-paste(x,y,f,z,sep="")
w<-"_wagers_c"
wager_file<-paste(x,y,w,z,sep="")
login <- function(){
#Grab CSRF Token
r <- GET("https://timeformus.com/login?ReturnUrl=/")
temp<-content(r, "text")
doc <- htmlParse(temp)
RequestVerificationToken<-xpathSApply(doc,"/html/body/div/div/div[2]/div/form/input[1]", xmlGetAttr, 'value')[[1]]
#Login
r <- POST("https://timeformus.com/Login", body = list('__RequestVerificationToken'=RequestVerificationToken,returnUrl="/",UserName="Myuserid",Password="MyPassword"), encode = "form")
temp<-content(r, "text")
#Keep cookie
a<-cookies(r)
.AspNet.ApplicationCookie <-a[7][[1]][3]
ASP.NET_SessionId <-a[7][[1]][1]
RequestVerificationToken <- a[7][[1]][2]
}
substrRight <- function(x, n){
substr(x, nchar(x)-n+1, nchar(x))
}
purchase_track <- function(race_date,name){
#Check track and purchase
url<-paste(c("https://timeformus.com/buytracks/",race_date),collapse='')
print(url)
r <- GET(url)
temp<-content(r, "text")
doc <- htmlParse(temp)
track_name <- xpathSApply(doc,"//form/div[@class=\"race clearfix\"]/div/strong",xmlValue)
#track_time <-xpathSApply(doc,"//form/div[@class=\"race clearfix\"]/div[2]/span/strong",xmlValue)
track_id <- xpathSApply(doc,"//form/div[@class=\"race clearfix\"]/input[1]", xmlGetAttr, 'value')
track_status <- xpathSApply(doc,"//form/div[@class=\"race clearfix\"]/div[3]", xmlValue)
for(i in 1:length(track_status))
{
if(regexpr(name,track_name[i])!=-1)
{
if(regexpr("Add",track_status[i])!=-1)
{
print("Purchase")
#Purchase
r <- POST("https://timeformus.com/BuyTracks/Add", body = list(trackRaceDateId=track_id[i],raceDate=race_date), encode = "form")
r <- POST("https://timeformus.com/buypps/checkout", body = list(payWith="537b15b2-15af-4450-abdd-4b15d0b6fa9d"), encode = "form")
}
else
{
print('Already own')
}
}
}
}
#Add double quote and clear speical char
clear_text <- function(raw){
raw <- paste(c('"',gsub("^\\s+|\\s+$", "", raw),'"'),collapse='')
return(raw)
}
#Open output file
setwd(file.path("C:","Users", "mutue", "OneDrive", "Documents", "TimeForm", "Data", "Charts"))
headerConn<-file(header_file,"w")
wagerConn<-file(wager_file,"w")
raceConn<-file(race_file,"w")
starterConn<-file(starter_file,"w")
footnoteConn<-file(footnote_file,"w")
fileURL <- file("url.csv","w")
###Program Start Here
#Login
login()
#Check / Purchase track
purchase_track(race_date,track_name)
print("Chart Process Complete")
#Get Race date page
r <- GET(paste(c("https://timeformus.com/account/purchases/",race_date),collapse=''))
temp<-content(r, "text")
doc <- htmlParse(temp)
#Grab track link
link<-xpathSApply(doc,"//div[@class=\"race clearfix\"]/div[3]/a", xmlGetAttr,'href')
name <-xpathSApply(doc,"//div[@class=\"race clearfix\"]/div[1]/strong",xmlValue)
href<-''
for(i in 1:length(name))
{
if(regexpr(track_name,name[i])!=-1)
{
href<-link[i]
print("Found link")
}
}
#Go to track link
url <- paste(c("https://timeformus.com",href),collapse='')
print(url)
r <- GET(url)
temp<-content(r, "text")
doc <- htmlParse(temp)
#Start Loggin result
printf <- function(...) cat(sprintf(...))
`%not in%` <- function (x, table) is.na(match(x, table, nomatch=NA_integer_))
#Loop through all race no
count_race <- 0
#loop tho all charts
for(href in xpathSApply(doc,"//div[@class=\"race-options\"]/select/option", xmlGetAttr,'value'))
{
url <- paste(c("https://timeformus.com",href),collapse='')
r <- GET(url)
temp<-content(r, "text")
doc <- htmlParse(temp)
doc_id <- htmlParse(temp)
print(url)
Link_list_ID <- c()
HorseID <- c()
PastPerformanceLink <-c()
HorseNamePre <- c()
HorseID <-append(HorseID,xpathSApply(doc,"//div[@class=\"list selected\"]",xmlGetAttr,'data-id'))
HorseID <-append(HorseID,xpathSApply(doc,"//div[@class=\"list selected important-message-bg\"]",xmlGetAttr,'data-id'))
HorseID <-append(HorseID,strsplit(xpathSApply(doc,"//div[@class=\"list\"]",xmlGetAttr,'data-id')," "))
if(length(xpathSApply(doc,"//div[@class=\"list important-message-bg\"]",xmlGetAttr,'data-id'))>0)
{
HorseID <-append(HorseID,xpathSApply(doc,"//div[@class=\"list important-message-bg\"]",xmlGetAttr,'data-id'))
}
HorseNamePre <- append(HorseNamePre,xpathSApply(doc,"//div[@class=\"list selected\"]",xmlGetAttr,'data-name'))
HorseNamePre <- append(HorseNamePre,strsplit(xpathSApply(doc,"//div[@class=\"list\"]",xmlGetAttr,'data-name')," "))
if(length(xpathSApply(doc,"//div[@class=\"list important-message-bg\"]",xmlGetAttr,'data-name'))>0)
{
HorseNamePre <- append(HorseNamePre,strsplit(xpathSApply(doc,"//div[@class=\"list important-message-bg\"]",xmlGetAttr,'data-name')," "))
}
count_race <- count_race +1
#collect horses in chart
for(ID in HorseID)
{
Link_list_ID <- append(Link_list_ID,paste("https://timeformus.com/basicpps/starterPastPerformancesAjax?starterId=",ID,sep=''))
}
print("List ID ")
print((Link_list_ID))
k <- 0
for(Link in Link_list_ID)
{
printf ("Race: %d,\n",count_race)
r_t <- GET(Link,encode = "form",add_headers("X-Requested-With" = "XMLHttpRequest"))
temp_t<-content(r_t, "text")
doc_t <- htmlParse(temp_t)
PastPerformanceLink <- c()
PastPerformanceLink <- append(PastPerformanceLink,(xpathSApply(doc_t,"//td[@class=\"no-wrap winner-places\"]/a",xmlGetAttr,'href')))
PastChartCodeTemp <- c()
PastChartCode <- c()
PastChartCodeTemp <- append(PastChartCodeTemp,xpathSApply(doc_t,"//td[@class=\"no-wrap track-race-date\"]/div/div/a/b",xmlValue))
print("********")
print(PastChartCodeTemp)
print("********")
count_pf <- 0
for(i in 1: length(PastChartCodeTemp))
{
if(!is.null(PastPerformanceLink) & !is.null(PastChartCodeTemp) ){
if(nchar(PastChartCodeTemp[i])<=3)
{
PastChartCode <- append(PastChartCode,PastChartCodeTemp[i])
}
}
}
if(is.null(PastChartCode[i])){
PastChartCode[i] <- "XXX"
}
count_PastPerformanceLink <- 1
#loop tho past perf in each horse
if (length(PastPerformanceLink)>0){
for( PastLink in PastPerformanceLink)
{
count_pf<- count_pf +1
short_link <- substring(PastLink,9,nchar(PastLink)-7)
if (length(short_link)==0 | short_link %in% repeat_link | !(is.element(PastChartCode[count_pf],TrackCodeList))) {
printf("Link %d repeted or race was not in US skip to a next link \n",count_PastPerformanceLink )
count_PastPerformanceLink <- count_PastPerformanceLink +1
}
else{
repeat_link <- append(repeat_link, substring(PastLink,9,nchar(PastLink)-7))
writeLines(paste(c(PastLink),collapse=','), fileURL)
url <- paste(c("https://timeformus.com",PastLink),collapse='')
r <- GET(url)
temp<-content(r, "text")
doc <- htmlParse(temp)
#go to past performance
xpath_string <- "//div[@class=\"race-options\"]/select"
xpath_string <- "//li[@class=\"selected\"]"
number_races <- xpathSApply(doc,"//div[@class=\"race-options\"]/select/option",xmlGetAttr,'value')
printf("Scraping Past Performance link %d of %d \n",count_PastPerformanceLink,length(PastPerformanceLink))
count_PastPerformanceLink <- count_PastPerformanceLink +1
count_past_race <- 1
length_race_number <- length(number_races)
record_type<-'"H"'
TrackName <- xpathSApply(doc,"//div[@class=\"track-options\"]/select/option[@selected=\"selected\"]",xmlValue)
TrackName <- substr(TrackName,1,nchar(TrackName)-1)
ChartDate <- xpathSApply(doc,"//div[@class=\"track-date\"]/input",xmlGetAttr,'value')
day <- substr(ChartDate,1,2)
month <- MonthConverter[substr(ChartDate,4,6)]
year <- substr(ChartDate,9,12)
RaceDate <- paste(year,month,day,sep="")
RaceDate <- clear_text(RaceDate)
bias_color <- xpathSApply(doc,"//div[@class=\"track-diagram-dirt\"]/div/div/span",xmlValue)
TrackCode <- clear_text(TrackConventer[TrackName])
TrackName <- clear_text(TrackName)
#Header Record Here
writeLines(paste(c(record_type,'"USA"',TrackCode,RaceDate,clear_text(length(number_races)),clear_text("D"),TrackName),collapse=','), headerConn)
#start
NumberofRace <- xpathSApply(doc,"//div[@class=\"race-options\"]/select/option[@selected=\"selected\"]",xmlValue)
NumberofRace <- gsub("Race ","",NumberofRace)
NumberofRace <- clear_text(NumberofRace)
record_type<-'"R"'
RaceText <- clear_text( xpathSApply(doc,"//div[@class=\"content-info diagram-info\"]/p",xmlValue)[5])
RaceType <- clear_text(xpathSApply(doc,"//div[@class=\"content-info diagram-info\"]/p",xmlValue)[2])
TrackSurface <- clear_text(xpathSApply(doc,"//div[@class=\"content-info diagram-info\"]/p",xmlValue)[1])
TrackCondition <- clear_text(xpathSApply(doc,"//div[@class=\"type\"]/div",xmlValue)[3])
TrackSurface <- paste(TrackSurface, TrackCondition)
Purse <- toString(clear_text(xpathSApply(doc,"//div[@class=\"content-info diagram-info\"]/p",xmlValue)[4]))
Purse <- gsub(",","",Purse)
RaceDistance <- clear_text( xpathSApply(doc,"//div[@class=\"content-info diagram-info\"]/p",xmlValue)[3])
HorseNumber <-length(xpathSApply(doc,"//td[@class=\"entries\"]/span[1]",xmlValue))
RaceFrac1 <- clear_text(xpathSApply(doc,"//div[@class=\"cols race-fractions\"]/table/tr[1]/td[2]",xmlValue))
RaceFrac2 <- clear_text(xpathSApply(doc,"//div[@class=\"cols race-fractions\"]/table/tr[1]/td[3]",xmlValue))
RaceFrac3 <- clear_text(xpathSApply(doc,"//div[@class=\"cols race-fractions\"]/table/tr[1]/td[4]",xmlValue))
RaceFrac4 <- clear_text(xpathSApply(doc,"//div[@class=\"cols race-fractions\"]/table/tr[1]/td[5]",xmlValue))
RaceFrac5 <- clear_text(xpathSApply(doc,"//div[@class=\"cols race-fractions\"]/table/tr[1]/td[6]",xmlValue))
Ajdtime1 <- clear_text(xpathSApply(doc,"//div[@class=\"cols race-fractions\"]/table/tr[2]/td[2]",xmlValue))
Ajdtime2 <- clear_text(xpathSApply(doc,"//div[@class=\"cols race-fractions\"]/table/tr[2]/td[3]",xmlValue))
Ajdtime3 <- clear_text(xpathSApply(doc,"//div[@class=\"cols race-fractions\"]/table/tr[2]/td[4]",xmlValue))
Ajdtime4 <- clear_text(xpathSApply(doc,"//div[@class=\"cols race-fractions\"]/table/tr[2]/td[5]",xmlValue))
Ajdtime5 <- clear_text(xpathSApply(doc,"//div[@class=\"cols race-fractions\"]/table/tr[2]/td[6]",xmlValue))
FootNote <- clear_text(xpathSApply(doc,"//div[@class=\"footnote-modal\"]/div",xmlValue))
is.number <- function(x) grepl("[[:digit:]]", x)
temp_i <- c()
temp_j <- list()
for(i in 1: length(xpathSApply(doc,"//div[@id=\"horse-chart-list\"]/div/table/tr",xmlValue) ))
{
k <- xpathSApply(doc, paste("//div[@id=\"horse-chart-list\"]/div/table/tr[",i,"]/td[5]/div/div/div/*",sep=""),xmlValue)
temp_j <- c(temp_j,list(k))
}
NextOutWinner <- 0
NextOutRunner <-0
for (i in 1: length(temp_j))
{
for (j in 1: length(temp_j[[i]]))
if(length(temp_j[[i]]) != 0)
{
if (nchar(temp_j[[i]][[j]])<10)
{
#locate center
index <- j
if( index-1 !=0)
{
NextOutRunner <- NextOutRunner+1
if( grepl("st",temp_j[[i]][[j-1]]))
{
NextOutWinner <- NextOutWinner+1
}
}
}
}
}
NextOutRunner < toString(NextOutRunner)
NextOutRunner <-clear_text(NextOutRunner)
NextOutWinner <- toString(NextOutWinner)
NextOutWinner <- clear_text(NextOutWinner)
#RaceRecord_Past
writeLines(paste(c(record_type,TrackCode,RaceDate,NumberofRace,RaceType,RaceText,TrackSurface,RaceDistance,Purse,HorseNumber,
RaceFrac1,RaceFrac2,RaceFrac3,RaceFrac4,RaceFrac5,Ajdtime1,Ajdtime2,Ajdtime3,Ajdtime4,Ajdtime5,
NextOutRunner,NextOutWinner,FootNote),collapse=','), raceConn)
record_type<-'"W"'
empty_table <- xpathSApply(doc,"//div[@class=\"exotic-container\"]",xmlValue)
if(nchar(empty_table)>30){
FarRightTable <- xpathSApply(doc,"//div[@class=\"exotic-container\"]/table/tr",xmlValue)
FarRightTable<- gsub(" \r\n ","", FarRightTable)
Wagertype1 <- substring(FarRightTable,8,gregexpr("\\ \\(\\d", FarRightTable))
Wagertype1 <- gsub(" ","",Wagertype1)
WiningNumber<- substring(FarRightTable,gregexpr(" \\(\\d", FarRightTable),gregexpr("[P][a][i][d]", FarRightTable))
WiningNumber <- substring(WiningNumber,1,nchar(WiningNumber)-1)
WiningNumber <- gsub(" ","",WiningNumber)
for (i in 1:length(WiningNumber)){
if(str_count(WiningNumber[i],"\\(")>1)
{
WiningNumber[i] <- substring(WiningNumber[i],2,gregexpr("[L][(]", WiningNumber[i]))
}
else
{
WiningNumber[i] <- substring(WiningNumber[i],2,gregexpr("\\d\\)", WiningNumber[i]))
}
MinPayOff <- strsplit(as.character(as.numeric(substring(FarRightTable,2,5))*100)," ")
PayOffAmount <- substring(FarRightTable,gregexpr(" [$]\\d", FarRightTable),gregexpr("$", FarRightTable))
PayOffAmount <- gsub("[$]|,|\r\n ","",PayOffAmount)
PayOffAmount <- as.character(as.numeric(PayOffAmount)*100)
WagerPool <- '"0"'
CarryOver <- '"0"'
Wageringleg <- list(
Exacta='"2"',
Quinella='"2"',
Trifecta='"3"',
Superfecta='"3"',
HighFive='"5"',
SuperHighFive='"5"',
Z5SuperHi5='"5"',
DailyDouble='"2"',
Pick3='"3"',
ConsolationPick3='"3"',
Pick4='"4"',
Pick5='"5"',
Pick6='"6"',
PlacePick9='"9"',
TwinTrifeca='"3"',
ConsolationDouble='"2"',
SuperHighFiveJackpot='"5"',
Pick6Jackpot='"6"',
GrandSlam='"4"'
)
CenterTable <- xpathSApply(doc,"//div[@class=\"race-result-container\"]/table/tr/td",xmlValue)
WinnerW <- clear_text(CenterTable[1])
WinW <- clear_text(CenterTable[2])
WinP <- clear_text(CenterTable[3])
WinS <- clear_text(CenterTable[4])
WinnerP <- clear_text(CenterTable[5])
PlaceP <- clear_text(CenterTable[7])
PlaceS <- clear_text(CenterTable[8])
WinnerS <- clear_text(CenterTable[9])
ShowS <- clear_text(CenterTable[12])
}
for(count in 1:length(Wagertype1)){
writeLines(paste(c(record_type,TrackCode,RaceDate,NumberofRace,clear_text(Wagertype1[count])
,(Wageringleg[Wagertype1[count]]),clear_text(WiningNumber[count]),clear_text(MinPayOff[count]),WagerPool,
clear_text(PayOffAmount[count]),CarryOver),collapse=','), wagerConn)
}
writeLines(paste(c(record_type,TrackCode,RaceDate,NumberofRace,'"Win_w"'
,'"1"',WinnerW,'"200"',WagerPool,WinW,CarryOver),collapse=','), wagerConn)
writeLines(paste(c(record_type,TrackCode,RaceDate,NumberofRace,'"Win_p"'
,'"1"',WinnerW,'"200"',WagerPool,WinP,CarryOver),collapse=','), wagerConn)
writeLines(paste(c(record_type,TrackCode,RaceDate,NumberofRace,'"Win_s"'
,'"1"',WinnerW,'"200"',WagerPool,WinS,CarryOver),collapse=','), wagerConn)
writeLines(paste(c(record_type,TrackCode,RaceDate,NumberofRace,'"Place_p"'
,'"1"',WinnerP,'"200"',WagerPool,PlaceP,CarryOver),collapse=','), wagerConn)
writeLines(paste(c(record_type,TrackCode,RaceDate,NumberofRace,'"Place_s"'
,'"1"',WinnerP,'"200"',WagerPool,PlaceS,CarryOver),collapse=','), wagerConn)
writeLines(paste(c(record_type,TrackCode,RaceDate,NumberofRace,'"Show_s"'
,'"1"',WinnerS,'"200"',WagerPool,ShowS,CarryOver),collapse=','), wagerConn)
record_type<-'"S"'
Prognum_XML <- xpathSApply(doc,"//div[@id=\"horse-chart-list\"]/div/table/tr/td/span",xmlValue)
#extracing Horse name and prog number
ProgNumber <- c()
TrainerName = strsplit(xpathSApply(doc,"//div[@id=\"horse-chart-list\"]/div/table/tr/td/span[2]/i[1]",xmlValue)," ")
HorseName = xpathSApply(doc,"//div[@id=\"horse-chart-list\"]/div/table/tr/td/span[2]/b",xmlValue)
for(count in 1:length(Prognum_XML))
{
if( count %% 2 == 1)
{
ProgNumber <- append(ProgNumber,Prognum_XML[count])
}
}
JockeyName <- xpathSApply(doc,"//td[@class=\"jockey\"]",xmlValue)
JockeyName <- gsub("\r\n ","", JockeyName)
jockeyDetail <- gsub(" ", "", JockeyName)
jockeyDetail <- substring(jockeyDetail,gregexpr("\\d{3}", jockeyDetail),nchar(jockeyDetail))
for(i in 1:length(JockeyName)) {
JockeyName[i]<-substring(JockeyName[i],0,gregexpr("\\d{3}", JockeyName[i])[[1]]-1)}
JockeyName <- gsub('[0-9]+', '', JockeyName)
jockeyDetailSplit <- c()
# POS = xpathSApply(doc,"//td[@class=\"fraction-figure\"]/div/div/span/b",xmlValue)
# Lengths = xpathSApply(doc,"//td[@class=\"fraction-figure\"]/div/div/span/sup",xmlValue)
# Frac = xpathSApply(doc,"//td[@class=\"fraction-figure\"]/div/div/span[2]",xmlValue)
Claiming = xpathSApply(doc,"//td[@class=\"claiming\"]",xmlValue)
for (i in 1: length(Claiming ))
{
if( nchar(Claiming[i]) ==0)
{
Claiming[i] <- "0"
}
Claiming[i] <- clear_text( Claiming[i])
}
#Merge data
for(count in 1:length(ProgNumber)){
jockeyDetailSplit <- as.list(strsplit(jockeyDetail, "[|]")[[count]])
POS <- c()
Lengths <- c()
Frac <- c()
for(index in 1:5)
{
pos_elem <- xpathSApply(doc,paste0("//div[@class=\"horse-charts-container\"]/table/tr[",count,"]/td[@class=\"fraction-figure\"]/div[1]/div[",index,"]/span/b"),xmlValue)
if(length(pos_elem)!=0)
POS <-c(POS,paste(pos_elem, collapse = ''))
else
POS <- c(POS,' ')
length_elem <- xpathSApply(doc,paste0("//div[@class=\"horse-charts-container\"]/table/tr[",count,"]/td[@class=\"fraction-figure\"]/div[1]/div[",index,"]/span/sup"),xmlValue)
if(length(length_elem)!=0)
Lengths <-c(Lengths,paste(length_elem, collapse = ''))
else
Lengths <-c(Lengths,' ')
frac_elem<-xpathSApply(doc,paste0("//div[@class=\"horse-charts-container\"]/table/tr[",count,"]/td[@class=\"fraction-figure\"]/div[1]/div[",index,"]/span[2]"),xmlValue)
if(length(frac_elem)!=0)
Frac <- c(Frac,paste(frac_elem, collapse = ''))
else
Frac <- c(Frac,' ')
}
#StartRecord
writeLines(paste(c(record_type,TrackCode,RaceDate,NumberofRace,clear_text(ProgNumber[count]),clear_text(HorseName[count]),
clear_text(TrainerName[count]),Claiming[count],clear_text(JockeyName[count]),clear_text(jockeyDetailSplit[1]),
clear_text(jockeyDetailSplit[2]),clear_text(jockeyDetailSplit[3]),
clear_text(POS[1]),clear_text(Lengths[1]),
clear_text(POS[2]),clear_text(Lengths[2]),
clear_text(POS[3]),clear_text(Lengths[3]),
clear_text(POS[4]),clear_text(Lengths[4]),
clear_text(POS[5]),clear_text(Lengths[5]),
clear_text(Frac[1]),clear_text(Frac[2]),
clear_text(Frac[3]),clear_text(Frac[4]),
clear_text(Frac[5])),collapse=','), starterConn)
}
record_type<-'"F"'
HorseList <- c()
temp <- ""
FootNote <- clear_text(xpathSApply(doc,"//div[@class=\"footnote-modal\"]/div",xmlValue))
for(count in 1:length(HorseName))
{
if (grepl(toupper(HorseName[count]),FootNote))
{
HorseList <- append(HorseList,clear_text(HorseName[count]))
temp <- paste(temp, paste(clear_text(HorseName[count]),",",sep=" "),sep=" ")
}
}
temp <- substr(temp,1,nchar(temp)-2)
#FootNote OUtput
writeLines(paste(c(record_type,NumberofRace,TrackCode,RaceDate,FootNote),collapse=','), footnoteConn)
}
}
}
}
}
}
close(raceConn)
close(starterConn)
close(footnoteConn)
close(wagerConn)
close(headerConn)
close(fileURL)
library(tidyverse)
library(lubridate)
library(RCurl)
library(dbplyr)
library(DBI)
library(RMariaDB)
library(skimr)
get_l <- function(l) {
case_when(
l=='head' ~ '0.10',
l=='neck' ~ '0.25',
l=='nose' ~ '0.05',
l=='start' ~ '0',
str_detect(l, ('0')) ~ l,
str_detect(l,'(^\\d+$)') ~ l,
str_detect(l,'(^1\\/2$)') ~ '0.5',
str_detect(l,'(^1\\/4$)') ~ '0.25',
str_detect(l,'(^3\\/4$)') ~ '0.75',
str_detect(l, '(nk.*)') ~ '0.25',
str_detect(l, '(hd\\s?)') ~ '0.10',
str_detect(l,'(^\\d+\\.\\d\\d)') ~ l,
TRUE ~ str_trim(l)
)
}
get_frac <- function(f){
case_when(
str_length(f) < 6 ~ as.double(f),
TRUE ~ (minute(ms(f))*60 + second(ms(f)))
)
}
get_furlongs <- function(d) {
case_when(
str_detect(d,'3 Furlongs') ~ 300,
str_detect(d,'3 1/2 Furlongs') ~ 350,
str_detect(d,'3 3/4 Furlongs') ~ 375,
str_detect(d,'4 Furlongs') ~ 400,
str_detect(d,'4 1/4 Furlongs') ~ 425,
str_detect(d,'4 1/2 Furlongs') ~ 450,
str_detect(d,'5 Furlongs') ~ 500,
str_detect(d,'About 5 Furlongs') ~ 500,
str_detect(d,'5 1/4 Furlongs') ~ 525,
str_detect(d,'5 Furlongs 80 Yards') ~ 536,
str_detect(d,'5 1/2 Furlongs') ~ 550,
str_detect(d,'About 5 1/2 Furlongs') ~ 550,
str_detect(d,'6 Furlongs') ~ 600,
str_detect(d,'About 6 Furlongs') ~ 600,
str_detect(d,'6 1/2 Furlongs') ~ 650,
str_detect(d,'About 6 1/2 Furlongs') ~ 650,
str_detect(d,'7 Furlongs') ~ 700,
str_detect(d,'About 7 Furlongs') ~ 700,
str_detect(d,'7 1/2 Furlongs') ~ 750,
str_detect(d,'1 Mile') ~ 800,
str_detect(d,'About 1 Mile') ~ 800,
str_detect(d,'1 Mile 40 Yards') ~ 818,
str_detect(d,'About 1 Mile 40 Yards') ~ 818,
str_detect(d,'1 Mile 70 Yards') ~ 832,
str_detect(d,'About 1 Mile 70 Yards') ~ 832,
str_detect(d,'1 1/16 Miles') ~ 850,
str_detect(d,'About 1 1/16 Miles') ~ 850,
str_detect(d,'1 1/8 Miles') ~ 900,
str_detect(d,'1 3/16 Miles') ~ 950,
str_detect(d,'1 1/4 Miles') ~ 1000,
str_detect(d,'1 5/16 Miles') ~ 1040,
str_detect(d,'1 5/16 Miles') ~ 1050,
str_detect(d,'1 3/8 Miles') ~ 1100,
str_detect(d,'1 7/16 Miles') ~ 1150,
str_detect(d,'1 1/2 Miles') ~ 1200,
str_detect(d,'1 9/16 Miles') ~ 1250,
str_detect(d,'1 5/8 Miles') ~ 1300,
str_detect(d,'1 11/16 Miles') ~ 1350,
str_detect(d,'1 3/4 Miles') ~ 1400,
str_detect(d,'1 13/16 Miles') ~ 1450,
str_detect(d,'1 7/8 Miles') ~ 1500,
str_detect(d,'1 15/16 Miles') ~ 1550,
str_detect(d,'2 Miles') ~ 1600,
TRUE ~ 0
)
}
get_race_type_code <- function(type) {
case_when(
str_detect(type,"Allowance Optional Claimer") ~ "AOC",
str_detect(type,"Starter Allowance") ~ "STR",
str_detect(type,"Maiden Claiming") ~ "MCL",
str_detect(type,"Allowance") ~ "ALW",
str_detect(type,"Maiden Special Weight") ~ "MSW",
str_detect(type,"(MAIDEN|Maiden)") ~ "MDN",
str_detect(type,"G1") ~ "STK1",
str_detect(type,"G2") ~ "STK2",
str_detect(type,"G3") ~ "STK3",
str_detect(type,"Trial") ~ "TRL",
str_detect(type,"Claiming") ~ "CLM",
str_detect(type,"Starter Optional Claiming") ~ "SOC",
str_detect(type,"Starter Stakes") ~ "SST",
str_detect(type,"(S.|OAKS|Oaks)") ~ "NGS",
str_detect(type,"H.") ~ "HCP",
str_detect(type,"Maiden Optional Claiming") ~ "MOC",
str_detect(type,"Match Race") ~ "MAT",
str_detect(type,"Claiming Stakes") ~ "CST",
str_detect(type,"Handicap") ~ "HCP",
str_detect(type,"Starter Handicap") ~ "SHP",
str_detect(type,"Maiden Stakes") ~ "MSTK",
str_detect(type,"Optional Claiming") ~ "OCL",
str_detect(type,"Maiden Starter Allowance") ~ "MSA",
str_detect(type,"Optional Claiming Stakes") ~ "OCS",
str_detect(type,"Optional Claiming Handicap") ~ "OCH",
TRUE ~ "RACE_TYPE_CODE"
)
}
get_age_re <- function(txt){
case_when(
str_detect(txt,regex('THREE YEARS? OLDS? AND UPWARD',ignore_case=TRUE)) ~ '3YU',
str_detect(txt,regex('FOUR YEARS? OLDS? AND UPWARD',ignore_case=TRUE)) ~ '4YU',
str_detect(txt,regex('TWO YEARS? OLDS?',ignore_case=TRUE)) ~ '2YO',
str_detect(txt,regex('THREE YEARS? OLDS?',ignore_case=TRUE)) ~ '3YO',
str_detect(txt,regex('FOUR YEARS? OLDS?',ignore_case=TRUE)) ~ '4YO',
str_detect(txt,regex('FIVE YEARS? OLDS?',ignore_case=TRUE)) ~ '5YO',
str_detect(txt,regex('THREE-YEARS?-OLDS? AND UPWARD',ignore_case=TRUE)) ~ '3YU',
str_detect(txt,regex('FOUR-YEARS?-OLDS? AND UPWARD',ignore_case=TRUE)) ~ '4YU',
str_detect(txt,regex('TWO-YEARS?-OLDS?',ignore_case=TRUE)) ~ '2YO',
str_detect(txt,regex('THREE-YEARS?-OLDS?',ignore_case=TRUE)) ~ '3YO',
str_detect(txt,regex('FOUR-YEARS?-OLDS?',ignore_case=TRUE)) ~ '4YO',
str_detect(txt,regex('FIVE-YEARS?-OLDS?',ignore_case=TRUE)) ~ '5YO',
TRUE ~ 'RESEARCH'
)
}
get_race_cat <- function(type) {
case_when(
str_detect(type,"STR") ~ "aw",
str_detect(type,"MSW") ~ "MS",
str_detect(type,"MDN") ~ "MS",
str_detect(type,"MCL") ~ "MS",
str_detect(type,"ALW") ~ "AW",
str_detect(type,"STK1") ~ "AW",
str_detect(type,"STK2") ~ "AW",
str_detect(type,"STK3") ~ "AW",
str_detect(type,"TRL") ~ "AW",
str_detect(type,"CLM") ~ "CLM",
str_detect(type,"SOC") ~ "AW",
str_detect(type,"NGSTK") ~ "AW",
str_detect(type,"MOC") ~ "MS",
str_detect(type,"MAT") ~ "AW",
str_detect(type,"CST") ~ "AW",
str_detect(type,"HCP") ~ "AW",
str_detect(type,"MST") ~ "AW",
str_detect(type,"SST") ~ "AW",
str_detect(type,"OCL") ~ "AW",
str_detect(type,"MS") ~ "AW",
str_detect(type,"OCS") ~ "AW",
str_detect(type,"AOC") ~ "AW",
str_detect(type,"OCH") ~ "AW",
TRUE ~ "AW"
)
}
cln_horse <- function(h){
h <- str_replace_all(h,'\\"','')
h <- str_replace(h, '\\(.*','')
h <- str_trim(h)
h
}
get_race_code <- function(dt_str, track, race) {
d <- str_trunc(dt_str,2, side = "left", ellipsis = "")
m <- str_sub(dt_str,5,2)
m <- month(ymd(dt_str),label=TRUE, abbr=TRUE)
yr <- str_trunc(dt_str,4, side = "right", ellipsis = "")
yr <- str_trunc(yr,2, side = "left", ellipsis = "")
r <- str_pad(race, 2, "left", "0")
s <- "_"
x <- str_c(d,m,yr,s,r,track,"00", sep="")
}
get_chart_code <- function(dt_str, track, race) {
d <- str_trunc(dt_str,2, side = "left", ellipsis = "")
m <- str_sub(dt_str,5,2)
m <- month(ymd(dt_str),label=TRUE, abbr=TRUE)
yr <- str_trunc(dt_str,4, side = "right", ellipsis = "")
yr <- str_trunc(yr,2, side = "left", ellipsis = "")
r <- '00'
s <- "_"
x <- str_c(d,m,yr,s,r,track,"00", sep="")
}
get_pp_code <- function(dt_str, track, race, finish) {
d <- str_trunc(dt_str,2, side = "left", ellipsis = "")
m <- str_sub(dt_str,5,2)
m <- month(ymd(dt_str),label=TRUE, abbr=TRUE)
yr <- str_trunc(dt_str,4, side = "right", ellipsis = "")
yr <- str_trunc(yr,2, side = "left", ellipsis = "")
r <- str_pad(race, 2, "left", "0")
p <- str_pad(finish,2, "left", "0")
s <- "_"
x <- str_c(d,m,yr,s,r,track,p, sep="")
}
to_int_str <- function(x) {
x <- str_sub(x, 2, str_length(x))
x <- (as.numeric(x) * 100)
x <- as.character(x)
}
get_wager_cat <- function(w){
case_when(
w == 'Win_w' | w=='Win_p' | w== 'Win_s' ~ 'V',
w == 'Place_p' | w== 'Place_s' ~ 'V',
w == 'Show_s' ~ 'V',
w == 'Exacta' | w == 'Trifecta' | w == 'Superfecta' ~ 'V',
str_detect(w,'(5|FIVE|Five)') & str_detect(w,'(High|HIGH|Super|SUPER|Hi)') ~ 'V',
w == 'DailyDouble' ~ 'H',
w == 'Pick3' ~ 'H',
w == 'Pick4' ~ 'H',
w == 'Pick5' ~ 'H',
w == 'Pick6' ~ 'H',
str_detect(w,'(Consolation|CONSOLATION)') ~ 'V',
TRUE ~ 'V'
)
}
get_wager_legs <- function(w){
case_when(
w == 'Win_w' | w=='Win_p' | w== 'Win_s' ~ 1L,
w == 'Place_p' | w== 'Place_s' ~ 1L,
w == 'Show_s' ~ 1L,
w == 'Exacta' ~ 2L,
w == 'Trifecta' ~ 3L,
w == 'Superfecta' ~ 4L,
str_detect(w,'(5|FIVE|Five)') & str_detect(w,'(High|HIGH|Super|SUPER|Hi)') ~ 5L,
w == 'DailyDouble' ~ 2L,
w == 'Pick3' ~ 3L,
w == 'Pick4' ~ 4L,
w == 'Pick5' ~ 5L,
w == 'Pick6' ~ 6L,
str_detect(w,'(Double|DOUBLE)') ~ 2L,
str_detect(w,'(GRAND|Grand)') ~ 4L,
str_detect(w,'(Pick3|PICK3)') ~ 3L,
str_detect(w,'(Pick4|PICK4)') ~ 4L,
str_detect(w,'(Pick5|PICK5)') ~ 5L,
str_detect(w,'(Pick6|PICK6)') ~ 6L,
str_detect(w,'(9)') ~ 9L,
TRUE ~ 0L
)
}
#Key Inputs
path1 <- "C:/users/mutue/OneDrive/Documents/TimeForm/Data/sacred/"
path2 <- '20200328/'
path3 <- 'gulfstreampark'
path4 <- str_extract(path2,'([0-9]+)')
path5 <- '_races_c.csv'
path6 <- '_starters_c.csv'
path7 <- '_wagers_c.csv'
path8 <- '.csv'
path <- str_c(path1,path2,path3,path4)
race_file <- str_c(path, path5)
starter_file <- str_c(path, path6)
wager_file <- str_c(path, path7)
pp_file <- str_c(path, path8)
races <- read_csv(race_file,
col_names = c("h","trk","dte","nbr","typ","txt","srf","dst","pur","fld","fr1","fr2","fr3","fr4","fr5","fr1a", "fr2a","fr3a","fr4a","fr5a","nor","now","ftn"),
col_types = ("ccccccccccccccccccccccc"))
race_starter <- read_csv(starter_file,
col_names = c("h2","trk2","dte2","nbr2","pgm","nme","trn","cl2","jky","wt","meq","odd","p1","l1","p2","l2", "p3","l3","p4","l4","p5","l5","sfr1","sfr2","sfr3","sfr4","sfr5"),
col_types = ("ccciccciciccicicicicicccccc"))
wager <- read_csv(wager_file, col_names = c("w_h","w_trk","w_dt","w_nbr","w_typ","w_leg","w_wnr","w_min","w_pl","w_rtn","w_co"), col_types = ("cccicccddcd"))
pps <- read_csv(pp_file, col_names = c("c1","c2","c3","c4","c5","c6","c7","c8","c9","c10","c11","c12","c13","c14","c15","c16",
"c17","c18","c19","c20","c21","c22","c23","c24","c25","c26","c27","c28","c29","c30","c31","c32","c33","c34","c35","c36","c37","c38","c39","c40","c41","c42","c43","c44","c45","c46"),
col_types = ("cccccccccccccccccccccccccccccccccccccccccccccc"))
races <- races %>%
select(-h) %>%
#trk trimed
mutate(trk = str_trim(trk)) %>%
#dte character clean - consistent length
mutate(ld =str_length(dte)) %>%
mutate(dte = if_else(ld==7,str_replace(dte,'((\\d{4})(\\d)(\\d{2}))','\\20\\3\\4'),dte)) %>%
#date string
mutate(dt_str=dte) %>%
#convert dte to true date
mutate(dte = ymd(dte)) %>%
select(-ld) %>%
#nbr to integer
mutate(nbr=as.integer(nbr)) %>%
#type trimmed
mutate(typ = str_trim(typ)) %>%
#text trimmed
mutate(txt=str_trim(txt)) %>%
#surface condition
mutate(srf_cnd =srf) %>%
mutate(srf_cnd=str_replace(srf_cnd,'((.*?)\\s(.*))','\\3')) %>%
mutate(srf_cnd=str_replace(srf_cnd,'(\\")','')) %>%
mutate(srf_cnd=str_extract(srf_cnd,'([a-zA-Z]{1,20})')) %>%
#surface
mutate(srf=str_replace_all(srf,'(\\")','')) %>%
mutate(srf=str_replace_all(srf,srf_cnd,'')) %>%
mutate(srf=str_extract(srf,'([a-zA-Z]{1,20})')) %>%
#distance string
mutate(dst_str=dst) %>%
#distance in furlongs (integer)
mutate(dst=get_furlongs(dst)) %>%
#distance type aka route sprint
mutate(rs = if_else(dst<800, 'Sprint','Route')) %>%
#purse
mutate(pur=str_extract(pur,'(\\d{1,10})')) %>%
mutate(pur=as.integer(pur)) %>%
#field
mutate(fld=as.integer(fld)) %>%
#next out winners
mutate(now=if_else(as.integer(nor)+as.integer(now)>0,round((as.integer(now)/as.integer(nor))*100),0)) %>%
#footnote trimmed
mutate(ftn=str_trim(ftn)) %>%
#fractions
mutate(fr1 = if_else(fr1=='-',0,get_frac(fr1))) %>%
mutate(fr2 = if_else(fr2=='-',0,get_frac(fr2))) %>%
mutate(fr3 = if_else(fr3=='-',0,get_frac(fr3))) %>%
mutate(fr4 = if_else(fr4=='-',0,get_frac(fr4))) %>%
mutate(fr5 = if_else(fr5=='-',0,get_frac(fr5))) %>%
mutate(fr1a = if_else(fr1a=='-',0,get_frac(fr1a))) %>%
mutate(fr2a = if_else(fr2a=='-',0,get_frac(fr2a))) %>%
mutate(fr3a = if_else(fr3a=='-',0,get_frac(fr3a))) %>%
mutate(fr4a = if_else(fr4a=='-',0,get_frac(fr4a))) %>%
mutate(fr5a = if_else(fr5a=='-',0,get_frac(fr5a))) %>%
#rail at
mutate(rat =str_extract(txt,'(Rail\\sat\\s\\d+)')) %>%
mutate(rat=as.integer(str_extract(rat,'(\\d+)'))) %>%
mutate(rat = if_else(is.na(rat),0L,rat)) %>%
rename(race_rat = rat) %>%
#Restrictions
#Age
mutate(rea = get_age_re(txt)) %>%
#mutate(rea = str_extract(txt,'(TWO|THREE|FOUR|FIVE|SIX)\\s(YEARS OLD AND UPWARD|YEARS OLD AND UPWARD|YEARS OLD|YEAR OLD)')) %>%
#mutate(rea = age_vec[re_a]) %>%
#Sex
mutate(reg = if_else(str_detect(txt, '(FILLIES|MARES|Fillies|Mares)'),'F','M')) %>%
#state
mutate(res = if_else(str_detect(txt, '(FOALED|BRED|Bred|Foaled|REGISTERED|CONCEIVED|Registered)'),'S','O')) %>%
#Course type: INNER | DOWNHILL | MAIN
mutate(cty = if_else(str_detect(txt, '^(DOWNHILL|Downhill)'),'D',if_else(str_detect(txt, '^(INNER|Inner)'),"I","M"))) %>%
#Grade
mutate(grd = if_else(str_detect(typ, '^(G1)'),1,if_else(str_detect(typ, '^(G2)'),2,if_else(str_detect(typ, '^(G3)'),3,0)))) %>%
#Claiming Prices
mutate(xcl = if_else(str_detect(typ, '(\\:\\s+\\$\\d+,?\\d\\d\\d)'),str_extract(typ, '(\\:\\s+\\$\\d+,?\\d\\d\\d)'),'0')) %>%
mutate(xcl = str_replace(xcl,'(\\:\\s+)','')) %>%
mutate(xcl = str_replace(xcl,'(\\$)','')) %>%
mutate(xcl = str_trim(xcl)) %>%
mutate(xcl = str_replace(xcl,'(\\,)','')) %>%
mutate(xcl = as.integer(xcl)) %>%
mutate(ncl = if_else(str_detect(typ, '(\\-\\s+\\$\\d+,?\\d\\d\\d)'),str_extract(typ, '(\\-\\s+\\$\\d+,?\\d\\d\\d)'),'0')) %>%
mutate(ncl = str_replace(ncl,'(\\-\\s+)','')) %>%
mutate(ncl = str_replace(ncl,'(\\$)','')) %>%
mutate(ncl = str_trim(ncl)) %>%
mutate(ncl = str_replace(ncl,'(\\,)','')) %>%
mutate(ncl = as.integer(ncl)) %>%
mutate(ncl= if_else(xcl>0 & ncl==0,xcl,ncl)) %>%
#Class Code
mutate(cls = get_race_type_code(typ)) %>%
mutate(rct = get_race_cat(cls)) %>%
#Race Code
mutate(rcd = get_race_code(dt_str,trk,nbr)) %>%
#Chart Code
mutate(ccd = get_chart_code(dt_str,trk,nbr)) %>%
rename(trk_code =trk,
race_date = dte,
race_nbr = nbr,
race_txt = txt,
race_srf = srf,
race_dst = dst,
race_dst_str = dst_str,
race_pur = pur,
race_fld = fld,
race_fr1 = fr1,
race_fr2 = fr2,
race_fr3 = fr3,
race_fr4 = fr4,
race_fr5 = fr5,
race_fr1a = fr1a,
race_fr2a = fr2a,
race_fr3a = fr3a,
race_fr4a = fr4a,
race_fr5a = fr5a,
race_now = now,
race_footnote = ftn,
race_srf_cnd = srf_cnd,
race_rs = rs,
race_rea = rea,
race_reg = reg,
race_res = res,
race_crs = cty,
race_grd = grd,
race_xcl = xcl,
race_ncl = ncl,
race_cls = cls,
race_cat = rct,
race_code = rcd,
chart_code = ccd) %>%
select(-typ, -dt_str) %>%
select(-nor) -> races
footnote <- races %>%
select(race_code, race_footnote) %>%
mutate(ftn=str_replace_all(race_footnote,'([A-Z])([a-z])','\\L\\1\\2')) %>%
mutate(ftn=str_replace(ftn,'S. S. ','SS')) %>%
mutate(ftn=str_replace(ftn,'ST. ','ST')) %>%
mutate(ftn=str_replace(ftn,'([A-Z]{1,4})(\\.\\s)','\\1 ')) %>%
mutate(ftn=str_replace(ftn,"(,\\.)",",")) %>%
mutate(ftn=str_split(ftn,'\\.\\s+',n=20)) %>%
unnest() %>%
mutate(nme = str_trim(str_extract(ftn,"(^[A-Z\\.'\\s+]{1,18})"))) %>%
mutate(nme = str_replace_all(nme,'[^A-Za-z0-9]+',"")) %>%
rename(horse = nme,
footnote = ftn) %>%
mutate(ftn_key = str_c(horse, race_code)) %>%
select(ftn_key, footnote)
# View(head(footnote, n=250))
race_starter %>%
#dte2 character clean - consistent length
mutate(ld2 =str_length(dte2)) %>%
mutate(dte2 = if_else(ld2==7,str_replace(dte2,'((\\d{4})(\\d)(\\d{2}))','\\20\\3\\4'),dte2)) %>%
#date string
mutate(dt2_str=dte2) %>%
#convert dte2 to true date
mutate(dte2 = ymd(dte2)) %>%
select(-ld2) %>%
#fix lengths superscripts
#l1
mutate(l1= str_replace_all(l1,'^\\"(\\d+)(\\\xa0)(\\d\\/\\d)\"$','\\1\\3')) %>%
mutate(l1= str_replace_all(l1,'(\\d+)(.)(\\d\\/\\d)','\\1\\3')) %>%
mutate(l1=str_replace(l1,'(1\\/2)',".50")) %>%
#l2
mutate(l2= str_replace_all(l2,'^\\"(\\d+)(\\\xa0)(\\d\\/\\d)\"$','\\1\\3')) %>%
mutate(l2= str_replace_all(l2,'(\\d+)(.)(\\d\\/\\d)','\\1\\3')) %>%
mutate(l2=str_replace(l2,'(1\\/2)',".50")) %>%
#l3
mutate(l3= str_replace_all(l3,'^\\"(\\d+)(\\\xa0)(\\d\\/\\d)\"$','\\1\\3')) %>%
mutate(l3= str_replace_all(l3,'(\\d+)(.)(\\d\\/\\d)','\\1\\3')) %>%
mutate(l3=str_replace(l3,'(1\\/2)',".50")) %>%
#l4
mutate(l4= str_replace_all(l4,'^\\"(\\d+)(\\\xa0)(\\d\\/\\d)\"$','\\1\\3')) %>%
mutate(l4= str_replace_all(l4,'(\\d+)(.)(\\d\\/\\d)','\\1\\3')) %>%
mutate(l4=str_replace(l4,'(1\\/2)',".50")) %>%
#l5
mutate(l5= str_replace_all(l5,'^\\"(\\d+)(\\\xa0)(\\d\\/\\d)\"$','\\1\\3')) %>%
mutate(l5= str_replace_all(l5,'(\\d+)(.)(\\d\\/\\d)','\\1\\3')) %>%
mutate(l5=str_replace(l5,'(1\\/2)',".50")) %>%
mutate(l5=str_replace(l5,'(1\\/4)',".25")) %>%
mutate(l5=str_replace(l5,'(3\\/4)',".75")) %>%
#Turn NAs to Zero for all ls
mutate(l1 = if_else(is.na(l1),'0', l1)) %>%
mutate(l2 = if_else(is.na(l2),'0', l2)) %>%
mutate(l3 = if_else(is.na(l3),'0', l3)) %>%
mutate(l4 = if_else(is.na(l4),'0', l4)) %>%
mutate(l5 = if_else(is.na(l5),'0', l5)) %>%
#l2
mutate(l1 = get_l(l1)) %>%
mutate(l2 = get_l(l2)) %>%
mutate(l3 = get_l(l3)) %>%
mutate(l4 = get_l(l4)) %>%
#l5
mutate(l5 = if_else(l5=='DH',l4,l5)) %>%
mutate(l5 = get_l(l5)) %>%
#Group By Race Date and Race Number
group_by(dte2, trk2, nbr2) %>%
arrange(p2, .by_group=TRUE) %>%
mutate(l1 =as.double(l1)) %>%
mutate(l2 =as.double(l2)) %>%
mutate(l3 =as.double(l3)) %>%
mutate(l4 =as.double(l4)) %>%
mutate(l5 =as.double(l5)) %>%
mutate(bl =sum(l2)) %>%
mutate(fld = n()) %>%
mutate(mybl =cumsum(l2)) %>%
mutate(mybl2 = if_else(p2==1,0,lag(mybl))) %>%
mutate(rs = if_else(mybl2<1,"E","z")) %>%
mutate(rs = if_else(rs=="z" & p2/fld >= 0.66 | mybl>7, "C",rs)) %>%
mutate(rs = if_else(rs=="z" & mybl2 >=1 & mybl2 <=1.5, "S", rs)) %>%
mutate(rs = if_else(rs=="z" & mybl2 >1.5 & mybl2 <3.7, "P",rs)) %>%
mutate(rs = if_else(rs=="z","M", rs)) %>%
ungroup() %>%
group_by(dte2, trk2, nbr2, rs) %>%
arrange(p2, .by_group=TRUE) %>%
mutate(pir = 1:n()) %>%
ungroup() %>%
group_by(dte2, trk2, nbr2) %>%
arrange(p2, .by_group=TRUE) %>%
mutate(rsr= str_c(rs,pir)) %>%
ungroup() %>%
group_by(dte2, trk2, nbr2) %>%
mutate(rshp=max(rsr)) %>%
rename(str_bl_2c=mybl2) %>%
rename(strung = bl) %>%
ungroup() %>%
#Race Code
mutate(rcd = get_race_code(dt2_str,trk2,nbr2)) %>%
#Chart Code
mutate(ccd = get_chart_code(dt2_str,trk2,nbr2)) %>%
#pp Code
mutate(pcd = get_pp_code(dt2_str,trk2,nbr2,p5)) -> race_starter
#Join race_starter and footnote
race_starter <- race_starter %>%
mutate(nme = str_replace(nme,'(.*)(\\s+\\([A-Z]{2,3}\\))','\\1')) %>%
mutate(temp_horse = nme) %>%
mutate(nme = str_replace_all(nme,'[^A-Za-z0-9]+',"")) %>%
rename(horse = nme,
race_code = rcd) %>%
mutate(horse = str_to_upper(horse)) %>%
mutate(horse = cln_horse(horse)) %>%
mutate(ftn_key = str_c(horse, race_code))
race_starter <- left_join(race_starter, footnote, by="ftn_key") %>%
mutate(horse = temp_horse) %>%
mutate(horse = cln_horse(horse)) %>%
mutate(horse = str_to_upper(horse)) %>%
mutate(cl2 = if_else(is.na(cl2)==TRUE,0L,cl2)) %>%
mutate(odd = str_replace(odd,('\\/'),"-")) %>%
mutate(num = as.integer(str_replace(odd,'(\\d+)(\\-.*)','\\1'))) %>%
mutate(den = as.integer(str_replace(odd,'(\\d+)(\\-)(\\d+)','\\3'))) %>%
mutate(odd = (num / den)*100) %>%
mutate(sfr1 = if_else(sfr1=='-',0,get_frac(sfr1))) %>%
mutate(sfr2 = if_else(sfr2=='-',0,get_frac(sfr2))) %>%
mutate(sfr3 = if_else(sfr3=='-',0,get_frac(sfr3))) %>%
mutate(sfr4 = if_else(sfr4=='-',0,get_frac(sfr4))) %>%
mutate(sfr5 = if_else(sfr5=='-',0,get_frac(sfr5))) %>%
rename(trk_code = trk2,
race_date = dte2,
race_nbr = nbr2,
off_odds = odd,
chart_code = ccd,
starter_code = pcd
) %>%
mutate(l1 = as.integer(l1 * 100)) %>%
mutate(l2 = as.integer(l2 * 100)) %>%
mutate(l3 = as.integer(l3 * 100)) %>%
mutate(l4 = as.integer(l4 * 100)) %>%
mutate(l5 = as.integer(l5 * 100)) %>%
mutate(final_frac = sfr5) %>%
mutate(meq = if_else(is.na(meq),'--', meq)) %>%
mutate(sfr5 = if_else(sfr4==0, sfr5 + sfr3, sfr5 + sfr4)) %>%
select(-num,-den, -h2, -cl2, -pir, -mybl, -str_bl_2c, -ftn_key, -dt2_str, - temp_horse)
#View(head(races, n=400))
#skim(races)
# print(head(race_starter,n=50))
# Wager section
#dte character clean - consistent length
wager <- wager %>%
mutate(w_ld =str_length(w_dt)) %>%
mutate(w_dt = if_else(w_ld==7,str_replace(w_dt,'((\\d{4})(\\d)(\\d{2}))','\\20\\3\\4'),w_dt)) %>%
mutate(w_rtn = ifelse(w_leg == 1,to_int_str(w_rtn),w_rtn)) %>%
mutate(w_rtn = ifelse(w_typ =="X-5SuperHighFive" & is.na(w_rtn), 0,w_rtn)) %>%
mutate(w_cat = get_wager_cat(w_typ)) %>%
mutate(w_leg = get_wager_legs(w_typ)) %>%
mutate(wager_id = case_when(
w_typ == "Win_w" ~ 11L,
w_typ == "Win_p" ~ 12L,
w_typ == "Win_s" ~ 13L,
w_typ == "Place_p" ~ 22L,
w_typ == "Place_s" ~ 23L,
w_typ == "Show_s" ~ 33L,
w_typ == "Exacta" ~ 20L,
w_typ == "Quinella" ~ 25L,
w_typ == "Trifecta" ~ 30L,
w_typ == "Superfecta" ~ 40L,
w_typ == "X-5SuperHighFive" ~ 50L,
w_typ == "DailyDouble" ~ 92L,
w_typ == "ConsolationDouble" ~ 920L,
w_typ == "Consolation Double" ~ 920L,
w_typ == "Pick3" ~ 93L,
w_typ == "ConsolationPick3" ~ 930L,
w_typ == "Consolation Pick 3" ~ 930L,
w_typ == "Pick4" ~ 94L,
w_typ == "ConsolationPick4" ~ 940L,
w_typ == "Pick5" ~ 95L,
w_typ == "ConsolationPick5" ~ 950L,
w_typ == "Pick6" ~ 96L,
w_typ == "ConsolationPick6" ~ 960L,
TRUE ~ 99L)) %>%
mutate(w_rtn = as.integer(w_rtn)) %>%
#date string
mutate(w_dt_str=w_dt) %>%
#convert dte to true date
mutate(w_dt = ymd(w_dt)) %>%
select(-w_ld) %>%
mutate(race_code = get_race_code(w_dt_str,w_trk, w_nbr),
wager_code =get_pp_code(w_dt_str,w_trk, w_nbr,wager_id)) %>%
rename(trk_code = w_trk,
wager_dt = w_dt,
race_nbr = w_nbr,
wager_typ = w_typ,
wager_legs = w_leg,
wager_winner = w_wnr,
wager_min = w_min,
wager_payoff = w_rtn,
wager_cat = w_cat) %>%
mutate(wager_payoff = if_else(is.na(wager_payoff),0L,wager_payoff)) %>%
mutate(race_nbr = as.integer(race_nbr)) %>%
mutate(wager_win = as.integer(wager_min)) %>%
select(trk_code, wager_dt, race_nbr, wager_typ, wager_legs, wager_winner, wager_min, wager_payoff, wager_cat, wager_code, race_code) %>%
filter(!is.na(wager_typ))
#View(head(wager, n=100))
#View(head(races, n=100))
#print(races, n = 200)
#skim(wager)
# database activities Insert tibble into MySQL tables
# Wager
# Ensure no duplicate data
wager <- wager %>%
distinct() %>%
group_by(wager_code) %>%
mutate(n = seq(n())) %>%
ungroup() %>%
mutate(wager_code = if_else(n>1, str_c(wager_code,n),wager_code)) %>%
select(-n)
# Esblish Connection
con <- dbConnect(odbc::odbc(), "MySQL")
#Insert data
insert <- dbSendQuery(con, "INSERT ignore INTO wagers(`trk_code`, `wager_dt`, `race_nbr`, `wager_typ`, `wager_legs`, `wager_winner`, `wager_min`, `wager_payoff`, `wager_cat`, `wager_code`, `race_code`) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)")
dbBind(insert, wager)
dbClearResult(insert)
con <- dbDisconnect(con)
# Race_Starter
# Ensure no duplicate data
race_starter <- race_starter %>%
distinct() %>%
group_by(starter_code) %>%
mutate(n = seq(n())) %>%
ungroup() %>%
mutate(starter_code = if_else(n>1, str_c(starter_code,n),starter_code)) %>%
select(-n)
#skim(race_starter)
#Esblish Connection
con <- dbConnect(odbc::odbc(), "MySQL")
insert <- dbSendQuery(con, "INSERT ignore INTO race_starters(`trk_code`, `race_date`, `race_nbr`, `pgm`, `horse`, `trn`, `jky`, `wt`, `meq`, `off_odds`, `p1`, `l1`, `p2`, `l2`, `p3`, `l3`, `p4`, `l4`, `p5`, `l5`, `sfr1`, `sfr2`, `sfr3`, `sfr4`, `sfr5`, `final_frac`, `strung`, `fld`, `rs`, `rsr`, `rshp`, `race_code`, `chart_code`, `starter_code`, `footnote`) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)")
dbBind(insert, race_starter)
dbClearResult(insert)
con <- dbDisconnect(con)
# Races
# Ensure no duplicate data
races <- races %>%
distinct() %>%
group_by(race_code) %>%
mutate(n = seq(n())) %>%
ungroup() %>%
filter(n==1) %>%
select(-n)
#skim(race_starter)
#Esblish Connection
con <- dbConnect(odbc::odbc(), "MySQL")
insert <- dbSendQuery(con, "INSERT ignore INTO races(`trk_code`, `race_date`, `race_nbr`, `race_txt`, `race_srf`, `race_rat`, `race_dst`, `race_dst_str`, `race_pur`, `race_fld`, `race_fr1`, `race_fr2`, `race_fr3`, `race_fr4`, `race_fr5`, `race_fr1a`, `race_fr2a`, `race_fr3a`, `race_fr4a`, `race_fr5a`, `race_now`, `race_footnote`, `race_srf_cnd`, `race_rs`, `race_rea`, `race_reg`, `race_res`, `race_crs`, `race_grd`, `race_xcl`, `race_ncl`, `race_cls`, `race_cat`, `race_code`, `chart_code`) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)")
dbBind(insert, races)
dbClearResult(insert)
con <- dbDisconnect(con)
#skim(races)