Final Project Proposal

Overview

I decided to pursue a Masters in Data Science to improve my hore picking abilities. This decision was based largely on the belief that data science techniques could provide me an information edge. Accordingly, my final project will seek to gain insight into the role distance run by each horse plays in the ultimate order of finish in a horse race.

Data

My primary source of data for this project will be the Trakus website. Trakus provides comprehensive horse racing performance analysis. Specifically, Trakus T-charts provide horse racing enthusist segment times, distance run, distance from the rail, average velocity and beaten lengths for each sixteenth of a mile for a particular horse race. Table 1 below sets forth the 1/16th T-chart for Race 1 at Aqueduct race track on November 15, 2019.

Table 1. Sample T-Chart

WorkFlow

My anticipated workflow for this project is consistent with Hadley Wickam’s Grammar of Data Science and is summarized below:

Import

  • Screen scrape Trakus data from Trakus website
  • Write scraped data to a CSV file for future processing

Tidy / Transform

  • Read CSV data from the Import Section and perform necessary Tidy operations
  • Transform the data to a shape that facilitate chart / products

Visualize Data

  • Use the Tidy data to visualize the race information with a view toward better understanding the impact of distance run plays in the outcome of horse races

Analyze / Model

  • Continue to analyze and/or model the data to solidify your findings

Communicate

  • Use presentation quality charts and table to present conclusions

Potential Challenges

Potential challenges I could encounter during this project include:

  • Collecting enough data
  • Screen scraping the data
  • Actually having some meaningful findings
  • Creating compelling visuals that convey my findings