Project Overview
I Love Playing and Watching Golf, So I Wanted to See if There is Any Correlation Between Players’ Rankings, Earnings, Statistics, Etc.
I Intend to Compare and Analyze Different Metrics and Statistics of the Top Players on the Tour. The Data Was Collected and Web Scraped from ESPN and Combines the “Regular Statistics”, “Expanded I Statistics”, and “Expanded II Statistics” by the Name of the Player. The Data is Suitable Because the Tables Include the Key Measurements That Assess How Good of a Golfer a Person Is. An Explanation of the Variables is Below.
| Column Name | Explanation |
|---|---|
| PLAYER | Player Name |
| AGE | Player Age |
| OVERALL RANKING | Player Ranking Based on Earnings |
| EVENTS | Number of Events Played |
| ROUNDS | Number of Rounds Played |
| CUTS MADE | Number of Cuts Made |
| TOP 10 | Number of Top Ten Finishes |
| WINS | Number of Wins |
| CUP POINTS | Total Cup Points* |
| EARNINGS | Total Earnings |
| SAVE PCT RANK | Ranking Based on Driving and Putting Statistics** |
| YDS/DRIVE | Average Yards per Drive |
| DRIVING ACC | Driving Accuracy |
| DRIVE TOTAL | Driving Distance and Accuracy Ranking |
| GREENS IN REG | Percentage of Greens in Regulation |
| PUTT AVG | Average Number of Putts |
| SAVE PCT | Percent of Time it Takes Player to Get ‘Up and Down’ From Bunker |
| SCORING AVG RANK | Ranking Based on Scoring Statistics** |
| EAGLES | Number of Eagles |
| BIRDIES | Number of Birdies |
| PARS | Number of Pars |
| BOGEYS | Number of Bogeys |
| BIRDIES/RD | Average Number of Birdies per Round |
| HOLES/EAGLE | Number of Holes Played per One Eagle |
*Tour Members Earn FedExCup Points Based on Their Finish at Each Tournament, with an Emphasis Placed on Wins and High Finishes. At the Conclusion of the Season, the Top 125 Players in the FedExCup Standings are Eligible to Play in the FedExCup Playoffs. For More Information, Visit: https://www.pgatour.com/fedexcup/fedexcup-overview.html
**When Players Were Tied in Rankings for These Statistics, The Cell on the Website Was Blank. When R Scraped the Data, It Inputted a Funky “A” Symbol. If This Was My Final Project, I Would Have Cleaned the Data and Changed Those, However, Those Are Rankings Based on Other Statistics, So They Were Not Necessary to Complete My Analysis.
Summary Statistics
| PLAYER | CUP POINTS | OVERALL RANKING | TOP 10 | WINS | DRIVING ACC | GREENS IN REG | BIRDIES/RD |
|---|---|---|---|---|---|---|---|
| Bryson DeChambeau | 1577 | 2 | 5 | 2 | 57.7 | 66.7 | 4.500 |
| Justin Thomas | 1552 | 1 | 5 | 1 | 58.5 | 69.4 | 5.095 |
| Stewart Cink | 1348 | 11 | 3 | 2 | 58.7 | 73.6 | 4.360 |
| Xander Schauffele | 1335 | 3 | 6 | 0 | 59.4 | 69.7 | 4.690 |
| Patrick Cantlay | 1280 | 8 | 4 | 1 | 61.1 | 69.2 | 4.500 |
| Viktor Hovland | 1259 | 7 | 4 | 1 | 63.4 | 69.7 | 4.630 |
| Jordan Spieth | 1250 | 6 | 6 | 1 | 52.3 | 64.8 | 4.417 |
| Hideki Matsuyama | 1244 | 5 | 2 | 1 | 61.6 | 67.3 | 3.966 |
| Dustin Johnson | 1194 | 4 | 4 | 1 | 59.5 | 69.0 | 4.324 |
| Billy Horschel | 1100 | 9 | 4 | 1 | 66.2 | 67.9 | 3.769 |
| Tony Finau | 1085 | 13 | 6 | 0 | 55.1 | 67.4 | 4.438 |
| Jon Rahm | 1071 | 10 | 8 | 0 | 63.1 | 72.2 | 4.341 |
| Harris English | 1055 | 17 | 5 | 1 | 65.9 | 66.4 | 4.375 |
| Corey Conners | 993 | 15 | 7 | 0 | 70.0 | 71.8 | 4.274 |
| Joaquin Niemann | 992 | 24 | 3 | 0 | 60.7 | 71.2 | 4.558 |
| Cameron Smith | 981 | 12 | 5 | 0 | 58.0 | 66.4 | 4.460 |
| Daniel Berger | 978 | 22 | 4 | 1 | 64.3 | 69.9 | 4.523 |
| Patrick Reed | 973 | 19 | 4 | 1 | 64.7 | 64.2 | 4.750 |
| Collin Morikawa | 968 | 18 | 4 | 1 | 70.5 | 72.9 | 4.636 |
| Brooks Koepka | 960 | 20 | 4 | 1 | 54.7 | 67.5 | 4.375 |
| Max Homa | 955 | 23 | 3 | 1 | 57.2 | 64.7 | 4.220 |
| Matt Jones | 954 | 29 | 3 | 1 | 56.0 | 67.7 | 3.919 |
| Sungjae Im | 940 | 21 | 3 | 0 | 70.2 | 69.3 | 4.278 |
| Jason Kokrak | 933 | 14 | 4 | 1 | 61.7 | 69.4 | 4.320 |
| Si Woo Kim | 909 | 26 | 3 | 1 | 59.7 | 67.5 | 3.915 |
Above are the Players with the Top Twenty-Five Highest Cup Points So Far This Season. Interestingly, Those Do Not Correspond with Their Earnings’ Ranking, Which Is Listed Next to Their Cup Points. Next, Are The Number of Top Ten Finishes and Wins. These Should Correlate with Cup Points, Since Cup Points Are Awarded Based on Wins and High Finishes. Driving Accuracy and the Percent of Times Players Hit Greens in Regulations are Two Statistics That Indicate Strong Skill Sets. However, the Top Scorers Are Not the Most Accurate or Precise Players, Interestingly Enough. The Last Variable I Included in the Table Was The Average Number of Birdies per Round that the Players Score. These are Three Variables I Would Think Have a Correlation With Cup Points and Earnings … We Shall See!
Driving Distance vs. Accuracy
There is a Negative Correlation Between the Two Variables, Indicating the Players With High Accuracy Do Not Drive as Far as Those With Low Accuracy. Around Fifty-Seven Percent Accuracy, the Slope Coefficient Increases, Causing the Regression Line to More Steeply Decrease. This Demonstrates that for Every One Percent Increase in Accuracy, the Shorter the Player’s Average Drive is. However, Many Points Are Outside the Grey Area, The Confidence Interval, Meaning There Are Many Outliers and No Regression Line Accurately Captures the Relationship Between the Variables.
Earnings Regression: Driving Accuracy, Sand Saves, and Greens in Regulation
##
## Regression Results
## ======================================================
## Dependent variable:
## ----------------------------------
## EARNINGS
## ------------------------------------------------------
## `DRIVING ACC` -1,537.971 (11,578.460)
## `SAVE PCT` 31,105.270*** (8,849.917)
## `GREENS IN REG` 163,653.600*** (22,333.060)
## Constant -11,186,187.000*** (1,430,658.000)
## ------------------------------------------------------
## Observations 203
## R2 0.274
## Adjusted R2 0.263
## Residual Std. Error 912,383.300 (df = 199)
## F Statistic 25.074*** (df = 3; 199)
## ======================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
I Wanted to See If Players’ Driving Accuracy, Save Percentages, and the Percent of Times They Hit Greens in Regulation Are Determinants of Their Earnings. Looking at These Regression Results, It Appears That Their Save Percentage and the Percent of Times They Hit Greens in Regulation Are Statistically Significant. Hitting Greens in Regulations Has a Greater Impact / Increases Their Earnings More Than Their Save Percentage.
Cup Points Regression: Scoring
##
## Regression Results
## ===============================================
## Dependent variable:
## ---------------------------
## `CUP POINTS`
## -----------------------------------------------
## EAGLES 28.649*** (6.292)
## BIRDIES 8.885*** (0.681)
## PARS -1.311*** (0.300)
## BOGEYS -7.745*** (0.941)
## Constant 316.100*** (70.784)
## -----------------------------------------------
## Observations 204
## R2 0.654
## Adjusted R2 0.647
## Residual Std. Error 203.859 (df = 199)
## F Statistic 93.950*** (df = 4; 199)
## ===============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
Scoring (Eagles, Birdies, Pars, and Bogeys) As I Assumed, Has a Strong, Direct Relationship to Their Standings / Results Since All Are Statistically Significant. Eagles, Are The Most Rare, So Those Have a Larger Impact on Their Cup Points. Furthermore, About Sixty-Five Percent of the Variance in the Cup Points (Dependent Variable) Is Predicted from the Scoring Variables (Independent Variables), Which Is Not Too Shabby!
Cup Points Distribution by Age Groups
I Wanted to See How Age Factored Into Standings. Overall, It Looks Like the Older the Players, the Less Cup Points They Earn; Granted, There Are Outliers. The Median Cup Points For Those in Their Twenties is About 500 Points, The Highest of All Age Groups. The Twenties, Though, Has The Largest Range (From 0 to Above 1,500 Points). For the Players in Their Thirties, The Median Number of Cup Points is About 325 Points, With A Few Outliers Upwards of 1,000 Points. The Players in Their Forties Are Not Far Behind, With a Median Number of Cup Points Around 175 Points. Again, There Are Outliers on The Higher End. While The Oldies, Those 50+, Overall Do Not Have High Cup Points, They Are The Only Age Group to Have the Lowest End / Number of Cup Points Be Above 0.
Further Validation of Findings
Building a Predictive Model Probably Would Not Work All That Well Because There Are So Many Other Factors That Go Into How Well Someone Plays Over the Course of the Few Days of a Tournament. However, Pulling Other Statistics from Different Websites and Comparing Them to These Findings, Would Be Interesting and Would (Hopefully) Further Validate These Findings. If I Had to Scrape Data from Another Site and Expand My Analysis, I Would Look at the PGA Tour’s Website Where They Have Thorough Statistics Including “Off the Tee”, “Approach The Green”, “Around The Green”, “Putting”, “Scoring”, “Streaks”, “Money/Finishes”, “Points/Rankings”, and More. (https://www.pgatour.com/stats/stat.186.html)