Assignment 6: PGA Tour

Taylor Latona

4/24/2021

Project Overview

I Love Playing and Watching Golf, So I Wanted to See if There is Any Correlation Between Players’ Rankings, Earnings, Statistics, Etc.

I Intend to Compare and Analyze Different Metrics and Statistics of the Top Players on the Tour. The Data Was Collected and Web Scraped from ESPN and Combines the “Regular Statistics”, “Expanded I Statistics”, and “Expanded II Statistics” by the Name of the Player. The Data is Suitable Because the Tables Include the Key Measurements That Assess How Good of a Golfer a Person Is. An Explanation of the Variables is Below.

Column Name Explanation
PLAYER Player Name
AGE Player Age
OVERALL RANKING Player Ranking Based on Earnings
EVENTS Number of Events Played
ROUNDS Number of Rounds Played
CUTS MADE Number of Cuts Made
TOP 10 Number of Top Ten Finishes
WINS Number of Wins
CUP POINTS Total Cup Points*
EARNINGS Total Earnings
SAVE PCT RANK Ranking Based on Driving and Putting Statistics**
YDS/DRIVE Average Yards per Drive
DRIVING ACC Driving Accuracy
DRIVE TOTAL Driving Distance and Accuracy Ranking
GREENS IN REG Percentage of Greens in Regulation
PUTT AVG Average Number of Putts
SAVE PCT Percent of Time it Takes Player to Get ‘Up and Down’ From Bunker
SCORING AVG RANK Ranking Based on Scoring Statistics**
EAGLES Number of Eagles
BIRDIES Number of Birdies
PARS Number of Pars
BOGEYS Number of Bogeys
BIRDIES/RD Average Number of Birdies per Round
HOLES/EAGLE Number of Holes Played per One Eagle

*Tour Members Earn FedExCup Points Based on Their Finish at Each Tournament, with an Emphasis Placed on Wins and High Finishes. At the Conclusion of the Season, the Top 125 Players in the FedExCup Standings are Eligible to Play in the FedExCup Playoffs. For More Information, Visit: https://www.pgatour.com/fedexcup/fedexcup-overview.html

**When Players Were Tied in Rankings for These Statistics, The Cell on the Website Was Blank. When R Scraped the Data, It Inputted a Funky “A” Symbol. If This Was My Final Project, I Would Have Cleaned the Data and Changed Those, However, Those Are Rankings Based on Other Statistics, So They Were Not Necessary to Complete My Analysis.

Summary Statistics

PLAYER CUP POINTS OVERALL RANKING TOP 10 WINS DRIVING ACC GREENS IN REG BIRDIES/RD
Bryson DeChambeau 1577 2 5 2 57.7 66.7 4.500
Justin Thomas 1552 1 5 1 58.5 69.4 5.095
Stewart Cink 1348 11 3 2 58.7 73.6 4.360
Xander Schauffele 1335 3 6 0 59.4 69.7 4.690
Patrick Cantlay 1280 8 4 1 61.1 69.2 4.500
Viktor Hovland 1259 7 4 1 63.4 69.7 4.630
Jordan Spieth 1250 6 6 1 52.3 64.8 4.417
Hideki Matsuyama 1244 5 2 1 61.6 67.3 3.966
Dustin Johnson 1194 4 4 1 59.5 69.0 4.324
Billy Horschel 1100 9 4 1 66.2 67.9 3.769
Tony Finau 1085 13 6 0 55.1 67.4 4.438
Jon Rahm 1071 10 8 0 63.1 72.2 4.341
Harris English 1055 17 5 1 65.9 66.4 4.375
Corey Conners 993 15 7 0 70.0 71.8 4.274
Joaquin Niemann 992 24 3 0 60.7 71.2 4.558
Cameron Smith 981 12 5 0 58.0 66.4 4.460
Daniel Berger 978 22 4 1 64.3 69.9 4.523
Patrick Reed 973 19 4 1 64.7 64.2 4.750
Collin Morikawa 968 18 4 1 70.5 72.9 4.636
Brooks Koepka 960 20 4 1 54.7 67.5 4.375
Max Homa 955 23 3 1 57.2 64.7 4.220
Matt Jones 954 29 3 1 56.0 67.7 3.919
Sungjae Im 940 21 3 0 70.2 69.3 4.278
Jason Kokrak 933 14 4 1 61.7 69.4 4.320
Si Woo Kim 909 26 3 1 59.7 67.5 3.915

Above are the Players with the Top Twenty-Five Highest Cup Points So Far This Season. Interestingly, Those Do Not Correspond with Their Earnings’ Ranking, Which Is Listed Next to Their Cup Points. Next, Are The Number of Top Ten Finishes and Wins. These Should Correlate with Cup Points, Since Cup Points Are Awarded Based on Wins and High Finishes. Driving Accuracy and the Percent of Times Players Hit Greens in Regulations are Two Statistics That Indicate Strong Skill Sets. However, the Top Scorers Are Not the Most Accurate or Precise Players, Interestingly Enough. The Last Variable I Included in the Table Was The Average Number of Birdies per Round that the Players Score. These are Three Variables I Would Think Have a Correlation With Cup Points and Earnings … We Shall See!

Driving Distance vs. Accuracy

There is a Negative Correlation Between the Two Variables, Indicating the Players With High Accuracy Do Not Drive as Far as Those With Low Accuracy. Around Fifty-Seven Percent Accuracy, the Slope Coefficient Increases, Causing the Regression Line to More Steeply Decrease. This Demonstrates that for Every One Percent Increase in Accuracy, the Shorter the Player’s Average Drive is. However, Many Points Are Outside the Grey Area, The Confidence Interval, Meaning There Are Many Outliers and No Regression Line Accurately Captures the Relationship Between the Variables.

Earnings Regression: Driving Accuracy, Sand Saves, and Greens in Regulation

## 
## Regression Results
## ======================================================
##                            Dependent variable:        
##                     ----------------------------------
##                                  EARNINGS             
## ------------------------------------------------------
## `DRIVING ACC`            -1,537.971 (11,578.460)      
## `SAVE PCT`              31,105.270*** (8,849.917)     
## `GREENS IN REG`        163,653.600*** (22,333.060)    
## Constant            -11,186,187.000*** (1,430,658.000)
## ------------------------------------------------------
## Observations                       203                
## R2                                0.274               
## Adjusted R2                       0.263               
## Residual Std. Error       912,383.300 (df = 199)      
## F Statistic              25.074*** (df = 3; 199)      
## ======================================================
## Note:                      *p<0.1; **p<0.05; ***p<0.01

I Wanted to See If Players’ Driving Accuracy, Save Percentages, and the Percent of Times They Hit Greens in Regulation Are Determinants of Their Earnings. Looking at These Regression Results, It Appears That Their Save Percentage and the Percent of Times They Hit Greens in Regulation Are Statistically Significant. Hitting Greens in Regulations Has a Greater Impact / Increases Their Earnings More Than Their Save Percentage.

Cup Points Regression: Scoring

## 
## Regression Results
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                            `CUP POINTS`        
## -----------------------------------------------
## EAGLES                   28.649*** (6.292)     
## BIRDIES                  8.885*** (0.681)      
## PARS                     -1.311*** (0.300)     
## BOGEYS                   -7.745*** (0.941)     
## Constant                316.100*** (70.784)    
## -----------------------------------------------
## Observations                    204            
## R2                             0.654           
## Adjusted R2                    0.647           
## Residual Std. Error     203.859 (df = 199)     
## F Statistic           93.950*** (df = 4; 199)  
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

Scoring (Eagles, Birdies, Pars, and Bogeys) As I Assumed, Has a Strong, Direct Relationship to Their Standings / Results Since All Are Statistically Significant. Eagles, Are The Most Rare, So Those Have a Larger Impact on Their Cup Points. Furthermore, About Sixty-Five Percent of the Variance in the Cup Points (Dependent Variable) Is Predicted from the Scoring Variables (Independent Variables), Which Is Not Too Shabby!

Cup Points Distribution by Age Groups

I Wanted to See How Age Factored Into Standings. Overall, It Looks Like the Older the Players, the Less Cup Points They Earn; Granted, There Are Outliers. The Median Cup Points For Those in Their Twenties is About 500 Points, The Highest of All Age Groups. The Twenties, Though, Has The Largest Range (From 0 to Above 1,500 Points). For the Players in Their Thirties, The Median Number of Cup Points is About 325 Points, With A Few Outliers Upwards of 1,000 Points. The Players in Their Forties Are Not Far Behind, With a Median Number of Cup Points Around 175 Points. Again, There Are Outliers on The Higher End. While The Oldies, Those 50+, Overall Do Not Have High Cup Points, They Are The Only Age Group to Have the Lowest End / Number of Cup Points Be Above 0.

Further Validation of Findings

Building a Predictive Model Probably Would Not Work All That Well Because There Are So Many Other Factors That Go Into How Well Someone Plays Over the Course of the Few Days of a Tournament. However, Pulling Other Statistics from Different Websites and Comparing Them to These Findings, Would Be Interesting and Would (Hopefully) Further Validate These Findings. If I Had to Scrape Data from Another Site and Expand My Analysis, I Would Look at the PGA Tour’s Website Where They Have Thorough Statistics Including “Off the Tee”, “Approach The Green”, “Around The Green”, “Putting”, “Scoring”, “Streaks”, “Money/Finishes”, “Points/Rankings”, and More. (https://www.pgatour.com/stats/stat.186.html)