Golf Data Analysis
PGA Tour Players Analysis
With The Masters just occurring a week ago and the weather getting nicer, once again my love for golf is back in my system. While looking through different websites containing golf statistics, one stood out to me against the others and that would be ESPN because it contains the earnings for each golfer by each year dating back to 2003. Because golf is not exactly what it used to be in 2003, I have decided to analyze the last 10 years, dating back until 2014.
The Question at Hand
With all of this data, I would like to see what stats are most influential to a players earnings.
Using earnings as a success metric, I will compare other variables to see what can impact a players earnings. It may be interesting to find out that winning the most doesnt equate to more money or if hitting the ball the furthest is beneficial. There are also other variables that I can use as success metrics such as wins or top tens or cup points but going into this, I was most focused on earnings and that is what attracted me to this data to begin with.
Data wrangling
Perform the necessary data wrangling to select, filter and transform the data following your proposed solution above. Any subsets, groups, aggregation functions or other data modifications should be clearly identified in your explanation.
This data was overall very clean and there was not much extra work. The one thing that was necessary for this data frame, however, was to create a new variable denoting the year. That is because each web page that I scraped was a different year and once I combined them all, there was no way to tell the years apart. On top of that, many of the players repeat on the list each year, so there would be no way to decipher between those player years as well.
To do this, I included the code below within my function so that it loops and does the same action for each years webpage
year_data <- data.frame(
Year = year,
)
Another thing that I had to do before performing any analysis would be to make all of my variables, other than name, as numeric.
scraped_data_multiple_years <- scraped_data_multiple_years %>%
mutate_at(vars(-name), as.numeric)
One last thing that I need to do would be getting rid of the dollar sign and the commas in the earnings column
scraped_data_multiple_years$Earnings <- gsub("\\$", "", scraped_data_multiple_years$Earnings)
# Clean the "Earnings" column by removing non-numeric characters
scraped_data_multiple_years$Earnings <- gsub("\\D", "", scraped_data_multiple_years$Earnings)
Performing the Analysis
Importing the data
I have already went ahead and scraped the data and uploaded it to my one drive to host it in the cloud.
I can now import the data using the link that one drive gave me, as shown below
scraped_data_multiple_years <- read.csv(https://myxavier-my.sharepoint.com/:x:/r/personal/mogensenj_xavier_edu/Documents/spring%2024/BAIS%20462/scraped_data_multiple_years.csv?d=w736c93b2f07644e49701b3b06ca5206e&csf=1&web=1&e=paapMH)
Visuals
Analysis 1
For my first visual, I would like to compare earnings to wins, just to make sure that the more you win, the more you should earn.
As you can see, for the most part, the more wins you get, the more earnings you will have on average. We do see an unusual amount of earnings for those players who have 4 wins but other than that, this graph is as expected.
Analysis 2
Now I would like to see if hitting the ball further really equates to more money.
Based on the visual it does somewhat support the argument that more driving distance leads to more earnings but the correlation does not appear to be strong at all and because of this, I would not recommend a pro golfer to practice distance.
Analysis 3
There is an old saying that goes “drive for show, putt for dough” meaning that putting is really where you make your difference on the golf course.
This visual shows that the players who have the most earnings average fewer amounts of putts. Once again the correlation on this graph is not the strongest but we can definitely see some type of negative relationship.
Analysis 4
This next visual is supposed to see if being more aggressive on the golf course can lead to a lower score. To do this I will compare birdies to score. The relationship in this may seem obvious as more birdies can lead to a lower score but at the same time if a player is constantly trying for birdies, it may lead to more mistakes.
This visual does give us one of the better correlations that we have seen, showing that more birdies do lead to a lower score. I would have thought that we would see some instances with a high number of birdies and higher score, signifying a more agressive player, but that is not the case.
Analysis 5
My last visual is meant to see if FedEx cup points equate to more earnings.
This is the visual with perhaps the least amount of correlation where we see some high earners with few cup points and also so low earners with many cup points. If I was to give any advice to a pro golfer it would be to not strive for cup points as it does not always equate to more earnings.
Summary
Looking back at the results, it is unfortunate that many of the visuals proved to be little to no correlation. If I had a guess, the reason for this is just becuase earnigns have to do with performance at bigger events. Sure a player can have great stats but the way he makes his money is performing well at the majors.