I am a big sports guy. Always have been my whole life. I grew up playing golf and soccer, and still play both to this day. Golf introduced me initially to the sports analytics sector & I’ve been fascinated with it for a long time. I actively still play golf competitively and keep stats for every single round I play. I use these stats to analyze my game & understand where it is I can improve! Taking an analytics-based model to my personal golf game has transformed my game over the past few years.
Using my analytics-based model has allowed me to understand where my game needed to grow and improve. The field of golf analytics has also gained traction & is continually growing, as well. Specific tournament statistics from the PGA Tour have been formed and are now being used in the everyday lexicon of golfers around the world. With this in mind, it’s intrigued me to keep an eye on professional statistics compared with my own as a way to understand truly how crazy good the pros are & also where I can improve.
The PGA Tour holds one event in Mexico every season. It used to be a WGC event, an invitational for the best in the world, named the WGC Mexico Championship. However, this season ushered in a new era for the event, changing formally to the Mexico Open, after WGC events shrunk to only 2 events per season (starting this year). The Mexico Open at Vidanta was held in Puerto Vallarta, Mexico from April 28 - May 1, 2022.
In this edition of the new Mexico Open, Jon Rahm captured the title with a score of -17 over the 72 holes. Conditions were pretty ideal for the tournament, but there was a consistent significant amount of wind throughout the tourney, leading to harder conditions than normal. Nonetheless, the ultimate purpose of this analysis is to look at some key stats & see if any of them were big indicators for a player’s performance at the Mexico Open. Looking at a few key stats can give a glimpse into who will play well in future tournaments that have similar course setups.
This data set is a combination of a few things. The majority of it is scraped directly from the ESPN page on the tournament.
The link to that ESPN page is here: https://www.espn.com/golf/leaderboard?tournamentId=401353229
The leader board and key stats are taken from the couple of tabs on the ESPN page for the Mexico Open. Some other stats came from assorted sources on the Internet.
Below is an in depth data dictionary, describing the factors in the dataset itself:
| POS | Position at the end of the tournament |
| PLAYER | Name of the player |
| SCORE | Score to par for each player |
| R1 | Score each player shot in Round 1 (Thursday) |
| R2 | Score each player shot in Round 2 (Friday) |
| R3 | Score each player shot in Round 3 (Saturday) |
| R4 | Score each player shot in Round 4 (Sunday) |
| TOT | Total score for the tournament (R1 + R2 + R3 + R4) |
| EARNINGS | Amount each player won (in $) based on finishing position |
| FEDEX.PTS | Amount of FedEx Points each player won based on finishing position |
| YDS/DRV | Average yards per drive for each player |
| DRV ACC | Driving accuracy percentage for each player |
| GIR | GIR percentage for each player |
| PP GIR | Putts per GIR (average putts when player gets GIR) |
Before getting into analysis, I’ve imported an interactive table with the full final leader board & key stats, as described above in the data dictionary, for the Mexico Open:
This part of the analysis aims to analyze Jon Rahm’s standing throughout the tournament by analyzing his rounds against the field average.
Jon Rahm got off to a very hot start, coming out of the gate guns blazing. His 64 in Round 1 put him in contention very early; he & 5 others would shoot 64 on day 1 to get out to a great start. These 6 players would blitz the field average, shooting over 4 shots better than the field average on day 1.
However, Jon Rahm would be the only player to follow up day 1 with a similar score, and that right there is where Rahm got separation from the field. The table above outlines the lowest 10 scores from R1, how they fared in R2, and where they ended up at the end of the tournament. Of those that shot 64 in day 1, Rahm was 1 of 2 players to shoot in the 60s again in R2.
Again, Rahm also blitzed the field average. His 66 would be a little over 3 shots better than the field average. He would shoot near field average in R3 & R4, so his hot start really is what separated him from the field.
Now, let’s get into some key statistics and see how Rahm fared there compared to the field. We’ll see if we can figure out which stats stick out as to potential reasons why he ended up winning at the Mexico Open.
We start with GIR Analysis.
Jon Rahm’s ball approach game (represented through GIR) seemed pretty solid. It was definitely higher than field average & towards the top. This is definitely a reason why he could have won. He gave himself tons of birdie (or better) putts, definitely more than the field average.
Rahm’s solid approach game is backed up through this table. Rahm had the 7th highest GIR percentage in the field for the tournament, hitting nearly 77% of the greens in regulation, which is huge. This means he had a birdie putt on 55/72 holes he played on, which is crazy. He had tons of looks at birdie; even if they were not good looks, they were still birdie putts, meaning he more than likely had secured par. Tony Finau blitzed the field, hitting a little over 83% of greens, but must have struggled with the putter on the green.
The table above shows the top 10 players in terms of driving distance. This is also definitely an indicator as to Rahm’s success. Rahm was a monster off the tee, averaging a ridiculous 340.4 yards off the tee. This means he must have had tons of short irons and wedges into greens, definitely aiding his ability to hit tons of greens in regulation & having good looks at birdies or better.
It is worth noting the ball naturally travels farther in that part of Mexico for scientific reasons that are beyond the scope of this analysis, but has to do with air moisture, altitude, atmospheric pressure, and other things of that nature.
Driving Accuracy did not seem to be Rahm’s strong suit, and honestly it did not seem to be any meaningful indicator for who performed well. The table above shows the top 20 players in terms of driving accuracy. Rahm was 15th.
Brandon Wu, who made a late charge on Sunday to win and finished T2, finished 10th in driving accuracy. This alludes to the fact that the course was probably pretty open. Missing the fairway, generally, does not mean that anyone was in serious trouble and could still scramble for par or still hit the green and have a look at birdie.
Putts per GIR did not really have any impact on tournament performance. This table outlines the top 20 players in terms of putts per GIR. Rahm did not even rank close to the top 20. His PP GIR was 1.73, which is not great; averaging 1.73 putts each time he hits the green in regulation means he had birdie looks but did not make them a lot. He must have just avoided making worse than bogey (so he wouldn’t drop a significant amount of shots at any point) & probably holed out outside of the green a few times.
As mentioned earlier, this was the first edition ever of the Mexico Open. Formerly a WGC event, the field was not as strong as it normally is in the event in Mexico. To analyze the impact of getting rid of the WGC (and thus becoming just a normal stop on Tour), I pulled a bunch of Twitter data to perform interaction analysis on the official PGA Tour (@PGATour) Twitter account on tweets during the tournament period in the last week of April.
This data was scraped directly from Twitter using the rtweet R package. It pulls the most recent 400 tweets (as of May 6, 2022), which captures approximately the past month of tweets from the PGA Tour Twitter account.
The table below analyzes the average likes and retweets of all the tweets from each day in the dataset. There are tweets from before, during, and after the Mexico Open. Data from before and after the Mexico Open was intentionally left in there to get a sense as to how those tweets performed (from an interaction standpoint) compared to the events before (Zurich Classic and the RBC Heritage) & the event after (Wells Fargo Championship, which is currently ongoing).
Looking at the average favorites and retweets over the month period (approximately), there definitely is a stark difference in interaction in the Mexico Open tournament week compared to the other 3 weeks/tournaments mentioned.
The highest average of favorites during the Mexico Open was about 617 favorites, and the highest average of retweets was 55. These averages are relatively low, especially compared to days like April 17, which was Sunday at the RBC Heritage. In this tournament, Jordan Spieth won for the first time on Tour in a while in a playoff over Patrick Cantlay. Neither Jordan Spieth nor Patrick Cantlay played in the Mexico Open. One of the few “big names” in golf right now that played in the Mexico Open ended up winning it: Jon Rahm.
The relatively weak field, compared to weeks before and immediately after, combined with the stripping of WGC-esque golf definitely had a negative impact on the tournament’s popularity. There were tons of people that I know of, personally, that definitely did not take this tournament as seriously or seem to care for it as much. It seems to have steeped from an important tournament to something like the Corales Championship (that does not have the allure of a major or like The Players), which is still a big deal since it’s a Tour event, but does not have the same strength of field as a big tournament.
The following analysis will take an in-depth look at the stats presented in this data set (YDS/DRV, DRV ACC, GIR, and PP GIR) in an attempt to see if there are any factors that are overwhelmingly important. The goal is to see if there were any stats that stick out as being really big contributors for their performance in this tournament.
A regression is ran for each factor against their final total score in the tournament to determine if there are any relevant relationships between final score and any of the stats captured in this dataset. As touched on in the Descriptive Analysis, we should expect some sort of correlation between scores and Driving Distance (Rahm was 1st in this category) & in GIR (Rahm was 7th in this category with other high tournament finishers ending up performing very well in this category).
##
## Call:
## lm(formula = TOT ~ GIR, data = mexico_open)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.2745 -2.8775 -0.3336 2.5694 9.7492
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 309.31876 5.55212 55.712 < 2e-16 ***
## GIR -0.47055 0.08014 -5.872 1.22e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.865 on 72 degrees of freedom
## (70 observations deleted due to missingness)
## Multiple R-squared: 0.3238, Adjusted R-squared: 0.3144
## F-statistic: 34.48 on 1 and 72 DF, p-value: 1.223e-07
##
## Call:
## lm(formula = TOT ~ `DRV ACC`, data = mexico_open)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.2307 -2.4772 0.3405 2.6408 12.8368
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 285.49284 3.55012 80.418 <2e-16 ***
## `DRV ACC` -0.13863 0.05615 -2.469 0.0159 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.513 on 72 degrees of freedom
## (70 observations deleted due to missingness)
## Multiple R-squared: 0.07805, Adjusted R-squared: 0.06524
## F-statistic: 6.095 on 1 and 72 DF, p-value: 0.01593
##
## Call:
## lm(formula = TOT ~ `YDS/DRV`, data = mexico_open)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.0388 -2.8533 -0.3937 2.4692 11.1870
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 327.36597 12.97212 25.236 < 2e-16 ***
## `YDS/DRV` -0.16130 0.04137 -3.899 0.000215 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.271 on 72 degrees of freedom
## (70 observations deleted due to missingness)
## Multiple R-squared: 0.1743, Adjusted R-squared: 0.1629
## F-statistic: 15.2 on 1 and 72 DF, p-value: 0.0002146
##
## Call:
## lm(formula = TOT ~ `YDS/DRV`, data = mexico_open)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.0388 -2.8533 -0.3937 2.4692 11.1870
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 327.36597 12.97212 25.236 < 2e-16 ***
## `YDS/DRV` -0.16130 0.04137 -3.899 0.000215 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.271 on 72 degrees of freedom
## (70 observations deleted due to missingness)
## Multiple R-squared: 0.1743, Adjusted R-squared: 0.1629
## F-statistic: 15.2 on 1 and 72 DF, p-value: 0.0002146
Each statistic has a simple regression ran & a corresponding summary table outlining the key regression stats (such as r^2) along with them. Looking over the summary tables and visualizations, there does not seem to be any variables that stick out as having a relationship with consistent scores.
We for sure can say, however, that Driving Accuracy and Putts per GIR were not at all accurate measurements. GIR and Driving Distance have a relationship with performance, but it’s far from significant. These are all represented through the Adjusted R-Squared values; GIR has the highest Adjusted R-Squared value by far, with r = 0.31. This does not indicate a strong or significant relationship, but it’s still the strongest one compared to the others.
In conclusion, Jon Rahm’s win at the Mexico Open was not quite determined by the traditionally important statistics, such as GIR. However, he ranked high on GIR & Driving Distance, which certainly aided his ability to score really well early on in the tournament, as he took a solid lead & performed much better than the field average in R1 and R2 of the Mexico Open.