Project Goal : Fans of the PGA Tour and golf in general know that driving distance has increased significantly over the past few decades.
One of my friends, David and I were discussing this development and we both commented that everybody seems to be a long hitter these days and we had a few questions that we hope the statistics can answer:
What kind of advantage do the longest hitters have when it comes to the really important measures like winning and making money?
Is there more distance parity now than there was in past decades?
Are the shorter hitters at a greater disadvantage now then they used to be?
Note : This article, like most of mine on R-Pubs, is both instructional and investigative. You can re-create any of these charts and tables from the embedded R code. Click the “Code” button to view.
# Load libraries, functions and clear environment
rm(list=ls())
options(scipen=999)
suppressPackageStartupMessages({
require(tidyverse)
require(lubridate)
require(scales)
require(kableExtra)
require(tinytex)
})
# Create a function to return zero for null and blank values
zna <- function(x) {
x <- ifelse(is.null(x), 0,
ifelse(is.na(x), 0,
ifelse(x == "", 0, x)))
return(x)
}
RJETBlue = "#003365"
PrettyTable = function(TableObject, TableTitle) {
TableObject %>%
kable("html", escape = FALSE,
caption = paste0('<p style="color:black; font-size:18px">',
TableTitle,
'</p>')) %>%
kable_styling("striped",
bootstrap_options = c("hover", "condensed"),
full_width = T) %>%
row_spec(0, color = "white", background = RJETBlue)
}
#
# Load Data from Github
# (Another program was used to scrape the data into these CSV files)
#
DrivingDistance = read_csv("https://raw.githubusercontent.com/ChrisAtSavvy/SavvyDemoProjects/main/R/PGAStats_DrivingDistance.csv", col_types = "nnccnnn")
DrivingDistance = DrivingDistance %>%
filter(!str_detect(Player, "tour"))
DrivingAccuracy = read_csv("https://raw.githubusercontent.com/ChrisAtSavvy/SavvyDemoProjects/main/R/PGAStats_DrivingAccuracy.csv", col_types = "nnccnnn")
DrivingAccuracy = DrivingAccuracy %>%
filter(!str_detect(Player, "tour"))
GreensInRegulation = read_csv("https://raw.githubusercontent.com/ChrisAtSavvy/SavvyDemoProjects/main/R/PGAStats_GreensInRegulation.csv", col_types = "nnccnnn")
GreensInRegulation = GreensInRegulation %>%
filter(!str_detect(Player, "tour"))
PuttsPerRound = read_csv("https://raw.githubusercontent.com/ChrisAtSavvy/SavvyDemoProjects/main/R/PGAStats_PuttsPerRound.csv", col_types = "nnccnnn")
TournamentResults = read_csv("https://raw.githubusercontent.com/ChrisAtSavvy/SavvyDemoProjects/main/R/PGAStats_TournamentResults.csv", col_types = "nncnccn")
TournamentResults = TournamentResults %>%
mutate(PrizeMoney = if_else(!is.na(PrizeMoney), PrizeMoney, 0))
AGGDistanceYear = DrivingDistance %>%
group_by(Year) %>%
summarize(MinDistance = min(AvgDist),
Q1Distance = quantile(AvgDist, 0.25),
MedDistance = median(AvgDist),
MeanDistance = mean(AvgDist),
Q3Distance = quantile(AvgDist, 0.75),
MaxDistance = max(AvgDist),
.groups = "drop") %>%
mutate(MinDistance_Label = if_else(Year %in% c(min(Year), round(mean(Year), 0), max(Year)),
MinDistance, NA),
MeanDistance_Label = if_else(Year %in% c(min(Year), round(mean(Year), 0), max(Year)),
MeanDistance, NA),
MaxDistance_Label = if_else(Year %in% c(min(Year), round(mean(Year), 0), max(Year)),
MaxDistance, NA),
MinVsMean = (MinDistance / MeanDistance) - 1,
MaxVsMean = (MaxDistance / MeanDistance) - 1,
MinVsMean_Label = if_else(Year %in% c(min(Year), round(mean(Year), 0), max(Year)),
MinVsMean, NA),
MaxVsMean_Label = if_else(Year %in% c(min(Year), round(mean(Year), 0), max(Year)),
MaxVsMean, NA)
)
# Aggregate Player Tournament Results by Year
AGGPlayerYear = TournamentResults %>%
rename(Year = EventYear) %>%
group_by(Year) %>%
mutate(YearlyPrizeMoney = sum(PrizeMoney)) %>%
ungroup() %>%
mutate(Wins = if_else(Order == 1, 1, 0),
Top10 = if_else(Order <= 10, 1, 0)) %>%
group_by(Year, Player) %>%
summarize(Wins = sum(Wins),
Top10 = sum(Top10),
Cuts = n(),
PrizeMoney = sum(PrizeMoney),
.groups = "drop")
# Add Stats
AGGPlayerYear = AGGPlayerYear %>%
inner_join(DrivingDistance %>%
rename(DistanceRank = Rank) %>%
select(Year, Player, DistanceRank, AvgDist),
by = c("Year" = "Year", "Player" = "Player"),
multiple = "all") %>%
inner_join(DrivingAccuracy %>%
rename(DriveAccuracyRank = Rank) %>%
select(Year, Player, DriveAccuracyRank, FairwayRate),
by = c("Year" = "Year", "Player" = "Player"),
multiple = "all") %>%
inner_join(GreensInRegulation %>%
rename(GreensAccuracyRank = Rank) %>%
select(Year, Player, GreensAccuracyRank, GreensRate),
by = c("Year" = "Year", "Player" = "Player"),
multiple = "all") %>%
inner_join(PuttsPerRound %>%
rename(PuttsRank = Rank) %>%
select(Year, Player, PuttsRank, AvgPutts),
by = c("Year" = "Year", "Player" = "Player"),
multiple = "all") %>%
group_by(Year) %>%
mutate(YearlyPrizeMoney = sum(PrizeMoney),
DrivingPercentile = 1 - rescale(rank(DistanceRank + DriveAccuracyRank), to = c(0, 1)),
DistancePercentile = 1 - rescale(rank(DistanceRank), to = c(0, 1)),
GreensPercentile = 1 - rescale(GreensAccuracyRank, to = c(0, 1)),
PuttsPercentile = 1 - rescale(PuttsRank, to = c(0, 1)),
MoneyPercentile = 1- rescale(rank(PrizeMoney), to = c(1, 0))) %>%
group_by(Year, Player) %>%
mutate(PercentOfMoney = PrizeMoney / YearlyPrizeMoney) %>%
ungroup() %>%
arrange(Year, desc(MoneyPercentile))
Let’s look at the average driving distance of the longest, average and shortest players on the tour since 1987.
AGGDistanceYear %>%
ggplot() +
geom_line(aes(Year, MaxDistance, color = "Longest Player"), linewidth = 1) +
geom_line(aes(Year, MedDistance, color = "Average Player"), linewidth = 1) +
geom_line(aes(Year, MinDistance, color = "Shortest Player"), linewidth = 1) +
geom_smooth(aes(Year, MaxDistance, color = "Longest Player"),
linewidth = 2, formula = "y ~ x", method = "loess", se = F) +
geom_smooth(aes(Year, MedDistance, color = "Average Player"),
linewidth = 2, formula = "y ~ x", method = "loess", se = F) +
geom_smooth(aes(Year, MinDistance, color = "Shortest Player"),
linewidth = 2, formula = "y ~ x", method = "loess", se = F) +
geom_label(aes(Year, MaxDistance, label = comma(MaxDistance_Label, accuracy = 0.1),
color = "Longest Player"), fill = "white", na.rm = T, show.legend = FALSE) +
geom_label(aes(Year, MeanDistance, label = comma(MeanDistance_Label, accuracy = 0.1,
color = "Average Player")), fill = "white", na.rm = T, show.legend = FALSE) +
geom_label(aes(Year, MinDistance, label = comma(MinDistance_Label, accuracy = 0.1),
color = "Shortest Player"), fill = "white", na.rm = T, show.legend = FALSE) +
scale_color_manual(name = "", values = c("Longest Player" = "blue",
"Average Player" = "black",
"Shortest Player" = "#e75480")) +
theme_minimal() +
theme(legend.position = "bottom") +
labs(title = "PGA Driving Distances by Year",
y = "Average Yards Per Drive",
x = "Year")
So clearly everybody had gotten longer across the board with average to long hitters gaining 38 yards and even shorter hitters picking up 33 yards which is about 14% in all cases.
Let’s put some names to the stats and see who the top two longest and shortest were by year.
LongDF = DrivingDistance %>%
filter(Rank %in% c(1,2)) %>%
mutate(`Longest Player` = Player, `Longest Avg` = AvgDist) %>%
select(Year, Rank, `Longest Player`, `Longest Avg`)
ShortDF = DrivingDistance %>%
group_by(Year) %>%
mutate(ShortRank = rev(Rank)) %>%
ungroup() %>%
filter(ShortRank %in% c(1,2)) %>%
mutate(Rank = ShortRank, `Shortest Player` = Player, `Shortest Avg` = AvgDist) %>%
select(Year, Rank, `Shortest Player`, `Shortest Avg`)
suppressWarnings({
TopDistanceDF =
LongDF %>%
inner_join(ShortDF, by = c("Year", "Rank"))})
PrettyTable(TopDistanceDF, "Longest and Shortest by Year")
Year | Rank | Longest Player | Longest Avg | Shortest Player | Shortest Avg |
---|---|---|---|---|---|
1987 | 1 | John McComish | 283.9 | Bob Murphy | 244.2 |
1987 | 2 | Davis Love III | 281.5 | Dave Stockton | 244.6 |
1988 | 1 | Steve Thomas | 284.6 | Jack Renner | 246.2 |
1988 | 2 | Craig Stadler | 279.5 | John Inman | 249.5 |
1989 | 1 | Steve Thomas | 284.6 | Jack Renner | 246.2 |
1989 | 2 | Craig Stadler | 279.5 | John Inman | 249.5 |
1990 | 1 | Tom Purtzer | 279.6 | Calvin Peete | 244.2 |
1990 | 2 | John Adams | 279.4 | Tommy Moore | 245.2 |
1991 | 1 | John Daly | 288.9 | Scott Verplank | 240.8 |
1991 | 2 | Greg Norman | 282.3 | Morris Hatalsky | 247.4 |
1992 | 1 | John Daly | 283.4 | Ed Fiori | 240.6 |
1992 | 2 | Phil Blackmar | 276.5 | Morris Hatalsky | 242.6 |
1993 | 1 | John Daly | 288.9 | David Delong | 240.9 |
1993 | 2 | Davis Love III | 280.2 | Morris Hatalsky | 243.2 |
1994 | 1 | Davis Love III | 283.8 | Stan Utley | 237.9 |
1994 | 2 | Dennis Paulson | 283.0 | Billy Ray Brown | 239.3 |
1995 | 1 | John Daly | 289.0 | Bill Britton | 246.6 |
1995 | 2 | Davis Love III | 284.6 | Tim Loustalot | 248.4 |
1996 | 1 | John Daly | 288.8 | Ed Fiori | 239.5 |
1996 | 2 | John Adams | 286.7 | Corey Pavin | 248.1 |
1997 | 1 | John Daly | 302.0 | Ed Fiori | 244.8 |
1997 | 2 | Tiger Woods | 294.8 | John Morse | 250.6 |
1998 | 1 | John Daly | 299.4 | Corey Pavin | 249.0 |
1998 | 2 | Tiger Woods | 296.3 | John Morse | 250.8 |
1999 | 1 | John Daly | 305.6 | Corey Pavin | 252.5 |
1999 | 2 | Chris Couch | 295.8 | Loren Roberts | 254.8 |
2000 | 1 | John Daly | 301.4 | Corey Pavin | 251.3 |
2000 | 2 | Tiger Woods | 298.0 | Mike Reid | 251.6 |
2001 | 1 | John Daly | 306.7 | Corey Pavin | 252.7 |
2001 | 2 | Brett Quigley | 298.5 | Loren Roberts | 261.6 |
2002 | 1 | John Daly | 306.8 | Pete Jordan | 256.6 |
2002 | 2 | Boo Weekley | 297.0 | Corey Pavin | 258.1 |
2003 | 1 | Hank Kuehne | 321.4 | Loren Roberts | 265.9 |
2003 | 2 | John Daly | 314.3 | Corey Pavin | 268.9 |
2004 | 1 | Hank Kuehne | 314.4 | Corey Pavin | 268.2 |
2004 | 2 | Scott Hend | 312.6 | Loren Roberts | 269.1 |
2005 | 1 | Scott Hend | 318.9 | Corey Pavin | 258.7 |
2005 | 2 | Tiger Woods | 316.1 | Jeff Hart | 265.7 |
2006 | 1 | Bubba Watson | 319.6 | Corey Pavin | 265.9 |
2006 | 2 | J.B. Holmes | 318.8 | Brad Faxon | 270.3 |
2007 | 1 | Bubba Watson | 315.2 | Corey Pavin | 265.3 |
2007 | 2 | John Daly | 312.9 | Gavin Coles | 268.3 |
2008 | 1 | Bubba Watson | 315.1 | Corey Pavin | 261.4 |
2008 | 2 | Robert Garrigus | 311.0 | Brian Gay | 270.5 |
2009 | 1 | Robert Garrigus | 312.0 | Corey Pavin | 259.0 |
2009 | 2 | Bubba Watson | 311.4 | Brian Gay | 268.5 |
2010 | 1 | Robert Garrigus | 315.5 | Brian Gay | 266.4 |
2010 | 2 | Bubba Watson | 309.8 | Craig Bowden | 270.0 |
2011 | 1 | J.B. Holmes | 318.4 | Brian Gay | 269.8 |
2011 | 2 | Bubba Watson | 314.9 | Paul Goydos | 274.8 |
2012 | 1 | Bubba Watson | 315.5 | Nick O’Hern | 268.9 |
2012 | 2 | Charlie Beljan | 311.6 | Gavin Coles | 271.2 |
2013 | 1 | Luke List | 306.3 | Mike Weir | 270.5 |
2013 | 2 | Dustin Johnson | 305.8 | Colt Knost | 271.7 |
2014 | 1 | Bubba Watson | 314.3 | Justin Leonard | 270.3 |
2014 | 2 | Dustin Johnson | 311.0 | Mike Weir | 271.2 |
2015 | 1 | Dustin Johnson | 317.7 | David Toms | 270.0 |
2015 | 2 | Bubba Watson | 315.2 | Ben Crane | 271.0 |
2016 | 1 | J.B. Holmes | 314.5 | David Toms | 269.7 |
2016 | 2 | Dustin Johnson | 313.6 | Colt Knost | 269.7 |
2017 | 1 | Rory McIlroy | 317.2 | Jim Furyk | 271.4 |
2017 | 2 | Dustin Johnson | 315.0 | Brian Stuard | 272.6 |
2018 | 1 | Rory McIlroy | 319.7 | Brian Stuard | 278.9 |
2018 | 2 | Trey Mullinax | 318.3 | Zac Blair | 279.5 |
2019 | 1 | Cameron Champ | 317.9 | Scott Langley | 271.3 |
2019 | 2 | Rory McIlroy | 313.5 | Brian Gay | 274.9 |
2020 | 1 | Bryson DeChambeau | 322.1 | Andrew Putnam | 277.9 |
2020 | 2 | Cameron Champ | 322.0 | Jim Furyk | 280.2 |
2021 | 1 | Bryson DeChambeau | 323.7 | K.J. Choi | 269.2 |
2021 | 2 | Rory McIlroy | 319.3 | David Hearn | 274.7 |
2022 | 1 | Cameron Champ | 321.4 | Brian Stuard | 277.4 |
2022 | 2 | Rory McIlroy | 321.3 | Brendon Todd | 279.0 |
AGGDistanceYear %>%
ggplot() +
geom_line(aes(Year, MaxVsMean, color = "Longest vs Average"), linewidth = 1) +
geom_line(aes(Year, MinVsMean, color = "Shortest vs Average"), linewidth = 1) +
geom_hline(aes(yintercept = 0, color = "Tour Average"), linewidth = 1) +
# geom_hline(yintercept = 0, color = "black", linewidth = 2) +
geom_smooth(aes(Year, MaxVsMean, color = "Longest vs Average"),
linewidth = 2, formula = "y ~ x", method = "loess", se = F) +
geom_smooth(aes(Year, MinVsMean, color = "Shortest vs Average"),
linewidth = 2, formula = "y ~ x", method = "loess", se = F) +
geom_label(aes(Year, MaxVsMean, label = percent(MaxVsMean_Label, accuracy = 0.1),
color = "Longest vs Average"), fill = "white", na.rm = T, show.legend = FALSE) +
geom_label(aes(Year, MinVsMean, label = percent(MinVsMean_Label, accuracy = 0.1),
color = "Shortest vs Average"), fill = "white", na.rm = T, show.legend = FALSE) +
scale_color_manual(name = "", values = c("Longest vs Average" = "blue",
"Tour Average" = "black",
"Shortest vs Average" = "#e75480")) +
scale_y_continuous(labels = percent_format(accuracy = 1)) +
theme_minimal() +
theme(legend.position = "bottom") +
labs(title = "PGA Driving Distance Differential by Year",
y = "Difference From Average",
x = "Year")
The longest players are minutely less long than the average player versus 1987 although there was a peak in the early 2000’s where the longest players peaked versus the average. The shortest players have maintained their position versus the average and in the last few years have lost ground.
However given the very small changes of less than 2% over 36 years I think we’ve got the answer question two question 2. The distance parity is basically unchanged over the last 36 years
So does that length matter? Let’s look at the ultimate test, prize money!
AGGDistanceQuartiles = AGGPlayerYear %>%
mutate(DistanceQuartile = case_when(
DistancePercentile > 0.66 ~ " Longest Third",
DistancePercentile > 0.33 ~ " Average Third",
TRUE ~ " Shortest Third")) %>%
filter(!is.na(DistanceQuartile) & !is.na(PercentOfMoney) & !is.na(Wins)) %>%
group_by(Year) %>%
mutate(YearlyWins = sum(Wins)) %>%
ungroup() %>%
group_by(Year, DistanceQuartile) %>%
summarize(PercentOfMoney = sum(PercentOfMoney),
PercentOfWins = sum(Wins) / min(YearlyWins),
.groups = "drop") %>%
mutate(PercentOfMoney_Label = if_else(Year %in% c(min(Year), round(mean(Year), 0), max(Year)),
PercentOfMoney, NA),
PercentOfWins_Label = if_else(Year %in% c(min(Year), round(mean(Year), 0), max(Year)),
PercentOfWins, NA))
AGGDistanceQuartiles %>%
ggplot() +
geom_line(aes(Year, PercentOfMoney, color = DistanceQuartile), linewidth = 1) +
geom_smooth(aes(Year, PercentOfMoney, color = DistanceQuartile),
linewidth = 2, formula = "y ~ x", method = "loess", se = F) +
geom_label(aes(Year, PercentOfMoney, label = percent(PercentOfMoney_Label, accuracy = 0.1),
color = DistanceQuartile), fill = "white", na.rm = T, show.legend = FALSE) +
scale_color_manual(values = c("blue", "black", "#e75480")) +
scale_y_continuous(labels = percent_format(accuracy = 1)) +
theme_minimal() +
theme(legend.position = "bottom") +
labs(title = "PGA Annual Prize Money Percentage Per Distance Cohort by Year",
color = "",
y = "Annual Percentage of Prize Money Won",
x = "Year")
The longest third of drivers on the tour have increased their percentage of total prize money across each season from 34.9% in 1987 to almost 51% in 2022. This has been at the direct expense of the shortest third of drivers who have seen their share drop from 27.3% to 14.4%.
You can also see how this trend has accelerated since about 2009 where there was movement toward convergence and then the long drivers started taking the money!
Let’s see if there is a similar trend in wins.
AGGDistanceQuartiles %>%
ggplot() +
geom_line(aes(Year, PercentOfWins, color = DistanceQuartile), linewidth = 1) +
geom_smooth(aes(Year, PercentOfWins, color = DistanceQuartile),
linewidth = 2, formula = "y ~ x", method = "loess", se = F) +
geom_label(aes(Year, PercentOfWins, label = percent(PercentOfWins_Label, accuracy = 0.1),
color = DistanceQuartile), fill = "white", na.rm = T, show.legend = FALSE) +
scale_color_manual(values = c("blue", "black", "#e75480")) +
scale_y_continuous(labels = percent_format(accuracy = 1)) +
theme_minimal() +
theme(legend.position = "bottom") +
labs(title = "PGA Annual Percentage of Wins Per Distance Cohort by Year",
color = "",
y = "Annual Wins",
x = "Year")
The longest third of drivers on the tour have increased their share of wins each season from 35.7% in 1987 when they were dead even with the shortest third. Now that share has increased to 73.3% while the shortest have dropped to 4.4%.
This is a much more dramatic shift than I anticipated. Before we move on from money and wins I want to validate this by taking a narrow look at just the top 10% of long drivers over the years.
AGGTopDrivers = AGGPlayerYear %>%
mutate(DistanceQuartile = case_when(
DistancePercentile > 0.90 ~ "Longest 10%",
TRUE ~ " Shortest 90%")) %>%
filter(!is.na(DistanceQuartile) & !is.na(PercentOfMoney) & !is.na(Wins)) %>%
group_by(Year) %>%
mutate(YearlyWins = sum(Wins)) %>%
ungroup() %>%
group_by(Year, DistanceQuartile) %>%
summarize(PercentOfMoney = sum(PercentOfMoney),
PercentOfWins = sum(Wins) / min(YearlyWins),
FairwayRate = mean(FairwayRate),
GreensRate = mean(GreensRate),
AvgPutts = mean(AvgPutts),
.groups = "drop") %>%
mutate(PercentOfMoney_Label = if_else(Year %in% c(min(Year), round(mean(Year), 0), max(Year)),
PercentOfMoney, NA),
PercentOfWins_Label = if_else(Year %in% c(min(Year), round(mean(Year), 0), max(Year)),
PercentOfWins, NA),
FairwayRate_Label = if_else(Year %in% c(min(Year), round(mean(Year), 0), max(Year)),
FairwayRate, NA),
GreensRate_Label = if_else(Year %in% c(min(Year), round(mean(Year), 0), max(Year)),
GreensRate, NA),
AvgPutts_Label = if_else(Year %in% c(min(Year), round(mean(Year), 0), max(Year)),
AvgPutts, NA)) %>%
filter(DistanceQuartile == "Longest 10%") %>%
rename(`Percent of Money` = PercentOfMoney, `Percent of Wins` = PercentOfWins,
`Fairway Accuracy` = FairwayRate, `Greens In Regulation` = GreensRate,
`Putts Per Round` = AvgPutts) %>%
pivot_longer(cols = c(`Percent of Money`, `Percent of Wins`,
`Fairway Accuracy`, `Greens In Regulation`, `Putts Per Round`),
names_to = "MetricName", values_to = "MetricValue") %>%
mutate(PercentOfMoney_Label = if_else(!MetricName == "Percent of Money",
NA, PercentOfMoney_Label),
PercentOfWins_Label = if_else(!MetricName == "Percent of Wins",
NA, PercentOfWins_Label),
FairwayRate_Label = if_else(!MetricName == "Fairway Accuracy",
NA, FairwayRate_Label),
GreensRate_Label = if_else(!MetricName == "Greens In Regulation",
NA, GreensRate_Label),
AvgPutts_Label = if_else(!MetricName == "Putts Per Round",
NA, AvgPutts_Label)
)
AGGTopDrivers %>%
filter(MetricName %in% c("Percent of Money", "Percent of Wins")) %>%
ggplot() +
geom_line(aes(Year, MetricValue, color = MetricName), linewidth = 1) +
geom_smooth(aes(Year, MetricValue, color = MetricName),
linewidth = 2, formula = "y ~ x", method = "loess", se = F) +
geom_label(aes(Year, MetricValue, label = percent(PercentOfMoney_Label, accuracy = 0.1),
color = MetricName), fill = "white", na.rm = T, show.legend = FALSE) +
geom_label(aes(Year, MetricValue, label = percent(PercentOfWins_Label, accuracy = 0.1),
color = MetricName), fill = "white", na.rm = T, show.legend = FALSE) +
scale_color_manual(values = c("darkgreen", "black")) +
scale_y_continuous(labels = percent_format(accuracy = 1)) +
theme_minimal() +
theme(legend.position = "bottom") +
labs(title = "PGA Annual Prize and Win Share For Longest 10% of Drivers",
color = "",
y = "Annual Percentage Won",
x = "Year")
If all things were equal, the longest 10% would get about 10% of the wins but again we can see that even for the longest 20 or so hitters they have moved from taking just over 14% of the wins and prize money in 1987 to 22%-24% in 2022.
So how did the longest hitters go from slightly above average to domination?
Let’s wrap this up by looking at some basic tee to green statistics for the longest 10%.
AGGTopDrivers %>%
filter(MetricName %in% c("Fairway Accuracy", "Greens In Regulation")) %>%
ggplot() +
geom_line(aes(Year, MetricValue, color = MetricName), linewidth = 1) +
geom_smooth(aes(Year, MetricValue, color = MetricName),
linewidth = 2, formula = "y ~ x", method = "loess", se = F) +
geom_label(aes(Year, MetricValue, label = percent(FairwayRate_Label, accuracy = 0.1),
color = MetricName), fill = "white", na.rm = T, show.legend = FALSE) +
geom_label(aes(Year, MetricValue, label = percent(GreensRate_Label, accuracy = 0.1),
color = MetricName), fill = "white", na.rm = T, show.legend = FALSE) +
scale_color_manual(values = c("darkgreen", "black")) +
scale_y_continuous(labels = percent_format(accuracy = 1)) +
theme_minimal() +
theme(legend.position = "bottom") +
labs(title = "PGA Fairway Accuracy and Greens In Regulation For Longest 10% of Drivers",
color = "",
y = "Accuracy Rate",
x = "Year")
So driving accuracy has suffered slightly for the longest 10% of hitters since 1987 as they are hitting about 55.4% or 7.7 / 14 fairways in 2022 versus 61.5% or 8.6 / 14 fairways in 1987.
However that hasn’t seemed to impact greens hit in regulation which has basically remained unchanged. It seems that the being 38 yards further down the fairway offsets the slight decrease in accuracy.
For our final stat we will look at putts.
AGGTopDrivers %>%
filter(MetricName %in% c("Putts Per Round")) %>%
ggplot() +
geom_line(aes(Year, MetricValue, color = MetricName), linewidth = 1) +
geom_smooth(aes(Year, MetricValue, color = MetricName),
linewidth = 2, formula = "y ~ x", method = "loess", se = F) +
geom_label(aes(Year, MetricValue, label = comma(AvgPutts_Label, accuracy = 0.1),
color = MetricName), fill = "white", na.rm = T, show.legend = FALSE) +
scale_color_manual(values = c("darkgreen")) +
theme_minimal() +
theme(legend.position = "bottom") +
labs(title = "PGA Putts Per Round For Longest 10% of Drivers",
color = "",
y = "Putts Per Round",
x = "Year")
This doesn’t look like much but we can see that the longest hitters have improved by 0.7 putts per round which directly impacts their score by about 3 strokes per 72 hole tournament.
Since Greens in Regulation were about the same over the years, the improvement in putting is presumably driven by proximity to the hole. Unfortunately we don’t have data back to 1987 on proximity.
Conclusions : So to answer our original questions:
Longer hitters have increased their advantage over the past 36 years with the longest 10% nearly doubling their share of prize money and wins.
The parity in just distance has remained constant with the long, short and average drivers all gaining about the same percentages in distance.
Shorter hitters are now at a significant disadvantage. The shortest third of hitters only wins about 15% of prize money now versus nearly 30% in 1987. The share of wins has dropped even more dramatically from 35% to 4%.
So if you can’t be Savvy; be long.