Analysing Tiger Woods’s performance (2005-2010)

Author

Pauric O’Shea

library(kableExtra)
knitr::include_graphics("Tiger Woods.png")

Introduction

Tiger Woods was the world’s top-ranked golfer for a record 281 straight weeks from 2005 to 2010. Throughout his remarkable reign, Woods achieved a remarkable collection of wins, including several renowned events and major championships including the Masters Tournament, the U.S. Open, The Open Championship, and the PGA Championship. His exceptional achievements during this time established him as one of the most influential individuals in the history of golf, highlighting his unmatched talent and consistency on the golf course. We compare Woods’ performance with other top golfers by examining their frequency of being in the top 10 earners, with a minimum of two instances. By comparing Woods to others, his great skill in golf becomes even more apparent, highlighting the extent of his influence on the sport and solidifying his reputation as a legendary figure in golfing supremacy.

Preparing Data

Loading necessary library’s and attached database.

#Loading necessary librarys
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter()     masks stats::filter()
✖ dplyr::group_rows() masks kableExtra::group_rows()
✖ dplyr::lag()        masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(Lahman)
library(ggplot2)
library(forcats)
library(dplyr)
library(readr)
Updated_Golf_Stats <- read_csv("~/Sports Analytics/6. Quarto(3)/6. Quarto/Updated-Golf-Stats.csv")
Rows: 36 Columns: 19
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (1): Player
dbl (17): Year, Rank, Age, Events, Rounds, CM, TT, Wins, YD, DA, GIR, PA, SS...
num  (1): Earnings

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(Updated_Golf_Stats)

I created additional variables to help organize the particular data points I am analyzing.

#Creating a variable that represents total holes completed each year by every competitor
Updated_Golf_Stats <- mutate(Updated_Golf_Stats, Total_Holes = Eagles + Birdies + Pars + Bogies)
#Create a variable that represents the number of times a golfer unsuccessfully missed a top ten finish once they made the cut
Updated_Golf_Stats <- mutate(Updated_Golf_Stats, Outside_TT = CM - TT)

I transformed the data into percentages to adjust for discrepancies in the number of tournaments participated by various golfers, ensuring that the information is shown in a comparative way.

#Creating a variable that represents % of Eagles, Birdies, Pars and Bogies each year
Updated_Golf_Stats <- mutate(Updated_Golf_Stats, Eagles_Pct = Eagles/ Total_Holes *100)
Updated_Golf_Stats <- mutate(Updated_Golf_Stats, Birdies_Pct = Birdies/ Total_Holes *100)
Updated_Golf_Stats <- mutate(Updated_Golf_Stats, Pars_Pct = Pars / Total_Holes *100) 
Updated_Golf_Stats <- mutate(Updated_Golf_Stats, Bogies_Pct = Bogies / Total_Holes *100)

#Creating a variable that represents % Cuts Missed, Outside Top Top finishes, Top Ten Finishes and Wins
Updated_Golf_Stats <- mutate(Updated_Golf_Stats, CM_Pct = CM / Events *100)
Updated_Golf_Stats <- mutate(Updated_Golf_Stats, Outside_TT_Pct = Outside_TT / Events *100)
Updated_Golf_Stats <- mutate(Updated_Golf_Stats, TT_Pct = TT / Events *100)
Updated_Golf_Stats <- mutate(Updated_Golf_Stats, Wins_Pct = Wins / Events *100)

I categorized the data into several datasheets, each displaying the average percentage of players’ performance from 2005 to 2010 when they were among the top earners.

#Finding the mean % Eagles, Birdies, Pars and Bogies produce throughout (2005-2010).Firstly extracting the information from Updated_Golf_Stats and creating a new database. Sorting based off player name and creating a new column called mean_etc.
summarise(Updated_Golf_Stats, AV_Eagles_Pct = mean(Eagles_Pct))
# A tibble: 1 × 1
  AV_Eagles_Pct
          <dbl>
1         0.604
AV_Eagles_Pct <- Updated_Golf_Stats %>%
  group_by(Player) %>%
  summarise(mean_Eagles_Pct = mean(Eagles_Pct)) %>%

arrange(desc(mean_Eagles_Pct))

AV_Birdies_Pct <- Updated_Golf_Stats %>%
  group_by(Player) %>%
  summarise(mean_Birdies_Pct = mean(Birdies_Pct)) %>%
  arrange(desc(mean_Birdies_Pct))

AV_Pars_Pct <- Updated_Golf_Stats %>%
  group_by(Player) %>%
  summarise(mean_Pars_Pct = mean(Pars_Pct)) %>%
  arrange(desc(mean_Pars_Pct))

AV_Bogies_Pct <- Updated_Golf_Stats %>%
  group_by(Player) %>%
  summarise(mean_Bogies_Pct = mean(Bogies_Pct)) %>%
  arrange(desc(mean_Bogies_Pct))


##Finding the mean % Yards Driven, Driving Accuracy,Green In Regulation, Putting Average and Sand Save Percentage produce throughout (2005-2010). Firstly extracting the information from Updated_Golf_Stats and creating a new database. Sorting based off player name and creating a new column called mean_etc.
YD <- Updated_Golf_Stats %>%
  group_by(Player) %>%
  summarise(mean_YD = mean(YD)) %>%

arrange(desc(mean_YD))

DA <- Updated_Golf_Stats %>%
  group_by(Player) %>%
  summarise(mean_DA = mean(DA)) %>%
  arrange(desc(mean_DA))

GIR <- Updated_Golf_Stats %>%
  group_by(Player) %>%
  summarise(mean_GIR = mean(GIR)) %>%
  arrange(desc(mean_GIR))

PA <- Updated_Golf_Stats %>%
  group_by(Player) %>%
  summarise(mean_PA = mean(PA)) %>%
  arrange(desc(mean_PA))

SSP <- Updated_Golf_Stats %>%
  group_by(Player) %>%
  summarise(mean_SSP = mean(SSP)) %>%
  arrange(desc(mean_SSP))

#Finding the mean % Cuts Missed, Outside of Top Ten, Top Ten and Wins produce throughout (2005-2010).Firstly extracting the information from Updated_Golf_Stats and creating a new database. Sorting based off player name and creating a new column called mean_etc.  
summarise(Updated_Golf_Stats, AV_CM_Pct = mean(CM_Pct))
# A tibble: 1 × 1
  AV_CM_Pct
      <dbl>
1      88.3
AV_CM_Pct <- Updated_Golf_Stats %>%
  group_by(Player) %>%
  summarise(mean_CM_Pct = mean(CM_Pct)) %>%

arrange(desc(mean_CM_Pct))

AV_OutsideTT_Pct <- Updated_Golf_Stats %>%
  group_by(Player) %>%
  summarise(mean_Outside_TT_Pct = mean(Outside_TT_Pct)) %>%
  arrange(desc(mean_Outside_TT_Pct))

AV_TT_Pct <- Updated_Golf_Stats %>%
  group_by(Player) %>%
  summarise(mean_TT_Pct = mean(TT_Pct)) %>%
  arrange(desc(mean_TT_Pct))

AV_Wins_Pct <- Updated_Golf_Stats %>%
  group_by(Player) %>%
  summarise(mean_Wins_Pct = mean(Wins_Pct)) %>%
  arrange(desc(mean_Wins_Pct))

I utilized an inner join function and employed the “Player” column as the common denominator to categorize the data into four distinctive aspects, each stored in individual datasheets. The components consist of “Performance per Hole,” analyzing player scores on individual holes, “Course Management,” focusing on player performance metrics with the clubs used, “Overall Performance Outcome,” examining tournament success over a specific period, and “Refined_Golf_2,” a refined dataset processed for improved analysis and interpretation. This method provides a thorough comprehension of several facets of players’ performance in golf, facilitating specific insights and visual representations using graphs and tables.

#Using the inner join function to merge AV_Eagles_Pct with AV_Birdies_Pct to create Eagles_Birdies_Pct. Using "Player" because it is common to both datasets. 
Eagles_Birdies_Pct <- AV_Eagles_Pct %>%
  inner_join(AV_Birdies_Pct, by = "Player")

#Using the inner join function to merge AV_Pars_Pct with AV_Bogies_Pct to create Pars_Bogies_Pct. Using "Player" because it is common to both datasets.
Pars_Bogies_Pct <- AV_Pars_Pct %>%
  inner_join(AV_Bogies_Pct, by = "Player")

#Using the inner join function to merge Eagles_Birdies_Pct with Pars_Bogies_Pct to create Performance_Per_Hole. Using "Player" because it is common to both datasets.
Performance_Per_Hole <- Eagles_Birdies_Pct %>%
  inner_join(Pars_Bogies_Pct, by = "Player")

#Using the inner join function to merge YD with DA to create YD_DA. Using "Player" because it is common to both datasets.
YD_DA <- YD %>%
  inner_join(DA, by = "Player")

#Using the inner join function to merge GIR with PA to create GIR_PA. Using "Player" because it is common to both datasets.
GIR_PA <- GIR %>%
  inner_join(PA, by = "Player")

#Using the inner join function to merge GIR_PA with SSP to create GIR_PA_SSP. Using "Player" because it is common to both datasets.
GIR_PA_SSP <- GIR_PA %>%
  inner_join(SSP, by = "Player")

#Using the inner join function to merge YD_DA with GIR_PA_SSP to create Course_Management. Using "Player" because it is common to both datasets.
Course_Management <- YD_DA %>%
  inner_join(GIR_PA_SSP, by = "Player")

#Using the inner join function to merge YD_DA with GIR_PA_SSP to create Course_Management. Using "Player" because it is common to both datasets.
Course_Management2 <- DA %>%
  inner_join(GIR_PA_SSP, by = "Player")

#Using the inner join function to merge AAV_CM_Pct with AV_OutsideTT_Pct to create CM_OutsideTT_Pct. Using "Player" because it is common to both datasets.
CM_OutsideTT_Pct <- AV_CM_Pct %>%
  inner_join(AV_OutsideTT_Pct, by = "Player")

#Using the inner join function to merge AV_TT_Pct with AV_Wins_Pct to create TT_Wins_Pct. Using "Player" because it is common to both datasets.
TT_Wins_Pct <- AV_TT_Pct %>%
  inner_join(AV_Wins_Pct, by = "Player")

#Using the inner join function to merge TT_Wins_Pct with CM_OutsideTT_Pct to create Overall_Performance_Outcomes. Using "Player" because it is common to both datasets.
Overall_Performance_Outcomes <- TT_Wins_Pct %>%
  inner_join(CM_OutsideTT_Pct, by = "Player")

#Using the inner join function to merge Performance_Per_Hole with Overall_Performance_Outcomes to create Refined_Golf_. Using "Player" because it is common to both datasets.
Refined_Golf_ <- Overall_Performance_Outcomes %>%
  inner_join(Performance_Per_Hole, by = "Player")

#Using the inner join function to merge Refined_Golf_ with Course_Management to create Refined_Golf2_. Using "Player" because it is common to both datasets. 
Refined_Golf2_ <- Refined_Golf_ %>%
  inner_join(Course_Management, by = "Player")

Performance Per Hole

The “Performance per Hole” component focuses on the occurrence of Eagles, Birdies, Pars, and Bogies, using boxplots and tables to clarify these results.This approach provided a graphic depiction and thorough analysis of Tiger Woods’ performance in contrast to other players, revealing any significant differences that highlighted his supremacy.

Eagle%

Tiger Woods stands out as the sole golfer who surpasses a 1 percent average eagle return, precisely reaching 1.04 percent. Remarkably, Tiger’s minimal eagle lower quartile results outshine those of every other player, with the exception of Phil Mickelson. Phil Mickelson, Tiger’s closest competitor, boasts a 0.74 percent average, demonstrating a formidable performance in this regard.

 ggplot(Updated_Golf_Stats, aes(x = as.factor(Player), y = Eagles_Pct, fill = Player)) + 
  geom_boxplot() + 
  scale_fill_manual(values = c("Tiger Woods" = "red", "gray25")) +
   theme_bw() + 
  theme(axis.text.x = element_text(angle = 60, hjust = 1),
        panel.grid.major = element_blank(),   # Remove major grid lines
        panel.grid.minor = element_blank(),   # Remove minor grid lines
        legend.position = "none") +          # Remove legend
  xlab("Player")

# Create a column to specify the color
Performance_Per_Hole$color <- ifelse(Performance_Per_Hole$Player == "Tiger Woods", "red", "gray25")

AV_Eagles_Pct <- data.frame(
  Player = c("Tiger Woods", "Phil Mickelson", "Vijay Singh", "Zach Johnson", "Sergio Garcia", "Kenny Perry", "Steve Stricker", "Geoff Ogilvy", "Jim Furyk", "Luke Donald"),
  Eagles_Percentage = c(1.04, 0.74, 0.66, 0.65, 0.55, 0.41, 0.39, 0.39, 0.34, 0.31)
)
# Generate the table with conditional formatting
knitr::kable(AV_Eagles_Pct,
      digits = c(0, 2),
      align = "lr",
      col.names = c("Player", "Eagles%"),
      caption = "Performance Percentages Per Hole",
      table.attr = 'data-quarto-disable-processing = "true"') %>%
  kable_styling(full_width = F) %>%
  row_spec(which(AV_Eagles_Pct$Player == "Tiger Woods"), bold = TRUE, color ="red", background = "white") 
Performance Percentages Per Hole
Player Eagles%
Tiger Woods 1.04
Phil Mickelson 0.74
Vijay Singh 0.66
Zach Johnson 0.65
Sergio Garcia 0.55
Kenny Perry 0.41
Steve Stricker 0.39
Geoff Ogilvy 0.39
Jim Furyk 0.34
Luke Donald 0.31

Birdies%

The statistic displays the Birdies % of different players, with Tiger Woods leading at 24.00 percent. Tiger frequently gets a greater proportion of birdies than his competition. Phil Mickelson is in second place with a performance of 22.74 percent, showing strength but still behind Tiger. The box plot shows Tiger’s performance as an anomaly, highlighting his remarkable birdie % relative to other players. Steve Stricker, Jim Furyk, Vijay Singh, and other players have good birdie percentages, but none regularly match the level set by Tiger Woods.

ggplot(Updated_Golf_Stats, aes(x = as.factor(Player), y = Birdies_Pct, fill = Player)) + 
  geom_boxplot() +
  scale_fill_manual(values = c("Tiger Woods" = "red", "gray25")) +
  theme_bw() + 
  theme(axis.text.x = element_text(angle = 60, hjust = 1),
        panel.grid.major = element_blank(),   # Remove major grid lines
        panel.grid.minor = element_blank(),   # Remove minor grid lines
        legend.position = "none") +           # Remove legend
  xlab("Player")

# Create a column to specify the color
Performance_Per_Hole$color <- ifelse(Performance_Per_Hole$Player == "Tiger Woods", "red", "gray25")

AV_Birdies_Pct <- data.frame(
  Player = c("Tiger Woods", "Phil Mickelson", "Steve Stricker", "Jim Furyk", "Vijay Singh", "Kenny Perry", "Geoff Ogilvy", "Sergio Garcia", "Luke Donald", "Zach Johnson"),
  Eagles_Percentage = c(24.00, 22.74, 21.56, 21.18, 21.18, 21.14, 21.11, 20.41, 20.41, 19.4)
)
# Generate the table with conditional formatting
knitr::kable(AV_Birdies_Pct,
      digits = c(0, 2),
      align = "lr",
      col.names = c("Player", "Birdies%"),
      caption = "Performance Percentages Per Hole",
      table.attr = 'data-quarto-disable-processing = "true"') %>%
  kable_styling(full_width = F, bootstrap_options = "condensed") %>%
  row_spec(which(AV_Birdies_Pct$Player == "Tiger Woods"), bold = TRUE, color = "red", background = "white")
Performance Percentages Per Hole
Player Birdies%
Tiger Woods 24.00
Phil Mickelson 22.74
Steve Stricker 21.56
Jim Furyk 21.18
Vijay Singh 21.18
Kenny Perry 21.14
Geoff Ogilvy 21.11
Sergio Garcia 20.41
Luke Donald 20.41
Zach Johnson 19.40

Pars%

The data reveals that Tiger’s Par percentage ranks as the second lowest at 62.57%. This finding aligns with expectations, considering his notably high volume of Birdies and Eagles compared to his counterparts. It suggests that Tiger adopts a more aggressive playing style compared to others, opting for riskier shots that may yield higher birdie and eagle rates but could result in lower pars. In terms of the boxplots, Tiger Woods’ higher quartile is comparable to that of Vijah Singh, Sergio Garcia, Kenny Perry, and Geoff Ogilvy. However, if Tiger’s Bogies percentage appears disproportionately high, it could suggest inconsistency in his game.

ggplot(Updated_Golf_Stats, aes(x = as.factor(Player), y = Pars_Pct, fill = Player)) + 
  geom_boxplot() +
  scale_fill_manual(values = c("Tiger Woods" = "red", "gray25")) +
  theme_bw() + 
  theme(axis.text.x = element_text(angle = 60, hjust = 1),
        panel.grid.major = element_blank(),      # Remove major grid lines
        panel.grid.minor = element_blank(),      # Remove minor grid lines
        legend.position = "none") +             # Remove legend
  xlab("Player")

# Create a column to specify the color
Performance_Per_Hole$color <- ifelse(Performance_Per_Hole$Player == "Tiger Woods", "red", "gray25")

AV_Pars_Pct <- data.frame(
  Player = c( "Tiger Woods", "Luke Donald", "Zach Johnson", "Jim Furyk", "Steve Stricker", "Vijay Singh", "Sergio Garcia", "Geoff Ogilvy", "Kenny Perry","Phil Mickelson"),
  Parss_Percentage = c(62.57, 67.03, 65.97, 65.85, 65.46, 64.06, 64.03, 63.79, 63.73, 62.12)
)
# Generate the table with conditional formatting
knitr::kable(AV_Pars_Pct,
      digits = c(0,2),
      align = "lr",
      col.names = c("Player", "Pars%"),
      caption = "Performance Percentages Per Hole",
      table.attr = 'data-quarto-disable-processing = "true"') %>%
  kable_styling(full_width = F) %>%
  row_spec(which(AV_Eagles_Pct$Player == "Tiger Woods"), bold = TRUE, color ="red", background = "white") 
Performance Percentages Per Hole
Player Pars%
Tiger Woods 62.57
Luke Donald 67.03
Zach Johnson 65.97
Jim Furyk 65.85
Steve Stricker 65.46
Vijay Singh 64.06
Sergio Garcia 64.03
Geoff Ogilvy 63.79
Kenny Perry 63.73
Phil Mickelson 62.12

Bogies%

The data shows the bogies percentage for each player, with Tiger Woods having the lowest proportion at 12.37%. Tiger has a high proficiency in avoiding bogeys in comparison to his colleagues. Sergio Garcia, Kenny Perry, and Geoff Ogilvy had bogey percentages of 15.00%, 14.71%, and 14.71% respectively, closely behind Tiger. Phil Mickelson and Vijay Singh have bogey rates of 14.39% and 14.10% respectively. Tiger’s low bogey % demonstrates his consistency and skill in reducing errors when playing golf. Despite his aggressive playing style resulting in several birdies and eagles, he adeptly avoids bogeys. This skill enhances his overall performance and shows a well-rounded approach to his game. Tiger’s reduced bogey percentage compared to his competition highlights his expertise and composure on the course.

 ggplot(Updated_Golf_Stats, aes(x = as.factor(Player), y = Bogies_Pct, fill = Player)) + 
  geom_boxplot() + 
  scale_fill_manual(values = c("Tiger Woods" = "red", "Other" = "gray25"), guide = "none") +
  theme_bw() + 
  theme(axis.text.x = element_text(angle = 60, hjust = 1),
        panel.grid.major = element_blank(),      # Remove major grid lines
        panel.grid.minor = element_blank(),      # Remove minor grid lines
        legend.position = "none") +             # Remove legend
  xlab("Player")

# Create a column to specify the color
AV_Bogies_Pct$color <- ifelse(AV_Bogies_Pct$Player == "Tiger Woods", "red", "gray25")

AV_Bogies_Pct <- data.frame(
  Player = c("Tiger Woods", "Sergio Garcia", "Kenny Perry", "Geoff Ogilvy", "Phil Mickelson", "Vijay Singh", "Zach Johnson", "Jim Furyk", "Steve Stricker", "Luke Donald"),
  Bogies_Percentage = c(12.37, 15, 14.71, 14.71, 14.39, 14.10, 13.90, 12.62, 12.58, 12.25)
)
# Generate the table with conditional formatting
knitr::kable(AV_Bogies_Pct,
             digits = c(0,2),
             align = "lr",
             col.names = c("Player", "Bogies%"),
             caption = "Performance Percentages Per Hole",
             table.attr = 'data-quarto-disable-processing = "true"') %>%
  kable_styling(full_width = FALSE) %>%
  row_spec(which(AV_Eagles_Pct$Player == "Tiger Woods"), bold = TRUE, color ="red")
Performance Percentages Per Hole
Player Bogies%
Tiger Woods 12.37
Sergio Garcia 15.00
Kenny Perry 14.71
Geoff Ogilvy 14.71
Phil Mickelson 14.39
Vijay Singh 14.10
Zach Johnson 13.90
Jim Furyk 12.62
Steve Stricker 12.58
Luke Donald 12.25

Course Management

In the “Course Management” section, these metrics will be compared to Tiger Woods, providing insights into his strategic performance relative to other players.

  1. Driving Distance (YD): Measures tee shot distance.
  2. Driving Accuracy (DA): Tracks fairway hits off the tee.
  3. Greens in Regulation (GIR): Records green reach in regulation.
  4. Sand Save Percentage (SSP): Gauges success in bunker play.
  5. Putting Average (PA): Evaluates putting efficiency.

Each of these performance metrics provides valuable insights into a player’s overall game and strategic decision-making on the course. They help identify strengths and weaknesses in different aspects of a player’s game.

Driving Distance

These results provide the mean yards driven (mean_YD) off the tee for each player, with Tiger Woods averaging 303 yards, the highest among all players listed. This suggests that Tiger consistently achieves greater driving distance compared to his peers. Phil Mickelson follows closely behind with an average of 299 yards, while the rest of the players exhibit progressively lower mean distances. Tiger’s superior driving distance underscores his exceptional power off the tee, potentially providing him with a significant advantage in setting up for subsequent shots and attacking the course aggressively.

YD %>%
  arrange(mean_YD) %>%
  mutate(Player = factor(Player, levels = Player)) %>%
  ggplot(aes(x = Player, y = mean_YD, color = ifelse(Player == "Tiger Woods", "Tiger Woods", "Other"))) +
  geom_segment(aes(xend = Player, yend = 0)) +
  geom_point(aes(size = 2)) +
  scale_color_manual(values = c("Tiger Woods" = "red", "Other" = "gray25"), guide = "none") +             
  coord_flip() +
  theme_bw() +
  theme(panel.grid.major = element_blank(),      # Remove major grid lines
        panel.grid.minor = element_blank())      # Remove minor grid lines

  xlab("")
$x
[1] ""

attr(,"class")
[1] "labels"

Long Game Accuracy

The results provide the average yards driven (mean_YD) and the average driving accuracy (mean_DA) off the tee for each player. Tiger Woods is now leading in the category of mean_YD with a distance of 303 yards, showcasing his excellent driving skills. Tiger’s mean_DA score of 59% indicates a little lesser accuracy than that of some of his contemporaries. Phil Mickelson had a mean_YD of 299 yards and a mean_DA of 56%, showing superior accuracy off the tee than Tiger Woods. Kenny Perry and Sergio Garcia demonstrate impressive driving distance and decent accuracy, with mean_YD scores of 298 and 297 yards, and mean_DA scores of 64% and 58% respectively. Steve Stricker and Jim Furyk have lower mean_YD scores but have exceptional mean_DA scores of 66 and 71 respectively, indicating their accuracy when hitting the ball off the tee. Tiger Woods has exceptional driving distance but his accuracy off the tee may not be as reliable as that of some of his peers. It indicates that he can hit the ball far, but there is potential for improving his accuracy in hitting the fairway regularly.This isn’t surprising because players who can hit the ball farther off the tee tend to sacrifice some level of accuracy, while those who prioritize accuracy may not achieve the same level of distance.

ggplot(data = Course_Management) +
  geom_point(mapping = aes(x = mean_DA, y = mean_YD, colour = Player)) +
  geom_text(mapping = aes(x = mean_DA, y = mean_YD, label = Player), hjust = 0.4, vjust = 1.1, size = 2.3) +
  scale_color_manual(values = c("Tiger Woods" = "red", "Other" = "gray25"), guide = "none") +
  ggtitle("Long Game") +                         # Add title
  geom_smooth(mapping = aes(x = mean_DA, y = mean_YD)) +
  theme_bw() +
  theme(panel.grid.major = element_blank(),      # Remove major grid lines
        panel.grid.minor = element_blank())      # Remove minor grid lines
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

DA and GIR

Despite having a relatively lower Driving Accuracy, Tiger Woods consistently achieves one of the highest GIR percentages among the listed players. His mean_GIR of 68% is the highest in the dataset. This highlights his exceptional ball-striking ability and proficiency in hitting greens in regulation.Woods’ ability to maintain a high GIR percentage despite a lower DA indicates his capability to play an aggressive strategy. While some players may prioritize accuracy off the tee to avoid hazards, Woods may choose to prioritize distance or line to set up more favorable approach shots. This aggressive approach allows him to frequently attack pins and set up birdie opportunities, compensating for any potential inaccuracies off the tee. Woods’ game is characterized by strategic adaptability. Despite not always hitting fairways with pinpoint accuracy, he adjusts his game to capitalize on his strengths, such as his exceptional iron play and ability to recover from difficult positions. This adaptability allows him to consistently perform at a high level even when his tee shots may not always find the fairway.

# Create scatter plot with individual player labels and conditional coloring
ggplot(Course_Management, aes(x = mean_DA, y = mean_GIR)) +
  geom_point(aes(color = Player == "Tiger Woods"), size = 2) +
  geom_text(aes(label = ifelse(Player == "Tiger Woods", "Tiger Woods", "")), 
            size = 3, vjust = 1.3) +  # Add player labels
  scale_color_manual(values = c("TRUE" = "red", "FALSE" = "black")) +  # Define color palette
  labs(x = "Mean Drive Accuracy", y = "Mean Greens in Regulation", 
       title = "Drive Accuracy vs. Greens in Regulation") +
  theme_minimal() +  # Remove background color
  theme(legend.position = "none",  # Remove legend
        panel.background = element_blank())  # Remove background color

GIR

Tiger Woods is the only player in the dataset with a GIR percentage exceeding 68.5%, showcasing his exceptional proficiency in hitting greens in regulation. Five players fall within the 67% to 68.5% GIR range, demonstrating strong performance, albeit below Woods’ level of excellence. Three players have GIR percentages ranging from 65% to 66%, demonstrating solid performance in hitting greens in regulation. One player has a GIR below 62%, indicating lower proficiency in this aspect of the game. Woods’ dominance in this metric reinforces his reputation as one of the greatest golfers of all time, showcasing his ability to consistently position himself for scoring opportunities. Woods’ ability to consistently hit greens in regulation at such a high rate sets him apart from his peers and underscores his impact on the game of golf.

ggplot(data = GIR) +
  geom_histogram(mapping = aes(x = mean_GIR, fill = Player), binwidth = 1.5, color = "black") +
  scale_fill_manual(values = c("Tiger Woods" = "red", "Other Players" = "gray25")) +
  ggtitle("GIR of the Players") +
  xlab("Mean GIR") +
  ylab("Frequency") +
  theme_minimal() +  # Use minimal theme
  theme(
    legend.position = "none",  # Remove legend
    panel.grid.major = element_blank(),  # Remove major gridlines
    panel.grid.minor = element_blank(),  # Remove minor gridlines
    panel.background = element_blank()  # Remove panel background
    )

Sand Save Percentage

Tiger Woods’ mean_SSP of 54% places him in the median range among the other players in the sample. Although his sand save % may not be the greatest, it still shows his ability to recover efficiently from greenside bunkers.Woods’ great performance in Greens in Regulation (GIR) and decent sand save percentage demonstrate his adaptability and skill in handling difficult conditions on the golf course. Although not excelling in one particular aspect, Woods’ comprehensive skill set and ability to adjust greatly contribute to his reputation as one of the greatest golfers in history.

SSP %>%
  mutate(Player = fct_reorder(Player, mean_SSP)) %>%
  ggplot(aes(x = Player, y = mean_SSP, fill = ifelse(Player == "Tiger Woods", "Tiger Woods", "Other"))) +
  geom_bar(stat = "identity", alpha = 0.6, width = 0.4) +
  scale_fill_manual(values = c("Tiger Woods" = "red", "Other" = "gray25"), guide = "none") +
  coord_flip() +
  xlab("") +
  theme_bw() +
   theme(panel.grid.major = element_blank(),
        panel.grid.minor = element_blank())

Putting Average

When it comes to creating typical violinplot, Tiger Woods stands out as a prominent player. A significant proportion of his putting average is 1.74 or below, showcasing his unparalleled superiority in putting abilities. Steve Stricker, Phil Mickelson, and Zach Johnson have had competitive stretches with averages below 1.74, but none have consistently matched Tiger Woods in this regard. The difference is shown by the smaller range of their data in comparison to Tiger Woods, emphasizing his extraordinary and consistent performance.

# Create a violin plot for all players overlaid
ggplot(Updated_Golf_Stats, aes(x = Player, y = PA, fill = Player)) +
  geom_violin(trim = FALSE, alpha = 0.5) +  # Adjust the width of the violin plot
  geom_boxplot(width = 0.1, fill = "white", color = "black") +  # Add a boxplot for reference
  scale_fill_manual(values = c("Tiger Woods" = "red", "gray25")) +  # Set fill colors
  labs(y = "Mean Putting Average", x = NULL, fill = NULL, title = "PA") +
  theme_minimal() +  # Set a minimal theme
  facet_wrap(~ Player, scales = "free")  # Facet by individual players with free scales

# Save the plot as an image file if needed
ggsave("violin_plots.png", width = 25, height = 20, dpi = 300)

Overall Performance Outcomes

This section examines golfers’ performance using metrics such as Top Ten Finishes (TT), Wins, Cuts Made (CM), and Outside of Top Ten Finishes (OTT). TT measures a golfer’s consistency of finsishing in the Top Ten in a tournament. Wins demonstrate a player’s ability to beat opponents in high-stakes events. CM shows how many times a golfer makes the cut in events to advance to the next round. OTT includes tournaments where a golfer finishes outside the top ten. These measurements demonstrate golfers’ consistency, success, and competitiveness on the professional tour, making it crucial to understand their impact in golf.

Preparing Data

Filtering and creating separate datasheets for Tiger Woods and the Rest so they can be used to compare with one another. In creating a datasheet with the rest average and Tiger Woods.

#Filtering and creating new datasheets
the_restPO <- filter(Overall_Performance_Outcomes, Player %in% c ("Zach Johnson", "Phil Mickelson", "Jim Furyk", "Steve Stricker", "Luke Donald", "Kenny Perry", "Geoff Ogilvy", "Vijay Singh", "Sergio Garcia"))

Tiger_WoodsPO <- filter(Overall_Performance_Outcomes, Player %in% c ("Tiger Woods"))

# Calculate the averages
mean_TT_Pct <- mean(the_restPO$mean_TT_Pct, na.rm = TRUE)
mean_Wins_Pct <- mean(the_restPO$mean_Wins_Pct, na.rm = TRUE)
mean_CM_Pct <- mean(the_restPO$mean_CM_Pct, na.rm = TRUE)
mean_Outside_TT_Pct <- mean(the_restPO$mean_Outside_TT_Pct, na.rm = TRUE)

# Create a new dataset
rest_average <- data.frame(
  Player = c("Rest Average"),
  mean_TT_Pct = mean_TT_Pct,
  mean_Wins_Pct = mean_Wins_Pct,
  mean_CM_Pct = mean_CM_Pct,
  mean_Outside_TT_Pct = mean_Outside_TT_Pct
)

# Print the new dataset
print(rest_average)
        Player mean_TT_Pct mean_Wins_Pct mean_CM_Pct mean_Outside_TT_Pct
1 Rest Average    37.16198      8.536309    86.94061            49.77863

Second stage in combing the Datasheetss so I am able to comapre the rest’s overall performance to Tiger Woods’s.

# Combine the data and remove decimal points from specific columns
Tiger_Rest <- bind_rows(
  mutate(rest_average, Player = "Rest Average"),
  mutate(Tiger_WoodsPO, Player = "Tiger Woods")
)

# Print the combined dataset
print(Tiger_Rest)
        Player mean_TT_Pct mean_Wins_Pct mean_CM_Pct mean_Outside_TT_Pct
1 Rest Average    37.16198      8.536309    86.94061            49.77863
2  Tiger Woods    68.37302     38.535053    91.56085            23.18783
# Rename the columns
Tiger_Rest <- Tiger_Rest %>%
  rename(
    Category = Player,
    TT = mean_TT_Pct,
    Wins = mean_Wins_Pct,
    CM = mean_CM_Pct,
    OTT = mean_Outside_TT_Pct
  )
# Round up the values in TT, Wins, CM, and OTT columns
Tiger_Rest <- Tiger_Rest %>%
  mutate(
    TT = ceiling(TT),  # Round up
    Wins = ceiling(Wins),  # Round up
    CM = ceiling(CM),  # Round up
    OTT = ceiling(OTT)  # Round up
  )

# Print the modified dataset
print(Tiger_Rest)
      Category TT Wins CM OTT
1 Rest Average 38    9 87  50
2  Tiger Woods 69   39 92  24

Radar chart and Table

Tiger Woods has a notably higher rating in the “TT” category compared to the Rest Average, suggesting a higher number of Top Ten finishes. Tiger Woods surpasses the Rest Average in the “Wins” and “CM” categories, with higher numbers representing more victories and cuts made. Rest Average has a higher number than Tiger Woods in the “OTT” category, indicating more frequent finishes outside the Top Ten. We can see that Tiger Woods was 31% more likely to finish in the top ten (69%) appose to his pears (38%). Similarly, Tiger Woods (39%) out performs the rest (9%) by 30% in regards to wins. Tiger Woods made (92%) of cuts, like The Rest who made (87%). The Rest are twice as likely in to finish outside of the top ten (50%), contrasting to to Tiger Woods who finished outside of the top ten once making the cut only (24%) of the time.

library(ggplot2)
library(tidyr)

# Provided data
data <- data.frame(
  Category = c("TT", "Wins", "CM", "OTT"),
  Rest_Average = c(37.16198, 8.536309, 86.94061, 49.77863),
  Tiger_Woods = c(68.37302, 38.535053, 91.56085, 23.18783)
)

# Convert data to long format
data_long <- pivot_longer(data, -Category, names_to = "Player", values_to = "Value")

# Plot radar chart with solid fill color
ggplot(data_long, aes(x = Category, y = Value, group = Player, fill = Player)) +
  geom_polygon(size = 1, alpha = 0.5) +
  geom_point(aes(color = Player), size = 3, show.legend = FALSE) +
  ylim(0, 100) +  # Adjust the y-axis limits as needed
  scale_color_manual(values = c(Tiger_Woods = "red", Rest_Average = "gray25")) +  # Set colors for Tiger Woods and Rest Average
  scale_fill_manual(values = c(Tiger_Woods = "red", Rest_Average = "gray25")) +  # Set fill colors for Tiger Woods and Rest Average
  coord_polar() +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 0, vjust = 0.5, hjust = 1)) +
  labs(title = "Tiger Woods vs The Rest")
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

# Print the modified dataset as a styled table
knitr::kable(Tiger_Rest,
             align = "lr",  # Align columns to the left and right
             col.names = c("Category", "TT", "Wins", "CM", "OTT"),  # Column names
             caption = "Overall Performance Outcomes",  # Table caption
             table.attr = 'data-quarto-disable-processing = "true"') %>%  
  kable_styling(full_width = F) %>%  # Set width and position of the table
  row_spec(which(Tiger_Rest$Category == "Tiger Woods"), bold = TRUE, color = "white", background = "red") %>%  # Highlight "Tiger Woods" row
  row_spec(which(Tiger_Rest$Category == "Rest Average"), bold = TRUE, color = "white", background = "gray25")  # Highlight "Rest Average" row
Overall Performance Outcomes
Category TT Wins CM OTT
Rest Average 38 9 87 50
Tiger Woods 69 39 92 24

Conclusion

Tiger Woods demonstrated remarkable performance in Performance per hole, Course Management, and Overall Performance Outcomes from 2005-2010. Tiger’s performance each hole was exceptional with an Eagle percentage of 1.04, Birdie percentage of 24, and Bogey percentage of 12.37. Woods has the ability to consistently play an assertive game. Tiger’s aggressive approach was built around his impressive Driving Distance average of 303 yards. The tiger’s impressive Greens in Regulation (GIR) percentage of 68.73% allowed it to adopt a particular playing style, albeit sacrificing its Driving Accuracy to 59%. Woods’s sand save percentage was ordinary at 54%, but his putting average was outstanding at 1.74 and he consistently maintained that level. Woods’s performance results much beyond the others, especially in terms of his win percentage (39% vs. 9%) and his ability to place in the top 10 (69% vs. 38%).