Golf Data Analysis


Lucas Young
Original: 31 October 2020
Updated: 9 Aug 2022


For the past 3 years I have collected data on most of my golf rounds. After a round, the data was compiled into a spreadsheet with each row corresponding to a 9-hole round. Some entries are incomplete due to weather delays, running out of daylight, time constraints, etc. The vast majority of rounds were played on the public courses in Fargo, North Dakota, but several other courses appear in the data set.


Fargo public golf courses:


In 2018 I collected the following variables:


In 2019, 2020, and 2021 I collected all of the variables from 2018 and added the following:


In this analysis I will attempt to answer the following questions:



Data Processing

Load Relevant Libraries

library(tidyverse)
library(magrittr)
library(janitor)
library(lubridate)
library(ggthemes)
library(scales)
library(ggpubr)
library(corrplot)


Set Global Image Options

knitr::opts_chunk$set(dpi = 200)


Load 2018 Data

Golf_2018 <- read_csv('./2018 Golf.csv', n_max = 105)


Load 2019 Data

Golf_2019 <- read_csv('./2019 Golf.csv', n_max = 143)


Load 2020 Data

Golf_2020 <- read_csv('./2020 Golf.csv', n_max = 177)


Load 2021 Data

Golf_2021 <- read_csv('./2021 Golf.csv', n_max = 225)


Merge Data

Merged_Golf <- bind_rows(Golf_2018, Golf_2019, Golf_2020, Golf_2021)

names(Merged_Golf) <- 
  make_clean_names(
    names(Merged_Golf),
    case = 'upper_camel'
)

Merged_Golf %<>% rename(FIR = Fir, GIR = Gir)

Merged_Golf$Date %<>% strptime(format = '%d-%b-%y') %<>% as.Date
Merged_Golf$Course %<>% as.factor
Merged_Golf$Wind %<>% parse_number
Merged_Golf$Temp %<>% parse_number
Merged_Golf$Tee %<>% as.factor
Merged_Golf$FeeAvoided %<>% parse_number

Merged_Golf$Weekday <- as.factor(weekdays(Merged_Golf$Date))

Merged_Golf$Weekday <- 
  factor(
    Merged_Golf$Weekday, 
    levels = c(
      'Monday',
      'Tuesday',
      'Wednesday',
      'Thursday',
      'Friday',
      'Saturday',
      'Sunday'
    ), 
    ordered = TRUE
  )

Merged_Golf$Year <- as.factor(year(Merged_Golf$Date))


Approximate Number of Playing Partners

PartnerCounter <- function(Partners){
  
  PartnerCount <- 0
  
  CommaCount <- str_count(Partners, ',')
  
  AndCount <- str_count(Partners, 'and')
  
  AndSymbolCount <- str_count(Partners, '&')
  
  PartnerCount <- CommaCount + AndCount + AndSymbolCount + 1
  
  if (is.na(Partners)){
    Partners <- 'None'
  }
  
  if (is.na(Partners)){
    PartnerCount <- 0
  }
  
  if (Partners == 'None'){
    PartnerCount <- 0
  }
  
  return(PartnerCount)
  
}

Merged_Golf$PartnerCount <- sapply(Merged_Golf$Partners, PartnerCounter)



Data Verification

Date:

Merged_Golf %>% 
filter(!is.na(Total)) %>% 
  
ggplot(aes(x = Date)) + 
geom_histogram(binwidth = 15, fill = '#DE5246', color = 'black') +
facet_grid(. ~ Year, scale = 'free_x') +
theme_bw() + 
xlab(paste('\n','Month')) +
ylab(paste('Number of 9-Hole Rounds', '\n')) +
ggtitle(paste('Histogram of Date', '\n')) + 
theme(plot.title = element_text(hjust = 0.5))


Course:

sort(unique(Merged_Golf$Course))
##  [1] 3 Bears                       Albany Back                  
##  [3] Albany Front                  Applewood Hills (Stillwater) 
##  [5] Buffalo Heights               Chaska Par 30                
##  [7] Columbia Front                Cottonwood                   
##  [9] Edelweiss Back                Edelweiss Front              
## [11] Edgebrook                     Edgewood Back                
## [13] Edgewood Front                El Zagal                     
## [15] Emerald Greens Back           Emerald Greens Front         
## [17] Emerald Greens Gold           Garrison                     
## [19] Hawley Back                   Hawley Front                 
## [21] Hiawatha Back                 Hiawatha Front               
## [23] Maple River Back              Maple River Front            
## [25] Meadows Back                  Meadows Front                
## [27] Moorhead CC Back              Moorhead CC Front            
## [29] Norsk                         Osgood                       
## [31] Osgood 3 Hole                 Pinewood                     
## [33] Pleasant View Lake            Pleasant View Woods          
## [35] Prairiewood                   Rose Creek Back              
## [37] Rose Creek Front              Royal Golf Club Back         
## [39] Royal Golf Club Front         South Hills (Waterloo)       
## [41] Stillwater Oaks Back          Stillwater Oaks Front        
## [43] Sunrise Ridge                 The Lakes                    
## [45] Theodore Wirth Front          Valley Golf (Wilmar)         
## [47] Village Green Back            Village Green Front          
## [49] Waverly Municipal Golf Course
## 49 Levels: 3 Bears Albany Back Albany Front ... Waverly Municipal Golf Course


Holes:

HoleScores <- 
  Merged_Golf %>% 
  select(Hole1, Hole2, Hole3, Hole4, Hole5, Hole6, Hole7, Hole8, Hole9, Year) %>%
  pivot_longer(
    cols = c(Hole1, Hole2, Hole3, Hole4, Hole5, Hole6, Hole7, Hole8, Hole9)
  )

names(HoleScores) <- c('Year', 'Hole', 'Score')
HoleScores %>% 
filter(!is.na(Score)) %>% 
  
ggplot(aes(x = Score)) + 
geom_histogram(binwidth = 1, fill = '#DE5246', color = 'black') + 
scale_x_continuous(labels = c('1', '2', '3', '4', '5', '6', '7', '8', '9'), 
breaks = c(1, 2, 3, 4, 5, 6, 7, 8, 9)) +
facet_grid(. ~ Year) +
theme_bw() +
xlab(paste('\n', 'Individual Hole Score')) + 
ylab(paste('Number of Holes', '\n')) +
ggtitle(paste('Histogram of Individual Hole Scores', '\n')) +
theme(plot.title = element_text(hjust = 0.5))


Total Score:

Merged_Golf %>% 
filter(!is.na(Total)) %>% 
  
ggplot(aes(x = Total)) +
geom_histogram(binwidth = 2, fill = '#DE5246', color = 'black') + 
facet_grid(. ~ Year) +
theme_bw() + 
xlab(paste('\n', 'Score (Total)')) + 
ylab(paste('Number of 9-Hole Rounds', '\n')) +
ggtitle(paste('Histogram of 9-Hole Round Scores', '\n')) +
theme(plot.title = element_text(hjust = 0.5))


To Par:

Merged_Golf %>% 
filter(!is.na(ToPar)) %>%
  
ggplot(aes(x = ToPar)) +
geom_histogram(binwidth = 2, fill = '#DE5246', color = 'black') + 
facet_grid(. ~ Year) +
theme_bw() + 
xlab(paste('\n', 'Score (To Par)')) + 
ylab(paste('Number of 9-Hole Rounds', '\n')) +
ggtitle(paste('Histogram of 9-Hole Round Scores Relative to Par', '\n')) +
theme(plot.title = element_text(hjust = 0.5))


Wind:

Merged_Golf %>% 
filter(!is.na(Wind)) %>%
  
ggplot(aes(x = Wind)) +       
geom_histogram(binwidth = 2, fill = '#DE5246', color = 'black') + 
facet_grid(. ~ Year) +
theme_bw() + 
xlab(paste('\n', 'Wind Speed (mph)')) + 
ylab(paste('Number of 9-Hole Rounds', '\n')) +
ggtitle(paste('Histogram of Wind Speed', '\n')) +
theme(plot.title = element_text(hjust = 0.5))


Temp:

Merged_Golf %>% 
filter(!is.na(Temp)) %>% 
  
ggplot(aes(x = Temp)) + 
geom_histogram(binwidth = 5, fill = '#DE5246', color = 'black') + 
facet_grid(. ~ Year) +
theme_bw() + 
xlab(paste('\n', 'Temp (°F)')) + 
ylab(paste('Number of 9-Hole Rounds', '\n')) +
ggtitle(paste('Histogram of Temperature', '\n')) +
theme(plot.title = element_text(hjust = 0.5))


FIR:

Merged_Golf %>% 
filter(!is.na(FIR)) %>% 
  
ggplot(aes(x = FIR)) + 
geom_histogram(binwidth = 0.1, fill = '#DE5246', color = 'black') + 
facet_grid(. ~ Year) +
theme_bw() + 
xlab(paste('\n', 'FIRs (%)')) + 
ylab(paste('Number of 9-Hole Rounds', '\n')) +
ggtitle(paste('Histogram of FIRs', '\n')) +
theme(plot.title = element_text(hjust = 0.5))


GIR:

Merged_Golf %>% 
filter(!is.na(FIR)) %>% 
  
ggplot(aes(x = GIR)) + 
geom_histogram(binwidth = 0.11, fill = '#DE5246', color = 'black') +
facet_grid(. ~ Year) +
theme_bw() + 
xlab(paste('\n', 'GIRs (%)')) + 
ylab(paste('Number of 9-Hole Rounds', '\n')) +
ggtitle(paste('Histogram of GIRs', '\n')) +
theme(plot.title = element_text(hjust = 0.5))


Putts / Hole:

Merged_Golf %>% 
filter(!is.na(PuttsPerHole)) %>% 
  
ggplot(aes(x = PuttsPerHole)) + 
geom_histogram(binwidth = 0.12, fill = '#DE5246', color = 'black') +
facet_grid(. ~ Year) +
theme_bw() + 
xlab(paste('\n', 'Putts / Hole')) + 
ylab(paste('Number of 9-Hole Rounds', '\n')) +
ggtitle(paste('Histogram of Putts / Hole', '\n')) +
theme(plot.title = element_text(hjust = 0.5))


Time:

Merged_Golf %>% 
filter(!is.na(Time)) %>% 
filter(!is.na(Total)) %>% 
  
ggplot(aes(x = Time)) + 
geom_histogram(bins = 15, fill = '#DE5246', color = 'black') +
scale_x_time(labels = time_format('%H:%M')) + 
facet_grid(. ~ Year) +
theme_bw() + 
xlab(paste('\n', 'Start Time')) + 
ylab(paste('Number of 9-Hole Rounds', '\n')) +
ggtitle(paste('Histogram of 9-Hole Round Start Times', '\n')) +
theme(plot.title = element_text(hjust = 0.5))


Partner Count:

Merged_Golf %>% 
filter(!is.na(PartnerCount)) %>% 
filter(!is.na(Total)) %>% 
  
ggplot(aes(x = PartnerCount)) + 
geom_histogram(binwidth = 1, fill = '#DE5246', color = 'black') +
facet_grid(. ~ Year) +
theme_bw() + 
xlab(paste('\n', 'Number of Playing Partners')) + 
ylab(paste('Number of 9-Hole Rounds', '\n')) +
ggtitle(paste('Histogram of Playing Partners', '\n')) +
theme(plot.title = element_text(hjust = 0.5))



Results

Have my scores changed over time?

Merged_Golf %>% 
filter(!is.na(ToPar)) %>% 
  
ggplot(aes(x = Date, y = ToPar)) + 
geom_point(color = '#DE5246', size = 1.5) +
facet_grid(. ~ Year, scale = 'free_x') +
geom_smooth(method = 'lm', formula = y ~ x, se = TRUE, color = '#4C8BF5') +
stat_cor(
  method = 'pearson', 
  color = 'black',
  size = 4,
  label.x.npc = 0.97, 
  label.y.npc = 0.90, 
  hjust = 1
) +
theme_bw() + 
xlab(paste('\n', 'Month')) + 
ylab(paste('Score (To Par)', '\n')) +
ggtitle(paste('9-Hole Scores Over Time', '\n')) +
theme(plot.title = element_text(hjust = 0.5))

Merged_Golf %>% 
filter(!is.na(ToPar)) %>% 
  
ggplot(aes(y = ToPar, fill = Year)) +
geom_boxplot(alpha = 0.5) +
scale_fill_gdocs() +
theme_bw() +
ylab(paste('Score (To Par)', '\n')) +
ggtitle(paste('Comparison of 9-Hole Round Scores', '\n')) +
theme(
  plot.title = element_text(hjust = 0.5),
  axis.title.x=element_blank(),
  axis.text.x = element_blank(),
  axis.ticks.x = element_blank()
)

Merged_Golf %>% 
filter(!is.na(Total)) %>%
  
ggplot(aes(x = ToPar, fill = Year)) +
geom_density(alpha = 0.5) +
scale_fill_gdocs() +
theme_bw() + 
xlab(paste('\n', 'Score (To Par)')) + 
ylab(paste('Density', '\n')) +
ggtitle(paste('Density Plot of 9-Hole Round Scores', '\n')) +
theme(plot.title = element_text(hjust = 0.5))


Have my FIRs changed over time?

Merged_Golf %>% 
filter(!is.na(ToPar)) %>%
filter(Year != 2018) %>% 
filter(!is.na(FIR)) %>% 
  
ggplot(aes(x = Date, y = FIR)) + 
geom_point(color = '#DE5246', size = 1.5) +
facet_grid(. ~ Year, scale = 'free_x') +
geom_smooth(method = 'lm', formula = y ~ x, se = TRUE, color = '#4C8BF5') +
stat_cor(
  method = 'pearson', 
  color = 'black',
  size = 4,
  label.x.npc = 0.97, 
  label.y.npc = 0.97, 
  hjust = 1
) +
theme_bw() + 
xlab(paste('\n', 'Month')) + 
ylab(paste('FIRs (%)', '\n')) +
ggtitle(paste('FIRs Over Time', '\n')) +
theme(plot.title = element_text(hjust = 0.5))


How do FIRs affect score?

Merged_Golf %>% 
filter(!is.na(ToPar)) %>%
filter(Year != 2018) %>% 
filter(!is.na(FIR)) %>% 
  
ggplot(aes(x = FIR, y = ToPar)) + 
geom_point(color = '#DE5246', size = 1.5) +
facet_grid(. ~ Year) +
geom_smooth(method = 'lm', formula = y ~ x, se = TRUE, color = '#4C8BF5') +
stat_cor(
  method = 'pearson', 
  color = 'black',
  size = 4,
  label.x.npc = 0.05, 
  label.y.npc = 0.93, 
  hjust = 0
) +
theme_bw() + 
xlab(paste('\n', 'FIRs (%)')) + 
ylab(paste('Score (To Par)', '\n')) +
ggtitle(paste('Score vs FIRs', '\n')) +
theme(plot.title = element_text(hjust = 0.5))


Have my GIRs changed over time?

Merged_Golf %>% 
filter(!is.na(ToPar)) %>%
filter(Year != 2018) %>% 
filter(!is.na(GIR)) %>% 
  
ggplot(aes(x = Date, y = GIR)) + 
geom_point(color = '#DE5246', size = 1.5) +
facet_grid(. ~ Year, scale = 'free_x') +
geom_smooth(method = 'lm', formula = y ~ x, se = TRUE, color = '#4C8BF5') +
stat_cor(
  method = 'pearson', 
  color = 'black',
  size = 4,
  label.x.npc = 0.05, 
  label.y.npc = 0.95, 
  hjust = 0
) +
theme_bw() +
xlab(paste('\n', 'Month')) + 
ylab(paste('GIRs (%)', '\n')) +
ggtitle(paste('GIRs Over Time', '\n')) +
theme(plot.title = element_text(hjust = 0.5))


How do GIRs affect score?

Merged_Golf %>% 
filter(!is.na(ToPar)) %>%
filter(Year != 2018) %>% 
filter(!is.na(GIR)) %>% 
  
ggplot(aes(x = GIR, y = ToPar)) + 
geom_point(color = '#DE5246', size = 1.5, alpha = 0.3) +
facet_grid(. ~ Year, scale = 'free_x') +
geom_smooth(method = 'lm', formula = y ~ x, se = TRUE, color = '#4C8BF5') +
stat_cor(
  method = 'pearson', 
  color = 'black',
  size = 4,
  label.x.npc = 0.97, 
  label.y.npc = 0.97, 
  hjust = 1
) +
xlim(0, 1) +
theme_bw() +
xlab(paste('\n', 'GIRs (%)')) + 
ylab(paste('Score (To Par)', '\n')) +
ggtitle(paste('Score vs GIRs', '\n')) +
theme(plot.title = element_text(hjust = 0.5)) +
labs(caption = paste(
  '\n', 
  'Note: Transparency has been added to show that', 
  'several points share the same coordinates.')
) + 
theme(plot.caption = element_text(hjust = 0))


Have my putts per hole changed over time?

Merged_Golf %>% 
filter(!is.na(ToPar)) %>%
filter(Year != 2018) %>% 
filter(!is.na(PuttsPerHole)) %>% 
  
ggplot(aes(x = Date, y = PuttsPerHole)) + 
geom_point(color = '#DE5246', size = 1.5) +
facet_grid(. ~ Year, scale = 'free_x') +
geom_smooth(method = 'lm', formula = y ~ x, se = TRUE, color = '#4C8BF5') +
stat_cor(
  method = 'pearson', 
  color = 'black',
  size = 4,
  label.x.npc = 0.97, 
  label.y.npc = 0.97, 
  hjust = 1
) +
theme_bw() +
xlab(paste('\n', 'Month')) + 
ylab(paste('Putts / Hole', '\n')) +
ggtitle(paste('Putts / Hole Over Time', '\n')) +
theme(plot.title = element_text(hjust = 0.5))


How do putts per hole affect score?

Merged_Golf %>% 
filter(!is.na(ToPar)) %>%
filter(Year != 2018) %>% 
filter(!is.na(PuttsPerHole)) %>% 
  
ggplot(aes(x = PuttsPerHole, y = ToPar)) + 
geom_point(color = '#DE5246', size = 1.5, alpha = 0.3) +
facet_grid(. ~ Year) +
geom_smooth(method = 'lm', formula = y ~ x, se = TRUE, color = '#4C8BF5') +
stat_cor(
  method = 'pearson', 
  color = 'black',
  size = 4,
  label.x.npc = 0.00, 
  label.y.npc = 0.89, 
  hjust = 0
) +
theme_bw() +
xlab(paste('\n', 'Putts / Hole')) + 
ylab(paste('Score (To Par)', '\n')) +
ggtitle(paste('Score vs Putts / Hole', '\n')) +
theme(plot.title = element_text(hjust = 0.5)) +
labs(caption = paste(
  '\n', 
  'Note: Transparency has been added to show that', 
  'several points share the same coordinates.')
) + 
theme(plot.caption = element_text(hjust = 0))


Does wind speed affect score?

Merged_Golf %>% 
filter(!is.na(ToPar)) %>%
filter(!is.na(Wind)) %>% 
  
ggplot(aes(x = Wind, y = ToPar)) + 
geom_point(color = '#DE5246', size = 1.5) +
facet_grid(. ~ Year) +
geom_smooth(method = 'lm', formula = y ~ x, se = TRUE, color = '#4C8BF5') +
stat_cor(
  method = 'pearson', 
  color = 'black',
  size = 4,
  label.x.npc = 0.95, 
  label.y.npc = 0.98, 
  hjust = 1
) +
theme_bw() +
xlab(paste('\n', 'Wind Speed (mph)')) + 
ylab(paste('Score (To Par)', '\n')) +
ggtitle(paste('Score vs Wind Speed', '\n')) +
theme(plot.title = element_text(hjust = 0.5))


Does temperature affect score?

Merged_Golf %>% 
filter(!is.na(ToPar)) %>% 
filter(!is.na(Temp)) %>% 
  
ggplot(aes(x = Temp, y = ToPar)) + 
geom_point(color = '#DE5246', size = 1.5) +
facet_grid(. ~ Year) +
geom_smooth(method = 'lm', formula = y ~ x, se = TRUE, color = '#4C8BF5') +
stat_cor(
  method = 'pearson', 
  color = 'black',
  size = 4,
  label.x.npc = 0.00, 
  label.y.npc = 0.88, 
  hjust = 0
) +
theme_bw() +
xlab(paste('\n', 'Temp (°F)')) + 
ylab(paste('Score (To Par)', '\n')) +
ggtitle(paste('Score vs Temp', '\n')) +
theme(plot.title = element_text(hjust = 0.5))


Does score vary by course?

Merged_Golf_Fargo <- 
  Merged_Golf %>%
  filter(
    !is.na(ToPar),
    Course == 'Rose Creek Front' |  
    Course == 'Rose Creek Back' |
    Course == 'Edgewood Front' |
    Course == 'Edgewood Back' |
    Course == 'Osgood' |
    Course == 'Prairiewood' |
    Course == 'El Zagal'
  )

Merged_Golf_Fargo$Course <- 
  factor(
    Merged_Golf_Fargo$Course, 
    levels = c(
      'Rose Creek Front',
      'Rose Creek Back',
      'Edgewood Front',
      'Edgewood Back',
      'Osgood',
      'Prairiewood',
      'El Zagal'
    ), 
    ordered = TRUE
  )
Merged_Golf_Fargo %>% 
filter(Year == 2020) %>% 
filter(!is.na(ToPar)) %>% 
filter(
  Course == 'Rose Creek Front' |  
  Course == 'Rose Creek Back' |
  Course == 'Edgewood Front' |
  Course == 'Edgewood Back'
) %>% 
  
ggplot(aes(x = ToPar, fill = Course)) +
facet_grid(. ~ Course) +
geom_density(alpha = 0.5) +
xlim(-3, 13) +
scale_fill_gdocs() +
theme_bw() +
xlab(paste('\n', 'Score (To Par)')) + 
ylab(paste('Density', '\n')) +
ggtitle(paste('Density Plot of Par 5 Course Scores (2020)', '\n')) +
theme(
  plot.title = element_text(hjust = 0.5),
  legend.position = 'none'
)

Merged_Golf_Fargo %>% 
filter(Year == 2021) %>% 
filter(!is.na(ToPar)) %>% 
filter(
  Course == 'Rose Creek Front' |  
  Course == 'Rose Creek Back' |
  Course == 'Edgewood Front' |
  Course == 'Edgewood Back'
) %>% 
  
ggplot(aes(x = ToPar, fill = Course)) +
facet_grid(. ~ Course) +
geom_density(alpha = 0.5) +
xlim(-3, 10) +
scale_fill_gdocs() +
theme_bw() +
xlab(paste('\n', 'Score (To Par)')) + 
ylab(paste('Density', '\n')) +
ggtitle(paste('Density Plot of Par 5 Course Scores (2021)', '\n')) +
theme(
  plot.title = element_text(hjust = 0.5),
  legend.position = 'none'
)

Merged_Golf_Fargo %>% 
filter(Year == 2020) %>% 
filter(!is.na(ToPar)) %>% 
filter(Course == 'Rose Creek Front' |  
       Course == 'Rose Creek Back' |
       Course == 'Edgewood Front' |
       Course == 'Edgewood Back'
) %>%  
  
ggplot(aes(Course, ToPar, fill = Course)) +
geom_boxplot(alpha = 0.5) +
scale_fill_gdocs() +
theme_bw() +
ylab(paste('Score (To Par)', '\n')) +
ggtitle(paste('Comparison of Par 5 Course Scores (2020)', '\n')) +
theme(
  plot.title = element_text(hjust = 0.5),
  axis.title.x=element_blank(),
  axis.text.x = element_blank(),
  axis.ticks.x = element_blank()
)

Merged_Golf_Fargo %>% 
filter(
  Year == 2021,
  !is.na(ToPar),
  Course == 'Rose Creek Front' |  
  Course == 'Rose Creek Back' |
  Course == 'Edgewood Front' |
  Course == 'Edgewood Back'
) %>%  

ggplot(aes(Course, ToPar, fill = Course)) +
geom_boxplot(alpha = 0.5) +
scale_fill_gdocs() +
theme_bw() +
ylab(paste('Score (To Par)', '\n')) +
ggtitle(paste('Comparison of Par 5 Course Scores (2021)', '\n')) +
theme(
  plot.title = element_text(hjust = 0.5),
  axis.title.x=element_blank(),
  axis.text.x = element_blank(),
  axis.ticks.x = element_blank()
)

Merged_Golf_Fargo %>% 
filter(
  Year == 2020,
  !is.na(ToPar), 
  Course == 'Osgood' |
  Course == 'Prairiewood' |
  Course == 'El Zagal'
) %>% 
  
ggplot(aes(x = ToPar, fill = Course)) +
facet_grid(. ~ Course) +
geom_density(alpha = 0.5) +
xlim(-1, 10) +
scale_fill_gdocs() +
theme_bw() +
xlab(paste('\n', 'Score (To Par)')) + 
ylab(paste('Density', '\n')) +
ggtitle(paste('Density Plot of Par 3 Course Scores (2020)', '\n')) +
theme(
  plot.title = element_text(hjust = 0.5),
  legend.position = 'none'
)

Merged_Golf_Fargo %>% 
filter(
  Year == 2021,
  !is.na(ToPar), 
  Course == 'Osgood' |
  Course == 'Prairiewood' |
  Course == 'El Zagal'
) %>% 
  
ggplot(aes(x = ToPar, fill = Course)) +
facet_grid(. ~ Course) +
geom_density(alpha = 0.5) +
xlim(-3, 10) +
scale_fill_gdocs() +
theme_bw() +
xlab(paste('\n', 'Score (To Par)')) + 
ylab(paste('Density', '\n')) +
ggtitle(paste('Density Plot of Par 3 Course Scores (2021)', '\n')) +
theme(
  plot.title = element_text(hjust = 0.5),
  legend.position = 'none'
)

Merged_Golf_Fargo %>% 
filter(
  Year == 2020,
  !is.na(ToPar),
  Course == 'Osgood' |
  Course == 'Prairiewood' |
  Course == 'El Zagal'
) %>%

ggplot(aes(Course, ToPar, fill = Course)) +
geom_boxplot(alpha = 0.5) +
ylim(0, 9) +
scale_fill_gdocs() +
theme_bw() +
ylab(paste('Score (To Par)', '\n')) +
ggtitle(paste('Comparison of Par 3 Course Scores (2020)', '\n')) +
theme(
  plot.title = element_text(hjust = 0.5),
  axis.title.x=element_blank(),
  axis.text.x = element_blank(),
  axis.ticks.x = element_blank()
)


Does tee time affect score?

Merged_Golf %>% 
filter(!is.na(ToPar)) %>% 
filter(!is.na(Time)) %>% 
  
ggplot(aes(x = Time, y = ToPar)) + 
geom_point(color = '#DE5246', size = 1.5) +
facet_grid(. ~ Year) +
scale_x_time(labels = time_format('%H:%M')) + 
geom_smooth(method = 'lm', formula = y ~ x, se = TRUE, color = '#4C8BF5') +
stat_cor(
  method = 'pearson', 
  color = 'black',
  size = 4,
  label.x.npc = 0.2, 
  label.y.npc = 0.86, 
  hjust = 0
) +
theme_bw() +
xlab(paste('\n', 'Tee Time')) + 
ylab(paste('Score (To Par)', '\n')) +
ggtitle(paste('Score vs Tee Time', '\n')) +
theme(plot.title = element_text(hjust = 0.5))


Does score vary by weekday?

Merged_Golf %>% 
filter(
  Year == 2020,
  !is.na(ToPar),
  !is.na(Weekday)
) %>% 
  
ggplot(aes(Weekday, ToPar, fill = Weekday)) +
geom_boxplot(alpha = 0.5) +
scale_fill_gdocs() +
theme_bw() +
ylab(paste('Score (To Par)', '\n')) +
ggtitle(paste('Comparison of Scores by Weekday (2020)', '\n')) +
theme(
  plot.title = element_text(hjust = 0.5),
  axis.title.x=element_blank(),
  axis.text.x = element_blank(),
  axis.ticks.x = element_blank()
)

Merged_Golf %>% 
filter(
  Year == 2021,
  !is.na(ToPar),
  !is.na(Weekday)
) %>% 
    
ggplot(aes(Weekday, ToPar, fill = Weekday)) +
geom_boxplot(alpha = 0.5) +
scale_fill_gdocs() +
theme_bw() +
ylab(paste('Score (To Par)', '\n')) +
ggtitle(paste('Comparison of Scores by Weekday (2021)', '\n')) +
theme(
  plot.title = element_text(hjust = 0.5),
  axis.title.x=element_blank(),
  axis.text.x = element_blank(),
  axis.ticks.x = element_blank()
)


Does the number of playing partners affect score?

Merged_Golf %>% 
filter(!is.na(ToPar)) %>% 
filter(!is.na(PartnerCount)) %>% 
  
ggplot(aes(x = PartnerCount, y = ToPar)) + 
geom_point(color = '#DE5246', size = 1.5, alpha = 0.3) +
facet_grid(. ~ Year) +
geom_smooth(method = 'lm', formula = y ~ x, se = TRUE, color = '#4C8BF5') +
stat_cor(
  method = 'pearson', 
  color = 'black',
  size = 4,
  label.x.npc = 0.0, 
  label.y.npc = 0.875, 
  hjust = 0
) +
xlim(0,4) +
theme_bw() +
xlab(paste('\n', 'Number of Playing Partners')) + 
ylab(paste('Score (To Par)', '\n')) +
ggtitle(paste('Score vs Number of Playing Partners', '\n')) +
theme(plot.title = element_text(hjust = 0.5)) +
labs(caption = paste(
  '\n', 
  'Note: Transparency has been added to show that', 
  'several points share the same coordinates.')
) + 
theme(plot.caption = element_text(hjust = 0))


Correlation Plot of All Numeric Variables

Merged_Golf %>% 
select(
  ToPar, 
  Wind, 
  Temp, 
  FIR, 
  GIR, 
  PuttsPerHole, 
  PartnerCount
) %>% 
cor(use = 'complete.obs') %>% 
corrplot.mixed(
  lower = 'number', 
  upper = 'circle',
  tl.col = 'black',
  lower.col = 'black', 
  tl.cex = 0.65
)



Conclusion

Variables that affect score:

  • Date (later date correlates to lower score) - I appear to be getting better at golf :P

  • GIRs (more GIRs correlates to lower score)

  • Putts / Hole (fewer putts / hole correlates to lower score)

  • Wind in 2018 & 2019 (more wind correlates to higher score) - interestingly this effect went away in 2020

  • Temp (warmer temp correlates to lower score) - this is likely confounded by the fact that it’s colder early in the year and my scores have been decreasing with time

  • Course (certain courses correlate to lower scores) - in addition to some courses being harder than others, I also play certain courses a lot more (Rose Creek and Prairiewood)

  • Number of Playing Partners (fewer playing partners correlates to lower scores) - this is likely confounded by the number of playing partners affecting pace of play, I tend to play better at a fast pace


Variables that don’t affect score:

  • FIRs

  • Wind in 2020 - this was probably the most surprising finding of this analysis

  • Tee Time