Lucas Young
Original: 31 October 2020
Updated: 9 Aug 2022
For the past 3 years I have collected data on most of my golf rounds. After a round, the data was compiled into a spreadsheet with each row corresponding to a 9-hole round. Some entries are incomplete due to weather delays, running out of daylight, time constraints, etc. The vast majority of rounds were played on the public courses in Fargo, North Dakota, but several other courses appear in the data set.
Fargo public golf courses:
Rose Creek: 18 holes - par 71
Edgewood: 18 holes - par 71
Osgood: 9 holes - par 33
Prairiewood: 9 holes - par 32
El Zagal: 9 holes - par 27
In 2018 I collected the following variables:
Date - date of the 9-hole round
Course - golf course the 9-hole round took place on
Score for each hole - number of strokes required to complete each hole
Score (Total) - total number of strokes required to complete the 9-hole round
Score (To Par) - Score (Total) minus par for the 9-hole course
Wind - wind speed during the 9-hole round measured in miles per hour
Temp - air temperature during the 9-hole round measured in degrees Fahrenheit
Tee - which tees the 9-hole round was played from (red, white, blue, black, etc.)
Playing partners - other people I played the 9-hole round with
In 2019, 2020, and 2021 I collected all of the variables from 2018 and added the following:
Putts / Hole (Putts Per Hole) - calculated by taking the total number of putts for the 9-hole round and dividing by 9
FIR (Fairway In Regulation) - recorded as a percentage for the 9-hole round, # of fairways hit from the tee on par 4 and par 5 holes divided by the number of par 4 and par 5 holes in the 9-hole round
GIR (Green In Regulation) - recorded as a percentage for the 9-hole round, # of greens hit in par minus 2 (or fewer) strokes divided by 9
Time - time of day the 9-hole round started
In this analysis I will attempt to answer the following questions:
Have my scores changed over time?
Have my FIRs changed over time?
How do FIRs affect score?
Have my GIRs changed over time?
How do GIRs affect score?
Have my putts per hole changed over time?
How do putts per hole affect score?
Does wind speed affect score?
Does temperature affect score?
Does score vary by course?
Does tee time affect score?
Does score vary by weekday?
Does the number of playing partners affect score?
Load Relevant Libraries
library(tidyverse)
library(magrittr)
library(janitor)
library(lubridate)
library(ggthemes)
library(scales)
library(ggpubr)
library(corrplot)
Set Global Image Options
knitr::opts_chunk$set(dpi = 200)
Load 2018 Data
Golf_2018 <- read_csv('./2018 Golf.csv', n_max = 105)
Load 2019 Data
Golf_2019 <- read_csv('./2019 Golf.csv', n_max = 143)
Load 2020 Data
Golf_2020 <- read_csv('./2020 Golf.csv', n_max = 177)
Load 2021 Data
Golf_2021 <- read_csv('./2021 Golf.csv', n_max = 225)
Merge Data
Merged_Golf <- bind_rows(Golf_2018, Golf_2019, Golf_2020, Golf_2021)
names(Merged_Golf) <-
make_clean_names(
names(Merged_Golf),
case = 'upper_camel'
)
Merged_Golf %<>% rename(FIR = Fir, GIR = Gir)
Merged_Golf$Date %<>% strptime(format = '%d-%b-%y') %<>% as.Date
Merged_Golf$Course %<>% as.factor
Merged_Golf$Wind %<>% parse_number
Merged_Golf$Temp %<>% parse_number
Merged_Golf$Tee %<>% as.factor
Merged_Golf$FeeAvoided %<>% parse_number
Merged_Golf$Weekday <- as.factor(weekdays(Merged_Golf$Date))
Merged_Golf$Weekday <-
factor(
Merged_Golf$Weekday,
levels = c(
'Monday',
'Tuesday',
'Wednesday',
'Thursday',
'Friday',
'Saturday',
'Sunday'
),
ordered = TRUE
)
Merged_Golf$Year <- as.factor(year(Merged_Golf$Date))
Approximate Number of Playing Partners
PartnerCounter <- function(Partners){
PartnerCount <- 0
CommaCount <- str_count(Partners, ',')
AndCount <- str_count(Partners, 'and')
AndSymbolCount <- str_count(Partners, '&')
PartnerCount <- CommaCount + AndCount + AndSymbolCount + 1
if (is.na(Partners)){
Partners <- 'None'
}
if (is.na(Partners)){
PartnerCount <- 0
}
if (Partners == 'None'){
PartnerCount <- 0
}
return(PartnerCount)
}
Merged_Golf$PartnerCount <- sapply(Merged_Golf$Partners, PartnerCounter)
Date:
Merged_Golf %>%
filter(!is.na(Total)) %>%
ggplot(aes(x = Date)) +
geom_histogram(binwidth = 15, fill = '#DE5246', color = 'black') +
facet_grid(. ~ Year, scale = 'free_x') +
theme_bw() +
xlab(paste('\n','Month')) +
ylab(paste('Number of 9-Hole Rounds', '\n')) +
ggtitle(paste('Histogram of Date', '\n')) +
theme(plot.title = element_text(hjust = 0.5))
Course:
sort(unique(Merged_Golf$Course))
## [1] 3 Bears Albany Back
## [3] Albany Front Applewood Hills (Stillwater)
## [5] Buffalo Heights Chaska Par 30
## [7] Columbia Front Cottonwood
## [9] Edelweiss Back Edelweiss Front
## [11] Edgebrook Edgewood Back
## [13] Edgewood Front El Zagal
## [15] Emerald Greens Back Emerald Greens Front
## [17] Emerald Greens Gold Garrison
## [19] Hawley Back Hawley Front
## [21] Hiawatha Back Hiawatha Front
## [23] Maple River Back Maple River Front
## [25] Meadows Back Meadows Front
## [27] Moorhead CC Back Moorhead CC Front
## [29] Norsk Osgood
## [31] Osgood 3 Hole Pinewood
## [33] Pleasant View Lake Pleasant View Woods
## [35] Prairiewood Rose Creek Back
## [37] Rose Creek Front Royal Golf Club Back
## [39] Royal Golf Club Front South Hills (Waterloo)
## [41] Stillwater Oaks Back Stillwater Oaks Front
## [43] Sunrise Ridge The Lakes
## [45] Theodore Wirth Front Valley Golf (Wilmar)
## [47] Village Green Back Village Green Front
## [49] Waverly Municipal Golf Course
## 49 Levels: 3 Bears Albany Back Albany Front ... Waverly Municipal Golf Course
Holes:
HoleScores <-
Merged_Golf %>%
select(Hole1, Hole2, Hole3, Hole4, Hole5, Hole6, Hole7, Hole8, Hole9, Year) %>%
pivot_longer(
cols = c(Hole1, Hole2, Hole3, Hole4, Hole5, Hole6, Hole7, Hole8, Hole9)
)
names(HoleScores) <- c('Year', 'Hole', 'Score')
HoleScores %>%
filter(!is.na(Score)) %>%
ggplot(aes(x = Score)) +
geom_histogram(binwidth = 1, fill = '#DE5246', color = 'black') +
scale_x_continuous(labels = c('1', '2', '3', '4', '5', '6', '7', '8', '9'),
breaks = c(1, 2, 3, 4, 5, 6, 7, 8, 9)) +
facet_grid(. ~ Year) +
theme_bw() +
xlab(paste('\n', 'Individual Hole Score')) +
ylab(paste('Number of Holes', '\n')) +
ggtitle(paste('Histogram of Individual Hole Scores', '\n')) +
theme(plot.title = element_text(hjust = 0.5))
Total Score:
Merged_Golf %>%
filter(!is.na(Total)) %>%
ggplot(aes(x = Total)) +
geom_histogram(binwidth = 2, fill = '#DE5246', color = 'black') +
facet_grid(. ~ Year) +
theme_bw() +
xlab(paste('\n', 'Score (Total)')) +
ylab(paste('Number of 9-Hole Rounds', '\n')) +
ggtitle(paste('Histogram of 9-Hole Round Scores', '\n')) +
theme(plot.title = element_text(hjust = 0.5))
To Par:
Merged_Golf %>%
filter(!is.na(ToPar)) %>%
ggplot(aes(x = ToPar)) +
geom_histogram(binwidth = 2, fill = '#DE5246', color = 'black') +
facet_grid(. ~ Year) +
theme_bw() +
xlab(paste('\n', 'Score (To Par)')) +
ylab(paste('Number of 9-Hole Rounds', '\n')) +
ggtitle(paste('Histogram of 9-Hole Round Scores Relative to Par', '\n')) +
theme(plot.title = element_text(hjust = 0.5))
Wind:
Merged_Golf %>%
filter(!is.na(Wind)) %>%
ggplot(aes(x = Wind)) +
geom_histogram(binwidth = 2, fill = '#DE5246', color = 'black') +
facet_grid(. ~ Year) +
theme_bw() +
xlab(paste('\n', 'Wind Speed (mph)')) +
ylab(paste('Number of 9-Hole Rounds', '\n')) +
ggtitle(paste('Histogram of Wind Speed', '\n')) +
theme(plot.title = element_text(hjust = 0.5))
Temp:
Merged_Golf %>%
filter(!is.na(Temp)) %>%
ggplot(aes(x = Temp)) +
geom_histogram(binwidth = 5, fill = '#DE5246', color = 'black') +
facet_grid(. ~ Year) +
theme_bw() +
xlab(paste('\n', 'Temp (°F)')) +
ylab(paste('Number of 9-Hole Rounds', '\n')) +
ggtitle(paste('Histogram of Temperature', '\n')) +
theme(plot.title = element_text(hjust = 0.5))
FIR:
Merged_Golf %>%
filter(!is.na(FIR)) %>%
ggplot(aes(x = FIR)) +
geom_histogram(binwidth = 0.1, fill = '#DE5246', color = 'black') +
facet_grid(. ~ Year) +
theme_bw() +
xlab(paste('\n', 'FIRs (%)')) +
ylab(paste('Number of 9-Hole Rounds', '\n')) +
ggtitle(paste('Histogram of FIRs', '\n')) +
theme(plot.title = element_text(hjust = 0.5))
GIR:
Merged_Golf %>%
filter(!is.na(FIR)) %>%
ggplot(aes(x = GIR)) +
geom_histogram(binwidth = 0.11, fill = '#DE5246', color = 'black') +
facet_grid(. ~ Year) +
theme_bw() +
xlab(paste('\n', 'GIRs (%)')) +
ylab(paste('Number of 9-Hole Rounds', '\n')) +
ggtitle(paste('Histogram of GIRs', '\n')) +
theme(plot.title = element_text(hjust = 0.5))
Putts / Hole:
Merged_Golf %>%
filter(!is.na(PuttsPerHole)) %>%
ggplot(aes(x = PuttsPerHole)) +
geom_histogram(binwidth = 0.12, fill = '#DE5246', color = 'black') +
facet_grid(. ~ Year) +
theme_bw() +
xlab(paste('\n', 'Putts / Hole')) +
ylab(paste('Number of 9-Hole Rounds', '\n')) +
ggtitle(paste('Histogram of Putts / Hole', '\n')) +
theme(plot.title = element_text(hjust = 0.5))
Time:
Merged_Golf %>%
filter(!is.na(Time)) %>%
filter(!is.na(Total)) %>%
ggplot(aes(x = Time)) +
geom_histogram(bins = 15, fill = '#DE5246', color = 'black') +
scale_x_time(labels = time_format('%H:%M')) +
facet_grid(. ~ Year) +
theme_bw() +
xlab(paste('\n', 'Start Time')) +
ylab(paste('Number of 9-Hole Rounds', '\n')) +
ggtitle(paste('Histogram of 9-Hole Round Start Times', '\n')) +
theme(plot.title = element_text(hjust = 0.5))
Partner Count:
Merged_Golf %>%
filter(!is.na(PartnerCount)) %>%
filter(!is.na(Total)) %>%
ggplot(aes(x = PartnerCount)) +
geom_histogram(binwidth = 1, fill = '#DE5246', color = 'black') +
facet_grid(. ~ Year) +
theme_bw() +
xlab(paste('\n', 'Number of Playing Partners')) +
ylab(paste('Number of 9-Hole Rounds', '\n')) +
ggtitle(paste('Histogram of Playing Partners', '\n')) +
theme(plot.title = element_text(hjust = 0.5))
Have my scores changed over time?
Merged_Golf %>%
filter(!is.na(ToPar)) %>%
ggplot(aes(x = Date, y = ToPar)) +
geom_point(color = '#DE5246', size = 1.5) +
facet_grid(. ~ Year, scale = 'free_x') +
geom_smooth(method = 'lm', formula = y ~ x, se = TRUE, color = '#4C8BF5') +
stat_cor(
method = 'pearson',
color = 'black',
size = 4,
label.x.npc = 0.97,
label.y.npc = 0.90,
hjust = 1
) +
theme_bw() +
xlab(paste('\n', 'Month')) +
ylab(paste('Score (To Par)', '\n')) +
ggtitle(paste('9-Hole Scores Over Time', '\n')) +
theme(plot.title = element_text(hjust = 0.5))
Merged_Golf %>%
filter(!is.na(ToPar)) %>%
ggplot(aes(y = ToPar, fill = Year)) +
geom_boxplot(alpha = 0.5) +
scale_fill_gdocs() +
theme_bw() +
ylab(paste('Score (To Par)', '\n')) +
ggtitle(paste('Comparison of 9-Hole Round Scores', '\n')) +
theme(
plot.title = element_text(hjust = 0.5),
axis.title.x=element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank()
)
Merged_Golf %>%
filter(!is.na(Total)) %>%
ggplot(aes(x = ToPar, fill = Year)) +
geom_density(alpha = 0.5) +
scale_fill_gdocs() +
theme_bw() +
xlab(paste('\n', 'Score (To Par)')) +
ylab(paste('Density', '\n')) +
ggtitle(paste('Density Plot of 9-Hole Round Scores', '\n')) +
theme(plot.title = element_text(hjust = 0.5))
Have my FIRs changed over time?
Merged_Golf %>%
filter(!is.na(ToPar)) %>%
filter(Year != 2018) %>%
filter(!is.na(FIR)) %>%
ggplot(aes(x = Date, y = FIR)) +
geom_point(color = '#DE5246', size = 1.5) +
facet_grid(. ~ Year, scale = 'free_x') +
geom_smooth(method = 'lm', formula = y ~ x, se = TRUE, color = '#4C8BF5') +
stat_cor(
method = 'pearson',
color = 'black',
size = 4,
label.x.npc = 0.97,
label.y.npc = 0.97,
hjust = 1
) +
theme_bw() +
xlab(paste('\n', 'Month')) +
ylab(paste('FIRs (%)', '\n')) +
ggtitle(paste('FIRs Over Time', '\n')) +
theme(plot.title = element_text(hjust = 0.5))
How do FIRs affect score?
Merged_Golf %>%
filter(!is.na(ToPar)) %>%
filter(Year != 2018) %>%
filter(!is.na(FIR)) %>%
ggplot(aes(x = FIR, y = ToPar)) +
geom_point(color = '#DE5246', size = 1.5) +
facet_grid(. ~ Year) +
geom_smooth(method = 'lm', formula = y ~ x, se = TRUE, color = '#4C8BF5') +
stat_cor(
method = 'pearson',
color = 'black',
size = 4,
label.x.npc = 0.05,
label.y.npc = 0.93,
hjust = 0
) +
theme_bw() +
xlab(paste('\n', 'FIRs (%)')) +
ylab(paste('Score (To Par)', '\n')) +
ggtitle(paste('Score vs FIRs', '\n')) +
theme(plot.title = element_text(hjust = 0.5))
Have my GIRs changed over time?
Merged_Golf %>%
filter(!is.na(ToPar)) %>%
filter(Year != 2018) %>%
filter(!is.na(GIR)) %>%
ggplot(aes(x = Date, y = GIR)) +
geom_point(color = '#DE5246', size = 1.5) +
facet_grid(. ~ Year, scale = 'free_x') +
geom_smooth(method = 'lm', formula = y ~ x, se = TRUE, color = '#4C8BF5') +
stat_cor(
method = 'pearson',
color = 'black',
size = 4,
label.x.npc = 0.05,
label.y.npc = 0.95,
hjust = 0
) +
theme_bw() +
xlab(paste('\n', 'Month')) +
ylab(paste('GIRs (%)', '\n')) +
ggtitle(paste('GIRs Over Time', '\n')) +
theme(plot.title = element_text(hjust = 0.5))
How do GIRs affect score?
Merged_Golf %>%
filter(!is.na(ToPar)) %>%
filter(Year != 2018) %>%
filter(!is.na(GIR)) %>%
ggplot(aes(x = GIR, y = ToPar)) +
geom_point(color = '#DE5246', size = 1.5, alpha = 0.3) +
facet_grid(. ~ Year, scale = 'free_x') +
geom_smooth(method = 'lm', formula = y ~ x, se = TRUE, color = '#4C8BF5') +
stat_cor(
method = 'pearson',
color = 'black',
size = 4,
label.x.npc = 0.97,
label.y.npc = 0.97,
hjust = 1
) +
xlim(0, 1) +
theme_bw() +
xlab(paste('\n', 'GIRs (%)')) +
ylab(paste('Score (To Par)', '\n')) +
ggtitle(paste('Score vs GIRs', '\n')) +
theme(plot.title = element_text(hjust = 0.5)) +
labs(caption = paste(
'\n',
'Note: Transparency has been added to show that',
'several points share the same coordinates.')
) +
theme(plot.caption = element_text(hjust = 0))
Have my putts per hole changed over time?
Merged_Golf %>%
filter(!is.na(ToPar)) %>%
filter(Year != 2018) %>%
filter(!is.na(PuttsPerHole)) %>%
ggplot(aes(x = Date, y = PuttsPerHole)) +
geom_point(color = '#DE5246', size = 1.5) +
facet_grid(. ~ Year, scale = 'free_x') +
geom_smooth(method = 'lm', formula = y ~ x, se = TRUE, color = '#4C8BF5') +
stat_cor(
method = 'pearson',
color = 'black',
size = 4,
label.x.npc = 0.97,
label.y.npc = 0.97,
hjust = 1
) +
theme_bw() +
xlab(paste('\n', 'Month')) +
ylab(paste('Putts / Hole', '\n')) +
ggtitle(paste('Putts / Hole Over Time', '\n')) +
theme(plot.title = element_text(hjust = 0.5))
How do putts per hole affect score?
Merged_Golf %>%
filter(!is.na(ToPar)) %>%
filter(Year != 2018) %>%
filter(!is.na(PuttsPerHole)) %>%
ggplot(aes(x = PuttsPerHole, y = ToPar)) +
geom_point(color = '#DE5246', size = 1.5, alpha = 0.3) +
facet_grid(. ~ Year) +
geom_smooth(method = 'lm', formula = y ~ x, se = TRUE, color = '#4C8BF5') +
stat_cor(
method = 'pearson',
color = 'black',
size = 4,
label.x.npc = 0.00,
label.y.npc = 0.89,
hjust = 0
) +
theme_bw() +
xlab(paste('\n', 'Putts / Hole')) +
ylab(paste('Score (To Par)', '\n')) +
ggtitle(paste('Score vs Putts / Hole', '\n')) +
theme(plot.title = element_text(hjust = 0.5)) +
labs(caption = paste(
'\n',
'Note: Transparency has been added to show that',
'several points share the same coordinates.')
) +
theme(plot.caption = element_text(hjust = 0))
Does wind speed affect score?
Merged_Golf %>%
filter(!is.na(ToPar)) %>%
filter(!is.na(Wind)) %>%
ggplot(aes(x = Wind, y = ToPar)) +
geom_point(color = '#DE5246', size = 1.5) +
facet_grid(. ~ Year) +
geom_smooth(method = 'lm', formula = y ~ x, se = TRUE, color = '#4C8BF5') +
stat_cor(
method = 'pearson',
color = 'black',
size = 4,
label.x.npc = 0.95,
label.y.npc = 0.98,
hjust = 1
) +
theme_bw() +
xlab(paste('\n', 'Wind Speed (mph)')) +
ylab(paste('Score (To Par)', '\n')) +
ggtitle(paste('Score vs Wind Speed', '\n')) +
theme(plot.title = element_text(hjust = 0.5))
Does temperature affect score?
Merged_Golf %>%
filter(!is.na(ToPar)) %>%
filter(!is.na(Temp)) %>%
ggplot(aes(x = Temp, y = ToPar)) +
geom_point(color = '#DE5246', size = 1.5) +
facet_grid(. ~ Year) +
geom_smooth(method = 'lm', formula = y ~ x, se = TRUE, color = '#4C8BF5') +
stat_cor(
method = 'pearson',
color = 'black',
size = 4,
label.x.npc = 0.00,
label.y.npc = 0.88,
hjust = 0
) +
theme_bw() +
xlab(paste('\n', 'Temp (°F)')) +
ylab(paste('Score (To Par)', '\n')) +
ggtitle(paste('Score vs Temp', '\n')) +
theme(plot.title = element_text(hjust = 0.5))
Does score vary by course?
Merged_Golf_Fargo <-
Merged_Golf %>%
filter(
!is.na(ToPar),
Course == 'Rose Creek Front' |
Course == 'Rose Creek Back' |
Course == 'Edgewood Front' |
Course == 'Edgewood Back' |
Course == 'Osgood' |
Course == 'Prairiewood' |
Course == 'El Zagal'
)
Merged_Golf_Fargo$Course <-
factor(
Merged_Golf_Fargo$Course,
levels = c(
'Rose Creek Front',
'Rose Creek Back',
'Edgewood Front',
'Edgewood Back',
'Osgood',
'Prairiewood',
'El Zagal'
),
ordered = TRUE
)
Merged_Golf_Fargo %>%
filter(Year == 2020) %>%
filter(!is.na(ToPar)) %>%
filter(
Course == 'Rose Creek Front' |
Course == 'Rose Creek Back' |
Course == 'Edgewood Front' |
Course == 'Edgewood Back'
) %>%
ggplot(aes(x = ToPar, fill = Course)) +
facet_grid(. ~ Course) +
geom_density(alpha = 0.5) +
xlim(-3, 13) +
scale_fill_gdocs() +
theme_bw() +
xlab(paste('\n', 'Score (To Par)')) +
ylab(paste('Density', '\n')) +
ggtitle(paste('Density Plot of Par 5 Course Scores (2020)', '\n')) +
theme(
plot.title = element_text(hjust = 0.5),
legend.position = 'none'
)
Merged_Golf_Fargo %>%
filter(Year == 2021) %>%
filter(!is.na(ToPar)) %>%
filter(
Course == 'Rose Creek Front' |
Course == 'Rose Creek Back' |
Course == 'Edgewood Front' |
Course == 'Edgewood Back'
) %>%
ggplot(aes(x = ToPar, fill = Course)) +
facet_grid(. ~ Course) +
geom_density(alpha = 0.5) +
xlim(-3, 10) +
scale_fill_gdocs() +
theme_bw() +
xlab(paste('\n', 'Score (To Par)')) +
ylab(paste('Density', '\n')) +
ggtitle(paste('Density Plot of Par 5 Course Scores (2021)', '\n')) +
theme(
plot.title = element_text(hjust = 0.5),
legend.position = 'none'
)
Merged_Golf_Fargo %>%
filter(Year == 2020) %>%
filter(!is.na(ToPar)) %>%
filter(Course == 'Rose Creek Front' |
Course == 'Rose Creek Back' |
Course == 'Edgewood Front' |
Course == 'Edgewood Back'
) %>%
ggplot(aes(Course, ToPar, fill = Course)) +
geom_boxplot(alpha = 0.5) +
scale_fill_gdocs() +
theme_bw() +
ylab(paste('Score (To Par)', '\n')) +
ggtitle(paste('Comparison of Par 5 Course Scores (2020)', '\n')) +
theme(
plot.title = element_text(hjust = 0.5),
axis.title.x=element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank()
)
Merged_Golf_Fargo %>%
filter(
Year == 2021,
!is.na(ToPar),
Course == 'Rose Creek Front' |
Course == 'Rose Creek Back' |
Course == 'Edgewood Front' |
Course == 'Edgewood Back'
) %>%
ggplot(aes(Course, ToPar, fill = Course)) +
geom_boxplot(alpha = 0.5) +
scale_fill_gdocs() +
theme_bw() +
ylab(paste('Score (To Par)', '\n')) +
ggtitle(paste('Comparison of Par 5 Course Scores (2021)', '\n')) +
theme(
plot.title = element_text(hjust = 0.5),
axis.title.x=element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank()
)
Merged_Golf_Fargo %>%
filter(
Year == 2020,
!is.na(ToPar),
Course == 'Osgood' |
Course == 'Prairiewood' |
Course == 'El Zagal'
) %>%
ggplot(aes(x = ToPar, fill = Course)) +
facet_grid(. ~ Course) +
geom_density(alpha = 0.5) +
xlim(-1, 10) +
scale_fill_gdocs() +
theme_bw() +
xlab(paste('\n', 'Score (To Par)')) +
ylab(paste('Density', '\n')) +
ggtitle(paste('Density Plot of Par 3 Course Scores (2020)', '\n')) +
theme(
plot.title = element_text(hjust = 0.5),
legend.position = 'none'
)
Merged_Golf_Fargo %>%
filter(
Year == 2021,
!is.na(ToPar),
Course == 'Osgood' |
Course == 'Prairiewood' |
Course == 'El Zagal'
) %>%
ggplot(aes(x = ToPar, fill = Course)) +
facet_grid(. ~ Course) +
geom_density(alpha = 0.5) +
xlim(-3, 10) +
scale_fill_gdocs() +
theme_bw() +
xlab(paste('\n', 'Score (To Par)')) +
ylab(paste('Density', '\n')) +
ggtitle(paste('Density Plot of Par 3 Course Scores (2021)', '\n')) +
theme(
plot.title = element_text(hjust = 0.5),
legend.position = 'none'
)
Merged_Golf_Fargo %>%
filter(
Year == 2020,
!is.na(ToPar),
Course == 'Osgood' |
Course == 'Prairiewood' |
Course == 'El Zagal'
) %>%
ggplot(aes(Course, ToPar, fill = Course)) +
geom_boxplot(alpha = 0.5) +
ylim(0, 9) +
scale_fill_gdocs() +
theme_bw() +
ylab(paste('Score (To Par)', '\n')) +
ggtitle(paste('Comparison of Par 3 Course Scores (2020)', '\n')) +
theme(
plot.title = element_text(hjust = 0.5),
axis.title.x=element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank()
)
Does tee time affect score?
Merged_Golf %>%
filter(!is.na(ToPar)) %>%
filter(!is.na(Time)) %>%
ggplot(aes(x = Time, y = ToPar)) +
geom_point(color = '#DE5246', size = 1.5) +
facet_grid(. ~ Year) +
scale_x_time(labels = time_format('%H:%M')) +
geom_smooth(method = 'lm', formula = y ~ x, se = TRUE, color = '#4C8BF5') +
stat_cor(
method = 'pearson',
color = 'black',
size = 4,
label.x.npc = 0.2,
label.y.npc = 0.86,
hjust = 0
) +
theme_bw() +
xlab(paste('\n', 'Tee Time')) +
ylab(paste('Score (To Par)', '\n')) +
ggtitle(paste('Score vs Tee Time', '\n')) +
theme(plot.title = element_text(hjust = 0.5))
Does score vary by weekday?
Merged_Golf %>%
filter(
Year == 2020,
!is.na(ToPar),
!is.na(Weekday)
) %>%
ggplot(aes(Weekday, ToPar, fill = Weekday)) +
geom_boxplot(alpha = 0.5) +
scale_fill_gdocs() +
theme_bw() +
ylab(paste('Score (To Par)', '\n')) +
ggtitle(paste('Comparison of Scores by Weekday (2020)', '\n')) +
theme(
plot.title = element_text(hjust = 0.5),
axis.title.x=element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank()
)
Merged_Golf %>%
filter(
Year == 2021,
!is.na(ToPar),
!is.na(Weekday)
) %>%
ggplot(aes(Weekday, ToPar, fill = Weekday)) +
geom_boxplot(alpha = 0.5) +
scale_fill_gdocs() +
theme_bw() +
ylab(paste('Score (To Par)', '\n')) +
ggtitle(paste('Comparison of Scores by Weekday (2021)', '\n')) +
theme(
plot.title = element_text(hjust = 0.5),
axis.title.x=element_blank(),
axis.text.x = element_blank(),
axis.ticks.x = element_blank()
)
Does the number of playing partners affect score?
Merged_Golf %>%
filter(!is.na(ToPar)) %>%
filter(!is.na(PartnerCount)) %>%
ggplot(aes(x = PartnerCount, y = ToPar)) +
geom_point(color = '#DE5246', size = 1.5, alpha = 0.3) +
facet_grid(. ~ Year) +
geom_smooth(method = 'lm', formula = y ~ x, se = TRUE, color = '#4C8BF5') +
stat_cor(
method = 'pearson',
color = 'black',
size = 4,
label.x.npc = 0.0,
label.y.npc = 0.875,
hjust = 0
) +
xlim(0,4) +
theme_bw() +
xlab(paste('\n', 'Number of Playing Partners')) +
ylab(paste('Score (To Par)', '\n')) +
ggtitle(paste('Score vs Number of Playing Partners', '\n')) +
theme(plot.title = element_text(hjust = 0.5)) +
labs(caption = paste(
'\n',
'Note: Transparency has been added to show that',
'several points share the same coordinates.')
) +
theme(plot.caption = element_text(hjust = 0))
Correlation Plot of All Numeric Variables
Merged_Golf %>%
select(
ToPar,
Wind,
Temp,
FIR,
GIR,
PuttsPerHole,
PartnerCount
) %>%
cor(use = 'complete.obs') %>%
corrplot.mixed(
lower = 'number',
upper = 'circle',
tl.col = 'black',
lower.col = 'black',
tl.cex = 0.65
)
Variables that affect score:
Date (later date correlates to lower score) - I appear to be getting better at golf :P
GIRs (more GIRs correlates to lower score)
Putts / Hole (fewer putts / hole correlates to lower score)
Wind in 2018 & 2019 (more wind correlates to higher score) - interestingly this effect went away in 2020
Temp (warmer temp correlates to lower score) - this is likely confounded by the fact that it’s colder early in the year and my scores have been decreasing with time
Course (certain courses correlate to lower scores) - in addition to some courses being harder than others, I also play certain courses a lot more (Rose Creek and Prairiewood)
Number of Playing Partners (fewer playing partners correlates to lower scores) - this is likely confounded by the number of playing partners affecting pace of play, I tend to play better at a fast pace
Variables that don’t affect score:
FIRs
Wind in 2020 - this was probably the most surprising finding of this analysis
Tee Time