Homework 4 Assignment: PGA Tour Data

The Data: The data was gathered from the PGATour.com website; it encompasses statistics about players’ ranking in terms of scoring, the total rounds they have played, the stroke average, the total strokes taken, the total adjustment of these strokes, the total rounds completed, the number of events competed in, the average and total money made, and the number victories from the 2021 - 2022 PGA Tour campaign.

My Analysis: To better depict the effect of scoring average on the total and average money earned, I created the below scatter plots. Based on my suspicion, and analysis of the diagram it appears as though a lower scoring average, over the course of a season, will lead players to earn more money. However, this is heavily influenced by variable “victories”, which means that a player has 1 or more victories on the PGA Tour. From the graph below, we can see that, as expected, players with 1 or more victories have higher cash earnings than the rest. This leads us to believe that the smaller dots, for players with 0 victories, are bound to breakthrough and get their first victory in a matter of time.

Package I used:

## install and load packages
library(ggplot2) #for creating plots and graphics

I loaded the ggplot2 library to create plots and graphs.

All data is taken from the PGA Tour database and is ready for analysis.

Golf <- readr::read_table('PGATour_2021_2022.txt')

Final Table and Analysis

knitr::kable(head(Golf[1:10,]))
Rank Prev_Rank Name Rounds Average Strokes Adjustment Events Cash Victories
1 2 MatthewWolff 16 68.847 1080 21.552 4 1294658 0
2 1 CollinMorikawa 8 68.917 539 12.335 2 1312322 0
3 4 SamBurns 16 69.064 1077 28.029 4 1944031 1
4 6 SungjaeIm 16 69.314 1081 28.029 4 1632198 1
5 3 RoryMcIlroy 4 69.318 263 14.271 1 1755000 1
6 47 CameronSmith 8 69.485 545 10.877 2 369375 0
ggplot2::ggplot(data = Golf, aes(x=Average, y=Cash)) + geom_point(aes(size=Victories)) + geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'