Goal: Figure out a way to find players that are high performing but maybe not highly paid that you can steal to get the team to the playoffs!
Details:
We first imported all of our data and merged our two data sets. We also want to clean the dataset to exclude any rows that have NA’s or any duplicate players. We also want to normalize the variables we are interested in looking at.
We can see with all three of the plots below that each of these three variables: points scored, minutes played, and age all correlate well with salary. Because of this we want to use these variables to create our clusters.
Began clustering with K = 2, to see how it performs. Will eventually consider running more clusters after examining elbow and Nbclust methods.
Want to visualize what our two clusters for salary look like across three pointers and minutes played. From here we can see that those dark blue ones in the top right are the players we may be interested in recruiting to play for us because they are scoring a lot of threes and playing a lot of minutes, however are still getting paid on the lower end of the payscale.
We also want to view this in 3D so we can look at our third variable which is age and see how that impacts minutes played and three pointers scored and see if it gives us additional information on which players we should consider underpaid.
We see from this that our variance accounted for by clusters is .44.We will use this to compare this when we use other N’s.
## [1] 0.4448182
We want to run the elbow method to assess if it would be worth it to use another K value besides 2. We can see from here that at K = 3, we start to see a way more significant diminish in return. Thus, it may be worth it to bump our K up to 3, however it doesn’t seem to make a huge increase in inter-cluster variance/total variance after K = 3.
We also want to look at NbClust to see which methods recommend which number of clusters. When we look at the histogram here we can see that K = 2,3, and 5 are the most frequently suggested number of clusters. Because we have already done K = 2, and according to the elbow method we saw diminishing return after 3, my suggestion would be to also run K = 3 and see if we can gather additional insights from three clusters.
In the steps below I repeat our first few steps, however now using K = 3. Then we will re-plot and re-visualize in order to pick the players in which we believe to be underpaid that we should consider recruiting for the wizards.
As expected, we see higher variance accounted for by clusters when using K = 3, versus K = 2.
## [1] 0.6212126
Ultimately, after looking at our visualizations my suggestion would be to recruit Donovan Mitchell from the Utah Jazz, Trae Young from the Atlanta Hawkes, and Luka Dončić from the Dallas Mavericks. I’ve selected these three players after the rigorous clustering process above. These are our best choices of players because they are high performing, yet underpaid. From our visualizations we can see that these players are playing a lot of minutes and are scoring a lot of pointers, but are getting paid relatively low amounts in comparison to other top performers. This also holds true for their age being relatively young in terms of all of our players. Thus, after this clustering exercise it is true that our best bet of making it to the finals this year is to recruit these players to the Wizards.