The goal of this project was to find underpaid players who are performing well; by using clustering, I was able to extract a list of 10 players that are performing almost as well as the highest paid NBA athletes, while receiving minimal compensation. I compiled the clusters based on a set of standardized variables which gauge the performance of each player.
The variables included:
Points Scored
Field Goal Percentage
Assists
Each of these variables was standardized by using this formula: \[(x-min(x))/(max(x)-min(x))\]
From the clustering, various graphs were plotted vs the players salary, and the players in the “higher performing” cluster who are paid the lowest were extracted. Based on the data, Trae Young is the highest performing player who receives the lowest compensation. The top 10 players include:
1. Trae Young (PG)
2. Donovan Mitchell (SG)
3. Pascal Siakam (PF)
Ben Simmons (PG)
Brandon Ingram (SF)
Jayson Tatum (PF)
Derrick Rose (PG)
Jamal Murray (PG)
Domantis Sabonis (PF)
Devonte’ Graham (PG)
Below are the graphs and clustering performance for reference.
By writing a function to perform cluster performance analyses on a range of 1-10 clusters, a plot is produced to visualize the diminishing marginal performance increase as the number of clusters increases. Based on the plot, 2 clusters is optimal.
Nbclust is a cluster performance function that surveys multiple cluster methods and compiles the number of “votes” each method recommends for the number of clusters. 2 clusters recieved the most votes, and is therefore the optimal value.
## *** : The Hubert index is a graphical method of determining the number of clusters.
## In the plot of Hubert index, we seek a significant knee that corresponds to a
## significant increase of the value of the measure i.e the significant peak in Hubert
## index second differences plot.
##
## *** : The D index is a graphical method of determining the number of clusters.
## In the plot of D index, we seek a significant knee (the significant peak in Dindex
## second differences plot) that corresponds to a significant increase of the value of
## the measure.
##
## *******************************************************************
## * Among all indices:
## * 10 proposed 2 as the best number of clusters
## * 5 proposed 3 as the best number of clusters
## * 1 proposed 4 as the best number of clusters
## * 1 proposed 5 as the best number of clusters
## * 2 proposed 6 as the best number of clusters
## * 3 proposed 11 as the best number of clusters
## * 2 proposed 14 as the best number of clusters
##
## ***** Conclusion *****
##
## * According to the majority rule, the best number of clusters is 2
##
##
## *******************************************************************