View Results of 2-Cluster K-Means Procedure

## K-means clustering with 2 clusters of sizes 120, 110
## 
## Cluster means:
##          PPG        MPG
## 1 -0.7521341 -0.7931782
## 2  0.8205099  0.8652853
## 
## Clustering vector:
##   [1] 2 1 2 2 1 2 2 1 2 2 1 2 2 1 1 1 2 2 1 1 2 2 2 1 2 2 2 2 1 2 2 1 2 2 2 2 2
##  [38] 1 1 2 2 2 1 1 1 1 1 2 2 1 2 2 2 1 2 2 1 1 1 1 1 1 2 2 1 2 1 2 1 2 1 1 1 2
##  [75] 2 2 1 2 2 2 1 1 1 1 1 1 2 1 2 2 1 2 2 1 2 1 1 2 2 2 2 1 2 1 1 2 1 2 2 2 1
## [112] 2 1 2 2 1 2 1 2 1 2 2 2 1 2 2 2 1 2 1 2 1 2 1 2 2 1 1 2 1 1 1 1 1 2 1 2 2
## [149] 2 1 2 2 2 2 1 1 2 1 1 1 1 2 1 2 1 1 2 2 2 1 2 1 1 1 1 1 2 1 2 1 2 1 2 1 1
## [186] 1 1 1 1 1 2 2 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 1 1 2 2 1 1 1 1 2 2 1 1 1 2 1
## [223] 1 2 1 1 1 1 1 1
## 
## Within cluster sum of squares by cluster:
## [1] 72.06630 86.13815
##  (between_SS / total_SS =  65.5 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
## [6] "betweenss"    "size"         "iter"         "ifault"
## $cluster
##   [1] 2 1 2 2 1 2 2 1 2 2 1 2 2 1 1 1 2 2 1 1 2 2 2 1 2 2 2 2 1 2 2 1 2 2 2 2 2
##  [38] 1 1 2 2 2 1 1 1 1 1 2 2 1 2 2 2 1 2 2 1 1 1 1 1 1 2 2 1 2 1 2 1 2 1 1 1 2
##  [75] 2 2 1 2 2 2 1 1 1 1 1 1 2 1 2 2 1 2 2 1 2 1 1 2 2 2 2 1 2 1 1 2 1 2 2 2 1
## [112] 2 1 2 2 1 2 1 2 1 2 2 2 1 2 2 2 1 2 1 2 1 2 1 2 2 1 1 2 1 1 1 1 1 2 1 2 2
## [149] 2 1 2 2 2 2 1 1 2 1 1 1 1 2 1 2 1 1 2 2 2 1 2 1 1 1 1 1 2 1 2 1 2 1 2 1 1
## [186] 1 1 1 1 1 2 2 2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 1 1 2 2 1 1 1 1 2 2 1 1 1 2 1
## [223] 1 2 1 1 1 1 1 1
## 
## $centers
##          PPG        MPG
## 1 -0.7521341 -0.7931782
## 2  0.8205099  0.8652853
## 
## $totss
## [1] 458
## 
## $withinss
## [1] 72.06630 86.13815
## 
## $tot.withinss
## [1] 158.2044
## 
## $betweenss
## [1] 299.7956

Visualize Outputs

Evaluate the quality of the clustering

Create Elbow Chart

## *** : The Hubert index is a graphical method of determining the number of clusters.
##                 In the plot of Hubert index, we seek a significant knee that corresponds to a 
##                 significant increase of the value of the measure i.e the significant peak in Hubert
##                 index second differences plot. 
## 

## *** : The D index is a graphical method of determining the number of clusters. 
##                 In the plot of D index, we seek a significant knee (the significant peak in Dindex
##                 second differences plot) that corresponds to a significant increase of the value of
##                 the measure. 
##  
## ******************************************************************* 
## * Among all indices:                                                
## * 8 proposed 2 as the best number of clusters 
## * 6 proposed 3 as the best number of clusters 
## * 1 proposed 4 as the best number of clusters 
## * 1 proposed 9 as the best number of clusters 
## * 1 proposed 11 as the best number of clusters 
## * 5 proposed 14 as the best number of clusters 
## * 1 proposed 15 as the best number of clusters 
## 
##                    ***** Conclusion *****                            
##  
## * According to the majority rule, the best number of clusters is  2 
##  
##  
## *******************************************************************

Compare quality 3 clusters vs 2 clusters

Summary

Overall, our approach to this assignment was to find variables that were correlated with a having a higher salary in order to find players that were underpaid. What we were able to find was that minutes per game (mpg) and points per game (ppg) were the two highest correlated variables to nba salary so they were the variables chosen for the k means clustering analysis. We then normalized mpg and ppg in order use them accurately in our analysis and also changed the salary variable to only show if they were above the mean pay denoted as 1 or below the mean pay denoted as 0 allowing us to make salary a categorical variable. We found that the variance accounted for by these two clusters was 0.65 which we found to be encouraging. The red 2’s that can be seen at the top right of the graph comparing ppg vs mpg of nba players are the players that we should put forth the most effort in targeting since they are two of the highest performing players yet are in the bottom half of the league in terms of mean pay. Overall, it was suggested by the histogram showed above to use two clusters.