#NBA Clustering Exercise Details:

clust_data_nba = nbastats1[, c("PTS","AST","salary")]

set.seed(50) #ensuring reproducibility 

kmeansObjNBA <- kmeans(clust_data_nba, centers = 3)

kmeansObjNBA
## K-means clustering with 3 clusters of sizes 271, 89, 49
## 
## Cluster means:
##        PTS       AST   salary
## 1 204.0627  44.16605  3368922
## 2 366.8764  78.44944 13787278
## 3 584.3878 143.24490 31761019
## 
## Clustering vector:
##   [1] 2 1 1 1 3 1 1 1 1 1 2 1 3 2 1 3 1 3 2 1 1 1 1 1 1 3 1 3 1 1 1 3 1 1 3 2 1
##  [38] 1 1 1 3 1 1 1 1 2 1 1 2 1 1 3 1 2 1 1 3 1 1 2 1 1 2 1 1 2 1 1 3 1 1 3 1 1
##  [75] 2 2 2 1 1 1 1 1 1 1 1 1 2 1 1 2 1 2 2 3 1 1 2 1 1 1 1 2 2 1 1 1 1 1 3 1 1
## [112] 2 2 1 1 1 1 2 1 1 1 2 1 1 1 1 2 2 1 1 2 1 1 1 2 1 1 1 2 1 2 1 3 1 1 2 3 2
## [149] 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 2 1 2 1 1 1 1 1 1 1 1 3 1 3 3 3 2 2 1 1 1 1
## [186] 1 1 1 3 1 2 1 1 2 2 1 1 3 2 2 3 1 1 3 2 2 1 1 1 1 1 1 2 3 1 2 1 1 1 2 3 3
## [223] 1 1 2 2 3 1 1 1 2 1 3 1 1 3 1 1 3 1 1 3 2 1 3 3 1 3 1 1 1 2 1 3 1 2 1 1 1
## [260] 1 1 1 1 2 2 1 1 2 2 2 1 1 2 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 2 1 1 2 1 1 1
## [297] 1 1 3 1 1 3 3 2 1 1 1 3 1 1 3 1 2 1 1 2 3 2 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1
## [334] 1 2 1 1 2 3 1 3 1 1 1 1 1 2 1 1 1 1 1 1 2 1 3 1 3 1 1 2 1 1 2 2 2 1 1 2 2
## [371] 2 1 1 1 1 2 1 3 2 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 2 2 2 1 1 1 1 1 2 1 1 1 2
## [408] 1 2
## 
## Within cluster sum of squares by cluster:
## [1] 1.137900e+15 1.273720e+15 1.321829e+15
##  (between_SS / total_SS =  90.6 %)
## 
## Available components:
## 
## [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
## [6] "betweenss"    "size"         "iter"         "ifault"
NBAclust = as.factor(kmeansObjNBA$cluster)

significant correlations found for Pts and Ast ^^^

Visualizing Intial options wirh Kmeans baseplot Visualization Function

Some other variables we experimented with

##FG Percentage vs. 3P%

Offensive Rebound vs. Steals

#KMeans with Standardized PTS & AST

#Points per minute / Assist per minute

Final Kmeans Cluster Chart

set interactivey for insights

Row {data-width=400}

Note: High and wide Competitive/Low Salaried players are under-valued and worthy of signing/trade

Undervalued Players Recommendation:

– Tre Young

– Luka Doncic

– Devonte’ Graham (highest low paid player among competitive & high paid players)

– T.J. McConnell

– De’Aaron Fox

Evaluation & Considerations

The core quesiton of this lab was to determine the relationship between player performance and salary, and to derive instances of potential value arbitrage for a free-agent signing or trade.

We feel that our correlation calculations and early kmean models with other variables made it clear that Points and Assists in standardized form are the most telling variables for the marginal production of a Player. Given that this graphic is in the context of player salary, any outliers beyond the clusters are instances of players who are distinct performers relative to their salary. The list above are just a handful of those instances. However, there are clear risks with this model and implementing the insights that come from it.Not only does this model not consider a defensive metric in its evaluation, but it also do not consider external conditions and circumstances that could possibly contribute to offensive performance, such as intangibles like team chemistry or fluidity, or tangible realities like the performance averages of a player’s teammates or the market value of his jersey. Next steps should include a detailed analysis of the data to ensure there are no errors and additionally, an improved performance metric to align and compare with player salary.

Clustering Validation & Quality Control

explained_variance = function(data_in, k){
  set.seed(50)
  kmeans_obj = kmeans(data_in, centers = k, algorithm = "Lloyd", iter.max = 50)
  
  # Variance accounted for by clusters is equal to the intercluster variance
  # divided by the total variance
  
  
  var_exp = kmeans_obj$betweenss / kmeans_obj$totss
  var_exp  
}


explained_var_NBA = sapply(1:10, explained_variance, data_in = clust_data_nba)



elbow_data_NBA = data.frame(k = 1:10, explained_var_NBA)

3 clear kinks in Elbow Graph: indication of three clusters as appropriate

# Plotting data.
nbaelbow <- ggplot(elbow_data_NBA, 
       aes(x = k,  
           y = explained_var_NBA)) + 
  geom_point(size = 3) +          
  geom_line(size = 1) +            
  xlab('k') + 
  ylab('Inter-cluster Variance / Total Variance') + 
  theme_light()

Row {data-width=400}

Elbow Graph