You are a data focused scout for the worst team in the NBA, probably the Wizards. Your general manager just heard about Data Science and thinks it can solve all the teams problems!!! She wants you to figure out a way to find players that are high performing but maybe not highly paid that you can steal (offer contracts/trade) to get your team out of last place!

Details:

Hints:

Pre-processing function and setup

nba_pre_processing <- function(nba, nba_salaries){
  nba <- read.csv("nba2020-21.csv-1")
nba_salaries <- read.csv("nba_salaries_21.csv-1")
  merged <- merge(nba, nba_salaries, by="Player")
  merged <- na.omit(merged)
  merged <- merged[order(merged[,'Player'],-merged[,'PTS']),]
  merged <- merged[!duplicated(merged$Player),]

return(merged)
}

merged<-nba_pre_processing(nba, nba_salaries)
clust_data_nba <- merged[,-c(1,2,4)]
head(clust_data_nba)
##   Age  G GS  MP  FG FGA   FG. X3P X3PA  X3P. X2P X2PA  X2P.  eFG. FT FTA   FT.
## 1  25 19 19 552  91 213 0.427  31   84 0.369  60  129 0.465 0.500 49  80 0.613
## 2  24 35  6 667  95 256 0.371  38  108 0.352  57  148 0.385 0.445 29  40 0.725
## 3  21 18  0 281  25  63 0.397  17   48 0.354   8   15 0.533 0.532  8  12 0.667
## 4  27 19  0 273  44  90 0.489  13   33 0.394  31   57 0.544 0.561 20  25 0.800
## 5  34 24 24 677 138 312 0.442  47  132 0.356  91  180 0.506 0.518 14  18 0.778
## 6  30  9  6 163  17  43 0.395   6   15 0.400  11   28 0.393 0.465  7   9 0.778
##   ORB DRB TRB AST STL BLK TOV PF PTS X2020.21
## 1  33 104 137  80  14  16  53 38 262 18136364
## 2   7  39  46  63  20   4  29 62 257  2345640
## 3  10  34  44   7   3   5  11 37  75  3458400
## 4   3  43  46  15   6   7  13 24 121  1752950
## 5  23 137 160  84  21  21  27 43 337 27500000
## 6   9  28  37  15   8   5  10  9  47  9720900
set.seed(1)
kmeans_obj_nba = kmeans(clust_data_nba, centers = 3, 
                        algorithm = "Lloyd")

Correlation Matrix

From the correlation matrix, some of the most correlated variables with Salary are Assists and Points

res <- cor(merged[,-c(1,2,4)]) #taking out categorical variables
new <- as.data.frame(round(res, 2)) #correlation matrix
# Run an algorithm with 3 centers.
set.seed(1)
kmeans_obj_nba = kmeans(clust_data_nba, centers = 3, 
                        algorithm = "Lloyd")
#View the results
# Tell R to read the cluster labels as factors so that ggplot2 
# (the graphing package) can read them as category labels instead of 
# continuous variables (numeric variables).

Subset players we want to display- higher performing and lower cost

playerswewant<- merged[(merged$PTS>500)&
                          (merged$AST>175) &
                          (merged$X2020.21<10000000),]
#Visualize the output
clusters_nba = as.factor(kmeans_obj_nba$cluster)
labels_nba = merged$Player

ggplot(merged, aes(x = AST,
                    y = PTS,
                            shape = clusters_nba,
                    color=X2020.21)) + 
  geom_point(size =4 ) + 
  geom_text(aes(label=ifelse(PTS>500 & AST>175 & X2020.21<10000000, Player,'')),
            hjust=0,vjust=0) +
  ggtitle("AST & PTS vs. Salary Clustering") +
  xlab("Assists") +
  ylab("Points") +
  scale_color_gradient(low="red", high="blue")+
  scale_shape_manual(name = "Cluster", 
                     labels = c("High performing", "Mid performance level", "Lower performing"),
                     values = c("1", "2","3")) + 
  theme_light()

# List of Players we want
playerswewant$Player
## [1] "Bam Adebayo"             "De'Aaron Fox"           
## [3] "Donovan Mitchell"        "LaMelo Ball"            
## [5] "Luka Don?i?"             "Shai Gilgeous-Alexander"
## [7] "Trae Young"

First, we determine what performance variables are most correlated with Salaries, and came up with Assists and Points. To determine which players to recommend to our GM to try to aquire, we have subset players with a salary under $10,000,000, Points over 500, and Assists below 175. In the clustering plot, this would be the players in the upper right quadrant (high performance stats) who are bright red. This resulted with: Donovan Mitchell, De’Aron Fox, LaMelo Ball, Trae Young, Bam Adebayo, Luka Doncic, Shai Gilgeous-Alexander. These players are of good value, and are good trade targets for the Wizards GM.

Drawbacks to this method include the fact that we are unable to incorporate categorical variables into this type of analysis, but categorical factors like position can be a relevant factor in decision-making here. Similarly, we can’t reasonably assume that underpaid players will be willing to switch teams and continue to be paid the same salary.