DATA 621 Blog 4: NBA Player Similarity using K-Means

David Quarshie

Intro

PCA Plot

In the last blog we did a Principal Component Analysis on NBA players to see which players are unique. The players that were moving away from the group were more unique and differed from that group. We saw that MVPs, Russell Westbrook and James Harden were more different than others. We also see that certain players have branched off and made their own group. Rudy Gobert, DeAndre Jordan, Hassan Whiteside, and Marcin Gortat can be found in the upper left, creating their own group of centers. Looking at their stats and see the PCA plot we can classify that these players are alike and can ba place in their own group. So how can we use R to create groups for all players in our dataset.

K-Means

The idea behind k-means clustering is to create an algorithm that splits a dataset into a k number of clusters, in which cluster is centered around the cluster’s centroid (mean). To do this the algorithm must take a pair of players and calculate the distance (Euclidean) between each pair. This distance helps to define the player similarities and shape the clusters. The algorithm will take a designated number of players (k) and place them as centroids and proceed to place players from the dataset to their closest player, based on that calculated Euclidean distance. Each time a player is placed in a cluster, the centroid amount is recalculated, making the fit better.

Gap Statistic

A downside to K-means Clustering is that the number of k clusters must be given in advance of running but determining that number is difficult. For this study, the gap statistic was used showing that having k=10.

K-Means Plot

We can finally make the k-means cluster to see which players fall into which groups.

We can see that the centers, Rudy Gobert and DeAndre Jordan created their own cluster as we expected. Also, James Harden and Rusell Westbrook created their own cluster. Superstars like LeBron James, Kawhi Leonard, and Stephen Curry have also created their own cluster. We can see the other players that play alike formed into other clusters.