DATA 698 - Research Proposal: NBA Player Comparisons
Author: David Quarshie
Intro
Is LeBron the next Jordan? Was Kobe’s playing style closest to Jordan or was it more like Magic? What NBA star’s game style does Duke’s star, Zion Williamson most likely resemble? If you’ve paid attention to any NBA coverage recently you’ve heard plenty of questions like these. Long winded debates have gone on to see which player is best, who is more like past great players, and what the future of league will look like. When discussing these topics people usually watch video of players to compare styles, count championship rings to rank greatness, and look at certain playing stats to see who’s more like who. But in today’s world where we have countless amount of data and machine learning, shouldn’t there be a better way to do comparisons?
For my project I will look at datasets of past and present NBA players and of current NCCA basketball players in an attempt to see if we can come up with player similarities that we can’t get from just watching the game. With there being plenty of sources with basic NBA and NCCA stats like points, rebounds, and assists; there are also sources that give more in depth new stats like player efficiency rating, offensive plus/minus, and usage rate. My plan is to apply various machine learning methods, clustering, and modeling with this data to make comparisons and see if we can pull any other interesting insights.
Background
As previously stated there are plenty of NBA datasets and many statisticians have used these data sets to come up with different interesting findings.
Estimating the Ability of NBA Players
https://arxiv.org/pdf/1008.0705.pdf
Although it was written in 2018, Paul Fearnhead and Benjamin M. Taylor took a look at the 2008-2009 NBA season to see if the year-end awards, MVP, defensive player of the year, and rookie of the year, measured up to the stats they had in their data. Using the data, they collected the two came up with a model to see player’s ability in a season and then rank them based on that ability. Overall, the model did a great job in determining player’s offensive ability but lacked in doing the same for defense. In regard to ranking players, the model showed that LeBron James, who won the MVP in the 2008-09 season, was top ranked in ability, but then rookie Russell Westbrook was also highly ranked but lost the rookie of the year award to Derrick Rose. This study shows us that models are great for using data to show us things we may be oblivious to. While Derrick Rose did win the rookie of the year award and later become the MVP, Russell Westbrook has also won an MVP award and is currently regarded as a better player than Rose.
Using Datarobot to Predict NBA Player Performance
https://blog.datarobot.com/using-datarobot-to-predict-nba-player-performance
Looking at a more recent rime frame, Benjamin Miller took at look at the 2018 NBA season using regression to rank players’ performance. Using an easier method than Paul and Benjamin’s self-created model, Benjamin used a regression formula derived by John Hollinger that uses basics stats like field goals attempted/made, free throws attempted/made, rebounds, steals, and turnovers to determine a “game score”. In summary, the game score is a measure of a player’s productivity for a single game. With the DataRobot program, Benjamin Miller was able to plug and play a CSV with players’ stats to see their overall game score and compare players in 2018. The program was also able to show the predicted player performance vs the actual, giving insight on how accurate the model is.
Overall the model told us that Buck’s star Giannis Antetokounmpo was a great player with a score of 29.14. Standing at 6’ 11’‘and having the ability to jump over players that are 6’ 6’’ we can take the model’s result confidently. However, the model also told us that Devin Booker has a low score of -0.14. DataRobot actually took in text input from a news report and saw that Booker is coming off an injury, therefore lower his score. This feature is awesome, giving us an extra layer to the model that doesn’t rely on numbers alone.
Tableau K-Means Clustering Analysis w/ NBA Data
https://anthonysmoak.com/2018/06/16/tableau-k-means-clustering-analysis-w-nba-data/
So far, we’ve looked at ways to use NBA data to look at players’ skill and to rank players. But how about using that same data to create groups to see who’s most alike? Using Tableau, Anthony Smoak did just that. Known for it’s ease of use and visualization powers, Tableau has quickly become one of the most used tools by not just data scientists but by anyone who wants to easily view data. By simply connecting his data source of 2017-18 NBA season data, Anthony was able to look at point guards and centers to see how they compare. In Tableau he looked at the number of assists and blocks per game these players had and created two clusters. While it may seem obvious who will fall in which cluster between point guards and centers separated by the number blocks and assists, we were introduced to an interesting result. The result showed that Nikola Jokic, Denver’s 7’ center, ended up in the point guard cluster. Clearly not a point guard, Nikola did not appear as an outlier on Tableau’s scatterplot but his average of 8 assists a game leads us to see that he classifies more as a point guard than a center. By only using two stats we were able to see get useful clusters and compare players easily, it will great to see results when we cluster with more stats.
Project Assessment
As an avid basketball fan, this project will be very interesting for me. While I can look at certain players’ playing styles and stats to make comparisons and create groups it will be fun to see what differences the data shows. However, what I am most interested in is looking at NCAA stats to see where these players fall when clustering. There are several assumptions that must be taken into account when doing this such as, college players playing against lower skilled teams, the college game being slower and having lower scores than the NBA, and college playing way less games than the NBA. Using R, I should be able to wrangle data, make needed edits, and build several models and clusters to evaluate players. I’m also intrigued by the DataRobot tool that gets us the game score, ranking players’ ability. That may be used in addition to R’s modeling and overall clustering. The project should get us several charts and plots that will help show readers what the results are.