Intro

The 2024 NBA draft has been widely regarded as one of the weirdest classes in recent history. There is no overarching MVP or ALL-NBA level talent like last year with Victor Wembanyama, and no definitive answer to what the Atlanta Hawks will do with the first overall pick. Even with the negative sentiment surrounding this draft class, many prospects have the potential to be a decent starter or role player in the league, if not better. In my opinion and from what I gather from NBA and Draft analysts, the pick variation for prospects this year is quite high as there is no extreme talent gap between players, meaning teams will generally draft for fit (which I believe could start as early as 7 with the Blazers or 9 with the Grizzlies).

To figure out if high variation of pick selection actually exists for this draft class, I compiled complete 2-round mock drafts from 12 different sources to see the average mocked pick for each prospect and the standard deviation of that average. The mock draft sources included: Bleacher Report, CBS Sports, ESPN, nbadraft.net, NBA Draft Room, Net Scouts Basketball, SB Nation, Sports Illustrated, Tankathon, The Athletic, The Ringer, and Yahoo Sports. After gathering the mock averages and standard deviations, I then modeled the mock average variable using Random Forest and XGBoost regression to determine which features affect mock draft position the most. Features for these models came from prospect bios, combine data, and box-score and advanced stats for the prospects’ 2023-24 season. More information of the data and models will be provided in upcoming sections.

Set Up

To start my analysis, I first collected the data previously mentioned from NBA.com (Bios and Combine), basketballreference.com (College/G-League Stats), and RealGM.com (International Stats). Biography data included variables such as position, age, and status (college, international, or G-League) and combine data included statistics for all combine drills. Previous season data consisted of regular box score and advanced statistics.

library(tidyverse)
library(caret)
library(car)
library(vip)
prospect_data <- read_csv("prospect_data.csv")

prospect_data <- prospect_data %>%
  mutate(across(where(is.character), ~ iconv(.x, 
                                             to = "UTF-8")))

Calculating Average and Standard Deviation Mock Pick

After loading in the data, I first calculated the mean mock draft pick for each player by averaging the mock pick for all 12 mock drafts then subtracting the sum of NA values from that average to penalize prospects that did not appear in one or many mock drafts (mocked to be undrafted). Subtracting the sum of NAs was a basic method for penalization and different methods would and should be applied in future work. Along with the average mock pick, I also calculated the standard deviation for each mock pick average to see if the variation in mock draft pick is as pronounced as it is claimed to be. The table below displays each prospect that appeared in at least 3 of the chosen mock drafts.

prospect_data <- prospect_data %>%
  rowwise() %>%
  mutate(mock_average = mean(c_across(contains("Mock")), 
                             na.rm = TRUE) - 
           sum(is.na(c_across(contains("Mock")))),
         mock_sd = sd(c_across(contains("Mock")), na.rm = TRUE)
  ) %>%
  arrange(mock_average) %>%
  ungroup() %>%
  mutate(Mock_Pick = row_number())

mock_mean_sd <- prospect_data %>%
  select(Name, Mock_Pick, mock_average, mock_sd) %>%
  arrange(Mock_Pick)

knitr::kable(mock_mean_sd)
Name Mock_Pick mock_average mock_sd
Zaccharie Risacher 1 1.333333 0.8498366
Alexandre Sarr 2 1.833333 0.3726780
Reed Sheppard 3 3.666667 1.1785113
Stephon Castle 4 4.833333 1.5723302
Matas Buzelis 5 5.416667 0.9537936
Donovan Clingan 6 5.750000 1.4790199
Dalton Knecht 7 9.083333 2.0999339
Ron Holland II 8 9.750000 4.2056510
Cody Williams 9 9.750000 2.9474565
Tidjane Salaun 10 10.666667 2.2852182
Devin Carter 11 10.916667 2.7525241
Rob Dillingham 12 11.083333 3.6846152
Nikola Topic 13 11.666667 4.2295258
Jared McCain 14 16.083333 1.3202483
JaKobe Walter 15 16.500000 3.7969286
Zach Edey 16 18.333333 4.4783429
Tristan da Silva 17 18.583333 4.6629449
Isaiah Collier 18 18.833333 5.4594465
Carlton Carrington 19 19.750000 4.7980031
Kelel Ware 20 20.250000 5.1498382
Kyshawn George 21 20.916667 4.3866907
Yves Missi 22 21.250000 4.0645828
Tyler Kolek 23 22.083333 3.3530666
Kyle Filipowski 24 24.166667 5.4441610
Johnny Furphy 25 24.333333 3.6817870
DaRon Holmes II 26 24.916667 4.9406196
Tyler Smith 27 27.500000 7.3541372
Terrence Shannon Jr. 28 28.416667 4.5727150
Jaylon Tyson 29 29.416667 3.9255219
Ryan Dunn 30 29.666667 5.0881125
Pacome Dadiet 31 31.333333 3.9440532
Bobi Klintman 32 31.916667 6.2777163
Baylor Scheierman 33 32.250000 5.4791271
Cam Christie 34 36.500000 7.3200638
Kevin McCullar Jr. 35 37.750000 6.7838657
AJ Johnson 36 38.272727 7.4290492
NFaly Dante 37 39.333333 10.8230721
KJ Simpson 38 40.200000 10.2138943
Trentyn Flowers 39 40.833333 7.7855687
Dillon Jones 40 41.272727 7.2431691
Jamal Shead 41 41.363636 6.6705739
Adem Bona 42 41.583333 5.9225323
Justin Edwards 43 41.666667 8.6922699
Harrison Ingram 44 41.818182 6.4846184
Trey Alexander 45 41.833333 8.5993263
PJ Hall 46 41.857143 6.1959911
Jonathan Mogbo 47 42.250000 5.1659946
Nikola Djurisic 48 43.083333 4.7338908
Enrique Freeman 49 44.285714 8.1705357
Ulrich Chomche 50 44.444444 3.7464386
Keshad Johnson 51 45.100000 5.6083542
Pelle Larsson 52 45.300000 6.1622753
Melvin Ajinca 53 46.000000 6.2589330
Juan Nunez 54 46.083333 6.6640620
Ajay Mitchell 55 46.333333 8.0966385
Cam Spencer 56 47.750000 6.3415517
Tristen Newton 57 48.200000 3.5670207
Jaylen Wells 58 48.250000 7.9175438
Antonio Reeves 59 48.666667 6.2902040
Reece Beekman 60 48.750000 9.4597833
Jalen Bridges 61 49.200000 4.4411301
Oso Ighodaro 62 49.727273 5.2492292
Isaac Jones 63 50.500000 6.2888791
Bronny James 64 52.666667 5.0387388
Boogie Ellis 65 53.000000 7.6376262

From the table, you can see that Zaccharie Risacher had the lowest average mock pick, followed by Alex Sarr, Reed Sheppard, and Stephon Castle (the most frequent first four picks in mocks). Among the 65 prospects looked at, USC teammates Boogie Ellis and Bronny James had the lowest average mock pick. Ellis was only picked in a few mocks and the majority of mocks had the Suns taking Bronny at pick 55. Looking at standard deviation, the lowest variation in mock draft pick was Alex Sarr, who was either the first or second in all mocks (sd under 1). Sarr was followed by fellow lottery picks such as Risacher, Sheppard, Matt Buzelis, and Donovan Clingan. On the opposite side of the spectrum, the prospects with the highest variation in mock draft position was N’Faly Dante from Oregon (mock pick 37) and KJ Simpson from Colorado (mock pick 38), who both had standard deviations of over 10 (could be picked around 10 spots higher or lower than average pick). Non-surprisingly, both these players are projected to go in the second round, where team’s draft decisions change frequently. Some other interesting observations to note are Tyler Kolek having the lowest standard deviation of projected non-lottery picks (many mocks have him going 22 to Suns), Nikola Topic and Ron Holland having the largest standard deviation of projected lottery picks, Tristen Newton having the lowest standard deviation of a projected second rounder (13th lowest overall), and Jared Mccain having the 5th lowest standard deviation despite being projected as the 14th pick. Overall, 30 of the 65 prospects had a standard deviation below 5, meaning half the draft class could be picked 5+ spots higher or lower than their average mock pick.

Random Forest

To better understand why some players are mocked (on average) higher than others and which variables most influence average mock draft pick, I ran a random forest using the caret package in R. The response variable for the model was the average mock pick and the explanatory variables included: Offensive Win Shares, Defensive Win Shares, Two-point percentage, Three-point rate, Free Throw Percentage, Wingspan, Standing Vertical Leap, Age, Personal Fouls, Position, Minutes Per Game, Usage Percentage, Turnover Percentage, and Block Percentage. Explanatory features were chosen based on low RMSE, relatively high R^2, reasonable predictions, and overall relevance to the model. Win shares were the only advanced encapsulating statistics that was calculated for all prospects and positively contributed to the model (PER did not). The combination of efficiency statistics chosen (2P%, 3PAr, and FT%) produced the smallest RMSE compared to all other combinations of efficiency metrics (TS%, eFG%, 3P%, FTr, and attempts and makes instead of percentages). Wingspan and standing vertical leap were the only statistically relevant combine metrics. Age was chosen over prospect status (college year, international, or g-league) due to better model performance. The only counting stats chosen were personal fouls and minutes per game, as the variation in games played between prospects altered counting statistics substantially. However, contrary to the other counting stats, both chosen metrics improved model performance while not noticeably changing predictions. Once features were selected, the data was split, scaled, and centered and the model was ran with 5-fold cross-validation.

features <- prospect_data %>%
  select(Name, mock_average, OWS, DWS, `2P%`, `FT%`,
         Wingspan, `Standing Vertical Leap`, Age, PF, Pos,
         MPG, `USG%`, `TOV%`, `BLK%`, `3PAr`)

set.seed(123)
trainIndex <- createDataPartition(features$mock_average, 
                                  p = .7, 
                                  list = FALSE, 
                                  times = 1)
train <- features[trainIndex, ]
test  <- features[-trainIndex, ]

fitControl <- trainControl(
  method = "repeatedcv",
  number = 5,
  repeats = 10)

rf_model <- train(
  mock_average ~ ., 
  data = train[, -1],
  method = "rf",
  trControl = fitControl,
  preProcess = c("center", "scale")
)

The final random forest model explained 20.63% of the variation in mock pick average for the training set, not particularly good but relative to models with other combinations of variables it provided the best trade-off between model complexity and predictive power. The test set RMSE for this model was 11.88, which was relatively low as well. Using this model, mock draft pick was predicted for each player, producing an RMSE of 7.73. The models draft predictions, as well as the initial mock pick average and the residuals are shown below.

From the table, you can see predictions ranged from 6 to 46, with a mean of 30.89 and a majority of prospects having pick predictions between 20 and 40. Because of this, I decided to rank the prospects given their model predictions and used that ranking as the models “mock draft”. Pittsburgh point guard Bub Carrington and Baylor big Yves Missi both jumped up into the top 10 while projected lottery picks Nikola Topic and Dalton Knecht fell to picks 22 and 41 respectively. Knecht had the biggest negative residual, accomponied by other projected first rounders Tristan da Silva and Kyshawn George. Prospects with the largest positive residuals were Melvin Ajinca, Isaac Jones, Nikola Djurisic, and Bronny James, all projected second round prospects.

test_prediction <- predict(rf_model, test)

rmse <- function(actual, predicted) {
  sqrt(mean((predicted - actual)^2))
}
rmse(test$mock_average, test_prediction)
## [1] 11.87708
rf_predictions <- predict(rf_model, features)
rmse(features$mock_average, rf_predictions)
## [1] 7.727638
mock_picks <- features %>%
  mutate(rf_pred = rf_predictions) %>%
  arrange(rf_pred) %>%
  mutate(Rank = row_number(),
         rf_residual = mock_average - rf_pred) %>%
  select(Name, mock_average, rf_pred, rf_residual, Rank)

knitr::kable(mock_picks)
Name mock_average rf_pred rf_residual Rank
Zaccharie Risacher 1.333333 6.997912 -5.6645787 1
Stephon Castle 4.833333 9.272980 -4.4396467 2
Matas Buzelis 5.416667 10.185224 -4.7685577 3
Alexandre Sarr 1.833333 11.858385 -10.0250512 4
Donovan Clingan 5.750000 13.763617 -8.0136168 5
Ron Holland II 9.750000 14.927029 -5.1770285 6
Reed Sheppard 3.666667 15.724277 -12.0576108 7
Tidjane Salaun 10.666667 16.082777 -5.4161108 8
Cody Williams 9.750000 17.785251 -8.0352508 9
Carlton Carrington 19.750000 19.505192 0.2448080 10
Yves Missi 21.250000 20.766727 0.4832731 11
Kelel Ware 20.250000 21.355050 -1.1050502 12
Devin Carter 10.916667 21.555295 -10.6386284 13
Rob Dillingham 11.083333 21.645070 -10.5617370 14
Tyler Smith 27.500000 22.120956 5.3790441 15
Kyle Filipowski 24.166667 22.598074 1.5685925 16
Zach Edey 18.333333 22.764336 -4.4310025 17
Jared McCain 16.083333 22.966886 -6.8835524 18
Johnny Furphy 24.333333 23.194785 1.1385479 19
JaKobe Walter 16.500000 23.862587 -7.3625872 20
Isaiah Collier 18.833333 24.101406 -5.2680726 21
Nikola Topic 11.666667 25.034453 -13.3677863 22
DaRon Holmes II 24.916667 27.220231 -2.3035645 23
Tyler Kolek 22.083333 27.893503 -5.8101696 24
Pacome Dadiet 31.333333 28.914096 2.4192370 25
Melvin Ajinca 46.000000 29.334860 16.6651402 26
Terrence Shannon Jr. 28.416667 30.968809 -2.5521421 27
Nikola Djurisic 43.083333 32.057005 11.0263286 28
Ryan Dunn 29.666667 32.307017 -2.6403505 29
Jaylon Tyson 29.416667 33.376324 -3.9596570 30
Baylor Scheierman 32.250000 33.872855 -1.6228553 31
Jamal Shead 41.363636 34.021793 7.3418435 32
Bobi Klintman 31.916667 34.390368 -2.4737014 33
Cam Christie 36.500000 34.786834 1.7131659 34
Ulrich Chomche 44.444444 34.937049 9.5073958 35
Justin Edwards 41.666667 35.100633 6.5660336 36
AJ Johnson 38.272727 35.193657 3.0790707 37
Kyshawn George 20.916667 35.597078 -14.6804113 38
Juan Nunez 46.083333 35.930116 10.1532178 39
Tristan da Silva 18.583333 36.108122 -17.5247882 40
Dalton Knecht 9.083333 36.118379 -27.0350457 41
Trey Alexander 41.833333 36.243704 5.5896292 42
Dillon Jones 41.272727 36.374544 4.8981829 43
Kevin McCullar Jr. 37.750000 36.583071 1.1669294 44
NFaly Dante 39.333333 36.700985 2.6323484 45
Trentyn Flowers 40.833333 37.167548 3.6657850 46
Adem Bona 41.583333 37.495626 4.0877074 47
Enrique Freeman 44.285714 37.910549 6.3751653 48
Isaac Jones 50.500000 38.313360 12.1866398 49
PJ Hall 41.857143 38.353300 3.5038425 50
KJ Simpson 40.200000 38.560479 1.6395207 51
Tristen Newton 48.200000 39.281590 8.9184100 52
Keshad Johnson 45.100000 39.415559 5.6844414 53
Jonathan Mogbo 42.250000 40.147405 2.1025951 54
Jaylen Wells 48.250000 41.554553 6.6954468 55
Harrison Ingram 41.818182 41.698179 0.1200029 56
Bronny James 52.666667 42.496154 10.1705129 57
Ajay Mitchell 46.333333 42.946228 3.3871049 58
Oso Ighodaro 49.727273 43.154260 6.5730130 59
Antonio Reeves 48.666667 43.226551 5.4401159 60
Jalen Bridges 49.200000 43.693157 5.5068431 61
Cam Spencer 47.750000 44.529400 3.2206000 62
Reece Beekman 48.750000 44.689514 4.0604864 63
Pelle Larsson 45.300000 44.814170 0.4858296 64
Boogie Ellis 53.000000 46.145090 6.8549104 65

Variable importance was calculated using the R package vip (variable importance score) and is plotted below. Age was by far and away the most important variable, followed by Wingspan, BLK%, 2P%, and USG%. Both position variables (center the one left out) were the least important but were kept in the model as predictive performance decreased with them removed.

vip(rf_model, n = 15)

XGBoost

Although the random forest produced interesting results, I chose to employ an XGBoost regression model to identify any improvement in model performance. Due to time constraints and computational efficiency, I decided to use the basic parameter values for my model. The model was ran with the same centered and scaled variables and was 5-fold cross-validated.

xgb_grid <- expand.grid(
  nrounds = 100,
  max_depth = 6,
  eta = 0.3,
  gamma = 0,
  colsample_bytree = 0.8,
  min_child_weight = 1,
  subsample = 0.8
)

set.seed(123)
xgb_model <- train(mock_average ~ ., 
                   data = train[, -1], 
                   method = "xgbTree", 
                   trControl = fitControl, 
                   tuneGrid = xgb_grid,
                   preProcess = c("center", "scale"))
print(xgb_model)
## eXtreme Gradient Boosting 
## 
## 48 samples
## 14 predictors
## 
## Pre-processing: centered (15), scaled (15) 
## Resampling: Cross-Validated (5 fold, repeated 10 times) 
## Summary of sample sizes: 40, 39, 38, 37, 38, 39, ... 
## Resampling results:
## 
##   RMSE     Rsquared   MAE     
##   15.2571  0.1825577  12.92009
## 
## Tuning parameter 'nrounds' was held constant at a value of 100
## Tuning
##  held constant at a value of 1
## Tuning parameter 'subsample' was held
##  constant at a value of 0.8

For the training set, RMSE was 15.26 and the R^2 was 18.26%, both worse than that of the random forest. The test set RMSE, however, was 13.63, which can partially be attributed to the small sample of data tested on (65 prospects total). Given the limited time frame (wanting to finish project before start of draft), I decided to use this as my final XGBoost model. After predicting the mock pick average for every prospect, the RMSE was 6.97. The XGBoost models predictions and residuals were added to the random forest predictions table which is presented below.

Overall, the XGBoost model vastly outperformed the random forest, with the predictions first 6 picks being the same as the average mock pick. Predictions ranged from 1 to 53 with a mean of 30.54. The first deviation from the average mock draft comes with Kansas forward Johnny Furphy, who jumped to the 7th overall pick (third highest positive residual). Other notable observations include Nikola Topic and Dalton Knecht again falling out of lottery albeit with smaller residuals (picks 23 and 27) and projected second rounders Melvin Ajinca and Jamal Shead entering the first round. The largest negative residuals remained the same (Knecht, da Silva, and George) while the largest positive residuals included Ajinca, Shead, Furphy, and Tristen Newton.

test_predictions <- predict(xgb_model, test)
rmse(test$mock_average, test_predictions)
## [1] 13.62956
xgb_predictions <- predict(xgb_model, features)
rmse(features$mock_average, xgb_predictions)
## [1] 6.970269
mock_picks <- mock_picks %>%
  mutate(xgb_pred = xgb_predictions) %>%
  arrange(xgb_pred) %>%
  mutate(Rank = row_number(),
         xgb_residual = mock_average - xgb_pred) %>%
  select(Name, mock_average, xgb_pred, xgb_residual, 
         rf_pred, rf_residual, Rank)
knitr::kable(mock_picks)
Name mock_average xgb_pred xgb_residual rf_pred rf_residual Rank
Zaccharie Risacher 1.333333 1.333115 0.0002181 6.997912 -5.6645787 1
Stephon Castle 4.833333 1.833295 3.0000383 9.272980 -4.4396467 2
Matas Buzelis 5.416667 3.665965 1.7507021 10.185224 -4.7685577 3
Alexandre Sarr 1.833333 4.834532 -3.0011989 11.858385 -10.0250512 4
Donovan Clingan 5.750000 5.416933 0.3330674 13.763617 -8.0136168 5
Ron Holland II 9.750000 5.750646 3.9993539 14.927029 -5.1770285 6
Pacome Dadiet 31.333333 6.154612 25.1787213 28.914096 2.4192370 7
Tidjane Salaun 10.666667 9.750295 0.9163720 16.082777 -5.4161108 8
Cody Williams 9.750000 10.435095 -0.6850948 17.785251 -8.0352508 9
Carlton Carrington 19.750000 10.666607 9.0833931 19.505192 0.2448080 10
Yves Missi 21.250000 10.917864 10.3321362 20.766727 0.4832731 11
Kelel Ware 20.250000 12.703759 7.5462408 21.355050 -1.1050502 12
Rob Dillingham 11.083333 16.083364 -5.0000312 21.645070 -10.5617370 13
Tyler Smith 27.500000 16.500229 10.9997711 22.120956 5.3790441 14
Kyle Filipowski 24.166667 18.333504 5.8331629 22.598074 1.5685925 15
Keshad Johnson 45.100000 18.751251 26.3487488 39.415559 5.6844414 16
Johnny Furphy 24.333333 19.749514 4.5838197 23.194785 1.1385479 17
JaKobe Walter 16.500000 20.250557 -3.7505569 23.862587 -7.3625872 18
Nikola Topic 11.666667 21.250176 -9.5835088 25.034453 -13.3677863 19
DaRon Holmes II 24.916667 22.083941 2.8327262 27.220231 -2.3035645 20
Dalton Knecht 9.083333 22.233538 -13.1502043 36.118379 -27.0350457 21
Jared McCain 16.083333 23.921852 -7.8385188 22.966886 -6.8835524 22
Devin Carter 10.916667 24.021381 -13.1047147 21.555295 -10.6386284 23
Melvin Ajinca 46.000000 24.916666 21.0833340 29.334860 16.6651402 24
Terrence Shannon Jr. 28.416667 27.499571 0.9170958 30.968809 -2.5521421 25
Nikola Djurisic 43.083333 28.416672 14.6666616 32.057005 11.0263286 26
Reed Sheppard 3.666667 28.781212 -25.1145452 15.724277 -12.0576108 27
Ryan Dunn 29.666667 29.416868 0.2497985 32.307017 -2.6403505 28
Jaylon Tyson 29.416667 29.666752 -0.2500852 33.376324 -3.9596570 29
Tyler Kolek 22.083333 30.212368 -8.1290347 27.893503 -5.8101696 30
Baylor Scheierman 32.250000 31.333406 0.9165936 33.872855 -1.6228553 31
Jamal Shead 41.363636 31.917181 9.4464553 34.021793 7.3418435 32
Bobi Klintman 31.916667 32.249912 -0.3332456 34.390368 -2.4737014 33
NFaly Dante 39.333333 35.865795 3.4675382 36.700985 2.6323484 34
KJ Simpson 40.200000 36.243801 3.9561989 38.560479 1.6395207 35
Cam Christie 36.500000 36.500771 -0.0007706 34.786834 1.7131659 36
Enrique Freeman 44.285714 36.586975 7.6987392 37.910549 6.3751653 37
Zach Edey 18.333333 37.498913 -19.1655795 22.764336 -4.4310025 38
Bronny James 52.666667 37.633964 15.0327021 42.496154 10.1705129 39
Ulrich Chomche 44.444444 37.749859 6.6945856 34.937049 9.5073958 40
Justin Edwards 41.666667 38.272659 3.3940074 35.100633 6.5660336 41
AJ Johnson 38.272727 39.332607 -1.0598800 35.193657 3.0790707 42
Isaiah Collier 18.833333 39.799248 -20.9659144 24.101406 -5.2680726 43
Kyshawn George 20.916667 40.200134 -19.2834676 35.597078 -14.6804113 44
Tristan da Silva 18.583333 41.272423 -22.6890895 36.108122 -17.5247882 45
Trey Alexander 41.833333 41.583473 0.2498601 36.243704 5.5896292 46
Dillon Jones 41.272727 41.665756 -0.3930290 36.374544 4.8981829 47
Kevin McCullar Jr. 37.750000 41.818790 -4.0687904 36.583071 1.1669294 48
Trentyn Flowers 40.833333 41.856918 -1.0235850 37.167548 3.6657850 49
Adem Bona 41.583333 42.250164 -0.6668307 37.495626 4.0877074 50
Reece Beekman 48.750000 43.599632 5.1503677 44.689514 4.0604864 51
Isaac Jones 50.500000 44.284580 6.2154198 38.313360 12.1866398 52
PJ Hall 41.857143 44.444447 -2.5873037 38.353300 3.5038425 53
Tristen Newton 48.200000 45.299885 2.9001152 39.281590 8.9184100 54
Jonathan Mogbo 42.250000 46.082729 -3.8327293 40.147405 2.1025951 55
Jaylen Wells 48.250000 46.333057 1.9169426 41.554553 6.6954468 56
Harrison Ingram 41.818182 47.749931 -5.9317495 41.698179 0.1200029 57
Ajay Mitchell 46.333333 48.249256 -1.9159228 42.946228 3.3871049 58
Oso Ighodaro 49.727273 48.666973 1.0602996 43.154260 6.5730130 59
Antonio Reeves 48.666667 48.749405 -0.0827382 43.226551 5.4401159 60
Jalen Bridges 49.200000 49.199665 0.0003349 43.693157 5.5068431 61
Cam Spencer 47.750000 49.726696 -1.9766960 44.529400 3.2206000 62
Juan Nunez 46.083333 49.946163 -3.8628298 35.930116 10.1532178 63
Pelle Larsson 45.300000 52.666843 -7.3668434 44.814170 0.4858296 64
Boogie Ellis 53.000000 52.999729 0.0002708 46.145090 6.8549104 65

Just as for the random forest model, vip was used to calculate variable importance. There wasn’t much change in importance between models, as age also dominated the XGBoost model. Small changes can be seen between the other top variables as well as the rise of defensive win shares as the fourth most important variable. Position is still has little importance along with 3PAr and FT%.

vip(xgb_model, n = 15)

Conclusion

Despite the lack of a definitive top-tier talent, the 2024 NBA draft presents a challenging landscape for teams due to the high variability in mock draft positions. Teams are likely to prioritize fit and potential over established hierarchy, making this draft unpredictable and potentially yielding surprises in player selections.

Both random forest and XGBoost were utilized to model average mock pick based on several important game- and player-level statistics. Results found XGBoost outperformed random forest and that attributes such as Age, Wingspan, BLK%, Defensive Win Shares, 2P%, and USG% all greatly impact mock draft position while a players position on the court, 3PAr, and FT% have little to no impact relative to other variables in the models.

This analysis provides an understanding of how statistical modeling can assist in predicting draft outcomes, although further refinements and data enhancements could improve predictive accuracy and robustness. Future work could account for the ordinal nature of draft picks and how standard deviation in mock draft pick affects model predictions.

Sources