Predicting a NBA player’s career by Draft Selection

Load Packages

library(tidyr)
library(dplyr)
library(psych)
library(tidyverse)
library(ggplot2)

Research question

In the NBA, can draft pick location predict success in the league? If so, what are the odds of a second round player, having a successful career in the NBA?

The goal of this project is to predict how successful an NBA player is based on the Draft number in the NBA Draft. Every year, around 60 players get drafted to play in the NBA, but not everyone is successful. To define success, I looked at the amount of years played In the league and VORP (Value Over Replacement Player).

Type of study

This is an observational study.

Explanatory

The explanatory variables are number of years in the league (numerical), VORP(numerical) and experience (categorical).

Data Collection

Data was collected from https://www.basketball-reference.com/draft/ and the list shows career statistics of NBA player’s drafted from 1994 to 2014.

nba_players <- read.csv("https://raw.githubusercontent.com/mikegankhuyag/606-HW/master/NBA%20Draft%20Picks%201994-2014.csv", stringsAsFactors = FALSE)

nba.players <- data.frame(nba_players)

head(nba.players)

##   Year Rk Pk  Tm         Player                   College Yrs   G   MP
## 1 2014  1  1 CLE Andrew Wiggins      University of Kansas   4 251 9069
## 2 2014  2  2 MIL  Jabari Parker           Duke University   3 152 4874
## 3 2014  3  3 PHI    Joel Embiid      University of Kansas   2  36  924
## 4 2014  4  4 ORL   Aaron Gordon     University of Arizona   4 209 5091
## 5 2014  5  5 UTA     Dante Exum                             2 148 3045
## 6 2014  6  6 BOS   Marcus Smart Oklahoma State University   4 211 5995
##    PTS  TRB AST   FG.  X3P.   FT. MP.1 PTS.1 TRB.1 AST.1   WS WS.48  BPM
## 1 5111 1020 532 0.450 0.332 0.755 36.1  20.4   4.1   2.1 10.5 0.055 -2.3
## 2 2403  847 314 0.491 0.341 0.748 32.1  15.8   5.6   2.1  9.0 0.088 -1.3
## 3  730  294  79 0.469 0.347 0.789 25.7  20.3   8.2   2.2  2.1 0.107  2.7
## 4 2069 1119 322 0.463 0.300 0.701 24.4   9.9   5.4   1.5 10.8 0.102  0.1
## 5  805  263 309 0.385 0.308 0.743 20.6   5.4   1.8   2.1  1.1 0.017 -3.3
## 6 1964  803 772 0.357 0.291 0.757 28.4   9.3   3.8   3.7  9.1 0.073  0.4
##   VORP
## 1 -0.8
## 2  0.9
## 3  1.1
## 4  2.7
## 5 -1.0
## 6  3.7

Data Preperation

nba.players$Yrs <-as.character(nba.players$Yrs)
nba.players$Yrs <- as.numeric(nba.players$Yrs)

nba.players$Yrs[is.na(nba.players$Yrs)] <- 0

nba.players$VORP <- as.character(nba.players$VORP)
nba.players$VORP <- as.numeric(nba.players$VORP)

nba.players$VORP[is.na(nba.players$VORP)] <- 0

nba.players$Pk <- as.character(nba.players$Pk)
nba.players$Pk <- as.numeric(nba.players$Pk)

nba.players$Draft_Inverse <- nba.players$Pk*-1

Categorize

For this project, I wanted to categorize success in the NBA by the amount of years a player played in the league.

I categorized years in the league below.

1.Legend - 10 or more years 2.Very Experienced - 8-9 Years 3.Experienced - 6-7 Years 4.Some Experience - 4-5 Years 5.Little Experience - 1-3 Years 6-Unsuccessful - 0 years

Experience_grade <- data.frame(
  Years = 0:23,
  Experience_g = c("Unsuccessful(0)","Little Experience(1-3)","Little Experience(1-3)","Little Experience(1-3)", "Some Experience(4-5)", "Some Experience(4-5)", "Experienced(6-7)", "Experienced(6-7)", "Very Experienced(8-9)","Very Experienced(8-9)","Legend(10+)","Legend(10+)","Legend(10+)","Legend(10+)","Legend(10+)","Legend(10+)","Legend(10+)","Legend(10+)","Legend(10+)","Legend(10+)","Legend(10+)","Legend(10+)","Legend(10+)","Legend(10+)"))

nba.players$Experience <- Experience_grade$Experience_g[match(nba.players$Yrs,Experience_grade$Years)]
head(nba.players$Experience)

## [1] Some Experience(4-5)   Little Experience(1-3) Little Experience(1-3)
## [4] Some Experience(4-5)   Little Experience(1-3) Some Experience(4-5)  
## 6 Levels: Experienced(6-7) Legend(10+) ... Very Experienced(8-9)

Relevant summary statistics

Provide summary statistics relevant to your research question. For example, if you’re comparing means across groups provide means, SDs, sample sizes of each group. This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.

describe(nba.players$Yrs)

##    vars    n mean   sd median trimmed  mad min max range skew kurtosis
## X1    1 1234 5.59 4.76      4    5.12 4.45   0  21    21 0.72    -0.36
##      se
## X1 0.14

describe(nba.players$VORP)

##    vars    n mean   sd median trimmed  mad  min   max range skew kurtosis
## X1    1 1234 3.96 10.8      0    1.55 1.19 -8.3 116.6 124.9 4.24    25.14
##      se
## X1 0.31

table(nba.players$Experience, useNA = 'ifany')

## 
##       Experienced(6-7)            Legend(10+) Little Experience(1-3) 
##                    130                    291                    342 
##   Some Experience(4-5)        Unsuccessful(0)  Very Experienced(8-9) 
##                    178                    181                    112

describeBy(nba.players$Yrs, 
           group = nba.players$Experience, mat=TRUE)

##     item                 group1 vars   n      mean        sd median
## X11    1       Experienced(6-7)    1 130  6.423077 0.4959586      6
## X12    2            Legend(10+)    1 291 12.608247 2.4978199     12
## X13    3 Little Experience(1-3)    1 342  1.941520 0.7863021      2
## X14    4   Some Experience(4-5)    1 178  4.370787 0.4843779      4
## X15    5        Unsuccessful(0)    1 181  0.000000 0.0000000      0
## X16    6  Very Experienced(8-9)    1 112  8.464286 0.5009643      8
##       trimmed    mad min max range      skew   kurtosis         se
## X11  6.403846 0.0000   6   7     1 0.3078134 -1.9198419 0.04349845
## X12 12.248927 2.9652  10  21    11 1.0244646  0.4998887 0.14642479
## X13  1.927007 1.4826   1   3     2 0.1025859 -1.3809722 0.04251835
## X14  4.340278 0.0000   4   5     1 0.5305261 -1.7281563 0.03630566
## X15  0.000000 0.0000   0   0     0       NaN        NaN 0.00000000
## X16  8.455556 0.0000   8   9     1 0.1413091 -1.9976293 0.04733668

describeBy(nba.players$VORP, 
           group = nba.players$Experience, mat=TRUE)

##     item                 group1 vars   n       mean         sd median
## X11    1       Experienced(6-7)    1 130  2.3538462  5.4015380   0.65
## X12    2            Legend(10+)    1 291 13.9927835 17.5389198   9.20
## X13    3 Little Experience(1-3)    1 342 -0.3114035  0.7378846  -0.20
## X14    4   Some Experience(4-5)    1 178  0.1494382  2.4126373  -0.40
## X15    5        Unsuccessful(0)    1 181  0.0000000  0.0000000   0.00
## X16    6  Very Experienced(8-9)    1 112  5.2303571  8.5144587   2.50
##        trimmed      mad  min   max range     skew  kurtosis         se
## X11  1.3778846  3.18759 -4.1  25.4  29.5 1.843642  3.494965 0.47374622
## X12 10.9828326 10.97124 -7.1 116.6 123.7 2.215044  6.765234 1.02814968
## X13 -0.2908759  0.29652 -2.9   8.6  11.5 5.089910 63.413006 0.03990023
## X14 -0.1902778  1.03782 -4.1  12.6  16.7 2.215084  7.409229 0.18083481
## X15  0.0000000  0.00000  0.0   0.0   0.0      NaN       NaN 0.00000000
## X16  3.9366667  5.63388 -8.3  42.3  50.6 1.916635  5.015057 0.80454072

Ploting the Data

ggplot(nba.players, aes(x=nba.players$Yrs)) + geom_histogram()

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(nba.players, aes(x=nba.players$VORP)) + geom_histogram()

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

##Models ###Predicting years in the league by draft location.

Visualizing the relationship between draft pick position and years in the NBA.

ggplot(data = nba.players, mapping = aes(x=nba.players$Yrs, y= nba.players$Pk))+
  geom_point() +
  ggtitle("Relationship between Draft Pick and Years in the League")

cor(nba.players$Yrs,nba.players$Pk)

## [1] -0.5552688

Yrs_P<-lm(Yrs ~ Pk, data = nba.players)
summary(Yrs_P)

## 
## Call:
## lm(formula = Yrs ~ Pk, data = nba.players)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -8.925 -2.815 -1.083  2.496 14.627 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 10.236187   0.228203   44.86   <2e-16 ***
## Pk          -0.155503   0.006636  -23.43   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.961 on 1232 degrees of freedom
## Multiple R-squared:  0.3083, Adjusted R-squared:  0.3078 
## F-statistic: 549.2 on 1 and 1232 DF,  p-value: < 2.2e-16

The correlation tells the model to calculate a players amount of years in the league by draft spot, is 10.236 plus the multiplication of -0.1555 and draft spot.

Years in NBA = -0.155503*(pick) + 10.236

#In 2017, Lonzo Ball was drafted numeber 2 and based on this model, he will last 9.93 years in the league. Josh Hart was drafted number 30 and based on the model, he will last 5.57 years in the league.

(-0.155503*2)+10.236187

## [1] 9.925181

(-0.155503*30)+10.236187

## [1] 5.571097

The linear model above.,gives us a R squared of 30.8%. This tells us that 30.8% of variability in years in the NBA is explained by draft pick location.

Prediction and prediction errors

Lets check the conditions of the linear model.

plot(nba.players$Yrs ~ nba.players$Pk)
abline(Yrs_P)

Model Assessment: 1. linearity 2. nearly normal residuals 3.constant variability.

plot(Yrs_P$residuals ~ nba.players$Pk)
abline(h = 0, lty = 3)

hist(Yrs_P$residuals)

qqnorm(Yrs_P$residuals)
qqline(Yrs_P$residuals)

Based on the Model Assessment, the linear model looks good.

Predicting VORP with draft location.

ggplot(data = nba.players, mapping = aes(x=nba.players$VORP, y= nba.players$Pk))+
  geom_point()+
  ggtitle("Relationship between Draft Pick and Player's VORP")

cor(nba.players$VORP,nba.players$Pk)

## [1] -0.3465931

VORP_P<-lm(VORP ~ Pk, data = nba.players)
summary(VORP_P)

## 
## Call:
## lm(formula = VORP ~ Pk, data = nba.players)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -18.619  -4.931  -1.514   1.567 106.281 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 10.53909    0.58368   18.06   <2e-16 ***
## Pk          -0.22012    0.01697  -12.97   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.13 on 1232 degrees of freedom
## Multiple R-squared:  0.1201, Adjusted R-squared:  0.1194 
## F-statistic: 168.2 on 1 and 1232 DF,  p-value: < 2.2e-16

The correlation tells the model to calculate a player’s VORP by draft spot, is 10.539 plus the multiplication of -0.22012 and draft spot.

Years in NBA = -0.22012*(pick) + 10.539

#In 2017, Lonzo Ball was drafted numeber 2 and based on this model, his VORP score in the league will be 10.1. Josh Hart was drafted number 30 and based on the model, his VORP score in the league will be 3.94.

(-0.22012 *2)+ 10.53909

## [1] 10.09885

(-0.22012 *30) +10.53909

## [1] 3.93549

The R^2 in the linear model is 12%, which means that 12% of variability in VORP is predicted by draft pick.

12% is very low, which means draft pick locations aren’t very good for predicting VORP. Let’s check if the model is skewed.

nba.players %>% select(Pk, Player, PTS, VORP, Experience) %>% 
filter(VORP > 60 | VORP < -5)

##    Pk             Player   PTS  VORP            Experience
## 1  16         Nick Young  7643  -5.1           Legend(10+)
## 2   4         Chris Paul 15614  71.7           Legend(10+)
## 3  13  Sebastian Telfair  4183  -5.8           Legend(10+)
## 4   1       LeBron James 28959 116.6           Legend(10+)
## 5  41       Willie Green  6059  -5.9           Legend(10+)
## 6   1 Michael Olowokandi  4135  -8.3 Very Experienced(8-9)
## 7   9      Dirk Nowitzki 30331  65.8           Legend(10+)
## 8  10        Paul Pierce 26397  61.5           Legend(10+)
## 9   1         Tim Duncan 26496  89.3           Legend(10+)
## 10 14     Maurice Taylor  5889  -5.9 Very Experienced(8-9)
## 11 27      Jacque Vaughn  3463  -7.1           Legend(10+)
## 12 13        Kobe Bryant 33643  72.1           Legend(10+)
## 13  5      Kevin Garnett 26071  94.0           Legend(10+)
## 14  2         Jason Kidd 17529  78.2           Legend(10+)

We see that only 8 players have a VORP score 60 and 6 players have a score of -5.

Lets transform VORP using log +1 transformation to deal with the skew.

nba.players$LOGVORP <- log(nba.players$VORP +1)

## Warning in log(nba.players$VORP + 1): NaNs produced

nba.players$LOGVORP[which(is.infinite(nba.players$LOGVORP))] =0
nba.players$LOGVORP[which(is.nan(nba.players$LOGVORP))] =0
LOGVORP_P <- lm(LOGVORP ~ Pk, data = nba.players)
summary(LOGVORP_P)

## 
## Call:
## lm(formula = LOGVORP ~ Pk, data = nba.players)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.8109 -0.8082 -0.1140  0.6161  4.1415 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.540108   0.070767   21.76   <2e-16 ***
## Pk          -0.031835   0.002058  -15.47   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.228 on 1232 degrees of freedom
## Multiple R-squared:  0.1627, Adjusted R-squared:  0.162 
## F-statistic: 239.4 on 1 and 1232 DF,  p-value: < 2.2e-16

VORP with a log + 1 transformation did not make a significant change.

plot(LOGVORP_P$residuals ~ nba.players$Pk)
abline(h = 0, lty = 3)

hist(LOGVORP_P$residuals)

qqnorm(LOGVORP_P$residuals)
qqline(LOGVORP_P$residuals)

Since VORP is not a good predictor, lets focus on years in the league to score success in the league.

ggplot(nba.players, aes(x=Experience, y=Pk)) +
  geom_boxplot(fill = 'yellow') +
  ggtitle("Players Draft Picks & Experience")+
  coord_flip()

Based on the categories, lets calculate the odds of each pick falling into each category.

nba_counts <- nba.players %>% group_by(Pk) %>% count(Experience)
nba_draft <- nba.players %>% group_by(Pk) %>% count(Pk)
Prob <- left_join(nba_counts,nba_draft, by = "Pk")
colnames(Prob) <- c("Pk","Experience","Experience_count","Pick_count")
probability_nba <- mutate(Prob, probability = Experience_count/Pick_count)
probability_nba

## # A tibble: 303 x 5
## # Groups:   Pk [60]
##       Pk             Experience Experience_count Pick_count probability
##    <dbl>                 <fctr>            <int>      <int>       <dbl>
##  1     1       Experienced(6-7)                2         21  0.09523810
##  2     1            Legend(10+)               11         21  0.52380952
##  3     1 Little Experience(1-3)                1         21  0.04761905
##  4     1   Some Experience(4-5)                2         21  0.09523810
##  5     1  Very Experienced(8-9)                5         21  0.23809524
##  6     2       Experienced(6-7)                2         21  0.09523810
##  7     2            Legend(10+)               10         21  0.47619048
##  8     2 Little Experience(1-3)                2         21  0.09523810
##  9     2   Some Experience(4-5)                2         21  0.09523810
## 10     2  Very Experienced(8-9)                5         21  0.23809524
## # ... with 293 more rows

probability_nba_spread <- probability_nba %>% select(Pk, Experience, probability) %>% spread(Experience, probability)
probability_nba_spread[is.na(probability_nba_spread)] = 0
probability_nba_spread <-select(probability_nba_spread,'Pk','Unsuccessful(0)','Little Experience(1-3)','Some Experience(4-5)','Experienced(6-7)','Very Experienced(8-9)','Legend(10+)')
probability_nba_spread[sample(nrow(probability_nba_spread), 10), ]

## # A tibble: 10 x 7
## # Groups:   Pk [10]
##       Pk `Unsuccessful(0)` `Little Experience(1-3)` `Some Experience(4-5)`
##    <dbl>             <dbl>                    <dbl>                  <dbl>
##  1    27         0.0000000                0.2380952              0.1904762
##  2    22         0.0000000                0.3809524              0.3333333
##  3    49         0.3333333                0.3333333              0.0952381
##  4    55         0.4000000                0.3500000              0.0500000
##  5    56         0.4000000                0.4000000              0.0000000
##  6    48         0.2857143                0.4761905              0.1904762
##  7    46         0.2857143                0.4285714              0.0952381
##  8     2         0.0000000                0.0952381              0.0952381
##  9    44         0.4285714                0.2380952              0.1428571
## 10    21         0.0000000                0.2857143              0.1428571
## # ... with 3 more variables: `Experienced(6-7)` <dbl>, `Very
## #   Experienced(8-9)` <dbl>, `Legend(10+)` <dbl>

Visualizing the predictions based on the categories.

par(mfrow=c(3,2))
plot(probability_nba_spread$`Unsuccessful(0)`,main="Draft Pick Spot of Players That Never Played")
plot(probability_nba_spread$`Little Experience(1-3)`,main="Draft Pick Spots of Players with 1-3 Yrs Experience")
plot(probability_nba_spread$`Some Experience(4-5)`,main="Draft Pick Spots of Players with 4-5 Yrs Experience")
plot(probability_nba_spread$`Experienced(6-7)`,main="Draft Pick Spots of Players with 6-7 Yrs Experience")
plot(probability_nba_spread$`Very Experienced(8-9)`,main="Draft Pick Spots of Players with 8-9 Yrs Experience")
plot(probability_nba_spread$`Legend(10+)`,main="Draft Pick Spots of Players with Over 10 Yrs Experience")

As, expected, there ins an inverse relationship, between players that never played and players that played 10 or more years. There isn’t much of a relationship between players that played 4-7.

We can see the overall trend below.

plot(nba.players %>% 
       select(Pk, Yrs) %>% 
       group_by(Pk) %>% 
       summarise(mean = mean(Yrs)), main ="Average NBA Career span(Years) based on Draft Spot")

2nd Rounders.

Below is the visualization of average second rounders by draft pick(picks 31-60).

plot(nba.players %>% 
       select(Pk, Yrs) %>% 
       filter(Pk >=31) %>% 
       group_by(Pk) %>% 
       summarise(mean = mean(Yrs)), main ="Average NBA Career span(Years) of Second Rounders")

Visualizing only second rounders, the model is not linear as all picks. Let’s check which draft spots has been most successful.

probability_nba %>%
  select(Pk, Experience, probability) %>%
  filter(Pk >=31 & Experience == 'Legend(10+)' & probability >0.15)

## # A tibble: 4 x 3
## # Groups:   Pk [4]
##      Pk  Experience probability
##   <dbl>      <fctr>       <dbl>
## 1    35 Legend(10+)   0.1904762
## 2    37 Legend(10+)   0.2857143
## 3    42 Legend(10+)   0.1904762
## 4    43 Legend(10+)   0.1904762

Based on the data, a player has a 28.5% chance of playing 10 or more years if they are selected 37th overall. Players selected, 35,42, and 43 have a 19% chance of playing 10 or more years in the league.

A lot of second rounders do not play in the league, as their contracts aren’t guaranteed. Which spots had the highest chances of playing any years in the league?

probability_nba %>%
  select(Pk, Experience, probability) %>%
  filter(Pk >=31 & Experience !='Unsuccessful(0)' & probability >0.50)

## # A tibble: 5 x 3
## # Groups:   Pk [5]
##      Pk             Experience probability
##   <dbl>                 <fctr>       <dbl>
## 1    32 Little Experience(1-3)   0.6666667
## 2    33 Little Experience(1-3)   0.5238095
## 3    38 Little Experience(1-3)   0.5238095
## 4    40 Little Experience(1-3)   0.5238095
## 5    41 Little Experience(1-3)   0.5238095

there was a 67% chance of playing in the league if you are selected 32nd and 52% chance of playing in the league if you are selected 33rd, 38th, 40th, or 41st. All surprisingly more successful than the 31st pick.

Are other NBA statistics a better predictor for Experience?

From our initial data, I tested the other stats to check if they were better predictors of success.

Exp_all <- lm(Yrs ~ PTS + TRB + AST +FG. +X3P.+ FT., data = nba.players)
summary(Exp_all)

## 
## Call:
## lm(formula = Yrs ~ PTS + TRB + AST + FG. + X3P. + FT., data = nba.players)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -9.103 -1.554 -0.321  1.431  7.899 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.401e-01  7.630e-01   0.184    0.854    
## PTS         2.488e-04  4.273e-05   5.823 7.94e-09 ***
## TRB         9.606e-04  7.924e-05  12.124  < 2e-16 ***
## AST         6.672e-04  9.980e-05   6.685 3.95e-11 ***
## FG.         6.207e+00  1.258e+00   4.935 9.45e-07 ***
## X3P.        7.245e-01  5.987e-01   1.210    0.227    
## FT.         1.128e+00  7.346e-01   1.535    0.125    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.264 on 944 degrees of freedom
##   (283 observations deleted due to missingness)
## Multiple R-squared:  0.7433, Adjusted R-squared:  0.7417 
## F-statistic: 455.6 on 6 and 944 DF,  p-value: < 2.2e-16

From the data we can see that Pts, Rebounds, Assists and FG% is correlated with years in the NBA. This is unsurprising, considering that points, rebounds, assists, and field goal percentages are the most crucial statistics in the NBA.

Conclusion

To predict how successful an NBA player is, I modeled two numerical values, years in the league vs draft pick location and VORP score and draft pick location. Based on our linear model, draft pick location vs VORP had an R^2 of 12% and was heavily skewed. The model for draft pick location and years in the league had a strong R^2 of 30%.

Since, years in the league had a stronger correlation, I decided to categorize the amount of years in the league into the 6 categories below. 1.Legend - 10 or more years 2.Very Experienced - 8-9 Years 3.Experienced - 6-7 Years 4.Some Experience - 4-5 Years 5.Little Experience - 1-3 Years 6-Unsuccessful - 0 years

Using the categories above, I was able to find the probability of each location falling into the category. As expected there was an inverse relationship between players that played 10 years in the league vs players that played 0 years in the league. The probabilities of players that played between 4-7 years in the league were pretty evenly selected in the draft. Based on the probability scores above, I decided to look into second rounders. There was a surprising statistic that the 37th pick overall had a 28% chance of playing 10 years or more, which is significantly higher than any other picks. Picks 35, 42, and 43 were second highest with 19% chance of playing 10 or more years in the league.

Since playing in the league isn’t always guaranteed, I wanted to see which location had the highest chance or playing in the league at all. It was the 32nd pick that had a chance of 66% of playing 1-3 years in the league. Also, to stay in the league, other statistics are always important. Based on the overall statistics of the players from 1994 to 2014, points, assists, rebounds and field goal percentage had the highest correlation to the number of years in the league.

Overall, this if a player was known to be going into the second round, they have a 66% chance of playing the league if they are drafted 32nd and a 28% chance to play in the league more than 10 years if they are 37th overall. Since our model only has a 30% R^2, its important to consider other statistics such as points, assists, rebounds and field goal percentage.