Intro

I wanted to see if I could create a model to predict if a candidate should be drafted in the 2015 NFL draft using historical combine data (1999-2014) and Madden 15 player ratings. I’m a big fan of the Madden games and I believe they do a good job at rating players on their roster. I downloaded the data from the following sites and assumed if they had a Madden rating then they should be drafted. Note: There were a few draft candidates with the same name as current NFL players. These were zeroed out.

Data overview

There are 5346 players total form 1999-2015 with 14 variables. At this

point, there appears to be many NA’s for Wonderlic, Bench Press, Shuttle and 3cone. The lack of Wonderlic scores was a surprise because I thought most candidates would have that published.

## 'data.frame':    5346 obs. of  15 variables:
##  $ Year       : int  2000 2011 2010 2010 2015 2011 2009 2005 2001 2013 ...
##  $ Name       : Factor w/ 5262 levels "A.C. Leonard",..: 4877 2861 298 1995 3664 1477 3664 746 3664 3267 ...
##  $ College    : Factor w/ 335 levels "Abilene Christian",..: 150 89 44 209 89 209 280 215 330 151 ...
##  $ POS        : Factor w/ 18 levels "C","CB","DE",..: 14 11 18 4 18 15 3 11 15 15 ...
##  $ Height     : int  77 75 70 76 74 73 76 75 69 73 ...
##  $ Weight     : int  211 270 186 295 202 213 274 236 207 230 ...
##  $ Wonderlic  : int  33 NA NA NA NA NA NA NA NA NA ...
##  $ Bench.Press: int  NA 30 13 23 NA 21 24 20 13 24 ...
##  $ VertLeap   : num  24.5 36.5 33.5 30.5 NA 34.5 31 45.5 39.5 31.5 ...
##  $ BroadJump  : int  99 125 105 114 NA 130 110 130 NA 118 ...
##  $ Shuttle    : num  4.38 4.37 4.18 4.48 NA 4.18 NA 4.13 NA 4.24 ...
##  $ X3Cone     : num  7.2 6.95 6.98 7.32 NA 7.28 NA 7.12 NA 6.75 ...
##  $ X40.Yard   : num  5.28 4.62 4.56 5.04 4.59 4.37 5 4.65 4.38 4.6 ...
##  $ Rating     : int  93 93 92 95 0 89 89 96 89 82 ...
##  $ Draft      : Factor w/ 2 levels "0","1": 2 2 2 2 1 2 2 2 2 2 ...
##       Year                    Name                            College    
##  Min.   :1999   Anthony Davis   :   3   Florida                   : 111  
##  1st Qu.:2003   Brandon Williams:   3   Florida State             : 110  
##  Median :2008   Chris Brown     :   3   Georgia                   : 105  
##  Mean   :2008   Chris Davis     :   3   Alabama                   :  99  
##  3rd Qu.:2013   Chris Jones     :   3   Southern Californiaifornia:  93  
##  Max.   :2015   Josh Davis      :   3   Oklahoma                  :  92  
##                 (Other)         :5328   (Other)                   :4736  
##       POS           Height          Weight        Wonderlic    
##  WR     : 684   Min.   :65.00   Min.   :155.0   Min.   : 4.00  
##  CB     : 534   1st Qu.:73.00   1st Qu.:207.0   1st Qu.:20.00  
##  RB     : 467   Median :74.00   Median :236.0   Median :25.00  
##  DE     : 429   Mean   :74.04   Mean   :245.1   Mean   :24.53  
##  OT     : 418   3rd Qu.:76.00   3rd Qu.:287.8   3rd Qu.:30.00  
##  DT     : 410   Max.   :82.00   Max.   :386.0   Max.   :48.00  
##  (Other):2404                                   NA's   :5061   
##   Bench.Press       VertLeap       BroadJump        Shuttle     
##  Min.   : 0.00   Min.   : 0.00   Min.   : 74.0   Min.   :0.000  
##  1st Qu.:17.00   1st Qu.:30.00   1st Qu.:108.0   1st Qu.:4.190  
##  Median :21.00   Median :33.00   Median :114.0   Median :4.340  
##  Mean   :21.32   Mean   :32.86   Mean   :113.3   Mean   :4.383  
##  3rd Qu.:25.00   3rd Qu.:36.00   3rd Qu.:120.0   3rd Qu.:4.560  
##  Max.   :51.00   Max.   :46.00   Max.   :147.0   Max.   :5.560  
##  NA's   :1702    NA's   :1052    NA's   :1183    NA's   :1682   
##      X3Cone         X40.Yard         Rating      Draft   
##  Min.   :0.000   Min.   :4.210   Min.   : 0.00   0:3913  
##  1st Qu.:6.990   1st Qu.:4.550   1st Qu.: 0.00   1:1433  
##  Median :7.220   Median :4.710   Median : 0.00           
##  Mean   :7.304   Mean   :4.795   Mean   :20.16           
##  3rd Qu.:7.570   3rd Qu.:5.000   3rd Qu.:64.00           
##  Max.   :9.120   Max.   :6.050   Max.   :99.00           
##  NA's   :1689    NA's   :15
I wanted to explore the distribution of the 7 combine metrics. Wonderlic, Bench Press, Vertical Leap, Broad Jump, Shuttle, 3Cone and 40 yd dash. Grey being the previous draft candidates and red being the 2015 candidates.

Modeling

I created a simple logistic classification model using past candidates combine performance metrics and Madden ratings to determine if the current candidates should be drafted or not.
Due to the poor data quality of Wonderlic metric, I’m setting all NA’s to be zero. I am also using complete observations to train the model. Luckily, there are 2048 complete observations, this should be a good representative sample. Looks like out of 2408 observations, 770 or 38% were drafted to the NFL.
##       Year                   Name            College          POS      
##  Min.   :1999   Aaron Williams :   2   Nebraska  :  47   CB     : 254  
##  1st Qu.:2003   Larry Brown    :   2   Iowa      :  43   OT     : 242  
##  Median :2007   Levi Brown     :   2   Notre Dame:  43   DE     : 227  
##  Mean   :2007   Michael Johnson:   2   Georgia   :  42   OLB    : 218  
##  3rd Qu.:2011   Mike Walker    :   2   Oklahoma  :  42   RB     : 206  
##  Max.   :2014   Ryan Grant     :   2   Clemson   :  40   DT     : 202  
##                 (Other)        :2399   (Other)   :2154   (Other):1062  
##      Height          Weight        Wonderlic        Bench.Press  
##  Min.   :65.00   Min.   :166.0   Min.   : 0.0000   Min.   : 0.0  
##  1st Qu.:73.00   1st Qu.:212.0   1st Qu.: 0.0000   1st Qu.:17.0  
##  Median :74.00   Median :248.0   Median : 0.0000   Median :21.0  
##  Mean   :74.28   Mean   :253.4   Mean   : 0.4529   Mean   :21.4  
##  3rd Qu.:76.00   3rd Qu.:300.0   3rd Qu.: 0.0000   3rd Qu.:25.0  
##  Max.   :80.00   Max.   :386.0   Max.   :34.0000   Max.   :51.0  
##                                                                  
##     VertLeap       BroadJump        Shuttle          X3Cone     
##  Min.   : 0.00   Min.   : 82.0   Min.   :0.000   Min.   :0.000  
##  1st Qu.:29.50   1st Qu.:106.0   1st Qu.:4.210   1st Qu.:7.000  
##  Median :33.00   Median :114.0   Median :4.370   Median :7.260  
##  Mean   :32.55   Mean   :112.5   Mean   :4.404   Mean   :7.329  
##  3rd Qu.:35.50   3rd Qu.:119.0   3rd Qu.:4.590   3rd Qu.:7.620  
##  Max.   :45.50   Max.   :139.0   Max.   :5.560   Max.   :9.040  
##                                                                 
##     X40.Yard         Rating     Draft   
##  Min.   :4.210   Min.   : 0.0   0:1640  
##  1st Qu.:4.570   1st Qu.: 0.0   1: 771  
##  Median :4.750   Median : 0.0           
##  Mean   :4.827   Mean   :24.2           
##  3rd Qu.:5.070   3rd Qu.:68.0           
##  Max.   :6.000   Max.   :99.0           
## 

Training 1st trial

I fit the classification model at first using position, height, weight, wonderlic, bench press, vertical leap, broad jump, shuttle, 3cone and 40 yd dash as predictors of whether or not the candidate is drafted.
From the model output, it does not seem bench press is too significant of a predictor. I feed the input data back into the model to validate. From the confusion matrix, the model drafted 412 out of 2408 with 157 false positives and 515 false negatives. This gives a precision of 770 / (770 + 157) = 83% and recall of 770 / (770 + 515) = 60%.
## 
## Call:
## glm(formula = Draft ~ POS + Height + Weight + Wonderlic + Bench.Press + 
##     VertLeap + BroadJump + Shuttle + X3Cone + X40.Yard, family = binomial, 
##     data = trainingdata)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.4119  -0.8505  -0.5596   1.0391   2.5522  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  19.805418   3.899124   5.079 3.79e-07 ***
## POSCB         0.367359   0.549945   0.668 0.504139    
## POSDE        -0.791971   0.340890  -2.323 0.020166 *  
## POSDT        -0.639610   0.290455  -2.202 0.027659 *  
## POSFB        -1.059816   0.468788  -2.261 0.023774 *  
## POSFS        -0.302096   0.536582  -0.563 0.573434    
## POSILB       -0.670943   0.424751  -1.580 0.114195    
## POSK          8.688904 882.745206   0.010 0.992147    
## POSLS       -23.844516 882.745651  -0.027 0.978450    
## POSOG         0.007986   0.292633   0.027 0.978228    
## POSOLB       -0.464583   0.401690  -1.157 0.247448    
## POSOT         0.104489   0.287851   0.363 0.716606    
## POSP        -21.523642 882.745225  -0.024 0.980547    
## POSQB       -14.405258 271.048630  -0.053 0.957615    
## POSRB        -0.718476   0.483170  -1.487 0.137014    
## POSSS        -0.211465   0.523777  -0.404 0.686411    
## POSTE        -0.568981   0.386264  -1.473 0.140741    
## POSWR         0.482998   0.536366   0.901 0.367854    
## Height       -0.111030   0.035038  -3.169 0.001531 ** 
## Weight        0.031028   0.004994   6.213 5.20e-10 ***
## Wonderlic     0.101337   0.022027   4.601 4.21e-06 ***
## Bench.Press   0.025175   0.010693   2.354 0.018555 *  
## VertLeap     -0.074123   0.020599  -3.598 0.000320 ***
## BroadJump     0.043611   0.011557   3.773 0.000161 ***
## Shuttle       2.824002   0.373751   7.556 4.16e-14 ***
## X3Cone       -2.865776   0.266635 -10.748  < 2e-16 ***
## X40.Yard     -3.011864   0.455140  -6.617 3.65e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 3022.0  on 2410  degrees of freedom
## Residual deviance: 2592.3  on 2384  degrees of freedom
## AIC: 2646.3
## 
## Number of Fisher Scoring iterations: 13
##    lcmpred
##        0    1
##   0 1483  157
##   1  515  256

Training 2nd trial

Looking at the predictive power and collinearity tables, I decided to take out bench press (not significant), wonderlic (poor data), broad jump (not significant), height and weight (positionally unique) for the next iteration. This time the model drafted 361 out of 2408 with 150 false positives and 559 false negatives. A slight bump in false negatives but not too big of a difference.
## 
## Call:
## glm(formula = Draft ~ POS + VertLeap + Shuttle + X3Cone + X40.Yard, 
##     family = binomial, data = trainingdata)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.8869  -0.8738  -0.5990   1.1224   2.1954  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  23.84178    2.66613   8.942  < 2e-16 ***
## POSCB        -2.30202    0.37840  -6.084 1.18e-09 ***
## POSDE        -1.62892    0.31371  -5.192 2.08e-07 ***
## POSDT        -0.39844    0.28372  -1.404   0.1602    
## POSFB        -2.39270    0.42174  -5.673 1.40e-08 ***
## POSFS        -2.66838    0.39703  -6.721 1.81e-11 ***
## POSILB       -2.02460    0.36528  -5.543 2.98e-08 ***
## POSK          6.60800  882.74505   0.007   0.9940    
## POSLS       -22.60638  882.74549  -0.026   0.9796    
## POSOG         0.19366    0.28589   0.677   0.4982    
## POSOLB       -1.83216    0.33986  -5.391 7.01e-08 ***
## POSOT         0.28289    0.27123   1.043   0.2969    
## POSP        -22.66549  882.74508  -0.026   0.9795    
## POSQB       -15.63707  302.53842  -0.052   0.9588    
## POSRB        -2.54794    0.37217  -6.846 7.58e-12 ***
## POSSS        -2.41490    0.40622  -5.945 2.77e-09 ***
## POSTE        -1.89693    0.33512  -5.660 1.51e-08 ***
## POSWR        -2.01588    0.39129  -5.152 2.58e-07 ***
## VertLeap     -0.03141    0.01750  -1.795   0.0726 .  
## Shuttle       2.95647    0.36278   8.149 3.65e-16 ***
## X3Cone       -2.74950    0.25454 -10.802  < 2e-16 ***
## X40.Yard     -3.12900    0.41634  -7.516 5.67e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 3022.0  on 2410  degrees of freedom
## Residual deviance: 2703.9  on 2389  degrees of freedom
## AIC: 2747.9
## 
## Number of Fisher Scoring iterations: 13
##    lcm2pred
##        0    1
##   0 1490  150
##   1  559  212

Results

I tested the second model with the 2015 draft class data and cross referenced with nfl.com rankings. http://www.nfl.com/top50

Results below:

Name College POS NFL.com Ranking
Dorial Green-Beckham Oklahoma WR 26
Phillip Dorsett Miami WR 47
Mario Alford West Virginia WR
Kenny Bell Nebraska WR
Justin Hardy East Carolina WR
Ali Marpet Hobart & William Smith OG
Trae Waynes Michigan State CB 10
Jalen Collins Louisiana State CB 30
Xavier Cooper Washington State DT
Jake Fisher Oregon OT 45
Kevin White West Virginia WR 2
Amari Cooper Alabama WR 3
J.J. Nelson UAB WR
Senquez Golson Mississippi CB
Stefon Diggs Maryland WR
Kaelin Clay Utah WR
Cameron Clear Texas AM OT
Chris Conley Georgia WR
Deon Long Maryland WR
Derrick Lott Tennessee-Chattanooga DT
DeAndrew White Alabama WR
Tyler Lockett Kansas State WR
Troy Hill Oregon CB