Every major american sport has a system for players who are young, talented, and are just coming out of college. This system is called the draft. The purpose of the draft is to pick the best college players and build there teams for the future. The NFL draft is 7 rounds and around 249-254 picks. The way to evaluate this talent other than watching them live or watching film has been the combine. The combine is a set of exercise/drills for talent scouts to evaluate college talents. There are six exercises to judge this (forty yard time, vertical jump, bench press reps, broad jump, three cone drill, shuttle time). The two I will break down are the forty yard dash time is just how fast you can run in a straight line for 40 yards and you are timed and the vertical jump which is how high you can jump with your arms over your head it is measured in inches For longest time there has been a debate which one of these exercises matter most and how they effect the GM/scouts decision to draft someone.
The goal of my visualization is to show which drill is highly valued based on the draft position of that player token. For example if a player runs a 40 yard dash time that is super fast (under 4.3 seconds) and gets drafted in the first round than that player is more highly valued than a player who runs a 40 yard dash time of 4.4 to 4.5 seconds (which is average) and gets picked in the second or third round. In my example I will use the position group of wide receivers and measure there forty yard dash and vertical jump and see how that correlates with there draft position
library(readr)
## Warning: package 'readr' was built under R version 4.3.3
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(RColorBrewer)
nfl_combine <- read_csv("nfl_combine.csv")
## Rows: 6128 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): position
## dbl (7): Round, forty, vertical, bench_reps, broad_jump, three_cone, shuttle
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
rm_combine <- nfl_combine[!is.na(nfl_combine$Round) & !is.na(nfl_combine$vertical) & !is.na(nfl_combine$three_cone) & !is.na(nfl_combine$bench_reps) & !is.na(nfl_combine$shuttle) & !is.na(nfl_combine$broad_jump) & !is.na(nfl_combine$forty),]
head(rm_combine)
## # A tibble: 6 × 8
## position Round forty vertical bench_reps broad_jump three_cone shuttle
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 TE 7 4.89 30 17 106 7.86 4.45
## 2 OT 1 5.49 25 27 98 8.25 4.85
## 3 OT 4 5.07 30.5 34 106 8.09 4.78
## 4 OT 6 5.44 27 22 101 7.56 4.4
## 5 TE 3 4.87 33 22 115 7.21 4.19
## 6 OG 4 5.23 28.5 22 110 7.85 4.61
round_selected <-rm_combine$Round
draft_position <- rm_combine$position
forty_time <- rm_combine$forty
wr_data <- rm_combine %>% filter(position == "WR")
fastest_wr <- wr_data %>% arrange(forty)
top_100_fastest_wr <- head(fastest_wr, 100)
draft_position <- top_100_fastest_wr$Round
forty_time <- top_100_fastest_wr$forty
vertical_jump <- top_100_fastest_wr$vertical
model <- lm(forty_time ~ draft_position + vertical_jump, data = top_100_fastest_wr)
summary(model)
##
## Call:
## lm(formula = forty_time ~ draft_position + vertical_jump, data = top_100_fastest_wr)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.140167 -0.031351 0.004884 0.036123 0.104688
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.526109 0.060226 75.152 < 2e-16 ***
## draft_position 0.006968 0.002587 2.693 0.00833 **
## vertical_jump -0.003994 0.001594 -2.506 0.01386 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.04859 on 97 degrees of freedom
## Multiple R-squared: 0.1338, Adjusted R-squared: 0.116
## F-statistic: 7.495 on 2 and 97 DF, p-value: 0.0009406
ggplot(top_100_fastest_wr, aes(x = forty_time, y = vertical_jump, color= (draft_position))) +
geom_point() +
geom_smooth(method = "lm", formula = y~x) +
labs(title = "Forty Times/Vertical Jump vs Round Selected For WR", x = "Forty Times (Measured In Seconds)", y = "Vertical Jump (Measured In Inches)", color = "Round Drafted", caption = "Source: Sports Reference StatHead Football") + theme_minimal(base_size = 12)
## Warning: The following aesthetics were dropped during statistical transformation: colour
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
## the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
## variable into a factor?
This data set has a lot of observations that had N/A’s in them as you see in the coding above the data set went from 6,028 observations to 1,930 once all the N/A’s were removed. I used !is.na function to remove all those N/A’s. But I wasn’t done there I had to create smaller data points so I used the filter command to only specific one position group (Wide Receivers). Even then there was still too many observations so I chose to only include the top 100. Why the top 100 because the average time on forty yard dash for wide receivers is 4.48 seconds I wanted to get close to the average in fact the slowest time in the top 100 is 4.48 seconds and I wanted measure the fastest players as well to see if you made it into the top 100 does a .10 second difference really effect your draft position.
What I found interesting is the fact that there is not a massive difference between draft position and your forty time and vertical jump at according to the r squared which is only 11%. As someone who follows sports this is shocking as commentators and coaches are always big on speed and some of the biggest draft selection shocks are always because they can under 4.3 seconds. But there is a weird correlation between forty times and vertical jumps. It seems that if you run faster you can jump higher and if you run slower you jump lower as well. Looking at the graph it self forty times alone do start effecting your draft position once you pass 4.4 seconds. Out of everyone picked the last 4 rounds (51 players total) only 9 players ran below 4.4 seconds. so it is more likely you get picked later if you run over a 4.4. As for vertical jump there wasn’t a correlation between how high you can jump and how high you get drafted. As an example, the player with the lowest jump measured at 26 inches when in the third round and the player with the highest jump measured at 45 inches also went in the third round.
I wish I would have worked on this for all positions with all the other variables like bench press, broad jump, three cone drill, etc. to find which variables matter the most for draft position and which positions do these drill not correlate to. For example the reason I didn’t do quarterback is because none of these drills are not important measurement of how a quarterback can read defense or how accurate they can throw the ball. All these drills tell you about a quarterback is how well how an athlete he is not what he can do with the ball. But these drills do correlate well with more skill positions like wide receivers, running backs, corner backs, and safeties. Also how they turned out as pros with statistics does doing well in these drills equal a hall of fame career.