Essay Part 1

Every major american sport has a system for players who are young, talented, and are just coming out of college. This system is called the draft. The purpose of the draft is to pick the best college players and build there teams for the future. The NFL draft is 7 rounds and around 249-254 picks. The way to evaluate this talent other than watching them live or watching film has been the combine. The combine is a set of exercise/drills for talent scouts to evaluate college talents. There are six exercises to judge this (forty yard time, vertical jump, bench press reps, broad jump, three cone drill, shuttle time). The two I will break down are the forty yard dash time is just how fast you can run in a straight line for 40 yards and you are timed and the vertical jump which is how high you can jump with your arms over your head it is measured in inches For longest time there has been a debate which one of these exercises matter most and how they effect the GM/scouts decision to draft someone.

The goal of my visualization is to show which drill is highly valued based on the draft position of that player token. For example if a player runs a 40 yard dash time that is super fast (under 4.3 seconds) and gets drafted in the first round than that player is more highly valued than a player who runs a 40 yard dash time of 4.4 to 4.5 seconds (which is average) and gets picked in the second or third round. In my example I will use the position group of wide receivers and measure there forty yard dash and vertical jump and see how that correlates with there draft position

Data sets and libraries I loaded in

library(readr)
## Warning: package 'readr' was built under R version 4.3.3
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(RColorBrewer)
nfl_combine <- read_csv("nfl_combine.csv")
## Rows: 6128 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): position
## dbl (7): Round, forty, vertical, bench_reps, broad_jump, three_cone, shuttle
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

How I removed everything and when from over 6,000 observations to 1,930 observations

rm_combine <- nfl_combine[!is.na(nfl_combine$Round) & !is.na(nfl_combine$vertical) & !is.na(nfl_combine$three_cone) & !is.na(nfl_combine$bench_reps) & !is.na(nfl_combine$shuttle) & !is.na(nfl_combine$broad_jump) & !is.na(nfl_combine$forty),]

The top 6 columns for rm_combine

head(rm_combine)
## # A tibble: 6 × 8
##   position Round forty vertical bench_reps broad_jump three_cone shuttle
##   <chr>    <dbl> <dbl>    <dbl>      <dbl>      <dbl>      <dbl>   <dbl>
## 1 TE           7  4.89     30           17        106       7.86    4.45
## 2 OT           1  5.49     25           27         98       8.25    4.85
## 3 OT           4  5.07     30.5         34        106       8.09    4.78
## 4 OT           6  5.44     27           22        101       7.56    4.4 
## 5 TE           3  4.87     33           22        115       7.21    4.19
## 6 OG           4  5.23     28.5         22        110       7.85    4.61

The vectors I used to combine different categories of the dataset

round_selected <-rm_combine$Round
draft_position <- rm_combine$position   
forty_time <- rm_combine$forty

I filtered and arrange the data so I wouldn’t have 1,930 observations just 100.

wr_data <- rm_combine %>% filter(position == "WR")

fastest_wr <- wr_data %>% arrange(forty) 

top_100_fastest_wr <- head(fastest_wr, 100)

Created my line of regression compared the forty time and vertical jump to draft position

draft_position <- top_100_fastest_wr$Round
forty_time <- top_100_fastest_wr$forty
vertical_jump <- top_100_fastest_wr$vertical

model <- lm(forty_time ~ draft_position + vertical_jump, data = top_100_fastest_wr)
summary(model)
## 
## Call:
## lm(formula = forty_time ~ draft_position + vertical_jump, data = top_100_fastest_wr)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.140167 -0.031351  0.004884  0.036123  0.104688 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     4.526109   0.060226  75.152  < 2e-16 ***
## draft_position  0.006968   0.002587   2.693  0.00833 ** 
## vertical_jump  -0.003994   0.001594  -2.506  0.01386 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04859 on 97 degrees of freedom
## Multiple R-squared:  0.1338, Adjusted R-squared:  0.116 
## F-statistic: 7.495 on 2 and 97 DF,  p-value: 0.0009406

A scatter plot was used to graph all this data. Turns out there is a correlation

ggplot(top_100_fastest_wr, aes(x = forty_time, y = vertical_jump, color= (draft_position))) +
  geom_point() +
  geom_smooth(method = "lm", formula = y~x) +
  labs(title = "Forty Times/Vertical Jump vs Round Selected For WR", x = "Forty Times (Measured In Seconds)", y = "Vertical Jump (Measured In Inches)", color = "Round Drafted", caption = "Source: Sports Reference StatHead Football") +  theme_minimal(base_size = 12)   
## Warning: The following aesthetics were dropped during statistical transformation: colour
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?

Essay Part 2

This data set has a lot of observations that had N/A’s in them as you see in the coding above the data set went from 6,028 observations to 1,930 once all the N/A’s were removed. I used !is.na function to remove all those N/A’s. But I wasn’t done there I had to create smaller data points so I used the filter command to only specific one position group (Wide Receivers). Even then there was still too many observations so I chose to only include the top 100. Why the top 100 because the average time on forty yard dash for wide receivers is 4.48 seconds I wanted to get close to the average in fact the slowest time in the top 100 is 4.48 seconds and I wanted measure the fastest players as well to see if you made it into the top 100 does a .10 second difference really effect your draft position.

What I found interesting is the fact that there is not a massive difference between draft position and your forty time and vertical jump at according to the r squared which is only 11%. As someone who follows sports this is shocking as commentators and coaches are always big on speed and some of the biggest draft selection shocks are always because they can under 4.3 seconds. But there is a weird correlation between forty times and vertical jumps. It seems that if you run faster you can jump higher and if you run slower you jump lower as well. Looking at the graph it self forty times alone do start effecting your draft position once you pass 4.4 seconds. Out of everyone picked the last 4 rounds (51 players total) only 9 players ran below 4.4 seconds. so it is more likely you get picked later if you run over a 4.4. As for vertical jump there wasn’t a correlation between how high you can jump and how high you get drafted. As an example, the player with the lowest jump measured at 26 inches when in the third round and the player with the highest jump measured at 45 inches also went in the third round.

I wish I would have worked on this for all positions with all the other variables like bench press, broad jump, three cone drill, etc. to find which variables matter the most for draft position and which positions do these drill not correlate to. For example the reason I didn’t do quarterback is because none of these drills are not important measurement of how a quarterback can read defense or how accurate they can throw the ball. All these drills tell you about a quarterback is how well how an athlete he is not what he can do with the ball. But these drills do correlate well with more skill positions like wide receivers, running backs, corner backs, and safeties. Also how they turned out as pros with statistics does doing well in these drills equal a hall of fame career.