With the NFL combine ending this weekend, teams have differing views on the importance of the event. Which drills are important for predicting player success? Can we find a signal (a high correlation) among drills and remove drills that are noisy (no correlation)? For this project, I wanted to investigate the different correlations between the players’ PFF grades and their NFL Combine results. Because each position plays a different role on the field, some drills may be better suited for one over another. Therefore, I wanted to group the players into their respective positions and find the correlations for each of the combine drills.
First I loaded the data from the given folder.
collegeCFFdata <- read.csv("~/Desktop/ProjectPFF/collegeCFFdata.csv", stringsAsFactors = FALSE)
proPFFdata <- read.csv("~/Desktop/ProjectPFF/proPFFdata.csv", stringsAsFactors = FALSE)
I then displayed all the different possible positions found in the data set.
# Position groupings
unique(proPFFdata$CareerPos[order(proPFFdata$CareerPos)])
## [1] "34DE" "34LE" "34OLB" "34RE" "43DE" "43LE" "43OLB" "43RE"
## [9] "DC" "DS" "DT" "FB" "FS" "HB" "ILB" "K"
## [17] "LC" "LG" "LS" "LT" "NT" "OC" "P" "QB"
## [25] "RC" "RG" "RS" "RT" "SS" "TE" "UT" "WR"
## [33] NA
Because there are so many positions, I wanted to group them into common positions across different teams. For example, 34OLB’s and 43DE’s were grouped together into one position called EDGE. This was done for each of the common positions we see in the NFL right now.
proPFFdata$pos = "N/A"
proPFFdata$pos = ifelse(proPFFdata$CareerPos == "43DE" | proPFFdata$CareerPos == "34OLB" | proPFFdata$CareerPos == "43RE" | proPFFdata$CareerPos == "43LE", "EDGE", proPFFdata$pos)
proPFFdata$pos = ifelse(proPFFdata$CareerPos == "43OLB" | proPFFdata$CareerPos == "ILB", "LB", proPFFdata$pos)
proPFFdata$pos = ifelse(proPFFdata$CareerPos == "34DE" | proPFFdata$CareerPos == "34RE" | proPFFdata$CareerPos == "34LE" | proPFFdata$CareerPos == "DT" | proPFFdata$CareerPos == "NT" | proPFFdata$CareerPos == "UT", "DT", proPFFdata$pos)
proPFFdata$pos = ifelse(proPFFdata$CareerPos == "SS" | proPFFdata$CareerPos == "FS" | proPFFdata$CareerPos == "DS" , "S", proPFFdata$pos)
proPFFdata$pos = ifelse(proPFFdata$CareerPos == "RC" | proPFFdata$CareerPos == "LC" | proPFFdata$CareerPos == "DC" , "CB", proPFFdata$pos)
proPFFdata$pos = ifelse(proPFFdata$CareerPos == "QB", "QB", proPFFdata$pos)
proPFFdata$pos = ifelse(proPFFdata$CareerPos == "WR", "WR", proPFFdata$pos)
proPFFdata$pos = ifelse(proPFFdata$CareerPos == "TE", "TE", proPFFdata$pos)
proPFFdata$pos = ifelse(proPFFdata$CareerPos == "HB", "HB", proPFFdata$pos)
proPFFdata$pos = ifelse(proPFFdata$CareerPos == "RT" | proPFFdata$CareerPos == "LT", "OT", proPFFdata$pos)
proPFFdata$pos = ifelse(proPFFdata$CareerPos == "RG" | proPFFdata$CareerPos == "LG" | proPFFdata$CareerPos == "OC", "IOL", proPFFdata$pos)
# Total grade is calculated by taking the sum of each of the different individual skill grades that a player participated in.
proPFFdata$total_grade = rowSums(proPFFdata[,c(12:21)])
pros = proPFFdata %>%
select(pff_GSISPLAYERID, pos, total_grade, mHt, mWt, mArm, mHand, aReps, aSS20, a3C, aVJ, aBJ, a10Y, a20Y, aF20, a20Y, a40Y) %>%
filter(!is.na(pos))
pros = pros %>% filter(pos != "N/A" & pos != "QB")
With the data set only containing common positions, I took the correlations of every position’s PFF grades and the combine drills and put it into a data frame.
m = matrix(NA, nrow = 13, ncol = length(unique(pros$pos)))
df = data.frame(m)
i = 1
for(pos in unique(pros$pos)) {
pff = pros[pros$pos == pos,]
pff = pff %>% group_by(pff_GSISPLAYERID, pos) %>% summarise_all(funs(mean))
df1 = cor(pff[,c(-1,-2)], use = "complete.obs")
df[,i] = df1[-1,1]
colnames(df)[i] = pos
i = i + 1
}
rownames(df) = rownames(df1[-1,])
The code below shows the creation of the heatmap. Blue values show a positive correlation, meaning that higher values of the drill correlate with higher grades in the position. This is common in the strength and explosion drills (aVJ, aBJ). Red values show that lower values result in higher grades. This is common among speed drills. White tiles show that there’s no correlation present, meaning that the drill will not be predictive for a certain position’s success.
cor_col_mat = data.matrix(df)
bk <- c(seq(-1,1,by=0.05))
mycols <- c(colorRampPalette(colors = c("red","white"))(length(bk)-21),colorRampPalette(colors = c("white","blue"))(length(bk)-21))
heatmap(cor_col_mat, Rowv=NA, Colv=NA, xlab = "Position", ylab = "Drill", col = mycols, scale = "none", breaks = bk, cexCol = 1.25, main = "Correlation Map of PFF Grades")
As we can see, there are a lot of light colors displayed in the different squares, showing that there isn’t a strong correlation with any combine drill with a particular position. However, we can see some positions have multiple non-white filled areas. Edge rushers tend to have success when they produce low (fast) times in the speed/quickness drills and can jump high in the vertical jump (aVJ). Halfbacks are another position that seems to show a similar story. For running backs, players that can post great broad and vertical jumps (showing explosion) and run well in the 40 yard dash seem to have more success in the pros. The last position that has some non-zero correlations is offensive tackles. OT’s that have good explosion numbers and solid short shuttle (aSS20) and first 20 yard times (aF20) tend to have success according to their PFF grades. Overall, a combine shouldn’t make or break a prospect’s evaluation, but there are some leading indicators on which drills can point us in the right direction for success in the NFL.
Can we incorporate college grades in this as well? It might behoove us to use college grades along with combine results to see if there are correlations with these numbers.
Can we isolate the different skill grades to figure out the different correlations? In this project, I used a sum to calculate every player’s grade for each game, but there is probably some value into selecting which drills correlate with different skill sets. For example, if we were looking to select a receiving TE, we might find that a TE receiving grade correlates well with speed, but there’s no correlation with speed and an overall grade.