Rookie potential is something that relies on different factors; how they played in college, how the team plays, the amount of time they will get during a season, psychological facts such as like how they will fit with the team, their resilience, among other variables. I understand the jump from college football to NFL is a big step and as much as the players are good, they need to have opportunities for them to be important to a team. This analysis is to analyze and estimate rookie wide receivers, running backs, and tight ends opportunities in a team. This analysis will also be made with the help of chatGPT’s pro tool code interpreter to show how the future of analysis and data science looks like.
A rook can have opportunities depending on the team he lands and how important are the current players in a team. In this case we want to know overall whether rookies in teams having a better offensive power have more opportunities due too the amount of chances created per game.
Asked gpt to filter data via prompting after uploading a file with the NFL data.
nfl2022_per_game <- read_csv("~/Downloads/nfl_team_per_game_stats_2022.csv")
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## date = col_date(format = ""),
## team = col_character(),
## pass_attempts = col_double(),
## rushing_attempts = col_double()
## )
head(nfl2022_per_game)
## # A tibble: 6 x 4
## date team pass_attempts rushing_attempts
## <date> <chr> <dbl> <dbl>
## 1 2022-01-02 Falcons 23 22
## 2 2022-01-02 Packers 42 32
## 3 2022-01-02 Cowboys 39 17
## 4 2022-01-02 49ers 23 37
## 5 2022-01-02 Chargers 31 35
## 6 2022-01-02 Ravens 32 32
Rushing attempts
To calculate the total offensive plays for each team, I’ll sum up the pass attempts and rushing attempts.
The percentage of pass plays for each team is calculated as:
\[ \text{Pass Percentage} = \left( \frac{\text{Total Pass Attempts}}{\text{Total Offensive Plays}} \right) \times 100 \]
Similarly, the percentage of rush plays for each team is:
\[ \text{Rush Percentage} = \left( \frac{\text{Total Rushing Attempts}}{\text{Total Offensive Plays}} \right) \times 100 \]
Here’s a summary of the total offensive plays each team executed in 2022
nfl_off <- read_csv("~/Downloads/nfl_team_offensive_percentage_2022.csv")
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## team = col_character(),
## total_offensive_plays = col_double(),
## pass_percentage = col_double(),
## rush_percentage = col_double()
## )
## team total_offensive_plays pass_percentage rush_percentage
## Length:32 Min. : 939 Min. :42.19 Min. :33.39
## Class :character 1st Qu.:1009 1st Qu.:50.51 1st Qu.:40.98
## Mode :character Median :1053 Median :56.24 Median :43.76
## Mean :1085 Mean :55.17 Mean :44.83
## 3rd Qu.:1158 3rd Qu.:59.02 3rd Qu.:49.49
## Max. :1286 Max. :66.61 Max. :57.81
nfl_off %>%
arrange(desc(total_offensive_plays))
## # A tibble: 32 x 4
## team total_offensive_plays pass_percentage rush_percentage
## <chr> <dbl> <dbl> <dbl>
## 1 Bengals 1286 60.7 39.3
## 2 Chiefs 1283 61.2 38.8
## 3 Buccaneers 1279 66.6 33.4
## 4 Rams 1219 57.0 43.0
## 5 Bills 1212 55.9 44.1
## 6 Cowboys 1203 52.3 47.7
## 7 49ers 1182 49.7 50.3
## 8 Cardinals 1174 60.4 39.6
## 9 Steelers 1152 56.3 43.7
## 10 Eagles 1151 49.7 50.3
## # … with 22 more rows
We have a rough estimate on how the teams play and their percentages of passing and rushing. We can see from the data that the mean is around 55 percent and median 56 favoring passing. But also we have a maximum of 66.61 percent of passing attempts.
Count: 572 games Mean: Approximately 60.68 plays Standard Deviation: Approximately 8.59 plays Minimum: 37 plays 25th Percentile: 55 plays Median (50th Percentile): 60 plays 75th Percentile: 66 plays Maximum: 88 plays
This looks like a bimodal distribution which might be due to different type of playing styles, trailing or leading. Using KDE we found that there is only one mode at 60.69 so I will continue the analysis inferring a normal distribution.
The one-tailed t-test is used to determine if one mean is greater (or less) than another. In this case, you want to test if a team’s average number of offensive plays is greater than the NFL 2022 average.
The approach will involve performing a one-sample t-test.
The “Significantly Greater” column indicates whether a team’s average is statistically greater than the NFL 2022 average, with the significance level set at \(\alpha = 0.1\).
From the results: - Teams like the Buccaneers, Chargers, and Browns have averages that are significantly greater than the league average. - Most teams, however, do not have a statistically significant difference in their average number of offensive plays when compared to the overall NFL 2022 average.
This classification provides a clearer indication of which teams might have had an offensive approach that involved running more plays than the typical NFL team in the 2022 season.
## Warning: Missing column names filled in: 'X1' [1]
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## X1 = col_character(),
## `T-Statistic` = col_double(),
## `One-Tailed P-Value` = col_double(),
## `Significantly Greater (α=0.1)` = col_logical()
## )
## # A tibble: 10 x 4
## X1 `T-Statistic` `One-Tailed P-Valu… `Significantly Greater (\u03b1=…
## <chr> <dbl> <dbl> <lgl>
## 1 Buccaneers 4.13 0.000315 TRUE
## 2 Chargers 3.24 0.00258 TRUE
## 3 Browns 2.85 0.00579 TRUE
## 4 Cardinals 2.65 0.00836 TRUE
## 5 Steelers 2.00 0.0309 TRUE
## 6 Eagles 1.75 0.0493 TRUE
## 7 Chiefs 1.71 0.0515 TRUE
## 8 Cowboys 1.62 0.0616 TRUE
## 9 Commanders 1.46 0.0825 TRUE
## 10 Bills 1.41 0.0880 TRUE
In this case we are 90% confident that these 10 teams offensive plays were above the average NFL offensive runs. Now that we have classified teams into offensive inclined and average, we can continue to check if there were more opportunities for rookies playing in an offensive team and infer this phenomenon for the next season. Here our question is: Do rookies playing in offense have more playing time in offensive inclined teams?
The premise of this analysis is that teams will play 2023 season fairly similar to what they played throughout 2022.
The coach-quarterback pair is very important for a teams’ performance so if the coach or the QB change for the upcoming season the estimation for that team won’t be as effective.
rook_data_1 <- read.csv("~/Downloads/rookies.csv")
head(rook_data_1)
## Rnd. Pick.No. NFL.team Player Pos. College
## 1 1 1 Jacksonville Jaguars Travon Walker DE Georgia
## 2 1 2 Detroit Lions Aidan Hutchinson DE Michigan
## 3 1 3 Houston Texans Derek Stingley Jr. CB LSU
## 4 1 4 New York Jets Sauce Gardner† CB Cincinnati
## 5 1 5 New York Giants Kayvon Thibodeaux DE Oregon
## 6 1 6 Carolina Panthers Ikem Ekwonu OT NC State
## Conf. Notes
## 1 SEC
## 2 Big Ten 2021Lombardi Award,Lott TrophyandTed Hendricks Awardwinner
## 3 SEC
## 4 The American
## 5 Pac-12
## 6 ACC
rook_data_w_stats <- read.csv("~/Downloads/merged_on_name_uppercase_rookies.csv")
head(rook_data_w_stats)
## Rnd. Pick.No. NFL.team Player Pos. College
## 1 1 6 Carolina Panthers IKEM EKWONU OT NC State
## 2 1 7 New York Giants EVAN NEAL OT Alabama
## 3 1 8 Atlanta Falcons DRAKE LONDON WR USC
## 4 1 9 Seattle Seahawks CHARLES CROSS OT Mississippi State
## 5 1 10 New York Jets GARRETT WILSON WR Ohio State
## 6 1 11 New Orleans Saints CHRIS OLAVE WR Ohio State
## Conf. Notes Unnamed..0 Season Tm Age Pos G GS RshTD
## 1 ACC NA NA NA NA NA NA
## 2 SEC from Chicago[R1 - 1] NA NA NA NA NA NA
## 3 Pac-12 137 2022 ATL 21 WR 17 15 NA
## 4 SEC from Denver[R1 - 2] NA NA NA NA NA NA
## 5 Big Ten from Seattle[R1 - 3] 160 2022 NYJ 22 WR 17 12 NA
## 6 Big Ten from Washington[R1 - 4] 139 2022 NOR 22 WR 15 9 NA
## RecTD PR.TD KR.TD FblTD IntTD OthTD AllTD X2PM X2PA D2P XPM XPA FGM FGA Sfty
## 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## 2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## 3 4 NA NA NA NA NA 4 1 NA 0 NA NA NA NA NA
## 4 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## 5 4 NA NA NA NA NA 4 NA NA 0 NA NA NA NA NA
## 6 4 NA NA NA NA NA 4 1 NA 0 NA NA NA NA NA
## Pts Pts.G
## 1 NA NA
## 2 NA NA
## 3 26 1.5
## 4 NA NA
## 5 24 1.4
## 6 26 1.7
From this visualization we can see the distribution of games played by rookies last season. Having Tampa Bay and Steelers as the ones which used their rookies the most.
data_check <- read.csv("~/Downloads/NFL Player Stats(1922 - 2022).csv")
data_check <- data_check %>%
filter(Season == 2022)
To understand the demographics of the rookies, I’ll focus on the following aspects:
Let’s start with the position distribution of the rookies.
The bar chart above displays the position distribution of rookies.
Observations: 1. Wide Receivers (WR) and Cornerbacks (CB) dominate the rookie list. 2. Offensive Tackles (OT), Defensive Ends (DE), and Linebackers (LB) also have a significant presence among the rookies. 3. Fewer rookies play positions like Punter (P) and Fullback (FB).
Next, let’s visualize the distribution of rookies across different NFL teams.
The bar chart above displays the distribution of rookies across different NFL teams.
Observations: 1. Teams like “CIN” (Cincinnati Bengals), “BAL” (Baltimore Ravens), and “CLE” (Cleveland Browns) have a higher number of rookies. 2. Teams like “SF” (San Francisco 49ers) and “KC” (Kansas City Chiefs) have fewer rookies.
This completes the exploratory data analysis for the demographics of rookies.
To adjust our analysis based on \(\alpha = 0.052\) for determining if a team is offensive:
Let’s start by reclassifying the teams based on the new \(\alpha\) value of 0.052.
offensive_teams_final_data <- nfl_off %>%
merge(hypothesis_testing, by.x =c("team"), by.y = "X1") %>%
filter(`One-Tailed P-Value` <= 0.052)%>%
arrange(desc(total_offensive_plays))
head(offensive_teams_final_data)
## team total_offensive_plays pass_percentage rush_percentage T-Statistic
## 1 Chiefs 1283 61.18472 38.81528 1.712747
## 2 Buccaneers 1279 66.61454 33.38546 4.129638
## 3 Cardinals 1174 60.39182 39.60818 2.653686
## 4 Steelers 1152 56.33681 43.66319 1.999291
## 5 Eagles 1151 49.69592 50.30408 1.747961
## 6 Chargers 1138 64.23550 35.76450 3.236272
## One-Tailed P-Value Significantly Greater (\u03b1=0.1)
## 1 0.0515155568 TRUE
## 2 0.0003146666 TRUE
## 3 0.0083560575 TRUE
## 4 0.0309109538 TRUE
## 5 0.0492531096 TRUE
## 6 0.0025840725 TRUE
With the adjusted classification based on \(\alpha = 0.052\):
Given our significance level for the t-test (\(\alpha\) of 0.05), the \(p\)-value of \(0.000068\) is well below \(\alpha\). Therefore, we can again reject the null hypothesis (\(H_0\)) and conclude that there is a statistically significant difference in the average games played by players (WR, TE, RB) from teams classified as high offensive (based on \(\alpha = 0.052\)) compared to those from average offensive teams indicating it’s not likely due to random chance.
Conclusion We can expect rookies (WR, TE, RB) from Chargers, Chiefs, and Eagles rookies could have more playing time opportunities. The other teams have either changed their coach or QB making it riskier to say that they will play a similar style than last year. But if we se a trend of the average plays per game to be close to what they already have, we could expect the rookies to have opportunities.
Application In this case I would add Quentin Johnston to my watch list of fantasy team due to the high volume of passes Chargers perform. For further analysis we can see the distribution of pass plays WRs Allen, Williams and RB Ekeler had last season to have a better understanding on the potential of Johnston. The other prospect I have is Rashee Rice from Kansas city. This because Chiefs have the most offensive plays and great pass_percentage. Also Kansas City and is a safe bet due security in their coach-qb pair. This alongside the loss of Juju for the upcoming season culd give Rice some value.
In this case we didn’t have data for all the teams’ plays-by-plays; this could be beneficial to analyze individual performance and gather other variables such as success rate of offensive plays which can be important to determine the variability of a rookie having more opportunities in a team.
This analysis is biased towards measuring opportunity with just chances created, but is a good starting point to make a better oriented decision.
This analysis is just to understand the direction teams could take next season. Fantasy points are not predicted through this analysis because they rely in completion of passes, distribution of plays among players, and other variables which are not addressed here.
This analysis was done using analysis copilot techniques with chatGPT’s code interpreter. As a Data Scientist is important for me and our profession to be transparent with our techniques so I will provide a link for the session. https://chat.openai.com/share/31a8e29a-dfe1-434d-91f3-b1f549d6390c
Data was gathered from: https://www.kaggle.com/datasets/kristofanderson/2012-2022-nfl-defense-and-offensive-statistics, https://en.wikipedia.org/wiki/2022_NFL_Draft