Introduction

Rookie potential is something that relies on different factors; how they played in college, how the team plays, the amount of time they will get during a season, psychological facts such as like how they will fit with the team, their resilience, among other variables. I understand the jump from college football to NFL is a big step and as much as the players are good, they need to have opportunities for them to be important to a team. This analysis is to analyze and estimate rookie wide receivers, running backs, and tight ends opportunities in a team. This analysis will also be made with the help of chatGPT’s pro tool code interpreter to show how the future of analysis and data science looks like.

A rook can have opportunities depending on the team he lands and how important are the current players in a team. In this case we want to know overall whether rookies in teams having a better offensive power have more opportunities due too the amount of chances created per game.

Exploratory Data Analysis

Asked gpt to filter data via prompting after uploading a file with the NFL data.

nfl2022_per_game <- read_csv("~/Downloads/nfl_team_per_game_stats_2022.csv")
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   date = col_date(format = ""),
##   team = col_character(),
##   pass_attempts = col_double(),
##   rushing_attempts = col_double()
## )
head(nfl2022_per_game)
## # A tibble: 6 x 4
##   date       team     pass_attempts rushing_attempts
##   <date>     <chr>            <dbl>            <dbl>
## 1 2022-01-02 Falcons             23               22
## 2 2022-01-02 Packers             42               32
## 3 2022-01-02 Cowboys             39               17
## 4 2022-01-02 49ers               23               37
## 5 2022-01-02 Chargers            31               35
## 6 2022-01-02 Ravens              32               32

Rushing attempts

To calculate the total offensive plays for each team, I’ll sum up the pass attempts and rushing attempts.

The percentage of pass plays for each team is calculated as:

\[ \text{Pass Percentage} = \left( \frac{\text{Total Pass Attempts}}{\text{Total Offensive Plays}} \right) \times 100 \]

Similarly, the percentage of rush plays for each team is:

\[ \text{Rush Percentage} = \left( \frac{\text{Total Rushing Attempts}}{\text{Total Offensive Plays}} \right) \times 100 \]

Here’s a summary of the total offensive plays each team executed in 2022

nfl_off <- read_csv("~/Downloads/nfl_team_offensive_percentage_2022.csv")
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   team = col_character(),
##   total_offensive_plays = col_double(),
##   pass_percentage = col_double(),
##   rush_percentage = col_double()
## )
##      team           total_offensive_plays pass_percentage rush_percentage
##  Length:32          Min.   : 939          Min.   :42.19   Min.   :33.39  
##  Class :character   1st Qu.:1009          1st Qu.:50.51   1st Qu.:40.98  
##  Mode  :character   Median :1053          Median :56.24   Median :43.76  
##                     Mean   :1085          Mean   :55.17   Mean   :44.83  
##                     3rd Qu.:1158          3rd Qu.:59.02   3rd Qu.:49.49  
##                     Max.   :1286          Max.   :66.61   Max.   :57.81
nfl_off %>%
  arrange(desc(total_offensive_plays))
## # A tibble: 32 x 4
##    team       total_offensive_plays pass_percentage rush_percentage
##    <chr>                      <dbl>           <dbl>           <dbl>
##  1 Bengals                     1286            60.7            39.3
##  2 Chiefs                      1283            61.2            38.8
##  3 Buccaneers                  1279            66.6            33.4
##  4 Rams                        1219            57.0            43.0
##  5 Bills                       1212            55.9            44.1
##  6 Cowboys                     1203            52.3            47.7
##  7 49ers                       1182            49.7            50.3
##  8 Cardinals                   1174            60.4            39.6
##  9 Steelers                    1152            56.3            43.7
## 10 Eagles                      1151            49.7            50.3
## # … with 22 more rows

We have a rough estimate on how the teams play and their percentages of passing and rushing. We can see from the data that the mean is around 55 percent and median 56 favoring passing. But also we have a maximum of 66.61 percent of passing attempts.

Summary of the distribution of the number of offensive plays per game across all teams in 2022:

Count: 572 games Mean: Approximately 60.68 plays Standard Deviation: Approximately 8.59 plays Minimum: 37 plays 25th Percentile: 55 plays Median (50th Percentile): 60 plays 75th Percentile: 66 plays Maximum: 88 plays

This looks like a bimodal distribution which might be due to different type of playing styles, trailing or leading. Using KDE we found that there is only one mode at 60.69 so I will continue the analysis inferring a normal distribution.

Hypothesis testing

The one-tailed t-test is used to determine if one mean is greater (or less) than another. In this case, you want to test if a team’s average number of offensive plays is greater than the NFL 2022 average.

  1. Null Hypothesis (\(H_0\)): The average number of offensive plays for a given team is less than or equal to the NFL 2022 average.
  2. Alternative Hypothesis (\(H_a\)): The average number of offensive plays for a given team is greater than the NFL 2022 average.

The approach will involve performing a one-sample t-test.

The “Significantly Greater” column indicates whether a team’s average is statistically greater than the NFL 2022 average, with the significance level set at \(\alpha = 0.1\).

From the results: - Teams like the Buccaneers, Chargers, and Browns have averages that are significantly greater than the league average. - Most teams, however, do not have a statistically significant difference in their average number of offensive plays when compared to the overall NFL 2022 average.

This classification provides a clearer indication of which teams might have had an offensive approach that involved running more plays than the typical NFL team in the 2022 season.

## Warning: Missing column names filled in: 'X1' [1]
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   X1 = col_character(),
##   `T-Statistic` = col_double(),
##   `One-Tailed P-Value` = col_double(),
##   `Significantly Greater (α=0.1)` = col_logical()
## )
## # A tibble: 10 x 4
##    X1         `T-Statistic` `One-Tailed P-Valu… `Significantly Greater (\u03b1=…
##    <chr>              <dbl>               <dbl> <lgl>                           
##  1 Buccaneers          4.13            0.000315 TRUE                            
##  2 Chargers            3.24            0.00258  TRUE                            
##  3 Browns              2.85            0.00579  TRUE                            
##  4 Cardinals           2.65            0.00836  TRUE                            
##  5 Steelers            2.00            0.0309   TRUE                            
##  6 Eagles              1.75            0.0493   TRUE                            
##  7 Chiefs              1.71            0.0515   TRUE                            
##  8 Cowboys             1.62            0.0616   TRUE                            
##  9 Commanders          1.46            0.0825   TRUE                            
## 10 Bills               1.41            0.0880   TRUE

In this case we are 90% confident that these 10 teams offensive plays were above the average NFL offensive runs. Now that we have classified teams into offensive inclined and average, we can continue to check if there were more opportunities for rookies playing in an offensive team and infer this phenomenon for the next season. Here our question is: Do rookies playing in offense have more playing time in offensive inclined teams?

Prediction

Team style of playing

The premise of this analysis is that teams will play 2023 season fairly similar to what they played throughout 2022.

The coach-quarterback pair is very important for a teams’ performance so if the coach or the QB change for the upcoming season the estimation for that team won’t be as effective.

Data transformation

rook_data_1 <- read.csv("~/Downloads/rookies.csv")
head(rook_data_1)
##   Rnd. Pick.No.             NFL.team             Player Pos.    College
## 1    1        1 Jacksonville Jaguars      Travon Walker   DE    Georgia
## 2    1        2        Detroit Lions   Aidan Hutchinson   DE   Michigan
## 3    1        3       Houston Texans Derek Stingley Jr.   CB        LSU
## 4    1        4        New York Jets     Sauce Gardner†   CB Cincinnati
## 5    1        5      New York Giants  Kayvon Thibodeaux   DE     Oregon
## 6    1        6    Carolina Panthers        Ikem Ekwonu   OT   NC State
##          Conf.                                                      Notes
## 1          SEC                                                           
## 2      Big Ten 2021Lombardi Award,Lott TrophyandTed Hendricks Awardwinner
## 3          SEC                                                           
## 4 The American                                                           
## 5       Pac-12                                                           
## 6          ACC
rook_data_w_stats <- read.csv("~/Downloads/merged_on_name_uppercase_rookies.csv")
head(rook_data_w_stats)
##   Rnd. Pick.No.           NFL.team         Player Pos.           College
## 1    1        6  Carolina Panthers    IKEM EKWONU   OT          NC State
## 2    1        7    New York Giants      EVAN NEAL   OT           Alabama
## 3    1        8    Atlanta Falcons   DRAKE LONDON   WR               USC
## 4    1        9   Seattle Seahawks  CHARLES CROSS   OT Mississippi State
## 5    1       10      New York Jets GARRETT WILSON   WR        Ohio State
## 6    1       11 New Orleans Saints    CHRIS OLAVE   WR        Ohio State
##     Conf.                   Notes Unnamed..0 Season  Tm Age Pos  G GS RshTD
## 1     ACC                                 NA     NA      NA     NA NA    NA
## 2     SEC    from Chicago[R1 - 1]         NA     NA      NA     NA NA    NA
## 3  Pac-12                                137   2022 ATL  21  WR 17 15    NA
## 4     SEC     from Denver[R1 - 2]         NA     NA      NA     NA NA    NA
## 5 Big Ten    from Seattle[R1 - 3]        160   2022 NYJ  22  WR 17 12    NA
## 6 Big Ten from Washington[R1 - 4]        139   2022 NOR  22  WR 15  9    NA
##   RecTD PR.TD KR.TD FblTD IntTD OthTD AllTD X2PM X2PA D2P XPM XPA FGM FGA Sfty
## 1    NA    NA    NA    NA    NA    NA    NA   NA   NA  NA  NA  NA  NA  NA   NA
## 2    NA    NA    NA    NA    NA    NA    NA   NA   NA  NA  NA  NA  NA  NA   NA
## 3     4    NA    NA    NA    NA    NA     4    1   NA   0  NA  NA  NA  NA   NA
## 4    NA    NA    NA    NA    NA    NA    NA   NA   NA  NA  NA  NA  NA  NA   NA
## 5     4    NA    NA    NA    NA    NA     4   NA   NA   0  NA  NA  NA  NA   NA
## 6     4    NA    NA    NA    NA    NA     4    1   NA   0  NA  NA  NA  NA   NA
##   Pts Pts.G
## 1  NA    NA
## 2  NA    NA
## 3  26   1.5
## 4  NA    NA
## 5  24   1.4
## 6  26   1.7

From this visualization we can see the distribution of games played by rookies last season. Having Tampa Bay and Steelers as the ones which used their rookies the most.

Getting rookie demographics

data_check <- read.csv("~/Downloads/NFL Player Stats(1922 - 2022).csv")
data_check <- data_check %>%
  filter(Season == 2022)

To understand the demographics of the rookies, I’ll focus on the following aspects:

  1. Position Distribution: A breakdown of rookies by their playing position.
  2. Team Distribution: A visualization to understand which teams have the most rookies.

Let’s start with the position distribution of the rookies.

The bar chart above displays the position distribution of rookies.

Observations: 1. Wide Receivers (WR) and Cornerbacks (CB) dominate the rookie list. 2. Offensive Tackles (OT), Defensive Ends (DE), and Linebackers (LB) also have a significant presence among the rookies. 3. Fewer rookies play positions like Punter (P) and Fullback (FB).

Next, let’s visualize the distribution of rookies across different NFL teams.

The bar chart above displays the distribution of rookies across different NFL teams.

Observations: 1. Teams like “CIN” (Cincinnati Bengals), “BAL” (Baltimore Ravens), and “CLE” (Cleveland Browns) have a higher number of rookies. 2. Teams like “SF” (San Francisco 49ers) and “KC” (Kansas City Chiefs) have fewer rookies.

This completes the exploratory data analysis for the demographics of rookies.

Analysis of relationship between offensive created chances and rookie opportunities

To adjust our analysis based on \(\alpha = 0.052\) for determining if a team is offensive:

  1. We’ll reclassify the teams as “high offensive” or “average offensive” using the new \(\alpha\) value of 0.052 instaad of 0.1.
  2. We’ll then filter out the rookies (WR, TE, RB) from these teams.
  3. Finally, we’ll conduct the two-sample t-test using the games played by players from both categories.

Let’s start by reclassifying the teams based on the new \(\alpha\) value of 0.052.

offensive_teams_final_data <- nfl_off %>%
  merge(hypothesis_testing, by.x =c("team"), by.y = "X1") %>%
  filter(`One-Tailed P-Value` <= 0.052)%>%
  arrange(desc(total_offensive_plays))

head(offensive_teams_final_data)
##         team total_offensive_plays pass_percentage rush_percentage T-Statistic
## 1     Chiefs                  1283        61.18472        38.81528    1.712747
## 2 Buccaneers                  1279        66.61454        33.38546    4.129638
## 3  Cardinals                  1174        60.39182        39.60818    2.653686
## 4   Steelers                  1152        56.33681        43.66319    1.999291
## 5     Eagles                  1151        49.69592        50.30408    1.747961
## 6   Chargers                  1138        64.23550        35.76450    3.236272
##   One-Tailed P-Value Significantly Greater (\u03b1=0.1)
## 1       0.0515155568                               TRUE
## 2       0.0003146666                               TRUE
## 3       0.0083560575                               TRUE
## 4       0.0309109538                               TRUE
## 5       0.0492531096                               TRUE
## 6       0.0025840725                               TRUE

With the adjusted classification based on \(\alpha = 0.052\):

  • \(t\)-statistic: \(4.4358\)
  • \(p\)-value: \(0.000068\)

Given our significance level for the t-test (\(\alpha\) of 0.05), the \(p\)-value of \(0.000068\) is well below \(\alpha\). Therefore, we can again reject the null hypothesis (\(H_0\)) and conclude that there is a statistically significant difference in the average games played by players (WR, TE, RB) from teams classified as high offensive (based on \(\alpha = 0.052\)) compared to those from average offensive teams indicating it’s not likely due to random chance.

Conclusion We can expect rookies (WR, TE, RB) from Chargers, Chiefs, and Eagles rookies could have more playing time opportunities. The other teams have either changed their coach or QB making it riskier to say that they will play a similar style than last year. But if we se a trend of the average plays per game to be close to what they already have, we could expect the rookies to have opportunities.

Application In this case I would add Quentin Johnston to my watch list of fantasy team due to the high volume of passes Chargers perform. For further analysis we can see the distribution of pass plays WRs Allen, Williams and RB Ekeler had last season to have a better understanding on the potential of Johnston. The other prospect I have is Rashee Rice from Kansas city. This because Chiefs have the most offensive plays and great pass_percentage. Also Kansas City and is a safe bet due security in their coach-qb pair. This alongside the loss of Juju for the upcoming season culd give Rice some value.

Limitations

In this case we didn’t have data for all the teams’ plays-by-plays; this could be beneficial to analyze individual performance and gather other variables such as success rate of offensive plays which can be important to determine the variability of a rookie having more opportunities in a team.

This analysis is biased towards measuring opportunity with just chances created, but is a good starting point to make a better oriented decision.

This analysis is just to understand the direction teams could take next season. Fantasy points are not predicted through this analysis because they rely in completion of passes, distribution of plays among players, and other variables which are not addressed here.

Disclosure

This analysis was done using analysis copilot techniques with chatGPT’s code interpreter. As a Data Scientist is important for me and our profession to be transparent with our techniques so I will provide a link for the session. https://chat.openai.com/share/31a8e29a-dfe1-434d-91f3-b1f549d6390c

Resources

Data was gathered from: https://www.kaggle.com/datasets/kristofanderson/2012-2022-nfl-defense-and-offensive-statistics, https://en.wikipedia.org/wiki/2022_NFL_Draft