Introduction

     The goal of any sports league is to generate as competitive and entertaining games as possible. After all, it is entertainment, and the more competitive games are, the more audience they will be able to attract. With this in mind, we have to think what the optimal distribution of team wins should be. Intuitively, we do not want it to be too spread out, since that would mean there is a large number of mediocre and bad teams. We also do not want the distribution to be too narrow, as that would complicate figuring out which teams made the playoffs. The three distributions below illustrate the three options.

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.


     In this project, I will analyze the competitive balance of the National Football League (NFL) using the team Win/Loss ratio variance as the indicator of competition. To avoid bias stemming from individual team runs within a given season, I will be using data from 2003 until the most recent complete season (2023). The paper concludes with a statement of the current state of competition in the NFL.

Data


     The data used in this project comes from “Team Rankings” website. A sports data bank, which, among other things, includes the win and loss data for each NFL team. For sake of convenience, I will omit games that ended up in a draw from my calculations. The table bellow shows each teams W/L/D statistics since 2003.


NFL Team Performance Data since 2003
Team Wins Losses Draws Win %
New England 269 112 0 70.6
Pittsburgh 229 135 2 62.9
Green Bay 223 145 2 60.6
Indianapolis 218 148 1 59.6
Baltimore 216 150 0 59.0
Kansas City 215 153 0 58.4
Seattle 214 155 1 58.0
Philadelphia 208 156 2 57.1
New Orleans 205 154 0 57.1
Dallas 201 156 0 56.3
Minnesota 185 167 2 52.6
Denver 187 169 0 52.5
LA Chargers 186 169 0 52.4
Atlanta 175 179 0 49.4
Buffalo 174 178 0 49.4
Cincinnati 173 178 4 49.3
San Francisco 175 186 1 48.5
Tennessee 169 184 0 47.9
Carolina 170 186 1 47.8
Chicago 163 187 0 46.6
NY Giants 164 192 1 46.1
Miami 158 188 0 45.7
LA Rams 159 196 1 44.8
Arizona 155 195 2 44.3
Tampa Bay 156 197 0 44.2
Houston 155 198 1 43.9
NY Jets 147 204 0 41.9
Washington 138 208 2 39.9
Detroit 133 213 2 38.4
Jacksonville 134 216 0 38.3
Las Vegas 125 219 0 36.3
Cleveland 119 225 1 34.6

Data Checks


     To check for data accuracy, I run a simple test of summing all of the wins and losses. If the dataset is complete, the number of wins and losses should be equal to each other, as for each game a team wins, another team needs to lose.

TotalWins <- sum(data$Wins)
TotalWins
## [1] 5698
TotalLosses <- sum(data$Loss)
TotalLosses
## [1] 5698
TotalGames <- (TotalWins+TotalLosses)
TotalGames
## [1] 11396
AvergeWP <- (TotalWins)/(TotalWins+TotalLosses)
AvergeWP
## [1] 0.5


     The number of wins and losses are equal. The data is balanced and we may proceed with the analysis.

Calculations


     In this analysis, we will rely on three main variables:


\(x\) = \(Total\) \(Number\) \(of\) \(Wins\)
\(M\) = \(Total\) \(Number\) \(of\) \(Games\)
\(p\) = \(Total\) \(Team\) \(Win\) \(Probability\)


     We have estimated above that \(p\) will be equal to 0.5. We also know, that the total number of games in the dataset, \(M\), amounts to 11,396 total games, and the total number of wins, \(x\) is equal to 5,698.
     The remaining calculations are fairly simple, to calculate the expected team win percentage, we utilize the following formula:


\(E(x)\) = \(p\) \(*\) \(M\)


     The wins variance formula then becomes


\(Var(x)\) = \(p\) \(*\) \(E(x)\)
\(Var(x)\) =\(p\) \(*\) \(p\) \(*\) \(M\)
\(Var(x)\) =\(p^{2}\) \(*\) \(M\)


     To extend this formula to calculate the Win/Loss ratio variance, we adjust the formula to


\[\text{Var}\left(\frac{x}{M}\right)=\frac{P^2 \cdot M}{M^2}\]

      Which, simplified and square rooted to turn into a standard deviation formula becomes:
\[\text{Var}\left(\frac{x}{M}\right)=\frac{P}{\sqrt{M}}\]


Ideal Balance


     Using the formula we derived above, the ideal competitive balance (standaard deviation) should therefore be calculated as the Average Win Percentage over the total number of games.

Ideal <- AvergeWP/sqrt(TotalGames)
Ideal
## [1] 0.004683751


     And the calculations reveal that the optimal W/L standard deviation for the NFL is 0.0046, or, less than half a percent. This means that 68% of the teams should have a W/L ratio between 0.49532 and 0.50468.

Actual Balance


      However, after estimating the standard deviation of the actual W/L ratio in the league, we find out that the SD is much greater.

Actual <- sd(data$Win_Percentage/100) 
Actual
## [1] 0.08465147


      The drastically bigger size in the real SD compared to ideal SD indicates the presence of the Larage Competetive Imbalance scenario outlined in the Introduction of this write up.

Competetive Balance Assesment


      The calculation below is a test, aiming to see how large the competetive imbalance is, with values greater than one indicating the presence of said imbalance.

Z <- Actual/Ideal
Z
## [1] 18.07344


      The results, unsurprisingly, show that the actual variance is 18 times the size of the optimal competitive balance levels. If we look back at the table of teams and their records over the past 21 years, it becomes obvious that, while some teams may have had a good season or two, there is still clear separation between the blue blooded programs, such as the Patriots, and the poverty franchises, such as the Browns. Therefore, the league should be focused on furthering competition in the league until this inequity is at least partially under control.