The goal of any sports league is to generate as competitive and entertaining games as possible. After all, it is entertainment, and the more competitive games are, the more audience they will be able to attract. With this in mind, we have to think what the optimal distribution of team wins should be. Intuitively, we do not want it to be too spread out, since that would mean there is a large number of mediocre and bad teams. We also do not want the distribution to be too narrow, as that would complicate figuring out which teams made the playoffs. The three distributions below illustrate the three options.
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
In this project, I will analyze the competitive balance of the
National Football League (NFL) using the team Win/Loss ratio variance as
the indicator of competition. To avoid bias stemming from individual
team runs within a given season, I will be using data from 2003 until
the most recent complete season (2023). The paper concludes with a
statement of the current state of competition in the NFL.
The data used in this project comes from “Team Rankings”
website. A sports data bank, which, among other things, includes the win
and loss data for each NFL team. For sake of convenience, I will omit
games that ended up in a draw from my calculations. The table bellow
shows each teams W/L/D statistics since 2003.
| Team | Wins | Losses | Draws | Win % |
|---|---|---|---|---|
| New England | 269 | 112 | 0 | 70.6 |
| Pittsburgh | 229 | 135 | 2 | 62.9 |
| Green Bay | 223 | 145 | 2 | 60.6 |
| Indianapolis | 218 | 148 | 1 | 59.6 |
| Baltimore | 216 | 150 | 0 | 59.0 |
| Kansas City | 215 | 153 | 0 | 58.4 |
| Seattle | 214 | 155 | 1 | 58.0 |
| Philadelphia | 208 | 156 | 2 | 57.1 |
| New Orleans | 205 | 154 | 0 | 57.1 |
| Dallas | 201 | 156 | 0 | 56.3 |
| Minnesota | 185 | 167 | 2 | 52.6 |
| Denver | 187 | 169 | 0 | 52.5 |
| LA Chargers | 186 | 169 | 0 | 52.4 |
| Atlanta | 175 | 179 | 0 | 49.4 |
| Buffalo | 174 | 178 | 0 | 49.4 |
| Cincinnati | 173 | 178 | 4 | 49.3 |
| San Francisco | 175 | 186 | 1 | 48.5 |
| Tennessee | 169 | 184 | 0 | 47.9 |
| Carolina | 170 | 186 | 1 | 47.8 |
| Chicago | 163 | 187 | 0 | 46.6 |
| NY Giants | 164 | 192 | 1 | 46.1 |
| Miami | 158 | 188 | 0 | 45.7 |
| LA Rams | 159 | 196 | 1 | 44.8 |
| Arizona | 155 | 195 | 2 | 44.3 |
| Tampa Bay | 156 | 197 | 0 | 44.2 |
| Houston | 155 | 198 | 1 | 43.9 |
| NY Jets | 147 | 204 | 0 | 41.9 |
| Washington | 138 | 208 | 2 | 39.9 |
| Detroit | 133 | 213 | 2 | 38.4 |
| Jacksonville | 134 | 216 | 0 | 38.3 |
| Las Vegas | 125 | 219 | 0 | 36.3 |
| Cleveland | 119 | 225 | 1 | 34.6 |
To check for data accuracy, I run a simple test of summing
all of the wins and losses. If the dataset is complete, the number of
wins and losses should be equal to each other, as for each game a team
wins, another team needs to lose.
TotalWins <- sum(data$Wins)
TotalWins
## [1] 5698
TotalLosses <- sum(data$Loss)
TotalLosses
## [1] 5698
TotalGames <- (TotalWins+TotalLosses)
TotalGames
## [1] 11396
AvergeWP <- (TotalWins)/(TotalWins+TotalLosses)
AvergeWP
## [1] 0.5
The number of wins and losses are equal. The data is
balanced and we may proceed with the analysis.
In this analysis, we will rely on three main variables:
We have estimated above that \(p\) will be equal to 0.5. We also know,
that the total number of games in the dataset, \(M\), amounts to 11,396 total games, and the
total number of wins, \(x\) is equal to
5,698.
The remaining calculations are fairly simple, to
calculate the expected team win percentage, we utilize the following
formula:
The wins variance formula then becomes
To extend this formula to calculate the Win/Loss ratio variance, we adjust the formula to
Using the formula we derived above, the ideal competitive
balance (standaard deviation) should therefore be calculated as the
Average Win Percentage over the total number of games.
Ideal <- AvergeWP/sqrt(TotalGames)
Ideal
## [1] 0.004683751
And the calculations reveal that the optimal W/L standard
deviation for the NFL is 0.0046, or, less than half a percent. This
means that 68% of the teams should have a W/L ratio between 0.49532 and
0.50468.
However, after estimating the standard deviation of the
actual W/L ratio in the league, we find out that the SD is much
greater.
Actual <- sd(data$Win_Percentage/100)
Actual
## [1] 0.08465147
The drastically bigger size in the real SD compared to
ideal SD indicates the presence of the Larage Competetive Imbalance
scenario outlined in the Introduction of this write up.
The calculation below is a test, aiming to see how large
the competetive imbalance is, with values greater than one indicating
the presence of said imbalance.
Z <- Actual/Ideal
Z
## [1] 18.07344
The results, unsurprisingly, show that the actual variance
is 18 times the size of the optimal competitive balance levels. If we
look back at the table of teams and their records over the past 21
years, it becomes obvious that, while some teams may have had a good
season or two, there is still clear separation between the blue blooded
programs, such as the Patriots, and the poverty franchises, such as the
Browns. Therefore, the league should be focused on furthering
competition in the league until this inequity is at least partially
under control.