Introduction

This paper analyzes the level of competitive balance existent in Major League Baseball by studying changes in the standard deviation of teams’ win/loss records from 2002 through 2022. Of note, the 2020 season is excluded from analysis due to the impacts of COVID-19 both on the length of the season and the dynamics of game play.

Although perfect competitive balance would result in a win/loss record of \(0.500\) for every team, competitive balance is thought of in an ideal sense that allows for some teams to outperform others. In order to have “ideal competitive balance,” the standard deviation of wins and losses should be at or close to \(P/\sqrt{M}\), where \(P\) is the ideal win/loss (\(0.500\)) and \(M\) is the number of games played in a standard season. In the case of the MLB, the ideal standard deviation of win/loss records is \(0.5/\sqrt{162} = 0.0393\) (Rotthoff).

The only data used in this study consisted of team win/loss records from 2002 through 2022. With few games left in the 2022 season, it was treated as complete for the sake of simplicity (Baseball, 2022).

Data Exploration

Standard Deviation comparisons and measures of the statistical significance of changes in the variance of win/loss records supports the majority of the findings of this paper. However, before analyzing these values, it was of interest to visually understand the distribution of win/loss records in the MLB and how that distribution has changed over time. Although somewhat subjective, this methodology continues to fit within the construct of analyzing ideal competitive balance by looking at the following items over time:

Below are two histograms which show the changing distribution of teams’ win/loss records overall (Figure 1) and at home (Figure 2). Comparing the light-grey filled density curve (2016 season) with the black density curve (2022 season), it is clear that there has been a flattening of the distribution of results over the last several years of play, both overall and at home. Although it is unclear whether these changes hold any statistical significance at this point, the findings are suggestive of a change in the competitiveness of the sport.

In an effort to better understand the changes in the extremes of the sport (e.g., the minimum and maximum win/loss records), scatterplots were developed with a team’s rank on the x-axis and the team’s win/loss record on the y-axis. Looking at Figure 3, where the color-coding of the points is based on the season, it is clear that the lighter dots tend to be heading towards growing extremes. While it’s possible that these findings imply competitive imbalance, it was of additional interest to ascertain whether it was the same teams in the same rank and win/loss positions. Figure 4, which is color-coded by team, does not display a concentration of color at the extremes suggesting that there is variability in how well a team performs despite the introduction of extremes in win/loss records.

Data Development

The figures above suggest some potential growing imbalance in the competitiveness of the MLB. In an effort to understand the true competitive nature of the sport, several statistics were developed by season, beginning in 2003; 2002 was ultimately excluded due to the use of lagged values in some statistics’ development. The statistics are summarized as follows and are shown in the table below:

  • SD_WL_SEASON - the standard deviation of win/loss records in a season;
  • SD_IDEAL_COMPARE - a comparison of the season’s win/loss standard deviation to the ideal of \(0.0393\);
  • SD_PRIOR_COMPARE - a comparison of the season’s win/loss standard deviation to the prior season’s;
  • SD_S1_COMPARE - a comparison of the season’s win/loss standard deviation to the first season in the study, 2002;
  • VAR_FTEST_PRIOR - 1 minus the p-value associated with the variance change between the current season’s win/loss record and the prior season’s;
  • VAR_FTEST_S1 - 1 minus the p-value associated with the variance change between the current season’s win/loss record and the first season in the study, 2002;
  • MEAN_WL - the mean win/loss record in the season (roughly 0.500 across the board); and
  • SD_IDEAL_VALUE - the ideal standard deviation of win/loss records.
SEASON SD_WL_SEASON SD_IDEAL_COMPARE SD_PRIOR_COMPARE SD_S1_COMPARE VAR_FTEST_PRIOR VAR_FTEST_S1 MEAN_WL SD_IDEAL_VALUE
2002 0.0914709 2.328468 1.0000000 1.0000000 1.0000000 1.0000000 0.5001000 0.0392837
2003 0.0826037 2.102748 0.9030607 0.9030607 0.4135051 0.4135051 0.5000000 0.0392837
2004 0.0832845 2.120076 1.0082406 0.9105024 0.0349486 0.3830730 0.4999000 0.0392837
2005 0.0666987 1.697873 0.8008547 0.7291802 0.7623471 0.9056005 0.4999667 0.0392837
2006 0.0622315 1.584155 0.9330234 0.6803422 0.2885996 0.9578418 0.5000000 0.0392837
2007 0.0572080 1.456278 0.9192772 0.6254230 0.3466435 0.9861932 0.4999333 0.0392837
2008 0.0682038 1.736185 1.1922075 0.7456340 0.6508518 0.8803274 0.4999667 0.0392837
2009 0.0704312 1.792885 1.0326579 0.7699848 0.1362146 0.8347903 0.5000000 0.0392837
2010 0.0679371 1.729395 0.9645877 0.7427179 0.1526278 0.8851197 0.5000667 0.0392837
2011 0.0705241 1.795251 1.0380802 0.7710008 0.1581360 0.8326710 0.5000667 0.0392837
2012 0.0735677 1.872728 1.0431567 0.8042746 0.1784447 0.7532956 0.5000000 0.0392837
2013 0.0754867 1.921577 1.0260846 0.8252538 0.1093420 0.6933717 0.5000333 0.0392837
2014 0.0592967 1.509448 0.7855257 0.6482581 0.8004063 0.9773792 0.5000333 0.0392837
2015 0.0644090 1.639585 1.0862146 0.7041474 0.3409866 0.9362904 0.5000333 0.0392837
2016 0.0661766 1.684581 1.0274439 0.7234719 0.1149247 0.9133920 0.5000333 0.0392837
2017 0.0712293 1.813203 1.0763523 0.7787107 0.3054240 0.8160036 0.4999667 0.0392837
2018 0.0903678 2.300389 1.2686879 0.9879408 0.7940249 0.0516462 0.4999333 0.0392837
2019 0.0979602 2.493660 1.0840167 1.0709443 0.3331530 0.2854810 0.4999333 0.0392837
2021 0.0892582 2.272144 0.9111684 0.9758106 0.3803278 0.1040089 0.5000000 0.0392837
2022 0.0924093 2.352356 1.0353023 1.0102589 0.1469342 0.0434571 0.4999000 0.0392837

Results

Before examining the question of competitive balance further, every season’s standard deviation of win/loss records was compared to the ideal of \(0.0393\); a graph plotting each season’s result is below.

Overall, there is statistical significance between each season’s standard deviation of win/loss records and the ideal. Using the ideal standard deviation as the general rule of thumb for competitive balance, the MLB is not balanced. Despite this finding, it remains arguable that the MLB is a popular league regardless of the existence of balance to the ideal historically. As such, it was of additional interest to understand if the changes examined at the beginning of this paper are indicative of a statistically significant shift in competitive balance, or imbalance.

The two plots below provide visualization of the comparisons of a season’s standard deviation of win/loss records to the prior season and the first season in the study (2002). If a point is highlighted red in either plot, it is indicative of a statistically significant change in the standard deviation either toward or away from competitive balance.

As the figures above show, there is no clear trend in the MLB toward or away from its historical level of competitive balance, with 2002 as the base year. In fact, in years where there was a statistically significant shift (\(\alpha = 5\)%) in competitive balance from the base year (2006, 2007, and 2014), it was due to a shift toward greater competitive balance.

Conclusions and Suggested Follow-ups

By using MLB data from 2002 through 2022, it was determined that the MLB is not competitively balanced relative to an ideal standard deviation of win/loss records of \(0.0393\). Despite this finding, it is clear that the competitive balance of the MLB has not changed significantly year-over-year since 2002. Furthermore, when comparing the distribution of win/loss records to 2002, statistically significant shifts in a season’s win/loss distribution was only ever in the direction of greater competitive balance.

Further research on this topic may include an examination of the MiLB seeing as a share of MLB players start in the minor-league setting. In addition to examining the share of players that come from MiLB teams, this study could help bring understanding to the lagged effects of competitive balance or imbalance in the minor leagues on the MLB.

References

Rotthoff, K. Sports Econ Week 5 part 5 - Ideal Balance [Lecture Recording]. Canvas Learning Management System. https://login.bc.edu/nidp/idff/sso?id=4&sid=0&option=credential&sid=0&target=https%3A%2F%2Fservices.bc.edu%2Fcommoncore%2Fmyservices.do.

Rotthoff, K. Sports Econ Week 5 part 6 - Ideal Applications [Lecture Recording]. Canvas Learning Management System. https://login.bc.edu/nidp/idff/sso?id=4&sid=0&option=credential&sid=0&target=https%3A%2F%2Fservices.bc.edu%2Fcommoncore%2Fmyservices.do.

Baseball-Reference. (2022). Major League Baseball Standings. [Dataset]. Retrieved from https://www.baseball-reference.com/leagues/majors/2022-standings.shtml.