One of the MLB’s most prominent issue’s right now is the payroll gap. It could possibly cause a lockout next year with a new CBA agreement coming up and the owners possibly pushing for a salary cap which the players would oppose. The issue is that big market teams are willing to pay more for the bigger stars and there is not a limit to how much they can pay players. Once the best players on the small market teams hit free agency the bigger market teams pick them up because they can outbid the smaller markets. The payroll gap appears to cause a competitive imbalance in the MLB however there are low payroll teams that will still find a way to compete each year. So is the payroll gap that big of a problem?
The data that I will be using to start comes from a dataset on kaggle created by Christopher Treasure. The dataset includes data on the payrolls of MLB teams from 2011 to 2024. It include how much of the payroll was injured, buried, retained, and on the 26 man roster. It also icludes the average age of each team. Finally it has the record of each team as well as if they either won the division, were a wild card team, or did not make the playoffs
Team Team.Name Year Average.Age Total.Payroll.Allocations
1 OAK Oakland Athletics 2024 26.5 62132581
2 PIT Pittsburgh Pirates 2024 27.7 84050989
3 TB Tampa Bay Rays 2024 26.8 89707422
4 DET Detroit Tigers 2024 26.0 96961614
5 CLE Cleveland Guardians 2024 26.3 105224582
6 BAL Baltimore Orioles 2024 28.4 109335494
Active.26.Man Injured Retained Buried Wins Losses Postseason
1 28956713 15581092 15557073 1763221 69 93 No Playoffs
2 51220210 14524211 15341351 2965217 76 86 No Playoffs
3 37691876 13179262 34675167 1706572 80 82 No Playoffs
4 33226992 26677166 36920494 1070295 86 76 Wildcard
5 50885032 21120833 22945837 10272880 92 69 Division Winner
6 65994548 10808540 30650082 1882324 91 71 Wildcard
Analysis
To start I wanted to visual introduce the problem with a visual using boxplots to show the widening of the payroll gap by year for the years included in the dataset:
mlb_payroll %>%ggplot(aes(x =as.factor(Year), y = Total.Payroll.Allocations)) +geom_boxplot() +labs(title ="Total Payroll Across The MLB By Year",x ="Year",y ="Payroll")
As you can see the gap widens over time. 2020 is an outlier because that was the Covid 19 season so the season was shortened. With the rest of the years though the minimum stays roughly around the same point while the max grows. Also you can see that there are more teams joining the top ranks in terms of spending as the outliers disappear. This is pulling up the mean over time which could distract from the issue but you have to focus on the range. As more teams spend more it could be looked at as both good and bad for the MLB. It could appear good from the perspective that more teams want to compete and are willing to spend more money to do it. It could however be bad because now low payroll teams have to compete with even more star powered teams each year.
My next visual shows the average payroll for each team from 2011 to 2024:
mlb_payroll %>%group_by(Team) %>%summarise(Average.Payroll =mean(Total.Payroll.Allocations)) %>%ggplot(aes(x =reorder(Team, Average.Payroll), y = Average.Payroll)) +geom_col() +coord_flip() +labs(title ="Average Payroll for Teams from 2011-2024",y ="Average Payroll",x ="Team")
The New York Yankees and Los Angeles Dodgers are running away with it as they by far have the 2 highest payrolls which is what I expected. Each of their payrolls are close to double that of half the teams in the league and they are close to triple one or two of the teams at the bottom. These are the averages over 14 years which makes it that much crazier that the Dodgers and Yankees are that much farther ahead of them.
Next I will be comparing the Average Payroll over the 14 years to the Average win total for each team
mlb_payroll %>%group_by(Team) %>%summarise(Average.Payroll =mean(Total.Payroll.Allocations),Average.Wins =mean(Wins)) %>%ggplot(aes(x = Average.Wins, y = Average.Payroll)) +geom_point() +labs(title ="Average Payroll for Teams from 2011-2024 Compared to their Average Wins",y ="Average Payroll",x ="Average Win Totals")
There is not a strong correlation in this graph. There are teams that have success in the regular season without needing to spend a lot on payroll. There are plenty of teams that have more wins than teams that have a higher payroll however in general there seems to be a positive and linear shape to the graph. By the shape of the graph there seem to be a couple other factors that go into the success of a team in the regular season other than payroll.
One of those other factors is injury so the next visualization will compare the average percent of payroll that is injured and the average win total for the 14 year period.
mlb_payroll %>%group_by(Team) %>%summarise(Average.Payroll =mean(Total.Payroll.Allocations),Average.Injury.Payroll =mean(Injured, na.rm =TRUE),Average.Wins =mean(Wins)) %>%ggplot(aes(x = Average.Wins, y = Average.Injury.Payroll/Average.Payroll)) +geom_point() +labs(title ="Average Percent of the Payroll Injured for Teams from 2011-2024 Compared to their Average Wins",y ="Average Percent of the Payroll Injured",x ="Average Win Totals")
mlb_payroll %>%group_by(Team) %>%summarise(Average.Payroll =mean(Total.Payroll.Allocations),Average.Injury.Payroll =mean(Injured, na.rm =TRUE)) %>%ggplot(aes(x =reorder(Team,Average.Injury.Payroll/Average.Payroll), y = Average.Injury.Payroll/Average.Payroll)) +geom_col() +coord_flip() +labs(title ="Average Percent of the Payroll Injured for Teams from 2011-2024",y ="Average Percent of the Payroll Injured",x ="Team")
I also included a visual so that matches the team to their average percent of payroll injured. There are a couple of outliers in the graph but other wise the trend seems to be a negative and linear shape. The strength of the correlation appears to be pretty similar to the graph with the average payroll. Both variables seem to have a similar impact on regular season success for MLB teams
Next I will be looking the average payroll of division winners, wild card teams, and those teams that missed the playoffs .
mlb_payroll %>%group_by(Postseason) %>%summarise(Average.Payroll =mean(Total.Payroll.Allocations)) %>%ggplot(aes(x = Postseason, y = Average.Payroll)) +geom_col() +labs(title ="Average Payroll for Teams by Whether They Made The Postseason from 2011-2024",y ="Average Payroll",x ="Whether a Team Made it to the Postseason or not")
There is not a huge difference in the columns of the graph. This could be because of the division make up that is pulling the division one down since a couple division do not have teams that spend a lot and one of them has to win the division. If one high paying team has a rough year with injuries that could also sway that chart but either way it should not be by much. It would be better if I looked at postseason success as a whole rather than just who got in.
Supplemental Data
I used another data souce to help me visualize the playoff success aspect. I scraped data from Statmuse to see the postseason records of MLB teams since 2010 to compare them to the payroll data. I have 2 visualization one that ranks the teams by wins and the other by winning percentage. I only used the teams that had played at least 10 postseason games over the time period.
playoff<-read.csv("playoff_df.csv")playoff %>%group_by(TEAM) %>%filter(G >10) %>%ggplot(aes(x =reorder(TEAM,W.), y = W.)) +geom_col() +coord_flip() +labs(title ="Postseason winning percentage of MLB teams since 2010",y ="Postseason winning percentage",x ="Team")
playoff %>%group_by(TEAM) %>%filter(G >10) %>%ggplot(aes(x =reorder(TEAM,W), y = W)) +geom_col() +coord_flip() +labs(title ="Postseason games won of MLB teams since 2010",y ="Percent of the Payroll Injured",x ="Team")
mlb_payroll %>%group_by(Team) %>%summarise(Average.Payroll =mean(Total.Payroll.Allocations)) %>%ggplot(aes(x =reorder(Team, Average.Payroll), y = Average.Payroll)) +geom_col() +coord_flip() +labs(title ="Average Payroll for Teams from 2011-2024",y ="Average Payroll",x ="Team")
The teams with the most playoff success are towards the top like the Dodgers, Red Sox, Astros, Giants, Phillies, Rangers, etc. These teams are towards the top of the average payroll list as well. There are also teams that are lower on the list like the Marlins, Rays, Pirates, Orioles, Brewers that are at the bottom of the payroll list and do not have much playoff success. There are a couple outliers like the Royals who are middle of the payroll list but have the best winning percentage and a mid tier amount of wins. This signals to me that they occasionally spend money and they have been able to go on a couple of postseason runs.
Conclusion
The payroll gap does have an effect on the competitiveness of the MLB. Some teams are able to get away with not spending too much and having regular season success while most who do not spend are not. In the postseason however the payroll gap seems to catch up to some of these teams and they are unable to compete with teams that spend more. The payroll gap is a problem that needs to be fixed because it keeps getting worse as time goes because the bigger teams keep wanting to spend more while the smaller teams are content spending the amount they are on payroll.