As a Chicago White Sox fan in the year 2025, it is hard to get excited about upcoming baseball seasons. It’s hard because you have teams like the Los Angeles Dodgers and New York Mets that have an owner that wants to win. The White Sox have had seven winning seasons since 2005. As a team in a top 4 market, this is very embarrassing. The city of Chicago has a passionate fan base and the owner refuses to invest any money into this team and give the city a winner. The Chicago White Sox refuse to spend money on this team and have never signed a player in free agency over $100 million meanwhile, the Los Angeles Dodgers throw everything to get the best player on the market, most recently, Shohei Ohtani. As a fan of a team that has an owner that refuses to spend, I wanted to see how a salary cap would affect teams like the Los Angeles Dodgers and New York Yankees so teams like the Chicago White Sox and Pittsburgh Pirates could compete with them.
Loading in Data for Analysis
library(ggplot2)
Warning: package 'ggplot2' was built under R version 4.4.3
library(tidyverse)
Warning: package 'readr' was built under R version 4.4.3
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.6
✔ forcats 1.0.0 ✔ stringr 1.6.0
✔ lubridate 1.9.4 ✔ tibble 3.3.0
✔ purrr 1.2.0 ✔ tidyr 1.3.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(rvest)
Attaching package: 'rvest'
The following object is masked from 'package:readr':
guess_encoding
library(dplyr)library(ggrepel)library(scales)
Attaching package: 'scales'
The following object is masked from 'package:purrr':
discard
The following object is masked from 'package:readr':
col_factor
New names:
Rows: 420 Columns: 21
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(3): Team, Team_Name, Postseason dbl (18): ...1, Year, Average_Age,
Total_Payroll_Allocations, Active_26_Man,...
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`
To figure this question out, I found a data set on Kaggle with every MLB payroll dating back to 2011. With payrolls going up and players making more and more money every year, I wanted to see the difference between owners spending and the amount of success the team has when the owner invests money into the team. Right now, there are a bunch of teams that are below a $200 million payroll and a couple standout teams that are above $260 million like the Dodgers and Yankees. What I did was added a column about cost per win. I also added a hypothetical $250 million cap so that teams would not be able to spend over that cap. I then added a column of wins after the cap and how many wins would be taken away after the cap.
1. Payroll Distribution by Postseason Result
For this graph, I wanted to show how spending would result in qualifying for the playoffs. This graph shows that since 2011, teams that spend money tend to win their division more or even qualify for the wildcard spot. The hard part about baseball is that there are different ranges of divisions. There are divisions like the AL East that has every team spend money and compete for the division and divisions like the AL Central that has teams that don’t spend but the division winner automatically makes the playoffs. The AL Central makes the minimum payroll for Division winner a lot less. If there were more teams with owner that spend, the Division Winner would be a lot tighter. The AL East is a prime example of why the payroll for the Wildcard spot is so high. It is debatable that they have the toughest division in baseball and with the baseball season being so long, a team with high payroll could lose some guys due to injury and fall to second place. The No Playoffs box plot makes a lot of sense. These teams don’t have the guys to make a playoff push and make it through the 162 game season.
ggplot(MLB_payrolls, aes(x = Postseason, y = Total_Payroll_Allocations)) +geom_boxplot(outlier.alpha =0.6) +scale_y_continuous(labels = scales::dollar) +labs(title ="Payroll Distribution by Postseason Result",x ="Postseason Result",y ="Total Payroll Allocations" ) +theme_minimal()
2. Average Payroll by Postseason Result
As this data set has data since 2011, spending looks a lot different from 2011. Players weren’t making $700 million and raising the payrolls. What isn’t different from 2011 though is that teams that spend money on average win the division. This graph should be shown to MLB owners saying that you should invest in your teams because you would have a good ROI. You would get more fans in the seats for playoff baseball and get all sorts of money from around the stadium. Again, there is also a variety in divisions every year and have a bunch of teams that spend money but fall short of winning the division because of other teams in the division outplaying them. This makes the Wildcard spot a bit behind the Division winner. There are sometimes teams that get screwed because of how competitive the division is and that’s why I think the No Playoffs tier is so high.
I really like this graph because it shows how money correlates with wins. 2011 looked a lot different spending-wise but as time goes on, you can still see how as the money goes up, the wins start to go up. You can also see, as time goes on and player contracts are going up, you start to see a separation of teams and wins. There isn’t a team below $200 million that has had less than 60 wins. As the players get more expensive, the difference in owners starts to play a difference and why we see the same teams every year in the playoffs. You are going to have your typical Cinderella team that makes the playoffs with not spending but the teams that spend will be in the playoffs every year. I think as time goes on and there being no salary cap, the gap will only get wider because the old school owners don’t want to pay their players.
ggplot(MLB_payrolls, aes(x = Total_Payroll_Allocations, y = Wins)) +geom_point(alpha =0.7) +geom_smooth(method ="lm", se =FALSE) +scale_x_continuous(labels = dollar) +labs(title ="Payroll vs Wins",x ="Total Payroll",y ="Wins" ) +theme_minimal()
`geom_smooth()` using formula = 'y ~ x'
4. Payroll vs Wins by Postseason Result
This is like the graph above but this shows the postseason result of each team. At the top, the teams that spend tend to win their division. You have exceptions when you have divisions that don’t typically spend but on average, the more you spend, the higher chance you have of winning the division. There is one outlier that spent a bunch of money but missed the playoffs. This could be because of all of the injuries the team had throughout the season to throw off their season.
ggplot(MLB_payrolls,aes(x = Total_Payroll_Allocations, y = Wins, color = Postseason)) +geom_point(size =2, alpha =0.8) +scale_x_continuous(labels = dollar) +labs(title ="Payroll vs Wins by Postseason Result",x ="Total Payroll",y ="Wins" ) +theme_minimal()
5. Estimated Wins Lost Under a $250M Salary Cap
This graph shows why cheaper owners want a salary cap. The teams that spend would have to move pieces around or restructure their contracts to fit this $250 million cap. If they have to move pieces, their win total would go down since they have to get rid of their expensive players. With this happening, the win total would go down since they’re losing some good players. This would bring more of a competitive balance to the league and give the smaller teams a better chance at making the playoffs. As a fan of a cheap team, I like this idea because it lets you look forward to the baseball season instead of watching your team lose 100 games a year. It would bring more variety and change up the league quite a bit.
ggplot(MLB_payrolls,aes(x = Total_Payroll_Allocations, y = Wins_Lost)) +geom_point(alpha =0.7, size =2) +geom_hline(yintercept =0, linetype ="dashed", color ="gray40") +geom_text_repel(data = MLB_payrolls %>%filter(Wins_Lost >0),aes(label = Team),size =3,max.overlaps =15 ) +scale_x_continuous(labels = dollar) +labs(title ="Estimated Wins Lost Under a $250M Salary Cap",subtitle ="Only teams above the cap lose wins",x ="Original Payroll",y ="Estimated Wins Lost" ) +theme_minimal()
6. Estimated Wins Under a $250 Salary Cap
This graph shows how good a salary cap would be for the MLB. This cap puts the higher spending teams more in the mix with the average teams and adds more variety to the league. This graph shows me how beneficial this would be for the league and with the season being extremely long already, this would add more stakes to every game because everyone is in the same boat. Teams with cheaper owners would also be more interested later into the season due to them actually having a chance at competing in the post-season.
ggplot(MLB_payrolls,aes(x = Payroll_Capped_250M, y = Wins_After_Cap)) +geom_point(size =2, alpha =0.7, color ="steelblue") +geom_vline(xintercept =250000000, linetype ="dashed", color ="gray40") +scale_x_continuous(labels = dollar) +labs(title ="Estimated Wins Under a $250M Salary Cap",subtitle ="High-spending teams are constrained to the cap",x ="Payroll After Cap",y ="Estimated Wins" ) +theme_minimal(base_size =12)
Part 2: Do the Top Teams Have the Top Players?
To figure this out, I scraped Baseball Reference to get the top players of the 2024 season the align with the Kaggle dataset above. I am using this data to get the baseball stat WAR. WAR stands for Wins Above Replacement and basically shows how many wins a player contributes to team.
New names:
Rows: 27 Columns: 3
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(1): Team dbl (2): ...1, Players_in_Top50
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`
New names:
Rows: 10 Columns: 5
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(2): Player, Team dbl (3): ...1, WAR, G
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`
1. Number of Players in Top 50 WAR by Team (2024)
This graph shows how the teams that spend have the best players. WAR is deemed the best stat to determine how good a player is and in 2024, the Dodgers and Astros were two teams with a payroll above $250 million and both have four players in the top 50. Other notable teams on here are the Padres, Yankees, and Mets having multiple players on this list. This is just another reason why the teams that spend are really good teams.
ggplot(team_top50_count,aes(x =reorder(Team, Players_in_Top50),y = Players_in_Top50)) +geom_col(fill ="#1f77b4") +coord_flip() +labs(title ="Number of Players in Top 50 WAR by Team (2024)",x ="",y ="Players in Top 50 WAR" ) +theme_minimal()
2. Top Dodgers Players by WAR (2024)
In 2024, the Dodgers had four MVP candidates on their team alone. These players all play most of the games throughout the season and are a big part of the reason why the Dodgers have all of the success. Shohei Ohtani is probably the greatest player ever and they paid him $700 million and the investment is worth it. The Dodgers are the most successful franchise in baseball right now and are not afraid to pay to improve their team. This mentality is widening the gap every year and is leaving these smaller market teams out in the dust.
ggplot(dodgers_top,aes(x =reorder(Player, WAR),y = WAR)) +geom_col(fill ="#005A9C") +coord_flip() +labs(title ="Top Dodgers Players by WAR (2024)",subtitle ="Batting WAR — Baseball-Reference",x ="",y ="WAR" ) +theme_minimal()
Conclusion
In conclusion, a salary cap at this point in the MLB is the only thing that can save the smaller teams in the league. The gap is getting wider between the teams because the owners at the top aren’t afraid to spend money or trade for players meanwhile the teams at the bottom don’t want to pay for players and just trade away their good players once they want more money.
As a White Sox fan in 2025, we have produced a bunch of talent but due to our cheap owner, we end up trading away these good players and acquiring younger talent and it becomes an endless cycle. This team knows how to get your excitement up for the season and breaks your heart in the end.
I think a salary cap would be beneficial because it would only create more even-level competition for everyone and provide more entertainment for the league. The season is already long and adding a more competitive balance would increase fan viewership and keep fans interested in the season longer than it would now.