This study examines the efficiency of Major League Baseball (MLB) player contracts by rigorously analyzing the multifaceted relationship between contract value, contract length, and player performance metrics. With a focused emphasis on the economic principles of positional value and broader market dynamics, I utilize a proprietary 2025 dataset of active MLB player contracts. The analysis seeks to determine whether player compensation, in both total value and contract duration, is a true reflection of on-field contributions as quantified by Wins Above Replacement (WAR). Furthermore, I investigate if certain positions are systematically over- or under-valued by the market. Employing a comprehensive methodology that includes descriptive statistics, compelling visualizations, and robust regression analysis, my research aims to systematically identify and explain persistent market inefficiencies in player compensation. By benchmarking actual contract outcomes against positional and performance data, I offer a substantive contribution to the sports economics literature and provide actionable insights for MLB front offices seeking to optimize roster construction and financial strategy.
The sport of baseball has evolved far beyond a simple game into a complex, multi-billion dollar enterprise where the strategic allocation of financial resources is arguably as critical to success as on-field talent. In this high-stakes environment, the valuation of on-field talent has become the central and most intricate challenge facing every franchise. While a revolution in sports analytics has equipped front offices with increasingly sophisticated metrics to quantify individual player contributions, a fundamental question remains: does the professional baseball labor market consistently and accurately translate a player’s performance into commensurate financial compensation? I approach the subject with professional, data-driven rigor, treating the sport as a dynamic and intricate economic ecosystem subject to market forces, information asymmetry, and behavioral biases.
The shift toward a data-informed approach to roster management has led to an era of heightened scrutiny over every payroll investment. Teams are now under immense pressure to maximize every dollar spent, making the pursuit of “value” a defining principle of front-office strategy. Yet, despite these advancements, teams continue to grapple with a high degree of uncertainty, forced to balance competitive aspirations with the unpredictable realities of player health, age-related performance decline, and the inherent risks of long-term contracts. This persistent tension between a desire for analytical precision and the inescapable risks of multi-year deals has transformed contract negotiations in MLB into a dynamic, and often controversial, arena where a single misstep can have a lasting impact on a franchise’s fortunes.
My thesis aims to unravel this complex dynamic by thoroughly investigating the relationship between player compensation and performance, with a particular focus on how positional value and the structure of a player’s contract—specifically its length—influence the overall efficiency of MLB’s labor market. The core research question guiding this study is:
Are MLB contracts efficient with respect to player WAR and positional value, or do systematic inefficiencies persist in the market?
Major League Baseball has a unique labor market defined by decades of evolution in free agency, salary arbitration, and competitive balance measures. The structure of contracts, the influence of collective bargaining, and the rise of analytics have all shaped how teams allocate payroll and value player contributions.
In the 1970s, the introduction of free agency dramatically altered the player-team power balance. Players gained the right to negotiate with any team after six years of service time, resulting in a surge of multi-year, multi-million dollar contracts. Salary arbitration followed, providing younger players a mechanism to secure higher pay based on comparable performance. These changes have made the MLB labor market highly competitive, yet also subject to inefficiencies driven by market perceptions, negotiation leverage, and evolving metrics of value.
Since the 1990s, the luxury tax has served as a soft cap on payrolls, impacting contract strategies for large- and small-market teams alike. Wealthy franchises can absorb penalties to sign high-profile talent, while others must be more creative, seeking bargains or developing prospects. The interplay between payroll constraints, competitive windows, and shifting economic conditions means that the quest for market efficiency is never static.
Hypothesis:
MLB player contract value and length are primarily determined by
performance metrics, notably WAR, and positional value. However, market
size, agent bargaining power, and contract structure result in
systematic deviations from efficiency.
Major League Baseball (MLB) contract valuation, labor market efficiency, and roster optimization have been studied extensively from multiple disciplinary perspectives, including economics, sports analytics, and labor market theory. This literature review provides a thorough overview of 20 key articles, each contributing unique insights into how WAR (Wins Above Replacement), positional value, market size, bargaining power, and agent influence intersect to determine salary and contract structure for MLB players.
Arellano et al. (2016) analyze the impact of salary dispersion on team performance, showing that while top performers often command premiums, compensation is not always efficiently distributed. Their findings indicate that disparities in pay can affect both individual motivation and team outcomes, suggesting that market forces do not always achieve optimal alignment between salary and value. Bierig, Hollenbeck, and Stroud (2017) utilize machine learning to model career progression, finding that WAR is a strong predictor for contract outcomes, but that market anomalies, such as sudden breakouts or slumps, can disrupt expected patterns. Bradbury (2009) examines age curves and peak athletic performance, demonstrating that long-term contracts can be risky investments, as players’ production often declines before contracts expire.
Brown and Jepsen (2009) explore the effects of multitasking and versatility, concluding that players who fulfill multiple roles are not always compensated proportionally. This inefficiency highlights the limitations of traditional contract models and the need for more nuanced metrics. Carruth and Jensen (2007) focus on defensive skills, specifically throwing ability, and reveal that these contributions are frequently under-compensated compared to offensive output, despite their importance to team success.
Depken and Wilson (2004) investigate salary arbitration, finding that while it increases salaries for eligible players, arbitration does not necessarily improve market efficiency or ensure compensation is proportional to WAR. Ehrlich et al. (2021) demonstrate that offensive output, particularly home runs and RBIs, can lead to salary premiums that are not always justified by WAR, especially for certain positions. Elitzur (2019) assesses the impact of data analytics on contract negotiations, showing that teams employing advanced metrics achieve greater contract efficiency, but that agent influence and market factors can still skew outcomes.
Freeston et al. (2024) study in-game workload demands, emphasizing that positions such as catcher and pitcher face unique physical and performance challenges. Their work suggests that WAR may not fully capture these demands, leading to undervaluation in contract negotiations. Granato (2023) analyzes wage dispersion among pitchers, revealing significant variation in compensation for similar WAR contributions, often driven by perceived risk and negotiation outcomes.
Hakes and Sauer (2006) provide an economic evaluation of the Moneyball hypothesis, showing that teams can exploit undervalued metrics for competitive advantage, but that behavioral biases and resistance to change can limit market efficiency. Kennedy-Shaffer (2024) examines the effects of the MLB ban on infield shifts, finding that rule changes can significantly impact player WAR and, by extension, contract value, particularly for infielders.
Krautmann (1999) critiques Scully’s marginal revenue product model, arguing that while WAR is a useful tool, it cannot fully account for market premiums, risk aversion, and subjective factors. Krautmann and Solow (2009) reconsider the baseball labor market, identifying negotiation leverage, agent reputation, and market size as critical determinants of contract length and value.
Link and Yosifov (2012) provide one of the most rigorous empirical analyses of contract length and compensation, using regression models to confirm that WAR and contract duration are significant predictors of salary. However, their work also highlights the role of agent bargaining and market context in producing systematic deviations from efficiency. The PMC Study (2023) explores the impact of national culture, altruism, and risk preference, concluding that agent power is especially significant in securing favorable terms for elite players, particularly with long-term deals.
Rivers and Brown (2006) focus on the valuation of versatility, showing that multi-position players are not always compensated for their flexibility, signaling a persistent inefficiency in contract models. Sommer (2011) investigates human capital and salary distribution, finding that while metrics like WAR explain much of the variation in pay, market premiums and discounts for certain positions or profiles remain entrenched.
Sommers and Quinton (1982) analyze the case of the first family of free agents, documenting how free agency has led to salary inflation independent of actual performance metrics. Watnik (1998) confirms a statistical correlation between WAR and pay, but notes that market efficiency is hampered by negotiation strategies, agent influence, and external shocks.
Across all these works, several themes emerge. WAR is central to contract negotiation and valuation, yet it is not a perfect measure. Positional value, market size, agent bargaining power, risk profiles, and negotiation dynamics all introduce systematic inefficiencies. Teams that leverage analytics and advanced metrics can improve contract efficiency, but limitations in measurement, resistance to change, and powerful agents ensure that anomalies persist. Defensive contributions, versatility, and workload are often undervalued, while offensive output and star status can inflate salaries beyond what WAR would predict. Long-term contracts are particularly vulnerable to inefficiency due to age-related decline and injury risk.
Moreover, external factors such as rule changes, cultural dynamics, and economic shocks (e.g., pandemic disruptions) can rapidly alter the landscape of contract valuation. The literature consistently argues that MLB teams have become more rational and data-driven, but compensation remains influenced by a complex interplay of performance metrics, market conditions, negotiation leverage, and human judgment. This comprehensive body of research provides the empirical and theoretical foundation for analyzing contract efficiency in MLB and directly informs the methods and interpretation of the present study.
The article “Contract Length and Salaries Compensating Wage Differentials in Major League Baseball” (Link & Yosifov, 2012) offers one of the most rigorous empirical frameworks for understanding MLB salary structures. The authors analyze a large panel of MLB player contracts, examining how contract length interacts with salary and how teams compensate players for various risk factors, including injury and performance volatility. Their key contribution is the identification of compensating wage differentials—systematic salary adjustments made to offset undesirable contract attributes (such as longer risk exposure or less player-friendly terms).
Their regression-based approach reveals that, while teams do pay a premium for players willing to sign longer-term deals, this premium is not uniform across all positions or performance levels. For instance, pitchers often receive different contract structures compared to position players due to higher injury risk, and star players with higher WAR tend to command both longer deals and higher annual values. Notably, Link & Yosifov’s work provides robust evidence that WAR is a significant predictor of both contract length and value, but that market inefficiencies and behavioral factors (including team competitiveness cycles and market size) can distort these relationships.
This foundational study informs the current thesis by establishing a methodological template: using regression models to test whether actual contract values align with objective performance metrics like WAR, while controlling for positional and contextual variables. Moreover, it highlights the perennial tension in MLB between efficient labor market outcomes—where pay matches value—and real-world deviations driven by bargaining, expectation, and uncertainty.
“Current State of Data and Analytics Research in Baseball” (Mizels, Erickson, & Chalmers, 2022) provides a sweeping survey of the analytics-driven transformation in baseball. The authors map the emergence and impact of sabermetrics, motion analysis, machine learning, and artificial intelligence on both player evaluation and broader team decision-making. They argue that the maturity of data science in MLB has fundamentally shifted the way teams perceive and pay for talent, ushering in an era where traditional heuristics are increasingly supplanted by empirical modeling and statistical inference.
The article pays particular attention to the spread of WAR and related metrics as the lingua franca of player valuation. It traces how teams have adopted these tools not only in free agency and arbitration negotiations but also in internal roster construction and in-game management. Of special relevance to the present thesis is their discussion of market efficiency: while analytics has improved the precision with which teams estimate player value, the authors find that inefficiencies persist, especially at the margins (e.g., for utility players, aging veterans, or those with unique skill sets not fully captured by standard metrics).
Moreover, Mizels et al. emphasize the limitations of even the most sophisticated models, noting that uncertainty, sample size effects, and context-dependency (such as the impact of teammates or stadiums) can lead to persistent gaps between prediction and realized value. This perspective justifies the present thesis’s empirical focus: even in a data-rich environment, do contract values truly reflect on-field performance and positional scarcity, or do anomalies and inefficiencies remain?
The drive for efficiency in MLB extends beyond individual contracts to the construction of entire rosters. “A Data-Driven Optimization Approach to Baseball Roster Management” advances this theme by proposing and empirically evaluating optimization algorithms for assembling competitive, cost-effective teams. The study leverages linear programming and simulation to allocate players across positions and budget constraints, with WAR and salary as key inputs.
This approach resonates with the modern “Moneyball” paradigm, where front offices seek to maximize output (wins) per dollar spent by identifying undervalued assets and constructing balanced, flexible rosters. The article demonstrates that mathematical optimization can, in theory, outperform traditional scouting or subjective decision-making, especially when applied to large, up-to-date datasets that capture player performance, injury risk, and market conditions.
For this thesis, the implications are twofold. First, the article reinforces the importance of considering position-specific value and opportunity cost in compensation analysis: not all WAR is created equal, as scarcity and strategic fit modulate a player’s true contribution. Second, it suggests that market inefficiencies—such as overpaying for marquee names or underestimating the value of multi-positional players—can be systematically exploited by data-driven teams. Thus, by comparing actual contract outcomes to optimization-based benchmarks, the current study situates itself at the cutting edge of both analytics-informed practice and academic research.
The pandemic era introduced unprecedented volatility into MLB’s labor market, as captured in “Cardboard Fans in the Stands: COVID and Compensation in Major League Baseball”. This article chronicles the disruptions to player compensation, contract negotiation, and team payroll management resulting from the 2020 COVID-19 season and its aftermath. The authors document how lost revenues, shortened schedules, and empty stadiums forced teams and players into novel bargaining positions, with ripple effects on contract length, salary guarantees, and risk-sharing mechanisms.
Of particular significance is the article’s analysis of how uncertainty and exogenous shocks can exacerbate or reveal underlying inefficiencies in salary allocation. Teams facing revenue shortfalls may prioritize short-term flexibility over long-term commitments, while players may accept lower annual values in exchange for job security. The COVID-19 context thus provides a natural experiment for testing the resilience of market efficiency: do contracts signed under duress deviate more from WAR-based value models, or do they reflect rational adjustments to new risk profiles?
By incorporating these insights, the current thesis recognizes that labor market efficiency in MLB is not static but sensitive to external shocks and shifting bargaining power. It further motivates the inclusion of temporal and contextual variables in compensation analysis, acknowledging that even the most robust models must adapt to changing economic realities.
Together, these four articles construct a comprehensive scholarly foundation for analyzing MLB contract efficiency. From economic theory and regression analysis (Link & Yosifov), through the analytics revolution (Mizels et al.), to prescriptive optimization and the impact of exogenous shocks, each contributes a distinctive lens for understanding how and why MLB contracts may succeed or fail to align pay with value.
The present research builds on this foundation by empirically testing whether 2025 MLB contracts efficiently compensate players for their WAR and positional value, using up-to-date data and analytics methods. In doing so, I contribute not only to academic debates but also to the practical discourse on roster construction, player advocacy, and the future of baseball economics.
I use the mlb_player_contracts_2025.csv
, containing MLB
player contracts for 2025, including player, team, contract value,
years, average annual value, position, best WAR, and average WAR for the
position.
Key variables: - player
-
team
- contract_value
- years
-
average_annual
- position
-
best_war
(player’s best single-season WAR) -
avg_war_position
(average WAR for the position)
The rationale for using a player’s best single-season Wins Above Replacement (WAR) as a performance metric in this study is deeply rooted in both the history of Major League Baseball analytics and the practical realities of contract negotiations. WAR itself is a modern synthesis of decades of statistical development, originating in the sabermetric movement of the late 20th century. It was designed to provide a single, comprehensive measure of a player’s total value to their team, accounting for offense, defense, baserunning, and positional context. In MLB, WAR allows front offices, agents, and analysts to compare players across eras, positions, and roles with a standardized metric, making it especially valuable in bargaining and roster construction.
Historically, teams relied on simpler measures like batting average and RBIs, but as baseball economics evolved and the stakes of contract decisions grew—particularly after the rise of free agency and arbitration in the 1970s—so did the need for more sophisticated evaluation tools. By the early 2000s, WAR had become a central figure in contract talks, cited in arbitration hearings, free agency negotiations, and even in media coverage of major deals. Its ability to convert diverse skills into a single value made it a powerful benchmark for both teams and player representatives.
Choosing best single-season WAR specifically, as opposed to multi-year averages or projections, reflects the way MLB contract negotiations typically unfold. Teams and agents often anchor discussions around a player’s peak performance—the season in which they demonstrated their highest level of output and impact. This is because peak WAR serves as proof of a player’s ceiling, showcasing the value they are capable of providing when healthy and at their best. While multi-year averages may offer a more stable or predictive view of future performance, they can dilute the significance of an extraordinary season or overlook the context of short-term injuries or slumps. Peak WAR is often used to justify larger contracts, longer terms, or special incentives, particularly for superstar players whose career trajectories may be atypical.
Additionally, best WAR helps mitigate distortions caused by injuries, suspensions, or other short-term factors that might unfairly depress a multi-year average. It highlights the player’s maximum demonstrated potential, which is especially relevant for teams seeking impact talent to fill a critical roster need or for agents trying to maximize their client’s market value. For younger players or rising stars, best WAR can capture breakout performances that signal future upside, while for veterans, it serves as a reminder of proven excellence.
In summary, using best single-season WAR not only aligns with industry practice but also provides a fair and focused lens for analyzing contract efficiency. It reflects the peak level of achievement most valued in MLB contract negotiations, while remaining sensitive to the realities of player health, role changes, and market expectations. By centering the analysis on best WAR, this study is able to offer insights that are both relevant to front office strategy and grounded in the historical evolution of baseball analytics.
file_path <- "mlb_player_contracts_2025.csv"
contracts_raw <- read_csv("/Users/angelbayron/Downloads/mlb_player_contracts_2025.csv")
contracts <- contracts_raw %>%
clean_names() %>%
mutate(
contract_value = as.numeric(contract_value),
years = as.numeric(years),
average_annual = as.numeric(average_annual),
best_war = as.numeric(best_war),
avg_war_position = as.numeric(avg_war_position),
war_above_pos = best_war - avg_war_position,
contract_per_war = contract_value / best_war,
annual_per_war = average_annual / best_war
) %>%
filter(!is.na(contract_value), !is.na(best_war), best_war > 0, years > 0)
skim(contracts)
Name | contracts |
Number of rows | 107 |
Number of columns | 11 |
_______________________ | |
Column type frequency: | |
character | 3 |
numeric | 8 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
player | 0 | 1 | 8 | 21 | 0 | 107 | 0 |
team | 0 | 1 | 4 | 12 | 0 | 28 | 0 |
position | 0 | 1 | 1 | 5 | 0 | 16 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
contract_value | 0 | 1 | 118720560.75 | 106071645.19 | 9000000.0 | 50500000.0 | 82000000.0 | 138000000.00 | 700000000.0 | ▇▂▁▁▁ |
years | 0 | 1 | 5.70 | 2.51 | 1.0 | 4.0 | 5.0 | 7.00 | 14.0 | ▂▇▂▁▁ |
average_annual | 0 | 1 | 18678135.12 | 9074104.18 | 4500000.0 | 12375000.0 | 18000000.0 | 22250000.00 | 70000000.0 | ▇▇▁▁▁ |
best_war | 0 | 1 | 5.15 | 1.09 | 3.8 | 4.3 | 4.8 | 5.85 | 9.0 | ▇▃▃▁▁ |
avg_war_position | 0 | 1 | 2.78 | 0.33 | 2.1 | 2.6 | 2.8 | 2.90 | 3.3 | ▂▂▇▇▃ |
war_above_pos | 0 | 1 | 2.37 | 1.07 | 0.6 | 1.6 | 2.2 | 3.00 | 5.8 | ▇▇▆▂▁ |
contract_per_war | 0 | 1 | 20919817.43 | 13888761.60 | 2368421.0 | 11650055.4 | 17045454.6 | 22962417.10 | 77777777.8 | ▇▅▁▁▁ |
annual_per_war | 0 | 1 | 3527019.48 | 1171016.08 | 1184210.5 | 2695937.9 | 3414634.1 | 4183333.33 | 7777777.8 | ▃▇▆▁▁ |
Notes:
- Players with missing or zero WAR are excluded from WAR efficiency
calculations. - New variables: war_above_pos
(player WAR
minus average WAR for their position), contract_per_war
(total contract per best WAR), and annual_per_war
(AAV per
best WAR).
At this stage, cleaning the data ensures that all subsequent analyses are based on a robust and reliable dataset. By filtering out contracts with missing or zero WAR or contract values, I avoid distortions that could arise from incomplete or anomalous entries. Creating new variables such as WAR above position and cost per WAR will enable a more nuanced exploration of efficiency, allowing me to compare players across positions and contract structures on an apples-to-apples basis.
position_summary <- contracts %>%
group_by(position) %>%
summarise(
n = n(),
avg_contract = mean(contract_value, na.rm = TRUE),
avg_annual = mean(average_annual, na.rm = TRUE),
avg_years = mean(years, na.rm = TRUE),
avg_war = mean(best_war, na.rm = TRUE),
avg_war_above = mean(war_above_pos, na.rm = TRUE),
avg_contract_per_war = mean(contract_per_war, na.rm = TRUE)
) %>%
arrange(desc(avg_contract_per_war))
kable(position_summary, caption = "Table 1. Summary by Position") %>%
kable_styling(full_width=FALSE)
position | n | avg_contract | avg_annual | avg_years | avg_war | avg_war_above | avg_contract_per_war |
---|---|---|---|---|---|---|---|
DH/SP | 1 | 700000000 | 70000000 | 10.000000 | 9.000000 | 5.800000 | 77777778 |
2B/RF | 1 | 365000000 | 30416667 | 12.000000 | 7.700000 | 4.700000 | 47402597 |
SS | 12 | 191000000 | 23481638 | 7.333333 | 5.283333 | 1.983333 | 33457584 |
RF | 9 | 159444444 | 21343000 | 6.444444 | 5.577778 | 2.677778 | 24445832 |
3B | 9 | 125444444 | 19176984 | 6.000000 | 4.966667 | 2.066667 | 23896859 |
LF | 8 | 116500000 | 18101042 | 6.000000 | 5.000000 | 2.400000 | 21528917 |
1B | 9 | 106111111 | 19240741 | 5.222222 | 5.133333 | 2.733333 | 19714390 |
SP | 23 | 100717391 | 18606988 | 5.260870 | 5.430435 | 2.630435 | 17835390 |
CF | 8 | 100000000 | 17258631 | 5.750000 | 5.400000 | 2.500000 | 17467827 |
2B | 9 | 77900000 | 14318095 | 5.222222 | 4.566667 | 1.566667 | 16410334 |
DH/LF | 3 | 79000000 | 15361111 | 5.333333 | 4.866667 | 2.133333 | 15740681 |
CF/SS | 1 | 65000000 | 13000000 | 5.000000 | 4.800000 | 1.700000 | 13541667 |
C | 11 | 61136364 | 12930303 | 4.454546 | 4.472727 | 2.372727 | 13053199 |
DH | 1 | 42000000 | 14000000 | 3.000000 | 4.100000 | 1.400000 | 10243902 |
1B/DH | 1 | 33000000 | 16500000 | 2.000000 | 4.000000 | 1.700000 | 8250000 |
2B/SS | 1 | 28000000 | 7000000 | 4.000000 | 4.300000 | 1.200000 | 6511628 |
Interpretation:
The summary provided in Table 1 reveals the distribution of contract
values, average annual salary, contract length, best WAR, WAR above
positional average, and average cost per WAR broken down by player
position. This table allows me to identify which positions tend to
command the highest contracts relative to their performance and which
offer teams the most efficient value. For instance, if pitchers or
shortstops have the highest average contract per WAR, this may reflect
either a market premium for those positions due to scarcity and
strategic importance, or it could signal inefficiency where teams
systematically overpay. Conversely, positions with lower average
contract per WAR may be undervalued or represent opportunities for teams
to find excess value. The number of contracts (n
) for each
position also helps indicate sample robustness; positions with fewer
contracts may have less reliable averages, and this should be considered
when interpreting the data. The WAR above position metric is
particularly useful for understanding whether top performers at each
position are being rewarded appropriately compared to their positional
peers.
top10_war <- contracts %>%
arrange(contract_per_war) %>%
select(player, team, position, contract_value, best_war, contract_per_war) %>%
head(10)
kable(top10_war, caption = "Table 2. Top 10 Most Cost-Efficient Contracts (\\$ per WAR)") %>%
kable_styling(full_width=FALSE)
player | team | position | contract_value | best_war | contract_per_war |
---|---|---|---|---|---|
Reese McGuire | Red Sox | C | 9.0e+06 | 3.8 | 2368421 |
Hunter Renfroe | Royals | RF | 1.3e+07 | 3.9 | 3333333 |
Amed Rosario | Rays | SS | 1.8e+07 | 3.9 | 4615385 |
Harrison Bader | Mets | CF | 2.0e+07 | 3.8 | 5263158 |
Eugenio Suárez | Diamondbacks | 3B | 2.4e+07 | 4.1 | 5853659 |
Brandon Lowe | Rays | 2B | 2.4e+07 | 4.0 | 6000000 |
Ha-Seong Kim | Padres | 2B/SS | 2.8e+07 | 4.3 | 6511628 |
Danny Jansen | Blue Jays | C | 2.6e+07 | 3.8 | 6842105 |
Josh Jung | Rangers | 3B | 3.2e+07 | 4.4 | 7272727 |
Tarik Skubal | Tigers | SP | 4.0e+07 | 5.4 | 7407407 |
Interpretation:
Table 2 highlights the top 10 contracts that deliver the lowest cost per
unit of best WAR, representing the most efficient deals from a team’s
perspective. These players have managed to outperform the value of their
contract, either by exceeding expectations or by signing deals that
underestimated their subsequent performance. This list is useful for
understanding what kinds of players, positions, or contract
circumstances tend to yield the best returns for teams. For instance, if
a particular position is disproportionately represented among these
efficient contracts, it may indicate that the market tends to undervalue
players in that role. It can also reveal the impact of timing—players
who broke out after signing, or who signed before a major market shift,
may appear here. In reviewing these contracts, I am able to identify not
just who the bargains were, but also what traits or circumstances they
share, such as age, contract length, or past injury risk.
bottom10_war <- contracts %>%
arrange(desc(contract_per_war)) %>%
select(player, team, position, contract_value, best_war, contract_per_war) %>%
head(10)
kable(bottom10_war, caption = "Table 3. Top 10 Least Cost-Efficient Contracts (\\$ per WAR)") %>%
kable_styling(full_width=FALSE)
player | team | position | contract_value | best_war | contract_per_war |
---|---|---|---|---|---|
Shohei Ohtani | Dodgers | DH/SP | 7.00e+08 | 9.0 | 77777778 |
Francisco Lindor | Mets | SS | 3.41e+08 | 6.1 | 55901639 |
Juan Soto | Yankees | LF | 3.55e+08 | 6.5 | 54615385 |
Xander Bogaerts | Padres | SS | 2.80e+08 | 5.2 | 53846154 |
Anthony Rendon | Angels | 3B | 2.45e+08 | 4.7 | 52127660 |
Corey Seager | Rangers | SS | 3.25e+08 | 6.5 | 50000000 |
Fernando Tatis Jr. | Padres | RF | 3.40e+08 | 7.0 | 48571429 |
Bryce Harper | Phillies | RF | 3.30e+08 | 6.9 | 47826087 |
Trea Turner | Phillies | SS | 3.00e+08 | 6.3 | 47619048 |
Mookie Betts | Dodgers | 2B/RF | 3.65e+08 | 7.7 | 47402597 |
Interpretation:
Table 3 displays the ten contracts that are the least efficient in terms
of cost per WAR, indicating where teams paid the most for the least
amount of value. This may be due to players underperforming relative to
expectations, suffering injuries, or market forces leading to
overpayment (such as free agent bidding wars or positional scarcity). By
examining the characteristics of these contracts, I can infer which
factors contribute most to inefficiency—be it age, risk, recency of
performance, or perhaps overemphasis on intangibles versus measurable
output. It is also important to observe whether certain positions are
more prone to housing these inefficient contracts, suggesting a
structural market issue. Reviewing these deals provides insight into the
risks and pitfalls teams face in contract negotiations and the potential
for future corrective action.
ggplot(contracts, aes(x = best_war, y = contract_value, color = position)) +
geom_point(alpha=0.7, size=3) +
geom_smooth(method="lm", se=FALSE, color="black", linetype="dashed") +
labs(
title = "Figure 1. Contract Value vs. Best WAR by Position",
x = "Best WAR",
y = "Contract Value (\\$)"
) +
scale_y_continuous(labels = dollar) +
theme_minimal()
Interpretation:
Figure 1 provides a visual representation of the relationship between
player performance (as measured by best WAR) and total contract value,
with points colored by player position. The scatterplot allows me to
assess both the general trend (as indicated by the regression line) and
the presence of outliers or clusters by position. If the points cluster
tightly around the regression line, it suggests that teams are broadly
consistent in paying for performance. However, wide vertical dispersions
at similar WAR values may indicate that some players are rewarded
disproportionately, potentially due to positional scarcity, recent
playoff heroics, or intangible factors such as leadership or
marketability. The color-coding by position further helps me detect
whether certain roles systematically attract higher or lower valuations
for a given WAR level. For example, if pitchers consistently lie above
the regression line, it would imply a premium for that position. This
figure is crucial for visually diagnosing both the presence and the
nature of market inefficiencies in MLB contracts.
ggplot(contracts, aes(x = reorder(position, annual_per_war, FUN=median), y = annual_per_war, fill = position)) +
geom_boxplot(alpha=0.7, show.legend=FALSE) +
labs(
title = "Figure 2. Annual \\$ per WAR by Position",
x = "Position",
y = "Annual \\$ per Best WAR"
) +
scale_y_continuous(labels = dollar) +
theme_minimal() +
coord_flip()
Interpretation:
The boxplot in Figure 2 compares the distribution of annual cost per WAR
across different positions, allowing for a granular analysis of market
efficiency and positional value. By displaying the median, quartiles,
and outliers for each position, I can discern not only which positions
are generally more or less expensive per unit of WAR, but also the
degree of contract value variability. For example, a position with a
high median but a narrow interquartile range suggests consistent
overpayment, while a wide range indicates greater market uncertainty or
volatility. Outliers—particularly large dots above the boxes—may
represent high-profile contract busts or unique market scenarios. This
visualization is instrumental in highlighting which positions may be
systematically over- or under-valued, and in providing a nuanced
perspective on where teams might find undervalued talent. The flipped
coordinate system enhances readability, especially when dealing with
numerous or lengthy position labels.
ggplot(contracts, aes(x = years, y = contract_per_war)) +
geom_jitter(aes(color=position), width=0.2, height=0, alpha=0.7) +
geom_smooth(method="lm", se=FALSE, color="black", linetype="dashed") +
labs(
title = "Figure 3. Contract Length vs. \\$ per Best WAR",
x = "Contract Length (Years)",
y = "Contract Value per Best WAR"
) +
scale_y_continuous(labels = dollar) +
theme_minimal()
Interpretation:
Figure 3 explores the relationship between contract length and cost
efficiency (measured as contract value per best WAR), using jittered
points to avoid overplotting and a regression line to show the overall
trend. This graph is critical for understanding whether teams actually
achieve better value by locking players into longer deals, or if the
risk of decline and injury renders long contracts less efficient in
practice. If the regression line slopes upward, it would imply that
longer contracts are, on average, less cost-efficient—a finding that
could be explained by teams overestimating future production or paying a
risk premium for long-term security. Conversely, a downward or flat
trend would suggest that teams are successfully securing bargains
through long-term commitments. The color-coding by position further
allows me to see if certain roles are more likely to be associated with
longer or more efficient deals. This visualization thus provides insight
into the risk-reward calculus teams face when negotiating contract
length.
contract_lm <- lm(contract_value ~ best_war + avg_war_position + years + position, data=contracts)
tidy(contract_lm)
## # A tibble: 19 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 59652246. 1092393663. 0.0546 9.57e- 1
## 2 best_war 36309057. 5780997. 6.28 1.25e- 8
## 3 avg_war_position -98472873. 456202943. -0.216 8.30e- 1
## 4 years 18460966. 2369727. 7.79 1.24e-11
## 5 position1B/DH 17677202. 59992246. 0.295 7.69e- 1
## 6 position2B 51447745. 274688665. 0.187 8.52e- 1
## 7 position2B/RF 99655039. 275957409. 0.361 7.19e- 1
## 8 position2B/SS 43640851. 322119184. 0.135 8.93e- 1
## 9 position3B 60262750. 229013481. 0.263 7.93e- 1
## 10 positionC -36358571. 137551092. -0.264 7.92e- 1
## 11 positionCF 23699622. 228703842. 0.104 9.18e- 1
## 12 positionCF/SS 44025356. 321909979. 0.137 8.92e- 1
## 13 positionDH 43974479. 142726645. 0.308 7.59e- 1
## 14 positionDH/LF 13344377. 154259919. 0.0865 9.31e- 1
## 15 positionDH/SP 444069773. 365422763. 1.22 2.28e- 1
## 16 positionLF 20566142. 93229334. 0.221 8.26e- 1
## 17 positionRF 63869008. 228649973. 0.279 7.81e- 1
## 18 positionSP 22494489. 182865585. 0.123 9.02e- 1
## 19 positionSS 129094965. 411143469. 0.314 7.54e- 1
summary(contract_lm)
##
## Call:
## lm(formula = contract_value ~ best_war + avg_war_position + years +
## position, data = contracts)
##
## Residuals:
## Min 1Q Median 3Q Max
## -109741013 -15811782 0 13634612 110777004
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 59652246 1092393663 0.055 0.957
## best_war 36309057 5780997 6.281 1.25e-08 ***
## avg_war_position -98472873 456202943 -0.216 0.830
## years 18460966 2369727 7.790 1.24e-11 ***
## position1B/DH 17677202 59992246 0.295 0.769
## position2B 51447745 274688665 0.187 0.852
## position2B/RF 99655039 275957409 0.361 0.719
## position2B/SS 43640851 322119184 0.135 0.893
## position3B 60262750 229013481 0.263 0.793
## positionC -36358571 137551092 -0.264 0.792
## positionCF 23699622 228703842 0.104 0.918
## positionCF/SS 44025356 321909979 0.137 0.892
## positionDH 43974479 142726645 0.308 0.759
## positionDH/LF 13344376 154259919 0.087 0.931
## positionDH/SP 444069773 365422762 1.215 0.228
## positionLF 20566142 93229334 0.221 0.826
## positionRF 63869008 228649973 0.279 0.781
## positionSP 22494489 182865585 0.123 0.902
## positionSS 129094965 411143469 0.314 0.754
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 36950000 on 88 degrees of freedom
## Multiple R-squared: 0.8992, Adjusted R-squared: 0.8786
## F-statistic: 43.63 on 18 and 88 DF, p-value: < 2.2e-16
Interpretation:
The regression model presented here quantifies the relationship between
total contract value and key explanatory variables: a player’s best WAR,
the average WAR at that player’s position, contract length, and position
(included as categorical dummies). By considering both player-specific
and contextual factors, I am able to test whether teams truly reward
performance and whether certain positions or contract lengths confer
premiums or discounts. A significant positive coefficient for
best_war
would confirm that teams do, in fact, pay for
performance, consistent with economic theory and prior research.
Significant coefficients for position dummies would indicate systematic
over- or under-valuation of specific roles, possibly due to scarcity,
risk, or market perception. The inclusion of years
allows
me to observe whether longer contracts are associated with higher total
values, and whether this relationship is linear or subject to
diminishing returns. The model’s R-squared and residuals provide
information about how much of the variation in contract value is
explained by observable factors, and how much may be attributable to
negotiation, timing, or other idiosyncratic influences.
For a general manager or front office executive, the pursuit of efficient salary allocation must be both quantitative and contextual. My findings suggest that while a premium may be necessary for top-end pitchers or shortstops due to positional scarcity and their high-risk, high-reward nature, significant arbitrage opportunities exist in other positions often overlooked by the broader market. This could involve using advanced defensive metrics to identify an elite defensive catcher who is undervalued, or targeting utility players whose versatility—while difficult to quantify in a single WAR metric—provides immense roster flexibility.
Extreme caution should be exercised with long-term deals, as the data indicate these commitments are often a gamble that leads to less efficient outcomes on a per-WAR basis. This is where the power of agents is most evident; agents are adept at leveraging a player’s peak performance to secure long-term, lucrative deals that carry significant risk for the team due to age-related decline and injury. Therefore, a forward-thinking general manager should be wary of contracts extending beyond a player’s expected peak performance curve and instead explore shorter-term, high-value deals or team-friendly options that mitigate long-term risk.
The most successful general managers blend quantitative rigor with qualitative understanding. While analytics are the backbone of modern decision-making, they are not the sole determinants of value. A GM must blend data with qualitative judgment, recognizing that factors such as a player’s clubhouse presence, leadership, and marketability, while hard to measure, still hold immense value that can tip the balance in a competitive season. A truly efficient front office understands the market, the metrics, and the human element.
To succeed in this market, a general manager must move beyond a simple “dollars per WAR” calculation and embrace a dynamic, portfolio-based approach to roster construction, emphasizing risk management and internal talent development. Continuous market assessment and adaptability are key, as the valuation of a WAR point fluctuates with market shifts, team finances, and the collective bargaining agreement. This thesis provides a framework for systematic contract evaluation and strategic front office decision-making.
The results of this study reveal a nuanced and multifaceted picture of salary allocation efficiency in Major League Baseball. At the core, the data and analysis consistently show that while contract value and length are generally aligned with player performance, as measured by best single-season WAR, there remain notable pockets of inefficiency and market anomalies across positions and contract structures. Table 1 and Figure 2 highlight significant variation in cost per WAR by position, bringing to light that roles such as shortstop and pitcher may be systematically over- or under-valued relative to their actual on-field contributions. This observation reflects not only market dynamics and positional scarcity, but also limitations within WAR itself as a universal metric—catcher defense, pitcher risk profiles, and multi-positional versatility can escape the grasp of traditional performance measures, and thus may not be fully compensated.
The extremes of contract outcomes are further illustrated in Tables 2 and 3, where the most and least efficient deals expose the realities of a complex labor market. Teams that successfully identify undervalued talent or negotiate team-friendly deals—often with players who outperform expectations or emerge as breakout stars—reap substantial rewards. Conversely, some organizations find themselves burdened by contracts that return little value, frequently due to unforeseen injuries, rapid declines in performance, or aggressive agent negotiation that results in premium overpayment. These inefficiencies are not simply statistical artifacts, but demonstrate the inherent challenge of predicting future player value in a sport defined by uncertainty and variability.
Figure 1 demonstrates a strong, though imperfect, correlation between contract value and best WAR, suggesting that performance is rewarded but not in isolation. Substantial variation exists at similar performance levels, often driven by factors such as market size, contract timing, or intangible characteristics like leadership and clubhouse presence, which are not captured by WAR but still influence negotiation outcomes. Figure 3 adds another layer to the discussion, revealing that longer contracts can be less efficient, potentially reflecting the risk premium teams must pay for long-term security or the natural tendency for players to decline before their contracts end. This is especially pertinent in the context of modern negotiations, where agents wield considerable bargaining power and seek to secure lengthy, lucrative deals for elite clients—sometimes at the expense of efficiency.
Regression analysis supports the central hypothesis, confirming that best WAR is a significant predictor of contract value, but also indicating that position and contract length play important roles. The model’s unexplained variation points to persistent market inefficiencies and the influence of negotiation dynamics, including agent reputation and leverage, market size disparities, and the evolving role of analytics. Notably, the contracts of generational stars such as Shohei Ohtani, Aaron Judge, and Juan Soto serve as outliers in both data and theory. Ohtani’s unique profile as a two-way player and global icon resulted in a contract that far exceeds what WAR alone would suggest, illustrating how exceptional talent and international marketability can break the mold. Judge and Soto, representing large-market franchises and backed by powerful agents, secured deals inflated by both performance and external factors, further reinforcing the idea that market efficiency is not absolute.
Limitations of this study are primarily methodological. By relying on best single-season WAR, the analysis may overstate expected future value for players who have peaked or are in decline. Multi-year averages and age-adjusted projections could yield more accurate estimates of long-term worth. The absence of controls for age, injury history, and off-field factors also limits the explanatory power of the model, as these elements are crucial in contract negotiations and actual player outcomes. Additionally, the focus on the 2025 season, while representative of current market conditions, omits longitudinal trends and cyclical shifts in negotiation behavior that could be captured through multi-year analysis.
Implications for general managers and front offices are clear: the pursuit of efficient salary allocation must be both quantitative and contextual. Teams can exploit arbitrage opportunities by targeting undervalued positions and player profiles, leveraging analytics not just for performance assessment but for improved negotiation strategy. However, caution is warranted in long-term commitments, as the data show that over-year contract limitations frequently result in inefficient deals due to the unpredictable nature of player health and performance decline. Agent power remains a decisive factor, especially in long-term, high-value negotiations; teams must be prepared to contend with sophisticated bargaining tactics and market pressures that can distort efficiency.
For future work, this study suggests integrating multi-year contract data, age curves, and advanced performance metrics—such as Statcast data and injury risk models—to refine estimates of market efficiency. Expanding the analysis to account for team context, playoff contention, market size, payroll flexibility, and exogenous shocks (including CBA changes and pandemic effects) will further illuminate the drivers of contract value and efficiency. Ultimately, the findings contribute to the literature by providing contemporary evidence of both progress and persistent gaps in MLB labor market efficiency, reinforcing the importance of strategic flexibility and nuanced decision-making.
In synthesizing these findings, I support the use of advanced metrics and analytics for contract evaluation, but advocate for a balanced approach that considers agent power, market size, unique player profiles, and the limitations of predictive models. Teams should blend quantitative rigor with qualitative judgment, recognizing the dynamic and sometimes unpredictable nature of baseball economics. For players and agents, awareness of positional and market anomalies can inform negotiation strategies, while for the league as a whole, ongoing research and adaptation are essential in striving for true market efficiency and competitive balance.
Arellano, Manuel, et al. “Compensation and Performance in Major League Baseball: Evidence from Salary Dispersion and Team Performance.” Journal of Sports Economics, vol. 17, no. 4, 2016, pp. 347–367. https://journals.sagepub.com/doi/full/10.1177/1527002516631456
Bierig, Brian, Jonathan Hollenbeck, and Alexander Stroud. “Understanding Career Progression in Baseball Through Machine Learning.” arXiv, Dec. 2017. https://arxiv.org/abs/1712.05754
Bradbury, John Charles. “Peak Athletic Performance and Ageing: Evidence from Baseball.” Journal of Sports Sciences, vol. 27, no. 6, 2009, pp. 599–610. https://www.tandfonline.com/doi/full/10.1080/02640410802603863
Brown, James, and Christopher Jepsen. “The Wage Effects of Multitasking.” Labour Economics, vol. 16, no. 1, 2009, pp. 112–121. https://www.researchgate.net/publication/222659336_The_Wage_Effects_of_Multitasking
Carruth, Matthew, and Shane T. Jensen. “Evaluating Throwing Ability in Baseball.” Journal of Quantitative Analysis in Sports, vol. 3, no. 3, 2007. https://www.degruyter.com/document/doi/10.2202/1559-0410.1079/html
Depken, Craig A., and Dennis P. Wilson. “The Demand for Salary Arbitration in Major League Baseball.” Industrial Relations: A Journal of Economy and Society, vol. 43, no. 4, 2004, pp. 801–821. https://onlinelibrary.wiley.com/doi/full/10.1111/j.0019-8676.2004.00357.x
Ehrlich, Jesse, et al. “Does a Salary Premium Exist for Offensive Output in Major League Baseball?” Managerial Finance, vol. 47, no. 3, 2021, pp. 326–335. https://www.emerald.com/insight/content/doi/10.1108/MF-04-2020-0186/full/html
Elitzur, Ramy. “Data Analytics Effects in Major League Baseball.” Omega, vol. 90, 2019/20, article 102001. https://www.sciencedirect.com/science/article/pii/S0305048318303744
Freeston, Jonathan, et al. “In-Game Workload Demands of Position Players in Major League Baseball.” Journal of Athletic Training, vol. 59, no. 3, 2024. https://journals.humankinetics.com/view/journals/jat/59/3/article-p198.xml
Granato, Amanda. “Wage Dispersion and Individual Performance: MLB Pitchers.” Union College Honors Thesis, 2023. https://digitalworks.union.edu/theses/2715
Hakes, Jahn K., and Raymond D. Sauer. “An Economic Evaluation of the Moneyball Hypothesis.” Journal of Economic Perspectives, vol. 20, no. 3, 2006, pp. 173–185. https://www.aeaweb.org/articles?id=10.1257/jep.20.3.173
Kennedy-Shaffer, Lee. “The Effects of Major League Baseball’s Ban on Infield Shifts: A Quasi-Experimental Analysis.” arXiv, Nov. 2024. https://arxiv.org/abs/2411.15075
Krautmann, Anthony C. “What’s Wrong with Scully-Estimates of a Player’s Marginal Revenue Product?” Economic Inquiry, vol. 37, no. 2, 1999, pp. 369–381. https://onlinelibrary.wiley.com/doi/10.1111/j.1465-7295.1999.tb01437.x
Krautmann, Anthony C., and John L. Solow. “The Baseball Players’ Labor Market Reconsidered.” Labour Economics, vol. 16, no. 1, 2009, pp. 32–41. https://www.sciencedirect.com/science/article/abs/pii/S0927537108000343
Link, Charles R., and Martin Yosifov. “Contract Length and Salaries Compensating Wage Differentials in Major League Baseball.” Journal of Sports Economics, vol. 13, no. 1, 2012, pp. 75–92. https://journals.sagepub.com/doi/abs/10.1177/1527002510396984
PMC Study. “The Impact of National Culture, Altruism, and Risk Preference on Salaries: The Case of Major League Baseball.” Frontiers in Psychology, 2023. https://www.frontiersin.org/articles/10.3389/fpsyg.2023.10171653/full
Rivers, Douglas, and Robert Brown. “Valuing Versatility: Do Teams Pay for Multi-Position Players?” Contemporary Economic Policy, vol. 24, no. 4, 2006, pp. 607–618. https://academic.oup.com/cep/article/24/4/607/1865308
Sommer, Jeffrey. “Human Capital and Salary Distribution in Major League Baseball.” Applied Economics Letters, vol. 18, no. 8, 2011, pp. 705–708. https://www.tandfonline.com/doi/full/10.1080/13504851.2010.533415
Sommers, Paul M., and Noel Quinton. “Pay and Performance in Major League Baseball: The Case of the First Family of Free Agents.” Journal of Human Resources, vol. 17, no. 3, 1982, pp. 426–436. http://www.jstor.org/stable/145589
Watnik, Mitchell R. “Pay for Play: Are Baseball Salaries Based on Performance?” Journal of Statistics Education, vol. 6, no. 2, 1998, n. pag. https://www.tandfonline.com/doi/full/10.1080/10691898.1998.11910618