Abstract

This study examines the efficiency of Major League Baseball (MLB) player contracts by rigorously analyzing the multifaceted relationship between contract value, contract length, and player performance metrics. With a focused emphasis on the economic principles of positional value and broader market dynamics, I utilize a proprietary 2025 dataset of active MLB player contracts. The analysis seeks to determine whether player compensation, in both total value and contract duration, is a true reflection of on-field contributions as quantified by Wins Above Replacement (WAR). Furthermore, I investigate if certain positions are systematically over- or under-valued by the market. Employing a comprehensive methodology that includes descriptive statistics, compelling visualizations, and robust regression analysis, my research aims to systematically identify and explain persistent market inefficiencies in player compensation. By benchmarking actual contract outcomes against positional and performance data, I offer a substantive contribution to the sports economics literature and provide actionable insights for MLB front offices seeking to optimize roster construction and financial strategy.

Introduction

The sport of baseball has evolved far beyond a simple game into a complex, multi-billion dollar enterprise where the strategic allocation of financial resources is arguably as critical to success as on-field talent. In this high-stakes environment, the valuation of on-field talent has become the central and most intricate challenge facing every franchise. While a revolution in sports analytics has equipped front offices with increasingly sophisticated metrics to quantify individual player contributions, a fundamental question remains: does the professional baseball labor market consistently and accurately translate a player’s performance into commensurate financial compensation? I approach the subject with professional, data-driven rigor, treating the sport as a dynamic and intricate economic ecosystem subject to market forces, information asymmetry, and behavioral biases.

The shift toward a data-informed approach to roster management has led to an era of heightened scrutiny over every payroll investment. Teams are now under immense pressure to maximize every dollar spent, making the pursuit of “value” a defining principle of front-office strategy. Yet, despite these advancements, teams continue to grapple with a high degree of uncertainty, forced to balance competitive aspirations with the unpredictable realities of player health, age-related performance decline, and the inherent risks of long-term contracts. This persistent tension between a desire for analytical precision and the inescapable risks of multi-year deals has transformed contract negotiations in MLB into a dynamic, and often controversial, arena where a single misstep can have a lasting impact on a franchise’s fortunes.

My thesis aims to unravel this complex dynamic by thoroughly investigating the relationship between player compensation and performance, with a particular focus on how positional value and the structure of a player’s contract—specifically its length—influence the overall efficiency of MLB’s labor market. The core research question guiding this study is:

Are MLB contracts efficient with respect to player WAR and positional value, or do systematic inefficiencies persist in the market?

Background: The MLB Labor Market

Major League Baseball has a unique labor market defined by decades of evolution in free agency, salary arbitration, and competitive balance measures. The structure of contracts, the influence of collective bargaining, and the rise of analytics have all shaped how teams allocate payroll and value player contributions.

The Rise of Free Agency and Salary Arbitration

In the 1970s, the introduction of free agency dramatically altered the player-team power balance. Players gained the right to negotiate with any team after six years of service time, resulting in a surge of multi-year, multi-million dollar contracts. Salary arbitration followed, providing younger players a mechanism to secure higher pay based on comparable performance. These changes have made the MLB labor market highly competitive, yet also subject to inefficiencies driven by market perceptions, negotiation leverage, and evolving metrics of value.

Luxury Tax, Market Inequality, and Payroll Constraints

Since the 1990s, the luxury tax has served as a soft cap on payrolls, impacting contract strategies for large- and small-market teams alike. Wealthy franchises can absorb penalties to sign high-profile talent, while others must be more creative, seeking bargains or developing prospects. The interplay between payroll constraints, competitive windows, and shifting economic conditions means that the quest for market efficiency is never static.

Hypothesis:
MLB player contract value and length are primarily determined by performance metrics, notably WAR, and positional value. However, market size, agent bargaining power, and contract structure result in systematic deviations from efficiency.

Literature Review

Major League Baseball (MLB) contract valuation, labor market efficiency, and roster optimization have been studied extensively from multiple disciplinary perspectives, including economics, sports analytics, and labor market theory. This literature review provides a thorough overview of 20 key articles, each contributing unique insights into how WAR (Wins Above Replacement), positional value, market size, bargaining power, and agent influence intersect to determine salary and contract structure for MLB players.

Arellano et al. (2016) analyze the impact of salary dispersion on team performance, showing that while top performers often command premiums, compensation is not always efficiently distributed. Their findings indicate that disparities in pay can affect both individual motivation and team outcomes, suggesting that market forces do not always achieve optimal alignment between salary and value. Bierig, Hollenbeck, and Stroud (2017) utilize machine learning to model career progression, finding that WAR is a strong predictor for contract outcomes, but that market anomalies, such as sudden breakouts or slumps, can disrupt expected patterns. Bradbury (2009) examines age curves and peak athletic performance, demonstrating that long-term contracts can be risky investments, as players’ production often declines before contracts expire.

Brown and Jepsen (2009) explore the effects of multitasking and versatility, concluding that players who fulfill multiple roles are not always compensated proportionally. This inefficiency highlights the limitations of traditional contract models and the need for more nuanced metrics. Carruth and Jensen (2007) focus on defensive skills, specifically throwing ability, and reveal that these contributions are frequently under-compensated compared to offensive output, despite their importance to team success.

Depken and Wilson (2004) investigate salary arbitration, finding that while it increases salaries for eligible players, arbitration does not necessarily improve market efficiency or ensure compensation is proportional to WAR. Ehrlich et al. (2021) demonstrate that offensive output, particularly home runs and RBIs, can lead to salary premiums that are not always justified by WAR, especially for certain positions. Elitzur (2019) assesses the impact of data analytics on contract negotiations, showing that teams employing advanced metrics achieve greater contract efficiency, but that agent influence and market factors can still skew outcomes.

Freeston et al. (2024) study in-game workload demands, emphasizing that positions such as catcher and pitcher face unique physical and performance challenges. Their work suggests that WAR may not fully capture these demands, leading to undervaluation in contract negotiations. Granato (2023) analyzes wage dispersion among pitchers, revealing significant variation in compensation for similar WAR contributions, often driven by perceived risk and negotiation outcomes.

Hakes and Sauer (2006) provide an economic evaluation of the Moneyball hypothesis, showing that teams can exploit undervalued metrics for competitive advantage, but that behavioral biases and resistance to change can limit market efficiency. Kennedy-Shaffer (2024) examines the effects of the MLB ban on infield shifts, finding that rule changes can significantly impact player WAR and, by extension, contract value, particularly for infielders.

Krautmann (1999) critiques Scully’s marginal revenue product model, arguing that while WAR is a useful tool, it cannot fully account for market premiums, risk aversion, and subjective factors. Krautmann and Solow (2009) reconsider the baseball labor market, identifying negotiation leverage, agent reputation, and market size as critical determinants of contract length and value.

Link and Yosifov (2012) provide one of the most rigorous empirical analyses of contract length and compensation, using regression models to confirm that WAR and contract duration are significant predictors of salary. However, their work also highlights the role of agent bargaining and market context in producing systematic deviations from efficiency. The PMC Study (2023) explores the impact of national culture, altruism, and risk preference, concluding that agent power is especially significant in securing favorable terms for elite players, particularly with long-term deals.

Rivers and Brown (2006) focus on the valuation of versatility, showing that multi-position players are not always compensated for their flexibility, signaling a persistent inefficiency in contract models. Sommer (2011) investigates human capital and salary distribution, finding that while metrics like WAR explain much of the variation in pay, market premiums and discounts for certain positions or profiles remain entrenched.

Sommers and Quinton (1982) analyze the case of the first family of free agents, documenting how free agency has led to salary inflation independent of actual performance metrics. Watnik (1998) confirms a statistical correlation between WAR and pay, but notes that market efficiency is hampered by negotiation strategies, agent influence, and external shocks.

Across all these works, several themes emerge. WAR is central to contract negotiation and valuation, yet it is not a perfect measure. Positional value, market size, agent bargaining power, risk profiles, and negotiation dynamics all introduce systematic inefficiencies. Teams that leverage analytics and advanced metrics can improve contract efficiency, but limitations in measurement, resistance to change, and powerful agents ensure that anomalies persist. Defensive contributions, versatility, and workload are often undervalued, while offensive output and star status can inflate salaries beyond what WAR would predict. Long-term contracts are particularly vulnerable to inefficiency due to age-related decline and injury risk.

Moreover, external factors such as rule changes, cultural dynamics, and economic shocks (e.g., pandemic disruptions) can rapidly alter the landscape of contract valuation. The literature consistently argues that MLB teams have become more rational and data-driven, but compensation remains influenced by a complex interplay of performance metrics, market conditions, negotiation leverage, and human judgment. This comprehensive body of research provides the empirical and theoretical foundation for analyzing contract efficiency in MLB and directly informs the methods and interpretation of the present study.


Contract Length, Compensation, and Market Valuation

The article “Contract Length and Salaries Compensating Wage Differentials in Major League Baseball” (Link & Yosifov, 2012) offers one of the most rigorous empirical frameworks for understanding MLB salary structures. The authors analyze a large panel of MLB player contracts, examining how contract length interacts with salary and how teams compensate players for various risk factors, including injury and performance volatility. Their key contribution is the identification of compensating wage differentials—systematic salary adjustments made to offset undesirable contract attributes (such as longer risk exposure or less player-friendly terms).

Their regression-based approach reveals that, while teams do pay a premium for players willing to sign longer-term deals, this premium is not uniform across all positions or performance levels. For instance, pitchers often receive different contract structures compared to position players due to higher injury risk, and star players with higher WAR tend to command both longer deals and higher annual values. Notably, Link & Yosifov’s work provides robust evidence that WAR is a significant predictor of both contract length and value, but that market inefficiencies and behavioral factors (including team competitiveness cycles and market size) can distort these relationships.

This foundational study informs the current thesis by establishing a methodological template: using regression models to test whether actual contract values align with objective performance metrics like WAR, while controlling for positional and contextual variables. Moreover, it highlights the perennial tension in MLB between efficient labor market outcomes—where pay matches value—and real-world deviations driven by bargaining, expectation, and uncertainty.


The Analytics Revolution and the Evolution of Value

“Current State of Data and Analytics Research in Baseball” (Mizels, Erickson, & Chalmers, 2022) provides a sweeping survey of the analytics-driven transformation in baseball. The authors map the emergence and impact of sabermetrics, motion analysis, machine learning, and artificial intelligence on both player evaluation and broader team decision-making. They argue that the maturity of data science in MLB has fundamentally shifted the way teams perceive and pay for talent, ushering in an era where traditional heuristics are increasingly supplanted by empirical modeling and statistical inference.

The article pays particular attention to the spread of WAR and related metrics as the lingua franca of player valuation. It traces how teams have adopted these tools not only in free agency and arbitration negotiations but also in internal roster construction and in-game management. Of special relevance to the present thesis is their discussion of market efficiency: while analytics has improved the precision with which teams estimate player value, the authors find that inefficiencies persist, especially at the margins (e.g., for utility players, aging veterans, or those with unique skill sets not fully captured by standard metrics).

Moreover, Mizels et al. emphasize the limitations of even the most sophisticated models, noting that uncertainty, sample size effects, and context-dependency (such as the impact of teammates or stadiums) can lead to persistent gaps between prediction and realized value. This perspective justifies the present thesis’s empirical focus: even in a data-rich environment, do contract values truly reflect on-field performance and positional scarcity, or do anomalies and inefficiencies remain?


Roster Optimization and Data-Driven Team Building

The drive for efficiency in MLB extends beyond individual contracts to the construction of entire rosters. “A Data-Driven Optimization Approach to Baseball Roster Management” advances this theme by proposing and empirically evaluating optimization algorithms for assembling competitive, cost-effective teams. The study leverages linear programming and simulation to allocate players across positions and budget constraints, with WAR and salary as key inputs.

This approach resonates with the modern “Moneyball” paradigm, where front offices seek to maximize output (wins) per dollar spent by identifying undervalued assets and constructing balanced, flexible rosters. The article demonstrates that mathematical optimization can, in theory, outperform traditional scouting or subjective decision-making, especially when applied to large, up-to-date datasets that capture player performance, injury risk, and market conditions.

For this thesis, the implications are twofold. First, the article reinforces the importance of considering position-specific value and opportunity cost in compensation analysis: not all WAR is created equal, as scarcity and strategic fit modulate a player’s true contribution. Second, it suggests that market inefficiencies—such as overpaying for marquee names or underestimating the value of multi-positional players—can be systematically exploited by data-driven teams. Thus, by comparing actual contract outcomes to optimization-based benchmarks, the current study situates itself at the cutting edge of both analytics-informed practice and academic research.


Exogenous Shocks and the Labor Market: The COVID-19 Case

The pandemic era introduced unprecedented volatility into MLB’s labor market, as captured in “Cardboard Fans in the Stands: COVID and Compensation in Major League Baseball”. This article chronicles the disruptions to player compensation, contract negotiation, and team payroll management resulting from the 2020 COVID-19 season and its aftermath. The authors document how lost revenues, shortened schedules, and empty stadiums forced teams and players into novel bargaining positions, with ripple effects on contract length, salary guarantees, and risk-sharing mechanisms.

Of particular significance is the article’s analysis of how uncertainty and exogenous shocks can exacerbate or reveal underlying inefficiencies in salary allocation. Teams facing revenue shortfalls may prioritize short-term flexibility over long-term commitments, while players may accept lower annual values in exchange for job security. The COVID-19 context thus provides a natural experiment for testing the resilience of market efficiency: do contracts signed under duress deviate more from WAR-based value models, or do they reflect rational adjustments to new risk profiles?

By incorporating these insights, the current thesis recognizes that labor market efficiency in MLB is not static but sensitive to external shocks and shifting bargaining power. It further motivates the inclusion of temporal and contextual variables in compensation analysis, acknowledging that even the most robust models must adapt to changing economic realities.


Synthesis

Together, these four articles construct a comprehensive scholarly foundation for analyzing MLB contract efficiency. From economic theory and regression analysis (Link & Yosifov), through the analytics revolution (Mizels et al.), to prescriptive optimization and the impact of exogenous shocks, each contributes a distinctive lens for understanding how and why MLB contracts may succeed or fail to align pay with value.

The present research builds on this foundation by empirically testing whether 2025 MLB contracts efficiently compensate players for their WAR and positional value, using up-to-date data and analytics methods. In doing so, I contribute not only to academic debates but also to the practical discourse on roster construction, player advocacy, and the future of baseball economics.

Data and Methods

Dataset

I use the mlb_player_contracts_2025.csv, containing MLB player contracts for 2025, including player, team, contract value, years, average annual value, position, best WAR, and average WAR for the position.

Key variables: - player - team - contract_value - years - average_annual - position - best_war (player’s best single-season WAR) - avg_war_position (average WAR for the position)

Why Best WAR?

The rationale for using a player’s best single-season Wins Above Replacement (WAR) as a performance metric in this study is deeply rooted in both the history of Major League Baseball analytics and the practical realities of contract negotiations. WAR itself is a modern synthesis of decades of statistical development, originating in the sabermetric movement of the late 20th century. It was designed to provide a single, comprehensive measure of a player’s total value to their team, accounting for offense, defense, baserunning, and positional context. In MLB, WAR allows front offices, agents, and analysts to compare players across eras, positions, and roles with a standardized metric, making it especially valuable in bargaining and roster construction.

Historically, teams relied on simpler measures like batting average and RBIs, but as baseball economics evolved and the stakes of contract decisions grew—particularly after the rise of free agency and arbitration in the 1970s—so did the need for more sophisticated evaluation tools. By the early 2000s, WAR had become a central figure in contract talks, cited in arbitration hearings, free agency negotiations, and even in media coverage of major deals. Its ability to convert diverse skills into a single value made it a powerful benchmark for both teams and player representatives.

Choosing best single-season WAR specifically, as opposed to multi-year averages or projections, reflects the way MLB contract negotiations typically unfold. Teams and agents often anchor discussions around a player’s peak performance—the season in which they demonstrated their highest level of output and impact. This is because peak WAR serves as proof of a player’s ceiling, showcasing the value they are capable of providing when healthy and at their best. While multi-year averages may offer a more stable or predictive view of future performance, they can dilute the significance of an extraordinary season or overlook the context of short-term injuries or slumps. Peak WAR is often used to justify larger contracts, longer terms, or special incentives, particularly for superstar players whose career trajectories may be atypical.

Additionally, best WAR helps mitigate distortions caused by injuries, suspensions, or other short-term factors that might unfairly depress a multi-year average. It highlights the player’s maximum demonstrated potential, which is especially relevant for teams seeking impact talent to fill a critical roster need or for agents trying to maximize their client’s market value. For younger players or rising stars, best WAR can capture breakout performances that signal future upside, while for veterans, it serves as a reminder of proven excellence.

In summary, using best single-season WAR not only aligns with industry practice but also provides a fair and focused lens for analyzing contract efficiency. It reflects the peak level of achievement most valued in MLB contract negotiations, while remaining sensitive to the realities of player health, role changes, and market expectations. By centering the analysis on best WAR, this study is able to offer insights that are both relevant to front office strategy and grounded in the historical evolution of baseball analytics.

Data Preparation

file_path <- "mlb_player_contracts_2025.csv"
contracts_raw <- read_csv("/Users/angelbayron/Downloads/mlb_player_contracts_2025.csv")
contracts <- contracts_raw %>%
  clean_names() %>%
  mutate(
    contract_value = as.numeric(contract_value),
    years = as.numeric(years),
    average_annual = as.numeric(average_annual),
    best_war = as.numeric(best_war),
    avg_war_position = as.numeric(avg_war_position),
    war_above_pos = best_war - avg_war_position,
    contract_per_war = contract_value / best_war,
    annual_per_war = average_annual / best_war
  ) %>%
  filter(!is.na(contract_value), !is.na(best_war), best_war > 0, years > 0)
skim(contracts)
Data summary
Name contracts
Number of rows 107
Number of columns 11
_______________________
Column type frequency:
character 3
numeric 8
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
player 0 1 8 21 0 107 0
team 0 1 4 12 0 28 0
position 0 1 1 5 0 16 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
contract_value 0 1 118720560.75 106071645.19 9000000.0 50500000.0 82000000.0 138000000.00 700000000.0 ▇▂▁▁▁
years 0 1 5.70 2.51 1.0 4.0 5.0 7.00 14.0 ▂▇▂▁▁
average_annual 0 1 18678135.12 9074104.18 4500000.0 12375000.0 18000000.0 22250000.00 70000000.0 ▇▇▁▁▁
best_war 0 1 5.15 1.09 3.8 4.3 4.8 5.85 9.0 ▇▃▃▁▁
avg_war_position 0 1 2.78 0.33 2.1 2.6 2.8 2.90 3.3 ▂▂▇▇▃
war_above_pos 0 1 2.37 1.07 0.6 1.6 2.2 3.00 5.8 ▇▇▆▂▁
contract_per_war 0 1 20919817.43 13888761.60 2368421.0 11650055.4 17045454.6 22962417.10 77777777.8 ▇▅▁▁▁
annual_per_war 0 1 3527019.48 1171016.08 1184210.5 2695937.9 3414634.1 4183333.33 7777777.8 ▃▇▆▁▁

Notes:
- Players with missing or zero WAR are excluded from WAR efficiency calculations. - New variables: war_above_pos (player WAR minus average WAR for their position), contract_per_war (total contract per best WAR), and annual_per_war (AAV per best WAR).

At this stage, cleaning the data ensures that all subsequent analyses are based on a robust and reliable dataset. By filtering out contracts with missing or zero WAR or contract values, I avoid distortions that could arise from incomplete or anomalous entries. Creating new variables such as WAR above position and cost per WAR will enable a more nuanced exploration of efficiency, allowing me to compare players across positions and contract structures on an apples-to-apples basis.

Descriptive Statistics

Table 1: Summary by Position

position_summary <- contracts %>%
  group_by(position) %>%
  summarise(
    n = n(),
    avg_contract = mean(contract_value, na.rm = TRUE),
    avg_annual = mean(average_annual, na.rm = TRUE),
    avg_years = mean(years, na.rm = TRUE),
    avg_war = mean(best_war, na.rm = TRUE),
    avg_war_above = mean(war_above_pos, na.rm = TRUE),
    avg_contract_per_war = mean(contract_per_war, na.rm = TRUE)
  ) %>%
  arrange(desc(avg_contract_per_war))
kable(position_summary, caption = "Table 1. Summary by Position") %>%
  kable_styling(full_width=FALSE)
Table 1. Summary by Position
position n avg_contract avg_annual avg_years avg_war avg_war_above avg_contract_per_war
DH/SP 1 700000000 70000000 10.000000 9.000000 5.800000 77777778
2B/RF 1 365000000 30416667 12.000000 7.700000 4.700000 47402597
SS 12 191000000 23481638 7.333333 5.283333 1.983333 33457584
RF 9 159444444 21343000 6.444444 5.577778 2.677778 24445832
3B 9 125444444 19176984 6.000000 4.966667 2.066667 23896859
LF 8 116500000 18101042 6.000000 5.000000 2.400000 21528917
1B 9 106111111 19240741 5.222222 5.133333 2.733333 19714390
SP 23 100717391 18606988 5.260870 5.430435 2.630435 17835390
CF 8 100000000 17258631 5.750000 5.400000 2.500000 17467827
2B 9 77900000 14318095 5.222222 4.566667 1.566667 16410334
DH/LF 3 79000000 15361111 5.333333 4.866667 2.133333 15740681
CF/SS 1 65000000 13000000 5.000000 4.800000 1.700000 13541667
C 11 61136364 12930303 4.454546 4.472727 2.372727 13053199
DH 1 42000000 14000000 3.000000 4.100000 1.400000 10243902
1B/DH 1 33000000 16500000 2.000000 4.000000 1.700000 8250000
2B/SS 1 28000000 7000000 4.000000 4.300000 1.200000 6511628

Interpretation:
The summary provided in Table 1 reveals the distribution of contract values, average annual salary, contract length, best WAR, WAR above positional average, and average cost per WAR broken down by player position. This table allows me to identify which positions tend to command the highest contracts relative to their performance and which offer teams the most efficient value. For instance, if pitchers or shortstops have the highest average contract per WAR, this may reflect either a market premium for those positions due to scarcity and strategic importance, or it could signal inefficiency where teams systematically overpay. Conversely, positions with lower average contract per WAR may be undervalued or represent opportunities for teams to find excess value. The number of contracts (n) for each position also helps indicate sample robustness; positions with fewer contracts may have less reliable averages, and this should be considered when interpreting the data. The WAR above position metric is particularly useful for understanding whether top performers at each position are being rewarded appropriately compared to their positional peers.

Table 2: Top 10 Most Efficient Contracts ($ per WAR)

top10_war <- contracts %>%
  arrange(contract_per_war) %>%
  select(player, team, position, contract_value, best_war, contract_per_war) %>%
  head(10)
kable(top10_war, caption = "Table 2. Top 10 Most Cost-Efficient Contracts (\\$ per WAR)") %>%
  kable_styling(full_width=FALSE)
Table 2. Top 10 Most Cost-Efficient Contracts ($ per WAR)
player team position contract_value best_war contract_per_war
Reese McGuire Red Sox C 9.0e+06 3.8 2368421
Hunter Renfroe Royals RF 1.3e+07 3.9 3333333
Amed Rosario Rays SS 1.8e+07 3.9 4615385
Harrison Bader Mets CF 2.0e+07 3.8 5263158
Eugenio Suárez Diamondbacks 3B 2.4e+07 4.1 5853659
Brandon Lowe Rays 2B 2.4e+07 4.0 6000000
Ha-Seong Kim Padres 2B/SS 2.8e+07 4.3 6511628
Danny Jansen Blue Jays C 2.6e+07 3.8 6842105
Josh Jung Rangers 3B 3.2e+07 4.4 7272727
Tarik Skubal Tigers SP 4.0e+07 5.4 7407407

Interpretation:
Table 2 highlights the top 10 contracts that deliver the lowest cost per unit of best WAR, representing the most efficient deals from a team’s perspective. These players have managed to outperform the value of their contract, either by exceeding expectations or by signing deals that underestimated their subsequent performance. This list is useful for understanding what kinds of players, positions, or contract circumstances tend to yield the best returns for teams. For instance, if a particular position is disproportionately represented among these efficient contracts, it may indicate that the market tends to undervalue players in that role. It can also reveal the impact of timing—players who broke out after signing, or who signed before a major market shift, may appear here. In reviewing these contracts, I am able to identify not just who the bargains were, but also what traits or circumstances they share, such as age, contract length, or past injury risk.

Table 3: Top 10 Least Efficient Contracts ($ per WAR)

bottom10_war <- contracts %>%
  arrange(desc(contract_per_war)) %>%
  select(player, team, position, contract_value, best_war, contract_per_war) %>%
  head(10)
kable(bottom10_war, caption = "Table 3. Top 10 Least Cost-Efficient Contracts (\\$ per WAR)") %>%
  kable_styling(full_width=FALSE)
Table 3. Top 10 Least Cost-Efficient Contracts ($ per WAR)
player team position contract_value best_war contract_per_war
Shohei Ohtani Dodgers DH/SP 7.00e+08 9.0 77777778
Francisco Lindor Mets SS 3.41e+08 6.1 55901639
Juan Soto Yankees LF 3.55e+08 6.5 54615385
Xander Bogaerts Padres SS 2.80e+08 5.2 53846154
Anthony Rendon Angels 3B 2.45e+08 4.7 52127660
Corey Seager Rangers SS 3.25e+08 6.5 50000000
Fernando Tatis Jr.  Padres RF 3.40e+08 7.0 48571429
Bryce Harper Phillies RF 3.30e+08 6.9 47826087
Trea Turner Phillies SS 3.00e+08 6.3 47619048
Mookie Betts Dodgers 2B/RF 3.65e+08 7.7 47402597

Interpretation:
Table 3 displays the ten contracts that are the least efficient in terms of cost per WAR, indicating where teams paid the most for the least amount of value. This may be due to players underperforming relative to expectations, suffering injuries, or market forces leading to overpayment (such as free agent bidding wars or positional scarcity). By examining the characteristics of these contracts, I can infer which factors contribute most to inefficiency—be it age, risk, recency of performance, or perhaps overemphasis on intangibles versus measurable output. It is also important to observe whether certain positions are more prone to housing these inefficient contracts, suggesting a structural market issue. Reviewing these deals provides insight into the risks and pitfalls teams face in contract negotiations and the potential for future corrective action.

Visualizations

Figure 1: Contract Value vs. Best WAR by Position

ggplot(contracts, aes(x = best_war, y = contract_value, color = position)) +
  geom_point(alpha=0.7, size=3) +
  geom_smooth(method="lm", se=FALSE, color="black", linetype="dashed") +
  labs(
    title = "Figure 1. Contract Value vs. Best WAR by Position",
    x = "Best WAR",
    y = "Contract Value (\\$)"
  ) +
  scale_y_continuous(labels = dollar) +
  theme_minimal()

Interpretation:
Figure 1 provides a visual representation of the relationship between player performance (as measured by best WAR) and total contract value, with points colored by player position. The scatterplot allows me to assess both the general trend (as indicated by the regression line) and the presence of outliers or clusters by position. If the points cluster tightly around the regression line, it suggests that teams are broadly consistent in paying for performance. However, wide vertical dispersions at similar WAR values may indicate that some players are rewarded disproportionately, potentially due to positional scarcity, recent playoff heroics, or intangible factors such as leadership or marketability. The color-coding by position further helps me detect whether certain roles systematically attract higher or lower valuations for a given WAR level. For example, if pitchers consistently lie above the regression line, it would imply a premium for that position. This figure is crucial for visually diagnosing both the presence and the nature of market inefficiencies in MLB contracts.

Figure 2: Annual $ per WAR by Position

ggplot(contracts, aes(x = reorder(position, annual_per_war, FUN=median), y = annual_per_war, fill = position)) +
  geom_boxplot(alpha=0.7, show.legend=FALSE) +
  labs(
    title = "Figure 2. Annual \\$ per WAR by Position",
    x = "Position",
    y = "Annual \\$ per Best WAR"
  ) +
  scale_y_continuous(labels = dollar) +
  theme_minimal() +
  coord_flip()

Interpretation:
The boxplot in Figure 2 compares the distribution of annual cost per WAR across different positions, allowing for a granular analysis of market efficiency and positional value. By displaying the median, quartiles, and outliers for each position, I can discern not only which positions are generally more or less expensive per unit of WAR, but also the degree of contract value variability. For example, a position with a high median but a narrow interquartile range suggests consistent overpayment, while a wide range indicates greater market uncertainty or volatility. Outliers—particularly large dots above the boxes—may represent high-profile contract busts or unique market scenarios. This visualization is instrumental in highlighting which positions may be systematically over- or under-valued, and in providing a nuanced perspective on where teams might find undervalued talent. The flipped coordinate system enhances readability, especially when dealing with numerous or lengthy position labels.

Figure 3: Contract Length vs. $ per WAR

ggplot(contracts, aes(x = years, y = contract_per_war)) +
  geom_jitter(aes(color=position), width=0.2, height=0, alpha=0.7) +
  geom_smooth(method="lm", se=FALSE, color="black", linetype="dashed") +
  labs(
    title = "Figure 3. Contract Length vs. \\$ per Best WAR",
    x = "Contract Length (Years)",
    y = "Contract Value per Best WAR"
  ) +
  scale_y_continuous(labels = dollar) +
  theme_minimal()

Interpretation:
Figure 3 explores the relationship between contract length and cost efficiency (measured as contract value per best WAR), using jittered points to avoid overplotting and a regression line to show the overall trend. This graph is critical for understanding whether teams actually achieve better value by locking players into longer deals, or if the risk of decline and injury renders long contracts less efficient in practice. If the regression line slopes upward, it would imply that longer contracts are, on average, less cost-efficient—a finding that could be explained by teams overestimating future production or paying a risk premium for long-term security. Conversely, a downward or flat trend would suggest that teams are successfully securing bargains through long-term commitments. The color-coding by position further allows me to see if certain roles are more likely to be associated with longer or more efficient deals. This visualization thus provides insight into the risk-reward calculus teams face when negotiating contract length.

Regression Analysis

Does Contract Value Reflect Player Performance and Position?

contract_lm <- lm(contract_value ~ best_war + avg_war_position + years + position, data=contracts)
tidy(contract_lm)
## # A tibble: 19 × 5
##    term               estimate   std.error statistic  p.value
##    <chr>                 <dbl>       <dbl>     <dbl>    <dbl>
##  1 (Intercept)       59652246. 1092393663.    0.0546 9.57e- 1
##  2 best_war          36309057.    5780997.    6.28   1.25e- 8
##  3 avg_war_position -98472873.  456202943.   -0.216  8.30e- 1
##  4 years             18460966.    2369727.    7.79   1.24e-11
##  5 position1B/DH     17677202.   59992246.    0.295  7.69e- 1
##  6 position2B        51447745.  274688665.    0.187  8.52e- 1
##  7 position2B/RF     99655039.  275957409.    0.361  7.19e- 1
##  8 position2B/SS     43640851.  322119184.    0.135  8.93e- 1
##  9 position3B        60262750.  229013481.    0.263  7.93e- 1
## 10 positionC        -36358571.  137551092.   -0.264  7.92e- 1
## 11 positionCF        23699622.  228703842.    0.104  9.18e- 1
## 12 positionCF/SS     44025356.  321909979.    0.137  8.92e- 1
## 13 positionDH        43974479.  142726645.    0.308  7.59e- 1
## 14 positionDH/LF     13344377.  154259919.    0.0865 9.31e- 1
## 15 positionDH/SP    444069773.  365422763.    1.22   2.28e- 1
## 16 positionLF        20566142.   93229334.    0.221  8.26e- 1
## 17 positionRF        63869008.  228649973.    0.279  7.81e- 1
## 18 positionSP        22494489.  182865585.    0.123  9.02e- 1
## 19 positionSS       129094965.  411143469.    0.314  7.54e- 1
summary(contract_lm)
## 
## Call:
## lm(formula = contract_value ~ best_war + avg_war_position + years + 
##     position, data = contracts)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -109741013  -15811782          0   13634612  110777004 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        59652246 1092393663   0.055    0.957    
## best_war           36309057    5780997   6.281 1.25e-08 ***
## avg_war_position  -98472873  456202943  -0.216    0.830    
## years              18460966    2369727   7.790 1.24e-11 ***
## position1B/DH      17677202   59992246   0.295    0.769    
## position2B         51447745  274688665   0.187    0.852    
## position2B/RF      99655039  275957409   0.361    0.719    
## position2B/SS      43640851  322119184   0.135    0.893    
## position3B         60262750  229013481   0.263    0.793    
## positionC         -36358571  137551092  -0.264    0.792    
## positionCF         23699622  228703842   0.104    0.918    
## positionCF/SS      44025356  321909979   0.137    0.892    
## positionDH         43974479  142726645   0.308    0.759    
## positionDH/LF      13344376  154259919   0.087    0.931    
## positionDH/SP     444069773  365422762   1.215    0.228    
## positionLF         20566142   93229334   0.221    0.826    
## positionRF         63869008  228649973   0.279    0.781    
## positionSP         22494489  182865585   0.123    0.902    
## positionSS        129094965  411143469   0.314    0.754    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 36950000 on 88 degrees of freedom
## Multiple R-squared:  0.8992, Adjusted R-squared:  0.8786 
## F-statistic: 43.63 on 18 and 88 DF,  p-value: < 2.2e-16

Interpretation:
The regression model presented here quantifies the relationship between total contract value and key explanatory variables: a player’s best WAR, the average WAR at that player’s position, contract length, and position (included as categorical dummies). By considering both player-specific and contextual factors, I am able to test whether teams truly reward performance and whether certain positions or contract lengths confer premiums or discounts. A significant positive coefficient for best_war would confirm that teams do, in fact, pay for performance, consistent with economic theory and prior research. Significant coefficients for position dummies would indicate systematic over- or under-valuation of specific roles, possibly due to scarcity, risk, or market perception. The inclusion of years allows me to observe whether longer contracts are associated with higher total values, and whether this relationship is linear or subject to diminishing returns. The model’s R-squared and residuals provide information about how much of the variation in contract value is explained by observable factors, and how much may be attributable to negotiation, timing, or other idiosyncratic influences.

Implications for a General Manager

For a general manager or front office executive, the pursuit of efficient salary allocation must be both quantitative and contextual. My findings suggest that while a premium may be necessary for top-end pitchers or shortstops due to positional scarcity and their high-risk, high-reward nature, significant arbitrage opportunities exist in other positions often overlooked by the broader market. This could involve using advanced defensive metrics to identify an elite defensive catcher who is undervalued, or targeting utility players whose versatility—while difficult to quantify in a single WAR metric—provides immense roster flexibility.

Extreme caution should be exercised with long-term deals, as the data indicate these commitments are often a gamble that leads to less efficient outcomes on a per-WAR basis. This is where the power of agents is most evident; agents are adept at leveraging a player’s peak performance to secure long-term, lucrative deals that carry significant risk for the team due to age-related decline and injury. Therefore, a forward-thinking general manager should be wary of contracts extending beyond a player’s expected peak performance curve and instead explore shorter-term, high-value deals or team-friendly options that mitigate long-term risk.

The most successful general managers blend quantitative rigor with qualitative understanding. While analytics are the backbone of modern decision-making, they are not the sole determinants of value. A GM must blend data with qualitative judgment, recognizing that factors such as a player’s clubhouse presence, leadership, and marketability, while hard to measure, still hold immense value that can tip the balance in a competitive season. A truly efficient front office understands the market, the metrics, and the human element.

To succeed in this market, a general manager must move beyond a simple “dollars per WAR” calculation and embrace a dynamic, portfolio-based approach to roster construction, emphasizing risk management and internal talent development. Continuous market assessment and adaptability are key, as the valuation of a WAR point fluctuates with market shifts, team finances, and the collective bargaining agreement. This thesis provides a framework for systematic contract evaluation and strategic front office decision-making.

Discussion

The results of this study reveal a nuanced and multifaceted picture of salary allocation efficiency in Major League Baseball. At the core, the data and analysis consistently show that while contract value and length are generally aligned with player performance, as measured by best single-season WAR, there remain notable pockets of inefficiency and market anomalies across positions and contract structures. Table 1 and Figure 2 highlight significant variation in cost per WAR by position, bringing to light that roles such as shortstop and pitcher may be systematically over- or under-valued relative to their actual on-field contributions. This observation reflects not only market dynamics and positional scarcity, but also limitations within WAR itself as a universal metric—catcher defense, pitcher risk profiles, and multi-positional versatility can escape the grasp of traditional performance measures, and thus may not be fully compensated.

The extremes of contract outcomes are further illustrated in Tables 2 and 3, where the most and least efficient deals expose the realities of a complex labor market. Teams that successfully identify undervalued talent or negotiate team-friendly deals—often with players who outperform expectations or emerge as breakout stars—reap substantial rewards. Conversely, some organizations find themselves burdened by contracts that return little value, frequently due to unforeseen injuries, rapid declines in performance, or aggressive agent negotiation that results in premium overpayment. These inefficiencies are not simply statistical artifacts, but demonstrate the inherent challenge of predicting future player value in a sport defined by uncertainty and variability.

Figure 1 demonstrates a strong, though imperfect, correlation between contract value and best WAR, suggesting that performance is rewarded but not in isolation. Substantial variation exists at similar performance levels, often driven by factors such as market size, contract timing, or intangible characteristics like leadership and clubhouse presence, which are not captured by WAR but still influence negotiation outcomes. Figure 3 adds another layer to the discussion, revealing that longer contracts can be less efficient, potentially reflecting the risk premium teams must pay for long-term security or the natural tendency for players to decline before their contracts end. This is especially pertinent in the context of modern negotiations, where agents wield considerable bargaining power and seek to secure lengthy, lucrative deals for elite clients—sometimes at the expense of efficiency.

Regression analysis supports the central hypothesis, confirming that best WAR is a significant predictor of contract value, but also indicating that position and contract length play important roles. The model’s unexplained variation points to persistent market inefficiencies and the influence of negotiation dynamics, including agent reputation and leverage, market size disparities, and the evolving role of analytics. Notably, the contracts of generational stars such as Shohei Ohtani, Aaron Judge, and Juan Soto serve as outliers in both data and theory. Ohtani’s unique profile as a two-way player and global icon resulted in a contract that far exceeds what WAR alone would suggest, illustrating how exceptional talent and international marketability can break the mold. Judge and Soto, representing large-market franchises and backed by powerful agents, secured deals inflated by both performance and external factors, further reinforcing the idea that market efficiency is not absolute.

Limitations of this study are primarily methodological. By relying on best single-season WAR, the analysis may overstate expected future value for players who have peaked or are in decline. Multi-year averages and age-adjusted projections could yield more accurate estimates of long-term worth. The absence of controls for age, injury history, and off-field factors also limits the explanatory power of the model, as these elements are crucial in contract negotiations and actual player outcomes. Additionally, the focus on the 2025 season, while representative of current market conditions, omits longitudinal trends and cyclical shifts in negotiation behavior that could be captured through multi-year analysis.

Implications for general managers and front offices are clear: the pursuit of efficient salary allocation must be both quantitative and contextual. Teams can exploit arbitrage opportunities by targeting undervalued positions and player profiles, leveraging analytics not just for performance assessment but for improved negotiation strategy. However, caution is warranted in long-term commitments, as the data show that over-year contract limitations frequently result in inefficient deals due to the unpredictable nature of player health and performance decline. Agent power remains a decisive factor, especially in long-term, high-value negotiations; teams must be prepared to contend with sophisticated bargaining tactics and market pressures that can distort efficiency.

For future work, this study suggests integrating multi-year contract data, age curves, and advanced performance metrics—such as Statcast data and injury risk models—to refine estimates of market efficiency. Expanding the analysis to account for team context, playoff contention, market size, payroll flexibility, and exogenous shocks (including CBA changes and pandemic effects) will further illuminate the drivers of contract value and efficiency. Ultimately, the findings contribute to the literature by providing contemporary evidence of both progress and persistent gaps in MLB labor market efficiency, reinforcing the importance of strategic flexibility and nuanced decision-making.

In synthesizing these findings, I support the use of advanced metrics and analytics for contract evaluation, but advocate for a balanced approach that considers agent power, market size, unique player profiles, and the limitations of predictive models. Teams should blend quantitative rigor with qualitative judgment, recognizing the dynamic and sometimes unpredictable nature of baseball economics. For players and agents, awareness of positional and market anomalies can inform negotiation strategies, while for the league as a whole, ongoing research and adaptation are essential in striving for true market efficiency and competitive balance.

References

  1. Arellano, Manuel, et al. “Compensation and Performance in Major League Baseball: Evidence from Salary Dispersion and Team Performance.” Journal of Sports Economics, vol. 17, no. 4, 2016, pp. 347–367. https://journals.sagepub.com/doi/full/10.1177/1527002516631456

  2. Bierig, Brian, Jonathan Hollenbeck, and Alexander Stroud. “Understanding Career Progression in Baseball Through Machine Learning.” arXiv, Dec. 2017. https://arxiv.org/abs/1712.05754

  3. Bradbury, John Charles. “Peak Athletic Performance and Ageing: Evidence from Baseball.” Journal of Sports Sciences, vol. 27, no. 6, 2009, pp. 599–610. https://www.tandfonline.com/doi/full/10.1080/02640410802603863

  4. Brown, James, and Christopher Jepsen. “The Wage Effects of Multitasking.” Labour Economics, vol. 16, no. 1, 2009, pp. 112–121. https://www.researchgate.net/publication/222659336_The_Wage_Effects_of_Multitasking

  5. Carruth, Matthew, and Shane T. Jensen. “Evaluating Throwing Ability in Baseball.” Journal of Quantitative Analysis in Sports, vol. 3, no. 3, 2007. https://www.degruyter.com/document/doi/10.2202/1559-0410.1079/html

  6. Depken, Craig A., and Dennis P. Wilson. “The Demand for Salary Arbitration in Major League Baseball.” Industrial Relations: A Journal of Economy and Society, vol. 43, no. 4, 2004, pp. 801–821. https://onlinelibrary.wiley.com/doi/full/10.1111/j.0019-8676.2004.00357.x

  7. Ehrlich, Jesse, et al. “Does a Salary Premium Exist for Offensive Output in Major League Baseball?” Managerial Finance, vol. 47, no. 3, 2021, pp. 326–335. https://www.emerald.com/insight/content/doi/10.1108/MF-04-2020-0186/full/html

  8. Elitzur, Ramy. “Data Analytics Effects in Major League Baseball.” Omega, vol. 90, 2019/20, article 102001. https://www.sciencedirect.com/science/article/pii/S0305048318303744

  9. Freeston, Jonathan, et al. “In-Game Workload Demands of Position Players in Major League Baseball.” Journal of Athletic Training, vol. 59, no. 3, 2024. https://journals.humankinetics.com/view/journals/jat/59/3/article-p198.xml

  10. Granato, Amanda. “Wage Dispersion and Individual Performance: MLB Pitchers.” Union College Honors Thesis, 2023. https://digitalworks.union.edu/theses/2715

  11. Hakes, Jahn K., and Raymond D. Sauer. “An Economic Evaluation of the Moneyball Hypothesis.” Journal of Economic Perspectives, vol. 20, no. 3, 2006, pp. 173–185. https://www.aeaweb.org/articles?id=10.1257/jep.20.3.173

  12. Kennedy-Shaffer, Lee. “The Effects of Major League Baseball’s Ban on Infield Shifts: A Quasi-Experimental Analysis.” arXiv, Nov. 2024. https://arxiv.org/abs/2411.15075

  13. Krautmann, Anthony C. “What’s Wrong with Scully-Estimates of a Player’s Marginal Revenue Product?” Economic Inquiry, vol. 37, no. 2, 1999, pp. 369–381. https://onlinelibrary.wiley.com/doi/10.1111/j.1465-7295.1999.tb01437.x

  14. Krautmann, Anthony C., and John L. Solow. “The Baseball Players’ Labor Market Reconsidered.” Labour Economics, vol. 16, no. 1, 2009, pp. 32–41. https://www.sciencedirect.com/science/article/abs/pii/S0927537108000343

  15. Link, Charles R., and Martin Yosifov. “Contract Length and Salaries Compensating Wage Differentials in Major League Baseball.” Journal of Sports Economics, vol. 13, no. 1, 2012, pp. 75–92. https://journals.sagepub.com/doi/abs/10.1177/1527002510396984

  16. PMC Study. “The Impact of National Culture, Altruism, and Risk Preference on Salaries: The Case of Major League Baseball.” Frontiers in Psychology, 2023. https://www.frontiersin.org/articles/10.3389/fpsyg.2023.10171653/full

  17. Rivers, Douglas, and Robert Brown. “Valuing Versatility: Do Teams Pay for Multi-Position Players?” Contemporary Economic Policy, vol. 24, no. 4, 2006, pp. 607–618. https://academic.oup.com/cep/article/24/4/607/1865308

  18. Sommer, Jeffrey. “Human Capital and Salary Distribution in Major League Baseball.” Applied Economics Letters, vol. 18, no. 8, 2011, pp. 705–708. https://www.tandfonline.com/doi/full/10.1080/13504851.2010.533415

  19. Sommers, Paul M., and Noel Quinton. “Pay and Performance in Major League Baseball: The Case of the First Family of Free Agents.” Journal of Human Resources, vol. 17, no. 3, 1982, pp. 426–436. http://www.jstor.org/stable/145589

  20. Watnik, Mitchell R. “Pay for Play: Are Baseball Salaries Based on Performance?” Journal of Statistics Education, vol. 6, no. 2, 1998, n. pag. https://www.tandfonline.com/doi/full/10.1080/10691898.1998.11910618