Question 1

Recommendation: I would walk Robbie Grossman and take my chances with Luis Garcia against Zack Short, Garcia has performed vastly better against right-handed hitters (Grossman hits from both sides, Zack Short bats right handed) while Short has struggled against right-handed pitching in 92 plate appearances, with an OPS of just 0.438.

Rationale: Starting with Luis Garcia, as a right handed pitcher it is a reasonable assumption that he would fare better against right handed batters than the switch-hitting Grossman. His numbers against righties fare far better than lefties, he held righties to a 0.168 wOBA and 1.92 FIP while Garcia posted a 0.358 wOBA and 4.35 FIP against lefties. After pinch-running for Miguel Cabrera, Zack Short (RHB) is in the lineup card. With the switch hitting Victor Reyes already off the bench, it is reasonable to plan to face Short after Grossman.

When evaluating the threat of Robbie Grossman versus Zach Short, Short certainly appears to be the less imposing matchup for Garcia. Grossman had posted a 106 wRC+ to date in the 2021 season against right-handed pitchers while Short posted a 28 wRC+ against right handed pitchers. The most present threat is the runner on third, and with two outs it will likely take a hit to score the runner; Grossman’s batting average against RHP is 0.222 while Short’s is 0.158.

Other important considerations include the threat of the run Grossman represents. Fortunately, the cardinals have Yadier Molina behind the plate, who was in the 69th percentile in blocks above average and in the 98th percentile of caught stealing above average in 2021. Garcia’s fastball velocity was in the 99th percentile in 2021 while Grossman’s sprint speed was in the 67th percentile. Thus, it would likely require an extra base hit from Short to score an additional run, given Molina’s ability to hold runners. Daz Cameron, the runner on third, did demonstrate 79th percentile sprint speed in 2021 though, so throwing through in a steal situationmay not be adivsable. Nonetheless, it would not be unreasonable to assume Molina’s ability behind the plate provides a reasonable deterrent to Grossman stealing second. It is also important to consider that Jamier Candelario, who follows Short is an above average hitter against RHPs in batting average and BB%. Ultimately, though I think it is logical to assume that facing Zack Short or another bench bat gives the Cardinals a noticeably better chance to escape the inning without giving up a run, even if the expected value calculation may show only a marginal change in expected runs given the potential additional damage Grossman represents.

Question 2

Recommendation: If I were to offer Anthony Santander a contract, irrespective of team needs and payroll constraints, I would offer Anthony Santander a contract on par with Chris Taylor in 2021 and Lourdes Gurriel Jr. in 2023– in the neighborhood of 4 years/$68,000,000.

Rationale: I beginning to evaluate a potential contract for Anthony Santander, a good place to start is evaluating his performance in 2024 and 2023 to other players with similar profiles in their contract years. Below compares Santander to some of to similar profiles I’ve identified. WAR (Wins above replacement), OAA (Outs Above Average), and OPS (On-Base plus Slugging) are averages from their contract year and the year prior while the trend columns are an indication of the trajectory each player is on from the prior two seasons in each statistical category.

Player Performance Trends
Comparison of WAR, OAA, and OPS Averages and Trends in Contract Season and Prior Season
Player Pos Season Age WAR Trend OAA Trend OPS Trend Contract
Anthony Santander OF 2024 29 2.80 −0.95 0.799 ?
Chris Taylor INF 2021 30 3.88 0.19 0.812 $60M/4 years
Kris Bryant OF 2021 29 2.05 −0.92 0.739 $182M/7 years
Lourdes Gurriel Jr. OF 2023 29 2.51 −1.21 0.795 $42M/3 years
Michael Conforto OF 2021 28 3.69 −1.50 0.828 $36M/2 years

Based on what the market rate has been for a player that produces like Santander has to this point, with an above average bat and a limited glove. It seems that generally the trend has been towards shorter deals at around 7% of the luxury tax threshold in AAV.

While it may be difficult to predict whether Santander will face the same pitfalls players like Bryant and Conforto have, a shorter deal will hedge against this risk and also give the player leverage for another contract. Thus. offering Santander deal similar to that of Gurriel’s or Taylor’s seems appropriate after adjusting for about the 12% inflation in the luxury tax threshold from Taylor’s contract in 2021 and the 2% inflation from Gurriel’s. I would propose an AAV of $17,000,000 for Santander.

Using the precedents of similar profiles’ longevity (i.e. Chris Taylor, Kris Bryant, and Michael Conforto), I would seek to sign Santander to a shorter deal that would allow him to have a reasonable expectation of another multi-year contract without my team being on the hook for it. Below is a plot demonstrating an aging curve for WAR production.

## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 4 rows containing non-finite outside the scale range
## (`stat_smooth()`).

From the above plot, it is clear that by age 33 and slightly sooner, players’ production levels are likely to dissipate more rapidly. Thus, I would ideally suggest a three year contract, but given Santander’s negotiating position– a four-plus year deal will be likely in order to be competitive in the market. Ideally, a market-rate AAV value and an opportunity to pursue another multi-year contract will make this offer competitive

Other considerations are important here such as where a team is in their rebuilding process and team needs/roster construction short term and long term at the OF and DH positions. The above data suggests Santander is a serviceable outfielder for the time being but players of similar archetypes often transition to the designated hitter position.

Question 3

Recommendation: In order to maximize value per dollar spent, teams should seek to build a roster core via the MLB rookie draft as well as the international free agent pool given the outsized value provided per dollar spent on players who produce before signing an MLB free agent deal, or even an extension after reaching the major leagues.

Rationale: To look where players have traditionally brought the most value per dollar spent on their teams I want to use contract data from 1985 to 2016 from the Lahman package. Synthesizing this with fangraph’s WAR leaderboards we can get a sense for what players brought their teams the most value and how they were acquired. Below are the top 20 players in value added per dollar spent after adjusting for salary inflation.

Most Productive Seasons per Dollar Spent (Inflation Adjusted)
Inflation evaluated by adjusted for salary as a function of yearly median salary
...1 Player Season WAR salary WAR/$100K
1 Albert Pujols 2001 7.22 $200,000 3.61
2 Mike Trout 2013 10.15 $510,000 1.99
3 Lance Berkman 2001 6.84 $305,000 2.24
4 Paul Lo Duca 2001 5.12 $230,000 2.23
5 Ben Zobrist 2009 8.69 $415,900 2.09
6 Nomar Garciaparra 1997 6.39 $150,000 4.26
7 Mike Piazza 1993 7.43 $126,000 5.90
8 Frank Thomas 1991 7.26 $120,000 6.05
9 Troy Glaus 2000 8.19 $275,000 2.98
10 Mookie Betts 2016 7.37 $566,000 1.30
11 Manny Machado 2015 6.59 $548,000 1.20
12 Eric Hinske 2002 4.84 $200,000 2.42
13 Corey Koskie 2001 5.75 $300,000 1.92
14 Chase Utley 2005 7.15 $345,000 2.07
15 Kenny Lofton 1992 5.78 $110,000 5.26
16 Torii Hunter 2001 4.23 $230,000 1.84
17 Curtis Granderson 2007 7.90 $410,000 1.93
18 Josh Donaldson 2013 7.24 $492,500 1.47
19 Matt Carpenter 2013 7.18 $504,000 1.42
20 Kris Bryant 2016 7.49 $652,000 1.15

It’s clear that players provide the most value on rookie deals with their debut MLB team. Of the twenty players listed above, only one player generated such outsized value on a free agency contract. With this said, their is a survivor bias to consider. The 20 players on this list clearly have made it to the major leagues with great success, but the risk involved with the rookie draft is baked into this price. According to Baseball America about 22% of MLB first round draft picks reach 1,000 hits while 9% of second round picks reach the same milestone. So for every player like Albert Pujols that rise through the ranks with great success dozens ultimately amount to a sunk cost that will never show up as an unproductive contract on the major league team’s books.

With that disclaimer though, these younger players are more likely to benefit from power law effects that are similar to those found in early stage venture capital bets, where a single small bet can provide enough value to support an entire fund. Despite the inevitable bets that will fail, I believe the upside of the unicorns that may emerge provide enough value to warrant leaning heavily into the strategy of acquiring players prior to reaching the major leagues, specifically through the MLB rookie draft. By obtaining more of these players, teams can increase variance with the hope of acheiving a small amount of winning bets that can provide large amounts of value while providing more capital to be spent on more efficiently evaluated markets.

Alternatively, the international free agent pool has provided similar value in recent years. Recent notable international free agents include Ronald Acuña, Fernando Tatís Jr., and Juan Soto. Each of which brought disproportionate value on pre-arbitration rookie deals. There is less public data on the success rates here and as a result this approach is likely more labor intensive, yet it is another approach to making the most of each dollar spent.

Question 4

Recommendation: With appropriate data I would use a XGBoost model trained on historical data to predict outcomes in a monte carlo simulation that would ultimately provide probabilities that can be used in game.

Rationale: In order to model stolen bases there are two primary problems to be solved, training a model based on historical data and then leveraging that model to produce in-game insights despite the unknowns before every pitch.

Choosing a model

An XGBoost model would be a reasonable choice as a starting point for this model because of its ability to handle mixed data types and also capture non-linear relationships that are certainly present in this problem. We will solve this problem as a binary classification model.

Training a model

The first step is training the model. Given clean data on both the batter, fielders, and pitchers this part is straight forward enough. Data that I would be looking for is the runner’s proximity to the starting base, pitch location, pitch spin, pitcher time to home, the runner’s maximum speed, and the runner’s average jump (let’s call it the distance a runner gains from the start of a pitcher’s motion to releasing the ball).

It’s clear that some of these variables could not be known before a pitch. Thus, instinctively there are two primary approaches we could take to a model that can determine the probability of stealing base: the first is a model, or series of models, that incorporate the necessary attributes to impute what the resulting pitch may look like. The second approach is train a model on historical data and generate simulation data based on possible scenarios. Using the second approach, we can use the aforementioned data as a starting point and train a model with cross-validation to avoid overfitting.

Generating Simulation Data

Limitations of explainable models lead me to prefer the second approach. To implement this approach we need to generate potential scenarios given in-game scenarios known before the pitch. So if Batter X is batting against Pitcher Y with two outs and runners at the corners, then we can utilize either real data or synthetic data to predict potential details such as time to plate, pitch velocity, and pitch spin. This can be done by bootstrapping real data or creating reasonable synthetic data from already observed data. This data should contain a representative batch of data containing pitch data Batter X could expect to see.

Creating Probabilities

Given each row of synthetic data we should have a batch of data points that would be reasonable to see in each plausible scenario. We can use measures of variance of each attribute that is a random variable to account for the range of potential outcomes. Once this data is created we can use predict_proba() to determine the likelihood of a successful stolen base. Then collapse each of these rows into a single probabilities by taking the mean of the created probabilities. This can be done for each plausible scenario.

Question 5

Recommendation: Given historical data that suggests divisional deficits beyond four games going into the All-Star game are unlikely to be surmounted, the team should not buy and sell if the team’s core is mostly beyond the end of their “prime” at approximately age 29.

Rationale: The most recent data I could find on teams probabilities of was via an article from Baseball America in 2002 that gave probabilities of teams winning their division at various positions behind the leader around the trade deadline. The table is shown below:

Deficit at Deadline Sample Size Won Division Pct.
0-½ 159 107 67%
1-1½ 19 8 42%
2-2½ 21 3 14%
3-3½ 31 10 32%
4-4½ 37 2 5%
5-5½ 32 3 9%
6-6½ 36 1 3%
7-7½ 36 2 6%
8-8½ 33 1 3%
9-9½ 33 0 0%
10 20 0 0%

From this plot, it’s clear that our team should not make any moves with plans to win the division in mind, so our attention turns to the wild card where our team is six games back. Discerning our team’s probability of making the wild card requires more inference though.

To try and determine the chances of catching the current leaders we can run a quick simulation to see our chances of being in the top 3. The assumptions I will make is that a team’s current win percentage is an indicator of future skill if and that there are three wild card leaders and three chasers, including our team.

With 10,000 simulationss we will seek to discern how often our team lands among the top 3 to get an idea of how realistic it is to make the playoffs.

set.seed(123)
n_sims <- 10000
games_remaining <- 60
wins_U <- 50
games_played <- 102
teams <- c("Team_U", "Team_C1", "Team_C2", "Team_3", "Team_2", "Team_1")
final_wins <- matrix(nrow = n_sims, ncol = length(teams))
colnames(final_wins) <- teams
for (i in 1:n_sims) {
  deficit_T1 <- sample(6:10, 1)
  deficit_T3 <- sample(1:6, 1)
  deficit_T2 <- sample(deficit_T3:deficit_T1, 1)
  wins_T1 <- wins_U + deficit_T1
  wins_T3 <- wins_U + deficit_T3
  wins_T2 <- wins_U + deficit_T2
  wins_C1 <- wins_U  # Tied with your team
  deficit_C2 <- sample(1:5, 1)
  potential_war_added <- 0
  wins_C2 <- wins_U - deficit_C2  # Behind your team
  win_pct_U <- (wins_U + potential_war_added) / games_played
  win_pct_T1 <- wins_T1 / games_played
  win_pct_T2 <- wins_T2 / games_played
  win_pct_T3 <- wins_T3 / games_played
  win_pct_C1 <- wins_C1 / games_played
  win_pct_C2 <- wins_C2 / games_played
  win_pct_U <- win_pct_U
  sim_wins_U <- rbinom(1, games_remaining, win_pct_U)
  sim_wins_T1 <- rbinom(1, games_remaining, win_pct_T1)
  sim_wins_T2 <- rbinom(1, games_remaining, win_pct_T2)
  sim_wins_T3 <- rbinom(1, games_remaining, win_pct_T3)
  sim_wins_C1 <- rbinom(1, games_remaining, win_pct_C1)
  sim_wins_C2 <- rbinom(1, games_remaining, win_pct_C2)
  total_wins_U <- wins_U + sim_wins_U
  total_wins_T1 <- wins_T1 + sim_wins_T1
  total_wins_T2 <- wins_T2 + sim_wins_T2
  total_wins_T3 <- wins_T3 + sim_wins_T3
  total_wins_C1 <- wins_C1 + sim_wins_C1
  total_wins_C2 <- wins_C2 + sim_wins_C2
  final_wins[i, ] <- c(total_wins_U, total_wins_C1, total_wins_C2, total_wins_T3, total_wins_T2, total_wins_T1)
}
is_top3 <- apply(final_wins, 1, function(x) {
  rankings <- order(x, decreasing = TRUE)
  which(rankings == which(teams == "Team_U")) <= 3
})
prob_top3 <- mean(is_top3)
cat("Probability of your team finishing among the top 3 wild card teams:", prob_top3, "\n")
## Probability of your team finishing among the top 3 wild card teams: 0.1974

Based on the simulation we can expect our team to make the playoffs about 20% of the time. At this point this pretty much eliminates buying, considering our team would need to find about 8 wins above replacement through 100 games to have a 50% chance of making a playoff.

The next consideration is where our team is in the team building process. If the team seems to be ascendant and has a core that is largely younger than 28-29, I would suggest the team stands pat and adds value in the offseason through free agency and ideally through the development of the farm system. Given the success of wild card teams in the current playoff structure, there is value in keeping together teams that have a chance to make it into the playoffs, even as a six seed. Often times the signings that have pushed these teams over the edge have not required exorbitant amounts of money. The six seed has made the NLCS the last three years. But as the aging curve above shows, after age 28 or 29 the drop-off accelerates. If teams can trade those pieces to contenders looking for a boost, that will likely net the team prospects that can return more long term value than an aging player on a team that is barely competitive enough to compete for the playoffs. In the case of an aging core or a weak farm system, this team should sell and not hold out for the off chance of making the playoffs.

Thanks!