Valuing Stolen Bases

Author

Jackson Scott

Being able to steal a base is a valuable skill, but how valuable is it really? The amount of stolen bases league wide steadily increased in the 1900’s hitting its peak in the 80’s and 90’s, but the money ball era lead to a decline and eventually staying steady at around 2,700 a year or a little more than one a game.

Decade Total Stolen Bases
1970 2458
1980 3126
1990 3138
2000 2796
2010 2725
2023 3503
2024 3617

As shown in the table since the rule changes made stealing bases easier there has been a massive increase in the total stolen bases. This change isn’t as much a change of team philosophy as it is a change in how valuable a stolen base is in terms of runs. The goal of the game is to score more runs than the other team, so when evaluating offense it is best to look at how many runs we can expect from each action. To do this we can look at the aptly names run expectancy matrix.

Image credit to Fangraphs, if you want to read more about the matrix click here: https://library.fangraphs.com/misc/re24/

The basic idea of the run expectancy matrix is based off how many outs and the position of the runners on base, how many runs would a given team score in that half inning. For example using the matrix above, runners on first and second with one out would have 0.908 expected runs. Say the batter strikes out, the run expectancy drops to .343 expected runs. This particular matrix is from 2014 so I went and found a more recent one. Credit to user Sunnyveil on Reddit for calculating the values for the 2023 season, they actually created a matrix that included count as well. For our purpose we will start with just the matrix values when the count is 0-0.

Runners Zero.Outs One.Out Two.Outs
Empty 0.51 0.27 0.10
1 _ _ 0.91 0.55 0.24
_ 2 _ 1.16 0.72 0.33
1 2 _ 1.46 0.95 0.47
_ _ 3 1.42 0.99 0.38
1 _ 3 1.82 1.21 0.57
_ 2 3 2.01 1.38 0.57
1 2 3 2.25 1.59 0.80

The math for the value of a stolen base is simple, it’s just the run expectancy pre stolen base attempt, and post attempt. For example a runner on first and no outs.

successful attempt =  1.16 - 0.91 = .25 runs
unsuccessful attempt = 0.27 - 0.91 = -0.64 runs

As we can see with this example there is a .89 difference in the expected runs based off if the runner is caught or not, and there is more to be lost than there is to be gained as the run value of getting caught it more than twice as much as successfully stealing the base. To understand the dip in stolen bases from the peak of the late 1900’s we can look at the expected run value of a stolen base attempt. This is done by weighting the run value by the probability of success. For my success rates I just picked 3 seasons pre rule change, 2000, 2010, and 2022 (last season before the change).

2000: 2,924/4,247 = 68.9%
2010: 2,959/4,088 = 72.4%
2022: 2,486/3,297 = 75.4%
Post rule change: 7,120/8,947 = 79.6%

Going back to the earlier table with the stolen bases by year, 2000’s and 2010’s both had a similar mean of about 2,700. Looking at the year I picked out it gives further context as you can see the attempts going down. My hypothesis is that the break even point for stolen bases being a net positive falls in about that 75% mark. Still using the example the expected run value is:

2000: 0.25(.689) + -0.64(.311) = -0.027 runs
2010: 0.25(.724) + -0.64(.276) = 0.004 runs 
2022: 0.25(.754) + -0.64(.246) = 0.031 runs
Post rule change: 0.25(.796) + -0.64(.204) = 0.068 runs

Looking at the expected value of the stolen base attempt, my hypothesis is supported. It looks like the break even point for a net zero run expectancy would be stealing successfully at a 72.2% clip. And like I guessed to make it worth stealing a team would have to succeed about 74%-75% of the time. This gives us the answer that we expected as to why there was a boom in stolen bases since the rule change. At the near 80% success rate now, teams are expected to score double as many runs as 2022, and almost a full runs difference than with the 68.9% in 2000. I will say this doesn’t take into account that for this situation they base runner is stealing second and the probability of stealing second is different than all stolen bases in general. This is also only using one of the possibilities in the matrix, to getting better idea of the full scope we want to calculate the expectancy for every stolen base situation except for stealing home on the matrix.

Event(Outs) 2000_EV 2010_EV 2022_EV 2023+2024_EV
1st to 2nd(0) -0.0267900 0.0043600 0.0310600 0.0684400
1st to 2nd(1) -0.0228200 -0.0011200 0.0174800 0.0435200
1st to 2nd(2) -0.0126300 -0.0010800 0.0088200 0.0226800
2nd to 3rd(0) -0.0976500 -0.0574000 -0.0229000 0.0254000
2nd to 3rd(1) -0.0067900 0.0243600 0.0510600 0.0884400
2nd to 3rd(2) -0.0681800 -0.0548800 -0.0434800 -0.0275200
1st and 2nd to 2nd and 3rd(0) 0.1488100 0.1939600 0.2326600 0.2868400
1st and 2nd to 2nd and 3rd(1) 0.1034500 0.1402000 0.1717000 0.2158000
1st and 2nd to 2nd and 3rd(2) -0.0772700 -0.0573200 -0.0402200 -0.0162800
1st and 3rd to 2nd and 3rd(0) -0.1272200 -0.0915200 -0.0609200 -0.0180800
1st and 3rd to 2nd and 3rd(1) -0.1410000 -0.1060000 -0.0760000 -0.0340000
1st and 3rd to 2nd and 3rd(2) -0.1772700 -0.1573200 -0.1402200 -0.1162800
EVmean -0.0421133 -0.0136467 0.0107533 0.0449133

This table gives some interesting insights to stolen bases in general, like that stealing with two outs is generally a bad idea for increasing run expectancy which makes sense because 1) to score you would still need a hit and 2) if you get caught stealing your run value goes to 0. The last column gives the best information for what I am looking for because it shows a huge difference in stealing at the league average rate in past years vs post run change. If the league was still stealing at the 75% clip from 2022 then the mean run value gained would be about even, where as post run chance teams are on average gaining 0.045 expected runs per stolen base attempt. The big flaw with this data is that the league average stolen base rate may have been 79.6% but that doesn’t mean that for each individual scenario that’s the percentage, for example maybe the league average percentage for stealing second was actually 82%, but for stealing 3rd it was 72%. That changes our values a lot. I couldn’t find the data for stealing specific bases, but I thought it would be better to build a matrix with the expected run values for each percentage and that will give a good idea for the estimated probability of success needed to make it worth attempting the steal.

Event(Outs) 50% ERV 55% ERV 60% ERV 65% ERV 70% ERV 75% ERV 80% ERV 85% ERV 90% ERV
1st to 2nd(0) -0.1950000 -0.1505000 -0.1060 -0.0615000 -0.0170000 0.0275 0.0720000 0.1165000 0.1610
1st to 2nd(1) -0.1400000 -0.1090000 -0.0780 -0.0470000 -0.0160000 0.0150 0.0460000 0.0770000 0.1080
1st to 2nd(2) -0.0750000 -0.0585000 -0.0420 -0.0255000 -0.0090000 0.0075 0.0240000 0.0405000 0.0570
2nd to 3rd(0) -0.3150000 -0.2575000 -0.2000 -0.1425000 -0.0850000 -0.0275 0.0300000 0.0875000 0.1450
2nd to 3rd(1) -0.1750000 -0.1305000 -0.0860 -0.0415000 0.0030000 0.0475 0.0920000 0.1365000 0.1810
2nd to 3rd(2) -0.1400000 -0.1210000 -0.1020 -0.0830000 -0.0640000 -0.0450 -0.0260000 -0.0070000 0.0120
1st and 2nd to 2nd and 3rd(0) -0.0950000 -0.0305000 0.0340 0.0985000 0.1630000 0.2275 0.2920000 0.3565000 0.4210
1st and 2nd to 2nd and 3rd(1) -0.0950000 -0.0425000 0.0100 0.0625000 0.1150000 0.1675 0.2200000 0.2725000 0.3250
1st and 2nd to 2nd and 3rd(2) -0.1850000 -0.1565000 -0.1280 -0.0995000 -0.0710000 -0.0425 -0.0140000 0.0145000 0.0430
1st and 3rd to 2nd and 3rd(0) -0.3200000 -0.2690000 -0.2180 -0.1670000 -0.1160000 -0.0650 -0.0140000 0.0370000 0.0880
1st and 3rd to 2nd and 3rd(1) -0.3300000 -0.2800000 -0.2300 -0.1800000 -0.1300000 -0.0800 -0.0300000 0.0200000 0.0700
1st and 3rd to 2nd and 3rd(2) -0.2850000 -0.2565000 -0.2280 -0.1995000 -0.1710000 -0.1425 -0.1140000 -0.0855000 -0.0570
Mean ERV for SB Percentage -0.1958333 -0.1551667 -0.1145 -0.0738333 -0.0331667 0.0075 0.0481667 0.0888333 0.1295

Now we get a good idea of the difference in run value based on your stolen base percentage. Looking at the means even if you steal 85% of the time it only nets you less than 1/20 of a run, so why steal at all? The answer is simple to win games you have to score runs, and it has been estimated that each run differential of about 10 runs is about one win. Taking this all into account, lets say team A can steal any base at an 80% rate but chooses not to and they win 80 games. Team B on the other hand can also steal bases at a 80% rate and before taking into account SB they had the same run differential as team A. If they were to attempt 100 steals based of the increase in predicted run value that would net them 4.8 more runs than team A. While 5 runs isn’t a lot it’s enough to potentially add a win or two which could be the difference between playoff appearance and being eliminated. I’m sure Arizona would’ve loved an extra win last year. Speaking of Arizona in 2023 they added 20 runs by our rough estimate, and those 2-3 wins added were the difference between making the playoffs, which eventually led to them making the WS. Even more important is the run matrix is league wide so this doesn’t perfectly represent every team so better teams will be expected to score more runs so those 5 extra expected runs on an average team could look more like 10 to 15 on a playoff caliber team. The one thing to look out for is it isn’t just free runs to steal more, the more you steal the more teams will prepare for it and the less you’ll be successful, and the value of stealing bases at and 80% rate is lost if the percentage drops to 75%. It’s about finding the sweet spot of maximum expected run value. Using the two seasons before and after the rule change we can try and visualize this.

`geom_smooth()` using formula = 'y ~ x'

The plot shows a positive linear relationship to RV added and total attempts. This makes sense because the more value you gain from stealing the more likely you are to steal, and as long as you have a positive EV per steal, so around that 74% mark you will gain value every attempt. This doesn’t say much in terms of the sweet spot for attempts that doesn’t sacrafice percentage so what if we looked at a slightly different chart.

`geom_smooth()` using formula = 'y ~ x'

Instead of using total expected run value, looking at SB% will give a good idea of efficiency vs volume. To maximize value you want to steal efficiently and as much as possible, and because of this you would expect a positive linear correlation like the last chart. This is because the better you steal the more you do it, and in the reverse case if you’re bad at stealing you steal less. This is somewhat the case when you look pre rule change, but post rule change the efficiency is pretty even regardless of attempts. Only 6 out of 60 team are below the 74% threshold where stealing produces positive expected value since the change. Looking at the team with the most attempts the Nationals had almost 100 more attempts then most teams with 296 but only created 3 runs of value. This is our best idea of where the amount of attempts may start to negatively affect your SB%, but this is only one data point. If we look at the teams close to them, they all are about 250 attempts, but unlike the Nationals they created 10+ runs, and in the Brewers case 20 runs. 3 of the highest created run values all came from that section with about 250 attempts, but other teams were able to create similar run value with less than 200 attempts. Those teams had SB% up in the upper 80’s about as high as possible, so that raises the question would these teams stealing more hurt their percentage and if it did would the gain in run value be worth it. The Reds are a good example to look at because the majority of their steals came from the same players in 23 and 24. The Reds attempted 238 steals at 79% in 2023, this produced 11 expected runs, good for 13th most in the data set. In 2024 they stole 252 based but improved to 82% boosting their run value to 16, almost a 50% increase. The Reds stole a lot, and then did it again and went up in efficiency. While this isn’t close to a definitive answer to our questions I think it supports the idea that stealing more won’t make you worse at stealing, and looking at the data there’s no evidence to reject that either, the teams all had similar efficiency regardless of attempts post rule change.

Again if you look at the box plots for percentages and attempts there’s very little difference in the amount of steals or the percentage. This could mean that the league thinks they have found a middle ground for stealing efficiency already, or that they have yet to test the true limits. I think we will see a couple teams push the limits and see how much value can SB really bring you.

If we can hypothesize that attempting more steals won’t drastically effect percentages, then we can also hypothesize that stealing more will help your teams chances at winning, 90% of teams generated positive expected run value per steal since the rule change so why not steal as much as possible. I think now that teams have had two seasons of data they can see this too, this is why I think the league will break the stolen bases per team record of about 173 set in 1987. This would smash the previous league wide record of 3,600 steals last year, but also break the 4,000 steals barrier. It has never been easier to steal bases and players have never been more athletic. I believe that stolen bases are being underutilized by teams to generate more runs, even if they are stealing at record numbers and efficiency. The team I think that has been able to utilize this the best is the Brewers and base running in general is why they have been able to over preform expectations consistently.

Our numbers are just super rough estimates, usinght mean expected value of all stolen bases, and each teams mean stolen base percentage is super ineffective for precises numbers. Because of this the last thing I want to look at is more precise value that each team and player is created, to do this I will take each stolen base attempt, and based of the type of steal and the result take the exact run value difference from the mean run value that the steal type gives. I will also compare this to expected value to see what players and teams are adding the most value through SB.

Tm Year 2nd+3rd RV totalRV 2bRV 3bRV SB%
Milwaukee Brewers 2024 28.29 22.90 27.91 0.38 84%
Arizona Diamondbacks 2023 24.93 22.72 21.84 3.09 86%
Cincinnati Reds 2024 22.38 20.26 18.21 4.17 82%
Philadelphia Phillies 2024 20.36 14.70 19.28 1.08 83%
New York Mets 2023 19.57 19.66 16.04 3.53 89%
Los Angeles Dodgers 2024 19.29 18.23 17.95 1.34 86%
Philadelphia Phillies 2023 18.07 18.07 15.28 2.79 84%
Seattle Mariners 2024 17.95 16.80 11.97 5.98 83%
Oakland Athletics 2023 16.49 16.49 12.36 4.13 83%
Chicago Cubs 2024 16.18 15.03 14.39 1.79 83%

The order of this list versus the estimate are similar order but the overall values are much more accurate in this one, for this case the number I want to use is the value gained from stealing 2nd + 3rd, I did this because stealing home is more rare and teams aren’t explicitly stealing home and a lot of those outs are likely pick off without intent to steal and scoring on passed balls, so I think just looking at second and third encapsulates the value from SB better. Finding a more exact value shows that teams are adding up to 2-3 wins from stolen bases alone. What this means for my prediction is that teams who successfully steal at volume could push themselves into a playoff birth or division title where they may not have if they didn’t steal. This only encapsulates half the game, what about on defense? Preventing stolen bases give you a boost in terms of run value as well.

Tm 2nd+3rd RV_Def totalRV_Def 2bRV_Def 3bRV_Def CS% SB
Kansas City Royals -3.56 -3.56 -5.53 1.97 33% 58
Detroit Tigers 2.47 2.56 1.02 1.45 26% 96
Seattle Mariners 3.16 2.01 3.60 -0.44 26% 116
Los Angeles Dodgers 3.43 2.28 6.80 -3.37 25% 115
Miami Marlins 4.88 5.06 2.87 2.01 24% 139
Los Angeles Angels 5.40 3.10 5.58 -0.18 24% 107
Oakland Athletics 5.76 3.46 3.38 2.38 25% 114
Philadelphia Phillies 6.99 6.02 5.61 1.38 22% 110
New York Yankees 7.18 4.88 8.47 -1.29 23% 122
Cincinnati Reds 7.32 6.17 6.24 1.08 23% 123

Defensively the Royals were the only team able to create run value from stolen bases, while the rest of the top 10 were able to keep it under 7.5 runs. Baseball is a zero sum game, meaning any value gained on one side, is directly lost on the other side. So when it’s easier to steal bases, stopping them becomes harder so it becomes important to limit the run game as much as possible.

Tm FullRV OffenseRV DefenseRV
Milwaukee Brewers 17.95 28.29 10.34
Kansas City Royals 16.39 12.83 -3.56
Los Angeles Dodgers 15.86 19.29 3.43
Cincinnati Reds 15.06 22.38 7.32
Seattle Mariners 14.79 17.95 3.16
Philadelphia Phillies 13.37 20.36 6.99
Chicago Cubs 6.12 16.18 10.06
Oakland Athletics 2.82 8.58 5.76
Detroit Tigers 1.25 3.72 2.47
Arizona Diamondbacks 0.57 10.94 10.37
Toronto Blue Jays -0.86 7.32 8.18
New York Mets -0.90 12.07 12.97
Miami Marlins -1.06 3.82 4.88
Boston Red Sox -1.11 6.43 7.54
San Diego Padres -1.20 10.68 11.88

Now we want to look at what specific players were the best in 2024 to do this we will look at the same metrics on the player level.

Name 2nd+3rd RV totalRV 2bRV 3bRV SB% SB
Shohei Ohtani 12.03 12.03 10.58 1.45 94% 59
Brice Turang 8.89 7.83 7.44 1.45 89% 50
Maikel Garcia 8.11 8.11 4.47 3.64 95% 37
Víctor Robles 7.99 6.84 4.61 3.38 94% 34
José Ramírez 6.23 5.08 5.30 0.93 85% 41
Elly De La Cruz 6.10 5.04 4.46 1.64 81% 67
Bryson Stott 6.09 6.09 5.83 0.26 91% 32
David Hamilton 5.78 5.78 3.44 2.34 89% 33
Pete Crow-Armstrong 5.51 4.36 4.47 1.04 90% 27
Josh Lowe 5.39 5.48 4.61 0.78 96% 25
Xavier Edwards 4.97 4.97 5.08 -0.11 89% 31
Francisco Lindor 4.74 4.74 3.44 1.30 88% 29
Oneil Cruz 4.62 4.62 5.25 -0.63 96% 22
Lawrence Butler 4.53 4.53 3.75 0.78 100% 18
Christian Yelich 4.40 4.49 3.36 1.04 95% 21
Zack Gelof 4.34 4.34 4.08 0.26 89% 25
Brenton Doyle 4.33 4.33 3.55 0.78 86% 30
Andrés Giménez 4.33 4.33 3.55 0.78 86% 30
Dylan Moore 4.19 4.19 3.41 0.78 84% 32
Richie Palacios 4.11 4.11 4.11 0.00 95% 19
Spencer Steer 4.11 4.20 3.33 0.78 89% 25
Zach McKinstry 4.00 4.00 4.00 0.00 100% 16
Nico Hoerner 3.99 3.99 1.91 2.08 84% 31
Cedric Mullins 3.93 3.93 4.30 -0.37 84% 32
Brandon Nimmo 3.76 3.76 3.50 0.26 100% 15

The list makes sense, we would expect guys like Turang and Ohtani who stole a ton of bases, but also the guys who stole at a super high clip like Lawrence Butler and again Ohtani. Elly De La Cruz led the MLB on stolen bases but is down the list a little because he was much lower in terms of percentage, but him still being 6th shows the value that stealing at a middling efficiency has when you expand it to volume, which again is why I think teams will steal much more this year. Looking at baseball savants stolen base run value leaders we have a fairly comparable list to what they have. The only difference in their top 10 is they had Corbin Carroll instead of Josh Lowe.

One final point I want to make is how I predict this will impact team building strategy. I don’t foresee a drastic change but I think it will open the door for more contact speed oriented players like a Jake McCarthy, Chandler Simpson type player. Also with the introduction of ABS I think teams will still value framing the most at the catcher position, but I think pop time will also be at the forefront of catcher development.