The Sacrifice of Flies in WAR

A run scores, an out is made. Which matters more?

DJ Barry

Sac Fly Intro Image
Sac Fly Intro Image

When I first learned that a sacrifice fly does not really help your WAR, and can actually hurt it in certain cases, it struck me as odd.

The batter gets the run home. That feels important. Of course, he also makes an out, and that matters too. Depending on the situation, the team’s expected runs for the rest of the inning can drop even after the run scores. I get that to some degree.

Still, the whole thing did not pass the smell test. So, against the advice of my therapist, I felt like it was my duty to spend an inordinate amount of time trying to get to the bottom of it.

The question is simple on the surface: a run scores, an out is made, which one matters more?

The answer depends on what kind of value we are trying to measure. Run Expectancy asks what the play did to the inning. Win Probability asks what the play did to the game. WAR tries to turn player value into one context-neutral number. The sacrifice fly sits right in the middle of those three ideas.

So I looked at the sacrifice fly three ways. First, through Run Expectancy, which explains why WAR can punish the out. Then through Win Probability, which explains why the run can matter more than the out in certain spots. Finally, through actual players, to see whether the hitters who get the runner home also show up as sacrifice-fly overperformers.

The point is not that every sacrifice fly should automatically help WAR.
The point is that a run-expectancy-heavy framework can miss part of the story when the value of the play depends on the exact game state.

Getting It Down to One Number

Getting it down to one number is the whole challenge of sports analytics.

Every sport has some version of this problem, but baseball is probably the easiest major sport to start with. The sample size is huge, the events are more isolated, and the game is less fluid than basketball, football, hockey, or soccer. That does not make it easy, but it does make baseball a great stepping stone for trying to measure value in other sports.

That is where WAR comes in. Different WAR models handle the details differently, but the basic goal is the same: take what a player does and translate it into runs, then wins. That is useful, but it also means every play has to be converted into run value.

For a sacrifice fly, that gets messy. The run scores, which matters. The batter also makes an out, which matters too. So the question is not just whether the runner scored. The question is how much the entire plate appearance changed the team’s expected runs.

That is where Run Expectancy takes over.

Even With the Run, What’s the Cost of the Out?

Run Expectancy is the big driver here. It asks a simple question: based on the situation, how many runs is the offense expected to score in the rest of the inning?

That is what ties this back to the WAR framework. The play is not only judged by whether the runner scored. It is judged by the run that scored, the out that was made, and the expected runs left after the play.

The basic idea is simple: RE Change = Runs Scored + New RE - Old RE. A sacrifice fly does not automatically become worth one full run just because one run scored. The remaining base-out situation still matters.

Read this section this way: Avg. Exp Runs is the expected runs before the plate appearance. Exp Runs post Sac Fly is one run scored plus the expected runs left after the sacrifice fly. Delta is the difference.
Run Expectancy: Average Result vs Sac Fly Result
Average result is the starting expected runs. Sac fly result is one run scored plus expected runs remaining.
Scenario Avg. Exp Runs Exp Runs post Sac Fly Delta
0 Outs
Bases Loaded 2.310 2.006 −0.304
2nd and 3rd 2.020 1.751 −0.269
1st and 3rd 1.820 1.560 −0.260
3rd Only 1.430 1.290 −0.140
1 Out
Bases Loaded 1.550 1.467 −0.083
2nd and 3rd 1.360 1.332 −0.028
1st and 3rd 1.180 1.240 0.060
3rd Only 0.950 1.110 0.160
Overall Average
Overall Average 1.447 1.396 −0.051
Sac fly model assumes a runner on second advances to third 17% of the time.

The table shows the basic tradeoff: the sacrifice fly is not judged against zero, it is judged against the expected value of the starting situation. The chart below uses the same Delta column to show where the sac fly helps the inning math and where the out gives too much value back.

The weighted average row is the key takeaway. Across all runner-on-third, fewer-than-two-out states, the average starting situation was worth +1.447 expected runs. The modeled sacrifice fly produced +1.396 total runs when counting the run scored plus the expected runs remaining. That leaves the sacrifice fly at -0.051 runs versus the average result.

What That Means in WAR Terms
Using the actual weighted-average Run Expectancy result from this section.
Measure Value How to Read It
Sac Fly RE Delta −0.051 The weighted average run value from the table above.
Runs per WAR Assumption 10.000 A simple rule of thumb for converting runs into WAR.
Approx WAR Impact −0.005 The estimated WAR effect for one modeled sac fly opportunity.
Runs per WAR varies by season and run environment. Ten runs per WAR is used here as a simple scale estimate.

That WAR number is tiny, which is part of the point. One sacrifice fly is not changing a player’s season by itself. The issue is not the size of one play. The issue is what the framework is choosing to reward or punish when this same type of opportunity shows up over and over.

So yes, I understand the math. A sacrifice fly with nobody out can be a net negative in Run Expectancy because the offense traded an out for one run and gave up the chance at a bigger inning. Fine.

That answers the inning-value question. It does not fully answer the game-value question.

And that is where this gets interesting. If a hitter always gets the runner from third home when the opportunity allows it, are we really sure we want to say the average hitter is more valuable just because the inning math likes the remaining upside better?

To make that question cleaner, I need two fake players.

The first is Zachary Fly. Zac is average in every way, except for one thing: give him a runner on third with fewer than two outs, and he always gets the run home with a sacrifice fly.

The second is Average Joe. Joe is also average in every way, including these runner-on-third chances. He is not bad. He is not special. He is just the normal baseline.

So now the question becomes easier to see: if everything else is equal, would I rather have Zachary Fly or Average Joe?

The Curious Case of Zachary Fly

Zachary Fly started as a hypothetical player for this exact comparison. I wanted one clean question: what happens if a hitter is completely average everywhere else, but he always gets the runner home from third with fewer than two outs?

Across from him is Average Joe. Joe is not a straw man. He is not bad. He is simply the average version of the same hitter in the same situations.

Zac is not chasing the perfect inning. Zac is here to get the run home. Give him a runner on third and fewer than two outs, and he gives you the sacrifice fly. Every time.

So the comparison is not superstar versus scrub. It is Zachary Fly versus Average Joe, with everything else held equal.

These chances are not common, but they are not random either. In this dataset, runner-on-third, fewer-than-two-out chances make up about 5.1% of all plate appearances. For a 600 PA season, that comes out to about 30.6 chances.

Official sacrifice flies are only one piece of that group, roughly 12.9% of these opportunities. The official scoring label is not really the point here. The point is what happens when one hitter keeps trading the out for the run, over and over again.

Season Comparison: Zac vs Average Hitter
Expected runs over a 600 PA season using the observed mix of runner-on-third opportunities
Outs Scenario Share Opps Avg Runs Zac Runs Run Gap WAR Gap
0 Outs Bases Loaded 7.6% 2.3 5.344 4.640 −0.704 −0.070
0 Outs 2nd and 3rd 7.0% 2.1 4.333 3.755 −0.577 −0.058
0 Outs 1st and 3rd 9.2% 2.8 5.114 4.383 −0.731 −0.073
0 Outs 3rd Only 4.7% 1.5 2.078 1.875 −0.203 −0.020
1 Out Bases Loaded 17.2% 5.2 8.135 7.698 −0.437 −0.044
1 Out 2nd and 3rd 16.4% 5.0 6.828 6.687 −0.141 −0.014
1 Out 1st and 3rd 20.0% 6.1 7.223 7.590 0.367 0.037
1 Out 3rd Only 17.9% 5.5 5.214 6.092 0.878 0.088
600 PA Total 100.0% 30.6 44.269 42.721 −1.548 −0.155
Estimated WAR uses 10 runs = 1 WAR as a simple rule of thumb. Actual run-to-win conversion changes by run environment.

Over a 600 PA season, the average hitter’s starting situations would be worth about 44.269 expected runs. Zac’s modeled sacrifice fly outcomes would be worth about 42.721 expected runs. That leaves Zac at -1.548 runs compared with the average hitter, or about -0.155 WAR using the 10 runs per WAR shortcut.

Put another way, if the average version of this player is a clean 2.00 WAR player, Zac would come out around 2.00 - 0.155 = 1.845 WAR. That is the uncomfortable part. Zac is getting the runner home every time, but because he is doing it through an out that scores a run, the Run Expectancy framework can still price him below average over a full season.

On paper, that makes the sacrifice fly look very conditional. It helps in some spots, especially with one out, but it is not automatically better than the average expected value of the situation. The run matters. The out can give back enough future value to make the play look worse than it feels.

This is where I still have a problem. The math is not broken. It is doing what Run Expectancy tells it to do. I just do not think this fully settles the question. If Zac keeps getting the runner from third home, I am not ready to cut him yet.

Run Expectancy or Win Expectancy?

This is where the sacrifice fly stops being a math problem and starts becoming a value problem.

Run Expectancy is not wrong. It is doing exactly what it is supposed to do. It asks a clean question: given the base-out state, how many runs should this team expect to score before the inning ends?

That is a useful question. It is also not the only question. Baseball is not played in neutral innings. Sometimes the inning ceiling matters. Sometimes the only thing that matters is whether the run got home.

That is why the sacrifice fly is such a good test case. With a runner on third and fewer than two outs, a sacrifice fly can look ordinary, or even slightly negative, through Run Expectancy. The batter gets the run home, but he also makes an out and lowers the rest-of-inning upside.

That makes sense from an inning-value perspective. A team with nobody out may still have a big inning in front of it. Trading that upside for one guaranteed run can look like a step backward. The run scores, but the ceiling drops.

Win Expectancy looks at the play differently. Down one in the ninth, a sacrifice fly that ties the game is not just one run. It is survival. In a tie game late, a sacrifice fly that takes the lead is not just one run. It changes the entire shape of the game.

Run Expectancy vs Win Expectancy
The sacrifice fly can look different depending on whether we are measuring the inning or the game.
Frame What It Measures Why Sac Flies Get Weird
Run Expectancy Expected runs left in the inning. The out lowers the inning ceiling, even when the run scores.
Win Expectancy The team’s chance of winning the game. The run can tie the game, take the lead, or protect a late lead.
WAR Tension How player value gets translated into runs and wins. A context-neutral framework may miss some scoreboard value.

This is why I do not think the sacrifice fly question is really about whether Run Expectancy is broken. It is not. The better question is whether Run Expectancy alone captures the type of value we actually care about.

For most plays, the RE approach makes sense. Over a full season, stripping out context helps keep player value cleaner and more stable. A double should not become a different skill just because it happened in April instead of September.

The sacrifice fly is different because the whole purpose of the play is context. Nobody praises a sacrifice fly because it maximizes the inning. They praise it because the runner scored. The batter accepted the out, changed the scoreboard, and in some spots changed the game.

So that is the split I want to keep in mind from here. Run Expectancy tells us what the play did to the inning. Win Expectancy tells us what the play did to the game. The sacrifice fly sits right between those two ideas.

That does not mean Run Expectancy is useless. It means we need another lens. If RE tells us the inning cost, then Win Probability Added can help show the game value.

The WPA Case for Zachary Fly

Section 7 set up the split. Run Expectancy tells us what the play did to the inning. Win Expectancy tells us what the play did to the game.

That distinction matters for Zac. Through Run Expectancy, his guaranteed sacrifice fly can look worse than average because the out lowers the rest-of-inning ceiling. Through Win Probability, the same play can look very different. If the run ties the game, takes the lead, or protects a late lead, the scoreboard may care more than the inning ceiling.

So now I want to give Zac his cleanest argument. Forget the full-season total for a minute. First, where does the sacrifice fly actually help win probability compared with the average result in the same situation?

Read this section as a game-state map. The big number is the sac fly WPA edge compared with the average hitter result in that cell. The small number is how often that cell shows up among runner-on-third, fewer-than-two-out chances.

Sac Fly WPA Edge by Game State

This heat map compares the average hitter result in each cell with the modeled sacrifice fly result. Positive numbers mean Zac’s sac fly is better for win probability than the average result in that spot. Negative numbers mean the average result is better.

Where Zac Looks Best and Worst by WPA
Positive WPA edge means the modeled sac fly beats the average result in that game state.
Spot Share Avg WPA SF WPA SF Edge SF Sample SF Source
Best Zac Spots
9th | Tie | 3rd Only, 1 Out 0.18% 0.13 16.82 16.69 14 Exact cell
9th | Tie | 1st and 3rd, 1 Out 0.22% 0.76 16.63 15.87 21 Exact cell
Extras | Down 1 | 3rd Only, 1 Out 0.15% −1.61 13.51 15.12 17 Exact cell
Extras | Tie | Bases Loaded, 1 Out 0.39% −1.32 13.26 14.58 42 Exact cell
9th | Tie | 2nd and 3rd, 1 Out 0.18% −0.94 13.36 14.31 14 Exact cell
Worst Zac Spots
9th | Down 2 | 1st and 3rd, 0 Outs 0.06% 0.90 −13.97 −14.87 7 Exact cell
9th | Down 2 | Bases Loaded, 1 Out 0.14% −0.33 −13.12 −12.79 16 Exact cell
Extras | Down 3+ | Bases Loaded, 0 Outs 0.01% 6.46 −3.86 −10.32 141 Score + base/out
9th | Down 2 | 1st and 3rd, 1 Out 0.14% −0.32 −10.60 −10.28 14 Exact cell
Extras | Down 2 | 3rd Only, 1 Out 0.06% 1.70 −7.20 −8.90 7 Exact cell
WPA is shown in percentage points. Sac fly source shows whether the estimate came from the exact cell or a broader fallback group.

This is the best argument for Zac. There are real game states where the sacrifice fly is not just acceptable. It is valuable. A sac fly that ties the game late, takes the lead, or protects a narrow margin can beat the average hitter result in a way Run Expectancy does not fully capture.

The warning is that a heat map gives every cell the same amount of space. A ninth-inning tie game and a random early-inning spot look equal on the page, even though they do not happen equally often. So the next step is to stop looking at where Zac can help and ask what it all adds up to over a season.

The WPA case for Zac is real. The season case still has to survive frequency.

What Does the WPA Map Add Up To?

Section 8 gave Zac his best case. There are real game states where the sacrifice fly looks much better through Win Probability than it does through Run Expectancy.

The problem is that a heat map gives every cell the same amount of space. A ninth-inning tie game and a random early-inning spot look equal on the page, even though they do not happen equally often.

So this section does the accounting. It takes the WPA edge from each cell and weights it by how often that cell actually appears across about 30.6 runner-on-third, fewer-than-two-out chances in a 600 PA season.

Season math:
Season WPA Gap = Sac Fly WPA Edge × Situation Share × 30.6
Season-Weighted WPA Summary
Where Zac’s season value comes from after weighting each spot by frequency
Rollup Share Opps Avg WPA Zac WPA Zac Gap Win Eq. Edge
Overall
Season Total 100.00% 30.6 4.46 27.82 23.36 0.234 Zac edge
By Situation
Bases Loaded, 0 Outs 7.56% 2.3 0.60 −1.52 −2.12 −0.021 Avg edge
2nd and 3rd, 0 Outs 7.01% 2.1 0.49 −0.50 −0.99 −0.010 Avg edge
1st and 3rd, 0 Outs 9.18% 2.8 0.24 −3.15 −3.39 −0.034 Avg edge
3rd Only, 0 Outs 4.75% 1.5 0.17 −0.21 −0.38 −0.004 Avg edge
Bases Loaded, 1 Out 17.15% 5.2 0.55 3.36 2.81 0.028 Zac edge
2nd and 3rd, 1 Out 16.41% 5.0 0.71 5.23 4.52 0.045 Zac edge
1st and 3rd, 1 Out 20.00% 6.1 1.35 11.08 9.73 0.097 Zac edge
3rd Only, 1 Out 17.94% 5.5 0.36 13.54 13.19 0.132 Zac edge
By Inning Block
1st-3rd 32.15% 9.8 2.42 5.00 2.58 0.026 Zac edge
4th-6th 33.52% 10.3 2.40 6.23 3.83 0.038 Zac edge
7th-8th 22.61% 6.9 0.22 5.48 5.26 0.053 Zac edge
9th 7.92% 2.4 0.01 3.05 3.04 0.030 Zac edge
Extras 3.80% 1.2 −0.60 8.06 8.65 0.087 Zac edge
By Score Block
Down 3+ 13.40% 4.1 −0.30 −9.07 −8.76 −0.088 Avg edge
Down 2 6.88% 2.1 −0.03 −5.35 −5.31 −0.053 Avg edge
Down 1 10.54% 3.2 0.92 6.08 5.16 0.052 Zac edge
Tied 24.08% 7.4 1.52 21.92 20.40 0.204 Zac edge
Up 1 14.68% 4.5 0.93 7.02 6.09 0.061 Zac edge
Up 2 10.00% 3.1 0.81 3.95 3.14 0.031 Zac edge
Up 3+ 20.43% 6.3 0.62 3.27 2.65 0.026 Zac edge
Zac Gap = Zac WPA minus average hitter WPA, weighted by situation frequency. Win Eq. = Zac Gap / 100. This is a win-probability translation, not literal WAR.

After weighting the WPA map by frequency, Zac’s season total comes out to +23.36 WPA points. Since WPA is measured in percentage points here, that translates to about +0.234 wins of win-probability value.

This is not literal WAR. WAR is a broader, mostly context-neutral framework that compares a player to replacement level. This number is narrower. It only tells us what Zac’s sacrifice-fly approach adds or subtracts through Win Probability in these runner-on-third spots.

This is the honest version of the WPA case. Zac has real value in certain scoreboard spots, but the season total depends on how often those spots actually show up. The highlight cells matter. The boring cells still get a vote.

Two Answers to the Same Zac Question

At this point, the article has two answers to the same basic question. Run Expectancy gives one answer. Win Probability gives another.

That does not mean one is right and the other is wrong. They are measuring different things. Run Expectancy is asking what the play did to the inning. Win Probability is asking what the play did to the game.

The point of this section is not to turn WPA into WAR. WPA is not WAR. It is too context-heavy for that. The point is simply to put both answers on a rough win scale so we can compare the size and direction of the argument.

Scale check:
Run Expectancy estimate = run gap / 10
Win Probability estimate = WPA point gap / 100

This is a comparison scale, not official WAR.
Two Answers to the Same Zac Question
Run Expectancy measures the inning. Win Probability measures the game.
Frame Question Unit Average Zac Zac Gap Rough Win Scale Edge
Run Expectancy What did Zac do to the inning? Runs 44.269 42.721 -1.548 −0.155 Average
Win Probability What did Zac do to the game? WPA Points 4.46 27.82 +23.36 0.234 Zac
Rough win scale: RE estimate = run gap / 10. WPA estimate = WPA point gap / 100. This is a comparison scale, not official WAR.

The Run Expectancy answer estimates Zac at -0.155 on the rough win scale. The Win Probability answer estimates Zac at +0.234 on that same rough scale.

Those are not contradictory results. They are answers to different questions. RE is judging the expected value of the inning. WPA is judging the chance of winning the game.

That is the tension in one table. Zac can look worse through Run Expectancy because the out lowers the inning ceiling. He can look more defensible through Win Probability because some runs matter more to the game than they do to the inning.

This does not prove Zac belongs on the roster. It shows why the sacrifice fly question is bigger than one run scored and one out made. The same play can lose inning value while gaining game value.

How to Read the Player Metrics

Before moving from Zachary Fly and Average Joe to real hitters, it is worth cleaning up the language.

The rest of the article uses a few related but different metrics. They all live around the same play, but they do not answer the same question.

The big distinction:
Getting the runner home is the baseball result. Getting official sacrifice-fly credit is the scoring label. Helping win the game is the context layer.
Player Metric Cheat Sheet
Same opportunity, different value questions
Metric Plain English Question It Answers How to Read It
Run Expectancy Average inning value. Did the play help the expected run total for the inning? This is why the out matters. A run can score and the play can still lose inning value.
Win Probability Game-value context. Did the play help the batting team win the game? This is why inning, score, and leverage matter. One run is not always just one run.
RSOE / 100 Runner Scored Over Expected per 100 chances. Did the hitter get the runner from third home more often than expected? This is the broader baseball result. It includes more than official sacrifice flies.
SFOE / 100 Sac Flies Over Expected per 100 chances. Did the hitter produce more official sacrifice flies than expected? This is the narrower box-score label. It captures one version of getting the runner home.
WPA / 100 Win Probability Added per 100 chances. Did those runner-on-third chances add game value? This keeps the game-state layer attached to the player results.
Gap RSOE / 100 minus SFOE / 100. Is runner-home value larger than official sac-fly credit? Positive means the broader result is stronger than the scoring label. Negative means sac-fly credit is stronger than the broader runner-home result.
All /100 metrics are scaled per 100 true sacrifice-fly opportunities: runner on third with fewer than two outs.

The key is not to treat these as interchangeable. A hitter can be strong at getting the runner home without piling up official sacrifice flies. He can also collect official sacrifice-fly credit without proving that the broader runner-home skill is as strong.

That is why the player section has to be careful. The goal is not to crown the king of sacrifice flies. The goal is to see whether there is any real separation between hitters once we compare the runner-home result, the scoring label, and the game-value layer.

Do Real Hitters Actually Separate?

Zachary Fly is useful because he gives us a clean thought experiment. Real baseball is messier. Players do not get the same opportunities, they do not face the same pitchers, and they do not always need the same type of contact.

So the next question is simple: do actual hitters separate on sacrifice flies over expected, or is this just noise dressed up as a leaderboard?

This section looks at real hitters and compares their clean official sacrifice flies to their expected sacrifice flies. The main rate is SF OE / 100, which means sacrifice flies over expected per 100 runner-on-third opportunities.

How to read this section:
SF OE = clean sacrifice flies minus expected sacrifice flies
SF OE / 100 = sacrifice flies over expected per 100 runner-on-third opportunities
ROE / 100 = runner from third scored above expected per 100 opportunities

This does not prove skill yet. It only shows whether the player spread is worth testing.

Sac Flies Over Expected Leaderboard

This table shows the top and bottom hitters by SF OE / 100 from 2015–2026, with a minimum of 100 runner-on-third opportunities. I am keeping the table narrow on purpose. The goal is the spread, not every supporting calculation.

Actual Player Results: Sac Flies Over Expected
Clean sacrifice flies compared with expected sacrifice flies
Rank Hitter Years Opp. SF Exp. SF SF OE SF OE / 100 ROE / 100
Top 15: Most SF OE / 100
1 Mountcastle, Ryan 2020–2026 148 36 19.3 16.7 11.3 8.2
2 Smith, Will 2019–2026 201 47 26.7 20.3 10.1 4.9
3 Gurriel, Yuli 2016–2025 211 45 26.9 18.1 8.6 12.2
4 Zimmerman, Ryan 2015–2021 130 27 16.4 10.6 8.2 0.7
5 Gregorius, Didi 2015–2022 195 40 25.0 15.0 7.7 5.4
6 Arraez, Luis 2019–2026 150 31 19.7 11.3 7.6 15.0
7 Merrifield, Whit 2016–2024 219 44 27.7 16.3 7.4 5.2
8 Wendle, Joey 2016–2024 119 24 15.5 8.5 7.1 5.6
9 Aguilar, Jesús 2017–2023 181 36 23.3 12.7 7.0 1.8
10 Suzuki, Kurt 2015–2022 145 28 18.1 9.9 6.8 4.1
11 Vogt, Stephen 2015–2022 119 23 14.9 8.1 6.8 11.1
12 Verdugo, Alex 2018–2025 163 32 21.2 10.8 6.6 0.6
13 Rendon, Anthony 2015–2024 245 47 30.8 16.2 6.6 9.5
14 Arenado, Nolan 2015–2026 385 75 49.8 25.2 6.5 6.2
15 Kipnis, Jason 2015–2020 152 29 19.3 9.7 6.4 −1.3
Bottom 15: Fewest SF OE / 100
1 Gallo, Joey 2015–2024 176 4 22.6 −18.6 −10.6 −15.4
2 Souza Jr., Steven 2015–2022 100 4 12.8 −8.8 −8.8 −5.7
3 Lopez, Nicky 2019–2026 111 5 13.8 −8.8 −7.9 1.4
4 Rodríguez, Julio 2022–2026 140 8 18.5 −10.5 −7.5 −0.9
5 Zunino, Mike 2015–2023 137 8 17.2 −9.2 −6.7 −13.9
6 Alfaro, Jorge 2017–2025 100 6 12.7 −6.7 −6.7 2.9
7 Maybin, Cameron 2015–2021 116 7 14.5 −7.5 −6.4 0.2
8 Anderson, Tim 2016–2025 181 11 22.5 −11.5 −6.3 −7.2
9 Steer, Spencer 2022–2026 138 10 18.5 −8.5 −6.2 −4.2
10 Smoak, Justin 2015–2020 123 8 15.6 −7.6 −6.2 −9.5
11 Maldonado, Martín 2015–2025 150 10 19.0 −9.0 −6.0 −10.7
12 Lux, Gavin 2019–2025 100 7 12.9 −5.9 −5.9 −1.8
13 Yelich, Christian 2015–2026 318 23 41.3 −18.3 −5.8 3.2
14 Sánchez, Jesús 2020–2026 128 11 17.7 −6.7 −5.3 −2.3
15 Mercer, Jordy 2015–2021 133 10 16.8 −6.8 −5.1 1.9
Sorted by SF OE / 100, then total SF OE, then opportunities. ROE / 100 shows runner-from-third scoring over expected per 100 opportunities.

The top hitter by sacrifice flies over expected per 100 opportunities was Mountcastle, Ryan, at +11.3 SF OE / 100. Across the full sample, he finished at +16.7 total SF OE.

The lowest hitter by sacrifice flies over expected per 100 opportunities was Gallo, Joey, at -10.6 SF OE / 100. Across the full sample, he finished at -18.6 total SF OE.

This is where the article shifts from the model to the players. The leaderboard shows separation, which matters. It tells us there are hitters who finished well above and well below expectation in these spots.

The leaderboard is not the conclusion. It is the setup. The real question is whether this player spread is bigger than what we would expect from random variation.

Signal or Noise: Does Sac Fly Skill Repeat?

The player leaderboard showed separation. That matters, but it is not enough. A leaderboard can show who finished above expectation without proving that the result is repeatable.

So this section tests the next question: if a hitter beats expectation in one sample, does that tell us anything about what he does in another sample?

I split the data into odd and even seasons from 2015 through 2024. That keeps the test balanced and avoids giving one side a partial 2026 season. Then I compare odd-season performance to even-season performance.

Test setup:
Positive WPA Edge = sac fly WPA edge greater than +0.25 WPA points
Negative WPA Edge = sac fly WPA edge less than -0.25 WPA points
Neutral spots are excluded

Main test = even-season result regressed on odd-season result

Odd/Even Regression Test

The table below uses a simple regression: even-season performance as a function of odd-season performance. A positive slope means the odd-season result carried forward. The p-value and R-squared tell us whether that pattern looks meaningful or mostly noisy.

Signal or Noise: Odd/Even Sac Fly Skill Test
Even-season result regressed on odd-season result, minimum 10 opportunities per split
Outcome Players
Model Output
Takeaway
Intercept Slope t-stat p-value Corr.
Positive WPA Edge
SF OE / 100 607 −0.088 0.038 0.924 0.3559 0.001 0.038 Mostly noise
ROE / 100 607 −0.573 0.202 4.623 0.0000 0.034 0.185 Some evidence of signal
Negative WPA Edge
SF OE / 100 382 −0.286 0.032 0.634 0.5263 0.001 0.033 Mostly noise
ROE / 100 382 2.534 0.203 4.165 0.0000 0.044 0.209 Some evidence of signal
Situational Edge
SF OE / 100 379 0.246 −0.026 −0.465 0.6419 0.001 −0.024 Mostly noise
ROE / 100 379 −2.940 0.135 2.405 0.0166 0.015 0.123 Some evidence of signal
SF OE / 100 is clean sacrifice flies over expected. ROE / 100 is runner scored from third over expected. Situational Edge = Positive WPA Edge result minus Negative WPA Edge result.

The main official-sac-fly test is the Situational Edge: SF OE / 100 row. Its slope was -0.026, with a p-value of 0.6419 and an R² of 0.001.

The broader runner-scored backup test is the Situational Edge: ROE / 100 row. Its slope was 0.135, with a p-value of 0.0166 and an R² of 0.015.

This is the section that keeps the article honest. If the situational-edge rows show a positive and meaningful relationship, then the sacrifice-fly argument has some repeatable signal behind it. If they are flat, then the leaderboard is probably mostly noise.

This still does not prove that hitters are intentionally choosing a sacrifice fly. The better claim is whether the data shows repeatable signal. Skill and intent are not the same thing.

Does the Runner-Home / WPA Signal Hold Across Time?

The player-level question changed once WPA entered the article. The old question was whether official sacrifice-fly credit repeated across time. That is not the cleanest test anymore.

The better question is whether the broader runner-home signal connects to actual win-probability value. In other words, do hitters who get the runner from third home above expectation also keep producing sacrifice flies that skew positive in WPA?

Time split:
Sample A = 2015-2016, 2019-2020, and 2023-2024
Sample B = 2017-2018, 2021-2022, and 2025-2026

The test is not whether one leaderboard looks good. The test is whether the runner-home / WPA signal stays stable across separate samples.

Stability Map: Signal or Noise?

This chart measures movement instead of just direction. The x-axis is how much a player’s RSOE / 100 changed from Sample A to Sample B. The y-axis is how much his Net WPA SF% changed.

The chart uses the main cutoff of 30 true SF opportunities and 15 official sacrifice flies in each sample. A movement of 25% in Net WPA SF% is treated as a noise warning. A movement of 35% is treated as a major noise warning.

Stability Summary
How much did RSOE / 100 and Net WPA SF% move between samples?
Metric Value
Qualified hitters 53
Cutoff used 30+ true SF opps and 15+ official SF in each sample
Sound signal share 24.5%
Noise warning share 35.8%
Major noise warning share 30.2%
Noise or major warning share 66.0%
Avg RSOE movement 4.08
Avg Net WPA SF% movement 21.6%
Median Net WPA SF% movement 20.0%
RSOE direction flipped 30.2%
Net WPA SF% direction flipped 9.4%
Both directions flipped 7.5%
RSOE A/B correlation 0.444
Net WPA SF% A/B correlation 0.161
Net SF WPA / 100 A/B correlation 0.091
Net WPA SF% movement is measured in percentage points. Example: +20% to -10% is a 30 percentage-point move.
Signal Class Summary
Sound signal vs noise warning groups
Signal Class Players Avg RSOE A Avg RSOE B Avg RSOE Move Avg Net WPA SF% A Avg Net WPA SF% B Avg Net WPA Move Avg Net SF WPA / 100 A Avg Net SF WPA / 100 B Official SF Share
Sound signal 13 3.67 3.10 1.83 37.9% 39.0% 6.4% 10.37 14.22 586 24.5%
Noisy but interesting 5 7.33 6.61 1.69 44.4% 25.3% 19.2% 22.16 14.33 253 9.4%
Noise warning 19 4.33 3.18 5.14 38.4% 37.9% 18.2% 14.34 15.12 857 35.8%
Major noise warning 16 2.73 2.23 5.40 24.7% 17.7% 38.9% 13.38 6.59 631 30.2%
Signal class is based on movement, not just whether the player was positive or negative in one sample.
Biggest Net WPA SF% Movers
These are the players driving the noise warning
Player Class Opps Official SF RSOE A RSOE B RSOE Move Net WPA SF% A Net WPA SF% B Net WPA Move Net SF WPA / 100 A Net SF WPA / 100 B SF WPA Move
Bell, Josh Major noise warning 331 43 3.27 6.43 3.15 −35.0% 30.4% 65.4% −6.09 5.17 11.26
Ozuna, Marcell Major noise warning 336 32 1.72 −6.43 8.15 60.0% 5.9% 54.1% 8.79 5.49 3.30
Realmuto, J.T. Major noise warning 306 39 1.66 3.25 1.59 76.2% 22.2% 54.0% 17.75 13.33 4.42
Kepler, Max Major noise warning 229 35 7.77 −0.81 8.58 29.4% −16.7% 46.1% 44.42 −18.96 63.38
Riley, Austin Major noise warning 208 31 −2.45 −8.48 6.04 62.5% 20.0% 42.5% 27.67 18.85 8.82
Albies, Ozzie Major noise warning 228 44 1.94 9.56 7.62 62.5% 21.4% 41.1% 75.06 15.18 59.88
Correa, Carlos Major noise warning 338 47 2.20 2.80 0.59 0.0% 40.7% 40.7% −0.91 17.78 18.69
Rendon, Anthony Major noise warning 245 48 8.15 11.72 3.57 56.7% 16.7% 40.0% 16.45 8.49 7.95
Longoria, Evan Major noise warning 255 49 −0.53 5.33 5.86 11.1% 50.0% 38.9% 33.72 21.55 12.18
Seager, Corey Major noise warning 225 33 10.08 6.91 3.17 35.3% 0.0% 35.3% 19.44 −5.05 24.49
Donaldson, Josh Major noise warning 211 34 3.22 5.11 1.89 47.1% 11.8% 35.3% −0.08 15.51 15.59
Gregorius, Didi Noise warning 195 40 3.60 6.87 3.27 13.3% 48.0% 34.7% −4.20 15.14 19.34
Moustakas, Mike Noise warning 193 36 −0.02 −1.52 1.50 38.9% 5.6% 33.3% 14.12 −4.79 18.92
Smith, Will Noise warning 201 47 4.76 4.98 0.23 20.8% 52.2% 31.3% 10.00 40.47 30.47
Crawford, Brandon Major noise warning 260 46 8.11 −1.08 9.19 0.0% 30.4% 30.4% −3.69 8.38 12.08
This table is intentionally skeptical. Large movement does not prove a player is bad. It means the player-level WPA signal is unstable across samples.

How to Read This

This is a better consistency test than the old SFOE version because it uses the article’s actual question. RSOE / 100 measures getting the runner home. Net WPA SF% measures whether the player’s official sacrifice flies skewed toward positive or negative win-probability value.

The key is movement. A small move is fine. A player going from +18% Net WPA SF% to +10% is still basically the same profile. A player going from +30% to -10% moved 40 percentage points, and that is a real noise warning.

That is a major caution flag. At least half of the qualified player sample landed in a noise-warning bucket. The Net WPA SF% sample-to-sample correlation is basically flat.

This section does not need the signal to be perfect. It just needs to be honest. If the best runner-home players keep a stable positive WPA profile, the player-level case gets stronger. If Net WPA SF% swings by 25+ percentage points for a lot of players, then the safer conclusion is that WPA timing is real value, but noisy at the player level.

Runner Home When It Matters

This is the player-level version of the WPA question. The league-level chart can tell us what percentage of all sacrifice flies are positive or negative WPA plays. This asks something more useful: which hitters are producing those positive-WPA sacrifice flies?

The x-axis is RSOE / 100, which measures how often a hitter gets the runner from third home above expectation. The y-axis is Positive WPA SF %, which measures the share of that hitter’s official sacrifice flies that increased his team’s win probability.

The question:
Do the best runner-home players also have a higher share of positive-WPA sacrifice flies?

Do the Best Runner-Home Players Create Sac Fly Win Value?

This scatterplot is the direct test. Each point is one hitter. To qualify, a player needed at least 100 true SF opportunities and at least 10 official sacrifice flies from 2015–2026.

The x-axis shows how often the hitter got the runner from third home above expectation. The y-axis shows the net batting-team WPA points he created through official sacrifice flies, scaled per 100 true sacrifice-fly opportunities.

The color shows the direction of the player’s official sacrifice flies. Green means his sac flies skewed positive-WPA. Red means they skewed negative-WPA. Bubble size is official sacrifice flies.

The Direct Answer

The top RSOE group created clearly more Net SF WPA / 100 than the bottom RSOE group. The top-minus-bottom gap in Net SF WPA / 100 was 3.28. The top-minus-bottom gap in Net WPA SF% was 2.6%.

Does RSOE Separate Sac Fly Win Value?
Qualified hitters grouped by RSOE / 100
RSOE Group Players Opps Official SF Avg RSOE / 100 Avg SFOE / 100 Net SF WPA / 100 WPA+ SF % WPA- SF % Net WPA SF % Avg SF WPA Pts
Top 25% RSOE / 100 87 15,183 2,352 8.47 2.38 13.31 62.1% 27.7% 34.4% 0.86
Middle 50% RSOE / 100 174 32,028 4,409 1.54 0.97 11.12 61.2% 28.5% 32.7% 0.81
Bottom 25% RSOE / 100 87 15,103 1,774 −5.56 −1.12 10.04 60.4% 28.6% 31.8% 0.85
Net SF WPA / 100 is batting-team WPA points from official sacrifice flies per 100 true SF opportunities. Net WPA SF % is WPA+ SF % minus WPA- SF %.
Section 14 Answer Key
Does runner-home overperformance connect to sac fly win value?
Metric Value
Qualified hitters 348
Minimum true SF opps 100
Minimum official SF 10
Qualified-player Net SF WPA / 100 11.39
Qualified-player WPA+ SF% 61.3%
Qualified-player WPA- SF% 28.3%
Qualified-player Net WPA SF% 33.0%
RSOE vs Net SF WPA / 100 correlation 0.127
RSOE vs Net SF WPA / 100 R-squared 0.016
RSOE vs Net WPA SF% correlation 0.041
Top-minus-bottom Net SF WPA / 100 gap 3.28
Top-minus-bottom Net WPA SF% gap 2.6%
Net WPA SF% = Positive WPA SF% minus Negative WPA SF%. Net SF WPA / 100 uses actual WPA points.
Player Examples
Who drives the relationship, and who complicates it?
Player Opps Official SF RSOE / 100 SFOE / 100 Net SF WPA / 100 WPA+ SF % WPA- SF % Net WPA SF % Avg SF WPA Pts
Best RSOE / 100
Ramírez, Harold 122 15 19.76 −0.46 35.90 73.3% 13.3% 60.0% 2.92
Rutschman, Adley 104 19 16.13 4.90 35.00 63.2% 10.5% 52.6% 1.92
Arraez, Luis 150 32 14.98 8.22 11.60 59.4% 25.0% 34.4% 0.54
Kwan, Steven 122 19 13.40 2.12 26.39 57.9% 42.1% 15.8% 1.69
Kirk, Alejandro 128 23 13.26 4.39 23.05 52.2% 21.7% 30.4% 1.28
Highest Net SF WPA / 100
Gattis, Evan 117 20 11.54 4.39 49.49 70.0% 15.0% 55.0% 2.90
Albies, Ozzie 228 44 6.65 6.25 38.03 65.9% 29.5% 36.4% 1.97
Marsh, Brandon 112 20 3.79 4.55 46.16 75.0% 20.0% 55.0% 2.58
Lowe, Brandon 146 21 1.33 1.09 41.51 71.4% 14.3% 57.1% 2.89
Laureano, Ramón 140 22 −1.04 2.48 47.93 72.7% 13.6% 59.1% 3.05
Lowest Net SF WPA / 100
Beltré, Adrian 152 25 12.00 3.55 −16.51 48.0% 44.0% 4.0% −1.00
India, Jonathan 121 20 6.26 3.48 −16.78 30.0% 55.0% −25.0% −1.02
Hosmer, Eric 234 20 6.11 −4.09 −10.73 45.0% 50.0% −5.0% −1.26
Turang, Brice 108 12 0.55 −2.44 −20.93 41.7% 33.3% 8.3% −1.88
Bruce, Jay 154 19 −3.11 −0.20 −12.99 42.1% 47.4% −5.3% −1.05
Good RSOE, lower SF WPA
García Jr., Luis 132 22 12.69 3.08 5.83 45.5% 27.3% 18.2% 0.35
Beltré, Adrian 152 25 12.00 3.55 −16.51 48.0% 44.0% 4.0% −1.00
Naylor, Josh 211 24 11.65 −2.07 9.76 75.0% 25.0% 50.0% 0.86
Ruiz, Keibert 107 12 11.30 −2.05 7.57 66.7% 16.7% 50.0% 0.67
Reynolds, Bryan 197 23 10.17 −1.36 5.48 43.5% 52.2% −8.7% 0.47
The final group highlights players with positive RSOE / 100 but below-average Net SF WPA / 100.

What This Actually Says

The scatterplot is not asking whether every good runner-home player is clutch. It is asking whether the player-level runner-home signal lines up with actual win-probability value from official sacrifice flies.

This version is better than using Positive WPA SF% alone because it uses the size of the WPA movement, not just the direction. A tiny positive sac fly and a huge positive sac fly should not count the same.

The top RSOE group created clearly more Net SF WPA / 100 than the bottom RSOE group. The correlation between RSOE / 100 and Net SF WPA / 100 was 0.127, with an R² of 0.016.

This is the sharper version of the argument. The official sacrifice-fly label is not enough. The better question is whether a hitter gets the runner home above expectation, and whether those sacrifice flies create actual win-probability value.

The Evidence Stack

At this point, the article has looked at the sacrifice fly from a few different angles. Each one answers a slightly different question.

That is important because the whole debate is not really about one scoring rule. It is about what kind of value we are trying to measure.

The Evidence Stack
Each layer answers a different version of the sacrifice-fly question
Layer Question What It Showed Why It Matters
Run Expectancy What did the play do to the inning? The modeled sacrifice fly finished at -0.051 runs versus the average starting situation. This explains why WAR can be skeptical of the sacrifice fly. The run scores, but the out changes the remaining inning value.
Zachary Fly What happens if one hitter always gets the runner home this way? Over a 600 PA season, Zac finished -1.548 runs versus the average hitter, or about -0.155 WAR using the simple 10 runs per WAR scale. This makes the tradeoff easier to see. Zac can do the thing we want and still look worse through a strict run-expectancy lens.
Win Probability Did the run matter to winning the game? The WPA map showed that the same sac fly can look very different depending on inning, score, and base-out state. This is the missing context. RE measures average run value, while WPA shows why one run is not always just one run.
Player Signal Is this just random noise? The player sections tested whether hitters separate from each other instead of assuming the leaderboard is automatically meaningful. This keeps the article honest. A better value framework still needs to avoid overclaiming small samples or noisy player rankings.
RSOE vs SFOE Does the box-score label capture the broader result? Among qualified hitters, RSOE / 100 and SFOE / 100 had a player-level correlation of 0.404. This separates the baseball result from the scoring label. Getting the runner home is broader than getting official sacrifice-fly credit.
WPA / 100 Do those player results also carry game value? WPA adds the game-value layer by asking how much the play changed the team’s chance to win. This ties the player section back to the main argument. The best version of the question includes runner-home value, official credit, and game value.
This table summarizes the argument before the final WAR interpretation. It does not add a new model.
That is the full stack: RE explains the cost of the out, WPA explains why the run can matter more in certain game states, and RSOE/SFOE separates the actual runner-home result from the official scoring label. WAR lives in the middle of that tension.

So the final question is not whether the sacrifice fly is always good or always bad. It is whether a WAR framework built mostly around average run value can fully capture a play whose value often depends on the exact game state.

What WAR Sees, and What It Misses

This is where the sacrifice fly becomes more than a scoring-rule complaint.

The point is not that every sacrifice fly should automatically help a hitter’s WAR. That would be too simple. Some sacrifice flies are routine. Some come in low-value spots. Some are just ordinary outs where the runner happened to score.

The better point is that Run Expectancy only answers one question: how did this play change the average number of runs expected in the inning?

That is a useful question. It is also not the only question.

Zachary Fly and Average Joe make the tension easier to see. Average Joe represents the normal baseline. He might do more than hit a sacrifice fly. He might single, walk, homer, strike out, or fail completely. Zac is simpler. With a runner on third and fewer than two outs, he banks the run with a sacrifice fly.

Through a strict Run Expectancy lens, Average Joe can look better because he preserves the upside of a bigger inning. Through a Win Probability lens, Zac can look better because in some game states the run matters more than the remaining upside.

That is the whole issue. WAR has good reasons to avoid becoming a pure context stat. The problem is that stripping away context completely can also strip away the reason a play mattered.

How Each Framework Sees the Sacrifice Fly
The same play can look different depending on the value lens
Lens What It Sees What It Misses Why It Matters
Run Expectancy The average run value of the base-out state before and after the play. Score, inning, leverage, and whether one specific run changes the game. The modeled sacrifice fly came out at -0.051 runs versus the average starting situation.
Win Probability How much the play changed the batting team’s chance to win the game. It can over-credit context, teammate setup, and game situation. It explains why the same sacrifice fly can matter very differently depending on inning and score.
Zachary Fly vs Average Joe The cleanest version of the tradeoff: bank the run or preserve the bigger-inning upside. It is still a thought experiment, not a direct player valuation model. Over a 600 PA season, Zac finished -1.548 runs versus Average Joe, or about -0.155 WAR using the simple 10 runs per WAR scale.
RSOE vs SFOE The difference between getting the runner home and getting official sacrifice-fly credit. Intent, approach, and whether the official scoring label fully captures the plate appearance. In the player section, RSOE / 100 and SFOE / 100 had a correlation of 0.404.
WPA / 100 Whether the player-level results also carry game-value impact. It is still context-heavy and should not simply replace WAR. WPA adds the game-value layer by asking how much the play changed the batting team’s chance to win.
WAR A context-neutral estimate of player value above replacement. Small situational value that only matters because of score, inning, or leverage. WAR is useful because it avoids overreacting to context, but that same choice can miss some real game value.
The sacrifice fly is useful as a case study because it separates average run value, game value, player signal, and official scoring credit.

The Balanced Takeaway

The sacrifice fly does not prove WAR is broken. That would be too strong. WAR is trying to measure player value in a neutral way, and that is the whole reason it is useful.

The issue is the tradeoff. A context-neutral model is cleaner, more stable, and easier to defend. At the same time, it can miss plays where the entire value is tied to the situation.

Run Expectancy is not wrong. It is doing what it is built to do. It measures average run value. The problem is that average run value is not always the same thing as helping your team win that game, in that inning, with that runner on third.

That is why the Zachary Fly versus Average Joe comparison works. Zac is not automatically better. Joe is not automatically better. The answer depends on the question being asked.

If the question is average inning value, Joe can have the stronger case. If the question is banking a specific run in a specific game state, Zac can have the stronger case. WAR mostly lives closer to the first question. Baseball games are often decided by the second.

That is also why RSOE, SFOE, and WPA matter. RSOE asks whether the runner scored more often than expected. SFOE asks whether the player got official sacrifice-fly credit more often than expected. WPA asks whether those events helped win the game. Those three things overlap, but they are not identical.

So the argument is not: give every sacrifice fly WAR credit. The argument is: be careful when a WAR framework leans too heavily on average run value, because some value only appears once the game state is allowed back into the picture.

The sacrifice fly is interesting because it lives in that gap. It is a small play, but it exposes a big measurement problem.

The Sac Fly Is Small. That’s the Point.

After all of this, I do not think the answer is that sacrifice flies are secretly some massive hidden WAR flaw.

They are not. One sacrifice fly is tiny. Even a full season of these chances usually moves the needle by fractions of a win.

That is exactly why I like the play as a case study. The sacrifice fly is small enough to understand, common enough to measure, and weird enough to show where different value systems disagree.

Run Expectancy sees the cost of the out. Win Probability sees the value of the run in context. RSOE sees whether the runner actually came home. SFOE sees whether the hitter got the official sacrifice-fly label. WAR has to decide how much of that context it wants to keep.

That is not an easy decision. A fully context-neutral stat can miss the importance of the moment. A fully context-driven stat can give a player too much credit for the situation around him.

The sacrifice fly sits right between those two problems.

Final Answers
The sacrifice fly is small, but the measurement question is not
Question Answer Why
Should every sacrifice fly help WAR? No. The out has a real cost, and not every run-scoring out is equally valuable.
Is Run Expectancy wrong? No. It answers the average inning-value question well. It just does not answer every value question.
Is Win Probability the full answer? No. It captures game context, but it can over-credit the situation around the player.
Does the official sacrifice-fly label tell the whole story? No. Getting the runner home is broader than getting official sacrifice-fly credit.
Does Zachary Fly beat Average Joe? It depends. Average Joe can win the average run-value argument. Zachary Fly can win the specific game-state argument.
What is the real takeaway? WAR is useful, but incomplete. A run-expectancy-heavy framework can miss value that only appears when inning, score, and game state matter.
This is a summary of the article’s argument, not a proposed replacement WAR formula.

A run scores. An out is made. Which matters more?

The answer is not one or the other. The answer depends on the question.

If the question is average run value, the out matters a lot. If the question is game value, the run can matter more. If the question is player skill, we need to be careful and separate real signal from noisy scoring labels.

That is the whole lesson. The sacrifice fly is not a big play. It is a small play that forces the bigger question into the open.

WAR mostly asks what a play is worth on average.
Baseball games are sometimes decided by what a play is worth right now.

That gap is where the sacrifice fly lives.