The Sacrifice of Flies in WAR

When I first learned that a sacrifice fly does not really help your WAR, and can actually hurt it in certain cases, it struck me as odd.

The batter gets the run home. That feels important. Of course, he also makes an out, and that matters too. Depending on the situation, the team’s expected runs for the rest of the inning can drop even after the run scores. I get that to some degree.

Still, the whole thing did not pass the smell test. So, against the advice of my therapist, I felt like it was my duty to spend an inordinate amount of time trying to get to the bottom of it.

The question is simple on the surface: a run scores, an out is made, which one matters more?

The answer depends on what we are measuring. Run Expectancy shows what the play did to the inning. Win Probability shows what it did to the game. WAR tries to turn value into one context-neutral number, and the sacrifice fly sits right between those ideas.

One Number to Rule Them All

Really, that is the goal of sports analytics in general. Players and teams create value in different ways, and one number makes them easier to compare. It gives us a common language for asking who helped more, even when they helped in completely different ways.

WAR is useful because of that, but baseball makes it difficult. Singles, walks, strikeouts, doubles, steals, double plays, home runs, sacrifice flies, and everything else all have to be translated into runs, then into wins.

Most plays are easy enough to understand. A home run is good. A strikeout is bad. A double is better than a single. The exact value can be debated, but the direction is clear.

Sacrifice flies are different because the good and bad happen at the same time. A run scores, but an out is made. The offense gets something real, but it also gives away one of its remaining outs.

WAR cannot just give the hitter full credit for the run. It has to ask the harder question: after accounting for both the run and the out, did the offense actually improve its expected run total?

Even When the Run Scores, the Out Still Counts

Run Expectancy does not care whether a play feels productive. It asks a colder question: after the play happened, how much did the offense’s expected run total change?

That is why the sacrifice fly gets interesting. The run scores, which is good. The batter also makes an out, which is bad. Run Expectancy forces both parts of the trade into the same equation.

The basic idea is simple: RE Change = Runs Scored + New Run Expectancy - Old Run Expectancy. For the average opportunity, I am using the observed delta_run_exp from the play-level data. For the clean sacrifice fly result, I use observed delta_run_exp when it exists. When it does not, I use a base/out Run Expectancy model for the clean sacrifice fly outcome.

Scenario	Opp	Clean SF	Clean SF%	Avg RE Change	Clean SF RE Change	SF vs Avg
Run Expectancy: Average Result vs Clean Sac Fly
Average result uses observed RE change. Clean sacrifice fly uses observed RE change when available, otherwise a base/out RE fallback.
0 Outs
3rd Only	4,848	599	12.4%	0.011	−0.140	−0.151
1st and 3rd	9,373	1,258	13.4%	0.019	−0.260	−0.279
2nd and 3rd	7,155	825	11.5%	0.025	−0.269	−0.295
Bases Loaded	7,717	998	12.9%	0.016	−0.304	−0.320
1 Out
3rd Only	18,307	2,356	12.9%	0.038	0.160	0.122
1st and 3rd	20,418	2,672	13.1%	0.028	0.060	0.032
2nd and 3rd	16,748	1,926	11.5%	0.038	−0.028	−0.066
Bases Loaded	17,507	2,382	13.6%	0.016	−0.083	−0.099
Overall Average
Overall Average	102,073	13,016	12.8%	—	—	—
Clean SF means official sacrifice fly, runner from third scored, and no sacrifice-fly double play or other double-play weirdness.
Avg RE Change is the average observed delta_run_exp for all outcomes in the same base/out state.
Clean SF RE Change source: Modeled RE fallback: 8 row(s). Modeled fallback assumes a runner on second advances to third 17% of the time.

The table is the cleanest version of the tradeoff. A clean sacrifice fly is productive, but it still has to beat the average value of the same opportunity. That is why the out matters.

The overall row is the punchline. Across these true sacrifice-fly opportunities, the average plate appearance changed run expectancy by . A successful clean sacrifice fly changed run expectancy by . The gap was runs.

This is the part that makes WAR’s treatment more understandable. A sacrifice fly is not getting graded against an empty baseline. It is getting graded against what the offense could have expected from that same situation.

That answers the inning-value question. It does not fully answer the game-value question. To get there, we need to think about what happens when one hitter is almost guaranteed to get the runner home.

The Curious Case of Zachary Fly

Imagine a hitter named Zachary Fly. Every time he comes up with a runner on third and fewer than two outs, he does exactly what the name suggests. He hits a successful clean sacrifice fly. The runner scores. The batter is out. No weird double-play stuff. No scoring loopholes. Just the pure version of the play.

Now compare him with Average Joe. Average Joe does not always hit a sacrifice fly. He gets the average result from the same mix of runner-on-third opportunities. Sometimes that means a hit. Sometimes a walk. Sometimes a strikeout. Sometimes a groundout. Sometimes the runner scores, and sometimes he does not.

Scenario	Share	Opp / Season	Average Joe RE / Opp	Zac RE / Opp	Zac Edge / Opp	Season Run Edge
Zachary Fly vs Average Joe: Run Expectancy View
Scaled to 30 true runner-on-third, fewer-than-two-out opportunities.
0 Outs
3rd Only	4.7%	1.4	0.011	−0.140	−0.151	−0.216
1st and 3rd	9.2%	2.8	0.019	−0.260	−0.279	−0.770
2nd and 3rd	7.0%	2.1	0.025	−0.269	−0.295	−0.620
Bases Loaded	7.6%	2.3	0.016	−0.304	−0.320	−0.726
1 Out
3rd Only	17.9%	5.4	0.038	0.160	0.122	0.654
1st and 3rd	20.0%	6.0	0.028	0.060	0.032	0.195
2nd and 3rd	16.4%	4.9	0.038	−0.028	−0.066	−0.323
Bases Loaded	17.2%	5.1	0.016	−0.083	−0.099	−0.511
Modeled Season Total
Modeled Season Total	100.0%	30.0	0.027	−0.051	−0.077	−2.318
Average Joe uses the average observed RE change from the same base/out opportunity mix.
Zachary Fly uses the clean sacrifice-fly RE value from Section 5 in those same base/out states.
WAR edge uses 10 runs = 1 WAR as a simple rule of thumb.

The table scales the Section 5 result to a modeled season of 30.0 true sacrifice-fly opportunities. The key column is Season Run Edge, which is Zachary Fly’s RE value minus Average Joe’s RE value after accounting for how often each base/out state appears.

Over 30.0 true sacrifice-fly opportunities, Average Joe comes out around +0.800 run-expectancy runs. Zachary Fly comes out around -1.517. That leaves Zac at -2.318 runs compared with the average hitter, or roughly -0.232 WAR using the 10-runs-per-win shortcut.

That is the uncomfortable part. Zac gets the runner home every time, and Run Expectancy still may not crown him. That is not because the run is ignored. It is because the average alternative is not doing nothing. The average alternative includes hits, walks, extra-base damage, and all the ways an inning can keep building.

So the Run Expectancy answer is clear enough: the sacrifice fly is useful, but it is not automatically better than the average opportunity. The run matters. The out can still give back enough future value to make the play look worse than it feels.

Run Expectancy or Win Expectancy?

The sacrifice fly is not a free run. The run scores, but the out still changes the rest of the inning. If the average hitter in that same spot can sometimes keep the line moving with a hit, a walk, or extra-base damage, then Zachary Fly’s guaranteed out is not automatically the best run-value outcome.

But this is where the question starts to shift. Baseball is not played inside a spreadsheet of average innings. It is played with a score, an inning, a leverage spot, and a game that can swing on one run.

That is the part Run Expectancy is not trying to answer. Run Expectancy asks what the play did to the expected run total for the inning. Win Expectancy asks what the play did to the team’s chance of winning the game.

That distinction matters for Zachary Fly. A sacrifice fly in the second inning of a five-run game is one thing. A sacrifice fly in the ninth inning of a tie game is something else entirely. The box score calls them both sacrifice flies. Run Expectancy may treat them similarly. The game does not.

So that is the split I want to keep in mind from here. Run Expectancy tells us what the play did to the inning. Win Expectancy tells us what the play did to the game. The sacrifice fly sits right between those two ideas.

The WPA Case for Zachary Fly

Section 7 set up the split. Run Expectancy tells us what the play did to the inning. Win Expectancy tells us what the play did to the game.

That distinction matters for Zac. Through Run Expectancy, his guaranteed sacrifice fly can look worse than average because the out lowers the rest-of-inning ceiling. Through Win Probability, the same play can look very different. If the run ties the game, takes the lead, or protects a late lead, the scoreboard may care more than the inning ceiling.

So now I want to give Zac his cleanest argument. Forget the full-season total for a minute. First, where does the sacrifice fly actually help win probability compared with the average result in the same situation?

Sac Fly WPA Edge by Game State

This heat map compares the average hitter result in each cell with the modeled sacrifice fly result. Positive numbers mean Zac’s sac fly is better for win probability than the average result in that spot. Negative numbers mean the average result is better.

Spot	Share	Avg WPA	SF WPA	SF Edge	SF Sample	SF Source
Where Zac Looks Best and Worst by WPA
Positive WPA edge means the modeled sac fly beats the average result in that game state.
Best Zac Spots
9th \| Tie \| 3rd Only, 1 Out	0.18%	0.13	16.82	16.69	14	Exact cell
9th \| Tie \| 1st and 3rd, 1 Out	0.22%	0.76	16.63	15.87	21	Exact cell
Extras \| Down 1 \| 3rd Only, 1 Out	0.15%	−1.61	13.51	15.12	17	Exact cell
Extras \| Tie \| Bases Loaded, 1 Out	0.39%	−1.32	13.57	14.89	41	Exact cell
9th \| Tie \| 2nd and 3rd, 1 Out	0.18%	−0.94	13.36	14.31	14	Exact cell
Worst Zac Spots
9th \| Down 2 \| Bases Loaded, 1 Out	0.14%	−0.33	−13.12	−12.79	16	Exact cell
9th \| Down 2 \| 1st and 3rd, 1 Out	0.14%	−0.32	−10.60	−10.28	14	Exact cell
Extras \| Down 3+ \| Bases Loaded, 0 Outs	0.01%	6.46	−3.75	−10.21	140	Score + base/out
9th \| Down 2 \| 2nd and 3rd, 1 Out	0.11%	−1.07	−9.89	−8.83	16	Exact cell
9th \| Down 2 \| 2nd and 3rd, 0 Outs	0.06%	5.45	−2.16	−7.61	58	Score + base/out
WPA is shown in percentage points. Sac fly source shows whether the estimate came from the exact cell or a broader fallback group.

This is the best argument for Zac. There are real game states where the sacrifice fly is not just acceptable. It is valuable. A sac fly that ties the game late, takes the lead, or protects a narrow margin can beat the average hitter result in a way Run Expectancy does not fully capture.

The warning is that a heat map gives every cell the same amount of space. A ninth-inning tie game and a random early-inning spot look equal on the page, even though they do not happen equally often. So the next step is to stop looking at where Zac can help and ask what it all adds up to over a season.

What Does the WPA Map Add Up To?

Section 8 gave Zac his best case. There are real game states where the sacrifice fly looks much better through Win Probability than it does through Run Expectancy.

The problem is that a heat map gives every cell the same amount of space. A ninth-inning tie game and a random early-inning spot look equal on the page, even though they do not happen equally often.

So this section does the accounting. It takes the WPA edge from each cell and weights it by how often that cell actually appears across about 30.0 runner-on-third, fewer-than-two-out chances in a 30 true SF opportunities season.

Rollup	Share	Opps	Avg WPA	Zac WPA	Zac Gap	Win Eq.	Edge
Season-Weighted WPA Summary
Where Zac’s season value comes from after weighting each spot by frequency
Overall
Season Total	100.00%	30.0	4.37	27.95	23.58	0.236	Zac edge
By Situation
Bases Loaded, 0 Outs	7.56%	2.3	0.59	−1.17	−1.76	−0.018	Avg edge
2nd and 3rd, 0 Outs	7.01%	2.1	0.48	−0.62	−1.10	−0.011	Avg edge
1st and 3rd, 0 Outs	9.18%	2.8	0.24	−3.01	−3.24	−0.032	Avg edge
3rd Only, 0 Outs	4.75%	1.4	0.16	−0.38	−0.54	−0.005	Avg edge
Bases Loaded, 1 Out	17.15%	5.1	0.54	3.58	3.05	0.030	Zac edge
2nd and 3rd, 1 Out	16.41%	4.9	0.70	5.14	4.44	0.044	Zac edge
1st and 3rd, 1 Out	20.00%	6.0	1.32	10.92	9.60	0.096	Zac edge
3rd Only, 1 Out	17.94%	5.4	0.35	13.49	13.14	0.131	Zac edge
By Inning Block
1st-3rd	32.15%	9.6	2.37	5.21	2.83	0.028	Zac edge
4th-6th	33.52%	10.1	2.36	6.78	4.43	0.044	Zac edge
7th-8th	22.61%	6.8	0.21	5.57	5.35	0.054	Zac edge
9th	7.92%	2.4	0.01	2.85	2.83	0.028	Zac edge
Extras	3.80%	1.1	−0.58	7.54	8.13	0.081	Zac edge
By Score Block
Down 3+	13.40%	4.0	−0.30	−8.70	−8.40	−0.084	Avg edge
Down 2	6.88%	2.1	−0.03	−4.80	−4.77	−0.048	Avg edge
Down 1	10.54%	3.2	0.90	5.99	5.09	0.051	Zac edge
Tied	24.08%	7.2	1.49	21.32	19.83	0.198	Zac edge
Up 1	14.68%	4.4	0.91	6.96	6.05	0.060	Zac edge
Up 2	10.00%	3.0	0.79	3.90	3.11	0.031	Zac edge
Up 3+	20.43%	6.1	0.61	3.27	2.66	0.027	Zac edge
Zac Gap = Zac WPA minus average hitter WPA, weighted by situation frequency. Win Eq. = Zac Gap / 100. This is a win-probability translation, not literal WAR.

After weighting the WPA map by frequency, Zac’s season total comes out to +23.58 WPA points. Since WPA is measured in percentage points here, that translates to about +0.236 wins of win-probability value.

This is not literal WAR. WAR is a broader, mostly context-neutral framework that compares a player to replacement level. This number is narrower. It only tells us what Zac’s sacrifice-fly approach adds or subtracts through Win Probability in these runner-on-third spots.

Two Answers to the Same Zac Question

At this point, the article has two answers to the same basic question. Run Expectancy gives one answer. Win Probability gives another.

That does not mean one is right and the other is wrong. They are measuring different things. Run Expectancy is asking what the play did to the inning. Win Probability is asking what the play did to the game.

The point of this section is not to turn WPA into WAR. WPA is not WAR. It is too context-heavy for that. The point is simply to put both answers on a rough win scale so we can compare the size and direction of the argument.

Frame	Question	Unit	Average	Zac	Zac Gap	Rough Win Scale	Edge
Two Answers to the Same Zac Question
Run Expectancy measures the inning. Win Probability measures the game.
Run Expectancy	What did Zac do to the inning?	Runs	0.800	-1.517	-2.318	−0.232	Average
Win Probability	What did Zac do to the game?	WPA Points	4.37	27.95	+23.58	0.236	Zac
Rough win scale: RE estimate = run gap / 10. WPA estimate = WPA point gap / 100. This is a comparison scale, not official WAR.

The Run Expectancy answer estimates Zac at -0.232 on the rough win scale. The Win Probability answer estimates Zac at +0.236 on that same rough scale.

Those are not contradictory results. They are answers to different questions. RE is judging the expected value of the inning. WPA is judging the chance of winning the game.

That is the tension in one table. Zac can look worse through Run Expectancy because the out lowers the inning ceiling. He can look more defensible through Win Probability because some runs matter more to the game than they do to the inning.

How to Read the Player Metrics

Before moving from Zachary Fly and Average Joe to real hitters, it is worth cleaning up the language.

The rest of the article uses a few related but different metrics. They all live around the same play, but they do not answer the same question.

Metric	Plain English	Question It Answers	How to Read It
Player Metric Cheat Sheet
Same opportunity, different value questions
Run Expectancy	Average inning value.	Did the play help the expected run total for the inning?	This is why the out matters. A run can score and the play can still lose inning value.
Win Probability	Game-value context.	Did the play help the batting team win the game?	This is why inning, score, and leverage matter. One run is not always just one run.
RSOE / 100	Runner Scored Over Expected per 100 chances.	Did the hitter get the runner from third home more often than expected?	This is the broader baseball result. It includes more than official sacrifice flies.
SFOE / 100	Sac Flies Over Expected per 100 chances.	Did the hitter produce more official sacrifice flies than expected?	This is the narrower box-score label. It captures one version of getting the runner home.
WPA / 100	Win Probability Added per 100 chances.	Did those runner-on-third chances add game value?	This keeps the game-state layer attached to the player results.
Gap	RSOE / 100 minus SFOE / 100.	Is runner-home value larger than official sac-fly credit?	Positive means the broader result is stronger than the scoring label. Negative means sac-fly credit is stronger than the broader runner-home result.
All /100 metrics are scaled per 100 true sacrifice-fly opportunities: runner on third with fewer than two outs.

The key is not to treat these as interchangeable. A hitter can be strong at getting the runner home without piling up official sacrifice flies. He can also collect official sacrifice-fly credit without proving that the broader runner-home skill is as strong.

That is why the player section has to be careful. The goal is not to crown the king of sacrifice flies. The goal is to see whether there is any real separation between hitters once we compare the runner-home result, the scoring label, and the game-value layer.

Do Real Hitters Actually Separate?

Zachary Fly is useful because he gives us a clean thought experiment. Real baseball is messier. Players do not get the same opportunities, they do not face the same pitchers, and they do not always need the same type of contact.

So the next question is simple: do actual hitters separate on sacrifice flies over expected, or is this just noise dressed up as a leaderboard?

This section looks at real hitters and compares their clean official sacrifice flies to their expected sacrifice flies. The main rate is SF OE / 100, which means sacrifice flies over expected per 100 runner-on-third opportunities.

Sac Flies Over Expected Leaderboard

This table shows the top and bottom hitters by SF OE / 100 from 2015–2026, with a minimum of 100 runner-on-third opportunities. I am keeping the table narrow on purpose. The goal is the spread, not every supporting calculation.

Rank	Hitter	Years	Opp.	SF	Exp. SF	SF OE	SF OE / 100	ROE / 100
Actual Player Results: Sac Flies Over Expected
Clean sacrifice flies compared with expected sacrifice flies
Top 15: Most SF OE / 100
1	Mountcastle, Ryan	2020–2026	148	36	19.3	16.7	11.3	8.2
2	Smith, Will	2019–2026	201	47	26.7	20.3	10.1	4.9
3	Gurriel, Yuli	2016–2025	211	45	26.9	18.1	8.6	12.2
4	Zimmerman, Ryan	2015–2021	130	27	16.4	10.6	8.2	0.7
5	Gregorius, Didi	2015–2022	195	40	25.0	15.0	7.7	5.4
6	Arraez, Luis	2019–2026	150	31	19.7	11.3	7.6	15.0
7	Merrifield, Whit	2016–2024	219	44	27.7	16.3	7.4	5.2
8	Wendle, Joey	2016–2024	119	24	15.5	8.5	7.1	5.6
9	Aguilar, Jesús	2017–2023	181	36	23.3	12.7	7.0	1.8
10	Suzuki, Kurt	2015–2022	145	28	18.1	9.9	6.8	4.1
11	Vogt, Stephen	2015–2022	119	23	14.9	8.1	6.8	11.1
12	Verdugo, Alex	2018–2025	163	32	21.2	10.8	6.6	0.6
13	Rendon, Anthony	2015–2024	245	47	30.8	16.2	6.6	9.5
14	Arenado, Nolan	2015–2026	385	75	49.8	25.2	6.5	6.2
15	Kipnis, Jason	2015–2020	152	29	19.3	9.7	6.4	−1.3
Bottom 15: Fewest SF OE / 100
1	Gallo, Joey	2015–2024	176	4	22.6	−18.6	−10.6	−15.4
2	Souza Jr., Steven	2015–2022	100	4	12.8	−8.8	−8.8	−5.7
3	Lopez, Nicky	2019–2026	111	5	13.8	−8.8	−7.9	1.4
4	Rodríguez, Julio	2022–2026	140	8	18.5	−10.5	−7.5	−0.9
5	Zunino, Mike	2015–2023	137	8	17.2	−9.2	−6.7	−13.9
6	Alfaro, Jorge	2017–2025	100	6	12.7	−6.7	−6.7	2.9
7	Maybin, Cameron	2015–2021	116	7	14.5	−7.5	−6.4	0.2
8	Anderson, Tim	2016–2025	181	11	22.5	−11.5	−6.3	−7.2
9	Steer, Spencer	2022–2026	138	10	18.5	−8.5	−6.2	−4.2
10	Smoak, Justin	2015–2020	123	8	15.6	−7.6	−6.2	−9.5
11	Maldonado, Martín	2015–2025	150	10	19.0	−9.0	−6.0	−10.7
12	Lux, Gavin	2019–2025	100	7	12.9	−5.9	−5.9	−1.8
13	Yelich, Christian	2015–2026	318	23	41.3	−18.3	−5.8	3.2
14	Sánchez, Jesús	2020–2026	128	11	17.7	−6.7	−5.3	−2.3
15	Mercer, Jordy	2015–2021	133	10	16.8	−6.8	−5.1	1.9
Sorted by SF OE / 100, then total SF OE, then opportunities. ROE / 100 shows runner-from-third scoring over expected per 100 opportunities.

The top hitter by sacrifice flies over expected per 100 opportunities was Mountcastle, Ryan, at +11.3 SF OE / 100. Across the full sample, he finished at +16.7 total SF OE.

The lowest hitter by sacrifice flies over expected per 100 opportunities was Gallo, Joey, at -10.6 SF OE / 100. Across the full sample, he finished at -18.6 total SF OE.

This is where the article shifts from the model to the players. The leaderboard shows separation, which matters. It tells us there are hitters who finished well above and well below expectation in these spots.

Signal or Noise: Does Sac Fly Skill Repeat?

The player leaderboard showed separation. That matters, but it is not enough. A leaderboard can show who finished above expectation without proving that the result is repeatable.

So this section tests the next question: if a hitter beats expectation in one sample, does that tell us anything about what he does in another sample?

I split the data into odd and even seasons from 2015 through 2024. That keeps the test balanced and avoids giving one side a partial 2026 season. Then I compare odd-season performance to even-season performance.

Odd/Even Regression Test

The table below uses a simple regression: even-season performance as a function of odd-season performance. A positive slope means the odd-season result carried forward. The p-value and R-squared tell us whether that pattern looks meaningful or mostly noisy.

Outcome	Players	Model Output						Takeaway
Signal or Noise: Odd/Even Sac Fly Skill Test
Even-season result regressed on odd-season result, minimum 10 opportunities per split
Outcome	Players	Intercept	Slope	t-stat	p-value	R²	Corr.	Takeaway
Positive WPA Edge
SF OE / 100	608	−0.082	0.035	0.841	0.4009	0.001	0.034	Mostly noise
ROE / 100	608	−0.526	0.207	4.736	0.0000	0.036	0.189	Some evidence of signal
Negative WPA Edge
SF OE / 100	383	−0.385	0.047	0.939	0.3482	0.002	0.048	Mostly noise
ROE / 100	383	2.594	0.190	3.916	0.0001	0.039	0.197	Some evidence of signal
Situational Edge
SF OE / 100	380	0.329	−0.003	−0.060	0.9522	0.000	−0.003	Mostly noise
ROE / 100	380	−3.028	0.123	2.166	0.0310	0.012	0.111	Some evidence of signal
SF OE / 100 is clean sacrifice flies over expected. ROE / 100 is runner scored from third over expected. Situational Edge = Positive WPA Edge result minus Negative WPA Edge result.

The main official-sac-fly test is the Situational Edge: SF OE / 100 row. Its slope was -0.003, with a p-value of 0.9522 and an R² of 0.000.

The broader runner-scored backup test is the Situational Edge: ROE / 100 row. Its slope was 0.123, with a p-value of 0.0310 and an R² of 0.012.

This is the section that keeps the article honest. If the situational-edge rows show a positive and meaningful relationship, then the sacrifice-fly argument has some repeatable signal behind it. If they are flat, then the leaderboard is probably mostly noise.

Does Runner-Home Skill Predict Future Sac Fly Win Value?

This is the cleaner player-level test. Instead of asking whether official sacrifice-fly credit repeats, I use the first sample to sort hitters by runner-home overperformance, then check whether those hitters created more sacrifice-fly win value in the second sample.

That gets much closer to the actual article question. If a hitter is good at getting the runner from third home, does that show up later in the form of real win-probability value from sacrifice flies?

Did Runner-Home Skill Turn Into Future Sac Fly Win Value?

This scatterplot is the out-of-sample test. Each point is one hitter. To qualify, a player needed 20+ true SF opportunities in each sample and 10+ total official sacrifice flies.

The x-axis is Sample A RSOE / 100. That is the earlier runner-home signal. The y-axis is Sample B Net SF WPA / 100. That is the later win-probability value created through official sacrifice flies.

The color shows Sample B Net WPA SF%. Green means the player’s Sample B sacrifice flies skewed positive-WPA. Red means they skewed negative-WPA. Bubble size is Sample B official sacrifice flies.

The Direct Answer

The top Sample A RSOE group created clearly more Sample B Net SF WPA / 100 than the bottom group. The top-minus-bottom gap in future Net SF WPA / 100 was 3.42. The top-minus-bottom gap in future Net WPA SF% was 0.1%.

Sample A RSOE Group	Players	A Opps	B Opps	B Official SF	A RSOE / 100	B RSOE / 100	B Net SF WPA / 100	B WPA+ SF %	B WPA- SF %	B Net WPA SF %	B Avg SF WPA Pts
Does Early RSOE Separate Future Sac Fly Win Value?
Players grouped by Sample A RSOE / 100, then evaluated by Sample B results
Top 25% Sample A RSOE	56	5,361	5,499	773	9.51	3.71	14.56	60.7%	29.6%	31.0%	1.04
Middle 50% Sample A RSOE	113	12,840	10,489	1,469	1.52	0.68	13.23	62.8%	28.3%	34.4%	0.94
Bottom 25% Sample A RSOE	57	4,884	6,245	850	−7.19	−2.06	11.14	60.0%	29.1%	30.9%	0.82
Buckets are based only on Sample A RSOE / 100. Sample B columns are out-of-sample results.

Metric	Value
Section 13 Answer Key
Did earlier runner-home value predict future sac-fly WPA value?
Qualified hitters	226
Minimum true SF opps	20 per sample
Minimum total official SF	10 total
Sample A RSOE to Sample B Net SF WPA / 100 correlation	0.108
Sample A RSOE to Sample B Net SF WPA / 100 R-squared	0.012
Sample A RSOE to Sample B RSOE correlation	0.305
Sample A RSOE to Sample B Net WPA SF% correlation	0.072
Sample A Net SF WPA to Sample B Net SF WPA correlation	0.112
Top-minus-bottom future Net SF WPA / 100 gap	3.42
Top-minus-bottom future Net WPA SF% gap	0.1%
This is an out-of-sample test: Sample A identifies the runner-home signal, Sample B evaluates the later sac-fly WPA result.

Player	A Opps	B Opps	Total SF	B SF	A RSOE / 100	B RSOE / 100	B Net SF WPA / 100	B Net WPA SF %	B WPA+ SF %	B WPA- SF %
Player Examples
Who supports the signal, and who complicates it?
Best Sample A RSOE / 100
Gurriel Jr., Lourdes	43	167	34	27	26.45	4.36	14.85	25.9%	59.3%	33.3%
Reynolds, Bryan	30	167	23	20	18.88	8.61	8.26	−5.0%	45.0%	50.0%
Blackmon, Charlie	130	111	33	10	18.41	1.28	1.26	10.0%	50.0%	40.0%
Tellez, Rowdy	31	115	22	16	15.63	−2.53	18.00	31.2%	62.5%	31.2%
Cooper, Garrett	30	57	10	8	14.54	−0.25	20.18	62.5%	75.0%	12.5%
Highest Sample B Net SF WPA / 100
Tapia, Raimel	34	45	10	6	6.64	19.75	68.89	100.0%	100.0%	0.0%
Polanco, Gregory	145	30	27	8	5.00	−3.37	142.67	25.0%	50.0%	25.0%
Haniger, Mitch	82	69	22	12	−1.09	−0.55	51.88	66.7%	83.3%	16.7%
Upton, Justin	170	20	23	3	−1.60	−19.20	93.00	100.0%	100.0%	0.0%
Acuña Jr., Ronald	73	90	18	14	−2.09	3.37	62.78	71.4%	85.7%	14.3%
Good early RSOE, weak future SF WPA
Lowrie, Jed	87	37	18	6	9.99	4.56	−42.70	0.0%	50.0%	50.0%
Arenado, Nolan	211	174	75	32	9.19	2.51	−3.22	28.1%	59.4%	31.2%
Gamel, Ben	52	36	11	5	8.49	2.82	−52.22	−20.0%	40.0%	60.0%
Bell, Josh	127	204	43	20	8.36	2.89	−2.75	0.0%	50.0%	50.0%
Rizzo, Anthony	218	79	38	10	8.19	7.84	−0.13	0.0%	40.0%	40.0%
Weak early RSOE, strong future SF WPA
Bote, David	42	30	10	4	−0.19	−0.95	38.33	100.0%	100.0%	0.0%
Haniger, Mitch	82	69	22	12	−1.09	−0.55	51.88	66.7%	83.3%	16.7%
Upton, Justin	170	20	23	3	−1.60	−19.20	93.00	100.0%	100.0%	0.0%
Acuña Jr., Ronald	73	90	18	14	−2.09	3.37	62.78	71.4%	85.7%	14.3%
Hicks, Aaron	127	50	24	5	−3.38	−6.28	40.80	80.0%	80.0%	0.0%
The middle groups are the most important caution groups because they show where early RSOE and future SF WPA value disagree.

What This Actually Says

This is a better test than simply checking whether SFOE repeats. SFOE is the scoring label. This section uses the earlier runner-home signal and asks whether it predicts future sacrifice-fly win value.

If the trend slopes upward, the players who were better at getting the runner home in Sample A also created more sac-fly WPA value in Sample B. If the trend is flat, then runner-home value may still matter, but the WPA timing piece is probably too noisy at the player level.

That is basically a flat out-of-sample signal. The top Sample A RSOE group created clearly more Sample B Net SF WPA / 100 than the bottom group.

Runner Home When It Matters

This is the player-level version of the WPA question. The league-level chart can tell us what percentage of all sacrifice flies are positive or negative WPA plays. This asks something more useful: which hitters are producing those positive-WPA sacrifice flies?

The x-axis is RSOE / 100, which measures how often a hitter gets the runner from third home above expectation. The y-axis is Positive WPA SF %, which measures the share of that hitter’s official sacrifice flies that increased his team’s win probability.

Do the Best Runner-Home Players Create Sac Fly Win Value?

This scatterplot is the direct test. Each point is one hitter. To qualify, a player needed at least 100 true SF opportunities and at least 10 official sacrifice flies from 2015–2026.

The x-axis shows how often the hitter got the runner from third home above expectation. The y-axis shows the net batting-team WPA points he created through official sacrifice flies, scaled per 100 true sacrifice-fly opportunities.

The color shows the direction of the player’s official sacrifice flies. Green means his sac flies skewed positive-WPA. Red means they skewed negative-WPA. Bubble size is official sacrifice flies.

The Direct Answer

The top RSOE group created clearly more Net SF WPA / 100 than the bottom RSOE group. The top-minus-bottom gap in Net SF WPA / 100 was 3.98. The top-minus-bottom gap in Net WPA SF% was 3.0%.

RSOE Group	Players	Opps	Official SF	Avg RSOE / 100	Avg SFOE / 100	Net SF WPA / 100	WPA+ SF %	WPA- SF %	Net WPA SF %	Avg SF WPA Pts
Does RSOE Separate Sac Fly Win Value?
Qualified hitters grouped by RSOE / 100
Top 25% RSOE / 100	87	15,183	2,315	8.47	2.14	14.22	62.8%	27.0%	35.8%	0.93
Middle 50% RSOE / 100	174	32,028	4,352	1.54	0.79	11.64	61.8%	28.0%	33.8%	0.86
Bottom 25% RSOE / 100	87	15,103	1,756	−5.56	−1.25	10.24	60.9%	28.0%	32.9%	0.88
Net SF WPA / 100 is batting-team WPA points from official sacrifice flies per 100 true SF opportunities. Net WPA SF % is WPA+ SF % minus WPA- SF %.

Metric	Value
Section 14 Answer Key
Does runner-home overperformance connect to sac fly win value?
Qualified hitters	348
Minimum true SF opps	100
Minimum official SF	10
Qualified-player Net SF WPA / 100	11.93
Qualified-player WPA+ SF%	61.9%
Qualified-player WPA- SF%	27.7%
Qualified-player Net WPA SF%	34.1%
RSOE vs Net SF WPA / 100 correlation	0.147
RSOE vs Net SF WPA / 100 R-squared	0.022
RSOE vs Net WPA SF% correlation	0.042
Top-minus-bottom Net SF WPA / 100 gap	3.98
Top-minus-bottom Net WPA SF% gap	3.0%
Net WPA SF% = Positive WPA SF% minus Negative WPA SF%. Net SF WPA / 100 uses actual WPA points.

Player	Opps	Official SF	RSOE / 100	SFOE / 100	Net SF WPA / 100	WPA+ SF %	WPA- SF %	Net WPA SF %	Avg SF WPA Pts
Player Examples
Who drives the relationship, and who complicates it?
Best RSOE / 100
Ramírez, Harold	122	15	19.76	−0.46	35.90	73.3%	13.3%	60.0%	2.92
Rutschman, Adley	104	19	16.13	4.90	35.00	63.2%	10.5%	52.6%	1.92
Arraez, Luis	150	31	14.98	7.56	13.67	61.3%	22.6%	38.7%	0.66
Kwan, Steven	122	19	13.40	2.12	26.39	57.9%	42.1%	15.8%	1.69
Kirk, Alejandro	128	23	13.26	4.39	23.05	52.2%	21.7%	30.4%	1.28
Highest Net SF WPA / 100
Gattis, Evan	117	20	11.54	4.39	49.49	70.0%	15.0%	55.0%	2.90
Marsh, Brandon	112	20	3.79	4.55	46.16	75.0%	20.0%	55.0%	2.58
Lowe, Brandon	146	21	1.33	1.09	41.51	71.4%	14.3%	57.1%	2.89
Laureano, Ramón	140	22	−1.04	2.48	47.93	72.7%	13.6%	59.1%	3.05
Jiménez, Eloy	108	11	−2.62	−2.46	38.43	72.7%	18.2%	54.5%	3.77
Lowest Net SF WPA / 100
Beltré, Adrian	152	23	12.00	2.23	−11.71	52.2%	39.1%	13.0%	−0.77
India, Jonathan	121	19	6.26	2.65	−14.21	31.6%	52.6%	−21.1%	−0.91
Hosmer, Eric	234	20	6.11	−4.09	−10.73	45.0%	50.0%	−5.0%	−1.26
Turang, Brice	108	11	0.55	−3.36	−15.83	45.5%	27.3%	18.2%	−1.55
O’Neill, Tyler	124	19	−5.14	2.38	−10.48	47.4%	42.1%	5.3%	−0.68
Good RSOE, lower SF WPA
García Jr., Luis	132	21	12.69	2.32	7.50	47.6%	23.8%	23.8%	0.47
Beltré, Adrian	152	23	12.00	2.23	−11.71	52.2%	39.1%	13.0%	−0.77
Naylor, Josh	211	24	11.65	−2.07	9.76	75.0%	25.0%	50.0%	0.86
Ruiz, Keibert	107	12	11.30	−2.05	7.57	66.7%	16.7%	50.0%	0.67
Blackmon, Charlie	241	33	10.52	0.98	11.16	54.5%	27.3%	27.3%	0.82
The final group highlights players with positive RSOE / 100 but below-average Net SF WPA / 100.

What This Actually Says

The scatterplot is not asking whether every good runner-home player is clutch. It is asking whether the player-level runner-home signal lines up with actual win-probability value from official sacrifice flies.

This version is better than using Positive WPA SF% alone because it uses the size of the WPA movement, not just the direction. A tiny positive sac fly and a huge positive sac fly should not count the same.

The top RSOE group created clearly more Net SF WPA / 100 than the bottom RSOE group. The correlation between RSOE / 100 and Net SF WPA / 100 was 0.147, with an R² of 0.022.

The Evidence Stack

At this point, the article has looked at the sacrifice fly from a few different angles. Each one answers a slightly different question.

That is important because the whole debate is not really about one scoring rule. It is about what kind of value we are trying to measure.

Layer	Question	What It Showed	Why It Matters
The Evidence Stack
Each layer answers a different version of the sacrifice-fly question
Run Expectancy	What did the play do to the inning?	The run-expectancy math showed why the out still has a cost, even when the run scores.	This explains why WAR can be skeptical of the sacrifice fly. The run scores, but the out changes the remaining inning value.
Zachary Fly	What happens if one hitter always gets the runner home this way?	Over a 30 true SF opportunities season, Zac finished -2.318 runs versus the average hitter, or about -0.232 WAR using the simple 10 runs per WAR scale.	This makes the tradeoff easier to see. Zac can do the thing we want and still look worse through a strict run-expectancy lens.
Win Probability	Did the run matter to winning the game?	The WPA map showed that the same sac fly can look very different depending on inning, score, and base-out state.	This is the missing context. RE measures average run value, while WPA shows why one run is not always just one run.
Player Signal	Is this just random noise?	The player sections tested whether hitters separate from each other instead of assuming the leaderboard is automatically meaningful.	This keeps the article honest. A better value framework still needs to avoid overclaiming small samples or noisy player rankings.
RSOE vs SFOE	Does the box-score label capture the broader result?	Among qualified hitters, RSOE / 100 and SFOE / 100 had a player-level correlation of 0.398.	This separates the baseball result from the scoring label. Getting the runner home is broader than getting official sacrifice-fly credit.
WPA / 100	Do those player results also carry game value?	WPA adds the game-value layer by asking how much the play changed the team’s chance to win.	This ties the player section back to the main argument. The best version of the question includes runner-home value, official credit, and game value.
This table summarizes the argument before the final WAR interpretation. It does not add a new model.

So the final question is not whether the sacrifice fly is always good or always bad. It is whether a WAR framework built mostly around average run value can fully capture a play whose value often depends on the exact game state.

What WAR Sees, and What It Misses

The point is not that every sacrifice fly should automatically help a hitter’s WAR. That would be too simple. Some sacrifice flies are routine. Some come in low-value spots. Some are just ordinary outs where the runner happened to score.

The better point is that Run Expectancy only answers one question: how did this play change the average number of runs expected in the inning?

Zachary Fly and Average Joe make the tension easier to see. Average Joe represents the normal baseline. He might do more than hit a sacrifice fly. He might single, walk, homer, strike out, or fail completely. Zac is simpler. With a runner on third and fewer than two outs, he banks the run with a sacrifice fly.

Through a strict Run Expectancy lens, Average Joe can look better because he preserves the upside of a bigger inning. Through a Win Probability lens, Zac can look better because in some game states the run matters more than the remaining upside.

That is the whole issue. WAR has good reasons to avoid becoming a pure context stat. The problem is that stripping away context completely can also strip away the reason a play mattered.

Lens	What It Sees	What It Misses	Why It Matters
How Each Framework Sees the Sacrifice Fly
The same play can look different depending on the value lens
Run Expectancy	The average run value of the base-out state before and after the play.	Score, inning, leverage, and whether one specific run changes the game.	The sacrifice fly can lose inning value because the out has a real cost.
Win Probability	How much the play changed the batting team’s chance to win the game.	It can over-credit context, teammate setup, and game situation.	It explains why the same sacrifice fly can matter very differently depending on inning and score.
Zachary Fly vs Average Joe	The cleanest version of the tradeoff: bank the run or preserve the bigger-inning upside.	It is still a thought experiment, not a direct player valuation model.	Over a 30 true SF opportunities season, Zac finished -2.318 runs versus Average Joe, or about -0.232 WAR using the simple 10 runs per WAR scale.
RSOE vs SFOE	The difference between getting the runner home and getting official sacrifice-fly credit.	Intent, approach, and whether the official scoring label fully captures the plate appearance.	In the player section, RSOE / 100 and SFOE / 100 had a correlation of 0.398.
WPA / 100	Whether the player-level results also carry game-value impact.	It is still context-heavy and should not simply replace WAR.	WPA adds the game-value layer by asking how much the play changed the batting team’s chance to win.
WAR	A context-neutral estimate of player value above replacement.	Small situational value that only matters because of score, inning, or leverage.	WAR is useful because it avoids overreacting to context, but that same choice can miss some real game value.
The sacrifice fly is useful as a case study because it separates average run value, game value, player signal, and official scoring credit.

The Balanced Takeaway

The sacrifice fly does not prove WAR is broken. That would be too strong. WAR is trying to measure player value in a neutral way, and that is the whole reason it is useful.

The issue is the tradeoff. A context-neutral model is cleaner, more stable, and easier to defend. At the same time, it can miss plays where the entire value is tied to the situation.

Run Expectancy is not wrong. It is doing what it is built to do. It measures average run value. The problem is that average run value is not always the same thing as helping your team win that game, in that inning, with that runner on third.

That is why the Zachary Fly versus Average Joe comparison works. Zac is not automatically better. Joe is not automatically better. The answer depends on the question being asked.

If the question is average inning value, Joe can have the stronger case. If the question is banking a specific run in a specific game state, Zac can have the stronger case. WAR mostly lives closer to the first question. Baseball games are often decided by the second.

That is also why RSOE, SFOE, and WPA matter. RSOE asks whether the runner scored more often than expected. SFOE asks whether the player got official sacrifice-fly credit more often than expected. WPA asks whether those events helped win the game. Those three things overlap, but they are not identical.

The sacrifice fly is interesting because it lives in that gap. It is a small play, but it exposes a big measurement problem.

The Sac Fly Is Small. That’s the Point.

After all of this, I do not think the answer is that sacrifice flies are secretly some massive hidden WAR flaw.

They are not. One sacrifice fly is tiny. Even a full season of these chances usually moves the needle by fractions of a win.

That is exactly why I like the play as a case study. The sacrifice fly is small enough to understand, common enough to measure, and weird enough to show where different value systems disagree.

Run Expectancy sees the cost of the out. Win Probability sees the value of the run in context. RSOE sees whether the runner actually came home. SFOE sees whether the hitter got the official sacrifice-fly label. WAR has to decide how much of that context it wants to keep.

That is not an easy decision. A fully context-neutral stat can miss the importance of the moment. A fully context-driven stat can give a player too much credit for the situation around him.

Question	Answer	Why
Final Answers
The sacrifice fly is small, but the measurement question is not
Should every sacrifice fly help WAR?	No.	The out has a real cost, and not every run-scoring out is equally valuable.
Is Run Expectancy wrong?	No.	It answers the average inning-value question well. It just does not answer every value question.
Is Win Probability the full answer?	No.	It captures game context, but it can over-credit the situation around the player.
Does the official sacrifice-fly label tell the whole story?	No.	Getting the runner home is broader than getting official sacrifice-fly credit.
Does Zachary Fly beat Average Joe?	It depends.	Average Joe can win the average run-value argument. Zachary Fly can win the specific game-state argument.
What is the real takeaway?	WAR is useful, but incomplete.	A run-expectancy-heavy framework can miss value that only appears when inning, score, and game state matter.
This is a summary of the article’s argument, not a proposed replacement WAR formula.

If the question is average run value, the out matters a lot. If the question is game value, the run can matter more. If the question is player skill, we need to be careful and separate real signal from noisy scoring labels.

That is the whole lesson. The sacrifice fly is not a big play. It is a small play that forces the bigger question into the open.

The Sacrifice of Flies in WAR

A run scores, an out is made. Which matters more?

DJ Barry

One Number to Rule Them All

Even When the Run Scores, the Out Still Counts

The Curious Case of Zachary Fly

Run Expectancy or Win Expectancy?

The WPA Case for Zachary Fly

Sac Fly WPA Edge by Game State

What Does the WPA Map Add Up To?

Two Answers to the Same Zac Question

How to Read the Player Metrics

Do Real Hitters Actually Separate?

Sac Flies Over Expected Leaderboard

Signal or Noise: Does Sac Fly Skill Repeat?

Odd/Even Regression Test

Does Runner-Home Skill Predict Future Sac Fly Win Value?

Did Runner-Home Skill Turn Into Future Sac Fly Win Value?

The Direct Answer

What This Actually Says

Runner Home When It Matters

Do the Best Runner-Home Players Create Sac Fly Win Value?

The Direct Answer

What This Actually Says

The Evidence Stack

What WAR Sees, and What It Misses

The Balanced Takeaway

The Sac Fly Is Small. That’s the Point.