Breaking Ball Taxonomy: Curveball, Slider, and Sweeper

Introduction

This piece was written by Dave Smith, a quantitative ecologist with the Idaho Department of Fish and Game who uses Bayesian hierarchical models professionally to estimate salmon and steelhead returns to Idaho. He is also a baseball nut.

Pitchers and coaches spend enormous energy developing the sweeper. It shows up in pitch design sessions, Driveline writeups, and player development philosophies across the league. The implicit assumption is that a sweeper is genuinely different from a slider, different enough to justify treating it as a distinct pitch with its own benchmarks, its own development pathway, its own expected outcomes.

That assumption deserves scrutiny.

Statcast introduced the sweeper label publicly in 2022, though it has retroactively applied it to pitches as far back as 2011, albeit only 33 pitches from a single pitcher (Bryan Shaw, Diamondbacks). Even with retroactive labeling, the sweeper’s share of breaking balls has grown roughly fivefold between 2020 and 2025, almost entirely at the expense of the slider.

The plot above shows the share of breaking balls assigned each Statcast label per season from 2020 to 2025. The sweeper share has grown steadily, largely displacing sliders. Knuckle-curves remain a small and stable share throughout.

That growth raises a real question: is the sweeper actually a different pitch, or is it just a slider with better branding and a development industry built around it?

This piece tries to answer that using Statcast pitch-tracking data from 2020 through 2025. The headline result: after conditioning on movement, sliders and sweepers are statistically indistinguishable on all five pitcher-controlled process outcomes. They get whiffs at the same rate, induce chases at the same rate, generate strikes at the same rate. The sweeper behaves like an extreme region of slider movement space, not a categorically different pitch. Where it does separate from sliders is on contact suppression: sweepers allow lower exit velocity, generate more popups, and give up fewer hard-hit balls. That distinction is real and worth taking seriously. But it is narrower than the current taxonomy implies, and the implications for how we evaluate and develop pitchers are worth thinking through carefully.

One caveat up front: because Statcast’s own classifier uses movement data to assign labels, any analysis comparing labels against movement is partly circular. What we can do is ask whether the movement space has clean boundaries where the labels draw lines, and whether the labels carry any additional information once movement is accounted for.

The Data

Pitch counts by type and season.
Season	CU (n)	CU (%)	SL (n)	SL (%)	ST (n)	ST (%)	KC (n)	KC (%)
2020	16,401	28.0	34,002	58.1	2,630	4.5	5,481	9.4
2021	38,104	25.7	85,810	58.0	12,144	8.2	11,946	8.1
2022	36,052	24.4	79,293	53.6	21,100	14.3	11,449	7.7
2023	31,687	21.4	75,901	51.2	30,907	20.9	9,639	6.5
2024	31,236	21.1	72,402	49.0	35,392	24.0	8,669	5.9
2025	32,967	22.3	69,509	47.1	36,126	24.5	8,967	6.1

The analysis uses six movement features: horizontal break, vertical break, spin rate, spin axis, release extension, and arm angle. In the plots and tables throughout this piece we use Statcast’s abbreviations: CU (curveball), SL (slider), ST (sweeper), and KC (knuckle-curve). All features are adjusted for pitcher handedness and residualized against season to strip out year-to-year classifier drift before any analysis. Release speed was left out because it’s so tightly correlated with vertical break (r = -0.81) that it doesn’t add independent information. Knuckle-curves are included in the visualization but combined with curveballs for the outcome models, where they behave almost identically.

Stage 1: Does the movement space actually have boundaries?

The first question is the most basic: when you plot every curveball, slider, and sweeper in movement space, do they form distinct clusters or one big blob?

To answer this, we ran a principal components analysis (PCA) on all six movement features. PCA is a way of finding the main axes of variation in a dataset, the directions in which pitches differ from each other the most. We retained all six components because each one explained at least 5% of total variance.

The first component (PC1) is essentially a curveball-to-sweeper gradient: it runs from high vertical break and spin rate on one end (curveball territory) to high spin axis and horizontal break on the other (sweeper/slider territory). PC2 captures a secondary dimension driven mainly by arm angle and horizontal break.

The table above shows PCA loadings for each movement feature on the first two components. Positive PC1 values point toward the sweeper/slider direction; negative PC1 values point toward the curveball direction.
Feature	PC1	PC2
Horiz. break	0.167	-0.328
Vert. break	-0.447	-0.178
Spin rate	-0.315	0.273
Spin axis	0.467	0.065
Extension	0.069	-0.087
Arm angle	-0.096	-0.385

The plot above shows every pitch in movement space (50,000-pitch subsample). The shaded regions show the densest 50% of each pitch type’s distribution. Slider and sweeper core areas overlap by 26.9%, meaning they’re sharing the same movement real estate. There’s no gap between them.

The biplot tells the story visually, but we can put a number on it. Using kernel density estimation, we measured how much of each pitch type’s movement footprint overlaps with others. The table below shows two overlap metrics: the core area (the densest 50% of each pitch type’s distribution) and the broader 95% range.

How much movement space do pitch types share? Core area = the densest 50% of each pitch type’s distribution; 95% range = the broader footprint. Sliders and sweepers share more than a quarter of their core movement space. Curveballs and knuckle-curves are almost interchangeable, which is why they’re combined in the outcome models below.
Pair	Core area overlap	95% range overlap
SL vs. ST	26.9%	68.4%
CU vs. ST	0.9%	48.2%
CU vs. SL	0.0%	37.2%
CU vs. KC	59.5%	80.4%
KC vs. SL	0.3%	43.0%
KC vs. ST	13.9%	45.5%

Sliders and sweepers share 26.9% of their core movement areas and 68.4% of their broader footprints. That’s a lot of overlap for two pitches that are supposed to be distinct categories. Curveball and sweeper, by contrast, barely touch (0.9% core overlap), which makes sense given how different the physics are. Knuckle-curves and curveballs are nearly interchangeable in movement space (59.5% core overlap), which is why they’re combined for the outcome models below.

Stage 2: Do the labels predict different outcomes for pitchers?

Movement overlap is suggestive, but the real test is outcomes. If sliders and sweepers move through the same space, do they also produce the same results?

The model is straightforward: for each outcome, we ask whether knowing a pitch was called a sweeper rather than a slider adds any predictive information once the pitch’s actual movement is already accounted for. If the sweeper label has independent value, its coefficient should be clearly non-zero. If the label is just a name for a particular region of slider movement space, the coefficient should be indistinguishable from zero.

We tested this on five pitcher-controlled outcomes: whiff rate, chase rate, strike rate, called strike rate, and zone rate. The sweeper is the reference category. Curveballs and knuckle-curves are grouped together (CU+KC), and all models include pitcher-level random effects so we’re not attributing to the label what is really explained by who throws it.

How much does the label matter, above and beyond the pitch’s movement? Each violin shows the range of plausible answers the model produces. Think of it as a probability distribution over effect sizes, where taller and narrower means more certainty. The line is the 95% credible interval (we’re 95% sure the true value falls in that range) and the dot is the median estimate. The dashed line at zero means ‘no difference from sweeper.’ The orange slider vs. sweeper panels all straddle zero, meaning sliders and sweepers are statistically indistinguishable on every outcome once movement is controlled. Curveballs are clearly different from sweepers on all five.

The plot above shows predicted probabilities for each pitch group after controlling for movement and pitcher. Slider and sweeper distributions sit almost exactly on top of each other across all five outcomes. Curveball+knuckle-curve is a genuinely different group.

The answer is no. After conditioning on movement, sliders and sweepers are statistically indistinguishable on all five outcomes. The slider vs. sweeper coefficient on whiff rate is 0.014 (95% credible interval: -0.016 to 0.044). That interval spans both negative and positive values, meaning the model can’t even tell us which pitch type generates more whiffs, let alone by how much. Same story for chase rate, strike rate, called strike rate, and zone rate. Every single one. Once you account for what the pitch actually does in the air, the label adds nothing.

Curveballs are a different story. They generate fewer whiffs and chases than sweepers but more called strikes and more pitches in the zone. That boundary, curveball/knuckle-curve versus the slider/sweeper family, is real and the label captures it reliably. The slider/sweeper boundary does not hold up the same way.

Stage 3: Do the labels predict different contact quality?

Even if sliders and sweepers produce similar strikeout-generating outcomes, maybe contact quality tells a different story. A pitch that batters make contact with differently is doing something meaningfully distinct, even if the whiff rates look the same.

We modeled five contact outcomes: hard hit rate, ground ball rate, fly ball rate, line drive rate, and popup rate, plus exit velocity, using the same framework as Stage 2. Again, all six movement components are controlled for, so we’re isolating any information the label carries beyond the pitch’s physical profile.

The plot above shows unadjusted contact outcome rates by pitch group. Even before controlling for movement, sweepers generate noticeably more popups and fly balls and fewer ground balls than sliders or curveballs.

The plot above shows posterior label contrasts for contact outcomes. Unlike the pitcher outcome results, both slider vs. sweeper (orange) and curveball+knuckle-curve vs. sweeper (blue) clearly exclude zero on most outcomes. Sweepers are associated with more popups, more fly balls, and lower hard hit rates, a distinct contact suppression profile that the label does capture, even where whiff rates look identical.

The plot above shows predicted contact outcome probabilities for each pitch group after controlling for movement and pitcher. On hard hit and ground ball rate, sweepers are clearly lower. On popup rate, sweepers are clearly higher. Sliders sit between sweepers and curveballs on most outcomes.

The plot above shows predicted exit velocity by pitch group after controlling for movement and pitcher. Sweepers allow roughly 1.7 fewer mph of exit velocity than sliders and 2.3 fewer mph than curveballs, a meaningful gap that persists after accounting for movement differences.

Here the sweeper label earns its keep. After controlling for movement, sweepers are associated with lower exit velocity (roughly 85.2 mph predicted) compared to sliders (86.9 mph) and curveballs (87.5 mph). They also generate more popups and fewer hard-hit balls. This is a real difference, and it’s not explained by the pitch’s raw movement profile.

One leading candidate for why: seam-shifted wake aerodynamics, a phenomenon where unusual seam orientation creates late, unpredictable movement that disrupts contact even when the gross movement profile looks similar to a slider’s (Smith and Smith 2021; Yin et al. 2025). This would explain why two pitches that look nearly identical in Statcast’s movement coordinates can produce different contact outcomes.

What does this all mean?

Think of the sweeper as the far end of the slider spectrum rather than its own pitch family. It occupies an extreme region of slider movement space, and on the outcomes pitchers control directly, it behaves exactly like a slider. The label draws a line through a continuum and calls one side something new.

What makes the sweeper genuinely interesting is what happens when contact is made. Lower exit velocity, more popups, fewer hard-hit balls. These differences survive movement controls, which means something about the sweeper region of movement space, possibly seam-shifted wake aerodynamics, suppresses hard contact in a way that conventional sliders do not. The sweeper is not just a slider. But it is also not as categorically distinct as the separate label implies. It is a slider pushed to an extreme, with a contact suppression benefit that becomes real at that extreme.

That distinction has concrete consequences.

For pitcher development: The target is the movement profile, not the label. Higher horizontal break and a more tilted spin axis are the physical objectives. Whether Statcast calls the result a sweeper or a slider is irrelevant to whether the pitch generates whiffs. The contact suppression benefit does appear to be real at the extreme end of the movement spectrum, so developing in that direction may genuinely limit damage on contact. But pitchers and coaches should be optimizing toward a region of movement space, not toward a Statcast tag.

For pitch quality models: Stuff+ benchmarks pitches within label-defined buckets (Langin 2021). A pitcher on the slider/sweeper boundary gets evaluated against a different comparison group depending on which side of an arbitrary line Statcast places him on a given day. If both sides of that line produce equivalent whiff rates, called strike rates, and chase rates, the benchmarking is generating noise with real consequences for how pitchers are evaluated and paid.

For research: Any study that treats slider and sweeper as distinct inputs is estimating a difference that does not exist on the pitcher outcome side. The movement profile is the right conditioning variable. The Statcast label is a noisy proxy for it.

References

Smith, A. W., and Smith, B. L. (2021). Using baseball seams to alter a pitch direction: The seam shifted wake. Proceedings of the Institution of Mechanical Engineers, Part P: Journal of Sports Engineering and Technology, 235(1), 21-28. https://doi.org/10.1177/1754337120961609

Yin, Y., Aoki, T., Watanabe, S., and Kobayashi, H. (2025). Aerodynamics study on sweeper. Proceedings of the Institution of Mechanical Engineers, Part P: Journal of Sports Engineering and Technology. https://doi.org/10.1177/17543371251395349

Langin, C. (2021). Pitch design: What is Stuff+? Quantifying pitches with pitch models. Driveline Baseball. https://www.drivelinebaseball.com/2021/12/what-is-stuff-quantifying-pitches-with-pitch-models/