Construct Validity of the Statcast Breaking Ball Taxonomy: Curveball, Slider, and Sweeper

Abstract

Major League Baseball’s Statcast system assigns discrete pitch-type labels to every pitch via a proprietary classifier. The sweeper (ST) label was introduced publicly in 2022, though pitches have been retroactively classified as ST as far back as 2011. This analysis assesses the construct validity of the Statcast breaking ball taxonomy across three stages, following the framework of Cronbach and Meehl (1955) and Strauss and Smith (2009). Stage 1 assesses convergent validity by mapping the six-dimensional movement space via principal components analysis (PCA). Stage 2 assesses discriminant validity by testing whether pitch type labels predict different pitcher-controlled process outcomes after conditioning on movement via Bayesian hierarchical binomial models. Stage 3 assesses predictive validity by applying the same framework to contact quality outcomes. The central finding is that the slider (SL) and sweeper (ST) labels fail discriminant validity: after conditioning on movement, they are indistinguishable on all five pitcher-controlled process outcomes. The meaningful discriminant boundary in the data is between the curveball/knuckle-curve family and the slider/sweeper family. Partial predictive validity is retained for the ST label on contact suppression outcomes, including exit velocity, popup rate, and fly ball rate, consistent with seam-shifted wake aerodynamics as a potential mechanism.

Introduction

The validity of classification systems is a foundational concern in applied measurement. Cronbach and Meehl (1955) introduced construct validity as the question of whether a measurement instrument captures the theoretical construct it purports to measure. Strauss and Smith (2009) operationalised three subtypes relevant to this analysis: convergent validity (does the measure cluster with theoretically similar constructs?), discriminant validity (does the measure predict different outcomes for theoretically distinct categories?), and predictive validity (does the measure predict criterion outcomes?).

Statcast’s pitch classification system assigns discrete labels to every pitch based on movement, velocity, spin, and arm angle via a proprietary machine learning classifier. The sweeper label, introduced publicly in 2022, has grown to represent approximately 25% of breaking balls by 2025, largely displacing sliders. Whether the sweeper represents a genuinely distinct pitch type or a refinement of an existing continuous movement spectrum is an open empirical question with practical implications for pitch development, player evaluation, and Statcast-based research.

An important circularity caveat applies: because Statcast’s classifier uses movement variables as inputs, analyses regressing movement features against labels are partly circular. This analysis characterises the geometry of the movement space and tests whether labels carry incremental predictive information beyond the physics they were partially derived from.

Data

Pitch-level data were pulled directly from Baseball Savant (baseballsavant.mlb.com) via the Statcast search API using httr and readr in R, covering regular season games 2020-2025. Pitch types CU, SL, ST, and KC were retained. SV (slurve) was retained for visualisation but excluded from outcome models.

Table 1. Pitch counts and proportions by type and season. CU = curveball, SL = slider, ST = sweeper, KC = knuckle-curve.
Season	CU (n)	CU (%)	SL (n)	SL (%)	ST (n)	ST (%)	KC (n)	KC (%)
2020	16,401	28.0	34,002	58.1	2,630	4.5	5,481	9.4
2021	38,104	25.7	85,810	58.0	12,144	8.2	11,946	8.1
2022	36,052	24.4	79,293	53.6	21,100	14.3	11,449	7.7
2023	31,687	21.4	75,901	51.2	30,907	20.9	9,639	6.5
2024	31,236	21.1	72,402	49.0	35,392	24.0	8,669	5.9
2025	32,967	22.3	69,509	47.1	36,126	24.5	8,967	6.1

Figure 1. Share of breaking balls assigned each Statcast label per season, 2020-2025. The sweeper share has grown consistently since 2020, largely at the expense of the slider, consistent with reclassification along a pre-existing movement continuum rather than the emergence of a genuinely novel pitch type.

Six movement features were standardised and residualised against factor(game_year) via OLS prior to analysis: horizontal break (api_break_x_arm, sign-flipped for left-handed pitchers so glove-side break is positive regardless of handedness), vertical break (api_break_z_with_gravity), spin rate, spin axis (sign-flipped for left-handed pitchers), release extension, and arm angle. Variable definitions follow Fast (2009) and Nathan (2012). Release speed was excluded due to high collinearity with vertical break (r = -0.81), consistent with the physical relationship between velocity and gravitational drop: faster pitches spend less time in flight, reducing the effect of gravity on vertical trajectory (Nathan 2012). Residualisation strips secular classifier drift so that movement geometry reflects within-season pitch physics. Knuckle-curves (KC) showed near-identical movement profiles to curveballs (59.5% core area KDE overlap) and were merged with curveballs for all outcome models. DIC comparisons confirmed the penalty for combining the two categories was less than 5.

Stage 1: Convergent Validity

PCA was computed on the six year-residualised movement features using vegan::rda() (Oksanen et al. 2022). All six principal components were retained as covariates in the outcome models. A broken stick test suggested retaining only the first two components; however, because the goal of the PCA stage is not dimensionality reduction per se but rather to construct a comprehensive movement-space representation for use as a conditioning set in the Bayesian models, all components explaining at least 5% of total variance were retained. This threshold is not a hard inferential criterion but rather an indicator that each component captures non-negligible independent variation in the movement space. Discarding the lower-order components would risk omitting movement dimensions that, while modest in overall variance explained, may carry predictive information relevant to pitch outcomes. We note that restricting the covariate set to PC1 and PC2 is a sensitivity analysis worth pursuing in future work. Convergent validity was assessed by examining whether pitches bearing the same Statcast label cluster in PC space, and whether the biplot structure aligns with the physical relationships established by aerodynamic theory. Pairwise movement space overlap was quantified using bivariate kernel density estimates (MASS::kde2d, 200x200 grid) on PC1/PC2 scores. Two overlap metrics are reported: core area overlap (grid cells above 50% of peak density) and 95% range overlap (grid cells above 5% of peak density), computed as the symmetric average of the intersection relative to each label’s density area.

Table 2. Variance explained by each principal component. All six components exceed the 5% retention threshold.
Component	Variance explained (%)	Cumulative (%)
PC1	30.9	30.9
PC2	20.7	51.6
PC3	17.9	69.5
PC4	14.0	83.5
PC5	10.3	93.7
PC6	6.3	100.0

Table 3. PCA loadings (scaled arrows, vegan::rda). PC1 is dominated by spin axis (0.467) and vertical break (-0.447), representing the continuum from curveball-type movement (negative PC1) to sweeper/slider-type movement (positive PC1). PC2 is dominated by arm angle (-0.385) and horizontal break (-0.328).
Feature	PC1	PC2
Horiz. break (adj.)	0.167	-0.328
Vert. break	-0.447	-0.178
Spin rate	-0.315	0.273
Spin axis (adj.)	0.467	0.065
Extension	0.069	-0.087
Arm angle	-0.096	-0.385

Figure 2. PCA of the breaking-ball movement space, 2020-2025 (50,000-pitch subsample). Shaded regions show the 50% kernel density contour (core area) for each label. Slider and sweeper core areas overlap by 26.9%, with 68.4% overlap at the 95% range. No clean cluster boundaries are visible, consistent with a continuous movement spectrum.

Table 4. Pairwise KDE overlap in PC1/PC2 space. Core area = proportion of grid cells above 50% of peak density that are shared (symmetric average); 95% range = same at 5% of peak density threshold. The SL/ST core area overlap of 26.9% and range overlap of 68.4% indicate substantial movement space sharing. KC/CU overlap of 59.5% supports merging those labels in outcome models.
Pair	Core area overlap	95% range overlap
SL vs. ST	26.9%	68.4%
CU vs. ST	0.9%	48.2%
CU vs. SL	0.0%	37.2%
CU vs. KC	59.5%	80.4%
KC vs. SL	0.3%	43.0%
KC vs. ST	13.9%	45.5%

PC1 contrasts high-vertical-break, high-spin-rate pitches (negative scores: curveballs) with high-spin-axis, high-horizontal-break pitches (positive scores: sweepers and sliders). The relationship between spin axis orientation and the direction of spin-induced deflection establishes spin axis as a meaningful discriminating variable across the CU/KC/SL/ST continuum (Bahill and Baldwin 2007). SL and ST both occupy positive PC1 space with 26.9% core area overlap and 68.4% range overlap, confirming the movement continuum hypothesis. No clean cluster boundaries separate any pitch types, consistent with convergent validity holding within label groups while discriminant boundaries between SL and ST are absent.

Stage 2: Discriminant Validity

Discriminant validity was assessed by testing whether pitch type labels predict different outcomes relative to the sweeper reference category after conditioning on the full movement space. Five outcomes were modelled at the pitcher x pitch_type x game_year x game_month cell level (minimum 10 pitches per cell): whiff rate, chase rate, called strike rate, any strike rate, and zone rate.

Each outcome was modelled with a binomial likelihood and logistic link:

\[\text{logit}(p_i) = \alpha_{j[i]} + \sum_{k=1}^{6} \beta_{pc_k} \cdot \text{PC}_{ki} + \beta_{cu+kc} \cdot \text{CU+KC}_i + \beta_{sl} \cdot \text{SL}_i\]

\[Y_i \sim \text{Binomial}(p_i,\ n_i), \quad \alpha_j \sim \text{Normal}(\mu, \tau_\alpha)\]

Pitcher intercepts are drawn from a common normal distribution with mean \(\mu\) and precision \(\tau_\alpha\), implementing partial pooling. Fixed effect priors and \(\mu\) were N(0, 0.001); \(\tau_\alpha\) was assigned a Gamma(0.001, 0.001) hyperprior. The derived contrast \(\beta_{cu+kc\ vs\ sl} = \beta_{cu+kc} - \beta_{sl}\) was monitored directly. Models were fit in JAGS via jagsUI (3 chains, 1,000 adaptation, 5,000 burn-in, 10,000 iterations, thinned by 2). All monitored nodes converged (R-hat < 1.1). Discriminant validity is indicated by \(\beta_{sl}\) credible intervals that exclude zero.

Figure 3. Posterior distributions for pairwise label contrasts across five pitcher-controlled outcomes. Each violin shows the full posterior; line = 95% credible interval; dot = posterior median. The dashed line marks zero (no difference from sweeper). The slider vs. sweeper contrast (orange, centre column) straddles zero on all five outcomes. The curveball+knuckle-curve vs. sweeper contrast (blue, left column) excludes zero on all five.

Figure 4. Posterior predicted probabilities for each pitch group at mean covariate values, marginalised over the pitcher random effect distribution. Slider and sweeper distributions are nearly indistinguishable across all five outcomes. Curveball+knuckle-curve is a clearly distinct group.

\(\beta_{sl}\) straddles zero on all five outcomes: whiff (mean = 0.014, 95% CI: -0.016 to 0.044), chase (0.004, -0.025 to 0.031), strike (0.016, -0.003 to 0.035), called strike (-0.009, -0.038 to 0.018), and zone (0.011, -0.011 to 0.033). After conditioning on movement, the slider label carries no incremental predictive information on any pitcher-controlled process outcome. The SL/ST distinction fails discriminant validity. The CU+KC group differs credibly from ST on all five outcomes, confirming the curveball/knuckle-curve vs. slider/sweeper boundary as the discriminantly valid distinction in the data.

Stage 3: Predictive Validity

Predictive validity was assessed using outcomes on balls put in play. Five outcomes were modelled with binomial likelihoods: hard hit rate (exit velocity >= 95 mph), ground ball rate, fly ball rate, line drive rate, and popup rate. Exit velocity was modelled separately with a Gaussian likelihood:

\[\mu_i = \alpha_{j[i]} + \sum_{k=1}^{6} \beta_{pc_k} \cdot \text{PC}_{ki} + \beta_{cu+kc} \cdot \text{CU+KC}_i + \beta_{sl} \cdot \text{SL}_i, \quad Y_i \sim \text{Normal}(\mu_i, \tau_e)\]

The prior on \(\mu\) was N(86.8, 0.001), centred on the observed mean exit velocity. An additional residual precision parameter \(\tau_e\) was estimated with Gamma(0.001, 0.001) hyperprior. All other model settings were identical to Stage 2.

Figure 6. Posterior label contrasts for contact quality outcomes. Unlike Stage 2, both slider vs. sweeper (orange) and curveball+knuckle-curve vs. sweeper (blue) exclude zero on most outcomes, indicating partial predictive validity for the ST label.

Figure 7. Posterior predicted contact outcome probabilities for each pitch group after conditioning on movement and pitcher. Sweepers show credibly lower hard hit and ground ball rates and higher popup and fly ball rates than sliders or curveballs.

Figure 8. Posterior predicted exit velocity by pitch group after conditioning on movement and pitcher. Sweepers are associated with approximately 1.7 fewer mph of exit velocity than sliders and 2.3 fewer mph than curveballs, a credible difference that persists after movement adjustment.

The Stage 2/Stage 3 divergence is the central finding: a label can fail discriminant validity on pitcher-controlled process outcomes while retaining partial predictive validity on contact quality outcomes. A plausible mechanism is seam-shifted wake aerodynamics, wherein asymmetric seam placement relative to the spin axis generates a lateral force independent of the Magnus effect (Smith and Smith 2021; Yin et al. 2025). This mechanism could suppress hard contact at the extreme horizontal-break end of the movement spectrum even where gross movement coordinates are equivalent to conventional sliders.

Discussion

The three-stage construct validity assessment yields a coherent picture. Convergent validity holds within label groups: pitches bearing the same label cluster in movement space, and the biplot structure aligns with aerodynamic theory. Discriminant validity fails for the SL/ST distinction: after conditioning on movement via six PC scores and pitcher-level partial pooling, the slider label carries no incremental predictive information on whiff rate, chase rate, strike rate, called strike rate, or zone rate. The meaningful discriminant boundary in the data is one level up from where Statcast draws it, between the curveball/knuckle-curve family and the slider/sweeper family. Partial predictive validity is retained for the ST label on contact suppression outcomes, suggesting that the sweeper region of movement space captures a genuine contact suppression profile that is not fully explained by the movement coordinates Statcast records.

These findings have several implications. First, analytical frameworks that treat SL and ST as distinct inputs to models of pitcher performance are likely estimating noise on the pitcher-controlled outcome side. Conditioning on the movement profile directly, or using latent movement representations such as the PC scores derived here, is a more defensible analytical strategy. Second, pitch quality models such as Stuff+ benchmark pitches within label-defined buckets: fastballs, breaking balls, and offspeed (Langin 2021). Langin’s framework predates the sweeper label, but the SL/ST distinction falls within the same breaking ball bucket. If sliders and sweepers are functionally indistinguishable on pitcher-controlled outcomes, a pitcher whose pitch sits at the SL/ST boundary is being benchmarked against a potentially arbitrary comparison group depending on which label Statcast assigns. Third, the partial predictive validity retained for the ST label on contact outcomes warrants further investigation, particularly regarding the role of seam-shifted wake aerodynamics in generating contact suppression at the extreme horizontal-break end of the movement spectrum.

Limitations include the absence of batter handedness, pitch count, and fastball shape as covariates. Olubayode (2025) found handedness-dependent SL/ST differences that the current model cannot detect. The dynamic taxonomy, with the ST label emerging mid-study period, represents a methodological challenge for longitudinal research that the year-residualisation only partially addresses.

References

Bahill, A. T., and Baldwin, D. G. (2007). Describing baseball pitch movement with right-hand rules. Computers in Biology and Medicine, 37(7), 1001-1008. https://doi.org/10.1016/j.compbiomed.2006.06.007

Cronbach, L. J., and Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 281-302. https://doi.org/10.1037/h0040957

Fast, M. (2009). What the heck is PITCHf/x? In The Hardball Times Baseball Annual 2010, eds. J. Distelheim, B. and B. Jacobs. ACTA Sports: Chicago, 153-158. https://baseball.physics.illinois.edu/FastPFXGuide.pdf

Langin, C. (2021). Pitch design: What is Stuff+? Quantifying pitches with pitch models. Driveline Baseball. https://www.drivelinebaseball.com/2021/12/what-is-stuff-quantifying-pitches-with-pitch-models/

Nathan, A. M. (2012). Determining pitch movement from PITCHf/x data. University of Illinois. http://baseball.physics.illinois.edu/Movement.pdf

Oksanen, J., et al. (2022). vegan: Community ecology package. R package version 2.6-4. https://CRAN.R-project.org/package=vegan

Olubayode, E. (2025). Bayesian analysis of strike likelihood in baseball: Evaluating the impact of pitch dynamics and batter characteristics. Master’s thesis, University of Oklahoma. https://www.proquest.com/openview/7e593c4fa60f7c16b5b366d296bc661b/1?pq-origsite=gscholar&cbl=18750&diss=y

Smith, A. W., and Smith, B. L. (2021). Using baseball seams to alter a pitch direction: The seam shifted wake. Proceedings of the Institution of Mechanical Engineers, Part P: Journal of Sports Engineering and Technology, 235(1), 21-28. https://doi.org/10.1177/1754337120961609

Strauss, M. E., and Smith, G. T. (2009). Construct validity: Advances in theory and methodology. Annual Review of Clinical Psychology, 5, 1-25. https://doi.org/10.1146/annurev.clinpsy.032408.153639

Yin, Y., Aoki, T., Watanabe, S., and Kobayashi, H. (2025). Aerodynamics study on sweeper. Proceedings of the Institution of Mechanical Engineers, Part P: Journal of Sports Engineering and Technology. https://doi.org/10.1177/17543371251395349