MLB Strikeouts Project

Author

Alex Peinkofer

My Project

Has the rise in strikeouts across MLB been driven by a fundamental shift in how pitchers attack hitters? That was the question I set out to answer when putting this analysis together. Over the past several years it has become hard to ignore the fact that strikeouts are at an all time high, and I wanted to dig into whether that trend could be explained by pitchers simply changing what they throw. Using pitch arsenal data from Baseball Savant, I looked at how pitch usage and effectiveness has shifted from 2019 to 2024 — specifically whether pitchers have been moving away from fastballs and leaning more on breaking balls and offspeed pitches to generate swings and misses. Whiff rate is the key metric here, measuring the percentage of swings that result in a miss, and serves as the bridge between pitch type usage and the strikeouts we see on the scoreboard.

Importing Data and Libraries and Data Wrangling

The Baseball Savant data required a few steps before it was ready to analyze. The pitch arsenal statistics were downloaded separately for each season from 2019 to 2024, so the first step was combining all six files into a single dataset and adding a season column to each so rows could be identified by year. The raw data uses abbreviated pitch type codes like FF for four-seam fastball and SL for slider, so these were grouped into five broader categories, Fastball, Cutter, Slider, Curveball, and Offspeed, to make trends easier to identify and visualize. Pitch types that were rare or inconsistently tracked across seasons were filtered out entirely. All numeric columns were converted from character strings to numbers since R reads CSV data as text by default, and any rows with missing values in key columns were dropped before analysis began.

MLB Pitch Arsenal Usage by Category (2019-2024)

This graph shows a decline in fastball usage and an increase in breaking balls, especially sliders. It shows that pitchers seem to think that they can get more outs by throwing more breaking and offspeed pitches rather than heavy doses of fastballs.

Average Whiff% by Pitch Category (2019-2024)

This graph shows that whiff rate has gone steadily down even though we have seen a rise in strikeouts throughout the league. This suggests an increase in strikes taken in early counts as well as an increase in balls being fouled off.

Distribution of K% by Pitch Category (2019-2024)

This graph shows the percentage of strikeouts each pitch type gets compared to one another. Recalling the previous graph where we saw that fastballs and cutters have low whiff rates, it makes sense to see that they are the two lowest pitch types in strikeout rate. That being said, the margin isn’t as wide as I would have expected.

Whiff% vs K% by Pitch Type (2019-2024)

`geom_smooth()` using formula = 'y ~ x'

This graph shows a strong positive relationship between whiff rate and strikeout rate. Pitches that generate more swings and misses generally produce more strikeouts, with sliders and curveballs tending to have the highest rates.

Average K% by Pitch Category per Season (2019-2024)

This graph compares average strikeout rates by pitch type. We can see that curveballs and sliders consistently produce the highest strikeout rates, while fastballs and cutters generate lower strikeout rates overall.

Secondary Data

To support this analysis, I also collected a secondary dataset by web scraping Baseball Reference’s standard pitching leaderboards using R. Using the polite and rvest packages, I scraped team-level pitching statistics across ten seasons from 2015 to 2024, recording metrics like SO9, ERA, FIP, and WHIP for every MLB team. This data serves as the concluding evidence in the analysis, showing that the pitch arsenal trends identified in the Savant data have translated into a real and measurable rise in strikeouts across the league over time.

Data Wrangling

The data needed a bit of cleanup before it was ready to analyze. Baseball Reference includes some extra rows in its tables, like repeated headers and a league average summary row, so those were removed first. Since rvest reads everything as text, all the stat columns had to be converted to numbers so R could actually work with them. K% and BB% were also calculated manually by dividing strikeout and walk totals by batters faced, since Baseball Reference doesn’t include those directly. Lastly, any rows with missing values in key columns were dropped to keep the analysis clean.

MLB Average Strikeouts Per 9 Innings (2015-2024)

This graph is looking at the average strikeouts per 9 innings metric for each season. We can see that from 2015-2020, this metric has skyrocketed, then lowered a touch up until 2024. Looking at that 2020 number, it worth noting that the 2020 season had only 60 games due to COVID, instead of the normal 162. This may be why that number is so inflated.

Conclusion

So what does all of this tell us? The data makes a pretty compelling case that the rise in strikeouts across MLB is not a coincidence, it is the direct result of pitchers fundamentally changing how they attack hitters. Fastball usage has been steadily declining while breaking balls and offspeed pitches have taken up a larger share of pitch arsenals, and those pitch types consistently generate more swings and misses than fastballs do. Whiff rate proved to be a strong predictor of strikeout rate, meaning that as pitchers throw more pitches that hitters can’t make contact with, strikeouts naturally follow. The Baseball Reference data ties it all together, showing that SO9 has been climbing at the team level for a decade. What started as a hunch turned out to be backed up pretty clearly by the numbers. Pitchers have figured out that breaking balls get outs, and until hitters find an answer, strikeouts are probably not going anywhere.