Just for a little bit of background on this exploration, My favorite sport to watch is baseball, and even more specifically, the MLB. I am an avid St. Louis Cardinals fan, and was disappointed when they lost in the wild card round this year. But lucky for me, the MLB produces tons of data each game, and I was able to capture that and explore it.
This data set is captured from Baseball Savant and was trimmed down to show only the game data on September 24, 2022 where the St. Louis Cardinals (STL) played the Los Angeles Dodgers (LAD). I chose this game because Albert Pujols hit historic career home runs 699 and 700 within the first 4 innings of the game. The Cardinals ended up winning this game 11-0.
The data itself is detailed at a pitch-by-pitch level, showing detailed statcast information Some of the information that is included is:
If you would like to learn more about the information about the data set and where it comes from, go here: https://baseballsavant.mlb.com/csv-docs
One Other Important Note: Throughout this data, we will see a pitch type deemed as “Other”. These were pitches that Statcast was unable to fit within a given category, whether that was due to an abnormal spin direction, speed, grip, or some other metric.
One of the first things that I was interested in exploring was the
ovedrall relation between release speed and pitch type. We often hear
about how off-speed pitches are slower, but I wanted to see in this
specfic game, just how drastic that difference was.
Not to my surprise, we see that the fastball and sinker had the highest
overall release speed from the pitcher. These are pitches that fly
rather straight, at high speeds, and with low movement and spin
rate.
However, it became interested when looking at the difference between the slider and the changeup. both of these pitches are in the mid-80s as they are thrown, with the slider having more variation. But overall, they are close on average and median. When watching the film of the game, the slider breaks horizontally as it ‘slides’ across the plate, while the changeup simply drops as it reaches home plate.
This game was also interesting because we some eephus pitches were thrown (If you have never seen these pitches, I highly suggest you look it up). They are high arcing pitches, that have little to no speed. And the data backs this up. They varied around 55 mph, which is a 40 mph difference compared with the 4-seam fastball. This is a big drop for the hitters, and can really cause some havoc. But due to their low speed, these are not typically utilized, unless position players (non-pitchers) are playing in a given game.
As I briefly noted above, there can be a large correlation between
release speed, spin rate, and pitch type, and I wanted to explore these
trends.
Grouping this data by color, we begin to see an interesting story
emerge. The 4-seam fastball has high speed and middle of the range spin
rate.
As are pitches get slower, and actually form a different rotation pattern, we see a different picture as well. I think the curveball is participial interesting to look at. We see two distinct groupings of curveballs, One around 78 mph and 2250 rpm, and the other around 78 mph, but at 3200 rpm. What this graph may not show, is that those two groupings are actually performed by two different pitchers. Pitchers in the MLB have different strengths and weaknesses, and they often play to those. The lower rpm grouping may belong to a pitcher who doesn’t have as much movement on his curveball, but he has other pitches to back it up.
One of the last pitching things that I was interested in exploring
was the pitch distribution by team. Teams often face each other with a
game plan, knowing what pitches any given batter is weak to, and what
pitches in any pitcher’ arsenal are more likely to get an out.
There are some pretty clear differences in pitch distribution for each
team the Dodgers (LAD) relied primary on a 4-seam fastball, at more than
double the next category, which was the slider. We also see that the
Cardinals (STL) had a more even distribution between 4-seam fastball and
curve ball.
The Cardinals (STL) were also the only team to throw eephus pitches, while the Dodgers were the only team to throw the Knuckle Curve.
Up to this point, we had been looking primarily at the correlation
between pitch type, speed, and spin rate. But all of these come down to
the main factor that drives the game of baseball: the outcome of the
pitch. The pitcher throws the ball, towards the batter, in hopes that
the batter will get themselves out in one way or another. And I was
curious whether spin rate and release speed had a correlation on the
outcome of the pitch.
Overall, it is hard to see any specific trend here. There is enough of a scattering both horizontally across release speed and vertically (spin rate) to be unable to determined any particular trend.
We can pinpoint small trends such as a low release speed and low spin rate will often not be a ‘called strike’, and it is more likely to be called a ball. Pitches that are usually swung at (foul, hit_into_play, swinging_strike, foul_tip, swinging_strike_blocked) have a higher release speed and a higher spin rate.
One of the advanced metrics tracked by statcast is the Delta Run Expectancy. According to their documentation, this metric tracks the “The change in Run Expectancy before the Pitch and after the Pitch” See Here. To explain that another way, it is how many runs were gained or eliminated by throwing that pitch. For instance, if the Delta Run Expectancy is -1, that pitch saved a run from being scored.
This graph shows the average Delta Run Expectancy by each
pitch, for each team.
For the Dodgers (LAD), we see that the changeup was the most effective in eliminating runs overall, while the slider created the most runs at over 0.1 runs created.
On the opposite side, we see that the Cardinals (STL) had a really good eephus pitch at over -0.2 runs created, while the ‘Other’ Category had a high average delta run expectancy at around 0.15 runs created.
If we compare this with the Pitch Distribution By Team, we see that the pitches Dodgers relied on most, 4-seam fastball and slider, ended up creating the most runs for the cardinals overall. For the Cardinals, the top two pitches, the 4-seam fastball and curveball both ended up protecting runs.
And that turned out to be the difference in the game. The Cardinals won 11-0 in the game, after getting 12 hits, compared to the Dodgers 7 hits.