Deception, in its bare-bones definition, is the act of being fooled. When being deceived, one’s decision-making is typically reversed, and what one expects to happen doesn’t happen. Applied to baseball, I built my DeceptionScore metric off two pillars: funk (derived from unique release) and swing decisions. At the most basic level, a hitter should swing at pitches in the zone and take pitches out of the zone. I applied this logic to building out my metric and made DeceptionScore.
df <- read.csv("savantPitching2022.csv")
df <- df %>% subset(select = c(pitcher, player_name, p_throws, pitch_type, release_speed, effective_speed, release_pos_x, release_pos_z, release_extension, zone, description))
df <- df %>% drop_na()
df <- dummy_cols(df, select_columns = "description")
names(df) <- sub("description_", "", names(df))
df <- df %>%
mutate(inZone = if_else(zone < 10, 1, 0),
swing = if_else(bunt_foul_tip == 1 | foul == 1 | foul_bunt == 1 | hit_into_play == 1 | missed_bunt == 1 | swinging_strike == 1 | swinging_strike_blocked == 1, 1, 0),
whiff = if_else(swinging_strike == 1 | swinging_strike_blocked == 1 | missed_bunt == 1, 1, 0),
ball = if_else(ball == 1 | blocked_ball == 1 | pitchout == 1 | hit_by_pitch == 1, 1, 0))
df <- df %>% subset(select = -c(swinging_strike, swinging_strike_blocked, missed_bunt,foul_bunt, foul_tip, bunt_foul_tip, pitchout, blocked_ball, hit_by_pitch, foul, hit_into_play, zone))
I used MLB pitching data scraped from Baseball Savant from the 2022 MLB season. I subsetted my data to reduce the dataset’s dimensionality and only included all of the columns that I thought were needed to quantify deception. I then dropped all of the NA rows to avoid any bad data. However, the dataset required more modifications, so I transformed all of the “description” values into dummy variables to make it easier to tell what event occurred on what pitch.
The transformation of the dummy variables allowed me to mutate the dataset further. I could add columns that labeled if a pitch was in the strike zone, a swing, a whiff, or a ball. These columns bring more detail to all of the pitch information and set the stage for quantifying deception.
Finally, I dropped all the unneeded columns before building out my deception metric to make my dataset more space efficient. Deception, in its bare-bones definition, is the act of being fooled. When being deceived, one’s decision-making is typically reversed, and what one expects to happen doesn’t happen. Applied to baseball, I built my DeceptionScore metric off two pillars: funk (derived from unique release) and swing decisions. At the most basic level, a hitter should swing at pitches in the zone and take pitches out of the zone. I applied this logic to building out my metric and made DeceptionScore.
avgRelHeight = mean(df$release_pos_z)
avgRelSide = mean(abs(df$release_pos_x))
avgExtension = mean(df$release_extension)
df["releaseRating"] = (abs(df["release_pos_x"]) - avgRelSide) ^ 2 + (df["release_pos_z"] - avgRelHeight) ^ 2 + ((df["release_extension"] - avgExtension)*2)
df["trick"] = if_else(df["called_strike"] == 1 | (df["inZone"] == 0 & df["swing"] == 1), 1, 0)
The first pillar of DeceptionScore is a releaseRating. The release rating measures the uniqueness of a pitcher’s release compared to the big league average release height, release side, and extension. I built out my formula to measure the variability of a pitcher’s release point. The extension of a pitcher’s release plays a significant role in increasing the pitch’s effective velocity, akin to the perceived velocity seen by a hitter. I bring value to higher extension for a pitcher because their extension deceives the hitter of the ball’s actual velocity and true traits, making it play up a notch.
Formula: (abs(ReleaseSide) - AvgReleaseSide) ^ 2 + (ReleaseHeight - AvgReleaseHeight) ^ 2 - ((Extension - AvgExtension) * 2)
The second pillar of DeceptionScore is a Trick. A trick is an event where the batter’s swing decision aligns differently from what they were perceived to do based on the ball’s plane and location through the strike zone. A trick can be one of two events. The first event is a called strike because, in fundamental theory, hitters should swing at pitches in the strike zone. The second event is a swing at a pitch out of the strike zone because, in fundamental theory, hitters shouldn’t swing at pitches out of the strike zone.
trickTable <- df %>%
group_by(player_name) %>%
summarise(Pitches = n(),
ReleaseScore = round(mean(releaseRating), 2),
"TrickRate" = round(mean(sum(trick) / n()), 2),
DeceptionScore = ReleaseScore + (100*TrickRate)) %>%
arrange(desc(DeceptionScore)) %>%
filter(Pitches > 100)
DT::datatable(trickTable, filter = 'top', options = list(
pageLength = 10, autoWidth = TRUE))
To calculate the DeceptionScore, I scaled TrickRate up to the same scale as Release Score and summed them up.
These two pillars work together to build up the DeceptionScore as they each address aspects of what I perceive to be deceiving in a pitcher. Funky release and foolish swing decisions.
The top pitchers rated in Deception Rating are Hoby Milner, Tyler Rogers, Danny Young, Joely Rodríguez, and Aaron Loup. By watching all pitchers, one can innately say they are funky and deceptive, but now the DeceptionScore gives weight to their overall funk on the bump.
DeceptionScore gives a basic understanding of assigning a value to a pitcher’s deception ability. Ideally, DeceptionScore can be further built out to consider a batter’s timing of a swing decision. However, this is only feasible with Hawkeye technology and cameras to identify when a batter can see the ball from the pitcher’s hand. But all in all, the DeceptionScore appears to line up with the eye test by watching the top-rated pitchers.