In pro scouting, it is very important to evaluate a player’s struggles. It is important to understand why a player struggles and how he responds to his struggles.
The purpose of this program is to flag players who are struggling and help determine why they are struggling. From a player development perspective, staff can work with struggling players to get them back on track. From a pro scouting perspective, this can be used to evaluate players through struggles.
In reality, this program would flag players across all organizations and levels. However, for demonstration purposes, I will use 2023 MLB data for three random Orioles opponents: the Mets, the Guardians, and the Mariners.
#library(baseballr)
#library(tidyverse)
#library(lubridate)
# Load 2023 Statcast data
load("G:/My Drive/Baseball/R Projects/Data/Statcast2023.RData")
opponents <- c('NYM', 'CLE', 'SEA')
# filter to relevant hitters
team_data <- Statcast2023 %>%
filter((inning_topbot == 'Top' & away_team %in% opponents) | (inning_topbot == 'Bot' & home_team %in% opponents))
The get_slump_dates() function flags dates in which a player can be considered to be in a slump. The specifications for a slump can be changed, but the default specifications are as follows.
get_slump_dates <- function(player, length = 20) {
# Create player-specific data frame
player_data <- team_data %>%
filter(player_name == player)
# Calculate statistics to date for each date in the player data
# Then, calculate the last length days data at each date
over_time <- data.frame()
dates <- unique(player_data$game_date) # list of dates
for (d in dates) { # loop through all dates
# data to date
data_to_date <- filter(player_data, game_type == 'R' & game_date <= d)
############# Set up stat calculations ###############
# Define count labels
walk_count <- sum(data_to_date$events == "walk")
strikeout_count <- sum(data_to_date$events == "strikeout")
hit_by_pitch_count <- sum(data_to_date$events == "hit_by_pitch")
bip_count <- (sum(data_to_date$bb_type == "line_drive"))+
(sum(data_to_date$bb_type == "ground_ball"))+
(sum(data_to_date$bb_type == "fly_ball"))+
(sum(data_to_date$bb_type == "popup"))
home_run_count <- sum(data_to_date$events == 'home_run')
# Define vectors for events
hits_vector <- c('single', 'double', 'triple', 'home_run')
at_bats_vector <- c('field_out', 'strikeout', 'grounded_into_double_play', 'fielders_choice', 'fielders_choice_out', 'triple_play', 'strikeout_double_play', 'double_play', 'field_error', 'force_out', hits_vector)
plate_appearances_vector <- c(at_bats_vector, 'sac_fly', 'walk', 'hit_by_pitch', 'sac_bunt', 'sac_bunt_double_play', 'sac_fly_double_play', 'catcher_interf')
swings_vector <- c('hit_into_play', 'swinging_strike_blocked', 'swinging_strike', 'foul_tip', 'foul')
################# Calculate stats to date #################
# Calculate plate appearances (PA)
PA <- sum(data_to_date$events %in% plate_appearances_vector)
# Calculate BABIP
babip <- (sum(data_to_date$events %in% hits_vector) -
home_run_count) / bip_count
# Calculate wOBA
woba_sum <- sum(data_to_date$woba_value, na.rm = TRUE)
woba_denominator <- sum(data_to_date$events %in% at_bats_vector) +
walk_count +
hit_by_pitch_count +
sum(data_to_date$events == 'sac_fly')
woba <- woba_sum / woba_denominator
################ Calculate the same stats for ##################
################ each date over specified period of time #######
################ e.g. last 20 days #############################
# Create a vector of dates to use for the length day window
last_length_data <- player_data %>%
filter(game_type == 'R' & game_date <= d & game_date >= (d - length))
# Define count labels
walk_count <- sum(last_length_data$events == "walk")
strikeout_count <- sum(last_length_data$events == "strikeout")
hit_by_pitch_count <- sum(last_length_data$events == "hit_by_pitch")
bip_count <- (sum(last_length_data$bb_type == "line_drive"))+
(sum(last_length_data$bb_type == "ground_ball"))+
(sum(last_length_data$bb_type == "fly_ball"))+
(sum(last_length_data$bb_type == "popup"))
home_run_count <- sum(last_length_data$events == 'home_run')
# Calculate plate appearances (PA)
PA_length <- sum(last_length_data$events %in% plate_appearances_vector)
# Calculate BABIP
babip_length <- (sum(last_length_data$events %in% hits_vector) -
home_run_count) / bip_count
# Calculate wOBA
woba_sum <- sum(last_length_data$woba_value, na.rm = TRUE)
woba_denominator <- sum(last_length_data$events %in% at_bats_vector) +
walk_count +
hit_by_pitch_count +
sum(last_length_data$events == 'sac_fly')
woba_length <- woba_sum / woba_denominator
# Create a row for the current player in over_time
new_row <- data.frame(Date = as.Date(d, origin = "1970-01-01"),
Name = player,
PA = PA,
wOBA = woba,
BABIP = babip,
PA_length = PA_length,
wOBA_length = woba_length,
BABIP_length = babip_length)
over_time <- rbind(over_time, new_row)
}
# add a column to identify slumps
# Define a slump as a row where the wOBA_length is less than 80% of total wOBA
# and the BABIP_length is NOT less than 80% of BABIP (i.e. not bad luck)
over_time <- over_time %>%
mutate(slump = ifelse(wOBA_length < wOBA * 0.8 & BABIP_length > BABIP * 0.8, 1, 0))
slump_dates <- filter(over_time, slump == 1)$Date
return(slump_dates)
}
Apply to an example strategy: Francisco Lindor
The resulting list shows the dates when Lindor would be considered to be in a slump.
lindor_dates <- get_slump_dates('Lindor, Francisco', length = 20)
The next step is to explore the data for Lindor’s September slump and look for possible explanations. Compare his tendencies over the slump period with his tendencies over the entire season.
This analysis is meant to describe approach, rather than production. Many metrics can be used, but for demonstration purposes, I have selected the following:
In further development, the analysis of slumping players can be vastly expanded. It can include anything from plate coverage visualizations to spray charts to pitch type heat maps.
# Create function
slump_comps <- function(player, date) {
slump_date <- as.Date(date)
player_data <- filter(team_data, player_name == player)
# slump_data is the slump date and up to 20 days prior
slump_data <- filter(player_data, game_type == 'R' & game_date <= slump_date & game_date >= (slump_date - 20))
# Define vectors for events
hits_vector <- c('single', 'double', 'triple', 'home_run')
at_bats_vector <- c('field_out', 'strikeout', 'grounded_into_double_play', 'fielders_choice', 'fielders_choice_out', 'triple_play', 'strikeout_double_play', 'double_play', 'field_error', 'force_out', hits_vector)
plate_appearances_vector <- c(at_bats_vector, 'sac_fly', 'walk', 'hit_by_pitch', 'sac_bunt', 'sac_bunt_double_play', 'sac_fly_double_play', 'catcher_interf')
swings_vector <- c('hit_into_play', 'swinging_strike_blocked', 'swinging_strike', 'foul_tip', 'foul')
##################### Calculate metrics for slump to #######################
##################### identify potential explanations. #####################
# Calculate pitches per PA
pitches_per_pa_slump <- nrow(slump_data) / sum(slump_data$events %in% plate_appearances_vector)
# Calculate chase rate on breaking balls
balls <- filter(slump_data, true_zone == 'ball')
chase_rate_slump <- sum(balls$description %in% swings_vector) / nrow(balls)
# Calculate pull percentage
bip_count <- (sum(slump_data$bb_type == "line_drive"))+
(sum(slump_data$bb_type == "ground_ball"))+
(sum(slump_data$bb_type == "fly_ball"))+
(sum(slump_data$bb_type == "popup"))
pull_percentage_slump <- sum((slump_data$stand == 'R' &
slump_data$hc_x < 125.42) |
(slump_data$stand == 'L' &
slump_data$hc_x > 125.42), na.rm = TRUE) /
bip_count
# Calculate oppo percentage
oppo_percentage_slump <- sum((slump_data$stand == 'R' &
slump_data$hc_x > 125.42) |
(slump_data$stand == 'L' &
slump_data$hc_x < 125.42), na.rm = TRUE) /
bip_count
# Calculate first pitch swing rate
first_pitch <- filter(slump_data, balls == 0 & strikes == 0)
fp_swing_rate_slump <- sum(first_pitch$description %in% swings_vector) / nrow(first_pitch)
################### Calculate same metrics for season data to compare #########
# Pitches per PA
pitches_per_pa_season <- nrow(player_data) / sum(player_data$events %in% plate_appearances_vector)
# Chase rate BB
balls <- filter(player_data, true_zone == 'ball')
chase_rate_season <- sum(balls$description %in% swings_vector) / nrow(balls)
# Pull percentage
bip_count <- (sum(player_data$bb_type == "line_drive"))+
(sum(player_data$bb_type == "ground_ball"))+
(sum(player_data$bb_type == "fly_ball"))+
(sum(player_data$bb_type == "popup"))
pull_percentage_season <- sum((player_data$stand == 'R' &
player_data$hc_x < 125.42) |
(player_data$stand == 'L' &
player_data$hc_x > 125.42), na.rm = TRUE) /
bip_count
# Oppo Percentage
oppo_percentage_season <- sum((player_data$stand == 'R' &
player_data$hc_x > 125.42) |
(player_data$stand == 'L' &
player_data$hc_x < 125.42), na.rm = TRUE) /
bip_count
# First pitch swing rate
first_pitch <- filter(player_data, balls == 0 & strikes == 0)
fp_swing_rate_season <- sum(first_pitch$description %in% swings_vector) / nrow(first_pitch)
################ Create data frame to compare ############################
comparison <- data.frame(Metric = c('Pitches per PA', 'Chase Rate BB', 'Pull Percentage',
'Oppo Percentage', 'First Pitch Swing Rate'),
Slump = c(round(pitches_per_pa_slump,2), round(chase_rate_slump,2),
round(pull_percentage_slump,2),round(oppo_percentage_slump,2),
round(fp_swing_rate_slump,2)),
Season = c(round(pitches_per_pa_season,2), round(chase_rate_season,2),
round(pull_percentage_season,2),round(oppo_percentage_season,2),
round(fp_swing_rate_season,2)))
return(comparison)
}
slump_comps('Lindor, Francisco', '2023-09-15')
## Metric Slump Season
## 1 Pitches per PA 3.95 4.00
## 2 Chase Rate BB 0.35 0.30
## 3 Pull Percentage 0.77 0.66
## 4 Oppo Percentage 0.23 0.34
## 5 First Pitch Swing Rate 0.32 0.27
This data offers a few possible explanations for Lindor’s slump. He is swinging at more first pitches and not getting quite as deep into ABs as normal. Also, he is pulling the ball significantly more (66% -> 77%).
When evaluating Lindor, pro scouts should look for him to make the necessary adjustments to get back to his normal approach. Look for Lindor to be more patient and willing to go the other way.
In practice, this program could be run on any given day to identify struggling players across baseball or from specific watch lists. While not every slump can be explained by the approach metrics used, this can be used to easily identify and diagnose many slumps.
# Create a function to check which players are in a slump on a given date
check_slumps <- function(player_list, date) {
slumping_players <- c()
for(player in player_list) {
if (as.Date(date) %in% get_slump_dates(player)) {
# add player to slumping_players vector
slumping_players <- c(slumping_players, player)
}
# else do nothing
}
return(slumping_players)
}
Example: The program is run on 07/04/2023
# For demonstration, scan all MLB players from the selected opponents
players <- unique(team_data$player_name)
slump_list <- check_slumps(players, '2023-07-04')
slump_list
## [1] "McNeil, Jeff" "Alonso, Pete" "Alvarez, Francisco"
The players in slump_list are Jeff McNeil, Pete Alonso, and Francisco Alvarez.
Now, the slump_comps() function can be applied to each player.
Jeff McNeil
slump_comps(slump_list[1], '2023-07-04')
## Metric Slump Season
## 1 Pitches per PA 3.26 3.38
## 2 Chase Rate BB 0.36 0.32
## 3 Pull Percentage 0.54 0.58
## 4 Oppo Percentage 0.46 0.42
## 5 First Pitch Swing Rate 0.38 0.42
Pete Alonso
slump_comps(slump_list[2], '2023-07-04')
## Metric Slump Season
## 1 Pitches per PA 3.97 3.94
## 2 Chase Rate BB 0.26 0.29
## 3 Pull Percentage 0.64 0.61
## 4 Oppo Percentage 0.36 0.38
## 5 First Pitch Swing Rate 0.17 0.24
Alvarez, Francisco
slump_comps(slump_list[3], '2023-07-04')
## Metric Slump Season
## 1 Pitches per PA 3.63 3.83
## 2 Chase Rate BB 0.32 0.29
## 3 Pull Percentage 0.53 0.61
## 4 Oppo Percentage 0.47 0.39
## 5 First Pitch Swing Rate 0.43 0.40