Slump Assessment and Development

The purpose of this program is to flag players who are struggling and help determine why they are struggling. Using this tool, player development staff can work with struggling players to get them back on track.

In reality, this program would flag players from across all of the Cubs’ full-season affiliates. However, for demonstration purposes, I will use the the Cubs’ 2023 MLB data.

#library(baseballr)
#library(tidyverse)
#library(lubridate)


# Load 2023 Statcast data
load("G:/My Drive/Baseball/R Projects/Data/Statcast2023.RData")

# filter to Cubs hitters
cubs_data <- Statcast2023 %>%
  filter((inning_topbot == 'Top' & away_team == 'CHC') | (inning_topbot == 'Bot' & home_team == 'CHC'))

Identifying slumps

The get_slump_dates() function flags dates in which a player can be considered to be in a slump. The specifications for a slump can be changed, but the default specifications are as follows.

wOBA over the last 20 games is less than 80% of overall wOBA
BABIP over the last 20 games is not less than 80% of overall BABIP
- i.e. dip in wOBA not a result of bad luck

get_slump_dates <- function(player, length = 20) {

# Create player-specific data frame
player_data <- cubs_data %>%
  filter(player_name == player)

# Calculate statistics to date for each date in the player data
# Then, calculate the last length days data at each date

over_time <- data.frame()
dates <- unique(player_data$game_date) # list of dates

for (d in dates) { # loop through all dates
  
  # data to date
  data_to_date <- filter(player_data, game_type == 'R' & game_date <= d)
  
  
  
  ############# Set up stat calculations ###############
  
  
  
  # Define count labels
  walk_count <- sum(data_to_date$events == "walk")
  strikeout_count <- sum(data_to_date$events == "strikeout")
  hit_by_pitch_count <- sum(data_to_date$events == "hit_by_pitch")
  bip_count <- (sum(data_to_date$bb_type == "line_drive"))+
    (sum(data_to_date$bb_type == "ground_ball"))+
    (sum(data_to_date$bb_type == "fly_ball"))+
    (sum(data_to_date$bb_type == "popup"))
  home_run_count <- sum(data_to_date$events == 'home_run')
  
  # Define vectors for events
  hits_vector <- c('single', 'double', 'triple', 'home_run')
  at_bats_vector <- c('field_out', 'strikeout', 'grounded_into_double_play', 'fielders_choice', 'fielders_choice_out', 'triple_play', 'strikeout_double_play', 'double_play', 'field_error', 'force_out', hits_vector)
  plate_appearances_vector <- c(at_bats_vector, 'sac_fly', 'walk', 'hit_by_pitch', 'sac_bunt', 'sac_bunt_double_play', 'sac_fly_double_play', 'catcher_interf')
  swings_vector <- c('hit_into_play', 'swinging_strike_blocked', 'swinging_strike', 'foul_tip', 'foul')
  
  
  
  ################# Calculate stats to date #################
  
  
  
  # Calculate plate appearances (PA)
  PA <- sum(data_to_date$events %in% plate_appearances_vector)
  
  # Calculate BABIP
  babip <- (sum(data_to_date$events %in% hits_vector) -
              home_run_count) / bip_count
  
  # Calculate wOBA
  woba_sum <- sum(data_to_date$woba_value, na.rm = TRUE)
  woba_denominator <- sum(data_to_date$events %in% at_bats_vector) +
    walk_count +
    hit_by_pitch_count +
    sum(data_to_date$events == 'sac_fly')
  woba <- woba_sum / woba_denominator
  
  
  
  ################ Calculate the same stats for ##################
  ################ each date over specified period of time #######
  ################ e.g. last 20 days #############################
  
  
  # Create a vector of dates to use for the length day window
  last_length_data <- player_data %>%
        filter(game_type == 'R' & game_date <= d & game_date >= (d - length))
  
  # Define count labels
  walk_count <- sum(last_length_data$events == "walk")
  strikeout_count  <- sum(last_length_data$events == "strikeout")
  hit_by_pitch_count <- sum(last_length_data$events == "hit_by_pitch")
  bip_count <- (sum(last_length_data$bb_type == "line_drive"))+
    (sum(last_length_data$bb_type == "ground_ball"))+
    (sum(last_length_data$bb_type == "fly_ball"))+
    (sum(last_length_data$bb_type == "popup"))
  home_run_count <- sum(last_length_data$events == 'home_run')
  
  
  
  
  # Calculate plate appearances (PA)
  PA_length <- sum(last_length_data$events %in% plate_appearances_vector)
  
  # Calculate BABIP
  babip_length <- (sum(last_length_data$events %in% hits_vector) -
              home_run_count) / bip_count
  
  # Calculate wOBA
  woba_sum <- sum(last_length_data$woba_value, na.rm = TRUE)
  woba_denominator <- sum(last_length_data$events %in% at_bats_vector) +
    walk_count +
    hit_by_pitch_count +
    sum(last_length_data$events == 'sac_fly')
  woba_length <- woba_sum / woba_denominator
  
  
  
  # Create a row for the current player in over_time
  new_row <- data.frame(Date = as.Date(d, origin = "1970-01-01"), 
                        Name = player, 
                        PA = PA, 
                        wOBA = woba,
                        BABIP = babip,
                        PA_length = PA_length,
                        wOBA_length = woba_length,
                        BABIP_length = babip_length)
  
  
  over_time <- rbind(over_time, new_row)
}

# add a column to identify slumps
# Define a slump as a row where the wOBA_length is less than 80% of total wOBA
# and the BABIP_length is NOT less than 80% of BABIP (i.e. not bad luck)

over_time <- over_time %>%
  mutate(slump = ifelse(wOBA_length < wOBA * 0.8 & BABIP_length > BABIP * 0.8, 1, 0))


slump_dates <- filter(over_time, slump == 1)$Date

return(slump_dates)

}

The resulting list shows the dates when Ian Happ would be considered to be in a slump.

happ_dates <- get_slump_dates('Happ, Ian', length = 20)

happ_dates

## [1] "2023-06-02" "2023-06-04" "2023-06-05" "2023-06-07" "2023-06-08"

Look for explanations for the slump

The next step is to explore the data for Happ’s June slump and look for possible explanations. Compare his tendencies over the slump period with his tendencies over the entire season.

This analysis is meant to describe approach, rather than production. This creates coachable insights. Many metrics can be used, but for demonstration purposes, I have selected the following:

pitches per PA
chase rate - breaking balls
pull percentage
oppo percentage
first pitch swing percentage

In further development, the analysis of slumping players can be vastly expanded. It can include anything from plate coverage visualizations to spray charts to pitch type heat maps.

# Create function

slump_comps <- function(player, date) { 

slump_date <- as.Date(date)
player_data <- filter(cubs_data, player_name == player)

# slump_data is the slump date and up to 20 days prior

slump_data <- filter(player_data, game_type == 'R' & game_date <= slump_date & game_date >= (slump_date - 20))

# Define vectors for events
  hits_vector <- c('single', 'double', 'triple', 'home_run')
  at_bats_vector <- c('field_out', 'strikeout', 'grounded_into_double_play', 'fielders_choice', 'fielders_choice_out', 'triple_play', 'strikeout_double_play', 'double_play', 'field_error', 'force_out', hits_vector)
  plate_appearances_vector <- c(at_bats_vector, 'sac_fly', 'walk', 'hit_by_pitch', 'sac_bunt', 'sac_bunt_double_play', 'sac_fly_double_play', 'catcher_interf')
  swings_vector <- c('hit_into_play', 'swinging_strike_blocked', 'swinging_strike', 'foul_tip', 'foul')


##################### Calculate metrics for slump to #######################
##################### identify potential explanations. #####################


# Calculate pitches per PA 

pitches_per_pa_slump <- nrow(slump_data) / sum(slump_data$events %in% plate_appearances_vector)

# Calculate chase rate on breaking balls

balls <- filter(slump_data, true_zone == 'ball')

chase_rate_slump <- sum(balls$description %in% swings_vector) / nrow(balls)

# Calculate pull percentage

   bip_count <- (sum(slump_data$bb_type == "line_drive"))+
    (sum(slump_data$bb_type == "ground_ball"))+
    (sum(slump_data$bb_type == "fly_ball"))+
    (sum(slump_data$bb_type == "popup"))
   
  pull_percentage_slump <- sum((slump_data$stand == 'R' & 
                          slump_data$hc_x < 125.42) |
                          (slump_data$stand == 'L' &
                           slump_data$hc_x > 125.42), na.rm = TRUE) /
                    bip_count
  
  # Calculate oppo percentage
  
  oppo_percentage_slump <- sum((slump_data$stand == 'R' & 
                          slump_data$hc_x > 125.42) |
                          (slump_data$stand == 'L' &
                          slump_data$hc_x < 125.42), na.rm = TRUE) /
                    bip_count
  
  # Calculate first pitch swing rate
  
  first_pitch <- filter(slump_data, balls == 0 & strikes == 0)
  
  fp_swing_rate_slump <- sum(first_pitch$description %in% swings_vector) / nrow(first_pitch)



################### Calculate same metrics for season data to compare #############


# Pitches per PA

pitches_per_pa_season <- nrow(player_data) / sum(player_data$events %in% plate_appearances_vector)


# Chase rate BB

balls <- filter(player_data, true_zone == 'ball')
chase_rate_season <- sum(balls$description %in% swings_vector) / nrow(balls)

# Pull percentage

   bip_count <- (sum(player_data$bb_type == "line_drive"))+
    (sum(player_data$bb_type == "ground_ball"))+
    (sum(player_data$bb_type == "fly_ball"))+
    (sum(player_data$bb_type == "popup"))
   
  pull_percentage_season <- sum((player_data$stand == 'R' & 
                          player_data$hc_x < 125.42) |
                          (player_data$stand == 'L' &
                           player_data$hc_x > 125.42), na.rm = TRUE) /
                    bip_count
  
  # Oppo Percentage
  
  oppo_percentage_season <- sum((player_data$stand == 'R' & 
                          player_data$hc_x > 125.42) |
                          (player_data$stand == 'L' &
                          player_data$hc_x < 125.42), na.rm = TRUE) /
                    bip_count
  
  # First pitch swing rate
  
  first_pitch <- filter(player_data, balls == 0 & strikes == 0)
  
  fp_swing_rate_season <- sum(first_pitch$description %in% swings_vector) / nrow(first_pitch)


################  Create data frame to compare ############################


comparison <- data.frame(Metric = c('Pitches per PA', 'Chase Rate BB', 'Pull Percentage',
                                    'Oppo Percentage', 'First Pitch Swing Rate'),
                         Slump = c(round(pitches_per_pa_slump,2), round(chase_rate_slump,2), 
                                   round(pull_percentage_slump,2),round(oppo_percentage_slump,2), 
                                   round(fp_swing_rate_slump,2)),
                         Season = c(round(pitches_per_pa_season,2), round(chase_rate_season,2), 
                                  round(pull_percentage_season,2),round(oppo_percentage_season,2), 
                                  round(fp_swing_rate_season,2)))

  
return(comparison)

}

Application:

slump_comps('Happ, Ian', '2023-06-02')

##                   Metric Slump Season
## 1         Pitches per PA  4.35   4.15
## 2          Chase Rate BB  0.29   0.25
## 3        Pull Percentage  0.76   0.59
## 4        Oppo Percentage  0.24   0.41
## 5 First Pitch Swing Rate  0.32   0.34

This data offers a few possible explanations for Happ’s slump. The most glaring difference between his recent and usual approach is that he is pulling the ball significantly more than normal (59% -> 76%). The most likely recommended development plan would be for coaches to encourage Happ to work on his ability to use all fields in BP. This could help lead him back to normal form.

Scan for slumping players and apply program

In practice, this program could be run on any given day to identify struggling players across the Cubs organization. While not every slump can be explained by the approach metrics used, this can be used to easily identify and diagnose many slumps.

# Create a function to check which players are in a slump on a given date

check_slumps <- function(player_list, date) {
  
  slumping_players <- c()
  
  for(player in player_list) {
    
    if (as.Date(date) %in% get_slump_dates(player)) {
      # add player to slumping_players vector
      slumping_players <- c(slumping_players, player)
    }
    # else do nothing
  }
            
  return(slumping_players)
  
}

Example: The program is run on 06/15/2023

# For demonstration, scan all Cubs MLB players from 2023
players <- unique(cubs_data$player_name)

slump_list <- check_slumps(players, '2023-06-15')

slump_list

## [1] "Barnhart, Tucker"   "Morel, Christopher"

At this point in time, the program identifies Tucker Barnhart and Christopher Morel as slumping.

Now, the slump_comps() function can be applied to each player.

Tucker Barnhart

slump_comps(slump_list[1], '2023-06-15')

##                   Metric Slump Season
## 1         Pitches per PA  4.52   4.09
## 2          Chase Rate BB  0.32   0.29
## 3        Pull Percentage  0.42   0.63
## 4        Oppo Percentage  0.58   0.37
## 5 First Pitch Swing Rate  0.39   0.34

Christopher Morel

slump_comps(slump_list[2], '2023-06-15')

##                   Metric Slump Season
## 1         Pitches per PA  4.02   3.88
## 2          Chase Rate BB  0.21   0.31
## 3        Pull Percentage  0.52   0.72
## 4        Oppo Percentage  0.48   0.28
## 5 First Pitch Swing Rate  0.38   0.42

Slump Assessment and Development

Ben Scartz

2023-12-18

Identifying slumps

Look for explanations for the slump

Application:

Scan for slumping players and apply program