A minor frustration with MLB Statcast data via Baseball Savant is that it does not list the home plate umpire, which makes estimating pitch framing abilities for catcher somewhat difficult. From the Savant FAQ page, they used to record the home plate umpire but that functionality has been depreciated.Thankfully, Umpire Scorecards (@UmpScorecards) has this data available on its website for free. It’s simple to download the data and the append it to scraped data from Savant. For this article, I used the 2021 season and scraped the data using the pybaseball package, though the statistical analysis is done in R.

As Angel Hernandez proves every week, what is called a ball versus a strike can vary wildly between umpires. A question that is interesting to me is if certain catchers are luckier than others by getting more favorable umpire assignments. To answer this, I want compare two models to predict whether or not a certain pitch is called a strike, one that incorporates both catchers and umpires versus one that only uses catchers. This will be done via mixed effects model were catchers and umpires are the random effects and certain things about the pitch (speed, x,y location, etc.) will be fixed. To test if the umpires do have an effect on if a pitch is called a strike, an ANOVA comparison is used. Then, to determine which catchers were the luckiest and unluckiest, predictions are made with both models, and the difference between predictions is the amount of “luck” a catcher had. For example, if the naive model (no umpire) predicts that the probability that the pitch being a strike is 45%, and the umpire-included model predicts 47%, the umpire is being 2% more generous than an average umpire and thus the catcher was “lucky” for having him.

The data cleaning aspect is fairly simple. I combined the pitch-by-pitch data with the umpire data to create a final dataset that lists the home plate umpire for each pitch. I also filtered the dataset down to just the variables needed.

library(readr)
library(readxl)
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5     v dplyr   1.0.8
## v tibble  3.1.6     v stringr 1.4.0
## v tidyr   1.2.0     v forcats 0.5.1
## v purrr   0.3.4
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
# pbp
statcast_21 <- read_csv("C:/Users/david/Downloads/statcast-21.csv")
## New names:
## * `` -> ...1
## Rows: 686991 Columns: 93
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr  (16): pitch_type, player_name, events, description, des, game_type, sta...
## dbl  (68): ...1, release_speed, release_pos_x, release_pos_z, batter, pitche...
## lgl   (8): spin_dir, spin_rate_deprecated, break_angle_deprecated, break_len...
## date  (1): game_date
## 
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
# ump data
ump_2021 <- read_excel("ump-2021.xlsx")
colnames(ump_2021)[1] <- "game_date"
colnames(ump_2021)[3] <- "home_team"

combin <- left_join(statcast_21, ump_2021, by = c("game_date","home_team"))

# exclude pitches that were swung as
f_df <- combin %>%
  filter(description == "called_strike" | description == "ball") %>%
  select(type, p_throws, stand, plate_x, plate_z, release_spin_rate, release_extension, pfx_z, pfx_x, fielder_2, Umpire) 

# convert characters to factors
f_df[sapply(f_df, is.character)] = lapply(f_df[sapply(f_df, is.character)],as.factor)

# check to make sure dataset has proper structures
sapply(f_df, class)
##              type          p_throws             stand           plate_x 
##          "factor"          "factor"          "factor"         "numeric" 
##           plate_z release_spin_rate release_extension             pfx_z 
##         "numeric"         "numeric"         "numeric"         "numeric" 
##             pfx_x         fielder_2            Umpire 
##         "numeric"         "numeric"          "factor"

The fixed effects for both models are: Pitching Throw Hand, Batting Handedness, Horizontal and Vertical plate location of the pitch, Spin Rate, Extension, Horizontal and Vertical Break

The random effects for the naive model is the catcher id (Fielder 2) and then I add the Umpire random effect for the combined model.

The anova checks to make sure that adding the Umpire random effect to the model is a statistically significant change, which it is. Comparing log likelihoods between the two models, the difference is small. This makes sense given that at the Major League level, the umpires are all very good and relatively the same, so there is not much variance. At the college or high school level I would assume there would be more variance.

library(lme4)
## Warning: package 'lme4' was built under R version 4.1.3
## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack
f_df$fielder_2 <- as.factor(f_df$fielder_2)
model1 <- type ~ p_throws + stand + plate_x * plate_z + release_spin_rate + release_extension + pfx_z * pfx_x + (1|fielder_2)
model2 <- type ~ p_throws + stand + plate_x * plate_z + release_spin_rate + release_extension + pfx_z * pfx_x + (1|fielder_2) + (1 | Umpire)

m0 <- glmer(model1, data = f_df,
            family = binomial(),
            nAGQ = 0,
            control=glmerControl(optimizer = "nloptwrap")
)

m1 <- glmer(model2, data = f_df,
            family = binomial(),
            nAGQ = 0,
            control=glmerControl(optimizer = "nloptwrap")
)

summary(m0)
## Generalized linear mixed model fit by maximum likelihood (Adaptive
##   Gauss-Hermite Quadrature, nAGQ = 0) [glmerMod]
##  Family: binomial  ( logit )
## Formula: type ~ p_throws + stand + plate_x * plate_z + release_spin_rate +  
##     release_extension + pfx_z * pfx_x + (1 | fielder_2)
##    Data: f_df
## Control: glmerControl(optimizer = "nloptwrap")
## 
##       AIC       BIC    logLik  deviance  df.resid 
##  440638.8  440768.1 -220307.4  440614.8    352893 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -576.21   -0.72   -0.63    1.32    2.13 
## 
## Random effects:
##  Groups    Name        Variance Std.Dev.
##  fielder_2 (Intercept) 0.002232 0.04725 
## Number of obs: 352905, groups:  fielder_2, 116
## 
## Fixed effects:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)       -1.520e+00  6.190e-02 -24.554  < 2e-16 ***
## p_throwsR          1.476e-01  1.164e-02  12.676  < 2e-16 ***
## standR             1.140e-01  7.797e-03  14.624  < 2e-16 ***
## plate_x           -6.468e-01  9.429e-03 -68.593  < 2e-16 ***
## plate_z            5.732e-02  3.493e-03  16.410  < 2e-16 ***
## release_spin_rate  2.853e-04  1.205e-05  23.673  < 2e-16 ***
## release_extension -2.130e-02  8.455e-03  -2.519   0.0118 *  
## pfx_z              3.986e-02  5.199e-03   7.668 1.75e-14 ***
## pfx_x              6.982e-02  5.872e-03  11.890  < 2e-16 ***
## plate_x:plate_z    2.502e-01  3.667e-03  68.229  < 2e-16 ***
## pfx_z:pfx_x       -6.449e-02  7.891e-03  -8.173 3.00e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) p_thrR standR plat_x plat_z rls_s_ rls_xt pfx_z  pfx_x 
## p_throwsR   -0.060                                                        
## standR      -0.062  0.096                                                 
## plate_x      0.043 -0.286 -0.128                                          
## plate_z     -0.193 -0.026  0.012  0.051                                   
## rels_spn_rt -0.465 -0.071 -0.028 -0.011 -0.027                            
## reles_xtnsn -0.877 -0.038 -0.016 -0.003  0.104  0.043                     
## pfx_z        0.047  0.078 -0.021 -0.024 -0.263  0.127 -0.153              
## pfx_x        0.136  0.006 -0.001 -0.157 -0.046 -0.237 -0.049  0.193       
## plt_x:plt_z -0.020  0.253  0.020 -0.909 -0.014 -0.001 -0.007  0.000  0.118
## pfx_z:pfx_x -0.160  0.559 -0.046  0.004  0.015  0.125  0.064 -0.023 -0.544
##             plt_:_
## p_throwsR         
## standR            
## plate_x           
## plate_z           
## rels_spn_rt       
## reles_xtnsn       
## pfx_z             
## pfx_x             
## plt_x:plt_z       
## pfx_z:pfx_x -0.019
summary(m1)
## Generalized linear mixed model fit by maximum likelihood (Adaptive
##   Gauss-Hermite Quadrature, nAGQ = 0) [glmerMod]
##  Family: binomial  ( logit )
## Formula: type ~ p_throws + stand + plate_x * plate_z + release_spin_rate +  
##     release_extension + pfx_z * pfx_x + (1 | fielder_2) + (1 |      Umpire)
##    Data: f_df
## Control: glmerControl(optimizer = "nloptwrap")
## 
##       AIC       BIC    logLik  deviance  df.resid 
##  440626.1  440766.2 -220300.1  440600.1    352892 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -575.96   -0.72   -0.63    1.32    2.12 
## 
## Random effects:
##  Groups    Name        Variance  Std.Dev.
##  fielder_2 (Intercept) 0.0022004 0.04691 
##  Umpire    (Intercept) 0.0008178 0.02860 
## Number of obs: 352905, groups:  fielder_2, 116; Umpire, 99
## 
## Fixed effects:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)       -1.520e+00  6.202e-02 -24.515  < 2e-16 ***
## p_throwsR          1.475e-01  1.165e-02  12.660  < 2e-16 ***
## standR             1.144e-01  7.800e-03  14.663  < 2e-16 ***
## plate_x           -6.468e-01  9.430e-03 -68.587  < 2e-16 ***
## plate_z            5.741e-02  3.494e-03  16.433  < 2e-16 ***
## release_spin_rate  2.852e-04  1.206e-05  23.647  < 2e-16 ***
## release_extension -2.129e-02  8.463e-03  -2.516   0.0119 *  
## pfx_z              3.983e-02  5.200e-03   7.659 1.87e-14 ***
## pfx_x              6.988e-02  5.874e-03  11.898  < 2e-16 ***
## plate_x:plate_z    2.502e-01  3.667e-03  68.228  < 2e-16 ***
## pfx_z:pfx_x       -6.465e-02  7.893e-03  -8.191 2.59e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) p_thrR standR plat_x plat_z rls_s_ rls_xt pfx_z  pfx_x 
## p_throwsR   -0.060                                                        
## standR      -0.062  0.095                                                 
## plate_x      0.042 -0.285 -0.128                                          
## plate_z     -0.192 -0.026  0.012  0.051                                   
## rels_spn_rt -0.464 -0.071 -0.028 -0.011 -0.027                            
## reles_xtnsn -0.876 -0.038 -0.016 -0.003  0.104  0.043                     
## pfx_z        0.047  0.078 -0.021 -0.024 -0.263  0.127 -0.153              
## pfx_x        0.136  0.006 -0.001 -0.157 -0.046 -0.237 -0.048  0.193       
## plt_x:plt_z -0.020  0.253  0.020 -0.909 -0.014  0.000 -0.007  0.000  0.118
## pfx_z:pfx_x -0.159  0.559 -0.046  0.004  0.015  0.126  0.063 -0.023 -0.544
##             plt_:_
## p_throwsR         
## standR            
## plate_x           
## plate_z           
## rels_spn_rt       
## reles_xtnsn       
## pfx_z             
## pfx_x             
## plt_x:plt_z       
## pfx_z:pfx_x -0.019
anova(m0, m1)
## Data: f_df
## Models:
## m0: type ~ p_throws + stand + plate_x * plate_z + release_spin_rate + release_extension + pfx_z * pfx_x + (1 | fielder_2)
## m1: type ~ p_throws + stand + plate_x * plate_z + release_spin_rate + release_extension + pfx_z * pfx_x + (1 | fielder_2) + (1 | Umpire)
##    npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)    
## m0   12 440639 440768 -220307   440615                         
## m1   13 440626 440766 -220300   440600 14.668  1  0.0001282 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The anova test confirms that adding the umpire variable is important.

Now, to determine the luckiest and unluckiest catchers, I made the predictions with both models and took the difference between the two predictions, using the methodology above. Then, to make the leaderboards more readable, since Statcast exports its player identification in the form of ID number, not name, I downloaded ID data from Razzball and combined it with the Statcast data. The leaderboard dataframe is the full summary of all of the players and their respective “luck” averages.

f_df$no_ump_pred <- predict(m0, f_df, type = "response")
f_df$ump_pred <- predict(m1, f_df, type = "response")
f_df$dif = f_df$ump_pred - f_df$no_ump_pred

fl <- f_df %>%
  select(fielder_2, dif) %>%
  drop_na() # about 1600 rows need to be dropped. Dataset has ~350k in general so not a big deal


library(readxl)
mlbam_id <- read_excel("mlbam_id.xlsx")
ids <- mlbam_id[,c(2,3)]
colnames(ids)[2] <- "fielder_2"

sapply(fl, class)
## fielder_2       dif 
##  "factor" "numeric"
sapply(ids, class)
##        Name   fielder_2 
## "character"   "numeric"
ids$fielder_2 <- as.factor(ids$fielder_2)

fl <- left_join(fl, ids, by = "fielder_2")

leaderboard <- fl %>%
  group_by(Name) %>%
  summarise(mean_dif = round(mean(dif),5))

Who was the luckiest and unluckiest catcher? Austin Allen was the luckiest, getting on average a 0.6% boost in strike probability called. Yohel Pozo was the unluckiest, losing around 0.4% on average. In general, these are small amounts, and so the (un)luckiness of a particular catcher is barely noticable.

unlucky <- leaderboard[order(leaderboard$mean_dif),]
head(unlucky)
## # A tibble: 6 x 2
##   Name         mean_dif
##   <chr>           <dbl>
## 1 Yohel Pozo   -0.0041 
## 2 Taylor Ward  -0.00374
## 3 Joey Bart    -0.00369
## 4 Payton Henry -0.00336
## 5 Tony Wolters -0.0017 
## 6 Ali Sanchez  -0.00166
lucky <- leaderboard[order(-leaderboard$mean_dif),]
head(lucky)
## # A tibble: 6 x 2
##   Name            mean_dif
##   <chr>              <dbl>
## 1 Austin Allen     0.00676
## 2 Tyler Payne      0.00673
## 3 Yermin Mercedes  0.00571
## 4 Jack Kruger      0.0049 
## 5 P.J. Higgins     0.00261
## 6 Taylor Gushue    0.00253

Hopefully this was an interesting article on catcher framing and the effects of umpires on called strikes. While the results ended up not being that interesting, I hope that those those who are not familiar with mixed effects models found this to be an interesting first tutorial.