How Much Does a Strikeout Cost?

Frank D. Evans – OKCoders Data Analytics – Spring 2018

\([ROUGH DRAFT]\)

Intro
Using the 2016 Lahman Baseball Database, all regular season pitching records for 10 years (2007 - 2016) are grouped together for each team-year combination to study the pitching bullpen and team outcomes for that year in relationship to one another. Under an assumption (which is admitedly slightly heavy handed), that a primary offensive objective of a pitcher is to create strikeouts in a given game–this anlaysis will isolate the value of a pitcher by their ability to generate strikeouts in games and understand how much a team “pays” for a given strikeout.

Objectives
1. Determine how much a strikeout costs, and which teams got a good or a bad deal.
2. Quantify the relationship between the salary of a pitching bullpen and the win ratio of the team for that year.
3. Quantify with a model the factors that would let a team get a good deal on a pitcher to minimoze cost and maximise expected strikeouts produced.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    9671   29489   40642   42737   52330  111191

By computing the total salary of all players that played in at least 10 games and adding up all strikeouts they produced for that team for that season, we can compute an effective “cost per strikeout”. There is a considerable band of values that teams have paid per strikeout over this 10 year period, raning from around $10K to well over $100K per strikeout, with an average value of 42737K during the period.

The value of a strikeout is high, getting more strikeouts has a strong relationship to a higher win ratio for a given season.

## # A tibble: 5 x 6
##   yearID name            player_salary strikeouts cost_strikeout win_ratio
##    <int> <chr>                   <dbl>      <int>          <dbl>     <dbl>
## 1   2013 Philadelphia P…     94734500.        852        111191.     0.451
## 2   2015 Detroit Tigers      82365000.        763        107949.     0.460
## 3   2015 San Francisco …     94895000.        890        106624.     0.519
## 4   2014 Philadelphia P…     98203500.        979        100310.     0.451
## 5   2016 Detroit Tigers      88811481.        948         93683.     0.534

Among the teams that paid the most, only 2 or 5 ended with a positive win record–and neither would be considered by a baseball analyst as great results.

## # A tibble: 5 x 6
##   yearID name            player_salary strikeouts cost_strikeout win_ratio
##    <int> <chr>                   <dbl>      <int>          <dbl>     <dbl>
## 1   2008 Pittsburgh Pir…      6982500.        722          9671.     0.414
## 2   2007 Tampa Bay Devi…      8839500.        885          9988.     0.407
## 3   2009 Florida Marlins      9989000.        981         10182.     0.537
## 4   2009 Oakland Athlet…      8050000.        748         10762.     0.463
## 5   2008 Florida Marlins     10106500.        864         11697.     0.522

Win record results are strikingly similar for those tha paid the least, promoting a want to see the entire landscape to characterize the relationship.

Looking at the amount paid per strikeout appears to not have a strong relationship with the win record of a given team for that season.

## [1] "player_salary"  "w_player_age"   "w_career_age"   "walk_ratio"    
## [5] "cost_strikeout"
##   intercept       RMSE  Rsquared        MAE     RMSESD RsquaredSD
## 1      TRUE 0.05474542 0.3276789 0.04511669 0.00236611 0.04788481
##         MAESD
## 1 0.002164012

Building a model against a win record of a team based on the key features of a team’s pitching bullpen does not yield a strong relationship–with an R Squared value of around 0.3–meaning that this model is only able to account for about 30% of the randomness that goes into the win record of a given team.

##   intercept     RMSE Rsquared      MAE   RMSESD RsquaredSD    MAESD
## 1      TRUE 14302.55 0.425792 11099.04 1150.835  0.0629895 796.7282

When seeking to determine the factors of what values influence a low cost per strikeout, the career age of the player (how long they have been playing in the MLB), seems to have the best fit. This seems to indicate that players younger in their career will tend to help reduce the cost of a pitching bullpen without reducing the likelihood of contributing to overall strikeouts.

##   intercept     RMSE  Rsquared      MAE  RMSESD RsquaredSD    MAESD
## 1      TRUE 14008.62 0.4313772 10825.21 1054.11 0.08280468 651.5575
##  (Intercept) w_career_age 
##    -4199.370     8322.666

Independently, this one factor can succussfully account for a similarly reliable model.

A younger bullpen still is safe to a win record, as there is no discernable relationship to the win record of the team for their season.

Conclusion
But plotting the model we built isolated on the career age of the pitcher, shows the degree to which this relationship exists on minimizing the cost of a given strikeout. Thus, the main conclusion on how to get a good deal is effective pre-major league career scouting moreso than buying established MLB pitching talent when it comes to effective use of dollars to buy strikeouts.