Take 1 Goals: - display understanding of states - display understanding of expected value - applied expected value table to baseball event data
By hand calculation of these three Stolen Base attempts by JT Realmuto:
Repeat the same calculations as in Task 1, but this time using R. For one player over three stolen base attempts, determine the change in expected value between state1 and state2. Over a variety of SBA base states, not just (1,0,0) to (0,1,0)
Read in the csv file that contains the expected value table for each
base state - out state. Use the read_csv
function to read
the file and assign it to the dataframe expect_value
.
expect_value <- read_csv('data/EV_base_out_states.csv')
Take a look at the dataframe, expect_value
that was just
created using the head
function. In RStudio, you can view a
dataframe by clicking on the table data icon on the right, in the
environment pane.
head(expect_value)
## # A tibble: 6 × 6
## base_state_name base_1 base_2 base_3 OUT_STATE EV_state
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 empty 0 0 0 0 0.481
## 2 on 1st 1 0 0 0 0.859
## 3 on 2nd 0 1 0 0 1.1
## 4 1st & 2nd 1 1 0 0 1.44
## 5 on 3rd 0 0 1 0 1.35
## 6 corners 1 0 1 0 1.78
The head
functions shows the column names and the top 6
rows, the default number of rows. There are more rows of data, but only
the 1st 6 are shown by default. You can provide the number of rows to
the head
function; here we provide 9 rows:
head(expect_value, 9)
## # A tibble: 9 × 6
## base_state_name base_1 base_2 base_3 OUT_STATE EV_state
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 empty 0 0 0 0 0.481
## 2 on 1st 1 0 0 0 0.859
## 3 on 2nd 0 1 0 0 1.1
## 4 1st & 2nd 1 1 0 0 1.44
## 5 on 3rd 0 0 1 0 1.35
## 6 corners 1 0 1 0 1.78
## 7 2nd & 3rd 0 1 1 0 1.96
## 8 loaded 1 1 1 0 2.29
## 9 empty 0 0 0 1 0.254
Let’s take time and understand what is in this important table. The
variable base_state_name
is a name label that is provided
to help interpret the next three base_x variables. The variable
base_1
shows if there is runner on 1st base; (1) indicates
a running, or no runner (0). Likewise base_2
and
base_3
are hot-coded variables for second and third base.
OUT_STATE
is the number of outs {0,1,2,3}. Finally,
EV_state
is the expect value, or expected runs, for each
base-out state.
Add a new column to the expected value table; baserunners_tot, the total number of base runners on all three bases for each base state. Use the ‘mutate’ function to create the new column based on the addition of all three bases.
expect_value_A <- expect_value %>%
mutate(baserunners_tot = (base_1+ base_2+ base_3))
Let’s take a quick look at the expect value. Plot the expect values of different out states.
ev_color_palettes <- c('#660510', '#991020', '#dd0000', '#ddaaaa',
'#aaaadd', '#0000dd', '#201099','#100566')
expect_value_A$base_state_name <- factor(expect_value_A$base_state_name,
levels=c('loaded', '2nd & 3rd',
'corners', 'on 3rd', '1st & 2nd',
'on 2nd', 'on 1st', 'empty'))
expect_value_B <- expect_value_A %>%
filter(OUT_STATE<3)
ggplot(expect_value_B) +
scale_color_manual(values=ev_color_palettes, name='Base State') +
geom_point(aes(x=OUT_STATE, y=EV_state, color=base_state_name), size=4) +
scale_x_continuous(name='Out State', limits= c(-0.5, 2.5)) +
scale_y_continuous(name='Expected value (runs)', expand=c(0,0), limits=c(0, 2.5)) +
theme_bw() +
geom_hline(yintercept = 0)
Does that make sense? The expected value is highest with zero outs and decreases with increasing outs. The expected value with 3 outs is zero; the inning is over. Within each out state, the more runners on base, the higher the expected value.
Use the read_csv()
function to read in the event file.
The event file contains stolen base attempts (SBA) and the base-out
states of each SBA. This 1st event file will be an abbreviated set of
SBA’s, hence the ‘short’ in the file name.
event_SBA_short <- read_csv('data/event_SBA_short.csv')
## Rows: 12 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): runner_name, runner_ID, SBA_outcome
## dbl (4): OUT_STATE, base_1, base_2, base_3
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
You can see the column names of the df above. There are 6 columns, the details of variables in this event files are described in the data dictionary.
Let’s take a looks at the table, use the head()
function
again to show the event file:
head(event_SBA_short)
## # A tibble: 6 × 7
## runner_name runner_ID OUT_STATE base_1 base_2 base_3 SBA_outcome
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 Bryce Harper harpb003 0 1 0 0 success
## 2 Bryce Harper harpb003 2 0 1 0 success
## 3 Bryce Harper harpb003 2 1 0 0 success
## 4 Cesar Hernandez hernc005 1 0 1 0 fail
## 5 Cesar Hernandez hernc005 1 1 0 0 fail
## 6 Cesar Hernandez hernc005 2 1 0 0 success
Again, the head()
function is showing the first six
rows. There are more base runners in the dataframe beyond those shown in
these first six rows. Use the table()
function to see how
many rows of SBA are provided for each runner in this short event
file.
table(event_SBA_short$runner_name)
##
## Bryce Harper Cesar Hernandez J.T. Realmuto Jean Segura
## 3 3 3 3
Here the table()
function is counting all the rows where
each unique runner_name
appears. This short event files
contains, four different runners, and each runner has 3 observations
(i.e. three SBA).
In Task 1, you calculate the EV change for J.T. Realmuto’s three
sample SBA’s. Let’s filter the dataframe to JT, so that we can now use R
to conduct the same calculation.
Use the filter()
function on the event dataframe to select
only one player: JT Realmuto, and assign that to the new dataframe,
event_SBA_short_Realmuto:
event_SBA_short_Realmuto <- event_SBA_short %>%
filter(runner_ID=='realj001')
Take a look at the new dataframe that was just created with JT Realmuto’s three sample SBA:
head(event_SBA_short_Realmuto)
## # A tibble: 3 × 7
## runner_name runner_ID OUT_STATE base_1 base_2 base_3 SBA_outcome
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 J.T. Realmuto realj001 0 1 0 0 success
## 2 J.T. Realmuto realj001 0 1 0 0 fail
## 3 J.T. Realmuto realj001 2 1 0 0 success
Use the table()
function again to check that the
dataframe only contains JT’s SBA’s:
table(event_SBA_short_Realmuto$runner_name)
##
## J.T. Realmuto
## 3
Use the merge()
function to add the expected value for
each of the 24 base-out states to the JT Realmuto event dataframe, and
assign this to a new dataframe
event_SBA_short_Realmuto_ev
.
Also, let’s rename the column EV_state
to
EV_state_0
to designate that this is the initial EV of the
state before the SBA, at time=0.
event_SBA_short_Realmuto_ev <- merge(event_SBA_short_Realmuto, expect_value,
by=c('OUT_STATE', 'base_1', 'base_2', 'base_3')) %>%
rename(EV_state_0= EV_state)
Take a look at this new dataframe, note how the expected value
column, EV_state
, was added for each initial base-out
state. Two EV_state
value are same; the initial state is
the same for these two SBA’s.
head(event_SBA_short_Realmuto_ev)
## OUT_STATE base_1 base_2 base_3 runner_name runner_ID SBA_outcome
## 1 0 1 0 0 J.T. Realmuto realj001 success
## 2 0 1 0 0 J.T. Realmuto realj001 fail
## 3 2 1 0 0 J.T. Realmuto realj001 success
## base_state_name EV_state_0
## 1 on 1st 0.859
## 2 on 1st 0.859
## 3 on 1st 0.224
Now we are ready to consider the state after the outcome of the SBA. We referred to the initial state as State0, we will refer to the state after the SBA as State2.
Determining State2, the base state and out state after a SBA.
State 1 out state: Calculate the change in out state based on outcome of the stolen base attempt. Success, a successful SBA does not change the out state, or fail, a SBA that leads to an out increases the out state buy 1 out.
event_SBA_short_Realmuto_ev_B <- event_SBA_short_Realmuto_ev %>%
mutate(S2_OUT_STATE= ifelse(SBA_outcome=='success', OUT_STATE, (1 + OUT_STATE)))
View the modified dataframe, note how the State2 out-state column has been created.
head(event_SBA_short_Realmuto_ev_B)
## OUT_STATE base_1 base_2 base_3 runner_name runner_ID SBA_outcome
## 1 0 1 0 0 J.T. Realmuto realj001 success
## 2 0 1 0 0 J.T. Realmuto realj001 fail
## 3 2 1 0 0 J.T. Realmuto realj001 success
## base_state_name EV_state_0 S2_OUT_STATE
## 1 on 1st 0.859 0
## 2 on 1st 0.859 1
## 3 on 1st 0.224 2
Read in table with the changes in base state with a SBA.
SBA_state_change <- read_csv('data/SBA_state_change.csv')
## Rows: 8 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): SBA_outcome
## dbl (6): base_1, base_2, base_3, S2_base_1, S2_base_2, S2_base_3
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Merge the change in base state at State 2 with the df:
event_SBA_short_Realmuto_ev_C <- event_SBA_short_Realmuto_ev_B %>%
merge(SBA_state_change, by=c('base_1', 'base_2', 'base_3', 'SBA_outcome'))
Take a look at the new columns added to the df:
head(event_SBA_short_Realmuto_ev_C)
## base_1 base_2 base_3 SBA_outcome OUT_STATE runner_name runner_ID
## 1 1 0 0 fail 0 J.T. Realmuto realj001
## 2 1 0 0 success 0 J.T. Realmuto realj001
## 3 1 0 0 success 2 J.T. Realmuto realj001
## base_state_name EV_state_0 S2_OUT_STATE S2_base_1 S2_base_2 S2_base_3
## 1 on 1st 0.859 1 0 0 0
## 2 on 1st 0.859 0 0 1 0
## 3 on 1st 0.224 2 0 1 0
add the expected value for State 2 in the same way as before.
Merge the expected value table with the df, but this time merger using
the State 2 base and out states:
event_SBA_short_Realmuto_ev_D <- merge(event_SBA_short_Realmuto_ev_C, expect_value,
by.x=c('S2_OUT_STATE', 'S2_base_1', 'S2_base_2', 'S2_base_3'),
by.y=c('OUT_STATE', 'base_1', 'base_2', 'base_3'))
take a look at the new columns added to the df:
head(event_SBA_short_Realmuto_ev_D)
## S2_OUT_STATE S2_base_1 S2_base_2 S2_base_3 base_1 base_2 base_3 SBA_outcome
## 1 0 0 1 0 1 0 0 success
## 2 1 0 0 0 1 0 0 fail
## 3 2 0 1 0 1 0 0 success
## OUT_STATE runner_name runner_ID base_state_name.x EV_state_0
## 1 0 J.T. Realmuto realj001 on 1st 0.859
## 2 0 J.T. Realmuto realj001 on 1st 0.859
## 3 2 J.T. Realmuto realj001 on 1st 0.224
## base_state_name.y EV_state
## 1 on 2nd 1.100
## 2 empty 0.254
## 3 on 2nd 0.319
that is a getting a little complicated. Let’s use a table:
S2_OUT_STATE | S2_base_1 | S2_base_2 | S2_base_3 | base_1 | base_2 | base_3 | SBA_outcome | OUT_STATE | runner_name | runner_ID | base_state_name.x | EV_state_0 | base_state_name.y | EV_state |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | 0 | 1 | 0 | 0 | success | 0 | J.T. Realmuto | realj001 | on 1st | 0.859 | on 2nd | 1.100 |
1 | 0 | 0 | 0 | 1 | 0 | 0 | fail | 0 | J.T. Realmuto | realj001 | on 1st | 0.859 | empty | 0.254 |
2 | 0 | 1 | 0 | 1 | 0 | 0 | success | 2 | J.T. Realmuto | realj001 | on 1st | 0.224 | on 2nd | 0.319 |
see how there are now two ‘EV_state’ columns; EV_state.x &
EV_state.y These are the expected values for State 1 and State 2. Let’s
use the ‘rename’ function to rename them to somethings that makes more
sense and is easier to read and remember. Also, now you can calculate
the change in value for each of the three SBA in the data frame. Use the
mutate function to create a new column, and name it ‘change_EV’.
This is the simple subtraction between the expected value of State 1 and
State 2.
event_SBA_short_Realmuto_ev_E <- event_SBA_short_Realmuto_ev_D %>%
rename(
EV_state_1 = EV_state) %>%
mutate(change_EV= EV_state_1- EV_state_0 )
was value created over these three SBA’s? Let’s find out.
Sum the change in value over these three SBA’s.
Remember, these three are non-representative, so this summation does not
represent the players’ actual value creation or destruction.
sum(event_SBA_short_Realmuto_ev_E$change_EV)
## [1] -0.269
A positive value means the runner created value with these three SBA.
A negative value means the runner destroyed value with these thee
SBA.
Here we see a negative number; so in these three SBA’s JT Realmuto
destroyed value. Specially,
{r} sum(event_SBA_short_Realmuto_ev_E$change_EV)
runs of
value.
Remember, these three SBA’s are non-representative, so this summation does not represent the player’s typical, or normal, value creation or destruction. In JT Realmuto’s defense, these three example SBA were not selected randomly; a failed SBA was selected for variety. JT Realmuto is a excellent baserunner; as we can explore in the next section.
One player over all stolen base attempts, determine the change in expected value between state1 and state2. Over all SBA base states, and sum the net season change in value. Did a player X’s stolen base activity create run value or not?
As in Task 2a, use the read_csv()
function to read in
the SBA event file.
event_SBA <- read_csv('data/event_SBA_full.csv')
## Rows: 2532 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (8): runner_name, runner_ID, team_runner_cd, SBA_outcome, BAT_HAND_CD, C...
## dbl (5): OUT_STATE, base_1, base_2, base_3, BAT_LINEUP_ID
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Now we are working with the full season event file; not the simple 12 rows file that we just explored. Let’s talk a look at his larger dataframe:
head(event_SBA)
## # A tibble: 6 × 13
## runner_…¹ runne…² team_…³ OUT_S…⁴ base_1 base_2 base_3 SBA_o…⁵ BAT_L…⁶ BAT_H…⁷
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <chr>
## 1 Jose Abr… abrej0… CHA 2 1 0 0 success 4 L
## 2 Ronald A… acunr0… ATL 1 1 0 0 success 2 R
## 3 Ronald A… acunr0… ATL 0 1 0 0 success 2 R
## 4 Ronald A… acunr0… ATL 0 1 0 1 fail 2 L
## 5 Ronald A… acunr0… ATL 2 1 0 1 success 2 L
## 6 Ronald A… acunr0… ATL 2 1 0 1 success 2 L
## # … with 3 more variables: CatcherID <chr>, PIT_ID <chr>, PIT_HAND_CD <chr>,
## # and abbreviated variable names ¹runner_name, ²runner_ID, ³team_runner_cd,
## # ⁴OUT_STATE, ⁵SBA_outcome, ⁶BAT_LINEUP_ID, ⁷BAT_HAND_CD
Most things look similar; there are also a few new columns. There is
a team_runner_cd
variable; this provides the team of the
runner. We will use this later in Task 4.
Let’s find out how many events (stolen base attempts) are in this
file? There are a few ways to determine this. One way is to use the
length()
function, this shows how many rows are in the
column runner_ID
in the dataframe `event_SBA.
length(event_SBA$runner_ID)
## [1] 2532
We can also ask a simple question of this dataframe: Over all players
what was the number success and failed SBA? The sum of these two
potential outcomes should be the same as the above length()
result.
table(event_SBA$SBA_outcome)
##
## fail success
## 579 1953
Now we will repeat the same actions as in Task 1 and 2, but with the full season record for one player. Since you have already read in the expected value table into the global environment, that task doesn’t need to be repeated.
Use the filter()
function on the full season event file
to select one player. This time let’s select Bryce Harper:
event_SBA_full_harper <- event_SBA %>%
filter(runner_ID=='harpb003')
Take a look on the new Harper full season SBA dataframe:
head(event_SBA_full_harper)
## # A tibble: 6 × 13
## runner_…¹ runne…² team_…³ OUT_S…⁴ base_1 base_2 base_3 SBA_o…⁵ BAT_L…⁶ BAT_H…⁷
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <chr>
## 1 Bryce Ha… harpb0… PHI 0 1 0 0 success 4 R
## 2 Bryce Ha… harpb0… PHI 2 1 0 0 success 4 R
## 3 Bryce Ha… harpb0… PHI 2 1 0 1 success 3 R
## 4 Bryce Ha… harpb0… PHI 2 1 0 0 success 4 R
## 5 Bryce Ha… harpb0… PHI 1 1 0 1 success 3 L
## 6 Bryce Ha… harpb0… PHI 2 1 0 0 success 4 R
## # … with 3 more variables: CatcherID <chr>, PIT_ID <chr>, PIT_HAND_CD <chr>,
## # and abbreviated variable names ¹runner_name, ²runner_ID, ³team_runner_cd,
## # ⁴OUT_STATE, ⁵SBA_outcome, ⁶BAT_LINEUP_ID, ⁷BAT_HAND_CD
Use the table()
function to calculate how many SBA’s
Harper had over the full season.
table(event_SBA_full_harper$runner_name)
##
## Bryce Harper
## 16
The above shows how many SBA attempts this single player had over the
entire season, but what were the outcomes? Use the table()
function again, but this time over the SBA_outcome
column.
table(event_SBA_full_harper$SBA_outcome)
##
## fail success
## 2 14
As you did earlier, use the merge()
function to add the
expected value for each of the 24 base-out states.
event_SBA_full_harper_ev <- merge(event_SBA_full_harper, expect_value,
by=c('OUT_STATE', 'base_1', 'base_2', 'base_3'))
Take a look at this dataframe of a single player. Note how the expected
value column was added for each base-out state
OUT_STATE | base_1 | base_2 | base_3 | runner_name | runner_ID | team_runner_cd | SBA_outcome | BAT_LINEUP_ID | BAT_HAND_CD | CatcherID | PIT_ID | PIT_HAND_CD | base_state_name | EV_state |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 0 | Bryce Harper | harpb003 | PHI | success | 4 | R | stalj001 | ramiy001 | R | on 1st | 0.859 |
0 | 1 | 1 | 0 | Bryce Harper | harpb003 | PHI | success | 4 | R | mccab002 | tehej001 | R | 1st & 2nd | 1.437 |
1 | 1 | 0 | 0 | Bryce Harper | harpb003 | PHI | success | 3 | R | carav001 | darvy001 | R | on 1st | 0.509 |
Determine State 2, base on the outcome of the SBA and determine the
outcome base-out state. Calculate the change in out state based on
outcome of the stolen base attempt.
Success, a successful SBA does not change the out state, or fail, a SBA
that leads to an out increases the out state buy 1 out.
event_SBA_full_harper_ev_B <- event_SBA_full_harper_ev %>%
mutate(S2_OUT_STATE= ifelse(SBA_outcome=='success', OUT_STATE, 1+OUT_STATE))
View the modified dataframe:
OUT_STATE | base_1 | base_2 | base_3 | runner_name | runner_ID | team_runner_cd | SBA_outcome | BAT_LINEUP_ID | BAT_HAND_CD | CatcherID | PIT_ID | PIT_HAND_CD | base_state_name | EV_state | S2_OUT_STATE |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 0 | Bryce Harper | harpb003 | PHI | success | 4 | R | stalj001 | ramiy001 | R | on 1st | 0.859 | 0 |
0 | 1 | 1 | 0 | Bryce Harper | harpb003 | PHI | success | 4 | R | mccab002 | tehej001 | R | 1st & 2nd | 1.437 | 0 |
1 | 1 | 0 | 0 | Bryce Harper | harpb003 | PHI | success | 3 | R | carav001 | darvy001 | R | on 1st | 0.509 | 1 |
1 | 1 | 0 | 0 | Bryce Harper | harpb003 | PHI | success | 4 | R | grany001 | housa001 | R | on 1st | 0.509 | 1 |
1 | 1 | 0 | 0 | Bryce Harper | harpb003 | PHI | success | 5 | R | barnt001 | bowmm001 | R | on 1st | 0.509 | 1 |
1 | 1 | 0 | 1 | Bryce Harper | harpb003 | PHI | success | 3 | L | holab001 | alcas001 | R | corners | 1.130 | 1 |
1 | 1 | 0 | 1 | Bryce Harper | harpb003 | PHI | fail | 4 | R | alfaj002 | richt001 | R | corners | 1.130 | 2 |
2 | 1 | 0 | 0 | Bryce Harper | harpb003 | PHI | success | 4 | R | mccab002 | sorom001 | R | on 1st | 0.224 | 2 |
2 | 1 | 0 | 0 | Bryce Harper | harpb003 | PHI | success | 4 | R | mccab002 | soboc001 | R | on 1st | 0.224 | 2 |
2 | 1 | 0 | 0 | Bryce Harper | harpb003 | PHI | success | 4 | R | ramow001 | matzs001 | L | on 1st | 0.224 | 2 |
2 | 1 | 0 | 0 | Bryce Harper | harpb003 | PHI | success | 4 | R | castj006 | berrj001 | R | on 1st | 0.224 | 2 |
2 | 1 | 0 | 0 | Bryce Harper | harpb003 | PHI | success | 4 | R | flowt001 | grees005 | R | on 1st | 0.224 | 2 |
2 | 1 | 0 | 0 | Bryce Harper | harpb003 | PHI | success | 4 | R | casac001 | grays001 | R | on 1st | 0.224 | 2 |
2 | 1 | 0 | 0 | Bryce Harper | harpb003 | PHI | fail | 3 | R | kellc002 | mcfat001 | L | on 1st | 0.224 | 3 |
2 | 1 | 0 | 0 | Bryce Harper | harpb003 | PHI | success | 4 | R | alfaj002 | quijj001 | L | on 1st | 0.224 | 2 |
Previously we created the dataframe SBA_state_change
by
reading in the ‘SBA_state_change.csv’ file. Merge the change in base
state at State 1, post SBA, with the df:
event_SBA_full_harper_ev_C <- event_SBA_full_harper_ev_B %>%
merge(SBA_state_change, by=c('base_1', 'base_2', 'base_3', 'SBA_outcome'))
Take a look at the new columns added to the df:
base_1 | base_2 | base_3 | SBA_outcome | OUT_STATE | runner_name | runner_ID | team_runner_cd | BAT_LINEUP_ID | BAT_HAND_CD | CatcherID | PIT_ID | PIT_HAND_CD | base_state_name | EV_state | S2_OUT_STATE | S2_base_1 | S2_base_2 | S2_base_3 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 0 | 0 | fail | 2 | Bryce Harper | harpb003 | PHI | 3 | R | kellc002 | mcfat001 | L | on 1st | 0.224 | 3 | 0 | 0 | 0 |
1 | 0 | 0 | success | 0 | Bryce Harper | harpb003 | PHI | 4 | R | stalj001 | ramiy001 | R | on 1st | 0.859 | 0 | 0 | 1 | 0 |
1 | 0 | 0 | success | 2 | Bryce Harper | harpb003 | PHI | 4 | R | ramow001 | matzs001 | L | on 1st | 0.224 | 2 | 0 | 1 | 0 |
1 | 0 | 0 | success | 1 | Bryce Harper | harpb003 | PHI | 3 | R | carav001 | darvy001 | R | on 1st | 0.509 | 1 | 0 | 1 | 0 |
1 | 0 | 0 | success | 1 | Bryce Harper | harpb003 | PHI | 4 | R | grany001 | housa001 | R | on 1st | 0.509 | 1 | 0 | 1 | 0 |
1 | 0 | 0 | success | 1 | Bryce Harper | harpb003 | PHI | 5 | R | barnt001 | bowmm001 | R | on 1st | 0.509 | 1 | 0 | 1 | 0 |
1 | 0 | 0 | success | 2 | Bryce Harper | harpb003 | PHI | 4 | R | castj006 | berrj001 | R | on 1st | 0.224 | 2 | 0 | 1 | 0 |
1 | 0 | 0 | success | 2 | Bryce Harper | harpb003 | PHI | 4 | R | mccab002 | sorom001 | R | on 1st | 0.224 | 2 | 0 | 1 | 0 |
1 | 0 | 0 | success | 2 | Bryce Harper | harpb003 | PHI | 4 | R | mccab002 | soboc001 | R | on 1st | 0.224 | 2 | 0 | 1 | 0 |
1 | 0 | 0 | success | 2 | Bryce Harper | harpb003 | PHI | 4 | R | alfaj002 | quijj001 | L | on 1st | 0.224 | 2 | 0 | 1 | 0 |
1 | 0 | 0 | success | 2 | Bryce Harper | harpb003 | PHI | 4 | R | flowt001 | grees005 | R | on 1st | 0.224 | 2 | 0 | 1 | 0 |
1 | 0 | 0 | success | 2 | Bryce Harper | harpb003 | PHI | 4 | R | casac001 | grays001 | R | on 1st | 0.224 | 2 | 0 | 1 | 0 |
1 | 0 | 1 | fail | 1 | Bryce Harper | harpb003 | PHI | 4 | R | alfaj002 | richt001 | R | corners | 1.130 | 2 | 0 | 0 | 1 |
1 | 0 | 1 | success | 1 | Bryce Harper | harpb003 | PHI | 3 | L | holab001 | alcas001 | R | corners | 1.130 | 1 | 0 | 1 | 1 |
1 | 0 | 1 | success | 2 | Bryce Harper | harpb003 | PHI | 3 | R | casac001 | grays001 | R | corners | 0.478 | 2 | 0 | 1 | 1 |
Add the expected value for State 1 in the same way as before.
Merge the expected value table with the df, but this time merger using
the State 2 base and out states:
event_SBA_full_harper_ev_D <- merge(event_SBA_full_harper_ev_C, expect_value,
by.x=c('S2_OUT_STATE', 'S2_base_1', 'S2_base_2', 'S2_base_3'),
by.y=c('OUT_STATE', 'base_1', 'base_2', 'base_3'))
take a look at the new columns added to the df:
S2_OUT_STATE | S2_base_1 | S2_base_2 | S2_base_3 | base_1 | base_2 | base_3 | SBA_outcome | OUT_STATE | runner_name | runner_ID | team_runner_cd | BAT_LINEUP_ID | BAT_HAND_CD | CatcherID | PIT_ID | PIT_HAND_CD | base_state_name.x | EV_state.x | base_state_name.y | EV_state.y |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | 0 | 1 | 0 | 0 | success | 0 | Bryce Harper | harpb003 | PHI | 4 | R | stalj001 | ramiy001 | R | on 1st | 0.859 | on 2nd | 1.100 |
0 | 1 | 0 | 1 | 1 | 1 | 0 | success | 0 | Bryce Harper | harpb003 | PHI | 4 | R | mccab002 | tehej001 | R | 1st & 2nd | 1.437 | corners | 1.784 |
1 | 0 | 1 | 0 | 1 | 0 | 0 | success | 1 | Bryce Harper | harpb003 | PHI | 4 | R | grany001 | housa001 | R | on 1st | 0.509 | on 2nd | 0.664 |
1 | 0 | 1 | 0 | 1 | 0 | 0 | success | 1 | Bryce Harper | harpb003 | PHI | 5 | R | barnt001 | bowmm001 | R | on 1st | 0.509 | on 2nd | 0.664 |
1 | 0 | 1 | 0 | 1 | 0 | 0 | success | 1 | Bryce Harper | harpb003 | PHI | 3 | R | carav001 | darvy001 | R | on 1st | 0.509 | on 2nd | 0.664 |
1 | 0 | 1 | 1 | 1 | 0 | 1 | success | 1 | Bryce Harper | harpb003 | PHI | 3 | L | holab001 | alcas001 | R | corners | 1.130 | 2nd & 3rd | 1.376 |
2 | 0 | 0 | 1 | 1 | 0 | 1 | fail | 1 | Bryce Harper | harpb003 | PHI | 4 | R | alfaj002 | richt001 | R | corners | 1.130 | on 3rd | 0.353 |
2 | 0 | 1 | 0 | 1 | 0 | 0 | success | 2 | Bryce Harper | harpb003 | PHI | 4 | R | ramow001 | matzs001 | L | on 1st | 0.224 | on 2nd | 0.319 |
2 | 0 | 1 | 0 | 1 | 0 | 0 | success | 2 | Bryce Harper | harpb003 | PHI | 4 | R | mccab002 | sorom001 | R | on 1st | 0.224 | on 2nd | 0.319 |
2 | 0 | 1 | 0 | 1 | 0 | 0 | success | 2 | Bryce Harper | harpb003 | PHI | 4 | R | mccab002 | soboc001 | R | on 1st | 0.224 | on 2nd | 0.319 |
2 | 0 | 1 | 0 | 1 | 0 | 0 | success | 2 | Bryce Harper | harpb003 | PHI | 4 | R | alfaj002 | quijj001 | L | on 1st | 0.224 | on 2nd | 0.319 |
2 | 0 | 1 | 0 | 1 | 0 | 0 | success | 2 | Bryce Harper | harpb003 | PHI | 4 | R | castj006 | berrj001 | R | on 1st | 0.224 | on 2nd | 0.319 |
2 | 0 | 1 | 0 | 1 | 0 | 0 | success | 2 | Bryce Harper | harpb003 | PHI | 4 | R | casac001 | grays001 | R | on 1st | 0.224 | on 2nd | 0.319 |
2 | 0 | 1 | 0 | 1 | 0 | 0 | success | 2 | Bryce Harper | harpb003 | PHI | 4 | R | flowt001 | grees005 | R | on 1st | 0.224 | on 2nd | 0.319 |
2 | 0 | 1 | 1 | 1 | 0 | 1 | success | 2 | Bryce Harper | harpb003 | PHI | 3 | R | casac001 | grays001 | R | corners | 0.478 | 2nd & 3rd | 0.580 |
See how there are now two ‘EV_state’ columns; EV_state.x &
EV_state.y These are the expected values for State 1 and State 2. Let’s
use the rename()
function to rename the column to
somethings that makes more sense and is easier to read and remember.
Also, now you can calculate the change in value for each of the three
SBA in the data frame. Use the mutate function to create a new column,
and name it ‘change_EV’.
This is the simple subtraction between the expected value of State 1 and
State 2.
event_SBA_full_harper_ev_E <- event_SBA_full_harper_ev_D %>%
rename(EV_state_1 = EV_state.x,
EV_state_2 = EV_state.y) %>%
mutate(change_EV= EV_state_2- EV_state_1 )
was value created over these three SBA’s? Let’s find out.
Sum the change in value over the full season of SBA’s.
sum(event_SBA_full_harper_ev_E$change_EV)
## [1] 1.065
A positive value means the runner created value with their SBA’s over the full season.
Now we are ready to explore a larger question: Did the team create value (or not) by SBA? Over an entire team, over all stolen base attempts, determine the change in expected value between state0 and state1. Over all SBA base states.
The event file that we need is the same as the previous task; the
full season SBA events. Determine the total number of SBA per team over
the full season. Use the table()
function over the
team_runner_cd
column.
table(event_SBA$team_runner_cd)
##
## ANA ARI ATL BAL BOS CHA CHN CIN CLE COL DET HOU KCA LAN MIA MIL MIN NYA NYN OAK
## 75 95 101 88 83 72 54 95 115 82 73 78 101 60 69 96 43 61 74 56
## PHI PIT SDN SEA SFN SLN TBA TEX TOR WAS
## 79 79 87 136 55 114 106 132 53 120
What team attempted the most SBA? Is there much variability across teams?
Similar to the previous tasks, next apply the expected value table to the event file containing all SBA for the season. Also at the Out state change in State 1.
event_SBA_team <- event_SBA %>%
merge(expect_value, by=c('OUT_STATE', 'base_1', 'base_2', 'base_3')) %>%
mutate(S2_OUT_STATE= ifelse(SBA_outcome=='success', OUT_STATE, 1+OUT_STATE))
And complete the State 1 by adding the change in base states:
event_SBA_team_A <- event_SBA_team %>%
merge(SBA_state_change, by=c('base_1', 'base_2', 'base_3', 'SBA_outcome'))
Finally, reapply the expected value to State 1:
event_SBA_team_B <- event_SBA_team_A %>%
merge(expect_value,
by.x=c('S2_OUT_STATE', 'S2_base_1', 'S2_base_2', 'S2_base_3'),
by.y=c('OUT_STATE', 'base_1', 'base_2', 'base_3')) %>%
rename(EV_state_1 = EV_state.x,
EV_state_2 = EV_state.y,
base_state_name_1 = base_state_name.x,
base_state_name_2 = base_state_name.y) %>%
mutate(delta_EV = EV_state_2-EV_state_1)
Now we have a dataframe the calculates the change in expected value for every stolen base attempt in one season. Let’s see if over the entire league, over the full season, did SBA’s create or destroy value on average. To do this we will sum the change in expected value over all SBA’s and then divide by the number of SBA’s.
sum(event_SBA_team_B$delta_EV)/length(event_SBA_team_B$delta_EV)
## [1] 0.02210269
A positive results indicates that over all stolen base attempts from all players stolen bases attempts create value.
But, our task is to explore the team level change in expected value.
As a reminder, the dataframe contains a column, team_runner_cd, that
identifies the team of the runner for each SBA event.
First we need to change the variable type of
team_runner_cd
.
event_SBA_team_B$team_runner_cd <- as.factor(event_SBA_team_B$team_runner_cd)
Next calculate the net expect value change by team from all players.
event_SBA_team_C <- event_SBA_team_B %>%
group_by(team_runner_cd) %>%
summarise(tot_delta_EV = sum(delta_EV))
Visualize the net expected value change per team from SBA
ggplot(event_SBA_team_C) +
geom_col(aes(y=tot_delta_EV, x=reorder(team_runner_cd, desc(tot_delta_EV)))) +
theme_bw() +
theme(axis.text.x=element_text(angle=90,hjust=1)) +
xlab(NULL) +
ylab('cumulative EV change') +
ggtitle('Team wide change in EV from SBA') +
theme(plot.title = element_text(hjust = 0.5)) +
geom_hline(yintercept = 0, color='red')
event_SBA_team_indiv <- event_SBA_team_B %>%
group_by(runner_ID) %>%
mutate(counter=1) %>%
summarise(delta_EV_tot= sum(delta_EV),
SBA_tot= sum(counter))
ggplot(event_SBA_team_indiv) +
geom_hline(yintercept = 0, color='red') +
geom_point(aes(x=SBA_tot, y=delta_EV_tot), alpha=0.5) +
theme_bw() + xlab('SBA per player') +
scale_y_continuous(breaks=seq(-2.5, 5, by=0.5),
name='net change in expected value from SBA')