~by hand, for one player over three stolen base attempts, determine the change in expected value between state1 and state2. Over a variety of SBA base states, not just (1,0,0) to (0,1,0)
read in the csv file that contains the expected value table for each base state - out state. use the ‘read_csv’ function to read the file and assign it to the data frame ‘expect_value’.
expect_value <- read_csv('EV_base_out_states.csv')
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## EV_state = col_double(),
## OUT_STATE = col_double(),
## base_1 = col_double(),
## base_2 = col_double(),
## base_3 = col_double(),
## base_state_name = col_character()
## )
And then look at the dataframe using the ‘head’ function. In R Studio, you can view a dataframe by clicking on the table data icon on the right, in the environment pane. Try that too.
head(expect_value)
## # A tibble: 6 × 6
## EV_state OUT_STATE base_1 base_2 base_3 base_state_name
## <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 0.481 0 0 0 0 empty
## 2 0.859 0 1 0 0 on 1st
## 3 1.1 0 0 1 0 on 2nd
## 4 1.44 0 1 1 0 1st & 2nd
## 5 1.35 0 0 0 1 on 3rd
## 6 1.78 0 1 0 1 corners
the ‘head’ functions shows column names and the top 6 rows.
base_1 shows if there is runner on 1st base (1), or no runner (0).
Add a new column to the expected value table; baserunners_tot, the total number of base runners on all three bases for each base state. Use the ‘mutate’ function to create the new column based on the addition of all three bases.
expect_value_A <- expect_value %>%
mutate(baserunners_tot = (base_1+ base_2+ base_3))
Let’s take a quick look at the expect value. Plot the expect values of different out states.
Does that make sense? The expected value is highest with zero outs and decreases with increasing outs. The expected value with 3 outs is zero; the inning is over. Within each out state, the more runners on base, the higher the expected value.
read in the event file & select a player read in the abbreviated stolen base attempt event file
event_BSA_short <- read_csv('event_SBA_short.csv')
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## runner_name = col_character(),
## runner_ID = col_character(),
## OUT_STATE = col_double(),
## base_1 = col_double(),
## base_2 = col_double(),
## base_3 = col_double(),
## SBA_outcome = col_character()
## )
you can see the column names of the df above. There are 6 columns.
Let’s take a looks at the table, use the ‘head’ function to show the event file
head(event_BSA_short)
## # A tibble: 6 × 7
## runner_name runner_ID OUT_STATE base_1 base_2 base_3 SBA_outcome
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 Bryce Harper harpb003 0 1 0 0 success
## 2 Bryce Harper harpb003 2 0 1 0 success
## 3 Bryce Harper harpb003 2 1 0 0 success
## 4 Cesar Hernandez hernc005 1 0 1 0 fail
## 5 Cesar Hernandez hernc005 1 1 0 0 fail
## 6 Cesar Hernandez hernc005 2 1 0 0 success
see what runner are available in the event file:
table(event_BSA_short$runner_name)
##
## Bryce Harper Cesar Hernandez J.T. Realmuto Jean Segura
## 3 3 3 3
there are four different runners, and each runner has 3 observations (i.e. three SBA).
filter the event file to select only one player
event_BSA_short_harper <- event_BSA_short %>%
filter(runner_ID=='harpb003')
take a look on the new dataframwe
head(event_BSA_short_harper)
## # A tibble: 3 × 7
## runner_name runner_ID OUT_STATE base_1 base_2 base_3 SBA_outcome
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 Bryce Harper harpb003 0 1 0 0 success
## 2 Bryce Harper harpb003 2 0 1 0 success
## 3 Bryce Harper harpb003 2 1 0 0 success
table(event_BSA_short_harper$runner_name)
##
## Bryce Harper
## 3
Use the merge function to add the expected value for each of the 24 base-out states.
event_BSA_short_harper_ev <- merge(event_BSA_short_harper, expect_value,
by=c('OUT_STATE', 'base_1', 'base_2', 'base_3'))
take a look at this dataframe of a single player. Note how the expected value column was added for each base-out state
head(event_BSA_short_harper_ev)
## OUT_STATE base_1 base_2 base_3 runner_name runner_ID SBA_outcome EV_state
## 1 0 1 0 0 Bryce Harper harpb003 success 0.859
## 2 2 0 1 0 Bryce Harper harpb003 success 0.319
## 3 2 1 0 0 Bryce Harper harpb003 success 0.224
## base_state_name
## 1 on 1st
## 2 on 2nd
## 3 on 1st
determing State 2, the base state and out state after a SBA.
State 2 out state: Calculate the change in out state based on outcome of the stolen base attempt. Success, a successful SBA does not change the out state, or fail, a SBA that leads to an out increases the out state buy 1 out.
event_BSA_short_harper_ev_B <- event_BSA_short_harper_ev %>%
mutate(S2_OUT_STATE= ifelse(SBA_outcome=='success', OUT_STATE, 1+OUT_STATE))
view the modified dataframe
head(event_BSA_short_harper_ev_B)
read in table with the changes in base state with a SBA
SBA_state_change <- read_csv('SBA_state_change.csv')
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## base_1 = col_double(),
## base_2 = col_double(),
## base_3 = col_double(),
## SBA_outcome = col_character(),
## S2_base_1 = col_double(),
## S2_base_2 = col_double(),
## S2_base_3 = col_double()
## )
merge the change in base state at State 2 with the df:
event_BSA_short_harper_ev_C <- event_BSA_short_harper_ev_B %>%
merge(SBA_state_change, by=c('base_1', 'base_2', 'base_3', 'SBA_outcome'))
take a look at the new columns added to the df:
head(event_BSA_short_harper_ev_C)
## base_1 base_2 base_3 SBA_outcome OUT_STATE runner_name runner_ID EV_state
## 1 0 1 0 success 2 Bryce Harper harpb003 0.319
## 2 1 0 0 success 0 Bryce Harper harpb003 0.859
## 3 1 0 0 success 2 Bryce Harper harpb003 0.224
## base_state_name S2_OUT_STATE S2_base_1 S2_base_2 S2_base_3
## 1 on 2nd 2 0 0 1
## 2 on 1st 0 0 1 0
## 3 on 1st 2 0 1 0
add the expected value for State 2 in the same way as before.
Merge the expected value table with the df, but this time merger using the State 2 base and out states:
event_BSA_short_harper_ev_D <- merge(event_BSA_short_harper_ev_C, expect_value,
by.x=c('S2_OUT_STATE', 'S2_base_1', 'S2_base_2', 'S2_base_3'),
by.y=c('OUT_STATE', 'base_1', 'base_2', 'base_3'))
take a look at the new columns added to the df:
head(event_BSA_short_harper_ev_D)
## S2_OUT_STATE S2_base_1 S2_base_2 S2_base_3 base_1 base_2 base_3 SBA_outcome
## 1 0 0 1 0 1 0 0 success
## 2 2 0 0 1 0 1 0 success
## 3 2 0 1 0 1 0 0 success
## OUT_STATE runner_name runner_ID EV_state.x base_state_name.x EV_state.y
## 1 0 Bryce Harper harpb003 0.859 on 1st 1.100
## 2 2 Bryce Harper harpb003 0.319 on 2nd 0.353
## 3 2 Bryce Harper harpb003 0.224 on 1st 0.319
## base_state_name.y
## 1 on 2nd
## 2 on 3rd
## 3 on 2nd
that is a getting a little complicated. Let’s use a table:
S2_OUT_STATE | S2_base_1 | S2_base_2 | S2_base_3 | base_1 | base_2 | base_3 | SBA_outcome | OUT_STATE | runner_name | runner_ID | EV_state.x | base_state_name.x | EV_state.y | base_state_name.y |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | 0 | 1 | 0 | 0 | success | 0 | Bryce Harper | harpb003 | 0.859 | on 1st | 1.100 | on 2nd |
2 | 0 | 0 | 1 | 0 | 1 | 0 | success | 2 | Bryce Harper | harpb003 | 0.319 | on 2nd | 0.353 | on 3rd |
2 | 0 | 1 | 0 | 1 | 0 | 0 | success | 2 | Bryce Harper | harpb003 | 0.224 | on 1st | 0.319 | on 2nd |
see how there are now two ‘EV_state’ columns; EV_state.x & EV_state.y These are the expected values for State 1 and State 2. Let’s use the ‘rename’ function to rename them to somethings that makes more sense and is easier to read and remember. Also, now you can calculate the change in value for each of the three SBA in the data frame. Use the mutate function to create a new column, and name it ‘change_EV’.
This is the simple subtraction between the expeted value of State 1 and State 2.
event_BSA_short_harper_ev_E <- event_BSA_short_harper_ev_D %>%
rename(EV_state_1 = EV_state.x,
EV_state_2 = EV_state.y) %>%
mutate(change_EV= EV_state_2- EV_state_1 )
was value created over these three SBA’s? Let’s find out.
Sum the change in value over these three SBA’s.
Remember, these three are non-representative, so this sumation does not represent the players’ actual value creation or descruction.
sum(event_BSA_short_harper_ev_E$change_EV)
## [1] 0.37
A positive value means the runner created value with these three SBA.
{maybe do it for a 2nd player and compare who created more value} ~by hand for one player over all stolen base attempts, determine the change in expected value between state1 and state2. Over all SBA base states, and sum the net season change in value. Did a player X’s stolen base activity create run value or not?
read in the stolen base attempt event file
event_BSA <- read_csv('event_SBA_full.csv')
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## runner_name = col_character(),
## runner_ID = col_character(),
## team_runner_cd = col_character(),
## OUT_STATE = col_double(),
## base_1 = col_double(),
## base_2 = col_double(),
## base_3 = col_double(),
## SBA_outcome = col_character(),
## BAT_LINEUP_ID = col_double(),
## BAT_HAND_CD = col_character(),
## CatcherID = col_character(),
## PIT_ID = col_character(),
## PIT_HAND_CD = col_character()
## )
How many events are in this file? There are a few ways to determine this. One way is to use the ‘length’ function
length(event_BSA$runner_ID)
## [1] 2532
that shows how many SBA attempts this single player had over the entire season. Over all players what was the number sucess and failed SBA?
table(event_BSA$SBA_outcome)
##
## fail success
## 579 1953
head(event_BSA)
## # A tibble: 6 × 13
## runner_name runner_ID team_runner_cd OUT_STATE base_1 base_2 base_3
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Jose Abreu abrej003 CHA 2 1 0 0
## 2 Ronald Acuna acunr001 ATL 1 1 0 0
## 3 Ronald Acuna acunr001 ATL 0 1 0 0
## 4 Ronald Acuna acunr001 ATL 0 1 0 1
## 5 Ronald Acuna acunr001 ATL 2 1 0 1
## 6 Ronald Acuna acunr001 ATL 2 1 0 1
## # … with 6 more variables: SBA_outcome <chr>, BAT_LINEUP_ID <dbl>,
## # BAT_HAND_CD <chr>, CatcherID <chr>, PIT_ID <chr>, PIT_HAND_CD <chr>
show the event file
Now we will repeat the same actions as in Task 1, but with the full record for one player. Since you have already read in the expected value table in the global environment, that task doesn’t need to be repeated.
filter the event file to select only one player
event_BSA_full_harper <- event_BSA %>%
filter(runner_ID=='harpb003')
take a look on the new dataframe
head(event_BSA_full_harper)
## # A tibble: 6 × 13
## runner_name runner_ID team_runner_cd OUT_STATE base_1 base_2 base_3
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Bryce Harper harpb003 PHI 0 1 0 0
## 2 Bryce Harper harpb003 PHI 2 1 0 0
## 3 Bryce Harper harpb003 PHI 2 1 0 1
## 4 Bryce Harper harpb003 PHI 2 1 0 0
## 5 Bryce Harper harpb003 PHI 1 1 0 1
## 6 Bryce Harper harpb003 PHI 2 1 0 0
## # … with 6 more variables: SBA_outcome <chr>, BAT_LINEUP_ID <dbl>,
## # BAT_HAND_CD <chr>, CatcherID <chr>, PIT_ID <chr>, PIT_HAND_CD <chr>
table(event_BSA_full_harper$runner_name)
##
## Bryce Harper
## 16
length(event_BSA_full_harper$runner_ID)
## [1] 16
that shows how many SBA attempts this single player had over the entire season
table(event_BSA_full_harper$SBA_outcome)
##
## fail success
## 2 14
Use the merge function to add the expected value for each of the 24 base-out states.
event_BSA_full_harper_ev <- merge(event_BSA_full_harper, expect_value,
by=c('OUT_STATE', 'base_1', 'base_2', 'base_3'))
take a look at this dataframe of a single player. Note how the expected value column was added for each base-out state
OUT_STATE | base_1 | base_2 | base_3 | runner_name | runner_ID | team_runner_cd | SBA_outcome | BAT_LINEUP_ID | BAT_HAND_CD | CatcherID | PIT_ID | PIT_HAND_CD | EV_state | base_state_name |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 0 | Bryce Harper | harpb003 | PHI | success | 4 | R | stalj001 | ramiy001 | R | 0.859 | on 1st |
0 | 1 | 1 | 0 | Bryce Harper | harpb003 | PHI | success | 4 | R | mccab002 | tehej001 | R | 1.437 | 1st & 2nd |
1 | 1 | 0 | 0 | Bryce Harper | harpb003 | PHI | success | 3 | R | carav001 | darvy001 | R | 0.509 | on 1st |
determining State 2, the base state and out state after a SBA.
State 2 out state: Calculate the change in out state based on outcome of the stolen base attempt. Success, a successful SBA does not change the out state, or fail, a SBA that leads to an out increases the out state buy 1 out.
event_BSA_full_harper_ev_B <- event_BSA_full_harper_ev %>%
mutate(S2_OUT_STATE= ifelse(SBA_outcome=='success', OUT_STATE, 1+OUT_STATE))
view the modified dataframe
OUT_STATE | base_1 | base_2 | base_3 | runner_name | runner_ID | team_runner_cd | SBA_outcome | BAT_LINEUP_ID | BAT_HAND_CD | CatcherID | PIT_ID | PIT_HAND_CD | EV_state | base_state_name | S2_OUT_STATE |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 0 | Bryce Harper | harpb003 | PHI | success | 4 | R | stalj001 | ramiy001 | R | 0.859 | on 1st | 0 |
0 | 1 | 1 | 0 | Bryce Harper | harpb003 | PHI | success | 4 | R | mccab002 | tehej001 | R | 1.437 | 1st & 2nd | 0 |
1 | 1 | 0 | 0 | Bryce Harper | harpb003 | PHI | success | 3 | R | carav001 | darvy001 | R | 0.509 | on 1st | 1 |
1 | 1 | 0 | 0 | Bryce Harper | harpb003 | PHI | success | 4 | R | grany001 | housa001 | R | 0.509 | on 1st | 1 |
1 | 1 | 0 | 0 | Bryce Harper | harpb003 | PHI | success | 5 | R | barnt001 | bowmm001 | R | 0.509 | on 1st | 1 |
1 | 1 | 0 | 1 | Bryce Harper | harpb003 | PHI | success | 3 | L | holab001 | alcas001 | R | 1.130 | corners | 1 |
1 | 1 | 0 | 1 | Bryce Harper | harpb003 | PHI | fail | 4 | R | alfaj002 | richt001 | R | 1.130 | corners | 2 |
2 | 1 | 0 | 0 | Bryce Harper | harpb003 | PHI | success | 4 | R | mccab002 | sorom001 | R | 0.224 | on 1st | 2 |
2 | 1 | 0 | 0 | Bryce Harper | harpb003 | PHI | success | 4 | R | mccab002 | soboc001 | R | 0.224 | on 1st | 2 |
2 | 1 | 0 | 0 | Bryce Harper | harpb003 | PHI | success | 4 | R | ramow001 | matzs001 | L | 0.224 | on 1st | 2 |
2 | 1 | 0 | 0 | Bryce Harper | harpb003 | PHI | success | 4 | R | castj006 | berrj001 | R | 0.224 | on 1st | 2 |
2 | 1 | 0 | 0 | Bryce Harper | harpb003 | PHI | success | 4 | R | flowt001 | grees005 | R | 0.224 | on 1st | 2 |
2 | 1 | 0 | 0 | Bryce Harper | harpb003 | PHI | success | 4 | R | casac001 | grays001 | R | 0.224 | on 1st | 2 |
2 | 1 | 0 | 0 | Bryce Harper | harpb003 | PHI | fail | 3 | R | kellc002 | mcfat001 | L | 0.224 | on 1st | 3 |
2 | 1 | 0 | 0 | Bryce Harper | harpb003 | PHI | success | 4 | R | alfaj002 | quijj001 | L | 0.224 | on 1st | 2 |
read in table with the changes in base state with a SBA
SBA_state_change <- read_csv('SBA_state_change.csv')
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## base_1 = col_double(),
## base_2 = col_double(),
## base_3 = col_double(),
## SBA_outcome = col_character(),
## S2_base_1 = col_double(),
## S2_base_2 = col_double(),
## S2_base_3 = col_double()
## )
merge the change in base state at State 2 with the df:
event_BSA_full_harper_ev_C <- event_BSA_full_harper_ev_B %>%
merge(SBA_state_change, by=c('base_1', 'base_2', 'base_3', 'SBA_outcome'))
take a look at the new columns added to the df:
base_1 | base_2 | base_3 | SBA_outcome | OUT_STATE | runner_name | runner_ID | team_runner_cd | BAT_LINEUP_ID | BAT_HAND_CD | CatcherID | PIT_ID | PIT_HAND_CD | EV_state | base_state_name | S2_OUT_STATE | S2_base_1 | S2_base_2 | S2_base_3 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 0 | 0 | fail | 2 | Bryce Harper | harpb003 | PHI | 3 | R | kellc002 | mcfat001 | L | 0.224 | on 1st | 3 | 0 | 0 | 0 |
1 | 0 | 0 | success | 0 | Bryce Harper | harpb003 | PHI | 4 | R | stalj001 | ramiy001 | R | 0.859 | on 1st | 0 | 0 | 1 | 0 |
1 | 0 | 0 | success | 2 | Bryce Harper | harpb003 | PHI | 4 | R | ramow001 | matzs001 | L | 0.224 | on 1st | 2 | 0 | 1 | 0 |
1 | 0 | 0 | success | 1 | Bryce Harper | harpb003 | PHI | 3 | R | carav001 | darvy001 | R | 0.509 | on 1st | 1 | 0 | 1 | 0 |
1 | 0 | 0 | success | 1 | Bryce Harper | harpb003 | PHI | 4 | R | grany001 | housa001 | R | 0.509 | on 1st | 1 | 0 | 1 | 0 |
1 | 0 | 0 | success | 1 | Bryce Harper | harpb003 | PHI | 5 | R | barnt001 | bowmm001 | R | 0.509 | on 1st | 1 | 0 | 1 | 0 |
1 | 0 | 0 | success | 2 | Bryce Harper | harpb003 | PHI | 4 | R | castj006 | berrj001 | R | 0.224 | on 1st | 2 | 0 | 1 | 0 |
1 | 0 | 0 | success | 2 | Bryce Harper | harpb003 | PHI | 4 | R | mccab002 | sorom001 | R | 0.224 | on 1st | 2 | 0 | 1 | 0 |
1 | 0 | 0 | success | 2 | Bryce Harper | harpb003 | PHI | 4 | R | mccab002 | soboc001 | R | 0.224 | on 1st | 2 | 0 | 1 | 0 |
1 | 0 | 0 | success | 2 | Bryce Harper | harpb003 | PHI | 4 | R | alfaj002 | quijj001 | L | 0.224 | on 1st | 2 | 0 | 1 | 0 |
1 | 0 | 0 | success | 2 | Bryce Harper | harpb003 | PHI | 4 | R | flowt001 | grees005 | R | 0.224 | on 1st | 2 | 0 | 1 | 0 |
1 | 0 | 0 | success | 2 | Bryce Harper | harpb003 | PHI | 4 | R | casac001 | grays001 | R | 0.224 | on 1st | 2 | 0 | 1 | 0 |
1 | 0 | 1 | fail | 1 | Bryce Harper | harpb003 | PHI | 4 | R | alfaj002 | richt001 | R | 1.130 | corners | 2 | 0 | 0 | 1 |
1 | 0 | 1 | success | 1 | Bryce Harper | harpb003 | PHI | 3 | L | holab001 | alcas001 | R | 1.130 | corners | 1 | 0 | 1 | 1 |
1 | 0 | 1 | success | 2 | Bryce Harper | harpb003 | PHI | 3 | R | casac001 | grays001 | R | 0.478 | corners | 2 | 0 | 1 | 1 |
add the expected value for State 2 in the same way as before.
Merge the expected value table with the df, but this time merger using the State 2 base and out states:
event_BSA_full_harper_ev_D <- merge(event_BSA_full_harper_ev_C, expect_value,
by.x=c('S2_OUT_STATE', 'S2_base_1', 'S2_base_2', 'S2_base_3'),
by.y=c('OUT_STATE', 'base_1', 'base_2', 'base_3'))
take a look at the new columns added to the df:
S2_OUT_STATE | S2_base_1 | S2_base_2 | S2_base_3 | base_1 | base_2 | base_3 | SBA_outcome | OUT_STATE | runner_name | runner_ID | team_runner_cd | BAT_LINEUP_ID | BAT_HAND_CD | CatcherID | PIT_ID | PIT_HAND_CD | EV_state.x | base_state_name.x | EV_state.y | base_state_name.y |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | 0 | 1 | 0 | 0 | success | 0 | Bryce Harper | harpb003 | PHI | 4 | R | stalj001 | ramiy001 | R | 0.859 | on 1st | 1.100 | on 2nd |
0 | 1 | 0 | 1 | 1 | 1 | 0 | success | 0 | Bryce Harper | harpb003 | PHI | 4 | R | mccab002 | tehej001 | R | 1.437 | 1st & 2nd | 1.784 | corners |
1 | 0 | 1 | 0 | 1 | 0 | 0 | success | 1 | Bryce Harper | harpb003 | PHI | 4 | R | grany001 | housa001 | R | 0.509 | on 1st | 0.664 | on 2nd |
1 | 0 | 1 | 0 | 1 | 0 | 0 | success | 1 | Bryce Harper | harpb003 | PHI | 5 | R | barnt001 | bowmm001 | R | 0.509 | on 1st | 0.664 | on 2nd |
1 | 0 | 1 | 0 | 1 | 0 | 0 | success | 1 | Bryce Harper | harpb003 | PHI | 3 | R | carav001 | darvy001 | R | 0.509 | on 1st | 0.664 | on 2nd |
1 | 0 | 1 | 1 | 1 | 0 | 1 | success | 1 | Bryce Harper | harpb003 | PHI | 3 | L | holab001 | alcas001 | R | 1.130 | corners | 1.376 | 2nd & 3rd |
2 | 0 | 0 | 1 | 1 | 0 | 1 | fail | 1 | Bryce Harper | harpb003 | PHI | 4 | R | alfaj002 | richt001 | R | 1.130 | corners | 0.353 | on 3rd |
2 | 0 | 1 | 0 | 1 | 0 | 0 | success | 2 | Bryce Harper | harpb003 | PHI | 4 | R | ramow001 | matzs001 | L | 0.224 | on 1st | 0.319 | on 2nd |
2 | 0 | 1 | 0 | 1 | 0 | 0 | success | 2 | Bryce Harper | harpb003 | PHI | 4 | R | mccab002 | sorom001 | R | 0.224 | on 1st | 0.319 | on 2nd |
2 | 0 | 1 | 0 | 1 | 0 | 0 | success | 2 | Bryce Harper | harpb003 | PHI | 4 | R | mccab002 | soboc001 | R | 0.224 | on 1st | 0.319 | on 2nd |
2 | 0 | 1 | 0 | 1 | 0 | 0 | success | 2 | Bryce Harper | harpb003 | PHI | 4 | R | alfaj002 | quijj001 | L | 0.224 | on 1st | 0.319 | on 2nd |
2 | 0 | 1 | 0 | 1 | 0 | 0 | success | 2 | Bryce Harper | harpb003 | PHI | 4 | R | castj006 | berrj001 | R | 0.224 | on 1st | 0.319 | on 2nd |
2 | 0 | 1 | 0 | 1 | 0 | 0 | success | 2 | Bryce Harper | harpb003 | PHI | 4 | R | casac001 | grays001 | R | 0.224 | on 1st | 0.319 | on 2nd |
2 | 0 | 1 | 0 | 1 | 0 | 0 | success | 2 | Bryce Harper | harpb003 | PHI | 4 | R | flowt001 | grees005 | R | 0.224 | on 1st | 0.319 | on 2nd |
2 | 0 | 1 | 1 | 1 | 0 | 1 | success | 2 | Bryce Harper | harpb003 | PHI | 3 | R | casac001 | grays001 | R | 0.478 | corners | 0.580 | 2nd & 3rd |
see how there are now two ‘EV_state’ columns; EV_state.x & EV_state.y These are the expected values for State 1 and State 2. Let’s use the ‘rename’ function to rename them to somethings that makes more sense and is easier to read and remember. Also, now you can calculate the change in value for each of the three SBA in the data frame. Use the mutate function to create a new column, and name it ‘change_EV’.
This is the simple subtraction between the expeted value of State 1 and State 2.
event_BSA_full_harper_ev_E <- event_BSA_full_harper_ev_D %>%
rename(EV_state_1 = EV_state.x,
EV_state_2 = EV_state.y) %>%
mutate(change_EV= EV_state_2- EV_state_1 )
was value created over these three SBA’s? Let’s find out.
Sum the change in value over these three SBA’s.
Remember, these three are non-representative, so this sumation does not represent the players’ actual value creation or descruction.
sum(event_BSA_full_harper_ev_E$change_EV)
## [1] 1.065
A positive value means the runner created value with thier SBA over the full season.
Did the team create value (or not) by SBA? Over an entire team, over all stolen base attempts, determine the change in expected value between state1 and state2. Over all SBA base states.
the event file is the same as the previous task; the full season SBA events see the total number of SBA for each team:
table(event_BSA$team_runner_cd)
##
## ANA ARI ATL BAL BOS CHA CHN CIN CLE COL DET HOU KCA LAN MIA MIL MIN NYA NYN OAK
## 75 95 101 88 83 72 54 95 115 82 73 78 101 60 69 96 43 61 74 56
## PHI PIT SDN SEA SFN SLN TBA TEX TOR WAS
## 79 79 87 136 55 114 106 132 53 120
And plot this table using ggplot
ggplot(event_BSA, aes(x=team_runner_cd)) +
geom_bar() +
theme(axis.text.x=element_text(angle=90,hjust=1)) +
ylab('stolen base attempts / team') +
xlab(NULL)
Similar to the previous tasks, next apply the expected value table to the event file containing all SBA for the season. Also at the Out state change in State 2.
event_BSA_team <- event_BSA %>%
merge(expect_value, by=c('OUT_STATE', 'base_1', 'base_2', 'base_3')) %>%
mutate(S2_OUT_STATE= ifelse(SBA_outcome=='success', OUT_STATE, 1+OUT_STATE))
And complete the State 2 by adding the change in base states:
event_BSA_team_A <- event_BSA_team %>%
merge(SBA_state_change, by=c('base_1', 'base_2', 'base_3', 'SBA_outcome'))
Finally, reapply the expected value to State 2:
event_BSA_team_B <- event_BSA_team_A %>%
merge(expect_value,
by.x=c('S2_OUT_STATE', 'S2_base_1', 'S2_base_2', 'S2_base_3'),
by.y=c('OUT_STATE', 'base_1', 'base_2', 'base_3')) %>%
rename(EV_state_1 = EV_state.x,
EV_state_2 = EV_state.y,
base_state_name_1 = base_state_name.x,
base_state_name_2 = base_state_name.y) %>%
mutate(delta_EV = EV_state_2-EV_state_1)
event_BSA_team_B$team_runner_cd <- as.factor(event_BSA_team_B$team_runner_cd)
hist(event_BSA_team_B$delta_EV, breaks=20)
sum(event_BSA_team_B$delta_EV)
## [1] 55.964
sum(event_BSA_team_B$delta_EV)/length(event_BSA_team_B$delta_EV)
## [1] 0.02210269
event_BSA_team_C <- event_BSA_team_B %>%
group_by(team_runner_cd) %>%
summarise(tot_delta_EV = sum(delta_EV))
ggplot(event_BSA_team_C) +
geom_col(aes(y=tot_delta_EV, x=reorder(team_runner_cd, desc(tot_delta_EV)))) +
theme_bw() +
theme(axis.text.x=element_text(angle=90,hjust=1)) +
xlab(NULL) +
ylab('cumulative EV change') +
ggtitle('Team wide change in EV from SBA') +
theme(plot.title = element_text(hjust = 0.5)) +
geom_hline(yintercept = 0, color='red')
event_BSA_team_indiv <- event_BSA_team_B %>%
group_by(runner_ID) %>%
mutate(counter=1) %>%
summarise(delta_EV_tot= sum(delta_EV),
SBA_tot= sum(counter))
hist(event_BSA_team_indiv$delta_EV_tot, breaks=40)
max(event_BSA_team_indiv$delta_EV_tot)
## [1] 4.574
ggplot(event_BSA_team_indiv) +
geom_hline(yintercept = 0, color='red') +
geom_point(aes(x=SBA_tot, y=delta_EV_tot), alpha=0.5) +
theme_bw() + xlab('SBA per player') +
scale_y_continuous(breaks=seq(-2.5, 5, by=0.5),
name='net change in expected value from SBA')