Task 1: by hand, 1 player, 3 SBA & outcomes

~by hand, for one player over three stolen base attempts, determine the change in expected value between state1 and state2. Over a variety of SBA base states, not just (1,0,0) to (0,1,0)

TASK1a: read EV file

read in the csv file that contains the expected value table for each base state - out state. use the ‘read_csv’ function to read the file and assign it to the data frame ‘expect_value’.

expect_value <- read_csv('EV_base_out_states.csv')
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   EV_state = col_double(),
##   OUT_STATE = col_double(),
##   base_1 = col_double(),
##   base_2 = col_double(),
##   base_3 = col_double(),
##   base_state_name = col_character()
## )

And then look at the dataframe using the ‘head’ function. In R Studio, you can view a dataframe by clicking on the table data icon on the right, in the environment pane. Try that too.

head(expect_value)
## # A tibble: 6 × 6
##   EV_state OUT_STATE base_1 base_2 base_3 base_state_name
##      <dbl>     <dbl>  <dbl>  <dbl>  <dbl> <chr>          
## 1    0.481         0      0      0      0 empty          
## 2    0.859         0      1      0      0 on 1st         
## 3    1.1           0      0      1      0 on 2nd         
## 4    1.44          0      1      1      0 1st & 2nd      
## 5    1.35          0      0      0      1 on 3rd         
## 6    1.78          0      1      0      1 corners

the ‘head’ functions shows column names and the top 6 rows.
base_1 shows if there is runner on 1st base (1), or no runner (0).

Add a new column to the expected value table; baserunners_tot, the total number of base runners on all three bases for each base state. Use the ‘mutate’ function to create the new column based on the addition of all three bases.

expect_value_A <- expect_value %>%
  mutate(baserunners_tot = (base_1+ base_2+ base_3))

Let’s take a quick look at the expect value. Plot the expect values of different out states.

Does that make sense? The expected value is highest with zero outs and decreases with increasing outs. The expected value with 3 outs is zero; the inning is over. Within each out state, the more runners on base, the higher the expected value.

TASK1b: read event file

read in the event file & select a player read in the abbreviated stolen base attempt event file

event_BSA_short <- read_csv('event_SBA_short.csv')
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   runner_name = col_character(),
##   runner_ID = col_character(),
##   OUT_STATE = col_double(),
##   base_1 = col_double(),
##   base_2 = col_double(),
##   base_3 = col_double(),
##   SBA_outcome = col_character()
## )

you can see the column names of the df above. There are 6 columns.

Let’s take a looks at the table, use the ‘head’ function to show the event file

head(event_BSA_short)
## # A tibble: 6 × 7
##   runner_name     runner_ID OUT_STATE base_1 base_2 base_3 SBA_outcome
##   <chr>           <chr>         <dbl>  <dbl>  <dbl>  <dbl> <chr>      
## 1 Bryce Harper    harpb003          0      1      0      0 success    
## 2 Bryce Harper    harpb003          2      0      1      0 success    
## 3 Bryce Harper    harpb003          2      1      0      0 success    
## 4 Cesar Hernandez hernc005          1      0      1      0 fail       
## 5 Cesar Hernandez hernc005          1      1      0      0 fail       
## 6 Cesar Hernandez hernc005          2      1      0      0 success

see what runner are available in the event file:

table(event_BSA_short$runner_name)
## 
##    Bryce Harper Cesar Hernandez   J.T. Realmuto     Jean Segura 
##               3               3               3               3

there are four different runners, and each runner has 3 observations (i.e. three SBA).

filter the event file to select only one player

event_BSA_short_harper <- event_BSA_short %>%
  filter(runner_ID=='harpb003')

take a look on the new dataframwe

head(event_BSA_short_harper)
## # A tibble: 3 × 7
##   runner_name  runner_ID OUT_STATE base_1 base_2 base_3 SBA_outcome
##   <chr>        <chr>         <dbl>  <dbl>  <dbl>  <dbl> <chr>      
## 1 Bryce Harper harpb003          0      1      0      0 success    
## 2 Bryce Harper harpb003          2      0      1      0 success    
## 3 Bryce Harper harpb003          2      1      0      0 success
table(event_BSA_short_harper$runner_name)
## 
## Bryce Harper 
##            3

Task 1c: apply EV to SBA events

Use the merge function to add the expected value for each of the 24 base-out states.

event_BSA_short_harper_ev <- merge(event_BSA_short_harper, expect_value, 
                                   by=c('OUT_STATE', 'base_1', 'base_2', 'base_3'))

take a look at this dataframe of a single player. Note how the expected value column was added for each base-out state

head(event_BSA_short_harper_ev)
##   OUT_STATE base_1 base_2 base_3  runner_name runner_ID SBA_outcome EV_state
## 1         0      1      0      0 Bryce Harper  harpb003     success    0.859
## 2         2      0      1      0 Bryce Harper  harpb003     success    0.319
## 3         2      1      0      0 Bryce Harper  harpb003     success    0.224
##   base_state_name
## 1          on 1st
## 2          on 2nd
## 3          on 1st

Task 1d: change in state with SBA,

determing State 2, the base state and out state after a SBA.

State 2 out state: Calculate the change in out state based on outcome of the stolen base attempt. Success, a successful SBA does not change the out state, or fail, a SBA that leads to an out increases the out state buy 1 out.

event_BSA_short_harper_ev_B <- event_BSA_short_harper_ev %>%
  mutate(S2_OUT_STATE= ifelse(SBA_outcome=='success', OUT_STATE, 1+OUT_STATE))

view the modified dataframe

head(event_BSA_short_harper_ev_B)

read in table with the changes in base state with a SBA

SBA_state_change <- read_csv('SBA_state_change.csv')
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   base_1 = col_double(),
##   base_2 = col_double(),
##   base_3 = col_double(),
##   SBA_outcome = col_character(),
##   S2_base_1 = col_double(),
##   S2_base_2 = col_double(),
##   S2_base_3 = col_double()
## )

merge the change in base state at State 2 with the df:

event_BSA_short_harper_ev_C <- event_BSA_short_harper_ev_B %>%
  merge(SBA_state_change, by=c('base_1', 'base_2', 'base_3', 'SBA_outcome'))

take a look at the new columns added to the df:

head(event_BSA_short_harper_ev_C)
##   base_1 base_2 base_3 SBA_outcome OUT_STATE  runner_name runner_ID EV_state
## 1      0      1      0     success         2 Bryce Harper  harpb003    0.319
## 2      1      0      0     success         0 Bryce Harper  harpb003    0.859
## 3      1      0      0     success         2 Bryce Harper  harpb003    0.224
##   base_state_name S2_OUT_STATE S2_base_1 S2_base_2 S2_base_3
## 1          on 2nd            2         0         0         1
## 2          on 1st            0         0         1         0
## 3          on 1st            2         0         1         0

add the expected value for State 2 in the same way as before.
Merge the expected value table with the df, but this time merger using the State 2 base and out states:

event_BSA_short_harper_ev_D <- merge(event_BSA_short_harper_ev_C, expect_value, 
        by.x=c('S2_OUT_STATE', 'S2_base_1', 'S2_base_2', 'S2_base_3'),
        by.y=c('OUT_STATE', 'base_1', 'base_2', 'base_3'))

take a look at the new columns added to the df:

head(event_BSA_short_harper_ev_D)
##   S2_OUT_STATE S2_base_1 S2_base_2 S2_base_3 base_1 base_2 base_3 SBA_outcome
## 1            0         0         1         0      1      0      0     success
## 2            2         0         0         1      0      1      0     success
## 3            2         0         1         0      1      0      0     success
##   OUT_STATE  runner_name runner_ID EV_state.x base_state_name.x EV_state.y
## 1         0 Bryce Harper  harpb003      0.859            on 1st      1.100
## 2         2 Bryce Harper  harpb003      0.319            on 2nd      0.353
## 3         2 Bryce Harper  harpb003      0.224            on 1st      0.319
##   base_state_name.y
## 1            on 2nd
## 2            on 3rd
## 3            on 2nd

that is a getting a little complicated. Let’s use a table:

the df as a table
S2_OUT_STATE S2_base_1 S2_base_2 S2_base_3 base_1 base_2 base_3 SBA_outcome OUT_STATE runner_name runner_ID EV_state.x base_state_name.x EV_state.y base_state_name.y
0 0 1 0 1 0 0 success 0 Bryce Harper harpb003 0.859 on 1st 1.100 on 2nd
2 0 0 1 0 1 0 success 2 Bryce Harper harpb003 0.319 on 2nd 0.353 on 3rd
2 0 1 0 1 0 0 success 2 Bryce Harper harpb003 0.224 on 1st 0.319 on 2nd

see how there are now two ‘EV_state’ columns; EV_state.x & EV_state.y These are the expected values for State 1 and State 2. Let’s use the ‘rename’ function to rename them to somethings that makes more sense and is easier to read and remember. Also, now you can calculate the change in value for each of the three SBA in the data frame. Use the mutate function to create a new column, and name it ‘change_EV’.
This is the simple subtraction between the expeted value of State 1 and State 2.

event_BSA_short_harper_ev_E <- event_BSA_short_harper_ev_D %>%
  rename(EV_state_1 = EV_state.x, 
         EV_state_2 = EV_state.y) %>%
  mutate(change_EV= EV_state_2- EV_state_1 )

was value created over these three SBA’s? Let’s find out.
Sum the change in value over these three SBA’s.
Remember, these three are non-representative, so this sumation does not represent the players’ actual value creation or descruction.

sum(event_BSA_short_harper_ev_E$change_EV)
## [1] 0.37

A positive value means the runner created value with these three SBA.

Task 2: 1 player, all SBA in a season

{maybe do it for a 2nd player and compare who created more value} ~by hand for one player over all stolen base attempts, determine the change in expected value between state1 and state2. Over all SBA base states, and sum the net season change in value. Did a player X’s stolen base activity create run value or not?

TASK2a: read in the full event file & select a player

read in the stolen base attempt event file

event_BSA <- read_csv('event_SBA_full.csv')
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   runner_name = col_character(),
##   runner_ID = col_character(),
##   team_runner_cd = col_character(),
##   OUT_STATE = col_double(),
##   base_1 = col_double(),
##   base_2 = col_double(),
##   base_3 = col_double(),
##   SBA_outcome = col_character(),
##   BAT_LINEUP_ID = col_double(),
##   BAT_HAND_CD = col_character(),
##   CatcherID = col_character(),
##   PIT_ID = col_character(),
##   PIT_HAND_CD = col_character()
## )

How many events are in this file? There are a few ways to determine this. One way is to use the ‘length’ function

length(event_BSA$runner_ID)
## [1] 2532

that shows how many SBA attempts this single player had over the entire season. Over all players what was the number sucess and failed SBA?

table(event_BSA$SBA_outcome)
## 
##    fail success 
##     579    1953
head(event_BSA)
## # A tibble: 6 × 13
##   runner_name  runner_ID team_runner_cd OUT_STATE base_1 base_2 base_3
##   <chr>        <chr>     <chr>              <dbl>  <dbl>  <dbl>  <dbl>
## 1 Jose Abreu   abrej003  CHA                    2      1      0      0
## 2 Ronald Acuna acunr001  ATL                    1      1      0      0
## 3 Ronald Acuna acunr001  ATL                    0      1      0      0
## 4 Ronald Acuna acunr001  ATL                    0      1      0      1
## 5 Ronald Acuna acunr001  ATL                    2      1      0      1
## 6 Ronald Acuna acunr001  ATL                    2      1      0      1
## # … with 6 more variables: SBA_outcome <chr>, BAT_LINEUP_ID <dbl>,
## #   BAT_HAND_CD <chr>, CatcherID <chr>, PIT_ID <chr>, PIT_HAND_CD <chr>

show the event file

Now we will repeat the same actions as in Task 1, but with the full record for one player. Since you have already read in the expected value table in the global environment, that task doesn’t need to be repeated.

filter the event file to select only one player

event_BSA_full_harper <- event_BSA %>%
  filter(runner_ID=='harpb003')

take a look on the new dataframe

head(event_BSA_full_harper)
## # A tibble: 6 × 13
##   runner_name  runner_ID team_runner_cd OUT_STATE base_1 base_2 base_3
##   <chr>        <chr>     <chr>              <dbl>  <dbl>  <dbl>  <dbl>
## 1 Bryce Harper harpb003  PHI                    0      1      0      0
## 2 Bryce Harper harpb003  PHI                    2      1      0      0
## 3 Bryce Harper harpb003  PHI                    2      1      0      1
## 4 Bryce Harper harpb003  PHI                    2      1      0      0
## 5 Bryce Harper harpb003  PHI                    1      1      0      1
## 6 Bryce Harper harpb003  PHI                    2      1      0      0
## # … with 6 more variables: SBA_outcome <chr>, BAT_LINEUP_ID <dbl>,
## #   BAT_HAND_CD <chr>, CatcherID <chr>, PIT_ID <chr>, PIT_HAND_CD <chr>
table(event_BSA_full_harper$runner_name)
## 
## Bryce Harper 
##           16
length(event_BSA_full_harper$runner_ID)
## [1] 16

that shows how many SBA attempts this single player had over the entire season

table(event_BSA_full_harper$SBA_outcome) 
## 
##    fail success 
##       2      14

Task 2b: apply EV to SBA events

Use the merge function to add the expected value for each of the 24 base-out states.

event_BSA_full_harper_ev <- merge(event_BSA_full_harper, expect_value, 
                                   by=c('OUT_STATE', 'base_1', 'base_2', 'base_3'))

take a look at this dataframe of a single player. Note how the expected value column was added for each base-out state

the df as a table
OUT_STATE base_1 base_2 base_3 runner_name runner_ID team_runner_cd SBA_outcome BAT_LINEUP_ID BAT_HAND_CD CatcherID PIT_ID PIT_HAND_CD EV_state base_state_name
0 1 0 0 Bryce Harper harpb003 PHI success 4 R stalj001 ramiy001 R 0.859 on 1st
0 1 1 0 Bryce Harper harpb003 PHI success 4 R mccab002 tehej001 R 1.437 1st & 2nd
1 1 0 0 Bryce Harper harpb003 PHI success 3 R carav001 darvy001 R 0.509 on 1st

Task 2c: change in state with SBA,

determining State 2, the base state and out state after a SBA.

State 2 out state: Calculate the change in out state based on outcome of the stolen base attempt. Success, a successful SBA does not change the out state, or fail, a SBA that leads to an out increases the out state buy 1 out.

event_BSA_full_harper_ev_B <- event_BSA_full_harper_ev %>%
  mutate(S2_OUT_STATE= ifelse(SBA_outcome=='success', OUT_STATE, 1+OUT_STATE))

view the modified dataframe

the df as a table
OUT_STATE base_1 base_2 base_3 runner_name runner_ID team_runner_cd SBA_outcome BAT_LINEUP_ID BAT_HAND_CD CatcherID PIT_ID PIT_HAND_CD EV_state base_state_name S2_OUT_STATE
0 1 0 0 Bryce Harper harpb003 PHI success 4 R stalj001 ramiy001 R 0.859 on 1st 0
0 1 1 0 Bryce Harper harpb003 PHI success 4 R mccab002 tehej001 R 1.437 1st & 2nd 0
1 1 0 0 Bryce Harper harpb003 PHI success 3 R carav001 darvy001 R 0.509 on 1st 1
1 1 0 0 Bryce Harper harpb003 PHI success 4 R grany001 housa001 R 0.509 on 1st 1
1 1 0 0 Bryce Harper harpb003 PHI success 5 R barnt001 bowmm001 R 0.509 on 1st 1
1 1 0 1 Bryce Harper harpb003 PHI success 3 L holab001 alcas001 R 1.130 corners 1
1 1 0 1 Bryce Harper harpb003 PHI fail 4 R alfaj002 richt001 R 1.130 corners 2
2 1 0 0 Bryce Harper harpb003 PHI success 4 R mccab002 sorom001 R 0.224 on 1st 2
2 1 0 0 Bryce Harper harpb003 PHI success 4 R mccab002 soboc001 R 0.224 on 1st 2
2 1 0 0 Bryce Harper harpb003 PHI success 4 R ramow001 matzs001 L 0.224 on 1st 2
2 1 0 0 Bryce Harper harpb003 PHI success 4 R castj006 berrj001 R 0.224 on 1st 2
2 1 0 0 Bryce Harper harpb003 PHI success 4 R flowt001 grees005 R 0.224 on 1st 2
2 1 0 0 Bryce Harper harpb003 PHI success 4 R casac001 grays001 R 0.224 on 1st 2
2 1 0 0 Bryce Harper harpb003 PHI fail 3 R kellc002 mcfat001 L 0.224 on 1st 3
2 1 0 0 Bryce Harper harpb003 PHI success 4 R alfaj002 quijj001 L 0.224 on 1st 2

read in table with the changes in base state with a SBA

SBA_state_change <- read_csv('SBA_state_change.csv')
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   base_1 = col_double(),
##   base_2 = col_double(),
##   base_3 = col_double(),
##   SBA_outcome = col_character(),
##   S2_base_1 = col_double(),
##   S2_base_2 = col_double(),
##   S2_base_3 = col_double()
## )

merge the change in base state at State 2 with the df:

event_BSA_full_harper_ev_C <- event_BSA_full_harper_ev_B %>%
  merge(SBA_state_change, by=c('base_1', 'base_2', 'base_3', 'SBA_outcome'))

take a look at the new columns added to the df:

the df as a table
base_1 base_2 base_3 SBA_outcome OUT_STATE runner_name runner_ID team_runner_cd BAT_LINEUP_ID BAT_HAND_CD CatcherID PIT_ID PIT_HAND_CD EV_state base_state_name S2_OUT_STATE S2_base_1 S2_base_2 S2_base_3
1 0 0 fail 2 Bryce Harper harpb003 PHI 3 R kellc002 mcfat001 L 0.224 on 1st 3 0 0 0
1 0 0 success 0 Bryce Harper harpb003 PHI 4 R stalj001 ramiy001 R 0.859 on 1st 0 0 1 0
1 0 0 success 2 Bryce Harper harpb003 PHI 4 R ramow001 matzs001 L 0.224 on 1st 2 0 1 0
1 0 0 success 1 Bryce Harper harpb003 PHI 3 R carav001 darvy001 R 0.509 on 1st 1 0 1 0
1 0 0 success 1 Bryce Harper harpb003 PHI 4 R grany001 housa001 R 0.509 on 1st 1 0 1 0
1 0 0 success 1 Bryce Harper harpb003 PHI 5 R barnt001 bowmm001 R 0.509 on 1st 1 0 1 0
1 0 0 success 2 Bryce Harper harpb003 PHI 4 R castj006 berrj001 R 0.224 on 1st 2 0 1 0
1 0 0 success 2 Bryce Harper harpb003 PHI 4 R mccab002 sorom001 R 0.224 on 1st 2 0 1 0
1 0 0 success 2 Bryce Harper harpb003 PHI 4 R mccab002 soboc001 R 0.224 on 1st 2 0 1 0
1 0 0 success 2 Bryce Harper harpb003 PHI 4 R alfaj002 quijj001 L 0.224 on 1st 2 0 1 0
1 0 0 success 2 Bryce Harper harpb003 PHI 4 R flowt001 grees005 R 0.224 on 1st 2 0 1 0
1 0 0 success 2 Bryce Harper harpb003 PHI 4 R casac001 grays001 R 0.224 on 1st 2 0 1 0
1 0 1 fail 1 Bryce Harper harpb003 PHI 4 R alfaj002 richt001 R 1.130 corners 2 0 0 1
1 0 1 success 1 Bryce Harper harpb003 PHI 3 L holab001 alcas001 R 1.130 corners 1 0 1 1
1 0 1 success 2 Bryce Harper harpb003 PHI 3 R casac001 grays001 R 0.478 corners 2 0 1 1

add the expected value for State 2 in the same way as before.
Merge the expected value table with the df, but this time merger using the State 2 base and out states:

event_BSA_full_harper_ev_D <- merge(event_BSA_full_harper_ev_C, expect_value, 
        by.x=c('S2_OUT_STATE', 'S2_base_1', 'S2_base_2', 'S2_base_3'),
        by.y=c('OUT_STATE', 'base_1', 'base_2', 'base_3'))

take a look at the new columns added to the df:

the df as a table
S2_OUT_STATE S2_base_1 S2_base_2 S2_base_3 base_1 base_2 base_3 SBA_outcome OUT_STATE runner_name runner_ID team_runner_cd BAT_LINEUP_ID BAT_HAND_CD CatcherID PIT_ID PIT_HAND_CD EV_state.x base_state_name.x EV_state.y base_state_name.y
0 0 1 0 1 0 0 success 0 Bryce Harper harpb003 PHI 4 R stalj001 ramiy001 R 0.859 on 1st 1.100 on 2nd
0 1 0 1 1 1 0 success 0 Bryce Harper harpb003 PHI 4 R mccab002 tehej001 R 1.437 1st & 2nd 1.784 corners
1 0 1 0 1 0 0 success 1 Bryce Harper harpb003 PHI 4 R grany001 housa001 R 0.509 on 1st 0.664 on 2nd
1 0 1 0 1 0 0 success 1 Bryce Harper harpb003 PHI 5 R barnt001 bowmm001 R 0.509 on 1st 0.664 on 2nd
1 0 1 0 1 0 0 success 1 Bryce Harper harpb003 PHI 3 R carav001 darvy001 R 0.509 on 1st 0.664 on 2nd
1 0 1 1 1 0 1 success 1 Bryce Harper harpb003 PHI 3 L holab001 alcas001 R 1.130 corners 1.376 2nd & 3rd
2 0 0 1 1 0 1 fail 1 Bryce Harper harpb003 PHI 4 R alfaj002 richt001 R 1.130 corners 0.353 on 3rd
2 0 1 0 1 0 0 success 2 Bryce Harper harpb003 PHI 4 R ramow001 matzs001 L 0.224 on 1st 0.319 on 2nd
2 0 1 0 1 0 0 success 2 Bryce Harper harpb003 PHI 4 R mccab002 sorom001 R 0.224 on 1st 0.319 on 2nd
2 0 1 0 1 0 0 success 2 Bryce Harper harpb003 PHI 4 R mccab002 soboc001 R 0.224 on 1st 0.319 on 2nd
2 0 1 0 1 0 0 success 2 Bryce Harper harpb003 PHI 4 R alfaj002 quijj001 L 0.224 on 1st 0.319 on 2nd
2 0 1 0 1 0 0 success 2 Bryce Harper harpb003 PHI 4 R castj006 berrj001 R 0.224 on 1st 0.319 on 2nd
2 0 1 0 1 0 0 success 2 Bryce Harper harpb003 PHI 4 R casac001 grays001 R 0.224 on 1st 0.319 on 2nd
2 0 1 0 1 0 0 success 2 Bryce Harper harpb003 PHI 4 R flowt001 grees005 R 0.224 on 1st 0.319 on 2nd
2 0 1 1 1 0 1 success 2 Bryce Harper harpb003 PHI 3 R casac001 grays001 R 0.478 corners 0.580 2nd & 3rd

see how there are now two ‘EV_state’ columns; EV_state.x & EV_state.y These are the expected values for State 1 and State 2. Let’s use the ‘rename’ function to rename them to somethings that makes more sense and is easier to read and remember. Also, now you can calculate the change in value for each of the three SBA in the data frame. Use the mutate function to create a new column, and name it ‘change_EV’.
This is the simple subtraction between the expeted value of State 1 and State 2.

event_BSA_full_harper_ev_E <- event_BSA_full_harper_ev_D %>%
  rename(EV_state_1 = EV_state.x, 
         EV_state_2 = EV_state.y) %>%
  mutate(change_EV= EV_state_2- EV_state_1 )

was value created over these three SBA’s? Let’s find out.
Sum the change in value over these three SBA’s.
Remember, these three are non-representative, so this sumation does not represent the players’ actual value creation or descruction.

sum(event_BSA_full_harper_ev_E$change_EV)
## [1] 1.065

A positive value means the runner created value with thier SBA over the full season.

Task 3: Team level SBA expected value.

Did the team create value (or not) by SBA? Over an entire team, over all stolen base attempts, determine the change in expected value between state1 and state2. Over all SBA base states.

the event file is the same as the previous task; the full season SBA events see the total number of SBA for each team:

table(event_BSA$team_runner_cd)
## 
## ANA ARI ATL BAL BOS CHA CHN CIN CLE COL DET HOU KCA LAN MIA MIL MIN NYA NYN OAK 
##  75  95 101  88  83  72  54  95 115  82  73  78 101  60  69  96  43  61  74  56 
## PHI PIT SDN SEA SFN SLN TBA TEX TOR WAS 
##  79  79  87 136  55 114 106 132  53 120

And plot this table using ggplot

ggplot(event_BSA, aes(x=team_runner_cd)) +
  geom_bar() +
  theme(axis.text.x=element_text(angle=90,hjust=1)) +
  ylab('stolen base attempts / team') +
  xlab(NULL)

Similar to the previous tasks, next apply the expected value table to the event file containing all SBA for the season. Also at the Out state change in State 2.

event_BSA_team <- event_BSA %>% 
  merge(expect_value, by=c('OUT_STATE', 'base_1', 'base_2', 'base_3')) %>%
  mutate(S2_OUT_STATE= ifelse(SBA_outcome=='success', OUT_STATE, 1+OUT_STATE))

And complete the State 2 by adding the change in base states:

event_BSA_team_A <- event_BSA_team %>%
  merge(SBA_state_change, by=c('base_1', 'base_2', 'base_3', 'SBA_outcome'))

Finally, reapply the expected value to State 2:

event_BSA_team_B <- event_BSA_team_A %>%
  merge(expect_value, 
        by.x=c('S2_OUT_STATE', 'S2_base_1', 'S2_base_2', 'S2_base_3'),
        by.y=c('OUT_STATE', 'base_1', 'base_2', 'base_3')) %>%
  rename(EV_state_1 = EV_state.x, 
         EV_state_2 = EV_state.y,
         base_state_name_1 = base_state_name.x,
         base_state_name_2 = base_state_name.y) %>%
  mutate(delta_EV = EV_state_2-EV_state_1)
event_BSA_team_B$team_runner_cd <- as.factor(event_BSA_team_B$team_runner_cd)
hist(event_BSA_team_B$delta_EV, breaks=20)

sum(event_BSA_team_B$delta_EV)
## [1] 55.964
sum(event_BSA_team_B$delta_EV)/length(event_BSA_team_B$delta_EV)
## [1] 0.02210269
event_BSA_team_C <- event_BSA_team_B %>%
  group_by(team_runner_cd) %>% 
  summarise(tot_delta_EV = sum(delta_EV))
ggplot(event_BSA_team_C) +
  geom_col(aes(y=tot_delta_EV, x=reorder(team_runner_cd, desc(tot_delta_EV)))) +
  theme_bw() +
  theme(axis.text.x=element_text(angle=90,hjust=1)) +
  xlab(NULL) +
  ylab('cumulative EV change') +
  ggtitle('Team wide change in EV from SBA') +
  theme(plot.title = element_text(hjust = 0.5)) +
  geom_hline(yintercept = 0, color='red')

extra stuff

event_BSA_team_indiv <- event_BSA_team_B %>% 
  group_by(runner_ID) %>% 
  mutate(counter=1) %>%
  summarise(delta_EV_tot= sum(delta_EV),
            SBA_tot= sum(counter))
hist(event_BSA_team_indiv$delta_EV_tot, breaks=40)

max(event_BSA_team_indiv$delta_EV_tot)
## [1] 4.574
ggplot(event_BSA_team_indiv) +
  geom_hline(yintercept = 0, color='red') +
  geom_point(aes(x=SBA_tot, y=delta_EV_tot), alpha=0.5) +
  theme_bw() + xlab('SBA per player') +
  scale_y_continuous(breaks=seq(-2.5, 5, by=0.5), 
                     name='net change in expected value from SBA')