Task 1: by hand, 1 player, 3 SBA & outcomes

~by hand, for one player over three stolen base attempts, determine the change in expected value between state1 and state2. Over a variety of SBA base states, not just (1,0,0) to (0,1,0)

TASK1a: read EV file

read in the csv file that contains the expected value table for each base state - out state. use the ‘read_csv’ function to read the file and assign it to the data frame ‘expect_value’.

expect_value <- read_csv('EV_base_out_states.csv')

## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   EV_state = col_double(),
##   OUT_STATE = col_double(),
##   base_1 = col_double(),
##   base_2 = col_double(),
##   base_3 = col_double(),
##   base_state_name = col_character()
## )

And then look at the dataframe using the ‘head’ function. In R Studio, you can view a dataframe by clicking on the table data icon on the right, in the environment pane. Try that too.

head(expect_value)

## # A tibble: 6 × 6
##   EV_state OUT_STATE base_1 base_2 base_3 base_state_name
##      <dbl>     <dbl>  <dbl>  <dbl>  <dbl> <chr>          
## 1    0.481         0      0      0      0 empty          
## 2    0.859         0      1      0      0 on 1st         
## 3    1.1           0      0      1      0 on 2nd         
## 4    1.44          0      1      1      0 1st & 2nd      
## 5    1.35          0      0      0      1 on 3rd         
## 6    1.78          0      1      0      1 corners

the ‘head’ functions shows column names and the top 6 rows.
base_1 shows if there is runner on 1st base (1), or no runner (0).

Add a new column to the expected value table; baserunners_tot, the total number of base runners on all three bases for each base state. Use the ‘mutate’ function to create the new column based on the addition of all three bases.

expect_value_A <- expect_value %>%
  mutate(baserunners_tot = (base_1+ base_2+ base_3))

Let’s take a quick look at the expect value. Plot the expect values of different out states.

Does that make sense? The expected value is highest with zero outs and decreases with increasing outs. The expected value with 3 outs is zero; the inning is over. Within each out state, the more runners on base, the higher the expected value.

TASK1b: read event file

read in the event file & select a player read in the abbreviated stolen base attempt event file

event_BSA_short <- read_csv('event_SBA_short.csv')

## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   runner_name = col_character(),
##   runner_ID = col_character(),
##   OUT_STATE = col_double(),
##   base_1 = col_double(),
##   base_2 = col_double(),
##   base_3 = col_double(),
##   SBA_outcome = col_character()
## )

you can see the column names of the df above. There are 6 columns.

Let’s take a looks at the table, use the ‘head’ function to show the event file

head(event_BSA_short)

## # A tibble: 6 × 7
##   runner_name     runner_ID OUT_STATE base_1 base_2 base_3 SBA_outcome
##   <chr>           <chr>         <dbl>  <dbl>  <dbl>  <dbl> <chr>      
## 1 Bryce Harper    harpb003          0      1      0      0 success    
## 2 Bryce Harper    harpb003          2      0      1      0 success    
## 3 Bryce Harper    harpb003          2      1      0      0 success    
## 4 Cesar Hernandez hernc005          1      0      1      0 fail       
## 5 Cesar Hernandez hernc005          1      1      0      0 fail       
## 6 Cesar Hernandez hernc005          2      1      0      0 success

see what runner are available in the event file:

table(event_BSA_short$runner_name)

## 
##    Bryce Harper Cesar Hernandez   J.T. Realmuto     Jean Segura 
##               3               3               3               3

there are four different runners, and each runner has 3 observations (i.e. three SBA).

filter the event file to select only one player

event_BSA_short_harper <- event_BSA_short %>%
  filter(runner_ID=='harpb003')

take a look on the new dataframwe

head(event_BSA_short_harper)

## # A tibble: 3 × 7
##   runner_name  runner_ID OUT_STATE base_1 base_2 base_3 SBA_outcome
##   <chr>        <chr>         <dbl>  <dbl>  <dbl>  <dbl> <chr>      
## 1 Bryce Harper harpb003          0      1      0      0 success    
## 2 Bryce Harper harpb003          2      0      1      0 success    
## 3 Bryce Harper harpb003          2      1      0      0 success

table(event_BSA_short_harper$runner_name)

## 
## Bryce Harper 
##            3

Task 1c: apply EV to SBA events

Use the merge function to add the expected value for each of the 24 base-out states.

event_BSA_short_harper_ev <- merge(event_BSA_short_harper, expect_value, 
                                   by=c('OUT_STATE', 'base_1', 'base_2', 'base_3'))

take a look at this dataframe of a single player. Note how the expected value column was added for each base-out state

head(event_BSA_short_harper_ev)

##   OUT_STATE base_1 base_2 base_3  runner_name runner_ID SBA_outcome EV_state
## 1         0      1      0      0 Bryce Harper  harpb003     success    0.859
## 2         2      0      1      0 Bryce Harper  harpb003     success    0.319
## 3         2      1      0      0 Bryce Harper  harpb003     success    0.224
##   base_state_name
## 1          on 1st
## 2          on 2nd
## 3          on 1st

Task 1d: change in state with SBA,

determing State 2, the base state and out state after a SBA.

State 2 out state: Calculate the change in out state based on outcome of the stolen base attempt. Success, a successful SBA does not change the out state, or fail, a SBA that leads to an out increases the out state buy 1 out.

event_BSA_short_harper_ev_B <- event_BSA_short_harper_ev %>%
  mutate(S2_OUT_STATE= ifelse(SBA_outcome=='success', OUT_STATE, 1+OUT_STATE))

view the modified dataframe

head(event_BSA_short_harper_ev_B)

read in table with the changes in base state with a SBA

SBA_state_change <- read_csv('SBA_state_change.csv')

## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   base_1 = col_double(),
##   base_2 = col_double(),
##   base_3 = col_double(),
##   SBA_outcome = col_character(),
##   S2_base_1 = col_double(),
##   S2_base_2 = col_double(),
##   S2_base_3 = col_double()
## )

merge the change in base state at State 2 with the df:

event_BSA_short_harper_ev_C <- event_BSA_short_harper_ev_B %>%
  merge(SBA_state_change, by=c('base_1', 'base_2', 'base_3', 'SBA_outcome'))

take a look at the new columns added to the df:

head(event_BSA_short_harper_ev_C)

##   base_1 base_2 base_3 SBA_outcome OUT_STATE  runner_name runner_ID EV_state
## 1      0      1      0     success         2 Bryce Harper  harpb003    0.319
## 2      1      0      0     success         0 Bryce Harper  harpb003    0.859
## 3      1      0      0     success         2 Bryce Harper  harpb003    0.224
##   base_state_name S2_OUT_STATE S2_base_1 S2_base_2 S2_base_3
## 1          on 2nd            2         0         0         1
## 2          on 1st            0         0         1         0
## 3          on 1st            2         0         1         0

add the expected value for State 2 in the same way as before.
Merge the expected value table with the df, but this time merger using the State 2 base and out states:

event_BSA_short_harper_ev_D <- merge(event_BSA_short_harper_ev_C, expect_value, 
        by.x=c('S2_OUT_STATE', 'S2_base_1', 'S2_base_2', 'S2_base_3'),
        by.y=c('OUT_STATE', 'base_1', 'base_2', 'base_3'))

take a look at the new columns added to the df:

head(event_BSA_short_harper_ev_D)

##   S2_OUT_STATE S2_base_1 S2_base_2 S2_base_3 base_1 base_2 base_3 SBA_outcome
## 1            0         0         1         0      1      0      0     success
## 2            2         0         0         1      0      1      0     success
## 3            2         0         1         0      1      0      0     success
##   OUT_STATE  runner_name runner_ID EV_state.x base_state_name.x EV_state.y
## 1         0 Bryce Harper  harpb003      0.859            on 1st      1.100
## 2         2 Bryce Harper  harpb003      0.319            on 2nd      0.353
## 3         2 Bryce Harper  harpb003      0.224            on 1st      0.319
##   base_state_name.y
## 1            on 2nd
## 2            on 3rd
## 3            on 2nd

that is a getting a little complicated. Let’s use a table:

the df as a table
S2_OUT_STATE	S2_base_2	S2_base_3	base_1	base_2	SBA_outcome	OUT_STATE	runner_name	runner_ID	EV_state.x	base_state_name.x	EV_state.y	base_state_name.y
0	1	0	1	0	success	0	Bryce Harper	harpb003	0.859	on 1st	1.100	on 2nd
2	0	1	0	1	success	2	Bryce Harper	harpb003	0.319	on 2nd	0.353	on 3rd
2	1	0	1	0	success	2	Bryce Harper	harpb003	0.224	on 1st	0.319	on 2nd

see how there are now two ‘EV_state’ columns; EV_state.x & EV_state.y These are the expected values for State 1 and State 2. Let’s use the ‘rename’ function to rename them to somethings that makes more sense and is easier to read and remember. Also, now you can calculate the change in value for each of the three SBA in the data frame. Use the mutate function to create a new column, and name it ‘change_EV’.
This is the simple subtraction between the expeted value of State 1 and State 2.

event_BSA_short_harper_ev_E <- event_BSA_short_harper_ev_D %>%
  rename(EV_state_1 = EV_state.x, 
         EV_state_2 = EV_state.y) %>%
  mutate(change_EV= EV_state_2- EV_state_1 )

was value created over these three SBA’s? Let’s find out.
Sum the change in value over these three SBA’s.
Remember, these three are non-representative, so this sumation does not represent the players’ actual value creation or descruction.

sum(event_BSA_short_harper_ev_E$change_EV)

## [1] 0.37

A positive value means the runner created value with these three SBA.

Task 2: 1 player, all SBA in a season

{maybe do it for a 2nd player and compare who created more value} ~by hand for one player over all stolen base attempts, determine the change in expected value between state1 and state2. Over all SBA base states, and sum the net season change in value. Did a player X’s stolen base activity create run value or not?

TASK2a: read in the full event file & select a player

read in the stolen base attempt event file

event_BSA <- read_csv('event_SBA_full.csv')

## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   runner_name = col_character(),
##   runner_ID = col_character(),
##   team_runner_cd = col_character(),
##   OUT_STATE = col_double(),
##   base_1 = col_double(),
##   base_2 = col_double(),
##   base_3 = col_double(),
##   SBA_outcome = col_character(),
##   BAT_LINEUP_ID = col_double(),
##   BAT_HAND_CD = col_character(),
##   CatcherID = col_character(),
##   PIT_ID = col_character(),
##   PIT_HAND_CD = col_character()
## )

How many events are in this file? There are a few ways to determine this. One way is to use the ‘length’ function

length(event_BSA$runner_ID)

## [1] 2532

that shows how many SBA attempts this single player had over the entire season. Over all players what was the number sucess and failed SBA?

table(event_BSA$SBA_outcome)

## 
##    fail success 
##     579    1953

head(event_BSA)

## # A tibble: 6 × 13
##   runner_name  runner_ID team_runner_cd OUT_STATE base_1 base_2 base_3
##   <chr>        <chr>     <chr>              <dbl>  <dbl>  <dbl>  <dbl>
## 1 Jose Abreu   abrej003  CHA                    2      1      0      0
## 2 Ronald Acuna acunr001  ATL                    1      1      0      0
## 3 Ronald Acuna acunr001  ATL                    0      1      0      0
## 4 Ronald Acuna acunr001  ATL                    0      1      0      1
## 5 Ronald Acuna acunr001  ATL                    2      1      0      1
## 6 Ronald Acuna acunr001  ATL                    2      1      0      1
## # … with 6 more variables: SBA_outcome <chr>, BAT_LINEUP_ID <dbl>,
## #   BAT_HAND_CD <chr>, CatcherID <chr>, PIT_ID <chr>, PIT_HAND_CD <chr>

show the event file

Now we will repeat the same actions as in Task 1, but with the full record for one player. Since you have already read in the expected value table in the global environment, that task doesn’t need to be repeated.

filter the event file to select only one player

event_BSA_full_harper <- event_BSA %>%
  filter(runner_ID=='harpb003')

take a look on the new dataframe

head(event_BSA_full_harper)

## # A tibble: 6 × 13
##   runner_name  runner_ID team_runner_cd OUT_STATE base_1 base_2 base_3
##   <chr>        <chr>     <chr>              <dbl>  <dbl>  <dbl>  <dbl>
## 1 Bryce Harper harpb003  PHI                    0      1      0      0
## 2 Bryce Harper harpb003  PHI                    2      1      0      0
## 3 Bryce Harper harpb003  PHI                    2      1      0      1
## 4 Bryce Harper harpb003  PHI                    2      1      0      0
## 5 Bryce Harper harpb003  PHI                    1      1      0      1
## 6 Bryce Harper harpb003  PHI                    2      1      0      0
## # … with 6 more variables: SBA_outcome <chr>, BAT_LINEUP_ID <dbl>,
## #   BAT_HAND_CD <chr>, CatcherID <chr>, PIT_ID <chr>, PIT_HAND_CD <chr>

table(event_BSA_full_harper$runner_name)

## 
## Bryce Harper 
##           16

length(event_BSA_full_harper$runner_ID)

## [1] 16

that shows how many SBA attempts this single player had over the entire season

table(event_BSA_full_harper$SBA_outcome)

## 
##    fail success 
##       2      14

Task 2b: apply EV to SBA events

Use the merge function to add the expected value for each of the 24 base-out states.

event_BSA_full_harper_ev <- merge(event_BSA_full_harper, expect_value, 
                                   by=c('OUT_STATE', 'base_1', 'base_2', 'base_3'))

take a look at this dataframe of a single player. Note how the expected value column was added for each base-out state

the df as a table
OUT_STATE	base_1	base_2	runner_name	runner_ID	team_runner_cd	SBA_outcome	BAT_LINEUP_ID	BAT_HAND_CD	CatcherID	PIT_ID	PIT_HAND_CD	EV_state	base_state_name
0	1	0	Bryce Harper	harpb003	PHI	success	4	R	stalj001	ramiy001	R	0.859	on 1st
0	1	1	Bryce Harper	harpb003	PHI	success	4	R	mccab002	tehej001	R	1.437	1st & 2nd
1	1	0	Bryce Harper	harpb003	PHI	success	3	R	carav001	darvy001	R	0.509	on 1st

Task 2c: change in state with SBA,

determining State 2, the base state and out state after a SBA.

State 2 out state: Calculate the change in out state based on outcome of the stolen base attempt. Success, a successful SBA does not change the out state, or fail, a SBA that leads to an out increases the out state buy 1 out.

event_BSA_full_harper_ev_B <- event_BSA_full_harper_ev %>%
  mutate(S2_OUT_STATE= ifelse(SBA_outcome=='success', OUT_STATE, 1+OUT_STATE))

view the modified dataframe

the df as a table
OUT_STATE	base_1	base_2	base_3	runner_name	runner_ID	team_runner_cd	SBA_outcome	BAT_LINEUP_ID	BAT_HAND_CD	CatcherID	PIT_ID	PIT_HAND_CD	EV_state	base_state_name	S2_OUT_STATE
0	1	0	0	Bryce Harper	harpb003	PHI	success	4	R	stalj001	ramiy001	R	0.859	on 1st	0
0	1	1	0	Bryce Harper	harpb003	PHI	success	4	R	mccab002	tehej001	R	1.437	1st & 2nd	0
1	1	0	0	Bryce Harper	harpb003	PHI	success	3	R	carav001	darvy001	R	0.509	on 1st	1
1	1	0	0	Bryce Harper	harpb003	PHI	success	4	R	grany001	housa001	R	0.509	on 1st	1
1	1	0	0	Bryce Harper	harpb003	PHI	success	5	R	barnt001	bowmm001	R	0.509	on 1st	1
1	1	0	1	Bryce Harper	harpb003	PHI	success	3	L	holab001	alcas001	R	1.130	corners	1
1	1	0	1	Bryce Harper	harpb003	PHI	fail	4	R	alfaj002	richt001	R	1.130	corners	2
2	1	0	0	Bryce Harper	harpb003	PHI	success	4	R	mccab002	sorom001	R	0.224	on 1st	2
2	1	0	0	Bryce Harper	harpb003	PHI	success	4	R	mccab002	soboc001	R	0.224	on 1st	2
2	1	0	0	Bryce Harper	harpb003	PHI	success	4	R	ramow001	matzs001	L	0.224	on 1st	2
2	1	0	0	Bryce Harper	harpb003	PHI	success	4	R	castj006	berrj001	R	0.224	on 1st	2
2	1	0	0	Bryce Harper	harpb003	PHI	success	4	R	flowt001	grees005	R	0.224	on 1st	2
2	1	0	0	Bryce Harper	harpb003	PHI	success	4	R	casac001	grays001	R	0.224	on 1st	2
2	1	0	0	Bryce Harper	harpb003	PHI	fail	3	R	kellc002	mcfat001	L	0.224	on 1st	3
2	1	0	0	Bryce Harper	harpb003	PHI	success	4	R	alfaj002	quijj001	L	0.224	on 1st	2

read in table with the changes in base state with a SBA

SBA_state_change <- read_csv('SBA_state_change.csv')

## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   base_1 = col_double(),
##   base_2 = col_double(),
##   base_3 = col_double(),
##   SBA_outcome = col_character(),
##   S2_base_1 = col_double(),
##   S2_base_2 = col_double(),
##   S2_base_3 = col_double()
## )

merge the change in base state at State 2 with the df:

event_BSA_full_harper_ev_C <- event_BSA_full_harper_ev_B %>%
  merge(SBA_state_change, by=c('base_1', 'base_2', 'base_3', 'SBA_outcome'))

take a look at the new columns added to the df:

the df as a table
base_1	base_3	SBA_outcome	OUT_STATE	runner_name	runner_ID	team_runner_cd	BAT_LINEUP_ID	BAT_HAND_CD	CatcherID	PIT_ID	PIT_HAND_CD	EV_state	base_state_name	S2_OUT_STATE	S2_base_2	S2_base_3
1	0	fail	2	Bryce Harper	harpb003	PHI	3	R	kellc002	mcfat001	L	0.224	on 1st	3	0	0
1	0	success	0	Bryce Harper	harpb003	PHI	4	R	stalj001	ramiy001	R	0.859	on 1st	0	1	0
1	0	success	2	Bryce Harper	harpb003	PHI	4	R	ramow001	matzs001	L	0.224	on 1st	2	1	0
1	0	success	1	Bryce Harper	harpb003	PHI	3	R	carav001	darvy001	R	0.509	on 1st	1	1	0
1	0	success	1	Bryce Harper	harpb003	PHI	4	R	grany001	housa001	R	0.509	on 1st	1	1	0
1	0	success	1	Bryce Harper	harpb003	PHI	5	R	barnt001	bowmm001	R	0.509	on 1st	1	1	0
1	0	success	2	Bryce Harper	harpb003	PHI	4	R	castj006	berrj001	R	0.224	on 1st	2	1	0
1	0	success	2	Bryce Harper	harpb003	PHI	4	R	mccab002	sorom001	R	0.224	on 1st	2	1	0
1	0	success	2	Bryce Harper	harpb003	PHI	4	R	mccab002	soboc001	R	0.224	on 1st	2	1	0
1	0	success	2	Bryce Harper	harpb003	PHI	4	R	alfaj002	quijj001	L	0.224	on 1st	2	1	0
1	0	success	2	Bryce Harper	harpb003	PHI	4	R	flowt001	grees005	R	0.224	on 1st	2	1	0
1	0	success	2	Bryce Harper	harpb003	PHI	4	R	casac001	grays001	R	0.224	on 1st	2	1	0
1	1	fail	1	Bryce Harper	harpb003	PHI	4	R	alfaj002	richt001	R	1.130	corners	2	0	1
1	1	success	1	Bryce Harper	harpb003	PHI	3	L	holab001	alcas001	R	1.130	corners	1	1	1
1	1	success	2	Bryce Harper	harpb003	PHI	3	R	casac001	grays001	R	0.478	corners	2	1	1

add the expected value for State 2 in the same way as before.
Merge the expected value table with the df, but this time merger using the State 2 base and out states:

event_BSA_full_harper_ev_D <- merge(event_BSA_full_harper_ev_C, expect_value, 
        by.x=c('S2_OUT_STATE', 'S2_base_1', 'S2_base_2', 'S2_base_3'),
        by.y=c('OUT_STATE', 'base_1', 'base_2', 'base_3'))

take a look at the new columns added to the df:

the df as a table
S2_OUT_STATE	S2_base_1	S2_base_2	S2_base_3	base_1	base_2	base_3	SBA_outcome	OUT_STATE	runner_name	runner_ID	team_runner_cd	BAT_LINEUP_ID	BAT_HAND_CD	CatcherID	PIT_ID	PIT_HAND_CD	EV_state.x	base_state_name.x	EV_state.y	base_state_name.y
0	0	1	0	1	0	0	success	0	Bryce Harper	harpb003	PHI	4	R	stalj001	ramiy001	R	0.859	on 1st	1.100	on 2nd
0	1	0	1	1	1	0	success	0	Bryce Harper	harpb003	PHI	4	R	mccab002	tehej001	R	1.437	1st & 2nd	1.784	corners
1	0	1	0	1	0	0	success	1	Bryce Harper	harpb003	PHI	4	R	grany001	housa001	R	0.509	on 1st	0.664	on 2nd
1	0	1	0	1	0	0	success	1	Bryce Harper	harpb003	PHI	5	R	barnt001	bowmm001	R	0.509	on 1st	0.664	on 2nd
1	0	1	0	1	0	0	success	1	Bryce Harper	harpb003	PHI	3	R	carav001	darvy001	R	0.509	on 1st	0.664	on 2nd
1	0	1	1	1	0	1	success	1	Bryce Harper	harpb003	PHI	3	L	holab001	alcas001	R	1.130	corners	1.376	2nd & 3rd
2	0	0	1	1	0	1	fail	1	Bryce Harper	harpb003	PHI	4	R	alfaj002	richt001	R	1.130	corners	0.353	on 3rd
2	0	1	0	1	0	0	success	2	Bryce Harper	harpb003	PHI	4	R	ramow001	matzs001	L	0.224	on 1st	0.319	on 2nd
2	0	1	0	1	0	0	success	2	Bryce Harper	harpb003	PHI	4	R	mccab002	sorom001	R	0.224	on 1st	0.319	on 2nd
2	0	1	0	1	0	0	success	2	Bryce Harper	harpb003	PHI	4	R	mccab002	soboc001	R	0.224	on 1st	0.319	on 2nd
2	0	1	0	1	0	0	success	2	Bryce Harper	harpb003	PHI	4	R	alfaj002	quijj001	L	0.224	on 1st	0.319	on 2nd
2	0	1	0	1	0	0	success	2	Bryce Harper	harpb003	PHI	4	R	castj006	berrj001	R	0.224	on 1st	0.319	on 2nd
2	0	1	0	1	0	0	success	2	Bryce Harper	harpb003	PHI	4	R	casac001	grays001	R	0.224	on 1st	0.319	on 2nd
2	0	1	0	1	0	0	success	2	Bryce Harper	harpb003	PHI	4	R	flowt001	grees005	R	0.224	on 1st	0.319	on 2nd
2	0	1	1	1	0	1	success	2	Bryce Harper	harpb003	PHI	3	R	casac001	grays001	R	0.478	corners	0.580	2nd & 3rd

see how there are now two ‘EV_state’ columns; EV_state.x & EV_state.y These are the expected values for State 1 and State 2. Let’s use the ‘rename’ function to rename them to somethings that makes more sense and is easier to read and remember. Also, now you can calculate the change in value for each of the three SBA in the data frame. Use the mutate function to create a new column, and name it ‘change_EV’.
This is the simple subtraction between the expeted value of State 1 and State 2.

event_BSA_full_harper_ev_E <- event_BSA_full_harper_ev_D %>%
  rename(EV_state_1 = EV_state.x, 
         EV_state_2 = EV_state.y) %>%
  mutate(change_EV= EV_state_2- EV_state_1 )

was value created over these three SBA’s? Let’s find out.
Sum the change in value over these three SBA’s.
Remember, these three are non-representative, so this sumation does not represent the players’ actual value creation or descruction.

sum(event_BSA_full_harper_ev_E$change_EV)

## [1] 1.065

A positive value means the runner created value with thier SBA over the full season.

Task 3: Team level SBA expected value.

Did the team create value (or not) by SBA? Over an entire team, over all stolen base attempts, determine the change in expected value between state1 and state2. Over all SBA base states.

the event file is the same as the previous task; the full season SBA events see the total number of SBA for each team:

table(event_BSA$team_runner_cd)

## 
## ANA ARI ATL BAL BOS CHA CHN CIN CLE COL DET HOU KCA LAN MIA MIL MIN NYA NYN OAK 
##  75  95 101  88  83  72  54  95 115  82  73  78 101  60  69  96  43  61  74  56 
## PHI PIT SDN SEA SFN SLN TBA TEX TOR WAS 
##  79  79  87 136  55 114 106 132  53 120

And plot this table using ggplot

ggplot(event_BSA, aes(x=team_runner_cd)) +
  geom_bar() +
  theme(axis.text.x=element_text(angle=90,hjust=1)) +
  ylab('stolen base attempts / team') +
  xlab(NULL)

Similar to the previous tasks, next apply the expected value table to the event file containing all SBA for the season. Also at the Out state change in State 2.

event_BSA_team <- event_BSA %>% 
  merge(expect_value, by=c('OUT_STATE', 'base_1', 'base_2', 'base_3')) %>%
  mutate(S2_OUT_STATE= ifelse(SBA_outcome=='success', OUT_STATE, 1+OUT_STATE))

And complete the State 2 by adding the change in base states:

event_BSA_team_A <- event_BSA_team %>%
  merge(SBA_state_change, by=c('base_1', 'base_2', 'base_3', 'SBA_outcome'))

Finally, reapply the expected value to State 2:

event_BSA_team_B <- event_BSA_team_A %>%
  merge(expect_value, 
        by.x=c('S2_OUT_STATE', 'S2_base_1', 'S2_base_2', 'S2_base_3'),
        by.y=c('OUT_STATE', 'base_1', 'base_2', 'base_3')) %>%
  rename(EV_state_1 = EV_state.x, 
         EV_state_2 = EV_state.y,
         base_state_name_1 = base_state_name.x,
         base_state_name_2 = base_state_name.y) %>%
  mutate(delta_EV = EV_state_2-EV_state_1)

event_BSA_team_B$team_runner_cd <- as.factor(event_BSA_team_B$team_runner_cd)

hist(event_BSA_team_B$delta_EV, breaks=20)

sum(event_BSA_team_B$delta_EV)

## [1] 55.964

sum(event_BSA_team_B$delta_EV)/length(event_BSA_team_B$delta_EV)

## [1] 0.02210269

event_BSA_team_C <- event_BSA_team_B %>%
  group_by(team_runner_cd) %>% 
  summarise(tot_delta_EV = sum(delta_EV))

ggplot(event_BSA_team_C) +
  geom_col(aes(y=tot_delta_EV, x=reorder(team_runner_cd, desc(tot_delta_EV)))) +
  theme_bw() +
  theme(axis.text.x=element_text(angle=90,hjust=1)) +
  xlab(NULL) +
  ylab('cumulative EV change') +
  ggtitle('Team wide change in EV from SBA') +
  theme(plot.title = element_text(hjust = 0.5)) +
  geom_hline(yintercept = 0, color='red')

extra stuff

event_BSA_team_indiv <- event_BSA_team_B %>% 
  group_by(runner_ID) %>% 
  mutate(counter=1) %>%
  summarise(delta_EV_tot= sum(delta_EV),
            SBA_tot= sum(counter))

hist(event_BSA_team_indiv$delta_EV_tot, breaks=40)

max(event_BSA_team_indiv$delta_EV_tot)

## [1] 4.574

ggplot(event_BSA_team_indiv) +
  geom_hline(yintercept = 0, color='red') +
  geom_point(aes(x=SBA_tot, y=delta_EV_tot), alpha=0.5) +
  theme_bw() + xlab('SBA per player') +
  scale_y_continuous(breaks=seq(-2.5, 5, by=0.5), 
                     name='net change in expected value from SBA')

Stolen Base Decisions & Strategy

Task 1: by hand, 1 player, 3 SBA & outcomes

TASK1a: read EV file

TASK1b: read event file

Task 1c: apply EV to SBA events

Task 1d: change in state with SBA,

Task 2: 1 player, all SBA in a season

TASK2a: read in the full event file & select a player

Task 2b: apply EV to SBA events

Task 2c: change in state with SBA,

Task 3: Team level SBA expected value.

extra stuff