I’ve written about my baseball hall-of-fame prediction model elsewhere https://fivetwentyone.wordpress.com/2015/02/27/hof-prediction-model-2015/. It uses regularized linear regression to assign weights to each voter, so that when the weighted sum of their ballots is taken, it closely matches the results of the non-public ballots. A useful way of thinking about it is it picks out a subset of voters that taken together match the thinking of the non-public voters. The underlying assumption is that the changes from one year to the next by these voters will match the changes for the corresponding non-public voters. It doesn’t explicitly categorize the groups of voters it picks out. This makes explaining why the model gives the output it does more complicated, in large part because of the colinearity of the ballots. Here I will dig in to some of the model results and try to explain in part why it makes the predictions it does.
library(readr)
library(dplyr)
library(ggplot2)
I’ve stored the results of my model for the 2017 hall-of-fame election in my github. The output file includes the result for each voter that made their ballots public this year and last, for each player that is returning from last year. It also includes the coefficient, or weight, of the linear regression model. Here I read it into a data frame
hof.model.output.2017 <- read_csv('https://raw.githubusercontent.com/bdilday/hofTracker/master/data/outputs/model.weights.2017.csv')
I’ll read in the model for the 2016 hall-of-fame election also for later comparison.
hof.model.output.2016 <- read_csv('https://raw.githubusercontent.com/bdilday/hofTracker/master/data/outputs/model.weights.2016.csv')
Because my model requires a voter to have made their vote public last year also in order to be used in the fit, the number is less than the full HOF tracker database.
number.of.voters <- unique(hof.model.output.2017$voter) %>% length()
print(number.of.voters)
## [1] 164
I’ve also generated a “tidy” formatted file giving each voter, player, and vote going back to 2009. This will be useful in order to get the overall mean of the public votes, not including the restriction that a voter made their vote public last year also.
hof.tidy <- read_csv('https://raw.githubusercontent.com/bdilday/hofTracker/master/data/outputs/hof.tracker.tidy.csv')
Here are those mean values for the 2017 ballots as of this writing,
hof.tidy %>% filter(year==2017) %>% group_by(player) %>% summarise(public.vote.pct=mean(value*100)) %>% arrange(desc(public.vote.pct)) %>% print.data.frame(digits=1)
## player public.vote.pct
## 1 Tim Raines 92.2
## 2 Jeff Bagwell 91.2
## 3 Ivan Rodriguez 80.8
## 4 Vlad Guerrero 73.6
## 5 Trevor Hoffman 72.5
## 6 Edgar Martinez 67.4
## 7 Barry Bonds 64.8
## 8 Roger Clemens 64.2
## 9 Mike Mussina 61.1
## 10 Curt Schilling 53.4
## 11 Lee Smith 29.0
## 12 Manny Ramirez 25.9
## 13 Larry Walker 23.3
## 14 Fred McGriff 16.6
## 15 Jeff Kent 15.0
## 16 Gary Sheffield 12.4
## 17 Billy Wagner 11.4
## 18 Sammy Sosa 9.3
## 19 Jorge Posada 4.7
## 20 Edgar Renteria 0.5
## 21 Jason Varitek 0.5
## 22 Arthur Rhodes 0.0
## 23 Carlos Guillen 0.0
## 24 Casey Blake 0.0
## 25 Derrek Lee 0.0
## 26 Freddy Sanchez 0.0
## 27 J.D. Drew 0.0
## 28 Magglio Ordonez 0.0
## 29 Matt Stairs 0.0
## 30 Melvin Mora 0.0
## 31 Mike Cameron 0.0
## 32 Orlando Cabrera 0.0
## 33 Pat Burrell 0.0
## 34 Tim Wakefield 0.0
To go back to the prediction model for 2017, here is the sum of the weights
hof.model.output.2017 %>% group_by(player) %>% summarise(sum.of.weights=sum(weight)) %>% head(1) %>% select(sum.of.weights)
## # A tibble: 1 × 1
## sum.of.weights
## <dbl>
## 1 0.9834448
There is also an intercept in the model, which is about 0.0077.
The following table breaks down the contribution to the predictions from each of the four possible groups of voter, namely (yes, no) last year and (yes, no) this year.
hof.model.output.2017 %>%
group_by(player, vote1, vote2) %>%
summarise(n=n(), w=sum(weight), mw=mean(weight)*number.of.voters) %>%
ungroup() %>%
group_by(player) %>%
mutate(n.all=sum(n), frac.bin=n/n.all, weight.ratio=w/frac.bin) %>%
select(-n.all) %>% print.data.frame(digits=3)
## player vote1 vote2 n w mw frac.bin weight.ratio
## 1 Barry Bonds 0 0 57 0.43879 1.262 0.3476 1.262
## 2 Barry Bonds 0 1 21 0.11886 0.928 0.1280 0.928
## 3 Barry Bonds 1 0 1 0.00915 1.501 0.0061 1.501
## 4 Barry Bonds 1 1 85 0.41665 0.804 0.5183 0.804
## 5 Billy Wagner 0 0 142 0.83213 0.961 0.8659 0.961
## 6 Billy Wagner 0 1 9 0.04516 0.823 0.0549 0.823
## 7 Billy Wagner 1 0 4 0.02923 1.198 0.0244 1.198
## 8 Billy Wagner 1 1 9 0.07693 1.402 0.0549 1.402
## 9 Curt Schilling 0 0 52 0.39391 1.242 0.3171 1.242
## 10 Curt Schilling 0 1 13 0.11190 1.412 0.0793 1.412
## 11 Curt Schilling 1 0 23 0.10600 0.756 0.1402 0.756
## 12 Curt Schilling 1 1 76 0.37163 0.802 0.4634 0.802
## 13 Edgar Martinez 0 0 52 0.36500 1.151 0.3171 1.151
## 14 Edgar Martinez 0 1 32 0.21454 1.100 0.1951 1.100
## 15 Edgar Martinez 1 0 1 0.01459 2.392 0.0061 2.392
## 16 Edgar Martinez 1 1 79 0.38932 0.808 0.4817 0.808
## 17 Fred McGriff 0 0 129 0.73759 0.938 0.7866 0.938
## 18 Fred McGriff 0 1 6 0.03330 0.910 0.0366 0.910
## 19 Fred McGriff 1 0 8 0.05396 1.106 0.0488 1.106
## 20 Fred McGriff 1 1 21 0.15859 1.238 0.1280 1.238
## 21 Gary Sheffield 0 0 137 0.85139 1.019 0.8354 1.019
## 22 Gary Sheffield 0 1 6 0.03299 0.902 0.0366 0.902
## 23 Gary Sheffield 1 0 6 0.02644 0.723 0.0366 0.723
## 24 Gary Sheffield 1 1 15 0.07262 0.794 0.0915 0.794
## 25 Jeff Bagwell 0 0 14 0.13317 1.560 0.0854 1.560
## 26 Jeff Bagwell 0 1 16 0.16786 1.721 0.0976 1.721
## 27 Jeff Bagwell 1 1 134 0.68242 0.835 0.8171 0.835
## 28 Jeff Kent 0 0 132 0.79438 0.987 0.8049 0.987
## 29 Jeff Kent 0 1 7 0.03640 0.853 0.0427 0.853
## 30 Jeff Kent 1 0 7 0.03944 0.924 0.0427 0.924
## 31 Jeff Kent 1 1 18 0.11322 1.032 0.1098 1.032
## 32 Larry Walker 0 0 124 0.77514 1.025 0.7561 1.025
## 33 Larry Walker 0 1 14 0.06059 0.710 0.0854 0.710
## 34 Larry Walker 1 0 2 0.00919 0.754 0.0122 0.754
## 35 Larry Walker 1 1 24 0.13852 0.947 0.1463 0.947
## 36 Lee Smith 0 0 113 0.62211 0.903 0.6890 0.903
## 37 Lee Smith 0 1 6 0.01228 0.336 0.0366 0.336
## 38 Lee Smith 1 0 3 0.02233 1.221 0.0183 1.221
## 39 Lee Smith 1 1 42 0.32673 1.276 0.2561 1.276
## 40 Mike Mussina 0 0 54 0.43108 1.309 0.3293 1.309
## 41 Mike Mussina 0 1 23 0.15862 1.131 0.1402 1.131
## 42 Mike Mussina 1 0 8 0.04544 0.932 0.0488 0.932
## 43 Mike Mussina 1 1 79 0.34830 0.723 0.4817 0.723
## 44 Roger Clemens 0 0 58 0.43045 1.217 0.3537 1.217
## 45 Roger Clemens 0 1 22 0.10931 0.815 0.1341 0.815
## 46 Roger Clemens 1 0 1 0.00915 1.501 0.0061 1.501
## 47 Roger Clemens 1 1 83 0.43453 0.859 0.5061 0.859
## 48 Sammy Sosa 0 0 143 0.88361 1.013 0.8720 1.013
## 49 Sammy Sosa 0 1 6 0.04034 1.103 0.0366 1.103
## 50 Sammy Sosa 1 0 4 0.02117 0.868 0.0244 0.868
## 51 Sammy Sosa 1 1 11 0.03832 0.571 0.0671 0.571
## 52 Tim Raines 0 0 14 0.10704 1.254 0.0854 1.254
## 53 Tim Raines 0 1 25 0.21267 1.395 0.1524 1.395
## 54 Tim Raines 1 1 125 0.66373 0.871 0.7622 0.871
## 55 Trevor Hoffman 0 0 34 0.18497 0.892 0.2073 0.892
## 56 Trevor Hoffman 0 1 24 0.12811 0.875 0.1463 0.875
## 57 Trevor Hoffman 1 0 9 0.05997 1.093 0.0549 1.093
## 58 Trevor Hoffman 1 1 97 0.61039 1.032 0.5915 1.032