HOF prediction model

I’ve written about my baseball hall-of-fame prediction model elsewhere https://fivetwentyone.wordpress.com/2015/02/27/hof-prediction-model-2015/. It uses regularized linear regression to assign weights to each voter, so that when the weighted sum of their ballots is taken, it closely matches the results of the non-public ballots. A useful way of thinking about it is it picks out a subset of voters that taken together match the thinking of the non-public voters. The underlying assumption is that the changes from one year to the next by these voters will match the changes for the corresponding non-public voters. It doesn’t explicitly categorize the groups of voters it picks out. This makes explaining why the model gives the output it does more complicated, in large part because of the colinearity of the ballots. Here I will dig in to some of the model results and try to explain in part why it makes the predictions it does.

Load some libraries

library(readr)
library(dplyr)
library(ggplot2)

Read the model data

I’ve stored the results of my model for the 2017 hall-of-fame election in my github. The output file includes the result for each voter that made their ballots public this year and last, for each player that is returning from last year. It also includes the coefficient, or weight, of the linear regression model. Here I read it into a data frame

hof.model.output.2017 <- read_csv('https://raw.githubusercontent.com/bdilday/hofTracker/master/data/outputs/model.weights.2017.csv')

I’ll read in the model for the 2016 hall-of-fame election also for later comparison.

hof.model.output.2016 <- read_csv('https://raw.githubusercontent.com/bdilday/hofTracker/master/data/outputs/model.weights.2016.csv')

Because my model requires a voter to have made their vote public last year also in order to be used in the fit, the number is less than the full HOF tracker database.

number.of.voters <- unique(hof.model.output.2017$voter) %>% length()
print(number.of.voters)
## [1] 164

I’ve also generated a “tidy” formatted file giving each voter, player, and vote going back to 2009. This will be useful in order to get the overall mean of the public votes, not including the restriction that a voter made their vote public last year also.

hof.tidy <- read_csv('https://raw.githubusercontent.com/bdilday/hofTracker/master/data/outputs/hof.tracker.tidy.csv')

Here are those mean values for the 2017 ballots as of this writing,

hof.tidy %>% filter(year==2017) %>% group_by(player) %>% summarise(public.vote.pct=mean(value*100)) %>% arrange(desc(public.vote.pct)) %>% print.data.frame(digits=1)
##             player public.vote.pct
## 1       Tim Raines            92.2
## 2     Jeff Bagwell            91.2
## 3   Ivan Rodriguez            80.8
## 4    Vlad Guerrero            73.6
## 5   Trevor Hoffman            72.5
## 6   Edgar Martinez            67.4
## 7      Barry Bonds            64.8
## 8    Roger Clemens            64.2
## 9     Mike Mussina            61.1
## 10  Curt Schilling            53.4
## 11       Lee Smith            29.0
## 12   Manny Ramirez            25.9
## 13    Larry Walker            23.3
## 14    Fred McGriff            16.6
## 15       Jeff Kent            15.0
## 16  Gary Sheffield            12.4
## 17    Billy Wagner            11.4
## 18      Sammy Sosa             9.3
## 19    Jorge Posada             4.7
## 20  Edgar Renteria             0.5
## 21   Jason Varitek             0.5
## 22   Arthur Rhodes             0.0
## 23  Carlos Guillen             0.0
## 24     Casey Blake             0.0
## 25      Derrek Lee             0.0
## 26  Freddy Sanchez             0.0
## 27       J.D. Drew             0.0
## 28 Magglio Ordonez             0.0
## 29     Matt Stairs             0.0
## 30     Melvin Mora             0.0
## 31    Mike Cameron             0.0
## 32 Orlando Cabrera             0.0
## 33     Pat Burrell             0.0
## 34   Tim Wakefield             0.0

To go back to the prediction model for 2017, here is the sum of the weights

hof.model.output.2017 %>% group_by(player) %>% summarise(sum.of.weights=sum(weight)) %>% head(1) %>% select(sum.of.weights)
## # A tibble: 1 × 1
##   sum.of.weights
##            <dbl>
## 1      0.9834448

There is also an intercept in the model, which is about 0.0077.

The following table breaks down the contribution to the predictions from each of the four possible groups of voter, namely (yes, no) last year and (yes, no) this year.

hof.model.output.2017 %>% 
  group_by(player, vote1, vote2) %>% 
  summarise(n=n(), w=sum(weight), mw=mean(weight)*number.of.voters) %>% 
  ungroup() %>% 
  group_by(player) %>% 
  mutate(n.all=sum(n), frac.bin=n/n.all, weight.ratio=w/frac.bin) %>% 
  select(-n.all) %>% print.data.frame(digits=3)
##            player vote1 vote2   n       w    mw frac.bin weight.ratio
## 1     Barry Bonds     0     0  57 0.43879 1.262   0.3476        1.262
## 2     Barry Bonds     0     1  21 0.11886 0.928   0.1280        0.928
## 3     Barry Bonds     1     0   1 0.00915 1.501   0.0061        1.501
## 4     Barry Bonds     1     1  85 0.41665 0.804   0.5183        0.804
## 5    Billy Wagner     0     0 142 0.83213 0.961   0.8659        0.961
## 6    Billy Wagner     0     1   9 0.04516 0.823   0.0549        0.823
## 7    Billy Wagner     1     0   4 0.02923 1.198   0.0244        1.198
## 8    Billy Wagner     1     1   9 0.07693 1.402   0.0549        1.402
## 9  Curt Schilling     0     0  52 0.39391 1.242   0.3171        1.242
## 10 Curt Schilling     0     1  13 0.11190 1.412   0.0793        1.412
## 11 Curt Schilling     1     0  23 0.10600 0.756   0.1402        0.756
## 12 Curt Schilling     1     1  76 0.37163 0.802   0.4634        0.802
## 13 Edgar Martinez     0     0  52 0.36500 1.151   0.3171        1.151
## 14 Edgar Martinez     0     1  32 0.21454 1.100   0.1951        1.100
## 15 Edgar Martinez     1     0   1 0.01459 2.392   0.0061        2.392
## 16 Edgar Martinez     1     1  79 0.38932 0.808   0.4817        0.808
## 17   Fred McGriff     0     0 129 0.73759 0.938   0.7866        0.938
## 18   Fred McGriff     0     1   6 0.03330 0.910   0.0366        0.910
## 19   Fred McGriff     1     0   8 0.05396 1.106   0.0488        1.106
## 20   Fred McGriff     1     1  21 0.15859 1.238   0.1280        1.238
## 21 Gary Sheffield     0     0 137 0.85139 1.019   0.8354        1.019
## 22 Gary Sheffield     0     1   6 0.03299 0.902   0.0366        0.902
## 23 Gary Sheffield     1     0   6 0.02644 0.723   0.0366        0.723
## 24 Gary Sheffield     1     1  15 0.07262 0.794   0.0915        0.794
## 25   Jeff Bagwell     0     0  14 0.13317 1.560   0.0854        1.560
## 26   Jeff Bagwell     0     1  16 0.16786 1.721   0.0976        1.721
## 27   Jeff Bagwell     1     1 134 0.68242 0.835   0.8171        0.835
## 28      Jeff Kent     0     0 132 0.79438 0.987   0.8049        0.987
## 29      Jeff Kent     0     1   7 0.03640 0.853   0.0427        0.853
## 30      Jeff Kent     1     0   7 0.03944 0.924   0.0427        0.924
## 31      Jeff Kent     1     1  18 0.11322 1.032   0.1098        1.032
## 32   Larry Walker     0     0 124 0.77514 1.025   0.7561        1.025
## 33   Larry Walker     0     1  14 0.06059 0.710   0.0854        0.710
## 34   Larry Walker     1     0   2 0.00919 0.754   0.0122        0.754
## 35   Larry Walker     1     1  24 0.13852 0.947   0.1463        0.947
## 36      Lee Smith     0     0 113 0.62211 0.903   0.6890        0.903
## 37      Lee Smith     0     1   6 0.01228 0.336   0.0366        0.336
## 38      Lee Smith     1     0   3 0.02233 1.221   0.0183        1.221
## 39      Lee Smith     1     1  42 0.32673 1.276   0.2561        1.276
## 40   Mike Mussina     0     0  54 0.43108 1.309   0.3293        1.309
## 41   Mike Mussina     0     1  23 0.15862 1.131   0.1402        1.131
## 42   Mike Mussina     1     0   8 0.04544 0.932   0.0488        0.932
## 43   Mike Mussina     1     1  79 0.34830 0.723   0.4817        0.723
## 44  Roger Clemens     0     0  58 0.43045 1.217   0.3537        1.217
## 45  Roger Clemens     0     1  22 0.10931 0.815   0.1341        0.815
## 46  Roger Clemens     1     0   1 0.00915 1.501   0.0061        1.501
## 47  Roger Clemens     1     1  83 0.43453 0.859   0.5061        0.859
## 48     Sammy Sosa     0     0 143 0.88361 1.013   0.8720        1.013
## 49     Sammy Sosa     0     1   6 0.04034 1.103   0.0366        1.103
## 50     Sammy Sosa     1     0   4 0.02117 0.868   0.0244        0.868
## 51     Sammy Sosa     1     1  11 0.03832 0.571   0.0671        0.571
## 52     Tim Raines     0     0  14 0.10704 1.254   0.0854        1.254
## 53     Tim Raines     0     1  25 0.21267 1.395   0.1524        1.395
## 54     Tim Raines     1     1 125 0.66373 0.871   0.7622        0.871
## 55 Trevor Hoffman     0     0  34 0.18497 0.892   0.2073        0.892
## 56 Trevor Hoffman     0     1  24 0.12811 0.875   0.1463        0.875
## 57 Trevor Hoffman     1     0   9 0.05997 1.093   0.0549        1.093
## 58 Trevor Hoffman     1     1  97 0.61039 1.032   0.5915        1.032