I recently stumbled upon the so called Glicko-2 algorithm when the popluar esports site HLTV.org published a newly created ranking of professional Counter-Strike teams. The ranking itself looked sensible but the algrorithm used seemd a bit subjective and lacked somewhat of statistical or scientific justification. Furthermore i believe that confidence intervalls are very important in this kind of analysis to tell apart if there are significant differences which were not present in their approach.
I found that the Glicko-2 algorithm would suit this problem very well and was ambitious to use it to create my own ranking. Unfortunately i could not find the data used by HLTV.org nor an implementation of the Glicko-2 algrorithm for R. So i went ahed and wrote a web scraper to get me the desired data and implemented a version of the Glicko-2 in R myself. Code can be found on GitHub.
Furthermore i published my results on Reddit and received an overwhelming amount of positve reactions, which made me even more ambitious to improve the algorithm further to fit my problem better. I have several ideas but first i have to implement the underlying bayesian model as a starting point and then move forward.
The goal of this exercise is to reproduce the results of the NFL example examined in Glickman (2001) in order to make sure having succeeded in implementing a correct version of the proposed algorithm (i.e. model) using the RStan library for R. In a further application i am planning to modify the model to cope with score differences in the outcome variable to differentiate between close and clear victories (most likely using an ordered logit or poisson distribution instead of a binomial logit).
The further will describe my approach in implementing the proposed constant variance and stochastic variance model and the following parameter inference on the NFL games data (as used in the paper and extended up to the end of 2015). I will present how i obtained and processed the data, how the models are implemented and the inference is carried out. All results will be compared to those described in the paper. In the end i will conclude with what i was able to reproduce and where i failed.
In order to perform the analysis i needed to get the data first. Since i was not able to find ressources for the paper i downloaded data from the publicly available Pro Football Reference (2015) and pre-processed it with the information given in the paper. After a bit of data processing i believe i was able to recreate the used dataset. The table below shows the first and last 5 rows.
| winner | loser | date | is_home | score | period | play_week |
|---|---|---|---|---|---|---|
| Indianapolis Colts | Arizona Cardinals | 1996-09-01 | 1 | 1 | 1 | 1 |
| Carolina Panthers | Atlanta Falcons | 1996-09-01 | 1 | 1 | 1 | 1 |
| St. Louis Rams | Cincinnati Bengals | 1996-09-01 | 1 | 1 | 1 | 1 |
| Minnesota Vikings | Detroit Lions | 1996-09-01 | 1 | 1 | 1 | 1 |
| Miami Dolphins | New England Patriots | 1996-09-01 | 1 | 1 | 1 | 1 |
| Seattle Seahawks | San Diego Chargers | 2000-11-05 | 1 | 1 | 5 | 10 |
| New Orleans Saints | San Francisco 49ers | 2000-11-05 | 1 | 1 | 5 | 10 |
| Carolina Panthers | St. Louis Rams | 2000-11-05 | -1 | 1 | 5 | 10 |
| Arizona Cardinals | Washington Redskins | 2000-11-05 | 1 | 1 | 5 | 10 |
| Green Bay Packers | Minnesota Vikings | 2000-11-06 | 1 | 1 | 5 | 10 |
The full table contains all regular season games (excluding two draws in 1997) from year 1996 up to the tenth play week of year 2000. This results in 1109 rows as also described in the paper. So this looks fine. score is the outcome variable for the team under winner (always one in this case of course due to the structure of the table), loser contains the losing team, period numbers the periods starting from 1996 to 2000 (5 periods), and is_home is a dummy variable denoting whether the team under winner played on the home ground with +1 or else -1. date and play_week contain information on when the games took place.
Matches are played by 31 different teams. The following table shows the first 5 teams in alphabetical order. team_id numbers all teams subsequently.
| team_name | team_id |
|---|---|
| Arizona Cardinals | 1 |
| Atlanta Falcons | 2 |
| Baltimore Ravens | 3 |
| Buffalo Bills | 4 |
| Carolina Panthers | 5 |
I conclude the dataset matches all the information given in the paper. E.g. in terms of number of matches and number of distinct teams.
This section will describe how the models were implemented. For implementation i used the Rstan library in R which provides an interface to the Stan Language. The model is pretty well definded in the paper. Only some information on priors used is sometimes not explicitly defined. I implemented a version of the constant variance model and one version of the stochastic variance model. The MCMC estimation procedure is carried out using the No-U-Turn Sampler (NUTS).
In the further equations \(\mathcal{N}(\mu,\sigma^2)\) denotes a normal distribution with mean \(\mu\) and standard deviation \(\sigma\). \(\mathcal{LN}(\mu,\sigma^2)\) denotes a log normal distribution with mean \(\mu\) and standard deviation \(\sigma\) on the log scale. \(\mathcal{B}(p)\) denotes a bernoulli distribution with probability \(p\). \(\mathcal{IG}(\alpha, \beta)\) denotes an inverse gamma distribution with shape \(\alpha\) and scale \(\beta\). \(l()\) denotes the logistic function. Parameters and variables are denoted very similar as in the paper: \(y\) is the binary response vector (score) for teams \(f\), \(\gamma_f\) is the vector of strengths for the first player, \(\gamma_s\) is the vector for the second teams. \(x\) is the vector of the covariate for the home advantage, which equals 1 if the first team plays on the home field and -1 otherwise. All of these vectors are of length 1109. \(\rho\) is the autocorrelation parameter, \(\beta\) the parameter for the home field advantage and \(\omega\) is the inital standard deviation in period zero, and \(\tau\) is the parameter for the change in variances. \(\gamma^{(t)}\) is a vector of length 31 containing the teams stength in period \(t \in \{0,\dots,5\}\). Likewise \(\sigma^{2^{(t)}}\) is a vector of length 31 containing the teams variances in period \(t \in \{1,\dots,5\}\). The following two subsections present the model equations and priors used for estimation for both models. This should be the exact same models used in the paper but some priors on single parameters that are explicitly defined.
The model equations and priors used are specified as follows: \[ y \sim \mathcal{B}(l(\gamma_{f} - \gamma_{s} + x \beta)) \] \[ \gamma^{(0)} \sim \mathcal{N}(0, \omega^2) \] \[ \gamma^{(t+1)} \sim \mathcal{N}(\rho \gamma^{(t)}, \sigma^2) \text{ } \forall_{t < 5}\] \[ \beta \sim \mathcal{N}(0, 25^2) \] \[ \omega^2 \sim \mathcal{IG}(4, 2) \]
The model equations and priors used are specified as follows:
\[ y \sim \mathcal{B}(l(\gamma_{f} - \gamma_{s} + x \beta)) \] \[ \gamma^{(0)} \sim \mathcal{N}(0, \omega^2) \] \[ \gamma^{(t+1)} \sim \mathcal{N}(\rho \gamma^{(t)}, \sigma^{2^{(t+1)}}) \text{ } \forall_{t < 5} \]
\[ \sigma^{2^{(1)}} \sim \mathcal{LN}(log(\omega^2), \tau^2) \] \[ \sigma^{2^{(t+1)}} \sim \mathcal{LN}(log(\sigma^{2^{(t)}}), \tau^2) \text{ } \forall_{0 < t < 5} \]
\[ \beta \sim \mathcal{N}(0, 25^2) \] \[ \tau \sim \mathcal{IG}(4, \frac{3}{2}) \] \[ \omega^2 \sim \mathcal{IG}(4, 2) \]
For model estimation iterations are carried out in parallel on 6 CPU cores, where each core computes 25000 iterations on independent chains (one chain per core). The first 5000 iterations per chain are excluded as warmup iterations. So this leads to 120000 post-warmup iterations in total. All of them are saved and used for inference. I did not find any benefit in running more iterations.
This section presents the results of the model estimations. First the constant variance model is examined shortly for means of comparison and then the stochastic variance model is examined extensively which i will focus on.
This section compares summaries of the posterior distributions for non team or time specific parameters for both models.
This subscetions presents posterior means, medians, and pointwise posterior 95% confidence intervalls for the parameters \(\rho\), \(\omega\), \(\sigma\), and \(\beta\) in the constant variance model.
| parameter | mean | 2.5% | 50% | 97.5% |
|---|---|---|---|---|
| rho | 0.5048 | 0.2515 | 0.5080 | 0.7444 |
| omega | 0.7438 | 0.4733 | 0.7112 | 1.2020 |
| sigma | 0.6777 | 0.5065 | 0.6750 | 0.8631 |
| beta | 0.5142 | 0.3797 | 0.5145 | 0.6517 |
This subscetions presents posterior means, medians, and pointwise posterior 95% confidence intervalls for the parameters \(\rho\), \(\tau\), \(\omega\), and \(\beta\).
| parameter | mean | 2.5% | 50% | 97.5% |
|---|---|---|---|---|
| rho | 0.5293 | 0.2598 | 0.5336 | 0.7745 |
| tau | 0.3317 | 0.1523 | 0.3166 | 0.6080 |
| omega | 0.6493 | 0.5097 | 0.6456 | 0.8151 |
| beta | 0.5148 | 0.3789 | 0.5142 | 0.6513 |
Parameter \(\rho\) is pretty much the same as described in Glickman (2001), but \(\beta\) the advantage effect of playing on the home field is almost twice as high as presented in the paper. The paper does not present results of \(\tau\) or \(\omega\).
The next table contains information on teams’ merits for period 5 (year 2000) that is structured as table 1 in Glickman (2001) on p. 678. Displayed are teams’ merits as posterior means of \(\gamma^{(5)}\) and posterior standard deviations of these parameters for the constant variance model (cv) and the stochastic variance model (sv). The table is ordered by merits in the constant variance model.
| team_name | posterior_mean_cv | posterior_sd_cv | posterior_mean_sv | posterior_sd_sv |
|---|---|---|---|---|
| Tennessee Titans | 1.2371 | 0.5725 | 1.3108 | 0.7003 |
| Minnesota Vikings | 0.8815 | 0.5434 | 0.9464 | 0.6119 |
| Miami Dolphins | 0.8023 | 0.5576 | 0.7623 | 0.5755 |
| Oakland Raiders | 0.7964 | 0.5662 | 0.7806 | 0.6595 |
| Indianapolis Colts | 0.6136 | 0.5289 | 0.6996 | 0.6104 |
| New York Giants | 0.5452 | 0.5545 | 0.4744 | 0.5544 |
| New York Jets | 0.5353 | 0.5376 | 0.5563 | 0.5546 |
| St. Louis Rams | 0.5276 | 0.5494 | 0.5160 | 0.5984 |
| Buffalo Bills | 0.4824 | 0.5425 | 0.4774 | 0.5341 |
| Tampa Bay Buccaneers | 0.3880 | 0.5346 | 0.3642 | 0.5104 |
| Washington Redskins | 0.3611 | 0.5208 | 0.3137 | 0.4857 |
| Kansas City Chiefs | 0.2885 | 0.5422 | 0.2782 | 0.5259 |
| Detroit Lions | 0.2415 | 0.5235 | 0.2059 | 0.5050 |
| Baltimore Ravens | 0.2283 | 0.5328 | 0.1719 | 0.5096 |
| Denver Broncos | 0.0596 | 0.5395 | 0.0499 | 0.5374 |
| Pittsburgh Steelers | 0.0562 | 0.5418 | 0.0249 | 0.5290 |
| Philadelphia Eagles | -0.0104 | 0.5218 | -0.0201 | 0.5253 |
| Jacksonville Jaguars | -0.0733 | 0.5449 | -0.1020 | 0.5871 |
| Green Bay Packers | -0.0766 | 0.5309 | -0.0554 | 0.5176 |
| New Orleans Saints | -0.1252 | 0.5340 | -0.1539 | 0.5457 |
| New England Patriots | -0.3677 | 0.5432 | -0.3108 | 0.5422 |
| Carolina Panthers | -0.3806 | 0.5358 | -0.3850 | 0.5274 |
| Seattle Seahawks | -0.4281 | 0.5341 | -0.3897 | 0.5296 |
| Dallas Cowboys | -0.4726 | 0.5436 | -0.4239 | 0.5377 |
| Chicago Bears | -0.6197 | 0.5384 | -0.5884 | 0.5407 |
| Atlanta Falcons | -0.6458 | 0.5271 | -0.7367 | 0.5908 |
| Arizona Cardinals | -0.7129 | 0.5310 | -0.6937 | 0.5496 |
| San Francisco 49ers | -0.9145 | 0.5363 | -1.0427 | 0.6379 |
| Cincinnati Bengals | -0.9441 | 0.5644 | -1.0066 | 0.6129 |
| San Diego Chargers | -1.1459 | 0.5902 | -1.4030 | 0.8661 |
| Cleveland Browns | -1.1530 | 0.5743 | -1.3033 | 0.6631 |
These results are very similar to the ones presented in the paper, although the varaiance in general is a bit lower and the values of the teams strength are also lower, both by a factor of approx. 0.8 for the stochastic variance model. This mainly results because of the higher value of \(\beta\), which i find for both models. The ranking itself remains almost unchanged in both cases. Also differences in teams’ merits are similar. More or less it seems to be the same result measured on a different scale through the higher value of \(\beta\). I move on presenting graphs that can be directly compared to parts of figures 1 and 2 in Glickman (2001) presenting the merits \(\gamma^{(t)}\) for the Atlanta Falcons and the Seattle Seahawks including pointwise 95% confidence intervalls as dotted lines.
Merits over Time in the Stochastic Variance Model
These graphs are again very similar to the ones presented in the paper. The graphs for both models show basically the same pattern. The only difference is that in the stochastic variance model there is more variation in the merit which is especially visible for the Atlanta Falcons. The next figure shows the variances \(\sigma^{2^{(t)}}\) over time for these two teams in the stochastic variance model.
Variances over Time in the Stochastic Variance Model
Once again one can see that this graph is also very similar although the pattern for the Seattle Seahawks is a little different, especially in 1996, but their change in \(\sigma^2\) is marginal over time so i think this should not raise a problem here. The main insights stay the same.
I was able to recreate a dataset matching all the information given in the paper from an external source of data. Furthermore i could implement both of the presented models and carry out the MCMC simulation in R using RStan. The results in the paper can be quite closely reproduced except for the value of the parameter \(\beta\) which i find to be almost twice as high as presented in the paper. This in turn shrinks the parameters \(\gamma\) and \(\sigma\) so that the overall level of the probabilities preserved. The comparison on table 1 and figures 1 and 2 make me feel confident that the estimation procedure itself is correctly implemented. It seems to me that either the dataset is different in the data on the home field variable or that there is some misspecification in the definition of the covariate vector \(x\). The constant variance model shows the same size of the parameter \(\beta\) which looks to be further evidence for this assumption. Nevertheless i cannot fully rule out errors in the model specification or estimation that affect both models. The next section presents the results of an extended dataset.
Since my code is written in a rather generic way it is easy to extend the NFL dataset up to the end of 2015 and run the estimation on this data. The dataset starts with the year 1996 and is extended up to the end of 2015. Variable and model definitions remain as before except for the number of games, periods and also teams. This dataset now features 5025 rows (i.e. regular season games) for 20 periods and 32 teams.
The following section is kept short and displays the single parameters \(\rho\), \(\tau\), \(\omega\), and \(\beta\) first and then the merits and confidence intervalls for all 32 over time including pointwise 95% posterior confidence intervalls for the stochastic variance model in a similar way as in the previous sections.
| parameter | mean | 2.5% | 50% | 97.5% |
|---|---|---|---|---|
| rho | 0.5849 | 0.4704 | 0.5859 | 0.6945 |
| tau | 0.1666 | 0.1021 | 0.1634 | 0.2522 |
| omega | 0.6005 | 0.5107 | 0.6006 | 0.6924 |
| beta | 0.3900 | 0.3265 | 0.3900 | 0.4531 |
It turns out that the parameters for \(\beta\) and \(\tau\) and \(\omega\) decrease as \(\rho\) increases. Confidence on estimates increases in general because the model can now use more data than before. The following figure depicts the teams merits in alphabetical order from left to right and top to bottom.
Merits over Time in the Stochastic Variance Model (2015 Data)
Once again the confidence intervalls become narrower because of more information in the model. Thank you very much for reading this report!
sessionInfo()
## R version 3.2.3 (2015-12-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 15.10
##
## locale:
## [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C
## [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8
## [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] knitr_1.11 rstan_2.8.2 ggplot2_1.0.1 data.table_1.9.4
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.2 MASS_7.3-44 munsell_0.4.2
## [4] colorspace_1.2-6 highr_0.5.1 stringr_0.6.2
## [7] plyr_1.8.1 tools_3.2.3 parallel_3.2.3
## [10] grid_3.2.3 gtable_0.1.2 htmltools_0.3
## [13] tufterhandout_1.2.1 yaml_2.1.13 digest_0.6.8
## [16] gridExtra_2.0.0 reshape2_1.4.1 formatR_1.2.1
## [19] codetools_0.2-14 inline_0.3.14 evaluate_0.8
## [22] rmarkdown_0.9 labeling_0.3 scales_0.2.4
## [25] stats4_3.2.3 chron_2.3-45 proto_0.3-10