Motivation

I recently stumbled upon the so called Glicko-2 algorithm when the popluar esports site HLTV.org published a newly created ranking of professional Counter-Strike teams. The ranking itself looked sensible but the algrorithm used seemd a bit subjective and lacked somewhat of statistical or scientific justification. Furthermore i believe that confidence intervalls are very important in this kind of analysis to tell apart if there are significant differences which were not present in their approach.

I found that the Glicko-2 algorithm would suit this problem very well and was ambitious to use it to create my own ranking. Unfortunately i could not find the data used by HLTV.org nor an implementation of the Glicko-2 algrorithm for R. So i went ahed and wrote a web scraper to get me the desired data and implemented a version of the Glicko-2 in R myself. Code can be found on GitHub.

Furthermore i published my results on Reddit and received an overwhelming amount of positve reactions, which made me even more ambitious to improve the algorithm further to fit my problem better. I have several ideas but first i have to implement the underlying bayesian model as a starting point and then move forward.

Introduction

The goal of this exercise is to reproduce the results of the NFL example examined in Glickman (2001) in order to make sure having succeeded in implementing a correct version of the proposed algorithm (i.e. model) using the RStan library for R. In a further application i am planning to modify the model to cope with score differences in the outcome variable to differentiate between close and clear victories (most likely using an ordered logit or poisson distribution instead of a binomial logit).

The further will describe my approach in implementing the proposed constant variance and stochastic variance model and the following parameter inference on the NFL games data (as used in the paper and extended up to the end of 2015). I will present how i obtained and processed the data, how the models are implemented and the inference is carried out. All results will be compared to those described in the paper. In the end i will conclude with what i was able to reproduce and where i failed.

Getting the Dataset

In order to perform the analysis i needed to get the data first. Since i was not able to find ressources for the paper i downloaded data from the publicly available Pro Football Reference (2015) and pre-processed it with the information given in the paper. After a bit of data processing i believe i was able to recreate the used dataset. The table below shows the first and last 5 rows.

winner loser date is_home score period play_week
Indianapolis Colts Arizona Cardinals 1996-09-01 1 1 1 1
Carolina Panthers Atlanta Falcons 1996-09-01 1 1 1 1
St. Louis Rams Cincinnati Bengals 1996-09-01 1 1 1 1
Minnesota Vikings Detroit Lions 1996-09-01 1 1 1 1
Miami Dolphins New England Patriots 1996-09-01 1 1 1 1
Seattle Seahawks San Diego Chargers 2000-11-05 1 1 5 10
New Orleans Saints San Francisco 49ers 2000-11-05 1 1 5 10
Carolina Panthers St. Louis Rams 2000-11-05 -1 1 5 10
Arizona Cardinals Washington Redskins 2000-11-05 1 1 5 10
Green Bay Packers Minnesota Vikings 2000-11-06 1 1 5 10

The full table contains all regular season games (excluding two draws in 1997) from year 1996 up to the tenth play week of year 2000. This results in 1109 rows as also described in the paper. So this looks fine. score is the outcome variable for the team under winner (always one in this case of course due to the structure of the table), loser contains the losing team, period numbers the periods starting from 1996 to 2000 (5 periods), and is_home is a dummy variable denoting whether the team under winner played on the home ground with +1 or else -1. date and play_week contain information on when the games took place.

Matches are played by 31 different teams. The following table shows the first 5 teams in alphabetical order. team_id numbers all teams subsequently.

team_name team_id
Arizona Cardinals 1
Atlanta Falcons 2
Baltimore Ravens 3
Buffalo Bills 4
Carolina Panthers 5

I conclude the dataset matches all the information given in the paper. E.g. in terms of number of matches and number of distinct teams.

Model Implementation

This section will describe how the models were implemented. For implementation i used the Rstan library in R which provides an interface to the Stan Language. The model is pretty well definded in the paper. Only some information on priors used is sometimes not explicitly defined. I implemented a version of the constant variance model and one version of the stochastic variance model. The MCMC estimation procedure is carried out using the No-U-Turn Sampler (NUTS).

In the further equations \(\mathcal{N}(\mu,\sigma^2)\) denotes a normal distribution with mean \(\mu\) and standard deviation \(\sigma\). \(\mathcal{LN}(\mu,\sigma^2)\) denotes a log normal distribution with mean \(\mu\) and standard deviation \(\sigma\) on the log scale. \(\mathcal{B}(p)\) denotes a bernoulli distribution with probability \(p\). \(\mathcal{IG}(\alpha, \beta)\) denotes an inverse gamma distribution with shape \(\alpha\) and scale \(\beta\). \(l()\) denotes the logistic function. Parameters and variables are denoted very similar as in the paper: \(y\) is the binary response vector (score) for teams \(f\), \(\gamma_f\) is the vector of strengths for the first player, \(\gamma_s\) is the vector for the second teams. \(x\) is the vector of the covariate for the home advantage, which equals 1 if the first team plays on the home field and -1 otherwise. All of these vectors are of length 1109. \(\rho\) is the autocorrelation parameter, \(\beta\) the parameter for the home field advantage and \(\omega\) is the inital standard deviation in period zero, and \(\tau\) is the parameter for the change in variances. \(\gamma^{(t)}\) is a vector of length 31 containing the teams stength in period \(t \in \{0,\dots,5\}\). Likewise \(\sigma^{2^{(t)}}\) is a vector of length 31 containing the teams variances in period \(t \in \{1,\dots,5\}\). The following two subsections present the model equations and priors used for estimation for both models. This should be the exact same models used in the paper but some priors on single parameters that are explicitly defined.

Constant Variance Model

The model equations and priors used are specified as follows: \[ y \sim \mathcal{B}(l(\gamma_{f} - \gamma_{s} + x \beta)) \] \[ \gamma^{(0)} \sim \mathcal{N}(0, \omega^2) \] \[ \gamma^{(t+1)} \sim \mathcal{N}(\rho \gamma^{(t)}, \sigma^2) \text{ } \forall_{t < 5}\] \[ \beta \sim \mathcal{N}(0, 25^2) \] \[ \omega^2 \sim \mathcal{IG}(4, 2) \]

Stochastic Variance Model

The model equations and priors used are specified as follows:

\[ y \sim \mathcal{B}(l(\gamma_{f} - \gamma_{s} + x \beta)) \] \[ \gamma^{(0)} \sim \mathcal{N}(0, \omega^2) \] \[ \gamma^{(t+1)} \sim \mathcal{N}(\rho \gamma^{(t)}, \sigma^{2^{(t+1)}}) \text{ } \forall_{t < 5} \]

\[ \sigma^{2^{(1)}} \sim \mathcal{LN}(log(\omega^2), \tau^2) \] \[ \sigma^{2^{(t+1)}} \sim \mathcal{LN}(log(\sigma^{2^{(t)}}), \tau^2) \text{ } \forall_{0 < t < 5} \]

\[ \beta \sim \mathcal{N}(0, 25^2) \] \[ \tau \sim \mathcal{IG}(4, \frac{3}{2}) \] \[ \omega^2 \sim \mathcal{IG}(4, 2) \]

Model Estimation

For model estimation iterations are carried out in parallel on 6 CPU cores, where each core computes 25000 iterations on independent chains (one chain per core). The first 5000 iterations per chain are excluded as warmup iterations. So this leads to 120000 post-warmup iterations in total. All of them are saved and used for inference. I did not find any benefit in running more iterations.

Model Estimates and Inference

This section presents the results of the model estimations. First the constant variance model is examined shortly for means of comparison and then the stochastic variance model is examined extensively which i will focus on.

Comparison of Single Parameters in Both Models

This section compares summaries of the posterior distributions for non team or time specific parameters for both models.

Constant Variance Model

This subscetions presents posterior means, medians, and pointwise posterior 95% confidence intervalls for the parameters \(\rho\), \(\omega\), \(\sigma\), and \(\beta\) in the constant variance model.

parameter mean 2.5% 50% 97.5%
rho 0.5048 0.2515 0.5080 0.7444
omega 0.7438 0.4733 0.7112 1.2020
sigma 0.6777 0.5065 0.6750 0.8631
beta 0.5142 0.3797 0.5145 0.6517

Stochastic Variance Model

This subscetions presents posterior means, medians, and pointwise posterior 95% confidence intervalls for the parameters \(\rho\), \(\tau\), \(\omega\), and \(\beta\).

parameter mean 2.5% 50% 97.5%
rho 0.5293 0.2598 0.5336 0.7745
tau 0.3317 0.1523 0.3166 0.6080
omega 0.6493 0.5097 0.6456 0.8151
beta 0.5148 0.3789 0.5142 0.6513

Parameter \(\rho\) is pretty much the same as described in Glickman (2001), but \(\beta\) the advantage effect of playing on the home field is almost twice as high as presented in the paper. The paper does not present results of \(\tau\) or \(\omega\).

Comparison of Merits in Both Models

The next table contains information on teams’ merits for period 5 (year 2000) that is structured as table 1 in Glickman (2001) on p. 678. Displayed are teams’ merits as posterior means of \(\gamma^{(5)}\) and posterior standard deviations of these parameters for the constant variance model (cv) and the stochastic variance model (sv). The table is ordered by merits in the constant variance model.

team_name posterior_mean_cv posterior_sd_cv posterior_mean_sv posterior_sd_sv
Tennessee Titans 1.2371 0.5725 1.3108 0.7003
Minnesota Vikings 0.8815 0.5434 0.9464 0.6119
Miami Dolphins 0.8023 0.5576 0.7623 0.5755
Oakland Raiders 0.7964 0.5662 0.7806 0.6595
Indianapolis Colts 0.6136 0.5289 0.6996 0.6104
New York Giants 0.5452 0.5545 0.4744 0.5544
New York Jets 0.5353 0.5376 0.5563 0.5546
St. Louis Rams 0.5276 0.5494 0.5160 0.5984
Buffalo Bills 0.4824 0.5425 0.4774 0.5341
Tampa Bay Buccaneers 0.3880 0.5346 0.3642 0.5104
Washington Redskins 0.3611 0.5208 0.3137 0.4857
Kansas City Chiefs 0.2885 0.5422 0.2782 0.5259
Detroit Lions 0.2415 0.5235 0.2059 0.5050
Baltimore Ravens 0.2283 0.5328 0.1719 0.5096
Denver Broncos 0.0596 0.5395 0.0499 0.5374
Pittsburgh Steelers 0.0562 0.5418 0.0249 0.5290
Philadelphia Eagles -0.0104 0.5218 -0.0201 0.5253
Jacksonville Jaguars -0.0733 0.5449 -0.1020 0.5871
Green Bay Packers -0.0766 0.5309 -0.0554 0.5176
New Orleans Saints -0.1252 0.5340 -0.1539 0.5457
New England Patriots -0.3677 0.5432 -0.3108 0.5422
Carolina Panthers -0.3806 0.5358 -0.3850 0.5274
Seattle Seahawks -0.4281 0.5341 -0.3897 0.5296
Dallas Cowboys -0.4726 0.5436 -0.4239 0.5377
Chicago Bears -0.6197 0.5384 -0.5884 0.5407
Atlanta Falcons -0.6458 0.5271 -0.7367 0.5908
Arizona Cardinals -0.7129 0.5310 -0.6937 0.5496
San Francisco 49ers -0.9145 0.5363 -1.0427 0.6379
Cincinnati Bengals -0.9441 0.5644 -1.0066 0.6129
San Diego Chargers -1.1459 0.5902 -1.4030 0.8661
Cleveland Browns -1.1530 0.5743 -1.3033 0.6631

These results are very similar to the ones presented in the paper, although the varaiance in general is a bit lower and the values of the teams strength are also lower, both by a factor of approx. 0.8 for the stochastic variance model. This mainly results because of the higher value of \(\beta\), which i find for both models. The ranking itself remains almost unchanged in both cases. Also differences in teams’ merits are similar. More or less it seems to be the same result measured on a different scale through the higher value of \(\beta\). I move on presenting graphs that can be directly compared to parts of figures 1 and 2 in Glickman (2001) presenting the merits \(\gamma^{(t)}\) for the Atlanta Falcons and the Seattle Seahawks including pointwise 95% confidence intervalls as dotted lines.

Constant Variance Model

Stochastic Variance Model

These graphs are again very similar to the ones presented in the paper. The graphs for both models show basically the same pattern. The only difference is that in the stochastic variance model there is more variation in the merit which is especially visible for the Atlanta Falcons. The next figure shows the variances \(\sigma^{2^{(t)}}\) over time for these two teams in the stochastic variance model.

Once again one can see that this graph is also very similar although the pattern for the Seattle Seahawks is a little different, especially in 1996, but their change in \(\sigma^2\) is marginal over time so i think this should not raise a problem here. The main insights stay the same.

Reestimation Conclusion

I was able to recreate a dataset matching all the information given in the paper from an external source of data. Furthermore i could implement both of the presented models and carry out the MCMC simulation in R using RStan. The results in the paper can be quite closely reproduced except for the value of the parameter \(\beta\) which i find to be almost twice as high as presented in the paper. This in turn shrinks the parameters \(\gamma\) and \(\sigma\) so that the overall level of the probabilities preserved. The comparison on table 1 and figures 1 and 2 make me feel confident that the estimation procedure itself is correctly implemented. It seems to me that either the dataset is different in the data on the home field variable or that there is some misspecification in the definition of the covariate vector \(x\). The constant variance model shows the same size of the parameter \(\beta\) which looks to be further evidence for this assumption. Nevertheless i cannot fully rule out errors in the model specification or estimation that affect both models. The next section presents the results of an extended dataset.

Extension

Since my code is written in a rather generic way it is easy to extend the NFL dataset up to the end of 2015 and run the estimation on this data. The dataset starts with the year 1996 and is extended up to the end of 2015. Variable and model definitions remain as before except for the number of games, periods and also teams. This dataset now features 5025 rows (i.e. regular season games) for 20 periods and 32 teams.

The following section is kept short and displays the single parameters \(\rho\), \(\tau\), \(\omega\), and \(\beta\) first and then the merits and confidence intervalls for all 32 over time including pointwise 95% posterior confidence intervalls for the stochastic variance model in a similar way as in the previous sections.

Stochastic Variance Model (2015 Data)

parameter mean 2.5% 50% 97.5%
rho 0.5849 0.4704 0.5859 0.6945
tau 0.1666 0.1021 0.1634 0.2522
omega 0.6005 0.5107 0.6006 0.6924
beta 0.3900 0.3265 0.3900 0.4531

It turns out that the parameters for \(\beta\) and \(\tau\) and \(\omega\) decrease as \(\rho\) increases. Confidence on estimates increases in general because the model can now use more data than before. The following figure depicts the teams merits in alphabetical order from left to right and top to bottom.

Once again the confidence intervalls become narrower because of more information in the model. Thank you very much for reading this report!

References & Resources

Software Used

sessionInfo()
## R version 3.2.3 (2015-12-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 15.10
## 
## locale:
##  [1] LC_CTYPE=en_US.utf8        LC_NUMERIC=C              
##  [3] LC_TIME=en_US.utf8         LC_COLLATE=en_US.utf8     
##  [5] LC_MONETARY=en_US.utf8     LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] knitr_1.11       rstan_2.8.2      ggplot2_1.0.1    data.table_1.9.4
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.2         MASS_7.3-44         munsell_0.4.2      
##  [4] colorspace_1.2-6    highr_0.5.1         stringr_0.6.2      
##  [7] plyr_1.8.1          tools_3.2.3         parallel_3.2.3     
## [10] grid_3.2.3          gtable_0.1.2        htmltools_0.3      
## [13] tufterhandout_1.2.1 yaml_2.1.13         digest_0.6.8       
## [16] gridExtra_2.0.0     reshape2_1.4.1      formatR_1.2.1      
## [19] codetools_0.2-14    inline_0.3.14       evaluate_0.8       
## [22] rmarkdown_0.9       labeling_0.3        scales_0.2.4       
## [25] stats4_3.2.3        chron_2.3-45        proto_0.3-10