ratingsR introduction

Install

ratingsR can be installed directly from GitHub.

devtools::install_github('jalapic/ratingsR')

library(ratingsR)

Colley Method

The colley function will calculate the Colley ratings for individuals based on win-loss data. It ignores ties. The input can either i) a square matrix of wins and losses (winners in rows and losers in columns, with all individuals in rows and columns), ii) a win-loss dataframe with the first two columns being the individuals/teams and the 3rd and 4th columns being the goals/points scored by individuals/teams in the 1st and 2nd rows respectively. Any other columns will be ignored.

div3_2012 is an example win-loss matrix from 2012 div III football. Numbers indicate the total wins by team in the rows against teams in the columns.

div3_2012

##                     Johns Hopkins Franklin & Marshall Gettysburg Dickinson
## Johns Hopkins       NA            "0"                 "1"        "1"      
## Franklin & Marshall "1"           NA                  "0"        "1"      
## Gettysburg          "0"           "1"                 NA         "0"      
## Dickinson           "0"           "0"                 "1"        NA       
## McDaniel            "0"           "0"                 "0"        "0"      
##                     McDaniel
## Johns Hopkins       "1"     
## Franklin & Marshall "1"     
## Gettysburg          "1"     
## Dickinson           "1"     
## McDaniel            NA

div3_2012_spread gives the results between each team. This dataframe also includes a fifth column giving the spread differential between each team.

div3_2012_spread

##    team1 team2 t1 t2 dif
## 1     FM    JH 14 12   2
## 2     GT    JH 35 49 -14
## 3     DK    JH  0 49 -49
## 4     MC    JH  7 49 -42
## 5     GT    FM 38 31   7
## 6     DK    FM 28 36  -8
## 7     MC    FM 10 35 -25
## 8     DK    GT 23 13  10
## 9     MC    GT  3 35 -32
## 10    MC    DK 31 38  -7

To calculate Colley ratings, we can do the following:

colley(div3_2012)

##       Johns Hopkins Franklin & Marshall          Gettysburg 
##           0.6428571           0.6428571           0.5000000 
##           Dickinson            McDaniel 
##           0.5000000           0.2142857

colley(div3_2012_spread)

##        DK        FM        GT        JH        MC 
## 0.5000000 0.6428571 0.5000000 0.6428571 0.2142857

Colley Method accounting for Spread

The value of wins can be adjusted according to spread with the Colley method using the colley_spread function. The input is a win-loss dataframe as above. Two additional parameters are required. First, spreadval - the spread threshold value above which to consider a win as differentially weighted, and secondly, adjval - the adjusted weight of the win. For example, to give wins a value of 1.5 wins if the spread is higher than 7:

colley_spread(div3_2012_spread, spreadval=7, adjval=1.5)

##        DK        FM        GT        JH        MC 
## 0.7321429 0.8750000 0.7321429 0.9107143 0.3750000

Colley Method with Ties

The function colley_ties enables calculation of Colley ratings accounting for ties. Ties are considered to be a half-win for each team. The input is a win-loss dataframe as above. When there are no ties, the resulting ratings will be the same as for the colley function. Here is an example using all 380 results from the 2014-15 EPL soccer season:

head(epl2014_15)

##                  team1             team2 t1 t2
## 1    Manchester United      Swansea City  1  2
## 2       Leicester City           Everton  2  2
## 3  Queens Park Rangers         Hull City  0  1
## 4           Stoke City       Aston Villa  0  1
## 5 West Bromwich Albion        Sunderland  2  2
## 6      West Ham United Tottenham Hotspur  0  1

colley_ties(epl2014_15)

##              Arsenal          Aston Villa              Burnley 
##            0.6785714            0.3809524            0.3571429 
##              Chelsea       Crystal Palace              Everton 
##            0.7738095            0.4642857            0.4642857 
##            Hull City       Leicester City            Liverpool 
##            0.3690476            0.4047619            0.5714286 
##      Manchester City    Manchester United     Newcastle United 
##            0.7023810            0.6428571            0.3928571 
##  Queens Park Rangers          Southampton           Stoke City 
##            0.3095238            0.5476190            0.5119048 
##           Sunderland         Swansea City    Tottenham Hotspur 
##            0.4166667            0.5238095            0.5833333 
## West Bromwich Albion      West Ham United 
##            0.4404762            0.4642857

Here is a comparison of how accounting for ties adjusts the ratings value:

plot(colley_ties(epl2014_15), colley(epl2014_15), xlab = "Colley with Ties", ylab="Basic Colley")

Colley Method with Weighting Wins by Time

Alternatively, wins can be weighted according to time. This might be needed when wins closer in time need to be considered more significant than wins that occurred earlier. There are four different weighting methods - linear, exp, log, or step. They differ in how much to devalue wins that occurred earliest. The step method requires an additional parameter ts which states at which time unit to consider wins worth twice the wins that occurred previously. The colley_weight function performs these methods. These methods require an input dataframe of four columns. The first two are the teams. The 3rd column is who won (1 for team1, 0 for team2). The fourth column is the time interval. This method does not work for ties.

An example from college basketball:

bball2

##    team1 team2 winner time
## 1   BING  HART      0    1
## 2    UVM   UNH      1    1
## 3     BU    ME      0    1
## 4   ALBY  UMBC      1    1
## 5   STON   UNH      1    4
## 6    UVM  ALBY      1    4
## 7     BU  HART      0    4
## 8     ME  UMBC      1    4
## 9   BING  ALBY      0    6
## 10   UVM    BU      0    7
## 11  BING  STON      0    8
## 12    ME  HART      0    8
## 13  UMBC   UNH      1    8

colley_weight(bball2, 'linear')

##      ALBY      BING        BU      HART        ME      STON      UMBC 
## 0.5069564 0.2996526 0.5623205 0.7294697 0.5132109 0.6272530 0.4962594 
##       UNH       UVM 
## 0.2914655 0.4734119

colley_weight(bball2, 'exp')

##      ALBY      BING        BU      HART        ME      STON      UMBC 
## 0.5148858 0.2296598 0.5847105 0.8936342 0.6357151 0.6296661 0.3908802 
##       UNH       UVM 
## 0.1220974 0.4987508

colley_weight(bball2, 'log')

##      ALBY      BING        BU      HART        ME      STON      UMBC 
## 0.5108379 0.3289859 0.5390191 0.6909780 0.5149819 0.6158528 0.4905430 
##       UNH       UVM 
## 0.3229554 0.4858461

colley_weight(bball2, 'step', ts='5')

##      ALBY      BING        BU      HART        ME      STON      UMBC 
## 0.5637570 0.2465278 0.5665359 0.8387386 0.6096838 0.6297210 0.4140900 
##       UNH       UVM 
## 0.1555496 0.4753964

Massey Method

The Massey method can be calculated using the massey function. The Massey method accounts for spread in its calculations. The input is a win-loss dataframe with at least four columns. Here is an example from the first few games of the 2009 Ivy Football League.

ivyfootball

##        team1     team2 t1 t2
## 1       Penn   Cornell 34  0
## 2       Penn   Harvard 17  7
## 3       Penn Princeton 42  7
## 4    Harvard      Yale 14 10
## 5    Harvard  Columbia 34 14
## 6  Princeton Dartmouth 23 11
## 7  Princeton      Yale 24 17
## 8      Brown      Yale 35 21
## 9      Brown Dartmouth 14  7
## 10  Columbia     Brown 28 14
## 11  Columbia   Cornell 30 20
## 12 Dartmouth   Cornell 20 17

To calculate the ratings, we do the following:

massey_ivy<- massey(ivyfootball)
round(massey_ivy,2)

##     Brown  Columbia   Cornell Dartmouth   Harvard      Penn Princeton 
##     -3.75      0.00    -11.00    -11.25     10.75     25.25     -3.00 
##      Yale 
##     -7.00

Comparing the Massey and Colley ratings we get:

plot(colley(ivyfootball), massey(ivyfootball), xlab="Colley Rating", ylab="Massey Rating")

References

Some worked examples for the Colley ranking methods:

1
2

Some worked examples for the Massey ranking methods:

3
4