ratingsR can be installed directly from GitHub.
devtools::install_github('jalapic/ratingsR')
library(ratingsR)
The colley function will calculate the Colley ratings for individuals based on win-loss data. It ignores ties. The input can either i) a square matrix of wins and losses (winners in rows and losers in columns, with all individuals in rows and columns), ii) a win-loss dataframe with the first two columns being the individuals/teams and the 3rd and 4th columns being the goals/points scored by individuals/teams in the 1st and 2nd rows respectively. Any other columns will be ignored.
div3_2012 is an example win-loss matrix from 2012 div III football. Numbers indicate the total wins by team in the rows against teams in the columns.
div3_2012
## Johns Hopkins Franklin & Marshall Gettysburg Dickinson
## Johns Hopkins NA "0" "1" "1"
## Franklin & Marshall "1" NA "0" "1"
## Gettysburg "0" "1" NA "0"
## Dickinson "0" "0" "1" NA
## McDaniel "0" "0" "0" "0"
## McDaniel
## Johns Hopkins "1"
## Franklin & Marshall "1"
## Gettysburg "1"
## Dickinson "1"
## McDaniel NA
div3_2012_spread gives the results between each team. This dataframe also includes a fifth column giving the spread differential between each team.
div3_2012_spread
## team1 team2 t1 t2 dif
## 1 FM JH 14 12 2
## 2 GT JH 35 49 -14
## 3 DK JH 0 49 -49
## 4 MC JH 7 49 -42
## 5 GT FM 38 31 7
## 6 DK FM 28 36 -8
## 7 MC FM 10 35 -25
## 8 DK GT 23 13 10
## 9 MC GT 3 35 -32
## 10 MC DK 31 38 -7
To calculate Colley ratings, we can do the following:
colley(div3_2012)
## Johns Hopkins Franklin & Marshall Gettysburg
## 0.6428571 0.6428571 0.5000000
## Dickinson McDaniel
## 0.5000000 0.2142857
colley(div3_2012_spread)
## DK FM GT JH MC
## 0.5000000 0.6428571 0.5000000 0.6428571 0.2142857
The value of wins can be adjusted according to spread with the Colley method using the colley_spread function. The input is a win-loss dataframe as above. Two additional parameters are required. First, spreadval - the spread threshold value above which to consider a win as differentially weighted, and secondly, adjval - the adjusted weight of the win. For example, to give wins a value of 1.5 wins if the spread is higher than 7:
colley_spread(div3_2012_spread, spreadval=7, adjval=1.5)
## DK FM GT JH MC
## 0.7321429 0.8750000 0.7321429 0.9107143 0.3750000
The function colley_ties enables calculation of Colley ratings accounting for ties. Ties are considered to be a half-win for each team. The input is a win-loss dataframe as above. When there are no ties, the resulting ratings will be the same as for the colley function. Here is an example using all 380 results from the 2014-15 EPL soccer season:
head(epl2014_15)
## team1 team2 t1 t2
## 1 Manchester United Swansea City 1 2
## 2 Leicester City Everton 2 2
## 3 Queens Park Rangers Hull City 0 1
## 4 Stoke City Aston Villa 0 1
## 5 West Bromwich Albion Sunderland 2 2
## 6 West Ham United Tottenham Hotspur 0 1
colley_ties(epl2014_15)
## Arsenal Aston Villa Burnley
## 0.6785714 0.3809524 0.3571429
## Chelsea Crystal Palace Everton
## 0.7738095 0.4642857 0.4642857
## Hull City Leicester City Liverpool
## 0.3690476 0.4047619 0.5714286
## Manchester City Manchester United Newcastle United
## 0.7023810 0.6428571 0.3928571
## Queens Park Rangers Southampton Stoke City
## 0.3095238 0.5476190 0.5119048
## Sunderland Swansea City Tottenham Hotspur
## 0.4166667 0.5238095 0.5833333
## West Bromwich Albion West Ham United
## 0.4404762 0.4642857
Here is a comparison of how accounting for ties adjusts the ratings value:
plot(colley_ties(epl2014_15), colley(epl2014_15), xlab = "Colley with Ties", ylab="Basic Colley")
Alternatively, wins can be weighted according to time. This might be needed when wins closer in time need to be considered more significant than wins that occurred earlier. There are four different weighting methods - linear, exp, log, or step. They differ in how much to devalue wins that occurred earliest. The step method requires an additional parameter ts which states at which time unit to consider wins worth twice the wins that occurred previously. The colley_weight function performs these methods. These methods require an input dataframe of four columns. The first two are the teams. The 3rd column is who won (1 for team1, 0 for team2). The fourth column is the time interval. This method does not work for ties.
An example from college basketball:
bball2
## team1 team2 winner time
## 1 BING HART 0 1
## 2 UVM UNH 1 1
## 3 BU ME 0 1
## 4 ALBY UMBC 1 1
## 5 STON UNH 1 4
## 6 UVM ALBY 1 4
## 7 BU HART 0 4
## 8 ME UMBC 1 4
## 9 BING ALBY 0 6
## 10 UVM BU 0 7
## 11 BING STON 0 8
## 12 ME HART 0 8
## 13 UMBC UNH 1 8
colley_weight(bball2, 'linear')
## ALBY BING BU HART ME STON UMBC
## 0.5069564 0.2996526 0.5623205 0.7294697 0.5132109 0.6272530 0.4962594
## UNH UVM
## 0.2914655 0.4734119
colley_weight(bball2, 'exp')
## ALBY BING BU HART ME STON UMBC
## 0.5148858 0.2296598 0.5847105 0.8936342 0.6357151 0.6296661 0.3908802
## UNH UVM
## 0.1220974 0.4987508
colley_weight(bball2, 'log')
## ALBY BING BU HART ME STON UMBC
## 0.5108379 0.3289859 0.5390191 0.6909780 0.5149819 0.6158528 0.4905430
## UNH UVM
## 0.3229554 0.4858461
colley_weight(bball2, 'step', ts='5')
## ALBY BING BU HART ME STON UMBC
## 0.5637570 0.2465278 0.5665359 0.8387386 0.6096838 0.6297210 0.4140900
## UNH UVM
## 0.1555496 0.4753964
The Massey method can be calculated using the massey function. The Massey method accounts for spread in its calculations. The input is a win-loss dataframe with at least four columns. Here is an example from the first few games of the 2009 Ivy Football League.
ivyfootball
## team1 team2 t1 t2
## 1 Penn Cornell 34 0
## 2 Penn Harvard 17 7
## 3 Penn Princeton 42 7
## 4 Harvard Yale 14 10
## 5 Harvard Columbia 34 14
## 6 Princeton Dartmouth 23 11
## 7 Princeton Yale 24 17
## 8 Brown Yale 35 21
## 9 Brown Dartmouth 14 7
## 10 Columbia Brown 28 14
## 11 Columbia Cornell 30 20
## 12 Dartmouth Cornell 20 17
To calculate the ratings, we do the following:
massey_ivy<- massey(ivyfootball)
round(massey_ivy,2)
## Brown Columbia Cornell Dartmouth Harvard Penn Princeton
## -3.75 0.00 -11.00 -11.25 10.75 25.25 -3.00
## Yale
## -7.00
Comparing the Massey and Colley ratings we get:
plot(colley(ivyfootball), massey(ivyfootball), xlab="Colley Rating", ylab="Massey Rating")