How to create AFLW team ratings using the Glicko model

What is the Glicko model?

The Glicko model was created by Mark E. Glickman as an expansion of the Elo model [1]. (The Elo model was created by Arpad Elo as a method to calculate the skill rating of chess players. The model is designed to compare two players or teams and determine a score based on previous ratings and the result of the match). After each match, players or teams swap/exchange rating points based on who won. If a higher ranked player/team beats a lower ranked player/team, they only accumulate a small percentage of the other player/teams points. However, if a lower ranked player/team beats a higher ranked player/team, they receive a lot more points, as this is a more valuable and unexpected win [2]. More on this here: link [2].

Glickman found that there was a deficiency in the Elo model, based on the reliability of team or player initial ratings [1]. Glickman wanted to account for situations where a player may not have competed for an extended period, or the initial rating of a player was not “trustworthy” [1]. This was done by adding in a ratings deviation [1].

The Glicko model is comprised of 4 different values, these are the initial value, the deviation value, a c-value and a gamma score.

Initial value (init): rating vector to initialise a new player [3]

Deviation value (dev): the uncertainty in a rating [1]

C-value (cval): controls the increase in team’s deviations over time [3]

Gamma score (gamma): indicates any home ground advantage (0 means no advantage) [3]

How can we use Glicko ratings?

We are able to track teams rating over the season to see how they are progressing at different points in a season or tournament
We can determine their rating at the end of the season and compare this to previous seasons
We can compare different teams ratings, this is often useful and frequently used in betting and gambling, where people want to try and determine which team is more likely to win

Creating the Glicko ratings?

The aim of this task is to determine the Glicko ratings of the 16 teams in the Australian Football League - Women’s (AFLW) at the end of the 2021 home and away season.

First we need to import a few key libraries that will be used for this code.

library(fitzRoy)
library(dplyr)
library(PlayerRatings)
library(tidyr)
library(caret)

Next, we will scrape some AFLW match data. In this example, we will use the fitzRoy package and the function get_aflw_match_data() to retrieve the specific data set. We will just be looking at the 2021 season data, hence why we have used “2021” in the brackets below.

AFLW <- get_aflw_match_data(2021)

Now, this data set contains a lot of information. This code will show you the list of the 75 variables contained in the data set.

names(AFLW)

##  [1] "match.name"                          "match.date"                         
##  [3] "match.status"                        "match.matchId"                      
##  [5] "match.venue"                         "match.utcStartTime"                 
##  [7] "match.homeTeamId"                    "match.awayTeamId"                   
##  [9] "match.round"                         "match.venueLocalStartTime"          
## [11] "match.abbr"                          "match.twitterHashTag"               
## [13] "match.homeTeam.name"                 "match.homeTeam.timeZone"            
## [15] "match.homeTeam.teamId"               "match.homeTeam.abbr"                
## [17] "match.homeTeam.nickname"             "match.awayTeam.name"                
## [19] "match.awayTeam.timeZone"             "match.awayTeam.teamId"              
## [21] "match.awayTeam.abbr"                 "match.awayTeam.nickname"            
## [23] "venue.address"                       "venue.name"                         
## [25] "venue.state"                         "venue.timeZone"                     
## [27] "venue.venueId"                       "venue.abbreviation"                 
## [29] "venue.capacity"                      "venue.groundDimension"              
## [31] "venue.latitude"                      "venue.longitude"                    
## [33] "round.name"                          "round.year"                         
## [35] "round.roundId"                       "round.abbreviation"                 
## [37] "round.competitionId"                 "round.roundNumber"                  
## [39] "status"                              "matchId"                            
## [41] "scoreWorm"                           "scoreMap"                           
## [43] "lastUpdated"                         "homeTeamScore.periodScore"          
## [45] "homeTeamScore.rushedBehinds"         "homeTeamScore.minutesInFront"       
## [47] "homeTeamScore.matchScore.totalScore" "homeTeamScore.matchScore.goals"     
## [49] "homeTeamScore.matchScore.behinds"    "homeTeamScore.matchScore.superGoals"
## [51] "awayTeamScore.periodScore"           "awayTeamScore.rushedBehinds"        
## [53] "awayTeamScore.minutesInFront"        "awayTeamScore.matchScore.totalScore"
## [55] "awayTeamScore.matchScore.goals"      "awayTeamScore.matchScore.behinds"   
## [57] "awayTeamScore.matchScore.superGoals" "matchClock.periods"                 
## [59] "weather.description"                 "weather.tempInCelsius"              
## [61] "weather.weatherType"                 "homeTeamScoreChart.goals"           
## [63] "homeTeamScoreChart.leftBehinds"      "homeTeamScoreChart.rightBehinds"    
## [65] "homeTeamScoreChart.leftPosters"      "homeTeamScoreChart.rightPosters"    
## [67] "homeTeamScoreChart.rushedBehinds"    "homeTeamScoreChart.touchedBehinds"  
## [69] "awayTeamScoreChart.goals"            "awayTeamScoreChart.leftBehinds"     
## [71] "awayTeamScoreChart.rightBehinds"     "awayTeamScoreChart.leftPosters"     
## [73] "awayTeamScoreChart.rightPosters"     "awayTeamScoreChart.rushedBehinds"   
## [75] "awayTeamScoreChart.touchedBehinds"

We do not need all these variables, so we will only select the ones relevant to creating the Glicko model.

AFLW <- AFLW %>% select(match.matchId, match.date, round.abbreviation, match.homeTeam.name, match.awayTeam.name, 
                        homeTeamScore.matchScore.totalScore, awayTeamScore.matchScore.totalScore)

For the Glicko model to work, we need to create a new column (let’s call it “Score”) to represent the results of the home team. We will attribute a 1 if the home team won, a 0 if the home team lost and a 0.5 if the game was a draw.

AFLW <- AFLW %>% mutate(
  Score = ifelse(homeTeamScore.matchScore.totalScore > awayTeamScore.matchScore.totalScore, 1, 
                 ifelse(homeTeamScore.matchScore.totalScore == awayTeamScore.matchScore.totalScore, 0.5, 0)))

As we are just looking at the ratings at the end of the 2021 AFLW, we need to remove the Finals rounds. This means that all clubs will have played the same amount of games across the season. We also need to change the data type of the round.abbreviation variable. We will remove the “rd” at the beginning and change the variable to a numeric data type.

AFLW <- AFLW[!(AFLW$round.abbreviation == "QF"| AFLW$round.abbreviation == "PF"| AFLW$round.abbreviation == "GF"),] 

AFLW$round.abbreviation <- substring(AFLW$round.abbreviation, 3)
AFLW$round.abbreviation <- as.numeric(AFLW$round.abbreviation)

To begin to develop the Glicko ratings, we first need to create a few different variables and values.

The “predictround” value takes the unique number rounds in the data set, as well as removing the 1st round, because we will use this later on to formulate the inital values.

The “model_vars” includes the 4 values that the code will need to run through.

The “p_glicko” is a new column added to the data frame, which for now, has NA’s but these will be where the predicted Glicko value for each game will be located.

predictround <- unique(AFLW$round.abbreviation)
predictround <- predictround[-1] # Remove week 1 from predictweeks list - use this to build the initial models
model_vars <- c("round.abbreviation",'match.homeTeam.name','match.awayTeam.name','Score')
AFLW[c("p_glicko")] <- NA

We also want to create a gamma value (home ground advantage metrics). For this example we are just going to use a very simple equation, home team wins divided by the number of games played. This resulted in a home ground advantage of 0.5873016

home_ground_ad <- sum(AFLW$Score)/63 #(63 is number of games in the data set)

print(home_ground_ad)

## [1] 0.5873016

To determine which values need to be used for each of the variables (init, dev, cval) (remember we have already figured out the gamma value before) we need to undertake parameter tuning. This function below will take the init, dev and cval values and run through each round for each variable in the “model_vars” value and return a confusion matrix for the predicted tips (p_glicko) as well as the final Glicko ratings model.

glicko_score <- function(init, dev, cval, data, status = NULL){
  
  predictround <- unique(data$round.abbreviation)
  model_vars <- c("round.abbreviation",'match.homeTeam.name','match.awayTeam.name','Score')
  
  if (is.null(status)){
    
    predictround <- predictround[-1]
    
    glickomodel <- glicko(data[data$round.abbreviation == 1, model_vars],
                          init = c(init, dev),
                          cval = cval,
                          gamma = 0.5873016,
                          history = T)
    
  } else {
    
    glickomodel <- status
    
  }
  
  
  data[c('p_glicko')] <- NA
  
  for (r in predictround){
    
    pred_round  <- subset(data, round.abbreviation == r)
    
    pred_glicko <- predict(glickomodel,
                           newdata = pred_round[model_vars],
                           gamma = 0.5873016,
                           tng = 1, trat = c(1600, 300))
    
    data$p_glicko[data$round.abbreviation == r] <- pred_glicko
    
    glickomodel <- glicko(pred_round[model_vars],
                          status = glickomodel$ratings,
                          init = c(init, dev),
                          cval = cval,
                          gamma = 0.5873016,
                          history = T)
    
  }
  
  data$tip_glicko <- ifelse(data$p_glicko > 0.5, 1, 0)
  
  cm_glicko <- confusionMatrix(data = factor(data$tip_glicko, levels = c(0,0.5,1)),
                               reference = factor(data$Score, levels = c(0,0.5,1)))
  
  return(
    list(glickomodel,
         cm_glicko)
  )
  
}

We need to let the code know what values to search through to find the best combination of init, dev and cval. The default values are init = (2200,300), cval = 15 (300 represents the dev value). For this example we will trial the values below.

params_glicko <- expand.grid(init = seq(1600, 3000, by = 200),
                             dev = seq(100, 300, by = 50),
                             cval = seq(5, 55, by = 5))

We then need to apply the formula to the data set (AFLW). This code will produce the combinations of init, dev and cval with the highest accuracy rate. This may take a few minutes to load depending on what sequence of values you have used, so do not panic. You can always choose a smaller range of numbers if your computer is not as powerful.

params_glicko$accuracy <- apply(params_glicko, 1, function (x) 
  glicko_score(x[1], x[2], x[3], AFLW)[[2]]$overall['Accuracy'] )

Because there are more than 1 combination with the top accuracy, the top result was chosen.

subset(params_glicko, accuracy == max(params_glicko$accuracy))

##     init dev cval accuracy
## 393 1600 300   50     0.75
## 394 1800 300   50     0.75
## 395 2000 300   50     0.75
## 396 2200 300   50     0.75
## 397 2400 300   50     0.75
## 398 2600 300   50     0.75
## 399 2800 300   50     0.75
## 400 3000 300   50     0.75
## 433 1600 300   55     0.75
## 434 1800 300   55     0.75
## 435 2000 300   55     0.75
## 436 2200 300   55     0.75
## 437 2400 300   55     0.75
## 438 2600 300   55     0.75
## 439 2800 300   55     0.75
## 440 3000 300   55     0.75

From this, we will choose init = 1600, dev = 300 and cval = 50.

The results

Once we have chosen the values for our parameters, we apply it to the Glicko model formula

glickomodel <- glicko(AFLW[model_vars],
                      init = c(1600,300),
                      cval = 50,
                      gamma = 0.5873016,
                      history = T)

These are the final Glicko ratings for the 14 AFLW teams

print(glickomodel)

## 
## Glicko Ratings For 14 Players Playing 63 Games
## 
##               Player Rating Deviation Games Win Draw Loss Lag
## 1          Melbourne   1977    156.97     9   7    0    2   0
## 2     Brisbane Lions   1927    160.89     9   7    0    2   0
## 3     Adelaide Crows   1897    161.57     9   7    0    2   0
## 4          Kangaroos   1841    155.46     9   6    0    3   0
## 5        Collingwood   1840    165.23     9   7    0    2   0
## 6          Fremantle   1695    163.22     9   6    0    3   0
## 7   Western Bulldogs   1684    156.53     9   5    0    4   0
## 8            Carlton   1603    163.78     9   5    0    4   0
## 9         GWS Giants   1501    162.88     9   4    0    5   0
## 10          St Kilda   1438    163.72     9   3    0    6   0
## 11          Richmond   1436    160.70     9   3    0    6   0
## 12 West Coast Eagles   1280    159.84     9   2    0    7   0
## 13      Geelong Cats   1184    163.89     9   1    0    8   0
## 14   Gold Coast Suns   1051    169.28     9   0    0    9   0

As you can see Melbourne are ranked #1 with a rating of 1977 and the Gold Coast Suns were ranked last with a rating of 1051.

References

[1]. Glickman M. Welcome to Glicko ratings [Internet]. Glicko.net. 2021. Available from: http://www.glicko.net/glicko.html

[2]. Mittal R. What is an ELO Rating? [Internet]. Medium. 2020. Available from: https://medium.com/purple-theory/what-is-elo-rating-c4eb7a9061e0

[3]. R: The Glicko Rating System [Internet]. Search.r-project.org. Available from: https://search.r-project.org/CRAN/refmans/PlayerRatings/html/glicko.html