1. Introduction

I would recommend checking out “Custom AFL Elo Rating Calculation and Visualisation”, also on RPubs, prior to using this tutorial.

Elo Ratings are a measure of strength of a player or team within a group. Players or teams are given an initial rating of a specified number of points. These points are exchanged with other players or teams in match-ups depending on a win, loss or draw. The number of points exchanged in a match-up is determined by the relative ratings of each player or team, a weighting factor and the outcome of the match-up.

The values for these inputs can be tested upon existing data to determine what values will give the greatest values. This process is called hyperparameter tuning.

In this tutorial we will conduct hyperparameter tuning on Elo Ratings in Australian Rules Football.

2. Source the Data

First we will load up some required packages. The ‘PlayerRatings’ package handles the calculation of Elo Ratings. The fitzRoy package will scrape the necessary data for our use.

library(fitzRoy)
library(PlayerRatings)
library(dplyr)
library(tidyr)
library(caret)
library(ggplot2)

We can load in and begin to manipulate the data for testing upon.

# add desired years of results
importedData <- fetch_player_stats_afltables(season = c(2017:2021))

# the imported data needs to be grouped and summarised by game
seasonStats <- importedData %>%
  group_by(Season, Round, Date, Local.start.time, Venue) %>%
  # a round number variable will assist in creating a week counter for the overall data 
  summarise(roundnumber = ifelse(Round[1] == "EF" | Round[1] == "QF", "FW1", ifelse(Round[1] == "SF", "FW2", ifelse(Round[1] == "PF", "FW3", ifelse(Round[1] == "GF", "FW4", Round[1])))),
            year = Season[1],
            hteam = Home.team[1],
            ateam = Away.team[1],
            hscore = Home.score[1],
            ascore = Away.score[1],
            # we will create a new column to represent the S value. 
            # 1 for a win, 0.5 for a draw and 0 for a loss.
            spoint = ifelse(hscore > ascore, 1, ifelse(hscore == ascore, 0.5, 0)))

# combine roundnumber and year to create unique value for each round, and make sure the data is arranged by date
seasonStats <- seasonStats %>% 
  unite("roundcode", sep = " - ", roundnumber:year, remove = TRUE) %>%
  arrange(Date)

We need to create a week counter to cycle through when doing the testing.

# creating a temporary data frame that is grouped by each unique round with a starting date
temp <- seasonStats %>%
  group_by(roundcode) %>%
  summarise(firstgame = Date[1])

# arrange the rounds in order and assign a value to each round, this number will need to match the number of rounds
temp <- temp %>%
  arrange(firstgame) %>%
  mutate(Week = 1:130)

# join the temporary data frame to our main one, so that the weeks match the correct round
seasonStats <- left_join(seasonStats, temp, by = "roundcode") %>%
  select(-firstgame)

# check the main data frame
head(seasonStats)
## # A tibble: 6 x 12
## # Groups:   Season, Round, Date, Local.start.time [5]
##   Season Round Date       Local.start.time Venue     roundcode hteam ateam hscore
##    <dbl> <chr> <date>                <int> <chr>     <chr>     <chr> <chr>  <int>
## 1   2017 1     2017-03-23             1920 M.C.G.    1 - 2017  Carl~ Rich~     89
## 2   2017 1     2017-03-24             1950 M.C.G.    1 - 2017  Coll~ West~     86
## 3   2017 1     2017-03-25             1635 Docklands 1 - 2017  St K~ Melb~     90
## 4   2017 1     2017-03-25             1635 S.C.G.    1 - 2017  Sydn~ Port~     82
## 5   2017 1     2017-03-25             1905 Carrara   1 - 2017  Gold~ Bris~     96
## 6   2017 1     2017-03-25             1925 M.C.G.    1 - 2017  Esse~ Hawt~    116
## # ... with 3 more variables: ascore <int>, spoint <dbl>, Week <int>

3. Hyperparameter Tuning

# create a function that takes in the various Elo Rating parameters to test them
eloScore <- function(init, kfac, gamma, data, status = NULL){
  
  # a list of all the weeks in the dataset to cycle through
  predictweeks <- unique(data$Week)
  
  # the relevant variables needed to calculate the Elo Ratings
  modelVars <- c('Week','hteam','ateam','spoint')
  
  # if an existing model isn't passed through
  if (is.null(status)){
    
    # subset all weeks bar Week 1
    predictweeks <- predictweeks[-1]
    
    # create elo calculation model using specified inputs
    eloModel <- elo(data[data$Week == 1, modelVars],
                    init = init,
                    kfac = kfac,
                    gamma = gamma,
                    history = T)
    
  } else {
    
    # assign an existing model 
    eloModel <- status
    
  }
  
  # create a probability variable, and assign it as NA to begin with
  data[c('eloProb')] <- NA
  
  # for loop that loops through the list of weeks
  for (w in predictweeks){
    
    # subset the week
    predWeek  <- subset(data, Week == w)
    
    # predict the subset week's matches
    # trat and tng are arguments that determine ratings of un-rated teams, and a games requirement
    predElo <- predict(eloModel,
                        newdata = predWeek[modelVars],
                        gamma = gamma,
                        tng = 1, trat = 1800)
    
    # assign the predictions to the corresponding week
    data$eloProb[data$Week == w] <- predElo
    
    # create and update the ratings
    eloModel <- elo(predWeek[modelVars],
                    status = eloModel$ratings,
                    init = init,
                    gamma = gamma,
                    kfac = kfac,
                    history = T)
    
  }
  
  # create a variable that indicates what the Elo Model predicts, relative to the home team
  data$tipElo <- ifelse(data$eloProb > 0.5, 1, 0)
  
  # create a confusion matrix of the results
  cmElo <- confusionMatrix(data = factor(data$tipElo, levels = c(0,0.5,1)),
                            reference = factor(data$spoint, levels = c(0,0.5,1)))
  
  return(
    
    # return final list of ratings, and the confusion matrix
    list(eloModel,
         cmElo, 
         data)
  )
  
}

4. Accuracy

Now that the model has been created, the accuracy of the models can be tested. Ranges of each input can be parsed into the function.

# this is an expanded grid of every combination of our specified inputs
# NOTE: more combinations will mean a longer run time
# be weary of the length of the data set and number of combinations of inputs
params <- expand.grid(init = seq(1600, 2400, by = 200),
                      kfac = seq(10, 100, by = 10),
                      gamma = seq(0, 1, by = 0.2))

We’ll split the data into training and testing

# we'll use a 70/30 split, but alternate splits can be used also
trainData <- subset(seasonStats, Week <= 91)
testData  <- subset(seasonStats, Week >= 92)

We next want to apply or parameters to the training data set, using our custom function.

# this will apply our testing parameters to our training data, testing for 
# NOTE: this may take several minutes
params$accuracy <- apply(params, 1, function (x) 
  eloScore(x[1], x[2], x[3], trainData)[[2]]$overall['Accuracy'])

Let’s subset the most accurate combination of input values to determine what combination(s) give the best accuracy

# this will subset the most accurate combination of inputs
subset(params, accuracy == max(params$accuracy))
##     init kfac gamma  accuracy
## 216 1600   40   0.8 0.6366237
## 217 1800   40   0.8 0.6366237
## 218 2000   40   0.8 0.6366237
## 219 2200   40   0.8 0.6366237
## 220 2400   40   0.8 0.6366237
## 266 1600   40   1.0 0.6366237
## 267 1800   40   1.0 0.6366237
## 268 2000   40   1.0 0.6366237
## 269 2200   40   1.0 0.6366237
## 270 2400   40   1.0 0.6366237
## 291 1600   90   1.0 0.6366237
## 292 1800   90   1.0 0.6366237
## 293 2000   90   1.0 0.6366237
## 294 2200   90   1.0 0.6366237
## 295 2400   90   1.0 0.6366237

Lastly, we can test how our newly found combination of values does on our testing data

# test inputs on testing data
testScore <- eloScore(init = 1600, 
                       kfac = 40, 
                       gamma = 0.8, 
                       data = testData,
                       status = eloScore(1600, 40, 0.8, trainData)[[1]])

Our function returns three objects: the latest ratings, a confusion matrix with accuracy statistics and a data frame containing game by game elo probability and tips.

# the latest ratings
testScore[[1]]
## 
## Elo Ratings For 18 Players Playing 990 Games
## 
##                    Player Rating Games Win Draw Loss Lag
## 1               Melbourne   1800   111  62    1   48   0
## 2           Port Adelaide   1795   110  70    1   39   1
## 3                 Geelong   1753   119  77    1   41   1
## 4        Western Bulldogs   1723   111  59    0   52   0
## 5          Brisbane Lions   1712   111  56    0   55   2
## 6  Greater Western Sydney   1650   116  65    4   47   2
## 7                  Sydney   1635   109  57    0   52   3
## 8                Richmond   1631   117  80    2   35   4
## 9                St Kilda   1605   107  45    1   61   4
## 10               Essendon   1583   108  53    1   54   3
## 11             West Coast   1578   113  69    1   43   4
## 12              Fremantle   1550   105  42    0   63   4
## 13               Hawthorn   1527   107  48    3   56   4
## 14            Collingwood   1504   113  58    2   53   4
## 15                Carlton   1485   105  30    0   75   4
## 16               Adelaide   1453   108  49    1   58   4
## 17             Gold Coast   1424   105  25    1   79   4
## 18        North Melbourne   1393   105  35    1   69   4
# confusion matrix and accuracy statistics
testScore[[2]]
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0 0.5  1
##        0   78   2 61
##        0.5  0   0  0
##        1   49   2 90
## 
## Overall Statistics
##                                           
##                Accuracy : 0.5957          
##                  95% CI : (0.5359, 0.6535)
##     No Information Rate : 0.5355          
##     P-Value [Acc > NIR] : 0.02408         
##                                           
##                   Kappa : 0.2028          
##                                           
##  Mcnemar's Test P-Value : 0.15051         
## 
## Statistics by Class:
## 
##                      Class: 0 Class: 0.5 Class: 1
## Sensitivity            0.6142    0.00000   0.5960
## Specificity            0.5935    1.00000   0.6107
## Pos Pred Value         0.5532        NaN   0.6383
## Neg Pred Value         0.6525    0.98582   0.5674
## Prevalence             0.4504    0.01418   0.5355
## Detection Rate         0.2766    0.00000   0.3191
## Detection Prevalence   0.5000    0.00000   0.5000
## Balanced Accuracy      0.6039    0.50000   0.6034
# data frame of all matches featuring elo probabilities and tips
testScore[[3]]
## # A tibble: 282 x 14
## # Groups:   Season, Round, Date, Local.start.time [276]
##    Season Round Date       Local.start.time Venue  roundcode hteam ateam  hscore
##     <dbl> <chr> <date>                <int> <chr>  <chr>     <chr> <chr>   <int>
##  1   2020 11    2020-08-08             1605 Adela~ 11 - 2020 Port~ Richm~     93
##  2   2020 11    2020-08-08             1940 Gabba  11 - 2020 Bris~ Weste~     96
##  3   2020 11    2020-08-09             1335 Perth~ 11 - 2020 West~ Carlt~     72
##  4   2020 11    2020-08-09             1740 Adela~ 11 - 2020 Melb~ North~     92
##  5   2020 11    2020-08-10             1810 Gabba  11 - 2020 St K~ Geelo~     34
##  6   2020 11    2020-08-10             1840 Perth~ 11 - 2020 Frem~ Hawth~     48
##  7   2020 11    2020-08-11             1805 Adela~ 11 - 2020 Adel~ Colli~     38
##  8   2020 11    2020-08-12             1910 Carra~ 11 - 2020 Gold~ Essen~     73
##  9   2020 12    2020-08-13             1810 Perth~ 12 - 2020 Sydn~ Great~     66
## 10   2020 12    2020-08-14             1950 Carra~ 12 - 2020 Geel~ Port ~     91
## # ... with 272 more rows, and 5 more variables: ascore <int>, spoint <dbl>,
## #   Week <int>, eloProb <dbl>, tipElo <dbl>

5. Visualisation

As a small extensions we can visualise a confusion matrix of our testing data.

# pull out our data with Elo tips and probabilities
eloData <- testScore[[3]]

# create a data frame with the Elo tip and the actual outcome (spoint)
confMatELO <- as.data.frame(table(eloData$tipElo, eloData$spoint))

# can optionally remove draws (Elo is very unlikely to predict) (Var2 is the predicted results)
confMatELO <- confMatELO %>%
  filter(Var2 != 0.5)

# plotting the confusion matrix
ggplot(data = confMatELO, mapping = aes(x = Var1, y = Var2)) +
  geom_tile(aes(fill = Freq)) + # generates a plot with rectangles
  geom_text(aes(label = Freq)) + # adds the specific values to the rectangles
  scale_fill_gradient(low = "grey80", high = "dark green") + # fill with custom colours (gray and dark green)
  labs(x = "Predicted Outcome", y = "Actual Outcome", # custom titles, and labels
       title = "Elo Probability Model Confusion Matrix",
       fill = "Count") +
  theme_bw() # changing theme, is preferential

These same principles are applicable to most data sets, try out your own!