1. Elo Rating Models

An Elo rating system is method for ranking teams or players in a competition, relative to one another. The system was invented by Arpad Elo and was originally used in the ranking of chess players. Nowdays, an Elo system can be applied to teams and players in a range of sports.

2. Required Packages

For this demonstration, an Elo ranking model will be built for the 2022 regular AFL seasons (rounds 1 to 23).

#Load required packages for analysis
library(tidyverse)
library(elo)
library(fitzRoy)
library(rsample)

3. Data Import and Cleaning

Data will be sourced from AFL Tables using the fitzRoy package. In order to perform analysis, a little bit of data cleaning and organisation is required first.

# Import and clean data
afl <- fetch_results_afltables(season = 2022)

# Get round data only 
afl <- afl %>%
  filter(Round.Number <= 23)

# Make a column for the result of each game
afl$Win <- ifelse(afl$Margin > 0, 1, ifelse(afl$Margin < 0, 0, 0.5))

# Select required columns
afl <- afl %>%
  select(Round.Number, Home.Team, Away.Team, Win)

4. Basic Elo

The first step is to build a basic Elo model in order to inspect its accuracy and understand the model. For this, we will use the initial rating for each team of 2200 and a k value of 27.

elo_basic <- elo::elo.run(formula = Win ~ Home.Team + Away.Team,
                                data = afl,
                                initial.elos = 2200,
                                k = 27,
                                history = T) %>%
  as.data.frame()


head(elo_basic)
##           team.A        team.B p.A wins.A update.A update.B  elo.A  elo.B
## 1      Melbourne     Footscray 0.5      1     13.5    -13.5 2213.5 2186.5
## 2        Carlton      Richmond 0.5      1     13.5    -13.5 2213.5 2186.5
## 3       St Kilda   Collingwood 0.5      0    -13.5     13.5 2186.5 2213.5
## 4        Geelong      Essendon 0.5      1     13.5    -13.5 2213.5 2186.5
## 5            GWS        Sydney 0.5      0    -13.5     13.5 2186.5 2213.5
## 6 Brisbane Lions Port Adelaide 0.5      1     13.5    -13.5 2213.5 2186.5

p.A is the probability of team A winning. This value will change for each match-up, based on the rankings of the playing teams. Wins.A is the actual outcome of the game. The update columns indicate the update to each teams Elo rating, and the Elo column are the final Elo ratings for each team at the end of the match.

5. Hyperparameter Tuning

In order to get the best Elo predictions and rankings, it is a good idea to tune the model and pick out the optimal parameters that produce the model with the highest accuracy. We can do this buy creating a function which extracts the best parameters. We will build this model using a training/testing split.

# Split data into training and testing sets
split <- initial_split(afl)
train <- training(split)
test <- testing(split)

# Write a function
elo_score <- function(initial_elos, k, data){
  
  # obtain elo ratings
  elo <- elo::elo.run(formula = Win ~ Home.Team + Away.Team,
                      initial_elos = initial_elos,
                      k = k,
                      data = data) %>%
    as.data.frame()
  
  data <- data %>% 
    mutate(p.A = elo$p.A) %>% 
    mutate(pred = ifelse(p.A > .5, 1, 0))
  
  cm <- caret::confusionMatrix(data = factor(data$pred, levels = c(0,0.5,1)),
                               reference = factor(data$Win, levels = c(0, 0.5,1)))
  
  return(list(cm))
  
}

# Create a grid 
params <- expand.grid(init = seq(1000, 3000, by = 50),
                      kfac = seq(10, 50, by = 5))

# Apply the function 
params$accuracy <- apply(X = params,
                         MARGIN = 1,
                         FUN = function(x)
                           elo_score(x[1], x[2], train)[[1]]$overall["Accuracy"])

# Optimal Parameters
best <- subset(params, accuracy == max(params$accuracy))

head(best)
##   init kfac  accuracy
## 1 1000   10 0.6756757
## 2 1050   10 0.6756757
## 3 1100   10 0.6756757
## 4 1150   10 0.6756757
## 5 1200   10 0.6756757
## 6 1250   10 0.6756757

6. Updated Elo Model

Based on the above model, the optimal k value is 10 and initial value is 1000, with an accuracy of 62.84% on the training data. We will use these parameters for build the final model.

elo_final <- elo::elo.run(formula = Win ~ Home.Team + Away.Team,
                                data = afl,
                                initial.elos = 1000,
                                k = 10,
                                history = T) 
elo_final_df <- elo_final %>% 
  as.data.frame()


head(elo_final_df)
##           team.A        team.B p.A wins.A update.A update.B elo.A elo.B
## 1      Melbourne     Footscray 0.5      1        5       -5  1005   995
## 2        Carlton      Richmond 0.5      1        5       -5  1005   995
## 3       St Kilda   Collingwood 0.5      0       -5        5   995  1005
## 4        Geelong      Essendon 0.5      1        5       -5  1005   995
## 5            GWS        Sydney 0.5      0       -5        5   995  1005
## 6 Brisbane Lions Port Adelaide 0.5      1        5       -5  1005   995

7. Using Elo Model to Make Predictions

Using the final model, we can make predictions on the testing data and other future match-ups.

test$predictions <- predict(elo_final, test, type = "prob")


# Predicted win or loss based on probability
test$pred.Win <- ifelse(test$predictions > 0.5, 1, ifelse(test$predictions < 0.5, 0, 0.5))

test <- test %>%
    mutate(pred.win = ifelse(predictions > 0.5, "Win", ifelse(predictions < 0.5, "Loss", "Draw"))) %>%
  mutate(actual.win = ifelse(Win > 0.5, "Win", ifelse(Win < 0.5, "Loss", "Draw")))


head(test)
## # A tibble: 6 x 8
##   Round.Number Home.Team  Away.Team         Win predic~1 pred.~2 pred.~3 actua~4
##          <int> <chr>      <chr>           <dbl>    <dbl>   <dbl> <chr>   <chr>  
## 1            1 Melbourne  Footscray           1    0.547       1 Win     Win    
## 2            1 St Kilda   Collingwood         0    0.433       0 Loss    Loss   
## 3            1 Hawthorn   North Melbourne     1    0.573       1 Win     Win    
## 4            1 Adelaide   Fremantle           0    0.408       0 Loss    Loss   
## 5            2 Gold Coast Melbourne           0    0.425       0 Loss    Loss   
## 6            3 Footscray  Sydney              1    0.451       0 Loss    Win    
## # ... with abbreviated variable names 1: predictions, 2: pred.Win, 3: pred.win,
## #   4: actual.win

Finally, we can use a confusion matrix to compare the the actual results to determine the accuracy of the Elo predictions. The accuracy of predictions on this training data is 73.47%.

cm <- caret::confusionMatrix(data =  factor(test$actual.win, levels = c("Win", "Loss")),
                             reference = factor(test$pred.win, levels = c("Win", "Loss")))
# Accuracy ----
cm$overall["Accuracy"]
##  Accuracy 
## 0.6938776