1. Introduction

This document will outline the instructions of how to calculate ELO ratings in sports. Originally developed for rating chess players based on win, draw or loss, the ELO ratings has been applied to other sports due to its simplicity at rating head to head matches and effectiveness.

In this example I will be applying the ELO ratings calculation to calculate ELO scores for the 2023 AFL season.

2. Load packages

Load the following packages, elo, fiztzRoy and tidymodels, caret and gtools

## load packages
library(elo)
library(fitzRoy)
library(tidymodels)
library(caret)
library(gtools)

3. Import, explore and manipulate Data

Using the FitzRoy package import the data. The FitzRoy package scrapes publicly available AFL data from sources like AFL Tables we will import the 2023 AFL season into a dataframe and call the afl_raw

The data is packaged up quite nicely ready to be used, however its good practice to explore the data.

You can look at the structure of the data, get summary stats to do some basic exploratory data analysis (EDA).

## import data using FitzRoy package
afl_raw <- fetch_results_afltables(2023)

## look at dataframe structure
str(afl_raw)
## tibble [216 × 16] (S3: tbl_df/tbl/data.frame)
##  $ Game        : num [1:216] 16191 16192 16193 16194 16195 ...
##  $ Date        : Date[1:216], format: "2023-03-16" "2023-03-17" ...
##  $ Round       : chr [1:216] "R1" "R1" "R1" "R1" ...
##  $ Home.Team   : chr [1:216] "Richmond" "Geelong" "North Melbourne" "Port Adelaide" ...
##  $ Home.Goals  : int [1:216] 8 16 12 18 17 9 15 9 10 13 ...
##  $ Home.Behinds: int [1:216] 10 7 15 18 13 7 16 11 7 12 ...
##  $ Home.Points : int [1:216] 58 103 87 126 115 61 106 65 67 90 ...
##  $ Away.Team   : chr [1:216] "Carlton" "Collingwood" "West Coast" "Brisbane Lions" ...
##  $ Away.Goals  : int [1:216] 8 19 12 11 9 16 12 19 7 12 ...
##  $ Away.Behinds: int [1:216] 10 11 10 6 11 14 18 10 10 10 ...
##  $ Away.Points : int [1:216] 58 125 82 72 65 110 90 124 52 82 ...
##  $ Venue       : chr [1:216] "M.C.G." "M.C.G." "Docklands" "Adelaide Oval" ...
##  $ Margin      : int [1:216] 0 -22 5 54 50 -49 16 -59 15 8 ...
##  $ Season      : num [1:216] 2023 2023 2023 2023 2023 ...
##  $ Round.Type  : chr [1:216] "Regular" "Regular" "Regular" "Regular" ...
##  $ Round.Number: int [1:216] 1 1 1 1 1 1 1 1 1 2 ...
## view dataframe summary
summary(afl_raw)
##       Game            Date               Round            Home.Team        
##  Min.   :16191   Min.   :2023-03-16   Length:216         Length:216        
##  1st Qu.:16245   1st Qu.:2023-04-27   Class :character   Class :character  
##  Median :16298   Median :2023-06-09   Mode  :character   Mode  :character  
##  Mean   :16298   Mean   :2023-06-10                                        
##  3rd Qu.:16352   3rd Qu.:2023-07-24                                        
##  Max.   :16406   Max.   :2023-09-30                                        
##    Home.Goals     Home.Behinds    Home.Points      Away.Team        
##  Min.   : 3.00   Min.   : 4.00   Min.   : 31.00   Length:216        
##  1st Qu.:10.00   1st Qu.: 9.00   1st Qu.: 70.00   Class :character  
##  Median :12.00   Median :11.00   Median : 85.00   Mode  :character  
##  Mean   :12.75   Mean   :11.29   Mean   : 87.76                     
##  3rd Qu.:16.00   3rd Qu.:13.00   3rd Qu.:105.00                     
##  Max.   :31.00   Max.   :22.00   Max.   :205.00                     
##    Away.Goals     Away.Behinds    Away.Points        Venue          
##  Min.   : 4.00   Min.   : 2.00   Min.   : 26.00   Length:216        
##  1st Qu.: 9.00   1st Qu.: 8.00   1st Qu.: 63.75   Class :character  
##  Median :11.00   Median :10.00   Median : 77.00   Mode  :character  
##  Mean   :11.47   Mean   :10.17   Mean   : 78.97                     
##  3rd Qu.:13.25   3rd Qu.:12.00   3rd Qu.: 93.00                     
##  Max.   :23.00   Max.   :21.00   Max.   :152.00                     
##      Margin             Season      Round.Type         Round.Number  
##  Min.   :-108.000   Min.   :2023   Length:216         Min.   : 1.00  
##  1st Qu.: -16.250   1st Qu.:2023   Class :character   1st Qu.: 6.75  
##  Median :   5.000   Median :2023   Mode  :character   Median :13.00  
##  Mean   :   8.792   Mean   :2023                      Mean   :13.01  
##  3rd Qu.:  32.000   3rd Qu.:2023                      3rd Qu.:19.25  
##  Max.   : 171.000   Max.   :2023                      Max.   :28.00

4. Manipulate and prepare data

You can manipulate and prepare the data based on what you want to use it for. For this example we will filter the data based on the final round of the 2023 regular season and select the required variables that will help us predict future match ups.

We will also filter the 2023 AFL finals series to use later to make predictions on.

## get required variables and store in new dataframe
afl_23 <- afl_raw %>%
  filter(Round.Number < 25) %>% # filter rounds
  mutate(Result = ifelse(Home.Points > Away.Points, 1, 
                  ifelse(Home.Points == Away.Points, 0.5, 0))) %>% # get the result
  select(Round, Home.Team, Away.Team, Result) # select the variables

## get required variables and store in new dataframe
afl_23_finals <- afl_raw %>%
  filter(Round.Number > 24) %>% # filter rounds
  mutate(Result = ifelse(Home.Points > Away.Points, 1, 
                  ifelse(Home.Points == Away.Points, 0.5, 0))) %>% # get the result
  select(Round, Home.Team, Away.Team, Result) # select the variables

5. Create functions

Create functions that you can call so you wont have to rewrite the code.

##elo function to set k factor and initial elos
elo_score <- function(initial_elos, k, data){
  
  # obtain elo ratings
  elo <- elo.run(formula = Result ~ Home.Team + Away.Team,
               initial_elos = initial_elos,
               k = k,
               data = data, history = T) 
  return(elo)
}

## elo and confusion matrix function to give us the accuracy score
accuracy <- function(initial_elos, k, data){
  
  # obtain elo ratings
  elo <- elo.run(formula = Result ~ Home.Team + Away.Team,
               initial_elos = initial_elos,
               k = k,
               data = data) %>%
    as.data.frame()
  
  data <- data %>% 
    mutate(p.A = elo$p.A) %>% 
    mutate(pred = ifelse(p.A > .5, 1, 0))
  
  cm_elo <- caret::confusionMatrix(data = factor(data$pred, levels = c(0,0.5,1)),
                  reference = factor(data$Result, levels = c(0, 0.5,1)))
  
  return(cm_elo$overall["Accuracy"])
  
}

6. Build Elo model

Build an initial elo model, start off with and initial elo of 1500 and a k factor of 25, this is the starting point to give us a baseline for our elo model we will further tune this models.

## call elo function
afl_elos <- elo_score(1500, 25, afl_23) %>%
  as.data.frame()

7. Hyperparameter tuning

We will now perform hyperparameter tuning on the intial elo and k factor. This process is essential and allows us to build a better performing model using optimal initial elo score and k factor.

To do this we will split the data between train and test then create a dataframe with a list of initial elo scores ranging from 1000 to 3000 increasing by fifty and and k factors from 10-50 increasing by 5 and test the accuracy against the train dataset to obtain the optimal initial elo score and k factor.

## split data
afl_split <- initial_split(afl_23)
afl_train <- training(afl_split) #train
afl_test <- testing(afl_split) #test

## Create a grid 
params_elo <- expand.grid(init = seq(1000, 3000, by = 50),
                      kfac = seq(10, 50, by = 5))


## Apply function to train data
params_elo$accuracy <- mapply(accuracy, params_elo$init, params_elo$kfac, MoreArgs = list(data = afl_train))

## view best tuned elo combinations
subset(params_elo, accuracy == max(params_elo$accuracy))

8. Update Elo model

Select any of the combination of the best parameters to use to update the elo model and obtain top 8 elo scores.

## use new parameters to get final elos for each team
final_elos <- final.elos(elo_score(1000, 40, afl_23)) %>%
  as.data.frame() %>% 
  rownames_to_column()

9. Make predictions

We will now make predictions on the 2023 AFL finals series.

First we will create a dataframe thats stores all possible matchups of the 2023 finals series.

## create a new df for all possible finals match ups
finals_matchups <- 
  permutations(n = 8, r = 2, c("Carlton",
                        "Melbourne",
                        "Sydney",
                        "Collingwood",
                        "St Kilda",
                        "Brisbane Lions",
                        "GWS",
                        "Port Adelaide"), 
                             repeats.allowed = FALSE) %>% 
  as.data.frame()

We will now add the teams final elo scores to the finals match ups.

## get elo scores for top 8 teams in 2023 finals
top8_elos <- 
  final_elos %>% 
  filter(rowname %in% c("Carlton",
                        "Melbourne",
                        "Sydney",
                        "Collingwood",
                        "St Kilda",
                        "Brisbane Lions",
                        "GWS",
                        "Port Adelaide"))
  
## apply elo scores to match ups
top8_elos <- 
  finals_matchups %>% 
  group_by(V1, V2) %>% 
  mutate(
    Home.Team.Elo = top8_elos$.[top8_elos$rowname == V1],
    Away.Team.Elo = top8_elos$.[top8_elos$rowname == V2]) %>%
  rename(Home.Team = V1,
         Away.Team = V2)%>%
  mutate(Home.Team.Prob = elo.prob(Home.Team.Elo, Away.Team.Elo), 
         Away.Team.Prob = 1 - Home.Team.Prob)

Finally we will put together the elo scores and probabilities to guide us to make the predictions on the finals matches.

afl_23_finals <- afl_23_finals %>%
 left_join(top8_elos, by = c("Home.Team", "Away.Team")) %>%
  filter(Round %in% c("QF",
                        "EF",
                        "SF",
                        "PF",
                        "GF")) %>%
  select(-"Result")
afl_23_finals
## # A tibble: 9 × 7
##   Round Home.Team      Away.Team      Home.Team.Elo Away.Team.Elo Home.Team.Prob
##   <chr> <chr>          <chr>                  <dbl>         <dbl>          <dbl>
## 1 QF    Collingwood    Melbourne              1633.         1616.          0.525
## 2 EF    Carlton        Sydney                 1587.         1548.          0.556
## 3 EF    St Kilda       GWS                    1510.         1573.          0.410
## 4 QF    Brisbane Lions Port Adelaide          1640.         1620.          0.529
## 5 SF    Melbourne      Carlton                1616.         1587.          0.541
## 6 SF    Port Adelaide  GWS                    1620.         1573.          0.567
## 7 PF    Collingwood    GWS                    1633.         1573.          0.586
## 8 PF    Brisbane Lions Carlton                1640.         1587.          0.575
## 9 GF    Collingwood    Brisbane Lions         1633.         1640.          0.490
## # ℹ 1 more variable: Away.Team.Prob <dbl>