This document will outline the instructions of how to calculate ELO ratings in sports. Originally developed for rating chess players based on win, draw or loss, the ELO ratings has been applied to other sports due to its simplicity at rating head to head matches and effectiveness.
In this example I will be applying the ELO ratings calculation to calculate ELO scores for the 2023 AFL season.
Load the following packages, elo, fiztzRoy and tidymodels, caret and gtools
## load packages
library(elo)
library(fitzRoy)
library(tidymodels)
library(caret)
library(gtools)
Using the FitzRoy package import the data. The FitzRoy package scrapes publicly available AFL data from sources like AFL Tables we will import the 2023 AFL season into a dataframe and call the afl_raw
The data is packaged up quite nicely ready to be used, however its good practice to explore the data.
You can look at the structure of the data, get summary stats to do some basic exploratory data analysis (EDA).
## import data using FitzRoy package
afl_raw <- fetch_results_afltables(2023)
## look at dataframe structure
str(afl_raw)
## tibble [216 × 16] (S3: tbl_df/tbl/data.frame)
## $ Game : num [1:216] 16191 16192 16193 16194 16195 ...
## $ Date : Date[1:216], format: "2023-03-16" "2023-03-17" ...
## $ Round : chr [1:216] "R1" "R1" "R1" "R1" ...
## $ Home.Team : chr [1:216] "Richmond" "Geelong" "North Melbourne" "Port Adelaide" ...
## $ Home.Goals : int [1:216] 8 16 12 18 17 9 15 9 10 13 ...
## $ Home.Behinds: int [1:216] 10 7 15 18 13 7 16 11 7 12 ...
## $ Home.Points : int [1:216] 58 103 87 126 115 61 106 65 67 90 ...
## $ Away.Team : chr [1:216] "Carlton" "Collingwood" "West Coast" "Brisbane Lions" ...
## $ Away.Goals : int [1:216] 8 19 12 11 9 16 12 19 7 12 ...
## $ Away.Behinds: int [1:216] 10 11 10 6 11 14 18 10 10 10 ...
## $ Away.Points : int [1:216] 58 125 82 72 65 110 90 124 52 82 ...
## $ Venue : chr [1:216] "M.C.G." "M.C.G." "Docklands" "Adelaide Oval" ...
## $ Margin : int [1:216] 0 -22 5 54 50 -49 16 -59 15 8 ...
## $ Season : num [1:216] 2023 2023 2023 2023 2023 ...
## $ Round.Type : chr [1:216] "Regular" "Regular" "Regular" "Regular" ...
## $ Round.Number: int [1:216] 1 1 1 1 1 1 1 1 1 2 ...
## view dataframe summary
summary(afl_raw)
## Game Date Round Home.Team
## Min. :16191 Min. :2023-03-16 Length:216 Length:216
## 1st Qu.:16245 1st Qu.:2023-04-27 Class :character Class :character
## Median :16298 Median :2023-06-09 Mode :character Mode :character
## Mean :16298 Mean :2023-06-10
## 3rd Qu.:16352 3rd Qu.:2023-07-24
## Max. :16406 Max. :2023-09-30
## Home.Goals Home.Behinds Home.Points Away.Team
## Min. : 3.00 Min. : 4.00 Min. : 31.00 Length:216
## 1st Qu.:10.00 1st Qu.: 9.00 1st Qu.: 70.00 Class :character
## Median :12.00 Median :11.00 Median : 85.00 Mode :character
## Mean :12.75 Mean :11.29 Mean : 87.76
## 3rd Qu.:16.00 3rd Qu.:13.00 3rd Qu.:105.00
## Max. :31.00 Max. :22.00 Max. :205.00
## Away.Goals Away.Behinds Away.Points Venue
## Min. : 4.00 Min. : 2.00 Min. : 26.00 Length:216
## 1st Qu.: 9.00 1st Qu.: 8.00 1st Qu.: 63.75 Class :character
## Median :11.00 Median :10.00 Median : 77.00 Mode :character
## Mean :11.47 Mean :10.17 Mean : 78.97
## 3rd Qu.:13.25 3rd Qu.:12.00 3rd Qu.: 93.00
## Max. :23.00 Max. :21.00 Max. :152.00
## Margin Season Round.Type Round.Number
## Min. :-108.000 Min. :2023 Length:216 Min. : 1.00
## 1st Qu.: -16.250 1st Qu.:2023 Class :character 1st Qu.: 6.75
## Median : 5.000 Median :2023 Mode :character Median :13.00
## Mean : 8.792 Mean :2023 Mean :13.01
## 3rd Qu.: 32.000 3rd Qu.:2023 3rd Qu.:19.25
## Max. : 171.000 Max. :2023 Max. :28.00
You can manipulate and prepare the data based on what you want to use it for. For this example we will filter the data based on the final round of the 2023 regular season and select the required variables that will help us predict future match ups.
We will also filter the 2023 AFL finals series to use later to make predictions on.
## get required variables and store in new dataframe
afl_23 <- afl_raw %>%
filter(Round.Number < 25) %>% # filter rounds
mutate(Result = ifelse(Home.Points > Away.Points, 1,
ifelse(Home.Points == Away.Points, 0.5, 0))) %>% # get the result
select(Round, Home.Team, Away.Team, Result) # select the variables
## get required variables and store in new dataframe
afl_23_finals <- afl_raw %>%
filter(Round.Number > 24) %>% # filter rounds
mutate(Result = ifelse(Home.Points > Away.Points, 1,
ifelse(Home.Points == Away.Points, 0.5, 0))) %>% # get the result
select(Round, Home.Team, Away.Team, Result) # select the variables
Create functions that you can call so you wont have to rewrite the code.
##elo function to set k factor and initial elos
elo_score <- function(initial_elos, k, data){
# obtain elo ratings
elo <- elo.run(formula = Result ~ Home.Team + Away.Team,
initial_elos = initial_elos,
k = k,
data = data, history = T)
return(elo)
}
## elo and confusion matrix function to give us the accuracy score
accuracy <- function(initial_elos, k, data){
# obtain elo ratings
elo <- elo.run(formula = Result ~ Home.Team + Away.Team,
initial_elos = initial_elos,
k = k,
data = data) %>%
as.data.frame()
data <- data %>%
mutate(p.A = elo$p.A) %>%
mutate(pred = ifelse(p.A > .5, 1, 0))
cm_elo <- caret::confusionMatrix(data = factor(data$pred, levels = c(0,0.5,1)),
reference = factor(data$Result, levels = c(0, 0.5,1)))
return(cm_elo$overall["Accuracy"])
}
Build an initial elo model, start off with and initial elo of 1500 and a k factor of 25, this is the starting point to give us a baseline for our elo model we will further tune this models.
## call elo function
afl_elos <- elo_score(1500, 25, afl_23) %>%
as.data.frame()
We will now perform hyperparameter tuning on the intial elo and k factor. This process is essential and allows us to build a better performing model using optimal initial elo score and k factor.
To do this we will split the data between train and test then create a dataframe with a list of initial elo scores ranging from 1000 to 3000 increasing by fifty and and k factors from 10-50 increasing by 5 and test the accuracy against the train dataset to obtain the optimal initial elo score and k factor.
## split data
afl_split <- initial_split(afl_23)
afl_train <- training(afl_split) #train
afl_test <- testing(afl_split) #test
## Create a grid
params_elo <- expand.grid(init = seq(1000, 3000, by = 50),
kfac = seq(10, 50, by = 5))
## Apply function to train data
params_elo$accuracy <- mapply(accuracy, params_elo$init, params_elo$kfac, MoreArgs = list(data = afl_train))
## view best tuned elo combinations
subset(params_elo, accuracy == max(params_elo$accuracy))
Select any of the combination of the best parameters to use to update the elo model and obtain top 8 elo scores.
## use new parameters to get final elos for each team
final_elos <- final.elos(elo_score(1000, 40, afl_23)) %>%
as.data.frame() %>%
rownames_to_column()
We will now make predictions on the 2023 AFL finals series.
First we will create a dataframe thats stores all possible matchups of the 2023 finals series.
## create a new df for all possible finals match ups
finals_matchups <-
permutations(n = 8, r = 2, c("Carlton",
"Melbourne",
"Sydney",
"Collingwood",
"St Kilda",
"Brisbane Lions",
"GWS",
"Port Adelaide"),
repeats.allowed = FALSE) %>%
as.data.frame()
We will now add the teams final elo scores to the finals match ups.
## get elo scores for top 8 teams in 2023 finals
top8_elos <-
final_elos %>%
filter(rowname %in% c("Carlton",
"Melbourne",
"Sydney",
"Collingwood",
"St Kilda",
"Brisbane Lions",
"GWS",
"Port Adelaide"))
## apply elo scores to match ups
top8_elos <-
finals_matchups %>%
group_by(V1, V2) %>%
mutate(
Home.Team.Elo = top8_elos$.[top8_elos$rowname == V1],
Away.Team.Elo = top8_elos$.[top8_elos$rowname == V2]) %>%
rename(Home.Team = V1,
Away.Team = V2)%>%
mutate(Home.Team.Prob = elo.prob(Home.Team.Elo, Away.Team.Elo),
Away.Team.Prob = 1 - Home.Team.Prob)
Finally we will put together the elo scores and probabilities to guide us to make the predictions on the finals matches.
afl_23_finals <- afl_23_finals %>%
left_join(top8_elos, by = c("Home.Team", "Away.Team")) %>%
filter(Round %in% c("QF",
"EF",
"SF",
"PF",
"GF")) %>%
select(-"Result")
afl_23_finals
## # A tibble: 9 × 7
## Round Home.Team Away.Team Home.Team.Elo Away.Team.Elo Home.Team.Prob
## <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 QF Collingwood Melbourne 1633. 1616. 0.525
## 2 EF Carlton Sydney 1587. 1548. 0.556
## 3 EF St Kilda GWS 1510. 1573. 0.410
## 4 QF Brisbane Lions Port Adelaide 1640. 1620. 0.529
## 5 SF Melbourne Carlton 1616. 1587. 0.541
## 6 SF Port Adelaide GWS 1620. 1573. 0.567
## 7 PF Collingwood GWS 1633. 1573. 0.586
## 8 PF Brisbane Lions Carlton 1640. 1587. 0.575
## 9 GF Collingwood Brisbane Lions 1633. 1640. 0.490
## # ℹ 1 more variable: Away.Team.Prob <dbl>