An Elo rating system is method for ranking teams or players in a competition, relative to one another. The system was invented by Arpad Elo and was originally used in the ranking of chess players. Nowdays, an Elo system can be applied to teams and players in a range of sports.
For this demonstration, an Elo ranking model will be built for the 2022 regular AFL seasons (rounds 1 to 23).
#Load required packages for analysis
library(tidyverse)
library(elo)
library(fitzRoy)
library(rsample)
Data will be sourced from AFL Tables using the fitzRoy package. In order to perform analysis, a little bit of data cleaning and organisation is required first.
# Import and clean data
afl <- fetch_results_afltables(season = 2022)
# Get round data only
afl <- afl %>%
filter(Round.Number <= 23)
# Make a column for the result of each game
afl$Win <- ifelse(afl$Margin > 0, 1, ifelse(afl$Margin < 0, 0, 0.5))
# Select required columns
afl <- afl %>%
select(Round.Number, Home.Team, Away.Team, Win)
The first step is to build a basic Elo model in order to inspect its accuracy and understand the model. For this, we will use the initial rating for each team of 2200 and a k value of 27.
elo_basic <- elo::elo.run(formula = Win ~ Home.Team + Away.Team,
data = afl,
initial.elos = 2200,
k = 27,
history = T) %>%
as.data.frame()
head(elo_basic)
## team.A team.B p.A wins.A update.A update.B elo.A elo.B
## 1 Melbourne Footscray 0.5 1 13.5 -13.5 2213.5 2186.5
## 2 Carlton Richmond 0.5 1 13.5 -13.5 2213.5 2186.5
## 3 St Kilda Collingwood 0.5 0 -13.5 13.5 2186.5 2213.5
## 4 Geelong Essendon 0.5 1 13.5 -13.5 2213.5 2186.5
## 5 GWS Sydney 0.5 0 -13.5 13.5 2186.5 2213.5
## 6 Brisbane Lions Port Adelaide 0.5 1 13.5 -13.5 2213.5 2186.5
p.A is the probability of team A winning. This value will change for each match-up, based on the rankings of the playing teams. Wins.A is the actual outcome of the game. The update columns indicate the update to each teams Elo rating, and the Elo column are the final Elo ratings for each team at the end of the match.
In order to get the best Elo predictions and rankings, it is a good idea to tune the model and pick out the optimal parameters that produce the model with the highest accuracy. We can do this buy creating a function which extracts the best parameters. We will build this model using a training/testing split.
# Split data into training and testing sets
split <- initial_split(afl)
train <- training(split)
test <- testing(split)
# Write a function
elo_score <- function(initial_elos, k, data){
# obtain elo ratings
elo <- elo::elo.run(formula = Win ~ Home.Team + Away.Team,
initial_elos = initial_elos,
k = k,
data = data) %>%
as.data.frame()
data <- data %>%
mutate(p.A = elo$p.A) %>%
mutate(pred = ifelse(p.A > .5, 1, 0))
cm <- caret::confusionMatrix(data = factor(data$pred, levels = c(0,0.5,1)),
reference = factor(data$Win, levels = c(0, 0.5,1)))
return(list(cm))
}
# Create a grid
params <- expand.grid(init = seq(1000, 3000, by = 50),
kfac = seq(10, 50, by = 5))
# Apply the function
params$accuracy <- apply(X = params,
MARGIN = 1,
FUN = function(x)
elo_score(x[1], x[2], train)[[1]]$overall["Accuracy"])
# Optimal Parameters
best <- subset(params, accuracy == max(params$accuracy))
head(best)
## init kfac accuracy
## 1 1000 10 0.6756757
## 2 1050 10 0.6756757
## 3 1100 10 0.6756757
## 4 1150 10 0.6756757
## 5 1200 10 0.6756757
## 6 1250 10 0.6756757
Based on the above model, the optimal k value is 10 and initial value is 1000, with an accuracy of 62.84% on the training data. We will use these parameters for build the final model.
elo_final <- elo::elo.run(formula = Win ~ Home.Team + Away.Team,
data = afl,
initial.elos = 1000,
k = 10,
history = T)
elo_final_df <- elo_final %>%
as.data.frame()
head(elo_final_df)
## team.A team.B p.A wins.A update.A update.B elo.A elo.B
## 1 Melbourne Footscray 0.5 1 5 -5 1005 995
## 2 Carlton Richmond 0.5 1 5 -5 1005 995
## 3 St Kilda Collingwood 0.5 0 -5 5 995 1005
## 4 Geelong Essendon 0.5 1 5 -5 1005 995
## 5 GWS Sydney 0.5 0 -5 5 995 1005
## 6 Brisbane Lions Port Adelaide 0.5 1 5 -5 1005 995
Using the final model, we can make predictions on the testing data and other future match-ups.
test$predictions <- predict(elo_final, test, type = "prob")
# Predicted win or loss based on probability
test$pred.Win <- ifelse(test$predictions > 0.5, 1, ifelse(test$predictions < 0.5, 0, 0.5))
test <- test %>%
mutate(pred.win = ifelse(predictions > 0.5, "Win", ifelse(predictions < 0.5, "Loss", "Draw"))) %>%
mutate(actual.win = ifelse(Win > 0.5, "Win", ifelse(Win < 0.5, "Loss", "Draw")))
head(test)
## # A tibble: 6 x 8
## Round.Number Home.Team Away.Team Win predic~1 pred.~2 pred.~3 actua~4
## <int> <chr> <chr> <dbl> <dbl> <dbl> <chr> <chr>
## 1 1 Melbourne Footscray 1 0.547 1 Win Win
## 2 1 St Kilda Collingwood 0 0.433 0 Loss Loss
## 3 1 Hawthorn North Melbourne 1 0.573 1 Win Win
## 4 1 Adelaide Fremantle 0 0.408 0 Loss Loss
## 5 2 Gold Coast Melbourne 0 0.425 0 Loss Loss
## 6 3 Footscray Sydney 1 0.451 0 Loss Win
## # ... with abbreviated variable names 1: predictions, 2: pred.Win, 3: pred.win,
## # 4: actual.win
Finally, we can use a confusion matrix to compare the the actual results to determine the accuracy of the Elo predictions. The accuracy of predictions on this training data is 73.47%.
cm <- caret::confusionMatrix(data = factor(test$actual.win, levels = c("Win", "Loss")),
reference = factor(test$pred.win, levels = c("Win", "Loss")))
# Accuracy ----
cm$overall["Accuracy"]
## Accuracy
## 0.6938776