Predicting results of AFL matches is very popular among those with a keen interest in the sport. But how are these predictions actually made? There are many methods that have been used to attempt to forecast results of AFL matches. In this document, I explain how such predictions can be made using an ELO Ratings model.
The following packages are required for the ELO Rating and prediction process. If you do not have them installed, please do so, by using the install.packages function.
library(tidyverse)
library(dplyr)
library(ggplot2)
library(elo)
library(lubridate)
library(fitzRoy)
library(caret)
library(PlayerRatings)
After installing and loading all required packages, we first use the
fitzRoy package to call up chosen data. To construct ELO
Ratings we require information on past matches such as home team &
away team, as well as team scores and results of the matches in order to
make predictions on future results.
Below we have imported 3 seasons worth of AFL Home & Away match results, as well as fixtures for the 2022 finals series.
# 2020 - 2022 H & A results
afl_results <- fitzRoy::fetch_results_afltables(2020:2022) %>%
filter(Round.Type == "Regular")
# 2022 AFL Finals fixtures
aflfinals_2022 <- fitzRoy::fetch_fixture_footywire(2022) %>%
filter(Date >= "2022-09-01")
There are a range of parameters that could be included in the model, however in this case we have chosen to use Home Ground Advantage (HGA), carryOver, and k-factor. HGA is a weighting given to teams playing at home, as most teams tend to perform better at home, carryOver is a metric used to take into account historical performance by considering teams ELOs from previous seasons, and the k value is simply a cap on amount of ELO points a team can gain / lose per match.
I recommend testing out the type of parameters you may want to include in your own model, as well as the optimal values of parameters to achieve the highest accuracy.
HGA <- 30
carryOver <- 1
k_val <- 26
We are now ready to calculate our ELO ratings! The
elo.run function from the elo package is the
method used in this example. The function to produce the ELOs specifies
what will be incorporated in the model, including key parameters
mentioned earlier, as well as the dataframe of reference.
eloratings <- elo::elo.run(formula = score(Home.Points, Away.Points) ~
adjust(Home.Team, HGA) + Away.Team +
regress(Season, 2200, carryOver),
data = afl_results,
initial.elos = 2200,
k = k_val,
history = T) %>%
as.data.frame()
Next, we will create and add a prediction column to the data, to be used when testing accuracy.
eloratings$pred_elo <- ifelse(eloratings$p.A > 0.5, 1, 0)
Testing accuracy allows us to modify our parameters to find the best combination. To achieve this, we first create a confusion matrix, referencing our ‘pred_elo’ and ‘wins.A’ columns, and then we run the second command listed to produce an Accuracy value.
In this case, I found that the best combination of the chosen parameters is what is set above, however this will differ depending on amount of parameters and the data set used, so it is best to keep testing different combinations.
cm <- confusionMatrix(data = factor(eloratings$pred_elo,levels = c(0, 0.5, 1)),
reference = factor(eloratings$wins.A,levels = c(0, 0.5, 1)))
cm$overall["Accuracy"]
## Accuracy
## 0.6648452
Results of this test show that the accuracy was 0.665.
Once we have run the ELO function and tested for accuracy, we can view the results. Here, we view the results in a few different ways using different commands on the same data.
Firstly, we can use as.data.frame to view predicted and
actual results, along with the change in ratings for both the home and
away team for all matches. In this instance, we are previewing the last
6 rows of the dataframe using the ‘tail’ command.
as.data.frame(eloratings) %>% tail()
## team.A team.B p.A wins.A update.A update.B elo.A
## 544 Geelong West Coast 0.8683866 1 3.421948 -3.421948 2338.182
## 545 Essendon Richmond 0.4041567 0 -10.508075 10.508075 2131.622
## 546 Port Adelaide Adelaide 0.5855225 1 10.776415 -10.776415 2193.902
## 547 Hawthorn Footscray 0.4557900 0 -11.850540 11.850540 2141.500
## 548 Carlton Collingwood 0.4237466 0 -11.017412 11.017412 2196.936
## 549 St Kilda Sydney 0.4185649 0 -10.882688 10.882688 2189.730
## elo.B pred_elo
## 544 2033.573 1
## 545 2250.071 0
## 546 2142.332 1
## 547 2226.002 0
## 548 2302.374 0
## 549 2298.591 0
Next, we can use as.matrix which will display each
team’s rating change over time. Again, we use ‘tail’ to view the last 6
rows.
as.matrix(eloratings) %>% tail()
## team.A team.B p.A wins.A update.A
## [544,] "Geelong" "West Coast" "0.8683866" "1.0" " 3.4219482"
## [545,] "Essendon" "Richmond" "0.4041567" "0.0" "-10.5080752"
## [546,] "Port Adelaide" "Adelaide" "0.5855225" "1.0" " 10.7764148"
## [547,] "Hawthorn" "Footscray" "0.4557900" "0.0" "-11.8505400"
## [548,] "Carlton" "Collingwood" "0.4237466" "0.0" "-11.0174119"
## [549,] "St Kilda" "Sydney" "0.4185649" "0.0" "-10.8826875"
## update.B elo.A elo.B pred_elo
## [544,] " -3.4219482" "2338.182" "2033.573" "1"
## [545,] " 10.5080752" "2131.622" "2250.071" "0"
## [546,] "-10.7764148" "2193.902" "2142.332" "1"
## [547,] " 11.8505400" "2141.500" "2226.002" "0"
## [548,] " 11.0174119" "2196.936" "2302.374" "0"
## [549,] " 10.8826875" "2189.730" "2298.591" "0"
The last component of the data we want to view is the final ELOs of
all 18 teams, in this case at the end of the 2022 regular season. This
can be done using the final.elos command.
*NOTE - Must turn the ELO calculation function into a list, as the
final.elos command does not work on data frames. This is
done simply by removing (or commenting out) the ‘as.data.frame’ line at
the end of the function code. See below for how to do this.
eloratings <- elo::elo.run(formula = score(Home.Points, Away.Points) ~
adjust(Home.Team, HGA) + Away.Team +
regress(Season, 2200, carryOver),
data = afl_results,
initial.elos = 2200,
k = k_val,
history = T) #%>%
#as.data.frame()
final.elos(eloratings)
## Adelaide Brisbane Lions Carlton Collingwood Essendon
## 2142.332 2263.830 2196.936 2302.374 2131.622
## Footscray Fremantle Geelong Gold Coast GWS
## 2226.002 2278.453 2338.182 2176.609 2112.851
## Hawthorn Melbourne North Melbourne Port Adelaide Richmond
## 2141.500 2288.472 2034.968 2193.902 2250.071
## St Kilda Sydney West Coast
## 2189.730 2298.591 2033.573
Finally, once we have decided on parameters, run our ELO model and
are happy with it, we can make predictions on the fixtures data. We do
this using the predict function as seen below.
aflfinals_2022 <- aflfinals_2022 %>%
mutate(Prob = predict(eloratings, newdata = aflfinals_2022))
aflfinals_2022
## # A tibble: 9 x 8
## Date Season Season.Game Round Home.Team Away.~1 Venue Prob
## <dttm> <dbl> <int> <dbl> <chr> <chr> <chr> <dbl>
## 1 2022-09-01 19:20:00 2022 199 24 Brisbane Lio~ Richmo~ Gabba 0.563
## 2 2022-09-02 19:50:00 2022 200 24 Melbourne Sydney M.C.~ 0.529
## 3 2022-09-03 16:35:00 2022 201 24 Geelong Collin~ M.C.~ 0.594
## 4 2022-09-03 18:10:00 2022 202 24 Fremantle Footsc~ Pert~ 0.616
## 5 2022-09-09 19:50:00 2022 203 25 Melbourne Brisba~ M.C.~ 0.578
## 6 2022-09-10 19:25:00 2022 204 25 Collingwood Freman~ M.C.~ 0.577
## 7 2022-09-16 19:50:00 2022 205 26 Geelong Brisba~ M.C.~ 0.646
## 8 2022-09-17 16:45:00 2022 206 26 Sydney Collin~ S.C.~ 0.538
## 9 2022-09-24 14:30:00 2022 207 27 Geelong Sydney M.C.~ 0.599
## # ... with abbreviated variable name 1: Away.Team
Overall, this page provides a fairly straight forward demonstration of creating an AFL ELO model and using it to make predictions.
Many modifications and improvements can be added in order to strengthen the ELO formula and improve accuracy of predictions. For example, adding more parameters and modifying parameter values to find the optimal combination for your model is recommended.
There is plenty of capacity for this process to be continued further e.g. by tuning parameters and splitting the data into testing and training sets for additional use.
Results of this model can also be utilised in other predictive models e.g. Logistic Regression Model.