Overview

Predicting results of AFL matches is very popular among those with a keen interest in the sport. But how are these predictions actually made? There are many methods that have been used to attempt to forecast results of AFL matches. In this document, I explain how such predictions can be made using an ELO Ratings model.

Loading Required Packages

The following packages are required for the ELO Rating and prediction process. If you do not have them installed, please do so, by using the install.packages function.

library(tidyverse)
library(dplyr)
library(ggplot2)
library(elo)
library(lubridate)
library(fitzRoy)
library(caret)
library(PlayerRatings)

Calling Up in Data

After installing and loading all required packages, we first use the fitzRoy package to call up chosen data. To construct ELO Ratings we require information on past matches such as home team & away team, as well as team scores and results of the matches in order to make predictions on future results.

Below we have imported 3 seasons worth of AFL Home & Away match results, as well as fixtures for the 2022 finals series.

# 2020 - 2022 H & A results
afl_results <- fitzRoy::fetch_results_afltables(2020:2022) %>%
  filter(Round.Type == "Regular")

# 2022 AFL Finals fixtures
aflfinals_2022 <- fitzRoy::fetch_fixture_footywire(2022) %>%
  filter(Date >= "2022-09-01") 

Setting ELO paramaters

There are a range of parameters that could be included in the model, however in this case we have chosen to use Home Ground Advantage (HGA), carryOver, and k-factor. HGA is a weighting given to teams playing at home, as most teams tend to perform better at home, carryOver is a metric used to take into account historical performance by considering teams ELOs from previous seasons, and the k value is simply a cap on amount of ELO points a team can gain / lose per match.

I recommend testing out the type of parameters you may want to include in your own model, as well as the optimal values of parameters to achieve the highest accuracy.

HGA <- 30 
carryOver <- 1 
k_val <- 26 

Calculating ELO results for all matches

We are now ready to calculate our ELO ratings! The elo.run function from the elo package is the method used in this example. The function to produce the ELOs specifies what will be incorporated in the model, including key parameters mentioned earlier, as well as the dataframe of reference.

eloratings <- elo::elo.run(formula = score(Home.Points, Away.Points) ~ 
                             adjust(Home.Team, HGA) + Away.Team + 
                             regress(Season, 2200, carryOver), 
                           data = afl_results, 
                           initial.elos = 2200,
                           k = k_val, 
                           history = T) %>%
  as.data.frame()

Creating a prediction column

Next, we will create and add a prediction column to the data, to be used when testing accuracy.

eloratings$pred_elo <- ifelse(eloratings$p.A > 0.5, 1, 0)

Testing accuracy

Testing accuracy allows us to modify our parameters to find the best combination. To achieve this, we first create a confusion matrix, referencing our ‘pred_elo’ and ‘wins.A’ columns, and then we run the second command listed to produce an Accuracy value.

In this case, I found that the best combination of the chosen parameters is what is set above, however this will differ depending on amount of parameters and the data set used, so it is best to keep testing different combinations.

cm <- confusionMatrix(data = factor(eloratings$pred_elo,levels = c(0, 0.5, 1)),
                      reference = factor(eloratings$wins.A,levels = c(0, 0.5, 1)))

cm$overall["Accuracy"]
##  Accuracy 
## 0.6648452

Results of this test show that the accuracy was 0.665.

View Results

Once we have run the ELO function and tested for accuracy, we can view the results. Here, we view the results in a few different ways using different commands on the same data.

Method 1 - as.data.frame

Firstly, we can use as.data.frame to view predicted and actual results, along with the change in ratings for both the home and away team for all matches. In this instance, we are previewing the last 6 rows of the dataframe using the ‘tail’ command.

as.data.frame(eloratings) %>% tail()
##            team.A      team.B       p.A wins.A   update.A   update.B    elo.A
## 544       Geelong  West Coast 0.8683866      1   3.421948  -3.421948 2338.182
## 545      Essendon    Richmond 0.4041567      0 -10.508075  10.508075 2131.622
## 546 Port Adelaide    Adelaide 0.5855225      1  10.776415 -10.776415 2193.902
## 547      Hawthorn   Footscray 0.4557900      0 -11.850540  11.850540 2141.500
## 548       Carlton Collingwood 0.4237466      0 -11.017412  11.017412 2196.936
## 549      St Kilda      Sydney 0.4185649      0 -10.882688  10.882688 2189.730
##        elo.B pred_elo
## 544 2033.573        1
## 545 2250.071        0
## 546 2142.332        1
## 547 2226.002        0
## 548 2302.374        0
## 549 2298.591        0

Method 2 - as.matrix

Next, we can use as.matrix which will display each team’s rating change over time. Again, we use ‘tail’ to view the last 6 rows.

as.matrix(eloratings) %>% tail()
##        team.A          team.B        p.A         wins.A update.A     
## [544,] "Geelong"       "West Coast"  "0.8683866" "1.0"  "  3.4219482"
## [545,] "Essendon"      "Richmond"    "0.4041567" "0.0"  "-10.5080752"
## [546,] "Port Adelaide" "Adelaide"    "0.5855225" "1.0"  " 10.7764148"
## [547,] "Hawthorn"      "Footscray"   "0.4557900" "0.0"  "-11.8505400"
## [548,] "Carlton"       "Collingwood" "0.4237466" "0.0"  "-11.0174119"
## [549,] "St Kilda"      "Sydney"      "0.4185649" "0.0"  "-10.8826875"
##        update.B      elo.A      elo.B      pred_elo
## [544,] " -3.4219482" "2338.182" "2033.573" "1"     
## [545,] " 10.5080752" "2131.622" "2250.071" "0"     
## [546,] "-10.7764148" "2193.902" "2142.332" "1"     
## [547,] " 11.8505400" "2141.500" "2226.002" "0"     
## [548,] " 11.0174119" "2196.936" "2302.374" "0"     
## [549,] " 10.8826875" "2189.730" "2298.591" "0"

Method 3 - final.elos

The last component of the data we want to view is the final ELOs of all 18 teams, in this case at the end of the 2022 regular season. This can be done using the final.elos command.

*NOTE - Must turn the ELO calculation function into a list, as the final.elos command does not work on data frames. This is done simply by removing (or commenting out) the ‘as.data.frame’ line at the end of the function code. See below for how to do this.

eloratings <- elo::elo.run(formula = score(Home.Points, Away.Points) ~ 
                             adjust(Home.Team, HGA) + Away.Team + 
                             regress(Season, 2200, carryOver), 
                           data = afl_results, 
                           initial.elos = 2200,
                           k = k_val, 
                           history = T) #%>%
  #as.data.frame()

final.elos(eloratings)
##        Adelaide  Brisbane Lions         Carlton     Collingwood        Essendon 
##        2142.332        2263.830        2196.936        2302.374        2131.622 
##       Footscray       Fremantle         Geelong      Gold Coast             GWS 
##        2226.002        2278.453        2338.182        2176.609        2112.851 
##        Hawthorn       Melbourne North Melbourne   Port Adelaide        Richmond 
##        2141.500        2288.472        2034.968        2193.902        2250.071 
##        St Kilda          Sydney      West Coast 
##        2189.730        2298.591        2033.573

Making predictions

Finally, once we have decided on parameters, run our ELO model and are happy with it, we can make predictions on the fixtures data. We do this using the predict function as seen below.

aflfinals_2022 <- aflfinals_2022 %>%
  mutate(Prob = predict(eloratings, newdata = aflfinals_2022))

aflfinals_2022
## # A tibble: 9 x 8
##   Date                Season Season.Game Round Home.Team     Away.~1 Venue  Prob
##   <dttm>               <dbl>       <int> <dbl> <chr>         <chr>   <chr> <dbl>
## 1 2022-09-01 19:20:00   2022         199    24 Brisbane Lio~ Richmo~ Gabba 0.563
## 2 2022-09-02 19:50:00   2022         200    24 Melbourne     Sydney  M.C.~ 0.529
## 3 2022-09-03 16:35:00   2022         201    24 Geelong       Collin~ M.C.~ 0.594
## 4 2022-09-03 18:10:00   2022         202    24 Fremantle     Footsc~ Pert~ 0.616
## 5 2022-09-09 19:50:00   2022         203    25 Melbourne     Brisba~ M.C.~ 0.578
## 6 2022-09-10 19:25:00   2022         204    25 Collingwood   Freman~ M.C.~ 0.577
## 7 2022-09-16 19:50:00   2022         205    26 Geelong       Brisba~ M.C.~ 0.646
## 8 2022-09-17 16:45:00   2022         206    26 Sydney        Collin~ S.C.~ 0.538
## 9 2022-09-24 14:30:00   2022         207    27 Geelong       Sydney  M.C.~ 0.599
## # ... with abbreviated variable name 1: Away.Team

Conclusions / Summary

  • Overall, this page provides a fairly straight forward demonstration of creating an AFL ELO model and using it to make predictions.

  • Many modifications and improvements can be added in order to strengthen the ELO formula and improve accuracy of predictions. For example, adding more parameters and modifying parameter values to find the optimal combination for your model is recommended.

  • There is plenty of capacity for this process to be continued further e.g. by tuning parameters and splitting the data into testing and training sets for additional use.

  • Results of this model can also be utilised in other predictive models e.g. Logistic Regression Model.