Results Dashboard

Web App

Overview

This quick analysis is a simple look at how well the AFL prediction models sourced by Squiggle perform relative to the aggregated betting odds provided by Odds Portal.

I’ve always been rather skeptical about the utility of using Elo models to predict the head-to-head outcome of AFL matches, simply because it’s extremely difficult (if not impossible) to consistenly beat an efficient market. The reason for this is based on the Wisdom of the Crowds principle where the aggregate prediction of a large, independent and intellectually diverse population will usually be more accurate than that of any single individual, due to the canceling out of both noise and certain biases associated with individual predictions. It is therefore unlikely that a ratings system, using the same information available to the public, could consistently outperform the consensus opinion.

Elo Models

We can use Squiggles publicly available API to load the historical tipping data from all the models sourced in the site.

Since we are going to compare the predictive performance of these models against the betting data, the sample size needs to be same. Hence, we should only analyse models that have predicted every game within a specific time span. The number of predictions made by each model accross each season can be seen using a pivot table.

##                        
##                         2017 2018 2019 2020
##   AFL Gains                0    0  207    0
##   AFL Lab                  0    0  207    9
##   AFL_GO                   0    0    0    9
##   AFLalytics               0    0  207    9
##   Aggregate              207  207  207    9
##   Fat Stats                0    0  207    9
##   Footy Maths Institute  207  207  207    9
##   Graft                  207  207  207    9
##   HPN                      0  207  207    0
##   Live Ladders             0  207  207    9
##   Massey Ratings           0  207  207    9
##   Matter of Stats        207  207  207    9
##   PlusSixOne             207  207  198    9
##   Punters                  0    0  207    9
##   Squiggle               207  207  207    9
##   Stattraction             0  207  207    9
##   Swinburne                0  207  207    9
##   The Arc                207  207  207    9
##   The Flag                 0    0    0    9

From the above table it is clear that most of the models were making predictions by 2018, so we’ll filter the data to only include models that have made continous predictions from 2018-2020. However, any year combination can be analysed through the shiny app that I made to summarise the results in a more in-depth way.

In order to determine the accuracy of a prediction, Squiggle uses a measure called bits which rewards those who were more confident with a tip that was correct, and punishes those who were more confident with a tip that ended up being incorrect. I also decided to calculate the Brier scores for each prediciton, since this is a far more common way of measuring predictive accuracy (although these two measures are very highly correlated with an R-squared of ~0.98). Since Brier scores are a measure of error (specifically mean squared error), a lower score is better.

The same process is then applied to the Odds Portal betting data, sourced from aussportsbetting.com

Model Performance

The brier and bits scores, as well as overall accuracy can provide a basic summary of the models performance.

##                   source Brier  Bits Accuracy
## 1           Betting Odds 0.201 0.153     69.0
## 2              Aggregate 0.202 0.149     68.1
## 5           Live Ladders 0.202 0.153     68.6
## 9               Squiggle 0.202 0.151     68.3
## 11             Swinburne 0.203 0.148     67.4
## 7        Matter of Stats 0.204 0.146     66.4
## 12               The Arc 0.205 0.139     67.8
## 3  Footy Maths Institute 0.206 0.136     68.6
## 4                  Graft 0.206 0.136     67.6
## 6         Massey Ratings 0.209 0.125     69.3
## 8             PlusSixOne 0.212 0.116     68.4
## 10          Stattraction 0.212 0.117     66.9

A deeper analysis can also be performed by assessing model performance across three componenents that make up the Brier score. These components are;

# Create bins for the forecasted probabilities
predictions$conf.bin = cut(predictions$confidence,breaks = seq(50,100,1),right=FALSE)
betting.odds$conf.bin = cut(100*betting.odds$tip.odds,breaks = seq(50,100,1),right=FALSE)
        
# Calculate the average correct outcomes, forecasted odds & total number of observations
# across each bin
data =  as.data.frame(group_by(predictions,conf.bin,source)%>%
          summarise(Result = mean(correct),Forecast = mean(confidence)/100,Count = n()))
         
data.betting =  as.data.frame(group_by(betting.odds,conf.bin)%>%
          summarise(Result = mean(tip.result),Forecast = mean(tip.odds),Count = n()))
        
# Elo model base rate
Base.Rate = group_by(predictions,source)%>%
                summarise(Base.Rate = mean(correct))
         
# Add the overall base rate value to binned data 
data = left_join(data,Base.Rate,by="source")
         
# Betting Odds base rate 
data.betting$Base.Rate =  mean(betting.odds$tip.result)
         
## Reliability, Resolution & Uncertainty Calculations
# Elo models
data$reliability = data$Count*(data$Forecast - data$Result)^2
data$resolution = data$Count*(data$Result - data$Base.Rate)^2
data$uncertainty = data$Base.Rate*(1-data$Base.Rate)
        
table = group_by(data,source)%>% summarise(Reliability = sum(reliability)/sum(Count), 
                                           Resolution = sum(resolution)/sum(Count),
                                           Uncertainty = mean(uncertainty),
                                           Brier = Reliability-Resolution+Uncertainty)
    
# Betting Odds 
data.betting$reliability = data.betting$Count*(data.betting$Forecast - data.betting$Result)^2
data.betting$resolution = data.betting$Count*(data.betting$Result - data.betting$Base.Rate)^2
data.betting$uncertainty = data.betting$Base.Rate*(1-data.betting$Base.Rate)

table.betting = data.betting %>% summarise(source = "Betting Odds", 
                                           Reliability = sum(reliability)/sum(Count), 
                                           Resolution = sum(resolution)/sum(Count),
                                           Uncertainty = mean(uncertainty),
                                           Brier = Reliability-Resolution+Uncertainty)
      
 table.Brier = rbind(table,table.betting)
 table.Brier[order(table.Brier$Brier),]
## # A tibble: 12 x 5
##    source                Reliability Resolution Uncertainty Brier
##    <chr>                       <dbl>      <dbl>       <dbl> <dbl>
##  1 Betting Odds               0.0157     0.0288       0.214 0.201
##  2 Squiggle                   0.0193     0.0339       0.216 0.202
##  3 Live Ladders               0.0159     0.0296       0.216 0.202
##  4 Aggregate                  0.0135     0.0285       0.217 0.202
##  5 Swinburne                  0.0179     0.0346       0.220 0.203
##  6 Matter of Stats            0.0143     0.0326       0.223 0.205
##  7 The Arc                    0.0192     0.0325       0.218 0.205
##  8 Footy Maths Institute      0.0122     0.0219       0.216 0.206
##  9 Graft                      0.0144     0.0270       0.219 0.206
## 10 Massey Ratings             0.0243     0.0280       0.213 0.209
## 11 PlusSixOne                 0.0209     0.0254       0.216 0.212
## 12 Stattraction               0.0216     0.0311       0.221 0.212

From the above table, we can see that the Betting Odds model derives its low Brier score by exhibiting moderate to good performances across each of the three components (results are easier to see in the web app), whereas the other models only perform well in one or two of these components.

## # A tibble: 1 x 1
##   source         
##   <chr>          
## 1 Matter of Stats

The Matter of Stats model was the only one that outperformed the betting odds in regards to reliability and resolution in the 2018-2020 period, indicating that its relatively poor overall accuracy may be due to performing slightly worse than expected when predicting the marginal 50-50 games.

Summary

It is not uncommon for various models to beat the market in any one or two year period (as can be seen in the aforementioned web app), but it is clear that no-one has been able to significantly beat the betting market over the last three seasons (Massey Ratings were only more accurate by 1 tip out of 423 games). However, this doesn’t necessarily detract from the quality of the models listed on Max Barry’s Squiggle site, but is more of a demonstration of the relative efficiency of the head to head betting market. The NFL Elo model produced by FiveThirtyEight performs similarly when compared to betting data.

Elo ratings are useful when its necessary to have an objective ranking system, like in games like tennis and chess, but if all you want to do is determine the chances of your team winning on the weekend, you might as well look at the bookmaker odds. This is why footy tipping is largely a game of luck, because the most likely results don’t always come to fruition so there will always be people who perform better (and worse) than the expected outcome. Hence, the only winning strategy is to have a fair amount of fortune.