This quick analysis is a simple look at how well the AFL prediction models sourced by Squiggle perform relative to the aggregated betting odds provided by Odds Portal.
I’ve always been rather skeptical about the utility of using Elo models to predict the head-to-head outcome of AFL matches, simply because it’s extremely difficult (if not impossible) to consistenly beat an efficient market. The reason for this is based on the Wisdom of the Crowds principle where the aggregate prediction of a large, independent and intellectually diverse population will usually be more accurate than that of any single individual, due to the canceling out of both noise and certain biases associated with individual predictions. It is therefore unlikely that a ratings system, using the same information available to the public, could consistently outperform the consensus opinion.
We can use Squiggles publicly available API to load the historical tipping data from all the models sourced in the site.
library(jsonlite)
library(dplyr)
library(data.table)
# Squiggle API
data = fromJSON("https://api.squiggle.com.au/?q=tips")
# Using na.omit from the data.table package
# Removing predictions with no outcome (2020 games that haven't been played)
predictions = na.omit(data$tips, cols = "correct") %>%
data.frame()
Since we are going to compare the predictive performance of these models against the betting data, the sample size needs to be same. Hence, we should only analyse models that have predicted every game within a specific time span. The number of predictions made by each model accross each season can be seen using a pivot table.
##
## 2017 2018 2019 2020
## AFL Gains 0 0 207 0
## AFL Lab 0 0 207 9
## AFL_GO 0 0 0 9
## AFLalytics 0 0 207 9
## Aggregate 207 207 207 9
## Fat Stats 0 0 207 9
## Footy Maths Institute 207 207 207 9
## Graft 207 207 207 9
## HPN 0 207 207 0
## Live Ladders 0 207 207 9
## Massey Ratings 0 207 207 9
## Matter of Stats 207 207 207 9
## PlusSixOne 207 207 198 9
## Punters 0 0 207 9
## Squiggle 207 207 207 9
## Stattraction 0 207 207 9
## Swinburne 0 207 207 9
## The Arc 207 207 207 9
## The Flag 0 0 0 9
From the above table it is clear that most of the models were making predictions by 2018, so we’ll filter the data to only include models that have made continous predictions from 2018-2020. However, any year combination can be analysed through the shiny app that I made to summarise the results in a more in-depth way.
# Specify start and end dates
start = "2018"
end = "2020"
# Create an index from 1 (earliest year) to N (latest year)
# corresponding to the column number for each year
index = data.frame(t(data.frame(colnum = seq(1:length(unique(predictions$year))))))
names(index) = unique(predictions$year)
# Keep sources that have provided predictions for all the years within the start and end range
keep = names(which(
colSums(t(table(predictions$sourceid,predictions$year)[,index[,start]:index[,end]] > 0)) ==
length(index[,start]:index[,end]) )) %>%
as.integer()
predictions = predictions[predictions$sourceid %in% keep & predictions$year>=as.numeric(start), ]
In order to determine the accuracy of a prediction, Squiggle uses a measure called bits which rewards those who were more confident with a tip that was correct, and punishes those who were more confident with a tip that ended up being incorrect. I also decided to calculate the Brier scores for each prediciton, since this is a far more common way of measuring predictive accuracy (although these two measures are very highly correlated with an R-squared of ~0.98). Since Brier scores are a measure of error (specifically mean squared error), a lower score is better.
# Convert varibles to numeric form
predictions[,c("err","confidence","margin","bits")] = lapply(predictions[,c("err","confidence","margin","bits")],
as.numeric)
# Create Brier scores
predictions$brier = (predictions$confidence/100 - predictions$correct)^2
The same process is then applied to the Odds Portal betting data, sourced from aussportsbetting.com
library(readxl)
library(lubridate)
## betting.odds data: http://www.aussportsbetting.com/data/historical-afl-results-and-odds-data/
betting.odds = read_excel("~/Documents/Docs/afl.xlsx", skip = 1) # Check header row first
betting.odds = betting.odds[,1:14] # Keep relevant columns
# Use the same date range as the Elo models
betting.odds = betting.odds %>% filter(year(Date)>=2018)
# Convert head-to-head price to percentage odds that sum to 1
betting.odds$Home.confidence =
(1/betting.odds$`Home Odds`)/((1/betting.odds$`Home Odds`)+(1/betting.odds$`Away Odds`))
betting.odds$Away.confidence = 1 - betting.odds$Home.confidence
# Odds of the favourite
betting.odds$tip.odds = apply(betting.odds[, c("Home.confidence","Away.confidence")], 1, max)
# Result of the tip
betting.odds$tip.result =
ifelse(betting.odds$`Home Score` - betting.odds$`Away Score` > 0 & betting.odds$Home.confidence>0.5 |
betting.odds$`Home Score` - betting.odds$`Away Score` < 0 & betting.odds$Home.confidence<=0.5 , 1,
ifelse(betting.odds$`Home Score` - betting.odds$`Away Score` == 0,0.5, 0))
betting.odds$brier = (betting.odds$tip.odds - betting.odds$tip.result)^2
betting.odds$bits = ifelse(betting.odds$tip.result == 1, 1 + log(betting.odds$tip.odds,base = 2),
ifelse(betting.odds$tip.result == 0,1 + log(1 - betting.odds$tip.odds,base = 2),
1 + 0.5*log(betting.odds$tip.odds*(1-betting.odds$tip.odds),base=2)))
The brier and bits scores, as well as overall accuracy can provide a basic summary of the models performance.
# Create summary table to display brier scores, bits and overall accuracy
accuracy.models = as.data.frame(group_by(predictions,source)%>%
summarise(Brier = round(mean(brier),3), Bits = round(mean(bits),3),
Accuracy = round(100*mean(correct),1)))
# Draws counted as correct tips
betting.odds$tip.result[betting.odds$tip.result==0.5] = 1
accuracy.betting = data.frame(betting.odds)%>%
summarise(source = "Betting Odds", Brier = round(mean(brier),3),
Bits = round(mean(bits),3), Accuracy = round(100*mean(tip.result),1))
total.accuracy = rbind(accuracy.betting,accuracy.models)
# Order by brier score
total.accuracy[order(total.accuracy$Brier,decreasing = FALSE),]
## source Brier Bits Accuracy
## 1 Betting Odds 0.201 0.153 69.0
## 2 Aggregate 0.202 0.149 68.1
## 5 Live Ladders 0.202 0.153 68.6
## 9 Squiggle 0.202 0.151 68.3
## 11 Swinburne 0.203 0.148 67.4
## 7 Matter of Stats 0.204 0.146 66.4
## 12 The Arc 0.205 0.139 67.8
## 3 Footy Maths Institute 0.206 0.136 68.6
## 4 Graft 0.206 0.136 67.6
## 6 Massey Ratings 0.209 0.125 69.3
## 8 PlusSixOne 0.212 0.116 68.4
## 10 Stattraction 0.212 0.117 66.9
A deeper analysis can also be performed by assessing model performance across three componenents that make up the Brier score. These components are;
Reliability/Calibration: Measures model performance at discrete predicton intervals (ie. do teams with 60% odds win 60% of the time? etc..), lower score the better.
Resolution: A measure of variability in the accuracy of the model across the different prediction intervals. A higher score is better as it generally indicates that the model is making bolder predictions.
Uncertainty: The uncertainty of a model in predicting the correct result. A lower value indicates a more accurate model.
# Create bins for the forecasted probabilities
predictions$conf.bin = cut(predictions$confidence,breaks = seq(50,100,1),right=FALSE)
betting.odds$conf.bin = cut(100*betting.odds$tip.odds,breaks = seq(50,100,1),right=FALSE)
# Calculate the average correct outcomes, forecasted odds & total number of observations
# across each bin
data = as.data.frame(group_by(predictions,conf.bin,source)%>%
summarise(Result = mean(correct),Forecast = mean(confidence)/100,Count = n()))
data.betting = as.data.frame(group_by(betting.odds,conf.bin)%>%
summarise(Result = mean(tip.result),Forecast = mean(tip.odds),Count = n()))
# Elo model base rate
Base.Rate = group_by(predictions,source)%>%
summarise(Base.Rate = mean(correct))
# Add the overall base rate value to binned data
data = left_join(data,Base.Rate,by="source")
# Betting Odds base rate
data.betting$Base.Rate = mean(betting.odds$tip.result)
## Reliability, Resolution & Uncertainty Calculations
# Elo models
data$reliability = data$Count*(data$Forecast - data$Result)^2
data$resolution = data$Count*(data$Result - data$Base.Rate)^2
data$uncertainty = data$Base.Rate*(1-data$Base.Rate)
table = group_by(data,source)%>% summarise(Reliability = sum(reliability)/sum(Count),
Resolution = sum(resolution)/sum(Count),
Uncertainty = mean(uncertainty),
Brier = Reliability-Resolution+Uncertainty)
# Betting Odds
data.betting$reliability = data.betting$Count*(data.betting$Forecast - data.betting$Result)^2
data.betting$resolution = data.betting$Count*(data.betting$Result - data.betting$Base.Rate)^2
data.betting$uncertainty = data.betting$Base.Rate*(1-data.betting$Base.Rate)
table.betting = data.betting %>% summarise(source = "Betting Odds",
Reliability = sum(reliability)/sum(Count),
Resolution = sum(resolution)/sum(Count),
Uncertainty = mean(uncertainty),
Brier = Reliability-Resolution+Uncertainty)
table.Brier = rbind(table,table.betting)
table.Brier[order(table.Brier$Brier),]
## # A tibble: 12 x 5
## source Reliability Resolution Uncertainty Brier
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Betting Odds 0.0157 0.0288 0.214 0.201
## 2 Squiggle 0.0193 0.0339 0.216 0.202
## 3 Live Ladders 0.0159 0.0296 0.216 0.202
## 4 Aggregate 0.0135 0.0285 0.217 0.202
## 5 Swinburne 0.0179 0.0346 0.220 0.203
## 6 Matter of Stats 0.0143 0.0326 0.223 0.205
## 7 The Arc 0.0192 0.0325 0.218 0.205
## 8 Footy Maths Institute 0.0122 0.0219 0.216 0.206
## 9 Graft 0.0144 0.0270 0.219 0.206
## 10 Massey Ratings 0.0243 0.0280 0.213 0.209
## 11 PlusSixOne 0.0209 0.0254 0.216 0.212
## 12 Stattraction 0.0216 0.0311 0.221 0.212
From the above table, we can see that the Betting Odds model derives its low Brier score by exhibiting moderate to good performances across each of the three components (results are easier to see in the web app), whereas the other models only perform well in one or two of these components.
# Betting odds reliability and resolution values
Betting.Odds.Reliability = as.numeric( table.Brier[table.Brier$source=="Betting Odds","Reliability"])
Betting.Odds.Resolution = as.numeric(table.Brier[table.Brier$source=="Betting Odds","Resolution"])
# Models with higher reliability and resolution than betting odds
table.Brier[table.Brier$Reliability < Betting.Odds.Reliability &
table.Brier$Resolution > Betting.Odds.Resolution,1]
## # A tibble: 1 x 1
## source
## <chr>
## 1 Matter of Stats
The Matter of Stats model was the only one that outperformed the betting odds in regards to reliability and resolution in the 2018-2020 period, indicating that its relatively poor overall accuracy may be due to performing slightly worse than expected when predicting the marginal 50-50 games.
It is not uncommon for various models to beat the market in any one or two year period (as can be seen in the aforementioned web app), but it is clear that no-one has been able to significantly beat the betting market over the last three seasons (Massey Ratings were only more accurate by 1 tip out of 423 games). However, this doesn’t necessarily detract from the quality of the models listed on Max Barry’s Squiggle site, but is more of a demonstration of the relative efficiency of the head to head betting market. The NFL Elo model produced by FiveThirtyEight performs similarly when compared to betting data.
Elo ratings are useful when its necessary to have an objective ranking system, like in games like tennis and chess, but if all you want to do is determine the chances of your team winning on the weekend, you might as well look at the bookmaker odds. This is why footy tipping is largely a game of luck, because the most likely results don’t always come to fruition so there will always be people who perform better (and worse) than the expected outcome. Hence, the only winning strategy is to have a fair amount of fortune.