End-of-season forecasting and Dynamic Pricing
J James Reade
Tinbergen Institute, Erasmus School of Economics, Rotterdam
13/04/2015
Introduction
- Bookmaking industry perhaps as ubiquitous as at any point in living memory?
- Deregulation the main driver, but also technology.
- Bookmakers must predict outcomes of events ahead of time:
- Forecast horizons from seconds to years.
- Additionally must update forecasts as new information revealed.
- Further, competitive pressure requires good forecasting.
- Particular bountiful source of events is sport:
- Football second (?) most popular betting event (after horse racing?).
- We investigate forecasts of end-of-season sports league outcomes.
- Such events satisfy following characteristics:
- Complex hierarchical structure.
- Huge interest.
- Massively varying uncertainty.
- Horizon varying from very long to very short.
- We propose simulation-based method of generating forecasts for such complex hierarchical events (end-of-season outcomes).
- Compare forecasts to those of bookmakers.
- We analyse dynamic pricing of bookmakers:
- Who offers best prices?
- How often do bookmakers update?
- What drives updates when they occur?
- Hugely preliminary work, although highly enjoyable.
- All comments/suggestions very welcome.
Plan of talk
- Review of relevant literature.
- Methodology of end-of-season forecast model.
- Data: For model and from bookmakers.
- Evaluate the model for individual matches.
- Evaluate model for end-of-season outcomes.
- Investigate dynamic pricing behaviour of bookmakers.
Literature
- Economic evaluations of bookmaker activity commonplace:
- E.g. Shin (1992), Vaughan Williams and Paton (1997),
- Analysis of complex hierarchical events such as end-of-season outcomes less so.
- Textbook treatment in, for example, Hyndman and Athanasopoulos (2012).
- Analysis of pricing naturally much more extensive:
- Basic competition models (Cournot, Bertrand) seem likely inadequate.
- Shin (1991), Shin (1992) and Shin (1993) attempt to capture theory of pricing for bookmakers.
- Attempts to explain commonly found pricing biases in bookmaking.
- More generally of interest to understand price movements:
- Why do firms change prices? Competition or new information or…?
Model: Methodology
- Multiple methods exist for modelling football match outcomes:
- Modelling goal outcomes: Bivariate Poission methods, e.g. Karlis and Ntzoufras (2003).
- Direct outcome modelling: Limited dependent variable methods, e.g. Forrest and Simmons (2000).
- Goddard and Asimakopoulos (2004) compares and contrasts methods.
- We adopt direct outcome modelling:
- Not averse to switching, decision made more for practical/familiarity reasons.
- Dependent variable: \[
y_{it} = \left\{\begin{array}{lll}0 && \text{if match $i$ at time $t$ results in away win,}\\0.5 && \text{if match $i$ at time $t$ results in draw,}\\1 && \text{if match $i$ at time $t$ results in home win,}\end{array}\right.
\]
- Explanatory variables (
.D is difference between home and away side, .2 is squared variable):
E.1: Elo prediction for match, based on Elo scores calculated since 1877.
pts1, pts.D pts.D.2: League points.
pld1, pld.D, pld.D.2: League matches played.
gs1, gs.D, gs.D.2: Goals scored.
gd1, gd.D, gd.D.2: Goal difference.
pos1, pos.D, pos.D.2: League position.
form1, form.D, form.D.2: Form (sum of points in last six matches).
- Season fixed effects.
- Use ordered logistic model.
End-of-season: Methodology
- Simulation method to generate probabilities of season outcomes:
- Use estimated model up to start of season.
- Use results up to starting point to update explanatory variables.
- Generate probabilistic forecasts for each match on first matchday.
- Use multinominal distribution to generate outcomes according to forecast probabilities.
- Update table according to outcomes generated.
- Use average goals scored/conceded in draws/home wins/away wins.
- Update Elo scores for teams based on imputed outcomes.
- Generate probabilistic forecasts for each match on next matchday and repeat process until end of season.
- Run sufficiently many replications to generate frequentist probabilities of outcomes.
End-of-season: Example
- Forecasts from yesterday’s matches:
loc <- "/home/readejj/Dropbox/Research/Sport/managerial change Cormac/"
loc2 <- "/home/readejj/Dropbox/Teaching/Reading/ec313/2015/Football-forecasts/"
library(knitr)
dates <- c("2015-01-30","2015-02-06","2015-02-13","2015-02-20","2015-02-27","2015-03-06","2015-03-13","2015-03-20","2015-04-10")
date.1 <- dates[NROW(dates)]
forecast.matches <- read.csv(paste(loc2,"forecasts_",date.1,".csv",sep=""),stringsAsFactors=F)
div <- "English Premier"
matches <- forecast.matches[forecast.matches$division==div,]
matches <- matches[order(matches$date),]
matches$id <- 1:NROW(matches)
par(mar=c(9,4,4,5)+.1)
plot(matches$id,matches$outcome,xaxt="n",xlab="",ylim=range(0,1),
main=paste("Forecasts of Weekend ",div," Matches",sep=""),
ylab="Probability of Outcome",col="white")
lines(matches$id,matches$Ph,col=2,pch=15,type="p")
lines(matches$id,matches$Pd,col=3,pch=16,type="p")
lines(matches$id,matches$Pa,col=4,pch=17,type="p")
legend("topleft",ncol=3,pch=c(15,16,17),col=c(2:4),
legend=c("OL (home)","OL (draw)","OL (away)"),bty="n")
abline(h=0.5,lty=2)
abline(h=0.6,lty=3)
abline(h=0.7,lty=2)
axis(1,at=matches$id,labels=paste(matches$team1,matches$team2,sep=" v "),las=2,cex.axis=0.65)
for(i in 2:NROW(matches)){
if(matches$date[i]!=matches$date[i-1]) {
lines(rep(c(i-0.5),2),c(0,1),lty=2)
}
}

- Generate outcomes for these matches using multinomial distribution:
for(i in 1:NROW(matches)) {
matches$outcome[i] <- c(1,0.5,0) %*% rmultinom(n = 1, size=1,
prob = matches[i,c("Ph","Pa","Pd")])
}
kable(matches[,c("team1","team2","outcome")])
| 3 |
Swansea |
Everton |
0.0 |
| 4 |
Tottenham |
Aston Villa |
1.0 |
| 5 |
Southampton |
Hull |
1.0 |
| 6 |
West Ham |
Stoke |
0.0 |
| 7 |
West Brom |
Leicester |
1.0 |
| 8 |
Sunderland |
C Palace |
0.5 |
| 9 |
Burnley |
Arsenal |
0.5 |
| 110 |
QPR |
Chelsea |
0.5 |
| 111 |
Man Utd |
Man City |
0.5 |
| 113 |
Liverpool |
Newcastle |
0.5 |
- Update goals scored/conceded with averages, update league tables, Elo scores.
- Generate forecasts for next weekend’s matches using updated data.
- Carry on until end of season and log final positions for teams.
- Repeat sufficiently many times to generate distribution.
Data: Model
res.eng <- read.csv(paste(loc,"res-eng.csv",sep=""),stringsAsFactor=F)
res.eng$goals1 <- as.numeric(res.eng$goals1)
res.eng$goals2 <- as.numeric(res.eng$goals2)
res.eng$tier <- as.numeric(res.eng$tier)
res.eng$season <- as.numeric(res.eng$season)
res.eng$X <- NULL
match.dates.2013 <- res.eng$date[res.eng$season==2013]
match.dates.2013 <- match.dates.2013[duplicated(match.dates.2013)==F]
match.dates.2013 <- match.dates.2013[order(match.dates.2013)]
match.dates.2014 <- res.eng$date[res.eng$season==2014]
match.dates.2014 <- match.dates.2014[duplicated(match.dates.2014)==F]
match.dates.2014 <- match.dates.2014[order(match.dates.2014)]
- Data collected from http://www.soccerbase.com:
- Every English football match between 1888-09-08 and 2015-05-24.
- Calculate league positions/details and Elo scores:
elo.prem <- read.csv(paste(loc2,"elo-prem-2010.csv",sep=""),stringsAsFactor=F)
epl.teams <- res.eng$team1[res.eng$season==2014 & res.eng$tier==1]
epl.teams <- c("Liverpool","Burnley","Stoke","QPR","Newcastle","Arsenal","West Ham","West Brom","Man Utd","Leicester","Chelsea","Aston Villa","Everton","Sunderland","Tottenham","Hull","Swansea","Man City","C Palace","Southampton")
epl.team.cols <- c("red3","purple","pink","lightblue","black","darkred","purple2","darkblue","red","blue","darkblue","purple3","blue2","hotpink","grey10","orange","grey20","skyblue","red4","pink3")
plot(range(as.Date(elo.prem$date),na.rm=T),rep(1,2),type="l",
ylim=range(elo.prem[gsub(" ",".",epl.teams)],na.rm=T),
main="Elo Scores for Current EPL Teams since 2010",ylab="Elo score",xlab="Date")
for(t in 1:NROW(epl.teams)) {
lines(as.Date(elo.prem$date),elo.prem[,gsub(" ",".",epl.teams[t])],type="l",col=epl.team.cols[t])
}

library(MASS)
model.ord <- polr(as.factor(outcome) ~ E.1 + pts1 + pts.D + pts.D.2 + pld1 + pld.D + pld.D.2 +
gs1 + gs.D + gs.D.2 + gd1 + gd.D + gd.D.2 + pos1 + pos.D + pos.D.2 +
form1 + form.D + form.D.2 + season.d,
data=res.eng, method = "logistic")
options(scipen=13)
#summary(model.ord)
kable(summary(model.ord)$coef, digits=3)
##
## Re-fitting to get Hessian
| E.1 |
2.379 |
0.019 |
128.119 |
| pts1 |
0.007 |
0.005 |
1.627 |
| pts.D |
-0.014 |
0.003 |
-4.306 |
| pts.D.2 |
0.000 |
0.000 |
-0.972 |
| pld1 |
-0.012 |
0.006 |
-1.935 |
| pld.D |
0.024 |
0.010 |
2.365 |
| pld.D.2 |
0.007 |
0.004 |
1.575 |
| gs1 |
0.003 |
0.002 |
1.700 |
| gs.D |
0.001 |
0.002 |
0.801 |
| gs.D.2 |
0.000 |
0.000 |
0.046 |
| gd1 |
-0.005 |
0.003 |
-2.087 |
| gd.D |
0.017 |
0.002 |
9.023 |
| gd.D.2 |
0.000 |
0.000 |
-0.010 |
| pos1 |
0.005 |
0.003 |
1.407 |
| pos.D |
0.008 |
0.003 |
2.905 |
| pos.D.2 |
0.000 |
0.000 |
2.502 |
| form1 |
0.007 |
0.004 |
1.974 |
| form.D |
-0.013 |
0.003 |
-4.388 |
| form.D.2 |
-0.001 |
0.000 |
-2.326 |
| season.d |
-0.003 |
0.000 |
-11.728 |
| 0|0.5 |
0.127 |
0.033 |
3.807 |
| 0.5|1 |
1.271 |
0.033 |
38.085 |
Data: Bookmakers
source("/home/readejj/Dropbox/Research/Code/R/betting/clean.data.R")
bks <- c("B3", "SK", "BX", "BY", "FR", "SO", "VC", "PP", "SJ", "EE", "LD", "CE",
"WH", "WN", "SX", "FB", "WA", "TI", "UN", "BW", "RD", "BF", "BD", "MA")
bks.full <- c("Bet365", "SkyBet", "Totesport", "Boyle Sports", "Betfred", "Sportingbet", "Bet Victor", "Paddy Power", "Stan James", "888sport", "Ladbrokes", "Coral",
"William Hill", "Winner", "Spreadex", "Betfair-fixed", "Betway", "Titanbet", "Unibet", "bwin", "32Red", "Betfair-exchange", "Betdaq", "Matchbook")
direc <- c("/home/readejj/Dropbox/Research/Data for Ideas/Betting/football/eos/premier-league/2015-04-08/winner")
winner <- data.frame(stringsAsFactors=F)
teams <- c(paste(direc,dir(direc,pattern="*.csv"),sep="/"))
for (team in teams) {
temp <- read.csv(team,stringsAsFactors=F)
temp <- clean.data(temp)
temp$team <- gsub("/home/readejj/Dropbox/Research/Data for Ideas/Betting/football/eos/premier-league/2015-04-08/winner/(\\S+).csv","\\1",team)
winner <- rbind(winner,temp)
}
## Loading required package: zoo
##
## Attaching package: 'zoo'
##
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
winner <- winner[order(winner$team,winner$Date.Time),]
direc <- c("/home/readejj/Dropbox/Research/Data for Ideas/Betting/football/eos/premier-league/2015-04-08/relegation")
releg <- data.frame(stringsAsFactors=F)
teams <- c(paste(direc,dir(direc,pattern="*.csv"),sep="/"))
for (team in teams) {
temp <- read.csv(team,stringsAsFactors=F)
temp <- clean.data(temp)
temp$team <- gsub("/home/readejj/Dropbox/Research/Data for Ideas/Betting/football/eos/premier-league/2015-04-08/relegation/(\\S+).csv","\\1",team)
releg <- rbind(releg,temp)
}
releg <- releg[order(releg$team,releg$Date.Time),]
- Data collected from http://www.oddschecker.com weekly.
- Odds on individual match outcomes and end-of-season outcomes over 20 bookmakers.
- All files contain price history of bookmakers.
- E.g. EPL winner file dates from 2014-03-18 to 2015-04-08.
- E.g. EPL relegation file dates from 2014-04-08 to 2015-04-08.
- Individual match betting data has shorter time horizons.
Model: Appraisal against individual events
- Model has been used to construct ex ante forecasts in recent weeks.
- Forecasts for all professional domestic English football matches.
- To evaluate quality of forecasts can run Mincer-Zarnowitz test.
- Regression of outcomes \(y_{it}\) on model forecasts \(\widehat{y}_{mit}\) plus constant: \[
y_{it} = \alpha + \beta \widehat{y}_{mit} + e_{it}.
\]
- Unbiased forecasts: \(\alpha=0\) and \(\beta=1\), with “well-behaved” residuals.
all.forecast.outcomes <- data.frame()
for(i in dates) {
temp.0 <- read.csv(paste(loc2,"forecast_outcomes_",i,".csv",sep=""),stringsAsFactors=F)
temp.0$X <-NULL
temp.0$forc.week <- i
temp.1 <- read.csv(paste(loc2,"forecasts_",i,".csv",sep=""))
temp.1$X <-NULL
temp.1 <- temp.1[is.na(temp.1$outcome)==F,]
if(!("Ph" %in% colnames(temp.1))) {
temp.1$Ph <- NA
temp.1$Pd <- NA
temp.1$Pa <- NA
}
if(!("tier" %in% colnames(temp.0))) {
temp.0$tier <- NA
}
temp.2 <- merge(temp.1[,c("match_id","outcome","Ph","Pd","Pa")],
temp.0[,c("match_id","date","division","team1",
"goals1","goals2","team2","outcome",
"season","tier","forc.week")],
by=c("match_id"),suffixes=c(".forc",".final"))
all.forecast.outcomes <- rbind(temp.2[is.na(temp.2$outcome.final)==F,],all.forecast.outcomes)
}
all.forecast.outcomes$outcome.h <- as.numeric(all.forecast.outcomes$outcome.final==1)
mz.h <- lm(outcome.h ~ Ph,data=all.forecast.outcomes)
#summary(mz.h)
kable(summary(mz.h)$coef, digits=3)
| (Intercept) |
0.095 |
0.073 |
1.309 |
0.191 |
| Ph |
0.733 |
0.153 |
4.793 |
0.000 |
all.forecast.outcomes$outcome.d <- as.numeric(all.forecast.outcomes$outcome.final==0.5)
mz.d <- lm(outcome.d ~ Pd,data=all.forecast.outcomes)
#summary(mz.d)
kable(summary(mz.d)$coef, digits=3)
| (Intercept) |
0.061 |
0.155 |
0.396 |
0.692 |
| Pd |
0.714 |
0.598 |
1.194 |
0.233 |
all.forecast.outcomes$outcome.a <- as.numeric(all.forecast.outcomes$outcome.final==0)
mz.a <- lm(outcome.a ~ Pa,data=all.forecast.outcomes)
#summary(mz.a)
kable(summary(mz.a)$coef, digits=3)
| (Intercept) |
0.081 |
0.053 |
1.510 |
0.132 |
| Pa |
0.854 |
0.170 |
5.011 |
0.000 |
calib.h <- aggregate(all.forecast.outcomes$outcome.h,by=list(round(all.forecast.outcomes$Ph,2)),FUN=mean)
plot(calib.h$Group.1,calib.h$x,xlim=range(0,1),ylim=range(0,1),
ylab="Frequency of events that occurred",xlab="Forecast probability",
main="Calibration plot for home win outcomes")
abline(0,1,lty=3)

calib.d <- aggregate(all.forecast.outcomes$outcome.d,by=list(round(all.forecast.outcomes$Pd,2)),FUN=mean)
plot(calib.d$Group.1,calib.d$x,xlim=range(0,1),ylim=range(0,1),
ylab="Frequency of events that occurred",xlab="Forecast probability",
main="Calibration plot for draw outcomes")
abline(0,1)

calib.a <- aggregate(all.forecast.outcomes$outcome.a,by=list(round(all.forecast.outcomes$Pa,2)),FUN=mean)
plot(calib.a$Group.1,calib.a$x,xlim=range(0,1),ylim=range(0,1),
ylab="Frequency of events that occurred",xlab="Forecast probability",
main="Calibration plot for away win outcomes")
abline(0,1)

- Regression and plots suggest quality of model unspectacular.
- Nonetheless important to compare against benchmark of bookmakers.
Comparison: Bookmakers on individual events
- Bookmakers represent benchmark in forecasting match outcomes.
- We can run same Mincer-Zarnowitz test to evaluate quality.
- Regression of outcomes \(y_{it}\) on forecasts \(\widehat{y}_{bit}\) plus constant: \[
y_{it} = \alpha + \beta \widehat{y}_{bit} + e_{it}.
\]
- Unbiased forecasts: \(\alpha=0\) and \(\beta=1\), with “well-behaved” residuals.
- Commonly discovered favourite-longshot bias implies \(\beta>1\).
bk <- data.frame()
dates <- dir("/home/readejj/Dropbox/Research/Data for Ideas/Betting/football/",
pattern="\\d{4}-\\d{2}-\\d{2}")
for(d in dates) {
loc0 <- paste("/home/readejj/Dropbox/Research/Data for Ideas/Betting/football/",d,"/",sep="")
divs <- dir(loc0)
for(div in divs) {
loc2 <- paste(loc0,div,"/",sep="")
bk.matches <- dir(loc2,pattern=".*?-v-.*?-.*?[.]csv")
for(i in bk.matches) {
temp <- read.csv(paste(loc2,i,sep="/"),stringsAsFactors=F)
if(NROW(temp)>=2) {
temp <- clean.data(temp)
temp$div <- div
temp$match.event <- gsub("[.]csv","",i)
bk <- rbind(bk,temp)
}
}
}
}
bk <- bk[order(bk$match.event,bk$Date.Time),]
bk$mean <- 1/rowMeans(bk[colnames(bk)[nchar(colnames(bk))==2]],na.rm=T)
bk$match.event <- gsub("sheffield-wednesday","sheff wed",bk$match.event)
bk$match.event <- gsub("nottingham-forest","nottm forest",bk$match.event)
bk$match.event <- gsub("middlesbrough","middlesbro",bk$match.event)
bk$match.event <- gsub("port-vale","port vale",bk$match.event)
bk$match.event <- gsub("man-utd","man utd",bk$match.event)
bk$match.event <- gsub("man-city","man city",bk$match.event)
bk$match.event <- gsub("aston-villa","aston villa",bk$match.event)
bk$match.event <- gsub("west-brom","west brom",bk$match.event)
bk$match.event <- gsub("west-ham","west ham",bk$match.event)
bk$match.event <- gsub("crystal-palace","c palace",bk$match.event)
bk$match.event <- gsub("sheffield-utd","sheff utd",bk$match.event)
bk$match.event <- gsub("cambridge-utd","cambridge u",bk$match.event)
bk$match.event <- gsub("dagenham-redbridge","dag & red",bk$match.event)
bk$match.event <- gsub("mk-dons","mk dons",bk$match.event)
bk$match.event <- gsub("notts-county","notts co",bk$match.event)
bk$match.event <- gsub("newport-county","newport co",bk$match.event)
bk$match.event <- gsub("afc-wimbledon","afc w'bledon",bk$match.event)
bk$match.event <- gsub("york-city","york",bk$match.event)
bk$team1 <- gsub("^(.*?)-v-(.*?)-(.*?)$","\\1",bk$match.event)
bk$team2 <- gsub("^(.*?)-v-(.*?)-(.*?)$","\\2",bk$match.event)
bk$match.event <- gsub("^(.*?)-v-(.*?)-(.*?)$","\\3",bk$match.event)
bk.h <- bk[bk$match.event==bk$team1,]
bk.d <- bk[bk$match.event=="draw",]
bk.a <- bk[bk$match.event==bk$team2,]
all.forecast.outcomes$team1 <- tolower(all.forecast.outcomes$team1)
all.forecast.outcomes$team2 <- tolower(all.forecast.outcomes$team2)
all.forecast.outcomes <- all.forecast.outcomes[duplicated(all.forecast.outcomes)==F,]
bk.h <- merge(bk.h,all.forecast.outcomes[,c("match_id","team1","team2","outcome.h","Ph")],
by=c("team1","team2"),all.x=T)
bk.mz.h <- lm(outcome.h ~ mean,data=bk.h)
#summary(bk.mz.h)
kable(summary(bk.mz.h)$coef, digits=3)
| (Intercept) |
0.113 |
0.026 |
4.346 |
0 |
| mean |
0.486 |
0.044 |
11.130 |
0 |
bk.calib.h <- aggregate(bk.h$outcome.h,by=list(round(bk.h$mean,2)),FUN=mean,na.rm=T)
plot(bk.calib.h$Group.1,bk.calib.h$x,xlim=range(0,1),ylim=range(0,1),
ylab="Frequency of events that occurred",xlab="Forecast probability",
main="BK Calibration plot for home win outcomes")
abline(0,1,lty=3)

bk.d <- merge(bk.d,all.forecast.outcomes[,c("match_id","team1","team2","outcome.d","Pd")],
by=c("team1","team2"),all.x=T)
bk.mz.d <- lm(outcome.d ~ mean,data=bk.d)
#summary(bk.mz.d)
kable(summary(bk.mz.d)$coef, digits=3)
| (Intercept) |
0.200 |
0.018 |
11.291 |
0.000 |
| mean |
0.112 |
0.038 |
2.953 |
0.003 |
bk.calib.d <- aggregate(bk.d$outcome.d,by=list(round(bk.d$mean,2)),FUN=mean,na.rm=T)
plot(bk.calib.d$Group.1,bk.calib.d$x,xlim=range(0,1),ylim=range(0,1),
ylab="Frequency of events that occurred",xlab="Forecast probability",
main="BK Calibration plot for draw outcomes")
abline(0,1,lty=3)

bk.a <- merge(bk.a,all.forecast.outcomes[,c("match_id","team1","team2","outcome.a","Pa")],
by=c("team1","team2"),all.x=T)
bk.mz.a <- lm(outcome.a ~ mean,data=bk.a)
#summary(bk.mz.a)
kable(summary(bk.mz.a)$coef, digits=3)
| (Intercept) |
0.278 |
0.020 |
13.756 |
0 |
| mean |
0.220 |
0.039 |
5.639 |
0 |
bk.calib.a <- aggregate(bk.a$outcome.a,by=list(round(bk.a$mean,2)),FUN=mean,na.rm=T)
plot(bk.calib.a$Group.1,bk.calib.a$x,xlim=range(0,1),ylim=range(0,1),
ylab="Frequency of events that occurred",xlab="Forecast probability",
main="BK Calibration plot for away win outcomes")
abline(0,1,lty=3)

bk.h$diff <- bk.h$mean - bk.h$Ph
bk.mod.mz.h <- lm(outcome.h ~ mean + diff,data=bk.h)
#summary(bk.mod.mz.h)
kable(summary(bk.mod.mz.h)$coef, digits=3)
| (Intercept) |
-0.137 |
0.031 |
-4.456 |
0 |
| mean |
1.126 |
0.062 |
18.169 |
0 |
| diff |
-0.943 |
0.067 |
-13.979 |
0 |
bk.d$diff <- bk.d$mean - bk.d$Pd
bk.mod.mz.d <- lm(outcome.d ~ mean + diff,data=bk.d)
#summary(bk.mod.mz.d)
kable(summary(bk.mod.mz.d)$coef, digits=3)
| (Intercept) |
-0.141 |
0.065 |
-2.167 |
0.03 |
| mean |
1.481 |
0.255 |
5.815 |
0.00 |
| diff |
-1.401 |
0.258 |
-5.434 |
0.00 |
bk.a$diff <- bk.a$mean - bk.a$Pa
bk.mod.mz.a <- lm(outcome.a ~ mean + diff,data=bk.a)
#summary(bk.mod.mz.a)
kable(summary(bk.mod.mz.a)$coef, digits=3)
| (Intercept) |
-0.012 |
0.024 |
-0.483 |
0.629 |
| mean |
1.297 |
0.067 |
19.463 |
0.000 |
| diff |
-1.345 |
0.070 |
-19.193 |
0.000 |
- Model provides additional predictive content above bookmakers for explaining outcomes.
- Graphically:
plot(calib.h$Group.1,calib.h$x,xlim=range(0,1),ylim=range(0,1),
ylab="Frequency of events that occurred",xlab="Forecast probability",
main="Calibration plot for home win outcomes")
lines(bk.calib.h$Group.1,bk.calib.h$x,col=2,type="p")
abline(0,1,lty=3)
legend("topleft",col=1:2,legend=c("Model","Bookmakers"),lty=1,bty="n")

plot(calib.d$Group.1,calib.d$x,xlim=range(0,1),ylim=range(0,1),
ylab="Frequency of events that occurred",xlab="Forecast probability",
main="Calibration plot for draw outcomes")
lines(bk.calib.d$Group.1,bk.calib.d$x,col=2,type="p")
abline(0,1,lty=3)
legend("topleft",col=1:2,legend=c("Model","Bookmakers"),lty=1,bty="n")

plot(calib.a$Group.1,calib.a$x,xlim=range(0,1),ylim=range(0,1),
ylab="Frequency of events that occurred",xlab="Forecast probability",
main="Calibration plot for away win outcomes")
lines(bk.calib.a$Group.1,bk.calib.a$x,col=2,type="p")
abline(0,1,lty=3)
legend("topleft",col=1:2,legend=c("Model","Bookmakers"),lty=1,bty="n")

Model: End-of-season outcomes 2013
- Given reasonable (relative) forecast performance, we simulate end-of-season outcomes.
- First we consider the 2013–2014 season (outcomes known).
- Forecasts updated for each day matches played (Saturday, Sunday, Monday, etc).
forecast.days.2013 <- data.frame()
dloc <- "/home/readejj/Dropbox/Teaching/Reading/ec313/2015/Football-forecasts/"
days <- dir(dloc,pattern="prem-final-*\\d*-\\d+[.]csv")
for( d in days) {
temp <- read.csv(paste(dloc,d,sep=""),stringsAsFactors=F)
temp <- temp[is.na(temp$Man.Utd)==F,]
temp$day <- as.Date(as.numeric(gsub("prem-final-*\\d*-(\\d+)[.]csv","\\1",d)),origin="1970-01-01")
forecast.days.2013 <- rbind(forecast.days.2013,temp)
}
forc.probs.2013.1 <- aggregate(forecast.days.2013==1,by=list(forecast.days.2013$day),FUN=mean)
forc.probs.2013.rel <- aggregate(forecast.days.2013>=18,by=list(forecast.days.2013$day),FUN=mean)
plot(forc.probs.2013.1$Group.1,forc.probs.2013.1$Chelsea,ylim=range(0,1),type="o",col="blue",
main="Model forecasts for EPL Title 2013-14",
ylab="Probability of Winning EPL",xlab="Date")
lines(forc.probs.2013.1$Group.1,forc.probs.2013.1$Man.City,type="o",col="skyblue")
lines(forc.probs.2013.1$Group.1,forc.probs.2013.1$Man.Utd,type="o",col="red")
lines(forc.probs.2013.1$Group.1,forc.probs.2013.1$Arsenal,type="o",col="darkred")
lines(forc.probs.2013.1$Group.1,forc.probs.2013.1$Liverpool,type="o",col="red2")

plot(forc.probs.2013.rel$Group.1,forc.probs.2013.rel$Cardiff,ylim=range(0,1),type="o",col="red",
main="Model forecasts for EPL Relegation 2013-14",
ylab="Probability of Relegation from EPL",xlab="Date")
lines(forc.probs.2013.rel$Group.1,forc.probs.2013.rel$Norwich,type="o",col="yellow")
lines(forc.probs.2013.rel$Group.1,forc.probs.2013.rel$Fulham,type="o",col="black")
lines(forc.probs.2013.rel$Group.1,forc.probs.2013.rel$Aston.Villa,type="o",col="purple")
lines(forc.probs.2013.rel$Group.1,forc.probs.2013.rel$West.Brom,type="o",col="darkblue")
lines(forc.probs.2013.rel$Group.1,forc.probs.2013.rel$West.Ham,type="o",col="purple3")
lines(forc.probs.2013.rel$Group.1,forc.probs.2013.rel$Sunderland,type="o",col="pink")

- Ocular econometrics: Appears to track events well.
- Liverpool’s late fall, Sunderland’s late spurt.
- Little really to evaluate against, with only few (20) outcomes.
- Additionally, uncertainty remained great even towards final day.
Model: End-of-season outcomes 2014
forecast.days <- data.frame()
dloc <- "/home/readejj/Dropbox/Teaching/Reading/ec313/2015/Football-forecasts/"
days <- dir(dloc,pattern="prem-final-2014-\\d-(\\d+).csv")
for( d in days) {
temp <- read.csv(paste(dloc,d,sep=""),stringsAsFactors=F)
temp$day <- as.Date(as.numeric(gsub("prem-final-2014-\\d+-(\\d+).csv","\\1",d)),origin="1970-01-01")
forecast.days <- rbind(forecast.days,temp)
}
forc.probs.1 <- aggregate(forecast.days==1,by=list(forecast.days$day),FUN=mean)
forc.probs.rel <- aggregate(forecast.days>=18,by=list(forecast.days$day),FUN=mean)
plot(forc.probs.1$Group.1,forc.probs.1$Chelsea,ylim=range(0,1),type="o",col="blue",
main="Comparison - Model and Bookmakers, Chelsea",
ylab="Probability of Winning EPL",xlab="Date")
lines(forc.probs.1$Group.1,forc.probs.1$Man.City,type="o",col="skyblue")
lines(forc.probs.1$Group.1,forc.probs.1$Man.Utd,type="o",col="red")
lines(forc.probs.1$Group.1,forc.probs.1$Arsenal,type="o",col="darkred")
lines(forc.probs.1$Group.1,forc.probs.1$Liverpool,type="o",col="red2")

plot(forc.probs.rel$Group.1,forc.probs.rel$Leicester,ylim=range(0,1),type="o",col="blue",
main="Comparison - Model and Bookmakers, Leicester",
ylab="Probability of Relegation",xlab="Date")
lines(forc.probs.rel$Group.1,forc.probs.rel$Burnley,type="o",col="purple")
lines(forc.probs.rel$Group.1,forc.probs.rel$QPR,type="o",col="lightblue")
lines(forc.probs.rel$Group.1,forc.probs.rel$Sunderland,type="o",col="pink")

Comparison: Bookmaker end-of-season outcomes
- Bookmaker prices for end-of-season outcomes collected for 2014–2015.
- Enables comparison between the two.
- No outcomes to compare against, nonetheless.
teams.1 <- winner$team[duplicated(winner$team)==F]
team.cols.1 <- c("darkred","blue","red2","skyblue","red","pink","black")
plot(range(winner$Date.Time),rep(0,2),ylim=range(0,1),col="white",ylab="Probability",xlab="",
main="Bookmaker Implied Probabilities for EPL Title",xaxt="n")
axis(1,at=seq(as.Date("2014-08-01"),as.Date("2015-06-01"),by="months"),
labels=format(seq(as.Date("2014-08-01"),as.Date("2015-06-01"),by="months"),"%b-%Y"),las=2)
for(t in 1:NROW(teams.1)) {
for(b in 1:NROW(bks)) {
lines(winner$Date.Time[winner$team==teams.1[t]],1/winner[winner$team==teams.1[t],bks[b]],col=team.cols.1[t])
}
}
legend("topleft",ncol=1,lty=1,col=team.cols.1,legend=teams.1,bty="n")
for(date in match.dates.2014) {
abline(v=as.Date(date),lty=3)
}
for(date in match.dates.2013[-NROW(match.dates.2013)]) {
abline(v=as.Date(date,origin="1970-01-01"),lty=3,col="grey")
}

teams.r <- releg$team[duplicated(releg$team)==F]
team.cols.r <- c("purple","mediumpurple1","red","blue","orange","darkblue","black","lightblue","pink","hotpink",
"black","darkblue","purple3")
plot(range(releg$Date.Time),rep(0,2),ylim=range(0,1),col="white",ylab="Probability",xlab="",
main="Bookmaker Implied Probabilities for Relegation",xaxt="n")
axis(1,at=seq(as.Date("2014-08-01"),as.Date("2015-06-01"),by="months"),
labels=format(seq(as.Date("2014-08-01"),as.Date("2015-06-01"),by="months"),"%b-%Y"),las=2)
for(t in 1:NROW(teams)) {
for(b in 1:NROW(bks)) {
lines(releg$Date.Time[releg$team==teams.r[t]],1/releg[releg$team==teams.r[t],bks[b]],col=team.cols.r[t])
}
}
legend("topleft",ncol=1,lty=1,col=team.cols.r,legend=teams.r,bty="n")
for(date in match.dates.2014) {
abline(v=as.Date(date),lty=3)
}

Model vs Bookmakers
plot(forc.probs.1$Group.1,forc.probs.1$Chelsea,ylim=range(0,1),type="o",col="blue",
main="Comparison - Model and Bookmakers, Chelsea",
ylab="Probability of Winning EPL",xlab="Date")
for(b in 1:NROW(bks)) {
lines(winner$Date.Time[winner$team=="chelsea"],1/winner[winner$team=="chelsea",bks[b]],col="blue")
}

plot(forc.probs.1$Group.1,forc.probs.1$Man.City,ylim=range(0,1),type="o",col="skyblue",
main="Comparison - Model and Bookmakers, Man City",
ylab="Probability of Winning EPL",xlab="Date")
for(b in 1:NROW(bks)) {
lines(winner$Date.Time[winner$team=="man-city"],1/winner[winner$team=="man-city",bks[b]],col="skyblue")
}

plot(forc.probs.1$Group.1,forc.probs.1$Man.Utd,ylim=range(0,1),type="o",col="red",
main="Comparison - Model and Bookmakers, Man United",
ylab="Probability of Winning EPL",xlab="Date")
for(b in 1:NROW(bks)) {
lines(winner$Date.Time[winner$team=="man-utd"],1/winner[winner$team=="man-utd",bks[b]],col="red")
}

plot(forc.probs.rel$Group.1,forc.probs.rel$Leicester,ylim=range(0,1),type="o",col="blue",
main="Comparison - Model and Bookmakers, Leicester",
ylab="Probability of Relegation",xlab="Date")
for(b in 1:NROW(bks)) {
lines(releg$Date.Time[releg$team=="leicester"],1/releg[releg$team=="leicester",bks[b]],col="blue")
}

plot(forc.probs.rel$Group.1,forc.probs.rel$Burnley,ylim=range(0,1),type="o",col="purple",
main="Comparison - Model and Bookmakers, Burnley",
ylab="Probability of Relegation",xlab="Date")
for(b in 1:NROW(bks)) {
lines(releg$Date.Time[releg$team=="burnley"],1/releg[releg$team=="burnley",bks[b]],col="purple")
}

Discussion
- Casual comparison suggests little difference between “forecasts”.
- Suggests method effective.
- Further improvements undoubtedly possible in underlying statistical model.
- But…
- Absence of outcomes to compare against.
- Deeply uncertain event being forecast.
Analysis of Bookmakers
- Bookmakers offer identical product often for different price.
- Value of product varies with information, however.
- Why do customers not all go to bookmaker offering best price?
- No information on customer loyalty offers.
- Who offers best price most often?
- What prompts bookmakers to change prices?
- Analysis suggests matches must be most important driver of price changes.
- To what extent do competitive pressures influence updating?
Frequency of Price Changes
- First need to create balanced daily panel:
winner.full <- data.frame(stringsAsFactors=F)
for(t in teams.1) {
temp <- data.frame("Date.Time"=seq(min(winner$Date.Time),max(winner$Date.Time),by="days"),
"team"=t,stringsAsFactors=F)
winner.full <- rbind(winner.full,temp)
}
winner.full <- merge(winner.full,winner,by=c("Date.Time","team"),all.x=T)
winner.full <- winner.full[order(winner.full$team,winner.full$Date.Time),]
for(t in teams.1) {
winner.full[winner.full$team==t,bks] <- na.locf(winner.full[winner.full$team==t,bks],na.rm=F)
}
winner.full$matchday <- as.numeric(winner.full$Date.Time %in% as.Date(c(match.dates.2013,match.dates.2014)))
winner.full <- winner.full[order(winner.full$team,winner.full$Date.Time),]
for(b in bks) {
winner.full[paste(b,1,sep=".")] <- c(-999,winner.full[-NROW(winner.full),b])
}
winner.full$team.1 <- c("na",winner.full$team[-NROW(winner.full)])
for(b in bks) {
winner.full[paste(b,"d",sep=".")] <- as.numeric((winner.full[b] != winner.full[paste(b,1,sep=".")]) & (winner.full$team==winner.full$team.1))
}
- Who offers best price most often?
best.p <- apply(winner.full[,nchar(colnames(winner.full))==2],1,FUN=max,na.rm=T)
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
best.full <- colSums(winner.full[,nchar(colnames(winner.full))==2]==best.p,na.rm=T)
- Betfair more than 25% of time (for EPL winner).
- Paddy Power, Bet Victor and Betfair Sportsbook and Coral most competitive bookmakers.
- Some (Stan James, Titanbet, Betfred) very uncompetitive.
- How frequently do bookmakers change prices?
100*colMeans(winner.full[grep("[.]d",colnames(winner.full))],na.rm=T)
## B3.d SK.d BX.d BY.d FR.d SO.d VC.d PP.d
## 12.64007 10.98592 12.19872 11.98630 11.27717 11.48497 19.10064 12.89675
## SJ.d EE.d LD.d CE.d WH.d WN.d SX.d FB.d
## 12.75294 12.26941 15.63461 12.20811 10.45611 14.50313 11.40351 13.28060
## WA.d TI.d UN.d BW.d RD.d BF.d BD.d MA.d
## 13.25301 16.70468 12.14071 10.25311 14.35986 17.02713 30.60697 29.63585
- Note: 26.3565891% of days in sample are matchdays.
- But is simple existence of matchday what determines price change?
- What if other bookmakers change prices?
- What if that teams loses/wins?
- What if that particular team doesn’t play?
long.winner.full <- read.csv(paste(loc,"long-winner-full.csv",sep=""),stringsAsFactors=F)
bk.reg <- lm(bk.d ~ matchday + bk.oth.d + bk.oth.d.1 + team.play + team.win + team.lose,data=long.winner.full)
#summary(bk.reg)
kable(summary(bk.reg)$coef, digits=3)
| (Intercept) |
0.017 |
0.002 |
10.139 |
0.000 |
| matchday |
0.058 |
0.005 |
12.819 |
0.000 |
| bk.oth.d |
0.032 |
0.000 |
66.520 |
0.000 |
| bk.oth.d.1 |
0.001 |
0.000 |
3.315 |
0.001 |
| team.play |
0.075 |
0.010 |
7.315 |
0.000 |
| team.win |
-0.004 |
0.011 |
-0.414 |
0.679 |
| team.lose |
0.036 |
0.013 |
2.803 |
0.005 |
- Aggregate results suggest matchdays important:
- Increase likelihood of odds change by 6%.
- If that team playing that day, further 7% increment.
- Actual match outcome less important.
- Competitive pressures matter also:
- Each extra bookie changing odds increases likelihood of change by 3%.
- Endogeneity an issue with other bookmaker changes.
- Do different bookmakers differ?
bk.reg.2 <- lm(bk.d ~ bk + matchday + bk.oth.d + bk.oth.d.1 + team.play + team.win + team.lose,data=long.winner.full)
#summary(bk.reg.2)
kable(summary(bk.reg.2)$coef, digits=3)
| (Intercept) |
-0.004 |
0.006 |
-0.572 |
0.567 |
| bkBD |
0.191 |
0.009 |
21.987 |
0.000 |
| bkBF |
0.057 |
0.008 |
6.732 |
0.000 |
| bkBW |
-0.020 |
0.009 |
-2.327 |
0.020 |
| bkBX |
-0.014 |
0.009 |
-1.573 |
0.116 |
| bkBY |
-0.002 |
0.009 |
-0.220 |
0.826 |
| bkCE |
0.004 |
0.009 |
0.484 |
0.628 |
| bkEE |
0.001 |
0.009 |
0.086 |
0.931 |
| bkFB |
0.013 |
0.009 |
1.557 |
0.119 |
| bkFR |
-0.015 |
0.009 |
-1.682 |
0.093 |
| bkLD |
0.033 |
0.009 |
3.821 |
0.000 |
| bkMA |
0.154 |
0.009 |
16.570 |
0.000 |
| bkPP |
0.013 |
0.009 |
1.553 |
0.120 |
| bkRD |
0.022 |
0.009 |
2.513 |
0.012 |
| bkSJ |
-0.003 |
0.009 |
-0.356 |
0.722 |
| bkSK |
-0.007 |
0.009 |
-0.839 |
0.402 |
| bkSO |
-0.011 |
0.009 |
-1.237 |
0.216 |
| bkSX |
-0.021 |
0.009 |
-2.323 |
0.020 |
| bkTI |
0.018 |
0.009 |
1.905 |
0.057 |
| bkUN |
-0.001 |
0.009 |
-0.068 |
0.946 |
| bkVC |
0.071 |
0.009 |
8.252 |
0.000 |
| bkWA |
0.014 |
0.009 |
1.579 |
0.114 |
| bkWH |
-0.018 |
0.009 |
-2.097 |
0.036 |
| bkWN |
0.020 |
0.009 |
2.262 |
0.024 |
| matchday |
0.055 |
0.004 |
12.308 |
0.000 |
| bk.oth.d |
0.032 |
0.000 |
68.096 |
0.000 |
| bk.oth.d.1 |
0.001 |
0.000 |
3.557 |
0.000 |
| team.play |
0.074 |
0.010 |
7.255 |
0.000 |
| team.win |
-0.004 |
0.010 |
-0.403 |
0.687 |
| team.lose |
0.035 |
0.013 |
2.762 |
0.006 |
- Simple fixed effects shows different bookmakers have different willingness to change prices.
- Explanatory variables appear unchanged.
bk.reg.3 <- lm(bk.d ~ bk + matchday*bk + bk.oth.d*bk + team.play*bk,data=long.winner.full)
#summary(bk.reg.3)
kable(summary(bk.reg.3)$coef, digits=3)
| (Intercept) |
-0.022 |
0.007 |
-3.004 |
0.003 |
| bkBD |
0.267 |
0.010 |
26.014 |
0.000 |
| bkBF |
0.156 |
0.010 |
15.409 |
0.000 |
| bkBW |
0.016 |
0.010 |
1.575 |
0.115 |
| bkBX |
-0.027 |
0.011 |
-2.482 |
0.013 |
| bkBY |
-0.002 |
0.010 |
-0.149 |
0.881 |
| bkCE |
0.013 |
0.010 |
1.296 |
0.195 |
| bkEE |
-0.008 |
0.010 |
-0.789 |
0.430 |
| bkFB |
0.031 |
0.010 |
3.002 |
0.003 |
| bkFR |
-0.009 |
0.011 |
-0.885 |
0.376 |
| bkLD |
0.018 |
0.010 |
1.760 |
0.078 |
| bkMA |
0.284 |
0.011 |
25.388 |
0.000 |
| bkPP |
0.016 |
0.010 |
1.530 |
0.126 |
| bkRD |
0.031 |
0.010 |
2.960 |
0.003 |
| bkSJ |
0.045 |
0.011 |
4.261 |
0.000 |
| bkSK |
-0.005 |
0.010 |
-0.473 |
0.636 |
| bkSO |
0.035 |
0.011 |
3.353 |
0.001 |
| bkSX |
-0.006 |
0.011 |
-0.543 |
0.587 |
| bkTI |
-0.018 |
0.011 |
-1.564 |
0.118 |
| bkUN |
-0.010 |
0.010 |
-0.931 |
0.352 |
| bkVC |
0.132 |
0.010 |
12.740 |
0.000 |
| bkWA |
-0.003 |
0.010 |
-0.337 |
0.736 |
| bkWH |
0.005 |
0.010 |
0.475 |
0.635 |
| bkWN |
-0.009 |
0.011 |
-0.895 |
0.371 |
| matchday |
0.001 |
0.021 |
0.066 |
0.947 |
| bk.oth.d |
0.042 |
0.002 |
19.399 |
0.000 |
| team.play |
0.106 |
0.029 |
3.695 |
0.000 |
| bkBD:matchday |
0.077 |
0.030 |
2.569 |
0.010 |
| bkBF:matchday |
0.073 |
0.027 |
2.720 |
0.007 |
| bkBW:matchday |
-0.042 |
0.030 |
-1.394 |
0.163 |
| bkBX:matchday |
-0.032 |
0.030 |
-1.066 |
0.286 |
| bkBY:matchday |
0.000 |
0.029 |
0.008 |
0.994 |
| bkCE:matchday |
0.022 |
0.028 |
0.769 |
0.442 |
| bkEE:matchday |
-0.014 |
0.030 |
-0.458 |
0.647 |
| bkFB:matchday |
0.110 |
0.028 |
3.913 |
0.000 |
| bkFR:matchday |
-0.002 |
0.030 |
-0.059 |
0.953 |
| bkLD:matchday |
0.149 |
0.030 |
5.030 |
0.000 |
| bkMA:matchday |
-0.026 |
0.030 |
-0.867 |
0.386 |
| bkPP:matchday |
0.036 |
0.028 |
1.300 |
0.193 |
| bkRD:matchday |
-0.007 |
0.030 |
-0.250 |
0.803 |
| bkSJ:matchday |
0.152 |
0.030 |
5.109 |
0.000 |
| bkSK:matchday |
0.033 |
0.027 |
1.210 |
0.226 |
| bkSO:matchday |
0.133 |
0.030 |
4.454 |
0.000 |
| bkSX:matchday |
0.019 |
0.030 |
0.625 |
0.532 |
| bkTI:matchday |
0.173 |
0.030 |
5.719 |
0.000 |
| bkUN:matchday |
0.010 |
0.030 |
0.322 |
0.747 |
| bkVC:matchday |
0.120 |
0.029 |
4.188 |
0.000 |
| bkWA:matchday |
0.115 |
0.028 |
4.096 |
0.000 |
| bkWH:matchday |
0.007 |
0.030 |
0.247 |
0.805 |
| bkWN:matchday |
0.141 |
0.030 |
4.724 |
0.000 |
| bkBD:bk.oth.d |
-0.023 |
0.003 |
-7.557 |
0.000 |
| bkBF:bk.oth.d |
-0.034 |
0.003 |
-11.575 |
0.000 |
| bkBW:bk.oth.d |
-0.007 |
0.003 |
-2.300 |
0.021 |
| bkBX:bk.oth.d |
0.003 |
0.003 |
1.045 |
0.296 |
| bkBY:bk.oth.d |
-0.003 |
0.003 |
-0.851 |
0.395 |
| bkCE:bk.oth.d |
-0.007 |
0.003 |
-2.505 |
0.012 |
| bkEE:bk.oth.d |
0.003 |
0.003 |
1.014 |
0.311 |
| bkFB:bk.oth.d |
-0.015 |
0.003 |
-5.034 |
0.000 |
| bkFR:bk.oth.d |
-0.003 |
0.003 |
-0.991 |
0.322 |
| bkLD:bk.oth.d |
-0.007 |
0.003 |
-2.400 |
0.016 |
| bkMA:bk.oth.d |
-0.027 |
0.003 |
-8.652 |
0.000 |
| bkPP:bk.oth.d |
-0.001 |
0.003 |
-0.477 |
0.633 |
| bkRD:bk.oth.d |
-0.003 |
0.003 |
-0.913 |
0.361 |
| bkSJ:bk.oth.d |
-0.020 |
0.003 |
-6.445 |
0.000 |
| bkSK:bk.oth.d |
-0.001 |
0.003 |
-0.428 |
0.669 |
| bkSO:bk.oth.d |
-0.019 |
0.003 |
-6.269 |
0.000 |
| bkSX:bk.oth.d |
-0.005 |
0.003 |
-1.481 |
0.139 |
| bkTI:bk.oth.d |
-0.004 |
0.003 |
-1.202 |
0.229 |
| bkUN:bk.oth.d |
0.005 |
0.003 |
1.532 |
0.125 |
| bkVC:bk.oth.d |
-0.025 |
0.003 |
-8.456 |
0.000 |
| bkWA:bk.oth.d |
-0.002 |
0.003 |
-0.675 |
0.500 |
| bkWH:bk.oth.d |
-0.010 |
0.003 |
-3.274 |
0.001 |
| bkWN:bk.oth.d |
0.000 |
0.003 |
0.042 |
0.966 |
| bkBD:team.play |
-0.253 |
0.041 |
-6.174 |
0.000 |
| bkBF:team.play |
-0.186 |
0.041 |
-4.561 |
0.000 |
| bkBW:team.play |
-0.047 |
0.041 |
-1.151 |
0.250 |
| bkBX:team.play |
0.078 |
0.041 |
1.919 |
0.055 |
| bkBY:team.play |
0.089 |
0.041 |
2.202 |
0.028 |
| bkCE:team.play |
0.119 |
0.041 |
2.926 |
0.003 |
| bkEE:team.play |
0.023 |
0.041 |
0.568 |
0.570 |
| bkFB:team.play |
0.043 |
0.041 |
1.055 |
0.291 |
| bkFR:team.play |
0.048 |
0.041 |
1.170 |
0.242 |
| bkLD:team.play |
0.038 |
0.041 |
0.928 |
0.353 |
| bkMA:team.play |
-0.213 |
0.041 |
-5.175 |
0.000 |
| bkPP:team.play |
-0.047 |
0.041 |
-1.150 |
0.250 |
| bkRD:team.play |
0.020 |
0.041 |
0.479 |
0.632 |
| bkSJ:team.play |
-0.200 |
0.041 |
-4.898 |
0.000 |
| bkSK:team.play |
-0.045 |
0.041 |
-1.105 |
0.269 |
| bkSO:team.play |
-0.155 |
0.041 |
-3.786 |
0.000 |
| bkSX:team.play |
-0.047 |
0.041 |
-1.149 |
0.251 |
| bkTI:team.play |
-0.038 |
0.041 |
-0.928 |
0.353 |
| bkUN:team.play |
-0.088 |
0.041 |
-2.162 |
0.031 |
| bkVC:team.play |
-0.102 |
0.041 |
-2.509 |
0.012 |
| bkWA:team.play |
-0.031 |
0.041 |
-0.771 |
0.441 |
| bkWH:team.play |
0.081 |
0.041 |
1.980 |
0.048 |
| bkWN:team.play |
-0.051 |
0.041 |
-1.250 |
0.211 |
- Interacting fixed effects with matchday, no. of other bookies changes and whether the team plays shows considerable differences between bookmakers.
- Matchday alone no longer significant.
- Betfair (fixed and exchange), Ladbrokes, Stan James, Sportingbet, Titanbet, Bet Victor, Winner and Betway influenced by matchday.
- Team playing leads to 10% greater chance of change.
- Small number of bookmakers less likely to update if team actually playing.
- Other bookmakers changing increases likelihood by 4%.
- Half of bookmakers less influenced by competition.
all.bk.coefs <- data.frame("(Intercept)"=NA," matchday"=NA,"bk.oth.d"=NA,"bk.oth.d.1"=NA,
"team.play"=NA,"team.win"=NA,"team.lose"=NA)
all.bk.ts <- data.frame("(Intercept)"=NA," matchday"=NA,"bk.oth.d"=NA,"bk.oth.d.1"=NA,
"team.play"=NA,"team.win"=NA,"team.lose"=NA)
for(b in bks) {
bk.reg.0 <- lm(bk.d ~ matchday + bk.oth.d + bk.oth.d.1 + team.play + team.win + team.lose,data=long.winner.full[long.winner.full$bk==b,])
# print(summary(bk.reg.0))
all.bk.coefs <- rbind(all.bk.coefs,coefficients(bk.reg.0))
all.bk.ts <- rbind(all.bk.ts,summary(bk.reg.0)$coefficients[,3])
}
hist(all.bk.coefs$X.matchday)

hist(all.bk.coefs$bk.oth.d)

hist(all.bk.coefs$team.play)

Conclusions
- Constructed and evaluated end-of-season forecasts for sports leagues.
- Hierarchical forecast model based on individual match outcomes.
- Evaluated bookmaker forecasts both for matches and end-of-season outcomes.
- Investigated bookmaker pricing behaviour:
- Varying degree of price competition amongst bookmakers.
- Ongoing research, comments welcome…
References
Forrest, D. and Simmons, R. (2000), “Forecasting Sport: The Behaviour and Performance of Football Tipsters”, International Journal of Forecasting, Vol. 16, pp. 317–331.
Goddard, J. and Asimakopoulos, I. (2004), “Modelling football match results and the efficiency of fixed-odds betting”, International Journal of Forecasting, Vol. 23, pp. 51–66.
Hyndman, R. and Athanasopoulos, G. (2012), Forecasting: Principles and Practice, www.otexts.com, available at: https://www.otexts.org/book/fpp.
Karlis, D. and Ntzoufras, I. (2003), “Analysis of Sports Data By Using Bivariate Poisson Models”, The Statistician, Vol. 52 No. 3, pp. 381–393.
Shin, H. (1991), “Optimal Betting Odds Against Insider Traders”, The Economic Journal, Vol. 101 No. 408, pp. 1179–1185.
Shin, H. (1992), “Prices of State Contingent Claims with Insider Traders, and the Favourite-Longshot Bias”, The Economic Journal, Vol. 102 No. 411, pp. 426–435.
Shin, H. (1993), “Measuring the Incidence of Insider Trading in a Market for State-Contingent Claims”, The Economic Journal, Vol. 103 No. 420, pp. 1141–1153.
Vaughan Williams, L. and Paton, D. (1997), “Why Is There a Favourite-Longshot Bias in British Racetrack Betting Markets?”, The Economic Journal, pp. 150–158.