End-of-season forecasting and Dynamic Pricing

J James Reade

Tinbergen Institute, Erasmus School of Economics, Rotterdam
13/04/2015

Introduction

Bookmaking industry perhaps as ubiquitous as at any point in living memory?
- Deregulation the main driver, but also technology.
Bookmakers must predict outcomes of events ahead of time:
- Forecast horizons from seconds to years.
- Additionally must update forecasts as new information revealed.
- Further, competitive pressure requires good forecasting.
Particular bountiful source of events is sport:
- Football second (?) most popular betting event (after horse racing?).
We investigate forecasts of end-of-season sports league outcomes.
Such events satisfy following characteristics:
- Complex hierarchical structure.
- Huge interest.
- Massively varying uncertainty.
- Horizon varying from very long to very short.
We propose simulation-based method of generating forecasts for such complex hierarchical events (end-of-season outcomes).
- Compare forecasts to those of bookmakers.
We analyse dynamic pricing of bookmakers:
- Who offers best prices?
- How often do bookmakers update?
- What drives updates when they occur?
Hugely preliminary work, although highly enjoyable.
- All comments/suggestions very welcome.

Plan of talk

Review of relevant literature.
Methodology of end-of-season forecast model.
Data: For model and from bookmakers.
Evaluate the model for individual matches.
Evaluate model for end-of-season outcomes.
Investigate dynamic pricing behaviour of bookmakers.

Literature

Economic evaluations of bookmaker activity commonplace:
- E.g. Shin (1992), Vaughan Williams and Paton (1997),
Analysis of complex hierarchical events such as end-of-season outcomes less so.
- Textbook treatment in, for example, Hyndman and Athanasopoulos (2012).
Analysis of pricing naturally much more extensive:
- Basic competition models (Cournot, Bertrand) seem likely inadequate.
- Shin (1991), Shin (1992) and Shin (1993) attempt to capture theory of pricing for bookmakers.
  - Attempts to explain commonly found pricing biases in bookmaking.
- More generally of interest to understand price movements:
  - Why do firms change prices? Competition or new information or…?

Model: Methodology

Multiple methods exist for modelling football match outcomes:
- Modelling goal outcomes: Bivariate Poission methods, e.g. Karlis and Ntzoufras (2003).
- Direct outcome modelling: Limited dependent variable methods, e.g. Forrest and Simmons (2000).
- Goddard and Asimakopoulos (2004) compares and contrasts methods.
We adopt direct outcome modelling:
- Not averse to switching, decision made more for practical/familiarity reasons.
Dependent variable: \[ y_{it} = \left\{\begin{array}{lll}0 && \text{if match $i$ at time $t$ results in away win,}\\0.5 && \text{if match $i$ at time $t$ results in draw,}\\1 && \text{if match $i$ at time $t$ results in home win,}\end{array}\right. \]
Explanatory variables (.D is difference between home and away side, .2 is squared variable):
- E.1: Elo prediction for match, based on Elo scores calculated since 1877.
- pts1, pts.D pts.D.2: League points.
- pld1, pld.D, pld.D.2: League matches played.
- gs1, gs.D, gs.D.2: Goals scored.
- gd1, gd.D, gd.D.2: Goal difference.
- pos1, pos.D, pos.D.2: League position.
- form1, form.D, form.D.2: Form (sum of points in last six matches).
- Season fixed effects.
Use ordered logistic model.

End-of-season: Methodology

Simulation method to generate probabilities of season outcomes:
1. Use estimated model up to start of season.
2. Use results up to starting point to update explanatory variables.
3. Generate probabilistic forecasts for each match on first matchday.
  - Use multinominal distribution to generate outcomes according to forecast probabilities.
  - Update table according to outcomes generated.
    - Use average goals scored/conceded in draws/home wins/away wins.
  - Update Elo scores for teams based on imputed outcomes.
4. Generate probabilistic forecasts for each match on next matchday and repeat process until end of season.
5. Run sufficiently many replications to generate frequentist probabilities of outcomes.

End-of-season: Example

Forecasts from yesterday’s matches:

loc <- "/home/readejj/Dropbox/Research/Sport/managerial change Cormac/"
loc2 <- "/home/readejj/Dropbox/Teaching/Reading/ec313/2015/Football-forecasts/"
library(knitr)
dates <- c("2015-01-30","2015-02-06","2015-02-13","2015-02-20","2015-02-27","2015-03-06","2015-03-13","2015-03-20","2015-04-10")
date.1 <- dates[NROW(dates)]
forecast.matches <- read.csv(paste(loc2,"forecasts_",date.1,".csv",sep=""),stringsAsFactors=F)
div <- "English Premier"
matches <- forecast.matches[forecast.matches$division==div,]
matches <- matches[order(matches$date),]
matches$id <- 1:NROW(matches)
par(mar=c(9,4,4,5)+.1)
plot(matches$id,matches$outcome,xaxt="n",xlab="",ylim=range(0,1),
     main=paste("Forecasts of Weekend ",div," Matches",sep=""),
     ylab="Probability of Outcome",col="white")
lines(matches$id,matches$Ph,col=2,pch=15,type="p")
lines(matches$id,matches$Pd,col=3,pch=16,type="p")
lines(matches$id,matches$Pa,col=4,pch=17,type="p")
legend("topleft",ncol=3,pch=c(15,16,17),col=c(2:4),
       legend=c("OL (home)","OL (draw)","OL (away)"),bty="n")
abline(h=0.5,lty=2)
abline(h=0.6,lty=3)
abline(h=0.7,lty=2)
axis(1,at=matches$id,labels=paste(matches$team1,matches$team2,sep=" v "),las=2,cex.axis=0.65)
for(i in 2:NROW(matches)){
  if(matches$date[i]!=matches$date[i-1]) {
    lines(rep(c(i-0.5),2),c(0,1),lty=2)
  }
}

Generate outcomes for these matches using multinomial distribution:

for(i in 1:NROW(matches)) {
  matches$outcome[i] <- c(1,0.5,0) %*% rmultinom(n = 1, size=1,
                                                 prob = matches[i,c("Ph","Pa","Pd")])  
}
kable(matches[,c("team1","team2","outcome")])

	team1	team2	outcome
3	Swansea	Everton	0.0
4	Tottenham	Aston Villa	1.0
5	Southampton	Hull	1.0
6	West Ham	Stoke	0.0
7	West Brom	Leicester	1.0
8	Sunderland	C Palace	0.5
9	Burnley	Arsenal	0.5
110	QPR	Chelsea	0.5
111	Man Utd	Man City	0.5
113	Liverpool	Newcastle	0.5

Update goals scored/conceded with averages, update league tables, Elo scores.
Generate forecasts for next weekend’s matches using updated data.
Carry on until end of season and log final positions for teams.
Repeat sufficiently many times to generate distribution.

Data: Model

All data files for code available on request, and/or findable in one of three locations (via Dropbox):
1. http://goo.gl/JwJSiD (historical results)
2. http://goo.gl/oESdOV (betting data files)
3. http://goo.gl/b3Sj0Q (everything else)

res.eng <- read.csv(paste(loc,"res-eng.csv",sep=""),stringsAsFactor=F)
res.eng$goals1 <- as.numeric(res.eng$goals1)
res.eng$goals2 <- as.numeric(res.eng$goals2)
res.eng$tier <- as.numeric(res.eng$tier)
res.eng$season <- as.numeric(res.eng$season)
res.eng$X <- NULL

match.dates.2013 <- res.eng$date[res.eng$season==2013]
match.dates.2013 <- match.dates.2013[duplicated(match.dates.2013)==F]
match.dates.2013 <- match.dates.2013[order(match.dates.2013)]
match.dates.2014 <- res.eng$date[res.eng$season==2014]
match.dates.2014 <- match.dates.2014[duplicated(match.dates.2014)==F]
match.dates.2014 <- match.dates.2014[order(match.dates.2014)]

Data collected from http://www.soccerbase.com:
- Every English football match between 1888-09-08 and 2015-05-24.
Calculate league positions/details and Elo scores:

elo.prem <- read.csv(paste(loc2,"elo-prem-2010.csv",sep=""),stringsAsFactor=F)
epl.teams <- res.eng$team1[res.eng$season==2014 & res.eng$tier==1]
epl.teams <- c("Liverpool","Burnley","Stoke","QPR","Newcastle","Arsenal","West Ham","West Brom","Man Utd","Leicester","Chelsea","Aston Villa","Everton","Sunderland","Tottenham","Hull","Swansea","Man City","C Palace","Southampton")
epl.team.cols <- c("red3","purple","pink","lightblue","black","darkred","purple2","darkblue","red","blue","darkblue","purple3","blue2","hotpink","grey10","orange","grey20","skyblue","red4","pink3")
plot(range(as.Date(elo.prem$date),na.rm=T),rep(1,2),type="l",
     ylim=range(elo.prem[gsub(" ",".",epl.teams)],na.rm=T),
     main="Elo Scores for Current EPL Teams since 2010",ylab="Elo score",xlab="Date")
for(t in 1:NROW(epl.teams)) {
  lines(as.Date(elo.prem$date),elo.prem[,gsub(" ",".",epl.teams[t])],type="l",col=epl.team.cols[t])
}

Regression model output:

library(MASS)
model.ord <- polr(as.factor(outcome) ~ E.1 + pts1 + pts.D + pts.D.2 + pld1 + pld.D + pld.D.2 + 
                   gs1 + gs.D + gs.D.2 + gd1 + gd.D + gd.D.2 + pos1 + pos.D + pos.D.2 + 
                   form1 + form.D + form.D.2 + season.d, 
                 data=res.eng, method = "logistic")
options(scipen=13)
#summary(model.ord)
kable(summary(model.ord)$coef, digits=3)

## 
## Re-fitting to get Hessian

	Value	Std. Error	t value
E.1	2.379	0.019	128.119
pts1	0.007	0.005	1.627
pts.D	-0.014	0.003	-4.306
pts.D.2	0.000	0.000	-0.972
pld1	-0.012	0.006	-1.935
pld.D	0.024	0.010	2.365
pld.D.2	0.007	0.004	1.575
gs1	0.003	0.002	1.700
gs.D	0.001	0.002	0.801
gs.D.2	0.000	0.000	0.046
gd1	-0.005	0.003	-2.087
gd.D	0.017	0.002	9.023
gd.D.2	0.000	0.000	-0.010
pos1	0.005	0.003	1.407
pos.D	0.008	0.003	2.905
pos.D.2	0.000	0.000	2.502
form1	0.007	0.004	1.974
form.D	-0.013	0.003	-4.388
form.D.2	-0.001	0.000	-2.326
season.d	-0.003	0.000	-11.728
0\|0.5	0.127	0.033	3.807
0.5\|1	1.271	0.033	38.085

Data: Bookmakers

source("/home/readejj/Dropbox/Research/Code/R/betting/clean.data.R")

bks <- c("B3", "SK", "BX", "BY", "FR", "SO", "VC", "PP", "SJ", "EE", "LD", "CE", 
         "WH", "WN", "SX", "FB", "WA", "TI", "UN", "BW", "RD", "BF", "BD", "MA")
bks.full <- c("Bet365", "SkyBet", "Totesport", "Boyle Sports", "Betfred", "Sportingbet", "Bet Victor", "Paddy Power", "Stan James", "888sport", "Ladbrokes", "Coral", 
         "William Hill", "Winner", "Spreadex", "Betfair-fixed", "Betway", "Titanbet", "Unibet", "bwin", "32Red", "Betfair-exchange", "Betdaq", "Matchbook")

direc <- c("/home/readejj/Dropbox/Research/Data for Ideas/Betting/football/eos/premier-league/2015-04-08/winner")
winner <- data.frame(stringsAsFactors=F)
teams <- c(paste(direc,dir(direc,pattern="*.csv"),sep="/"))

for (team in teams) {
  temp <- read.csv(team,stringsAsFactors=F)
  temp <- clean.data(temp)
  temp$team <- gsub("/home/readejj/Dropbox/Research/Data for Ideas/Betting/football/eos/premier-league/2015-04-08/winner/(\\S+).csv","\\1",team)
  winner <- rbind(winner,temp)
}

## Loading required package: zoo
## 
## Attaching package: 'zoo'
## 
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

winner <- winner[order(winner$team,winner$Date.Time),]

direc <- c("/home/readejj/Dropbox/Research/Data for Ideas/Betting/football/eos/premier-league/2015-04-08/relegation")
releg <- data.frame(stringsAsFactors=F)
teams <- c(paste(direc,dir(direc,pattern="*.csv"),sep="/"))

for (team in teams) {
  temp <- read.csv(team,stringsAsFactors=F)
  temp <- clean.data(temp)
  temp$team <- gsub("/home/readejj/Dropbox/Research/Data for Ideas/Betting/football/eos/premier-league/2015-04-08/relegation/(\\S+).csv","\\1",team)
  releg <- rbind(releg,temp)
}

releg <- releg[order(releg$team,releg$Date.Time),]

Data collected from http://www.oddschecker.com weekly.
Odds on individual match outcomes and end-of-season outcomes over 20 bookmakers.
All files contain price history of bookmakers.
- E.g. EPL winner file dates from 2014-03-18 to 2015-04-08.
- E.g. EPL relegation file dates from 2014-04-08 to 2015-04-08.
- Individual match betting data has shorter time horizons.

Model: Appraisal against individual events

Model has been used to construct ex ante forecasts in recent weeks.
- Forecasts for all professional domestic English football matches.
To evaluate quality of forecasts can run Mincer-Zarnowitz test.
- Regression of outcomes $y_{it}$ on model forecasts $\widehat{y}_{mit}$ plus constant: \[ y_{it} = \alpha + \beta \widehat{y}_{mit} + e_{it}. \]
Unbiased forecasts: $\alpha=0$ and $\beta=1$, with “well-behaved” residuals.

all.forecast.outcomes <- data.frame()
for(i in dates) {
  temp.0 <- read.csv(paste(loc2,"forecast_outcomes_",i,".csv",sep=""),stringsAsFactors=F)
  temp.0$X <-NULL
  temp.0$forc.week <- i
  temp.1 <- read.csv(paste(loc2,"forecasts_",i,".csv",sep=""))
  temp.1$X <-NULL
  temp.1 <- temp.1[is.na(temp.1$outcome)==F,]
  if(!("Ph" %in% colnames(temp.1))) {
    temp.1$Ph <- NA
    temp.1$Pd <- NA
    temp.1$Pa <- NA
  }
  if(!("tier" %in% colnames(temp.0))) {
    temp.0$tier <- NA
  }
  temp.2 <- merge(temp.1[,c("match_id","outcome","Ph","Pd","Pa")],
                             temp.0[,c("match_id","date","division","team1",
                                       "goals1","goals2","team2","outcome",
                                       "season","tier","forc.week")],
                  by=c("match_id"),suffixes=c(".forc",".final"))
  all.forecast.outcomes <- rbind(temp.2[is.na(temp.2$outcome.final)==F,],all.forecast.outcomes)
}
all.forecast.outcomes$outcome.h <- as.numeric(all.forecast.outcomes$outcome.final==1)
mz.h <- lm(outcome.h ~ Ph,data=all.forecast.outcomes)
#summary(mz.h)
kable(summary(mz.h)$coef, digits=3)

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	0.095	0.073	1.309	0.191
Ph	0.733	0.153	4.793	0.000

all.forecast.outcomes$outcome.d <- as.numeric(all.forecast.outcomes$outcome.final==0.5)
mz.d <- lm(outcome.d ~ Pd,data=all.forecast.outcomes)
#summary(mz.d)
kable(summary(mz.d)$coef, digits=3)

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	0.061	0.155	0.396	0.692
Pd	0.714	0.598	1.194	0.233

all.forecast.outcomes$outcome.a <- as.numeric(all.forecast.outcomes$outcome.final==0)
mz.a <- lm(outcome.a ~ Pa,data=all.forecast.outcomes)
#summary(mz.a)
kable(summary(mz.a)$coef, digits=3)

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	0.081	0.053	1.510	0.132
Pa	0.854	0.170	5.011	0.000

calib.h <- aggregate(all.forecast.outcomes$outcome.h,by=list(round(all.forecast.outcomes$Ph,2)),FUN=mean)
plot(calib.h$Group.1,calib.h$x,xlim=range(0,1),ylim=range(0,1),
     ylab="Frequency of events that occurred",xlab="Forecast probability",
     main="Calibration plot for home win outcomes")
abline(0,1,lty=3)

calib.d <- aggregate(all.forecast.outcomes$outcome.d,by=list(round(all.forecast.outcomes$Pd,2)),FUN=mean)
plot(calib.d$Group.1,calib.d$x,xlim=range(0,1),ylim=range(0,1),
     ylab="Frequency of events that occurred",xlab="Forecast probability",
     main="Calibration plot for draw outcomes")
abline(0,1)

calib.a <- aggregate(all.forecast.outcomes$outcome.a,by=list(round(all.forecast.outcomes$Pa,2)),FUN=mean)
plot(calib.a$Group.1,calib.a$x,xlim=range(0,1),ylim=range(0,1),
     ylab="Frequency of events that occurred",xlab="Forecast probability",
     main="Calibration plot for away win outcomes")
abline(0,1)

Regression and plots suggest quality of model unspectacular.
- Nonetheless important to compare against benchmark of bookmakers.

Comparison: Bookmakers on individual events

Bookmakers represent benchmark in forecasting match outcomes.
We can run same Mincer-Zarnowitz test to evaluate quality.
- Regression of outcomes $y_{it}$ on forecasts $\widehat{y}_{bit}$ plus constant: \[ y_{it} = \alpha + \beta \widehat{y}_{bit} + e_{it}. \]
Unbiased forecasts: $\alpha=0$ and $\beta=1$, with “well-behaved” residuals.
- Commonly discovered favourite-longshot bias implies $\beta>1$.

bk <- data.frame()
dates <- dir("/home/readejj/Dropbox/Research/Data for Ideas/Betting/football/",
             pattern="\\d{4}-\\d{2}-\\d{2}")
for(d in dates) {
  loc0 <- paste("/home/readejj/Dropbox/Research/Data for Ideas/Betting/football/",d,"/",sep="")
  divs <- dir(loc0)
  for(div in divs) {
    loc2 <- paste(loc0,div,"/",sep="")
    bk.matches <- dir(loc2,pattern=".*?-v-.*?-.*?[.]csv")
    for(i in bk.matches) {
      temp <- read.csv(paste(loc2,i,sep="/"),stringsAsFactors=F)
      if(NROW(temp)>=2) {
        temp <- clean.data(temp)
        temp$div <- div
        temp$match.event <- gsub("[.]csv","",i)
        bk <- rbind(bk,temp)      
      }
    }
  }
}
bk <- bk[order(bk$match.event,bk$Date.Time),]
bk$mean <- 1/rowMeans(bk[colnames(bk)[nchar(colnames(bk))==2]],na.rm=T)
bk$match.event <- gsub("sheffield-wednesday","sheff wed",bk$match.event)
bk$match.event <- gsub("nottingham-forest","nottm forest",bk$match.event)
bk$match.event <- gsub("middlesbrough","middlesbro",bk$match.event)
bk$match.event <- gsub("port-vale","port vale",bk$match.event)
bk$match.event <- gsub("man-utd","man utd",bk$match.event)
bk$match.event <- gsub("man-city","man city",bk$match.event)
bk$match.event <- gsub("aston-villa","aston villa",bk$match.event)
bk$match.event <- gsub("west-brom","west brom",bk$match.event)
bk$match.event <- gsub("west-ham","west ham",bk$match.event)
bk$match.event <- gsub("crystal-palace","c palace",bk$match.event)
bk$match.event <- gsub("sheffield-utd","sheff utd",bk$match.event)
bk$match.event <- gsub("cambridge-utd","cambridge u",bk$match.event)
bk$match.event <- gsub("dagenham-redbridge","dag &amp; red",bk$match.event)
bk$match.event <- gsub("mk-dons","mk dons",bk$match.event)
bk$match.event <- gsub("notts-county","notts co",bk$match.event)
bk$match.event <- gsub("newport-county","newport co",bk$match.event)
bk$match.event <- gsub("afc-wimbledon","afc w'bledon",bk$match.event)
bk$match.event <- gsub("york-city","york",bk$match.event)
bk$team1 <- gsub("^(.*?)-v-(.*?)-(.*?)$","\\1",bk$match.event)
bk$team2 <- gsub("^(.*?)-v-(.*?)-(.*?)$","\\2",bk$match.event)
bk$match.event <- gsub("^(.*?)-v-(.*?)-(.*?)$","\\3",bk$match.event)

bk.h <- bk[bk$match.event==bk$team1,]
bk.d <- bk[bk$match.event=="draw",]
bk.a <- bk[bk$match.event==bk$team2,]

all.forecast.outcomes$team1 <- tolower(all.forecast.outcomes$team1)
all.forecast.outcomes$team2 <- tolower(all.forecast.outcomes$team2)
all.forecast.outcomes <- all.forecast.outcomes[duplicated(all.forecast.outcomes)==F,]

bk.h <- merge(bk.h,all.forecast.outcomes[,c("match_id","team1","team2","outcome.h","Ph")],
              by=c("team1","team2"),all.x=T)
bk.mz.h <- lm(outcome.h ~ mean,data=bk.h)
#summary(bk.mz.h)
kable(summary(bk.mz.h)$coef, digits=3)

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	0.113	0.026	4.346	0
mean	0.486	0.044	11.130	0

bk.calib.h <- aggregate(bk.h$outcome.h,by=list(round(bk.h$mean,2)),FUN=mean,na.rm=T)
plot(bk.calib.h$Group.1,bk.calib.h$x,xlim=range(0,1),ylim=range(0,1),
     ylab="Frequency of events that occurred",xlab="Forecast probability",
     main="BK Calibration plot for home win outcomes")
abline(0,1,lty=3)

bk.d <- merge(bk.d,all.forecast.outcomes[,c("match_id","team1","team2","outcome.d","Pd")],
              by=c("team1","team2"),all.x=T)
bk.mz.d <- lm(outcome.d ~ mean,data=bk.d)
#summary(bk.mz.d)
kable(summary(bk.mz.d)$coef, digits=3)

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	0.200	0.018	11.291	0.000
mean	0.112	0.038	2.953	0.003

bk.calib.d <- aggregate(bk.d$outcome.d,by=list(round(bk.d$mean,2)),FUN=mean,na.rm=T)
plot(bk.calib.d$Group.1,bk.calib.d$x,xlim=range(0,1),ylim=range(0,1),
     ylab="Frequency of events that occurred",xlab="Forecast probability",
     main="BK Calibration plot for draw outcomes")
abline(0,1,lty=3)

bk.a <- merge(bk.a,all.forecast.outcomes[,c("match_id","team1","team2","outcome.a","Pa")],
              by=c("team1","team2"),all.x=T)
bk.mz.a <- lm(outcome.a ~ mean,data=bk.a)
#summary(bk.mz.a)
kable(summary(bk.mz.a)$coef, digits=3)

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	0.278	0.020	13.756	0
mean	0.220	0.039	5.639	0

bk.calib.a <- aggregate(bk.a$outcome.a,by=list(round(bk.a$mean,2)),FUN=mean,na.rm=T)
plot(bk.calib.a$Group.1,bk.calib.a$x,xlim=range(0,1),ylim=range(0,1),
     ylab="Frequency of events that occurred",xlab="Forecast probability",
     main="BK Calibration plot for away win outcomes")
abline(0,1,lty=3)

Bookmakers appear similarly poor:
- Football an intrinsically uncertain event.
Can consider combined regression to evaluate forecasts more directly:
- Add difference between two forecasts to either original model.
- Include both forecasts and reduce collinearity.
Regression of outcomes $y_{it}$ on model forecasts $\widehat{y}_{mit}$, the difference $\widehat{y}_{bit}-\widehat{y}_{mit}$ plus constant: \[ y_i = \alpha + \beta \widehat{y}_{mit} + \gamma(\widehat{y}_{bit}-\widehat{y}_{mit}) + e_i. \]
Anticipate: $\alpha=0$ and $\beta+\gamma=1$ for unbiased joint forecast.
- $\gamma\neq0$ implies model forecasts add information over bookmakers.

bk.h$diff <- bk.h$mean - bk.h$Ph
bk.mod.mz.h <- lm(outcome.h ~ mean + diff,data=bk.h)
#summary(bk.mod.mz.h)
kable(summary(bk.mod.mz.h)$coef, digits=3)

	Estimate	Std. Error	t value
(Intercept)	-0.137	0.031	-4.456
mean	1.126	0.062	18.169
diff	-0.943	0.067	-13.979

bk.d$diff <- bk.d$mean - bk.d$Pd
bk.mod.mz.d <- lm(outcome.d ~ mean + diff,data=bk.d)
#summary(bk.mod.mz.d)
kable(summary(bk.mod.mz.d)$coef, digits=3)

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	-0.141	0.065	-2.167	0.03
mean	1.481	0.255	5.815	0.00
diff	-1.401	0.258	-5.434	0.00

bk.a$diff <- bk.a$mean - bk.a$Pa
bk.mod.mz.a <- lm(outcome.a ~ mean + diff,data=bk.a)
#summary(bk.mod.mz.a)
kable(summary(bk.mod.mz.a)$coef, digits=3)

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	-0.012	0.024	-0.483	0.629
mean	1.297	0.067	19.463	0.000
diff	-1.345	0.070	-19.193	0.000

Model provides additional predictive content above bookmakers for explaining outcomes.
Graphically:

plot(calib.h$Group.1,calib.h$x,xlim=range(0,1),ylim=range(0,1),
     ylab="Frequency of events that occurred",xlab="Forecast probability",
     main="Calibration plot for home win outcomes")
lines(bk.calib.h$Group.1,bk.calib.h$x,col=2,type="p")
abline(0,1,lty=3)
legend("topleft",col=1:2,legend=c("Model","Bookmakers"),lty=1,bty="n")

plot(calib.d$Group.1,calib.d$x,xlim=range(0,1),ylim=range(0,1),
     ylab="Frequency of events that occurred",xlab="Forecast probability",
     main="Calibration plot for draw outcomes")
lines(bk.calib.d$Group.1,bk.calib.d$x,col=2,type="p")
abline(0,1,lty=3)
legend("topleft",col=1:2,legend=c("Model","Bookmakers"),lty=1,bty="n")

plot(calib.a$Group.1,calib.a$x,xlim=range(0,1),ylim=range(0,1),
     ylab="Frequency of events that occurred",xlab="Forecast probability",
     main="Calibration plot for away win outcomes")
lines(bk.calib.a$Group.1,bk.calib.a$x,col=2,type="p")
abline(0,1,lty=3)
legend("topleft",col=1:2,legend=c("Model","Bookmakers"),lty=1,bty="n")

Model: End-of-season outcomes 2013

Given reasonable (relative) forecast performance, we simulate end-of-season outcomes.
First we consider the 2013–2014 season (outcomes known).
Forecasts updated for each day matches played (Saturday, Sunday, Monday, etc).

forecast.days.2013 <- data.frame()

dloc <- "/home/readejj/Dropbox/Teaching/Reading/ec313/2015/Football-forecasts/"
days <- dir(dloc,pattern="prem-final-*\\d*-\\d+[.]csv")
for( d in days) {
  temp <- read.csv(paste(dloc,d,sep=""),stringsAsFactors=F) 
  temp <- temp[is.na(temp$Man.Utd)==F,]
  temp$day <- as.Date(as.numeric(gsub("prem-final-*\\d*-(\\d+)[.]csv","\\1",d)),origin="1970-01-01")
  forecast.days.2013 <- rbind(forecast.days.2013,temp)
}

forc.probs.2013.1 <- aggregate(forecast.days.2013==1,by=list(forecast.days.2013$day),FUN=mean)
forc.probs.2013.rel <- aggregate(forecast.days.2013>=18,by=list(forecast.days.2013$day),FUN=mean)

plot(forc.probs.2013.1$Group.1,forc.probs.2013.1$Chelsea,ylim=range(0,1),type="o",col="blue",
     main="Model forecasts for EPL Title 2013-14",
     ylab="Probability of Winning EPL",xlab="Date")
lines(forc.probs.2013.1$Group.1,forc.probs.2013.1$Man.City,type="o",col="skyblue")
lines(forc.probs.2013.1$Group.1,forc.probs.2013.1$Man.Utd,type="o",col="red")
lines(forc.probs.2013.1$Group.1,forc.probs.2013.1$Arsenal,type="o",col="darkred")
lines(forc.probs.2013.1$Group.1,forc.probs.2013.1$Liverpool,type="o",col="red2")

plot(forc.probs.2013.rel$Group.1,forc.probs.2013.rel$Cardiff,ylim=range(0,1),type="o",col="red",
     main="Model forecasts for EPL Relegation 2013-14",
     ylab="Probability of Relegation from EPL",xlab="Date")
lines(forc.probs.2013.rel$Group.1,forc.probs.2013.rel$Norwich,type="o",col="yellow")
lines(forc.probs.2013.rel$Group.1,forc.probs.2013.rel$Fulham,type="o",col="black")
lines(forc.probs.2013.rel$Group.1,forc.probs.2013.rel$Aston.Villa,type="o",col="purple")
lines(forc.probs.2013.rel$Group.1,forc.probs.2013.rel$West.Brom,type="o",col="darkblue")
lines(forc.probs.2013.rel$Group.1,forc.probs.2013.rel$West.Ham,type="o",col="purple3")
lines(forc.probs.2013.rel$Group.1,forc.probs.2013.rel$Sunderland,type="o",col="pink")

Ocular econometrics: Appears to track events well.
- Liverpool’s late fall, Sunderland’s late spurt.
Little really to evaluate against, with only few (20) outcomes.
- Additionally, uncertainty remained great even towards final day.

Model: End-of-season outcomes 2014

forecast.days <- data.frame()

dloc <- "/home/readejj/Dropbox/Teaching/Reading/ec313/2015/Football-forecasts/"
days <- dir(dloc,pattern="prem-final-2014-\\d-(\\d+).csv")
for( d in days) {
  temp <- read.csv(paste(dloc,d,sep=""),stringsAsFactors=F) 
  temp$day <- as.Date(as.numeric(gsub("prem-final-2014-\\d+-(\\d+).csv","\\1",d)),origin="1970-01-01")
  forecast.days <- rbind(forecast.days,temp)
}

forc.probs.1 <- aggregate(forecast.days==1,by=list(forecast.days$day),FUN=mean)
forc.probs.rel <- aggregate(forecast.days>=18,by=list(forecast.days$day),FUN=mean)


plot(forc.probs.1$Group.1,forc.probs.1$Chelsea,ylim=range(0,1),type="o",col="blue",
     main="Comparison - Model and Bookmakers, Chelsea",
     ylab="Probability of Winning EPL",xlab="Date")
lines(forc.probs.1$Group.1,forc.probs.1$Man.City,type="o",col="skyblue")
lines(forc.probs.1$Group.1,forc.probs.1$Man.Utd,type="o",col="red")
lines(forc.probs.1$Group.1,forc.probs.1$Arsenal,type="o",col="darkred")
lines(forc.probs.1$Group.1,forc.probs.1$Liverpool,type="o",col="red2")

plot(forc.probs.rel$Group.1,forc.probs.rel$Leicester,ylim=range(0,1),type="o",col="blue",
     main="Comparison - Model and Bookmakers, Leicester",
     ylab="Probability of Relegation",xlab="Date")
lines(forc.probs.rel$Group.1,forc.probs.rel$Burnley,type="o",col="purple")
lines(forc.probs.rel$Group.1,forc.probs.rel$QPR,type="o",col="lightblue")
lines(forc.probs.rel$Group.1,forc.probs.rel$Sunderland,type="o",col="pink")

Comparison: Bookmaker end-of-season outcomes

Bookmaker prices for end-of-season outcomes collected for 2014–2015.
- Enables comparison between the two.
- No outcomes to compare against, nonetheless.

teams.1 <- winner$team[duplicated(winner$team)==F]
team.cols.1 <- c("darkred","blue","red2","skyblue","red","pink","black")

plot(range(winner$Date.Time),rep(0,2),ylim=range(0,1),col="white",ylab="Probability",xlab="",
     main="Bookmaker Implied Probabilities for EPL Title",xaxt="n")
axis(1,at=seq(as.Date("2014-08-01"),as.Date("2015-06-01"),by="months"),
     labels=format(seq(as.Date("2014-08-01"),as.Date("2015-06-01"),by="months"),"%b-%Y"),las=2)
for(t in 1:NROW(teams.1)) {
  for(b in 1:NROW(bks)) {
    lines(winner$Date.Time[winner$team==teams.1[t]],1/winner[winner$team==teams.1[t],bks[b]],col=team.cols.1[t])
  }  
}
legend("topleft",ncol=1,lty=1,col=team.cols.1,legend=teams.1,bty="n")

for(date in match.dates.2014) {
  abline(v=as.Date(date),lty=3)
}
for(date in match.dates.2013[-NROW(match.dates.2013)]) {
  abline(v=as.Date(date,origin="1970-01-01"),lty=3,col="grey")
}

teams.r <- releg$team[duplicated(releg$team)==F]
team.cols.r <- c("purple","mediumpurple1","red","blue","orange","darkblue","black","lightblue","pink","hotpink",
               "black","darkblue","purple3")

plot(range(releg$Date.Time),rep(0,2),ylim=range(0,1),col="white",ylab="Probability",xlab="",
     main="Bookmaker Implied Probabilities for Relegation",xaxt="n")
axis(1,at=seq(as.Date("2014-08-01"),as.Date("2015-06-01"),by="months"),
     labels=format(seq(as.Date("2014-08-01"),as.Date("2015-06-01"),by="months"),"%b-%Y"),las=2)
for(t in 1:NROW(teams)) {
  for(b in 1:NROW(bks)) {
    lines(releg$Date.Time[releg$team==teams.r[t]],1/releg[releg$team==teams.r[t],bks[b]],col=team.cols.r[t])
  }  
}
legend("topleft",ncol=1,lty=1,col=team.cols.r,legend=teams.r,bty="n")

for(date in match.dates.2014) {
  abline(v=as.Date(date),lty=3)
}

Model vs Bookmakers

plot(forc.probs.1$Group.1,forc.probs.1$Chelsea,ylim=range(0,1),type="o",col="blue",
     main="Comparison - Model and Bookmakers, Chelsea",
     ylab="Probability of Winning EPL",xlab="Date")
for(b in 1:NROW(bks)) {
  lines(winner$Date.Time[winner$team=="chelsea"],1/winner[winner$team=="chelsea",bks[b]],col="blue")
}

plot(forc.probs.1$Group.1,forc.probs.1$Man.City,ylim=range(0,1),type="o",col="skyblue",
     main="Comparison - Model and Bookmakers, Man City",
     ylab="Probability of Winning EPL",xlab="Date")
for(b in 1:NROW(bks)) {
  lines(winner$Date.Time[winner$team=="man-city"],1/winner[winner$team=="man-city",bks[b]],col="skyblue")
}

plot(forc.probs.1$Group.1,forc.probs.1$Man.Utd,ylim=range(0,1),type="o",col="red",
     main="Comparison - Model and Bookmakers, Man United",
     ylab="Probability of Winning EPL",xlab="Date")
for(b in 1:NROW(bks)) {
  lines(winner$Date.Time[winner$team=="man-utd"],1/winner[winner$team=="man-utd",bks[b]],col="red")
}

plot(forc.probs.rel$Group.1,forc.probs.rel$Leicester,ylim=range(0,1),type="o",col="blue",
     main="Comparison - Model and Bookmakers, Leicester",
     ylab="Probability of Relegation",xlab="Date")
for(b in 1:NROW(bks)) {
  lines(releg$Date.Time[releg$team=="leicester"],1/releg[releg$team=="leicester",bks[b]],col="blue")
}

plot(forc.probs.rel$Group.1,forc.probs.rel$Burnley,ylim=range(0,1),type="o",col="purple",
     main="Comparison - Model and Bookmakers, Burnley",
     ylab="Probability of Relegation",xlab="Date")
for(b in 1:NROW(bks)) {
  lines(releg$Date.Time[releg$team=="burnley"],1/releg[releg$team=="burnley",bks[b]],col="purple")
}

Discussion

Casual comparison suggests little difference between “forecasts”.
- Suggests method effective.
- Further improvements undoubtedly possible in underlying statistical model.
But…
- Absence of outcomes to compare against.
- Deeply uncertain event being forecast.

Analysis of Bookmakers

Bookmakers offer identical product often for different price.
- Value of product varies with information, however.
Why do customers not all go to bookmaker offering best price?
- No information on customer loyalty offers.

Who offers best price most often?
What prompts bookmakers to change prices?
- Analysis suggests matches must be most important driver of price changes.
- To what extent do competitive pressures influence updating?

Frequency of Price Changes

First need to create balanced daily panel:

winner.full <- data.frame(stringsAsFactors=F)
for(t in teams.1) {
  temp <- data.frame("Date.Time"=seq(min(winner$Date.Time),max(winner$Date.Time),by="days"),
                     "team"=t,stringsAsFactors=F)
  winner.full <- rbind(winner.full,temp)
}
winner.full <- merge(winner.full,winner,by=c("Date.Time","team"),all.x=T)
winner.full <- winner.full[order(winner.full$team,winner.full$Date.Time),]

for(t in teams.1) {
  winner.full[winner.full$team==t,bks] <- na.locf(winner.full[winner.full$team==t,bks],na.rm=F)
}


winner.full$matchday <- as.numeric(winner.full$Date.Time %in% as.Date(c(match.dates.2013,match.dates.2014)))

winner.full <- winner.full[order(winner.full$team,winner.full$Date.Time),]
for(b in bks) {
  winner.full[paste(b,1,sep=".")] <- c(-999,winner.full[-NROW(winner.full),b])
}
winner.full$team.1 <- c("na",winner.full$team[-NROW(winner.full)])
for(b in bks) {
  winner.full[paste(b,"d",sep=".")] <- as.numeric((winner.full[b] != winner.full[paste(b,1,sep=".")]) & (winner.full$team==winner.full$team.1))
}

Who offers best price most often?

best.p <- apply(winner.full[,nchar(colnames(winner.full))==2],1,FUN=max,na.rm=T)

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf

best.full <- colSums(winner.full[,nchar(colnames(winner.full))==2]==best.p,na.rm=T)

Betfair more than 25% of time (for EPL winner).
Paddy Power, Bet Victor and Betfair Sportsbook and Coral most competitive bookmakers.
- Some (Stan James, Titanbet, Betfred) very uncompetitive.
How frequently do bookmakers change prices?

100*colMeans(winner.full[grep("[.]d",colnames(winner.full))],na.rm=T)

##     B3.d     SK.d     BX.d     BY.d     FR.d     SO.d     VC.d     PP.d 
## 12.64007 10.98592 12.19872 11.98630 11.27717 11.48497 19.10064 12.89675 
##     SJ.d     EE.d     LD.d     CE.d     WH.d     WN.d     SX.d     FB.d 
## 12.75294 12.26941 15.63461 12.20811 10.45611 14.50313 11.40351 13.28060 
##     WA.d     TI.d     UN.d     BW.d     RD.d     BF.d     BD.d     MA.d 
## 13.25301 16.70468 12.14071 10.25311 14.35986 17.02713 30.60697 29.63585

Note: 26.3565891% of days in sample are matchdays.
But is simple existence of matchday what determines price change?
- What if other bookmakers change prices?
- What if that teams loses/wins?
- What if that particular team doesn’t play?

long.winner.full <- read.csv(paste(loc,"long-winner-full.csv",sep=""),stringsAsFactors=F)

bk.reg <- lm(bk.d ~ matchday + bk.oth.d + bk.oth.d.1 + team.play + team.win + team.lose,data=long.winner.full)
#summary(bk.reg)
kable(summary(bk.reg)$coef, digits=3)

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	0.017	0.002	10.139	0.000
matchday	0.058	0.005	12.819	0.000
bk.oth.d	0.032	0.000	66.520	0.000
bk.oth.d.1	0.001	0.000	3.315	0.001
team.play	0.075	0.010	7.315	0.000
team.win	-0.004	0.011	-0.414	0.679
team.lose	0.036	0.013	2.803	0.005

Aggregate results suggest matchdays important:
- Increase likelihood of odds change by 6%.
- If that team playing that day, further 7% increment.
- Actual match outcome less important.
Competitive pressures matter also:
- Each extra bookie changing odds increases likelihood of change by 3%.
- Endogeneity an issue with other bookmaker changes.
Do different bookmakers differ?

bk.reg.2 <- lm(bk.d ~ bk + matchday + bk.oth.d + bk.oth.d.1 + team.play + team.win + team.lose,data=long.winner.full)
#summary(bk.reg.2)
kable(summary(bk.reg.2)$coef, digits=3)

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	-0.004	0.006	-0.572	0.567
bkBD	0.191	0.009	21.987	0.000
bkBF	0.057	0.008	6.732	0.000
bkBW	-0.020	0.009	-2.327	0.020
bkBX	-0.014	0.009	-1.573	0.116
bkBY	-0.002	0.009	-0.220	0.826
bkCE	0.004	0.009	0.484	0.628
bkEE	0.001	0.009	0.086	0.931
bkFB	0.013	0.009	1.557	0.119
bkFR	-0.015	0.009	-1.682	0.093
bkLD	0.033	0.009	3.821	0.000
bkMA	0.154	0.009	16.570	0.000
bkPP	0.013	0.009	1.553	0.120
bkRD	0.022	0.009	2.513	0.012
bkSJ	-0.003	0.009	-0.356	0.722
bkSK	-0.007	0.009	-0.839	0.402
bkSO	-0.011	0.009	-1.237	0.216
bkSX	-0.021	0.009	-2.323	0.020
bkTI	0.018	0.009	1.905	0.057
bkUN	-0.001	0.009	-0.068	0.946
bkVC	0.071	0.009	8.252	0.000
bkWA	0.014	0.009	1.579	0.114
bkWH	-0.018	0.009	-2.097	0.036
bkWN	0.020	0.009	2.262	0.024
matchday	0.055	0.004	12.308	0.000
bk.oth.d	0.032	0.000	68.096	0.000
bk.oth.d.1	0.001	0.000	3.557	0.000
team.play	0.074	0.010	7.255	0.000
team.win	-0.004	0.010	-0.403	0.687
team.lose	0.035	0.013	2.762	0.006

Simple fixed effects shows different bookmakers have different willingness to change prices.
- Explanatory variables appear unchanged.

bk.reg.3 <- lm(bk.d ~ bk + matchday*bk + bk.oth.d*bk + team.play*bk,data=long.winner.full)
#summary(bk.reg.3)
kable(summary(bk.reg.3)$coef, digits=3)

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	-0.022	0.007	-3.004	0.003
bkBD	0.267	0.010	26.014	0.000
bkBF	0.156	0.010	15.409	0.000
bkBW	0.016	0.010	1.575	0.115
bkBX	-0.027	0.011	-2.482	0.013
bkBY	-0.002	0.010	-0.149	0.881
bkCE	0.013	0.010	1.296	0.195
bkEE	-0.008	0.010	-0.789	0.430
bkFB	0.031	0.010	3.002	0.003
bkFR	-0.009	0.011	-0.885	0.376
bkLD	0.018	0.010	1.760	0.078
bkMA	0.284	0.011	25.388	0.000
bkPP	0.016	0.010	1.530	0.126
bkRD	0.031	0.010	2.960	0.003
bkSJ	0.045	0.011	4.261	0.000
bkSK	-0.005	0.010	-0.473	0.636
bkSO	0.035	0.011	3.353	0.001
bkSX	-0.006	0.011	-0.543	0.587
bkTI	-0.018	0.011	-1.564	0.118
bkUN	-0.010	0.010	-0.931	0.352
bkVC	0.132	0.010	12.740	0.000
bkWA	-0.003	0.010	-0.337	0.736
bkWH	0.005	0.010	0.475	0.635
bkWN	-0.009	0.011	-0.895	0.371
matchday	0.001	0.021	0.066	0.947
bk.oth.d	0.042	0.002	19.399	0.000
team.play	0.106	0.029	3.695	0.000
bkBD:matchday	0.077	0.030	2.569	0.010
bkBF:matchday	0.073	0.027	2.720	0.007
bkBW:matchday	-0.042	0.030	-1.394	0.163
bkBX:matchday	-0.032	0.030	-1.066	0.286
bkBY:matchday	0.000	0.029	0.008	0.994
bkCE:matchday	0.022	0.028	0.769	0.442
bkEE:matchday	-0.014	0.030	-0.458	0.647
bkFB:matchday	0.110	0.028	3.913	0.000
bkFR:matchday	-0.002	0.030	-0.059	0.953
bkLD:matchday	0.149	0.030	5.030	0.000
bkMA:matchday	-0.026	0.030	-0.867	0.386
bkPP:matchday	0.036	0.028	1.300	0.193
bkRD:matchday	-0.007	0.030	-0.250	0.803
bkSJ:matchday	0.152	0.030	5.109	0.000
bkSK:matchday	0.033	0.027	1.210	0.226
bkSO:matchday	0.133	0.030	4.454	0.000
bkSX:matchday	0.019	0.030	0.625	0.532
bkTI:matchday	0.173	0.030	5.719	0.000
bkUN:matchday	0.010	0.030	0.322	0.747
bkVC:matchday	0.120	0.029	4.188	0.000
bkWA:matchday	0.115	0.028	4.096	0.000
bkWH:matchday	0.007	0.030	0.247	0.805
bkWN:matchday	0.141	0.030	4.724	0.000
bkBD:bk.oth.d	-0.023	0.003	-7.557	0.000
bkBF:bk.oth.d	-0.034	0.003	-11.575	0.000
bkBW:bk.oth.d	-0.007	0.003	-2.300	0.021
bkBX:bk.oth.d	0.003	0.003	1.045	0.296
bkBY:bk.oth.d	-0.003	0.003	-0.851	0.395
bkCE:bk.oth.d	-0.007	0.003	-2.505	0.012
bkEE:bk.oth.d	0.003	0.003	1.014	0.311
bkFB:bk.oth.d	-0.015	0.003	-5.034	0.000
bkFR:bk.oth.d	-0.003	0.003	-0.991	0.322
bkLD:bk.oth.d	-0.007	0.003	-2.400	0.016
bkMA:bk.oth.d	-0.027	0.003	-8.652	0.000
bkPP:bk.oth.d	-0.001	0.003	-0.477	0.633
bkRD:bk.oth.d	-0.003	0.003	-0.913	0.361
bkSJ:bk.oth.d	-0.020	0.003	-6.445	0.000
bkSK:bk.oth.d	-0.001	0.003	-0.428	0.669
bkSO:bk.oth.d	-0.019	0.003	-6.269	0.000
bkSX:bk.oth.d	-0.005	0.003	-1.481	0.139
bkTI:bk.oth.d	-0.004	0.003	-1.202	0.229
bkUN:bk.oth.d	0.005	0.003	1.532	0.125
bkVC:bk.oth.d	-0.025	0.003	-8.456	0.000
bkWA:bk.oth.d	-0.002	0.003	-0.675	0.500
bkWH:bk.oth.d	-0.010	0.003	-3.274	0.001
bkWN:bk.oth.d	0.000	0.003	0.042	0.966
bkBD:team.play	-0.253	0.041	-6.174	0.000
bkBF:team.play	-0.186	0.041	-4.561	0.000
bkBW:team.play	-0.047	0.041	-1.151	0.250
bkBX:team.play	0.078	0.041	1.919	0.055
bkBY:team.play	0.089	0.041	2.202	0.028
bkCE:team.play	0.119	0.041	2.926	0.003
bkEE:team.play	0.023	0.041	0.568	0.570
bkFB:team.play	0.043	0.041	1.055	0.291
bkFR:team.play	0.048	0.041	1.170	0.242
bkLD:team.play	0.038	0.041	0.928	0.353
bkMA:team.play	-0.213	0.041	-5.175	0.000
bkPP:team.play	-0.047	0.041	-1.150	0.250
bkRD:team.play	0.020	0.041	0.479	0.632
bkSJ:team.play	-0.200	0.041	-4.898	0.000
bkSK:team.play	-0.045	0.041	-1.105	0.269
bkSO:team.play	-0.155	0.041	-3.786	0.000
bkSX:team.play	-0.047	0.041	-1.149	0.251
bkTI:team.play	-0.038	0.041	-0.928	0.353
bkUN:team.play	-0.088	0.041	-2.162	0.031
bkVC:team.play	-0.102	0.041	-2.509	0.012
bkWA:team.play	-0.031	0.041	-0.771	0.441
bkWH:team.play	0.081	0.041	1.980	0.048
bkWN:team.play	-0.051	0.041	-1.250	0.211

Interacting fixed effects with matchday, no. of other bookies changes and whether the team plays shows considerable differences between bookmakers.
- Matchday alone no longer significant.
  - Betfair (fixed and exchange), Ladbrokes, Stan James, Sportingbet, Titanbet, Bet Victor, Winner and Betway influenced by matchday.
- Team playing leads to 10% greater chance of change.
  - Small number of bookmakers less likely to update if team actually playing.
- Other bookmakers changing increases likelihood by 4%.
  - Half of bookmakers less influenced by competition.

all.bk.coefs <- data.frame("(Intercept)"=NA," matchday"=NA,"bk.oth.d"=NA,"bk.oth.d.1"=NA,
                           "team.play"=NA,"team.win"=NA,"team.lose"=NA)
all.bk.ts <- data.frame("(Intercept)"=NA," matchday"=NA,"bk.oth.d"=NA,"bk.oth.d.1"=NA,
                        "team.play"=NA,"team.win"=NA,"team.lose"=NA)
for(b in bks) {
  bk.reg.0 <- lm(bk.d ~ matchday + bk.oth.d + bk.oth.d.1 + team.play + team.win + team.lose,data=long.winner.full[long.winner.full$bk==b,])
  #  print(summary(bk.reg.0))
  all.bk.coefs <- rbind(all.bk.coefs,coefficients(bk.reg.0))
  all.bk.ts <- rbind(all.bk.ts,summary(bk.reg.0)$coefficients[,3])
}
hist(all.bk.coefs$X.matchday)

hist(all.bk.coefs$bk.oth.d)

hist(all.bk.coefs$team.play)

Conclusions

Constructed and evaluated end-of-season forecasts for sports leagues.
- Hierarchical forecast model based on individual match outcomes.
Evaluated bookmaker forecasts both for matches and end-of-season outcomes.
Investigated bookmaker pricing behaviour:
- Varying degree of price competition amongst bookmakers.
Ongoing research, comments welcome…

References

Forrest, D. and Simmons, R. (2000), “Forecasting Sport: The Behaviour and Performance of Football Tipsters”, International Journal of Forecasting, Vol. 16, pp. 317–331.

Goddard, J. and Asimakopoulos, I. (2004), “Modelling football match results and the efficiency of fixed-odds betting”, International Journal of Forecasting, Vol. 23, pp. 51–66.

Hyndman, R. and Athanasopoulos, G. (2012), Forecasting: Principles and Practice, www.otexts.com, available at: https://www.otexts.org/book/fpp.

Karlis, D. and Ntzoufras, I. (2003), “Analysis of Sports Data By Using Bivariate Poisson Models”, The Statistician, Vol. 52 No. 3, pp. 381–393.

Shin, H. (1991), “Optimal Betting Odds Against Insider Traders”, The Economic Journal, Vol. 101 No. 408, pp. 1179–1185.

Shin, H. (1992), “Prices of State Contingent Claims with Insider Traders, and the Favourite-Longshot Bias”, The Economic Journal, Vol. 102 No. 411, pp. 426–435.

Shin, H. (1993), “Measuring the Incidence of Insider Trading in a Market for State-Contingent Claims”, The Economic Journal, Vol. 103 No. 420, pp. 1141–1153.

Vaughan Williams, L. and Paton, D. (1997), “Why Is There a Favourite-Longshot Bias in British Racetrack Betting Markets?”, The Economic Journal, pp. 150–158.