End-of-season forecasting and Dynamic Pricing

J James Reade

Tinbergen Institute, Erasmus School of Economics, Rotterdam
13/04/2015

Introduction

Plan of talk

  1. Review of relevant literature.
  2. Methodology of end-of-season forecast model.
  3. Data: For model and from bookmakers.
  4. Evaluate the model for individual matches.
  5. Evaluate model for end-of-season outcomes.
  6. Investigate dynamic pricing behaviour of bookmakers.

Literature

Model: Methodology

End-of-season: Methodology

End-of-season: Example

  1. Forecasts from yesterday’s matches:
loc <- "/home/readejj/Dropbox/Research/Sport/managerial change Cormac/"
loc2 <- "/home/readejj/Dropbox/Teaching/Reading/ec313/2015/Football-forecasts/"
library(knitr)
dates <- c("2015-01-30","2015-02-06","2015-02-13","2015-02-20","2015-02-27","2015-03-06","2015-03-13","2015-03-20","2015-04-10")
date.1 <- dates[NROW(dates)]
forecast.matches <- read.csv(paste(loc2,"forecasts_",date.1,".csv",sep=""),stringsAsFactors=F)
div <- "English Premier"
matches <- forecast.matches[forecast.matches$division==div,]
matches <- matches[order(matches$date),]
matches$id <- 1:NROW(matches)
par(mar=c(9,4,4,5)+.1)
plot(matches$id,matches$outcome,xaxt="n",xlab="",ylim=range(0,1),
     main=paste("Forecasts of Weekend ",div," Matches",sep=""),
     ylab="Probability of Outcome",col="white")
lines(matches$id,matches$Ph,col=2,pch=15,type="p")
lines(matches$id,matches$Pd,col=3,pch=16,type="p")
lines(matches$id,matches$Pa,col=4,pch=17,type="p")
legend("topleft",ncol=3,pch=c(15,16,17),col=c(2:4),
       legend=c("OL (home)","OL (draw)","OL (away)"),bty="n")
abline(h=0.5,lty=2)
abline(h=0.6,lty=3)
abline(h=0.7,lty=2)
axis(1,at=matches$id,labels=paste(matches$team1,matches$team2,sep=" v "),las=2,cex.axis=0.65)
for(i in 2:NROW(matches)){
  if(matches$date[i]!=matches$date[i-1]) {
    lines(rep(c(i-0.5),2),c(0,1),lty=2)
  }
}

  1. Generate outcomes for these matches using multinomial distribution:
for(i in 1:NROW(matches)) {
  matches$outcome[i] <- c(1,0.5,0) %*% rmultinom(n = 1, size=1,
                                                 prob = matches[i,c("Ph","Pa","Pd")])  
}
kable(matches[,c("team1","team2","outcome")])
team1 team2 outcome
3 Swansea Everton 0.0
4 Tottenham Aston Villa 1.0
5 Southampton Hull 1.0
6 West Ham Stoke 0.0
7 West Brom Leicester 1.0
8 Sunderland C Palace 0.5
9 Burnley Arsenal 0.5
110 QPR Chelsea 0.5
111 Man Utd Man City 0.5
113 Liverpool Newcastle 0.5
  1. Update goals scored/conceded with averages, update league tables, Elo scores.
  2. Generate forecasts for next weekend’s matches using updated data.
  3. Carry on until end of season and log final positions for teams.
  4. Repeat sufficiently many times to generate distribution.

Data: Model

res.eng <- read.csv(paste(loc,"res-eng.csv",sep=""),stringsAsFactor=F)
res.eng$goals1 <- as.numeric(res.eng$goals1)
res.eng$goals2 <- as.numeric(res.eng$goals2)
res.eng$tier <- as.numeric(res.eng$tier)
res.eng$season <- as.numeric(res.eng$season)
res.eng$X <- NULL

match.dates.2013 <- res.eng$date[res.eng$season==2013]
match.dates.2013 <- match.dates.2013[duplicated(match.dates.2013)==F]
match.dates.2013 <- match.dates.2013[order(match.dates.2013)]
match.dates.2014 <- res.eng$date[res.eng$season==2014]
match.dates.2014 <- match.dates.2014[duplicated(match.dates.2014)==F]
match.dates.2014 <- match.dates.2014[order(match.dates.2014)]
elo.prem <- read.csv(paste(loc2,"elo-prem-2010.csv",sep=""),stringsAsFactor=F)
epl.teams <- res.eng$team1[res.eng$season==2014 & res.eng$tier==1]
epl.teams <- c("Liverpool","Burnley","Stoke","QPR","Newcastle","Arsenal","West Ham","West Brom","Man Utd","Leicester","Chelsea","Aston Villa","Everton","Sunderland","Tottenham","Hull","Swansea","Man City","C Palace","Southampton")
epl.team.cols <- c("red3","purple","pink","lightblue","black","darkred","purple2","darkblue","red","blue","darkblue","purple3","blue2","hotpink","grey10","orange","grey20","skyblue","red4","pink3")
plot(range(as.Date(elo.prem$date),na.rm=T),rep(1,2),type="l",
     ylim=range(elo.prem[gsub(" ",".",epl.teams)],na.rm=T),
     main="Elo Scores for Current EPL Teams since 2010",ylab="Elo score",xlab="Date")
for(t in 1:NROW(epl.teams)) {
  lines(as.Date(elo.prem$date),elo.prem[,gsub(" ",".",epl.teams[t])],type="l",col=epl.team.cols[t])
}

library(MASS)
model.ord <- polr(as.factor(outcome) ~ E.1 + pts1 + pts.D + pts.D.2 + pld1 + pld.D + pld.D.2 + 
                   gs1 + gs.D + gs.D.2 + gd1 + gd.D + gd.D.2 + pos1 + pos.D + pos.D.2 + 
                   form1 + form.D + form.D.2 + season.d, 
                 data=res.eng, method = "logistic")
options(scipen=13)
#summary(model.ord)
kable(summary(model.ord)$coef, digits=3)
## 
## Re-fitting to get Hessian
Value Std. Error t value
E.1 2.379 0.019 128.119
pts1 0.007 0.005 1.627
pts.D -0.014 0.003 -4.306
pts.D.2 0.000 0.000 -0.972
pld1 -0.012 0.006 -1.935
pld.D 0.024 0.010 2.365
pld.D.2 0.007 0.004 1.575
gs1 0.003 0.002 1.700
gs.D 0.001 0.002 0.801
gs.D.2 0.000 0.000 0.046
gd1 -0.005 0.003 -2.087
gd.D 0.017 0.002 9.023
gd.D.2 0.000 0.000 -0.010
pos1 0.005 0.003 1.407
pos.D 0.008 0.003 2.905
pos.D.2 0.000 0.000 2.502
form1 0.007 0.004 1.974
form.D -0.013 0.003 -4.388
form.D.2 -0.001 0.000 -2.326
season.d -0.003 0.000 -11.728
0|0.5 0.127 0.033 3.807
0.5|1 1.271 0.033 38.085

Data: Bookmakers

source("/home/readejj/Dropbox/Research/Code/R/betting/clean.data.R")

bks <- c("B3", "SK", "BX", "BY", "FR", "SO", "VC", "PP", "SJ", "EE", "LD", "CE", 
         "WH", "WN", "SX", "FB", "WA", "TI", "UN", "BW", "RD", "BF", "BD", "MA")
bks.full <- c("Bet365", "SkyBet", "Totesport", "Boyle Sports", "Betfred", "Sportingbet", "Bet Victor", "Paddy Power", "Stan James", "888sport", "Ladbrokes", "Coral", 
         "William Hill", "Winner", "Spreadex", "Betfair-fixed", "Betway", "Titanbet", "Unibet", "bwin", "32Red", "Betfair-exchange", "Betdaq", "Matchbook")

direc <- c("/home/readejj/Dropbox/Research/Data for Ideas/Betting/football/eos/premier-league/2015-04-08/winner")
winner <- data.frame(stringsAsFactors=F)
teams <- c(paste(direc,dir(direc,pattern="*.csv"),sep="/"))

for (team in teams) {
  temp <- read.csv(team,stringsAsFactors=F)
  temp <- clean.data(temp)
  temp$team <- gsub("/home/readejj/Dropbox/Research/Data for Ideas/Betting/football/eos/premier-league/2015-04-08/winner/(\\S+).csv","\\1",team)
  winner <- rbind(winner,temp)
}
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## 
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
winner <- winner[order(winner$team,winner$Date.Time),]

direc <- c("/home/readejj/Dropbox/Research/Data for Ideas/Betting/football/eos/premier-league/2015-04-08/relegation")
releg <- data.frame(stringsAsFactors=F)
teams <- c(paste(direc,dir(direc,pattern="*.csv"),sep="/"))

for (team in teams) {
  temp <- read.csv(team,stringsAsFactors=F)
  temp <- clean.data(temp)
  temp$team <- gsub("/home/readejj/Dropbox/Research/Data for Ideas/Betting/football/eos/premier-league/2015-04-08/relegation/(\\S+).csv","\\1",team)
  releg <- rbind(releg,temp)
}

releg <- releg[order(releg$team,releg$Date.Time),]

Model: Appraisal against individual events

all.forecast.outcomes <- data.frame()
for(i in dates) {
  temp.0 <- read.csv(paste(loc2,"forecast_outcomes_",i,".csv",sep=""),stringsAsFactors=F)
  temp.0$X <-NULL
  temp.0$forc.week <- i
  temp.1 <- read.csv(paste(loc2,"forecasts_",i,".csv",sep=""))
  temp.1$X <-NULL
  temp.1 <- temp.1[is.na(temp.1$outcome)==F,]
  if(!("Ph" %in% colnames(temp.1))) {
    temp.1$Ph <- NA
    temp.1$Pd <- NA
    temp.1$Pa <- NA
  }
  if(!("tier" %in% colnames(temp.0))) {
    temp.0$tier <- NA
  }
  temp.2 <- merge(temp.1[,c("match_id","outcome","Ph","Pd","Pa")],
                             temp.0[,c("match_id","date","division","team1",
                                       "goals1","goals2","team2","outcome",
                                       "season","tier","forc.week")],
                  by=c("match_id"),suffixes=c(".forc",".final"))
  all.forecast.outcomes <- rbind(temp.2[is.na(temp.2$outcome.final)==F,],all.forecast.outcomes)
}
all.forecast.outcomes$outcome.h <- as.numeric(all.forecast.outcomes$outcome.final==1)
mz.h <- lm(outcome.h ~ Ph,data=all.forecast.outcomes)
#summary(mz.h)
kable(summary(mz.h)$coef, digits=3)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.095 0.073 1.309 0.191
Ph 0.733 0.153 4.793 0.000
all.forecast.outcomes$outcome.d <- as.numeric(all.forecast.outcomes$outcome.final==0.5)
mz.d <- lm(outcome.d ~ Pd,data=all.forecast.outcomes)
#summary(mz.d)
kable(summary(mz.d)$coef, digits=3)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.061 0.155 0.396 0.692
Pd 0.714 0.598 1.194 0.233
all.forecast.outcomes$outcome.a <- as.numeric(all.forecast.outcomes$outcome.final==0)
mz.a <- lm(outcome.a ~ Pa,data=all.forecast.outcomes)
#summary(mz.a)
kable(summary(mz.a)$coef, digits=3)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.081 0.053 1.510 0.132
Pa 0.854 0.170 5.011 0.000
calib.h <- aggregate(all.forecast.outcomes$outcome.h,by=list(round(all.forecast.outcomes$Ph,2)),FUN=mean)
plot(calib.h$Group.1,calib.h$x,xlim=range(0,1),ylim=range(0,1),
     ylab="Frequency of events that occurred",xlab="Forecast probability",
     main="Calibration plot for home win outcomes")
abline(0,1,lty=3)

calib.d <- aggregate(all.forecast.outcomes$outcome.d,by=list(round(all.forecast.outcomes$Pd,2)),FUN=mean)
plot(calib.d$Group.1,calib.d$x,xlim=range(0,1),ylim=range(0,1),
     ylab="Frequency of events that occurred",xlab="Forecast probability",
     main="Calibration plot for draw outcomes")
abline(0,1)

calib.a <- aggregate(all.forecast.outcomes$outcome.a,by=list(round(all.forecast.outcomes$Pa,2)),FUN=mean)
plot(calib.a$Group.1,calib.a$x,xlim=range(0,1),ylim=range(0,1),
     ylab="Frequency of events that occurred",xlab="Forecast probability",
     main="Calibration plot for away win outcomes")
abline(0,1)

Comparison: Bookmakers on individual events

bk <- data.frame()
dates <- dir("/home/readejj/Dropbox/Research/Data for Ideas/Betting/football/",
             pattern="\\d{4}-\\d{2}-\\d{2}")
for(d in dates) {
  loc0 <- paste("/home/readejj/Dropbox/Research/Data for Ideas/Betting/football/",d,"/",sep="")
  divs <- dir(loc0)
  for(div in divs) {
    loc2 <- paste(loc0,div,"/",sep="")
    bk.matches <- dir(loc2,pattern=".*?-v-.*?-.*?[.]csv")
    for(i in bk.matches) {
      temp <- read.csv(paste(loc2,i,sep="/"),stringsAsFactors=F)
      if(NROW(temp)>=2) {
        temp <- clean.data(temp)
        temp$div <- div
        temp$match.event <- gsub("[.]csv","",i)
        bk <- rbind(bk,temp)      
      }
    }
  }
}
bk <- bk[order(bk$match.event,bk$Date.Time),]
bk$mean <- 1/rowMeans(bk[colnames(bk)[nchar(colnames(bk))==2]],na.rm=T)
bk$match.event <- gsub("sheffield-wednesday","sheff wed",bk$match.event)
bk$match.event <- gsub("nottingham-forest","nottm forest",bk$match.event)
bk$match.event <- gsub("middlesbrough","middlesbro",bk$match.event)
bk$match.event <- gsub("port-vale","port vale",bk$match.event)
bk$match.event <- gsub("man-utd","man utd",bk$match.event)
bk$match.event <- gsub("man-city","man city",bk$match.event)
bk$match.event <- gsub("aston-villa","aston villa",bk$match.event)
bk$match.event <- gsub("west-brom","west brom",bk$match.event)
bk$match.event <- gsub("west-ham","west ham",bk$match.event)
bk$match.event <- gsub("crystal-palace","c palace",bk$match.event)
bk$match.event <- gsub("sheffield-utd","sheff utd",bk$match.event)
bk$match.event <- gsub("cambridge-utd","cambridge u",bk$match.event)
bk$match.event <- gsub("dagenham-redbridge","dag &amp; red",bk$match.event)
bk$match.event <- gsub("mk-dons","mk dons",bk$match.event)
bk$match.event <- gsub("notts-county","notts co",bk$match.event)
bk$match.event <- gsub("newport-county","newport co",bk$match.event)
bk$match.event <- gsub("afc-wimbledon","afc w'bledon",bk$match.event)
bk$match.event <- gsub("york-city","york",bk$match.event)
bk$team1 <- gsub("^(.*?)-v-(.*?)-(.*?)$","\\1",bk$match.event)
bk$team2 <- gsub("^(.*?)-v-(.*?)-(.*?)$","\\2",bk$match.event)
bk$match.event <- gsub("^(.*?)-v-(.*?)-(.*?)$","\\3",bk$match.event)

bk.h <- bk[bk$match.event==bk$team1,]
bk.d <- bk[bk$match.event=="draw",]
bk.a <- bk[bk$match.event==bk$team2,]

all.forecast.outcomes$team1 <- tolower(all.forecast.outcomes$team1)
all.forecast.outcomes$team2 <- tolower(all.forecast.outcomes$team2)
all.forecast.outcomes <- all.forecast.outcomes[duplicated(all.forecast.outcomes)==F,]

bk.h <- merge(bk.h,all.forecast.outcomes[,c("match_id","team1","team2","outcome.h","Ph")],
              by=c("team1","team2"),all.x=T)
bk.mz.h <- lm(outcome.h ~ mean,data=bk.h)
#summary(bk.mz.h)
kable(summary(bk.mz.h)$coef, digits=3)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.113 0.026 4.346 0
mean 0.486 0.044 11.130 0
bk.calib.h <- aggregate(bk.h$outcome.h,by=list(round(bk.h$mean,2)),FUN=mean,na.rm=T)
plot(bk.calib.h$Group.1,bk.calib.h$x,xlim=range(0,1),ylim=range(0,1),
     ylab="Frequency of events that occurred",xlab="Forecast probability",
     main="BK Calibration plot for home win outcomes")
abline(0,1,lty=3)

bk.d <- merge(bk.d,all.forecast.outcomes[,c("match_id","team1","team2","outcome.d","Pd")],
              by=c("team1","team2"),all.x=T)
bk.mz.d <- lm(outcome.d ~ mean,data=bk.d)
#summary(bk.mz.d)
kable(summary(bk.mz.d)$coef, digits=3)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.200 0.018 11.291 0.000
mean 0.112 0.038 2.953 0.003
bk.calib.d <- aggregate(bk.d$outcome.d,by=list(round(bk.d$mean,2)),FUN=mean,na.rm=T)
plot(bk.calib.d$Group.1,bk.calib.d$x,xlim=range(0,1),ylim=range(0,1),
     ylab="Frequency of events that occurred",xlab="Forecast probability",
     main="BK Calibration plot for draw outcomes")
abline(0,1,lty=3)

bk.a <- merge(bk.a,all.forecast.outcomes[,c("match_id","team1","team2","outcome.a","Pa")],
              by=c("team1","team2"),all.x=T)
bk.mz.a <- lm(outcome.a ~ mean,data=bk.a)
#summary(bk.mz.a)
kable(summary(bk.mz.a)$coef, digits=3)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.278 0.020 13.756 0
mean 0.220 0.039 5.639 0
bk.calib.a <- aggregate(bk.a$outcome.a,by=list(round(bk.a$mean,2)),FUN=mean,na.rm=T)
plot(bk.calib.a$Group.1,bk.calib.a$x,xlim=range(0,1),ylim=range(0,1),
     ylab="Frequency of events that occurred",xlab="Forecast probability",
     main="BK Calibration plot for away win outcomes")
abline(0,1,lty=3)

bk.h$diff <- bk.h$mean - bk.h$Ph
bk.mod.mz.h <- lm(outcome.h ~ mean + diff,data=bk.h)
#summary(bk.mod.mz.h)
kable(summary(bk.mod.mz.h)$coef, digits=3)
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.137 0.031 -4.456 0
mean 1.126 0.062 18.169 0
diff -0.943 0.067 -13.979 0
bk.d$diff <- bk.d$mean - bk.d$Pd
bk.mod.mz.d <- lm(outcome.d ~ mean + diff,data=bk.d)
#summary(bk.mod.mz.d)
kable(summary(bk.mod.mz.d)$coef, digits=3)
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.141 0.065 -2.167 0.03
mean 1.481 0.255 5.815 0.00
diff -1.401 0.258 -5.434 0.00
bk.a$diff <- bk.a$mean - bk.a$Pa
bk.mod.mz.a <- lm(outcome.a ~ mean + diff,data=bk.a)
#summary(bk.mod.mz.a)
kable(summary(bk.mod.mz.a)$coef, digits=3)
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.012 0.024 -0.483 0.629
mean 1.297 0.067 19.463 0.000
diff -1.345 0.070 -19.193 0.000
plot(calib.h$Group.1,calib.h$x,xlim=range(0,1),ylim=range(0,1),
     ylab="Frequency of events that occurred",xlab="Forecast probability",
     main="Calibration plot for home win outcomes")
lines(bk.calib.h$Group.1,bk.calib.h$x,col=2,type="p")
abline(0,1,lty=3)
legend("topleft",col=1:2,legend=c("Model","Bookmakers"),lty=1,bty="n")

plot(calib.d$Group.1,calib.d$x,xlim=range(0,1),ylim=range(0,1),
     ylab="Frequency of events that occurred",xlab="Forecast probability",
     main="Calibration plot for draw outcomes")
lines(bk.calib.d$Group.1,bk.calib.d$x,col=2,type="p")
abline(0,1,lty=3)
legend("topleft",col=1:2,legend=c("Model","Bookmakers"),lty=1,bty="n")

plot(calib.a$Group.1,calib.a$x,xlim=range(0,1),ylim=range(0,1),
     ylab="Frequency of events that occurred",xlab="Forecast probability",
     main="Calibration plot for away win outcomes")
lines(bk.calib.a$Group.1,bk.calib.a$x,col=2,type="p")
abline(0,1,lty=3)
legend("topleft",col=1:2,legend=c("Model","Bookmakers"),lty=1,bty="n")

Model: End-of-season outcomes 2013

forecast.days.2013 <- data.frame()

dloc <- "/home/readejj/Dropbox/Teaching/Reading/ec313/2015/Football-forecasts/"
days <- dir(dloc,pattern="prem-final-*\\d*-\\d+[.]csv")
for( d in days) {
  temp <- read.csv(paste(dloc,d,sep=""),stringsAsFactors=F) 
  temp <- temp[is.na(temp$Man.Utd)==F,]
  temp$day <- as.Date(as.numeric(gsub("prem-final-*\\d*-(\\d+)[.]csv","\\1",d)),origin="1970-01-01")
  forecast.days.2013 <- rbind(forecast.days.2013,temp)
}

forc.probs.2013.1 <- aggregate(forecast.days.2013==1,by=list(forecast.days.2013$day),FUN=mean)
forc.probs.2013.rel <- aggregate(forecast.days.2013>=18,by=list(forecast.days.2013$day),FUN=mean)

plot(forc.probs.2013.1$Group.1,forc.probs.2013.1$Chelsea,ylim=range(0,1),type="o",col="blue",
     main="Model forecasts for EPL Title 2013-14",
     ylab="Probability of Winning EPL",xlab="Date")
lines(forc.probs.2013.1$Group.1,forc.probs.2013.1$Man.City,type="o",col="skyblue")
lines(forc.probs.2013.1$Group.1,forc.probs.2013.1$Man.Utd,type="o",col="red")
lines(forc.probs.2013.1$Group.1,forc.probs.2013.1$Arsenal,type="o",col="darkred")
lines(forc.probs.2013.1$Group.1,forc.probs.2013.1$Liverpool,type="o",col="red2")

plot(forc.probs.2013.rel$Group.1,forc.probs.2013.rel$Cardiff,ylim=range(0,1),type="o",col="red",
     main="Model forecasts for EPL Relegation 2013-14",
     ylab="Probability of Relegation from EPL",xlab="Date")
lines(forc.probs.2013.rel$Group.1,forc.probs.2013.rel$Norwich,type="o",col="yellow")
lines(forc.probs.2013.rel$Group.1,forc.probs.2013.rel$Fulham,type="o",col="black")
lines(forc.probs.2013.rel$Group.1,forc.probs.2013.rel$Aston.Villa,type="o",col="purple")
lines(forc.probs.2013.rel$Group.1,forc.probs.2013.rel$West.Brom,type="o",col="darkblue")
lines(forc.probs.2013.rel$Group.1,forc.probs.2013.rel$West.Ham,type="o",col="purple3")
lines(forc.probs.2013.rel$Group.1,forc.probs.2013.rel$Sunderland,type="o",col="pink")

Model: End-of-season outcomes 2014

forecast.days <- data.frame()

dloc <- "/home/readejj/Dropbox/Teaching/Reading/ec313/2015/Football-forecasts/"
days <- dir(dloc,pattern="prem-final-2014-\\d-(\\d+).csv")
for( d in days) {
  temp <- read.csv(paste(dloc,d,sep=""),stringsAsFactors=F) 
  temp$day <- as.Date(as.numeric(gsub("prem-final-2014-\\d+-(\\d+).csv","\\1",d)),origin="1970-01-01")
  forecast.days <- rbind(forecast.days,temp)
}

forc.probs.1 <- aggregate(forecast.days==1,by=list(forecast.days$day),FUN=mean)
forc.probs.rel <- aggregate(forecast.days>=18,by=list(forecast.days$day),FUN=mean)


plot(forc.probs.1$Group.1,forc.probs.1$Chelsea,ylim=range(0,1),type="o",col="blue",
     main="Comparison - Model and Bookmakers, Chelsea",
     ylab="Probability of Winning EPL",xlab="Date")
lines(forc.probs.1$Group.1,forc.probs.1$Man.City,type="o",col="skyblue")
lines(forc.probs.1$Group.1,forc.probs.1$Man.Utd,type="o",col="red")
lines(forc.probs.1$Group.1,forc.probs.1$Arsenal,type="o",col="darkred")
lines(forc.probs.1$Group.1,forc.probs.1$Liverpool,type="o",col="red2")

plot(forc.probs.rel$Group.1,forc.probs.rel$Leicester,ylim=range(0,1),type="o",col="blue",
     main="Comparison - Model and Bookmakers, Leicester",
     ylab="Probability of Relegation",xlab="Date")
lines(forc.probs.rel$Group.1,forc.probs.rel$Burnley,type="o",col="purple")
lines(forc.probs.rel$Group.1,forc.probs.rel$QPR,type="o",col="lightblue")
lines(forc.probs.rel$Group.1,forc.probs.rel$Sunderland,type="o",col="pink")

Comparison: Bookmaker end-of-season outcomes

teams.1 <- winner$team[duplicated(winner$team)==F]
team.cols.1 <- c("darkred","blue","red2","skyblue","red","pink","black")

plot(range(winner$Date.Time),rep(0,2),ylim=range(0,1),col="white",ylab="Probability",xlab="",
     main="Bookmaker Implied Probabilities for EPL Title",xaxt="n")
axis(1,at=seq(as.Date("2014-08-01"),as.Date("2015-06-01"),by="months"),
     labels=format(seq(as.Date("2014-08-01"),as.Date("2015-06-01"),by="months"),"%b-%Y"),las=2)
for(t in 1:NROW(teams.1)) {
  for(b in 1:NROW(bks)) {
    lines(winner$Date.Time[winner$team==teams.1[t]],1/winner[winner$team==teams.1[t],bks[b]],col=team.cols.1[t])
  }  
}
legend("topleft",ncol=1,lty=1,col=team.cols.1,legend=teams.1,bty="n")

for(date in match.dates.2014) {
  abline(v=as.Date(date),lty=3)
}
for(date in match.dates.2013[-NROW(match.dates.2013)]) {
  abline(v=as.Date(date,origin="1970-01-01"),lty=3,col="grey")
}

teams.r <- releg$team[duplicated(releg$team)==F]
team.cols.r <- c("purple","mediumpurple1","red","blue","orange","darkblue","black","lightblue","pink","hotpink",
               "black","darkblue","purple3")

plot(range(releg$Date.Time),rep(0,2),ylim=range(0,1),col="white",ylab="Probability",xlab="",
     main="Bookmaker Implied Probabilities for Relegation",xaxt="n")
axis(1,at=seq(as.Date("2014-08-01"),as.Date("2015-06-01"),by="months"),
     labels=format(seq(as.Date("2014-08-01"),as.Date("2015-06-01"),by="months"),"%b-%Y"),las=2)
for(t in 1:NROW(teams)) {
  for(b in 1:NROW(bks)) {
    lines(releg$Date.Time[releg$team==teams.r[t]],1/releg[releg$team==teams.r[t],bks[b]],col=team.cols.r[t])
  }  
}
legend("topleft",ncol=1,lty=1,col=team.cols.r,legend=teams.r,bty="n")

for(date in match.dates.2014) {
  abline(v=as.Date(date),lty=3)
}

Model vs Bookmakers

plot(forc.probs.1$Group.1,forc.probs.1$Chelsea,ylim=range(0,1),type="o",col="blue",
     main="Comparison - Model and Bookmakers, Chelsea",
     ylab="Probability of Winning EPL",xlab="Date")
for(b in 1:NROW(bks)) {
  lines(winner$Date.Time[winner$team=="chelsea"],1/winner[winner$team=="chelsea",bks[b]],col="blue")
}  

plot(forc.probs.1$Group.1,forc.probs.1$Man.City,ylim=range(0,1),type="o",col="skyblue",
     main="Comparison - Model and Bookmakers, Man City",
     ylab="Probability of Winning EPL",xlab="Date")
for(b in 1:NROW(bks)) {
  lines(winner$Date.Time[winner$team=="man-city"],1/winner[winner$team=="man-city",bks[b]],col="skyblue")
}  

plot(forc.probs.1$Group.1,forc.probs.1$Man.Utd,ylim=range(0,1),type="o",col="red",
     main="Comparison - Model and Bookmakers, Man United",
     ylab="Probability of Winning EPL",xlab="Date")
for(b in 1:NROW(bks)) {
  lines(winner$Date.Time[winner$team=="man-utd"],1/winner[winner$team=="man-utd",bks[b]],col="red")
}  

plot(forc.probs.rel$Group.1,forc.probs.rel$Leicester,ylim=range(0,1),type="o",col="blue",
     main="Comparison - Model and Bookmakers, Leicester",
     ylab="Probability of Relegation",xlab="Date")
for(b in 1:NROW(bks)) {
  lines(releg$Date.Time[releg$team=="leicester"],1/releg[releg$team=="leicester",bks[b]],col="blue")
}  

plot(forc.probs.rel$Group.1,forc.probs.rel$Burnley,ylim=range(0,1),type="o",col="purple",
     main="Comparison - Model and Bookmakers, Burnley",
     ylab="Probability of Relegation",xlab="Date")
for(b in 1:NROW(bks)) {
  lines(releg$Date.Time[releg$team=="burnley"],1/releg[releg$team=="burnley",bks[b]],col="purple")
}  

Discussion

Analysis of Bookmakers

  1. Who offers best price most often?
  2. What prompts bookmakers to change prices?
    • Analysis suggests matches must be most important driver of price changes.
    • To what extent do competitive pressures influence updating?

Frequency of Price Changes

winner.full <- data.frame(stringsAsFactors=F)
for(t in teams.1) {
  temp <- data.frame("Date.Time"=seq(min(winner$Date.Time),max(winner$Date.Time),by="days"),
                     "team"=t,stringsAsFactors=F)
  winner.full <- rbind(winner.full,temp)
}
winner.full <- merge(winner.full,winner,by=c("Date.Time","team"),all.x=T)
winner.full <- winner.full[order(winner.full$team,winner.full$Date.Time),]

for(t in teams.1) {
  winner.full[winner.full$team==t,bks] <- na.locf(winner.full[winner.full$team==t,bks],na.rm=F)
}


winner.full$matchday <- as.numeric(winner.full$Date.Time %in% as.Date(c(match.dates.2013,match.dates.2014)))

winner.full <- winner.full[order(winner.full$team,winner.full$Date.Time),]
for(b in bks) {
  winner.full[paste(b,1,sep=".")] <- c(-999,winner.full[-NROW(winner.full),b])
}
winner.full$team.1 <- c("na",winner.full$team[-NROW(winner.full)])
for(b in bks) {
  winner.full[paste(b,"d",sep=".")] <- as.numeric((winner.full[b] != winner.full[paste(b,1,sep=".")]) & (winner.full$team==winner.full$team.1))
}
best.p <- apply(winner.full[,nchar(colnames(winner.full))==2],1,FUN=max,na.rm=T)
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
## Warning in FUN(newX[, i], ...): no non-missing arguments to max; returning
## -Inf
best.full <- colSums(winner.full[,nchar(colnames(winner.full))==2]==best.p,na.rm=T)
100*colMeans(winner.full[grep("[.]d",colnames(winner.full))],na.rm=T)
##     B3.d     SK.d     BX.d     BY.d     FR.d     SO.d     VC.d     PP.d 
## 12.64007 10.98592 12.19872 11.98630 11.27717 11.48497 19.10064 12.89675 
##     SJ.d     EE.d     LD.d     CE.d     WH.d     WN.d     SX.d     FB.d 
## 12.75294 12.26941 15.63461 12.20811 10.45611 14.50313 11.40351 13.28060 
##     WA.d     TI.d     UN.d     BW.d     RD.d     BF.d     BD.d     MA.d 
## 13.25301 16.70468 12.14071 10.25311 14.35986 17.02713 30.60697 29.63585
long.winner.full <- read.csv(paste(loc,"long-winner-full.csv",sep=""),stringsAsFactors=F)

bk.reg <- lm(bk.d ~ matchday + bk.oth.d + bk.oth.d.1 + team.play + team.win + team.lose,data=long.winner.full)
#summary(bk.reg)
kable(summary(bk.reg)$coef, digits=3)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.017 0.002 10.139 0.000
matchday 0.058 0.005 12.819 0.000
bk.oth.d 0.032 0.000 66.520 0.000
bk.oth.d.1 0.001 0.000 3.315 0.001
team.play 0.075 0.010 7.315 0.000
team.win -0.004 0.011 -0.414 0.679
team.lose 0.036 0.013 2.803 0.005
bk.reg.2 <- lm(bk.d ~ bk + matchday + bk.oth.d + bk.oth.d.1 + team.play + team.win + team.lose,data=long.winner.full)
#summary(bk.reg.2)
kable(summary(bk.reg.2)$coef, digits=3)
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.004 0.006 -0.572 0.567
bkBD 0.191 0.009 21.987 0.000
bkBF 0.057 0.008 6.732 0.000
bkBW -0.020 0.009 -2.327 0.020
bkBX -0.014 0.009 -1.573 0.116
bkBY -0.002 0.009 -0.220 0.826
bkCE 0.004 0.009 0.484 0.628
bkEE 0.001 0.009 0.086 0.931
bkFB 0.013 0.009 1.557 0.119
bkFR -0.015 0.009 -1.682 0.093
bkLD 0.033 0.009 3.821 0.000
bkMA 0.154 0.009 16.570 0.000
bkPP 0.013 0.009 1.553 0.120
bkRD 0.022 0.009 2.513 0.012
bkSJ -0.003 0.009 -0.356 0.722
bkSK -0.007 0.009 -0.839 0.402
bkSO -0.011 0.009 -1.237 0.216
bkSX -0.021 0.009 -2.323 0.020
bkTI 0.018 0.009 1.905 0.057
bkUN -0.001 0.009 -0.068 0.946
bkVC 0.071 0.009 8.252 0.000
bkWA 0.014 0.009 1.579 0.114
bkWH -0.018 0.009 -2.097 0.036
bkWN 0.020 0.009 2.262 0.024
matchday 0.055 0.004 12.308 0.000
bk.oth.d 0.032 0.000 68.096 0.000
bk.oth.d.1 0.001 0.000 3.557 0.000
team.play 0.074 0.010 7.255 0.000
team.win -0.004 0.010 -0.403 0.687
team.lose 0.035 0.013 2.762 0.006
bk.reg.3 <- lm(bk.d ~ bk + matchday*bk + bk.oth.d*bk + team.play*bk,data=long.winner.full)
#summary(bk.reg.3)
kable(summary(bk.reg.3)$coef, digits=3)
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.022 0.007 -3.004 0.003
bkBD 0.267 0.010 26.014 0.000
bkBF 0.156 0.010 15.409 0.000
bkBW 0.016 0.010 1.575 0.115
bkBX -0.027 0.011 -2.482 0.013
bkBY -0.002 0.010 -0.149 0.881
bkCE 0.013 0.010 1.296 0.195
bkEE -0.008 0.010 -0.789 0.430
bkFB 0.031 0.010 3.002 0.003
bkFR -0.009 0.011 -0.885 0.376
bkLD 0.018 0.010 1.760 0.078
bkMA 0.284 0.011 25.388 0.000
bkPP 0.016 0.010 1.530 0.126
bkRD 0.031 0.010 2.960 0.003
bkSJ 0.045 0.011 4.261 0.000
bkSK -0.005 0.010 -0.473 0.636
bkSO 0.035 0.011 3.353 0.001
bkSX -0.006 0.011 -0.543 0.587
bkTI -0.018 0.011 -1.564 0.118
bkUN -0.010 0.010 -0.931 0.352
bkVC 0.132 0.010 12.740 0.000
bkWA -0.003 0.010 -0.337 0.736
bkWH 0.005 0.010 0.475 0.635
bkWN -0.009 0.011 -0.895 0.371
matchday 0.001 0.021 0.066 0.947
bk.oth.d 0.042 0.002 19.399 0.000
team.play 0.106 0.029 3.695 0.000
bkBD:matchday 0.077 0.030 2.569 0.010
bkBF:matchday 0.073 0.027 2.720 0.007
bkBW:matchday -0.042 0.030 -1.394 0.163
bkBX:matchday -0.032 0.030 -1.066 0.286
bkBY:matchday 0.000 0.029 0.008 0.994
bkCE:matchday 0.022 0.028 0.769 0.442
bkEE:matchday -0.014 0.030 -0.458 0.647
bkFB:matchday 0.110 0.028 3.913 0.000
bkFR:matchday -0.002 0.030 -0.059 0.953
bkLD:matchday 0.149 0.030 5.030 0.000
bkMA:matchday -0.026 0.030 -0.867 0.386
bkPP:matchday 0.036 0.028 1.300 0.193
bkRD:matchday -0.007 0.030 -0.250 0.803
bkSJ:matchday 0.152 0.030 5.109 0.000
bkSK:matchday 0.033 0.027 1.210 0.226
bkSO:matchday 0.133 0.030 4.454 0.000
bkSX:matchday 0.019 0.030 0.625 0.532
bkTI:matchday 0.173 0.030 5.719 0.000
bkUN:matchday 0.010 0.030 0.322 0.747
bkVC:matchday 0.120 0.029 4.188 0.000
bkWA:matchday 0.115 0.028 4.096 0.000
bkWH:matchday 0.007 0.030 0.247 0.805
bkWN:matchday 0.141 0.030 4.724 0.000
bkBD:bk.oth.d -0.023 0.003 -7.557 0.000
bkBF:bk.oth.d -0.034 0.003 -11.575 0.000
bkBW:bk.oth.d -0.007 0.003 -2.300 0.021
bkBX:bk.oth.d 0.003 0.003 1.045 0.296
bkBY:bk.oth.d -0.003 0.003 -0.851 0.395
bkCE:bk.oth.d -0.007 0.003 -2.505 0.012
bkEE:bk.oth.d 0.003 0.003 1.014 0.311
bkFB:bk.oth.d -0.015 0.003 -5.034 0.000
bkFR:bk.oth.d -0.003 0.003 -0.991 0.322
bkLD:bk.oth.d -0.007 0.003 -2.400 0.016
bkMA:bk.oth.d -0.027 0.003 -8.652 0.000
bkPP:bk.oth.d -0.001 0.003 -0.477 0.633
bkRD:bk.oth.d -0.003 0.003 -0.913 0.361
bkSJ:bk.oth.d -0.020 0.003 -6.445 0.000
bkSK:bk.oth.d -0.001 0.003 -0.428 0.669
bkSO:bk.oth.d -0.019 0.003 -6.269 0.000
bkSX:bk.oth.d -0.005 0.003 -1.481 0.139
bkTI:bk.oth.d -0.004 0.003 -1.202 0.229
bkUN:bk.oth.d 0.005 0.003 1.532 0.125
bkVC:bk.oth.d -0.025 0.003 -8.456 0.000
bkWA:bk.oth.d -0.002 0.003 -0.675 0.500
bkWH:bk.oth.d -0.010 0.003 -3.274 0.001
bkWN:bk.oth.d 0.000 0.003 0.042 0.966
bkBD:team.play -0.253 0.041 -6.174 0.000
bkBF:team.play -0.186 0.041 -4.561 0.000
bkBW:team.play -0.047 0.041 -1.151 0.250
bkBX:team.play 0.078 0.041 1.919 0.055
bkBY:team.play 0.089 0.041 2.202 0.028
bkCE:team.play 0.119 0.041 2.926 0.003
bkEE:team.play 0.023 0.041 0.568 0.570
bkFB:team.play 0.043 0.041 1.055 0.291
bkFR:team.play 0.048 0.041 1.170 0.242
bkLD:team.play 0.038 0.041 0.928 0.353
bkMA:team.play -0.213 0.041 -5.175 0.000
bkPP:team.play -0.047 0.041 -1.150 0.250
bkRD:team.play 0.020 0.041 0.479 0.632
bkSJ:team.play -0.200 0.041 -4.898 0.000
bkSK:team.play -0.045 0.041 -1.105 0.269
bkSO:team.play -0.155 0.041 -3.786 0.000
bkSX:team.play -0.047 0.041 -1.149 0.251
bkTI:team.play -0.038 0.041 -0.928 0.353
bkUN:team.play -0.088 0.041 -2.162 0.031
bkVC:team.play -0.102 0.041 -2.509 0.012
bkWA:team.play -0.031 0.041 -0.771 0.441
bkWH:team.play 0.081 0.041 1.980 0.048
bkWN:team.play -0.051 0.041 -1.250 0.211
all.bk.coefs <- data.frame("(Intercept)"=NA," matchday"=NA,"bk.oth.d"=NA,"bk.oth.d.1"=NA,
                           "team.play"=NA,"team.win"=NA,"team.lose"=NA)
all.bk.ts <- data.frame("(Intercept)"=NA," matchday"=NA,"bk.oth.d"=NA,"bk.oth.d.1"=NA,
                        "team.play"=NA,"team.win"=NA,"team.lose"=NA)
for(b in bks) {
  bk.reg.0 <- lm(bk.d ~ matchday + bk.oth.d + bk.oth.d.1 + team.play + team.win + team.lose,data=long.winner.full[long.winner.full$bk==b,])
  #  print(summary(bk.reg.0))
  all.bk.coefs <- rbind(all.bk.coefs,coefficients(bk.reg.0))
  all.bk.ts <- rbind(all.bk.ts,summary(bk.reg.0)$coefficients[,3])
}
hist(all.bk.coefs$X.matchday)

hist(all.bk.coefs$bk.oth.d)

hist(all.bk.coefs$team.play)

Conclusions

References

Forrest, D. and Simmons, R. (2000), “Forecasting Sport: The Behaviour and Performance of Football Tipsters”, International Journal of Forecasting, Vol. 16, pp. 317–331.

Goddard, J. and Asimakopoulos, I. (2004), “Modelling football match results and the efficiency of fixed-odds betting”, International Journal of Forecasting, Vol. 23, pp. 51–66.

Hyndman, R. and Athanasopoulos, G. (2012), Forecasting: Principles and Practice, www.otexts.com, available at: https://www.otexts.org/book/fpp.

Karlis, D. and Ntzoufras, I. (2003), “Analysis of Sports Data By Using Bivariate Poisson Models”, The Statistician, Vol. 52 No. 3, pp. 381–393.

Shin, H. (1991), “Optimal Betting Odds Against Insider Traders”, The Economic Journal, Vol. 101 No. 408, pp. 1179–1185.

Shin, H. (1992), “Prices of State Contingent Claims with Insider Traders, and the Favourite-Longshot Bias”, The Economic Journal, Vol. 102 No. 411, pp. 426–435.

Shin, H. (1993), “Measuring the Incidence of Insider Trading in a Market for State-Contingent Claims”, The Economic Journal, Vol. 103 No. 420, pp. 1141–1153.

Vaughan Williams, L. and Paton, D. (1997), “Why Is There a Favourite-Longshot Bias in British Racetrack Betting Markets?”, The Economic Journal, pp. 150–158.