This is number 13 in a series of forecasts of football match outcomes, following on from recent efforts. The method of forecasts is unchanged from previous weeks.
Bookmaker prices are compared with forecasts, potentially offering opportunities, should you sufficiently believe my forecasts to be better than those of bookmakers. Naturally, I make no such claim, and indeed since the bookmaker prices were collected (Friday 7pm each week), they may well have changed.
The dataset used is all English matches recorded on http://www.soccerbase.com, which goes back to 1877 and the very first football matches. Experimentation will take place in time (i.e., not this week) with adjusting the estimation sample size, since it is not necessarily useful to have all matches back to 1877 when forecasting matches in 2015. The Elo ranks have been calculated since the very first matches, and hence historical information is retained even if the regression model were to omit those early matches, to the extent that it is useful in determining a team’s current strength, back throughout footballing history.
library(knitr)
library(MASS)
source("/home/readejj/Dropbox/Research/Code/R/betting/clean.data.R")
date.1 <- tail(dates,1)
loc0 <- "/home/readejj/Dropbox/Teaching/Reading/ec313/2015/Football-forecasts/"
forecast.matches <- read.csv(paste("forecasts_",date.1,".csv",sep=""),stringsAsFactors=F)
forecast.matches <- forecast.matches[is.na(forecast.matches$outcome)==F,]
forecast.matches <- forecast.matches[forecast.matches$date>"2015-05-28",]
#res.eng <- read.csv(paste(loc0,"historical_",date.1,".csv",sep=""))
An ordered logistic regression model is run; this has been reported each previous week, for reference purposes.
All thus far is fairly overkill for one single forecast: the FA Cup Final between Arsenal and Aston Villa, taking place at Wembley on May 30 (tomorrow as I write this). This match is a little different to league matches and other matches forecast previously, in that it has to have an outcome on the day. Nonetheless, a draw is possible, and I’m not forecasting the outcome of any penalty shoot-out, just the match itself.
simpleplot("English FA Cup")
The coloured dots are forecasts from the ordered logistic regression model; the red squares are the probability of a home win, the green solid circles are the probability of a draw, and the blue triangles the probability of an away win. Arsenal are strong favourites, with their probability of victory at 73%, with Aston Villa at only 10%.
bkplot(as.Date(date.1)+2,"English FA Cup","fa-cup")
## Loading required package: zoo
##
## Attaching package: 'zoo'
##
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
The darker coloured and smaller symbols are the range of bookmaker implied probabilities for outcomes. Bookmakers are not distinguished, but the range of bookmaker prices as shown on Oddschecker is reported, rather than any summary statistic. A bigger spread of prices might suggest greater uncertainty amongst bookmakers about any particular outcome.
Bookmakers similarly see Arsenal as strong favourites, with implied probabilities (correcting for overround and favourite longshot bias) of between 60 and 70%. This means that bookmakers back Arsenal slightly less strongly than my model suggests.
The overall conclusion is that Arsenal are strong favourites; this naturally does not mean they will win, but it means that they are much more likely than Aston Villa to win.
This is the final forecast for the English football league for the current, 2014–15 season. I plan on continuing these forecasts into the new season in August, and in the meantime considering whether the method needs any amending by evaluating the forecasts already made and published against bookmaker prices.