Following on from the forecasts set out in http://rpubs.com/jjreade/forc_300115 on Friday ahead of the weekend’s action, this document is the first of a set of updates on such forecasts, detailing outcomes, and discussing improvements to be implemented.
Firstly, we should consider outcomes; we load up the specific outcomes file for matches forecast over the weekend:
loc2 <- "/home/readejj/Dropbox/Teaching/Reading/ec313/2015/Football-forecasts/"
date.1 <- "2015-01-30"
recent.forecast.outcomes <- read.csv(paste(loc2,"forecast_outcomes_",date.1,".csv",sep=""),stringsAsFactors=F)
forecast.matches <- read.csv(paste(loc2,"forecasts_",date.1,".csv",sep=""))
forecast.matches <- forecast.matches[is.na(forecast.matches$outcome)==F,]
forecast.outcomes <- merge(forecast.matches[,c("match_id","outcome")],recent.forecast.outcomes,by=c("match_id"),
suffixes=c(".forc",".final"))
First, our Premier League forecasts:
prem.matches <- forecast.outcomes[forecast.outcomes$division=="English Premier",]
prem.matches$id <- 1:NROW(prem.matches)
par(mar=c(9,4,4,5)+.1)
plot(prem.matches$id,prem.matches$outcome.forc,xaxt="n",xlab="",ylim=range(0,1),
main="Forecasts of Weekend Premier League Matches",
ylab="Probability of Outcome")
abline(h=0.5,lty=2)
abline(h=0.6,lty=3)
abline(h=0.7,lty=2)
lines(prem.matches$id,prem.matches$outcome.final,col=2,type="p")
for(i in 1:NROW(prem.matches)) {
lines(rep(i,2),c(prem.matches$outcome.forc[i],prem.matches$outcome.final[i]),type="l",lty=2,col=4)
}
axis(1,at=prem.matches$id,labels=paste(prem.matches$team1,prem.matches$team2,sep=" v "),las=2,cex.axis=0.65)
Red circles are outcomes, which are either 0 (away win), 0.5 (draw), or 1 (home win). Blue lines link forecasts to outcomes. Hence three of the highest probability forecasts ended up as home wins, and even such a (relative) “success”, as can be seen, yields what appears a reasonably sizeable forecast error. The fourth (relatively) strongly expected home win, Southampton against Swansea, turned out to be an away win, against most expectations, arguably.
Next, our Championship forecasts:
champ.matches <- forecast.outcomes[forecast.outcomes$division=="English Championship",]
champ.matches$id <- 1:NROW(champ.matches)
par(mar=c(9,4,4,5)+.1)
plot(champ.matches$id,champ.matches$outcome.forc,xaxt="n",xlab="",ylim=range(0,1),
main="Forecasts of Weekend Championship Matches",
ylab="Probability of Outcome")
abline(h=0.5,lty=2)
abline(h=0.6,lty=3)
abline(h=0.7,lty=2)
lines(champ.matches$id,champ.matches$outcome.final,col=2,type="p")
for(i in 1:NROW(champ.matches)) {
lines(rep(i,2),c(champ.matches$outcome.forc[i],champ.matches$outcome.final[i]),type="l",lty=2,col=4)
}
axis(1,at=champ.matches$id,labels=paste(champ.matches$team1,champ.matches$team2,sep=" v "),las=2,cex.axis=0.65)
There was a greater range of probabalistic forecasts for the Championship relative to the Premier League, with Blackpool and Cardiff only at just above 40% to beat Brighton and Derby, respectively. Blackpool defeated those odds to beat Brighton, whilst Derby won at Cardiff, as was expected.
Next, our League One forecasts:
lg1.matches <- forecast.outcomes[forecast.outcomes$division=="English League One",]
lg1.matches$id <- 1:NROW(lg1.matches)
par(mar=c(9,4,4,5)+.1)
plot(lg1.matches$id,lg1.matches$outcome.forc,xaxt="n",xlab="",ylim=range(0,1),
main="Forecasts of Weekend League One Matches",
ylab="Probability of Outcome")
abline(h=0.5,lty=2)
abline(h=0.6,lty=3)
abline(h=0.7,lty=2)
lines(lg1.matches$id,lg1.matches$outcome.final,col=2,type="p")
for(i in 1:NROW(lg1.matches)) {
lines(rep(i,2),c(lg1.matches$outcome.forc[i],lg1.matches$outcome.final[i]),type="l",lty=2,col=4)
}
axis(1,at=lg1.matches$id,labels=paste(lg1.matches$team1,lg1.matches$team2,sep=" v "),las=2,cex.axis=0.65)
Crawley’s win over Preston was the most unexpected result in League One over the weekend. The forecast of Barnsley vs Oldham has not been updated for the weekend’s results; both teams won, which suggests that the forecast is unlikely to be altered much. The least surprising result of the weekend perhaps is between Coventry and Rochdale, which finished a draw.
Next, our League Two forecasts:
lg2.matches <- forecast.outcomes[forecast.outcomes$division=="English League Two",]
lg2.matches$id <- 1:NROW(lg2.matches)
par(mar=c(9,4,4,5)+.1)
plot(lg2.matches$id,lg2.matches$outcome.forc,xaxt="n",xlab="",ylim=range(0,1),
main="Forecasts of Weekend League Two Matches",
ylab="Probability of Outcome")
abline(h=0.5,lty=2)
abline(h=0.6,lty=3)
abline(h=0.7,lty=2)
lines(lg2.matches$id,lg2.matches$outcome.final,col=2,type="p")
for(i in 1:NROW(lg2.matches)) {
lines(rep(i,2),c(lg2.matches$outcome.forc[i],lg2.matches$outcome.final[i]),type="l",lty=2,col=4)
}
axis(1,at=lg2.matches$id,labels=paste(lg2.matches$team1,lg2.matches$team2,sep=" v "),las=2,cex.axis=0.65)
League Two threw up a number of surprise results, perhaps most notably Hartlepool’s win over Plymouth, but also Oxford’s win at Stevenage and to a lesser extent Tranmere’s win at Exeter.
Next, our Football Conference forecasts:
conf.matches <- forecast.outcomes[forecast.outcomes$division=="Football Conference",]
conf.matches$id <- 1:NROW(conf.matches)
par(mar=c(9,4,4,5)+.1)
plot(conf.matches$id,conf.matches$outcome.forc,xaxt="n",xlab="",ylim=range(0,1),
main="Forecasts of Weekend Football Conference Matches",
ylab="Probability of Outcome")
abline(h=0.5,lty=2)
abline(h=0.6,lty=3)
abline(h=0.7,lty=2)
lines(conf.matches$id,conf.matches$outcome.final,col=2,type="p")
for(i in 1:NROW(conf.matches)) {
lines(rep(i,2),c(conf.matches$outcome.forc[i],conf.matches$outcome.final[i]),type="l",lty=2,col=4)
}
axis(1,at=conf.matches$id,labels=paste(conf.matches$team1,conf.matches$team2,sep=" v "),las=2,cex.axis=0.65)
Numerically it is important to evaluate forecast errors.
forecast.outcomes$error <- forecast.outcomes$outcome.final - forecast.outcomes$outcome.forc
forecast.outcomes$error2 <- forecast.outcomes$error^2
forecast.outcomes$aerror <- abs(forecast.outcomes$error)
summary(forecast.outcomes[forecast.outcomes$tier<=5,c("error","error2","aerror")])
## error error2 aerror
## Min. :-0.705574 Min. :0.000075 Min. :0.008652
## 1st Qu.:-0.491005 1st Qu.:0.052329 1st Qu.:0.228755
## Median :-0.008652 Median :0.166743 Median :0.408341
## Mean :-0.006526 Mean :0.172350 Mean :0.372104
## 3rd Qu.: 0.394592 3rd Qu.:0.261508 3rd Qu.:0.511379
## Max. : 0.589336 Max. :0.497834 Max. :0.705574
## NA's :11 NA's :11 NA's :11
We can consider also, by division, forecast errors:
library(knitr)
aggs <- aggregate(forecast.outcomes[forecast.outcomes$tier<=5,c("error","error2","aerror")],
by=list(forecast.outcomes$division[forecast.outcomes$tier<=5]),FUN=mean,na.rm=T)
kable(aggs[c(4,1,2,3,5),])
| Group.1 | error | error2 | aerror | |
|---|---|---|---|---|
| 4 | English Premier | -0.0780767 | 0.1992338 | 0.4146174 |
| 1 | English Championship | -0.0636480 | 0.1705921 | 0.3585123 |
| 2 | English League One | 0.1010544 | 0.1409837 | 0.3329361 |
| 3 | English League Two | 0.0003048 | 0.2034620 | 0.4220871 |
| 5 | Football Conference | -0.0025224 | 0.1373953 | 0.3161291 |
The error column is the mean forecast error, the error2 column is the mean squared forecast error, and the aerror column is the absolute forecast error.
In terms of simple forecast errors, the summary above suggests there is no obvious bias in forecasts (the mean error, first column, is essentially zero), but that on average errors are larger than might be hoped for; the mean absolute error is 0.37, suggesting that on average our forecasts are out by about 40%. This is not altogether surprising given that the majority of our forecasts lay in the region 40–60%, and naturally many of those matches ended in home or away wins. That the squared errors (error2) and absolute errors (aerror) yield similar information suggests that very few of our forecasts were wildly out (which would be penalised by a measure of squared errors), but equivalently this simply speaks to the moderate nature of our forecasts at this stage. Forecasts nearer to 80% and 90% may lead to smaller errors for matches that end as expected, but larger ones if matches end in surprise results.
The second set of information breaks down errors by division, and gives somewhat conflicting information; based on mean errors, Premier League and League Two forecasts are better, but based on squared and absolute errors, these are the worst forecasts. Although unbiasedness is an important requirement of forecasts, similarly we hope for forecasts with as minimal a variance as possible, and hence the larger squared measure suggests that this has yet to be achieved. The large (relative) positive bias for League One forecasts reflects the relatively large number of home wins in that division, and in particular two unexpected ones (Sheffield United and Crawley).
Finally, we list all the forecasts again with outcomes:
kable(forecast.outcomes[order(forecast.outcomes$date,forecast.outcomes$division),
c("date","division","team1","goals1","goals2","team2",
"outcome.forc","outcome.final","error","error2","aerror")],
digits=3)
| date | division | team1 | goals1 | goals2 | team2 | outcome.forc | outcome.final | error | error2 | aerror | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 15 | 2015-01-30 | English Championship | Bournemouth | 2 | 0 | Watford | 0.628 | 1.0 | 0.372 | 0.138 | 0.372 |
| 59 | 2015-01-31 | Conference North | Boston Utd | 1 | 1 | Stockport | 0.584 | 0.5 | -0.084 | 0.007 | 0.084 |
| 56 | 2015-01-31 | Conference South | Sutton Utd | 2 | 0 | Farnborough | 0.728 | 1.0 | 0.272 | 0.074 | 0.272 |
| 57 | 2015-01-31 | Conference South | Eastbourne | 0 | 1 | Bath City | 0.553 | 0.0 | -0.553 | 0.306 | 0.553 |
| 11 | 2015-01-31 | English Championship | Blackpool | 1 | 0 | Brighton | 0.443 | 1.0 | 0.557 | 0.310 | 0.557 |
| 12 | 2015-01-31 | English Championship | Huddersfield | 1 | 2 | Leeds | 0.597 | 0.0 | -0.597 | 0.356 | 0.597 |
| 13 | 2015-01-31 | English Championship | Nottm Forest | 0 | 1 | Millwall | 0.634 | 0.0 | -0.634 | 0.403 | 0.634 |
| 14 | 2015-01-31 | English Championship | Cardiff | 0 | 2 | Derby | 0.426 | 0.0 | -0.426 | 0.182 | 0.426 |
| 16 | 2015-01-31 | English Championship | Blackburn | 2 | 1 | Fulham | 0.597 | 1.0 | 0.403 | 0.162 | 0.403 |
| 17 | 2015-01-31 | English Championship | Reading | 2 | 0 | Sheff Wed | 0.562 | 1.0 | 0.438 | 0.191 | 0.438 |
| 18 | 2015-01-31 | English Championship | Brentford | 0 | 1 | Middlesbro | 0.491 | 0.0 | -0.491 | 0.241 | 0.491 |
| 19 | 2015-01-31 | English Championship | Charlton | 1 | 1 | Rotherham | 0.565 | 0.5 | -0.065 | 0.004 | 0.065 |
| 20 | 2015-01-31 | English Championship | Ipswich | 0 | 0 | Wigan | 0.729 | 0.5 | -0.229 | 0.052 | 0.229 |
| 21 | 2015-01-31 | English Championship | Birmingham | 0 | 0 | Norwich | 0.509 | 0.5 | -0.009 | 0.000 | 0.009 |
| 22 | 2015-01-31 | English Championship | Bolton | 2 | 2 | Wolves | 0.582 | 0.5 | -0.082 | 0.007 | 0.082 |
| 23 | 2015-01-31 | English League One | Bradford | 1 | 1 | Colchester | 0.685 | 0.5 | -0.185 | 0.034 | 0.185 |
| 24 | 2015-01-31 | English League One | Coventry | 2 | 2 | Rochdale | 0.484 | 0.5 | 0.016 | 0.000 | 0.016 |
| 25 | 2015-01-31 | English League One | Crewe | 0 | 5 | MK Dons | 0.414 | 0.0 | -0.414 | 0.172 | 0.414 |
| 26 | 2015-01-31 | English League One | Sheff Utd | 2 | 0 | Swindon | 0.498 | 1.0 | 0.502 | 0.252 | 0.502 |
| 27 | 2015-01-31 | English League One | Crawley | 2 | 1 | Preston | 0.411 | 1.0 | 0.589 | 0.347 | 0.589 |
| 29 | 2015-01-31 | English League One | Oldham | 3 | 0 | Notts Co | 0.599 | 1.0 | 0.401 | 0.161 | 0.401 |
| 30 | 2015-01-31 | English League One | Chesterfield | 2 | 2 | Doncaster | 0.633 | 0.5 | -0.133 | 0.018 | 0.133 |
| 31 | 2015-01-31 | English League One | Leyton Orient | 1 | 4 | Scunthorpe | 0.533 | 0.0 | -0.533 | 0.284 | 0.533 |
| 33 | 2015-01-31 | English League One | Barnsley | 2 | 1 | Port Vale | 0.589 | 1.0 | 0.411 | 0.169 | 0.411 |
| 34 | 2015-01-31 | English League One | Peterborough | 1 | 0 | Yeovil | 0.609 | 1.0 | 0.391 | 0.153 | 0.391 |
| 35 | 2015-01-31 | English League Two | Southend | 1 | 0 | York | 0.638 | 1.0 | 0.362 | 0.131 | 0.362 |
| 36 | 2015-01-31 | English League Two | Wycombe | 0 | 0 | Portsmouth | 0.697 | 0.5 | -0.197 | 0.039 | 0.197 |
| 37 | 2015-01-31 | English League Two | Dag & Red | 3 | 1 | Cheltenham | 0.605 | 1.0 | 0.395 | 0.156 | 0.395 |
| 38 | 2015-01-31 | English League Two | Exeter | 1 | 2 | Tranmere | 0.576 | 0.0 | -0.576 | 0.332 | 0.576 |
| 39 | 2015-01-31 | English League Two | Burton | 1 | 0 | Bury | 0.643 | 1.0 | 0.357 | 0.127 | 0.357 |
| 40 | 2015-01-31 | English League Two | Carlisle | 2 | 1 | Mansfield | 0.577 | 1.0 | 0.423 | 0.179 | 0.423 |
| 41 | 2015-01-31 | English League Two | Stevenage | 0 | 2 | Oxford | 0.646 | 0.0 | -0.646 | 0.417 | 0.646 |
| 42 | 2015-01-31 | English League Two | Luton | 3 | 2 | Cambridge U | 0.585 | 1.0 | 0.415 | 0.172 | 0.415 |
| 43 | 2015-01-31 | English League Two | Newport Co | 0 | 1 | Shrewsbury | 0.501 | 0.0 | -0.501 | 0.251 | 0.501 |
| 44 | 2015-01-31 | English League Two | Accrington | 1 | 5 | Northampton | 0.541 | 0.0 | -0.541 | 0.293 | 0.541 |
| 45 | 2015-01-31 | English League Two | Morecambe | 1 | 1 | AFC W’bledon | 0.569 | 0.5 | -0.069 | 0.005 | 0.069 |
| 46 | 2015-01-31 | English League Two | Hartlepool | 3 | 2 | Plymouth | 0.417 | 1.0 | 0.583 | 0.339 | 0.583 |
| 3 | 2015-01-31 | English Premier | Chelsea | 1 | 1 | Man City | 0.627 | 0.5 | -0.127 | 0.016 | 0.127 |
| 4 | 2015-01-31 | English Premier | Liverpool | 2 | 0 | West Ham | 0.592 | 1.0 | 0.408 | 0.167 | 0.408 |
| 5 | 2015-01-31 | English Premier | Hull | 0 | 3 | Newcastle | 0.525 | 0.0 | -0.525 | 0.276 | 0.525 |
| 6 | 2015-01-31 | English Premier | C Palace | 0 | 1 | Everton | 0.604 | 0.0 | -0.604 | 0.364 | 0.604 |
| 7 | 2015-01-31 | English Premier | Man Utd | 3 | 1 | Leicester | 0.709 | 1.0 | 0.291 | 0.085 | 0.291 |
| 8 | 2015-01-31 | English Premier | Stoke | 3 | 1 | QPR | 0.693 | 1.0 | 0.307 | 0.094 | 0.307 |
| 9 | 2015-01-31 | English Premier | Sunderland | 2 | 0 | Burnley | 0.584 | 1.0 | 0.416 | 0.173 | 0.416 |
| 10 | 2015-01-31 | English Premier | West Brom | 0 | 3 | Tottenham | 0.502 | 0.0 | -0.502 | 0.252 | 0.502 |
| 62 | 2015-01-31 | Evo-Stik S Premier | Weymouth | 3 | 0 | Histon | 0.681 | 1.0 | 0.319 | 0.102 | 0.319 |
| 48 | 2015-01-31 | Football Conference | Dartford | 2 | 2 | Bristol R | 0.404 | 0.5 | 0.096 | 0.009 | 0.096 |
| 49 | 2015-01-31 | Football Conference | Wrexham | 0 | 0 | Torquay | 0.549 | 0.5 | -0.049 | 0.002 | 0.049 |
| 50 | 2015-01-31 | Football Conference | Forest Green | 1 | 0 | Nuneaton | 0.784 | 1.0 | 0.216 | 0.047 | 0.216 |
| 51 | 2015-01-31 | Football Conference | Lincoln | 1 | 0 | Dover | 0.494 | 1.0 | 0.506 | 0.256 | 0.506 |
| 53 | 2015-01-31 | Football Conference | Woking | 3 | 0 | Alfreton | 0.721 | 1.0 | 0.279 | 0.078 | 0.279 |
| 54 | 2015-01-31 | Football Conference | Welling | 1 | 3 | Chester | 0.555 | 0.0 | -0.555 | 0.308 | 0.555 |
| 55 | 2015-01-31 | Football Conference | Southport | 0 | 1 | Gateshead | 0.511 | 0.0 | -0.511 | 0.262 | 0.511 |
| 61 | 2015-01-31 | Ryman Premier | Kingstonian | 0 | 1 | Maidstone | 0.511 | 0.0 | -0.511 | 0.261 | 0.511 |
| 28 | 2015-02-01 | English League One | Walsall | 1 | 1 | Gillingham | 0.625 | 0.5 | -0.125 | 0.016 | 0.125 |
| 32 | 2015-02-01 | English League One | Bristol C | 2 | 0 | Fleetwood | 0.706 | 1.0 | 0.294 | 0.087 | 0.294 |
| 1 | 2015-02-01 | English Premier | Arsenal | 5 | 0 | Aston Villa | 0.739 | 1.0 | 0.261 | 0.068 | 0.261 |
| 2 | 2015-02-01 | English Premier | Southampton | 0 | 1 | Swansea | 0.706 | 0.0 | -0.706 | 0.498 | 0.706 |
| 58 | 2015-02-03 | Conference North | Stockport | NA | NA | Barrow | 0.592 | NA | NA | NA | NA |
| 65 | 2015-02-03 | English FA Cup | Fulham | NA | NA | Sunderland | 0.467 | NA | NA | NA | NA |
| 66 | 2015-02-03 | English FA Cup | Sheff Utd | NA | NA | Preston | 0.575 | NA | NA | NA | NA |
| 67 | 2015-02-03 | English FA Cup | Man Utd | NA | NA | Cambridge U | 0.857 | NA | NA | NA | NA |
| 63 | 2015-02-03 | English League One | Barnsley | NA | NA | Oldham | 0.574 | NA | NA | NA | NA |
| 69 | 2015-02-03 | FA Trophy | Halifax | NA | NA | Dartford | 0.724 | NA | NA | NA | NA |
| 70 | 2015-02-03 | FA Trophy | Gateshead | NA | NA | Wrexham | 0.645 | NA | NA | NA | NA |
| 71 | 2015-02-03 | FA Trophy | Ebbsfleet | NA | NA | Braintree | 0.418 | NA | NA | NA | NA |
| 47 | 2015-02-03 | Football Conference | Dover | NA | NA | Grimsby | 0.583 | NA | NA | NA | NA |
| 52 | 2015-02-03 | Football Conference | Alfreton | NA | NA | Lincoln | 0.495 | NA | NA | NA | NA |
| 64 | 2015-02-03 | Football Conference | Wrexham | NA | NA | Forest Green | 0.513 | NA | NA | NA | NA |
| 68 | 2015-02-04 | English FA Cup | Bolton | NA | NA | Liverpool | 0.392 | NA | NA | NA | NA |
| 60 | 2015-02-04 | Ryman Premier | Lewes | NA | NA | Canvey Isl. | 0.532 | NA | NA | NA | NA |
Estimate using different subsamples of the historical dataset, to determine whether that has any impact. If a dataset is stationary, then subsample estimation should make no difference, but if there are non-stationarities then it might. Football match outcomes from more than 100 years ago are unlikely to be particularly relevant for outcomes today. The Elo scores calculated reflect all matches since 1877, and hence all information would not be thrown away by restricting estimation.
Estimate using an alternate statistical model, as this may help expand the range of forecast probabilities. The small range may be a result of teams particularly in the Premiership not being particularly different from each other relative to other divisions, and is something to be investigated, but an ordered logit/probit model would allow a better delineation between forecasting a home win, draw or away win, via the cut-offs that are estimated.