Following on from the forecasts set out in http://rpubs.com/jjreade/forc_300115 on Friday ahead of the weekend’s action, this document is the first of a set of updates on such forecasts, detailing outcomes, and discussing improvements to be implemented.

Outcomes

Firstly, we should consider outcomes; we load up the specific outcomes file for matches forecast over the weekend:

loc2 <- "/home/readejj/Dropbox/Teaching/Reading/ec313/2015/Football-forecasts/"
date.1 <- "2015-01-30"
recent.forecast.outcomes <- read.csv(paste(loc2,"forecast_outcomes_",date.1,".csv",sep=""),stringsAsFactors=F)
forecast.matches <- read.csv(paste(loc2,"forecasts_",date.1,".csv",sep=""))
forecast.matches <- forecast.matches[is.na(forecast.matches$outcome)==F,]
forecast.outcomes <- merge(forecast.matches[,c("match_id","outcome")],recent.forecast.outcomes,by=c("match_id"),
                           suffixes=c(".forc",".final"))

First, our Premier League forecasts:

prem.matches <- forecast.outcomes[forecast.outcomes$division=="English Premier",]
prem.matches$id <- 1:NROW(prem.matches)
par(mar=c(9,4,4,5)+.1)
plot(prem.matches$id,prem.matches$outcome.forc,xaxt="n",xlab="",ylim=range(0,1),
     main="Forecasts of Weekend Premier League Matches",
     ylab="Probability of Outcome")
abline(h=0.5,lty=2)
abline(h=0.6,lty=3)
abline(h=0.7,lty=2)
lines(prem.matches$id,prem.matches$outcome.final,col=2,type="p")
for(i in 1:NROW(prem.matches)) {
  lines(rep(i,2),c(prem.matches$outcome.forc[i],prem.matches$outcome.final[i]),type="l",lty=2,col=4)
}
axis(1,at=prem.matches$id,labels=paste(prem.matches$team1,prem.matches$team2,sep=" v "),las=2,cex.axis=0.65)

Red circles are outcomes, which are either 0 (away win), 0.5 (draw), or 1 (home win). Blue lines link forecasts to outcomes. Hence three of the highest probability forecasts ended up as home wins, and even such a (relative) “success”, as can be seen, yields what appears a reasonably sizeable forecast error. The fourth (relatively) strongly expected home win, Southampton against Swansea, turned out to be an away win, against most expectations, arguably.

Next, our Championship forecasts:

champ.matches <- forecast.outcomes[forecast.outcomes$division=="English Championship",]
champ.matches$id <- 1:NROW(champ.matches)
par(mar=c(9,4,4,5)+.1)
plot(champ.matches$id,champ.matches$outcome.forc,xaxt="n",xlab="",ylim=range(0,1),
     main="Forecasts of Weekend Championship Matches",
     ylab="Probability of Outcome")
abline(h=0.5,lty=2)
abline(h=0.6,lty=3)
abline(h=0.7,lty=2)
lines(champ.matches$id,champ.matches$outcome.final,col=2,type="p")
for(i in 1:NROW(champ.matches)) {
  lines(rep(i,2),c(champ.matches$outcome.forc[i],champ.matches$outcome.final[i]),type="l",lty=2,col=4)
}
axis(1,at=champ.matches$id,labels=paste(champ.matches$team1,champ.matches$team2,sep=" v "),las=2,cex.axis=0.65)

There was a greater range of probabalistic forecasts for the Championship relative to the Premier League, with Blackpool and Cardiff only at just above 40% to beat Brighton and Derby, respectively. Blackpool defeated those odds to beat Brighton, whilst Derby won at Cardiff, as was expected.

Next, our League One forecasts:

lg1.matches <- forecast.outcomes[forecast.outcomes$division=="English League One",]
lg1.matches$id <- 1:NROW(lg1.matches)
par(mar=c(9,4,4,5)+.1)
plot(lg1.matches$id,lg1.matches$outcome.forc,xaxt="n",xlab="",ylim=range(0,1),
     main="Forecasts of Weekend League One Matches",
     ylab="Probability of Outcome")
abline(h=0.5,lty=2)
abline(h=0.6,lty=3)
abline(h=0.7,lty=2)
lines(lg1.matches$id,lg1.matches$outcome.final,col=2,type="p")
for(i in 1:NROW(lg1.matches)) {
  lines(rep(i,2),c(lg1.matches$outcome.forc[i],lg1.matches$outcome.final[i]),type="l",lty=2,col=4)
}
axis(1,at=lg1.matches$id,labels=paste(lg1.matches$team1,lg1.matches$team2,sep=" v "),las=2,cex.axis=0.65)

Crawley’s win over Preston was the most unexpected result in League One over the weekend. The forecast of Barnsley vs Oldham has not been updated for the weekend’s results; both teams won, which suggests that the forecast is unlikely to be altered much.  The least surprising result of the weekend perhaps is between Coventry and Rochdale, which finished a draw.

Next, our League Two forecasts:

lg2.matches <- forecast.outcomes[forecast.outcomes$division=="English League Two",]
lg2.matches$id <- 1:NROW(lg2.matches)
par(mar=c(9,4,4,5)+.1)
plot(lg2.matches$id,lg2.matches$outcome.forc,xaxt="n",xlab="",ylim=range(0,1),
     main="Forecasts of Weekend League Two Matches",
     ylab="Probability of Outcome")
abline(h=0.5,lty=2)
abline(h=0.6,lty=3)
abline(h=0.7,lty=2)
lines(lg2.matches$id,lg2.matches$outcome.final,col=2,type="p")
for(i in 1:NROW(lg2.matches)) {
  lines(rep(i,2),c(lg2.matches$outcome.forc[i],lg2.matches$outcome.final[i]),type="l",lty=2,col=4)
}
axis(1,at=lg2.matches$id,labels=paste(lg2.matches$team1,lg2.matches$team2,sep=" v "),las=2,cex.axis=0.65)

League Two threw up a number of surprise results, perhaps most notably Hartlepool’s win over Plymouth, but also Oxford’s win at Stevenage and to a lesser extent Tranmere’s win at Exeter.

Next, our Football Conference forecasts:

conf.matches <- forecast.outcomes[forecast.outcomes$division=="Football Conference",]
conf.matches$id <- 1:NROW(conf.matches)
par(mar=c(9,4,4,5)+.1)
plot(conf.matches$id,conf.matches$outcome.forc,xaxt="n",xlab="",ylim=range(0,1),
     main="Forecasts of Weekend Football Conference Matches",
     ylab="Probability of Outcome")
abline(h=0.5,lty=2)
abline(h=0.6,lty=3)
abline(h=0.7,lty=2)
lines(conf.matches$id,conf.matches$outcome.final,col=2,type="p")
for(i in 1:NROW(conf.matches)) {
  lines(rep(i,2),c(conf.matches$outcome.forc[i],conf.matches$outcome.final[i]),type="l",lty=2,col=4)
}
axis(1,at=conf.matches$id,labels=paste(conf.matches$team1,conf.matches$team2,sep=" v "),las=2,cex.axis=0.65)

Tabular Version

Numerically it is important to evaluate forecast errors.

forecast.outcomes$error <- forecast.outcomes$outcome.final - forecast.outcomes$outcome.forc
forecast.outcomes$error2 <- forecast.outcomes$error^2
forecast.outcomes$aerror <- abs(forecast.outcomes$error)
summary(forecast.outcomes[forecast.outcomes$tier<=5,c("error","error2","aerror")])
##      error               error2             aerror        
##  Min.   :-0.705574   Min.   :0.000075   Min.   :0.008652  
##  1st Qu.:-0.491005   1st Qu.:0.052329   1st Qu.:0.228755  
##  Median :-0.008652   Median :0.166743   Median :0.408341  
##  Mean   :-0.006526   Mean   :0.172350   Mean   :0.372104  
##  3rd Qu.: 0.394592   3rd Qu.:0.261508   3rd Qu.:0.511379  
##  Max.   : 0.589336   Max.   :0.497834   Max.   :0.705574  
##  NA's   :11          NA's   :11         NA's   :11

We can consider also, by division, forecast errors:

library(knitr)
aggs <- aggregate(forecast.outcomes[forecast.outcomes$tier<=5,c("error","error2","aerror")],
          by=list(forecast.outcomes$division[forecast.outcomes$tier<=5]),FUN=mean,na.rm=T)
kable(aggs[c(4,1,2,3,5),])
Group.1 error error2 aerror
4 English Premier -0.0780767 0.1992338 0.4146174
1 English Championship -0.0636480 0.1705921 0.3585123
2 English League One 0.1010544 0.1409837 0.3329361
3 English League Two 0.0003048 0.2034620 0.4220871
5 Football Conference -0.0025224 0.1373953 0.3161291

The error column is the mean forecast error, the error2 column is the mean squared forecast error, and the aerror column is the absolute forecast error.

In terms of simple forecast errors, the summary above suggests there is no obvious bias in forecasts (the mean error, first column, is essentially zero), but that on average errors are larger than might be hoped for; the mean absolute error is 0.37, suggesting that on average our forecasts are out by about 40%. This is not altogether surprising given that the majority of our forecasts lay in the region 40–60%, and naturally many of those matches ended in home or away wins. That the squared errors (error2) and absolute errors (aerror) yield similar information suggests that very few of our forecasts were wildly out (which would be penalised by a measure of squared errors), but equivalently this simply speaks to the moderate nature of our forecasts at this stage. Forecasts nearer to 80% and 90% may lead to smaller errors for matches that end as expected, but larger ones if matches end in surprise results.

The second set of information breaks down errors by division, and gives somewhat conflicting information; based on mean errors, Premier League and League Two forecasts are better, but based on squared and absolute errors, these are the worst forecasts. Although unbiasedness is an important requirement of forecasts, similarly we hope for forecasts with as minimal a variance as possible, and hence the larger squared measure suggests that this has yet to be achieved.  The large (relative) positive bias for League One forecasts reflects the relatively large number of home wins in that division, and in particular two unexpected ones (Sheffield United and Crawley).

Finally, we list all the forecasts again with outcomes:

kable(forecast.outcomes[order(forecast.outcomes$date,forecast.outcomes$division),
                       c("date","division","team1","goals1","goals2","team2",
                         "outcome.forc","outcome.final","error","error2","aerror")],
      digits=3)
date division team1 goals1 goals2 team2 outcome.forc outcome.final error error2 aerror
15 2015-01-30 English Championship Bournemouth 2 0 Watford 0.628 1.0 0.372 0.138 0.372
59 2015-01-31 Conference North Boston Utd 1 1 Stockport 0.584 0.5 -0.084 0.007 0.084
56 2015-01-31 Conference South Sutton Utd 2 0 Farnborough 0.728 1.0 0.272 0.074 0.272
57 2015-01-31 Conference South Eastbourne 0 1 Bath City 0.553 0.0 -0.553 0.306 0.553
11 2015-01-31 English Championship Blackpool 1 0 Brighton 0.443 1.0 0.557 0.310 0.557
12 2015-01-31 English Championship Huddersfield 1 2 Leeds 0.597 0.0 -0.597 0.356 0.597
13 2015-01-31 English Championship Nottm Forest 0 1 Millwall 0.634 0.0 -0.634 0.403 0.634
14 2015-01-31 English Championship Cardiff 0 2 Derby 0.426 0.0 -0.426 0.182 0.426
16 2015-01-31 English Championship Blackburn 2 1 Fulham 0.597 1.0 0.403 0.162 0.403
17 2015-01-31 English Championship Reading 2 0 Sheff Wed 0.562 1.0 0.438 0.191 0.438
18 2015-01-31 English Championship Brentford 0 1 Middlesbro 0.491 0.0 -0.491 0.241 0.491
19 2015-01-31 English Championship Charlton 1 1 Rotherham 0.565 0.5 -0.065 0.004 0.065
20 2015-01-31 English Championship Ipswich 0 0 Wigan 0.729 0.5 -0.229 0.052 0.229
21 2015-01-31 English Championship Birmingham 0 0 Norwich 0.509 0.5 -0.009 0.000 0.009
22 2015-01-31 English Championship Bolton 2 2 Wolves 0.582 0.5 -0.082 0.007 0.082
23 2015-01-31 English League One Bradford 1 1 Colchester 0.685 0.5 -0.185 0.034 0.185
24 2015-01-31 English League One Coventry 2 2 Rochdale 0.484 0.5 0.016 0.000 0.016
25 2015-01-31 English League One Crewe 0 5 MK Dons 0.414 0.0 -0.414 0.172 0.414
26 2015-01-31 English League One Sheff Utd 2 0 Swindon 0.498 1.0 0.502 0.252 0.502
27 2015-01-31 English League One Crawley 2 1 Preston 0.411 1.0 0.589 0.347 0.589
29 2015-01-31 English League One Oldham 3 0 Notts Co 0.599 1.0 0.401 0.161 0.401
30 2015-01-31 English League One Chesterfield 2 2 Doncaster 0.633 0.5 -0.133 0.018 0.133
31 2015-01-31 English League One Leyton Orient 1 4 Scunthorpe 0.533 0.0 -0.533 0.284 0.533
33 2015-01-31 English League One Barnsley 2 1 Port Vale 0.589 1.0 0.411 0.169 0.411
34 2015-01-31 English League One Peterborough 1 0 Yeovil 0.609 1.0 0.391 0.153 0.391
35 2015-01-31 English League Two Southend 1 0 York 0.638 1.0 0.362 0.131 0.362
36 2015-01-31 English League Two Wycombe 0 0 Portsmouth 0.697 0.5 -0.197 0.039 0.197
37 2015-01-31 English League Two Dag & Red 3 1 Cheltenham 0.605 1.0 0.395 0.156 0.395
38 2015-01-31 English League Two Exeter 1 2 Tranmere 0.576 0.0 -0.576 0.332 0.576
39 2015-01-31 English League Two Burton 1 0 Bury 0.643 1.0 0.357 0.127 0.357
40 2015-01-31 English League Two Carlisle 2 1 Mansfield 0.577 1.0 0.423 0.179 0.423
41 2015-01-31 English League Two Stevenage 0 2 Oxford 0.646 0.0 -0.646 0.417 0.646
42 2015-01-31 English League Two Luton 3 2 Cambridge U 0.585 1.0 0.415 0.172 0.415
43 2015-01-31 English League Two Newport Co 0 1 Shrewsbury 0.501 0.0 -0.501 0.251 0.501
44 2015-01-31 English League Two Accrington 1 5 Northampton 0.541 0.0 -0.541 0.293 0.541
45 2015-01-31 English League Two Morecambe 1 1 AFC W’bledon 0.569 0.5 -0.069 0.005 0.069
46 2015-01-31 English League Two Hartlepool 3 2 Plymouth 0.417 1.0 0.583 0.339 0.583
3 2015-01-31 English Premier Chelsea 1 1 Man City 0.627 0.5 -0.127 0.016 0.127
4 2015-01-31 English Premier Liverpool 2 0 West Ham 0.592 1.0 0.408 0.167 0.408
5 2015-01-31 English Premier Hull 0 3 Newcastle 0.525 0.0 -0.525 0.276 0.525
6 2015-01-31 English Premier C Palace 0 1 Everton 0.604 0.0 -0.604 0.364 0.604
7 2015-01-31 English Premier Man Utd 3 1 Leicester 0.709 1.0 0.291 0.085 0.291
8 2015-01-31 English Premier Stoke 3 1 QPR 0.693 1.0 0.307 0.094 0.307
9 2015-01-31 English Premier Sunderland 2 0 Burnley 0.584 1.0 0.416 0.173 0.416
10 2015-01-31 English Premier West Brom 0 3 Tottenham 0.502 0.0 -0.502 0.252 0.502
62 2015-01-31 Evo-Stik S Premier Weymouth 3 0 Histon 0.681 1.0 0.319 0.102 0.319
48 2015-01-31 Football Conference Dartford 2 2 Bristol R 0.404 0.5 0.096 0.009 0.096
49 2015-01-31 Football Conference Wrexham 0 0 Torquay 0.549 0.5 -0.049 0.002 0.049
50 2015-01-31 Football Conference Forest Green 1 0 Nuneaton 0.784 1.0 0.216 0.047 0.216
51 2015-01-31 Football Conference Lincoln 1 0 Dover 0.494 1.0 0.506 0.256 0.506
53 2015-01-31 Football Conference Woking 3 0 Alfreton 0.721 1.0 0.279 0.078 0.279
54 2015-01-31 Football Conference Welling 1 3 Chester 0.555 0.0 -0.555 0.308 0.555
55 2015-01-31 Football Conference Southport 0 1 Gateshead 0.511 0.0 -0.511 0.262 0.511
61 2015-01-31 Ryman Premier Kingstonian 0 1 Maidstone 0.511 0.0 -0.511 0.261 0.511
28 2015-02-01 English League One Walsall 1 1 Gillingham 0.625 0.5 -0.125 0.016 0.125
32 2015-02-01 English League One Bristol C 2 0 Fleetwood 0.706 1.0 0.294 0.087 0.294
1 2015-02-01 English Premier Arsenal 5 0 Aston Villa 0.739 1.0 0.261 0.068 0.261
2 2015-02-01 English Premier Southampton 0 1 Swansea 0.706 0.0 -0.706 0.498 0.706
58 2015-02-03 Conference North Stockport NA NA Barrow 0.592 NA NA NA NA
65 2015-02-03 English FA Cup Fulham NA NA Sunderland 0.467 NA NA NA NA
66 2015-02-03 English FA Cup Sheff Utd NA NA Preston 0.575 NA NA NA NA
67 2015-02-03 English FA Cup Man Utd NA NA Cambridge U 0.857 NA NA NA NA
63 2015-02-03 English League One Barnsley NA NA Oldham 0.574 NA NA NA NA
69 2015-02-03 FA Trophy Halifax NA NA Dartford 0.724 NA NA NA NA
70 2015-02-03 FA Trophy Gateshead NA NA Wrexham 0.645 NA NA NA NA
71 2015-02-03 FA Trophy Ebbsfleet NA NA Braintree 0.418 NA NA NA NA
47 2015-02-03 Football Conference Dover NA NA Grimsby 0.583 NA NA NA NA
52 2015-02-03 Football Conference Alfreton NA NA Lincoln 0.495 NA NA NA NA
64 2015-02-03 Football Conference Wrexham NA NA Forest Green 0.513 NA NA NA NA
68 2015-02-04 English FA Cup Bolton NA NA Liverpool 0.392 NA NA NA NA
60 2015-02-04 Ryman Premier Lewes NA NA Canvey Isl. 0.532 NA NA NA NA

Improvements to be Implemented

  1. Estimate using different subsamples of the historical dataset, to determine whether that has any impact. If a dataset is stationary, then subsample estimation should make no difference, but if there are non-stationarities then it might. Football match outcomes from more than 100 years ago are unlikely to be particularly relevant for outcomes today. The Elo scores calculated reflect all matches since 1877, and hence all information would not be thrown away by restricting estimation.

  2. Estimate using an alternate statistical model, as this may help expand the range of forecast probabilities. The small range may be a result of teams particularly in the Premiership not being particularly different from each other relative to other divisions, and is something to be investigated, but an ordered logit/probit model would allow a better delineation between forecasting a home win, draw or away win, via the cut-offs that are estimated.