Introduction

This is the second in a series of forecasts of football match outcomes, following on from my efforts last week. One of the intended improvements from last week’s forecasts was to model the outcomes as an ordered logit or probit model, that way generating individual probabilities for the three events: home win, draw or away win.  That has been carried out this week, and we plot those forecasts alongside the linear regression forecasts.

Loading the Data

As with last week, the dataset is all English matches recorded on http://www.soccerbase.com, which goes back to 1877 and the very first football matches.  Experimentation will take place with adjusting the estimation sample size, since it is not necessarily useful to have all matches back to 1877 when forecasting matches in 2015.  The Elo ranks have been calculated since the very first matches, and hence historical information is retained, to the extent that it is useful in determining a team’s current strength, back throughout footballing history.

library(knitr)
library(MASS)
date.1 <- "2015-02-06"
wd <- "/home/readejj/Dropbox/Teaching/Reading/ec313/2015/Football-forecasts/"
forecast.matches <- read.csv(paste(wd,"forecasts_",date.1,".csv",sep=""))
forecast.matches <- forecast.matches[is.na(forecast.matches$outcome)==F,]

Forecast Model

The linear regression model is estimated here and reported:

res.eng <- read.csv(paste(wd,"historical_",date.1,".csv",sep=""))
model <- lm(outcome ~ E.1 + pts1 + pts.D + pts.D.2 + pld1 + pld.D + pld.D.2 + gs1 + gs.D + gs.D.2 
            + gd1 + gd.D + gd.D.2 
            + pos1 + pos.D + pos.D.2 + form1 + form.D + form.D.2 + tier1 + tier.D + tier.D.2 + season.d,
            data=res.eng)
summary(model)
## 
## Call:
## lm(formula = outcome ~ E.1 + pts1 + pts.D + pts.D.2 + pld1 + 
##     pld.D + pld.D.2 + gs1 + gs.D + gs.D.2 + gd1 + gd.D + gd.D.2 + 
##     pos1 + pos.D + pos.D.2 + form1 + form.D + form.D.2 + tier1 + 
##     tier.D + tier.D.2 + season.d, data = res.eng)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.0177 -0.2933  0.1393  0.3497  0.8447 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.014e-01  7.143e-03  56.203  < 2e-16 ***
## E.1          4.050e-01  1.098e-02  36.892  < 2e-16 ***
## pts1         1.042e-03  4.309e-04   2.418  0.01560 *  
## pts.D       -2.854e-03  3.153e-04  -9.051  < 2e-16 ***
## pts.D.2     -1.376e-05  6.576e-06  -2.093  0.03633 *  
## pld1        -1.722e-03  6.194e-04  -2.780  0.00544 ** 
## pld.D        3.374e-03  7.240e-04   4.661 3.15e-06 ***
## pld.D.2     -4.607e-05  3.132e-05  -1.471  0.14126    
## gs1          5.071e-04  1.737e-04   2.920  0.00350 ** 
## gs.D        -3.181e-05  1.552e-04  -0.205  0.83753    
## gs.D.2      -1.654e-06  4.766e-06  -0.347  0.72861    
## gd1         -6.827e-04  2.446e-04  -2.791  0.00525 ** 
## gd.D         3.427e-03  1.785e-04  19.205  < 2e-16 ***
## gd.D.2      -5.681e-06  2.380e-06  -2.386  0.01701 *  
## pos1         7.940e-04  3.053e-04   2.601  0.00930 ** 
## pos.D       -4.201e-04  2.583e-04  -1.626  0.10388    
## pos.D.2      3.594e-05  1.189e-05   3.022  0.00251 ** 
## form1        7.702e-04  3.574e-04   2.155  0.03116 *  
## form.D      -2.162e-03  3.344e-04  -6.465 1.01e-10 ***
## form.D.2    -7.894e-05  3.044e-05  -2.593  0.00952 ** 
## tier1        1.978e-03  7.782e-04   2.541  0.01105 *  
## tier.D      -5.404e-02  3.172e-03 -17.033  < 2e-16 ***
## tier.D.2    -5.883e-03  1.277e-03  -4.606 4.10e-06 ***
## season.d    -1.104e-03  3.133e-05 -35.230  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4008 on 215647 degrees of freedom
##   (38023 observations deleted due to missingness)
## Multiple R-squared:  0.05678,    Adjusted R-squared:  0.05668 
## F-statistic: 564.4 on 23 and 215647 DF,  p-value: < 2.2e-16

The ordered logistic regression model is:

model.ord <- polr(as.factor(outcome) ~ E.1 + pts1 + pts.D + pts.D.2 + pld1 + pld.D + pld.D.2 + 
                    gs1 + gs.D + gs.D.2 + gd1 + gd.D + gd.D.2 + pos1 + pos.D + pos.D.2 + 
                    form1 + form.D + form.D.2 + tier1 + tier.D + tier.D.2 + season.d, 
                  data=res.eng, method = "logistic")
summary(model.ord)
## 
## Re-fitting to get Hessian
## Call:
## polr(formula = as.factor(outcome) ~ E.1 + pts1 + pts.D + pts.D.2 + 
##     pld1 + pld.D + pld.D.2 + gs1 + gs.D + gs.D.2 + gd1 + gd.D + 
##     gd.D.2 + pos1 + pos.D + pos.D.2 + form1 + form.D + form.D.2 + 
##     tier1 + tier.D + tier.D.2 + season.d, data = res.eng, method = "logistic")
## 
## Coefficients:
##               Value Std. Error  t value
## E.1       1.910e+00  9.480e-03 201.4739
## pts1      4.325e-03  2.069e-03   2.0901
## pts.D    -1.426e-02  1.504e-03  -9.4814
## pts.D.2  -7.848e-05  3.584e-05  -2.1900
## pld1     -9.531e-03  2.977e-03  -3.2015
## pld.D     1.844e-02  3.529e-03   5.2253
## pld.D.2  -2.550e-04  1.607e-04  -1.5866
## gs1       3.699e-03  8.457e-04   4.3746
## gs.D     -6.013e-04  7.530e-04  -0.7986
## gs.D.2   -3.248e-06  2.545e-05  -0.1276
## gd1      -3.809e-03  1.186e-03  -3.2125
## gd.D      1.776e-02  8.716e-04  20.3728
## gd.D.2    6.807e-06  1.638e-05   0.4156
## pos1      3.105e-03  1.446e-03   2.1464
## pos.D    -1.150e-03  1.240e-03  -0.9271
## pos.D.2   1.959e-04  5.902e-05   3.3192
## form1     3.492e-03  1.702e-03   2.0516
## form.D   -9.734e-03  1.416e-03  -6.8765
## form.D.2 -2.751e-04  1.481e-04  -1.8576
## tier1     9.365e-03  3.695e-03   2.5345
## tier.D   -2.827e-01  1.428e-02 -19.7948
## tier.D.2 -8.438e-03  6.902e-03  -1.2226
## season.d -5.399e-03  1.515e-04 -35.6460
## 
## Intercepts:
##       Value    Std. Error t value 
## 0|0.5  -0.1010   0.0170    -5.9491
## 0.5|1   1.0595   0.0170    62.4299
## 
## Residual Deviance: 433812.17 
## AIC: 433862.17 
## (38023 observations deleted due to missingness)

The Forecasts

Premier League

First, our Premier League forecasts:

prem.matches <- forecast.matches[forecast.matches$division=="English Premier",]
prem.matches$id <- 1:NROW(prem.matches)
par(mar=c(9,4,4,5)+.1)
plot(prem.matches$id,prem.matches$outcome,xaxt="n",xlab="",ylim=range(0,1),
     main="Forecasts of Weekend Premier League Matches",
     ylab="Probability of Outcome")
lines(prem.matches$id,prem.matches$Ph,col=2,pch=15,type="p")
lines(prem.matches$id,prem.matches$Pd,col=3,pch=16,type="p")
lines(prem.matches$id,prem.matches$Pa,col=4,pch=17,type="p")
legend("topleft",ncol=4,pch=c(1,15,16,17),col=c(1:4),
       legend=c("OLS","OL (home)","OL (draw)","OL (away)"),bty="n")
abline(h=0.5,lty=2)
abline(h=0.6,lty=3)
abline(h=0.7,lty=2)
abline(h=0.4,lty=3)
axis(1,at=prem.matches$id,labels=paste(prem.matches$team1,prem.matches$team2,sep=" v "),las=2,cex.axis=0.65)

The coloured dots are forecasts from the ordered logistic regression model; the black circles are the forecasts from a simple OLS linear probability model.  Hence the black circles are essentially a probability of a home win occurring (given the ordinal variable defined to capture all three outcomes), whereas the red squares are the probability of a home win, the green solid circles are the probability of a draw, and the blue triangles the probability of an away win.  The home bias in football is notable in that the majority of red squares lie above blue triangles.  Even in the case of Chelsea at Aston Villa, which appears on current form an away banker, Chelsea are only at just under 60% for the win.  The strong home bankers are Man City (against Hull), Arsenal (against Leicester), Man United (against Burnley) and Chelsea (against Everton), but even the largest of these probabilities (Man City) is just shy of 70%.  Two of the tightest matches appear to be the two derbies taking place this weekend, namely Tottenham vs Arsenal and Everton vs Liverpool, the othe very tight match is West Ham vs Man United.

It is worth noting, finally, that there appears more variation this week in the OLS forecasts than last week, suggesting that the lack of variation last week was due to a very evenly matched set of fixtures.

Championship

Next, our Championship forecasts:

champ.matches <- forecast.matches[forecast.matches$division=="English Championship",]
champ.matches$id <- 1:NROW(champ.matches)
par(mar=c(9,4,4,5)+.1)
plot(champ.matches$id,champ.matches$outcome,xaxt="n",xlab="",ylim=range(0,1),
     main="Forecasts of Weekend Championship Matches",
     ylab="Probability of Outcome")
lines(champ.matches$id,champ.matches$Ph,col=2,pch=15,type="p")
lines(champ.matches$id,champ.matches$Pd,col=3,pch=16,type="p")
lines(champ.matches$id,champ.matches$Pa,col=4,pch=17,type="p")
legend("topleft",ncol=4,pch=c(1,15,16,17),col=c(1:4),
       legend=c("OLS","OL (home)","OL (draw)","OL (away)"),bty="n")
abline(h=0.5,lty=2)
abline(h=0.6,lty=3)
abline(h=0.7,lty=2)
axis(1,at=champ.matches$id,labels=paste(champ.matches$team1,champ.matches$team2,sep=" v "),las=2,cex.axis=0.65)

League One

Next, our League One forecasts:

lg1.matches <- forecast.matches[forecast.matches$division=="English League One",]
lg1.matches$id <- 1:NROW(lg1.matches)
par(mar=c(9,4,4,5)+.1)
plot(lg1.matches$id,lg1.matches$outcome,xaxt="n",xlab="",ylim=range(0,1),
     main="Forecasts of Weekend League One Matches",
     ylab="Probability of Outcome")
lines(lg1.matches$id,lg1.matches$Ph,col=2,pch=15,type="p")
lines(lg1.matches$id,lg1.matches$Pd,col=3,pch=16,type="p")
lines(lg1.matches$id,lg1.matches$Pa,col=4,pch=17,type="p")
legend("topleft",ncol=4,pch=c(1,15,16,17),col=c(1:4),
       legend=c("OLS","OL (home)","OL (draw)","OL (away)"),bty="n")
abline(h=0.5,lty=2)
abline(h=0.6,lty=3)
abline(h=0.7,lty=2)
axis(1,at=lg1.matches$id,labels=paste(lg1.matches$team1,lg1.matches$team2,sep=" v "),las=2,cex.axis=0.65)

League Two

Next, our League Two forecasts:

lg2.matches <- forecast.matches[forecast.matches$division=="English League Two",]
lg2.matches$id <- 1:NROW(lg2.matches)
par(mar=c(9,4,4,5)+.1)
plot(lg2.matches$id,lg2.matches$outcome,xaxt="n",xlab="",ylim=range(0,1),
     main="Forecasts of Weekend League Two Matches",
     ylab="Probability of Outcome")
lines(lg2.matches$id,lg2.matches$Ph,col=2,pch=15,type="p")
lines(lg2.matches$id,lg2.matches$Pd,col=3,pch=16,type="p")
lines(lg2.matches$id,lg2.matches$Pa,col=4,pch=17,type="p")
legend("topleft",ncol=4,pch=c(1,15,16,17),col=c(1:4),
       legend=c("OLS","OL (home)","OL (draw)","OL (away)"),bty="n")
abline(h=0.5,lty=2)
abline(h=0.6,lty=3)
abline(h=0.7,lty=2)
axis(1,at=lg2.matches$id,labels=paste(lg2.matches$team1,lg2.matches$team2,sep=" v "),las=2,cex.axis=0.65)

Football Conference

Next, our Football Conference forecasts:

conf.matches <- forecast.matches[forecast.matches$division=="Football Conference",]
conf.matches$id <- 1:NROW(conf.matches)
par(mar=c(9,4,4,5)+.1)
plot(conf.matches$id,conf.matches$outcome,xaxt="n",xlab="",ylim=range(0,1),
     main="Forecasts of Weekend Football Conference Matches",
     ylab="Probability of Outcome")
lines(conf.matches$id,conf.matches$Ph,col=2,pch=15,type="p")
lines(conf.matches$id,conf.matches$Pd,col=3,pch=16,type="p")
lines(conf.matches$id,conf.matches$Pa,col=4,pch=17,type="p")
legend("topleft",ncol=4,pch=c(1,15,16,17),col=c(1:4),
       legend=c("OLS","OL (home)","OL (draw)","OL (away)"),bty="n")
abline(h=0.5,lty=2)
abline(h=0.6,lty=3)
abline(h=0.7,lty=2)
axis(1,at=conf.matches$id,labels=paste(conf.matches$team1,conf.matches$team2,sep=" v "),las=2,cex.axis=0.65)

List of all forecasts

For transparency, all forecasts are also listed as a table:

kable(forecast.matches[order(forecast.matches$date,forecast.matches$division),
                       c("date","division","team1","outcome","team2")])
date division team1 outcome team2
67 2015-02-07 Conference North Hyde 0.3560532 Boston Utd
68 2015-02-07 Conference North Bradford PA 0.5513738 Gainsborough
69 2015-02-07 Conference North Tamworth 0.6313289 Barrow
76 2015-02-07 Conference North Stockport 0.6472377 Stalybridge
59 2015-02-07 Conference South Hayes & Y 0.5585974 Eastbourne
8 2015-02-07 English Championship Fulham 0.5939926 Birmingham
9 2015-02-07 English Championship Wolves 0.6198190 Reading
10 2015-02-07 English Championship Leeds 0.5201798 Brentford
11 2015-02-07 English Championship Wigan 0.3921368 Bournemouth
12 2015-02-07 English Championship Watford 0.6098818 Blackburn
13 2015-02-07 English Championship Norwich 0.7337136 Blackpool
14 2015-02-07 English Championship Derby 0.7050756 Bolton
15 2015-02-07 English Championship Millwall 0.5600428 Huddersfield
16 2015-02-07 English Championship Middlesbro 0.7588164 Charlton
17 2015-02-07 English Championship Sheff Wed 0.5928734 Cardiff
18 2015-02-07 English Championship Brighton 0.6328668 Nottm Forest
19 2015-02-07 English Championship Rotherham 0.4552989 Ipswich
119 2015-02-07 English Championship Rotherham 0.4552989 Ipswich
20 2015-02-07 English League One Swindon 0.6934015 Barnsley
21 2015-02-07 English League One Yeovil 0.6065801 Crawley
22 2015-02-07 English League One Doncaster 0.5897386 Walsall
23 2015-02-07 English League One MK Dons 0.6025996 Bristol C
24 2015-02-07 English League One Scunthorpe 0.6328965 Oldham
25 2015-02-07 English League One Notts Co 0.4921860 Chesterfield
26 2015-02-07 English League One Port Vale 0.5156276 Bradford
27 2015-02-07 English League One Colchester 0.6171720 Crewe
28 2015-02-07 English League One Rochdale 0.7114202 Leyton Orient
29 2015-02-07 English League One Gillingham 0.5105389 Sheff Utd
30 2015-02-07 English League One Preston 0.6932474 Coventry
31 2015-02-07 English League One Fleetwood 0.5978147 Peterborough
32 2015-02-07 English League Two Bury 0.6154614 Exeter
33 2015-02-07 English League Two Portsmouth 0.6683120 Hartlepool
34 2015-02-07 English League Two AFC W’bledon 0.5797523 Newport Co
35 2015-02-07 English League Two Mansfield 0.5091203 Stevenage
36 2015-02-07 English League Two Cheltenham 0.4474220 Burton
37 2015-02-07 English League Two Plymouth 0.6568156 Accrington
38 2015-02-07 English League Two Shrewsbury 0.6640427 Southend
39 2015-02-07 English League Two Cambridge U 0.5723349 Wycombe
40 2015-02-07 English League Two Tranmere 0.6704065 Carlisle
41 2015-02-07 English League Two York 0.6120021 Dag & Red
42 2015-02-07 English League Two Northampton 0.6236824 Morecambe
43 2015-02-07 English League Two Oxford 0.4923092 Luton
1 2015-02-07 English Premier Leicester 0.5339189 C Palace
2 2015-02-07 English Premier QPR 0.3885049 Southampton
3 2015-02-07 English Premier Man City 0.7729102 Hull
4 2015-02-07 English Premier Aston Villa 0.3271566 Chelsea
5 2015-02-07 English Premier Tottenham 0.5010300 Arsenal
6 2015-02-07 English Premier Everton 0.5023859 Liverpool
7 2015-02-07 English Premier Swansea 0.5910778 Sunderland
113 2015-02-07 FA Trophy Halifax 0.7071926 Dartford
114 2015-02-07 FA Trophy Gateshead 0.6637172 Wrexham
115 2015-02-07 FA Trophy Dover 0.7978782 Bath City
44 2015-02-07 Football Conference Chester 0.6704363 Dartford
45 2015-02-07 Football Conference Forest Green 0.5460007 Grimsby
46 2015-02-07 Football Conference Altrincham 0.6625440 Aldershot
47 2015-02-07 Football Conference Bristol R 0.6438205 Lincoln
48 2015-02-07 Football Conference Alfreton 0.5386176 Southport
49 2015-02-07 Football Conference Eastleigh 0.7126895 Telford
50 2015-02-07 Football Conference Aldershot 0.4751325 Halifax
51 2015-02-07 Football Conference Macclesfield 0.7237967 Welling
52 2015-02-07 Football Conference Nuneaton 0.4363156 Wrexham
53 2015-02-07 Football Conference Gateshead 0.6503344 Kidderminster
54 2015-02-07 Football Conference Torquay 0.6070942 Braintree
55 2015-02-07 Football Conference Barnet 0.6635609 Woking
56 2015-02-07 Football Conference Dover 0.6812986 Altrincham
120 2015-02-08 English Premier West Ham 0.5131791 Man Utd
121 2015-02-08 English Premier Burnley 0.5602065 West Brom
122 2015-02-08 English Premier Newcastle 0.5328664 Stoke
123 2015-02-09 English League One Bradford 0.4656593 MK Dons
131 2015-02-10 English Championship Blackpool 0.3330129 Middlesbro
132 2015-02-10 English Championship Charlton 0.4649354 Norwich
133 2015-02-10 English Championship Brentford 0.5727377 Watford
134 2015-02-10 English Championship Huddersfield 0.5275923 Wolves
135 2015-02-10 English Championship Cardiff 0.5483160 Brighton
136 2015-02-10 English Championship Bolton 0.6119957 Fulham
137 2015-02-10 English Championship Ipswich 0.6979172 Sheff Wed
138 2015-02-10 English Championship Birmingham 0.6469008 Millwall
139 2015-02-10 English Championship Bournemouth 0.5791758 Derby
140 2015-02-10 English Championship Blackburn 0.6549149 Rotherham
141 2015-02-10 English Championship Reading 0.5986062 Leeds
142 2015-02-10 English League One Bristol C 0.7337478 Port Vale
143 2015-02-10 English League One Leyton Orient 0.5671468 Notts Co
144 2015-02-10 English League One Chesterfield 0.5698005 Preston
145 2015-02-10 English League One Crawley 0.4737742 Doncaster
146 2015-02-10 English League One Coventry 0.5194755 Scunthorpe
147 2015-02-10 English League One Oldham 0.4523621 Swindon
148 2015-02-10 English League One Barnsley 0.6056150 Fleetwood
149 2015-02-10 English League One Crewe 0.6143761 Yeovil
150 2015-02-10 English League One Peterborough 0.5740166 Gillingham
151 2015-02-10 English League One Sheff Utd 0.6989185 Colchester
152 2015-02-10 English League One Walsall 0.5409291 Rochdale
153 2015-02-10 English League Two Stevenage 0.5886318 Bury
154 2015-02-10 English League Two Newport Co 0.5875543 Tranmere
155 2015-02-10 English League Two Hartlepool 0.4514268 Northampton
156 2015-02-10 English League Two Dag & Red 0.5749546 Portsmouth
157 2015-02-10 English League Two Carlisle 0.4143856 Shrewsbury
158 2015-02-10 English League Two Wycombe 0.6473508 Plymouth
159 2015-02-10 English League Two Morecambe 0.6489586 Mansfield
160 2015-02-10 English League Two Southend 0.7168300 Cheltenham
161 2015-02-10 English League Two Accrington 0.5296740 Oxford
162 2015-02-10 English League Two Exeter 0.5052105 Cambridge U
163 2015-02-10 English League Two Burton 0.6586635 AFC W’bledon
164 2015-02-10 English League Two Luton 0.6752474 York
127 2015-02-10 English Premier Arsenal 0.7526942 Leicester
128 2015-02-10 English Premier Hull 0.5541056 Aston Villa
129 2015-02-10 English Premier Liverpool 0.6039202 Tottenham
130 2015-02-10 English Premier Sunderland 0.6632512 QPR
178 2015-02-10 Evo-Stik S Premier Histon 0.4524500 Slough
165 2015-02-10 Football Conference Alfreton 0.4160040 Forest Green
166 2015-02-10 Football Conference Southport 0.4883591 Eastleigh
182 2015-02-10 Football Conference Halifax 0.5743726 Gateshead
183 2015-02-10 Football Conference Macclesfield 0.6853697 Altrincham
190 2015-02-11 English Championship Nottm Forest 0.5859345 Wigan
184 2015-02-11 English Premier Man Utd 0.7384119 Burnley
185 2015-02-11 English Premier Southampton 0.6304523 West Ham
186 2015-02-11 English Premier C Palace 0.5958422 Newcastle
187 2015-02-11 English Premier West Brom 0.5245474 Swansea
188 2015-02-11 English Premier Chelsea 0.7327859 Everton
189 2015-02-11 English Premier Stoke 0.4590179 Man City