chiSq df pValue
home 13.4194 5 0.0197
away 23.4891 5 0.0003
Smartodds specialise in providing in-depth research and analysis on numerous sporting events all over the globe, and producing world-class bespoke software platforms.
We predict outcomes of professional sports on behalf of our clients.
Started with football, now also working on American football, baseball, basketball, cricket, golf, ice hockey, tennis, and more.
We’re always recruiting. Join us!
Motivation: A Bivariate Weibull Count Model for Forecasting Association Football Scores by Boshnakov et al. (an excellent paper!).
Few natural alternatives to Poisson distribution for count data, other than the Negative Binomial.
Solution: Build count distribution from Weibull interarrival times between goals.
Fit Poisson distribution to home and away goals (separately).
Compare observe and expected goals.
Perform goodness-of-fit test.
(This is the most common approach in the literature.)
chiSq df pValue
home 13.4194 5 0.0197
away 23.4891 5 0.0003
\[ Y^{(h)}_i \stackrel{\text{iid}}\sim \text{Poisson}(\lambda); \; i=1,...,N \\ Y^{(a)}_i \stackrel{\text{iid}}\sim \text{Poisson}(\mu); \; i=1,...,N \]
\[ Y^{(h)}_i \stackrel{\text{iid}}\sim \text{Poisson}(\lambda_i); \; i=1,...,N \\ Y^{(a)}_i \stackrel{\text{iid}}\sim \text{Poisson}(\mu_i); \; i=1,...,N \]
We want \(\lambda_i\) and \(\mu_i\) to depend on some covariates \(X_i\) and some parameters \(\theta\).
This can easily look like overdispersion in the aggregated data.
Each team has an attack rating (\(\alpha\)) and a defence rating (\(\beta\)).
We also have home advantage (\(\eta\)) and a global mean (\(\gamma\)).
All of these are constant over time.
We set
\[ \log(\lambda_i) = \alpha_{\text{home}(i)} + \beta_{\text{away}(i)} + \gamma + \eta/2 \\ \log(\mu_i) = \alpha_{\text{away}(i)} + \beta_{\text{home}(i)} + \gamma - \eta/2 \]
Sample \(\alpha\), \(\beta\) from a Gaussian. Assume \(\alpha\) and \(\beta\) are positively correlated.
Set reasonable values for \(\eta\) and \(\gamma\).
Generate 5 seasons’ worth of data using the independent Poisson assumption.
Summary statistics:
mean(HG) sd(HG) mean(AG) sd(AG)
EPL 1.564 1.305 1.183 1.150
Sim 1.556 1.330 1.186 1.149
chiSq df pValue
home 24.3305 5 0.0002
away 12.6575 5 0.0268
Don’t rely too much on empirical/aggregated data.
Make sure that plots and tests are relevant to your modelling assumptions.
Answer the question by formulating a model.
Always a good idea to simulate from your model.
Conventional wisdom: extreme scorelines are overrepresented (relative to independent Poisson).
Different to men’s football, where low scorelines are overrepresented.
Is this true?
Women:
awayGoals
homeGoals 0 1 2 3 4 5 6+
0 5.68 7.02 4.70 3.29 2.62 1.90 3.24
1 8.58 7.61 4.67 2.37 1.16 0.57 0.82
2 6.85 5.78 2.59 1.29 0.54 0.13 0.15
3 4.89 3.11 1.51 0.52 0.12 0.08 0.03
4 4.00 1.68 0.60 0.25 0.08 0.02 0.02
5 2.62 1.16 0.29 0.08 0.02 0.03 0.00
6+ 5.96 1.07 0.29 0.02 0.00 0.00 0.00
Men:
awayGoals
homeGoals 0 1 2 3 4 5 6+
0 9.20 7.87 5.17 2.32 1.15 0.42 0.62
1 11.74 10.64 5.12 1.93 0.85 0.26 0.22
2 8.60 7.64 3.67 1.38 0.41 0.14 0.08
3 4.90 3.44 1.60 0.42 0.17 0.02 0.01
4 2.75 1.63 0.70 0.23 0.06 0.00 0.00
5 1.34 0.65 0.19 0.07 0.00 0.00 0.00
6+ 1.63 0.62 0.14 0.01 0.01 0.00 0.00
Difference (women minus men):
awayGoals
homeGoals 0 1 2 3 4 5 6+
0 -3.52 -0.85 -0.46 0.97 1.47 1.48 2.63
1 -3.16 -3.03 -0.45 0.43 0.31 0.31 0.60
2 -1.74 -1.86 -1.08 -0.09 0.13 0.00 0.07
3 -0.01 -0.33 -0.08 0.10 -0.05 0.06 0.02
4 1.24 0.05 -0.09 0.02 0.02 0.02 0.02
5 1.28 0.51 0.09 0.01 0.02 0.03 0.00
6+ 4.33 0.45 0.15 0.00 -0.01 0.00 0.00