Sports generate multitudes of data. Sports data analysis is useful not only for coaches trying to understand the relative strength or weakness of teams, but also for avid betters around the world looking for an edge. Given the popularity of fantasy sports and betting sites, regression might provide a valuable tool in predicting outcomes for decison making.
Depending on the sport in question, there are a variety of events that bets could be placed on. These events could be simple (team A will beat team B), or more granular (the score between team A and team B will be 3-0) or even more refined (there will be 3 corner kicks in a game, there will exactly 1 red card, at least 2 goals will be score by any team in the first half of a game). The refined examples can get pretty specific and domain knowledge might go a long way in creating and selecting appropriate predictors.
As mentioned above, one of the simplest bets to place is whether or not a team will win. Winning is determined by scoring more points than the other team. Goals or baskets scored are effectively a count. This suggests that we should work with a count data model such as Poisson. The Poisson distribution describes the probability of a number of events within a specific time period. It is described by a unique parameter, \(\lambda\) which represents the rate (in our case, average number of goals) of occurence.
The data is obtained from https://www.football-data.org/ which contains a wealth of information about football competitions, matches and players. We make a request to the API and parse the content. We iterate through the matches of the 2018-2019 EPL season to build a data frame representing every game, consisting of home and away teams, and the number of goals scored by the respective teams.
home | away | homegoals | awaygoals |
---|---|---|---|
Manchester United FC | Leicester City FC | 2 | 1 |
Newcastle United FC | Tottenham Hotspur FC | 1 | 2 |
Fulham FC | Crystal Palace FC | 0 | 2 |
Huddersfield Town AFC | Chelsea FC | 0 | 3 |
Watford FC | Brighton & Hove Albion FC | 2 | 0 |
AFC Bournemouth | Cardiff City FC | 2 | 0 |
Wolverhampton Wanderers FC | Everton FC | 2 | 2 |
Liverpool FC | West Ham United FC | 4 | 0 |
Southampton FC | Burnley FC | 0 | 0 |
Arsenal FC | Manchester City FC | 0 | 2 |
Cardiff City FC | Newcastle United FC | 0 | 0 |
Tottenham Hotspur FC | Fulham FC | 3 | 1 |
Everton FC | Southampton FC | 2 | 1 |
Leicester City FC | Wolverhampton Wanderers FC | 2 | 0 |
West Ham United FC | AFC Bournemouth | 1 | 2 |
Chelsea FC | Arsenal FC | 3 | 2 |
Manchester City FC | Huddersfield Town AFC | 6 | 1 |
Burnley FC | Watford FC | 1 | 3 |
Brighton & Hove Albion FC | Manchester United FC | 3 | 2 |
Crystal Palace FC | Liverpool FC | 0 | 2 |
The mean of home and away goals for the overall season are extracted and used as the parameter of the Poisson distribution. We plot the count of goals against the distribution to evaluate the fit and see that the Poisson is appropriate. An alternative distribution to consider could be the neagative binomial.
## avg_home_goals avg_away_goals
## 1 1.568421 1.252632
With this information, can now estimate the probability of events like scoring at least two goals at home or the probability of a draw.
## [1] 0.464808
g <- seq(0,max(epl_data$homegoals, epl_data$awaygoals))
home_draw <- dpois(g, lambdas$avg_home_goals)
away_draw <- dpois(g, lambdas$avg_away_goals)
p_draw <- sum(home_draw*away_draw)
p_draw
## [1] 0.2481926
The data is restructured for modeling, with goals
as the target variable. Poisson belongs to the GLM family of regression and is linked to the response variables via a log link.
model_data <- rbind(data.frame(team=epl_data$home, opponent=epl_data$away, home=1, goals=epl_data$homegoals),
data.frame(team=epl_data$away, opponent=epl_data$home, home=0, goals=epl_data$awaygoals))
showtable(head(model_data,20),"")
team | opponent | home | goals |
---|---|---|---|
Manchester United FC | Leicester City FC | 1 | 2 |
Newcastle United FC | Tottenham Hotspur FC | 1 | 1 |
Fulham FC | Crystal Palace FC | 1 | 0 |
Huddersfield Town AFC | Chelsea FC | 1 | 0 |
Watford FC | Brighton & Hove Albion FC | 1 | 2 |
AFC Bournemouth | Cardiff City FC | 1 | 2 |
Wolverhampton Wanderers FC | Everton FC | 1 | 2 |
Liverpool FC | West Ham United FC | 1 | 4 |
Southampton FC | Burnley FC | 1 | 0 |
Arsenal FC | Manchester City FC | 1 | 0 |
Cardiff City FC | Newcastle United FC | 1 | 0 |
Tottenham Hotspur FC | Fulham FC | 1 | 3 |
Everton FC | Southampton FC | 1 | 2 |
Leicester City FC | Wolverhampton Wanderers FC | 1 | 2 |
West Ham United FC | AFC Bournemouth | 1 | 1 |
Chelsea FC | Arsenal FC | 1 | 3 |
Manchester City FC | Huddersfield Town AFC | 1 | 6 |
Burnley FC | Watford FC | 1 | 1 |
Brighton & Hove Albion FC | Manchester United FC | 1 | 3 |
Crystal Palace FC | Liverpool FC | 1 | 0 |
pois_model <- glm(goals ~ home + team + opponent, family=poisson(link=log), data=model_data)
summary(pois_model)
##
## Call:
## glm(formula = goals ~ home + team + opponent, family = poisson(link = log),
## data = model_data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.22803 -1.02266 -0.07358 0.50507 2.95532
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.309958 0.190911 1.624 0.104466
## home 0.224823 0.061471 3.657 0.000255 ***
## teamNewcastle United FC -0.443771 0.198225 -2.239 0.025174 *
## teamFulham FC -0.623130 0.211966 -2.940 0.003285 **
## teamHuddersfield Town AFC -1.064297 0.246910 -4.310 1.63e-05 ***
## teamWatford FC -0.218885 0.186353 -1.175 0.240164
## teamAFC Bournemouth -0.133506 0.182665 -0.731 0.464851
## teamWolverhampton Wanderers FC -0.333015 0.191725 -1.737 0.082397 .
## teamLiverpool FC 0.283268 0.163386 1.734 0.082965 .
## teamSouthampton FC -0.357939 0.194228 -1.843 0.065346 .
## teamArsenal FC 0.113481 0.170847 0.664 0.506546
## teamCardiff City FC -0.635008 0.211935 -2.996 0.002733 **
## teamTottenham Hotspur FC 0.015482 0.174361 0.089 0.929246
## teamEverton FC -0.193872 0.184394 -1.051 0.293074
## teamLeicester City FC -0.249205 0.187329 -1.330 0.183419
## teamWest Ham United FC -0.222848 0.186341 -1.196 0.231729
## teamChelsea FC -0.046227 0.177056 -0.261 0.794027
## teamManchester City FC 0.349636 0.161204 2.169 0.030090 *
## teamBurnley FC -0.354960 0.194236 -1.827 0.067630 .
## teamBrighton & Hove Albion FC -0.614774 0.209921 -2.929 0.003405 **
## teamCrystal Palace FC -0.244292 0.187344 -1.304 0.192241
## opponentTottenham Hotspur FC -0.192416 0.215842 -0.891 0.372680
## opponentCrystal Palace FC 0.099336 0.199498 0.498 0.618532
## opponentChelsea FC -0.196356 0.215832 -0.910 0.362948
## opponentBrighton & Hove Albion FC 0.207993 0.193860 1.073 0.283316
## opponentCardiff City FC 0.347072 0.188169 1.844 0.065114 .
## opponentEverton FC -0.039719 0.206574 -0.192 0.847527
## opponentWest Ham United FC 0.137465 0.197774 0.695 0.487018
## opponentBurnley FC 0.343316 0.188765 1.819 0.068951 .
## opponentManchester City FC -0.693964 0.253873 -2.734 0.006266 **
## opponentNewcastle United FC -0.008777 0.204340 -0.043 0.965741
## opponentFulham FC 0.507814 0.182377 2.784 0.005362 **
## opponentSouthampton FC 0.298063 0.190554 1.564 0.117773
## opponentWolverhampton Wanderers FC -0.046560 0.206557 -0.225 0.821661
## opponentAFC Bournemouth 0.383428 0.187682 2.043 0.041056 *
## opponentArsenal FC 0.082663 0.201403 0.410 0.681488
## opponentHuddersfield Town AFC 0.432121 0.184554 2.341 0.019210 *
## opponentWatford FC 0.207872 0.194634 1.068 0.285513
## opponentManchester United FC 0.131993 0.198659 0.664 0.506420
## opponentLiverpool FC -0.744472 0.257723 -2.889 0.003869 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 975.98 on 759 degrees of freedom
## Residual deviance: 762.55 on 720 degrees of freedom
## AIC: 2210.2
##
## Number of Fisher Scoring iterations: 5
modelresults <- data.frame(summary(pois_model)[12])
colnames(modelresults) <- c('estimate','std_err','z_val','p_val')
showtable(modelresults,"")
estimate | std_err | z_val | p_val | |
---|---|---|---|---|
(Intercept) | 0.3099584 | 0.1909109 | 1.6235762 | 0.1044663 |
home | 0.2248228 | 0.0614710 | 3.6573827 | 0.0002548 |
teamNewcastle United FC | -0.4437708 | 0.1982247 | -2.2387257 | 0.0251738 |
teamFulham FC | -0.6231297 | 0.2119663 | -2.9397578 | 0.0032847 |
teamHuddersfield Town AFC | -1.0642969 | 0.2469099 | -4.3104660 | 0.0000163 |
teamWatford FC | -0.2188852 | 0.1863525 | -1.1745760 | 0.2401644 |
teamAFC Bournemouth | -0.1335065 | 0.1826649 | -0.7308819 | 0.4648513 |
teamWolverhampton Wanderers FC | -0.3330153 | 0.1917251 | -1.7369417 | 0.0823975 |
teamLiverpool FC | 0.2832683 | 0.1633864 | 1.7337329 | 0.0829655 |
teamSouthampton FC | -0.3579393 | 0.1942277 | -1.8428849 | 0.0653458 |
teamArsenal FC | 0.1134810 | 0.1708471 | 0.6642255 | 0.5065460 |
teamCardiff City FC | -0.6350079 | 0.2119351 | -2.9962372 | 0.0027333 |
teamTottenham Hotspur FC | 0.0154822 | 0.1743605 | 0.0887942 | 0.9292455 |
teamEverton FC | -0.1938720 | 0.1843940 | -1.0514010 | 0.2930745 |
teamLeicester City FC | -0.2492048 | 0.1873294 | -1.3303026 | 0.1834186 |
teamWest Ham United FC | -0.2228478 | 0.1863406 | -1.1959164 | 0.2317292 |
teamChelsea FC | -0.0462267 | 0.1770562 | -0.2610846 | 0.7940273 |
teamManchester City FC | 0.3496364 | 0.1612044 | 2.1689010 | 0.0300902 |
teamBurnley FC | -0.3549598 | 0.1942362 | -1.8274648 | 0.0676299 |
teamBrighton & Hove Albion FC | -0.6147739 | 0.2099211 | -2.9285950 | 0.0034050 |
teamCrystal Palace FC | -0.2442920 | 0.1873435 | -1.3039789 | 0.1922408 |
opponentTottenham Hotspur FC | -0.1924161 | 0.2158425 | -0.8914653 | 0.3726796 |
opponentCrystal Palace FC | 0.0993365 | 0.1994979 | 0.4979325 | 0.6185316 |
opponentChelsea FC | -0.1963557 | 0.2158321 | -0.9097614 | 0.3629484 |
opponentBrighton & Hove Albion FC | 0.2079926 | 0.1938602 | 1.0728997 | 0.2833161 |
opponentCardiff City FC | 0.3470723 | 0.1881687 | 1.8444744 | 0.0651140 |
opponentEverton FC | -0.0397191 | 0.2065741 | -0.1922752 | 0.8475267 |
opponentWest Ham United FC | 0.1374646 | 0.1977741 | 0.6950588 | 0.4870184 |
opponentBurnley FC | 0.3433157 | 0.1887654 | 1.8187428 | 0.0689507 |
opponentManchester City FC | -0.6939639 | 0.2538727 | -2.7335115 | 0.0062663 |
opponentNewcastle United FC | -0.0087766 | 0.2043400 | -0.0429510 | 0.9657406 |
opponentFulham FC | 0.5078137 | 0.1823766 | 2.7844229 | 0.0053623 |
opponentSouthampton FC | 0.2980625 | 0.1905540 | 1.5641891 | 0.1177732 |
opponentWolverhampton Wanderers FC | -0.0465599 | 0.2065567 | -0.2254097 | 0.8216606 |
opponentAFC Bournemouth | 0.3834284 | 0.1876823 | 2.0429656 | 0.0410558 |
opponentArsenal FC | 0.0826626 | 0.2014031 | 0.4104339 | 0.6814877 |
opponentHuddersfield Town AFC | 0.4321208 | 0.1845543 | 2.3414292 | 0.0192101 |
opponentWatford FC | 0.2078724 | 0.1946339 | 1.0680174 | 0.2855127 |
opponentManchester United FC | 0.1319935 | 0.1986590 | 0.6644223 | 0.5064201 |
opponentLiverpool FC | -0.7444725 | 0.2577228 | -2.8886555 | 0.0038689 |
From the summary of this model, we can draw some interesting conclusions:
home
is significant with a coefficient of 0.224823, which means that a team playing at home are more likely to score goals.These findings might be obvious to someone with knowledge of the EPL and of that season in particular. However, we can turn to prediction for further insights.
We use our model to look at two teams, Liverpool (at home) vs Chelsea, and the number of goals scored predicted for each team.
chelsea <- round(predict(pois_model,
data.frame(home=0, team="Chelsea FC",
opponent="Liverpool FC"), type="response"),0)
liverpool <- round(predict(pois_model,
data.frame(home=1, team="Liverpool FC",
opponent="Chelsea FC"), type="response"),0)
print(paste0('Liverpool FC (home) ',liverpool,' - ',chelsea,' Chelsea FC (away)'))
## [1] "Liverpool FC (home) 2 - 1 Chelsea FC (away)"
We can pull out all the games played by Liverpool FC and whether goals were scored during the home games and or away games.
home | away | homegoals | awaygoals |
---|---|---|---|
Liverpool FC | West Ham United FC | 4 | 0 |
Liverpool FC | Brighton & Hove Albion FC | 1 | 0 |
Liverpool FC | Southampton FC | 3 | 0 |
Liverpool FC | Manchester City FC | 0 | 0 |
Liverpool FC | Cardiff City FC | 4 | 1 |
Liverpool FC | Fulham FC | 2 | 0 |
Liverpool FC | Everton FC | 1 | 0 |
Liverpool FC | Manchester United FC | 3 | 1 |
Liverpool FC | Newcastle United FC | 4 | 0 |
Liverpool FC | Arsenal FC | 5 | 1 |
Liverpool FC | Crystal Palace FC | 4 | 3 |
Liverpool FC | Leicester City FC | 1 | 1 |
Liverpool FC | AFC Bournemouth | 3 | 0 |
Liverpool FC | Watford FC | 5 | 0 |
Liverpool FC | Burnley FC | 4 | 2 |
Liverpool FC | Tottenham Hotspur FC | 2 | 1 |
Liverpool FC | Chelsea FC | 2 | 0 |
Liverpool FC | Huddersfield Town AFC | 5 | 0 |
Liverpool FC | Wolverhampton Wanderers FC | 2 | 0 |
The data can be manipulated further to compare the actual scores vs the predicted score. This is shown for Liverpool FC, a team that consistently performs very well and Huddersfield Town AFC, a team that finished at the bottom of the table.
home | away | actualscore | predscore |
---|---|---|---|
Liverpool FC | West Ham United FC | 4-0 | 3-1 |
Liverpool FC | Brighton & Hove Albion FC | 1-0 | 3-0 |
Liverpool FC | Southampton FC | 3-0 | 3-0 |
Liverpool FC | Manchester City FC | 0-0 | 1-1 |
Liverpool FC | Cardiff City FC | 4-1 | 3-0 |
Liverpool FC | Fulham FC | 2-0 | 4-0 |
Liverpool FC | Everton FC | 1-0 | 2-1 |
Liverpool FC | Manchester United FC | 3-1 | 3-1 |
Liverpool FC | Newcastle United FC | 4-0 | 2-0 |
Liverpool FC | Arsenal FC | 5-1 | 2-1 |
Liverpool FC | Crystal Palace FC | 4-3 | 3-1 |
Liverpool FC | Leicester City FC | 1-1 | 2-1 |
Liverpool FC | AFC Bournemouth | 3-0 | 3-1 |
Liverpool FC | Watford FC | 5-0 | 3-1 |
Liverpool FC | Burnley FC | 4-2 | 3-0 |
Liverpool FC | Tottenham Hotspur FC | 2-1 | 2-1 |
Liverpool FC | Chelsea FC | 2-0 | 2-1 |
Liverpool FC | Huddersfield Town AFC | 5-0 | 3-0 |
Liverpool FC | Wolverhampton Wanderers FC | 2-0 | 2-0 |
home | away | actualscore | predscore |
---|---|---|---|
Huddersfield Town AFC | Chelsea FC | 0-3 | 0-2 |
Huddersfield Town AFC | Cardiff City FC | 0-0 | 1-1 |
Huddersfield Town AFC | Crystal Palace FC | 0-1 | 1-2 |
Huddersfield Town AFC | Tottenham Hotspur FC | 0-2 | 0-2 |
Huddersfield Town AFC | Liverpool FC | 0-1 | 0-3 |
Huddersfield Town AFC | Fulham FC | 1-0 | 1-1 |
Huddersfield Town AFC | West Ham United FC | 1-1 | 1-2 |
Huddersfield Town AFC | Brighton & Hove Albion FC | 1-2 | 1-1 |
Huddersfield Town AFC | Newcastle United FC | 0-1 | 1-1 |
Huddersfield Town AFC | Southampton FC | 1-3 | 1-1 |
Huddersfield Town AFC | Burnley FC | 1-2 | 1-1 |
Huddersfield Town AFC | Manchester City FC | 0-3 | 0-3 |
Huddersfield Town AFC | Everton FC | 0-1 | 1-2 |
Huddersfield Town AFC | Arsenal FC | 1-2 | 1-2 |
Huddersfield Town AFC | Wolverhampton Wanderers FC | 1-0 | 1-2 |
Huddersfield Town AFC | AFC Bournemouth | 0-2 | 1-2 |
Huddersfield Town AFC | Leicester City FC | 1-4 | 1-2 |
Huddersfield Town AFC | Watford FC | 1-2 | 1-2 |
Huddersfield Town AFC | Manchester United FC | 1-1 | 1-2 |
This example showed how to use Poisson regression and apply it to sports data to draw some statistical insights and predict game scores. While the insights may be satisfying, there are many variable factors in sports leagues such as injuries or transfers that can have significant impact a team’s performance. Such complexity is not captured in this model, interesting results from this simple model can be foundational for more robust score prediction exercises.