Poisson Regression

Sports generate multitudes of data. Sports data analysis is useful not only for coaches trying to understand the relative strength or weakness of teams, but also for avid betters around the world looking for an edge. Given the popularity of fantasy sports and betting sites, regression might provide a valuable tool in predicting outcomes for decison making.

Depending on the sport in question, there are a variety of events that bets could be placed on. These events could be simple (team A will beat team B), or more granular (the score between team A and team B will be 3-0) or even more refined (there will be 3 corner kicks in a game, there will exactly 1 red card, at least 2 goals will be score by any team in the first half of a game). The refined examples can get pretty specific and domain knowledge might go a long way in creating and selecting appropriate predictors.

As mentioned above, one of the simplest bets to place is whether or not a team will win. Winning is determined by scoring more points than the other team. Goals or baskets scored are effectively a count. This suggests that we should work with a count data model such as Poisson. The Poisson distribution describes the probability of a number of events within a specific time period. It is described by a unique parameter, \(\lambda\) which represents the rate (in our case, average number of goals) of occurence.

Data

The data is obtained from https://www.football-data.org/ which contains a wealth of information about football competitions, matches and players. We make a request to the API and parse the content. We iterate through the matches of the 2018-2019 EPL season to build a data frame representing every game, consisting of home and away teams, and the number of goals scored by the respective teams.

home away homegoals awaygoals
Manchester United FC Leicester City FC 2 1
Newcastle United FC Tottenham Hotspur FC 1 2
Fulham FC Crystal Palace FC 0 2
Huddersfield Town AFC Chelsea FC 0 3
Watford FC Brighton & Hove Albion FC 2 0
AFC Bournemouth Cardiff City FC 2 0
Wolverhampton Wanderers FC Everton FC 2 2
Liverpool FC West Ham United FC 4 0
Southampton FC Burnley FC 0 0
Arsenal FC Manchester City FC 0 2
Cardiff City FC Newcastle United FC 0 0
Tottenham Hotspur FC Fulham FC 3 1
Everton FC Southampton FC 2 1
Leicester City FC Wolverhampton Wanderers FC 2 0
West Ham United FC AFC Bournemouth 1 2
Chelsea FC Arsenal FC 3 2
Manchester City FC Huddersfield Town AFC 6 1
Burnley FC Watford FC 1 3
Brighton & Hove Albion FC Manchester United FC 3 2
Crystal Palace FC Liverpool FC 0 2

The mean of home and away goals for the overall season are extracted and used as the parameter of the Poisson distribution. We plot the count of goals against the distribution to evaluate the fit and see that the Poisson is appropriate. An alternative distribution to consider could be the neagative binomial.

##   avg_home_goals avg_away_goals
## 1       1.568421       1.252632

With this information, can now estimate the probability of events like scoring at least two goals at home or the probability of a draw.

## [1] 0.464808
## [1] 0.2481926

Modeling

The data is restructured for modeling, with goals as the target variable. Poisson belongs to the GLM family of regression and is linked to the response variables via a log link.

team opponent home goals
Manchester United FC Leicester City FC 1 2
Newcastle United FC Tottenham Hotspur FC 1 1
Fulham FC Crystal Palace FC 1 0
Huddersfield Town AFC Chelsea FC 1 0
Watford FC Brighton & Hove Albion FC 1 2
AFC Bournemouth Cardiff City FC 1 2
Wolverhampton Wanderers FC Everton FC 1 2
Liverpool FC West Ham United FC 1 4
Southampton FC Burnley FC 1 0
Arsenal FC Manchester City FC 1 0
Cardiff City FC Newcastle United FC 1 0
Tottenham Hotspur FC Fulham FC 1 3
Everton FC Southampton FC 1 2
Leicester City FC Wolverhampton Wanderers FC 1 2
West Ham United FC AFC Bournemouth 1 1
Chelsea FC Arsenal FC 1 3
Manchester City FC Huddersfield Town AFC 1 6
Burnley FC Watford FC 1 1
Brighton & Hove Albion FC Manchester United FC 1 3
Crystal Palace FC Liverpool FC 1 0
## 
## Call:
## glm(formula = goals ~ home + team + opponent, family = poisson(link = log), 
##     data = model_data)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.22803  -1.02266  -0.07358   0.50507   2.95532  
## 
## Coefficients:
##                                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                         0.309958   0.190911   1.624 0.104466    
## home                                0.224823   0.061471   3.657 0.000255 ***
## teamNewcastle United FC            -0.443771   0.198225  -2.239 0.025174 *  
## teamFulham FC                      -0.623130   0.211966  -2.940 0.003285 ** 
## teamHuddersfield Town AFC          -1.064297   0.246910  -4.310 1.63e-05 ***
## teamWatford FC                     -0.218885   0.186353  -1.175 0.240164    
## teamAFC Bournemouth                -0.133506   0.182665  -0.731 0.464851    
## teamWolverhampton Wanderers FC     -0.333015   0.191725  -1.737 0.082397 .  
## teamLiverpool FC                    0.283268   0.163386   1.734 0.082965 .  
## teamSouthampton FC                 -0.357939   0.194228  -1.843 0.065346 .  
## teamArsenal FC                      0.113481   0.170847   0.664 0.506546    
## teamCardiff City FC                -0.635008   0.211935  -2.996 0.002733 ** 
## teamTottenham Hotspur FC            0.015482   0.174361   0.089 0.929246    
## teamEverton FC                     -0.193872   0.184394  -1.051 0.293074    
## teamLeicester City FC              -0.249205   0.187329  -1.330 0.183419    
## teamWest Ham United FC             -0.222848   0.186341  -1.196 0.231729    
## teamChelsea FC                     -0.046227   0.177056  -0.261 0.794027    
## teamManchester City FC              0.349636   0.161204   2.169 0.030090 *  
## teamBurnley FC                     -0.354960   0.194236  -1.827 0.067630 .  
## teamBrighton & Hove Albion FC      -0.614774   0.209921  -2.929 0.003405 ** 
## teamCrystal Palace FC              -0.244292   0.187344  -1.304 0.192241    
## opponentTottenham Hotspur FC       -0.192416   0.215842  -0.891 0.372680    
## opponentCrystal Palace FC           0.099336   0.199498   0.498 0.618532    
## opponentChelsea FC                 -0.196356   0.215832  -0.910 0.362948    
## opponentBrighton & Hove Albion FC   0.207993   0.193860   1.073 0.283316    
## opponentCardiff City FC             0.347072   0.188169   1.844 0.065114 .  
## opponentEverton FC                 -0.039719   0.206574  -0.192 0.847527    
## opponentWest Ham United FC          0.137465   0.197774   0.695 0.487018    
## opponentBurnley FC                  0.343316   0.188765   1.819 0.068951 .  
## opponentManchester City FC         -0.693964   0.253873  -2.734 0.006266 ** 
## opponentNewcastle United FC        -0.008777   0.204340  -0.043 0.965741    
## opponentFulham FC                   0.507814   0.182377   2.784 0.005362 ** 
## opponentSouthampton FC              0.298063   0.190554   1.564 0.117773    
## opponentWolverhampton Wanderers FC -0.046560   0.206557  -0.225 0.821661    
## opponentAFC Bournemouth             0.383428   0.187682   2.043 0.041056 *  
## opponentArsenal FC                  0.082663   0.201403   0.410 0.681488    
## opponentHuddersfield Town AFC       0.432121   0.184554   2.341 0.019210 *  
## opponentWatford FC                  0.207872   0.194634   1.068 0.285513    
## opponentManchester United FC        0.131993   0.198659   0.664 0.506420    
## opponentLiverpool FC               -0.744472   0.257723  -2.889 0.003869 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 975.98  on 759  degrees of freedom
## Residual deviance: 762.55  on 720  degrees of freedom
## AIC: 2210.2
## 
## Number of Fisher Scoring iterations: 5
estimate std_err z_val p_val
(Intercept) 0.3099584 0.1909109 1.6235762 0.1044663
home 0.2248228 0.0614710 3.6573827 0.0002548
teamNewcastle United FC -0.4437708 0.1982247 -2.2387257 0.0251738
teamFulham FC -0.6231297 0.2119663 -2.9397578 0.0032847
teamHuddersfield Town AFC -1.0642969 0.2469099 -4.3104660 0.0000163
teamWatford FC -0.2188852 0.1863525 -1.1745760 0.2401644
teamAFC Bournemouth -0.1335065 0.1826649 -0.7308819 0.4648513
teamWolverhampton Wanderers FC -0.3330153 0.1917251 -1.7369417 0.0823975
teamLiverpool FC 0.2832683 0.1633864 1.7337329 0.0829655
teamSouthampton FC -0.3579393 0.1942277 -1.8428849 0.0653458
teamArsenal FC 0.1134810 0.1708471 0.6642255 0.5065460
teamCardiff City FC -0.6350079 0.2119351 -2.9962372 0.0027333
teamTottenham Hotspur FC 0.0154822 0.1743605 0.0887942 0.9292455
teamEverton FC -0.1938720 0.1843940 -1.0514010 0.2930745
teamLeicester City FC -0.2492048 0.1873294 -1.3303026 0.1834186
teamWest Ham United FC -0.2228478 0.1863406 -1.1959164 0.2317292
teamChelsea FC -0.0462267 0.1770562 -0.2610846 0.7940273
teamManchester City FC 0.3496364 0.1612044 2.1689010 0.0300902
teamBurnley FC -0.3549598 0.1942362 -1.8274648 0.0676299
teamBrighton & Hove Albion FC -0.6147739 0.2099211 -2.9285950 0.0034050
teamCrystal Palace FC -0.2442920 0.1873435 -1.3039789 0.1922408
opponentTottenham Hotspur FC -0.1924161 0.2158425 -0.8914653 0.3726796
opponentCrystal Palace FC 0.0993365 0.1994979 0.4979325 0.6185316
opponentChelsea FC -0.1963557 0.2158321 -0.9097614 0.3629484
opponentBrighton & Hove Albion FC 0.2079926 0.1938602 1.0728997 0.2833161
opponentCardiff City FC 0.3470723 0.1881687 1.8444744 0.0651140
opponentEverton FC -0.0397191 0.2065741 -0.1922752 0.8475267
opponentWest Ham United FC 0.1374646 0.1977741 0.6950588 0.4870184
opponentBurnley FC 0.3433157 0.1887654 1.8187428 0.0689507
opponentManchester City FC -0.6939639 0.2538727 -2.7335115 0.0062663
opponentNewcastle United FC -0.0087766 0.2043400 -0.0429510 0.9657406
opponentFulham FC 0.5078137 0.1823766 2.7844229 0.0053623
opponentSouthampton FC 0.2980625 0.1905540 1.5641891 0.1177732
opponentWolverhampton Wanderers FC -0.0465599 0.2065567 -0.2254097 0.8216606
opponentAFC Bournemouth 0.3834284 0.1876823 2.0429656 0.0410558
opponentArsenal FC 0.0826626 0.2014031 0.4104339 0.6814877
opponentHuddersfield Town AFC 0.4321208 0.1845543 2.3414292 0.0192101
opponentWatford FC 0.2078724 0.1946339 1.0680174 0.2855127
opponentManchester United FC 0.1319935 0.1986590 0.6644223 0.5064201
opponentLiverpool FC -0.7444725 0.2577228 -2.8886555 0.0038689

From the summary of this model, we can draw some interesting conclusions:

  • The variable home is significant with a coefficient of 0.224823, which means that a team playing at home are more likely to score goals.
  • Teams Hudderfield, Newcastle, Fulhan, Cardiff and Brighton are all less likely to score goals
    • teamHuddersfield Town AFC is also very significant with the greatest negative coefficient indicating that it is the least likley team to score, which should not be suprising given that Huddersfield finished last in the 18/19 season.
  • Teams were less likely to score when the opponent was Manchester City FC (winner) and Liverpool (runner up) but more likely to score when playing Fulham, Bournemouth and Huddersfield.

These findings might be obvious to someone with knowledge of the EPL and of that season in particular. However, we can turn to prediction for further insights.

Predictions

We use our model to look at two teams, Liverpool (at home) vs Chelsea, and the number of goals scored predicted for each team.

## [1] "Liverpool FC (home) 2 - 1 Chelsea FC (away)"

We can pull out all the games played by Liverpool FC and whether goals were scored during the home games and or away games.

home away homegoals awaygoals
Liverpool FC West Ham United FC 4 0
Liverpool FC Brighton & Hove Albion FC 1 0
Liverpool FC Southampton FC 3 0
Liverpool FC Manchester City FC 0 0
Liverpool FC Cardiff City FC 4 1
Liverpool FC Fulham FC 2 0
Liverpool FC Everton FC 1 0
Liverpool FC Manchester United FC 3 1
Liverpool FC Newcastle United FC 4 0
Liverpool FC Arsenal FC 5 1
Liverpool FC Crystal Palace FC 4 3
Liverpool FC Leicester City FC 1 1
Liverpool FC AFC Bournemouth 3 0
Liverpool FC Watford FC 5 0
Liverpool FC Burnley FC 4 2
Liverpool FC Tottenham Hotspur FC 2 1
Liverpool FC Chelsea FC 2 0
Liverpool FC Huddersfield Town AFC 5 0
Liverpool FC Wolverhampton Wanderers FC 2 0

The data can be manipulated further to compare the actual scores vs the predicted score. This is shown for Liverpool FC, a team that consistently performs very well and Huddersfield Town AFC, a team that finished at the bottom of the table.

home away actualscore predscore
Liverpool FC West Ham United FC 4-0 3-1
Liverpool FC Brighton & Hove Albion FC 1-0 3-0
Liverpool FC Southampton FC 3-0 3-0
Liverpool FC Manchester City FC 0-0 1-1
Liverpool FC Cardiff City FC 4-1 3-0
Liverpool FC Fulham FC 2-0 4-0
Liverpool FC Everton FC 1-0 2-1
Liverpool FC Manchester United FC 3-1 3-1
Liverpool FC Newcastle United FC 4-0 2-0
Liverpool FC Arsenal FC 5-1 2-1
Liverpool FC Crystal Palace FC 4-3 3-1
Liverpool FC Leicester City FC 1-1 2-1
Liverpool FC AFC Bournemouth 3-0 3-1
Liverpool FC Watford FC 5-0 3-1
Liverpool FC Burnley FC 4-2 3-0
Liverpool FC Tottenham Hotspur FC 2-1 2-1
Liverpool FC Chelsea FC 2-0 2-1
Liverpool FC Huddersfield Town AFC 5-0 3-0
Liverpool FC Wolverhampton Wanderers FC 2-0 2-0
home away actualscore predscore
Huddersfield Town AFC Chelsea FC 0-3 0-2
Huddersfield Town AFC Cardiff City FC 0-0 1-1
Huddersfield Town AFC Crystal Palace FC 0-1 1-2
Huddersfield Town AFC Tottenham Hotspur FC 0-2 0-2
Huddersfield Town AFC Liverpool FC 0-1 0-3
Huddersfield Town AFC Fulham FC 1-0 1-1
Huddersfield Town AFC West Ham United FC 1-1 1-2
Huddersfield Town AFC Brighton & Hove Albion FC 1-2 1-1
Huddersfield Town AFC Newcastle United FC 0-1 1-1
Huddersfield Town AFC Southampton FC 1-3 1-1
Huddersfield Town AFC Burnley FC 1-2 1-1
Huddersfield Town AFC Manchester City FC 0-3 0-3
Huddersfield Town AFC Everton FC 0-1 1-2
Huddersfield Town AFC Arsenal FC 1-2 1-2
Huddersfield Town AFC Wolverhampton Wanderers FC 1-0 1-2
Huddersfield Town AFC AFC Bournemouth 0-2 1-2
Huddersfield Town AFC Leicester City FC 1-4 1-2
Huddersfield Town AFC Watford FC 1-2 1-2
Huddersfield Town AFC Manchester United FC 1-1 1-2

Conclusion

This example showed how to use Poisson regression and apply it to sports data to draw some statistical insights and predict game scores. While the insights may be satisfying, there are many variable factors in sports leagues such as injuries or transfers that can have significant impact a team’s performance. Such complexity is not captured in this model, interesting results from this simple model can be foundational for more robust score prediction exercises.