The beginning of the Premier League season marks the return of some of the most exciting football in the world. For me and other Fantasy Premier League managers, this is the beginning of weekends beginning with fans chanting from the Emirates at 4:30 AM (if you’re on the West Coast of the US like me), all while furiously refreshing your FPL points screen as you monitor results from other matches. Like other fantasy sports, analytics has become increasingly prevalent in creating the ultimate FPL lineup. Given my obsession with all things Premier League, and that I am both an FPL Draft player and work with data and modeling, I decided to take the plunge into FPL analytics for my own purposes, and developed a model geared towards predicting player points by gameweek for the 2023/24 season.
The model used is a linear OLS regression, which estimates dependent variable values (in this case, FPL points) using one or a series of independent variables. The independent variables used in this model include:
This model repository can be found in the following Google Drive location: https://drive.google.com/drive/folders/1MczPsv4Kffw2A_HayIDiYyFrxeHz97Cv?usp=drive_link
This repository also features a script for a Shiny tool to optimize your FPL draft team, and directly compare the output of two players by position and gameweek. For more details on how the model performs specific calculations, read the user guide stored in the repository.
This model is heavily reliant on live gameweek data catalogued and provided by Github user vaastav, who’s own Github repository contains weekly performance data for each player going back to the 2016/17 season (1). This includes weekly updates for the 2023/24 season from the Premier League API. Additionally, this model utilizes the R package fplR (2) to validate data from the Premier League API, apply transfer and injury news, and update live form for players.
As previously noted, the model predicts points scored by each player by gameweek. This is dependent on opponent, historical matchup data from both the teams and players, and current season performance data. The predicted points are altered to reflect injuries (including probabilities of playing or not playing), transfers, and live form:
The accuracy of the model is validated by comparing the model output to week-by-week performance during the 2023/24 season. The model performance statistics as of gameweek 4 can be seen below:
| R^2 | RMSE | Avg. Error |
|---|---|---|
| 0.71 | 0.85 | 0.45 |
The model currently overestimates true weekly point totals by about 0.5 points on average. Visual inspection of the model does not appear to show signs of over or underfitting.
The Github link to the repository also contains an R shiny app that is designed to optimize lineups for FPL draft league managers. The tool allows you upload the model results, select all the players in your team, select a gameweek, and see the optimal lineup to play with your given roster. I have included my own roster in the examples below for analytical use of the tool:
Additionally, the tool can directly compare players by gameweek for waiver or other transfer decisions. This is done by comparing the following three metrics:
Z-scores for players predicted points, overall and by position;
Direct comparison of predicted points.
The screenshot below shows the tool comparing Malo Gusto and Pedro Porro for gameweek 4, with Gusto showing a more favorable matchup and projection.
I personally will use this tool to assess my week-to-week lineup in the FPL draft league, however this code could also be adapted for use in public leagues as well.
The documentation for this model can be found in previously referenced Github repository. This model is a simple OLS regression that uses the following independent variables to predict total points for a given gameweek:
Player
Team
Opponent team
Probability of playing by historical player trends
Probability of starting by historical player trends
Goals scored per 90 minutes by historical player trends
Goals conceded per 90 minutes by historical player trends
Assists per 90 minutes by historical player trends
Clean sheets per 90 minutes by historical player trends
Saves per 90 minutes by historical player trends
Yellow cards per 90 minutes by historical player trends
Red cards per 90 minutes by historical player trends
Penalties missed per 90 minutes by historical player trends
Penalties saved per 90 minutes by historical player trends
Fixture difficulty rating, determined by number of points scored
For prediction, the model replaces previous season statistics with those for the 2024 season.
The script also pulls live form, injury, suspension, and transfer values and news from the FPL API using the fplr package to update player availability and predicted points.
Please see the model repository listed above for the original R code and user guide in Word document format.
Future models and predictions could incorporate underlying statistics such as expected goals, assists, or clean sheets (xG, xA, xCS). However, these statistics may be better used to understand player form during the season as failure to match expected stats could indicate whether players are overperforming or underperforming.