3/25/2021

Data Source

  • The data source will be: https://www.football-data.org/
  • The source provides data on various football(soccer) competitions. The API access provides competition, match, player, score, teams, areas and line up based information on games, including even live updates on these.
  • Another important data source to be utilized is captured as the European Soccer Dataset: https://www.kaggle.com/hugomathien/soccer\
  • Data.world also includes several datasets which can be benefited from to combine knowledge from multiple sources: https://data.world/datasets/soccer

Problem Description

  • Prediction of football games results is one of the most valued and promising subjects in this sport’s analytics field. The betting industry is mainly founded up on the potential results of the games and assignment of probabilities to each potential outcome for a prospective game. During this project, quantitative predictions will be conducted for football games. Based on avalibility of the API, these quantitative outcomes will include:
  • Scores of teams or players,
  • Number of red cards or yellow cards during a game,
  • Number of corners,
  • Prediction of winner team in a game based on historical data, and more.

Analytics Plan

  • Prediction of game results have been a challenging subject which has been focused by various researchers and practitioners. During this project, based on historical score data and game data from cross competitions, a study to predict game results will be conducted. The main element of this study will be scorings of games. Other potential studies are:
  • Football fans are familiar with the fact that referees are highly influential to some sort of games in a certain sense. Based on the limited availability of referee data, a pattern exploration on referees potential impact to game scores will be mapped.
  • Certain football players score more than others. Some occur due to position while some are seasonal. Based on historical data, prediction of scoring of individual players will be analyzed.

Technical Approach to Analytics Plan


- Due to the fact that football api provides set of interesting and highly sophisticated sort of informations, an extensive set of data wrangling will be applied to the data in order to extract the right information out of it.

- During the analytical applications, different ML models such as regression, support vector machines and clustering will be utilized. Clustering is forecasted to be useful in projection of situation of games and data to visualize and explore. It’s also planned to be a front application to provide deeper insights on potential applications.

- Deep learning modelling will be based on prediction of game results of individual teams based on historical dataset.

Evaluation

  • Our results will be showcased in various data visualization formats. These would include scatter plots and line graphs that will show the relationship between our set dependent and independent variables.
  • Confusion matrix, accuracy, precision, recall, roc-auc curve, R^2 and other useful metrics will be utilized in the game prediction and score based regression models.
  • Thank you for listening!