2025-10-27

About the Data

  • The dataset contains information on all the soccer players from the 2023/2024 season in the top 5 leagues (English Premier League, Spanish La Liga, German Bundesliga, French Ligue 1, Italian Serie A)
  • The data is sourced from Kaggle
  • There are 37 columns, with most of them being statistics like Goals Scored, Goals Assisted, Minutes Played, etc.

Bar Charts (plotly)

Bar Chart Analysis

  • The bar charts show the number of players from the top 30 nationalities. The chart on the left includes the countries in which the top 5 domestic leagues are located (England, Germany, Spain, France, Italy), and the left chart excludes them.

  • The top countries are the top 5 league countries, this is no surprise. Spain has the most players.

  • Looking at the chart without the top 5 league countries, Brazil has the most players, followed by Argentina, Portugal, Netherlands, and Belgium.

3D Scatterplot (plotly)

3D Scatterplot Analysis

  • Here we look at if Progressive Runs and Progressive Carries are associated with Assists
  • There does not appear to be a strong relationship, as some of the top assisters have low numbers of progressive runs and carries, relatively, and some players with a high number of carries and runs have low assists
  • The color shows the player’s position, and as expected, Midfielders and Forwards generally have more assists

Line Graphs (ggplot)

Line Graph Analysis

  • Forwards and Midfielders have the highest amount of goals and assists,
  • Defenders have a low amount of goals and assists, with goalkeepers having almost none across all ages
  • For most positions, there is an upturn at around 19 years old when players start to score and assist more
  • Most positions tend to drop in G+A as they get older, except for Forwards which continue to rise over time

Scatterplot (ggplot)

Scatterplot Analysis

  • This plot features xG (Expected Goals) and Goals (Actual number of goals scored) to look at how well the xG stat predicts goals.
  • Since xG is continuous and Goals is discrete, if the difference between them was less than 0.5, they were counted as performing as expected
  • For around 59% of players, xG predicted the number of goals scored accurately
  • Many of the players who performed as expected had 0 xG and 0 goalss

Statistical Analysis

  • This is a 5-number summary of the age of players within each league
  • Ligue 1 has the youngest players overall, though not by much
  • The distribution of ages across all leagues are fairly similar
## # A tibble: 5 × 6
##   Comp                 min    q1 median    q3   max
##   <chr>              <int> <dbl>  <dbl> <dbl> <int>
## 1 de Bundesliga         16    22     25    28    39
## 2 eng Premier League    15    21     25    28    38
## 3 es La Liga            16    22     26    29    40
## 4 fr Ligue 1            15    20     23    27    39
## 5 it Serie A            15    22     25    28    40

End

  • I hoped you have learned something from this exploratory data analysis of the top 5 soccer leagues
    • Brazil, Argentina, and Portugal are the most common foreign nationalities in the top 5 leagues
    • Progressive Runs and Carries are not a good indicator of Assists
    • As expected, Forwards and Midfielders have the most Goals and Assists
    • xG predicts Goals correctly around 59 percent of the time
    • Age of players is similar across all 5 leagues