2025-11-02

The Dataset

-This project analyzes the Ballon d’Or 2025 Nominees Dataset, compiled from a season’s worth of football performance statistics for the world’s elite players.

-The data set focuses on nominees for the prestigious Ballon d’Or award, containing key metrics that quantify offensive and expected output across major European leagues.

-The data set has been cleaned and transformed to focus on variables essential for performance analysis, including: Goals and Assists, Minutes Played, Expected Goals (\(\mathbf{xG}\)), Expected Assists (\(\mathbf{xAG}\)), player Position, and Geographical (Nation/Continent) details.

Brief Overview

We will explore the relationship between expected performance and actual output among the nominees through various visualizations and statistical modeling. The analysis is structured as follows:

-Top 15 Bar Chart: Identifies the nominees with the highest absolute assist totals, categorized by their competitive league.

-Distribution Violin Plot: Visualizes the spread and central tendency of Goals per 90 minutes (\(\mathbf{G/90}\)) across the simplified position groups (Forward, Midfielder, Defender).

-Interactive Plots (Bubble & 3D): Show the relationship between expected performance (\(\mathbf{xG/90}, \mathbf{xAG/90}\)) and actual output (\(\mathbf{G+A/90}\)) to identify efficiency and creative clusters.

-Donut Chart: Maps the geographical distribution of the nominees by their respective Continent to illustrate regional representation.

-Statistical Analysis: Includes a five-number summary to compare the \(\mathbf{G+A/90}\) distribution by position and a Linear Regression Model to statistically confirm the predictive power of expected metrics (\(\mathbf{xG/90} + \mathbf{xAG/90}\)) over actual goal involvement (\(\mathbf{G+A/90}\)).

Top 15 Ballon d’Or Nominees by Assists

This horizontal bar chart highlights the top 15 playmakers among the nominees based on their absolute assist totals. The visualization orders players from the lowest to the highest assist count, with bars colored by their respective League to provide crucial context regarding the competitive environment in which these contributions were made.

Ggplot Violin Plot—Goals per 90 by Position

This violin + box plot shows the distribution of Goals per 90 minutes for each position group. The shape of each violin reveals how consistent or varied scoring output is within that role. We can observe that forwards generally have the highest median scoring rate, while defenders and goalkeepers contribute less to goal scoring, as expected.

Plotly Bubble Plot — xG/90 vs G+A/90

This interactive bubble chart compares expected goals (xG/90) with actual goal involvement (G+A/90) for each player. Bubble size represents minutes played, and color represents the player’s league. Players above the trend line are outperforming their expected metrics, meaning they convert chances efficiently; those below underperform relative to their xG.

plotly — 3D Plot: xG/90 × xAG/90 × G+A/90

This 3D scatter plot explores how expected goals (xG/90) and expected assists (xAG/90) combine to produce overall contribution (G+A/90). Each dot represents a player, and the color shows their position. Forwards and attacking midfielders generally cluster in higher xG and xAG zones, while defenders sit near the lower end — highlighting positional specialization.

3D Plot Analysis

The 3D scatter plot visualizes the relationship between expected goals (xG/90), expected assists (xAG/90), and total goal contributions (G+A/90).
Each point represents a nominee, with color indicating their position.

From the plot, we can see that forwards generally occupy the upper-right area, showing both high xG and xAG values, while midfielders appear in moderate ranges, balancing creation and scoring. Defenders cluster toward the lower end, indicating limited offensive involvement.

This visualization highlights how offensive roles produce stronger contributions across all expected and realized metrics, demonstrating clear positional patterns among top players.

plotly — Donut Chart: Nominees by Continent

This donut chart shows the distribution of nominees by continent, based on their nationality. It provides a clear picture of regional dominance — Europe and South America usually lead in representation, reflecting where most elite football leagues are located.

Statistical Analysis

To summarize overall performance, we compared goal involvement per 90 minutes (G+A/90) across positions using a five-number summary and a simple regression model linking expected metrics (xG/90, xAG/90) to actual output (G+A/90).

## # A tibble: 4 × 6
##   position_simple   Min    Q1 Median    Q3   Max
##   <chr>           <dbl> <dbl>  <dbl> <dbl> <dbl>
## 1 DF               0.11 0.21    0.41  0.44  0.54
## 2 FW               0.53 0.755   0.98  1.28  1.48
## 3 GK               0    0       0     0     0   
## 4 MF               0    0.35    0.61  0.65  1.04

conclusion

Forwards and attacking midfielders dominate both scoring and assists per 90 minutes. - The 3D plot shows positional clusters—offensive roles consistently lead in expected and actual output. - Regression results confirm that expected metrics (xG/90, xAG/90) align with real performance (G+A/90). - The continent chart shows that while Europe dominates, strong representation also comes from South America and Africa, emphasizing football’s global talent pool.