Hello and welcome to my data visualization project, where I explore into the FIFA world of soccer analytics, With data obtained from Kaggle.com, https://www.kaggle.com/datasets/maso0dahmed/football-players-data
In this project, I aim to clear up the relationship between player attributes and their impact on both overall player rating and market valuation within professional soccer. Through the use of data visualization techniques, I seek to shed light on the factors that influence how players are perceived and valued in the dynamic landscape of the sport.
After thoroughly analyzing the dataset, I’ve investigated the connection between overall player rating and their corresponding value. While it’s commonly assumed that higher ratings translate to greater value, the visualizations I’ve created reveal subtleties that complicate this straightforward assumption.
This chart shows the correlation between age, player value, and overall rating. It illustrates that prime earning years typically fall within the age range of 25 to 35, showcasing an dynamic between age and player worth.
import pandas as pd
path = 'C:/Users/Korisnik/Desktop/'
filename = 'fifa_players.csv'
df = pd.read_csv(path + filename)
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Select columns of interest
selected_columns = ["age", "overall_rating", "value_euro"]
# Create a scatterplot with player value encoded by color
plt.figure(figsize=(10, 6))
sns.scatterplot(x='age', y='overall_rating', hue='value_euro', data=df)
plt.title('Age vs. Overall Rating (Player Value encoded by Color)')
plt.xlabel('Age')
plt.ylabel('Overall Rating')
plt.grid(True)
plt.legend(title='Player Value (Euro)')
plt.show()
These charts show some inconsistencies in distribution among the players. While rating is “evenly” distributed among the players, we can see that the top value belongs to top 2%-3% of the player pool, hence the inconsistency between overall ranking and value.
# Create subplots for overall rating and player value histograms
fig, axes = plt.subplots(2, 1, figsize=(10, 10))
# Histogram of overall ratings
sns.histplot(df['overall_rating'], bins=20, kde=True, color='skyblue', ax=axes[0])
axes[0].set_title('Distribution of Overall Ratings')
axes[0].set_xlabel('Overall Rating')
axes[0].set_ylabel('Frequency')
axes[0].grid(True)
# Histogram of player values
sns.histplot(df['value_euro'], bins=20, kde=True, color='salmon', ax=axes[1])
axes[1].set_title('Distribution of Player Values')
axes[1].set_xlabel('Player Value (Euro)')
axes[1].set_ylabel('Frequency')
axes[1].grid(True)
plt.tight_layout()
plt.show()
Examining the top nationalities by overall rating, this chart suggests variations in football’s popularity across different countries. Notably, while average values were utilized, it implies that certain nationalities exhibit a greater affinity for the sport, as indicated by the data.
# Calculate average overall rating for each nationality
avg_rating_by_nationality = df.groupby('nationality')['overall_rating'].mean().sort_values(ascending=False).head(10)
# Create a bar plot
plt.figure(figsize=(12, 6))
sns.barplot(x=avg_rating_by_nationality.index, y=avg_rating_by_nationality.values, palette='viridis')
plt.title('Top 10 Nationalities by Average Overall Rating')
plt.xlabel('Nationality')
plt.ylabel('Average Overall Rating')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()
Exploring an metric within the dataset – players’ preferred foot – this visualization examines how ratings vary based on this preference. Despite minor deviations, the distribution remains largely consistent across both right and left-footed players, although there’s a notable prevalence of right-footed users.
# Define overall rating categories
overall_rating_categories = ['Low', 'Medium', 'High']
# Calculate the count of players with each overall rating category for each preferred foot
rating_counts = df.groupby(['preferred_foot', pd.cut(df['overall_rating'], bins=[0, 60, 80, 100], labels=overall_rating_categories)], observed=False)['name'].count().unstack()
# Plot the stacked bar plot
plt.figure(figsize=(10, 6))
rating_counts.plot(kind='bar', stacked=True, color=sns.color_palette('Set3', len(overall_rating_categories)))
plt.title('Distribution of Overall Ratings by Preferred Foot')
plt.xlabel('Preferred Foot')
plt.ylabel('Number of Players')
plt.xticks(rotation=0)
plt.legend(title='Overall Rating')
plt.tight_layout()
plt.show()
Despite some players boasting higher overall ratings, this chart reveals that their corresponding value tends to be comparatively lower, indicating potential discrepancies between perceived skill and market worth.
# Sort the DataFrame by overall rating in descending order
sorted_df = df.sort_values(by='overall_rating', ascending=False)
# Select the top 50 players
top_50_players = sorted_df.head(50)
# Define the positions for the bars
positions = np.arange(len(top_50_players))
# Define the width of each bar
bar_width = 0.35
# Divide player value by 1,000,000 to scale it down
scaled_player_value = top_50_players['value_euro'] / 1000000
# Create a figure and axis object
fig, ax = plt.subplots(figsize=(12, 15))
# Plot the overall rating bars
overall_rating_bars = ax.barh(positions - bar_width/2, top_50_players['overall_rating'], bar_width, color='skyblue', label='Overall Rating')
# Plot the player value bars (scaled down)
value_bars = ax.barh(positions + bar_width/2, scaled_player_value, bar_width, color='lightgreen', label='Player Value (Millions)')
# Add text annotations for overall rating bars
for bar, value in zip(overall_rating_bars, top_50_players['overall_rating']):
ax.text(bar.get_width(), bar.get_y() + bar.get_height() / 2, f'{value:.2f}', va='center', ha='left', fontsize=8)
# Add text annotations for player value bars
for bar, value in zip(value_bars, scaled_player_value):
ax.text(bar.get_width(), bar.get_y() + bar.get_height() / 2, f'{value:.2f}', va='center', ha='left', fontsize=8)
# Set labels and title
ax.set_xlabel('Value (Millions of Euro)')
ax.set_ylabel('Player')
ax.set_title('Top 50 Players: Overall Rating vs. Player Value')
ax.set_yticks(positions)
ax.set_yticklabels(top_50_players['name'])
ax.invert_yaxis()
ax.legend()
plt.show()
Arranging the top 50 players based on their value exposes situations where players, despite lower overall rankings, hold considerable market value. This observation highlights how factors beyond mere skill level significantly impact a player’s monetary worth.
# Sort the DataFrame by player value in descending order
sorted_df = df.sort_values(by='value_euro', ascending=False)
# Select the top 50 players
top_50_players = sorted_df.head(50)
# Define the positions for the bars
positions = np.arange(len(top_50_players))
# Define the width of each bar
bar_width = 0.35
# Create a figure and axis object
fig, ax = plt.subplots(figsize=(12, 15))
# Plot the overall rating bars
overall_rating_bars = ax.barh(positions - bar_width/2, top_50_players['overall_rating'], bar_width, color='lightcoral', label='Overall Rating')
# Plot the player value bars
value_bars = ax.barh(positions + bar_width/2, top_50_players['value_euro'] / 1000000, bar_width, color='royalblue', label='Player Value (Millions)')
# Add text annotations for overall rating bars
for bar, value in zip(overall_rating_bars, top_50_players['overall_rating']):
ax.text(bar.get_width(), bar.get_y() + bar.get_height() / 2, f'{value:.2f}', va='center', ha='left', fontsize=8)
# Add text annotations for player value bars
for bar, value in zip(value_bars, top_50_players['value_euro'] / 1000000):
ax.text(bar.get_width(), bar.get_y() + bar.get_height() / 2, f'{value:.2f}M', va='center', ha='left', fontsize=8)
# Set labels and title
ax.set_xlabel('Value (Millions of Euro)')
ax.set_ylabel('Player')
ax.set_title('Top 50 Players: Overall Rating vs. Player Value')
ax.set_yticks(positions)
ax.set_yticklabels(top_50_players['name'])
ax.invert_yaxis()
ax.legend()
plt.show()
The data presented in these charts indicates that although all the featured players possess exceptional skills, not all attain millionaire status. Despite their on-field performance, differences in compensation suggest that factors such as player publicity, popularity, location, and luck play significant roles in determining a player’s worth. These variables highlight the intricate nature of assessing a player’s value, with some factors proving difficult or even impossible to measure accurately.