import os
os.environ['QT_QPA_PLATFORM_PLUGIN_PATH'] = 'C:/ProgramData/Anaconda3/Library/plugins/platforms'
This is a dataset containing statistics from FIFA on all players in the game. The code below is basically just getting the libraries needed to make the visualizations, as well as importing the dataset into a DataFrame
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
def set_file_path(path):
return path
file_path = "//apporto.com/dfs/LOYOLA/Users/bfames_loyola/Desktop/Fifa_Data.csv"
df = pd.read_csv(file_path)
# print(df.info())
This visualization limits the data set to only players who: - Have ‘ST’ (Striker) in their preferred position value - Have an ‘OVR’ greater than a 68
This visualization compares the two following statistics: - Shot Power - Sprint Speed
The goal of this visualization is to see if there is a strong/weak correlation between shot power and sprint speed
# First Visualization: (Strikers Only with Overall > 68) Comparing Shot Power and Sprint Speed
# Making a DataFrame with just players who play Striker ('ST')
strikers_df = df[df['Preferred Positions'].str.contains('ST', na=False)]
strikers_df = strikers_df[strikers_df["Overall"] >= 68]
# Cleaning up the data
strikers_df['Sprint speed'] = pd.to_numeric(strikers_df['Sprint speed'], errors='coerce')
strikers_df['Shot power'] = pd.to_numeric(strikers_df['Shot power'], errors='coerce')
df = df.dropna(subset=['Sprint speed', 'Shot power'])
# Creating the visualization
plt.figure(num="FIFA Striker Analysis: Shot Power vs. Sprint Speed", figsize=(7, 7))
plt.xlim(0, 99)
plt.ylim(0, 99)
plt.xticks(range(0, 100, 10))
plt.yticks(range(0, 100, 10))
plt.grid(True, linestyle="--", alpha=0.6)
plt.scatter(strikers_df['Sprint speed'], strikers_df['Shot power'], color='blue', alpha=0.5)
plt.plot([0, 99], [0, 99], 'r--', alpha=0.3, label='Equal values')
# Labeling Plot
plt.xlabel("Sprint Speed")
plt.ylabel("Shot Power")
plt.title("Comparison of Sprint Speed and Shot Power for Strikers w/ a 68 Overall or Better")
# Creating Legend
plt.legend(loc='upper left')
plt.axis('square')
plt.tight_layout()
plt.show()
This visualization limits the data set to only players who: - Have ‘ST’ (Striker) in their preferred position value - Have an ‘OVR’ greater than a 68
This visualization shows the distribution of shot power amongst these strikers
The goal of this visualization is to get a better understanding of where most strikers’ “Shot Power” statistic lands
# Second Visualization: (Same Striker Dataset) Creating a histogram showing the display of Shot Power
plt.figure(num="FIFA Analysis: Distribution of Shot Power Among Strikers", figsize=(7, 4))
plt.hist(strikers_df["Shot power"], bins=20, color="blue", edgecolor="black", alpha=0.7)
plt.xlabel("Shot Power")
plt.ylabel("Number of Players")
plt.title("Distribution of Shot Power Among Strikers with an Overall of 68 or Better")
plt.grid(axis="y", linestyle="--", alpha=0.7)
plt.show()
This visualization has no extra limitations on the data
This visualization shows what the age distribution is amongst the players in the dataset. This is a pie chart, broken up into different bins.
# Third Visualization (FIFA Age Distribution)
age_groups = pd.cut(df['Age'], bins=[14, 21, 25, 30, 35, 40], labels=['15-21', '22-25', '26-30', '31-35', '36+'])
age_groups.value_counts().plot(kind='pie', autopct='%1.1f%%', figsize=(7, 7))
plt.ylabel("Percentage")
plt.title("Age Distribution in FIFA")
plt.show()
This visualization has no further limitations on the data
This visualization shows what the average overall rating is as players age over time
# Fourth Distribution (Average Overall Rating by Age)
# Creating DF for visualization
age_overall_avg = df.groupby('Age')['Overall'].mean()
# Creating the plot
plt.figure(figsize=(7, 4))
plt.plot(age_overall_avg, marker='o', linestyle='-')
plt.xlabel("Age")
plt.ylabel("Average Rating")
plt.title("Average Overall Rating by Age")
plt.grid(True)
plt.show()
This visualization limits the data set to only players who: - Have ‘ST’ (Striker) in their preferred position value - Have an ‘OVR’ greater than an 80
This visualization takes each numerical category, and finds the 5 statistics with the highest averages. This shows which statistics have the most influence on a player’s higher overall rating
# Fifth Visualization (Top 5 Statistics for 80+ OVR Strikers)
# Creating dataset for strikers with >80 OVR
good_strikers_df = strikers_df[strikers_df["Overall"] >= 80]
good_strikers_df_means = good_strikers_df.mean(numeric_only=True)
# Creating a new DF with column means, & removing unwanted columns
good_strikers_df_means = good_strikers_df_means.drop(['ID', 'Special', 'Unnamed: 0', 'Overall'])
# print(good_strikers_df_means)
# Getting top 5 columns
sorted_means = good_strikers_df_means.sort_values(ascending=False)
# Making the Plot
sorted_means[:5].plot(kind='bar', figsize=(7, 4), color='skyblue', edgecolor='black')
plt.xlabel("Columns")
plt.ylabel("Average Value")
plt.title("Top 5 Statistics for Strikers with an 80+ OVR")
plt.xticks(rotation=45)
plt.show()