PythonRMD

import os
os.environ['QT_QPA_PLATFORM_PLUGIN_PATH'] = 'C:/ProgramData/Anaconda3/Library/plugins/platforms'

Getting the Data and setting up the visualizations

This is a dataset containing statistics from FIFA on all players in the game. The code below is basically just getting the libraries needed to make the visualizations, as well as importing the dataset into a DataFrame

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

def set_file_path(path):
    return path

file_path = "//apporto.com/dfs/LOYOLA/Users/bfames_loyola/Desktop/Fifa_Data.csv"

df = pd.read_csv(file_path)
# print(df.info())

First Visualization

This visualization limits the data set to only players who: - Have ‘ST’ (Striker) in their preferred position value - Have an ‘OVR’ greater than a 68

This visualization compares the two following statistics: - Shot Power - Sprint Speed

The goal of this visualization is to see if there is a strong/weak correlation between shot power and sprint speed

# First Visualization: (Strikers Only with Overall > 68) Comparing Shot Power and Sprint Speed

    # Making a DataFrame with just players who play Striker ('ST')
strikers_df = df[df['Preferred Positions'].str.contains('ST', na=False)]
strikers_df = strikers_df[strikers_df["Overall"] >= 68]

    # Cleaning up the data
strikers_df['Sprint speed'] = pd.to_numeric(strikers_df['Sprint speed'], errors='coerce')
strikers_df['Shot power'] = pd.to_numeric(strikers_df['Shot power'], errors='coerce')
df = df.dropna(subset=['Sprint speed', 'Shot power'])

    # Creating the visualization
plt.figure(num="FIFA Striker Analysis: Shot Power vs. Sprint Speed", figsize=(7, 7))
plt.xlim(0, 99)

plt.ylim(0, 99)

plt.xticks(range(0, 100, 10))

plt.yticks(range(0, 100, 10))

plt.grid(True, linestyle="--", alpha=0.6)
plt.scatter(strikers_df['Sprint speed'], strikers_df['Shot power'], color='blue', alpha=0.5)
plt.plot([0, 99], [0, 99], 'r--', alpha=0.3, label='Equal values')
    # Labeling Plot
plt.xlabel("Sprint Speed")
plt.ylabel("Shot Power")
plt.title("Comparison of Sprint Speed and Shot Power for Strikers w/ a 68 Overall or Better")
    # Creating Legend
plt.legend(loc='upper left')
plt.axis('square')

plt.tight_layout()
plt.show()

Second Visualization

This visualization limits the data set to only players who: - Have ‘ST’ (Striker) in their preferred position value - Have an ‘OVR’ greater than a 68

This visualization shows the distribution of shot power amongst these strikers

The goal of this visualization is to get a better understanding of where most strikers’ “Shot Power” statistic lands

# Second Visualization: (Same Striker Dataset) Creating a histogram showing the display of Shot Power
    
plt.figure(num="FIFA Analysis: Distribution of Shot Power Among Strikers", figsize=(7, 4))
plt.hist(strikers_df["Shot power"], bins=20, color="blue", edgecolor="black", alpha=0.7)
plt.xlabel("Shot Power")
plt.ylabel("Number of Players")
plt.title("Distribution of Shot Power Among Strikers with an Overall of 68 or Better")
plt.grid(axis="y", linestyle="--", alpha=0.7)
plt.show()

Third Visualization

This visualization has no extra limitations on the data

This visualization shows what the age distribution is amongst the players in the dataset. This is a pie chart, broken up into different bins.

# Third Visualization (FIFA Age Distribution)

age_groups = pd.cut(df['Age'], bins=[14, 21, 25, 30, 35, 40], labels=['15-21', '22-25', '26-30', '31-35', '36+'])
age_groups.value_counts().plot(kind='pie', autopct='%1.1f%%', figsize=(7, 7))
plt.ylabel("Percentage")
plt.title("Age Distribution in FIFA")
plt.show()

Fourth Visualization

This visualization has no further limitations on the data

This visualization shows what the average overall rating is as players age over time

# Fourth Distribution (Average Overall Rating by Age)
    
    # Creating DF for visualization
age_overall_avg = df.groupby('Age')['Overall'].mean()

    # Creating the plot
plt.figure(figsize=(7, 4))
plt.plot(age_overall_avg, marker='o', linestyle='-')
plt.xlabel("Age")
plt.ylabel("Average Rating")
plt.title("Average Overall Rating by Age")
plt.grid(True)
plt.show()

Fifth Visualization

This visualization limits the data set to only players who: - Have ‘ST’ (Striker) in their preferred position value - Have an ‘OVR’ greater than an 80

This visualization takes each numerical category, and finds the 5 statistics with the highest averages. This shows which statistics have the most influence on a player’s higher overall rating

# Fifth Visualization (Top 5 Statistics for 80+ OVR Strikers) 
    
    # Creating dataset for strikers with >80 OVR
good_strikers_df = strikers_df[strikers_df["Overall"] >= 80]
good_strikers_df_means = good_strikers_df.mean(numeric_only=True)

    # Creating a new DF with column means, & removing unwanted columns
good_strikers_df_means = good_strikers_df_means.drop(['ID', 'Special', 'Unnamed: 0', 'Overall'])
# print(good_strikers_df_means)

    # Getting top 5 columns
sorted_means = good_strikers_df_means.sort_values(ascending=False)

    # Making the Plot
sorted_means[:5].plot(kind='bar', figsize=(7, 4), color='skyblue', edgecolor='black')
plt.xlabel("Columns")
plt.ylabel("Average Value")
plt.title("Top 5 Statistics for Strikers with an 80+ OVR")
plt.xticks(rotation=45)

plt.show()