Python Assignment- NBA Performance Data Set 1985-2018

Intro

For as long as the NBA has been around there has always been a debate on who is the greatest player of all time. Growing up I was able to witness the greatest of LeBron James, and never really got to experience the greatness of Michael Jordan. On the other hand, my dad has watched both athletes play and always argues Jordan’s better. To really settle the debate on who is the greatest player of all time I decided to use the NBA Players Performance dataset. This data set will help me decide based on stats who is the overall greatest player of all time.

Data set

In the data set, there are 4685 rows and 25 columns which include physical characteristics and draft information for NBA players from 1985 through 2018. This information can be utilized to examine various player performances throughout time. When looking through this data set for the first time I wanted to focus on the data specifically for the players who had the highest points per game average of all time. Once I got a list of the highest points per game leaders I narrowed the data set to just 10 players. Out of those 10 players I wanted to use the columns:career_AST,career_FG%, career_G,career_PER,career_PTS,career_TRB, name, and position. These columns would allow me to analyze out of the top 10 players for career points per game, who is ultimately the greatest of all time.

When munging through the data for the first time I noticed that some of the older players when selecting through my top 10 didn’t have three-point stats. Knowing that I didn’t factor a three-point percentage into my decision on who is the greatest player of all time. I wanted to make sure all the players were on the same playing field when factoring in my decision.

Findings

My analysis revealed a lot to me to help me decide who the greatest player of all time is. First, when trying to narrow down what players I should decide from I decided to look at the top 10 best average points per game scorers. The 10 players that I was able to narrow down to consisted of: Wilt Chamberlain, Michael Jordan, Elgin Baylor, LeBron James, Kevin Durant, Jerry West, Allen Iverson, Bob Pettit, Oscar Robertson, and George Gervin. In terms of Career average points per game, Wilt Chamberlain and Michael Jordan lead the pack with an average of 30.1 points per game. While George Gervin averaged the least amount of points with 25.1. Second, I found that while Wilt Chamberlain played the fourth most games out of the top 10 he did have the highest field goal percentage at 54.0. LeBron on the other hand is tied for second in average field goal percentage but has the most amount of games played at 1,198.

Third, I wanted to look at player efficiency rating which takes into account positive results and negative results. The formula adds positive stats and subtracts negative ones through a statistical point value system. Michael Jordan lead in career PER with 27.90 and Allen Iverson had the lowest with 20.90. Fourth, I wanted to split the list of the top 10 players and put them in a nested pie based on their position. My findings were that many of these players were shooting guards while there were only 1 center, point guard, and power forward. Lastly, I wanted to see which players had the highest combined points rebounds and assists. I found that Wilt Chamberlain had the most combined with 57.4 and George Gervin had the lowest with 33.0.

Visualizations

Highest Career PPG Scorers

My first visualization highlights how I got the list to figure out who the greatest of all time is. I decided to generate the top 10 highest career points per game scorers. When I got the top 10 I decided to put it into a vertical bar chart. I wanted to make sure I had a line on the chart showing the mean average of all the top 10. As well I wanted to indicate which players were above the average, within 1%, and below the average. This chart helped show me who was the most productive scorer of all time. My findings showed that both Wilt Chamberlain and Michael Jordan were the most efficient scorers with an average of 30.1 points per game over the time they played.

import os
os.environ['QT_QPA_PLATFORM_PLUGIN_PATH'] = 'D:/Anaconda3/Library/plugins/platforms'

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
import matplotlib.patches as mpatches
from matplotlib.ticker import FuncFormatter
warnings.filterwarnings("ignore")

path = "U:/"

filename = path + "players.csv"

df = pd.read_csv(filename, usecols = ['name', 'career_PTS','career_FG%','career_G','career_TRB','career_AST','career_PER','position'])

df = df.sort_values('career_PTS', ascending = False)

df_top10 = df[['name', 'career_PTS','career_FG%','career_G','career_TRB','career_AST','career_PER','position']].sort_values(by=['career_PTS'], ascending=False).head(10)

df_top10.reset_index(inplace = True, drop = True)

df_top10.columns = ['PlayerName' ,'CareerPPG','CareerFG','CareerGames', 'CareerTRB','CareerAST','CareerPER','Position']

mean_career_pts = df_top10['CareerPPG'].mean()

def pick_colors_according_to_mean_CareerPPG(this_data):
    colors=[]
    avg = this_data.CareerPPG.mean()
    for each in this_data.CareerPPG:
        if each > avg*1.01:
            colors.append('blue')
        elif each < avg*0.99:
            colors.append('green')
        else:
            colors.append('orange')
    return colors





bottom = 0
top = 10
d1 = df_top10.loc[bottom:top]
my_colors = pick_colors_according_to_mean_CareerPPG(d1)

d1['CareerFG'] = pd.to_numeric(d1['CareerFG'])


Above = mpatches.Patch(color='blue', label='Above Average')
At = mpatches.Patch(color='orange', label='Within 1% of the Average')
Below = mpatches.Patch(color='green', label='Below Average')


fig = plt.figure(figsize=(18,16))
fig.suptitle(' Top 10 Highest Average Career Points Per Game scorers (1985-2018)',
             fontsize=18,fontweight='bold')



ax1 = fig.add_subplot(1,1,1)
ax1.bar(d1.PlayerName,d1.CareerPPG, label = 'CareerPPG', color=my_colors)
#ax1.legend(fontsize=14)

ax1.legend(handles=[Above,At,Below],fontsize=14)
plt.axhline(d1.CareerPPG.mean(), color='black', linestyle='dashed')
ax1.spines['right'].set_visible(False)
ax1.spines['top'].set_visible(False)
#ax1.axes.xaxis.set_visible(False)
ax1.text(top-1,d1.CareerPPG.mean()+0.5, 'Mean =' + str(d1.CareerPPG.mean()), rotation = 0, fontsize=16)
ax1.set_xlabel('Top 10 Player', fontsize = 18)
ax1.set_ylabel("Average Career Points Per Game", fontsize = 18)
ax1.tick_params(axis='x', labelsize=11)
ax1.tick_params(axis='y', labelsize=14)

    
for row_counter, value_at_row_counter in enumerate(d1.CareerPPG):
    if value_at_row_counter > d1.CareerPPG.mean()*1.01:
        color = 'blue'
    elif value_at_row_counter < d1.CareerPPG.mean()*0.99:
        color = 'green'
    else:
        color = 'orange'
    ax1.text(row_counter, value_at_row_counter+0.5, str(value_at_row_counter), color=color,size=14,fontweight='bold',
            ha='center', va="bottom",backgroundcolor='white',rotation=0)
    
    plt.ylim(0, d1.CareerPPG.max()*1.1)

plt.show()

Field Goal Percentage and Games Played Analysis

My second visualization consisted of a dual-axis chart that displayed the analysis between field goal percentage and total games played between the top 10 players from before. This helps highlight who had the best average shooting meanwhile being able to play the most games while doing it. It’s easy for a player to have a high field goal percentage while playing not a lot of games. Meanwhile, it’s more impressive to keep a high overall percentage while playing a vast majority of games. Wilt Chamberlain had the highest field goal percentage at 54.0 but only played the fourth most games at 1,045. Meanwhile LeBron James has played the most games at 1,198 and is tied for second in field goal percentage at 50.4 which is very impressive.

df_top10['CareerFG'] = df_top10['CareerFG'].astype(float)
df_top10['CareerTRB'] = df_top10['CareerTRB'].astype(float)
df_top10['CareerPER'] = df_top10['CareerPER'].astype(float)


def autolabel(these_bars, this_ax,place_of_decimals,symbol):
    for each_bar in these_bars:
        height = each_bar.get_height()
        this_ax.text(each_bar.get_x()+each_bar.get_width()/2, height*1.01, symbol+format(height,place_of_decimals),
                    fontsize=14,color='black',ha='center',va='bottom')
                    


fig = plt.figure(figsize=(18,16))
ax1 = fig.add_subplot(1,1,1)
ax2 = ax1.twinx()
bar_width = 0.4

x_pos = np.arange(10)
fg_bars = ax1.bar(x_pos-(0.5*bar_width), d1.CareerFG, bar_width, color= 'blue', edgecolor='black', label = 'Field Goal Percentage')
game_bars = ax2.bar(x_pos+(0.5*bar_width), d1.CareerGames, bar_width, color= 'red', edgecolor='black', label = 'Total Number of Games Played')

ax1.set_xlabel("Player", fontsize=14)
ax1.set_ylabel('Field Goal Percentage', fontsize=18, labelpad=20)
ax2.set_ylabel("Total Number of Games Played", fontsize=18,rotation=270, labelpad=20)
ax1.tick_params(axis='y',labelsize=14)
ax2.tick_params(axis='y',labelsize=14)

plt.title('Field Goal Percentage and Games Played Analysis (1985-2018)\n Top 10 Average Points Per Game Scorers', fontsize=18)
ax1.set_xticks(x_pos)
ax1.set_xticklabels(d1.PlayerName,fontsize=11)

count_color, count_label = ax1.get_legend_handles_labels()
fine_color, fine_label = ax2.get_legend_handles_labels()
legend = ax1.legend(count_color + fine_color, count_label + fine_label, loc='upper left', frameon=True, ncol=1, shadow=True,
                   borderpad=1, fontsize=14)

ax1.set_ylim(0,d1.CareerFG.max()*1.4)

ax2.set_ylim(0,d1.CareerGames.max()*1.2)

autolabel(fg_bars,ax1,'.1f','')
autolabel(game_bars, ax2, ',.0f','')

ax2.yaxis.set_major_formatter(FuncFormatter(lambda x, p: format(int(x), ',')))

plt.show()

Career Player Efficiency Rating

My third visualization highlights a donut chart of which player in the top 10 finished with the highest player efficiency rating. Player efficiency rating takes into account positive results, including field goals, free throws, 3-pointers, assists, rebounds, blocks, and steals, and negative results, including missed shots, turnovers, and personal fouls. The formula adds positive stats and subtracts negative ones through a statistical point value system. Michael Jordan lead in career PER with 27.90 and Allen Iverson had the lowest with 20.90. This rating is very critical because this is used mostly when debating on who the greatest player of all time is

pie_df = df[['name','career_PER','career_PTS']].sort_values(by=['career_PTS'], ascending=False).head(10)
del pie_df['career_PTS']
pie_df.reset_index(inplace = True, drop = True)
pie_df['career_PER'] = pie_df['career_PER'].astype(float)

number_outside_colors = len(pie_df.name.unique())
outside_color_ref_number = np.arange(number_outside_colors)*2

fig = plt.figure(figsize=(7,7))
ax = fig.add_subplot(1,1,1)

colormap = plt.get_cmap("tab20c")
outer_colors = colormap(outside_color_ref_number)

avg_PER = pie_df.career_PER.mean()

# use lambda function to format the autopct labels
autopct_fmt = lambda p: '{:.2f}%\n({:.2f} PER)'.format(p, p * sum(pie_df.groupby(['name'])['career_PER'].sum()) / 100)

pie_df.groupby(['name'])['career_PER'].sum().plot(
       kind = 'pie',radius=1, colors = outer_colors, pctdistance = 0.75, labeldistance = 1.1,
       wedgeprops = {'edgecolor':'w'}, textprops = {'fontsize': 10},
       autopct = autopct_fmt, startangle = 90, subplots = True)

hole = plt.Circle((0,0), 0.3, fc='white')
fig1 = plt.gcf()
fig1.gca().add_artist(hole)

ax.yaxis.set_visible(False)
plt.title('Top 10 PPG Scorers Career Player Efficiency Rating (1985-2018)', fontsize =15)

ax.text(0,0, 'Top 10 PER AVG\n'  + str((avg_PER)), size = 10, ha ='center', va = 'center')

ax.axis('equal')

plt.tight_layout()

plt.show()

Player Position Types

My fourth visualization highlights a nested pie chart player’s positions. This in my opinion was a very eye-opening visualization for me. I realized in this visualization that maybe when trying to find out who the greatest player of all time is we should compare it based on position. For example, we should compare Jordan to someone like Iverson and LeBron to Durant. This helped me build a case for who the greatest player of all time ultimately is.


pie_df1 = df[['name','position','career_PTS']].sort_values(by=['career_PTS'], ascending=False).head(10)
del pie_df1['career_PTS']
pie_df1.reset_index(inplace = True, drop = True)

pie_df1 = df_top10.sort_values('Position', ascending = False)

position_counts = pie_df1['Position'].value_counts()
player_counts = pie_df1['PlayerName'].value_counts()

fig = plt.figure(figsize=(7,7))
ax = fig.add_subplot(1,1,1)

colormap = plt.get_cmap("tab20c")
outer_colors = colormap([0, 4, 8, 12, 16])




position_counts.plot(
       kind = 'pie',radius=1, colors = outer_colors, pctdistance = 0.85, labeldistance = 1.1,
       wedgeprops = {'edgecolor':'w'}, textprops = {'fontsize': 12},
       autopct='%1.1f%%',
       startangle = 90)

inner_colors = colormap([1, 2, 3, 19, 5, 6, 7, 9, 13, 17])
player_counts.plot(
       kind = 'pie',radius=0.7, colors = inner_colors, pctdistance = 0.4, labeldistance = 0.75,
       wedgeprops = {'edgecolor':'w'}, textprops = {'fontsize': 5.9, 'horizontalalignment': 'center', 'verticalalignment': 'center'},
       labels = pie_df1.PlayerName.unique(),
       autopct = '%1.2f%%',
       startangle = 90)



ax.yaxis.set_visible(False)
plt.title('Career Top 10 PPG Scorers Position Types (1985-2018)', fontsize =16)


ax.axis('equal')

plt.tight_layout()

plt.show()

Total Points+Rebounds+Assists

In my final visualization, I decided to make a stacked bar chart that highlighted who had the highest points+rebounds+assists totals out of the top 10. When combing all three of those stats together can help show who is the most efficient player on the floor. The greatest of all time should be able to do it all and those three stats are very important when deciding who the greatest of all time is. Wilt Chamberlain had the highest combined total with 57.4 PRA while George Gervin had the lowest with 33.0.

df_top10['TotalPRA'] = df_top10['CareerPPG'] + df_top10['CareerTRB'] + df_top10['CareerAST']
stacked_df = df_top10[['PlayerName', 'CareerPPG', 'CareerTRB', 'CareerAST']]
stacked_df.set_index('PlayerName', inplace=True)

colors = ['lightblue', 'orangered', 'lightgreen']

fig = plt.figure(figsize=(18,10))
ax = fig.add_subplot(1,1,1)

stacked_df.plot(kind='bar', stacked=True, ax=ax, color=colors)
plt.title('Career Total Points+Rebound+Assists for Top 10 Players (1985-2018) \n Stacked Bar Plot', fontsize=18)
plt.xlabel('Player', fontsize=18)
plt.ylabel('Career Total Points+Rebounds+Assists',fontsize=18, labelpad=10)
plt.xticks(rotation=0, horizontalalignment = 'center', fontsize=12)

plt.yticks(fontsize=14)

handles, labels = ax.get_legend_handles_labels()
handles = [handles[2],handles[1],handles[0] ]
labels  =  [labels[2],labels[1],labels[0]]

ax.bar_label(ax.containers[0], label_type='center', fontsize=14,color="black")
ax.bar_label(ax.containers[1], label_type='center', fontsize=14,color="black")
ax.bar_label(ax.containers[2], label_type='center', fontsize=14,color="black")


plt.legend(handles,labels,loc='best',fontsize=14)

plt.show()

Conclusion

In conclusion, my analysis of the top 10 points per game scorers in the NBA from 1985-2018 provided me with a good answer on who I think the best player of all time is. Even though some of these players excelled in certain categories than others. My fourth visualization of the players by position changed my mind about who could be deemed the best player. I can conclude that the best player of all time shouldn’t be compared in a general setting. I think when deciding on who the best player of all time is, you must look by position. Each position is played differently and deciding on the best overall player is very hard to conclude. All of these players in the top 10 could be argued of being the best of all time. The argument for who is the greatest of all time should be by position only. This means going forward that the debate of the greatest of all time can’t be MJ Vs. LeBron. Instead, it should be an argument between Lebron Vs. Durant and MJ Vs. Allen Iverson. When you compare it by positions there is a better argument on who could be deemed the best of all time. Since different positions require different abilities and skills.