# add these two lines underneath the chunk where you have included the use_python line.
import os
os.environ['QT_QPA_PLATFORM_PLUGIN_PATH'] = 'c:/ProgramData/Anaconda3/Library/plugins/platforms'

Introduction

After conducting my analysis of the 2020 NBA season, I wanted to take a deeper dive into the NBA and their statistics. In this project, I am conducting an analysis on what colleges have produced the more productive NBA players and how draft year and pick number factor into a players performance I found a data set that went back to 1990 with all the draft picks until 2021 and their stats up to the end of last year. In this project we will do some analysis of draft picks as well as the colleges which the players came from and see if we can find any patterns.

Dataset

The data set being used is 1990-2021 NBA draft picks for individual player stats. This data set tracks each player drafted to the NBA in this time frame and their career statistics such as minutes played, points, rebounds, college, and more.


nba[['Yrs','TOTPTS','TOTTRB', 'TOTAST', 'TOTMP','MPG', 'FG%']].describe()
##                Yrs        TOTPTS  ...          MPG          FG%
## count  1621.000000   1621.000000  ...  1621.000000  1616.000000
## mean      6.305367   3511.962369  ...    18.040592     0.435562
## std       4.640924   4788.948725  ...     8.730546     0.084803
## min       1.000000      0.000000  ...     0.000000     0.000000
## 25%       2.000000    253.000000  ...    10.800000     0.403000
## 50%       5.000000   1516.000000  ...    17.600000     0.435000
## 75%      10.000000   5081.000000  ...    24.900000     0.473000
## max      22.000000  36559.000000  ...    41.100000     1.000000
## 
## [8 rows x 7 columns]

Findings

Using the draft dataset, I was able to find that Kentucky, Duke, and North Carolina have produced some of the most productive NBA players. In the visuals below, we see that these schools lead in total points, minutes, and years played since 1990. I also found that pick number does have an affect on productivity. I will take a deeper looks as we go through the visualization.

Tab 1

This graph is showing the schools that have generated players with the most minutes and years played over the last 30 years. As you an see from the graph below, the schools that have produced the most players include Kentucky, Duke, and UNC. I would like to point out that this graph does not indicate that they have the longest tenured players but rather they might have the most compiled number of drafted players. Further analysis would need to be done to draw more in depth analysis.


plt.figure(figsize=(18,10))
plt.stackplot(b.index, b['MPG'],b['Yrs'],labels=['Total Minutes/Game','Total Years'])
#barh, scatter

plt.style.use('seaborn')
plt.title('Sum of Minutes Played per Game from 1990-2021',fontsize = 40)
plt.xlabel('College', fontsize = 30)
plt.ylabel('Sum of Minutes and Points', fontsize = 20)
plt.xticks(fontsize=20, rotation=85)
## ([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, '')])
plt.yticks(fontsize=20)
## (array([   0.,  200.,  400.,  600.,  800., 1000., 1200., 1400., 1600.,
##        1800.]), [Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, '')])
plt.legend(fontsize=20)
plt.show()

Tab 2

This graph is showing us the average of both rebounds and assists for each individual year of the whole draft. It appears like there is some positive correlation between points and rebounds per game as the trends look very similar. As you can see, it looks like 2005-2010 had some of the highest averages.

plt.figure(figsize = (14,10))
plt.plot(drftppg['DraftYear'], drftppg['PPG'], color = 'red', marker = 'o', label = 'PPG')
plt.plot(drftrpg['DraftYear'], drftrpg['RPG'], color = 'blue', marker = 'x', label = 'RPG')
plt.title('PPG/RPG average by draft year')
plt.xlabel('Draft Year')
plt.ylabel('Points/REbounds Per Game Average')
plt.legend()
plt.show()

Tab 3

Looking further into players underlying colleges, the same 3 schools have produced the highest scoring players. There is a lot of consistency to the graph earlier in tab 1.

x = nba.dropna()
x = nba.sort_values(['TOTPTS'], ascending=True).groupby('College',as_index = False)['TOTPTS'].sum()
x= x.sort_values('TOTPTS', ascending = False).reset_index()
x = x.dropna()
plt.figure(figsize=(18, 10))
plt.bar(x.loc[0:10, 'College'], x.loc[0:10, 'TOTPTS'], label = 'Total Points')
## <BarContainer object of 11 artists>
plt.legend(fontsize= 12)
plt.title('Which colleges produce the best scores?')
plt.xlabel('College')
plt.ylabel('Total Points')
current_values = plt.gca().get_yticks()
plt.gca().set_yticklabels(['{:,.0f}'.format(x) for x in current_values])
plt.show()

Tab 4

The outside number indicates that draft pick number. Here I wanted to see how if there was any discrepancies in scoring between the top ten picks. The data shows that the top 1-5 draft picks have produced higher scoring players. From a value standpoint, it appears that having the 9th draft pick looks to have some of the highest value in the second half of the top 10 draft picks.

rebast = nba[["DraftYear", "TOTPTS", "TOTAST", "TOTTRB", "Pk"]]
#rebast.groupby(['DraftYear']).sum().sort_values('DraftYear', ascending = False)

top10 = rebast[rebast['Pk'] < 11]
top10f = top10.filter(['Pk','TOTPTS'])


tot_pts= top10.TOTPTS.sum()

top10f.groupby(['Pk']).sum().plot(
    kind='pie', radius=1, labeldistance = 1.2, y='TOTPTS', legend=None, figsize=(16,16), 
    autopct ='%.1f%%', title= "Total Points Scored by top 10 Picks 1990-2021", normalize=True)
plt.show()

Tab 5

The graph below displays all the first overall picks from 2000 to present day. We can see that combining points, rebounds, and assist per game that most number 1 overall picks make a large impact with a few outliers. 2003 praticularly stands out and rightfully so as this would be Lebron James in that draft slot.

points = nba[nba['DraftYear'] >= 2000].set_index("DraftYear").groupby("DraftYear").head(1)[["PPG","RPG","APG"]]
figure(figsize=(16,12),dpi=100)
points.plot.bar(stacked=True)
plt.xlabel("Year Drafted")
plt.ylabel("PTS/REB/AST")
plt.title("Pts,Reb,Ast of the First Picks")
#plt.figure(figsize=(16,12), dpi=100)
plt.savefig('test.png')
plt.show()

Conclusion

In conclusion, we have seen consistently that the schools that produce the best outcomes are Duke, Kentucky, and UNC. These schools have players in the top 3 in both of graphs on both minutes played and points scored. I also found that having players picked in the top 5 picks of the draft produced more points compared to later picks.