IS460 Python Final

NBA Free Throw Shooting

For this project, I was prompted to create 5 visualizations in Python based on a data set of my choosing, with the goal of finding something out about the data. Once it was time to get started, I chose a data set that contained information on NBA free throw shooting as I am a big sports fan and wanted to choose something I found interesting. The data contains lots of information spanning 10 years, from 2006 to 2016, with things like players, if they made or missed the free throw, and when it happened like the date or quarter. With almost 620,000 observations of the 11 different variables, there is plenty of information to go through.

Graph 1

First things first, I had to read in my data as well as import some useful packages that I would need to parse through and visualize the data. Once I had my data read in, I began to break it down in order to create my first visualization, which is a donut chart showing free throws attempted in each quarter. To do this, I had to group my data by the quarter and count up the observations, then remove the extra quarters [5, 6, 7, 8], which represent overtimes, as they would clutter up and skew the data.

As you can see in the chart, the quarter does seem to have an impact on how many free throws are shot, as in the first quarter there is less attempts than any other, while the fouth has a greater amount than the previous 3. This would make sense as teams are more likely to foul late in the game when they are down points, as it is a tactic to ensure getting the ball back.


import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.ticker import FuncFormatter
df = pd.read_csv(r'C:/Users/nrhar/OneDrive/Documents/IS460/free_throws.csv')
df['period'] = df['period'].astype(int)

x = df.groupby('period')['period'].count().reset_index(name = 'count')
x = pd.DataFrame(x)

omit = [5.0, 6.0, 7.0, 8.0]

x2 = x.loc[ ~x['period'].isin(omit)   ]
x2.columns = ['Quarter', 'Total']

fig = plt.figure(figsize = (10, 10))
ax = fig.add_subplot(1, 1, 1)

colormap = plt.get_cmap("Accent")
outer_colors = colormap([1, 2, 3, 4])

total_freethrows = x2.Total.sum()

x2.groupby(['Quarter'])['Total'].sum().plot(
        kind = 'pie', radius = 1, colors = outer_colors, pctdistance = 0.85, labeldistance = 1.1,
        wedgeprops = dict(edgecolor='w'), textprops = {'fontsize':18}, 
        autopct = lambda p: '{:.2f}%\n({:.1f}K)'.format(p, (p/100)*total_freethrows/1000), startangle = 90)

hole = plt.Circle((0,0), 0.3, fc='white')
fig1 = plt.gcf()
fig1.gca().add_artist(hole)

ax.yaxis.set_visible(False)
plt.title('Total Free Throws Attempted by Quarter', fontsize=18)

ax.text(0, 0, 'Total Free Throws\n' + str(round(total_freethrows/1000, 2)) + 'K', size = 18, ha='center', va='center')

ax.axis('equal')

plt.tight_layout()

plt.show()

Graph 2

After seeing the variance amongst free throws shot in each quarter, it made me want to get a better idea of how much of an impact this was having on actual points scored, as obviously more free throws were being shot in the fourth quarter compared to the others, but I wanted to find out if the teams were fouling a bad shooter and therefore surrendering less points than usually expected. To do this, I created a dual axis bar chart showing both the amount of free throws made and attempted in each quarter.

When looking at the chart, it is pretty visible that the ratio of shots made to missed is pretty even across all quarters, sitting at roughly 75% for all four. This means that teams are not effectively fouling late in games, as if they were, they would get the worst shooter on the line and the percentage in the fourth would be lower. The graph still has more to show, however, as the discrepancy in points between the first and fourth quarters are huge, with the fourth receiving more than 50,000 more points in just free throws than the first.


df_made = df.groupby('period')['shot_made'].sum().reset_index(name = 'fts_made')
df_made1 = df_made.loc[ ~df_made['period'].isin(omit)   ]

df_perc = pd.concat([df_made, x2], axis=1, join='inner')
df_perc.drop('Quarter', axis = 1, inplace = True)

def autolabel(these_bars, this_ax, place_of_decimals, symbol):
    for each_bar in these_bars:
        height = each_bar.get_height()
        this_ax.text(each_bar.get_x() + each_bar.get_width()/2, height*1.01, symbol + format(height, place_of_decimals),
                    fontsize = 11, color = 'black', ha = 'center', va = 'bottom')
                    
fig = plt.figure(figsize = (18, 10))
ax1 = fig.add_subplot(1, 1, 1)
ax2 = ax1.twinx()
bar_width = 0.4

x_pos = np.arange(4)
made_bars = ax1.bar(x_pos-(0.5*bar_width), df_perc.fts_made, bar_width, color = 'lightgreen',
                    edgecolor = 'black', label = 'Free Throws Made')
attempted_bars = ax2.bar(x_pos+(0.5*bar_width), df_perc.Total, bar_width, color = 'gray',
                    edgecolor = 'black', label = 'Free Throws Attempted')

ax1.set_ylim(ymax = max(df_perc.Total)*1.1)

ax2.set_ylim(ymax = max(df_perc.Total)*1.1)

ax1.set_xlabel('Quarter', fontsize = 18)
ax1.set_ylabel('Free Throws Made', fontsize = 18)
ax2.set_ylabel('Free Throws Attempted', fontsize = 18, rotation = 270, labelpad = 20)
ax1.tick_params(axis = 'y', labelsize = 14, rotation = 0)
ax2.tick_params(axis = 'y', labelsize = 14, rotation = 0)

plt.title('Free Throws Made and Attempted by Quarter', fontsize = 18)
ax1.set_xticks(x_pos)
ax1.set_xticklabels(df_perc.period, fontsize = 15)

autolabel(made_bars, ax1, '.0f', '')
autolabel(attempted_bars, ax2, '.0f', '')

plt.show()

Graph 3

After looking at some of the aggregate data in the above two graphs, I wanted to get a little deeper into the data and look at free throw shooting at an individual player basis. To get this done, I had to group by the player’s names and then count up the amount of times they attempted a free throw, then I had to sort the data in descending order so I had the top 10 players in free throws attempted at the top. Once I had that all I had to do was select the first 10 records in the data and then plot them in my horizontal bar chart.

Seeing these top 10 players in free throw attempts, there are no surprises at all in terms of the names featured as all were cemented as stars during the time this data was collected(2006-2016), with iconic names like Lebron James, Kobe Bryant, and Kevin Durant all near the top. What is surprising, however, is the discrepancy between the top two players and everyone else, as Lebron James and Dwight Howard have around 2000 more attempts than the couple that trail them and almost double the attempts as rest in the top 10. This is very shocking as among the top 10 in something like this you wouldn’t expect such a large gap.


df1 = df.groupby('player')['player'].count().reset_index(name = 'total')
df1 = pd.DataFrame(df1)

df1 = df1.sort_values('total', ascending = False)
df1.reset_index(inplace = True, drop = True)

df2 = df1.loc[0:9]

fig = plt.figure(figsize = (18, 12))

ax1 = fig.add_subplot(1, 1, 1)
ax1.barh(df2.player, df2.total, color = 'red')

for row_counter, value_at_row in enumerate(df2.total):
    ax1.text(value_at_row + 2, row_counter, str(value_at_row), color = 'black', size = 12, fontweight = 'bold')
plt.xlim(0, df2.total.max()*1.1)

ax1.set_title('Top 10 Players in Total Free Throws Attempted', size = 20)
ax1.set_xlabel('Free Throws Attempted', fontsize = 15)
ax1.set_ylabel('Player Name', fontsize = 15)
plt.xticks(fontsize = 15)

plt.yticks(fontsize = 15)

plt.show()

Graph 4

To continue looking at these leaders in free throws attempted, I wanted to see how the quarter numbers effected free throw attempts individually. By creating a multiple line plot, it allowed me to look for trends among these players in how often they shot, which could then allow for further analysis by comparing players play styles and seeing how their free throw trends match up.

This plot shows similar results to the earlier free throw by quarter, as the lowest amounts of free throw attempts were in first, middle level in the second and third, and maxing out finally in the fourth. What was slightly different is that for most players there wasn’t a stark increase in free throws shot in fourth, just a small one, with two players actually shooting more in another quarter than the fourth. There were two players who did have a sizable jump, Lebron James and Dwight Howard, who were the far and away leaders in total attempts as well.


df3 = df.groupby(['player', 'period'])['player'].count().reset_index(name = 'total_by_quarter')
df3 = pd.DataFrame(df3)

df3 = df3.sort_values('total_by_quarter', ascending = False)
df3.reset_index(inplace = True, drop = True)

top10 = df2['player']

df4 = df3.loc[ df3['player'].isin(top10)   ]
df5 = df4.loc[ ~df4['period'].isin(omit)   ]

fig = plt.figure(figsize = (18, 10))
ax = fig.add_subplot(1, 1, 1)


mycolors = {'LeBron James':'blue',
            'Dwight Howard':'red',
            'Kevin Durant':'green',
            'Kobe Bryant':'purple',
            'Dwyane Wade':'yellow',
            'Carmelo Anthony':'gray',
            'Dirk Nowitzki':'brown',
            'James Harden':'orange',
            'Russell Westbrook':'teal',
            'Chris Bosh':'black'}

for key, grp in df5.groupby(['player']):
    grp.plot(ax = ax, kind = 'line', x = 'period', y = 'total_by_quarter', color = mycolors[key], label = key, marker = '8')
    
plt.title('Total Free Throws Attempted by Player by Quarter', fontsize = 18)
ax.set_xlabel('Quarter', fontsize = 18)
ax.set_ylabel('Free Throws Attempted', fontsize = 18)
ax.tick_params(axis = 'x', labelsize = 14, rotation = 0)
ax.tick_params(axis = 'y', labelsize = 14, rotation = 0)
    
plt.show()

Graph 5

Finishing out this analysis, I decided to look at how these 10 players shot percentage wise by quarter, as they all took a lot of free throws, however there would obviously be variation in percentages as well. To do this, I had to calculate each player’s free throw percentage by quarter, and then plot them using a stacked bar chart, which would allow us to see how their total free throw percentage compares as well as by quarter.

When looking at this chart, it is pretty clear that the percentage by quarter for each player is rather constant, as their are no players who have one section significantly larger than another. One thing that jumps out, however, is that Dwight Howard and Lebron James, more so the former, have a much lower free throw percentage than their fellow stars, which may correlate to the fact that they shoot more free throws in general and specifically in the fourth. This may mean that the two are being targeted to shoot extra free throws due to the propensity to miss them.


df6 = df.groupby(['player', 'period'])['shot_made'].sum().reset_index(name = 'fts_made')
df7 = df.groupby(['player', 'period'])['shot_made'].count().reset_index(name = 'fts_att')
df7.columns = ['player1', 'period1', 'fts_att']

df8 = pd.concat([df6, df7], axis=1, join='inner')
df8['ft_pct'] = round((df8['fts_made']/df8['fts_att'])*100, 2)
df8.drop(['player1', 'period1'], axis = 1, inplace = True)

df9 = df8.loc[ df8['player'].isin(top10)   ]
df10 = df9.loc[ ~df9['period'].isin(omit)   ]

df10 = df10.pivot(index = 'player', columns = 'period', values = 'ft_pct')

fig = plt.figure(figsize = (18, 10))
ax = fig.add_subplot(1, 1, 1)

df10.plot(kind = 'bar', stacked = True, ax = ax)

plt.ylabel('Free Throw Percentage', fontsize = 18)
plt.title('Top 10 Players Free Throw Percentage by Quarter', size = 20)
plt.xticks(rotation = 90, horizontalalignment = 'center', fontsize = 15)

plt.yticks(fontsize = 15)

ax.set_xlabel('Player', fontsize = 20)

plt.show()