College Hockey Commitment Analysis

Loading Packages and Data Below

# piece of code given from the professor
import os
os.environ['QT_QPA_PLATFORM_PLUGIN_PATH'] = '/Users/markgranatire/anaconda3/Library/plugins/platforms'

# I paste some code in here, maybe to identify all of the packages I need to use and then to read in the data and to report some details about the data. 

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
import matplotlib.patches as mpatches

warnings.filterwarnings("ignore")

filename = 'CollegeHockeyData.csv'

df = pd.read_csv(filename, skiprows = 0)

# following formats the date and spilts into separate columns to report on
df['Date'] = pd.to_datetime(df['Date'], format = '%m/%d/%y')

df['Commit Year'] = df['Date'].dt.year
df['Commit Month'] = df['Date'].dt.month
df['Commit Month Name'] = df['Date'].dt.strftime('%B')
df['Commit Month Name Abbrev'] = df['Date'].dt.strftime('%b')
df['Commit Day Name'] = df['Date'].dt.strftime('%A')
df['Commit Day Name Abbrev'] = df['Date'].dt.strftime('%a')

Introduction to the Data

I’ve compiled a dataset on player commitments to college hockey programs, sourced from https://www.collegecommitments.com/CommitList.aspx?a1bdi=NCAAD1. To get the data, I moved the table on the website to Excel, saved the data to CollegeHockeyData.csv, and read that file. The website gets updated daily as players announce their commitments to different schools. When the data was pulled, there were 1070 commitments, and I used different visualizations to understand the data. The dataset encompasses detailed information on all the players that have announced their commitment, including the player name, date of commitment, current hockey team, the college they committed to, and starting season; one challenge I encountered was the data did not have the conference of the college teams unless you filtered the table on the website. To address this, I modified the data to ensure that each college hockey team had a conference they were attached to. In order to understand the conferences, it is important to note that the conferences are show in their abbreviation form. The conferences are as follows: Atlantic Hockey Association (AHA), Big Ten Conference (BigTen), Central Collegiate Hockey Association (CCHA), Eastern College Athletic Conference Hockey (ECAC), Hockey East Association (Hockey East), National Collegiate Hockey Conference (NCHC), and a handful of Independent teams not associated to a conference. Additionally, I found that the current team column had that team’s league in parentheses. I modified the data to separate the current team and the current league to show what junior league the player was a part of. This process allowed me to streamline the dataset and focus on essential data regarding player commitments while ensuring accurate analysis. Hockey is different from other sports as most hockey players play for a junior hockey team before they attend college. Looking at current teams can give colleges insights into what leagues to focus recruiting on. Overall, this dataset provides a comprehensive resource for exploring how college hockey teams will look in the seasons to come, junior hockey trends, and college hockey conference dynamics, aligning with my lifelong passion for hockey.

Findings

First Visualization - Scatter Plot

# created df2 and pulled columns for this visualization
df2 = df.groupby(['DOB', 'Commit Month'])['DOB'].count().reset_index(name='Count')
df2 = pd.DataFrame(df2)

df2['Count_Fives'] = round(df2['Count']/5, 0)

df2['DOB'] = df2['DOB'].astype('int')


# following plots the visualization
plt.figure(figsize=(18, 10))

plt.scatter(df2['Commit Month'], df2['DOB'], marker='8', cmap='viridis', 
            c = df2['Count_Fives'], s = df2['Count_Fives']*30, edgecolors = 'black')

plt.title('Player Commitments by Month', fontsize=18)
plt.xlabel('Month of Commitment', fontsize=14)
plt.ylabel('Birth year of Player(s)', fontsize=14)

# sets up the colorbar to keep track of number of commits
cbar = plt.colorbar()
cbar.set_label('Number of Commitments', rotation=270, fontsize=14, color='black', labelpad=30)

my_colorbar_ticks = [*range(int(df2['Count_Fives'].min()), int(df2['Count_Fives'].max()), 2)]
cbar.set_ticks(my_colorbar_ticks)

my_colorbar_ticks_labels = [*range(int(df2['Count'].min()), int(df2['Count'].max()-2), 10)]
cbar.set_ticklabels(my_colorbar_ticks_labels)

# sets the x-axis
my_x_ticks = [*range(df2['Commit Month'].min(), df2['Commit Month'].max()+1, 1)]
plt.xticks(my_x_ticks, fontsize=14, color='black')

# sets the y-axis
my_y_ticks = [*range(df2['DOB'].min(), df2['DOB'].max()+1, 1)]
plt.yticks(my_y_ticks, fontsize=14, color='black')

plt.show()

Insight: This scatter plot visualization represents the number of commitments by month per birth year. In order to get this data, I created a scatter plot data frame that grabbed the DOB, Commit Month, and the count of the commitments. The x-axis is the months of the year, and the y-axis is the different birth years based on the data. On the right side of the visualization is a color bar that shows the number of commitments. This visualization looks into what months players committed to teams. The visualization also sheds light on what birth years had success during certain months of the year. Based on the visualization, the majority of commitments happen in the fall. This could be because the NCAA has a certain set of recruiting dates where players can meet with coaches, teams, and schools so players might wait until then to choose a school. Another item to note is the two most populated commitment months are August for the 2007 birth years and November for the 2006 birth years. Players in those birth years are younger and finally can verbally commit to schools. It shows that these players are talented and have a lot of schools knocking on their doors.

Second Visualization - Frequency of Commitments

# created df3 in order to get columns for this visualization
df3 = df.groupby(['College', 'Starting Year'])['College'].count().reset_index(name='Count')
df3 = pd.DataFrame(df3)

# creates x df which will be used later on in other visuals
x = df3.groupby(['College']).agg({'Count':['sum', 'mean']}).reset_index()
x.columns = ['College', 'TotalCommits', 'AverCommits']
x['AverCommits'] = round(x['AverCommits'], 2)
x = x.sort_values('TotalCommits', ascending=False)
x.reset_index(inplace=True, drop=True)

# created function in order to set what bars get what color
def pick_colors_according_to_mean_count(this_data):
    colors = []
    avg = this_data.TotalCommits.mean()
    for each in this_data.TotalCommits:
        if each  > avg*1.01:
            colors.append('lightblue')
        elif each  < avg*0.99:
            colors.append('purple')
        else:
            colors.append('black')
    return colors

# sets the range for first plot and second plot  
bottom1 = 0
top1 = x['TotalCommits'].max()
d1 = x.loc[bottom1:top1]
my_colors1 = pick_colors_according_to_mean_count(d1)

bottom2 = 0
top2 = 9
d2 = x.loc[bottom2:top2]
my_colors2 = pick_colors_according_to_mean_count(d2)

# sets the legend 
Above = mpatches.Patch(color='lightblue', label='Above Average')
At = mpatches.Patch(color='black', label='Within 1% of the Average')
Below = mpatches.Patch(color='purple', label='Below Average')

# initializes the figure that will be plotted
fig = plt.figure(figsize=(20, 16))
fig.suptitle('Frequency of Commitments by College Teams Over the Next 3 Seasons:\n Top ' 
             + str(top1 + 1) + ' College Teams and Top ' + str(top2 + 1) + ' College Teams\n', fontsize=18, fontweight='bold')

# creates the first bar plot
ax1 = fig.add_subplot(2, 1, 1)
ax1.bar(d1.College, d1.TotalCommits, label = 'Count', color = my_colors1)
ax1.legend(handles=[Above, At, Below],fontsize=14)
plt.axhline(d1.TotalCommits.mean(), color = 'black', linestyle = 'dashed')
ax1.spines['right'].set_visible(False)
ax1.spines['top'].set_visible(False)
ax1.axes.xaxis.set_visible(False)
ax1.set_title('Top '+ str(top1 + 1) +' Teams by Total Commitments', size=20)
ax1.text(top1, d1.TotalCommits.mean()+2, 'Mean = ' + str(round(d1.TotalCommits.mean(), 2)), rotation=0, fontsize=14)

# creates the second bar plot
ax2 = fig.add_subplot(2, 1, 2)
ax2.bar(d2.College, d2.TotalCommits, label = 'Count', color = my_colors2)
ax2.legend(handles=[Above, At, Below],fontsize=14)
plt.axhline(d2.TotalCommits.mean(), color = 'black', linestyle = 'solid')
for i, j in enumerate(d2.TotalCommits):
    ax2.text(i, j + 1.1, str(j), ha='center', fontsize=12, fontweight='bold', va='center', backgroundcolor='white')
ax2.spines['right'].set_visible(False)
ax2.spines['top'].set_visible(False)
ax2.set_xlabel('College Hockey Teams', fontsize=18)
ax2.set_xticklabels(d2.College, fontsize=14, rotation=45, ha='right')
ax2.set_title('Top '+ str(top2+1) +' Teams by Total Commitments', size=20)
ax2.text(top2, d2.TotalCommits.mean()+1, 'Mean = ' + str(round(d2.TotalCommits.mean(), 2)), rotation=0, fontsize=14)

fig.subplots_adjust(hspace=0.5)
plt.tight_layout() 
plt.show()

Insight: This two bar plot visualization represents the top 36 teams in regards to the number of commitments as well as the top 10 college teams in the second bar plot . I created a bar plot data frame that grabbed the college team and the total count of commitments. I ended up sorting the total number of commitments which showed the teams in descending order. Given the number of teams in the first bar plot, I decided not to display the teams on the x-axis. The y-axis for both graphs is the number of commitments based on the data. The x-axis in the second bar plot is the top 10 college teams. This visualization looks into the top 35 and 10 college hockey teams with the most commitments throughout the next three seasons. What separates this bar plot from others identifies the mean of commitments for both bar plots. To show what teams fall within the range of the mean, I used a marker that separates teams that are above average, below average and teams that are within 1% of the mean. Based on the visualization, the University of Maine Black Bears lead all teams with 35 commitments. There is a relative drop after the first university, but it slows down afterward. The top bar plot has a mean of around 21 commitments, while the bottom bar plot mean is around 25 commitments. In the first bar plot, the first 13 teams are above the mean, with five teams within 1% of the mean and 18 teams falling below the average amount. In the second bar plot, the first three teams are above the mean, with seven teams that fall below the average amount. Most teams don’t have a lot of commitments like Maine or Arizona State, so it would be interesting to see how the mean looks with 64 teams.

Third Visualization - Dual Axis

# creates a function for the labels on each bar
def autolabel(these_bars, this_ax, place_of_decimals, symbol):
    for each_bar in these_bars:
        height = each_bar.get_height()
        this_ax.text(each_bar.get_x()+each_bar.get_width()/2, height*1.01, symbol+format(height, place_of_decimals),
                    fontsize=11, color='black', ha='center', va='bottom')

# creates the figure plot                     
fig = plt.figure(figsize=(18, 10))
ax1 = fig.add_subplot(1,1,1)
ax2 = ax1.twinx()
bar_width = 0.4

x_pos = np.arange(10)
total_commits_bars = ax1.bar(x_pos-(0.5*bar_width), d2.TotalCommits, bar_width, color='gray', edgecolor='black',
                     label='Total Commitments')
aver_commits_bars = ax2.bar(x_pos+(0.5*bar_width), d2.AverCommits, bar_width, color='green', edgecolor='black',
                     label='Average Commitments')
ax1.set_xlabel('College Hockey Teams', fontsize=18)
ax1.set_ylabel('Total Commitments', fontsize=18, labelpad=20)
ax2.set_ylabel('Average Commitment Total', fontsize=18, rotation=270, labelpad=20)
ax1.tick_params(axis='y', labelsize=14)
ax1.tick_params(axis='x', labelsize=14)
ax1.set_xticklabels(d2.College, fontsize=14, rotation=45, ha='right')
ax2.tick_params(axis='y', labelsize=14)

plt.title('Total Commitments and Average Amount of Commitments Analysis\n Top 10 Most Frequently Committed Colleges Over The Next 3 Seasons\n', fontsize=18)
ax1.set_xticks(x_pos)

ax1.set_xticklabels(d2.College, fontsize=14)

# sets the color and label for the legend
total_commits_color, total_commits_label = ax1.get_legend_handles_labels()
avg_commits_color, avg_commits_color_label = ax2.get_legend_handles_labels()

# creates the label that will be displayed for the user
legend = ax1.legend(total_commits_color + avg_commits_color, total_commits_label + avg_commits_color_label, loc='upper center', frameon=True, ncol=1,
                   shadow=True, borderpad=1, fontsize=14)

ax1.set_ylim(0, d2.TotalCommits.max()*1.5)

# creates the label for top of the bars
autolabel(total_commits_bars, ax1, '.0f', '')
autolabel(aver_commits_bars, ax2, '.2f', '')

plt.tight_layout() 
plt.show()

Insight: This dual axis plot visualization represents the top 10 teams in regards to the number of commitments as well their averages over the course of the next 3 seasons.I created a dual axis plot data frame that grabbed the college team, the average commitment total, and the total count of commitments. I ended up sorting the total number of commitments which showed the teams in descending order. Due to the dual axis for this bar plot, the left y-axis is the number of commitments based on the data, and the right y-axis is the count of commitments to track the average number of commitments. The x-axis in the bar plot is the top 10 college teams. This visualization looks into the top 10 college hockey teams with the most commitments throughout the next three seasons in the gray plots and the average number of commitments for each of the top 10 teams in the green bar plot. Based on the visualization, the University of Maine Black Bears led all teams with 35 commitments, averaging 12 over three seasons. Based on the University of Maine, something to ponder is whether getting more players than other teams on average means they expect more players to hit the transfer portal or decommit. Most teams in this bar plot have around 7 to 8 commitments over the next three seasons. To replace the seniors and transfers, colleges target 7 to 8 players to fill roster holes. A team that is interesting to look at is Merrimack College because they average 11.5 players with 22 commitments. Looking at the data, they currently don’t have a commitment for the 2026-2027 season, which skews the data for this visual.

Fourth Visualization - Line Plot

# creates the conference data frame and gets the what columns I need
conference_df = df.groupby(['DOB', 'College Conference'])['College Conference'].count().reset_index(name='Total Commitments')
conference_df['DOB'] = pd.to_datetime(conference_df['DOB'], format='%Y')

# sets up the figure for the visual
fig = plt.figure(figsize = (18, 10))
ax = fig.add_subplot(1, 1, 1)

# identifies colors for each conference
my_colors = {'AHA':'purple', 'BigTen':'green', 
            'CCHA':'lightblue', 'ECAC': 'gray', 
            'Hockey East':'gold', 'Independent':'brown', 
            'NCHC':'red'}

# creates a for loop to plot each line for each conference
for key, grp in conference_df.groupby(['College Conference']):
    conference = key[0]
    grp.plot(ax=ax, kind='line', x='DOB', y='Total Commitments', color=my_colors[conference], label=conference, marker='8')
    
plt.title('Total Commitments by Birth Year per Conference', fontsize=18)
ax.set_xlabel("Players' Birth Year", fontsize=18)
ax.set_ylabel('Total Commitments', fontsize = 18, labelpad=20)
ax.tick_params(axis = 'x', labelsize=14, rotation = 0)
ax.tick_params(axis = 'y', labelsize = 14, rotation = 0)

# line of code that handles how the legend is equal to each line in the visual
handles, labels = ax.get_legend_handles_labels()
handles = [ handles[0], handles[1],handles[2],handles[3],handles[4],handles[5],handles[6] ]
labels = [labels[0],labels[1],labels[2],labels[3],labels[4],labels[5],labels[6]]

# creates a legend for the end user
plt.legend(handles, labels, loc='best', fontsize=14, ncol=1)
    
plt.show()

Insight: This line plot visualization represents the total commitments by birth year per conference. I created a line plot data frame that grabbed the college team conference column, the DOB, and the total count of commitments. For this visualization, the x-axis represents the player’s birth year, and the y-axis represents the total number of commitments. I implemented the legend to visualize what conference each line represents. This line plot targets how conferences compare against others regarding the total obligations and how conferences compare recruiting different birth years. Based on the visualization, each conference took a dip regarding the 2007 birth years. There could be many reasons, but it should be known that the 2007 birth years are 16/17 years old this year, and they are too young to commit. The AHA and ECAC conferences lead the two oldest birth years. This could be because many teams in these conferences tend to get older players. Usually, the top-end talent begins their college careers early. The older players spend more years in junior hockey. The top-end conferences that have the powerhouse teams have younger rosters. Hockey East conference leads 2006 and 2007 birth years, which is no surprise because they have some powerhouse teams.

Fifth Visualization - Stacked Bar Chart

# creates the stacked data frame for the visual
stacked_df = df.groupby(['College Conference', 'Starting Year'])['College Conference'].count().reset_index(name='Total Commitments')

# creates the pivot in order to put values where they need to be
stacked_df = stacked_df.pivot(index = 'Starting Year', columns='College Conference', values = 'Total Commitments')

# identifies the conference order to be abc order
conference_order = ['AHA', 'BigTen', 'CCHA', 'ECAC', 'Hockey East', 'Independent', 'NCHC']
stacked_df = stacked_df.reindex(columns=reversed(conference_order))

# sets up the figure for the visual
fig = plt.figure(figsize=(18,10))
ax = fig.add_subplot(1,1,1)

# plots the data frame for a bar chart
stacked_df.plot(kind = 'bar', stacked=True, ax = ax)

plt.ylabel('Total Commitments', fontsize=18, labelpad=10)
plt.title('Total Commmitments per Conference by Starting Season \n Stacked Bar Plot', fontsize = 18)
plt.xticks(rotation=0, horizontalalignment = 'center', fontsize = 14)

plt.yticks(fontsize=14)

ax.set_xlabel('Starting Year', fontsize = 18)

# line of code that handles how the legend is equal to each stacked bar
handles, labels = ax.get_legend_handles_labels()
handles = [ handles[6], handles[5],handles[4],handles[3],handles[2],handles[1],handles[0] ]
labels = [labels[6],labels[5],labels[4],labels[3],labels[2],labels[1],labels[0]]

# sets up the legend for the end user
plt.legend(handles, labels, loc='best', fontsize=14, ncol=1)

plt.show()

Insight: This stacked bar plot visualization represents the total commitments by the starting season per conference. I created a stacked bar plot data frame that grabbed the college team conferences, starting year and the total count of commitments. I modified the data into order but the starting year as the index. With ordering the conferences by alphabetical order, it helps the viewer understand the data. For this visualization, the x-axis is the starting season of the commitment, and the y-axis is the total number of commitments. I implemented the legend to give a visual of what conference each bar represents. This stacked bar plot targets the idea of seeing the difference between each of the three starting seasons for committed players. Also, this visual shows how each conference stacks up against each other. Based on the visualization, the 2026-2027 starting season doesn’t have a lot of commitments. A lot of the players are relatively young and just beginning the recruiting process for college. Independent Teams and BigTen Conference have the lowest number of commitments. One main reason is that the number of teams in the BigTen and the number of Independent Teams are relatively tiny compared to other conferences. Something interesting to note is the AHA conference only has one commitment for the 2026-2027 season. With the AHA usually not having top-end young players on their rosters, it is no surprise that they do not have many commitments.

Sixth Visualization - Pie Chart

# set up the pie chart data frame for the visual 
pie_chart_df = df.groupby(['College Conference', 'DOB', 'Commit Year', 'Starting Year'])['College Conference'].count().reset_index(name='Total Commitments')

# creates a column of age of commitment from a calculation
pie_chart_df['Age of Commitment'] = pie_chart_df['Commit Year'] - pie_chart_df['DOB']

# gets the aggregate for total commits and age of commitment
pie_chart_df = pie_chart_df.groupby(['College Conference']).agg({'Total Commitments': 'sum', 'Age of Commitment': ['mean']}).reset_index()

# sets up the columns to be used (BUT NOTE: did not use average age of commits to display)
pie_chart_df.columns = ['College Conference', 'TotalCommits', 'AverAgeCommits']

# sets up colors for each conference
number_outside_colors = len(pie_chart_df['College Conference'].unique())
outside_color_ref_number = np.arange(number_outside_colors)*2

fig = plt.figure(figsize=(10, 10))
ax = fig.add_subplot(1,1,1)

# lets up color map for outside color map
colormap = plt.get_cmap("tab20")
outer_colors = colormap(outside_color_ref_number)

# creates a variable of all commits to be called later
all_commits = pie_chart_df.TotalCommits.sum()

# sets up the pie chart and displays what is shown in each slice
pie_chart_df.groupby(['College Conference'])['TotalCommits'].sum().plot(
    kind='pie', radius=1, colors = outer_colors, pctdistance = 0.75, labeldistance = 1.1, 
    wedgeprops = dict(edgecolor = 'white'), textprops = {'fontsize': 18}, 
    autopct = lambda p: "{:.2f}%\n({:.0f})".format(p, (p/100)*all_commits), 
    startangle = 90)


hole = plt.Circle((0,0), 0.3, fc='white')
fig1 = plt.gcf()
fig1.gca().add_artist(hole)

ax.yaxis.set_visible(False)
plt.title('Total Commitments by Conference', fontsize=18)

ax.text(0, 0, 'Total Commitments\n' + str(all_commits), size = 14, ha='center', va='center')

ax.axis('equal')

plt.tight_layout()


plt.show()

Insight: This pie chart plot visualization represents the total commitments by conference. I created a pie chart plot data frame that grabbed the college conference and the total count of commitments. I added the percentages and amounts for each conference. For this visualization, each pie slice represents a conference within the data. I implemented each pie’s percentages and the total number of commitments per conference. I used the inside of the pie chart to show the total amount of commitments in the data. This visualization shows how conferences are stacked against each other and how the breakdown looks by conference. Independent, BigTen, and CCHA are the bottom three conferences with 91, 121, and 116 commitments, respectively. ECAC leads the way with 233 and Hockey East 227. Many of the commitments per conference result from the number of teams in each of those conferences.

Seventh Visualization - Nested Pie Chart

# fills the na columns of current teams and adds values to those empty values
df['Current Team'].fillna("Not Available (Not Available)", inplace=True)
df['Current Team'] = df['Current Team'].replace('Not Available', 'Not Available (Not Available)')
df['Current Team'].value_counts()

df[['Current Team', 'Current League']] = df['Current Team'].str.extract(r'(.+)\s\((.+)\)')

# sets up nested pie chart data frame 
nested_pie_chart_df = df.groupby(['College Conference', 'Current Team', 'Current League'])['College Conference'].count().reset_index(name='Total Commitments')
nested_pie_chart_df = nested_pie_chart_df.groupby(['College Conference', 'Current League']).agg({'Total Commitments': 'sum'}).reset_index()
nested_pie_chart_df.sort_values(by=['College Conference', 'Total Commitments'], ascending=[True, False], inplace=True)

# function that sets up top 3 leagues and combines all others
def combine_other_leagues(group):
    # Sort the group by 'Total Commitments' in descending order
    group = group.sort_values(by='Total Commitments', ascending=False)
    # Identify top three leagues
    top_three_leagues = group['Current League'].head(3)
    # Combine commitments for other leagues
    other_commitments = group.loc[~group['Current League'].isin(top_three_leagues), 'Total Commitments'].sum()
    # Create a df with combined commitments to be displayed for the nested pie chart
    combined_row = pd.DataFrame({'College Conference': [group['College Conference'].iloc[0]],
                                 'Current League': ['Other'],
                                 'Total Commitments': [other_commitments]})
    return pd.concat([group[group['Current League'].isin(top_three_leagues)], combined_row])

# creates the combined_df from the nested by chart and calls the function
combined_df = nested_pie_chart_df.groupby('College Conference').apply(combine_other_leagues).reset_index(drop=True)
combined_df = combined_df.sort_values(by=['College Conference', 'Total Commitments'], ascending=[True, False])

# sets the inside colors for the nested pie chart (TO NOTE: outside colors identified in other visual)
number_inside_colors = len(nested_pie_chart_df['College Conference'].unique())
all_color_ref_number = np.arange(number_outside_colors + number_inside_colors)

inside_color_ref_number = []
for each in all_color_ref_number:
    if each not in outside_color_ref_number:
        inside_color_ref_number.append(each)

# creates the figure to be displayed
fig = plt.figure(figsize=(15, 15))
ax = fig.add_subplot(1,1,1)

# sets up the color map
colormap = plt.get_cmap("tab20")
outer_colors = colormap(outside_color_ref_number)

# creates all_commits variable
all_commits = combined_df['Total Commitments'].sum()
percent = lambda p: "{:.2f}%\n({:.0f})".format(p, (p/100)*all_commits)

# creates outside pie chart
combined_df.groupby(['College Conference'])['Total Commitments'].sum().plot(
    kind='pie', radius=1, colors = outer_colors, pctdistance = 0.85, labeldistance = 1.1, 
    wedgeprops = dict(edgecolor = 'white'), textprops = {'fontsize': 18}, 
    autopct = percent, 
    startangle = 180)

# creates nested pie chart
inner_colors = colormap(inside_color_ref_number)
combined_df['Total Commitments'].plot(
    kind='pie', radius=0.7, colors = inner_colors, pctdistance = 0.8, labeldistance = 0.55, 
    wedgeprops = dict(edgecolor = 'white'), textprops = {'fontsize': 12},
    labels = combined_df['Current League'],
    autopct = '%1.2f%%', 
    startangle = 180,
    rotatelabels = True)

hole = plt.Circle((0,0), 0.3, fc='white')
fig1 = plt.gcf()
fig1.gca().add_artist(hole)

ax.yaxis.set_visible(False)
plt.title('Total Commitments by Conference and Junior League', fontsize=18)

ax.text(0, 0, 'Total Commitments\n' + str(all_commits), size = 14, ha='center', va='center')

ax.axis('equal')

plt.tight_layout()


plt.show()

Insight: This nested pie chart plot visualization represents the total commitments by conference and breakdown between popular junior hockey leagues. I created a nested pie chart plot data frame that grabbed the college conference, current team, current league and the total count of commitments. I added the percentages and amounts for each conference for the outer pie chart and I added the percentages for the current league for the nested chart. For this visualization, each pie slice represents a conference within the data. Within those pie slices is the nested pie that shows the top 3 most popular junior hockey leagues from the committed players. Due to the number of committed players, having only the top three popular junior hockey leagues honestly didn’t fairly share the representation between commitments. Combining the non-top three junior hockey leagues into Other allows each junior hockey league to be represented somehow. I implemented the percentages that each pie and nested pie represents and the total number of commitments per conference on the outer pie chart. I used the inside of the pie chart to show the total amount of commitments in the data. This visualization shows how conferences are stacked against each other and how the breakdown looks by conference and junior hockey league. Based on the data, there were three junior hockey leagues that constantly were in the top 3 for each of the conferences. The BCHL showed up in all conferences, the USHL showed up in six out of the seven conferences, and the NAHL showed up in five. The only other junior leagues/teams that were popular were the NTDP, OJHL, and the Minnesota High School Hockey League (MN-US). One thing to note based on the visualization, is some of the percentages are overlapping each other. For the BigTen Conference, NTDP team represents 1.78% while the BCHL represents 1.40%. This visualization is essential on multiple levels; it shows the fans how most top players play junior hockey in the BCHL, USHL, and NAHL. It shows parents a path to college hockey if they help their kids get into those leagues. Lastly, it shows colleges what leagues to recruit from so they don’t waste resources traveling to significantly more minor junior leagues. An interesting junior league/team to look at is the NTDP, which is the National Team Development Program. The top 16-17 year old hockey players in America get invited to play on that team. The conference that has that league in the top 3 is the BigTen. The BigTen conference has some of the most storied college hockey programs, so it should come as no surprise when they recruit the best of the best.

Eighth Visualization - Bump Chart

# creates bump df to be used for visual
bump_df = df.groupby(['College Conference', 'DOB', 'Commit Year', 'Starting Year'])['College Conference'].count().reset_index(name='Total Commitments')
bump_df['Age of Commitment'] = bump_df['Commit Year'] - bump_df['DOB']
bump_df = bump_df.groupby(['College Conference', 'Starting Year']).agg({'Total Commitments': 'sum', 'Age of Commitment': ['mean']}).reset_index()
bump_df.columns = ['College Conference', 'Starting Year', 'TotalCommits', 'AverAgeCommits']

# following three lines to identify ranking for average age commits
bump_df = bump_df.pivot(index = 'College Conference', columns = 'Starting Year', values = 'AverAgeCommits')

bump_df_ranked = bump_df.rank(0, ascending = True, method='min')

bump_df_ranked = bump_df_ranked.T

# creates figure to be used
fig = plt.figure(figsize=(18, 10))
ax = fig.add_subplot(1,1,1)

# plots the ranked data frame
bump_df_ranked.plot(kind='line', ax=ax, marker='o', markeredgewidth = 2, linewidth=6, 
                   markersize = 44, markerfacecolor = 'white')

ax.invert_yaxis()

num_rows = bump_df_ranked.shape[0]
num_cols = bump_df_ranked.shape[1]

# sets up year_order for x-axis
year_order = ['2024-25', '2025-26', '2026-27']

plt.ylabel('Starting Season Rankings', fontsize=18, labelpad=10)
plt.title('Ranking of Average Age Committed by Starting Season per Conference\n Bump Chart', fontsize = 18, pad = 15)
plt.xticks(np.arange(num_rows), year_order, fontsize=14)

plt.yticks(range(1, num_cols+1, 1), fontsize=14)

ax.set_xlabel('Starting Season', fontsize = 18)

handles, labels = ax.get_legend_handles_labels()
handles = [handles[6], handles[5],handles[4],handles[3],handles[2],handles[1],handles[0]]
labels = [labels[6],labels[5],labels[4],labels[3],labels[2],labels[1],labels[0]]

plt.legend(handles, labels, bbox_to_anchor=(1.01, 1.01), fontsize=14, 
          labelspacing= 1, markerscale = .4, borderpad = 1, handletextpad = 0.8)

# creates variables i and j
i = 0 
j = 0

# identifies the columns and rows and writes the value inside the index
for eachcol in bump_df_ranked.columns:
    for eachrow in bump_df_ranked.index:
        this_rank = bump_df_ranked.iloc[i, j]
        ax.text(i, this_rank, str(round(bump_df.iloc[j, i], 2)), ha='center', va='center', fontsize=12)
        i += 1
    j += 1
    i = 0

plt.show()

Insight: This bump plot visualization represents the rankings of conferences per starting season per conference by average age of commitment. I created a bump chart plot data frame that grabbed the starting year, average age of commitment, and the total count of commitments. For this visualization, the x-axis is the starting season for the commitments, and the y-axis is the ranking. I implemented the legend to visualize what conference each line represents. The average age of commitment is displayed within the circle on each line. This bump plot targets the idea of seeing how conferences rank against others regarding the average age of commitment by the starting season. Based on the visualization, the BigTen and NCHC are ranked 1 and 2 for the upcoming season. Many young players go to those leagues as that is where the top teams in the country are. The Hockey East conference makes a big jump for the 2025-2026 season as they take over the top spot with an average age of commitment at 17. What is interesting to note about the 2026-2027 season is that independent Teams take the top ranking spot with a commitment average age of 16.5. Also, four conferences are tied in 2nd place with an average age of 17. The 2026-2027 season rankings should be taken with a grain of salt because only a few players have yet committed for that season. This visualization supports the theory that the top conferences are getting younger with better talent. Junior Hockey is there to develop until you are ready to play college hockey. If the top conferences are getting young players in their door, it can assumed those players have tons of talent.

Wrap up

In conclusion, these visualizations comprehensively analyze college hockey commitments, covering how college hockey teams will look in the seasons to come, junior hockey trends, and college hockey conference dynamics. While each visualization offers valuable insights into specific aspects of the hockey player’s commitments, there are opportunities to improve clarity, context, and presentation to maximize their effectiveness in conveying information to stakeholders. By refining visualization techniques and providing more precise explanations of data trends, these visualizations can be powerful tools for fans, management, college hockey teams, junior hockey teams, and NHL front offices. They can inform strategic decision-making processes and facilitate a deeper understanding of the intricacies of youth, junior, and college hockey.