Introduction

When it comes to the game of tennis, there are never ending debates about who are the best players, specifically who is the greatest player of all time in the Association of Tennis Professionals, specifically when it comes to the men. Due to the structure of the tournaments in the ATP, both international tournaments and Grand Slams, there always comes a time when there could possibly be an upset or unexpected win. When it comes to the major tournaments that are highly anticipated, Wimbledon, US Open, Australian Open, and the French Open, there are always debated about who is going to win, and a large percentage of the time you will hear people say it will be Federer, Nadal, or Djokovic. Opinions greatly differ on who is the best of the three and the ATP as it takes a lot to rank in the top ten let alone three of the ATP. Using the data provided by the Association of Tennis Professionals Men’s Tour, we are able make some insights into the results of tournaments dating back from 2000 to 2016, including Grand Slams, Master Series, Masters Cups, and International Series competitions. This report will look at the top ranked players and which of the top ranked players has the higher winning percentage and major titles in the world.

Dataset

This data set was collected from the Association of Tennis Professionals (ATP) Men’s Tour. The ATP Men’s Tour data can be found at https://www.kaggle.com/jordangoblet/atp-tour-20002016. When examining this data set that documents the ATP tournaments as well as the outcome of the matches, dates, rankings, and anything that is essential to know for an ATP tournament, there was some details that we not as significant when fully analyzing the information. Of the 54 columns, a large portion of them provided valuable knowledge when it came to dissecting and learning more about rankings, players in the ATP, and the tournaments that are held, although some of the information provided while informative, was not necessarily relevant to the visualizations. Since some of the details provided were not as significant as others, I chose not to use them, but rather utilized the data that pertained more so to the top three players, Roger Federer, Rafael Nadal, and Novak Djokovic as well as information relevant to the Grand Slam titles. A description of the cleanup can be found in the appendix below.

##Findings {.tabset .tabset-fade .tabset-pill}

import os
os.environ['QT_QPA_PLATFORM_PLUGIN_PATH'] = 'c:/ProgramData/Anaconda3/Library/plugins/platforms'

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

warnings.filterwarnings("ignore")

path = "U:/"

#textfile = path + 'Metadata.txt'

#metadata = pd.read_csv(textfile, sep='\t')

filename = path + 'Data.csv'

tennis_df = pd.read_csv(filename, encoding='latin1')

Winning Probabilty vs Ranking Difference for Grand Slams and Non Grand Slams

One of the first things to look at before diving deeper into the world of ATP Men’s tennis, is the probability of winning a Grand Slam title versus a title that is not a Grand Slam. Grand Slam tournaments are the world’s four most important tournaments in the world of tennis. These tournaments also offer the most ranking points and a greater number of the “best of” sets, (which is 5 vs 3). This can help us understand the which kind of tournament has the higher percentage of winning for each ranking as well as whether or not it is a Grand Slam, which consists of four major tournaments the Wimbleton, US Open, Australian Open, and French Open.


tennis_df.WRank = pd.to_numeric(tennis_df.WRank, errors = 'coerce') 
tennis_df.LRank = pd.to_numeric(tennis_df.LRank, errors = 'coerce')

tennis_df['Difference'] =  tennis_df.LRank - tennis_df.WRank 

tennis_df['Round_10'] = 10*round(np.true_divide(tennis_df.Difference,10))
tennis_df['Round_20'] = 20*round(np.true_divide(tennis_df.Difference,20))

#Number of sets within a match
tennis_df['Total Sets'] = tennis_df.Wsets + tennis_df.Lsets

tennis_df.W3 = tennis_df.W3.fillna(0)
tennis_df.W4 = tennis_df.W4.fillna(0)
tennis_df.W5 = tennis_df.W5.fillna(0)
tennis_df.L3 = tennis_df.L3.fillna(0)
tennis_df.L4 = tennis_df.L4.fillna(0)
tennis_df.L5 = tennis_df.L5.fillna(0)

tennis_df['Sets Difference'] = tennis_df.W1+tennis_df.W2+tennis_df.W3+tennis_df.W4+tennis_df.W5 - (tennis_df.L1+tennis_df.L2+tennis_df.L3+tennis_df.L4+tennis_df.L5)

new_df = tennis_df

df_non_GrandSlam = new_df[~(new_df.Series == 'Grand Slam')]

df_GrandSlam = new_df[new_df.Series == 'Grand Slam']

plt.figure(figsize = (20, 10))
bins = np.arange(10, 200, 10)
GrandSlam_prob = []
non_GrandSlam_prob = []

for value in bins:
     
    pos = value
    neg = -value
    
    pos_wins = len(df_GrandSlam[df_GrandSlam.Round_10 == pos])
    neg_wins = len(df_GrandSlam[df_GrandSlam.Round_10 == neg])
    GrandSlam_prob.append(np.true_divide(pos_wins, pos_wins + neg_wins))
    
    pos_wins = len(df_non_GrandSlam[df_non_GrandSlam.Round_10 == pos])
    neg_wins = len(df_non_GrandSlam[df_non_GrandSlam.Round_10 == neg])
    non_GrandSlam_prob.append(np.true_divide(pos_wins, pos_wins + neg_wins))
    
plt.bar(bins, GrandSlam_prob, width = 9, color = 'red') 
## <BarContainer object of 19 artists>
plt.bar(bins, non_GrandSlam_prob, width = 8, color = 'blue')
## <BarContainer object of 19 artists>
plt.title('Ranking Difference vs Winning Probability', fontsize = 30)
plt.xlabel('Ranking Difference',fontsize = 15)
plt.ylabel('Winning Probability',fontsize = 15)
plt.xlim([10, 200])
## (10.0, 200.0)
plt.ylim([0.5, 0.9])
## (0.5, 0.9)
plt.legend(['Grand Slams', 'Non Grand Slams'], loc = 1, fontsize = 15)

plt.show() 

Several things can be inferred from the above bar chart, we can easily see that the percentage of winning increases as the rank difference does. However, this trend does ten to saturate when the rank different reaches one-hundred. This trend can be seen in both Grand Slam and non Grand Slam tournaments. One can also infer that a more favored player has a higher percentage of winning compared to one that is not, or an underdog that is ranked lower in a Grand Slam tournament. Finally, once can also conclude that because Grand Slam tournaments play five sets rather than three, non Grand Slam tournaments could possibly be seen as more effective for players that are not of a certain ranking.

Grand Slams Won Since 2000

In the tennis world, there will always be specific players that win certain tournaments more so than others. Knowing that the major four major tournaments in the Grand Slam are the most highly anticipated and played the most out of any tournament, finding out which player has the most of each tournament will help explain the ranking of each player. This heat map gives us a contrast of the four major Grand Slams and the players who have won each tournament.


slams = tennis_df[tennis_df.Series == 'Grand Slam']

sets = slams[['Tournament', 'Series', 'Round', 'Wsets', 'Lsets']]

sets = sets.dropna()

round(sets.groupby('Tournament')['Lsets'].mean(), 3)
## Tournament
## Australian Open    0.685
## French Open        0.663
## US Open            0.679
## Wimbledon          0.688
## Name: Lsets, dtype: float64
round(sets.groupby('Round')['Lsets'].mean(), 3)
## Round
## 1st Round        0.674
## 2nd Round        0.686
## 3rd Round        0.655
## 4th Round        0.691
## Quarterfinals    0.705
## Semifinals       0.750
## The Final        0.776
## Name: Lsets, dtype: float64
wins = slams[['Winner', 'Tournament', 'Round']]
wins = wins[wins.Round == 'The Final']

winners_df = wins.groupby('Winner')['Tournament'].count()
winners_df = winners_df.reset_index()

winners_df = winners_df.sort_values(['Tournament'], ascending = False)

winners_slam_df = wins.groupby(['Winner','Tournament']).count()
winners_slam_df = winners_slam_df.reset_index()
winners_slam_df = winners_slam_df.sort_values(['Winner'], ascending=True)
winners_slam_df.columns = ['Winner','Tournament', 'Count']

winners_slam_df = winners_slam_df.dropna()

tennis_hm_df = pd.pivot_table(winners_slam_df, index='Winner', columns = 'Tournament', values = 'Count')


from matplotlib.ticker import FuncFormatter

fig = plt.figure(figsize = (18, 10))
ax = fig.add_subplot(1, 1, 1)

comma_fmt = FuncFormatter(lambda x, p: format(int(x), ','))

ax = sns.heatmap(tennis_hm_df, linewidth = 0.5, annot = True, cmap = 'coolwarm', fmt=',.0f',
                 square = True, annot_kws={'size':11},
                 cbar_kws = {'format': comma_fmt, 'orientation': 'vertical'})


plt.show()

The above heat map can provide us with additional information and insights to each player for each tournament. While the bar graph had showed the winning probability of a Grand Slam vs a Non Grand Slam pertaining to the players ranking, it did not show how many wins, which tournament specifically, and which player. This heat map shows the relative values of the total tournaments won in each of the major four, providing a more capturing experience. Based on the data, the player who have the over all most wins acorss the major four are Federer, Nadal, and Djokovic. What we conclude from this data is that they are clearly the top players during this time and that each player clearly has their “preferred” tournament, (each tournament is held on different surfaces, two of which are the same).

Ranking of Top 10 Players by Each Surface

A huge debate among all tennis fans and players is, who is the best tennis player in the world? By understanding that the four major Grand Slam tournaments are all on different surfaces (grass, clay, and hard), understanding the top ten players ranking over all, as well as what their ranking is on each surface allows for a better overall understanding of each player and possibly determine their preference or familiarity with the different surfaces. By graphing the top ten players around the world by their ranking, we can get and insight as to who ranks the highest among them, as well as who ranks the highest on each surface. The below graph is a stacked bar chart, which is colored by each court surface.


top10_df = tennis_df[(tennis_df.Winner.isin(['Federer R.', 'Nadal R.', 'Djokovic N.', 'Murray A.', 'Roddick A.', 'Hewitt L.', 'Tsonga J.W.', 'Ferrer D.', 'Nalbandian D.', 'Berdych T.']))]

top10_df = tennis_df[(tennis_df.Loser.isin(['Federer R.', 'Nadal R.', 'Djokovic N.', 'Murray A.', 'Roddick A.', 'Hewitt L.', 'Tsonga J.W.', 'Ferrer D.', 'Nalbandian D.', 'Berdych T.']))]

top10_df = top10_df[['Date', 'Surface', 'Winner', 'Loser', 'WRank', 'LRank']]

top10_wins = top10_df[top10_df.Winner.isin(['Federer R.', 'Nadal R.', 'Djokovic N.', 'Murray A.', 'Roddick A.', 'Hewitt L.', 'Tsonga J.W.', 'Ferrer D.', 'Nalbandian D.', 'Berdych T.'])]

top10_losses = top10_df[top10_df.Loser.isin(['Federer R.', 'Nadal R.', 'Djokovic N.', 'Murray A.', 'Roddick A.', 'Hewitt L.', 'Tsonga J.W.', 'Ferrer D.', 'Nalbandian D.', 'Berdych T.'])]

top10_wins = top10_wins[['Date', 'Surface', 'Winner', 'WRank']]
top10_losses = top10_losses[['Date', 'Surface', 'Loser', 'LRank']]
top10_wins.columns = ['Date', 'Surface', 'Player', 'Rank']
top10_losses.columns = ['Date', 'Surface', 'Player', 'Rank']

top10_df = pd.concat([top10_wins, top10_losses], sort=True)
top10_df['Date'] = pd.to_datetime(top10_df.Date, format='%d/%m/%Y')

top10_df = top10_df.sort_values(['Date'])

top10_players_df = top10_df.groupby(['Surface', 'Player'])['Rank'].sum().reset_index(name='Total Ranking')

top10_players_df = top10_players_df.sort_values(['Total Ranking'], ascending=False)

top10_players_df = top10_players_df.pivot(index='Player', columns='Surface', values='Total Ranking')

surface_order = ['Clay', 'Grass', 'Carpet', 'Hard']
top10_players_df = top10_players_df.reindex(columns=reversed(surface_order))

from matplotlib.ticker import FuncFormatter

fig = plt.figure(figsize=(20, 15))
ax = fig.add_subplot(1, 1, 1)

top10_players_df.plot(kind='bar', stacked=True, ax=ax)

plt.ylabel('Total Ranking', fontsize=20, labelpad=15)
plt.title('Total Ranking by Player and by Surface \n Stacked Bar Plot', fontsize=20)
plt.xticks(rotation=0, horizontalalignment = 'center', fontsize=18)
## (array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), [Text(0, 0, 'Berdych T.'), Text(1, 0, 'Djokovic N.'), Text(2, 0, 'Federer R.'), Text(3, 0, 'Ferrer D.'), Text(4, 0, 'Hewitt L.'), Text(5, 0, 'Murray A.'), Text(6, 0, 'Nadal R.'), Text(7, 0, 'Nalbandian D.'), Text(8, 0, 'Roddick A.'), Text(9, 0, 'Tsonga J.W.')])
plt.yticks(fontsize=18)
## (array([    0.,  2000.,  4000.,  6000.,  8000., 10000., 12000., 14000.,
##        16000.]), [Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, '')])
ax.set_xlabel('Top 10 Players (Based on GS Wins 2000 - 2016)', fontsize=18, labelpad=15)

handles, labels = ax.get_legend_handles_labels()
handles = [handles[3], handles[2], handles[1], handles[0]]
labels  = [labels[3],  labels[2],  labels[1],  labels[0],]
plt.legend(handles, labels, loc='best', fontsize=16)

plt.show()

The above graph can provide various insights and predictions of each player. First, we can see that out of the top ten player that Ferrer is ranked that highest, followed by Roddick, Djokovic, and Nadal making them the top four players in the world. Second, we can see that of these four that they all clearly have a high ranking in hard court, thus we can make the assumption that they all play on that surface frequently. We can also infer that Ferrer, Djokovic, and Nada have a higher ranking on clay than Roddick, with Ferrer being the highest. Third, we can see that of the top four ranked players, they all have small if not minimal rankings in grass and carpet court surfaces, with some lower ranked players like Hewitt and Federer who have higher rankings in grass than all of the top four players. Finally, the most intersting part of this graph is the fact that Federer who is referred to one of the big three, and possibly one of the greatest players of all time, has a lower overall ranking compared to Roddick and Ferrer who are not recognized as a top three player. This could be due to the fact that Federer was a more well known and popular player like Nadal and Djokovic, and is also very popular among certain age groups.

A look at the Big Three: Federer, Nadal, & Djokovic Ranking

In many sports like tennis there are always players that go through and evolution as their career takes off, suffer an injury, or maybe even have a major loss in a tournament. In this sport, a majority of the players tend to have several years where they are at the height of their career followed by it tapering off or hitting a plateau, and maybe even eventually retire. With Federer, Nadal, and Djokovic being the most popular and top tennis players of this time, I wanted to to compare their rank over the course of 2000 all the way to 2016. To do this I created a line chart comparing each of the payers total rank throughout each tennis season, (yearly interval).


top3_df = tennis_df[(tennis_df.Winner.isin(['Federer R.', 'Nadal R.', 'Djokovic N.']))]

top3_df = tennis_df[(tennis_df.Loser.isin(['Federer R.', 'Nadal R.', 'Djokovic N.']))]

top3_df = top3_df[['Date', 'Winner', 'Loser', 'WRank', 'LRank']]

top3_wins = top3_df[top3_df.Winner.isin(['Federer R.', 'Nadal R.', 'Djokovic N.'])]

top3_losses = top3_df[top3_df.Loser.isin(['Federer R.', 'Nadal R.', 'Djokovic N.'])]

top3_wins = top3_wins[['Date', 'Winner', 'WRank']]
top3_losses = top3_losses[['Date', 'Loser', 'LRank']]

top3_wins.columns = ['Date', 'Player', 'Rank']
top3_losses.columns = ['Date', 'Player', 'Rank']

top3_df = pd.concat([top3_wins, top3_losses], sort=False)
top3_df['Date'] = pd.to_datetime(top3_df.Date, format='%d/%m/%Y')

top3_df = top3_df.sort_values(['Date'])
top3_df.Rank = top3_df.Rank.astype(int)

# Remove outlying Ranks

top3_df = top3_df[top3_df.Rank < 100]

top_players_df = top3_df.groupby(['Date', 'Player'])['Rank'].sum().reset_index(name='Total Ranking')

fig = plt.figure(figsize = (18, 10))
ax = fig.add_subplot(1, 1, 1)

my_colors = {'Federer R.':'red',
             'Nadal R.':'green',
             'Djokovic N.':'blue'}

for key, grp in top_players_df.groupby(['Player']):
    grp.plot(ax=ax, kind='line', x='Date', y='Total Ranking', color=my_colors[key], label=key, marker='8')

plt.title('Total Rankings by Year', fontsize=20)
ax.set_xlabel('Date (Yearly Interval)', fontsize=20)
ax.set_ylabel('Total Ranking', fontsize=20, labelpad=22)
ax.tick_params(axis='x', labelsize=18, rotation=0)
ax.tick_params(axis='y', labelsize=18, rotation=0)

handles, labels = ax.get_legend_handles_labels()
handles = [handles[1], handles[2], handles[0]]
labels  = [labels[1],  labels[2],  labels[0]]
plt.legend(handles, labels, loc='best', fontsize=18, ncol=1)


plt.show()

Using the above line chart, it is clear that over the past sixteen years from 2000 to 2016 that each of the players started at high rankings. However, you can clearly see that Federer started his pro career before both Nadal and Djokovic, followed with Nadal starting his career before Djokovic. We can also clearly see a patter among all of these top players, even though they each started their professional career at different times, that all of their rankings started off very high followed by a decline over the years. Thus, allowing for us to make the assumption that age could play into the decline of these players ranks. However, Federer is older than both Nadal, and you can see towards the later years on the chart between 2014 - 2016 that his ranking fluctuates and actually is higher than the two younger players. You can also see that Nadal and Federer had more fluctuations in their rankings than Djokovic did as he seemed to plateau at the end, having a lower ranking than both Federer and Nadal making no indication that it is to increase in the future.

Big Three Comparison on Court Surfaces

While ranking or the number of Grand Slam tournaments and non Grand Slam tournaments say a lot about a player, when really dividing the data to get each players performance on the top three surfaces it can give you more insight to a player than their ranking and possibly even their performance in a tournament. In general a players performance on a tennis court surface correlates more so with what they primarily play on, but can also correlate with how each player performs on the other surfaces. To do that I decided to graph this data on a radar char to show who of the big three, (Federer, Nadal, and Djokovic), is the most well rounded player and the best on each surface.


surface = tennis_df[['Surface', 'Winner', 'Loser']]

surface_wins = surface[['Surface', 'Winner']]
surface_losses = surface[['Surface', 'Loser']]
surface_wins.columns = ['Surface', 'Player']
surface_losses.columns = ['Surface', 'Player']

surface_wins['idx'] = range(1, len(surface_wins) + 1)
surface_losses['idx'] = range(1, len(surface_losses) + 1)

surface_wins = surface_wins.groupby(['Surface', 'Player']).count()
surface_wins = surface_wins.reset_index()
surface_wins.columns = ['Surface', 'Player', 'Win_Count']

surface_losses = surface_losses.groupby(['Surface', 'Player']).count()
surface_losses = surface_losses.reset_index()
surface_losses.columns = ['Surface', 'Player', 'Loss_Count']

surface = pd.merge(surface_wins, surface_losses, on=['Surface', 'Player'])

surface['Play_Total'] = surface['Win_Count'] + surface['Loss_Count']

surface['Win_Percentage'] = round(surface['Win_Count'] / surface['Play_Total'], 4)*100

surface = surface[surface.Play_Total > 50]

surface.sort_values(by='Win_Percentage', ascending=False).head(30)
##      Surface          Player  Win_Count  Loss_Count  Play_Total  Win_Percentage
## 612     Clay        Nadal R.        351          35         386           90.93
## 969    Grass      Federer R.        147          20         167           88.02
## 1435    Hard     Djokovic N.        469          82         551           85.12
## 1468    Hard      Federer R.        622         124         746           83.38
## 1129   Grass       Murray A.         90          18         108           83.33
## 944    Grass     Djokovic N.         65          14          79           82.28
## 351     Clay     Djokovic N.        169          39         208           81.25
## 1185   Grass      Roddick A.         82          21         103           79.61
## 1295    Hard       Agassi A.        203          56         259           78.38
## 1131   Grass        Nadal R.         58          17          75           77.33
## 381     Clay      Federer R.        203          60         263           77.19
## 1717    Hard       Murray A.        369         110         479           77.04
## 1022   Grass       Hewitt L.        103          31         134           76.87
## 57    Carpet      Federer R.         46          14          60           76.67
## 1719    Hard        Nadal R.        363         112         475           76.42
## 1802    Hard      Roddick A.        402         136         538           74.72
## 1824    Hard      Sampras P.         73          26          99           73.74
## 1244   Grass     Tsonga J.W.         42          16          58           72.41
## 520     Clay      Kuerten G.        105          40         145           72.41
## 385     Clay    Ferrero J.C.        221          86         307           71.99
## 167   Carpet        Safin M.         46          18          64           71.88
## 621     Clay    Nishikori K.         60          24          84           71.43
## 766     Clay        Thiem D.         59          24          83           71.08
## 874    Grass      Berdych T.         59          24          83           71.08
## 1422    Hard  Del Potro J.M.        220          90         310           70.97
## 316     Clay        Coria G.        129          53         182           70.88
## 1192   Grass     Rusedski G.         41          17          58           70.69
## 384     Clay       Ferrer D.        293         122         415           70.60
## 605     Clay         Moya C.        203          85         288           70.49
## 1550    Hard       Hewitt L.        324         138         462           70.13
surface.Surface.unique()

#Who of the Big Three is the best on Various Surfaces
## array(['Carpet', 'Clay', 'Grass', 'Hard'], dtype=object)
top3_surfaces = surface[(surface.Player.isin(['Federer R.', 'Nadal R.', 'Djokovic N.'])) & (surface.Surface != 'Carpet')]


top3_surfaces = pd.pivot_table(top3_surfaces, values='Win_Percentage', columns=['Surface'], index=['Player'])

top3_surfaces.index.names
## FrozenList(['Player'])
top3_surfaces[top3_surfaces.index == "Federer R."]

#%matplotlib inline
## Surface      Clay  Grass   Hard
## Player                         
## Federer R.  77.19  88.02  83.38
labels = np.array(['Clay', 'Grass', 'Hard'])
Federer = top3_surfaces.loc[top3_surfaces[top3_surfaces.index == "Federer R."].index[0],labels].values


Federer = top3_surfaces.loc[top3_surfaces[top3_surfaces.index == "Federer R."].index[0],labels].values
Nadal = top3_surfaces.loc[top3_surfaces[top3_surfaces.index == "Nadal R."].index[0],labels].values
Djokovic = top3_surfaces.loc[top3_surfaces[top3_surfaces.index == "Djokovic N."].index[0],labels].values

top3_wins = pd.DataFrame([Federer, Nadal, Djokovic])
top3_wins.columns = ['Clay', 'Grass', 'Hard']
top3_wins['Player'] = ['Federer R.', 'Nadal R.', 'Djokovic N.']
top3_wins = top3_wins[['Player', 'Clay', 'Grass', 'Hard']]

Federer = np.concatenate((Federer, [Federer[0]]))
Nadal = np.concatenate((Nadal, [Nadal[0]]))
Djokovic = np.concatenate((Djokovic, [Djokovic[0]]))

angles = np.linspace(0,2*np.pi, len(labels), endpoint=False)

angles = np.concatenate((angles,[angles[0]]))


#labels for some reason would not work so: 120º = Grass , 0º = Clay , 240º = Hard

fig = plt.figure(figsize=(20, 10))


ax1 = fig.add_subplot(111, polar=True)
ax1.plot(angles, Federer, 'o-', linewidth=2, label = 'Federer')
ax1.fill(angles, Federer, alpha = 0.25)
ax1.set_thetagrids(angles * 180/np.pi) #labels)
## (<a list of 8 Line2D ticklines objects>, [Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, '')])
ax1.grid(True)

ax2 = fig.add_subplot(111, polar=True)
ax2.plot(angles, Nadal, 'o-', linewidth=2, label = 'Nadal')
ax2.fill(angles, Nadal, alpha = 0.25)
ax2.set_thetagrids(angles * 180/np.pi) #labels)
## (<a list of 8 Line2D ticklines objects>, [Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, '')])
ax2.grid(True)

ax3 = fig.add_subplot(111, polar=True)
ax3.plot(angles, Djokovic, 'o-', linewidth=2, label = 'Djokovic')
ax3.fill(angles, Djokovic, alpha = 0.25)
ax3.set_thetagrids(angles * 180/np.pi) #labels)
## (<a list of 8 Line2D ticklines objects>, [Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, '')])
ax3.grid(True)

l = plt.legend(bbox_to_anchor=(1.1,1))

plt.show()

When looking at the radar chart above, you can see that each player has a vast knowledge and clearly versitility on each of the three popular surfaces of clay, grass, and hard court. However, we can see from this chart that Federer is clearly a stronger and better player on grass of the the three, with Nadal clearly being the stranger and better player on grass of the three. Djokovic, unlike Federer and Nadal, does not have a clear point where he comes out as the stronger or better player on a specific surface, rather he ranks close with Federer when it comes to hard court, but when really analyzing the data he beats Federer by about 1.74 points. When looking at the overal average of who is the best player on all of the surfaces, Djokovic actually comes out on top as the most complete or well rounded player just marginally over Federer by 0.02, and Nadal by 1.323. This data clearly shows that when it comes to performance on grass, clay, and hard court, Djokovic will perform better on all three over Nadal and Federer.

Conclusion

From all of these visualizations, we can learn several things about the top tennis players in the Association of Tennis Players, specifically those on the men’s tour. By looking at the bar chart of the winning probability of a Grand Slam or non Grand Slam tournament, we can strongly infer that players who have a higher winning probability who have a high ranking are more likely to win a Grand Slam tournament than someone who is not of a high rank. We can also infer from this data that there are clearly a select few of these tennis players who have high ranking across all court surfaces who also have won over a certain amount of Grand Slam tournaments. From this we concluded that of the top ten ranked players, Federer, Nadal, and Djokovic, while not having that highest total ranking across all of the court surfaces, had the most wins of the major four Grand Slam tournaments. Finally, we also learned of the bug three who was the more rounded player across all of the court surfaces which was Djokovic even though his ranking was lower than Federer’s and Nadal’s. While this does not entirely prove who is the greatest player of all time in the tennis world, it does give tennis fans across the globe a better idea of the performance of three most well known and loved players of our time.

Appendix

tennis_df
##        ATP        Location  ... Total Sets Sets Difference
## 0        1        Adelaide  ...        2.0             6.0
## 1        1        Adelaide  ...        2.0             6.0
## 2        1        Adelaide  ...        3.0             4.0
## 3        1        Adelaide  ...        2.0             7.0
## 4        1        Adelaide  ...        3.0             1.0
## ...    ...             ...  ...        ...             ...
## 46647   54  St. Petersburg  ...        2.0             8.0
## 46648   54  St. Petersburg  ...        2.0             6.0
## 46649   54  St. Petersburg  ...        2.0             4.0
## 46650   54  St. Petersburg  ...        2.0             5.0
## 46651   54  St. Petersburg  ...        3.0             3.0
## 
## [46652 rows x 59 columns]