When it comes to the game of tennis, there are never ending debates about who are the best players, specifically who is the greatest player of all time in the Association of Tennis Professionals, specifically when it comes to the men. Due to the structure of the tournaments in the ATP, both international tournaments and Grand Slams, there always comes a time when there could possibly be an upset or unexpected win. When it comes to the major tournaments that are highly anticipated, Wimbledon, US Open, Australian Open, and the French Open, there are always debated about who is going to win, and a large percentage of the time you will hear people say it will be Federer, Nadal, or Djokovic. Opinions greatly differ on who is the best of the three and the ATP as it takes a lot to rank in the top ten let alone three of the ATP. Using the data provided by the Association of Tennis Professionals Men’s Tour, we are able make some insights into the results of tournaments dating back from 2000 to 2016, including Grand Slams, Master Series, Masters Cups, and International Series competitions. This report will look at the top ranked players and which of the top ranked players has the higher winning percentage and major titles in the world.
This data set was collected from the Association of Tennis Professionals (ATP) Men’s Tour. The ATP Men’s Tour data can be found at https://www.kaggle.com/jordangoblet/atp-tour-20002016. When examining this data set that documents the ATP tournaments as well as the outcome of the matches, dates, rankings, and anything that is essential to know for an ATP tournament, there was some details that we not as significant when fully analyzing the information. Of the 54 columns, a large portion of them provided valuable knowledge when it came to dissecting and learning more about rankings, players in the ATP, and the tournaments that are held, although some of the information provided while informative, was not necessarily relevant to the visualizations. Since some of the details provided were not as significant as others, I chose not to use them, but rather utilized the data that pertained more so to the top three players, Roger Federer, Rafael Nadal, and Novak Djokovic as well as information relevant to the Grand Slam titles. A description of the cleanup can be found in the appendix below.
##Findings {.tabset .tabset-fade .tabset-pill}
import os
os.environ['QT_QPA_PLATFORM_PLUGIN_PATH'] = 'c:/ProgramData/Anaconda3/Library/plugins/platforms'
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")
path = "U:/"
#textfile = path + 'Metadata.txt'
#metadata = pd.read_csv(textfile, sep='\t')
filename = path + 'Data.csv'
tennis_df = pd.read_csv(filename, encoding='latin1')
One of the first things to look at before diving deeper into the world of ATP Men’s tennis, is the probability of winning a Grand Slam title versus a title that is not a Grand Slam. Grand Slam tournaments are the world’s four most important tournaments in the world of tennis. These tournaments also offer the most ranking points and a greater number of the “best of” sets, (which is 5 vs 3). This can help us understand the which kind of tournament has the higher percentage of winning for each ranking as well as whether or not it is a Grand Slam, which consists of four major tournaments the Wimbleton, US Open, Australian Open, and French Open.
tennis_df.WRank = pd.to_numeric(tennis_df.WRank, errors = 'coerce')
tennis_df.LRank = pd.to_numeric(tennis_df.LRank, errors = 'coerce')
tennis_df['Difference'] = tennis_df.LRank - tennis_df.WRank
tennis_df['Round_10'] = 10*round(np.true_divide(tennis_df.Difference,10))
tennis_df['Round_20'] = 20*round(np.true_divide(tennis_df.Difference,20))
#Number of sets within a match
tennis_df['Total Sets'] = tennis_df.Wsets + tennis_df.Lsets
tennis_df.W3 = tennis_df.W3.fillna(0)
tennis_df.W4 = tennis_df.W4.fillna(0)
tennis_df.W5 = tennis_df.W5.fillna(0)
tennis_df.L3 = tennis_df.L3.fillna(0)
tennis_df.L4 = tennis_df.L4.fillna(0)
tennis_df.L5 = tennis_df.L5.fillna(0)
tennis_df['Sets Difference'] = tennis_df.W1+tennis_df.W2+tennis_df.W3+tennis_df.W4+tennis_df.W5 - (tennis_df.L1+tennis_df.L2+tennis_df.L3+tennis_df.L4+tennis_df.L5)
new_df = tennis_df
df_non_GrandSlam = new_df[~(new_df.Series == 'Grand Slam')]
df_GrandSlam = new_df[new_df.Series == 'Grand Slam']
plt.figure(figsize = (20, 10))
bins = np.arange(10, 200, 10)
GrandSlam_prob = []
non_GrandSlam_prob = []
for value in bins:
pos = value
neg = -value
pos_wins = len(df_GrandSlam[df_GrandSlam.Round_10 == pos])
neg_wins = len(df_GrandSlam[df_GrandSlam.Round_10 == neg])
GrandSlam_prob.append(np.true_divide(pos_wins, pos_wins + neg_wins))
pos_wins = len(df_non_GrandSlam[df_non_GrandSlam.Round_10 == pos])
neg_wins = len(df_non_GrandSlam[df_non_GrandSlam.Round_10 == neg])
non_GrandSlam_prob.append(np.true_divide(pos_wins, pos_wins + neg_wins))
plt.bar(bins, GrandSlam_prob, width = 9, color = 'red')
## <BarContainer object of 19 artists>
plt.bar(bins, non_GrandSlam_prob, width = 8, color = 'blue')
## <BarContainer object of 19 artists>
plt.title('Ranking Difference vs Winning Probability', fontsize = 30)
plt.xlabel('Ranking Difference',fontsize = 15)
plt.ylabel('Winning Probability',fontsize = 15)
plt.xlim([10, 200])
## (10.0, 200.0)
plt.ylim([0.5, 0.9])
## (0.5, 0.9)
plt.legend(['Grand Slams', 'Non Grand Slams'], loc = 1, fontsize = 15)
plt.show()
Several things can be inferred from the above bar chart, we can easily see that the percentage of winning increases as the rank difference does. However, this trend does ten to saturate when the rank different reaches one-hundred. This trend can be seen in both Grand Slam and non Grand Slam tournaments. One can also infer that a more favored player has a higher percentage of winning compared to one that is not, or an underdog that is ranked lower in a Grand Slam tournament. Finally, once can also conclude that because Grand Slam tournaments play five sets rather than three, non Grand Slam tournaments could possibly be seen as more effective for players that are not of a certain ranking.
In the tennis world, there will always be specific players that win certain tournaments more so than others. Knowing that the major four major tournaments in the Grand Slam are the most highly anticipated and played the most out of any tournament, finding out which player has the most of each tournament will help explain the ranking of each player. This heat map gives us a contrast of the four major Grand Slams and the players who have won each tournament.
slams = tennis_df[tennis_df.Series == 'Grand Slam']
sets = slams[['Tournament', 'Series', 'Round', 'Wsets', 'Lsets']]
sets = sets.dropna()
round(sets.groupby('Tournament')['Lsets'].mean(), 3)
## Tournament
## Australian Open 0.685
## French Open 0.663
## US Open 0.679
## Wimbledon 0.688
## Name: Lsets, dtype: float64
round(sets.groupby('Round')['Lsets'].mean(), 3)
## Round
## 1st Round 0.674
## 2nd Round 0.686
## 3rd Round 0.655
## 4th Round 0.691
## Quarterfinals 0.705
## Semifinals 0.750
## The Final 0.776
## Name: Lsets, dtype: float64
wins = slams[['Winner', 'Tournament', 'Round']]
wins = wins[wins.Round == 'The Final']
winners_df = wins.groupby('Winner')['Tournament'].count()
winners_df = winners_df.reset_index()
winners_df = winners_df.sort_values(['Tournament'], ascending = False)
winners_slam_df = wins.groupby(['Winner','Tournament']).count()
winners_slam_df = winners_slam_df.reset_index()
winners_slam_df = winners_slam_df.sort_values(['Winner'], ascending=True)
winners_slam_df.columns = ['Winner','Tournament', 'Count']
winners_slam_df = winners_slam_df.dropna()
tennis_hm_df = pd.pivot_table(winners_slam_df, index='Winner', columns = 'Tournament', values = 'Count')
from matplotlib.ticker import FuncFormatter
fig = plt.figure(figsize = (18, 10))
ax = fig.add_subplot(1, 1, 1)
comma_fmt = FuncFormatter(lambda x, p: format(int(x), ','))
ax = sns.heatmap(tennis_hm_df, linewidth = 0.5, annot = True, cmap = 'coolwarm', fmt=',.0f',
square = True, annot_kws={'size':11},
cbar_kws = {'format': comma_fmt, 'orientation': 'vertical'})
plt.show()
The above heat map can provide us with additional information and insights to each player for each tournament. While the bar graph had showed the winning probability of a Grand Slam vs a Non Grand Slam pertaining to the players ranking, it did not show how many wins, which tournament specifically, and which player. This heat map shows the relative values of the total tournaments won in each of the major four, providing a more capturing experience. Based on the data, the player who have the over all most wins acorss the major four are Federer, Nadal, and Djokovic. What we conclude from this data is that they are clearly the top players during this time and that each player clearly has their “preferred” tournament, (each tournament is held on different surfaces, two of which are the same).
A huge debate among all tennis fans and players is, who is the best tennis player in the world? By understanding that the four major Grand Slam tournaments are all on different surfaces (grass, clay, and hard), understanding the top ten players ranking over all, as well as what their ranking is on each surface allows for a better overall understanding of each player and possibly determine their preference or familiarity with the different surfaces. By graphing the top ten players around the world by their ranking, we can get and insight as to who ranks the highest among them, as well as who ranks the highest on each surface. The below graph is a stacked bar chart, which is colored by each court surface.
top10_df = tennis_df[(tennis_df.Winner.isin(['Federer R.', 'Nadal R.', 'Djokovic N.', 'Murray A.', 'Roddick A.', 'Hewitt L.', 'Tsonga J.W.', 'Ferrer D.', 'Nalbandian D.', 'Berdych T.']))]
top10_df = tennis_df[(tennis_df.Loser.isin(['Federer R.', 'Nadal R.', 'Djokovic N.', 'Murray A.', 'Roddick A.', 'Hewitt L.', 'Tsonga J.W.', 'Ferrer D.', 'Nalbandian D.', 'Berdych T.']))]
top10_df = top10_df[['Date', 'Surface', 'Winner', 'Loser', 'WRank', 'LRank']]
top10_wins = top10_df[top10_df.Winner.isin(['Federer R.', 'Nadal R.', 'Djokovic N.', 'Murray A.', 'Roddick A.', 'Hewitt L.', 'Tsonga J.W.', 'Ferrer D.', 'Nalbandian D.', 'Berdych T.'])]
top10_losses = top10_df[top10_df.Loser.isin(['Federer R.', 'Nadal R.', 'Djokovic N.', 'Murray A.', 'Roddick A.', 'Hewitt L.', 'Tsonga J.W.', 'Ferrer D.', 'Nalbandian D.', 'Berdych T.'])]
top10_wins = top10_wins[['Date', 'Surface', 'Winner', 'WRank']]
top10_losses = top10_losses[['Date', 'Surface', 'Loser', 'LRank']]
top10_wins.columns = ['Date', 'Surface', 'Player', 'Rank']
top10_losses.columns = ['Date', 'Surface', 'Player', 'Rank']
top10_df = pd.concat([top10_wins, top10_losses], sort=True)
top10_df['Date'] = pd.to_datetime(top10_df.Date, format='%d/%m/%Y')
top10_df = top10_df.sort_values(['Date'])
top10_players_df = top10_df.groupby(['Surface', 'Player'])['Rank'].sum().reset_index(name='Total Ranking')
top10_players_df = top10_players_df.sort_values(['Total Ranking'], ascending=False)
top10_players_df = top10_players_df.pivot(index='Player', columns='Surface', values='Total Ranking')
surface_order = ['Clay', 'Grass', 'Carpet', 'Hard']
top10_players_df = top10_players_df.reindex(columns=reversed(surface_order))
from matplotlib.ticker import FuncFormatter
fig = plt.figure(figsize=(20, 15))
ax = fig.add_subplot(1, 1, 1)
top10_players_df.plot(kind='bar', stacked=True, ax=ax)
plt.ylabel('Total Ranking', fontsize=20, labelpad=15)
plt.title('Total Ranking by Player and by Surface \n Stacked Bar Plot', fontsize=20)
plt.xticks(rotation=0, horizontalalignment = 'center', fontsize=18)
## (array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), [Text(0, 0, 'Berdych T.'), Text(1, 0, 'Djokovic N.'), Text(2, 0, 'Federer R.'), Text(3, 0, 'Ferrer D.'), Text(4, 0, 'Hewitt L.'), Text(5, 0, 'Murray A.'), Text(6, 0, 'Nadal R.'), Text(7, 0, 'Nalbandian D.'), Text(8, 0, 'Roddick A.'), Text(9, 0, 'Tsonga J.W.')])
plt.yticks(fontsize=18)
## (array([ 0., 2000., 4000., 6000., 8000., 10000., 12000., 14000.,
## 16000.]), [Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, '')])
ax.set_xlabel('Top 10 Players (Based on GS Wins 2000 - 2016)', fontsize=18, labelpad=15)
handles, labels = ax.get_legend_handles_labels()
handles = [handles[3], handles[2], handles[1], handles[0]]
labels = [labels[3], labels[2], labels[1], labels[0],]
plt.legend(handles, labels, loc='best', fontsize=16)
plt.show()
The above graph can provide various insights and predictions of each player. First, we can see that out of the top ten player that Ferrer is ranked that highest, followed by Roddick, Djokovic, and Nadal making them the top four players in the world. Second, we can see that of these four that they all clearly have a high ranking in hard court, thus we can make the assumption that they all play on that surface frequently. We can also infer that Ferrer, Djokovic, and Nada have a higher ranking on clay than Roddick, with Ferrer being the highest. Third, we can see that of the top four ranked players, they all have small if not minimal rankings in grass and carpet court surfaces, with some lower ranked players like Hewitt and Federer who have higher rankings in grass than all of the top four players. Finally, the most intersting part of this graph is the fact that Federer who is referred to one of the big three, and possibly one of the greatest players of all time, has a lower overall ranking compared to Roddick and Ferrer who are not recognized as a top three player. This could be due to the fact that Federer was a more well known and popular player like Nadal and Djokovic, and is also very popular among certain age groups.
In many sports like tennis there are always players that go through and evolution as their career takes off, suffer an injury, or maybe even have a major loss in a tournament. In this sport, a majority of the players tend to have several years where they are at the height of their career followed by it tapering off or hitting a plateau, and maybe even eventually retire. With Federer, Nadal, and Djokovic being the most popular and top tennis players of this time, I wanted to to compare their rank over the course of 2000 all the way to 2016. To do this I created a line chart comparing each of the payers total rank throughout each tennis season, (yearly interval).
top3_df = tennis_df[(tennis_df.Winner.isin(['Federer R.', 'Nadal R.', 'Djokovic N.']))]
top3_df = tennis_df[(tennis_df.Loser.isin(['Federer R.', 'Nadal R.', 'Djokovic N.']))]
top3_df = top3_df[['Date', 'Winner', 'Loser', 'WRank', 'LRank']]
top3_wins = top3_df[top3_df.Winner.isin(['Federer R.', 'Nadal R.', 'Djokovic N.'])]
top3_losses = top3_df[top3_df.Loser.isin(['Federer R.', 'Nadal R.', 'Djokovic N.'])]
top3_wins = top3_wins[['Date', 'Winner', 'WRank']]
top3_losses = top3_losses[['Date', 'Loser', 'LRank']]
top3_wins.columns = ['Date', 'Player', 'Rank']
top3_losses.columns = ['Date', 'Player', 'Rank']
top3_df = pd.concat([top3_wins, top3_losses], sort=False)
top3_df['Date'] = pd.to_datetime(top3_df.Date, format='%d/%m/%Y')
top3_df = top3_df.sort_values(['Date'])
top3_df.Rank = top3_df.Rank.astype(int)
# Remove outlying Ranks
top3_df = top3_df[top3_df.Rank < 100]
top_players_df = top3_df.groupby(['Date', 'Player'])['Rank'].sum().reset_index(name='Total Ranking')
fig = plt.figure(figsize = (18, 10))
ax = fig.add_subplot(1, 1, 1)
my_colors = {'Federer R.':'red',
'Nadal R.':'green',
'Djokovic N.':'blue'}
for key, grp in top_players_df.groupby(['Player']):
grp.plot(ax=ax, kind='line', x='Date', y='Total Ranking', color=my_colors[key], label=key, marker='8')
plt.title('Total Rankings by Year', fontsize=20)
ax.set_xlabel('Date (Yearly Interval)', fontsize=20)
ax.set_ylabel('Total Ranking', fontsize=20, labelpad=22)
ax.tick_params(axis='x', labelsize=18, rotation=0)
ax.tick_params(axis='y', labelsize=18, rotation=0)
handles, labels = ax.get_legend_handles_labels()
handles = [handles[1], handles[2], handles[0]]
labels = [labels[1], labels[2], labels[0]]
plt.legend(handles, labels, loc='best', fontsize=18, ncol=1)
plt.show()
Using the above line chart, it is clear that over the past sixteen years from 2000 to 2016 that each of the players started at high rankings. However, you can clearly see that Federer started his pro career before both Nadal and Djokovic, followed with Nadal starting his career before Djokovic. We can also clearly see a patter among all of these top players, even though they each started their professional career at different times, that all of their rankings started off very high followed by a decline over the years. Thus, allowing for us to make the assumption that age could play into the decline of these players ranks. However, Federer is older than both Nadal, and you can see towards the later years on the chart between 2014 - 2016 that his ranking fluctuates and actually is higher than the two younger players. You can also see that Nadal and Federer had more fluctuations in their rankings than Djokovic did as he seemed to plateau at the end, having a lower ranking than both Federer and Nadal making no indication that it is to increase in the future.
While ranking or the number of Grand Slam tournaments and non Grand Slam tournaments say a lot about a player, when really dividing the data to get each players performance on the top three surfaces it can give you more insight to a player than their ranking and possibly even their performance in a tournament. In general a players performance on a tennis court surface correlates more so with what they primarily play on, but can also correlate with how each player performs on the other surfaces. To do that I decided to graph this data on a radar char to show who of the big three, (Federer, Nadal, and Djokovic), is the most well rounded player and the best on each surface.
surface = tennis_df[['Surface', 'Winner', 'Loser']]
surface_wins = surface[['Surface', 'Winner']]
surface_losses = surface[['Surface', 'Loser']]
surface_wins.columns = ['Surface', 'Player']
surface_losses.columns = ['Surface', 'Player']
surface_wins['idx'] = range(1, len(surface_wins) + 1)
surface_losses['idx'] = range(1, len(surface_losses) + 1)
surface_wins = surface_wins.groupby(['Surface', 'Player']).count()
surface_wins = surface_wins.reset_index()
surface_wins.columns = ['Surface', 'Player', 'Win_Count']
surface_losses = surface_losses.groupby(['Surface', 'Player']).count()
surface_losses = surface_losses.reset_index()
surface_losses.columns = ['Surface', 'Player', 'Loss_Count']
surface = pd.merge(surface_wins, surface_losses, on=['Surface', 'Player'])
surface['Play_Total'] = surface['Win_Count'] + surface['Loss_Count']
surface['Win_Percentage'] = round(surface['Win_Count'] / surface['Play_Total'], 4)*100
surface = surface[surface.Play_Total > 50]
surface.sort_values(by='Win_Percentage', ascending=False).head(30)
## Surface Player Win_Count Loss_Count Play_Total Win_Percentage
## 612 Clay Nadal R. 351 35 386 90.93
## 969 Grass Federer R. 147 20 167 88.02
## 1435 Hard Djokovic N. 469 82 551 85.12
## 1468 Hard Federer R. 622 124 746 83.38
## 1129 Grass Murray A. 90 18 108 83.33
## 944 Grass Djokovic N. 65 14 79 82.28
## 351 Clay Djokovic N. 169 39 208 81.25
## 1185 Grass Roddick A. 82 21 103 79.61
## 1295 Hard Agassi A. 203 56 259 78.38
## 1131 Grass Nadal R. 58 17 75 77.33
## 381 Clay Federer R. 203 60 263 77.19
## 1717 Hard Murray A. 369 110 479 77.04
## 1022 Grass Hewitt L. 103 31 134 76.87
## 57 Carpet Federer R. 46 14 60 76.67
## 1719 Hard Nadal R. 363 112 475 76.42
## 1802 Hard Roddick A. 402 136 538 74.72
## 1824 Hard Sampras P. 73 26 99 73.74
## 1244 Grass Tsonga J.W. 42 16 58 72.41
## 520 Clay Kuerten G. 105 40 145 72.41
## 385 Clay Ferrero J.C. 221 86 307 71.99
## 167 Carpet Safin M. 46 18 64 71.88
## 621 Clay Nishikori K. 60 24 84 71.43
## 766 Clay Thiem D. 59 24 83 71.08
## 874 Grass Berdych T. 59 24 83 71.08
## 1422 Hard Del Potro J.M. 220 90 310 70.97
## 316 Clay Coria G. 129 53 182 70.88
## 1192 Grass Rusedski G. 41 17 58 70.69
## 384 Clay Ferrer D. 293 122 415 70.60
## 605 Clay Moya C. 203 85 288 70.49
## 1550 Hard Hewitt L. 324 138 462 70.13
surface.Surface.unique()
#Who of the Big Three is the best on Various Surfaces
## array(['Carpet', 'Clay', 'Grass', 'Hard'], dtype=object)
top3_surfaces = surface[(surface.Player.isin(['Federer R.', 'Nadal R.', 'Djokovic N.'])) & (surface.Surface != 'Carpet')]
top3_surfaces = pd.pivot_table(top3_surfaces, values='Win_Percentage', columns=['Surface'], index=['Player'])
top3_surfaces.index.names
## FrozenList(['Player'])
top3_surfaces[top3_surfaces.index == "Federer R."]
#%matplotlib inline
## Surface Clay Grass Hard
## Player
## Federer R. 77.19 88.02 83.38
labels = np.array(['Clay', 'Grass', 'Hard'])
Federer = top3_surfaces.loc[top3_surfaces[top3_surfaces.index == "Federer R."].index[0],labels].values
Federer = top3_surfaces.loc[top3_surfaces[top3_surfaces.index == "Federer R."].index[0],labels].values
Nadal = top3_surfaces.loc[top3_surfaces[top3_surfaces.index == "Nadal R."].index[0],labels].values
Djokovic = top3_surfaces.loc[top3_surfaces[top3_surfaces.index == "Djokovic N."].index[0],labels].values
top3_wins = pd.DataFrame([Federer, Nadal, Djokovic])
top3_wins.columns = ['Clay', 'Grass', 'Hard']
top3_wins['Player'] = ['Federer R.', 'Nadal R.', 'Djokovic N.']
top3_wins = top3_wins[['Player', 'Clay', 'Grass', 'Hard']]
Federer = np.concatenate((Federer, [Federer[0]]))
Nadal = np.concatenate((Nadal, [Nadal[0]]))
Djokovic = np.concatenate((Djokovic, [Djokovic[0]]))
angles = np.linspace(0,2*np.pi, len(labels), endpoint=False)
angles = np.concatenate((angles,[angles[0]]))
#labels for some reason would not work so: 120º = Grass , 0º = Clay , 240º = Hard
fig = plt.figure(figsize=(20, 10))
ax1 = fig.add_subplot(111, polar=True)
ax1.plot(angles, Federer, 'o-', linewidth=2, label = 'Federer')
ax1.fill(angles, Federer, alpha = 0.25)
ax1.set_thetagrids(angles * 180/np.pi) #labels)
## (<a list of 8 Line2D ticklines objects>, [Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, '')])
ax1.grid(True)
ax2 = fig.add_subplot(111, polar=True)
ax2.plot(angles, Nadal, 'o-', linewidth=2, label = 'Nadal')
ax2.fill(angles, Nadal, alpha = 0.25)
ax2.set_thetagrids(angles * 180/np.pi) #labels)
## (<a list of 8 Line2D ticklines objects>, [Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, '')])
ax2.grid(True)
ax3 = fig.add_subplot(111, polar=True)
ax3.plot(angles, Djokovic, 'o-', linewidth=2, label = 'Djokovic')
ax3.fill(angles, Djokovic, alpha = 0.25)
ax3.set_thetagrids(angles * 180/np.pi) #labels)
## (<a list of 8 Line2D ticklines objects>, [Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, ''), Text(0, 0, '')])
ax3.grid(True)
l = plt.legend(bbox_to_anchor=(1.1,1))
plt.show()
When looking at the radar chart above, you can see that each player has a vast knowledge and clearly versitility on each of the three popular surfaces of clay, grass, and hard court. However, we can see from this chart that Federer is clearly a stronger and better player on grass of the the three, with Nadal clearly being the stranger and better player on grass of the three. Djokovic, unlike Federer and Nadal, does not have a clear point where he comes out as the stronger or better player on a specific surface, rather he ranks close with Federer when it comes to hard court, but when really analyzing the data he beats Federer by about 1.74 points. When looking at the overal average of who is the best player on all of the surfaces, Djokovic actually comes out on top as the most complete or well rounded player just marginally over Federer by 0.02, and Nadal by 1.323. This data clearly shows that when it comes to performance on grass, clay, and hard court, Djokovic will perform better on all three over Nadal and Federer.
From all of these visualizations, we can learn several things about the top tennis players in the Association of Tennis Players, specifically those on the men’s tour. By looking at the bar chart of the winning probability of a Grand Slam or non Grand Slam tournament, we can strongly infer that players who have a higher winning probability who have a high ranking are more likely to win a Grand Slam tournament than someone who is not of a high rank. We can also infer from this data that there are clearly a select few of these tennis players who have high ranking across all court surfaces who also have won over a certain amount of Grand Slam tournaments. From this we concluded that of the top ten ranked players, Federer, Nadal, and Djokovic, while not having that highest total ranking across all of the court surfaces, had the most wins of the major four Grand Slam tournaments. Finally, we also learned of the bug three who was the more rounded player across all of the court surfaces which was Djokovic even though his ranking was lower than Federer’s and Nadal’s. While this does not entirely prove who is the greatest player of all time in the tennis world, it does give tennis fans across the globe a better idea of the performance of three most well known and loved players of our time.
tennis_df
## ATP Location ... Total Sets Sets Difference
## 0 1 Adelaide ... 2.0 6.0
## 1 1 Adelaide ... 2.0 6.0
## 2 1 Adelaide ... 3.0 4.0
## 3 1 Adelaide ... 2.0 7.0
## 4 1 Adelaide ... 3.0 1.0
## ... ... ... ... ... ...
## 46647 54 St. Petersburg ... 2.0 8.0
## 46648 54 St. Petersburg ... 2.0 6.0
## 46649 54 St. Petersburg ... 2.0 4.0
## 46650 54 St. Petersburg ... 2.0 5.0
## 46651 54 St. Petersburg ... 3.0 3.0
##
## [46652 rows x 59 columns]