Looking for the best tennis player: How can we measure greatness in tennis?
In the project, we have used the data on the ATP (Association of Tennis Professionals) matches, tournaments, and players, available on the JeffSackmann/tennis_atp repository on GitHub. We aimed to search for the best tennis player. We track difficulties in comparing players who dominated in tennis in different periods and proposed various ways in which one may try to compare them.
We have imported the data from the repository on January 1, 2021. For that moment, the latest data was for December 28, 2020, so the whole season 2020 was included. There are 3 main data sets:
data_atp.rds
rankings.rds
players.rds
data_atp.rds was used to produce the fourth data set, data_extra.rds, which extends data_atp by additional features (mostly matches statistics that can be extracted from the score, such as the number of sets or tie-breaks won by both opponents). All the pre-processing can be found and reproduced with the use of the data_preparation.R script, appended to the project.
data_atp.rds is a data set for matches. Originally, it contained 49 attributes, e.g. names of the opponents (winner_name, loser_name), information about a tournament (touney_name, tourney_level, tourney_date, surface), and round (round), score (score), match duration (minutes), and some more detailed information on the players, such as their height (winner_ht) and age at the moment of the match (winner_age), as well as details of the match: e.g. number of aces (w_ace), double fouls (w_df) or break points saved (w_bpSaved)1. The complete description of the original data can be found in the matches_data_dictionary.txt file, appended to the project.
The data in the data_atp file spans the years 1968 to 2020. I retrieved it on January 1, 2021, 21:05, CET. The preprocessing of this data involved mostly formatting dates, renaming levels associated with tourney levels, and adding the dates of birth for the opponents (which made for 51 attributes in total in this data set). We have used the players data set for the latter task, retrieved on January 3, 2021, at 21:54, CET. players contains the players’ names and IDs and their birth date, dominant hand, and country codes. The data set was not useful outside of the preprocessing stage.
Much of the data preparation was devoted to processing the rankings data set, retrieved on January 3, 2021, 21:40, CET. The rankings.rds contains 5 variables: ranking_date, rank, player_id. points and player (name). Players’ names were added based on the IDs, using the players data set. They have formatted the dates and restricted the data set to the top 100 players for each ranking update, as the project aimed to investigate only the best players. We have found a couple of mistakes in the data set (players who rank for a particular date was an obvious typo).2 It is worth mentioning that the data for rankings starts from 1973-08-27, and the ranking points are presented only from 1990-01-01.
data_extra file has been produced basing on the data_atp data set, and extends the 51 attributes present in data_atp by another 35, for example w_SetsWon/l_SetsWon (number of sets won in match), w_SvGmsLost/l_SvGmsLost (number of service games lost in match) and w_tbWon/l_tbWon (number of tie-breaks won in match). A full guide to this data set can be found in the data_preprocessing.R file.
data_atp.rds (and data_extra.rds) contains 177,642 records, meaning that there 177,642 singles male professional tennis matches we have record of, from 444 tournaments (excluding Davis Cup3). rankings.rds, after significant reduction (to the top players only) has 180,886 data points, and the players data set contains 6,433 unique players.
Ranking data seems to be an obvious first choice for finding out which players stand out the most in overall greatness. The source of ranking points is players’ performance in the tournaments. Nowadays (from 2009), except for long-lasting fame and significant financial reward, the most prestigious tournaments, Grand Slam Tournaments4, grant their winners 2,000 ranking points. A runner-up receives 1,200 points, and the semi-finalists - 720. The second most important event in the male tennis calendar is the ATP Finals, in which the best 8 players of an ending season compete for up to 1,500 additional points. The third tier in the hierarchy are the Masters 1000 tournaments, in which the winners get 1,000 ranking points, runners-up - 600, and semi-finalists - 360. Masters 1000 are a major source of ranking points for the best players, as there are (for 2021) 9 such events during the year5. There are also tournaments included in the 500 and 250 ATP Series (500 or 250 ranking points for the winner), but they are not that frequently played by the players on the very top of the ranking. As for today, the points are the basis for players’ ranking position, and their expiry time is one year6. Crucially for our analysis, a set of tournaments played by players at different times changes. Except from the immutable 4 Grand Slam tournaments, makeup and the number of events played on each level vary over decades. Therefore, rankings provide a way to abstract how exactly players make up for their dominance and focus only on how they compare to the others, which played simultaneously, in possibly similar circumstances. In other words, we will be looking here for the players who dominated the rankings to the highest degree.
The graph show top 10 players with respect to the time they spent as world number 1. The leader in this category is Novak Djokovic, who has been listed on top of the ATP rankings for the first time in 2011. Djokovic’s over six years as the ranking’s leader is more than e.g., time that Bjorn Borg spent overall in the world top 3 (Borg’s 9th place in that category). However, there are 3 other players than Djokovic who come close to his record, occupying more than five years as the world number 1 - they are Federer, Sampras and Connors. Federer, just like Djokovic and Nadal (6th with respect to the orange category), are still active players, all three currently in the top 107, so their records are not a closed book.
Interestingly though, in the other three categories taken into account, the leaders are different. Roger Federer is an undisputed leader with respect to time spent in the top 3 and top 10. With over 18 years in the top 10 and 16 in the top 3 overall, he is over two years ahead of the second player in these two sub rankings, Rafael Nadal. Federer is also second with respect to time spent on top of the ranking in a row, where the leader, with over five years, is Jimmy Connors.
Clearly, no one player beats all the others in his rankings records. Federer is either first or second with respect to every category shown, which provides a strong argument for his advantage. However, it may be argued that the overall time spent in the top 3 or top 10 is not equally important as time spent on top of the ranking. In general, it seems that a relatively small group of 6 players stands out from the rest, all categories considered. None of the top 6 players on the list, so Djokovic, Federer, Sampras, Connors, Lendl and Nadal, fell out of top 10 in either category, and the top 3 in each category always recruits from these six players.
The players’ domination over their rivals is reflected in the time they spent on top of the ATP rankings and the way they reached the top and eventually left it. From this viewpoint, Djokovic may be allowed merit for “often” advancing from top 10 to top 3, and then relatively “often”8 to world number 1. A similar pattern can be seen for Sampras, who, however, spent, in general, less time in top 3 and as top 1 than Djokovic. Federer, and especially Nadal, both have great records as players listed in top 10 and top 3, but they spent comparably more time than Djokovic at other ranking positions than 1.
There are, however, also some issues connected to the ATP rankings that render the conclusions we would like to draw dubious. As we mentioned, rankings provide a way to see how the players compare, regardless, e.g., what and even how many particular tournaments there were at the time of their prime. However, the rankings methodology also changes, and the rankings data is not consistent for the older records. The methodology’s impact may be in the differences in points rewarded to the players for particular results at the tournaments, additional opportunities for scoring points (like ATP Finals), or withdrawing points for particular events (it is the case with Summer Olympics and Davis Cup from 2016). Also, an evolving set of tournaments played on different surfaces causes players, who usually favor some surfaces and tournaments over some other, to have varying opportunities. First, in the next graph, we will focus on a more straightforward problem: the frequency of rankings updates.
What can easily be concluded is that prior 1985 the rankings data was updated much less frequently than today. As the black line shows, there were years (in particular 1980-1984), when the rankings was updated only twice a season. This casts a doubt on the record streaks as the world number 1 player achieved during this period. It is not to say that Jimmy Connors’ record 5.43 years streak9 should not be trusted, but one has to bear in mind that he needed to defend his leading position much less frequently than more recent players’, who are in danger of losing their points after every single tournament in which they performed in a previous year. Additionally, the graph reveals that for some players, in particular Lendl and Sampras, although they did not achieve a record streaks, they enjoyed two periods of domination, with a relatively small interuption. But for a couple months breaks, one would be entirely entitled to say that the years 1986-1991 were dominated by Lendl. Another interesting fact is that in the last 12 years, with exception of one Djokovic’s domination period lasting for over 2 years, there were no particularly long periods of one-player dominance.
But even the very best players had to start their careers from scratch and climb the rankings. One of the aspects we considered in a quest for finding the best player was to consider how long it took the player to reach rank 1, from the time his first game in the database was recorded.
Measured in days, Roger Federer climbed the fastest up the rankings, reaching TOP1 just after 1596 days, or slightly less than four and half years. The second fastest was Pete Sampras, for whom it was just a week more. Third is Rafael Nadal with 1946 days. From a commonly known “big-three”, it took Djokovic the longest to become number 1 in men’s tennis. As the chart helps us to set in the timeline, we might see that it might be because of overlapping climbing periods with Rafael Nadal. Findings like this make us aware that we also need to consider other players’ development when assessing a player.
Tennis hates a vacuum. Over the years, different players dominated courts. At the very best level, the concentration of wins is very high amongst few players. A good season for one player often means a terrible season for another, who lost the finals to the winner despite his high form and skills. This is why it is essential always to put some perspective when analyzing best performances.
As presented in the plot above, we can see the development of the best player career, their peaks, troughs, and relation to other players, as measured by the number of tournament wins. We can see that it is unusual for any two or more players to have simultaneously their peak season. An excellent example of this is Ivan Lendl, whose best seasons in the 1980s are echoed in a lower number of wins of other great players of this time - Bjorn Borg and Jimmy Connors. Another example of this is Roger Federer’s best performance in 2005 that shadowed Rafael Nadal’s number of tournaments won. Another insight from this chart is the pace of career development. We might see patterns of steady climb, like Andre Agassi, quick rise to prominence like Ivan Lendl, or even sudden falls like Andy Murray’s after 2015.
As we have pointed out, rankings allow for comparing players against their rivals, but they also have some drawbacks. Looking for the actual tournamets wins has its purpose - not only do the wins make up the rankings, but also some deficiencies of rankings, like depending on a current methodology of creating the rankings (e.g. once winning a tourney means getting x points, after the revison of rule - y points, once tournaments counts, the other time not), can be fixed by looking at particualar events. We shall try to help this issue by looking at the “raw wins” and manually picking what is important. This is what we are up to do in this section.
What we were up to show with this graphs that the actural numbers of tournaments win for the players who have collected the most of them and the types of wins and their distribution over the career’s span. In fact, Jimmy Connors is the player who won the most professional tournaments in the history of tennis - 110. However, his career was exceptionally long and lasted around 28 years (from the first to the last professional match recorded). But even more, interestingly, his wins are concentrated in a period of 18 years, or even 13 years, disregarding three late wins. Therefore, one may say that he was less lastingly dominant than Federer, who has been winning for over 18 years now. From the other viewpoint, gathering a record number of wins in a shorter time may be deemed more impressive - but then other players stand out, in particular Bjorn Borg, who has collected his 64 wins in just eight years.
The other thing that should be pointed out is the contents of the players’ wins portfolio. The legacy of Connors, McEnroe, Vilas or Nastase are mostly “other tour-level events” wins. The todays “big three”, that is Djokovic, Federer, and Nadal, have much more Grand Slam wins. More on the players’ records to particular types of tournaments is discussed later.
For almost all players, an interesting pattern of winning their first non-challenger tournament around two years after playing his first professional match, it is detected. Also, it is visible for some players that their wins are periodic - in particular for Nadal, who specializes in clay tennis, and clay tournaments are organized in late spring. The periods of players’ prime, as well as their injuries may be spotted: for instance, Djokovic’s break c. Five years ago, after a period of frequent highest-level wins. For most of the retired great players, we can see a period when they did not enjoy tournament wins anymore despite being active. However, there are some exceptions, particularly Lendl or Sampras, who even won a Grand Slam (US Open 2002) almost by the end of his career. Finally, we may look at the players’ achievements conditional on the chances they had and used. In this case, we consider only the finals a player reached10. It may be seen that, e.g., Federer won the only c eventually. 65% of finals he played are less than Connors and much less than Nadal and Djokovic (both c. 70%). On the one hand, reaching a final is an achievement, and relatively low ratios here indicate that Federer (or Lendl) might have had even greater records, but in general low finals win ratio should not be counted to a player’s advantage. It is particularly impressive when a player, such as Sampras (and to an extent also Bjorg, but also had a long period of no wins at all) managed to win much over 70% of finals he competed in.
Every professional sport has its more or less important competitions. The most important tennis events are The Grand Slam tournaments/ Four most important annual tennis events that offer the most ranking points, prize money, public, and media attention. The Grand Slam itinerary consists of the Australian Open, the French Open (also known as Roland Garros), Wimbledon, and the US Open in August–September. The Australian and the United States tournaments are played on hard courts,[a] the French on clay, and Wimbledon on the grass. As mentioned during surface analysis, different players historically excelled in different court types.
The chord diagram above presents us eight players with the highest number of Grand Slam tournaments wins. Starting with Rafael Nadal, we can see his dominance on Roland Garros’ clay surface and decent performance on the US Open, and a moderate number of wins on Wimbledon. Second comes Roger Federer, whose single wins in the French Open are compensated by a significant number of wins in the other three tournaments, especially the Wimbledon. Novak Djokovic - third in line, also presents a strong performance on Wimbledon’s grass court and an excellent history of winning the Australian Open. Notable mentions deserve Pete Sampras, who dominated Wimbledon for some time, while also getting one of the best performances on the US Open, as well as and Bjorn Borg, who as one of the few players gave a very stellar performance on both grass of Wimbledon and clay of French Open.
As we examined, some names appear more than others in the list of big tournament winners. To further investigate their performance and see how it differs from a typical pro player, we took a more in-depth look at the relation of age, and the number of Grand Slam matches won.
Blended into the background, we see the performance of each player available in our database. Highlighted, we can see outstanding players and how much they differ from the rest of the pack. Looking at this step plot, we might see the pace of winning dependent on player age, how early he started winning Grand Slam titles, and the overall number of wins. Taking an example from Bjorn Borg, we can see that his Grand Slam career started exceptionally early, winning his first match at just 17. His progressive win streak continued till the end of his career at just the age of 26, which amounted to 142 wins 11 Grand Slam titles.Rafael Nadal, Novak Djokovic, and Pete Sampras began their Grand Slam careers at about the same age, while Federer joined the competition a little later. Whereas Pete Sampras ended his career after his 30s, Djokovic and Nadal are still pulling ahead of the game. Still, they lag behind, despite his considerable age as a pro player, the unmatched Roger Federer.
Speaking of opportunities the players had to built their all-time great achievements, it is worth acknowledging that they not always mantained consistency throught entire career span. Many times, players had periods of greatness, followed by times of injury or worse disposition. It may be seen very well on a graph above for Nadal and Agassi, and to an extend also for Federer. Moreover, in old days of tennis or even today for various reasons, the players may have varying chances to travel to distant venues, and compete in all events in the calendar. For this reason we will now look not at their entire carrers but only at the specific seasons, and search for the best ratio of tournaments they won to tournaments they acturally competed in.
It is worth noting that as we are looking for the best of the best seasons in the tennis history, we must have limited the query to the most prestigious tournaments, and we also do not take into account seasons in which a player competed in less than 4 tournaments. By these criteria, Lendl’s 1986 season happens to be the single best in the history, with a 75% of wins in the most prestigious tournaments (6 out of 8)11, Roland Garros, Wimbledon and US Open among them12. Though Lendl is clearly ahead of the field, Djokovic’s 2015 season must also be acknowledged as on the best ever, as he then competed in 13 most prestigious tournaments, and was victorius in 9 of them13. What must also not be overlooked is that some players, notably Djokovic and Federer, although failed to set a record in the tournaments win ratio during a season, had more than 1 season qualifying to the all-time best list: Djokovic 2 (2015 and 2011), and Federer 4 (2006, 2004, 2017 and 2005).
Another aspect of career development worth considering is the performance across best seasons, measured by the total number of wins.
As presented on the graph, we are considering top10 players in terms of the highest number of wins in their very best season. This geom bump lot helps us investigate further the pace of career development. In the left upper right-hand side corner of the plot, we see Guillermo Vilas, whose one season dominance was followed by a sharp decline in his other best seasons. It puts it in stark contrast with Jimmy Connors, who had nowhere near Villas in his prime season and scored more wins over his career through strong performance across all his best five seasons. Player Ilie Nastase presents an interesting pattern, with a nearly equally strong performance during his two best seasons followed by a substantial decline in his third.
Temporal falls in form may happen to every great player, therefore comparing best seasons for our players has its purpose. However, from the best players, consistency is also expected. Therefore, it is important to see their overall performance, and we shall do it on a more fine-grained level than before - comparing the best players’ overall ratios of matches wins.
Several interesting points may be made about the statistic of lifetime matches wins to matches played ratio. The todays “big three,” Djokovic, Nadal, and Federer, clearly lead in the field, Djokovic and Nadal having even a pretty significant advantage over Federer. Nonetheless, one should remember that the big three are still active players, and, as was shown on the 3rd plot, usually at the end of careers, players enjoy relatively fewer victories (in tournaments, but inevitably also in matches). Therefore, we may expect Nadal’s and Djokovic’s numbers to fall, but only if they have already reached their ceiling - and this is, of course, hard to estimate. Although Nadal and Djokovic (and, occasionally, Federer) still play, their number of performances has already matched or exceeded the numbers for the older great players, so their impressive results are up to the current moment of their careers should, of course, not be denied.
Finally, it is worth noting that the set of players listed here is not very different from the list of the best players with respect to rankings statistics and tournament achievement. On the one hand, it is not surprising, as rankings are built on match wins. However, it could also be so that some players competed for more than others (more frequently, or for a longer period, like Connors and Federer) and built a significant portfolio of trophies. At least for the very best old-time players like Borg, Connors, Lendl, McEnroe, and Sampras, it is not valid, as they are also on the all-time best list of wins ratios.
To be considered one of the best players, one needs to perform well both in Grand Slam and non-Grand Slam tournaments. Drilled down and, calculated ratios of matches won and investigated the relationship between them using linear regression.
The same names that dominated previous charts score very high in both ratios. Players like Bjorg and Nadal performed exceptionally well in Grand - Slam matches, with ratios close to 90%. Novak Djokovic takes the lead in non-Grand Slam with a ratio of win of 82%. Roger Federer is also very high, with both ratios more similar to each other, whereas Pete Sampras is more oriented towards winning Grand Slam matches. Notable mentions deserve players who perform significantly better in one of the categories, like Dennis Ralston in Grand Slam matches and Jose Luis Clerc in non-Grand Slam matches.
After what has been said, one may actually be tempted to ask a straighforward question: so, how does the greatest players of all time directly compare to each other? A different way to answer this question than finding out various statistics of their carrers, would be to inspect their mutual confrontations. Isn’t it the case, that if one player has played the other many times, at various points of their carrers, in various conditions and circumstances, and was usually winning, then he can be named a better player?
There is no single player who dominated every top player he played with, however there are 3 players who stand out: Rafael Nadal, who has the highest average win ratio agains top 3 players he played (77%) and the highest worse win ratio: 46% agains Nikolay Davydenko (in 11 matches). However, his secord case of win ratio under 50% (more loses than wins) is agains Novak Djokovic, who has the 2nd highest worse win ratio, 44%, and for whom there was only 1 player who beated him more often than was beaten by Djokovic - Andy Roddick (5 win out of 9 matches played). Thus said Novak the closest to being unbetable, with 1 exception - of Bjorn Borg. Borg in turn has 3rd highest overall average win ratio against his opponents (71.1%), and also has only 1 oppenent with whom he was more often losing than winning - John Newcombe (1 win of Borg out of 4 matches). As in the case of Roddick and Djokovic, Borg is much younger than Newcombe, and met him at the beginning of his carrer. All the opponnents he played later on, he has beaten in head-to-head.
Roger Federer must be mentioned as having the second highest average win ratio agains best rivals (73.4%), however Federer has also quite a few rivals, with who he was on average more often losing than winning (7) - not only players older than him, whom he might havemet at the begging of a carrer (like Gustavo Kuerten, Yevgeny Kafelnikov and Alex Corretja) and played a few times, but also currently active players like Dominic Thiem and Alexander Zverev, and probably most importantly, his greatest rivals - Rafael Nadal (42%) and Novak Djokovic (45%).
All in all, for every reat player there always is another, with whom he has trouble winning. That is not say, that mutual statistics do not distinguish any players over the others.
Tennis is one of the unique sports in which players must master the technique of playing on several significantly different surfaces - clay, grass, carpet, and hard surfaces. Each of them has various advantages and disadvantages about match play and maintenance. Clay tennis courts favor players who can play defense. The bounce of a tennis ball is also higher and slower than that of a hard tennis court. Grass tennis courts are the fastest type of courts. Bounces tend to be fast and low, rallies are short, and the service plays a more significant role than other surfaces. When it comes to hard courts, play is generally faster because there is little energy absorption. Players can apply many types of spins during play, and the ball tends to bounce high. Carpet is typically the fastest surface, faster than a hardcourt, with low bounce. Although high-profile tournaments like the ATP tour were commonly played on a carpet in the past, they saw a decline in modern tennis. The main reason for it is that games carried out on this surface, with their great pace, are not as much interesting for the audience and. Also, due to its characteristics, the carpet surface created more opportunities for injuries.
Starting with clay, Guillermo Villas takes first place with 667 overall wins. Although many wins, he had a comparably poor record at the French Open, with just the single triumph in 1977, although he did reach the final four times. A similar story can be told about the second place - Manuel Orantes. Although he has won the second most clay-court matches in history (560), he never won the French Open - the most important tournament played on clay. The undisputed greatest clay-court player of all time is Rafael Nadal, who won an incredible 54 clay-court titles, including 10 French Opens and 11 Monte Carlo Masters trophies, through his 449 wins at the third place.
The hard surface is the most balanced between modern surfaces. In its top 10, we can find all of the so-called “big-four” players: Federer, Nadal, Djokovic, and Murray. Roger Federer has a greater claim than anyone else of being the best hard-court player of all time. A runner up for his place is Novak Djokovic with his 612 wins. Andre Agassi, another tennis great who played the game a long time in the third position, retiring from tennis at age 36.
As for the grass, Roger Federer is also considered the greatest player on this surface. Federer produced a run of dominance at Wimbledon - the most important grass tournament, from 2003 through 2009. In second place is Jimmy Connors with 174 wins, who in a way was a Federed of his time. Although Pete Sampras did not make it to the top 10 of total wins on grass, he deserves a notable mention for his seven Wimbledon titles. Through these achievements, he is also considered one of the most significant grass players of all time.
On the carpet surface - somewhat a relic of the past, we might see great players from previous decades. John McEnroe and Jimmy Connors tie for the first place, but due to more significant titles won, John McEnroe is considered the all-time king of carpet courts.
Taking it all together, we can see that each surface has its series of legends. A notable finding is that a player should not be only assessed through his number of wins alone, as different matches are for other stakes. We notice some patterns, with individual players appearing across different rankings like Federer, Djokovic, Nadal, and players from the past like Connors.
Although we are here most concerned with finding out which player has been overall the best in tennis history, this can not be accomplished without stepping at a more detailed level, and looking at some of the components of tennis craft. We have tried to choose those elements that represent crucial factors that must characterize great, complete players. For instance, a mastery in serve is undoubtedly essential. Yet, it is not a good measure of dominance if the serves do not translate into a solid advantage in winning service points and sets. Similarly, a ratio of second serves won may be seen as a good indicator of confidence. Yet, it is conditional on losing first serves, which would not be called a quality of the best players. In the following two graphs, we will look at one measure of dominance and one measure of confidence under pressure and present the all-time best players concerning these qualities.
Interestingly, in both aspects, some players that occupy the top 10 places for the best in the ratio of matches won to 0 sets, and the best ratio of tie-breaks wins are not the from the shortlist all-time have seen before. It would look differently if we set higher thresholds for the number of appearances, but the point is not to deny some players access to the elite group but to filter out cases in which a player did not contribute much to sports’ history, playing mostly in low-rank tournaments or playing for a too short period. Of course, setting a threshold is always a bit arbitrary, and in the case of tie-breaks, one must consider that in the old-day of the game (prior c. 1990), tie-breaks were played much less frequently than they are played today.
As far as tie-breaks records are concerned, Connors, who may be considered the best in tie-breaks, played them over 18 times fewer than Federer. What is striking is that the ratio of tie-breaks won by Federer (2nd place) and Djokovic (3rd place) is almost identical, despite vast numbers of times they took part in tie-breaks, and also significant differences between them (Federer’s 698 tie-breaks vs. Djokovic’s 397). Some all-time greats also feature in the list for best tie-breaks players, notably Sampras and Roddick. However, there are also relatively unknown players (Mauricio Hadad, active in the 1990s, highest-ranking 78), and currently active, young players (Felix Auger-Aliassime, born 1998, highest ranking 17).
Our statistics for the most dominant players, that is, the ones who most frequently beat their rivals to 0 in sets, also reveals interesting findings. In this category of dominance Nadal and Federer set aside the rest of the field, both in terms of the ratio of such wins and absolute numbers of matches. They proved their superiority.
Tennis is a beautiful game with a long tradition and generations of players who have delighted the public with their skills and style of play. Analyzing tennis data is a hard task, due to the complexity of data and the constant evolution of the game itself. In our project, we wanted to discover who is the greatest player of all time. Unfortunately, a clear answer to this question does not exist. Each generation of players was governed by its own laws, had its own strengths and weaknesses, and had its own plethora of stars who competed with each other. However, during our analysis, a few names came up more often than others. Namely, members of the big three, Federer, Nadal, Djokovic, and prominent names from the past like Borg, Connors, Lendl, McEnroe, and Sampras. To compare them one to one, we have to wait until the current stars have finished their careers, but by then there are sure to be new stars who can outshine the current champions. Therefore, there is nothing left for us to do but cheer for the athletes, enjoy the game and remember the legends of the past.
Similarily for the loser: loser_ht, loser_age, l_ace, l_df, l_bpSaved.↩︎
All the pre-processing described here can be found in the data_preparation.R file.↩︎
Davis Cup is the main national competition in tennis. During the year there are multiple tourneys on continental and global levels that are part of Davis Cup competition. We often treat victories in Davis Cup separately from individual victoris, especially because winnig every match in Davis Cup does not imply that a player won the tournament, as the final result depends on his teamates’ performence.↩︎
There are four: Australian Open, Roland Garros (French Open), Wimbledon, and US Open. This, thankfully, has not changed in the recent history of tennis.↩︎
A calendar year and a season may be equaled in tennis, as a season usually starts in January with the Australian Open, and ends with the ATP Finals in November.↩︎
Meaning that in normal circumstances - when the calendar of tournaments does not change, a player will be rewarded with x points for a tournament a until the next edition of the tournament a.↩︎
As for February 2021.↩︎
“Often” should be here understood as: whenever a player, here Djokovic, is listed in top 10, he is most probably also listed in top 3.↩︎
In the presention we gave on Jan 25, 2021, there was a mistake in counting the streaks for the players, resulting in Roger Federer being portrayed as all-time best with respect to time spent as world best player in a row. We had then calculated the streaks backwards (if a player is no. 1 at time ranking_datet, check if he was a no. 1 at time ranking_datet-1, if yes, add ranking_datet-ranking_datet-1 to a streak), however it should be calculated, as it is now, in a forward fashion, as a player who becomes no. 1 remains no. 1 until the next ranking.↩︎
In the next section, we will consider all tournaments a player took part in during a particular season. Yet another proposition is to take a ratio of one’s victories to all calendar events↩︎
We have to admit that there was a mistake in our presentation from January 25, 2020, concerning this graph and the graph for best individual seasons by matches win ratio. The labels for seasons (players’ names and years) were then unfortunately presented in reversed order.↩︎
Winning 4 Grand Slam tournaments in one year is considererd the holy grail in tennis. In the modern era in tennis, since 1968, it has been acomplished only once, in 1969 by Rod Laver. Sadly, Laver begun his carrer in the early 1960s, so we do not have full record of his achivements (and those prior 1968 are not directly comparable to the later ones, due to major changes in tournaments’ regulations).↩︎
In 2015 Djokovic won 3 of 4 Grand Slam tournaments, 6 of 9 ATP Masters 1000 and the ATP Finals. Winning Australian Open and Roland Garros the next year, he became the only (except from Rod Laver) tennis player in the open era who complated a non-calendar Grand Slam, that is 4 Grand Slam tournaments wins in a row, but not necassarily during one season.↩︎