1. Introduction

We are going to have a look at some of statistics of the Wimbledon Tennis championships held over the years. We have acquired the data from the internet (Wikipedia) and gathered a dataset called Tennis which contains information about the matches that took place in Wimbledon.

2. Loading the dataset

In order to load the dataset, we used the following link:

link_wimbledon = 'https://en.wikipedia.org/wiki/List_of_Wimbledon_gentlemen%27s_singles_champions'

Tennis = htmltab(doc ='https://en.wikipedia.org/wiki/List_of_Wimbledon_gentlemen%27s_singles_champions' , 
               which ='//*[@id="mw-content-text"]/div/table[4]') 

The following libraries have to be loaded as well:

library(htmltab)
library(ggplot2)
library(tidyverse)

3. Variable Description

The Tennis datasets contain the following variables and their respective descriptions have been given below:

Table 1. Variable Description
Variable Description
Year The year the match took place
Champion The winner of the tournament
C_Country The country the winner was from
Runner-up The runner-up of the tournament
Country The country the runner-up was from
Score in the final The score in the finals of the championship

4. Summary Statistics of the Championships

Let us do some data exploration in order to get some stats from the Wimbledon championships over the years. We have data spanning from 1968 - 2022.

4.1 Which player has the most number of Wimbledon Titles?

The initial Dataset had two variables that had the same name - Country. So we are first going to change of one of the variables in order to perform the different explorations.

#Changing country variable of the champion to C_Country
Tennis$C_Country <- Tennis$Country
Tennis <- Tennis[, -2]
Tennis <- Tennis[, c(1,2,6,4,3,5)]
  • First we are going to find the player who has the most number of Wimbledon titles till date.
#Adding a count variable
Tennis_1 <- Tennis %>%
  group_by(Champion) %>%
  count()

#Arranging them in descending order
Tennis_1 <- Tennis_1 %>%
  arrange(desc(n))

#Plotting the data
ggplot(data = Tennis_1, aes(x = Champion, y = n)) +
  geom_bar(stat = "identity", fill = "pink", col = "black") +
  geom_text(aes(label = n), position = position_stack(vjust = 0.8), size = 3) +
  theme(axis.text.x=element_text(angle = -70, hjust = 0)) +
  ylab("Number of championships")

  • As we can see from the above graph, the player with the most number of Wimbledon titles till date is Roger Federer (8), followed by Pete Sampras and Novak Djokovic who have 7 titles each. There are a total of 21 Wimbledon champions overall and there is one year where the event did not take place which is 2020 and that was due to the impact of COVID-19.

4.2 Which country has the highest number of Champions?

We are now going to have a look at which country has the most number of Champions.

#Adding a count variable to country
Tennis_2 <- Tennis %>%
  group_by(C_Country) %>%
  count()

#Arranging them in descending order
Tennis_2 <- Tennis_2 %>%
  arrange(desc(n))

#Plotting the data
ggplot(data = Tennis_2, aes(x = C_Country, y = n)) +
  geom_bar(stat = "identity", fill = "pink", col = "black") +
  geom_text(aes(label = n), position = position_stack(vjust = 0.8), size = 3) +
  theme(axis.text.x=element_text(angle = -70, hjust = 0)) +
  ylab("Number of championships") +
  xlab("Country")

  • As we can see from the above graph, USA has the most number of champions, followed by Switzerland. The top performer from USA is Pete Sampras. Switzerland has a total of 8 titles which were all secured by Roger Federer.