We are going to have a look at some of statistics of the Wimbledon Tennis championships held over the years. We have acquired the data from the internet (Wikipedia) and gathered a dataset called Tennis which contains information about the matches that took place in Wimbledon.
In order to load the dataset, we used the following link:
link_wimbledon = 'https://en.wikipedia.org/wiki/List_of_Wimbledon_gentlemen%27s_singles_champions'
Tennis = htmltab(doc ='https://en.wikipedia.org/wiki/List_of_Wimbledon_gentlemen%27s_singles_champions' ,
which ='//*[@id="mw-content-text"]/div/table[4]')
The following libraries have to be loaded as well:
library(htmltab)
library(ggplot2)
library(tidyverse)
The Tennis datasets contain the following variables and their respective descriptions have been given below:
| Variable | Description |
|---|---|
| Year | The year the match took place |
| Champion | The winner of the tournament |
| C_Country | The country the winner was from |
| Runner-up | The runner-up of the tournament |
| Country | The country the runner-up was from |
| Score in the final | The score in the finals of the championship |
Let us do some data exploration in order to get some stats from the Wimbledon championships over the years. We have data spanning from 1968 - 2022.
The initial Dataset had two variables that had the same name - Country. So we are first going to change of one of the variables in order to perform the different explorations.
#Changing country variable of the champion to C_Country
Tennis$C_Country <- Tennis$Country
Tennis <- Tennis[, -2]
Tennis <- Tennis[, c(1,2,6,4,3,5)]
#Adding a count variable
Tennis_1 <- Tennis %>%
group_by(Champion) %>%
count()
#Arranging them in descending order
Tennis_1 <- Tennis_1 %>%
arrange(desc(n))
#Plotting the data
ggplot(data = Tennis_1, aes(x = Champion, y = n)) +
geom_bar(stat = "identity", fill = "pink", col = "black") +
geom_text(aes(label = n), position = position_stack(vjust = 0.8), size = 3) +
theme(axis.text.x=element_text(angle = -70, hjust = 0)) +
ylab("Number of championships")
We are now going to have a look at which country has the most number of Champions.
#Adding a count variable to country
Tennis_2 <- Tennis %>%
group_by(C_Country) %>%
count()
#Arranging them in descending order
Tennis_2 <- Tennis_2 %>%
arrange(desc(n))
#Plotting the data
ggplot(data = Tennis_2, aes(x = C_Country, y = n)) +
geom_bar(stat = "identity", fill = "pink", col = "black") +
geom_text(aes(label = n), position = position_stack(vjust = 0.8), size = 3) +
theme(axis.text.x=element_text(angle = -70, hjust = 0)) +
ylab("Number of championships") +
xlab("Country")