This is a document visualizing some interesting stats thorughout the years from the Wimbledon Women’s Singles Final.
I used historical data from the Wimbledon website, where the women’s singles final data is stored. This dataset included the rankings of the champion & runner-up, as well as their nationalities, and how long the matches lasted in minutes as well as the number of sets. I used the data provided from 1925 forward, as this was when Wimbledon started to rank their female players. Some years were missing completely, most notably the years of WWII, during which Wimbledon was cancelled.
I thought this data would be good to explore, & I found many interesting things about it even before crating the visualizations found below. Just this past year, in 2024, the two women in the final were ranked 7th and 31st, while typically the athletes that make it to the final are in the top 5. I also thought it was interesting that the longest match lasted for 166 minutes, played in 2005.
library(DescTools)
library(readxl)
library(ggplot2)
library(lubridate)
library(dplyr)
library(scales)
library(ggthemes)
library(ggplot2)
library(ggrepel)
library(plotly)
library(RColorBrewer)
df <- read_xlsx("Wimbledon stats 2.xlsx")
p1 <- ggplot(df, aes(x = factor(RUSeed))) +
geom_histogram(bins = 18, colour = 'black', fill = 'pink', stat = 'count') +
labs(title = "Histogram of Wimbledon Runner Up Seedings",
x = "Runner-Up Seed Ranking", y = "Frequency") +
geom_text(stat='count', aes(label = ..count.., vjust=-0.4))
p1
# This visualization shows the rankings of each Wimbledon runner-up.
They are mostly ranked in the top 10, with the most amount of runners-up
also being ranked second in the official Wimbledon seedings as well. But
I thought it was interesting that occasionally players of lower rankings
made it to the final, with 4 players that weren’t even considered for
ranking making it to the final, shown in this viusalization as the 0
value.
p2 <- ggplot(df, aes(x=df$Year, y=df$Minutes)) +
geom_line(color='black', size=1) +
geom_point(shape=21, size=4, color='darkgreen', fill='darkgreen') +
labs(title="Length of Wimbledon Women's Singles Final", x="Year", y="Minutes", caption="Source: Draws Archive, Ladies Singles: www.wimbledon.com") +
theme_light() +
theme(plot.title = element_text(hjust=0.5)) +
geom_text_repel(aes(label=Minutes), size=3,
color='black',
segment.color='darkgreen',
max.overlaps = Inf,
box.padding=1,
point.padding=1)
p2
This visualization shows the length of each Wimbledon women’s singles
final match. The average length of the matches is around 82 minutes,
around 1 hour and 20 minutes. Thus, the lengths of the shortest match,
only 30 minutes, and the longest, 166 minutes (almost 3 hours), becomes
especially interesting.
country_counts <- table(df$Country)
p3 <- plot_ly(labels = names(country_counts), values = country_counts, type = "pie",
textposition = "outside", textinfo = "label + percent") %>%
layout(title="Nationalities of Wimbledon Women's Singles Champions")
p3
The chart above shwos the distribution of nationalities for each champion. 59.3% hails from the USA. The second-most common nationality is German, at 9.89%. Britain does not come into play until the 5th most common nationality, at 5.49%.
RUCountry_counts <- table(df$RUCountry)
plot_ly(df, labels = names(RUCountry_counts), values = RUCountry_counts) %>%
add_pie(hole=0.5) %>%
layout(title = "Nationality of Runners Up in Wimbledon Finals",
annotations = list(text = "Total Country Count: 20", showarrow = FALSE))
This donut plot shows the nationalities of the runners-up in the Wimbledon finals. Again, the USA takes the lead with 47.3% of the chart. This time, Great Britain is ranked third for runners-up on their home turf.
ggplot(df, aes(x = Year)) +
geom_line(aes(y = df$CSeed, color = "orange")) +
geom_line(aes(y = df$RUSeed, color = "hotpink")) +
scale_y_reverse()
labs(title = "Wimbledon Champion & Runner-Up Seeds Over the Years",
x = "Year", y = "Ranking")
## $x
## [1] "Year"
##
## $y
## [1] "Ranking"
##
## $title
## [1] "Wimbledon Champion & Runner-Up Seeds Over the Years"
##
## attr(,"class")
## [1] "labels"
The visualization above shows a comparison between the rankings for
the champion and runner-up each year. In many years, the champion and
runner-up are aptly ranked first and second, respectively. 2024 presents
an interesting case, where the champion was ranked 31st and the
runner-up 7th. In other years, outliers exist where a champion is
unranked, makes it to the final, and wins.
## Conclusion {.unlisted .unnumbered}