Introduction

This is a document visualizing some interesting stats thorughout the years from the Wimbledon Women’s Singles Final.

Dataset

I used historical data from the Wimbledon website, where the women’s singles final data is stored. This dataset included the rankings of the champion & runner-up, as well as their nationalities, and how long the matches lasted in minutes as well as the number of sets. I used the data provided from 1925 forward, as this was when Wimbledon started to rank their female players. Some years were missing completely, most notably the years of WWII, during which Wimbledon was cancelled.

Findings

I thought this data would be good to explore, & I found many interesting things about it even before crating the visualizations found below. Just this past year, in 2024, the two women in the final were ranked 7th and 31st, while typically the athletes that make it to the final are in the top 5. I also thought it was interesting that the longest match lasted for 166 minutes, played in 2005.

library(DescTools)
library(readxl)
library(ggplot2)
library(lubridate)
library(dplyr)
library(scales)
library(ggthemes)
library(ggplot2)
library(ggrepel)
library(plotly)
library(RColorBrewer)
df <- read_xlsx("Wimbledon stats 2.xlsx")

Histogram of Runner-Up Seedings

p1 <- ggplot(df, aes(x = factor(RUSeed))) +
  geom_histogram(bins = 18, colour = 'black', fill = 'pink', stat = 'count') +
  labs(title = "Histogram of Wimbledon Runner Up Seedings",
       x = "Runner-Up Seed Ranking", y = "Frequency") +
  geom_text(stat='count', aes(label = ..count.., vjust=-0.4))
p1

# This visualization shows the rankings of each Wimbledon runner-up. They are mostly ranked in the top 10, with the most amount of runners-up also being ranked second in the official Wimbledon seedings as well. But I thought it was interesting that occasionally players of lower rankings made it to the final, with 4 players that weren’t even considered for ranking making it to the final, shown in this viusalization as the 0 value.

Line Chart of Match Lengths

p2 <- ggplot(df, aes(x=df$Year, y=df$Minutes)) +
  geom_line(color='black', size=1) +
  geom_point(shape=21, size=4, color='darkgreen', fill='darkgreen') +
  labs(title="Length of Wimbledon Women's Singles Final", x="Year", y="Minutes", caption="Source: Draws Archive, Ladies Singles: www.wimbledon.com") +
  theme_light() +
  theme(plot.title = element_text(hjust=0.5)) +
  geom_text_repel(aes(label=Minutes), size=3, 
                   color='black',
                  segment.color='darkgreen',
                  max.overlaps = Inf,
                  box.padding=1,
                  point.padding=1)
p2

This visualization shows the length of each Wimbledon women’s singles final match. The average length of the matches is around 82 minutes, around 1 hour and 20 minutes. Thus, the lengths of the shortest match, only 30 minutes, and the longest, 166 minutes (almost 3 hours), becomes especially interesting.

Pie Chart of Champion Nationalities

country_counts <- table(df$Country)

p3 <- plot_ly(labels = names(country_counts), values = country_counts, type = "pie", 
          textposition = "outside", textinfo = "label + percent") %>%
  layout(title="Nationalities of Wimbledon Women's Singles Champions")
p3

The chart above shwos the distribution of nationalities for each champion. 59.3% hails from the USA. The second-most common nationality is German, at 9.89%. Britain does not come into play until the 5th most common nationality, at 5.49%.

Donut Plot of Wimbledon Runner-Up Nationalities

RUCountry_counts <- table(df$RUCountry)
plot_ly(df, labels = names(RUCountry_counts), values = RUCountry_counts) %>%
  add_pie(hole=0.5) %>%
  layout(title = "Nationality of Runners Up in Wimbledon Finals",
    annotations = list(text = "Total Country Count: 20", showarrow = FALSE))

This donut plot shows the nationalities of the runners-up in the Wimbledon finals. Again, the USA takes the lead with 47.3% of the chart. This time, Great Britain is ranked third for runners-up on their home turf.

Plot of Champion and Runners-Up Seedings

ggplot(df, aes(x = Year)) +
  geom_line(aes(y = df$CSeed, color = "orange")) +
  geom_line(aes(y = df$RUSeed, color = "hotpink")) +
  scale_y_reverse()

  labs(title = "Wimbledon Champion & Runner-Up Seeds Over the Years",
       x = "Year", y = "Ranking")
## $x
## [1] "Year"
## 
## $y
## [1] "Ranking"
## 
## $title
## [1] "Wimbledon Champion & Runner-Up Seeds Over the Years"
## 
## attr(,"class")
## [1] "labels"

The visualization above shows a comparison between the rankings for the champion and runner-up each year. In many years, the champion and runner-up are aptly ranked first and second, respectively. 2024 presents an interesting case, where the champion was ranked 31st and the runner-up 7th. In other years, outliers exist where a champion is unranked, makes it to the final, and wins.
## Conclusion {.unlisted .unnumbered}