# A tibble: 100 × 5
Rank Player Age Tournaments.Played PPT
<dbl> <chr> <dbl> <dbl> <dbl>
1 11 Veronika Kudermetova 29 9 624.
2 2 Taylor Townsend 30 14 585
3 3 Elise Mertens 30 14 581.
4 1 Katerina Siniakova 29 16 547.
5 1 Aryna Sabalenka 28 19 532.
6 4 Jasmine Paolini 30 15 492.
7 4 Sara Errani 39 15 492.
8 6 Gabriela Dabrowski 34 16 433.
9 2 Elena Rybakina 26 23 372
10 3 Iga Swiatek 24 19 366.
# ℹ 90 more rows
Assignment 7
WTA Ranking and Age Analysis
Introduction
The Women’s Tennis Association is the organizing body of the Women’s tennis around the world. Players of many ages play the sport and I want to analyze this, along with other factors such as rank and points.
I intend to answer my questions by using the rankings of WTA tennis players. I used RStudio’s chromote feature to web scrape the WTA’s top 50 rankings for singles and doubles players as of May 8th 2026, this will be the data I will use for my analysis. It is suitable as it comes directly from the WTA organization itself.
Data accessed at:
https://www.wtatennis.com/rankings/singles
https://www.wtatennis.com/rankings/doubles
Data Wrangling
When scraping the table of WTA rankings for Singles and Doubles Tennis players, I needed to complete some data wrangling so that the dataset would be easy to use. For example, I needed to filter out the words “Ranking History” and irrelevent numbers that showed up in some of the columns. I also needed to make sure that the rank and points column were numeric and remove an empty column. In my analysis, I will also create a column called points per tournament by dividing points by tournaments played and a column categorizing age.
Analysis
- Create a table showing each players points per tournament.
By calculating the Points Per Tournament (PPT), we can see that lower-ranked players like Veronika Kudermetova can actually have a higher scoring rate than the top-ranked seeds.
- Next, I am going to create a scatter plot to show the relationship between tournaments played and points earned.
There is not a strong positive or negative correlation. However, there are interesting outliers, those who are high on the y-axis and low or average on the x-axis. For example, there is one player with over 10,000 points who has played only about 19 tournaments. This indicates a high efficiency, most likely, winning or reaching the finals of nearly every event they enter. Several players have 7,500+ points while playing fewer than 20 tournaments. These are players who prioritize quality over quantity.
There is a heavy density of points between 20 and 32tournaments played, but with points mostly staying below 2,500. This group represents players who compete very frequently but likely exit in earlier rounds or play in lower-tier tournaments.
- To dive deeper into this, I am going to create a boxplot to show the point distribution by amount of tournaments played.
The visualization shows that players with a “Strategic” workload (under 16 tournaments) actually maintain a significantly higher median point total than those playing more frequently. While the the group over 16 tournaments contains more players, their points are more heavily concentrated at the lower end of the scale, suggesting that quantity of play does not guarantee a higher rank. The presence of several high-performing outliers in the over 16 tournaments group shows that while some can sustain elite play across a heavy schedule, the most efficient path to the top appears to be a more selective tournament calendar.
Create a histogram of the distribution of age for the Top 50 WTA Singles and Doubles Players.
This histogram shows that the age distribution of top 50 WTA is slightly right-skewed, with a significant concentration of athletes in their late 20s and early 30s. The highest frequency occurs around age 30. There is a presence of younger players starting in their late teens and grows in to 30s. The count drops off sharply after age 34, with only a few outliers competing into their early 40s.
- Create a boxplot showing the point distribution by age group.
The “Over 30” group exhibits a higher median point total and a larger interquartile range, suggesting that veteran players in the top 50 tend to maintain more consistent high-level point totals. In contrast, the “Under 30” group has a lower median but features several outliers reaching up to 10,000 points, representing the dominant top stars of the younger generation. Overall, while the younger group contains the absolute highest point earners, the veteran group shows a higher baseline of point accumulation across its middle 50% of players.