Who sets the pace when there’s no clock? In tennis, it’s the players who control the pace of the game, especially during their service games. With no time limit on matches, only score, players must be tactical when considering their pace of play.
This analysis will use data from a 538 article that focuses mostly on how the game has slowed over time, but today we’ll be looking at the player data specifically.
Let’s compare the the time between serves and the following point (the “down time”) for a classic rival pair; Rafael Nadal and Roger Federer.
First we’ll install and load the necessary packages.
tidyverse
conveniently has all of them.
install.packages("tidyverse")
library(dplyr)
library(ggplot2)
We’ll load the serve_times.csv
file from the Github
repo, which contains data on how many second elapse between a given
player’s serve and the start of the next point.
# Load data from Github repo
serve_time <- read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/tennis-time/serve_times.csv")
# Take an initial look at the data
glimpse(serve_time)
## Rows: 120
## Columns: 7
## $ server <chr> "Nicolas Almagro", "Nicolas Almagro", "Nicol…
## $ seconds_before_next_point <int> 22, 19, 23, 24, 19, 34, 21, 17, 17, 34, 29, …
## $ day <chr> "28-May-15", "28-May-15", "28-May-15", "28-M…
## $ opponent <chr> "Rafael Nadal", "Rafael Nadal", "Rafael Nada…
## $ game_score <chr> "15-30", "15-40", "30-40", "Deuce", "Ad-in",…
## $ set <int> 1, 1, 1, 1, 1, 5, 3, 3, 3, 1, 1, 1, 1, 1, 5,…
## $ game <chr> "0-0", "0-0", "0-0", "0-0", "0-0", "2-4", "0…
Let’s subset the data to get the player’s we’re interested in, Nadal and Federer, by filtering for their service games.
# Create list of players we're interested in
players_of_interest <- c('Rafael Nadal', 'Roger Federer')
poi_serve_times <- serve_time |>
filter(server %in% players_of_interest) |>
select(server, seconds_before_next_point) # Our variables of interest
# Take a quick look to confirm results are as expected
glimpse(poi_serve_times)
## Rows: 22
## Columns: 2
## $ server <chr> "Rafael Nadal", "Rafael Nadal", "Rafael Nada…
## $ seconds_before_next_point <int> 34, 29, 33, 23, 27, 27, 15, 15, 14, 22, 17, …
In this exercise, we consider player
to be our predictor
variable, and seconds_before_next_point
to be the
independent variable.
Let’s visualize the distribution of the data for each player to see if there’s anything interesting.
ggplot(poi_serve_times, aes(x = server, y = seconds_before_next_point, color = server)) +
geom_boxplot() +
labs(x = "Player",
y = "Seconds Before Next Point",
title = "Distribution of Time Between Points, Nadal v. Federer") +
theme(legend.position = 'none')
We can see that for this sample, Nadal’s time between points is
significantly longer than Federer’s, with a median
seconds_before_next_point
of approximately 28 compared to
Federer’s 16.
According to the data available, Federer generally plays faster games than Nadal.
For tennis fans, this may make intuitive sense given Federer’s fluid and forward-moving play style. Nadal is known for playing topspin power shots and aggressive rallying from the baseline, so perhaps a longer time between serves is part of his strategy!
Admittedly, the sample size is small and limited to a few events. Additional data across a wider set of tournaments, surfaces, and players would make this a far more reasonable analysis and conclusion.