Thanks for taking a look at this for me, Brian!
In this document I wanted to describe my workflow for using Catapult (and catapultR) for our submaximal testing (SMT) at the club. We have a heart rate (HR) run protocol, which is a standardised bout of running used to measure exercise HR (HRex) as a proxy of fitness. I wont bore you with the details of why I think it’s a good idea but we tried to summarising everything in our latest paper on the topic. It’s real simple to run (see here) and all we need to do is extract the mean 30-sec heart rate at the end of the run (3 minutes). Cue integrated HR vests and catapultR!
First thing’s first: I’ll load the packages in and set my catapultR token.
# load packages-----------------------------------------------------------------
library(rstudioapi) # dirname() & getActiveDocumentContext() to auto set wd
library(tidyverse)
library(catapultR)
library(readxl)
library(nplyr) # nest_mutate()
library(slider) # slide_dbl()
# set token --------------------------------------------------------------------
# set token wd
setwd(dirname(getActiveDocumentContext()$path))
setwd("../")
# load token
sToken <-
read_file("token.txt")
# apply token
token <-
ofCloudCreateToken(sToken = sToken,
sRegion = "EMEA")
I start by pulling down all the relevant metadata from the
cloud. It’s pretty straightforward, and I’m preaching to the choir here,
but the documentation on the catapultr web page is a wicked
resource! I just had to make a few tweaks to fit what I needed, navigate
some errors I was getting, and stick to a tidyverse code style.
One initial stumbling block was when trying to fetch the periods, then
filter by my SMT tag list. Some periods have no tags and this was
causing an issue. The workaround was to first filter the periods df to
only include rows where the tag list was at least length 1. Then, the
subsequent SMT tag filtering plays ball.
# get cloud metadata -----------------------------------------------------------
# start date
start <- "2025-07-07"
# players
players <-
ofCloudGetAthletes(credentials = token) %>%
as_tibble()
# activities
activities <-
ofCloudGetActivities(credentials = token,
from = as.integer(as_datetime(start)))
# smt periods
periods <-
# all periods
map(.x = activities %>% pull(id),
.f = ~ ofCloudGetPeriods(credentials = token,
activity_id = .x)) %>%
list_rbind() %>%
# keep only rows where tag_list is non-empty
filter(map_lgl(.x = tag_list,
.f = ~ length(.x) > 0)) %>%
# smt only
unnest(cols = tag_list) %>%
filter(tag_list %in% c("HR Run", "Stride Test", "Std SSG"))
When it comes to getting the players in periods, this was again
more fiddly than initially anticipated because it only threw up some
errors after a few weeks of use. player_id is either returned blank or
as a list of length 1 (the id), meaning unnest(cols = players) won’t
work; because data in player_id is not the same class (it’s character in
some, list in the other). I found it easier just to remove the column to
proceed.
# players in periods
pip <-
periods %>%
select(id, tag_list) %>%
mutate(# get players in period
pip = map(.x = id,
.f = ~ ofCloudGetAthleteDevicesInPeriod(credentials = token,
period_id = .x)),
# remove player_id from players nested df
pip = map(.x = pip,
.f = ~ .x %>% select(-player_id))) %>%
# unnest
unnest(cols = pip)
Next to get the sensor data. I dont know if this is more
cumbersome than needed but I was getting a lot of double-nested lists
from the catapultR functions. Maybe its because I ran them
inside mutate, to add them on as new columns, but that was needed so I
could retain ‘tag_list’ from the initial (outer) data frame (it
distinguises between the type of SMT, but as a variable it isn’t
returned by any subsequent catapultR function.). So, for
example: I had to use map(~ pluck(.x, 1)) to extract the first element
of .x, which is the inner data frame returned by
ofCloudGetPeriodSensorDataEx().
# get sensor data --------------------------------------------------------------
sensor <-
pip %>%
# subset pip df to arguments for cloud sensor function
select(period_id = id,
athlete_id,
tag_list) %>%
# get cloud sensor data & unlist into a df
mutate(pmap(.l = list(period_id, athlete_id),
.f = ~ ofCloudGetPeriodSensorDataEx(credentials = token,
period_id = ..1,
athlete_id = ..2)) %>%
# extract inner df
map(.f = ~ pluck(.x, 1)) %>%
list_rbind(),
# convert nested dfs to tibbles
data = map(.x = data,
.f = as_tibble))
Now the sensor data is in R and I can press on with tidying and
wrangling it. I tend to save the data frame as a .rds file at this
point, in case I ever need to backtrack but to avoid making unecessary
calls to the API.
Each time we complete a new test I run a slightly different script. It first reads in my most recent .rds file, finds the last exported date, and sets this as the ‘from’ argument in the activities pull.
# get current data -------------------------------------------------------------
load(file = "data/1_cloud_data.RData")
# get last test date -----------------------------------------------------------
last <-
# pul nested sensor data as a vector
sensor %>%
pull(data) %>%
# summarise each as the first timestamp
map_dbl(.f = ~
.x %>%
pull(ts) %>%
min(na.rm = T)) %>%
# get latest start timestamp and format as date
max(na.rm = T) %>%
as_datetime() %>%
date()
# get cloud metadata -----------------------------------------------------------
# activities
activities <-
ofCloudGetActivities(credentials = token,
from = as.integer(as_datetime(last + 1)))
…and so on. I pull the sensor data down for the new dates only,
then row-bind them to the original and re-save.
The goal of this next section is to tidy the data frame(s) only, without any HR-specific wrangling just yet. I’ve tried to think of it as ‘cleaning up the cloud data’, so the next step can be a fork in the road for SMT-specific wrangling.
I start by adding the test date and time to the main (outer) data frame, which I get from the sensor (nested) data frame.
# add date & time --------------------------------------------------------------
sensor_tidy <-
sensor %>%
# extract first timestamp as date-time
mutate(start = map_dbl(.x = data,
.f = ~
.x %>%
pull(ts) %>%
min(na.rm = T)) %>%
as_datetime(tz = "GB")) %>%
# split & format
separate_wider_delim(cols = start,
delim = " ",
names = c("date", "time")) %>%
mutate(date = as_date(date))
After that I bind and tidy up all the player-related info in the
main data frame.
# wrangle main df --------------------------------------------------------------
# add positions & maxes
sensor_tidy <-
sensor_tidy %>%
# join player info
left_join(y =
players %>%
select(jersey,
max_v = velocity_max,
max_hr = heart_rate_max,
pos = position_name),
by = "jersey") %>%
# combine names
unite(col = name,
athlete_first_name,
athlete_last_name,
sep = " ") %>%
# add position groups
mutate(grp2 = case_when(pos %in% c("Prop",
"Hooker",
"2nd Row",
"Back Row") ~ "Forwards",
.default = "Backs"),
grp1 = case_when(pos %in% c("Prop",
"Hooker",
"2nd Row") ~ "Tight 5s",
pos == "Back Row" ~ "Back Rows",
grp2 == "Backs" ~ grp2))
Then for the sensor data. I add numeric seconds, remove any null
(zero) HR data, then add both HR and velocity as percent of their
respective maximums (the latter is more ‘out of interest’ - I don’t
actually use it for anything in this project, yet).
# wrangle sensor df ------------------------------------------------------------
sensor_tidy <-
sensor_tidy %>%
# add sec, remove null hr
nest_mutate(.nest_data = data,
s = (row_number() / 10) - 0.1,
hr_bpm = na_if(x = hr,
y = 0)) %>%
# add hr & vel %
mutate(# hr %
data = map2(.x = data,
.y = max_hr,
.f = ~ mutate(.data = .x,
hr = 100 * (hr_bpm / .y))),
# vel %
data = map2(.x = data,
.y = max_v,
.f = ~ mutate(.data = .x,
v_perc = 100 * (v / .y))))
A final celan up of the main data frame before saving it.
# clean df ---------------------------------------------------------------------
sensor_tidy <-
sensor_tidy %>%
select(date,
time,
name,
jersey,
pos,
grp1,
grp2,
smt = tag_list,
sensor = data)
This final part of the workflow focuses on the actual HR data and it’s use ina hr-run-style SMT. Up to now, the previous parts of the workflow and the datasets created could be used for other styles of SMT, like extracting the vertical component of PlayerLoad during standardised high-intensity strides. But that’s a story for another day…
For this one, I start by adding the 30-sec rolling mean of HR (as a 5 HR max) to the nested sensor data. This is the simplest but probably on of the most efficient ways to smooth the data.
# wrangle sensor df ------------------------------------------------------------
hr_tidy <-
sensor_tidy %>%
# add smoothed hr
nest_mutate(.nest_data = sensor,
hr_sm = slide_dbl(.x = hr,
.f = mean,
.before = 300))
Then I can extract the relevant sensor info to the main data
frame: the proportion of NA rows in the HR column, 30-sec rolling HR at
3 minutes, and mean velocity & acceleration through the entire 3
minutes. All these are custom functions I wrote just to save huge bits
of code in the main script. I’ve put them at the bottom of this
write-up, if you want to check them out.
# wrangle main df --------------------------------------------------------------
hr_tidy <-
hr_tidy %>%
mutate(# add run type
protocol = case_when(jersey == "TCD" ~ "Backs & BR (50 m)",
pos %in% c("Prop",
"Hooker",
"2nd Row") ~ "Tight 5 (45 m)",
.default = "Backs & BR (50 m)"),
# count prop NA
na_hr = map_dbl(.x = sensor,
.f = get_prop_na),
# get HRav @ 3 min
hr_av = map_dbl(.x = sensor,
.f = get_hr_at),
# get vel & acc av
v_av = map_dbl(.x = sensor,
.f = ~ get_sensor_means(sensor_data = .x,
sensor = v)),
a_av = map_dbl(.x = sensor,
.f = ~ get_sensor_means(sensor_data = .x,
sensor = a)))
At this point I have to inspect any new HR data to see if it
needs to be removed or not. Sometimes a player will have a test with no
missing HR, but its visually clear that the best didnt pick it up
properly. The plot_hr_sensor() is another custom function I wrote and I
loop all the players from a given test date through it.
# inspect new hr data ----------------------------------------------------------
hr_tidy %>%
# select date to inspect
filter(date == "2025-09-16") %>%
# subset
select(sensor_data = sensor, name, date) %>%
# visualise
pmap(.f = ~ plot_hr_sensor(sensor_data = ..1,
name = ..2,
date = ..3))
The workflow gets a little messy at this point, because when I
sport a HR sensor file that should be removed, the best solution I’ve
got is to log it in an Excel sheet then import this to filter out
relevant rows. I combine this with an auto removal process where I
filter out any testing occasions when a player has any HR drop-outs.
# import dubious hr ------------------------------------------------------------
dubious <-
read_xlsx(path = "data/dubious_hr.xlsx") %>%
mutate(date = as_date(date))
# log & remove bad hr ----------------------------------------------------------
# log bad hr
bad_hr <-
hr_tidy %>%
# join dubious hr checks
left_join(y = dubious,
by = c("name", "date")) %>%
# tag and subset bad hr instances
mutate(decision = case_when(na_hr != 0 ~ "HR Dropout (auto)",
decision == "remove" ~ "Bad Trace")) %>%
filter(! is.na(decision))
# subset hr_tidy
hr_tidy <-
hr_tidy %>%
# join dubious hr checks
left_join(y = dubious,
by = c("name", "date")) %>%
# subset only good hr instances
filter(na_hr == 0,
decision %in% c("keep", NA))
Then the last final clean-up and export before this data is
finally ready for the real analysis!
# clean df ---------------------------------------------------------------------
hr_tidy <-
hr_tidy %>%
select(-c(smt, decision)) %>%
relocate(protocol,
.before = sensor)
# save data files --------------------------------------------------------------
write_rds(x = hr_tidy,
file = "data/3a_hr_run_tidy.rds",
compress = "gz")
write_rds(x = bad_hr,
file = "data/3b_bad_hr.rds",
compress = "gz")
# hr wrangling -----------------------------------------------------------------
# get proportion of NAs
get_prop_na <-
function(sensor_data,
from = -Inf,
to = Inf,
sec = s,
hr = hr) {
sensor_data %>%
filter(between(x = {{ sec }},
left = from,
right = to)) %>%
summarise(mean(is.na(x = hr))) %>%
as.numeric() %>%
round(2) * 100
}
# get hr at timepoint
get_hr_at <-
function(sensor_data,
at = 180,
sec = s,
hr = hr_sm) {
# find max dur
dur <-
sensor_data %>%
summarise(max({{ sec }})) %>%
as.numeric()
# extract hr
sensor_data %>%
filter(case_when(dur < at ~ {{ sec }} == dur,
.default = {{ sec }} == at)) %>%
pull({{ hr }})
}
# get sensor means
get_sensor_means <-
function(sensor_data,
from = 0,
to = 180,
sec = s,
sensor) {
sensor_data %>%
filter(between(x = {{ sec }},
left = from,
right = to)) %>%
summarise(mean(abs({{ sensor }}),
na.rm = T)) %>%
as.numeric()
}
# visualisations ---------------------------------------------------------------
# plot hr function
plot_hr_sensor <-
function(sensor_data,
hr_at = 180,
hlgt_from = NA,
hlgt_to = NA,
hlgt_fill = "orange",
hr_raw = hr,
hr_sm = hr_sm,
sm_rm_first = 30,
sec = s,
name = NA,
date = NA){
# find max dur
dur <-
sensor_data %>%
summarise(max({{ sec }})) %>%
as.numeric()
# title
title <-
case_when(!is.na(name) & !is.na(date) ~ str_c(name, ", ", as.character(date)),
is.na(name) ~ as.character(date),
is.na(date) ~ name)
# build plot
hr_sensor_plot <-
sensor_data %>%
# wrangle sensor df
mutate(# add hr at
hr_at = case_when({{ sec }} == hr_at ~ {{ hr_sm }},
dur < hr_at & {{ sec }} == dur ~ {{ hr_sm }},
.default = NA),
# remove first n sec of rolling mean
hr_sm = case_when({{ sec }} < sm_rm_first ~ NA,
.default = {{ hr_sm }}),
# convert numeric time to mm:ss
time = hms::as_hms({{ sec }})) %>%
# set mappings
ggplot(mapping = aes(x = time,
y = {{ hr_raw }})) +
# add shadced regions for last minute
annotate(geom = "rect",
xmin = hms::as_hms(hlgt_from),
xmax = hms::as_hms(hlgt_to),
ymin = -Inf,
ymax = Inf,
fill = hlgt_fill,
alpha = 0.5) +
# add raw & smoothed hr
geom_line(colour = "grey80") +
geom_point(mapping = aes(y = {{ hr_sm }}),
size = 1) +
# add labels for timepoints
ggrepel::geom_label_repel(mapping = aes(y = hr_at,
label = str_c(round(hr_at, 0),"%")),
size = 2,
nudge_x = -15,
nudge_y = -5,
direction = "x",
colour = "black",
fontface = "bold",
family = "Chakra Petch") +
# add timepoint
geom_point(mapping = aes(y = hr_at),
size = 4,
shape = 21,
fill = "red",
colour = "white") +
scale_x_time(breaks = scales::breaks_width("1 min"),
labels = scales::label_time(format = "%M:%S")) +
# labels
labs(title = title,
x = element_blank(),
y = "HR (% max)") +
# themes
theme_minimal() +
theme(# title
plot.title.position = "plot",
plot.title = if(!is.na(title)) element_text(face = "bold") else element_blank(),
# axes
axis.title.y = element_text(face = "bold"),
axis.text.x = element_text(face = "bold",
colour = "black",
size = 7),
# gridlines
panel.grid.minor.x = element_line(colour = "grey90",
linetype = "dotted"),
panel.grid.major.x = element_line(colour = "grey90",
linetype = "dotted"),
panel.grid.major.y = element_line(colour = "grey90",
linetype = "dotted"),
panel.grid.minor.y = element_blank(),
# other
text = element_text(family = "Chakra Petch"))
return(hr_sensor_plot)
}