Overview

Thanks for taking a look at this for me, Brian!

In this document I wanted to describe my workflow for using Catapult (and catapultR) for our submaximal testing (SMT) at the club. We have a heart rate (HR) run protocol, which is a standardised bout of running used to measure exercise HR (HRex) as a proxy of fitness. I wont bore you with the details of why I think it’s a good idea but we tried to summarising everything in our latest paper on the topic. It’s real simple to run (see here) and all we need to do is extract the mean 30-sec heart rate at the end of the run (3 minutes). Cue integrated HR vests and catapultR!

Getting Cloud Data

First thing’s first: I’ll load the packages in and set my catapultR token.

# load packages-----------------------------------------------------------------

library(rstudioapi)   # dirname() & getActiveDocumentContext() to auto set wd
library(tidyverse)
library(catapultR)
library(readxl)
library(nplyr)        # nest_mutate()
library(slider)       # slide_dbl()

# set token --------------------------------------------------------------------

  # set token wd
  setwd(dirname(getActiveDocumentContext()$path))
  setwd("../")
  
  # load token
  sToken <- 
    read_file("token.txt")
  
  # apply token
  token <- 
    ofCloudCreateToken(sToken = sToken, 
                       sRegion = "EMEA")

I start by pulling down all the relevant metadata from the cloud. It’s pretty straightforward, and I’m preaching to the choir here, but the documentation on the catapultr web page is a wicked resource! I just had to make a few tweaks to fit what I needed, navigate some errors I was getting, and stick to a tidyverse code style. One initial stumbling block was when trying to fetch the periods, then filter by my SMT tag list. Some periods have no tags and this was causing an issue. The workaround was to first filter the periods df to only include rows where the tag list was at least length 1. Then, the subsequent SMT tag filtering plays ball.

# get cloud metadata -----------------------------------------------------------

  # start date
  start <- "2025-07-07"

  # players
  players <- 
    ofCloudGetAthletes(credentials = token) %>% 
    as_tibble()
  
  # activities
  activities <- 
    ofCloudGetActivities(credentials = token, 
                         from = as.integer(as_datetime(start)))

  # smt periods
  periods <- 
    
    # all periods
    map(.x = activities %>% pull(id), 
        .f = ~ ofCloudGetPeriods(credentials = token, 
                                 activity_id = .x)) %>% 
    list_rbind() %>% 
    
    # keep only rows where tag_list is non-empty
    filter(map_lgl(.x = tag_list, 
                   .f = ~ length(.x) > 0)) %>%
    
    # smt only
    unnest(cols = tag_list) %>% 
    filter(tag_list %in% c("HR Run", "Stride Test", "Std SSG"))

When it comes to getting the players in periods, this was again more fiddly than initially anticipated because it only threw up some errors after a few weeks of use. player_id is either returned blank or as a list of length 1 (the id), meaning unnest(cols = players) won’t work; because data in player_id is not the same class (it’s character in some, list in the other). I found it easier just to remove the column to proceed.

  # players in periods
  pip <- 
    periods %>%
    select(id, tag_list) %>%
    mutate(# get players in period
           pip = map(.x = id, 
                     .f = ~ ofCloudGetAthleteDevicesInPeriod(credentials = token, 
                                                             period_id = .x)),
           
           # remove player_id from players nested df
           pip = map(.x = pip, 
                     .f = ~ .x %>% select(-player_id))) %>% 
    
    # unnest
    unnest(cols = pip)

Next to get the sensor data. I dont know if this is more cumbersome than needed but I was getting a lot of double-nested lists from the catapultR functions. Maybe its because I ran them inside mutate, to add them on as new columns, but that was needed so I could retain ‘tag_list’ from the initial (outer) data frame (it distinguises between the type of SMT, but as a variable it isn’t returned by any subsequent catapultR function.). So, for example: I had to use map(~ pluck(.x, 1)) to extract the first element of .x, which is the inner data frame returned by ofCloudGetPeriodSensorDataEx().

# get sensor data --------------------------------------------------------------

sensor <- 
  pip %>% 
  
  # subset pip df to arguments for cloud sensor function
  select(period_id = id, 
         athlete_id, 
         tag_list) %>% 
  
  # get cloud sensor data & unlist into a df
  mutate(pmap(.l = list(period_id, athlete_id), 
              .f = ~ ofCloudGetPeriodSensorDataEx(credentials = token, 
                                                  period_id = ..1, 
                                                  athlete_id = ..2)) %>% 
           
           # extract inner df
           map(.f = ~ pluck(.x, 1)) %>% 
           list_rbind(), 
         
         # convert nested dfs to tibbles
         data = map(.x = data, 
                    .f = as_tibble))

Now the sensor data is in R and I can press on with tidying and wrangling it. I tend to save the data frame as a .rds file at this point, in case I ever need to backtrack but to avoid making unecessary calls to the API.

Each time we complete a new test I run a slightly different script. It first reads in my most recent .rds file, finds the last exported date, and sets this as the ‘from’ argument in the activities pull.

# get current data -------------------------------------------------------------

load(file = "data/1_cloud_data.RData")

# get last test date -----------------------------------------------------------

last <-
  
  # pul nested sensor data as a vector
  sensor %>%
  pull(data) %>%
  
  # summarise each as the first timestamp
  map_dbl(.f = ~ 
            .x %>%
            pull(ts) %>%
            min(na.rm = T)) %>% 
  
  # get latest start timestamp and format as date
  max(na.rm = T) %>%
  as_datetime() %>%
  date()

# get cloud metadata -----------------------------------------------------------

  # activities
  activities <- 
    ofCloudGetActivities(credentials = token, 
                         from = as.integer(as_datetime(last + 1)))

…and so on. I pull the sensor data down for the new dates only, then row-bind them to the original and re-save.

Tidy Sensor Data

The goal of this next section is to tidy the data frame(s) only, without any HR-specific wrangling just yet. I’ve tried to think of it as ‘cleaning up the cloud data’, so the next step can be a fork in the road for SMT-specific wrangling.

I start by adding the test date and time to the main (outer) data frame, which I get from the sensor (nested) data frame.

# add date & time --------------------------------------------------------------

sensor_tidy <-
  sensor %>%
  
  # extract first timestamp as date-time
  mutate(start = map_dbl(.x = data, 
                         .f = ~ 
                           .x %>% 
                           pull(ts) %>% 
                           min(na.rm = T)) %>%
             as_datetime(tz = "GB")) %>%
  
  # split & format
  separate_wider_delim(cols = start, 
                       delim = " ", 
                       names = c("date", "time")) %>%
  mutate(date = as_date(date))

After that I bind and tidy up all the player-related info in the main data frame.

# wrangle main df --------------------------------------------------------------
  
  # add positions & maxes
  sensor_tidy <- 
    sensor_tidy %>%
    
    # join player info
    left_join(y = 
                players %>%
                select(jersey, 
                       max_v = velocity_max, 
                       max_hr = heart_rate_max, 
                       pos = position_name), 
              by = "jersey") %>% 
    
    # combine names
    unite(col = name, 
          athlete_first_name, 
          athlete_last_name, 
          sep = " ") %>%
    
    # add position groups
    mutate(grp2 = case_when(pos %in% c("Prop", 
                                      "Hooker", 
                                      "2nd Row", 
                                      "Back Row") ~ "Forwards", 
                           .default = "Backs"), 
           grp1 = case_when(pos %in% c("Prop", 
                                       "Hooker", 
                                       "2nd Row") ~ "Tight 5s", 
                            pos == "Back Row" ~ "Back Rows", 
                            grp2 == "Backs" ~ grp2))

Then for the sensor data. I add numeric seconds, remove any null (zero) HR data, then add both HR and velocity as percent of their respective maximums (the latter is more ‘out of interest’ - I don’t actually use it for anything in this project, yet).

# wrangle sensor df ------------------------------------------------------------

  sensor_tidy <- 
    sensor_tidy %>%
    
    # add sec, remove null hr
    nest_mutate(.nest_data = data, 
                s = (row_number() / 10) - 0.1, 
                hr_bpm = na_if(x = hr, 
                               y = 0)) %>% 
    
    # add hr & vel %
    mutate(# hr %
           data = map2(.x = data, 
                       .y = max_hr, 
                       .f = ~ mutate(.data = .x, 
                                     hr = 100 * (hr_bpm / .y))), 
           # vel %
           data = map2(.x = data, 
                       .y = max_v, 
                       .f = ~ mutate(.data = .x, 
                                     v_perc = 100 * (v / .y))))

A final celan up of the main data frame before saving it.

# clean df ---------------------------------------------------------------------

sensor_tidy <- 
  sensor_tidy %>% 
  select(date, 
         time, 
         name, 
         jersey, 
         pos, 
         grp1, 
         grp2, 
         smt = tag_list, 
         sensor = data)

HR-specific Wrangling

This final part of the workflow focuses on the actual HR data and it’s use ina hr-run-style SMT. Up to now, the previous parts of the workflow and the datasets created could be used for other styles of SMT, like extracting the vertical component of PlayerLoad during standardised high-intensity strides. But that’s a story for another day…

For this one, I start by adding the 30-sec rolling mean of HR (as a 5 HR max) to the nested sensor data. This is the simplest but probably on of the most efficient ways to smooth the data.

# wrangle sensor df ------------------------------------------------------------

hr_tidy <- 
  sensor_tidy %>%
  
  # add smoothed hr
  nest_mutate(.nest_data = sensor, 
              hr_sm = slide_dbl(.x = hr, 
                                .f = mean, 
                                .before = 300))

Then I can extract the relevant sensor info to the main data frame: the proportion of NA rows in the HR column, 30-sec rolling HR at 3 minutes, and mean velocity & acceleration through the entire 3 minutes. All these are custom functions I wrote just to save huge bits of code in the main script. I’ve put them at the bottom of this write-up, if you want to check them out.

# wrangle main df --------------------------------------------------------------
  
hr_tidy <- 
  hr_tidy %>%
  
  mutate(# add run type
         protocol = case_when(jersey == "TCD" ~ "Backs & BR (50 m)", 
                              pos %in% c("Prop", 
                                         "Hooker", 
                                         "2nd Row") ~ "Tight 5 (45 m)", 
                              .default = "Backs & BR (50 m)"), 

         # count prop NA
         na_hr = map_dbl(.x = sensor, 
                         .f = get_prop_na), 
         
         # get HRav @ 3 min
         hr_av = map_dbl(.x = sensor, 
                         .f = get_hr_at), 
         
         # get vel & acc av
         v_av = map_dbl(.x = sensor, 
                        .f = ~ get_sensor_means(sensor_data = .x, 
                                                sensor = v)), 
         a_av = map_dbl(.x = sensor, 
                        .f = ~ get_sensor_means(sensor_data = .x, 
                                                sensor = a)))

At this point I have to inspect any new HR data to see if it needs to be removed or not. Sometimes a player will have a test with no missing HR, but its visually clear that the best didnt pick it up properly. The plot_hr_sensor() is another custom function I wrote and I loop all the players from a given test date through it.

# inspect new hr data ----------------------------------------------------------

hr_tidy %>%
  
  # select date to inspect
  filter(date == "2025-09-16") %>% 
  
  # subset
  select(sensor_data = sensor, name, date) %>% 

  # visualise
  pmap(.f = ~ plot_hr_sensor(sensor_data = ..1, 
                             name = ..2, 
                             date = ..3))

The workflow gets a little messy at this point, because when I sport a HR sensor file that should be removed, the best solution I’ve got is to log it in an Excel sheet then import this to filter out relevant rows. I combine this with an auto removal process where I filter out any testing occasions when a player has any HR drop-outs.

# import dubious hr ------------------------------------------------------------

dubious <- 
  read_xlsx(path = "data/dubious_hr.xlsx") %>%
  mutate(date = as_date(date))

# log & remove bad hr ----------------------------------------------------------

  # log bad hr
  bad_hr <- 
    hr_tidy %>%
      
    # join dubious hr checks
    left_join(y = dubious, 
              by = c("name", "date")) %>%
              
    # tag and subset bad hr instances
    mutate(decision = case_when(na_hr != 0 ~ "HR Dropout (auto)", 
                                decision == "remove" ~ "Bad Trace")) %>%
    filter(! is.na(decision))
  
  # subset hr_tidy
  hr_tidy <- 
    hr_tidy %>%
          
    # join dubious hr checks
    left_join(y = dubious, 
              by = c("name", "date")) %>%
    
    # subset only good hr instances
    filter(na_hr == 0, 
           decision %in% c("keep", NA))

Then the last final clean-up and export before this data is finally ready for the real analysis!

# clean df ---------------------------------------------------------------------

hr_tidy <- 
  hr_tidy %>%
  select(-c(smt, decision)) %>%
  relocate(protocol, 
           .before = sensor)

# save data files --------------------------------------------------------------

write_rds(x = hr_tidy, 
          file = "data/3a_hr_run_tidy.rds", 
          compress = "gz")
  
write_rds(x = bad_hr, 
          file = "data/3b_bad_hr.rds", 
          compress = "gz")

Appendix: Helper Functions

# hr wrangling -----------------------------------------------------------------
  
  # get proportion of NAs
  get_prop_na <- 
    function(sensor_data, 
             from = -Inf, 
             to = Inf, 
             sec = s, 
             hr = hr) {
    
    sensor_data %>% 
      filter(between(x = {{  sec  }}, 
                     left = from, 
                     right = to)) %>% 
      summarise(mean(is.na(x = hr))) %>% 
      as.numeric() %>% 
      round(2) * 100
    
    }
  
  # get hr at timepoint
  get_hr_at <- 
    function(sensor_data, 
             at = 180, 
             sec = s, 
             hr = hr_sm) {
    
    # find max dur
    dur <- 
      sensor_data %>%
      summarise(max({{  sec  }})) %>%
      as.numeric()
    
    # extract hr
    sensor_data %>% 
      filter(case_when(dur < at ~ {{  sec  }} == dur,
                       .default =  {{  sec  }} == at)) %>% 
      pull({{  hr  }})
    
    }
  
  # get sensor means 
  get_sensor_means <- 
    function(sensor_data, 
             from = 0, 
             to = 180, 
             sec = s, 
             sensor) {
    
    sensor_data %>% 
      filter(between(x = {{  sec  }}, 
                     left = from, 
                     right = to)) %>% 
      summarise(mean(abs({{  sensor  }}), 
                     na.rm = T)) %>% 
      as.numeric()
      
    }
  
# visualisations ---------------------------------------------------------------
  
  # plot hr function
  plot_hr_sensor <- 
    function(sensor_data, 
             hr_at = 180, 
             hlgt_from = NA,
             hlgt_to = NA, 
             hlgt_fill = "orange", 
             hr_raw = hr, 
             hr_sm = hr_sm, 
             sm_rm_first = 30, 
             sec = s, 
             name = NA, 
             date = NA){
    
    # find max dur
    dur <- 
      sensor_data %>%
      summarise(max({{  sec  }})) %>%
      as.numeric()
    
    # title
    title <-
      case_when(!is.na(name) & !is.na(date) ~ str_c(name, ", ", as.character(date)),
                is.na(name) ~ as.character(date),
                is.na(date) ~ name)
      
    # build plot
    hr_sensor_plot <- 
      sensor_data %>%
      
      # wrangle sensor df
      mutate(# add hr at
             hr_at = case_when({{  sec  }} == hr_at ~ {{  hr_sm  }}, 
                               dur < hr_at & {{  sec  }} == dur ~ {{  hr_sm  }}, 
                               .default = NA),
             
             # remove first n sec of rolling mean
             hr_sm = case_when({{  sec  }} < sm_rm_first ~ NA, 
                               .default = {{  hr_sm  }}), 
             
             # convert numeric time to mm:ss
             time = hms::as_hms({{  sec  }})) %>% 
  
      # set mappings
      ggplot(mapping = aes(x = time, 
                           y = {{  hr_raw  }})) + 
      
      # add shadced regions for last minute
      annotate(geom = "rect", 
               xmin = hms::as_hms(hlgt_from), 
               xmax = hms::as_hms(hlgt_to), 
               ymin = -Inf, 
               ymax = Inf, 
               fill = hlgt_fill, 
               alpha = 0.5) + 
      
      # add raw & smoothed hr
      geom_line(colour = "grey80") + 
      geom_point(mapping = aes(y = {{  hr_sm  }}), 
                 size = 1) + 
      
      # add labels for timepoints
      ggrepel::geom_label_repel(mapping = aes(y = hr_at, 
                                              label = str_c(round(hr_at, 0),"%")), 
                                size = 2, 
                                nudge_x = -15, 
                                nudge_y = -5, 
                                direction = "x", 
                                colour = "black", 
                                fontface = "bold", 
                                family = "Chakra Petch") + 

      # add timepoint
      geom_point(mapping = aes(y = hr_at), 
                 size = 4, 
                 shape = 21, 
                 fill = "red", 
                 colour = "white") + 
      
      scale_x_time(breaks = scales::breaks_width("1 min"),
                   labels = scales::label_time(format = "%M:%S")) + 
        
      # labels
      labs(title = title, 
           x = element_blank(), 
           y = "HR (% max)") + 
      
      # themes
      theme_minimal() + 
      theme(# title
            plot.title.position = "plot",
            plot.title = if(!is.na(title)) element_text(face = "bold") else element_blank(),

            # axes
            axis.title.y = element_text(face = "bold"), 
            axis.text.x = element_text(face = "bold", 
                                       colour = "black", 
                                       size = 7), 
                        
            # gridlines
            panel.grid.minor.x = element_line(colour = "grey90", 
                                              linetype = "dotted"), 
            panel.grid.major.x = element_line(colour = "grey90", 
                                              linetype = "dotted"), 
            panel.grid.major.y = element_line(colour = "grey90", 
                                              linetype = "dotted"), 
            panel.grid.minor.y = element_blank(), 
            
            # other
            text = element_text(family = "Chakra Petch"))
    
    return(hr_sensor_plot)
    
  }

Submax Testing: HR Run Workflow with Catapult

Overview

Getting Cloud Data

Tidy Sensor Data

HR-specific Wrangling