Introduction

In a previous R markdown blog² we described the relationship between power and heart rate during a virtual ride with five intervals. To predict heart rate (HR) we came up with a regression model comprising three predictors: power (in W), elapsed time in seconds (a proxy of fatigue), and gradient (expressed by elevation change meters, a measure of resistance). With two constants, viz. a HR lead of 10 seconds, and a mean power window of 25 seconds, see the equation below as an example.³

lead(heart rate, n = 10) = b0 + b1 * SMA(power, n = 25) + b2 * timestamp - b3 * elevation change meters

Suppose a one hour ride’s fatigue parameter b2 is estimated at 0.005. Model-wise this means that after one hour of cycling, the heart rate increased by 3600 times 0.005, which is 18 bpm, on average under the same power and resistance conditions.

In this markdown paper we shall examine how this fatigue indicator behaves during virtual time trial rides. The fatigue parameter could incorrectly have a negative association. That’s why we’re calling it “HR change” for the time being. Defined as the heart acceleration after your ride, calculated by a linear model with power and resistance as covariates, in other words, taking power and resistance into account. My personal virtual ride data, accessed with VeloViewer, serves as test data.

Five sample virtual time trials

In the aforementioned article we discussed (and R - coded) a dual graphical representation of a virtual interval ride, one expanded with key figures and a compact one with normalized linear trends for HR and power. In this article we merge both graphs and focus on tough virtual time trials. A randomly selected five of these are subsequently presented graphically in the usual manner.

The guideline for interpreting the graphs: in a time trial we want to keep the power as high and constant as possible in order to achieve the best performance. The aim is a flat power line. The regression lines reveal whether this was successful during the ride and the consequences for the heart rate. Here’s the R code to get the b coefficient from both simple regression models:

b_HR_distance <-  round(coef(lm(tcx_df$HR ~ tcx_df$Cum_Distance))[2],2)
b_power_distance <-  round(coef(lm(SMA(tcx_df$Watt,n=5) ~ tcx_df$Cum_Distance))[2],2)

The variability index is also important in this context. Defined as the normalized power divided by the average power. My coding of the definition looks like this:

mean_Watt <- round(mean(tcx_df$Watt),0)
normalized_power <- round(mean((SMA(tcx_df$Watt,n=30))^4,na.rm = TRUE)^(1/4),0)
variability_index <-  round(normalized_power / mean_Watt,2)

The higher this index, the greater the discrepancy between average and normalized power. A high variability index means that the ride had a lot of surges and coasting. For properly executed endurance rides and time trials, we want the index to be as low as possible. On the other hand, a criterium, a training with intervals or chasing KOMs will all show a high value, due to all of the zero and low power values from coasting.
The bottom line is that making a good estimate of how much power you can maintain throughout the race against time is almost always the best strategy.

1. Punta Veleno

Punta Veleno from Castello di Brenzone is a climb in the Italian Veneto region. It is 7.9 kilometers long and covers 1,005 meters of elevation with an average gradient of 12.7%. The top of the climb is at an altitude of 1,135 meters.
The virtual route is slightly longer than the actual climb (see Figure 1). After reaching the top, a power decline set in, which is understandable. Apparently it was all about the climb here. So despite the negative power slope (-2.43), we are pleased with this effort. The heart rate has increased considerably by 24 beats per minute in a short time. The raise is on average 2.52 heartbeats per kilometer (multiple regression model).

Figure 1. Punta Veleno List of figures

2. Passo Gavia

The Passo di Gavia is a mountain pass in the Italian Alps, in the Lombardy region. It is one of the highest pass roads in Europe. The summit is at 2,621 meters.
This ride is more than twice as long as the previous one with more elevation meters. The ascent, on the other hand, is less steep. The power slope and variability index show good values. The heart rate has increased by 29 bpm in one hour and fifteen minutes.

Figure 2. Passo Gavia List of figures

3. Potash Road

The scenic road in eastern Utah follows a towering sandstone cliff. On the the side of the road courses the Colorado River. Apart from the two small climbs at the beginning and end, it is a very flat ride.
Between the tenth and twentieth kilometers the power effort was a bit too much, given the slight decline afterwards (Figure 3). However, it is barely reflected in the regression line. The heart rate has increased by 26 bpm.

Figure 3. Potash Road List of figures

4. Oostende - Blankenberge

A very flat coastal route in Flanders, from Oostende to Blankenberge.
I probably started a little too slowly on this ride, considering the power and HR trends (a power coefficient of 1.18 and an HR change of 38, that is a raise of 1.09 bpm per km according the multiple regression model, respectively 1.22 in the simple regression model).

Figure 4. Oostende - Blankenberge List of figures

5. Gascony prologue

This ride is a 13.24 km short spin with some easy hills. Near the town of Auch, about 160 km southeast of Bordeaux.
A top ride by my standards. But according to my own interpretation of the metrics, I may have started a little too slowly. The power slope indicates 4.3 per kilometer and the heart rate increased by 36 beats per minute in just twenty-two minutes, which amounts to an average raise of 2.34 bpm per km in the multiple regression model (the simple regression model indicates 2.9 bpm).

Figure 5. Gascony prologue List of figures

6. Sample summary data

By multiplying the b coefficient of the multiple regression model by the number of seconds, you get the model-based HR change during the entire ride. If you divide this metric by the number of kilometers, you will see the HR score per km. In addition, the table contains a comparable score, namely the b coefficient of the single regression model (the “b lm HR distance” in red, and a normalized “b lm HR distance z-scores”). The simple regression power-distance b’s are at the bottom of the table. During the climb of the Punta Veleno I lost an average of 2.4 watts per kilometer, while during the Gascony prologue there was an average gain of 4.3 watts.
These five sample ride are very different in terms of distance and elevation.

Table 1a. Summary data virtual time trial samples
	Punta Veleno	Passo Gavia	Potash Road	Oostende - Blankenberge	Gascony prologue
Date	2021-01-10	2020-09-12	2017-12-14	2024-03-18	2020-03-09
Distance (km)	9.514	20.564	51.087	34.962	13.234
Elevation (m)	1,021	1,264	113	97	268
Duration	00:55:53	01:15:19	01:18:24	00:56:48	00:22:05
Seconds	3,353	4,519	4,704	3,408	1,325
mean Speed	10.2	16.4	39.1	36.9	36
mean Cadence	64	71	83	75	78
mean Watt	270	267	242	236	289
mean HR	159	152	151	139	149
Power to heart ratio	1.7	1.76	1.6	1.7	1.94
sd Watt	38	39	36	57	94
sd HR	10	11	10	14	14
normalized power	273	270	246	243	309
variability index	1.01	1.01	1.02	1.03	1.07
b2 multiple lm	0.007044	0.006328	0.005533	0.011223	0.023383
R² multiple lm	0.71	0.88	0.77	0.91	0.85
HR change	24	29	26	38	31
HR change per km	2.52	1.41	0.51	1.09	2.34
b lm HR distance	3.25	1.84	0.54	1.22	2.9
b lm HR distance (z-scores)	0.33	0.17	0.05	0.09	0.2
b lm power distance	-2.43	0.52	-0.12	1.18	4.3
b lm power distance (z-scores)	-0.06	0.01	0	0.02	0.05

To supplement Table 1a, we look up a number of general ride statistics at Strava that we cannot easily reproduce ourselves.

1. Energy (kJ) and calories. Energy or ‘Total Work’ in kilojoules (kJ) is the sum of the watts generated during the ride. There is a close 1–1 ratio between energy and calories expended during a ride.
2. Relative effort. The Relative effort measures how much cardiovascular work went into any activity that has heart rate data. A short and hard activity can require just as much effort as a long and leisurely one, and relative effort makes it so you can compare the two. Not only that, but different activity types are weighted so that your efforts can be compared across sports, and your values are personalized to your own heart rate zones so you can even compare with other people.
3. Training load. Strava calculates the training load by comparing your power during your ride to your FTP and determining how much load you put on your body during the workout. Training load is a way to determine how much rest you need after your workout.
4. Intensity percentage. This is Strava’s way of showing how difficult a ride was as compared to your FTP. They look at your weighted average power⁴ for the ride and compare it to your FTP. If your weighted average power was 250W and your FTP 300W, the Intensity would be 83%. It’s possible to have an Intensity of over 100% if the ride is shorter than an hour, or if your FTP is specified too low.

Strava divides the intensity percentage into five categories:

- Endurance / Recovery Ride – 65% and lower;
- Moderate Ride – 65-80%;
- Tempo Ride – 80-95%;
- Time Trial or Race – 95-105%;
- Short Time Trial or Race – 105% and higher.

The five example rides presented in this article all belong to the last group.

5. “Points in red”. A predecessor of Stava’s relative effort, then called ‘suffer score’, which is still offered by VeloViewer in a modified way. It is based on the amount of time spent in heart rate zones four and five.

Here’s the R code to import the activity data, to filter the examples and to create a table with additional data:

version[['version.string']]

## [1] "R version 4.3.2 (2023-10-31 ucrt)"

library(XML)
library(xml2)
library(magrittr)       # set_colnames
library(TTR)            # MA
library(hms)
library(kableExtra)
library(knitr)
library(tidyverse)

options("scipen" = 100, "digits" = 4)         # print numbers in normal notation

# get all my Strava / VeloViewer virtual rides summary data.

vv_files <- "data/vv_download/"            # Folder VeloViewer files.

# R sometimes works in mysterious ways... The rlang::sym() function takes strings (including those with - sign in name) and turns them into quotes, 
# which are in turn unquoted by !! and used as names in the context of df where they refer to the relevant columns. 

virtual_activities_df <- read_csv(file = paste0(vv_files, "activities.csv"), show_col_types = FALSE,  name_repair = "unique") %>% 
  rename_all(function(x) gsub(" ", "_", x)) %>% 
  filter(Type == 'VirtualRide') %>% 
  rename(Activity_ID = last_col(),
         Speed_kmh = "Speed_km/h",
         W_HR_ratio = "W/HR",
         Intensity_perc = "Intensity_%") %>% 
  mutate(Power_0_150 = Power_0W + !!rlang::sym('Power_0-50W') + !!rlang::sym('Power_50-100W') + !!rlang::sym('Power_100-150W'),
         Power_150_200 = !!rlang::sym('Power_150-200W'), 
         Power_200_250 = !!rlang::sym('Power_200-250W'), 
         Power_250_400 = !!rlang::sym('Power_250-300W') + !!rlang::sym('Power_300-350W') +!!rlang::sym('Power_350-400W'), 
         Power_ge_400  = !!rlang::sym('Power_400-450W') + !!rlang::sym('Power_450W+')) %>% 
  select(Activity_ID, When, Name, Elapsed_Time, Dist_km, Elv_m, Speed_kmh, Pwr_W, Weighted_Avg_Pwr_W, Max_Pwr_W, 
         Cad, Heart, Max_Heart, W_HR_ratio, Energy_kJ, Cal, Training_Load, Intensity_perc, Relative_Effort, Points_in_red,
         Power_0_150,Power_150_200,Power_200_250,Power_250_400,Power_ge_400) %>% 
  mutate(Dist_km = round(Dist_km/1000,3), 
         Speed_kmh=round(Speed_kmh * 3.6,1), 
         When = as.character(When),
         When = paste0(substring(When,1,10), " xx-", substring(When,15,19)),
         Duration = hms::as_hms(Elapsed_Time),
         Elv_m = round(Elv_m,0),
         year = year(When),
         Variability_index = round(Weighted_Avg_Pwr_W / Pwr_W,2),
         Intensity_perc_class = factor(
              ifelse(Intensity_perc < 65, "Endurance/recovery Ride - lower than 65%",
                     ifelse(Intensity_perc >= 65 & Intensity_perc < 80, "Moderate ride - 65-80%",
                            ifelse(Intensity_perc >= 80 & Intensity_perc < 95, "Tempo ride - 80-95%",
                                   ifelse(Intensity_perc >= 95 & Intensity_perc < 105, "Time trial or race - 95-105%",
                                          ifelse(Intensity_perc >= 105, "Short time trial or race - 105% and higher", NA)))))))

# pluck 5 sample virtual rides in the category "Short Time Trial or Race – 105% and higher".

activity_id <- data.frame(c(4599245358,4049850859,1313852876,10987044381,3168488893))
activity_names <- data.frame(c("Punta Veleno", "Passo Gavia", "Potash Road", "Oostende - Blankenberge", "Gascony prologue"))
activity_data <- cbind(activity_id,activity_names)
colnames(activity_data) <- c("Activity_ID", "Ride")
rm(activity_id,activity_names)
activity_data <- activity_data %>% left_join(virtual_activities_df, by = "Activity_ID") %>% select(Ride, Activity_ID, Energy_kJ, Cal, Relative_Effort, Training_Load, Intensity_perc, Intensity_perc_class)

# generate table.

knitr::kable(activity_data, format = "html", escape = FALSE, caption = "Table 1b. Additional figures from Strava") %>% 
  row_spec(1:nrow(activity_data),font_size=10) %>% 
  row_spec(0,bold=TRUE) %>% 
  column_spec(1, bold=TRUE) %>% 
  column_spec(3:6, color = "blue") %>% 
  column_spec(7:8, color = "red") %>%
  kableExtra::kable_styling(full_width = TRUE, font_size = 10)

Table 1b. Additional figures from Strava
Ride	Activity_ID	Energy_kJ	Cal	Relative_Effort	Training_Load	Intensity_perc	Intensity_perc_class
Punta Veleno	4599245358	903.9	1007.9	191	135.07	120.4	Short time trial or race - 105% and higher
Passo Gavia	4049850859	1207.4	1346.2	202	179.38	119.6	Short time trial or race - 105% and higher
Potash Road	1313852876	1137.5	1268.3	159	152.38	108.0	Short time trial or race - 105% and higher
Oostende - Blankenberge	10987044381	804.4	896.9	110	106.78	106.2	Short time trial or race - 105% and higher
Gascony prologue	3168488893	382.4	426.4	54	64.51	132.4	Short time trial or race - 105% and higher

It is noteworthy that the prologue time trial scores lowest on the ‘absolute, quantitative’ variables such as calories burned and training load, but highest on intensity percentage, which can be interpreted as a relative one.

Population virtual time trials

Based on the five examples, we have gained some insight into the figures. Now we can study the intensive time trial group as a whole. This selection consists of approximately 250 efforts out of a total number of virtual rides of more than 1,250, see the output below. That’s about 20 percent.
A number of rides are canceled for various reasons. This is not due to the Strava data, because it is always available via VeloViewer. But to obtain the regression data it is necessary to read all archived ride files. This was not successful in all cases: unfortunately the archive is incomplete and some files are rejected by the import routine, even after adjustments. Apart from all this, we have not taken into account a dozen efforts shorter than 5 km. This leaves more than 200 efforts, including very flat stages and out-of-category climbs, see the output below.

options(width = 2000)

# Function to force equal row number if necessary.

bind_cols_fill <- function(df_list) {
  max_rows <- map_int(df_list, nrow) %>% max()
  map(df_list, function(df) {
    if(nrow(df) == max_rows) return(df)
    first <- names(df)[1] %>% sym()
    df %>% add_row(!!first := rep(NA, max_rows - nrow(df)))
  }) %>% bind_cols()
}

# Function to get parameter p-value classification.

get_p_value_class <- function(fit, i){
  p_class <- as.character(symnum(summary(fit)$coefficients[,"Pr(>|t|)"][[i]], cutpoints = c(0,0.001,0.01,0.05,1), 
                                  symbols = c("< .001", "< .01", "< .05", "n.s.")))
}

# Read all recent tcx files.

tcx_files = list.files(path = 'C:/Users/acmsc/Documents/Strava uploads/', pattern = "tcx$", full.names = TRUE)

variables <-  13
output <- matrix(ncol=variables, nrow=(length(tcx_files)))

for (i in c(1:length(tcx_files))){
  
rouvy_tcx_file <- tcx_files[i]

mark_als_XML <- XML::xmlParse(rouvy_tcx_file)

df1 <- xmlToDataFrame(mark_als_XML, nodes=getNodeSet(mark_als_XML, "//nm:HeartRateBpm", 
                                                     namespaces=c(nm="http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2"))) %>% rename(HR = Value) %>% mutate(HR = as.numeric(HR))

df2 <- xmlToDataFrame(mark_als_XML, nodes=getNodeSet(mark_als_XML, "//nm:Time",
                                                     namespaces=c(nm="http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2"))) %>% rename(Time = text)

df3 <- xmlToDataFrame(mark_als_XML, nodes=getNodeSet(mark_als_XML, "//nm:AltitudeMeters",
                                                     namespaces=c(nm="http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2"))) %>% rename(Altitude = text) %>% mutate(Altitude = as.numeric(Altitude))

df4 <- xmlToDataFrame(mark_als_XML, nodes=getNodeSet(mark_als_XML, "//nm:DistanceMeters",
                                                     namespaces=c(nm="http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2"))) %>% filter(row_number()!=1)  %>% rename(Cum_Distance = text) %>% mutate(Cum_Distance = round(as.numeric(Cum_Distance),0)) 

part1_df <- bind_cols_fill(list(df1,df2,df3,df4)) %>% 
  mutate_all(~ifelse(is.na(.x), mean(.x, na.rm = TRUE), .x))  

part2_df <-  do.call("rbind", xpathApply(mark_als_XML, "//ns3:TPX", function(N) {
  data.frame(
    Watt = as.numeric(xmlValue(N[["Watts"]]))
  )
}))

tcx_df <- cbind(part1_df, part2_df) %>%  
  mutate(Elevation_change_m = Altitude - lag(Altitude)) %>% 
  mutate(Elevation_change_m = if_else(is.na(Elevation_change_m), 0, Elevation_change_m)) %>% 
  mutate(Timestamp = hms::as_hms(row_number())) %>% 
  mutate(Cum_Distance = Cum_Distance / 1000) %>% 
  select(Time, Timestamp, HR, Watt, Elevation_change_m, Cum_Distance)

When <- paste0(substring(min(df2$Time),1,10), " xx-", substring(min(df2$Time),15,19))

# apply linear model.

linearMod <- lm(lead(HR, n = 10) ~ SMA(Watt, n = 25) + Timestamp + Elevation_change_m, data=tcx_df)

b0 <- as.numeric(round(linearMod$coefficients[1],0))
b1 <- as.numeric(round(linearMod$coefficients[2],4))
b2 <- as.numeric(round(linearMod$coefficients[3],6))
b3 <- as.numeric(round(linearMod$coefficients[4],4))
R2 <- as.numeric(round(summary(linearMod)$r.squared,2))
R2_p_value <- as.character(symnum(summary(linearMod)$fstatistic %>% {unname(pf(.[1],.[2],.[3],lower.tail=FALSE))} , cutpoints = c(0,0.001,0.01,0.05,1),
                                  symbols = c("< .001", "< .01", "< .05", "n.s.")))
b0_p_value <- get_p_value_class(linearMod, 1)
b1_p_value <- get_p_value_class(linearMod, 2)
b2_p_value <- get_p_value_class(linearMod, 3)
b3_p_value <- get_p_value_class(linearMod, 4)

HR_distance_slope <- as.numeric(round(coef(lm(scale(tcx_df$HR) ~ tcx_df$Cum_Distance))[2],2))
Power_distance_slope <- as.numeric(round(coef(lm(scale(tcx_df$Watt) ~ tcx_df$Cum_Distance))[2],2))

output[i,] <- c(When,R2,R2_p_value,b0,b1,b2,b3,b0_p_value,b1_p_value,b2_p_value,b3_p_value,HR_distance_slope,Power_distance_slope)

}

new_virtual_rides_lm_hr_HR_change_df <- data.frame(output) %>%
  set_colnames(c("When","R2","R2_p_value","b0","b1","b2","b3","b0_p_value","b1_p_value","b2_p_value","b3_p_value", "HR_distance_slope","Power_distance_slope")) %>% 
  mutate(hr_HR_change = round(as.numeric(b2) * 3600,0))

# export new virtual activities LM data to csv.
write_csv2(new_virtual_rides_lm_hr_HR_change_df, "data/new_virtual_rides_lm_hr_HR_change.csv")
#csv_check <- read_csv2(file = "data/new_virtual_rides_lm_hr_HR_change.csv", col_names = TRUE)  

rm(bind_cols_fill,tcx_files,rouvy_tcx_file,mark_als_XML,vv_files,tcx_df,df1,df2,df3,df4,linearMod,output,part1_df,part2_df,b0,b1,b2,b3,b0_p_value,b1_p_value,b2_p_value,b3_p_value,i,R2,R2_p_value,When, variables)
rm(HR_distance_slope, Power_distance_slope, get_p_value_class)

old_virtual_rides_lm_hr_HR_change_df <- read_csv2(file = "data/old_virtual_rides_lm_hr_HR_change.csv", col_names = TRUE)  

virtual_rides_lm_hr_HR_change_df <- rbind(old_virtual_rides_lm_hr_HR_change_df,new_virtual_rides_lm_hr_HR_change_df) 

virtual_rides_all_metrics <- virtual_activities_df %>% left_join(virtual_rides_lm_hr_HR_change_df, by = c("When")) 

library(epiDisplay)

epiDisplay::tab1(factor(virtual_rides_all_metrics$Intensity_perc_class, levels = c("Endurance/recovery Ride - lower than 65%","Moderate ride - 65-80%","Tempo ride - 80-95%","Time trial or race - 95-105%","Short time trial or race - 105% and higher")), cum.percent = TRUE, graph = FALSE)

## factor(virtual_rides_all_metrics$Intensity_perc_class, levels = c("Endurance/recovery Ride - lower than 65%",  : 
##      "Moderate ride - 65-80%", "Tempo ride - 80-95%", "Time trial or race - 95-105%",  : 
##      "Short time trial or race - 105% and higher")) : 
##                                            Frequency Percent Cum. percent
## Endurance/recovery Ride - lower than 65%          76     6.0          6.0
## Moderate ride - 65-80%                           159    12.6         18.7
## Tempo ride - 80-95%                              384    30.5         49.2
## Time trial or race - 95-105%                     384    30.5         79.8
## Short time trial or race - 105% and higher       254    20.2        100.0
##   Total                                         1257   100.0        100.0

time_trials <- virtual_rides_all_metrics %>% filter(Intensity_perc_class == "Short time trial or race - 105% and higher") %>% 
  filter(!is.na(hr_HR_change) & Variability_index <= 1.07 & Dist_km > 5 & Dist_km < 66 & Activity_ID != 4337793747  & Activity_ID != 4787832167 & Activity_ID != 3455449898  & Activity_ID != 2206457124) %>% 
  mutate(Duration_class = factor(ifelse(Duration < 3600, "< 1 hour", ifelse(Duration >= 3600 & Duration < 5400, "1-1.5 hour", "> 1.5 hour")), levels = c("< 1 hour","1-1.5 hour", "> 1.5 hour"))) %>% 
  mutate(Elv_m_4_class = factor(ifelse(Elv_m < 150, "< 150m", 
                                ifelse(Elv_m >= 150 & Elv_m < 500, "150-500m",
                                  ifelse(Elv_m >= 500 & Elv_m < 1000, "500-1000m", "> 1000m"))), levels = c("< 150m","150-500m","500-1000m", "> 1000m"))) %>% 
  mutate(Elv_m_3_class = factor(ifelse(Elv_m >= 1000, "> 1000m",
                                  ifelse(Elv_m >= 500 & Elv_m < 1000, "500-1000m", "< 500m")), levels = c("> 1000m","500-1000m", "< 500m"))) %>% 
  mutate(HR_change = as.numeric(b2) * Elapsed_Time)
   
epiDisplay::tab1(time_trials$Elv_m_4_class, cum.percent = TRUE, graph = FALSE)

## time_trials$Elv_m_4_class : 
##           Frequency Percent Cum. percent
## < 150m           13     6.4          6.4
## 150-500m         41    20.1         26.5
## 500-1000m        92    45.1         71.6
## > 1000m          58    28.4        100.0
##   Total         204   100.0        100.0

1. Descriptives

Table 2 gives some measures of dispersion and central tendency. The table includes the median, range, and the first and third quartiles. Where the range is a measure of where the beginning and end are in a set, the interquartile range (IQR) is a measure of where the bulk of the values lie. The IQR specifies where the “middle fifty” is in a data set: between the first and third quartiles.
Apart from the common feature of very high relative intensity, the other variables vary considerably. The indicator that triggered this writing, the HR change, also has a wide range from 4 to 43.
The proportion expained variance in the multipe regression model (R²) range is 0.11 - 0.98. Coefficient b2 of the multiple model is the basis for the HR change (b2 times number of seconds equals HR change). HR_distance_slope and Power_distance_slope are the b1’s of two simple regressions on z-score normalized variables, viz. HR-cumulative distance and power-cumulative distance, to be discussed in paragraph 6.

time_trials_summary_part1 <- time_trials %>% mutate(Elapsed_Time_minutes = Elapsed_Time/60, HR_change_km = round(HR_change / Dist_km,2)) %>% dplyr::select(HR_change,HR_change_km,Elapsed_Time_minutes,Dist_km,Elv_m,Speed_kmh,Pwr_W,Weighted_Avg_Pwr_W,Variability_index,Cad,Heart,Max_Heart,W_HR_ratio,Energy_kJ,Cal,Training_Load,Relative_Effort,Points_in_red)
tts1 <- do.call(cbind, lapply(time_trials_summary_part1, summary))
tts2 <- as.data.frame(t(tts1)) %>% 
  mutate(across(where(is.numeric), ~ if_else(!grepl("W_HR_ratio|Variability_index|HR_change_km", rownames(as.data.frame(t(tts1)))), round(., digits = 0),.))) %>% 
  mutate(across(where(is.numeric), ~ format(.,big.mark = ',', decimal.mark = '.'))) %>% 
  mutate(across(where(is_character), ~ str_replace(., ".000|.00", "")))

time_trials_summary_part2 <- time_trials %>% dplyr::select(R2,b2,HR_distance_slope,Power_distance_slope) %>% 
  mutate_all(~as.numeric(.))
tts3 <- do.call(cbind, lapply(time_trials_summary_part2, summary))
tts4 <- as.data.frame(t(tts3)) %>% 
  mutate(across(where(is.numeric), ~ if_else(!grepl("b2", rownames(as.data.frame(t(tts3)))), round(., digits = 2),.))) %>% 
  mutate(across(where(is.numeric), ~ format(.,big.mark = ',', decimal.mark = '.'))) %>% 
  mutate(across(where(is_character), ~ if_else(!grepl("b2", rownames(as.data.frame(t(tts3)))), str_replace(., "000", ""),.)))

rbind(tts2,tts4) %>% dplyr::select(-Mean) %>% 
  kbl(caption = paste0("Table 2. Median and range of virtual time trial stats. (n = ", nrow(time_trials), ")")) %>% 
  kable_classic(full_width = FALSE) %>% 
  row_spec(1:nrow(rbind(tts2,tts4)), align = "r")

Table 2. Median and range of virtual time trial stats. (n = 204)
	Min.	1st Qu.	Median	3rd Qu.	Max.
HR_change	4	21	26	32	43
HR_change_km	0.070	0.570	0.880	1.202	4.11
Elapsed_Time_minutes	22	57	70	83	144
Dist_km	9	23	31	40	66
Elv_m	0	492	734	1,089	2,044
Speed_kmh	10	20	30	34	47
Pwr_W	224	236	242	250	289
Weighted_Avg_Pwr_W	237	241	247	256	298
Variability_index	1	1.010	1.020	1.040	1.07
Cad	64	73	77	79	88
Heart	135	145	150	153	162
Max_Heart	155	165	168	171	179
W_HR_ratio	1.507	1.623	1.672	1.728	2
Energy_kJ	380	814	1,025	1,.000	2,038
Cal	423	908	1,143	1,338	2,272
Training_Load	52	112	141	168	266
Relative_Effort	37	124	160	202	343
Points_in_red	3	40	82	120	260
R2	0.110	0.650	0.80	0.87	0.98
b2	0.000656	0.004344	0.00635	0.00848	0.02338
HR_distance_slope	0.000	0.060	0.09	0.13	0.38
Power_distance_slope	-0.090	-0.010	0.01	0.02	0.19

2. Correlation matrix

To gain some insight into the relationships between these variables, we run a correlation matrix. When discussing the matrix, I would like to focus on the coefficients of HR change (on the first row). There is a negative relationship with the length/duration variables (elapsed time, distance). And also with metrics that measure the total amount of effort (energy, calories, training load, relative effort). A weak positive relationship between HR change and maximum heart rate can be observed. The conclusion is that on average during the entire ride, the heart rate is most accelerated in shorter efforts and efforts where the total amount of work has remained within limits. In these cases the maximum HR achieved is slightly higher.

cor_data <- time_trials %>% dplyr::select(HR_change,Elapsed_Time,Dist_km,Elv_m,Speed_kmh,Pwr_W,Weighted_Avg_Pwr_W,Variability_index,Cad,Heart,Max_Heart,W_HR_ratio,Energy_kJ,Cal,Training_Load,Relative_Effort,Points_in_red,Intensity_perc)
cor_data_matrix <- Hmisc::rcorr(as.matrix(cor_data))
cor_data_matrix_r <- cor_data_matrix$r
cor_data_matrix_P <- cor_data_matrix$P
corrplot::corrplot(cor_data_matrix_r, p.mat = cor_data_matrix_P, sig.level = 0.05, type = "upper", diag = FALSE, order = "original",
                   insig = 'blank', tl.cex = 0.7, tl.col    = "black", tl.srt = 45,
                   title=paste0('Correlation matrix (p < 0.05, insignificant cells suppressed. (n = ', nrow(time_trials), ')'), mar=c(0,0,1,0),tl.offset = 0.5, cex.main = 0.8, col.main = "black")

Figure 6. Correlation matrix List of figures

3. Training load and HR change

Let’s explore a few HR change relationships further, starting with training load.
Figure 7a is the scatter plot. When you look at the point cloud, do you expect a significant relationship?

ggplot(time_trials, aes(y = Training_Load,  x = as.numeric(HR_change))) +
  geom_point() +
  theme(plot.title = element_text(size=17), axis.title=element_text(size=9), legend.position = "right") +
  labs(x="HR change (bpm), defined as: LM model b2 coefficient * elapsed time (seconds)", y="Training load",  color='elevation', fill = 'elevation', shape = 'elevation',
       title = "Scatterplot HR change and training load",
       subtitle = paste0(nrow(time_trials), " virtual time trial rides (with intensity >= 105%), 2017 - ", year(today())),
       caption  = paste0("Devices: Tacx Neo, Wahoo Kickr"))

Figure 7a. Scatterplot HR change and training load List of figures

This significant relationship exists according to the correlation matrix (Pearson’s r = -0.33201, p<0.01), see Figure 7b.

library(ggtext)

ggplot(time_trials, aes(y = Training_Load,  x = as.numeric(HR_change))) +
  geom_point() +
  geom_smooth(method = 'lm', formula = y ~ splines::bs(x, 4), se = FALSE, colour="red", linewidth=2) +
  geom_smooth(method = 'lm', se = FALSE, colour="blue", linewidth=1.5) +
  theme(plot.title = element_text(size=17), axis.title=element_text(size=9), legend.position = "right") +
  labs(x="HR change (bpm), defined as: LM model b2 coefficient * elapsed time (seconds)", y="Training load",  color='elevation', fill = 'elevation', shape = 'elevation',
       title = "Trend HR change and training load (<span style='color: blue;'>linear</span> and <span style='color: red;'>non-linear</span>)",
       subtitle = paste0(" in ", nrow(time_trials), " virtual time trial rides (with intensity >= 105%), 2017 - ", year(today())),
       caption  = paste0("Devices: Tacx Neo, Wahoo Kickr")) +
  theme(plot.title = element_markdown())

Figure 7b. Relationship HR change and training load List of figures

We can break down the relationship into elevation meters, see Figure 7c. In each of the three elevation categories, on average the lower the training load, the higher the HR change.

ggplot(time_trials, aes(y = Training_Load,  x = as.numeric(HR_change), 
                        shape = as.factor(Elv_m_3_class) , color = as.factor(Elv_m_3_class))) +
  geom_point() +
  geom_smooth(method = 'lm', se = FALSE) +
  scale_color_manual(values=c('red','darkgreen','blue')) +
  scale_shape_manual(values = c(24, 16, 3)) +
  theme(plot.title = element_text(size=17), axis.title=element_text(size=9), legend.position = "right") +
  labs(x="HR change (bpm), defined as: LM model b2 coefficient * elapsed time (seconds)", y="Training load",  color='elevation', fill = 'elevation', shape = 'elevation',
       title = "Trend HR change and training load, by elevation",
       subtitle = paste0(" in ", nrow(time_trials), " virtual time trial rides (with intensity >= 105%), 2017 - ", year(today())),
       caption  = paste0("Devices: Tacx Neo, Wahoo Kickr"))

Figure 7c. Relationship HR change, training load and elevation List of figures

4. Elapsed time and HR change

There is a negative relationship HR change - duration, in each of the three elevation categories.

ggplot(time_trials, aes(y = as.POSIXct(Duration, format = "%H:%M:%S"),  x = as.numeric(HR_change), 
                        shape = as.factor(Elv_m_3_class) , color = as.factor(Elv_m_3_class))) +
  geom_point() +
  geom_smooth(method = 'lm', se = FALSE) +
  scale_color_manual(values=c('red','darkgreen','blue')) +
  scale_shape_manual(values = c(24, 16, 3)) +
  theme(plot.title = element_text(size=17), axis.title=element_text(size=9), legend.position = "right") +
  labs(x="HR change (bpm), defined as: LM model b2 coefficient * elapsed time (seconds)", y="Elapsed time",  color='elevation', fill = 'elevation', shape = 'elevation',
       title = "Relationship HR change and duration, by elevation",
       subtitle = paste0(" in ", nrow(time_trials), " virtual time trial rides (with intensity >= 105%), 2017 - ", year(today())),
       caption  = paste0("Devices: Tacx Neo, Wahoo Kickr"))

Figure 8. Relationship HR change, duration and elevation List of figures

5. Energy and HR change

There is a negative relationship HR change - energy, in each of the three elevation categories.
Figures 7c, 8, and 9 are almost copies of each other, indicating the strong relationship between training load, energy and duration in the examined group of efforts that was already demonstrated in the correlation matrix.

ggplot(time_trials, aes(y = Energy_kJ,  x = as.numeric(HR_change), 
                        shape = as.factor(Elv_m_3_class) , color = as.factor(Elv_m_3_class))) +
  geom_point() +
  geom_smooth(method = 'lm', se = FALSE) +
  scale_color_manual(values=c('red','darkgreen','blue')) +
  scale_shape_manual(values = c(24, 16, 3)) +
  theme(plot.title = element_text(size=17), axis.title=element_text(size=9), legend.position = "right") +
  labs(x="HR change (bpm), defined as: LM model b2 coefficient * elapsed time (seconds)", y="Energy (kJ)",  color='elevation', fill = 'elevation', shape = 'elevation',
       title = "Relationship HR change and Energy (kJ), by elevation",
       subtitle = paste0(" in ", nrow(time_trials), " virtual time trial rides (with intensity >= 105%), 2017 - ", year(today())),
       caption  = paste0("Devices: Tacx Neo, Wahoo Kickr")) +
  scale_y_continuous(labels = scales::comma_format(big.mark = ',', decimal.mark = '.'))

Figure 9. Relationship HR change, energy and elevation List of figures

6. Simple regression b’s

For each ride we performed two simple regressions with distance as the independent variable: the first with power as the dependent variable, the second with HR. In order to compare the two slope angles (the model b1 coefficient), the dependent variables are normalized by z-scores.

HR_distance_slope <- as.numeric(round(coef(lm(scale(tcx_df$HR) ~ tcx_df$Cum_Distance))[2],2))
Power_distance_slope <- as.numeric(round(coef(lm(scale(tcx_df$Watt) ~ tcx_df$Cum_Distance))[2],2))

A positive slope corresponds to an increase, a negative one to decrease. The heart rate increases in all rides, but the power, which we would like to keep constant during the ride, appears to drop quite often (see Figure 10). Apparently, on quite a number of rides I started too energetically. Fortunately, the median is just greater than zero (0.01).

slope1 <- time_trials %>% dplyr::select(HR_distance_slope) %>% rename(slope = HR_distance_slope) %>% 
  mutate(trend = 'HR distance')

slope2 <- time_trials %>% dplyr::select(Power_distance_slope) %>% rename(slope = Power_distance_slope) %>%  
  mutate(trend = 'Power distance')

slopes <- rbind(slope1,slope2) %>% mutate(slope = as.numeric(slope))

slope_median <- slopes %>% 
  group_by(trend) %>%
  summarise(grp_median = median(slope))

ggplot(slopes, aes(x = slope)) + 
  geom_histogram(aes(color = trend, fill = trend),alpha = 0.3, position = "identity") +
  scale_fill_manual(values = c("red", "darkgreen")) +
  scale_color_manual(values = c("red", "darkgreen")) +
  labs(x="slope", y="count", 
       title = "Slopes (regression b's) of normalized distance-power (W)\n and distance-HR (bpm) trends",
       subtitle = paste0(" in ", nrow(time_trials), " virtual time trial rides, 2017 - ", year(today())),
       caption  = paste0("Devices: Tacx Neo, Wahoo Kickr")) +
  theme(legend.text = element_text(size=7), axis.title=element_text(size=9)) +
  geom_label(aes(0.2, 35, hjust = 0, vjust = 0, label = paste0("median HR slopes: ", slope_median[1,2], "\n",
                                                               "median power slopes: ", slope_median[2,2])), data.frame())

Figure 10a. Histogram regression b’s normalized trends List of figures

Figure 10a is an interesting result of our exploratory research in this article. So let’s enhance this graph a bit…
A density plot visualizes the distribution of data over a continuous interval. This type of chart is a representation of the distribution of a numeric variable that uses a kernel density estimate (KDE) to show the probability density function of the variable. An advantage density plots have over histograms is that they’re better at determining the distribution shape because they’re not affected by the number of bins used. Figure 11 shows the result. A typical example of a compact graph with a clear message: don’t go too fast in the first part of your time trial!

ggplot(slopes, aes(x = slope)) + 
  geom_histogram(aes(y = after_stat(density), color = trend), 
                   fill = "white",position = "identity")+
  geom_density(aes(color = trend), linewidth = 1, adjust = 1, kernel = c("gaussian")) +
  scale_color_manual(values = c("red", "darkgreen")) +
  labs(x="slope", y="density",
       title = "Density plot slopes (regression b's) of normalized distance-power (W)\n and distance-HR (bpm) trends",
       subtitle = paste0(" in ", nrow(time_trials), " virtual time trial rides, 2017 - ", year(today())),
       caption  = paste0("Devices: Tacx Neo, Wahoo Kickr")) +
  theme(legend.text = element_text(size=7), axis.title=element_text(size=9))

Figure 10b. Density plot regression b’s normalized trends List of figures

Concluding remarks

What have I learned while writing this document? First of all, the HR change within the selection of heavy time trials shows a wide variation in scores. The median of the HR change is 26 bpm, a quarter of the scores are below 21 bpm, a quarter above 32 bpm. Apparently the group of selected time trials is less homogeneous than expected (see table 2).
Regarding HR change, the correlation matrix (Figure 6) shows a negative relationship with the length/duration variables (elapsed time, distance). And also with metrics that measure the total amount of effort (energy, calories, training load, relative effort). A weak positive relationship between HR change and maximum heart rate can be observed. The conclusion is that on average during the entire ride, the heart rate is most accelerated in shorter efforts and efforts where the total amount of work has remained within limits. In these cases the maximum HR achieved is slightly higher.

The second conclusion concerns the power trends. In a time trial we want to keep the power as high and constant as possible in order to achieve the best performance. The aim is a flat power line. We especially want to prevent declining power. Given the number of efforts with a negative power slope angle, I am not always successful in this (see Figures 10a-b).

List of figures

Figure 1. Punta Veleno
Figure 2. Passo Gavia
Figure 3. Potash Road
Figure 4. Oostende - Blankenberge
Figure 5. Gascony prologue
Figure 6. Correlation matrix
Figure 7a. Scatterplot HR change and training load
Figure 7b. Relationship HR change and training load
Figure 7c. Relationship HR change, training load and elevation
Figure 8. Relationship HR change, duration and elevation
Figure 9. Relationship HR change, energy and elevation
Figure 10a. Histogram regression b’s normalized trends
Figure 10b. Density plot regression b’s normalized trends

Bibliography

Stack Overflow
https://stackoverflow.com/

Strava
https://www.strava.com

VeloViewer
https://www.veloviewer.com

Footnotes

Geographer, data-analist and cyclist↩︎
The relationship between power and heart rate in a virtual ride with five ascents, and a potentially interesting HR change indicator as a spin-off
https://rpubs.com/JSchakenraad/1187324 ↩︎
A simple moving average (SMA) is calculated by taking the arithmetic mean of a given set of values over a specified period.↩︎
The weighted average power indicator is very similar to normalized power↩︎

The use of regression coefficients as power and heart rate change indicators in virtual time trial rides

Jos Schakenraad¹

published 2024-06-11 (updated 2024-06-11)

Introduction

Five sample virtual time trials

1. Punta Veleno

2. Passo Gavia

3. Potash Road

4. Oostende - Blankenberge

5. Gascony prologue

6. Sample summary data

Population virtual time trials

1. Descriptives

2. Correlation matrix

3. Training load and HR change

4. Elapsed time and HR change

5. Energy and HR change

6. Simple regression b’s

Concluding remarks

List of figures

Bibliography

The use of regression coefficients as power and heart rate change indicators in virtual time trial rides

Jos Schakenraad1

published 2024-06-11 (updated 2024-06-11)

Introduction

Five sample virtual time trials

1. Punta Veleno

2. Passo Gavia

3. Potash Road

4. Oostende - Blankenberge

5. Gascony prologue

6. Sample summary data

Population virtual time trials

1. Descriptives

2. Correlation matrix

3. Training load and HR change

4. Elapsed time and HR change

5. Energy and HR change

6. Simple regression b’s

Concluding remarks

List of figures

Bibliography

Jos Schakenraad¹