Introduction


In a previous R markdown blog2 we described the relationship between power and heart rate during a virtual ride with five intervals. To predict heart rate (HR) we came up with a regression model comprising three predictors: power (in W), elapsed time in seconds (a proxy of fatigue), and gradient (expressed by elevation change meters, a measure of resistance). With two constants, viz. a HR lead of 10 seconds, and a mean power window of 25 seconds, see the equation below as an example.3

lead(heart rate, n = 10) = b0 + b1 * SMA(power, n = 25) + b2 * timestamp - b3 * elevation change meters

Suppose a one hour ride’s fatigue parameter b2 is estimated at 0.005. Model-wise this means that after one hour of cycling, the heart rate increased by 3600 times 0.005, which is 18 bpm, on average under the same power and resistance conditions.

In this markdown paper we shall examine how this fatigue indicator behaves during virtual time trial rides. The fatigue parameter could incorrectly have a negative association. That’s why we’re calling it “HR change” for the time being. Defined as the heart acceleration after your ride, calculated by a linear model with power and resistance as covariates, in other words, taking power and resistance into account. My personal virtual ride data, accessed with VeloViewer, serves as test data.

Five sample virtual time trials


In the aforementioned article we discussed (and R - coded) a dual graphical representation of a virtual interval ride, one expanded with key figures and a compact one with normalized linear trends for HR and power. In this article we merge both graphs and focus on tough virtual time trials. A randomly selected five of these are subsequently presented graphically in the usual manner.

The guideline for interpreting the graphs: in a time trial we want to keep the power as high and constant as possible in order to achieve the best performance. The aim is a flat power line. The regression lines reveal whether this was successful during the ride and the consequences for the heart rate. Here’s the R code to get the b coefficient from both simple regression models:

b_HR_distance <-  round(coef(lm(tcx_df$HR ~ tcx_df$Cum_Distance))[2],2)
b_power_distance <-  round(coef(lm(SMA(tcx_df$Watt,n=5) ~ tcx_df$Cum_Distance))[2],2)

The variability index is also important in this context. Defined as the normalized power divided by the average power. My coding of the definition looks like this:

mean_Watt <- round(mean(tcx_df$Watt),0)
normalized_power <- round(mean((SMA(tcx_df$Watt,n=30))^4,na.rm = TRUE)^(1/4),0)
variability_index <-  round(normalized_power / mean_Watt,2)

The higher this index, the greater the discrepancy between average and normalized power. A high variability index means that the ride had a lot of surges and coasting. For properly executed endurance rides and time trials, we want the index to be as low as possible. On the other hand, a criterium, a training with intervals or chasing KOMs will all show a high value, due to all of the zero and low power values from coasting.
The bottom line is that making a good estimate of how much power you can maintain throughout the race against time is almost always the best strategy.


1. Punta Veleno



Punta Veleno from Castello di Brenzone is a climb in the Italian Veneto region. It is 7.9 kilometers long and covers 1,005 meters of elevation with an average gradient of 12.7%. The top of the climb is at an altitude of 1,135 meters.
The virtual route is slightly longer than the actual climb (see Figure 1). After reaching the top, a power decline set in, which is understandable. Apparently it was all about the climb here. So despite the negative power slope (-2.43), we are pleased with this effort. The heart rate has increased considerably by 24 beats per minute in a short time. The raise is on average 2.52 heartbeats per kilometer (multiple regression model).


Figure 1. Punta Veleno  [List of figures](#figlist)

Figure 1. Punta Veleno List of figures


2. Passo Gavia



The Passo di Gavia is a mountain pass in the Italian Alps, in the Lombardy region. It is one of the highest pass roads in Europe. The summit is at 2,621 meters.
This ride is more than twice as long as the previous one with more elevation meters. The ascent, on the other hand, is less steep. The power slope and variability index show good values. The heart rate has increased by 29 bpm in one hour and fifteen minutes.


Figure 2. Passo Gavia [List of figures](#figlist)

Figure 2. Passo Gavia List of figures


3. Potash Road



The scenic road in eastern Utah follows a towering sandstone cliff. On the the side of the road courses the Colorado River. Apart from the two small climbs at the beginning and end, it is a very flat ride.
Between the tenth and twentieth kilometers the power effort was a bit too much, given the slight decline afterwards (Figure 3). However, it is barely reflected in the regression line. The heart rate has increased by 26 bpm.


Figure 3. Potash Road [List of figures](#figlist)

Figure 3. Potash Road List of figures


4. Oostende - Blankenberge



A very flat coastal route in Flanders, from Oostende to Blankenberge.
I probably started a little too slowly on this ride, considering the power and HR trends (a power coefficient of 1.18 and an HR change of 38, that is a raise of 1.09 bpm per km according the multiple regression model, respectively 1.22 in the simple regression model).


Figure 4. Oostende - Blankenberge [List of figures](#figlist)

Figure 4. Oostende - Blankenberge List of figures


5. Gascony prologue



This ride is a 13.24 km short spin with some easy hills. Near the town of Auch, about 160 km southeast of Bordeaux.
A top ride by my standards. But according to my own interpretation of the metrics, I may have started a little too slowly. The power slope indicates 4.3 per kilometer and the heart rate increased by 36 beats per minute in just twenty-two minutes, which amounts to an average raise of 2.34 bpm per km in the multiple regression model (the simple regression model indicates 2.9 bpm).

Figure 5. Gascony prologue [List of figures](#figlist)

Figure 5. Gascony prologue List of figures


6. Sample summary data



By multiplying the b coefficient of the multiple regression model by the number of seconds, you get the model-based HR change during the entire ride. If you divide this metric by the number of kilometers, you will see the HR score per km. In addition, the table contains a comparable score, namely the b coefficient of the single regression model (the “b lm HR distance” in red, and a normalized “b lm HR distance z-scores”). The simple regression power-distance b’s are at the bottom of the table. During the climb of the Punta Veleno I lost an average of 2.4 watts per kilometer, while during the Gascony prologue there was an average gain of 4.3 watts.
These five sample ride are very different in terms of distance and elevation.

Table 1a. Summary data virtual time trial samples
Punta Veleno Passo Gavia Potash Road Oostende - Blankenberge Gascony prologue
Date 2021-01-10 2020-09-12 2017-12-14 2024-03-18 2020-03-09
Distance (km) 9.514 20.564 51.087 34.962 13.234
Elevation (m) 1,021 1,264 113 97 268
Duration 00:55:53 01:15:19 01:18:24 00:56:48 00:22:05
Seconds 3,353 4,519 4,704 3,408 1,325
mean Speed 10.2 16.4 39.1 36.9 36
mean Cadence 64 71 83 75 78
mean Watt 270 267 242 236 289
mean HR 159 152 151 139 149
Power to heart ratio 1.7 1.76 1.6 1.7 1.94
sd Watt 38 39 36 57 94
sd HR 10 11 10 14 14
normalized power 273 270 246 243 309
variability index 1.01 1.01 1.02 1.03 1.07
b2 multiple lm 0.007044 0.006328 0.005533 0.011223 0.023383
R² multiple lm 0.71 0.88 0.77 0.91 0.85
HR change 24 29 26 38 31
HR change per km 2.52 1.41 0.51 1.09 2.34
b lm HR distance 3.25 1.84 0.54 1.22 2.9
b lm HR distance (z-scores) 0.33 0.17 0.05 0.09 0.2
b lm power distance -2.43 0.52 -0.12 1.18 4.3
b lm power distance (z-scores) -0.06 0.01 0 0.02 0.05



To supplement Table 1a, we look up a number of general ride statistics at Strava that we cannot easily reproduce ourselves.

1. Energy (kJ) and calories. Energy or ‘Total Work’ in kilojoules (kJ) is the sum of the watts generated during the ride. There is a close 1–1 ratio between energy and calories expended during a ride.
2. Relative effort. The Relative effort measures how much cardiovascular work went into any activity that has heart rate data. A short and hard activity can require just as much effort as a long and leisurely one, and relative effort makes it so you can compare the two. Not only that, but different activity types are weighted so that your efforts can be compared across sports, and your values are personalized to your own heart rate zones so you can even compare with other people.
3. Training load. Strava calculates the training load by comparing your power during your ride to your FTP and determining how much load you put on your body during the workout. Training load is a way to determine how much rest you need after your workout.
4. Intensity percentage. This is Strava’s way of showing how difficult a ride was as compared to your FTP. They look at your weighted average power4 for the ride and compare it to your FTP. If your weighted average power was 250W and your FTP 300W, the Intensity would be 83%. It’s possible to have an Intensity of over 100% if the ride is shorter than an hour, or if your FTP is specified too low.

Strava divides the intensity percentage into five categories:

- Endurance / Recovery Ride – 65% and lower;
- Moderate Ride – 65-80%;
- Tempo Ride – 80-95%;
- Time Trial or Race – 95-105%;
- Short Time Trial or Race – 105% and higher.

The five example rides presented in this article all belong to the last group.

5. “Points in red”. A predecessor of Stava’s relative effort, then called ‘suffer score’, which is still offered by VeloViewer in a modified way. It is based on the amount of time spent in heart rate zones four and five.


Here’s the R code to import the activity data, to filter the examples and to create a table with additional data:


version[['version.string']]
## [1] "R version 4.3.2 (2023-10-31 ucrt)"
library(XML)
library(xml2)
library(magrittr)       # set_colnames
library(TTR)            # MA
library(hms)
library(kableExtra)
library(knitr)
library(tidyverse)

options("scipen" = 100, "digits" = 4)         # print numbers in normal notation

# get all my Strava / VeloViewer virtual rides summary data.

vv_files <- "data/vv_download/"            # Folder VeloViewer files.

# R sometimes works in mysterious ways... The rlang::sym() function takes strings (including those with - sign in name) and turns them into quotes, 
# which are in turn unquoted by !! and used as names in the context of df where they refer to the relevant columns. 

virtual_activities_df <- read_csv(file = paste0(vv_files, "activities.csv"), show_col_types = FALSE,  name_repair = "unique") %>% 
  rename_all(function(x) gsub(" ", "_", x)) %>% 
  filter(Type == 'VirtualRide') %>% 
  rename(Activity_ID = last_col(),
         Speed_kmh = "Speed_km/h",
         W_HR_ratio = "W/HR",
         Intensity_perc = "Intensity_%") %>% 
  mutate(Power_0_150 = Power_0W + !!rlang::sym('Power_0-50W') + !!rlang::sym('Power_50-100W') + !!rlang::sym('Power_100-150W'),
         Power_150_200 = !!rlang::sym('Power_150-200W'), 
         Power_200_250 = !!rlang::sym('Power_200-250W'), 
         Power_250_400 = !!rlang::sym('Power_250-300W') + !!rlang::sym('Power_300-350W') +!!rlang::sym('Power_350-400W'), 
         Power_ge_400  = !!rlang::sym('Power_400-450W') + !!rlang::sym('Power_450W+')) %>% 
  select(Activity_ID, When, Name, Elapsed_Time, Dist_km, Elv_m, Speed_kmh, Pwr_W, Weighted_Avg_Pwr_W, Max_Pwr_W, 
         Cad, Heart, Max_Heart, W_HR_ratio, Energy_kJ, Cal, Training_Load, Intensity_perc, Relative_Effort, Points_in_red,
         Power_0_150,Power_150_200,Power_200_250,Power_250_400,Power_ge_400) %>% 
  mutate(Dist_km = round(Dist_km/1000,3), 
         Speed_kmh=round(Speed_kmh * 3.6,1), 
         When = as.character(When),
         When = paste0(substring(When,1,10), " xx-", substring(When,15,19)),
         Duration = hms::as_hms(Elapsed_Time),
         Elv_m = round(Elv_m,0),
         year = year(When),
         Variability_index = round(Weighted_Avg_Pwr_W / Pwr_W,2),
         Intensity_perc_class = factor(
              ifelse(Intensity_perc < 65, "Endurance/recovery Ride - lower than 65%",
                     ifelse(Intensity_perc >= 65 & Intensity_perc < 80, "Moderate ride - 65-80%",
                            ifelse(Intensity_perc >= 80 & Intensity_perc < 95, "Tempo ride - 80-95%",
                                   ifelse(Intensity_perc >= 95 & Intensity_perc < 105, "Time trial or race - 95-105%",
                                          ifelse(Intensity_perc >= 105, "Short time trial or race - 105% and higher", NA)))))))

# pluck 5 sample virtual rides in the category "Short Time Trial or Race – 105% and higher".

activity_id <- data.frame(c(4599245358,4049850859,1313852876,10987044381,3168488893))
activity_names <- data.frame(c("Punta Veleno", "Passo Gavia", "Potash Road", "Oostende - Blankenberge", "Gascony prologue"))
activity_data <- cbind(activity_id,activity_names)
colnames(activity_data) <- c("Activity_ID", "Ride")
rm(activity_id,activity_names)
activity_data <- activity_data %>% left_join(virtual_activities_df, by = "Activity_ID") %>% select(Ride, Activity_ID, Energy_kJ, Cal, Relative_Effort, Training_Load, Intensity_perc, Intensity_perc_class)

# generate table.

knitr::kable(activity_data, format = "html", escape = FALSE, caption = "Table 1b. Additional figures from Strava") %>% 
  row_spec(1:nrow(activity_data),font_size=10) %>% 
  row_spec(0,bold=TRUE) %>% 
  column_spec(1, bold=TRUE) %>% 
  column_spec(3:6, color = "blue") %>% 
  column_spec(7:8, color = "red") %>%
  kableExtra::kable_styling(full_width = TRUE, font_size = 10) 
Table 1b. Additional figures from Strava
Ride Activity_ID Energy_kJ Cal Relative_Effort Training_Load Intensity_perc Intensity_perc_class
Punta Veleno 4599245358 903.9 1007.9 191 135.07 120.4 Short time trial or race - 105% and higher
Passo Gavia 4049850859 1207.4 1346.2 202 179.38 119.6 Short time trial or race - 105% and higher
Potash Road 1313852876 1137.5 1268.3 159 152.38 108.0 Short time trial or race - 105% and higher
Oostende - Blankenberge 10987044381 804.4 896.9 110 106.78 106.2 Short time trial or race - 105% and higher
Gascony prologue 3168488893 382.4 426.4 54 64.51 132.4 Short time trial or race - 105% and higher



It is noteworthy that the prologue time trial scores lowest on the ‘absolute, quantitative’ variables such as calories burned and training load, but highest on intensity percentage, which can be interpreted as a relative one.



Population virtual time trials



Based on the five examples, we have gained some insight into the figures. Now we can study the intensive time trial group as a whole. This selection consists of approximately 250 efforts out of a total number of virtual rides of more than 1,250, see the output below. That’s about 20 percent.
A number of rides are canceled for various reasons. This is not due to the Strava data, because it is always available via VeloViewer. But to obtain the regression data it is necessary to read all archived ride files. This was not successful in all cases: unfortunately the archive is incomplete and some files are rejected by the import routine, even after adjustments. Apart from all this, we have not taken into account a dozen efforts shorter than 5 km. This leaves more than 200 efforts, including very flat stages and out-of-category climbs, see the output below.



options(width = 2000)

# Function to force equal row number if necessary.

bind_cols_fill <- function(df_list) {
  max_rows <- map_int(df_list, nrow) %>% max()
  map(df_list, function(df) {
    if(nrow(df) == max_rows) return(df)
    first <- names(df)[1] %>% sym()
    df %>% add_row(!!first := rep(NA, max_rows - nrow(df)))
  }) %>% bind_cols()
}

# Function to get parameter p-value classification.

get_p_value_class <- function(fit, i){
  p_class <- as.character(symnum(summary(fit)$coefficients[,"Pr(>|t|)"][[i]], cutpoints = c(0,0.001,0.01,0.05,1), 
                                  symbols = c("< .001", "< .01", "< .05", "n.s.")))
}

# Read all recent tcx files.

tcx_files = list.files(path = 'C:/Users/acmsc/Documents/Strava uploads/', pattern = "tcx$", full.names = TRUE)

variables <-  13
output <- matrix(ncol=variables, nrow=(length(tcx_files)))

for (i in c(1:length(tcx_files))){
  
rouvy_tcx_file <- tcx_files[i]

mark_als_XML <- XML::xmlParse(rouvy_tcx_file)

df1 <- xmlToDataFrame(mark_als_XML, nodes=getNodeSet(mark_als_XML, "//nm:HeartRateBpm", 
                                                     namespaces=c(nm="http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2"))) %>% rename(HR = Value) %>% mutate(HR = as.numeric(HR))

df2 <- xmlToDataFrame(mark_als_XML, nodes=getNodeSet(mark_als_XML, "//nm:Time",
                                                     namespaces=c(nm="http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2"))) %>% rename(Time = text)

df3 <- xmlToDataFrame(mark_als_XML, nodes=getNodeSet(mark_als_XML, "//nm:AltitudeMeters",
                                                     namespaces=c(nm="http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2"))) %>% rename(Altitude = text) %>% mutate(Altitude = as.numeric(Altitude))

df4 <- xmlToDataFrame(mark_als_XML, nodes=getNodeSet(mark_als_XML, "//nm:DistanceMeters",
                                                     namespaces=c(nm="http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2"))) %>% filter(row_number()!=1)  %>% rename(Cum_Distance = text) %>% mutate(Cum_Distance = round(as.numeric(Cum_Distance),0)) 

part1_df <- bind_cols_fill(list(df1,df2,df3,df4)) %>% 
  mutate_all(~ifelse(is.na(.x), mean(.x, na.rm = TRUE), .x))  

part2_df <-  do.call("rbind", xpathApply(mark_als_XML, "//ns3:TPX", function(N) {
  data.frame(
    Watt = as.numeric(xmlValue(N[["Watts"]]))
  )
}))

tcx_df <- cbind(part1_df, part2_df) %>%  
  mutate(Elevation_change_m = Altitude - lag(Altitude)) %>% 
  mutate(Elevation_change_m = if_else(is.na(Elevation_change_m), 0, Elevation_change_m)) %>% 
  mutate(Timestamp = hms::as_hms(row_number())) %>% 
  mutate(Cum_Distance = Cum_Distance / 1000) %>% 
  select(Time, Timestamp, HR, Watt, Elevation_change_m, Cum_Distance)

When <- paste0(substring(min(df2$Time),1,10), " xx-", substring(min(df2$Time),15,19))

# apply linear model.

linearMod <- lm(lead(HR, n = 10) ~ SMA(Watt, n = 25) + Timestamp + Elevation_change_m, data=tcx_df)

b0 <- as.numeric(round(linearMod$coefficients[1],0))
b1 <- as.numeric(round(linearMod$coefficients[2],4))
b2 <- as.numeric(round(linearMod$coefficients[3],6))
b3 <- as.numeric(round(linearMod$coefficients[4],4))
R2 <- as.numeric(round(summary(linearMod)$r.squared,2))
R2_p_value <- as.character(symnum(summary(linearMod)$fstatistic %>% {unname(pf(.[1],.[2],.[3],lower.tail=FALSE))} , cutpoints = c(0,0.001,0.01,0.05,1),
                                  symbols = c("< .001", "< .01", "< .05", "n.s.")))
b0_p_value <- get_p_value_class(linearMod, 1)
b1_p_value <- get_p_value_class(linearMod, 2)
b2_p_value <- get_p_value_class(linearMod, 3)
b3_p_value <- get_p_value_class(linearMod, 4)

HR_distance_slope <- as.numeric(round(coef(lm(scale(tcx_df$HR) ~ tcx_df$Cum_Distance))[2],2))
Power_distance_slope <- as.numeric(round(coef(lm(scale(tcx_df$Watt) ~ tcx_df$Cum_Distance))[2],2))

output[i,] <- c(When,R2,R2_p_value,b0,b1,b2,b3,b0_p_value,b1_p_value,b2_p_value,b3_p_value,HR_distance_slope,Power_distance_slope)

}

new_virtual_rides_lm_hr_HR_change_df <- data.frame(output) %>%
  set_colnames(c("When","R2","R2_p_value","b0","b1","b2","b3","b0_p_value","b1_p_value","b2_p_value","b3_p_value", "HR_distance_slope","Power_distance_slope")) %>% 
  mutate(hr_HR_change = round(as.numeric(b2) * 3600,0))

# export new virtual activities LM data to csv.
write_csv2(new_virtual_rides_lm_hr_HR_change_df, "data/new_virtual_rides_lm_hr_HR_change.csv")
#csv_check <- read_csv2(file = "data/new_virtual_rides_lm_hr_HR_change.csv", col_names = TRUE)  

rm(bind_cols_fill,tcx_files,rouvy_tcx_file,mark_als_XML,vv_files,tcx_df,df1,df2,df3,df4,linearMod,output,part1_df,part2_df,b0,b1,b2,b3,b0_p_value,b1_p_value,b2_p_value,b3_p_value,i,R2,R2_p_value,When, variables)
rm(HR_distance_slope, Power_distance_slope, get_p_value_class)

old_virtual_rides_lm_hr_HR_change_df <- read_csv2(file = "data/old_virtual_rides_lm_hr_HR_change.csv", col_names = TRUE)  

virtual_rides_lm_hr_HR_change_df <- rbind(old_virtual_rides_lm_hr_HR_change_df,new_virtual_rides_lm_hr_HR_change_df) 

virtual_rides_all_metrics <- virtual_activities_df %>% left_join(virtual_rides_lm_hr_HR_change_df, by = c("When")) 

library(epiDisplay)

epiDisplay::tab1(factor(virtual_rides_all_metrics$Intensity_perc_class, levels = c("Endurance/recovery Ride - lower than 65%","Moderate ride - 65-80%","Tempo ride - 80-95%","Time trial or race - 95-105%","Short time trial or race - 105% and higher")), cum.percent = TRUE, graph = FALSE)
## factor(virtual_rides_all_metrics$Intensity_perc_class, levels = c("Endurance/recovery Ride - lower than 65%",  : 
##      "Moderate ride - 65-80%", "Tempo ride - 80-95%", "Time trial or race - 95-105%",  : 
##      "Short time trial or race - 105% and higher")) : 
##                                            Frequency Percent Cum. percent
## Endurance/recovery Ride - lower than 65%          76     6.0          6.0
## Moderate ride - 65-80%                           159    12.6         18.7
## Tempo ride - 80-95%                              384    30.5         49.2
## Time trial or race - 95-105%                     384    30.5         79.8
## Short time trial or race - 105% and higher       254    20.2        100.0
##   Total                                         1257   100.0        100.0
time_trials <- virtual_rides_all_metrics %>% filter(Intensity_perc_class == "Short time trial or race - 105% and higher") %>% 
  filter(!is.na(hr_HR_change) & Variability_index <= 1.07 & Dist_km > 5 & Dist_km < 66 & Activity_ID != 4337793747  & Activity_ID != 4787832167 & Activity_ID != 3455449898  & Activity_ID != 2206457124) %>% 
  mutate(Duration_class = factor(ifelse(Duration < 3600, "< 1 hour", ifelse(Duration >= 3600 & Duration < 5400, "1-1.5 hour", "> 1.5 hour")), levels = c("< 1 hour","1-1.5 hour", "> 1.5 hour"))) %>% 
  mutate(Elv_m_4_class = factor(ifelse(Elv_m < 150, "< 150m", 
                                ifelse(Elv_m >= 150 & Elv_m < 500, "150-500m",
                                  ifelse(Elv_m >= 500 & Elv_m < 1000, "500-1000m", "> 1000m"))), levels = c("< 150m","150-500m","500-1000m", "> 1000m"))) %>% 
  mutate(Elv_m_3_class = factor(ifelse(Elv_m >= 1000, "> 1000m",
                                  ifelse(Elv_m >= 500 & Elv_m < 1000, "500-1000m", "< 500m")), levels = c("> 1000m","500-1000m", "< 500m"))) %>% 
  mutate(HR_change = as.numeric(b2) * Elapsed_Time)
   
epiDisplay::tab1(time_trials$Elv_m_4_class, cum.percent = TRUE, graph = FALSE)
## time_trials$Elv_m_4_class : 
##           Frequency Percent Cum. percent
## < 150m           13     6.4          6.4
## 150-500m         41    20.1         26.5
## 500-1000m        92    45.1         71.6
## > 1000m          58    28.4        100.0
##   Total         204   100.0        100.0



1. Descriptives



Table 2 gives some measures of dispersion and central tendency. The table includes the median, range, and the first and third quartiles. Where the range is a measure of where the beginning and end are in a set, the interquartile range (IQR) is a measure of where the bulk of the values lie. The IQR specifies where the “middle fifty” is in a data set: between the first and third quartiles.
Apart from the common feature of very high relative intensity, the other variables vary considerably. The indicator that triggered this writing, the HR change, also has a wide range from 4 to 43.
The proportion expained variance in the multipe regression model (R²) range is 0.11 - 0.98. Coefficient b2 of the multiple model is the basis for the HR change (b2 times number of seconds equals HR change). HR_distance_slope and Power_distance_slope are the b1’s of two simple regressions on z-score normalized variables, viz. HR-cumulative distance and power-cumulative distance, to be discussed in paragraph 6.

time_trials_summary_part1 <- time_trials %>% mutate(Elapsed_Time_minutes = Elapsed_Time/60, HR_change_km = round(HR_change / Dist_km,2)) %>% dplyr::select(HR_change,HR_change_km,Elapsed_Time_minutes,Dist_km,Elv_m,Speed_kmh,Pwr_W,Weighted_Avg_Pwr_W,Variability_index,Cad,Heart,Max_Heart,W_HR_ratio,Energy_kJ,Cal,Training_Load,Relative_Effort,Points_in_red)
tts1 <- do.call(cbind, lapply(time_trials_summary_part1, summary))
tts2 <- as.data.frame(t(tts1)) %>% 
  mutate(across(where(is.numeric), ~ if_else(!grepl("W_HR_ratio|Variability_index|HR_change_km", rownames(as.data.frame(t(tts1)))), round(., digits = 0),.))) %>% 
  mutate(across(where(is.numeric), ~ format(.,big.mark = ',', decimal.mark = '.'))) %>% 
  mutate(across(where(is_character), ~ str_replace(., ".000|.00", "")))

time_trials_summary_part2 <- time_trials %>% dplyr::select(R2,b2,HR_distance_slope,Power_distance_slope) %>% 
  mutate_all(~as.numeric(.))
tts3 <- do.call(cbind, lapply(time_trials_summary_part2, summary))
tts4 <- as.data.frame(t(tts3)) %>% 
  mutate(across(where(is.numeric), ~ if_else(!grepl("b2", rownames(as.data.frame(t(tts3)))), round(., digits = 2),.))) %>% 
  mutate(across(where(is.numeric), ~ format(.,big.mark = ',', decimal.mark = '.'))) %>% 
  mutate(across(where(is_character), ~ if_else(!grepl("b2", rownames(as.data.frame(t(tts3)))), str_replace(., "000", ""),.)))

rbind(tts2,tts4) %>% dplyr::select(-Mean) %>% 
  kbl(caption = paste0("Table 2. Median and range of virtual time trial stats. (n = ", nrow(time_trials), ")")) %>% 
  kable_classic(full_width = FALSE) %>% 
  row_spec(1:nrow(rbind(tts2,tts4)), align = "r") 
Table 2. Median and range of virtual time trial stats. (n = 204)
Min. 1st Qu. Median 3rd Qu. Max.
HR_change 4 21 26 32 43
HR_change_km 0.070 0.570 0.880 1.202 4.11
Elapsed_Time_minutes 22 57 70 83 144
Dist_km 9 23 31 40 66
Elv_m 0 492 734 1,089 2,044
Speed_kmh 10 20 30 34 47
Pwr_W 224 236 242 250 289
Weighted_Avg_Pwr_W 237 241 247 256 298
Variability_index 1 1.010 1.020 1.040 1.07
Cad 64 73 77 79 88
Heart 135 145 150 153 162
Max_Heart 155 165 168 171 179
W_HR_ratio 1.507 1.623 1.672 1.728 2
Energy_kJ 380 814 1,025 1,.000 2,038
Cal 423 908 1,143 1,338 2,272
Training_Load 52 112 141 168 266
Relative_Effort 37 124 160 202 343
Points_in_red 3 40 82 120 260
R2 0.110 0.650 0.80 0.87 0.98
b2 0.000656 0.004344 0.00635 0.00848 0.02338
HR_distance_slope 0.000 0.060 0.09 0.13 0.38
Power_distance_slope -0.090 -0.010 0.01 0.02 0.19



2. Correlation matrix



To gain some insight into the relationships between these variables, we run a correlation matrix. When discussing the matrix, I would like to focus on the coefficients of HR change (on the first row). There is a negative relationship with the length/duration variables (elapsed time, distance). And also with metrics that measure the total amount of effort (energy, calories, training load, relative effort). A weak positive relationship between HR change and maximum heart rate can be observed. The conclusion is that on average during the entire ride, the heart rate is most accelerated in shorter efforts and efforts where the total amount of work has remained within limits. In these cases the maximum HR achieved is slightly higher.

cor_data <- time_trials %>% dplyr::select(HR_change,Elapsed_Time,Dist_km,Elv_m,Speed_kmh,Pwr_W,Weighted_Avg_Pwr_W,Variability_index,Cad,Heart,Max_Heart,W_HR_ratio,Energy_kJ,Cal,Training_Load,Relative_Effort,Points_in_red,Intensity_perc)
cor_data_matrix <- Hmisc::rcorr(as.matrix(cor_data))
cor_data_matrix_r <- cor_data_matrix$r
cor_data_matrix_P <- cor_data_matrix$P
corrplot::corrplot(cor_data_matrix_r, p.mat = cor_data_matrix_P, sig.level = 0.05, type = "upper", diag = FALSE, order = "original",
                   insig = 'blank', tl.cex = 0.7, tl.col    = "black", tl.srt = 45,
                   title=paste0('Correlation matrix (p < 0.05, insignificant cells suppressed. (n = ', nrow(time_trials), ')'), mar=c(0,0,1,0),tl.offset = 0.5, cex.main = 0.8, col.main = "black")
Figure 6. Correlation matrix [List of figures](#figlist)

Figure 6. Correlation matrix List of figures



3. Training load and HR change


Let’s explore a few HR change relationships further, starting with training load.
Figure 7a is the scatter plot. When you look at the point cloud, do you expect a significant relationship?

ggplot(time_trials, aes(y = Training_Load,  x = as.numeric(HR_change))) +
  geom_point() +
  theme(plot.title = element_text(size=17), axis.title=element_text(size=9), legend.position = "right") +
  labs(x="HR change (bpm), defined as: LM model b2 coefficient * elapsed time (seconds)", y="Training load",  color='elevation', fill = 'elevation', shape = 'elevation',
       title = "Scatterplot HR change and training load",
       subtitle = paste0(nrow(time_trials), " virtual time trial rides (with intensity >= 105%), 2017 - ", year(today())),
       caption  = paste0("Devices: Tacx Neo, Wahoo Kickr"))
Figure 7a. Scatterplot HR change and training load [List of figures](#figlist)

Figure 7a. Scatterplot HR change and training load List of figures



This significant relationship exists according to the correlation matrix (Pearson’s r = -0.33201, p<0.01), see Figure 7b.

library(ggtext)

ggplot(time_trials, aes(y = Training_Load,  x = as.numeric(HR_change))) +
  geom_point() +
  geom_smooth(method = 'lm', formula = y ~ splines::bs(x, 4), se = FALSE, colour="red", linewidth=2) +
  geom_smooth(method = 'lm', se = FALSE, colour="blue", linewidth=1.5) +
  theme(plot.title = element_text(size=17), axis.title=element_text(size=9), legend.position = "right") +
  labs(x="HR change (bpm), defined as: LM model b2 coefficient * elapsed time (seconds)", y="Training load",  color='elevation', fill = 'elevation', shape = 'elevation',
       title = "Trend HR change and training load (<span style='color: blue;'>linear</span> and <span style='color: red;'>non-linear</span>)",
       subtitle = paste0(" in ", nrow(time_trials), " virtual time trial rides (with intensity >= 105%), 2017 - ", year(today())),
       caption  = paste0("Devices: Tacx Neo, Wahoo Kickr")) +
  theme(plot.title = element_markdown())
Figure 7b. Relationship HR change and training load [List of figures](#figlist)

Figure 7b. Relationship HR change and training load List of figures



We can break down the relationship into elevation meters, see Figure 7c. In each of the three elevation categories, on average the lower the training load, the higher the HR change.

ggplot(time_trials, aes(y = Training_Load,  x = as.numeric(HR_change), 
                        shape = as.factor(Elv_m_3_class) , color = as.factor(Elv_m_3_class))) +
  geom_point() +
  geom_smooth(method = 'lm', se = FALSE) +
  scale_color_manual(values=c('red','darkgreen','blue')) +
  scale_shape_manual(values = c(24, 16, 3)) +
  theme(plot.title = element_text(size=17), axis.title=element_text(size=9), legend.position = "right") +
  labs(x="HR change (bpm), defined as: LM model b2 coefficient * elapsed time (seconds)", y="Training load",  color='elevation', fill = 'elevation', shape = 'elevation',
       title = "Trend HR change and training load, by elevation",
       subtitle = paste0(" in ", nrow(time_trials), " virtual time trial rides (with intensity >= 105%), 2017 - ", year(today())),
       caption  = paste0("Devices: Tacx Neo, Wahoo Kickr"))
Figure 7c. Relationship HR change, training load and elevation [List of figures](#figlist)

Figure 7c. Relationship HR change, training load and elevation List of figures



4. Elapsed time and HR change



There is a negative relationship HR change - duration, in each of the three elevation categories.

ggplot(time_trials, aes(y = as.POSIXct(Duration, format = "%H:%M:%S"),  x = as.numeric(HR_change), 
                        shape = as.factor(Elv_m_3_class) , color = as.factor(Elv_m_3_class))) +
  geom_point() +
  geom_smooth(method = 'lm', se = FALSE) +
  scale_color_manual(values=c('red','darkgreen','blue')) +
  scale_shape_manual(values = c(24, 16, 3)) +
  theme(plot.title = element_text(size=17), axis.title=element_text(size=9), legend.position = "right") +
  labs(x="HR change (bpm), defined as: LM model b2 coefficient * elapsed time (seconds)", y="Elapsed time",  color='elevation', fill = 'elevation', shape = 'elevation',
       title = "Relationship HR change and duration, by elevation",
       subtitle = paste0(" in ", nrow(time_trials), " virtual time trial rides (with intensity >= 105%), 2017 - ", year(today())),
       caption  = paste0("Devices: Tacx Neo, Wahoo Kickr"))
Figure 8. Relationship HR change, duration and elevation [List of figures](#figlist)

Figure 8. Relationship HR change, duration and elevation List of figures



5. Energy and HR change



There is a negative relationship HR change - energy, in each of the three elevation categories.
Figures 7c, 8, and 9 are almost copies of each other, indicating the strong relationship between training load, energy and duration in the examined group of efforts that was already demonstrated in the correlation matrix.

ggplot(time_trials, aes(y = Energy_kJ,  x = as.numeric(HR_change), 
                        shape = as.factor(Elv_m_3_class) , color = as.factor(Elv_m_3_class))) +
  geom_point() +
  geom_smooth(method = 'lm', se = FALSE) +
  scale_color_manual(values=c('red','darkgreen','blue')) +
  scale_shape_manual(values = c(24, 16, 3)) +
  theme(plot.title = element_text(size=17), axis.title=element_text(size=9), legend.position = "right") +
  labs(x="HR change (bpm), defined as: LM model b2 coefficient * elapsed time (seconds)", y="Energy (kJ)",  color='elevation', fill = 'elevation', shape = 'elevation',
       title = "Relationship HR change and Energy (kJ), by elevation",
       subtitle = paste0(" in ", nrow(time_trials), " virtual time trial rides (with intensity >= 105%), 2017 - ", year(today())),
       caption  = paste0("Devices: Tacx Neo, Wahoo Kickr")) +
  scale_y_continuous(labels = scales::comma_format(big.mark = ',', decimal.mark = '.'))
Figure 9. Relationship HR change, energy and elevation [List of figures](#figlist)

Figure 9. Relationship HR change, energy and elevation List of figures



6. Simple regression b’s



For each ride we performed two simple regressions with distance as the independent variable: the first with power as the dependent variable, the second with HR. In order to compare the two slope angles (the model b1 coefficient), the dependent variables are normalized by z-scores.

HR_distance_slope <- as.numeric(round(coef(lm(scale(tcx_df$HR) ~ tcx_df$Cum_Distance))[2],2))
Power_distance_slope <- as.numeric(round(coef(lm(scale(tcx_df$Watt) ~ tcx_df$Cum_Distance))[2],2))

A positive slope corresponds to an increase, a negative one to decrease. The heart rate increases in all rides, but the power, which we would like to keep constant during the ride, appears to drop quite often (see Figure 10). Apparently, on quite a number of rides I started too energetically. Fortunately, the median is just greater than zero (0.01).


slope1 <- time_trials %>% dplyr::select(HR_distance_slope) %>% rename(slope = HR_distance_slope) %>% 
  mutate(trend = 'HR distance')

slope2 <- time_trials %>% dplyr::select(Power_distance_slope) %>% rename(slope = Power_distance_slope) %>%  
  mutate(trend = 'Power distance')

slopes <- rbind(slope1,slope2) %>% mutate(slope = as.numeric(slope))

slope_median <- slopes %>% 
  group_by(trend) %>%
  summarise(grp_median = median(slope))

ggplot(slopes, aes(x = slope)) + 
  geom_histogram(aes(color = trend, fill = trend),alpha = 0.3, position = "identity") +
  scale_fill_manual(values = c("red", "darkgreen")) +
  scale_color_manual(values = c("red", "darkgreen")) +
  labs(x="slope", y="count", 
       title = "Slopes (regression b's) of normalized distance-power (W)\n and distance-HR (bpm) trends",
       subtitle = paste0(" in ", nrow(time_trials), " virtual time trial rides, 2017 - ", year(today())),
       caption  = paste0("Devices: Tacx Neo, Wahoo Kickr")) +
  theme(legend.text = element_text(size=7), axis.title=element_text(size=9)) +
  geom_label(aes(0.2, 35, hjust = 0, vjust = 0, label = paste0("median HR slopes: ", slope_median[1,2], "\n",
                                                               "median power slopes: ", slope_median[2,2])), data.frame())
Figure 10a. Histogram regression b's normalized trends [List of figures](#figlist)

Figure 10a. Histogram regression b’s normalized trends List of figures



Figure 10a is an interesting result of our exploratory research in this article. So let’s enhance this graph a bit…
A density plot visualizes the distribution of data over a continuous interval. This type of chart is a representation of the distribution of a numeric variable that uses a kernel density estimate (KDE) to show the probability density function of the variable. An advantage density plots have over histograms is that they’re better at determining the distribution shape because they’re not affected by the number of bins used. Figure 11 shows the result. A typical example of a compact graph with a clear message: don’t go too fast in the first part of your time trial!

ggplot(slopes, aes(x = slope)) + 
  geom_histogram(aes(y = after_stat(density), color = trend), 
                   fill = "white",position = "identity")+
  geom_density(aes(color = trend), linewidth = 1, adjust = 1, kernel = c("gaussian")) +
  scale_color_manual(values = c("red", "darkgreen")) +
  labs(x="slope", y="density",
       title = "Density plot slopes (regression b's) of normalized distance-power (W)\n and distance-HR (bpm) trends",
       subtitle = paste0(" in ", nrow(time_trials), " virtual time trial rides, 2017 - ", year(today())),
       caption  = paste0("Devices: Tacx Neo, Wahoo Kickr")) +
  theme(legend.text = element_text(size=7), axis.title=element_text(size=9))
Figure 10b. Density plot regression b's normalized trends [List of figures](#figlist)

Figure 10b. Density plot regression b’s normalized trends List of figures



Concluding remarks

What have I learned while writing this document? First of all, the HR change within the selection of heavy time trials shows a wide variation in scores. The median of the HR change is 26 bpm, a quarter of the scores are below 21 bpm, a quarter above 32 bpm. Apparently the group of selected time trials is less homogeneous than expected (see table 2).
Regarding HR change, the correlation matrix (Figure 6) shows a negative relationship with the length/duration variables (elapsed time, distance). And also with metrics that measure the total amount of effort (energy, calories, training load, relative effort). A weak positive relationship between HR change and maximum heart rate can be observed. The conclusion is that on average during the entire ride, the heart rate is most accelerated in shorter efforts and efforts where the total amount of work has remained within limits. In these cases the maximum HR achieved is slightly higher.

The second conclusion concerns the power trends. In a time trial we want to keep the power as high and constant as possible in order to achieve the best performance. The aim is a flat power line. We especially want to prevent declining power. Given the number of efforts with a negative power slope angle, I am not always successful in this (see Figures 10a-b).


Bibliography



Stack Overflow
https://stackoverflow.com/

Strava
https://www.strava.com

VeloViewer
https://www.veloviewer.com



Footnotes


  1. Geographer, data-analist and cyclist↩︎

  2. The relationship between power and heart rate in a virtual ride with five ascents, and a potentially interesting HR change indicator as a spin-off
    https://rpubs.com/JSchakenraad/1187324↩︎

  3. A simple moving average (SMA) is calculated by taking the arithmetic mean of a given set of values over a specified period.↩︎

  4. The weighted average power indicator is very similar to normalized power↩︎