In a previous R markdown blog2 we described the relationship between power
and heart rate during a virtual ride with five intervals. To predict
heart rate (HR) we came up with a regression model comprising three
predictors: power (in W), elapsed time in seconds (a proxy of fatigue),
and gradient (expressed by elevation change meters, a measure of
resistance). With two constants, viz. a HR lead of 10 seconds, and a
mean power window of 25 seconds, see the equation below as an example.3
lead(heart rate, n = 10) = b0 + b1 * SMA(power, n = 25) + b2 *
timestamp - b3 * elevation change meters
Suppose a one hour ride’s fatigue parameter b2 is estimated at 0.005.
Model-wise this means that after one hour of cycling, the heart rate
increased by 3600 times 0.005, which is 18 bpm, on average under the
same power and resistance conditions.
In this markdown paper we
shall examine how this fatigue indicator behaves during virtual time
trial rides. The fatigue parameter could incorrectly have a negative
association. That’s why we’re calling it “HR change” for the time being.
Defined as the heart acceleration after your ride, calculated by a
linear model with power and resistance as covariates, in other words,
taking power and resistance into account. My personal virtual ride data,
accessed with VeloViewer, serves as test data.
In the aforementioned article we discussed (and R - coded) a
dual graphical representation of a virtual interval ride, one expanded
with key figures and a compact one with normalized linear trends for HR
and power. In this article we merge both graphs and focus on tough
virtual time trials. A randomly selected five of these are subsequently
presented graphically in the usual manner.
The guideline for
interpreting the graphs: in a time trial we want to keep the power as
high and constant as possible in order to achieve the best performance.
The aim is a flat power line. The regression lines reveal whether this
was successful during the ride and the consequences for the heart rate.
Here’s the R code to get the b coefficient from both simple regression
models:
b_HR_distance <- round(coef(lm(tcx_df$HR ~ tcx_df$Cum_Distance))[2],2)
b_power_distance <- round(coef(lm(SMA(tcx_df$Watt,n=5) ~ tcx_df$Cum_Distance))[2],2)
The variability index is also important in this context. Defined as
the normalized power divided by the average power. My coding of the
definition looks like this:
mean_Watt <- round(mean(tcx_df$Watt),0)
normalized_power <- round(mean((SMA(tcx_df$Watt,n=30))^4,na.rm = TRUE)^(1/4),0)
variability_index <- round(normalized_power / mean_Watt,2)
The higher this index, the greater the discrepancy between average
and normalized power. A high variability index means that the ride had a
lot of surges and coasting. For properly executed endurance rides and
time trials, we want the index to be as low as possible. On the other
hand, a criterium, a training with intervals or chasing KOMs will all
show a high value, due to all of the zero and low power values from
coasting.
The bottom line is that making a good estimate of how much
power you can maintain throughout the race against time is almost always
the best strategy.
Punta Veleno from Castello di Brenzone is a climb in the
Italian Veneto region. It is 7.9 kilometers long and covers 1,005 meters
of elevation with an average gradient of 12.7%. The top of the climb is
at an altitude of 1,135 meters.
The virtual route is slightly longer
than the actual climb (see Figure 1). After reaching the top, a power
decline set in, which is understandable. Apparently it was all about the
climb here. So despite the negative power slope (-2.43), we are pleased
with this effort. The heart rate has increased considerably by 24 beats
per minute in a short time. The raise is on average 2.52 heartbeats per
kilometer (multiple regression model).
Figure 1. Punta Veleno List of figures
The Passo di Gavia is a mountain pass in the Italian Alps,
in the Lombardy region. It is one of the highest pass roads in Europe.
The summit is at 2,621 meters.
This ride is more than twice as long
as the previous one with more elevation meters. The ascent, on the other
hand, is less steep. The power slope and variability index show good
values. The heart rate has increased by 29 bpm in one hour and fifteen
minutes.
Figure 2. Passo Gavia List of figures
The scenic road in eastern Utah follows a towering sandstone
cliff. On the the side of the road courses the Colorado River. Apart
from the two small climbs at the beginning and end, it is a very flat
ride.
Between the tenth and twentieth kilometers the power effort
was a bit too much, given the slight decline afterwards (Figure 3).
However, it is barely reflected in the regression line. The heart rate
has increased by 26 bpm.
Figure 3. Potash Road List of figures
A very flat coastal route in Flanders, from Oostende to
Blankenberge.
I probably started a little too slowly on this ride,
considering the power and HR trends (a power coefficient of 1.18 and an
HR change of 38, that is a raise of 1.09 bpm per km according the
multiple regression model, respectively 1.22 in the simple regression
model).
Figure 4. Oostende - Blankenberge List of figures
Figure 5. Gascony prologue List of figures
| Punta Veleno | Passo Gavia | Potash Road | Oostende - Blankenberge | Gascony prologue | |
|---|---|---|---|---|---|
| Date | 2021-01-10 | 2020-09-12 | 2017-12-14 | 2024-03-18 | 2020-03-09 |
| Distance (km) | 9.514 | 20.564 | 51.087 | 34.962 | 13.234 |
| Elevation (m) | 1,021 | 1,264 | 113 | 97 | 268 |
| Duration | 00:55:53 | 01:15:19 | 01:18:24 | 00:56:48 | 00:22:05 |
| Seconds | 3,353 | 4,519 | 4,704 | 3,408 | 1,325 |
| mean Speed | 10.2 | 16.4 | 39.1 | 36.9 | 36 |
| mean Cadence | 64 | 71 | 83 | 75 | 78 |
| mean Watt | 270 | 267 | 242 | 236 | 289 |
| mean HR | 159 | 152 | 151 | 139 | 149 |
| Power to heart ratio | 1.7 | 1.76 | 1.6 | 1.7 | 1.94 |
| sd Watt | 38 | 39 | 36 | 57 | 94 |
| sd HR | 10 | 11 | 10 | 14 | 14 |
| normalized power | 273 | 270 | 246 | 243 | 309 |
| variability index | 1.01 | 1.01 | 1.02 | 1.03 | 1.07 |
| b2 multiple lm | 0.007044 | 0.006328 | 0.005533 | 0.011223 | 0.023383 |
| R² multiple lm | 0.71 | 0.88 | 0.77 | 0.91 | 0.85 |
| HR change | 24 | 29 | 26 | 38 | 31 |
| HR change per km | 2.52 | 1.41 | 0.51 | 1.09 | 2.34 |
| b lm HR distance | 3.25 | 1.84 | 0.54 | 1.22 | 2.9 |
| b lm HR distance (z-scores) | 0.33 | 0.17 | 0.05 | 0.09 | 0.2 |
| b lm power distance | -2.43 | 0.52 | -0.12 | 1.18 | 4.3 |
| b lm power distance (z-scores) | -0.06 | 0.01 | 0 | 0.02 | 0.05 |
To supplement Table 1a, we look up a number of general ride
statistics at Strava that we cannot easily reproduce ourselves.
1. Energy (kJ) and calories. Energy or ‘Total Work’ in
kilojoules (kJ) is the sum of the watts generated during the ride. There
is a close 1–1 ratio between energy and calories expended during a
ride.
2. Relative effort. The Relative effort
measures how much cardiovascular work went into any activity that has
heart rate data. A short and hard activity can require just as much
effort as a long and leisurely one, and relative effort makes it so you
can compare the two. Not only that, but different activity types are
weighted so that your efforts can be compared across sports, and your
values are personalized to your own heart rate zones so you can even
compare with other people.
3. Training load. Strava
calculates the training load by comparing your power during your ride to
your FTP and determining how much load you put on your body during the
workout. Training load is a way to determine how much rest you need
after your workout.
4. Intensity percentage. This
is Strava’s way of showing how difficult a ride was as compared to your
FTP. They look at your weighted average power4 for the ride and
compare it to your FTP. If your weighted average power was 250W and your
FTP 300W, the Intensity would be 83%. It’s possible to have an Intensity
of over 100% if the ride is shorter than an hour, or if your FTP is
specified too low.
Strava divides the intensity percentage into
five categories:
- Endurance / Recovery Ride – 65% and
lower;
- Moderate Ride – 65-80%;
- Tempo Ride – 80-95%;
-
Time Trial or Race – 95-105%;
- Short Time Trial or Race – 105% and
higher.
The five example rides presented in this article all
belong to the last group.
5. “Points in red”. A
predecessor of Stava’s relative effort, then called ‘suffer score’,
which is still offered by VeloViewer in a modified way. It is based on
the amount of time spent in heart rate zones four and five.
Here’s the R code to import the activity data, to filter the
examples and to create a table with additional data:
version[['version.string']]
## [1] "R version 4.3.2 (2023-10-31 ucrt)"
library(XML)
library(xml2)
library(magrittr) # set_colnames
library(TTR) # MA
library(hms)
library(kableExtra)
library(knitr)
library(tidyverse)
options("scipen" = 100, "digits" = 4) # print numbers in normal notation
# get all my Strava / VeloViewer virtual rides summary data.
vv_files <- "data/vv_download/" # Folder VeloViewer files.
# R sometimes works in mysterious ways... The rlang::sym() function takes strings (including those with - sign in name) and turns them into quotes,
# which are in turn unquoted by !! and used as names in the context of df where they refer to the relevant columns.
virtual_activities_df <- read_csv(file = paste0(vv_files, "activities.csv"), show_col_types = FALSE, name_repair = "unique") %>%
rename_all(function(x) gsub(" ", "_", x)) %>%
filter(Type == 'VirtualRide') %>%
rename(Activity_ID = last_col(),
Speed_kmh = "Speed_km/h",
W_HR_ratio = "W/HR",
Intensity_perc = "Intensity_%") %>%
mutate(Power_0_150 = Power_0W + !!rlang::sym('Power_0-50W') + !!rlang::sym('Power_50-100W') + !!rlang::sym('Power_100-150W'),
Power_150_200 = !!rlang::sym('Power_150-200W'),
Power_200_250 = !!rlang::sym('Power_200-250W'),
Power_250_400 = !!rlang::sym('Power_250-300W') + !!rlang::sym('Power_300-350W') +!!rlang::sym('Power_350-400W'),
Power_ge_400 = !!rlang::sym('Power_400-450W') + !!rlang::sym('Power_450W+')) %>%
select(Activity_ID, When, Name, Elapsed_Time, Dist_km, Elv_m, Speed_kmh, Pwr_W, Weighted_Avg_Pwr_W, Max_Pwr_W,
Cad, Heart, Max_Heart, W_HR_ratio, Energy_kJ, Cal, Training_Load, Intensity_perc, Relative_Effort, Points_in_red,
Power_0_150,Power_150_200,Power_200_250,Power_250_400,Power_ge_400) %>%
mutate(Dist_km = round(Dist_km/1000,3),
Speed_kmh=round(Speed_kmh * 3.6,1),
When = as.character(When),
When = paste0(substring(When,1,10), " xx-", substring(When,15,19)),
Duration = hms::as_hms(Elapsed_Time),
Elv_m = round(Elv_m,0),
year = year(When),
Variability_index = round(Weighted_Avg_Pwr_W / Pwr_W,2),
Intensity_perc_class = factor(
ifelse(Intensity_perc < 65, "Endurance/recovery Ride - lower than 65%",
ifelse(Intensity_perc >= 65 & Intensity_perc < 80, "Moderate ride - 65-80%",
ifelse(Intensity_perc >= 80 & Intensity_perc < 95, "Tempo ride - 80-95%",
ifelse(Intensity_perc >= 95 & Intensity_perc < 105, "Time trial or race - 95-105%",
ifelse(Intensity_perc >= 105, "Short time trial or race - 105% and higher", NA)))))))
# pluck 5 sample virtual rides in the category "Short Time Trial or Race – 105% and higher".
activity_id <- data.frame(c(4599245358,4049850859,1313852876,10987044381,3168488893))
activity_names <- data.frame(c("Punta Veleno", "Passo Gavia", "Potash Road", "Oostende - Blankenberge", "Gascony prologue"))
activity_data <- cbind(activity_id,activity_names)
colnames(activity_data) <- c("Activity_ID", "Ride")
rm(activity_id,activity_names)
activity_data <- activity_data %>% left_join(virtual_activities_df, by = "Activity_ID") %>% select(Ride, Activity_ID, Energy_kJ, Cal, Relative_Effort, Training_Load, Intensity_perc, Intensity_perc_class)
# generate table.
knitr::kable(activity_data, format = "html", escape = FALSE, caption = "Table 1b. Additional figures from Strava") %>%
row_spec(1:nrow(activity_data),font_size=10) %>%
row_spec(0,bold=TRUE) %>%
column_spec(1, bold=TRUE) %>%
column_spec(3:6, color = "blue") %>%
column_spec(7:8, color = "red") %>%
kableExtra::kable_styling(full_width = TRUE, font_size = 10)
| Ride | Activity_ID | Energy_kJ | Cal | Relative_Effort | Training_Load | Intensity_perc | Intensity_perc_class |
|---|---|---|---|---|---|---|---|
| Punta Veleno | 4599245358 | 903.9 | 1007.9 | 191 | 135.07 | 120.4 | Short time trial or race - 105% and higher |
| Passo Gavia | 4049850859 | 1207.4 | 1346.2 | 202 | 179.38 | 119.6 | Short time trial or race - 105% and higher |
| Potash Road | 1313852876 | 1137.5 | 1268.3 | 159 | 152.38 | 108.0 | Short time trial or race - 105% and higher |
| Oostende - Blankenberge | 10987044381 | 804.4 | 896.9 | 110 | 106.78 | 106.2 | Short time trial or race - 105% and higher |
| Gascony prologue | 3168488893 | 382.4 | 426.4 | 54 | 64.51 | 132.4 | Short time trial or race - 105% and higher |
It is noteworthy that the prologue time trial scores lowest
on the ‘absolute, quantitative’ variables such as calories burned and
training load, but highest on intensity percentage, which can be
interpreted as a relative one.
Based on the five examples, we have gained some insight into
the figures. Now we can study the intensive time trial group as a whole.
This selection consists of approximately 250 efforts out of a total
number of virtual rides of more than 1,250, see the output below. That’s
about 20 percent.
A number of rides are canceled for various
reasons. This is not due to the Strava data, because it is always
available via VeloViewer. But to obtain the regression data it is
necessary to read all archived ride files. This was not successful in
all cases: unfortunately the archive is incomplete and some files are
rejected by the import routine, even after adjustments. Apart from all
this, we have not taken into account a dozen efforts shorter than 5 km.
This leaves more than 200 efforts, including very flat stages and
out-of-category climbs, see the output below.
options(width = 2000)
# Function to force equal row number if necessary.
bind_cols_fill <- function(df_list) {
max_rows <- map_int(df_list, nrow) %>% max()
map(df_list, function(df) {
if(nrow(df) == max_rows) return(df)
first <- names(df)[1] %>% sym()
df %>% add_row(!!first := rep(NA, max_rows - nrow(df)))
}) %>% bind_cols()
}
# Function to get parameter p-value classification.
get_p_value_class <- function(fit, i){
p_class <- as.character(symnum(summary(fit)$coefficients[,"Pr(>|t|)"][[i]], cutpoints = c(0,0.001,0.01,0.05,1),
symbols = c("< .001", "< .01", "< .05", "n.s.")))
}
# Read all recent tcx files.
tcx_files = list.files(path = 'C:/Users/acmsc/Documents/Strava uploads/', pattern = "tcx$", full.names = TRUE)
variables <- 13
output <- matrix(ncol=variables, nrow=(length(tcx_files)))
for (i in c(1:length(tcx_files))){
rouvy_tcx_file <- tcx_files[i]
mark_als_XML <- XML::xmlParse(rouvy_tcx_file)
df1 <- xmlToDataFrame(mark_als_XML, nodes=getNodeSet(mark_als_XML, "//nm:HeartRateBpm",
namespaces=c(nm="http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2"))) %>% rename(HR = Value) %>% mutate(HR = as.numeric(HR))
df2 <- xmlToDataFrame(mark_als_XML, nodes=getNodeSet(mark_als_XML, "//nm:Time",
namespaces=c(nm="http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2"))) %>% rename(Time = text)
df3 <- xmlToDataFrame(mark_als_XML, nodes=getNodeSet(mark_als_XML, "//nm:AltitudeMeters",
namespaces=c(nm="http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2"))) %>% rename(Altitude = text) %>% mutate(Altitude = as.numeric(Altitude))
df4 <- xmlToDataFrame(mark_als_XML, nodes=getNodeSet(mark_als_XML, "//nm:DistanceMeters",
namespaces=c(nm="http://www.garmin.com/xmlschemas/TrainingCenterDatabase/v2"))) %>% filter(row_number()!=1) %>% rename(Cum_Distance = text) %>% mutate(Cum_Distance = round(as.numeric(Cum_Distance),0))
part1_df <- bind_cols_fill(list(df1,df2,df3,df4)) %>%
mutate_all(~ifelse(is.na(.x), mean(.x, na.rm = TRUE), .x))
part2_df <- do.call("rbind", xpathApply(mark_als_XML, "//ns3:TPX", function(N) {
data.frame(
Watt = as.numeric(xmlValue(N[["Watts"]]))
)
}))
tcx_df <- cbind(part1_df, part2_df) %>%
mutate(Elevation_change_m = Altitude - lag(Altitude)) %>%
mutate(Elevation_change_m = if_else(is.na(Elevation_change_m), 0, Elevation_change_m)) %>%
mutate(Timestamp = hms::as_hms(row_number())) %>%
mutate(Cum_Distance = Cum_Distance / 1000) %>%
select(Time, Timestamp, HR, Watt, Elevation_change_m, Cum_Distance)
When <- paste0(substring(min(df2$Time),1,10), " xx-", substring(min(df2$Time),15,19))
# apply linear model.
linearMod <- lm(lead(HR, n = 10) ~ SMA(Watt, n = 25) + Timestamp + Elevation_change_m, data=tcx_df)
b0 <- as.numeric(round(linearMod$coefficients[1],0))
b1 <- as.numeric(round(linearMod$coefficients[2],4))
b2 <- as.numeric(round(linearMod$coefficients[3],6))
b3 <- as.numeric(round(linearMod$coefficients[4],4))
R2 <- as.numeric(round(summary(linearMod)$r.squared,2))
R2_p_value <- as.character(symnum(summary(linearMod)$fstatistic %>% {unname(pf(.[1],.[2],.[3],lower.tail=FALSE))} , cutpoints = c(0,0.001,0.01,0.05,1),
symbols = c("< .001", "< .01", "< .05", "n.s.")))
b0_p_value <- get_p_value_class(linearMod, 1)
b1_p_value <- get_p_value_class(linearMod, 2)
b2_p_value <- get_p_value_class(linearMod, 3)
b3_p_value <- get_p_value_class(linearMod, 4)
HR_distance_slope <- as.numeric(round(coef(lm(scale(tcx_df$HR) ~ tcx_df$Cum_Distance))[2],2))
Power_distance_slope <- as.numeric(round(coef(lm(scale(tcx_df$Watt) ~ tcx_df$Cum_Distance))[2],2))
output[i,] <- c(When,R2,R2_p_value,b0,b1,b2,b3,b0_p_value,b1_p_value,b2_p_value,b3_p_value,HR_distance_slope,Power_distance_slope)
}
new_virtual_rides_lm_hr_HR_change_df <- data.frame(output) %>%
set_colnames(c("When","R2","R2_p_value","b0","b1","b2","b3","b0_p_value","b1_p_value","b2_p_value","b3_p_value", "HR_distance_slope","Power_distance_slope")) %>%
mutate(hr_HR_change = round(as.numeric(b2) * 3600,0))
# export new virtual activities LM data to csv.
write_csv2(new_virtual_rides_lm_hr_HR_change_df, "data/new_virtual_rides_lm_hr_HR_change.csv")
#csv_check <- read_csv2(file = "data/new_virtual_rides_lm_hr_HR_change.csv", col_names = TRUE)
rm(bind_cols_fill,tcx_files,rouvy_tcx_file,mark_als_XML,vv_files,tcx_df,df1,df2,df3,df4,linearMod,output,part1_df,part2_df,b0,b1,b2,b3,b0_p_value,b1_p_value,b2_p_value,b3_p_value,i,R2,R2_p_value,When, variables)
rm(HR_distance_slope, Power_distance_slope, get_p_value_class)
old_virtual_rides_lm_hr_HR_change_df <- read_csv2(file = "data/old_virtual_rides_lm_hr_HR_change.csv", col_names = TRUE)
virtual_rides_lm_hr_HR_change_df <- rbind(old_virtual_rides_lm_hr_HR_change_df,new_virtual_rides_lm_hr_HR_change_df)
virtual_rides_all_metrics <- virtual_activities_df %>% left_join(virtual_rides_lm_hr_HR_change_df, by = c("When"))
library(epiDisplay)
epiDisplay::tab1(factor(virtual_rides_all_metrics$Intensity_perc_class, levels = c("Endurance/recovery Ride - lower than 65%","Moderate ride - 65-80%","Tempo ride - 80-95%","Time trial or race - 95-105%","Short time trial or race - 105% and higher")), cum.percent = TRUE, graph = FALSE)
## factor(virtual_rides_all_metrics$Intensity_perc_class, levels = c("Endurance/recovery Ride - lower than 65%", :
## "Moderate ride - 65-80%", "Tempo ride - 80-95%", "Time trial or race - 95-105%", :
## "Short time trial or race - 105% and higher")) :
## Frequency Percent Cum. percent
## Endurance/recovery Ride - lower than 65% 76 6.0 6.0
## Moderate ride - 65-80% 159 12.6 18.7
## Tempo ride - 80-95% 384 30.5 49.2
## Time trial or race - 95-105% 384 30.5 79.8
## Short time trial or race - 105% and higher 254 20.2 100.0
## Total 1257 100.0 100.0
time_trials <- virtual_rides_all_metrics %>% filter(Intensity_perc_class == "Short time trial or race - 105% and higher") %>%
filter(!is.na(hr_HR_change) & Variability_index <= 1.07 & Dist_km > 5 & Dist_km < 66 & Activity_ID != 4337793747 & Activity_ID != 4787832167 & Activity_ID != 3455449898 & Activity_ID != 2206457124) %>%
mutate(Duration_class = factor(ifelse(Duration < 3600, "< 1 hour", ifelse(Duration >= 3600 & Duration < 5400, "1-1.5 hour", "> 1.5 hour")), levels = c("< 1 hour","1-1.5 hour", "> 1.5 hour"))) %>%
mutate(Elv_m_4_class = factor(ifelse(Elv_m < 150, "< 150m",
ifelse(Elv_m >= 150 & Elv_m < 500, "150-500m",
ifelse(Elv_m >= 500 & Elv_m < 1000, "500-1000m", "> 1000m"))), levels = c("< 150m","150-500m","500-1000m", "> 1000m"))) %>%
mutate(Elv_m_3_class = factor(ifelse(Elv_m >= 1000, "> 1000m",
ifelse(Elv_m >= 500 & Elv_m < 1000, "500-1000m", "< 500m")), levels = c("> 1000m","500-1000m", "< 500m"))) %>%
mutate(HR_change = as.numeric(b2) * Elapsed_Time)
epiDisplay::tab1(time_trials$Elv_m_4_class, cum.percent = TRUE, graph = FALSE)
## time_trials$Elv_m_4_class :
## Frequency Percent Cum. percent
## < 150m 13 6.4 6.4
## 150-500m 41 20.1 26.5
## 500-1000m 92 45.1 71.6
## > 1000m 58 28.4 100.0
## Total 204 100.0 100.0
Table 2 gives some measures of dispersion and central
tendency. The table includes the median, range, and the first and third
quartiles. Where the range is a measure of where the beginning and end
are in a set, the interquartile range (IQR) is a measure of where the
bulk of the values lie. The IQR specifies where the “middle fifty” is in
a data set: between the first and third quartiles.
Apart from the
common feature of very high relative intensity, the other variables vary
considerably. The indicator that triggered this writing, the HR change,
also has a wide range from 4 to 43.
The proportion expained variance
in the multipe regression model (R²) range is 0.11 - 0.98. Coefficient
b2 of the multiple model is the basis for the HR change (b2 times number
of seconds equals HR change). HR_distance_slope and Power_distance_slope
are the b1’s of two simple regressions on z-score normalized variables,
viz. HR-cumulative distance and power-cumulative distance, to be
discussed in paragraph 6.
time_trials_summary_part1 <- time_trials %>% mutate(Elapsed_Time_minutes = Elapsed_Time/60, HR_change_km = round(HR_change / Dist_km,2)) %>% dplyr::select(HR_change,HR_change_km,Elapsed_Time_minutes,Dist_km,Elv_m,Speed_kmh,Pwr_W,Weighted_Avg_Pwr_W,Variability_index,Cad,Heart,Max_Heart,W_HR_ratio,Energy_kJ,Cal,Training_Load,Relative_Effort,Points_in_red)
tts1 <- do.call(cbind, lapply(time_trials_summary_part1, summary))
tts2 <- as.data.frame(t(tts1)) %>%
mutate(across(where(is.numeric), ~ if_else(!grepl("W_HR_ratio|Variability_index|HR_change_km", rownames(as.data.frame(t(tts1)))), round(., digits = 0),.))) %>%
mutate(across(where(is.numeric), ~ format(.,big.mark = ',', decimal.mark = '.'))) %>%
mutate(across(where(is_character), ~ str_replace(., ".000|.00", "")))
time_trials_summary_part2 <- time_trials %>% dplyr::select(R2,b2,HR_distance_slope,Power_distance_slope) %>%
mutate_all(~as.numeric(.))
tts3 <- do.call(cbind, lapply(time_trials_summary_part2, summary))
tts4 <- as.data.frame(t(tts3)) %>%
mutate(across(where(is.numeric), ~ if_else(!grepl("b2", rownames(as.data.frame(t(tts3)))), round(., digits = 2),.))) %>%
mutate(across(where(is.numeric), ~ format(.,big.mark = ',', decimal.mark = '.'))) %>%
mutate(across(where(is_character), ~ if_else(!grepl("b2", rownames(as.data.frame(t(tts3)))), str_replace(., "000", ""),.)))
rbind(tts2,tts4) %>% dplyr::select(-Mean) %>%
kbl(caption = paste0("Table 2. Median and range of virtual time trial stats. (n = ", nrow(time_trials), ")")) %>%
kable_classic(full_width = FALSE) %>%
row_spec(1:nrow(rbind(tts2,tts4)), align = "r")
| Min. | 1st Qu. | Median | 3rd Qu. | Max. | |
|---|---|---|---|---|---|
| HR_change | 4 | 21 | 26 | 32 | 43 |
| HR_change_km | 0.070 | 0.570 | 0.880 | 1.202 | 4.11 |
| Elapsed_Time_minutes | 22 | 57 | 70 | 83 | 144 |
| Dist_km | 9 | 23 | 31 | 40 | 66 |
| Elv_m | 0 | 492 | 734 | 1,089 | 2,044 |
| Speed_kmh | 10 | 20 | 30 | 34 | 47 |
| Pwr_W | 224 | 236 | 242 | 250 | 289 |
| Weighted_Avg_Pwr_W | 237 | 241 | 247 | 256 | 298 |
| Variability_index | 1 | 1.010 | 1.020 | 1.040 | 1.07 |
| Cad | 64 | 73 | 77 | 79 | 88 |
| Heart | 135 | 145 | 150 | 153 | 162 |
| Max_Heart | 155 | 165 | 168 | 171 | 179 |
| W_HR_ratio | 1.507 | 1.623 | 1.672 | 1.728 | 2 |
| Energy_kJ | 380 | 814 | 1,025 | 1,.000 | 2,038 |
| Cal | 423 | 908 | 1,143 | 1,338 | 2,272 |
| Training_Load | 52 | 112 | 141 | 168 | 266 |
| Relative_Effort | 37 | 124 | 160 | 202 | 343 |
| Points_in_red | 3 | 40 | 82 | 120 | 260 |
| R2 | 0.110 | 0.650 | 0.80 | 0.87 | 0.98 |
| b2 | 0.000656 | 0.004344 | 0.00635 | 0.00848 | 0.02338 |
| HR_distance_slope | 0.000 | 0.060 | 0.09 | 0.13 | 0.38 |
| Power_distance_slope | -0.090 | -0.010 | 0.01 | 0.02 | 0.19 |
To gain some insight into the relationships between these
variables, we run a correlation matrix. When discussing the matrix, I
would like to focus on the coefficients of HR change (on the first row).
There is a negative relationship with the length/duration variables
(elapsed time, distance). And also with metrics that measure the total
amount of effort (energy, calories, training load, relative effort). A
weak positive relationship between HR change and maximum heart rate can
be observed. The conclusion is that on average during the entire ride,
the heart rate is most accelerated in shorter efforts and efforts where
the total amount of work has remained within limits. In these cases the
maximum HR achieved is slightly higher.
cor_data <- time_trials %>% dplyr::select(HR_change,Elapsed_Time,Dist_km,Elv_m,Speed_kmh,Pwr_W,Weighted_Avg_Pwr_W,Variability_index,Cad,Heart,Max_Heart,W_HR_ratio,Energy_kJ,Cal,Training_Load,Relative_Effort,Points_in_red,Intensity_perc)
cor_data_matrix <- Hmisc::rcorr(as.matrix(cor_data))
cor_data_matrix_r <- cor_data_matrix$r
cor_data_matrix_P <- cor_data_matrix$P
corrplot::corrplot(cor_data_matrix_r, p.mat = cor_data_matrix_P, sig.level = 0.05, type = "upper", diag = FALSE, order = "original",
insig = 'blank', tl.cex = 0.7, tl.col = "black", tl.srt = 45,
title=paste0('Correlation matrix (p < 0.05, insignificant cells suppressed. (n = ', nrow(time_trials), ')'), mar=c(0,0,1,0),tl.offset = 0.5, cex.main = 0.8, col.main = "black")
Figure 6. Correlation matrix List of figures
Let’s explore a few HR change relationships further, starting
with training load.
Figure 7a is the scatter plot. When you look at
the point cloud, do you expect a significant relationship?
ggplot(time_trials, aes(y = Training_Load, x = as.numeric(HR_change))) +
geom_point() +
theme(plot.title = element_text(size=17), axis.title=element_text(size=9), legend.position = "right") +
labs(x="HR change (bpm), defined as: LM model b2 coefficient * elapsed time (seconds)", y="Training load", color='elevation', fill = 'elevation', shape = 'elevation',
title = "Scatterplot HR change and training load",
subtitle = paste0(nrow(time_trials), " virtual time trial rides (with intensity >= 105%), 2017 - ", year(today())),
caption = paste0("Devices: Tacx Neo, Wahoo Kickr"))
Figure 7a. Scatterplot HR change and training load List of figures
This significant relationship exists according to the
correlation matrix (Pearson’s r = -0.33201, p<0.01), see Figure 7b.
library(ggtext)
ggplot(time_trials, aes(y = Training_Load, x = as.numeric(HR_change))) +
geom_point() +
geom_smooth(method = 'lm', formula = y ~ splines::bs(x, 4), se = FALSE, colour="red", linewidth=2) +
geom_smooth(method = 'lm', se = FALSE, colour="blue", linewidth=1.5) +
theme(plot.title = element_text(size=17), axis.title=element_text(size=9), legend.position = "right") +
labs(x="HR change (bpm), defined as: LM model b2 coefficient * elapsed time (seconds)", y="Training load", color='elevation', fill = 'elevation', shape = 'elevation',
title = "Trend HR change and training load (<span style='color: blue;'>linear</span> and <span style='color: red;'>non-linear</span>)",
subtitle = paste0(" in ", nrow(time_trials), " virtual time trial rides (with intensity >= 105%), 2017 - ", year(today())),
caption = paste0("Devices: Tacx Neo, Wahoo Kickr")) +
theme(plot.title = element_markdown())
Figure 7b. Relationship HR change and training load List of figures
We can break down the relationship into elevation meters,
see Figure 7c. In each of the three elevation categories, on average the
lower the training load, the higher the HR change.
ggplot(time_trials, aes(y = Training_Load, x = as.numeric(HR_change),
shape = as.factor(Elv_m_3_class) , color = as.factor(Elv_m_3_class))) +
geom_point() +
geom_smooth(method = 'lm', se = FALSE) +
scale_color_manual(values=c('red','darkgreen','blue')) +
scale_shape_manual(values = c(24, 16, 3)) +
theme(plot.title = element_text(size=17), axis.title=element_text(size=9), legend.position = "right") +
labs(x="HR change (bpm), defined as: LM model b2 coefficient * elapsed time (seconds)", y="Training load", color='elevation', fill = 'elevation', shape = 'elevation',
title = "Trend HR change and training load, by elevation",
subtitle = paste0(" in ", nrow(time_trials), " virtual time trial rides (with intensity >= 105%), 2017 - ", year(today())),
caption = paste0("Devices: Tacx Neo, Wahoo Kickr"))
Figure 7c. Relationship HR change, training load and elevation List of figures
There is a negative relationship HR change - duration, in
each of the three elevation categories.
ggplot(time_trials, aes(y = as.POSIXct(Duration, format = "%H:%M:%S"), x = as.numeric(HR_change),
shape = as.factor(Elv_m_3_class) , color = as.factor(Elv_m_3_class))) +
geom_point() +
geom_smooth(method = 'lm', se = FALSE) +
scale_color_manual(values=c('red','darkgreen','blue')) +
scale_shape_manual(values = c(24, 16, 3)) +
theme(plot.title = element_text(size=17), axis.title=element_text(size=9), legend.position = "right") +
labs(x="HR change (bpm), defined as: LM model b2 coefficient * elapsed time (seconds)", y="Elapsed time", color='elevation', fill = 'elevation', shape = 'elevation',
title = "Relationship HR change and duration, by elevation",
subtitle = paste0(" in ", nrow(time_trials), " virtual time trial rides (with intensity >= 105%), 2017 - ", year(today())),
caption = paste0("Devices: Tacx Neo, Wahoo Kickr"))
Figure 8. Relationship HR change, duration and elevation List of figures
There is a negative relationship HR change - energy, in each
of the three elevation categories.
Figures 7c, 8, and 9 are almost
copies of each other, indicating the strong relationship between
training load, energy and duration in the examined group of efforts that
was already demonstrated in the correlation matrix.
ggplot(time_trials, aes(y = Energy_kJ, x = as.numeric(HR_change),
shape = as.factor(Elv_m_3_class) , color = as.factor(Elv_m_3_class))) +
geom_point() +
geom_smooth(method = 'lm', se = FALSE) +
scale_color_manual(values=c('red','darkgreen','blue')) +
scale_shape_manual(values = c(24, 16, 3)) +
theme(plot.title = element_text(size=17), axis.title=element_text(size=9), legend.position = "right") +
labs(x="HR change (bpm), defined as: LM model b2 coefficient * elapsed time (seconds)", y="Energy (kJ)", color='elevation', fill = 'elevation', shape = 'elevation',
title = "Relationship HR change and Energy (kJ), by elevation",
subtitle = paste0(" in ", nrow(time_trials), " virtual time trial rides (with intensity >= 105%), 2017 - ", year(today())),
caption = paste0("Devices: Tacx Neo, Wahoo Kickr")) +
scale_y_continuous(labels = scales::comma_format(big.mark = ',', decimal.mark = '.'))
Figure 9. Relationship HR change, energy and elevation List of figures
For each ride we performed two simple regressions with distance as the independent variable: the first with power as the dependent variable, the second with HR. In order to compare the two slope angles (the model b1 coefficient), the dependent variables are normalized by z-scores.
HR_distance_slope <- as.numeric(round(coef(lm(scale(tcx_df$HR) ~ tcx_df$Cum_Distance))[2],2))
Power_distance_slope <- as.numeric(round(coef(lm(scale(tcx_df$Watt) ~ tcx_df$Cum_Distance))[2],2))
A positive slope corresponds to an increase, a negative one to decrease. The heart rate increases in all rides, but the power, which we would like to keep constant during the ride, appears to drop quite often (see Figure 10). Apparently, on quite a number of rides I started too energetically. Fortunately, the median is just greater than zero (0.01).
slope1 <- time_trials %>% dplyr::select(HR_distance_slope) %>% rename(slope = HR_distance_slope) %>%
mutate(trend = 'HR distance')
slope2 <- time_trials %>% dplyr::select(Power_distance_slope) %>% rename(slope = Power_distance_slope) %>%
mutate(trend = 'Power distance')
slopes <- rbind(slope1,slope2) %>% mutate(slope = as.numeric(slope))
slope_median <- slopes %>%
group_by(trend) %>%
summarise(grp_median = median(slope))
ggplot(slopes, aes(x = slope)) +
geom_histogram(aes(color = trend, fill = trend),alpha = 0.3, position = "identity") +
scale_fill_manual(values = c("red", "darkgreen")) +
scale_color_manual(values = c("red", "darkgreen")) +
labs(x="slope", y="count",
title = "Slopes (regression b's) of normalized distance-power (W)\n and distance-HR (bpm) trends",
subtitle = paste0(" in ", nrow(time_trials), " virtual time trial rides, 2017 - ", year(today())),
caption = paste0("Devices: Tacx Neo, Wahoo Kickr")) +
theme(legend.text = element_text(size=7), axis.title=element_text(size=9)) +
geom_label(aes(0.2, 35, hjust = 0, vjust = 0, label = paste0("median HR slopes: ", slope_median[1,2], "\n",
"median power slopes: ", slope_median[2,2])), data.frame())
Figure 10a. Histogram regression b’s normalized trends List of figures
Figure 10a is an interesting result of our exploratory
research in this article. So let’s enhance this graph a bit…
A
density plot visualizes the distribution of data over a continuous
interval. This type of chart is a representation of the distribution of
a numeric variable that uses a kernel density estimate (KDE) to show the
probability density function of the variable. An advantage density plots
have over histograms is that they’re better at determining the
distribution shape because they’re not affected by the number of bins
used. Figure 11 shows the result. A typical example of a compact graph
with a clear message: don’t go too fast in the first part of your time
trial!
ggplot(slopes, aes(x = slope)) +
geom_histogram(aes(y = after_stat(density), color = trend),
fill = "white",position = "identity")+
geom_density(aes(color = trend), linewidth = 1, adjust = 1, kernel = c("gaussian")) +
scale_color_manual(values = c("red", "darkgreen")) +
labs(x="slope", y="density",
title = "Density plot slopes (regression b's) of normalized distance-power (W)\n and distance-HR (bpm) trends",
subtitle = paste0(" in ", nrow(time_trials), " virtual time trial rides, 2017 - ", year(today())),
caption = paste0("Devices: Tacx Neo, Wahoo Kickr")) +
theme(legend.text = element_text(size=7), axis.title=element_text(size=9))
Figure 10b. Density plot regression b’s normalized trends List of figures
What have I learned while writing this document? First of all, the HR
change within the selection of heavy time trials shows a wide variation
in scores. The median of the HR change is 26 bpm, a quarter of the
scores are below 21 bpm, a quarter above 32 bpm. Apparently the group of
selected time trials is less homogeneous than expected (see table
2).
Regarding HR change, the correlation matrix (Figure 6) shows a
negative relationship with the length/duration variables (elapsed time,
distance). And also with metrics that measure the total amount of effort
(energy, calories, training load, relative effort). A weak positive
relationship between HR change and maximum heart rate can be observed.
The conclusion is that on average during the entire ride, the heart rate
is most accelerated in shorter efforts and efforts where the total
amount of work has remained within limits. In these cases the maximum HR
achieved is slightly higher.
The second conclusion concerns the
power trends. In a time trial we want to keep the power as high and
constant as possible in order to achieve the best performance. The aim
is a flat power line. We especially want to prevent declining power.
Given the number of efforts with a negative power slope angle, I am not
always successful in this (see Figures 10a-b).
Figure 1. Punta Veleno
Figure 2. Passo Gavia
Figure 3. Potash
Road
Figure 4. Oostende - Blankenberge
Figure 5. Gascony prologue
Figure 6. Correlation matrix
Figure 7a. Scatterplot HR change and training load
Figure 7b.
Relationship HR change and training load
Figure
7c. Relationship HR change, training load and
elevation
Figure 8. Relationship HR change,
duration and elevation
Figure 9. Relationship HR
change, energy and elevation
Figure 10a. Histogram regression b’s normalized trends
Figure
10b. Density plot regression b’s normalized
trends
Stack Overflow
https://stackoverflow.com/
Strava
https://www.strava.com
VeloViewer
https://www.veloviewer.com
Footnotes
Geographer, data-analist and cyclist↩︎
The relationship between power and heart rate in a
virtual ride with five ascents, and a potentially interesting HR change
indicator as a spin-off
https://rpubs.com/JSchakenraad/1187324↩︎
A simple moving average (SMA) is calculated by taking the arithmetic mean of a given set of values over a specified period.↩︎
The weighted average power indicator is very similar to normalized power↩︎