I was wondering how do I perform against other skyrunners on ascents and descents. This post is a result of this wondering and you can use it to analyse your performance in mountain running races.

Data scraping

First we need to get the data. Go to the activity that you want to analyse and rigth click on the table with splits data. In the window that popped up on the right side, right click on the tbody id = contents line and click Copy. Then paste it to a notepad and save in a .tsv format. Repeat the same for other runners.

I am loading the packages needed and creating a function that will get the data from the raw text files that we just downloaded.

rm(list=ls())

library(readr)
library(stringr)
library(tidyr)
library(dplyr)
setwd("~/zabawy_z_r/stravaSplits")

scrape_splits <- function(input) {
  #data extraction
  splits <- read_delim(input, "<") %>%
    separate(2, c("nic","dane"),">") %>%
    select(dane) 
  #matrix spread
  splits_tidy <- matrix(splits[[1]], nrow = 3) %>%
    t()%>%
    data.frame(stringsAsFactors = F)

  names(splits_tidy) <- c("pace","gap","elev")
  return(splits_tidy)
}

Now, I am specifying the number of splits that we want to analyse. Next, I apply our new function to the 2 raw files. In our exapmple, player 1 must have turned off Strava some 300m later, I am deleting this obsolete, 25th split. We got 2 nice tables.

split_n <- 1:24

player1 <- bind_cols(scrape_splits("ktos_jested.tsv")[-25,], split = split_n)
player2 <- bind_cols(scrape_splits("ktos2_jested.tsv"), split = split_n)

player1

##      pace   gap  elev split
## 1   6:00  3:49    95      1
## 2  10:33  4:58   122      2
## 3   3:53  3:11  -213      3
## 4   9:21  4:30   167      4
## 5   7:52  4:42    97      5
## 6   3:51  3:23  -175      6
## 7   4:07  3:59   -49      7
## 8   6:33  4:17    89      8
## 9   5:10  4:17    16      9
## 10  3:53  3:42   -99     10
## 11  9:25  4:34   139     11
## 12  6:17  4:25    64     12
## 13  5:20  4:30    23     13
## 14  4:54  4:26    -1     14
## 15  6:25  4:27    39     15
## 16  7:18  4:42    83     16
## 17  7:51  5:10   -26     17
## 18  4:07  3:56  -161     18
## 19  4:08  3:59  -127     19
## 20  9:43  4:53   157     20
## 21 11:18  5:25   169     21
## 22  6:03  5:01   -34     22
## 23  5:15  4:07  -224     23
## 24  4:16  3:39  -135     24

player2

##      pace   gap  elev split
## 1   5:41  3:48    92      1
## 2  10:28  5:02   127      2
## 3   4:13  3:29  -213      3
## 4   9:45  4:52   161      4
## 5   8:28  5:11    99      5
## 6   4:14  3:53  -170      6
## 7   4:18  4:19   -46      7
## 8   6:30  4:31    88      8
## 9   4:54  4:29     8      9
## 10  4:03  4:01   -97     10
## 11  9:34  4:46   146     11
## 12  6:20  4:41    65     12
## 13  5:32  4:50    27     13
## 14  4:36  4:35   -23     14
## 15  7:03  4:38    77     15
## 16  6:28  4:53    48     16
## 17  8:31  5:34   -42     17
## 18  4:13  4:15  -157     18
## 19  6:36  5:08   -52     19
## 20  9:47  5:13   149     20
## 21 13:12  6:19   157     21
## 22  4:16  4:18  -124     22
## 23  6:05  4:27  -223     23
## 24  3:29  3:44   -97     24

Do you like sky?

I am joining the data and adding pace in POSIX format. Then I do the first simple plot: pace by runner in each split.

data <- bind_rows(player1 = player1, player2 = player2, .id = "player") %>%
  mutate(pace_posix = as.POSIXct(pace, format = '%M:%S '))
  
library(ggplot2)
ggplot(data, aes(split, pace_posix, col = player))+
  geom_point()

Now, I repeat the plot, but I am sorting the splits by elevation, so that it is easier to capure some relationship between elevation and pace.

arranged_data <- data %>%
  arrange(player, as.numeric(elev)) %>%
  group_by(split) %>%
  mutate(mean_elev = mean(as.numeric(elev)))%>%
  ungroup(split) %>%
  arrange(player, mean_elev) %>%
  group_by(player)%>%
  mutate(ordered_splits = split_n) 


ggplot(arranged_data, aes(ordered_splits, pace_posix, col = player))+
  geom_point()

As for me, the info is not much interesting. I will classify the splits according to how steep they are, and check where do I need to catch up.

#classyfying splits
steep_data <- arranged_data %>%
  mutate(steepness = ifelse(mean_elev > 100, "very steep up", 
                            ifelse(mean_elev > 50, "steep up",
                ifelse(mean_elev > -50, "flat", 
                       ifelse(mean_elev > -100,"steep down",
                              "very steep down"))))) %>%
  mutate(split_winner = ifelse(player1$pace < player2$pace,
                               "player1", "player2")) %>%
  ungroup(player)%>%
  mutate(steepness = factor(steepness, ordered = T, levels = c("very steep down","steep down", "flat", "steep up", "very steep up")))
#plot
steep_data %>%
  filter(player == "player1")%>%
  ggplot(aes(split_winner, fill = split_winner))+
  geom_bar()+
  facet_wrap(~steepness)

It looks like player 1 is in general better than me, and flat splits are my weak point. It is even better visible if we change the perspective a little.

steep_data %>%
  filter(player == "player1")%>%
  ggplot(aes(steepness, fill = split_winner))+
  geom_bar()+
  facet_wrap(~split_winner)+
  theme(axis.text.x=element_text(angle=45, hjust=1))

If the relationship is not so obvoius as in the chart above, we can use position = fill argument inside ggplot, to show the win rates on splits with different steepness. This way I can identify my weaknesses and strengths without regard to player 2’s overall condition.

steep_data %>%
  filter(player == "player1")%>%
  ggplot(aes(split_winner, fill = steepness))+
  geom_bar(position = "fill")+
  theme(axis.text.x=element_text(angle=45, hjust=1))+
  ggtitle("Win rates on different grounds")

Does this analysis mean that I should go and train on asphalt now?

Why not asphalt

No I shouldn’t. This is why:

The analysi has some limitations i.e. it is possible that there were big ascents and descents inside the “flat” splits.
Running on the asphalt is boring. So, I’m submitting this post and see you later. (Hope not to get lost again).

How did I scrape splits data from Strava to identify my weak and strong points vs other players in mountain running

Adam Ewert-Krzemieniewski

27 May 2018

Data scraping

Do you like sky?

Why not asphalt