Mapping My Runs

Background

I try to run around 10km most days. Like a lot of runners now, I use a smartphone app to track the running. There are several available, the one I use is called ‘Runkeeper’ - as seen below:

During the run, the app makes various announcements - for example it announces every km that I have completed, and the average pace during that km. However, it also tracks the runs and uploads them to my Runkeeper login, where I can download them onto my computer. This allows me to analyse the runs.

Exploring my Run Data

I’m using the R programming language to explore the data (https://cran.r-project.org). Its fairly easy to use once you start - initially it may be easier to use a menu-driven map package but:

R enables data to be re-formatted when it isn’t in a standard form (like it is here).
It makes the results easy to reproduce ore modify (just pass on the code).

I’ll include the R code to demonstrate how I have processed and analysed the data - but this isn’t meant to be an R tutorial - more an explanation of how I carry out my exploration of the run data. R just happens to be a tool that I use.

The data itself consists of several files - one for each run. These can be read into R using the code below - assuming you have a folder containing all of the run files (each one is a gpx file whose name is the date and time of the run).

# Includev an R add on to read the kind of files the runs are stored in
library(rgdal)
# Find all of the gpx files
all_gpx <- dir(pattern=".*gpx$")
# ... and how many are in the list
n <- length(all_gpx)

# Read each one in turn and storew the results in a list
run_tracks <- vector(mode = "list", length = length(all_gpx))
for (i in 1:n) run_tracks[[i]] <- readOGR(dsn=all_gpx[i],layer='tracks')

The list run_tracks now has all of the tracks in a single list that R can work with. At this stage there are a mixture of routes for the runs. A lot of them follow similar routes, but there are variants. Firstly, all of the routes will be explored, but then I’ll focus on a specific route, along the Royal Canal and the grounds of Carton House in Maynooth.

Creating the First Map

Packages are kind of add-ons to R. There are so many of these - over 10,000 - that it would be impractical in computing terms to load all of them every time R is used, so individual ones are loaded when needed. Below, the leaflet package is used to create a map of each of the routes. Use is also made of the dplyr package that allows R to carry out ‘pipeline’ style data processing.

n <- length(run_type)
leaflet() %>% addTiles() -> all_map
for (i in 1:n) 
  all_map %>% 
    addPolylines(data=run_tracks[[i]],col='indianred') -> all_map
all_map

The map above is quite large scale, because there are runs recorded in the Dublin area, the North East of England and Dundee in Scotland. These just reflect runs I took while in Maynooth (most of the time) but also travelling to Dundee and Durham. leaflet automatically chooses a map window that accommodates all of the data - but it is possible to zoom in on individual data. Here, the colour of the mapped routes (‘indianred’) is chosen to contrast with the backdrop map. Three ‘blobs’ can be seen on the map - try zooming in on one of these (say the one in Dundee) - do this by ‘sliding’ the map until the Dundee ‘blob’ is on the centre, then press the ‘+’ repeatedly button to zoom in. You may find it helps to re-position the red blob to the centre between zooms. Eventually a distinct path (rather than a blob) can be seen - showing a couple of runs that I made while staying in Dundee in 2015. The routes shown are semi-transparent (a default when leaflet draws routes) and are based on latitude and longitude readings continually recorded by the smartphone GPS during my run.

Don’t Get Lost

One of the problems with the last map is that it is possible to get lost zooming and panning - so that one could end up, for example, somewhere in New York at a quite detailed zoom level, but nowhere near the running routes. The setMaxBounds option automatically pulls the map back so that the mapped data is central if you stay away. Here I take the bounding box of the first run - which has latitudes 053°22’35.9“N to 053°23’25.4”N and longitudes 006°36’20.7“W to 006°33’17.3”W.

run_tracks[[1]]@bbox -> llbounds
all_map %>% setMaxBounds(llbounds[1],llbounds[2],llbounds[3],llbounds[4])

Alternative Backdrops - Down in Black and White

As default, leaflet provides a standard OpenStreetMap (OSM - http://openstreetmapdata.com ) map as a backdrop, although other alternatives are possible. Here, addProviderTiles can be used:

leaflet() %>% addProviderTiles('OpenStreetMap.BlackAndWhite') -> bw_map
for (i in 1:n) 
  bw_map %>% 
    addPolylines(data=run_tracks[[i]],col='indianred') %>%
    setMaxBounds(llbounds[1],llbounds[2],llbounds[3],llbounds[4])-> bw_map
bw_map

In this case, the map provider is still OSM - but a near-monochrome map is used, so that the running routes stand out clearly, since there are in colour. I’ve kept the boundary locking switched here.

Finding Structure in the Data

There are several different runs illustrated here - but clearly, although some runs are more or less along the same route, other are quite different. The following code attempts to identify the distinct run route groupings. It does this via the following steps:

Identify a ‘distance’ between each pair of routes - essentially a measure of how similar or different two routes are.
Find groups of routes with very little difference, and sort the routes into these groups
Consider each group separately

This process is relatively complex, and I won’t go into detail here (beyond the overview above) - but the code to do this is shown below:

library(rgeos) # For the gDistance function.


dists <- matrix(0,n,n) # Set up a distance function

run_tracks2 <- run_tracks # Make a copy of the running route list

# Transform it to Irish National Grid (better for distances)
for (i in 1:n) run_tracks2[[i]] <- spTransform(run_tracks[[i]],CRS('+init=epsg:3857')) 

# Compute the distances
for (i in 1:n) 
  for (j in 1:n)
    if (i < j) dists[i,j] <- gDistance(run_tracks2[[i]],run_tracks2[[j]],hausdorff=TRUE)
  
dists <- dists + t(dists)    
dists <- as.dist(dists)

# Sort the routes into similar groups - here there are 5
htree <- hclust(dists,method='ward.D2')
run_type <- cutree(htree,k=5)

The variable run_type now contains the classification. These can be seen on a map, if the different route groups are given different colours.

colours <- brewer.pal(5,'Set1')
leaflet() %>% addProviderTiles('OpenStreetMap.BlackAndWhite') -> my_group_map
for (i in 1:n) 
  my_group_map %>% 
   addPolylines(data=tracks3[[i]],col=colours[run_type[i]]) -> my_group_map
my_group_map

This identifies the five groups - two are in Maynooth (the red route looping along the Canal and past Carton House and the blue route leading to Leixlip Confey railway station - the green route in in Howth (along Howth Head), the orange route along the Tay Estuary in Dundee, and finally a purple route through Houghall Woods in Durham.

Detailed Examination of Carton House Runs

It is also possible to select out one of these routes - here I choose the Carton House runs, and examine these in more detail. Here the boundary is focused on this route (the run_type is 1). The opacity of the routes is set very low, as this allows the accumulation of overlaid runs to build up slowly - and identify some parts of the route where there is variety in my exact choice of road or pathway. In particular, there are two distinct choices over the grounds of Carton House, and I occasionally go along the back street parallel to Maynooth Main Street.

n <- length(run_type)
leaflet() %>% addProviderTiles('OpenStreetMap.BlackAndWhite')-> carton_map
for (i in 1:n) 
  if (run_type[i] == 1) carton_map %>% 
    addPolylines(data=run_tracks[[i]],opacity=0.05,col='indianred') %>%     
    setMaxBounds(llbounds[1],llbounds[2],llbounds[3],llbounds[4]) -> 
carton_map

carton_map

The ‘most typical’ run in the group

Having calculated the ‘distances’ earlier, this can be used to identify typicality of each run. Taking the Carton house runs the following code looks at the distance between each of these, and finds the individual run that is ‘closest’ in total to all the runs in the group - a sort of ‘average’ run. Technically it is strongly related to the idea of a median route. Here it is added to the previous map, but shown in blue to highlight.

rt1 <- which(run_type == 1)
d1 <- as.matrix(dists)[rt1,rt1]
m1 <- rt1[which.min(rowSums(d1))]
mr1 <- run_tracks[[m1]]
carton_map %>% 
    addPolylines(data=mr1,opacity=1,col='darkblue')

This links the ‘most typical’ route with all of them, again highlighting variability.

When do I run?

The file names for the individual runs were based on the time and date of the run. The lubridate package allows the manipulation of date and time information - so that time and date patterns can be observed. Below is some code to create a new data table - with two columns - the timestamp of the run, and the run type. Here I label the run types with a description of the route - this makes the analysis more legible.

library(ggplot2)
library(tibble)
library(lubridate)
all_gpx %>% sub('\\.gpx','',.) %>% gsub('-','',.) %>% ymd_hm -> run_times
tibble(when=run_times,type=as.factor(c("Carton","Leixlip","Howth","Durham","Dundee")[run_type])) -> run_data

Firstly, a plot of the dates of each run type can be produced. To make the plot more readable, days are colour coded.

run_data %>% ggplot(aes(y=type,x=when)) + geom_point(aes(col=wday(when,label=TRUE))) + labs(colour='Day') + theme_fivethirtyeight()

Mostly, the Leixlip runs take place on sundays. On one particularly enthusiastic Sunday, a Leixlip run plus one along Howth Head took place. A couple of groups of runs on trips to Durham took place (one in the week and another at a weekend) - some weekends no runs took place, I have to admit.

Times of day can also be investigated. It looks like the Durham runs tended to start later than the others. Finally, the one-off Howth Head run took place quite a lot later - mostly because a Leixlip run had already occurred on that day…

# Function to reset all dates to January 1st - really just a trick to compare time of day
# of an event for data spanning several days
date1jan <- function(x,yr) {
  y <- x
  date(y) <- dmy(yr+1010000)
  return(y)
}
## Apply this transform,   then draw histograms faceted by route group
run_data %>% mutate(time_of_day=date1jan(when,2015)) -> run_data
run_data %>% ggplot(aes(x=time_of_day)) + 
  geom_histogram(fill='indianred') + facet_wrap(~type,ncol=2) + theme_fivethirtyeight()

Time of Run as Summer Approaches

A final graph examines the time of day that the run took place, and how this altered with the passing of time from January to May.

run_data %>% mutate(date=date(when)) -> run_data
 run_data %>% filter(type=='Carton') %>% ggplot(aes(x=date,y=time_of_day)) + geom_point() + geom_smooth() + theme_fivethirtyeight()

Although there are a few outliers, the general trend is to start earlier in the move from winter to spring, and towards summer.

Conclusions

Most running apps (including Runkeeper, the one that I use) offer the ability to record running routes, and to download these for further analysis. Doing this allows me to explore the interaction that I have with local (and other) green space, and in particular how this can vary, and how patterns of my interaction relate to other patterns in my life including time of day, weeks vs. weekends and breaks from the ‘usual’ rhythm when I travel.