Abstract

The purpose of this study is the extraction of running data from a gpx file for subsequent analysis. The file itself is a Garmin export file from the activity data acquired on this case in a Garmin Forerunner 935 watch. The activity performed is trail running in a mountain area, specifically in Bariloche, Argentina.The export file is based on a xml schema.

File structure analysis

We initiate the analysis by loading the .gpx file. The file was downloaded from the GarminConnect website as an export file from a selected activity. The chosen activity is a Trail Run on a mountain sector (Campanario hill) to account for altitude variations. The run was performed in the turistic area of Bariloche, Patagonia Argentina. We load the file and parse it in R:

doc <- xmlTreeParse("20181129-run.gpx", useInternal = TRUE)
top <- xmlRoot(doc)

The xml tree has on top the node gpx with some basic information about the .gpx file, the schema used and links for further information.

xmlName(top)
[1] "gpx"

The tree below the previous node separates in two branches:

names(top)
  metadata        trk 
"metadata"      "trk" 

The “metadata” branch has some basic information about the link to the Garmin Connect website, and the starting time of the activity, which could be used as an unique activity identifier.

names(top[[1]])
  link   time 
"link" "time" 

On the other side, the second branch called “trk” has the relevant information about the activity, for what it seems to refer to the “track” data. This is the one of interest for the analysis purposes and has three branches:

names(top[[2]])
    name     type   trkseg 
  "name"   "type" "trkseg" 

The first two branches, “name” and “type”, have information regarding the activity: “name” is a general reference about the activity and geographical place, and “type” is the type of activity done. The third branch (trkseg), has the data for each track point (“trkpt”) saved during the activity course. If we take a look at the point structure:

head(xmlChildren(top[[2]][[3]]),1)
$trkpt
<trkpt lat="-41.06869638897478580474853515625" lon="-71.48198059760034084320068359375">
  <ele>853.5999755859375</ele>
  <time>2018-11-29T21:19:50.000Z</time>
  <extensions>
    <ns3:TrackPointExtension>
      <ns3:atemp>28.0</ns3:atemp>
      <ns3:hr>110</ns3:hr>
      <ns3:cad>67</ns3:cad>
    </ns3:TrackPointExtension>
  </extensions>
</trkpt> 

We can see that we can extract:

Data Extraction and Processing

We start by extracting the general information for the activity:

SportName <- xmlSApply(top[[2]][["name"]],xmlValue) #Garmin Assigned name for the excercise acording to geographic place
SportType <- xmlSApply(top[[2]][["type"]],xmlValue) #Garmin Assigned sport classification
SessionTime <- xmlSApply(top[[1]][["time"]],xmlValue) #Starting Time (unique identifier)

And we now get the information for every track point. Since the data collection gets overlapped at every second acquired (as set on the watch configuration), we set a “leap” factor to account for every specified number of points. That makes the visualization of markers on a map more suitable.

TrackPoint <- top[[2]][[3]] #Track points level

#Getting the information from the xml
PointTime <- list()
latitude <- vector()
longitude <- vector()
elevation <- vector()
hr <- vector()
cadence <- vector()
temperature <- vector()
elapsedtime <- list()
Text <- list()

# We collect only part of the points to avoid to much overlapping (clustering is not an visually interesting option)
leap <- 8   # Show points every "leap" points
counter <- 0
j <- 1

for (i in 1:length(xmlChildren(TrackPoint))) {    # length(xmlChildren(TrackPoint)) gets length of track points
        counter <- counter +1        
        if ((counter == leap) | (i == 1)) {
              #Time data extraction & transformation           
              Time <- xmlValue(TrackPoint[[i]][["time"]])  ##gets time
              Time <- sub("T"," ",Time)  ## arranges date (takes "T" out)
              Time <- unlist(strsplit(as.character(Time),"[.]"))[[1]]  ##removes decimal seconds
              PosTime <- strptime(Time,"%Y-%m-%d %H:%M:%S") ## sets as POSIXlt  
              ifelse (i==1, PointTime <- PosTime, 
                      PointTime <- append(PointTime, PosTime, after = j-1))  #creates the list of point times by appending
              #Elapsed time calculation and list creation:
              ifelse (i==1, elapsedtime <- difftime(PointTime[1],PointTime[1], units = "secs"), 
                      elapsedtime <- append(elapsedtime, difftime(PointTime[j],PointTime[1], units = "secs"), after = j-1)) 
            #Rest of the variables extraction:  
            latitude[j] <- as.numeric(xmlAttrs(TrackPoint[[i]])[[1]])  ## gets latitude
            longitude[j] <- as.numeric(xmlAttrs(TrackPoint[[i]])[[2]])   ## gets longitude
            elevation[j] <- as.numeric(xmlValue(TrackPoint[[i]][["ele"]]))  ##gets elevation
            hr[j] <- as.numeric(xmlValue(TrackPoint[[i]][[3]][[1]][["hr"]]))  ##Heart Rate
            cadence[j] <- as.numeric(xmlValue(TrackPoint[[i]][[3]][[1]][["cad"]])) ##Cadence
            temperature[j] <- as.numeric(xmlValue(TrackPoint[[i]][[3]][[1]][["atemp"]])) ##Ambient Temperature
            
            #Popup text for each point 
                #Time text arrangement
                Hours <- as.integer(elapsedtime[j]/3600)
                Minutes <- as.integer((elapsedtime[j] - Hours*3600)/60)
                Seconds <- elapsedtime[j] - Hours*3600 - Minutes*60
                if (Hours != 0) {
                        PopUpTime <- paste(Hours,"h ",Minutes,"m ",Seconds,"s",sep = "")
                }    else {
            
                        ifelse(Minutes != 0, PopUpTime <- paste(Minutes,"m ",Seconds,"s",sep = ""),
                               PopUpTime <- paste(Seconds,"s",sep = ""))           
                }
                #Full text
                Text <- append(Text, paste("Elapsed Time:",PopUpTime, "<br>",
                                       "Elevation:",as.character(format(round(elevation[j], 2), nsmall = 2)),"<br>",
                                       "Heart Rate:",as.character(hr[j]),"<br>",
                                       "Cadence:",as.character(cadence[j]),"<br>",
                                       "Temperature:",as.character(temperature[j])))
            j <- j+1
            counter <- 0
        }
}

It´s important to notice that the code includes the text generation for each point giving a string of point data information. This is relevant since we can point the cursor in the screen and a popup window will show the information.

Data visualization

We create a leaflet plot of every trackpoint extracted, given the latitude and longitude with the respective text information gathered from the file:

runcoord <- data.frame(lat=latitude,lng=longitude) 

#Add a new icon 
runningicon <- makeIcon(iconUrl = "run1.png",
                        iconWidth = 15,
                        iconHeight = 15,
                        )
#Leaflet plot
runcoord %>% leaflet(width = "100%") %>% addTiles() %>% addMarkers(icon = runningicon, popup = Text)

As it can be seen from the map, a customized icon was added, following the running theme.

An additional step is to visualize the rest of the variables. It can be done in a facet plot against the elapsed time:

runtot <- data.frame(elapsedtime,elevation, hr,cadence,temperature) #converts to data frame all data
runtot <- pivot_longer(runtot,c(elevation, hr,cadence,temperature),names_to = "Variable",values_to = "Value") #arranges in longer format

g <- ggplot(runtot, aes(x = elapsedtime, y = Value)) + geom_line(aes(color = Variable)) +
        facet_grid(rows=vars(Variable), scales = "free_y") +
        labs(x = "Elapsed Time [secs]", y = "Data Field") +
        theme(legend.position = "none") + 
        scale_color_manual(values = c("red","blue","black","darkgreen"))
ggplotly(g)