The purpose of this study is the extraction of running data from a gpx file for subsequent analysis. The file itself is a Garmin export file from the activity data acquired on this case in a Garmin Forerunner 935 watch. The activity performed is trail running in a mountain area, specifically in Bariloche, Argentina.The export file is based on a xml schema.
We initiate the analysis by loading the .gpx file. The file was downloaded from the GarminConnect website as an export file from a selected activity. The chosen activity is a Trail Run on a mountain sector (Campanario hill) to account for altitude variations. The run was performed in the turistic area of Bariloche, Patagonia Argentina. We load the file and parse it in R:
doc <- xmlTreeParse("20181129-run.gpx", useInternal = TRUE)
top <- xmlRoot(doc)
The xml tree has on top the node gpx with some basic information about the .gpx file, the schema used and links for further information.
xmlName(top)
[1] "gpx"
The tree below the previous node separates in two branches:
names(top)
metadata trk
"metadata" "trk"
The “metadata” branch has some basic information about the link to the Garmin Connect website, and the starting time of the activity, which could be used as an unique activity identifier.
names(top[[1]])
link time
"link" "time"
On the other side, the second branch called “trk” has the relevant information about the activity, for what it seems to refer to the “track” data. This is the one of interest for the analysis purposes and has three branches:
names(top[[2]])
name type trkseg
"name" "type" "trkseg"
The first two branches, “name” and “type”, have information regarding the activity: “name” is a general reference about the activity and geographical place, and “type” is the type of activity done. The third branch (trkseg), has the data for each track point (“trkpt”) saved during the activity course. If we take a look at the point structure:
head(xmlChildren(top[[2]][[3]]),1)
$trkpt
<trkpt lat="-41.06869638897478580474853515625" lon="-71.48198059760034084320068359375">
<ele>853.5999755859375</ele>
<time>2018-11-29T21:19:50.000Z</time>
<extensions>
<ns3:TrackPointExtension>
<ns3:atemp>28.0</ns3:atemp>
<ns3:hr>110</ns3:hr>
<ns3:cad>67</ns3:cad>
</ns3:TrackPointExtension>
</extensions>
</trkpt>
We can see that we can extract:
We start by extracting the general information for the activity:
SportName <- xmlSApply(top[[2]][["name"]],xmlValue) #Garmin Assigned name for the excercise acording to geographic place
SportType <- xmlSApply(top[[2]][["type"]],xmlValue) #Garmin Assigned sport classification
SessionTime <- xmlSApply(top[[1]][["time"]],xmlValue) #Starting Time (unique identifier)
And we now get the information for every track point. Since the data collection gets overlapped at every second acquired (as set on the watch configuration), we set a “leap” factor to account for every specified number of points. That makes the visualization of markers on a map more suitable.
TrackPoint <- top[[2]][[3]] #Track points level
#Getting the information from the xml
PointTime <- list()
latitude <- vector()
longitude <- vector()
elevation <- vector()
hr <- vector()
cadence <- vector()
temperature <- vector()
elapsedtime <- list()
Text <- list()
# We collect only part of the points to avoid to much overlapping (clustering is not an visually interesting option)
leap <- 8 # Show points every "leap" points
counter <- 0
j <- 1
for (i in 1:length(xmlChildren(TrackPoint))) { # length(xmlChildren(TrackPoint)) gets length of track points
counter <- counter +1
if ((counter == leap) | (i == 1)) {
#Time data extraction & transformation
Time <- xmlValue(TrackPoint[[i]][["time"]]) ##gets time
Time <- sub("T"," ",Time) ## arranges date (takes "T" out)
Time <- unlist(strsplit(as.character(Time),"[.]"))[[1]] ##removes decimal seconds
PosTime <- strptime(Time,"%Y-%m-%d %H:%M:%S") ## sets as POSIXlt
ifelse (i==1, PointTime <- PosTime,
PointTime <- append(PointTime, PosTime, after = j-1)) #creates the list of point times by appending
#Elapsed time calculation and list creation:
ifelse (i==1, elapsedtime <- difftime(PointTime[1],PointTime[1], units = "secs"),
elapsedtime <- append(elapsedtime, difftime(PointTime[j],PointTime[1], units = "secs"), after = j-1))
#Rest of the variables extraction:
latitude[j] <- as.numeric(xmlAttrs(TrackPoint[[i]])[[1]]) ## gets latitude
longitude[j] <- as.numeric(xmlAttrs(TrackPoint[[i]])[[2]]) ## gets longitude
elevation[j] <- as.numeric(xmlValue(TrackPoint[[i]][["ele"]])) ##gets elevation
hr[j] <- as.numeric(xmlValue(TrackPoint[[i]][[3]][[1]][["hr"]])) ##Heart Rate
cadence[j] <- as.numeric(xmlValue(TrackPoint[[i]][[3]][[1]][["cad"]])) ##Cadence
temperature[j] <- as.numeric(xmlValue(TrackPoint[[i]][[3]][[1]][["atemp"]])) ##Ambient Temperature
#Popup text for each point
#Time text arrangement
Hours <- as.integer(elapsedtime[j]/3600)
Minutes <- as.integer((elapsedtime[j] - Hours*3600)/60)
Seconds <- elapsedtime[j] - Hours*3600 - Minutes*60
if (Hours != 0) {
PopUpTime <- paste(Hours,"h ",Minutes,"m ",Seconds,"s",sep = "")
} else {
ifelse(Minutes != 0, PopUpTime <- paste(Minutes,"m ",Seconds,"s",sep = ""),
PopUpTime <- paste(Seconds,"s",sep = ""))
}
#Full text
Text <- append(Text, paste("Elapsed Time:",PopUpTime, "<br>",
"Elevation:",as.character(format(round(elevation[j], 2), nsmall = 2)),"<br>",
"Heart Rate:",as.character(hr[j]),"<br>",
"Cadence:",as.character(cadence[j]),"<br>",
"Temperature:",as.character(temperature[j])))
j <- j+1
counter <- 0
}
}
It´s important to notice that the code includes the text generation for each point giving a string of point data information. This is relevant since we can point the cursor in the screen and a popup window will show the information.
We create a leaflet plot of every trackpoint extracted, given the latitude and longitude with the respective text information gathered from the file:
runcoord <- data.frame(lat=latitude,lng=longitude)
#Add a new icon
runningicon <- makeIcon(iconUrl = "run1.png",
iconWidth = 15,
iconHeight = 15,
)
#Leaflet plot
runcoord %>% leaflet(width = "100%") %>% addTiles() %>% addMarkers(icon = runningicon, popup = Text)
As it can be seen from the map, a customized icon was added, following the running theme.
An additional step is to visualize the rest of the variables. It can be done in a facet plot against the elapsed time:
runtot <- data.frame(elapsedtime,elevation, hr,cadence,temperature) #converts to data frame all data
runtot <- pivot_longer(runtot,c(elevation, hr,cadence,temperature),names_to = "Variable",values_to = "Value") #arranges in longer format
g <- ggplot(runtot, aes(x = elapsedtime, y = Value)) + geom_line(aes(color = Variable)) +
facet_grid(rows=vars(Variable), scales = "free_y") +
labs(x = "Elapsed Time [secs]", y = "Data Field") +
theme(legend.position = "none") +
scale_color_manual(values = c("red","blue","black","darkgreen"))
ggplotly(g)