library(tidyverse)
library(XML)
library(sf)

Goal

To read files with .gpx extension from GPS software like wikiloc, Strava, Samsung Health and others.

GPX files

GPX, or GPS exchange format, is an XML schema designed as a common GPS data format for software applications wikipedia.

Many geocomputation packages in R can read .gpx files: Gecomputation in R. For instance, the package sf. Nevertheless, part of the metadata could be missing in some cases. For this reason, we can read GPX files directly as XML objects.

XML package

Again, many packages can read xml and help to parse it in R. I picked XML CRAN and learned it using Gaston Sanchez notes.

Data from wikiloc

Let us apply XML navigation to read all information in a wikiloc .gpx file.

file_name = "data/wikiloc/montcau-i-la-mola-des-del-coll-d-estenalles.gpx"
# external pointer is used
doc = htmlTreeParse(file = file_name, useInternalNodes = T, encoding = "UTF-8")

Metadata

There is only one node with this XPath

node = getNodeSet(doc, "/html/body/gpx/metadata")
xmlSize(node)
## [1] 1
node[[1]]
## <metadata>
##   <name>Wikiloc - Montcau i la Mola des del coll d'Estenalles</name>
##   <author>
##     <name>Sergi Boixader</name>
##     <link href="https://www.wikiloc.com/wikiloc/user.do?id=380745"/>
##     <text>Sergi Boixader on Wikiloc</text>
##   </author>
##   <link href="https://www.wikiloc.com/hiking-trails/montcau-i-la-mola-des-del-coll-destenalles-44641450"/>
##   <text>Montcau i la Mola des del coll d'Estenalles on Wikiloc</text>
##   <time>2019-12-21T10:30:40Z</time>
## </metadata>

we can navigate all metadata structure using ‘xmlChildren’ or ‘getNodeSet’.

For instance, to obtain the link to the route in wikiloc, we do the following

node = getNodeSet(doc, "/html/body/gpx/metadata/link")
xmlAttrs(node[[1]])
##                                                                                        href 
## "https://www.wikiloc.com/hiking-trails/montcau-i-la-mola-des-del-coll-destenalles-44641450"

The link to wikiloc page of the author of the route is obtained as

node = getNodeSet(doc, "/html/body/gpx/metadata/author/link")
xmlAttrs(node[[1]])
##                                                href 
## "https://www.wikiloc.com/wikiloc/user.do?id=380745"

and the name of the author

node = getNodeSet(doc, "/html/body/gpx/metadata/author/name")
sapply(node, xmlValue)
## [1] "Sergi Boixader"

Way-points

In a similar way we read all way-points, 7 of them

wpt_node = getNodeSet(doc, "/html/body/gpx/wpt")
xmlSize(wpt_node)
## [1] 7

for instance the first waypoint reads

wpt_node[[1]]
## <wpt lat="41.670660" lon="2.007717">
##   <ele>.0</ele>
##   <time>2019-12-22T04:40:59Z</time>
##   <name>Coll d'Eres</name>
##   <cmt/>
##   <desc/>
## </wpt>

You can see that there 5 nodes in each way-point (ele, time, name, cmt, desc) and the longitude and latitude of the point are the attributes of it

xmlAttrs(wpt_node[[1]])
##         lat         lon 
## "41.670660"  "2.007717"

This can be read as a numeric vector as

sapply(xmlAttrs(wpt_node[[1]]), as.numeric, USE.NAMES = TRUE)
##       lat       lon 
## 41.670660  2.007717

We can create a data frame with the way-points name and coordinates

n_elements = xmlSize(wpt_node)
wpt_df <- data.frame(
  name = character(n_elements), lat = numeric(n_elements), lon = numeric(n_elements))
coordinates = t(sapply(wpt_node, xmlAttrs))
wpt_df$lat = as.numeric(coordinates[, "lat"])
wpt_df$lon = as.numeric(coordinates[, "lon"])

In order to get the name, we need get the children for each node wpt and get the value of the node name. There is a function to run this loop over all nodes identified by a given XPath. In our case, the XPath reads “/html/body/gpx/wpt/name” (note that “//name” could also be used if no other name node exists in nodes different than wpt)

xpathSApply(doc, path = "/html/body/gpx/wpt/name", xmlValue)
## [1] "Coll d'Eres"                    "Coll d'Estenalles. Aparcament" 
## [3] "Coll de Tres Termes"            "Desviació al Montcau"          
## [5] "La Mola. Sant Llorenç del Munt" "Montcau"                       
## [7] "Morral del Drac (cova)"

In order to be sure that the accents are not messed up, it is important to include encoding = “UTF-8” parameter in htmlTreeParse

wpt_df$name = xpathSApply(doc, path = "/html/body/gpx/wpt/name", xmlValue)

We have defined a data frame with all way-points information.

as_tibble(wpt_df)
## # A tibble: 7 x 3
##   name                             lat   lon
##   <chr>                          <dbl> <dbl>
## 1 Coll d'Eres                     41.7  2.01
## 2 Coll d'Estenalles. Aparcament   41.7  1.99
## 3 Coll de Tres Termes             41.7  2.01
## 4 Desviació al Montcau            41.7  2.00
## 5 La Mola. Sant Llorenç del Munt  41.6  2.02
## 6 Montcau                         41.7  2.00
## 7 Morral del Drac (cova)          41.6  2.01

Track points

We now read the track points. In totat, there are 1318 points

track_node = getNodeSet(doc, "/html/body/gpx/trk/trkseg/trkpt")
xmlSize(track_node)
## [1] 1318

For instance, the fifth point reads

track_node[[5]]
## <trkpt lat="41.669064" lon="1.994213">
##   <ele>864.386</ele>
##   <time>2019-12-21T07:05:07Z</time>
## </trkpt>

We can see latitude and longitude are attributes while elevation and time are subnodes

times <- xpathSApply(doc, path = "/html/body/gpx/trk/trkseg/trkpt/time", xmlValue)
elevations <- as.numeric(
  xpathSApply(doc, path = "/html/body/gpx/trk/trkseg/trkpt/ele", xmlValue))
coords <- xpathSApply(doc, path = "//trkpt", xmlAttrs)
n_elements = xmlSize(track_node)
track_df <- data.frame(
  elevation = elevations, time = times,
  lat = as.numeric(coords["lat",]),
  lon = as.numeric(coords["lon",]))

We can finally transform times into datetime POSIX

track_df$time[1:3]
## [1] "2019-12-21T07:04:23Z" "2019-12-21T07:04:32Z" "2019-12-21T07:04:48Z"
track_df$time = strptime(times, "%Y-%m-%dT%H:%M:%SZ", tz = "UTC")
track_df$time[1:3]
## [1] "2019-12-21 07:04:23 UTC" "2019-12-21 07:04:32 UTC"
## [3] "2019-12-21 07:04:48 UTC"
track_df$time = as.POSIXct(track_df$time)
attr(track_df$time, "tzone") <- "Europe/Paris"
track_df$time[1:3]
## [1] "2019-12-21 08:04:23 CET" "2019-12-21 08:04:32 CET"
## [3] "2019-12-21 08:04:48 CET"

Therefore, we have been able to read all details in a wikiloc .gpx file and import them into different data frames.

Using sf package

sf package reads and manipulate many files with geographic information sf Reference

First, we check the layers. There are 5 of them

file_name = "data/wikiloc/montcau-i-la-mola-des-del-coll-d-estenalles.gpx"
st_layers(file_name)
## Driver: GPX 
## Available layers:
##     layer_name     geometry_type features fields
## 1    waypoints             Point        7     23
## 2       routes       Line String        0     12
## 3       tracks Multi Line String        1     12
## 4 route_points             Point        0     25
## 5 track_points             Point     1318     26

We can import one layer. If a layer has 0 features it means is empty (routes and route_points in our example)

First read the waypoints layer

wikiloc_sf_waypoints = st_read(file_name, layer = "waypoints")
## Reading layer `waypoints' from data source `C:\Users\josep\Documents\R\gpx_files\data\wikiloc\montcau-i-la-mola-des-del-coll-d-estenalles.gpx' using driver `GPX'
## Simple feature collection with 7 features and 23 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 1.994181 ymin: 41.64132 xmax: 2.018137 ymax: 41.67499
## geographic CRS: WGS 84

Now the information is in a sf object Simple Feature in R which is equivalent to a data frame plus a column with the geographic information.

For instance, the name field coincides with wpt_df$name

wikiloc_sf_waypoints$name
## [1] "Coll d'Eres"                    "Coll d'Estenalles. Aparcament" 
## [3] "Coll de Tres Termes"            "Desviació al Montcau"          
## [5] "La Mola. Sant Llorenç del Munt" "Montcau"                       
## [7] "Morral del Drac (cova)"

Coordinates are in the geometry column with other interesting information. If we just want to extract the coordinates to replicate the data frame we created previously

st_coordinates(wikiloc_sf_waypoints)
##          X        Y
## 1 2.007717 41.67066
## 2 1.994181 41.66893
## 3 2.010412 41.65647
## 4 1.998319 41.67077
## 5 2.018137 41.64132
## 6 2.004676 41.67499
## 7 2.014919 41.64558

where X is the longitude and y the latitude.

In this way, we can recreate data frame wpt_df, called wpt_df2, as

lon_lat_matrix = st_coordinates(wikiloc_sf_waypoints)
wpt_df2 = data.frame(
  name = wikiloc_sf_waypoints$name,
  lat = lon_lat_matrix[, "Y"],
  lon = lon_lat_matrix[, "X"])

Now we can read the “tracks” layer

wikiloc_sf_tracks = st_read(file_name, layer = "tracks")
## Reading layer `tracks' from data source `C:\Users\josep\Documents\R\gpx_files\data\wikiloc\montcau-i-la-mola-des-del-coll-d-estenalles.gpx' using driver `GPX'
## Simple feature collection with 1 feature and 12 fields
## geometry type:  MULTILINESTRING
## dimension:      XY
## bbox:           xmin: 1.994153 ymin: 41.64106 xmax: 2.018343 ymax: 41.67506
## geographic CRS: WGS 84

This layer can be use to plot the route as a line. Finally, the layer “track_points” gives teh same information contained in the node with XPath “/html/body/gpx/trk/trkseg/trkpt”

wikiloc_sf_track_points = st_read(file_name, layer = "track_points")
## Reading layer `track_points' from data source `C:\Users\josep\Documents\R\gpx_files\data\wikiloc\montcau-i-la-mola-des-del-coll-d-estenalles.gpx' using driver `GPX'
## Simple feature collection with 1318 features and 26 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 1.994153 ymin: 41.64106 xmax: 2.018343 ymax: 41.67506
## geographic CRS: WGS 84

The advantage is that time is already in POSIXct format and elevation is in the field ele. The rest of 26 fields are empty with the exception of the counter field track_seg_point_id that identifies sequentially every point of the tracked route.

Therefore, all geometry information can be obtained faster by using the sf package. The XML reading procedure it is useful when you are interested in non-geographic information.

Data from Strava

file_name = "data/strava/sant_ignasi_turo_de_monts_tibidabo_coll_de_la_vinassa_la_pinya_2_vallvidrera_santa_maria_de_jerusalem.gpx"
# external pointer is used
doc = htmlTreeParse(file = file_name, useInternalNodes = T)

First three nodes are the same for all gpx files: “/html/body/gpx”. For this file, there’s only metadata and track points (“trk”)

node = getNodeSet(doc, "/html/body/gpx")
gpx_info = xmlChildren(node[[1]])
str(gpx_info)
## List of 2
##  $ metadata:Classes 'XMLInternalElementNode', 'XMLInternalNode', 'XMLAbstractNode' <externalptr> 
##  $ trk     :Classes 'XMLInternalElementNode', 'XMLInternalNode', 'XMLAbstractNode' <externalptr> 
##  - attr(*, "class")= chr [1:2] "XMLInternalNodeList" "XMLNodeList"

Metadata

In this meadta, only time is shown

xmlChildren(gpx_info$metadata)
## $time
## <time>2020-09-26T05:39:03Z</time> 
## 
## attr(,"class")
## [1] "XMLInternalNodeList" "XMLNodeList"

Track points

We follow the same process as before to identify that there are a total of 2937 track points

track_node = getNodeSet(doc, "/html/body/gpx/trk/trkseg/trkpt")
xmlSize(track_node)
## [1] 2937

The fourth point, for instance, reads

track_node[[4]]
## <trkpt lat="41.4061470" lon="2.1201890">
##   <ele>160.4</ele>
##   <time>2020-09-26T05:39:08Z</time>
##   <extensions>
##     <trackpointextension>
##       <cad>83</cad>
##     </trackpointextension>
##   </extensions>
## </trkpt>

The extension node is specific of the software that generated the file, in this case Strava. It contains information about the cadence. Latitude and longitude are attributes while elevation and time are nodes.

The following data frame contains all inforamtion in the file

times = xpathSApply(doc, path = "/html/body/gpx/trk/trkseg/trkpt/time", xmlValue)
times = strptime(times, "%Y-%m-%dT%H:%M:%SZ", tz = "UTC")
times = as.POSIXct(times)
attr(times, "tzone") = "Europe/Paris"
elevations = as.numeric(
  xpathSApply(doc, path = "/html/body/gpx/trk/trkseg/trkpt/ele", xmlValue))
cadence = as.numeric(
  xpathSApply(doc, path = "//cad", xmlValue))
coords = xpathSApply(doc, path = "//trkpt", xmlAttrs)
track_df = data.frame(
  elevation = elevations, time = times,
  cadence = cadence,
  lat = as.numeric(coords["lat",]),
  lon = as.numeric(coords["lon",]))

Using sf package

First, check the layers. There are 5 layers as in the previous file

file_name = "data/strava/sant_ignasi_turo_de_monts_tibidabo_coll_de_la_vinassa_la_pinya_2_vallvidrera_santa_maria_de_jerusalem.gpx"
st_layers(file_name)
## Driver: GPX 
## Available layers:
##     layer_name     geometry_type features fields
## 1    waypoints             Point        0     23
## 2       routes       Line String        0     12
## 3       tracks Multi Line String        1     12
## 4 route_points             Point        0     25
## 5 track_points             Point     2937     27

Only tracks and track_point layers have content

First read the tracks layer, that gives the route as a Multi Line String

strava_sf_tracks = st_read(file_name, layer = "tracks")
## Reading layer `tracks' from data source `C:\Users\josep\Documents\R\gpx_files\data\strava\sant_ignasi_turo_de_monts_tibidabo_coll_de_la_vinassa_la_pinya_2_vallvidrera_santa_maria_de_jerusalem.gpx' using driver `GPX'
## Simple feature collection with 1 feature and 12 fields
## geometry type:  MULTILINESTRING
## dimension:      XY
## bbox:           xmin: 2.101805 ymin: 41.40574 xmax: 2.122151 ymax: 41.42716
## geographic CRS: WGS 84

Detail of track_points can be obtained as

strava_sf_track_points = st_read(file_name, layer = "track_points")
## Reading layer `track_points' from data source `C:\Users\josep\Documents\R\gpx_files\data\strava\sant_ignasi_turo_de_monts_tibidabo_coll_de_la_vinassa_la_pinya_2_vallvidrera_santa_maria_de_jerusalem.gpx' using driver `GPX'
## Simple feature collection with 2937 features and 27 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 2.101805 ymin: 41.40574 xmax: 2.122151 ymax: 41.42716
## geographic CRS: WGS 84

with this method, coordinates elevation and times are read but not cadence, as it is specific to Strava gpx files. Having the route directly in a sf object has some advantages although the first method is required if all information in the file is needed

Data from Instamaps

Instamaps can export the full map in raster format (tif) or vector format (geopackage). Tracks can be saved in GPX format as layers.

file_name = "data/instamaps/itinerari.gpx"
# external pointer is used
doc = htmlTreeParse(file = file_name, useInternalNodes = T)

This file has only metadata and rte (route) information

node = getNodeSet(doc, "/html/body/gpx")
gpx_info = xmlChildren(node[[1]])
str(gpx_info)
## List of 2
##  $ metadata:Classes 'XMLInternalElementNode', 'XMLInternalNode', 'XMLAbstractNode' <externalptr> 
##  $ rte     :Classes 'XMLInternalElementNode', 'XMLInternalNode', 'XMLAbstractNode' <externalptr> 
##  - attr(*, "class")= chr [1:2] "XMLInternalNodeList" "XMLNodeList"

Metadata

Metadata contains interesting information about the bounding box for the map

metadata = xmlChildren(gpx_info$metadata)
bounds <- xmlAttrs(metadata$bounds)
bb = as.numeric(bounds)
names(bb) = names(bounds)
bb
##    minlat    minlon    maxlat    maxlon 
## 41.427790  2.133622 41.449960  2.159886

Track points

A total of 152 points are incluede in the route

track_node = getNodeSet(doc, "/html/body/gpx/rte/rtept")
xmlSize(track_node)
## [1] 152

Looking at the first point, we can see that each point contains only latitude and longitude as attributes

track_node[[1]]
## <rtept lat="41.429584" lon="2.14345"/>

The coordinates can be exported into a data frame as

coords <- xpathSApply(doc, path = "//rtept", xmlAttrs)
track_df <- data.frame(
  lat = as.numeric(coords["lat",]),
  lon = as.numeric(coords["lon",]))

Using sf package

First, check the layers. There are 5 layers as in all gpx files

st_layers(file_name)
## Driver: GPX 
## Available layers:
##     layer_name     geometry_type features fields
## 1    waypoints             Point        0     23
## 2       routes       Line String        1     15
## 3       tracks Multi Line String        0     12
## 4 route_points             Point      152     25
## 5 track_points             Point        0     26

Only routes and route_points layers have content

First read the routes layer, that gives the route as a Line String

instamaps_sf_routes = st_read(file_name, layer = "routes")
## Reading layer `routes' from data source `C:\Users\josep\Documents\R\gpx_files\data\instamaps\itinerari.gpx' using driver `GPX'
## Simple feature collection with 1 feature and 15 fields
## geometry type:  LINESTRING
## dimension:      XY
## bbox:           xmin: 2.133622 ymin: 41.42779 xmax: 2.159886 ymax: 41.44996
## geographic CRS: WGS 84

Detail of route_points can be obtained as

instamaps_sf_route_points = st_read(file_name, layer = "route_points")
## Reading layer `route_points' from data source `C:\Users\josep\Documents\R\gpx_files\data\instamaps\itinerari.gpx' using driver `GPX'
## Simple feature collection with 152 features and 25 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 2.133622 ymin: 41.42779 xmax: 2.159886 ymax: 41.44996
## geographic CRS: WGS 84

All features are empty with the exception of route_point_id that identifies each point in the route sequentially.

Coordinates of route points can be read easily using st_coordinates function

str(st_coordinates(instamaps_sf_route_points))
##  num [1:152, 1:2] 2.14 2.14 2.14 2.14 2.14 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : chr [1:152] "1" "2" "3" "4" ...
##   ..$ : chr [1:2] "X" "Y"
lon_lat_matrix = st_coordinates(instamaps_sf_route_points)
instamaps_coordinates = data.frame(
  lat = lon_lat_matrix[, "Y"],
  lon = lon_lat_matrix[, "X"])
head(instamaps_coordinates)
##        lat      lon
## 1 41.42958 2.143450
## 2 41.42908 2.142055
## 3 41.42997 2.138590
## 4 41.42779 2.137045
## 5 41.42809 2.136530
## 6 41.42875 2.136530

Data from Vissir

Vissir is the Geoinformation advanced viewer of the Institut Cartogràfic i Geològic de Catalunya.

file_name = "data/vissir/itinerari_fet_amb_vissir.gpx"
# external pointer is used
doc = htmlTreeParse(file = file_name, useInternalNodes = T)
node = getNodeSet(doc, "/html/body/gpx")
gpx_info = xmlChildren(node[[1]])
str(gpx_info)
## List of 2
##  $ metadata:Classes 'XMLInternalElementNode', 'XMLInternalNode', 'XMLAbstractNode' <externalptr> 
##  $ trk     :Classes 'XMLInternalElementNode', 'XMLInternalNode', 'XMLAbstractNode' <externalptr> 
##  - attr(*, "class")= chr [1:2] "XMLInternalNodeList" "XMLNodeList"

Metadata

metadata = xmlChildren(gpx_info$metadata)
metadata
## $name
## <name>Vissir3_GPX_Export</name> 
## 
## attr(,"class")
## [1] "XMLInternalNodeList" "XMLNodeList"

Track Points

A total of 88 points in track_points

track_node = getNodeSet(doc, "/html/body/gpx/trk/trkseg/trkpt")
xmlSize(track_node)
## [1] 88
track_node[[1]]
## <trkpt lon="2.136177293750496" lat="41.415971263922714"/>

Only coordinates information is available (with more decimals, to be changed with R option digits options(digits = 15)).

coords <- xpathSApply(doc, path = "//trkpt", xmlAttrs)
track_df <- data.frame(
  lat = as.numeric(coords["lat",]),
  lon = as.numeric(coords["lon",]))

Using sf package

st_layers(file_name)
## Driver: GPX 
## Available layers:
##     layer_name     geometry_type features fields
## 1    waypoints             Point        0     23
## 2       routes       Line String        0     12
## 3       tracks Multi Line String        1     12
## 4 route_points             Point        0     25
## 5 track_points             Point       88     26

Tracks and Track_points layers contain information. We read them as we did for previous cases. As an example, information about the track points

vissir_sf_track_points = st_read(file_name, layer = "track_points")
## Reading layer `track_points' from data source `C:\Users\josep\Documents\R\gpx_files\data\vissir\itinerari_fet_amb_vissir.gpx' using driver `GPX'
## Simple feature collection with 88 features and 26 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 2.124332 ymin: 41.41274 xmax: 2.136177 ymax: 41.42125
## geographic CRS: WGS 84

Data from GPS visualizer

GPSvisualizer provides a simple interface to draw points, routes and polygons on several maps.

WE can use directly sf package function st_read that already has drivers to read .gpx files as we have shown before

file_name = "data/gpsvisualizer/1005062619-75178.gpx"
st_layers(file_name)
## Driver: GPX 
## Available layers:
##     layer_name     geometry_type features fields
## 1    waypoints             Point        0     23
## 2       routes       Line String        0     12
## 3       tracks Multi Line String        1     12
## 4 route_points             Point        0     25
## 5 track_points             Point      113     26
gps_track_points_sf = st_read(file_name, layer = "track_points")
## Reading layer `track_points' from data source `C:\Users\josep\Documents\R\gpx_files\data\gpsvisualizer\1005062619-75178.gpx' using driver `GPX'
## Simple feature collection with 113 features and 26 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 2.129352 ymin: 41.4291 xmax: 2.144051 ymax: 41.44496
## geographic CRS: WGS 84
gps_tracks_sf = st_read(file_name, layer = "tracks")
## Reading layer `tracks' from data source `C:\Users\josep\Documents\R\gpx_files\data\gpsvisualizer\1005062619-75178.gpx' using driver `GPX'
## Simple feature collection with 1 feature and 12 fields
## geometry type:  MULTILINESTRING
## dimension:      XY
## bbox:           xmin: 2.129352 ymin: 41.4291 xmax: 2.144051 ymax: 41.44496
## geographic CRS: WGS 84

Conclusion

We have shown how to read the information in a file with gpx extension in two ways:

  1. As XML file. All information can be extracted including metadata
  2. As sf object. Most information related to points tracked can be read directly into a sf object. Nevertheless, specific information of some software (for instance, cadence in files from Strava) and metadata can not be read.