library(tidyverse)
library(XML)
library(sf)
To read files with .gpx extension from GPS software like wikiloc, Strava, Samsung Health and others.
GPX, or GPS exchange format, is an XML schema designed as a common GPS data format for software applications wikipedia.
Many geocomputation packages in R can read .gpx files: Gecomputation in R. For instance, the package sf. Nevertheless, part of the metadata could be missing in some cases. For this reason, we can read GPX files directly as XML objects.
Again, many packages can read xml and help to parse it in R. I picked XML CRAN and learned it using Gaston Sanchez notes.
Let us apply XML navigation to read all information in a wikiloc .gpx file.
file_name = "data/wikiloc/montcau-i-la-mola-des-del-coll-d-estenalles.gpx"
# external pointer is used
doc = htmlTreeParse(file = file_name, useInternalNodes = T, encoding = "UTF-8")
There is only one node with this XPath
node = getNodeSet(doc, "/html/body/gpx/metadata")
xmlSize(node)
## [1] 1
node[[1]]
## <metadata>
## <name>Wikiloc - Montcau i la Mola des del coll d'Estenalles</name>
## <author>
## <name>Sergi Boixader</name>
## <link href="https://www.wikiloc.com/wikiloc/user.do?id=380745"/>
## <text>Sergi Boixader on Wikiloc</text>
## </author>
## <link href="https://www.wikiloc.com/hiking-trails/montcau-i-la-mola-des-del-coll-destenalles-44641450"/>
## <text>Montcau i la Mola des del coll d'Estenalles on Wikiloc</text>
## <time>2019-12-21T10:30:40Z</time>
## </metadata>
we can navigate all metadata structure using ‘xmlChildren’ or ‘getNodeSet’.
For instance, to obtain the link to the route in wikiloc, we do the following
node = getNodeSet(doc, "/html/body/gpx/metadata/link")
xmlAttrs(node[[1]])
## href
## "https://www.wikiloc.com/hiking-trails/montcau-i-la-mola-des-del-coll-destenalles-44641450"
The link to wikiloc page of the author of the route is obtained as
node = getNodeSet(doc, "/html/body/gpx/metadata/author/link")
xmlAttrs(node[[1]])
## href
## "https://www.wikiloc.com/wikiloc/user.do?id=380745"
and the name of the author
node = getNodeSet(doc, "/html/body/gpx/metadata/author/name")
sapply(node, xmlValue)
## [1] "Sergi Boixader"
In a similar way we read all way-points, 7 of them
wpt_node = getNodeSet(doc, "/html/body/gpx/wpt")
xmlSize(wpt_node)
## [1] 7
for instance the first waypoint reads
wpt_node[[1]]
## <wpt lat="41.670660" lon="2.007717">
## <ele>.0</ele>
## <time>2019-12-22T04:40:59Z</time>
## <name>Coll d'Eres</name>
## <cmt/>
## <desc/>
## </wpt>
You can see that there 5 nodes in each way-point (ele, time, name, cmt, desc) and the longitude and latitude of the point are the attributes of it
xmlAttrs(wpt_node[[1]])
## lat lon
## "41.670660" "2.007717"
This can be read as a numeric vector as
sapply(xmlAttrs(wpt_node[[1]]), as.numeric, USE.NAMES = TRUE)
## lat lon
## 41.670660 2.007717
We can create a data frame with the way-points name and coordinates
n_elements = xmlSize(wpt_node)
wpt_df <- data.frame(
name = character(n_elements), lat = numeric(n_elements), lon = numeric(n_elements))
coordinates = t(sapply(wpt_node, xmlAttrs))
wpt_df$lat = as.numeric(coordinates[, "lat"])
wpt_df$lon = as.numeric(coordinates[, "lon"])
In order to get the name, we need get the children for each node wpt and get the value of the node name. There is a function to run this loop over all nodes identified by a given XPath. In our case, the XPath reads “/html/body/gpx/wpt/name” (note that “//name” could also be used if no other name node exists in nodes different than wpt)
xpathSApply(doc, path = "/html/body/gpx/wpt/name", xmlValue)
## [1] "Coll d'Eres" "Coll d'Estenalles. Aparcament"
## [3] "Coll de Tres Termes" "Desviació al Montcau"
## [5] "La Mola. Sant Llorenç del Munt" "Montcau"
## [7] "Morral del Drac (cova)"
In order to be sure that the accents are not messed up, it is important to include encoding = “UTF-8” parameter in htmlTreeParse
wpt_df$name = xpathSApply(doc, path = "/html/body/gpx/wpt/name", xmlValue)
We have defined a data frame with all way-points information.
as_tibble(wpt_df)
## # A tibble: 7 x 3
## name lat lon
## <chr> <dbl> <dbl>
## 1 Coll d'Eres 41.7 2.01
## 2 Coll d'Estenalles. Aparcament 41.7 1.99
## 3 Coll de Tres Termes 41.7 2.01
## 4 Desviació al Montcau 41.7 2.00
## 5 La Mola. Sant Llorenç del Munt 41.6 2.02
## 6 Montcau 41.7 2.00
## 7 Morral del Drac (cova) 41.6 2.01
We now read the track points. In totat, there are 1318 points
track_node = getNodeSet(doc, "/html/body/gpx/trk/trkseg/trkpt")
xmlSize(track_node)
## [1] 1318
For instance, the fifth point reads
track_node[[5]]
## <trkpt lat="41.669064" lon="1.994213">
## <ele>864.386</ele>
## <time>2019-12-21T07:05:07Z</time>
## </trkpt>
We can see latitude and longitude are attributes while elevation and time are subnodes
times <- xpathSApply(doc, path = "/html/body/gpx/trk/trkseg/trkpt/time", xmlValue)
elevations <- as.numeric(
xpathSApply(doc, path = "/html/body/gpx/trk/trkseg/trkpt/ele", xmlValue))
coords <- xpathSApply(doc, path = "//trkpt", xmlAttrs)
n_elements = xmlSize(track_node)
track_df <- data.frame(
elevation = elevations, time = times,
lat = as.numeric(coords["lat",]),
lon = as.numeric(coords["lon",]))
We can finally transform times into datetime POSIX
track_df$time[1:3]
## [1] "2019-12-21T07:04:23Z" "2019-12-21T07:04:32Z" "2019-12-21T07:04:48Z"
track_df$time = strptime(times, "%Y-%m-%dT%H:%M:%SZ", tz = "UTC")
track_df$time[1:3]
## [1] "2019-12-21 07:04:23 UTC" "2019-12-21 07:04:32 UTC"
## [3] "2019-12-21 07:04:48 UTC"
track_df$time = as.POSIXct(track_df$time)
attr(track_df$time, "tzone") <- "Europe/Paris"
track_df$time[1:3]
## [1] "2019-12-21 08:04:23 CET" "2019-12-21 08:04:32 CET"
## [3] "2019-12-21 08:04:48 CET"
Therefore, we have been able to read all details in a wikiloc .gpx file and import them into different data frames.
sf package reads and manipulate many files with geographic information sf Reference
First, we check the layers. There are 5 of them
file_name = "data/wikiloc/montcau-i-la-mola-des-del-coll-d-estenalles.gpx"
st_layers(file_name)
## Driver: GPX
## Available layers:
## layer_name geometry_type features fields
## 1 waypoints Point 7 23
## 2 routes Line String 0 12
## 3 tracks Multi Line String 1 12
## 4 route_points Point 0 25
## 5 track_points Point 1318 26
We can import one layer. If a layer has 0 features it means is empty (routes and route_points in our example)
First read the waypoints layer
wikiloc_sf_waypoints = st_read(file_name, layer = "waypoints")
## Reading layer `waypoints' from data source `C:\Users\josep\Documents\R\gpx_files\data\wikiloc\montcau-i-la-mola-des-del-coll-d-estenalles.gpx' using driver `GPX'
## Simple feature collection with 7 features and 23 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 1.994181 ymin: 41.64132 xmax: 2.018137 ymax: 41.67499
## geographic CRS: WGS 84
Now the information is in a sf object Simple Feature in R which is equivalent to a data frame plus a column with the geographic information.
For instance, the name field coincides with wpt_df$name
wikiloc_sf_waypoints$name
## [1] "Coll d'Eres" "Coll d'Estenalles. Aparcament"
## [3] "Coll de Tres Termes" "Desviació al Montcau"
## [5] "La Mola. Sant Llorenç del Munt" "Montcau"
## [7] "Morral del Drac (cova)"
Coordinates are in the geometry column with other interesting information. If we just want to extract the coordinates to replicate the data frame we created previously
st_coordinates(wikiloc_sf_waypoints)
## X Y
## 1 2.007717 41.67066
## 2 1.994181 41.66893
## 3 2.010412 41.65647
## 4 1.998319 41.67077
## 5 2.018137 41.64132
## 6 2.004676 41.67499
## 7 2.014919 41.64558
where X is the longitude and y the latitude.
In this way, we can recreate data frame wpt_df, called wpt_df2, as
lon_lat_matrix = st_coordinates(wikiloc_sf_waypoints)
wpt_df2 = data.frame(
name = wikiloc_sf_waypoints$name,
lat = lon_lat_matrix[, "Y"],
lon = lon_lat_matrix[, "X"])
Now we can read the “tracks” layer
wikiloc_sf_tracks = st_read(file_name, layer = "tracks")
## Reading layer `tracks' from data source `C:\Users\josep\Documents\R\gpx_files\data\wikiloc\montcau-i-la-mola-des-del-coll-d-estenalles.gpx' using driver `GPX'
## Simple feature collection with 1 feature and 12 fields
## geometry type: MULTILINESTRING
## dimension: XY
## bbox: xmin: 1.994153 ymin: 41.64106 xmax: 2.018343 ymax: 41.67506
## geographic CRS: WGS 84
This layer can be use to plot the route as a line. Finally, the layer “track_points” gives teh same information contained in the node with XPath “/html/body/gpx/trk/trkseg/trkpt”
wikiloc_sf_track_points = st_read(file_name, layer = "track_points")
## Reading layer `track_points' from data source `C:\Users\josep\Documents\R\gpx_files\data\wikiloc\montcau-i-la-mola-des-del-coll-d-estenalles.gpx' using driver `GPX'
## Simple feature collection with 1318 features and 26 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 1.994153 ymin: 41.64106 xmax: 2.018343 ymax: 41.67506
## geographic CRS: WGS 84
The advantage is that time is already in POSIXct format and elevation is in the field ele. The rest of 26 fields are empty with the exception of the counter field track_seg_point_id that identifies sequentially every point of the tracked route.
Therefore, all geometry information can be obtained faster by using the sf package. The XML reading procedure it is useful when you are interested in non-geographic information.
file_name = "data/strava/sant_ignasi_turo_de_monts_tibidabo_coll_de_la_vinassa_la_pinya_2_vallvidrera_santa_maria_de_jerusalem.gpx"
# external pointer is used
doc = htmlTreeParse(file = file_name, useInternalNodes = T)
First three nodes are the same for all gpx files: “/html/body/gpx”. For this file, there’s only metadata and track points (“trk”)
node = getNodeSet(doc, "/html/body/gpx")
gpx_info = xmlChildren(node[[1]])
str(gpx_info)
## List of 2
## $ metadata:Classes 'XMLInternalElementNode', 'XMLInternalNode', 'XMLAbstractNode' <externalptr>
## $ trk :Classes 'XMLInternalElementNode', 'XMLInternalNode', 'XMLAbstractNode' <externalptr>
## - attr(*, "class")= chr [1:2] "XMLInternalNodeList" "XMLNodeList"
In this meadta, only time is shown
xmlChildren(gpx_info$metadata)
## $time
## <time>2020-09-26T05:39:03Z</time>
##
## attr(,"class")
## [1] "XMLInternalNodeList" "XMLNodeList"
We follow the same process as before to identify that there are a total of 2937 track points
track_node = getNodeSet(doc, "/html/body/gpx/trk/trkseg/trkpt")
xmlSize(track_node)
## [1] 2937
The fourth point, for instance, reads
track_node[[4]]
## <trkpt lat="41.4061470" lon="2.1201890">
## <ele>160.4</ele>
## <time>2020-09-26T05:39:08Z</time>
## <extensions>
## <trackpointextension>
## <cad>83</cad>
## </trackpointextension>
## </extensions>
## </trkpt>
The extension node is specific of the software that generated the file, in this case Strava. It contains information about the cadence. Latitude and longitude are attributes while elevation and time are nodes.
The following data frame contains all inforamtion in the file
times = xpathSApply(doc, path = "/html/body/gpx/trk/trkseg/trkpt/time", xmlValue)
times = strptime(times, "%Y-%m-%dT%H:%M:%SZ", tz = "UTC")
times = as.POSIXct(times)
attr(times, "tzone") = "Europe/Paris"
elevations = as.numeric(
xpathSApply(doc, path = "/html/body/gpx/trk/trkseg/trkpt/ele", xmlValue))
cadence = as.numeric(
xpathSApply(doc, path = "//cad", xmlValue))
coords = xpathSApply(doc, path = "//trkpt", xmlAttrs)
track_df = data.frame(
elevation = elevations, time = times,
cadence = cadence,
lat = as.numeric(coords["lat",]),
lon = as.numeric(coords["lon",]))
First, check the layers. There are 5 layers as in the previous file
file_name = "data/strava/sant_ignasi_turo_de_monts_tibidabo_coll_de_la_vinassa_la_pinya_2_vallvidrera_santa_maria_de_jerusalem.gpx"
st_layers(file_name)
## Driver: GPX
## Available layers:
## layer_name geometry_type features fields
## 1 waypoints Point 0 23
## 2 routes Line String 0 12
## 3 tracks Multi Line String 1 12
## 4 route_points Point 0 25
## 5 track_points Point 2937 27
Only tracks and track_point layers have content
First read the tracks layer, that gives the route as a Multi Line String
strava_sf_tracks = st_read(file_name, layer = "tracks")
## Reading layer `tracks' from data source `C:\Users\josep\Documents\R\gpx_files\data\strava\sant_ignasi_turo_de_monts_tibidabo_coll_de_la_vinassa_la_pinya_2_vallvidrera_santa_maria_de_jerusalem.gpx' using driver `GPX'
## Simple feature collection with 1 feature and 12 fields
## geometry type: MULTILINESTRING
## dimension: XY
## bbox: xmin: 2.101805 ymin: 41.40574 xmax: 2.122151 ymax: 41.42716
## geographic CRS: WGS 84
Detail of track_points can be obtained as
strava_sf_track_points = st_read(file_name, layer = "track_points")
## Reading layer `track_points' from data source `C:\Users\josep\Documents\R\gpx_files\data\strava\sant_ignasi_turo_de_monts_tibidabo_coll_de_la_vinassa_la_pinya_2_vallvidrera_santa_maria_de_jerusalem.gpx' using driver `GPX'
## Simple feature collection with 2937 features and 27 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 2.101805 ymin: 41.40574 xmax: 2.122151 ymax: 41.42716
## geographic CRS: WGS 84
with this method, coordinates elevation and times are read but not cadence, as it is specific to Strava gpx files. Having the route directly in a sf object has some advantages although the first method is required if all information in the file is needed
Instamaps can export the full map in raster format (tif) or vector format (geopackage). Tracks can be saved in GPX format as layers.
file_name = "data/instamaps/itinerari.gpx"
# external pointer is used
doc = htmlTreeParse(file = file_name, useInternalNodes = T)
This file has only metadata and rte (route) information
node = getNodeSet(doc, "/html/body/gpx")
gpx_info = xmlChildren(node[[1]])
str(gpx_info)
## List of 2
## $ metadata:Classes 'XMLInternalElementNode', 'XMLInternalNode', 'XMLAbstractNode' <externalptr>
## $ rte :Classes 'XMLInternalElementNode', 'XMLInternalNode', 'XMLAbstractNode' <externalptr>
## - attr(*, "class")= chr [1:2] "XMLInternalNodeList" "XMLNodeList"
Metadata contains interesting information about the bounding box for the map
metadata = xmlChildren(gpx_info$metadata)
bounds <- xmlAttrs(metadata$bounds)
bb = as.numeric(bounds)
names(bb) = names(bounds)
bb
## minlat minlon maxlat maxlon
## 41.427790 2.133622 41.449960 2.159886
A total of 152 points are incluede in the route
track_node = getNodeSet(doc, "/html/body/gpx/rte/rtept")
xmlSize(track_node)
## [1] 152
Looking at the first point, we can see that each point contains only latitude and longitude as attributes
track_node[[1]]
## <rtept lat="41.429584" lon="2.14345"/>
The coordinates can be exported into a data frame as
coords <- xpathSApply(doc, path = "//rtept", xmlAttrs)
track_df <- data.frame(
lat = as.numeric(coords["lat",]),
lon = as.numeric(coords["lon",]))
First, check the layers. There are 5 layers as in all gpx files
st_layers(file_name)
## Driver: GPX
## Available layers:
## layer_name geometry_type features fields
## 1 waypoints Point 0 23
## 2 routes Line String 1 15
## 3 tracks Multi Line String 0 12
## 4 route_points Point 152 25
## 5 track_points Point 0 26
Only routes and route_points layers have content
First read the routes layer, that gives the route as a Line String
instamaps_sf_routes = st_read(file_name, layer = "routes")
## Reading layer `routes' from data source `C:\Users\josep\Documents\R\gpx_files\data\instamaps\itinerari.gpx' using driver `GPX'
## Simple feature collection with 1 feature and 15 fields
## geometry type: LINESTRING
## dimension: XY
## bbox: xmin: 2.133622 ymin: 41.42779 xmax: 2.159886 ymax: 41.44996
## geographic CRS: WGS 84
Detail of route_points can be obtained as
instamaps_sf_route_points = st_read(file_name, layer = "route_points")
## Reading layer `route_points' from data source `C:\Users\josep\Documents\R\gpx_files\data\instamaps\itinerari.gpx' using driver `GPX'
## Simple feature collection with 152 features and 25 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 2.133622 ymin: 41.42779 xmax: 2.159886 ymax: 41.44996
## geographic CRS: WGS 84
All features are empty with the exception of route_point_id that identifies each point in the route sequentially.
Coordinates of route points can be read easily using st_coordinates function
str(st_coordinates(instamaps_sf_route_points))
## num [1:152, 1:2] 2.14 2.14 2.14 2.14 2.14 ...
## - attr(*, "dimnames")=List of 2
## ..$ : chr [1:152] "1" "2" "3" "4" ...
## ..$ : chr [1:2] "X" "Y"
lon_lat_matrix = st_coordinates(instamaps_sf_route_points)
instamaps_coordinates = data.frame(
lat = lon_lat_matrix[, "Y"],
lon = lon_lat_matrix[, "X"])
head(instamaps_coordinates)
## lat lon
## 1 41.42958 2.143450
## 2 41.42908 2.142055
## 3 41.42997 2.138590
## 4 41.42779 2.137045
## 5 41.42809 2.136530
## 6 41.42875 2.136530
Vissir is the Geoinformation advanced viewer of the Institut Cartogràfic i Geològic de Catalunya.
file_name = "data/vissir/itinerari_fet_amb_vissir.gpx"
# external pointer is used
doc = htmlTreeParse(file = file_name, useInternalNodes = T)
node = getNodeSet(doc, "/html/body/gpx")
gpx_info = xmlChildren(node[[1]])
str(gpx_info)
## List of 2
## $ metadata:Classes 'XMLInternalElementNode', 'XMLInternalNode', 'XMLAbstractNode' <externalptr>
## $ trk :Classes 'XMLInternalElementNode', 'XMLInternalNode', 'XMLAbstractNode' <externalptr>
## - attr(*, "class")= chr [1:2] "XMLInternalNodeList" "XMLNodeList"
metadata = xmlChildren(gpx_info$metadata)
metadata
## $name
## <name>Vissir3_GPX_Export</name>
##
## attr(,"class")
## [1] "XMLInternalNodeList" "XMLNodeList"
A total of 88 points in track_points
track_node = getNodeSet(doc, "/html/body/gpx/trk/trkseg/trkpt")
xmlSize(track_node)
## [1] 88
track_node[[1]]
## <trkpt lon="2.136177293750496" lat="41.415971263922714"/>
Only coordinates information is available (with more decimals, to be changed with R option digits options(digits = 15)).
coords <- xpathSApply(doc, path = "//trkpt", xmlAttrs)
track_df <- data.frame(
lat = as.numeric(coords["lat",]),
lon = as.numeric(coords["lon",]))
st_layers(file_name)
## Driver: GPX
## Available layers:
## layer_name geometry_type features fields
## 1 waypoints Point 0 23
## 2 routes Line String 0 12
## 3 tracks Multi Line String 1 12
## 4 route_points Point 0 25
## 5 track_points Point 88 26
Tracks and Track_points layers contain information. We read them as we did for previous cases. As an example, information about the track points
vissir_sf_track_points = st_read(file_name, layer = "track_points")
## Reading layer `track_points' from data source `C:\Users\josep\Documents\R\gpx_files\data\vissir\itinerari_fet_amb_vissir.gpx' using driver `GPX'
## Simple feature collection with 88 features and 26 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 2.124332 ymin: 41.41274 xmax: 2.136177 ymax: 41.42125
## geographic CRS: WGS 84
GPSvisualizer provides a simple interface to draw points, routes and polygons on several maps.
WE can use directly sf package function st_read that already has drivers to read .gpx files as we have shown before
file_name = "data/gpsvisualizer/1005062619-75178.gpx"
st_layers(file_name)
## Driver: GPX
## Available layers:
## layer_name geometry_type features fields
## 1 waypoints Point 0 23
## 2 routes Line String 0 12
## 3 tracks Multi Line String 1 12
## 4 route_points Point 0 25
## 5 track_points Point 113 26
gps_track_points_sf = st_read(file_name, layer = "track_points")
## Reading layer `track_points' from data source `C:\Users\josep\Documents\R\gpx_files\data\gpsvisualizer\1005062619-75178.gpx' using driver `GPX'
## Simple feature collection with 113 features and 26 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 2.129352 ymin: 41.4291 xmax: 2.144051 ymax: 41.44496
## geographic CRS: WGS 84
gps_tracks_sf = st_read(file_name, layer = "tracks")
## Reading layer `tracks' from data source `C:\Users\josep\Documents\R\gpx_files\data\gpsvisualizer\1005062619-75178.gpx' using driver `GPX'
## Simple feature collection with 1 feature and 12 fields
## geometry type: MULTILINESTRING
## dimension: XY
## bbox: xmin: 2.129352 ymin: 41.4291 xmax: 2.144051 ymax: 41.44496
## geographic CRS: WGS 84
We have shown how to read the information in a file with gpx extension in two ways: