Analyzing Envirocar trajectory data with R

We can import the envirocar directly into spatial objects by exploiting the GeoJSON format which is supported by OGR, hence R function readOGR in package rgdal. To read from a https connection, we need RCurl.

url = "https://giv-car.uni-muenster.de/stable/rest/tracks/5207d871e4b058cd3d669afe"
require(rgdal)  # readOGR
## Loading required package: rgdal Loading required package: sp rgdal:
## version: 0.8-11, (SVN revision 479M) Geospatial Data Abstraction Library
## extensions to R successfully loaded Loaded GDAL runtime: GDAL 1.10.0,
## released 2013/04/24 Path to GDAL shared files: /usr/share/gdal/1.10 Loaded
## PROJ.4 runtime: Rel. 4.8.0, 6 March 2012, [PJ_VERSION: 480] Path to PROJ.4
## shared files: (autodetected)
layer <- readOGR(url, layer = "OGRGeoJSON")
## OGR data source with driver: GeoJSON 
## Source: "https://giv-car.uni-muenster.de/stable/rest/tracks/5207d871e4b058cd3d669afe", layer: "OGRGeoJSON"
## with 173 features and 3 fields
## Feature type: wkbPoint with 2 dimensions
class(layer)
## [1] "SpatialPointsDataFrame"
## attr(,"package")
## [1] "sp"
layer[1, ]
##          coordinates id                 time
## 1 (7.59584, 51.9691)  0 2013-08-11T16:56:06Z
##                                                                                                                                                                                                                                                                                                                                                                                   phenomenons
## 1 { "Intake Temperature": { "value": 38.0, "unit": "c" }, "Intake Pressure": { "value": 126.0, "unit": "kPa" }, "CO2": { "value": 0.00587494280504, "unit": "kg\\/h" }, "Calculated MAF": { "value": 26.7454, "unit": "g\\/s" }, "Speed": { "value": 4.0, "unit": "km\\/h" }, "Rpm": { "value": 1339.0, "unit": "u\\/min" }, "Consumption": { "value": 0.002216959549072, "unit": "l\\/h" } }

As we can see, the attribute contains id (point nr) and time (still as factor, not as POSIXt) and the measured values are still in a JSON string that needs to be parsed. For this, we wrote an import function that converts time, parses the JSON into measured values, and adds a units attribute to the sp object:

importEnviroCar = function(file) {
    require(rjson)  # fromJSON
    require(maptools)  # spCbind
    # read data as spatial object:
    layer = readOGR(file, layer = "OGRGeoJSON")
    # convert time from text to POSIXct:
    layer$time = as.POSIXct(layer$time, format = "%Y-%m-%dT%H:%M:%SZ")
    # the third column is JSON, we want it in a table (data.frame) form: 1. form
    # a list of lists
    l1 = lapply(as.character(layer[[3]]), fromJSON)
    # 2. parse the $value elements in the sublist:
    l2 = lapply(l1, function(x) as.data.frame(lapply(x, function(X) X$value)))
    # bind these value elements, row by row, together
    ret = do.call(rbind, l2)
    # read the units:
    units = lapply(l1[1], function(x) as.data.frame(lapply(x, function(X) X$unit)))
    # add a units attribute to layer
    layer[[3]] = NULL
    # add the table as attributes to the spatial object
    if (length(layer) == nrow(ret)) {
        layer = spCbind(layer, ret)
        attr(layer, "units") = units[[1]]
        layer
    } else NULL
}

url = "https://giv-car.uni-muenster.de/stable/rest/tracks/5207d871e4b058cd3d669afe"
sp = importEnviroCar(url)
## Loading required package: rjson Loading required package: maptools
## Checking rgeos availability: TRUE
## OGR data source with driver: GeoJSON 
## Source: "https://giv-car.uni-muenster.de/stable/rest/tracks/5207d871e4b058cd3d669afe", layer: "OGRGeoJSON"
## with 173 features and 3 fields
## Feature type: wkbPoint with 2 dimensions
sp[1, ]
##          coordinates id                time Intake.Temperature
## 1 (7.59584, 51.9691)  0 2013-08-11 16:56:06                 38
##   Intake.Pressure      CO2 Calculated.MAF Speed  Rpm Consumption
## 1             126 0.005875          26.75     4 1339    0.002217
attr(sp, "units")
##   Intake.Temperature Intake.Pressure  CO2 Calculated.MAF Speed   Rpm
## 1                  c             kPa kg/h            g/s  km/h u/min
##   Consumption
## 1         l/h
library(lattice)
trellis.par.set(sp.theme())
spplot(sp, "Consumption", colorkey = TRUE)

plot of chunk unnamed-chunk-3

Interpolating a point sequence

Here, we interpolate the point measurements to a new set of points, using inverse distance weighting

n = length(sp) * 5  # create 5 points between each observation...
cc = coordinates(sp)
newx = approx(cc[, 1], n = n)$y  # along x,
newy = approx(cc[, 2], n = n)$y  # along y,
crs = CRS(proj4string(sp))
newpts = SpatialPoints(cbind(newx, newy), crs)
# Alternatively: convert to SpatialLines, then use spsample
library(gstat)
# interpolate Consumption values:
idw = idw(Consumption ~ 1, sp, newpts)
## [inverse distance weighted interpolation]

We can plot this sequence by

spplot(idw[1], colorkey = TRUE, scales = list(draw = TRUE), main = "inverse distance interpolated Fuel Consumption values")

plot of chunk unnamed-chunk-5

Here, the new points are evenly distributed between all existing points, meaning that with 5 times as much points, between every two points 4 points are added equidistant. This gives a new point density that resembles the original point density.

Alternatively, we could generate interpolation points

Aggregation of trajectory data: spatial

We can aggregate to a 5 x 5 grid, using an arbitray aggregation function, here max

bb = bbox(sp)
grd = GridTopology(bb[, 1], apply(bb, 1, diff)/5, c(5, 5))
sp.agg = aggregate(sp[-1], SpatialGrid(grd, crs), max)

and show the results, for

spplot(sp.agg[-c(1, 2, 8)], colorkey = TRUE, sp.layout = list("sp.points", sp, 
    col = 3), main = "maximum measured value per grid cell")

plot of chunk unnamed-chunk-7

Aggregation: temporal

When looking at temporal variability, we see that for this trajectory the values are very unequally distrubed, indicating a break in the trajectory

plot(y = sp$Consumption, x = sp$time)
title("Fuel consumption over time")

plot of chunk unnamed-chunk-8

We can also aggregate to time intervals, e.g. 10 min values. For this, we will convert the points to a zoo object (time series)

library(zoo)
## Attaching package: 'zoo'
## 
## The following object is masked from 'package:base':
## 
## as.Date, as.Date.numeric
xx = zoo(sp$Consumption, sp$time)
plot(aggregate(xx, cut(sp$time, "5 min"), mean))
points(xx)
title("10-minute mean Consumption (l/h)")

plot of chunk unnamed-chunk-9

After having played with spatial objects (package sp) and temporal objects (package zoo), we can convert these data into spatio-temporal objects (package spacetime):

library(spacetime)
stidf = STIDF(geometry(sp), sp$time, sp@data)
stplot(geometry(stidf), main = "trajectory points, cut in 6 time slices with equal amount of points")

plot of chunk unnamed-chunk-10

plots the geometry of the spatial points, over time, cutting time in six parts with an equal number of observations.

The next plot adds attribute values as colour, but cuts time in four parts of equal duration:

# stplot(stidf) # breaks because of id and time are invalid type
stplot(stidf[, , "Consumption"], col.regions = bpy.colors(), number = 4, colorkey = T, 
    main = "Consumption, cut into 4 time slices of equal duration")

plot of chunk unnamed-chunk-11

Package spacetime also contains a class for trajectories. Direct conversion assumes that all points form one consecutive trajectory (note that this requires spacetime >= 1.0-10 to work):

sttdf = as(stidf, "STTDF")  # convert to trajectory

If this gives an error, first define the conversion function correctly:

setAs("STIDF", "STTDF", function(from) {
    if (is.null(from$burst)) 
        traj = list(as(from, "STI")) else traj = lapply(split(from, from$burst), function(x) as(x, "STI"))
    STIbox = STI(SpatialPoints(t(bbox(from@sp)), from@sp@proj4string), range(index(from)))
    new("STTDF", new("STT", STIbox, traj = traj), data = from@data)
})

The object can be plotted:

stplot(sttdf, scales = list(draw = TRUE))

plot of chunk unnamed-chunk-14

Adding a background map

Using package ggmap, we can add a google map background (or other background):

bb <- bbox(sttdf@sp)
# expand bb (here, manually):
bb[2, ] = c(51.94, 51.985)
bb[1, ] = c(7.58, 7.67)
library(ggmap)
## Loading required package: ggplot2
map4ao <- get_map(location = as.vector(bb))
## Warning: bounding box given to google - spatial extent only approximate.
## converting bounding box to center/zoom specification. (experimental) Map
## from URL :
## http://maps.googleapis.com/maps/api/staticmap?center=51.9625,7.625&zoom=13&size=%20640x640&scale=%202&maptype=terrain&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
# maptype = 'satellite', scale=2, zoom=4) Read the attributes to pass them
# to grid.raster
bbMap <- attr(map4ao, "bb")
latCenter <- with(bbMap, ll.lat + ur.lat)/2
lonCenter <- with(bbMap, ll.lon + ur.lon)/2
height <- with(bbMap, ur.lat - ll.lat)
width <- with(bbMap, ur.lon - ll.lon)
## Another component of the sp.layout in spplot
sp.raster <- list("grid.raster", map4ao, x = lonCenter, y = latCenter, width = width, 
    height = height, default.units = "native")
stplot(sttdf, scales = list(draw = TRUE), sp.layout = sp.raster, col = "red", 
    lwd = 2, main = "google map background")

plot of chunk unnamed-chunk-15

The following code allows conversion from SpatialPoints into SpatialLines, and has been added to sp:

setAs("SpatialPoints", "Line", function(from) Line(coordinates(from)))
setAs("SpatialPoints", "Lines", function(from) Lines(as(from, "Line"), "ID"))
setAs("SpatialPoints", "SpatialLines", function(from) SpatialLines(list(as(from, 
    "Lines")), from@proj4string))

Generalizing a trajectory

We can generalize a trajectory using Douglas-Peuker by using gSimplify in rgeos:

sl = as(sp, "SpatialLines")
library(rgeos)
## rgeos version: 0.2-20, (SVN revision (unknown)) GEOS runtime version:
## 3.3.8-CAPI-1.7.8 Polygon checking: TRUE
plot(gSimplify(sl, 5e-04), axes = TRUE)  # WRONG: rgeos takes lat/long as Euclidian coords
plot(sp, add = T, col = "red")
title("generalization in Long/Lat (wrong!)")

plot of chunk unnamed-chunk-17

however, without warning this falsely assumes that coordinates are metric, i.e. in a Euclidian system. They are not:

proj4string(sp)
## [1] "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"

How can we resolve this?

Reprojecting data to UTM

Package rgdal contains code to reproject data:

library(rgdal)
utm = CRS("+proj=utm +zone=32 +north +ellps=WGS84 +datum=WGS84 +units=m +no_defs")
slT = spTransform(sl, utm)
5e-04 * 110000  # approximate meters per degree latitude
## [1] 55
plot(gSimplify(slT, 55), axes = TRUE)  # RIGHT: rgeos takes coords Euclidian
plot(spTransform(sp, utm), add = T, col = "red")
title("generalization in UTM zone 32N")

plot of chunk unnamed-chunk-19

A plot of the (projected) simplified trajectory, here represented by a set of selected points, is created by 1. conversion to lines, 2. simplifying the line, 3. conversion to points, 4. matching to the full set of points, and 5. selection:

sl.simplified = gSimplify(slT, 55)  # 2
sp.simplified = as(sl.simplified, "SpatialPoints")  # 3
sp = spTransform(sp, utm)
sel = zerodist2(sp.simplified, sp, 0.001)[, 2]  # 4
sp.selected = sp[sel, ]  # 5
length(sp.selected)
## [1] 16
sp.selected[1:3, ]
##          coordinates id                time Intake.Temperature
## 1  (403540, 5758530)  0 2013-08-11 16:56:06                 38
## 21 (403731, 5758060) 20 2013-08-11 16:58:08                 32
## 32 (404064, 5758120) 31 2013-08-11 16:59:15                 30
##    Intake.Pressure      CO2 Calculated.MAF Speed  Rpm Consumption
## 1              126 0.005875          26.75     4 1339    0.002217
## 21              92 0.005398          24.57    26 1652    0.002037
## 32             112 0.005929          26.99    49 1480    0.002237
spplot(sp.selected, "Consumption", colorkey = TRUE, main = "selected points according to Douglas-Peuker")

plot of chunk unnamed-chunk-20

Further problems regarding generalization

Generalization as done above only retains the spatial path, but ignores time and attributes. One could rather easily implement Douglas-Peuker on higher-dimensional curves e.g. \( (x,y,t) \) or \( (x,y,t,Attr_1,Attr_2,...,Attr_n) \) but then one needs a distance metric that combines space, time and attributes – which one would be good?

Multiple tracks

library(RCurl)
## Loading required package: bitops
url = "https://giv-car.uni-muenster.de/stable/rest/tracks/"
allIDs = fromJSON(getURL(url))
ids = sapply(allIDs[[1]], function(x) x$id)
ids = ids[3:120]
ids = ids[!ids %in% c("522d9877e4b00a043c46b236", "522d9855e4b00a043c46a8b3", 
    "52264446e4b00a043c4376f8", "52274261e4b00a043c44b9b0")]
ids[1:10]
##  [1] "52300956e4b00a043c47730d" "52300954e4b00a043c477086"
##  [3] "52300649e4b00a043c476990" "52300647e4b00a043c476665"
##  [5] "522f421de4b00a043c474c59" "522f4217e4b00a043c47493e"
##  [7] "522ec0e7e4b00a043c46fa51" "522def1ee4b00a043c46f068"
##  [9] "522def13e4b00a043c46ed75" "522dc412e4b00a043c46c9d5"

Read the first 10 tracks, and check which of them are valid (meaning: all attributes are present for all points):

from = 1
to = 10
ids = ids[from:to]
lst = lapply(paste(url, ids, sep = ""), importEnviroCar)
## OGR data source with driver: GeoJSON 
## Source: "https://giv-car.uni-muenster.de/stable/rest/tracks/52300956e4b00a043c47730d", layer: "OGRGeoJSON"
## with 66 features and 3 fields
## Feature type: wkbPoint with 2 dimensions
## OGR data source with driver: GeoJSON 
## Source: "https://giv-car.uni-muenster.de/stable/rest/tracks/52300954e4b00a043c477086", layer: "OGRGeoJSON"
## with 161 features and 3 fields
## Feature type: wkbPoint with 2 dimensions
## OGR data source with driver: GeoJSON 
## Source: "https://giv-car.uni-muenster.de/stable/rest/tracks/52300649e4b00a043c476990", layer: "OGRGeoJSON"
## with 134 features and 3 fields
## Feature type: wkbPoint with 2 dimensions
## OGR data source with driver: GeoJSON 
## Source: "https://giv-car.uni-muenster.de/stable/rest/tracks/52300647e4b00a043c476665", layer: "OGRGeoJSON"
## with 202 features and 3 fields
## Feature type: wkbPoint with 2 dimensions
## OGR data source with driver: GeoJSON 
## Source: "https://giv-car.uni-muenster.de/stable/rest/tracks/522f421de4b00a043c474c59", layer: "OGRGeoJSON"
## with 115 features and 3 fields
## Feature type: wkbPoint with 2 dimensions
## OGR data source with driver: GeoJSON 
## Source: "https://giv-car.uni-muenster.de/stable/rest/tracks/522f4217e4b00a043c47493e", layer: "OGRGeoJSON"
## with 198 features and 3 fields
## Feature type: wkbPoint with 2 dimensions
## OGR data source with driver: GeoJSON 
## Source: "https://giv-car.uni-muenster.de/stable/rest/tracks/522ec0e7e4b00a043c46fa51", layer: "OGRGeoJSON"
## with 161 features and 3 fields
## Feature type: wkbPoint with 2 dimensions
## OGR data source with driver: GeoJSON 
## Source: "https://giv-car.uni-muenster.de/stable/rest/tracks/522def1ee4b00a043c46f068", layer: "OGRGeoJSON"
## with 101 features and 3 fields
## Feature type: wkbPoint with 2 dimensions
## OGR data source with driver: GeoJSON 
## Source: "https://giv-car.uni-muenster.de/stable/rest/tracks/522def13e4b00a043c46ed75", layer: "OGRGeoJSON"
## with 188 features and 3 fields
## Feature type: wkbPoint with 2 dimensions
## OGR data source with driver: GeoJSON 
## Source: "https://giv-car.uni-muenster.de/stable/rest/tracks/522dc412e4b00a043c46c9d5", layer: "OGRGeoJSON"
## with 4 features and 3 fields
## Feature type: wkbPoint with 2 dimensions
names(lst) = ids
sel = !sapply(lst, is.null)
sel
## 52300956e4b00a043c47730d 52300954e4b00a043c477086 52300649e4b00a043c476990 
##                    FALSE                    FALSE                     TRUE 
## 52300647e4b00a043c476665 522f421de4b00a043c474c59 522f4217e4b00a043c47493e 
##                    FALSE                     TRUE                    FALSE 
## 522ec0e7e4b00a043c46fa51 522def1ee4b00a043c46f068 522def13e4b00a043c46ed75 
##                     TRUE                     TRUE                    FALSE 
## 522dc412e4b00a043c46c9d5 
##                     TRUE

Next, we'll throw out the invalid ones (a better procedure would only throw out points, not full trajectories)

lst = lst[sel]
length(lst)
## [1] 5

Then, we'll create a STTDF from this, by

lst2 = lapply(lst, function(x) STI(geometry(x), x$time))
attributes = do.call(rbind, lapply(lst, function(x) x@data))
sttdf = STTDF(STT(lst2), attributes)
stplot(sttdf, col = 1:length(lst), scales = list(draw = TRUE), main = "five trajectories")

plot of chunk unnamed-chunk-24

What have we learned?

  1. reading trajectory data is tricky: some trajectories have no attributes for part of the points, some have a different number of attributes; this make joining them difficult (but not impossible)
  2. subsampling, or generalizing the trajectories is easy if we only focus on space (Douglas-Peuker, rgeos::gSimplify), but needs difficult choices if time and/or attributes are taken into account too
  3. densifying by interpolation is easy, but one needs to choose where to put extra points: equally spaced between observation points, equally distributed over space, equally distributed over time?
  4. aggregating trajectory data leads to change of support: new values are now valid for a grid cell, or a polyline, and for a time interval; we haven't addressed colouring

Open questions

  1. Should R take care of the heterogeneity in the data served, or should the data served be consistent?
  2. How do we aggregate over multiple trajectories, e.g. when they have different sampling rates? Simply take the points, or weight by interval length or duration?
  3. How do we assign trajectories to road segments?
  4. How do we aggregate over space AND time, and how do we visualize this?
  5. How do we analyze (e.g., summarize) trajectories, within drivers and between drivers?
  6. How do we compare trajectories, how do we compare drivers?