This project aimed to develop an R package named trajectories that is specifically catered for trajectory analysis based on existing R packages available such as sp, spacetime. Since the beginning of the project, progress has been made in three main aspects. First, several functions have been developed to allow coercion between common spatial classes for trajectories. Second, a set of functions was written to analyze the trajectory data. Third, a plot() function was created to specifically visualize trajectory data. In addition, two sample trajectory datasets were integrated to the package for demonstration as well as for user testing. The package is managed at r-forge.org and can be viewed and downloaded at this link.
For the rest of this post, I will first briefly introduce the sample datasets, and then discusses the trajectories package from the three aspects mentioned above.
To demonstrate the functionality of the package, two sets of sample trajectory datasets are included in the package. The first dataset is named traj_sample. It was created from public GPS trajectories uploaded by user alex18 at OpenStreetMap.org. Five trajectories are available in this sample as five STI objects stored in a single list object in R. The second dataset, named geolife_sample was extracted from the GeoLife dataset released by Microsoft. It includes the trajectories collected from users of mobile phone via GPS. A total of 15 trajectories from three users are stored in the dataset as a single STTDF object. The traj_sample dataset was mostly used for testing when developed the methods due to its relative small size. On contrary, the geolife_sample dataset is relative richer in the amount of trajectory data. Hence, it will be used later in this post to demonstrate the package. The size of the GeoLife dataset is 298.66MB. Users may choose to download the entire dataset for testing from the link on the GeoLife webpage.
The first step is to load the sample dataset geolife_sample from the trajectories package. The code for loading the dataset is shown below:
# Load the package. You will see some compatibility-related warning
# messages. Just ignore them at this point.
library("trajectories")
## Loading required package: sp Loading required package: spacetime Loading
## required package: rgeos rgeos version: 0.2-19, (SVN revision 394) GEOS
## runtime version: 3.3.3-CAPI-1.7.4 Polygon checking: TRUE
## Warning: the specification for class "im" in package 'maptools' seems
## equivalent to one from package 'sp' and is not turning on duplicate class
## definitions for this class Warning: the specification for class "owin" in
## package 'maptools' seems equivalent to one from package 'sp' and is not
## turning on duplicate class definitions for this class Warning: the
## specification for class "ppp" in package 'maptools' seems equivalent to
## one from package 'sp' and is not turning on duplicate class definitions
## for this class Warning: the specification for class "psp" in package
## 'maptools' seems equivalent to one from package 'sp' and is not turning on
## duplicate class definitions for this class Warning: replacing previous
## import '.__C__im' when loading 'maptools' Warning: replacing previous
## import '.__C__owin' when loading 'maptools' Warning: replacing previous
## import '.__C__ppp' when loading 'maptools' Warning: replacing previous
## import '.__C__psp' when loading 'maptools'
## Attaching package: 'trajectories'
##
## The following object is masked from 'package:stats':
##
## aggregate
# Load the geolife sample dataset
data(geolife_sample)
# Make the name shorter
geolife <- geolife_sample
class(geolife)
## [1] "STTDF"
## attr(,"package")
## [1] "spacetime"
slotNames(geolife)
## [1] "data" "traj" "sp" "time" "endTime"
The object geolife has four slots as specified as below:
sp: the coordinates of minimum bounding box of the trajectories;
time: the starting stamp and the ending time stamp of the trajectories;
endTime: the ending time of the two time stamps in the time slot. Since the event recorded here are trajectory points, which has no duration over time, the endTime should be identical to time;
traj: a list that contains all trajectories. Each trajectory is stored as an STI object;
data: a dataframe that contains the attribute data, such as distance, time, average speed, elevation, of the all points in the trajectories.
Next, three functions that allows for coercion between common spatial classes for trajectories are listed in the table below:
| Name | Description |
|---|---|
| STItoSTTDF() | Coerces a list of STI objects into STTDF and computes the trajectory attributes such as distance, time, average speed, turning angle, elevation change, etc. |
| STItoSpatialLines() | Coerces an STI object into an SpatialLines object. The time slot of the STI object is discarded. |
| STTDFtoSpatialLines() | Coerces an STTDF object into an SpatialLines object. The time and data slot of the STTDF object is discarded. |
The usage of these three functions are shown below:
sttdf <- STItoSTTDF(geolife@traj)
class(sttdf)
## [1] "STTDF"
## attr(,"package")
## [1] "spacetime"
sl <- STItoSpatialLines(geolife@traj[[1]])
class(sl)
## [1] "SpatialLines"
## attr(,"package")
## [1] "sp"
sl2 <- STTDFtoSpatialLines(geolife)
class(sl2)
## [1] "SpatialLines"
## attr(,"package")
## [1] "sp"
One of the major goals of trajectory analysis is to extract useful information from trajectory data. To this end, three functions were developed to manipulate and summarize the trajectory data.
The summary() function summarizes the basic properties and statistics of the trajectory data, either stored as STI object or STTDF object.
sti <- geolife@traj[[1]]
summary(sti)
## $class
## [1] "STI"
## attr(,"package")
## [1] "spacetime"
##
## $bbox
## min max
## long 116.29 116.32
## lat 39.98 40.01
##
## $is.projected
## [1] FALSE
##
## $proj4string
## [1] "+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"
##
## $npoints
## [1] 907
##
## $starting_time
## [1] "2008-10-23 02:53:10 PDT"
##
## $ending_time
## [1] "2008-10-23 11:11:12 PDT"
##
## $duration
## [1] 8.3
##
## attr(,"class")
## [1] "summary.STI"
summary(geolife)
## $class
## [1] "STTDF"
## attr(,"package")
## [1] "spacetime"
##
## $bbox
## min max
## x 116.15 116.39
## y 39.89 40.08
##
## $is.projected
## [1] FALSE
##
## $proj4string
## [1] "+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"
##
## $ntraj
## [1] 15
##
## $starting_time
## [1] "2008-10-23 02:53:10 PDT"
##
## $ending_time
## [1] "2008-10-28 05:03:42 PDT"
##
## $duration
## [1] 122.2
##
## $total_dist
## [1] 364.3
##
## $time_lapsed
## [1] 37404
##
## $ave_speed
## [1] 0.0097
##
## $ave_elevation
## [1] 334.8
##
## attr(,"class")
## [1] "summary.STTDF"
The aggregate() function aggregates the trajectory data over various temporal scales such as hour, day of month, or month of year, and returns a dataframe object containing the summarization as results. In the dataframe, the first column indicates the temporal unit (e.g., hour, day, month). The second column contains the total distance in that unit. The third column contains the total time lapsed (second) in that unit. The forth column contains the average elevation in that unit. The fifth column contains the total number of points in that unit. The sixth column contains the average speed (km/h) in that unit.
aggregate(geolife, "hour")
## dist timeLapsed elev np speed
## 1 0.0000 0 0 0 NaN
## 2 0.0000 0 0 0 NaN
## 3 0.0000 0 0 0 NaN
## 4 0.0000 0 0 0 NaN
## 5 0.0000 0 0 0 NaN
## 6 0.0000 0 0 0 NaN
## 7 0.0000 0 0 0 NaN
## 8 0.0000 0 0 0 NaN
## 9 0.0000 0 0 0 NaN
## 10 36.4337 3108 703709 3110 42.20
## 11 26.4016 2537 622561 2537 37.46
## 12 14.8010 956 476643 959 55.74
## 13 6.7556 1397 716136 1397 17.41
## 14 6.0347 2038 1334814 2038 10.66
## 15 5.3388 852 1110787 852 22.56
## 16 12.1635 1065 1388155 1066 41.12
## 17 0.5935 119 181018 119 17.95
## 18 0.0000 0 0 0 NaN
## 19 0.0000 0 0 0 NaN
## 20 0.0000 0 0 0 NaN
## 21 0.0000 0 0 0 NaN
## 22 0.0002 1 521 1 0.72
## 23 6.8711 1094 477434 1094 22.61
## 24 0.0000 0 0 0 NaN
In the example above, the hourly statistics of the trajectory is listed in a dataframe with statistics for each hour being a row. Similarly, the daily statistics and monthly statistics can be obtained by executing aggregate(geolife, “day”) or aggregate(geolife, “month”).
The crop() function was designed to spatially select trajectories that overlay with a certain area. This is particularly useful when users want to spatially subset the trajectory data. Currently, the crop() function is able to spatially subset trajectory data using a SpatialPolygons object. At this point, only trajectories that entirely overlap within the polygon will be selected.
# Create an SpatialPolygons object
lat_min <- min(geolife@traj[[1]]@sp$lat)
lat_max <- max(geolife@traj[[1]]@sp$lat)
long_min <- min(geolife@traj[[1]]@sp$long)
long_max <- max(geolife@traj[[1]]@sp$long)
xpol <- c(long_min, long_max, long_max, long_min, long_min)
ypol <- c(lat_min, lat_min, lat_max, lat_max, lat_min)
pol <- SpatialPolygons(list(Polygons(list(Polygon(cbind(xpol, ypol))), ID = "x1")))
pol@proj4string <- CRS("+proj=longlat +datum=WGS84")
# Crop the geolife dataset using the polygon created
geolife_cropped <- crop(geolife, pol)
Last but not least, the plot() function is introduced here to visualize the results from the previous example. The plot() function can coerce either an STI *object or an *STTDF *object into an *SpatialLines *object, and then plot the *SpatialLines *object using the drawing device in R. By default, the *plot() draws a single STI object using black color while draws an STTDF object using distinct colors for each trajectories.
# Plotting an STI object
plot(geolife@traj[[1]])
# Plotting an STTDF object
plot(geolife)
plot(pol, add = T)
# Plotting an STI object cropped by a polygon
plot(geolife_cropped)
plot(pol, add = T)
The following aspects can be further developed/improved in future: