The goal of this vignette is to demonstrate the functionalities of the swaRm package for processing collective movement data.
swaRm can be installed from the SwarmLab’s drat repository by running the following lines in your R terminal:
install.packages("drat") # This line is required if drat is not present
# in your R installation.
drat::addRepo("swarm-lab")
install.packages("swaRm")If you want to receive package updates automatically, you can add the drat repository to your R installation permanently by adding the line drat::addRepo("swarm-lab") to your .Rprofile file.
The swaRm package ships with a number of example data files that we will use for demonstrating its functioning.
First, we will load in memory the first of these example data files.
filePath <- system.file("extdata/01.csv", package = "swaRm")
dat <- read.csv(filePath)
head(dat)## date time lon lat
## 1 2015-09-10 7:00:00 15.76468 -22.37957
## 2 2015-09-10 7:00:01 15.76468 -22.37957
## 3 2015-09-10 7:00:04 15.76468 -22.37958
## 4 2015-09-10 7:00:05 15.76468 -22.37958
## 5 2015-09-10 7:00:06 NA -22.37958
## 6 2015-09-10 7:00:07 15.76467 NA
This file contains GPS data, with a date, a timestamp, and longitude/latitude coordinates. Note that the coordinates can also be X/Y cartesian (or projected) coordinates (e.g. as returned by video tracking software, or if you projected your GPS data on a given reference grid).
The first step of the analysis process is to transform this imported data into a standardized trajectory data table that will be usable by all the functions in the swaRm package.
library(swaRm)
traj <- makeTraj(x = dat$lon, y = dat$lat, id = "01",
date = dat$date, time = dat$time,
date.format = "ymd", time.format = "hms",
geo = TRUE) # Set 'geo' to FALSE if working with cartesian
# (i.e. non geographic) coordinates
trajHere makeTraj returns a table containing 3599 observations (each row is an observation) and 4 columns: id, time, lon and lat (or x and y if working with projected data). id is a unique identifier chosen by you to distinguish between different trajectories when merging multiple tables together (we will see this later in this tutorial). time is a standardized timestamp constructed from the dates and times we provided the makeTraj function.
Trajectory tables are R6 objects inheriting from the data.table class thanks to the R6Frame package. As such, they retain all the properties of data.table objects and can be manipulated in almost all the same ways data.table objects can be. This includes the possibility to use functions from the dplyr package as we will see later on. data.table objects behave like regular data.frame objects, but come with additional functionalities and significant gains in processing speed compared to data.frame objects.
Warning - Difference between data.table and trajectory tables.
While trajectory tables can be manipulated pretty much like any data.table (or data.frame) object, there is still a major difference that need to be taken into account until the R6Frame package is completed (version 0.1 only at the time of writing). A trajectory table is essentially a R6 wrapper around a data.table object that is stored in the $data slot of the R6 object. At the moment, this means that you cannot access the columns of the table using the $ sign like you would do with data.table or data.frame objects. This will hopefully be included in the next version of the R6Frame package. For now, you access the columns in two different ways:
traj[["time"]]).$data slot of the trajectory table object, and then use the $ sign to access the desired column (e.g., traj$data$time).This should not affect you in most situations in which you would be manioulating or modifying the data, but it could become a problem when trying to use formula-based functions (e.g., linear regressions, plot). In this case, pass the $data slot to the function instead of the full trajectory table object (e.g. plot(lat ~ lon, data = traj$data, type = "l")).
Errors and missing observations are common problems with tracking data, for instance when working in areas with poor GPS satellite covering. swaRm provides a set of convenience functions to automatically detect and correct some of the most common types of errors. Detection and correction are handled by separate functions to allow users to develop their own correction functions for instance.
These functions are meant to assist the user during the processing of his/her tracking data. However, as every automated method, they are not 100% reliable and a manual inspection of the data is recommended to ensure that all errors were correctly detected and corrected appropriately. It is also recommended that you keep the original tracking data untouched in order to facilitate the comparison between the data sets before and after the correction is applied. Note as well that the order in which you apply these different corrections will have an impact on the final state of your data.
Missing observations are probably the most common type of errors in tracking data. They can have a number of origins (e.g. GPS tag losing satellite connection, animal moving out of the camera field of view, read/write errors, etc.). The findMissing function is here to help you identify this missing data. Note that this function works better with data that was collected at regular time intervals. It will most likely return weird results if it is not the case.
missing <- findMissing(traj)
missing## time type
## 1 2015-09-10 07:00:02 MISSING
## 2 2015-09-10 07:00:03 MISSING
## 3 2015-09-10 07:00:24 NA
## 4 2015-09-10 07:00:34 NA
findMissing returns a data frame with two columns. The time column contains timestamps at which the function believes observations are missing. The type column indicates whether the observation is truly missing (i.e. was never recorded) or if the timestamp of the observation was set to NA during recording or during the importation of the data (e.g. because the date or time info were not formatted properly). You can then decide what to do about this missing data: ignore it, correct it with your own algorithm, or use the automated fixMissing function provided with the swaRm package.
traj01 <- fixMissing(traj)
traj01[1:40, ]## Trajectory table [40 observations]
## Number of tracks: 1
## Geographic data: TRUE
## Dimensions: 2D
##
## id time lon lat error
## 1: 01 2015-09-10 07:00:00 15.76468 -22.37957 OK
## 2: 01 2015-09-10 07:00:01 15.76468 -22.37957 OK
## 3: 01 2015-09-10 07:00:02 NA NA MISSING
## 4: 01 2015-09-10 07:00:03 NA NA MISSING
## 5: 01 2015-09-10 07:00:04 15.76468 -22.37958 OK
## 6: 01 2015-09-10 07:00:05 15.76468 -22.37958 OK
## 7: 01 2015-09-10 07:00:06 NA -22.37958 OK
## 8: 01 2015-09-10 07:00:07 15.76467 NA OK
## 9: 01 2015-09-10 07:00:08 15.76467 -22.37959 OK
## 10: 01 2015-09-10 07:00:09 15.76467 -22.37959 OK
## 11: 01 2015-09-10 07:00:09 15.76467 -22.37959 OK
## 12: 01 2015-09-10 07:00:10 15.76467 -22.37959 OK
## 13: 01 2015-09-10 07:00:11 15.76467 -22.37959 OK
## 14: 01 2015-09-10 07:00:12 15.76467 -22.37959 OK
## 15: 01 2015-09-10 07:00:13 15.76467 -22.37959 OK
## 16: 01 2015-09-10 07:00:14 15.76467 -22.37959 OK
## 17: 01 2015-09-10 07:00:15 15.76467 -22.37960 OK
## 18: 01 2015-09-10 07:00:16 15.76466 -22.37960 OK
## 19: 01 2015-09-10 07:00:17 15.76466 -22.37960 OK
## 20: 01 2015-09-10 07:00:18 15.76466 -22.37960 OK
## 21: 01 2015-09-10 07:00:19 15.76466 -22.37959 OK
## 22: 01 2015-09-10 07:00:20 15.76467 -22.37959 OK
## 23: 01 2015-09-10 07:00:21 15.76467 -22.37959 OK
## 24: 01 2015-09-10 07:00:22 15.76467 -22.37959 OK
## 25: 01 2015-09-10 07:00:23 15.76467 -22.37959 OK
## 26: 01 2015-09-10 07:00:24 16.76467 -23.37959 NA
## 27: 01 2015-09-10 07:00:25 15.76467 -22.37959 OK
## 28: 01 2015-09-10 07:00:26 15.76468 -22.37960 OK
## 29: 01 2015-09-10 07:00:27 15.76468 -22.37960 OK
## 30: 01 2015-09-10 07:00:28 15.76468 -22.37960 OK
## 31: 01 2015-09-10 07:00:29 15.76468 -22.37960 OK
## 32: 01 2015-09-10 07:00:30 15.76468 -22.37960 OK
## 33: 01 2015-09-10 07:00:31 15.76468 -22.37960 OK
## 34: 01 2015-09-10 07:00:32 15.76468 -22.37960 OK
## 35: 01 2015-09-10 07:00:33 15.76468 -22.37960 OK
## 36: 01 2015-09-10 07:00:34 15.76469 -22.37960 NA
## 37: 01 2015-09-10 07:00:35 15.76469 -22.37960 OK
## 38: 01 2015-09-10 07:00:36 15.76469 -22.37960 OK
## 39: 01 2015-09-10 07:00:37 15.76469 -22.37960 OK
## 40: 01 2015-09-10 07:00:38 15.76468 -22.37960 OK
## id time lon lat error
Here the function added two new observations at rows 3 and 4 (indicated by the MISSING error tag) and fixed to NA timestamps at rows 26 and 36 (indicated by the NA error tag).
Note that all error correction function will add the extra error column to the trajectory table in order to indicate which observations were corrected. This can be useful for instance to check that the automated correction did work properly.
Duplicated timestamps are a less common type of errors. They can be caused by writing errors for instance. The function findTimeDup is here to help you identify these potential errors.
time_dup <- findTimeDup(traj01)
time_dup## [1] 11
The function fixTimeDup will then attempt to automatically correct the duplicated timestamp. If a timestamps is missing where the duplicated timestamp is, the function will replace the duplicated timestamp with the missing one. Otherwise, the duplicated timestamp will be replaced by NA.
traj01 <- fixTimeDup(traj01)
traj01[1:20, ]## Trajectory table [20 observations]
## Number of tracks: 1
## Geographic data: TRUE
## Dimensions: 2D
##
## id time lon lat error
## 1: 01 2015-09-10 07:00:00 15.76468 -22.37957 OK
## 2: 01 2015-09-10 07:00:01 15.76468 -22.37957 OK
## 3: 01 2015-09-10 07:00:02 NA NA MISSING
## 4: 01 2015-09-10 07:00:03 NA NA MISSING
## 5: 01 2015-09-10 07:00:04 15.76468 -22.37958 OK
## 6: 01 2015-09-10 07:00:05 15.76468 -22.37958 OK
## 7: 01 2015-09-10 07:00:06 NA -22.37958 OK
## 8: 01 2015-09-10 07:00:07 15.76467 NA OK
## 9: 01 2015-09-10 07:00:08 15.76467 -22.37959 OK
## 10: 01 2015-09-10 07:00:09 15.76467 -22.37959 OK
## 11: 01 <NA> 15.76467 -22.37959 TIMEDUP
## 12: 01 2015-09-10 07:00:10 15.76467 -22.37959 OK
## 13: 01 2015-09-10 07:00:11 15.76467 -22.37959 OK
## 14: 01 2015-09-10 07:00:12 15.76467 -22.37959 OK
## 15: 01 2015-09-10 07:00:13 15.76467 -22.37959 OK
## 16: 01 2015-09-10 07:00:14 15.76467 -22.37959 OK
## 17: 01 2015-09-10 07:00:15 15.76467 -22.37960 OK
## 18: 01 2015-09-10 07:00:16 15.76466 -22.37960 OK
## 19: 01 2015-09-10 07:00:17 15.76466 -22.37960 OK
## 20: 01 2015-09-10 07:00:18 15.76466 -22.37960 OK
Here the duplicated timestamp corresponds to an already existing observation (hence the <NA> created by the fixTimeDup function in the time column to replace the duplicated timestamp). We will simply remove it here using the filter function in the dplyr package.
library(dplyr)
traj01 <- filter(traj01, -11)
traj01[1:20, ]## Trajectory table [20 observations]
## Number of tracks: 1
## Geographic data: TRUE
## Dimensions: 2D
##
## id time lon lat error
## 1: 01 2015-09-10 07:00:00 15.76468 -22.37957 OK
## 2: 01 2015-09-10 07:00:01 15.76468 -22.37957 OK
## 3: 01 2015-09-10 07:00:02 NA NA MISSING
## 4: 01 2015-09-10 07:00:03 NA NA MISSING
## 5: 01 2015-09-10 07:00:04 15.76468 -22.37958 OK
## 6: 01 2015-09-10 07:00:05 15.76468 -22.37958 OK
## 7: 01 2015-09-10 07:00:06 NA -22.37958 OK
## 8: 01 2015-09-10 07:00:07 15.76467 NA OK
## 9: 01 2015-09-10 07:00:08 15.76467 -22.37959 OK
## 10: 01 2015-09-10 07:00:09 15.76467 -22.37959 OK
## 11: 01 2015-09-10 07:00:10 15.76467 -22.37959 OK
## 12: 01 2015-09-10 07:00:11 15.76467 -22.37959 OK
## 13: 01 2015-09-10 07:00:12 15.76467 -22.37959 OK
## 14: 01 2015-09-10 07:00:13 15.76467 -22.37959 OK
## 15: 01 2015-09-10 07:00:14 15.76467 -22.37959 OK
## 16: 01 2015-09-10 07:00:15 15.76467 -22.37960 OK
## 17: 01 2015-09-10 07:00:16 15.76466 -22.37960 OK
## 18: 01 2015-09-10 07:00:17 15.76466 -22.37960 OK
## 19: 01 2015-09-10 07:00:18 15.76466 -22.37960 OK
## 20: 01 2015-09-10 07:00:19 15.76466 -22.37959 OK
Inconsistent locations are another frequent type of errors in tracking data. They correspond to reported locations that are significantly off compared to the normal variability of the tracking systems. They can have multiple origins (e.g. GPS temporarily losing connection to one or more satellites, video recording showing increased noise, etc.) and they can create serious problems during the analysis of the data. For instance, here is a graph of the trajectory that we have created earlier.
library(ggplot2)
ggplot(traj01$data, aes(x = lon, y = lat)) +
geom_path() +
xlab("Longitude") + ylab("Latitude") +
coord_map()This does not look very good because one data point is way out of the actual range of the data. We will use the findLocErr function to identify this point and filter it out of the data when plotting.
loc_err <- findLocErr(traj01)
loc_err## [1] 25
ggplot(filter(traj01, -loc_err)$data, aes(x = lon, y = lat)) +
geom_path() +
xlab("Longitude") + ylab("Latitude") +
coord_map()This is much better!
Instead of filtering out the erroneous point, we can correct it using the fixLocErr function that will estimate its location using linear (default) or spline interpolation.
traj01 <- fixLocErr(traj01)
ggplot(traj01$data, aes(x = lon, y = lat)) +
geom_path() +
xlab("Longitude") + ylab("Latitude") +
coord_map()Interpolation should be used carefully as it will introduce artificial auto-correlation between successive data points. If this is probably fine to correct spurious errors, it might create more serious problem when several successive locations require correction. In this case, it might be better to simply filter out the errors or ignore the corresponding segment in the analyis.
In some occasions, the data can contain timestamps that have no location data associated with. This can occur for instance because a GPS unit could not acquire statellite signals. The fixMissing function seen earlier will also introduce such “NA” locations when adding missing timestamps to the trajectory table. You can discover these missing locations in your data by using the findLocNA function.
loc_NA <- findLocNA(traj01)
loc_NA## [1] 3 4 7 8
You can then automatically replace these missing data points using the fixLocNA function. This function will estimate the location of the missing points using linear (default) or spline interpolation.
traj01 <- fixLocNA(traj01)
traj01[1:20, ]## Trajectory table [20 observations]
## Number of tracks: 1
## Geographic data: TRUE
## Dimensions: 2D
##
## id time lon lat error
## 1: 01 2015-09-10 07:00:00 15.76468 -22.37957 OK
## 2: 01 2015-09-10 07:00:01 15.76468 -22.37957 OK
## 3: 01 2015-09-10 07:00:02 15.76468 -22.37958 MISSING+lOCNA
## 4: 01 2015-09-10 07:00:03 15.76468 -22.37958 MISSING+lOCNA
## 5: 01 2015-09-10 07:00:04 15.76468 -22.37958 OK
## 6: 01 2015-09-10 07:00:05 15.76468 -22.37958 OK
## 7: 01 2015-09-10 07:00:06 15.76467 -22.37958 lOCNA
## 8: 01 2015-09-10 07:00:07 15.76467 -22.37959 lOCNA
## 9: 01 2015-09-10 07:00:08 15.76467 -22.37959 OK
## 10: 01 2015-09-10 07:00:09 15.76467 -22.37959 OK
## 11: 01 2015-09-10 07:00:10 15.76467 -22.37959 OK
## 12: 01 2015-09-10 07:00:11 15.76467 -22.37959 OK
## 13: 01 2015-09-10 07:00:12 15.76467 -22.37959 OK
## 14: 01 2015-09-10 07:00:13 15.76467 -22.37959 OK
## 15: 01 2015-09-10 07:00:14 15.76467 -22.37959 OK
## 16: 01 2015-09-10 07:00:15 15.76467 -22.37960 OK
## 17: 01 2015-09-10 07:00:16 15.76466 -22.37960 OK
## 18: 01 2015-09-10 07:00:17 15.76466 -22.37960 OK
## 19: 01 2015-09-10 07:00:18 15.76466 -22.37960 OK
## 20: 01 2015-09-10 07:00:19 15.76466 -22.37959 OK
As said before, interpolation should be used carefully as it will introduce artificial auto-correlation between successive data points. If this is probably fine to correct spurious errors, it might create more serious problem when several successive locations require correction. In this case, it might be better to simply filter out the errors or ignore the corresponding segment in the analyis.
We have already seen earlier how to represent a trajectory using ggplot2. For geographic data, we can go one step further and use the ggmap package to download a satellite map of the study area and overlay the trajectory directly over it.
library(ggmap)
myLocation <- c(lon = mean(range(traj01[["lon"]])), lat = mean(range(traj01[["lat"]])))
myMap <- get_map(location = myLocation, source = "google",
maptype = "satellite", zoom = 17, scale = 2)
ggmap(myMap) +
geom_path(data = traj01$data, aes(x = lon, y = lat), color = "#D7601C", size = 1) +
xlab("Longitude") + ylab("Latitude") +
xlim(min(traj01[["lon"]]), max(traj01[["lon"]])) +
ylim(min(traj01[["lat"]]), max(traj01[["lat"]])) +
guides(color = FALSE)swaRm contains a number of functions to help you characterize the trajectories that you are working with. We will go over these different functions in the rest of this section. Most of them should be self-explanatory and I will therefore only provide example code. Note that I will be using the mutate function of the dplyr package in order to quickly append the calculated statistics to the trajectory table.
traj01 <- mutate(traj01,
linDist = linDist(x = lon, y = lat, geo = isGeo(traj01)),
cumDist = cumsum(linDist))
traj01## Trajectory table [3600 observations]
## Number of tracks: 1
## Geographic data: TRUE
## Dimensions: 2D
##
## id time lon lat error linDist
## 1: 01 2015-09-10 07:00:00 15.76468 -22.37957 OK 0.0000000
## 2: 01 2015-09-10 07:00:01 15.76468 -22.37957 OK 0.3322063
## 3: 01 2015-09-10 07:00:02 15.76468 -22.37958 MISSING+lOCNA 0.1969157
## 4: 01 2015-09-10 07:00:03 15.76468 -22.37958 MISSING+lOCNA 0.1969157
## 5: 01 2015-09-10 07:00:04 15.76468 -22.37958 OK 0.1969157
## ---
## 3596: 01 2015-09-10 07:59:55 15.76231 -22.37752 OK 0.7561179
## 3597: 01 2015-09-10 07:59:56 15.76230 -22.37752 OK 0.7015594
## 3598: 01 2015-09-10 07:59:57 15.76229 -22.37752 OK 0.7937674
## 3599: 01 2015-09-10 07:59:58 15.76229 -22.37751 OK 0.9354126
## 3600: 01 2015-09-10 07:59:59 15.76228 -22.37751 OK 0.9846149
## cumDist
## 1: 0.0000000
## 2: 0.3322063
## 3: 0.5291220
## 4: 0.7260377
## 5: 0.9229534
## ---
## 3596: 933.9746911
## 3597: 934.6762505
## 3598: 935.4700180
## 3599: 936.4054306
## 3600: 937.3900455
Note that geo parameter of the linDist function is set to TRUE by the isGeo function in order to tell the linDist function that it will be handling geographic data. In this case, it uses the distGeo function from the geosphere package to return the distances in meters. If geo had been set to FALSE instead, the returned distances would have been in the units of the data. The same principle will apply for the functions calculating linear speeds and accelerations below.
traj01 <- mutate(traj01,
linSpeed = linSpeed(x = lon, y = lat, t = time, geo = isGeo(traj01)))
traj01## Trajectory table [3600 observations]
## Number of tracks: 1
## Geographic data: TRUE
## Dimensions: 2D
##
## id time lon lat error linDist
## 1: 01 2015-09-10 07:00:00 15.76468 -22.37957 OK 0.0000000
## 2: 01 2015-09-10 07:00:01 15.76468 -22.37957 OK 0.3322063
## 3: 01 2015-09-10 07:00:02 15.76468 -22.37958 MISSING+lOCNA 0.1969157
## 4: 01 2015-09-10 07:00:03 15.76468 -22.37958 MISSING+lOCNA 0.1969157
## 5: 01 2015-09-10 07:00:04 15.76468 -22.37958 OK 0.1969157
## ---
## 3596: 01 2015-09-10 07:59:55 15.76231 -22.37752 OK 0.7561179
## 3597: 01 2015-09-10 07:59:56 15.76230 -22.37752 OK 0.7015594
## 3598: 01 2015-09-10 07:59:57 15.76229 -22.37752 OK 0.7937674
## 3599: 01 2015-09-10 07:59:58 15.76229 -22.37751 OK 0.9354126
## 3600: 01 2015-09-10 07:59:59 15.76228 -22.37751 OK 0.9846149
## cumDist linSpeed
## 1: 0.0000000 NA
## 2: 0.3322063 0.3322063
## 3: 0.5291220 0.1969157
## 4: 0.7260377 0.1969157
## 5: 0.9229534 0.1969157
## ---
## 3596: 933.9746911 0.7561179
## 3597: 934.6762505 0.7015594
## 3598: 935.4700180 0.7937674
## 3599: 936.4054306 0.9354126
## 3600: 937.3900455 0.9846149
traj01 <- mutate(traj01,
linAcc = linAcc(x = lon, y = lat, t = time, geo = isGeo(traj01)))
traj01## Trajectory table [3600 observations]
## Number of tracks: 1
## Geographic data: TRUE
## Dimensions: 2D
##
## id time lon lat error linDist
## 1: 01 2015-09-10 07:00:00 15.76468 -22.37957 OK 0.0000000
## 2: 01 2015-09-10 07:00:01 15.76468 -22.37957 OK 0.3322063
## 3: 01 2015-09-10 07:00:02 15.76468 -22.37958 MISSING+lOCNA 0.1969157
## 4: 01 2015-09-10 07:00:03 15.76468 -22.37958 MISSING+lOCNA 0.1969157
## 5: 01 2015-09-10 07:00:04 15.76468 -22.37958 OK 0.1969157
## ---
## 3596: 01 2015-09-10 07:59:55 15.76231 -22.37752 OK 0.7561179
## 3597: 01 2015-09-10 07:59:56 15.76230 -22.37752 OK 0.7015594
## 3598: 01 2015-09-10 07:59:57 15.76229 -22.37752 OK 0.7937674
## 3599: 01 2015-09-10 07:59:58 15.76229 -22.37751 OK 0.9354126
## 3600: 01 2015-09-10 07:59:59 15.76228 -22.37751 OK 0.9846149
## cumDist linSpeed linAcc
## 1: 0.0000000 NA NA
## 2: 0.3322063 0.3322063 NA
## 3: 0.5291220 0.1969157 -1.352906e-01
## 4: 0.7260377 0.1969157 -6.040891e-10
## 5: 0.9229534 0.1969157 5.768741e-11
## ---
## 3596: 933.9746911 0.7561179 -1.968536e-01
## 3597: 934.6762505 0.7015594 -5.455845e-02
## 3598: 935.4700180 0.7937674 9.220803e-02
## 3599: 936.4054306 0.9354126 1.416451e-01
## 3600: 937.3900455 0.9846149 4.920234e-02
traj01 <- mutate(traj01,
heading = heading(x = lon, y = lat, geo = isGeo(traj01)))
traj01## Trajectory table [3600 observations]
## Number of tracks: 1
## Geographic data: TRUE
## Dimensions: 2D
##
## id time lon lat error linDist
## 1: 01 2015-09-10 07:00:00 15.76468 -22.37957 OK 0.0000000
## 2: 01 2015-09-10 07:00:01 15.76468 -22.37957 OK 0.3322063
## 3: 01 2015-09-10 07:00:02 15.76468 -22.37958 MISSING+lOCNA 0.1969157
## 4: 01 2015-09-10 07:00:03 15.76468 -22.37958 MISSING+lOCNA 0.1969157
## 5: 01 2015-09-10 07:00:04 15.76468 -22.37958 OK 0.1969157
## ---
## 3596: 01 2015-09-10 07:59:55 15.76231 -22.37752 OK 0.7561179
## 3597: 01 2015-09-10 07:59:56 15.76230 -22.37752 OK 0.7015594
## 3598: 01 2015-09-10 07:59:57 15.76229 -22.37752 OK 0.7937674
## 3599: 01 2015-09-10 07:59:58 15.76229 -22.37751 OK 0.9354126
## 3600: 01 2015-09-10 07:59:59 15.76228 -22.37751 OK 0.9846149
## cumDist linSpeed linAcc heading
## 1: 0.0000000 NA NA 0.0000000
## 2: 0.3322063 0.3322063 NA 3.1415927
## 3: 0.5291220 0.1969157 -1.352906e-01 -2.7854511
## 4: 0.7260377 0.1969157 -6.040891e-10 -2.7854511
## 5: 0.9229534 0.1969157 5.768741e-11 -2.7854511
## ---
## 3596: 933.9746911 0.7561179 -1.968536e-01 -0.7491576
## 3597: 934.6762505 0.7015594 -5.455845e-02 -1.0775073
## 3598: 935.4700180 0.7937674 9.220803e-02 -1.1389831
## 3599: 936.4054306 0.9354126 1.416451e-01 -1.0775073
## 3600: 937.3900455 0.9846149 4.920234e-02 -1.2266459
As before, note that geo parameter of the heading function is set to TRUE by the isGeo function in order to tell the heading function that it will be handling geographic data. In this case, it uses the bearing function from the geosphere package to return the headings in radians. If geo had been set to FALSE instead, the heading would have been calculated using the atan2 function and returned in radians as well. The same principle will apply for the functions calculating angular speeds and accelerations below.
traj01 <- mutate(traj01,
angSpeed = angSpeed(x = lon, y = lat, t = time, geo = isGeo(traj01)))
traj01## Trajectory table [3600 observations]
## Number of tracks: 1
## Geographic data: TRUE
## Dimensions: 2D
##
## id time lon lat error linDist
## 1: 01 2015-09-10 07:00:00 15.76468 -22.37957 OK 0.0000000
## 2: 01 2015-09-10 07:00:01 15.76468 -22.37957 OK 0.3322063
## 3: 01 2015-09-10 07:00:02 15.76468 -22.37958 MISSING+lOCNA 0.1969157
## 4: 01 2015-09-10 07:00:03 15.76468 -22.37958 MISSING+lOCNA 0.1969157
## 5: 01 2015-09-10 07:00:04 15.76468 -22.37958 OK 0.1969157
## ---
## 3596: 01 2015-09-10 07:59:55 15.76231 -22.37752 OK 0.7561179
## 3597: 01 2015-09-10 07:59:56 15.76230 -22.37752 OK 0.7015594
## 3598: 01 2015-09-10 07:59:57 15.76229 -22.37752 OK 0.7937674
## 3599: 01 2015-09-10 07:59:58 15.76229 -22.37751 OK 0.9354126
## 3600: 01 2015-09-10 07:59:59 15.76228 -22.37751 OK 0.9846149
## cumDist linSpeed linAcc heading angSpeed
## 1: 0.0000000 NA NA 0.0000000 NA
## 2: 0.3322063 0.3322063 NA 3.1415927 3.141593e+00
## 3: 0.5291220 0.1969157 -1.352906e-01 -2.7854511 3.561416e-01
## 4: 0.7260377 0.1969157 -6.040891e-10 -2.7854511 -3.288902e-09
## 5: 0.9229534 0.1969157 5.768741e-11 -2.7854511 -4.539102e-09
## ---
## 3596: 933.9746911 0.7561179 -1.968536e-01 -0.7491576 5.870940e-01
## 3597: 934.6762505 0.7015594 -5.455845e-02 -1.0775073 -3.283497e-01
## 3598: 935.4700180 0.7937674 9.220803e-02 -1.1389831 -6.147583e-02
## 3599: 936.4054306 0.9354126 1.416451e-01 -1.0775073 6.147581e-02
## 3600: 937.3900455 0.9846149 4.920234e-02 -1.2266459 -1.491386e-01
traj01 <- mutate(traj01,
angAcc = angAcc(x = lon, y = lat, t = time, geo = isGeo(traj01)))
traj01While you can use swaRm to process and analyze single trajectories, its main goal is to deal with multiple trajectories at once in order to calculate statistics about the collective behavior of animal moving in groups. In this section we will explore the different functions that this package offers to do exactly this.
The first step will be to load all the sample trajectories that are provided with the swaRm package. There are 16 of them that correspond to the trajectories of 16 animals collected using GPS collars. We will also compute all the basic statistics of these trajectories using the functions that we have explored in the previous section. We will use the lapply function to create and process all the trajectory tables separately and then the rbindtt function to combine them all together. This function is a wrapper around the rbindlist function of the data.table package to make it work properly with trajectory tables.
filePaths <- dir(system.file("extdata", package = "swaRm"), full.names = TRUE)
trajs <- lapply(filePaths, function(path) {
dat <- read.csv(path)
makeTraj(x = dat$lon, y = dat$lat, id = gsub(".*/|.csv.*", "\\1", path),
date = dat$date, time = dat$time, date.format = "ymd", time.format = "hms",
geo = TRUE) %>%
fixMissing() %>%
fixTimeDup() %>% filter(!is.na(time)) %>%
fixLocErr() %>%
fixLocNA() %>%
mutate(linDist = linDist(lon, lat, geo = TRUE)) %>%
mutate(cumDist = cumsum(linDist)) %>%
mutate(linSpeed = linSpeed(lon, lat, time, geo = TRUE),
linAcc = linAcc(lon, lat, time, geo = TRUE),
heading = heading(lon, lat, geo = TRUE),
angSpeed = angSpeed(lon, lat, time, geo = TRUE),
angAcc = angAcc(lon, lat, time, geo = TRUE))
}) %>% rbindtt()
trajs## Trajectory table [57600 observations]
## Number of tracks: 16
## Geographic data: TRUE
## Dimensions: 2D
##
## id time lon lat error linDist
## 1: 01 2015-09-10 07:00:00 15.76468 -22.37957 OK 0.0000000
## 2: 01 2015-09-10 07:00:01 15.76468 -22.37957 OK 0.3322063
## 3: 01 2015-09-10 07:00:02 15.76468 -22.37958 MISSING+lOCNA 0.1969157
## 4: 01 2015-09-10 07:00:03 15.76468 -22.37958 MISSING+lOCNA 0.1969157
## 5: 01 2015-09-10 07:00:04 15.76468 -22.37958 OK 0.1969157
## ---
## 57596: 16 2015-09-10 07:59:55 15.76234 -22.37731 OK 0.9268808
## 57597: 16 2015-09-10 07:59:56 15.76233 -22.37731 OK 1.0534119
## 57598: 16 2015-09-10 07:59:57 15.76233 -22.37731 OK 0.8238941
## 57599: 16 2015-09-10 07:59:58 15.76231 -22.37731 OK 1.0298676
## 57600: 16 2015-09-10 07:59:59 15.76231 -22.37731 OK 0.8238941
## cumDist linSpeed linAcc heading angSpeed
## 1: 0.0000000 NA NA 0.000000 NA
## 2: 0.3322063 0.3322063 NA 3.141593 3.141593e+00
## 3: 0.5291220 0.1969157 -1.352906e-01 -2.785451 3.561416e-01
## 4: 0.7260377 0.1969157 -6.040891e-10 -2.785451 -3.288902e-09
## 5: 0.9229534 0.1969157 5.768741e-11 -2.785451 -4.539102e-09
## ---
## 57596: 929.4328272 0.9268808 9.557838e-02 -1.570796 -1.336042e-01
## 57597: 930.4862390 1.0534119 1.265310e-01 -1.358974 2.118219e-01
## 57598: 931.3101331 0.8238941 -2.295178e-01 -1.570796 -2.118219e-01
## 57599: 932.3400007 1.0298676 2.059735e-01 -1.570796 -6.644542e-09
## 57600: 933.1638948 0.8238941 -2.059735e-01 -1.570796 6.644542e-09
## angAcc
## 1: NA
## 2: NA
## 3: -2.785451e+00
## 4: -3.561416e-01
## 5: -1.250199e-09
## ---
## 57596: -4.606295e-03
## 57597: 3.454261e-01
## 57598: -4.236438e-01
## 57599: 2.118219e-01
## 57600: 1.328908e-08
Note the use of the pipe symbol (%>%) from the dplyr package to create a data processing pipeline. Read more about dplyr and %>% here.
myLocation <- c(lon = mean(range(trajs[["lon"]])), lat = mean(range(trajs[["lat"]])))
myMap <- get_map(location = myLocation, source = "google",
maptype = "satellite", zoom = 17, scale = 2)
ggmap(myMap) +
geom_path(data = trajs$data, aes(x = lon, y = lat, color = id), size = 0.8) +
xlab("Longitude") + ylab("Latitude") +
xlim(min(trajs[["lon"]]), max(trajs[["lon"]])) +
ylim(min(trajs[["lat"]]), max(trajs[["lat"]])) +
guides(color = FALSE)Now that we have loaded and cleaned up the trajectories, we can start working toward extracting the characteristics of the group behavior. swaRm provides a number of functions to compute the most common statistics of group movement. More will be added in the future as the package keeps being developed.
Most of the functions demonstrated below should be self-explanatory and I will therefore only provide example code. Note that I will be using functions of the dplyr package in order to facilitate the processing of the data over the entire trajectory table.
trajSummary <- group_by(trajs, time) %>%
do(centroid(lon, lat, geo = TRUE)) %>%
ungroup()
trajSummary## Trajectory table [3600 observations]
## Number of tracks: 0
## Geographic data: TRUE
## Dimensions: 2D
##
## Source: local data table [3,600 x 3]
##
## time lon lat
## (time) (dbl) (dbl)
## 1 2015-09-10 07:00:00 15.76443 -22.37962
## 2 2015-09-10 07:00:01 15.76443 -22.37962
## 3 2015-09-10 07:00:02 15.76443 -22.37962
## 4 2015-09-10 07:00:03 15.76443 -22.37962
## 5 2015-09-10 07:00:04 15.76444 -22.37962
## 6 2015-09-10 07:00:05 15.76444 -22.37962
## 7 2015-09-10 07:00:06 15.76444 -22.37962
## 8 2015-09-10 07:00:07 15.76444 -22.37962
## 9 2015-09-10 07:00:08 15.76444 -22.37962
## 10 2015-09-10 07:00:09 15.76444 -22.37962
## .. ... ... ...
ggmap(myMap) +
geom_path(data = trajs$data, aes(x = lon, y = lat, group = id), size = 0.8, color = "white") +
geom_path(data = trajSummary$data, aes(x = lon, y = lat), size = 1, color = "red") +
xlab("Longitude") + ylab("Latitude") +
xlim(min(trajs[["lon"]]), max(trajs[["lon"]])) +
ylim(min(trajs[["lat"]]), max(trajs[["lat"]]))trajs <- group_by(trajs, time) %>%
mutate(distToCentroid = dist2centroid(lon, lat, geo = TRUE)) %>%
ungroup()
trajs## Trajectory table [57600 observations]
## Number of tracks: 16
## Geographic data: TRUE
## Dimensions: 2D
##
## Source: local data table [57,600 x 13]
##
## id time lon lat error linDist
## (chr) (time) (dbl) (dbl) (chr) (dbl)
## 1 01 2015-09-10 07:00:00 15.76468 -22.37957 OK 0.0000000
## 2 01 2015-09-10 07:00:01 15.76468 -22.37957 OK 0.3322063
## 3 01 2015-09-10 07:00:02 15.76468 -22.37958 MISSING+lOCNA 0.1969157
## 4 01 2015-09-10 07:00:03 15.76468 -22.37958 MISSING+lOCNA 0.1969157
## 5 01 2015-09-10 07:00:04 15.76468 -22.37958 OK 0.1969157
## 6 01 2015-09-10 07:00:05 15.76468 -22.37958 OK 0.2442443
## 7 01 2015-09-10 07:00:06 15.76467 -22.37958 lOCNA 0.3267916
## 8 01 2015-09-10 07:00:07 15.76467 -22.37959 lOCNA 0.3267916
## 9 01 2015-09-10 07:00:08 15.76467 -22.37959 OK 0.3267916
## 10 01 2015-09-10 07:00:09 15.76467 -22.37959 OK 0.5400470
## .. ... ... ... ... ... ...
## Variables not shown: cumDist (dbl), linSpeed (dbl), linAcc (dbl), heading
## (dbl), angSpeed (dbl), angAcc (dbl), distToCentroid (dbl)
trajSummary <- group_by(trajs, time) %>%
summarize(distToCentroid = mean(distToCentroid)) %>%
merge(trajSummary, by = "time")
trajSummary## Trajectory table [3600 observations]
## Number of tracks: 0
## Geographic data: TRUE
## Dimensions: 2D
##
## Source: local data table [3,600 x 4]
##
## time distToCentroid lon lat
## (time) (dbl) (dbl) (dbl)
## 1 2015-09-10 07:00:00 29.23587 15.76443 -22.37962
## 2 2015-09-10 07:00:01 29.28079 15.76443 -22.37962
## 3 2015-09-10 07:00:02 29.23897 15.76443 -22.37962
## 4 2015-09-10 07:00:03 29.25190 15.76443 -22.37962
## 5 2015-09-10 07:00:04 29.28317 15.76444 -22.37962
## 6 2015-09-10 07:00:05 29.30297 15.76444 -22.37962
## 7 2015-09-10 07:00:06 29.36472 15.76444 -22.37962
## 8 2015-09-10 07:00:07 29.39793 15.76444 -22.37962
## 9 2015-09-10 07:00:08 29.44143 15.76444 -22.37962
## 10 2015-09-10 07:00:09 29.47574 15.76444 -22.37962
## .. ... ... ... ...
ggplot(trajSummary$data, aes(x = time, y = distToCentroid)) +
geom_line() +
xlab("Time") + ylab("Distance to the group's centroid (m)") +
ylim(0, NA)trajs <- group_by(trajs, time) %>%
mutate(nearNeighbor = nn(lon, lat, id, geo = TRUE),
nearNeighborDist = nnd(lon, lat, geo = TRUE))
trajs## Trajectory table [57600 observations]
## Number of tracks: 16
## Geographic data: TRUE
## Dimensions: 2D
##
## Source: local data table [57,600 x 15]
##
## id time lon lat error linDist
## (chr) (time) (dbl) (dbl) (chr) (dbl)
## 1 01 2015-09-10 07:00:00 15.76468 -22.37957 OK 0.0000000
## 2 01 2015-09-10 07:00:01 15.76468 -22.37957 OK 0.3322063
## 3 01 2015-09-10 07:00:02 15.76468 -22.37958 MISSING+lOCNA 0.1969157
## 4 01 2015-09-10 07:00:03 15.76468 -22.37958 MISSING+lOCNA 0.1969157
## 5 01 2015-09-10 07:00:04 15.76468 -22.37958 OK 0.1969157
## 6 01 2015-09-10 07:00:05 15.76468 -22.37958 OK 0.2442443
## 7 01 2015-09-10 07:00:06 15.76467 -22.37958 lOCNA 0.3267916
## 8 01 2015-09-10 07:00:07 15.76467 -22.37959 lOCNA 0.3267916
## 9 01 2015-09-10 07:00:08 15.76467 -22.37959 OK 0.3267916
## 10 01 2015-09-10 07:00:09 15.76467 -22.37959 OK 0.5400470
## .. ... ... ... ... ... ...
## Variables not shown: cumDist (dbl), linSpeed (dbl), linAcc (dbl), heading
## (dbl), angSpeed (dbl), angAcc (dbl), distToCentroid (dbl), nearNeighbor
## (chr), nearNeighborDist (dbl)
trajSummary <- group_by(trajs, time) %>%
summarize(nearNeighborDist = mean(nearNeighborDist)) %>%
merge(trajSummary, by = "time")
trajSummary## Trajectory table [3600 observations]
## Number of tracks: 0
## Geographic data: TRUE
## Dimensions: 2D
##
## Source: local data table [3,600 x 5]
##
## time nearNeighborDist distToCentroid lon lat
## (time) (dbl) (dbl) (dbl) (dbl)
## 1 2015-09-10 07:00:00 8.214665 29.23587 15.76443 -22.37962
## 2 2015-09-10 07:00:01 8.193252 29.28079 15.76443 -22.37962
## 3 2015-09-10 07:00:02 8.094282 29.23897 15.76443 -22.37962
## 4 2015-09-10 07:00:03 8.072977 29.25190 15.76443 -22.37962
## 5 2015-09-10 07:00:04 8.029790 29.28317 15.76444 -22.37962
## 6 2015-09-10 07:00:05 8.020695 29.30297 15.76444 -22.37962
## 7 2015-09-10 07:00:06 8.070632 29.36472 15.76444 -22.37962
## 8 2015-09-10 07:00:07 8.098252 29.39793 15.76444 -22.37962
## 9 2015-09-10 07:00:08 8.092327 29.44143 15.76444 -22.37962
## 10 2015-09-10 07:00:09 8.094541 29.47574 15.76444 -22.37962
## .. ... ... ... ... ...
ggplot(trajSummary$data, aes(x = time, y = nearNeighborDist)) +
geom_line() +
xlab("Time") + ylab("Mean distance to nearest neighbor (m)") +
ylim(0, NA)trajs <- group_by(trajs, time) %>%
mutate(isChull = isChull(lon, lat))
# Plot the convex hull for a randomly chose timestamp
randTime <- trajs[["time"]][sample(1:nrow(trajs), 1)]
subTraj <- filter(trajs, time == randTime)
chullPol <- filter(subTraj, isChull > 0) %>%
arrange(isChull)
myLocation <- c(lon = mean(range(subTraj[["lon"]])), lat = mean(range(subTraj[["lat"]])))
myMap <- get_map(location = myLocation, source = "google",
maptype = "satellite", zoom = 19, scale = 2)
ggmap(myMap) +
geom_polygon(data = chullPol$data, aes(lon, lat), fill = "red", alpha = 0.25) +
geom_point(data = subTraj$data, aes(lon, lat), size = 3, color = "white") +
geom_point(data = subTraj$data, aes(lon, lat, color = id), size = 2) +
xlab("Longitude") + ylab("Latitude") +
xlim(min(subTraj[["lon"]]), max(subTraj[["lon"]])) +
ylim(min(subTraj[["lat"]]), max(subTraj[["lat"]])) +
guides(color = FALSE)trajSummary <- group_by(trajs, time) %>%
summarize(chullPerim = chullPerimeter(lon, lat, geo = TRUE),
chullArea = chullArea(lon, lat, geo = TRUE)) %>%
merge(trajSummary, by = "time")
ggplot(trajSummary$data, aes(x = time, y = chullPerim)) +
geom_line() +
xlab("Time") + ylab("Perimeter of the group's convex hull (m)") +
ylim(0, NA)ggplot(trajSummary$data, aes(x = time, y = chullArea)) +
geom_line() +
xlab("Time") + ylab(bquote("Surface area of the group's convex hull (" * m ^ 2 * ")")) +
ylim(0, NA)The group’s general shape is estimated by fitting a confidence ellipse on either the coordinates of all the group members, or only those belonging to the convex hull of the group (we will use the first method here).
From this ellipse, we can extract two shape parameters:
trajSummary <- group_by(trajs, time) %>%
summarize(sphericity = sphericity(lon, lat)) %>%
merge(trajSummary, by = "time")
ggplot(trajSummary$data, aes(x = time, y = sphericity)) +
geom_line() +
xlab("Time") + ylab("Group's sphericity") +
ylim(0, 1)trajSummary <- group_by(trajs, time) %>%
summarize(stretch = stretch(lon, lat)) %>%
merge(trajSummary, by = "time")
ggplot(trajSummary$data, aes(x = time, y = stretch)) +
geom_line() +
xlab("Time") + ylab("Group's sphericity") +
ylim(-pi / 2, pi / 2)