To download data, you simply need to copy paste its URL (seal_gps_data.csv > RAW > copy URL) and use readr::read_csv to fetch it. For now, we will only download the gps data.
Tip
Readr is part of the Tidyverse “metapackage”, like ggplot2, dplyr, lubridate etc. In the following course, we will mostly rely on tidyverse functions because they tend to be more efficient than base R functions (like read.csv)
Tip
It is good practice to always clean your environment when starting a new script. This will avoid using too much RAM and calling objects stored in your environment from other scripts.
#clean environmentrm(list =ls())gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 605265 32.4 1378294 73.7 702068 37.5
Vcells 1112040 8.5 8388608 64.0 1927682 14.8
#load packageslibrary(tidyverse)library(sf)
Warning: package 'sf' was built under R version 4.5.2
You now have a “tibble” containing gps data and timestamps for 3 individuals.
Step 2 : Cleaning GPS data
Before plotting any map, we need to make sure that our dataset is clean. GPS loggers can sometimes generate obviously wrong points (hundreds of km away from the actual animal). A simple way to spot these points is by simply calculating the speed between 2 consecutive locations. If this speed is higher than the maximum speed of your species, then you need to remove those points to avoid interfering with your analyses.
Knowing that overall, a good maximum speed for pinnipeds is 2.5 m/s. Try writing a script to filter unlikely locations based on speed. To do that, you will need to :
Convert your data frame to a spatial point data frame using sf::st_as_sf() and set the GPS CRS using sf::st_crs() and EPSG:4326. SF is the standard package for handling spatial data.
Calculate the duration (seconds) between location T and T-1 using dplyr::mutate() and lag().
Calculate the distance (meters) between location T and T-1 using sf::st_distance()
Compute the speed for each location
Before checking if you have any abnormal speed, remember that your data frame contains 3 different tracks. You will also need to remove the speed, distance and time interval value for the first GPS point of each seal. Use again lag() to find the first row of each seal track. Combine it with ifelse() to remove the values.
ifelse statements
ifelse() allows you to change a value based on a statement. Its syntax is ifelse(statement, value when TRUE, value when FALSE). If you refer to vectors or data frame columns in the statement (e.g.df$colA > df$colB), ifelse() will compare every pair of values.
Filter the ones with a speed > 2.5 m/s.
Bonus step: do this filter inside a while() loop which will re-compute new speed, distance and time values after filtering locations based on the speed value.
Using lag() and dplyr::mutate()
To code these last 3 points within the Tidyverse framework, you will need to use lag() which will compare the row i of your data frame with the row i-1 anddplyr::mutate which can compute a new column based on existing ones in a data frame.
# A tibble: 10 × 3
id value new_val
<chr> <int> <int>
1 A 1 NA
2 B 2 1
3 A 3 1
4 B 4 1
5 A 5 1
6 B 6 1
7 A 7 1
8 B 8 1
9 A 9 1
10 B 10 1
#base r versionsample_data$new_val <- sample_data$value -lag(sample_data$value)
You will notice that Tidyverse functions often use the pipe (%>%). This operator allows you to specify the data frame on which you are working. Its keyboard shortcut is ctrl + shift + M.
Important
If you get overwhelmed, remember to check the help page of a function (e.g.?mutate()).
Step 2 solution
First we compute the metrics. See how we are:
setting the CRS (projection)
use commands like “%>%”, “mutate”, to manipulate our data
use “ifelse”
sf_gps <-st_as_sf(my_gps, coords =c("longitude", "latitude"))st_crs(sf_gps) <-4326#set gps CRSsf_gps <- sf_gps %>%mutate(distance_to_prev =as.numeric(st_distance(geometry, lag(geometry), by_element =TRUE)), # compute distances in metersinterval_secs =as.numeric(difftime(timestamp, lag(timestamp), units ="secs")), #compute time interval in secondsspeed = distance_to_prev/interval_secs # Compute speed in m.s )#remove speed, time and distance when switching from one individual to anothersf_gps$distance_to_prev <-ifelse(sf_gps$id !=lag(sf_gps$id),NA, sf_gps$distance_to_prev)sf_gps$interval_secs <-ifelse(sf_gps$id !=lag(sf_gps$id),NA, sf_gps$interval_secs)sf_gps$speed <-ifelse(sf_gps$id !=lag(sf_gps$id),NA, sf_gps$speed)
We now see that we have 6 points with speed > 2.5 m/s and up to 290.15 m/s. Seems a bit high for a seal…
summary(sf_gps$speed)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.0000 0.1127 0.3447 0.6255 0.7319 290.1517 3
#what does the "which" function do?length(which(sf_gps$speed >2.5))
[1] 6
Let’s remove these points (see here we use “while”)
while (any(sf_gps$speed >2.5, na.rm = T)) { sf_gps <- sf_gps %>%filter(speed <2.5) # Filter out points with speed > 2.5 m/s#compute again speed, time and distance sf_gps <- sf_gps %>%mutate(distance_to_prev =as.numeric(st_distance(geometry, lag(geometry), by_element =TRUE)),interval_secs =as.numeric(difftime(timestamp, lag(timestamp), units ="secs")), speed = distance_to_prev/interval_secs ) sf_gps$distance_to_prev <-ifelse(sf_gps$id !=lag(sf_gps$id),NA, sf_gps$distance_to_prev) sf_gps$interval_secs <-ifelse(sf_gps$id !=lag(sf_gps$id),NA, sf_gps$interval_secs) sf_gps$speed <-ifelse(sf_gps$id !=lag(sf_gps$id),NA, sf_gps$speed)}summary(sf_gps$speed)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.0000 0.1124 0.3433 0.4615 0.7299 2.4898 3
length(which(sf_gps$speed >2.5))
[1] 0
Step 3 : Interpolating tracks with a constant time step
For multiple reasons, GPS data collected by loggers do not display a constant time step between each location. With diving animals for instance, a logger cannot record a GPS location when underwater because it won’t be able to reach satellite signals.
print(summary(sf_gps$interval_secs))
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
300 347 424 648 617 15061 3
If we check this time step on our data, we see that there is at least 300 secs (5 minutes) between 2 GPS locations. The average time step is around 10 minutes (648 secs) and goes up to 15061 secs (~4 hours)!
Finding the right cutoff for interpolation can be tricky. Here, we will use the 90th percentile of our time step, which is around 20 minutes.
Our goal is now to regularize our GPS track to have one location every 20 minutes. To do so, use the adehabitatLT R package to:
Transform your data frame into an ltraj object using as.ltraj(). Using sf::st_coordinates() you can extract the x and y coordinates of your tracks.
Interpolate the tracks to have one location every 20 minutes using redisltraj().
Convert back your ltraj object to a sf data frame using adehabitatLT::ld() and sf::st_as_sf().
You want to do that because ltraj object are specific to adehabitatLT and therefore super annoying if you want to plot maps, or add a depth information to your track as we want to here.
Step 3 solution
First we convert to ltraj.
library(adehabitatLT)
Warning: package 'adehabitatLT' was built under R version 4.5.1
Warning: package 'ade4' was built under R version 4.5.1
Warning: package 'adehabitatMA' was built under R version 4.5.1
# Create an ltraj objectltraj_data <-as.ltraj(xy =st_coordinates(sf_gps), # Coordinates (x, y)date = sf_gps$timestamp, # Timestampsid = sf_gps$id, # Animal IDstypeII =TRUE# TRUE if timestamps are provided, FALSE otherwise)# Print the ltraj objectprint(ltraj_data)
*********** List of class ltraj ***********
Type of the traject: Type II (time recorded)
* Time zone: UTC *
Irregular traject. Variable time lag between two locs
Characteristics of the bursts:
id burst nb.reloc NAs date.begin date.end
1 dewey dewey 1182 0 2017-03-06 21:36:07 2017-03-15 23:31:15
2 huey huey 1061 0 2017-03-09 04:30:09 2017-03-15 23:54:09
3 louie louie 1084 0 2017-03-06 22:54:53 2017-03-15 23:57:33
infolocs provided. The following variables are available:
[1] "pkey"
As you can see from the output, we have variable time step between our locations. Let’s use redisltraj to interpolate our tracks (20 minutes, * 60 seconds)
Now that we are sure that our GPS dataset is clean, we can finally map it. Because we use SF, we can easily plot our data with ggplot2::geom_sf.
Mapping with ggplot2 and sf
We can also add a shapefile (i.e. mask) of the coastline. You can find it online on the github repository. Download the “Costaline_UTM30” folder and load the .shp file in R using sf::st_read().
Another useful SF function is sf::st_bbox which compute the x and y limits of your spatial data. We can then use it inside our ggplot map with coord_sf() to specify the limits of our study area.
When plotting with ggplot2, remember that each line is a layer, and each layer is plotted on top of the previous one.
Finally, ggspatial package allows you to easily add annotation scale and north arrow.
r =getOption("repos")r["CRAN"] ="http://cran.us.r-project.org"options(repos = r)install.packages("ggspatial")
package 'ggspatial' successfully unpacked and MD5 sums checked
The downloaded binary packages are in
C:\Users\mrc24pxb\AppData\Local\Temp\Rtmp6PJCD5\downloaded_packages
library(ggspatial)library(viridis)bbox <-st_bbox(sf_traj)#download all files from folder "Coastline_UTM30" and upload them from the "files" tab, be careful on your path !!coastline <-st_read("~/MarineTopPredators_MasterCourse/OSX4025_MovementEcology/Practicals/Coastline/Coastline_UTM30.shp", quiet=TRUE)
As you can see, this data frame has the same id column as our gps data. Here, each row summarize one dive. Some loggers like the one here do not record detailed information of diving data, and summarize it with key values to save power and memory. Here we see that the logger recorded the timestamps of when the seal started and ended a dive. We also get the maximum depth reached during that dive.
When dealing with new data, a good practice is to explore it, through plots or using summary(), table(), or dplyr::group_by().
Try to find for each seal :
The overall maximum depth it reached
Using group_by() and summarise()
You can use dplyr::group_by and dplyr::summarise(). group_by() allows you to group rows based on one of the columns values. After group_by(), using summarise() will allow you to calculate new variables for your groups.
See how we create a dataset that you can use further.
The total average duration of its dive
Tip
Remember you have learned how to compute time interval in step 2
# A tibble: 3 × 2
id max_depth
<chr> <dbl>
1 dewey 39.2
2 huey 26.1
3 louie 28
Step 6 solution - average dive duration
my_depth %>%mutate(dive_duration_sec =as.numeric(difftime(dive_end, dive_start, units ="secs"))) %>%group_by(id) %>%summarise(mean_div_dur_sec =mean(dive_duration_sec))
# A tibble: 3 × 2
id mean_div_dur_sec
<chr> <dbl>
1 dewey 310.
2 huey 216.
3 louie 182.
As I said, plots are also a good way to explore your data. Here let’s try to plot the dive profiles of our 3 individuals (time ~ depth)!
Code a ggplot() with dive start timestamps on the x-axis, and depth on the y-axis. Give a different colour to each individuals.
Bonus step : for clarity, you can try using facet_wrap() in your ggplot so that profiles do not overlap.
Step 6 solution - plot
ggplot() +geom_point(data = my_depth, aes(x = dive_start, y =-max_depth, col = id)) +facet_wrap(~id) +theme_bw() +labs(x ="Time", y ="Depth")
3. Merging channels
We now have clean GPS and depth data, but both have a different temporal resolution. We have spatial data every 20 minutes, and depth data for each dive.
To merge these two sources, we again need to summarize the higher resolution one (i.e. dives) to fit the coarser resolution data (i.e. gps). To do so, we can take advantage of having IDs and timestamps for both datasets.
In R, a simple way to perform such task is too use a for() loop combined with dplyr::filter().
for() loops
In biology, we often work with datasets—for example, a list of different species, sample measurements, or genetic sequences. Instead of manually analyzing each data point, we can use a for() loop to repeat the same operation on every item in a dataset.
Example:
species <-c("Adélie Penguin", "Emperor Penguin", "Chinstrap Penguin")for (s in species) {print(s)}
dplyr::filter() allows to filter rows of a data frame based on one or several statements. For instance we can filter depth data to only keep the data belonging to Huey and with maximum depth deeper than 20 m.
Here, for each row of sf_traj, we want to add the average dive depth of the seal.
Merging depth and GPS solution
First, we need to initialize our new column with NAs.
sf_traj$mean_dive_depth <-NA
Then we create a for() loop which will go through all the rows of our GPS data frame one by one.
We ask this for loop to find all the data from all the dives that happened between the timestamp of a GPS locations and the one before (so over a 20 minutes interval).
Because of that, we won’t be able to compute average dive depth for the first location of a track. To exclude those from the loop :
We make it spanning for the second line of the GPS data frame to the last one (2:nrow(sf_traj)).
We compute an average dive depth only if the 2 consecutive GPS locations belong to the same individual.
Finally, during some interval, no dive were recorded. With our current code, these location have NA depth values. We change these to 0.
for(i in2:nrow(sf_traj)){if(!is.na(sf_traj$dt[i])){#get time limits and id t_start <- sf_traj$date[i-1] t_end <- sf_traj$date[i] my_id <- sf_traj$id[i]#filter depth data temp_depth <- my_depth %>%filter(id == my_id & dive_start >= t_start & dive_start <= t_end) %>%filter(dive_end >= t_start & dive_end <= t_end )#paste value in the GPS data frame sf_traj$mean_dive_depth[i] <-mean(temp_depth$max_depth) }}#Change NAs to 0 when no dive were recorded.sf_traj$mean_dive_depth <-ifelse(is.na(sf_traj$mean_dive_depth), 0, sf_traj$mean_dive_depth)
Now that you now how to merge different data streams, adapt the previous script to also compute the total time spent diving over a 20 minutes interval.
Get time spent diving solution
sf_traj$mean_dive_depth <-NAsf_traj$diving_duration_sec <-NAmy_depth <- my_depth %>%mutate(dive_duration_sec =as.numeric(difftime(dive_end, dive_start, units ="secs")))for(i in2:nrow(sf_traj)){if(!is.na(sf_traj$dt[i])){#get time limits and id t_start <- sf_traj$date[i-1] t_end <- sf_traj$date[i] my_id <- sf_traj$id[i]#filter depth data temp_depth <- my_depth %>%filter(id == my_id & dive_start >= t_start & dive_start <= t_end) %>%filter(dive_end >= t_start & dive_end <= t_end )#paste value in the GPS data frame sf_traj$mean_dive_depth[i] <-mean(temp_depth$max_depth) sf_traj$diving_duration_sec[i] <-sum(temp_depth$dive_duration_sec) }}#Change NAs to 0 when no dive were recorded.sf_traj$mean_dive_depth <-ifelse(is.na(sf_traj$mean_dive_depth), 0, sf_traj$mean_dive_depth)sf_traj$diving_duration_sec <-ifelse(is.na(sf_traj$diving_duration_sec), 0, sf_traj$diving_duration_sec)
We now have our final data frame, with combined depth and GPS data!
Let’s explore it a bit. Try to generate a new map where the size of each point would be dependent on the diving depth.
From this map, we can already see some patterns in the seals diving behaviour. Diving depth seems to increase as they go further from the coast and breeding site.
Tip
Attaching variables to spatial coordinates is also often used in Hidden Markov Models (HMMs, see tutorial n°3) to classify behaviours.
4. Introduction to spatial analyses
This simple visual analysis is often not enough. A common way to analyze such data is to convert tracks into “rasters”.
Rasters
A raster in R is like a digital image made up of tiny squares (pixels), where each pixel has a value that represents something in a specific geographic area. In biology and ecology, raster data is often used for spatial data—things like temperature, vegetation cover, or sea ice concentration over a region. These grids can be stored in raster objects in R and analyzed with the terra package
library(terra)# Create a blank raster with 10 rows and 10 columnsr <-rast(nrows =10, ncols =10, xmin =0, xmax =100, ymin =0, ymax =100)# Fill it with random temperature valuesvalues(r) <-runif(ncell(r), min =-10, max =30) # Plot the rasterplot(r, main ="Temperature Map (°C)")
To do so, we first need to change the projection of our data. The UTM projection uses meters as units instead of degrees. We need this to specify the spatial resolution of our grid in meters (note that you could do it in degrees, it is just less intuitive). The EPSG code for Europe is 25828
sf_traj_UTM <-st_transform(sf_traj, 25828) # UTM projection, change based on locationbbox_UTM <-st_bbox(sf_traj_UTM)coastline_UTM <-st_transform(coastline, 25828)
Tip
How do we transform to Latitude and Longitude if we instead have UTM (sometimes noted as “Easting and Northing”, or “x and y”)?
Solution - from “Easting and Northing” to “Latitude and Longitude”
How do we get out of the sf package/format and go back to a simple data frame? You might need to do this step if you want to input your data in track2KBA, for example.
Solution - from sf to simple data frame
#extract geometry from sf and save as coordinatestracks_coords <-do.call(rbind, st_geometry(sf_traj_LatLon)) %>%as_tibble(.name_repair="minimal") %>%setNames(c("Longitude","Latitude"))#extract data (without geometry) and attach the coordinatespredicted_Locations_LatLon<-dplyr::select(as.data.frame(sf_traj_LatLon),-geometry)predicted_Locations_LatLon<-cbind(predicted_Locations_LatLon,tracks_coords)#specify again your animal ID as factor predicted_Locations_LatLon$id<-as.factor(predicted_Locations_LatLon$id)
Then we use terra::rast() to create a raster covering the extent of our GPS locations, and ask for a resolution of 1km².
# Define a raster gridrGrid <-rast(ext(sf_traj_UTM), resolution =1000) #resolution at 1Km
Once created, we can fill it with our values. Let’s say we want to know in each cell the average dive depth.
r_mean <-rasterize(vect(sf_traj_UTM), rGrid, field ="mean_dive_depth", fun = mean)plot(r_mean)
We know have a raster showing diving behaviour of our seals in our study areas. To map it in a cleaner way. We need to convert it back to a data frame as ggplot doesn’t handle rasters.
df_raster <-as.data.frame(r_mean, xy =TRUE, na.rm =TRUE)ggplot() +geom_tile(data = df_raster, aes(x = x, y = y, fill = mean)) +geom_sf(data = coastline_UTM, fill="antiquewhite") +#coastline shapefilecoord_sf(xlim =c(bbox_UTM["xmin"], bbox_UTM["xmax"]), ylim =c(bbox_UTM["ymin"], bbox_UTM["ymax"])) +#study area limitsannotation_scale(location ="br", width_hint =0.5) +#annotation scale and north arrowannotation_north_arrow(location ="tr", which_north ="true",style = north_arrow_fancy_orienteering) +scale_fill_viridis(discrete = F) +#viridis is colourblind friendlytheme_bw() +theme(panel.grid.major =element_line(color =gray(.5),linetype ="dashed",linewidth =0.5),panel.background =element_rect(fill ="aliceblue")) +labs(x ="Longitude", y ="Latitude", fill ="Average dive depth")
Tip
Once rasterized, spatial data can be easily analyzed in R. We could for instance study the effect of the environment of the dive depth of seals by downloading online a raster of the same resolution containing sea surface temperature, chlA concentration or any other variable for instance. Such data are freely available on https://data.marine.copernicus.eu/products
Sources
Data sample from “Iorio-Merlo, Virginia et al. (2022), Prey encounters and spatial memory influence use of foraging patches in a marine central place forager, Proceedings of the Royal Society B: Biological Sciences” - https://doi.org/10.1098/rspb.2021.2261