Week 4 Excersices

Author

Claude Widmer

Introduction Code

Load all Modules

require("readr")
require("sf")
require("dplyr")
require("ggplot2")
require("tidyr")
require("gridExtra")
require("SimilarityMeasures")

Load data

wildschwein <- read_delim("data/wildschwein_BE_2056.csv", ",")

# Careful! What Timezone is assumed?
sabi <- wildschwein |>
    st_as_sf(coords = c("E", "N"), crs = 2056, remove = FALSE) |>
    filter(
      TierName == "Sabi", 
      DatetimeUTC >= "2015-07-01", 
      DatetimeUTC < "2015-07-03"
      )
distance_by_element <- function(later, now) {
  as.numeric(
    st_distance(later, now, by_element = TRUE)
  )
}


sabi <- sabi |>
    mutate(
        nMinus2 = distance_by_element(lag(geometry, 2), geometry),  # distance to pos -30 minutes
        nMinus1 = distance_by_element(lag(geometry, 1), geometry),  # distance to pos -15 minutes
        nPlus1  = distance_by_element(geometry, lead(geometry, 1)), # distance to pos +15 mintues
        nPlus2  = distance_by_element(geometry, lead(geometry, 2))  # distance to pos +30 minutes
    )


sabi <- sabi |>
    rowwise() |>
    mutate(
        stepMean = mean(c(nMinus2, nMinus1, nPlus1, nPlus2))
    ) |>
    ungroup()


sabi <- sabi |>
    mutate(static = stepMean < mean(stepMean, na.rm = TRUE))

sabi

Simple feature collection with 192 features and 12 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 2569724 ymin: 1204916 xmax: 2570927 ymax: 1205957
Projected CRS: CH1903+ / LV95
# A tibble: 192 × 13
   TierID TierName CollarID DatetimeUTC                E        N
 * <chr>  <chr>       <dbl> <dttm>                 <dbl>    <dbl>
 1 002A   Sabi        12275 2015-06-30 22:00:13 2569972. 1205366.
 2 002A   Sabi        12275 2015-06-30 22:16:06 2569975. 1205637.
 3 002A   Sabi        12275 2015-06-30 22:30:19 2570266. 1205857.
 4 002A   Sabi        12275 2015-06-30 22:45:13 2570208. 1205913.
 5 002A   Sabi        12275 2015-06-30 23:00:10 2570247. 1205731.
 6 002A   Sabi        12275 2015-06-30 23:15:17 2570512. 1205279.
 7 002A   Sabi        12275 2015-06-30 23:30:38 2570684. 1205103.
 8 002A   Sabi        12275 2015-06-30 23:45:16 2570526. 1205051.
 9 002A   Sabi        12275 2015-07-01 00:00:10 2570532. 1205044.
10 002A   Sabi        12275 2015-07-01 00:15:14 2570530. 1205059.
# ℹ 182 more rows
# ℹ 7 more variables: geometry <POINT [m]>, nMinus2 <dbl>, nMinus1 <dbl>,
#   nPlus1 <dbl>, nPlus2 <dbl>, stepMean <dbl>, static <lgl>

sabi_filter <- sabi |>
    filter(!static)

sabi_filter |>
    ggplot(aes(E, N)) +
    geom_point(data = sabi, col = "red") +
    geom_path() +
    geom_point() +
    coord_fixed() +
    theme(legend.position = "bottom")

Exercise A

Task 1: Calculate Distance

Now, you can Step a): Specify a temporal window v and Step b): Measure the distance to every point within v, which you had used with sabi, on on your own movement data or to a different wild boar using different sampling intervals.

Simple feature collection with 191 features and 13 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 2568791 ymin: 1205847 xmax: 2570089 ymax: 1206864
Projected CRS: CH1903+ / LV95
# A tibble: 191 × 14
   TierID TierName CollarID DatetimeUTC                E        N
   <chr>  <chr>       <dbl> <dttm>                 <dbl>    <dbl>
 1 018A   Ruth        13974 2015-03-05 23:00:18 2569208. 1206239.
 2 018A   Ruth        13974 2015-03-05 23:15:16 2569110. 1206183.
 3 018A   Ruth        13974 2015-03-05 23:30:29 2569084. 1206206.
 4 018A   Ruth        13974 2015-03-05 23:45:16 2569075. 1206208.
 5 018A   Ruth        13974 2015-03-06 00:00:12 2569079. 1206200.
 6 018A   Ruth        13974 2015-03-06 00:15:08 2569050. 1206209.
 7 018A   Ruth        13974 2015-03-06 00:30:13 2569024. 1206247.
 8 018A   Ruth        13974 2015-03-06 00:45:09 2568908. 1206195.
 9 018A   Ruth        13974 2015-03-06 01:00:13 2568791. 1206086.
10 018A   Ruth        13974 2015-03-06 01:15:09 2568799. 1206075.
# ℹ 181 more rows
# ℹ 8 more variables: geometry <POINT [m]>, nMinus3 <dbl>, nMinus2 <dbl>,
#   nMinus1 <dbl>, nPlus1 <dbl>, nPlus2 <dbl>, nPlus3 <dbl>, stepMean <dbl>

Task 2: Specify and apply threshold d

After calculating the Euclidean distances to positions within the temporal window v in task 1, you can explore these values (we stored them in the column stepMean) using summary statistics (histograms, boxplot, summary()): This way we can define a reasonable threshold value to differentiate between stops and moves. There is no “correct” way of doing this, specifying a threshold always depends on data as well as the question that needs to be answered. In this exercise, use the mean of all stepMean values.

Store the new information (boolean to differentiate between stops (TRUE) and moves (FALSE)) in a new column named static.

-> I will use the 3rd Quartile, because it has many outliers. Hope this is okay :)

    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
  0.7882   2.0690   5.9740  69.9283 103.0305 526.0221        6

Simple feature collection with 191 features and 14 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 2568791 ymin: 1205847 xmax: 2570089 ymax: 1206864
Projected CRS: CH1903+ / LV95
# A tibble: 191 × 15
   TierID TierName CollarID DatetimeUTC                E        N
 * <chr>  <chr>       <dbl> <dttm>                 <dbl>    <dbl>
 1 018A   Ruth        13974 2015-03-05 23:00:18 2569208. 1206239.
 2 018A   Ruth        13974 2015-03-05 23:15:16 2569110. 1206183.
 3 018A   Ruth        13974 2015-03-05 23:30:29 2569084. 1206206.
 4 018A   Ruth        13974 2015-03-05 23:45:16 2569075. 1206208.
 5 018A   Ruth        13974 2015-03-06 00:00:12 2569079. 1206200.
 6 018A   Ruth        13974 2015-03-06 00:15:08 2569050. 1206209.
 7 018A   Ruth        13974 2015-03-06 00:30:13 2569024. 1206247.
 8 018A   Ruth        13974 2015-03-06 00:45:09 2568908. 1206195.
 9 018A   Ruth        13974 2015-03-06 01:00:13 2568791. 1206086.
10 018A   Ruth        13974 2015-03-06 01:15:09 2568799. 1206075.
# ℹ 181 more rows
# ℹ 9 more variables: geometry <POINT [m]>, nMinus3 <dbl>, nMinus2 <dbl>,
#   nMinus1 <dbl>, nPlus1 <dbl>, nPlus2 <dbl>, nPlus3 <dbl>, stepMean <dbl>,
#   static <lgl>

Task 3: Visualize segmented trajectories

Now visualize the segmented trajectory spatially. Just like last week, you can use ggplot with geom_path(), geom_point() and coord_equal(). Assign colour = static within aes() to distinguish between segments with “movement” and without.

Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

Task 4: Segment-based analysis

In applying Laube and Purves (2011), we’ve come as far as step b) in Figure 16.1. In order to complete the last steps (c and d), we need a unique ID for each segment that we can use as a grouping variable. The following function does just that (it assigns unique IDs based on the column static which you created in Task 2). You will learn about functions next week. For now, just copy the following code chunk into your script and run it.

You can use the newly created function rle_id to assign unique IDs to subtrajectories (as shown below). Visualize the moving segments by colourizing them by segment_ID. Then use segment_ID as a grouping variable to determine the segments duration and remove short segments (e.g. segments with a duration < 5 Minutes)

Exercise B

Task 1: Similarity measures

We will now calculate similarties between trajectories using a new dataset pedestrian.csv (available on moodle). Download an import this dataset as a data.frame or tibble. It it a set of six different but similar trajectories from pedestrians walking on a path.

For this task, explore the trajectories first and get an idea on how the pedestrians moved.

Rows: 289 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl  (3): TrajID, E, N
dttm (1): DatetimeUTC

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Task 2: Calculate similarity

Install the package SimilarityMeasures (install.packages(“SimilarityMeasures”)). Familiarize yourself with this package by skimming through the function descriptions help(package = “SimilarityMeasures”). Now compare trajectory 1 to trajectories 2-6 using different similarity measures from the package. Your options are. DTW, EditDist, Frechet and LCSS.

Before visualizing your results think about the following: Which two trajectories to you percieve to be most similar, which are most dissimilar? Now visualize the results from the computed similarity measures. Which measure reflects your own intuition the closest?

-> 1 and 6 are the closest together. As seen in the attributes above and the plots.

Note: - All functions in the package need matrices as input, with one trajectory per matrix. - LCSStakes very long to compute. The accuracy of the algorithm (pointSpacing = ,pointDistance = and errorMarg =) can be varied to provide faster calculations.

Please see Vlachos, Gunopoulos, and Kollios (2002) for more information.

         DTW      EditDist Frechet  LCSS
TrajID 2 3650.025 45       28.54075 3   
TrajID 3 50785.51 47       2307.844 1   
TrajID 4 5906.787 42       1069.229 11  
TrajID 5 2178.411 28       717.9816 31  
TrajID 6 1152.718 27       38.96272 38