require("readr")
require("sf")
require("dplyr")
require("ggplot2")
require("tidyr")
require("gridExtra")
require("SimilarityMeasures")Week 4 Excersices
Introduction Code
Load all Modules
Load data
wildschwein <- read_delim("data/wildschwein_BE_2056.csv", ",")
# Careful! What Timezone is assumed?
sabi <- wildschwein |>
st_as_sf(coords = c("E", "N"), crs = 2056, remove = FALSE) |>
filter(
TierName == "Sabi",
DatetimeUTC >= "2015-07-01",
DatetimeUTC < "2015-07-03"
)
distance_by_element <- function(later, now) {
as.numeric(
st_distance(later, now, by_element = TRUE)
)
}
sabi <- sabi |>
mutate(
nMinus2 = distance_by_element(lag(geometry, 2), geometry), # distance to pos -30 minutes
nMinus1 = distance_by_element(lag(geometry, 1), geometry), # distance to pos -15 minutes
nPlus1 = distance_by_element(geometry, lead(geometry, 1)), # distance to pos +15 mintues
nPlus2 = distance_by_element(geometry, lead(geometry, 2)) # distance to pos +30 minutes
)
sabi <- sabi |>
rowwise() |>
mutate(
stepMean = mean(c(nMinus2, nMinus1, nPlus1, nPlus2))
) |>
ungroup()
sabi <- sabi |>
mutate(static = stepMean < mean(stepMean, na.rm = TRUE))
sabiSimple feature collection with 192 features and 12 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 2569724 ymin: 1204916 xmax: 2570927 ymax: 1205957
Projected CRS: CH1903+ / LV95
# A tibble: 192 × 13
TierID TierName CollarID DatetimeUTC E N
* <chr> <chr> <dbl> <dttm> <dbl> <dbl>
1 002A Sabi 12275 2015-06-30 22:00:13 2569972. 1205366.
2 002A Sabi 12275 2015-06-30 22:16:06 2569975. 1205637.
3 002A Sabi 12275 2015-06-30 22:30:19 2570266. 1205857.
4 002A Sabi 12275 2015-06-30 22:45:13 2570208. 1205913.
5 002A Sabi 12275 2015-06-30 23:00:10 2570247. 1205731.
6 002A Sabi 12275 2015-06-30 23:15:17 2570512. 1205279.
7 002A Sabi 12275 2015-06-30 23:30:38 2570684. 1205103.
8 002A Sabi 12275 2015-06-30 23:45:16 2570526. 1205051.
9 002A Sabi 12275 2015-07-01 00:00:10 2570532. 1205044.
10 002A Sabi 12275 2015-07-01 00:15:14 2570530. 1205059.
# ℹ 182 more rows
# ℹ 7 more variables: geometry <POINT [m]>, nMinus2 <dbl>, nMinus1 <dbl>,
# nPlus1 <dbl>, nPlus2 <dbl>, stepMean <dbl>, static <lgl>
sabi_filter <- sabi |>
filter(!static)
sabi_filter |>
ggplot(aes(E, N)) +
geom_point(data = sabi, col = "red") +
geom_path() +
geom_point() +
coord_fixed() +
theme(legend.position = "bottom")Exercise A
Task 1: Calculate Distance
Now, you can Step a): Specify a temporal window v and Step b): Measure the distance to every point within v, which you had used with sabi, on on your own movement data or to a different wild boar using different sampling intervals.
Simple feature collection with 191 features and 13 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 2568791 ymin: 1205847 xmax: 2570089 ymax: 1206864
Projected CRS: CH1903+ / LV95
# A tibble: 191 × 14
TierID TierName CollarID DatetimeUTC E N
<chr> <chr> <dbl> <dttm> <dbl> <dbl>
1 018A Ruth 13974 2015-03-05 23:00:18 2569208. 1206239.
2 018A Ruth 13974 2015-03-05 23:15:16 2569110. 1206183.
3 018A Ruth 13974 2015-03-05 23:30:29 2569084. 1206206.
4 018A Ruth 13974 2015-03-05 23:45:16 2569075. 1206208.
5 018A Ruth 13974 2015-03-06 00:00:12 2569079. 1206200.
6 018A Ruth 13974 2015-03-06 00:15:08 2569050. 1206209.
7 018A Ruth 13974 2015-03-06 00:30:13 2569024. 1206247.
8 018A Ruth 13974 2015-03-06 00:45:09 2568908. 1206195.
9 018A Ruth 13974 2015-03-06 01:00:13 2568791. 1206086.
10 018A Ruth 13974 2015-03-06 01:15:09 2568799. 1206075.
# ℹ 181 more rows
# ℹ 8 more variables: geometry <POINT [m]>, nMinus3 <dbl>, nMinus2 <dbl>,
# nMinus1 <dbl>, nPlus1 <dbl>, nPlus2 <dbl>, nPlus3 <dbl>, stepMean <dbl>
Task 2: Specify and apply threshold d
After calculating the Euclidean distances to positions within the temporal window v in task 1, you can explore these values (we stored them in the column stepMean) using summary statistics (histograms, boxplot, summary()): This way we can define a reasonable threshold value to differentiate between stops and moves. There is no “correct” way of doing this, specifying a threshold always depends on data as well as the question that needs to be answered. In this exercise, use the mean of all stepMean values.
Store the new information (boolean to differentiate between stops (TRUE) and moves (FALSE)) in a new column named static.
-> I will use the 3rd Quartile, because it has many outliers. Hope this is okay :)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.7882 2.0690 5.9740 69.9283 103.0305 526.0221 6
Simple feature collection with 191 features and 14 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 2568791 ymin: 1205847 xmax: 2570089 ymax: 1206864
Projected CRS: CH1903+ / LV95
# A tibble: 191 × 15
TierID TierName CollarID DatetimeUTC E N
* <chr> <chr> <dbl> <dttm> <dbl> <dbl>
1 018A Ruth 13974 2015-03-05 23:00:18 2569208. 1206239.
2 018A Ruth 13974 2015-03-05 23:15:16 2569110. 1206183.
3 018A Ruth 13974 2015-03-05 23:30:29 2569084. 1206206.
4 018A Ruth 13974 2015-03-05 23:45:16 2569075. 1206208.
5 018A Ruth 13974 2015-03-06 00:00:12 2569079. 1206200.
6 018A Ruth 13974 2015-03-06 00:15:08 2569050. 1206209.
7 018A Ruth 13974 2015-03-06 00:30:13 2569024. 1206247.
8 018A Ruth 13974 2015-03-06 00:45:09 2568908. 1206195.
9 018A Ruth 13974 2015-03-06 01:00:13 2568791. 1206086.
10 018A Ruth 13974 2015-03-06 01:15:09 2568799. 1206075.
# ℹ 181 more rows
# ℹ 9 more variables: geometry <POINT [m]>, nMinus3 <dbl>, nMinus2 <dbl>,
# nMinus1 <dbl>, nPlus1 <dbl>, nPlus2 <dbl>, nPlus3 <dbl>, stepMean <dbl>,
# static <lgl>
Task 3: Visualize segmented trajectories
Now visualize the segmented trajectory spatially. Just like last week, you can use ggplot with geom_path(), geom_point() and coord_equal(). Assign colour = static within aes() to distinguish between segments with “movement” and without.
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
Task 4: Segment-based analysis
In applying Laube and Purves (2011), we’ve come as far as step b) in Figure 16.1. In order to complete the last steps (c and d), we need a unique ID for each segment that we can use as a grouping variable. The following function does just that (it assigns unique IDs based on the column static which you created in Task 2). You will learn about functions next week. For now, just copy the following code chunk into your script and run it.
You can use the newly created function rle_id to assign unique IDs to subtrajectories (as shown below). Visualize the moving segments by colourizing them by segment_ID. Then use segment_ID as a grouping variable to determine the segments duration and remove short segments (e.g. segments with a duration < 5 Minutes)
Exercise B
Task 1: Similarity measures
We will now calculate similarties between trajectories using a new dataset pedestrian.csv (available on moodle). Download an import this dataset as a data.frame or tibble. It it a set of six different but similar trajectories from pedestrians walking on a path.
For this task, explore the trajectories first and get an idea on how the pedestrians moved.
Rows: 289 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (3): TrajID, E, N
dttm (1): DatetimeUTC
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Task 2: Calculate similarity
Install the package SimilarityMeasures (install.packages(“SimilarityMeasures”)). Familiarize yourself with this package by skimming through the function descriptions help(package = “SimilarityMeasures”). Now compare trajectory 1 to trajectories 2-6 using different similarity measures from the package. Your options are. DTW, EditDist, Frechet and LCSS.
Before visualizing your results think about the following: Which two trajectories to you percieve to be most similar, which are most dissimilar? Now visualize the results from the computed similarity measures. Which measure reflects your own intuition the closest?
-> 1 and 6 are the closest together. As seen in the attributes above and the plots.
Note: - All functions in the package need matrices as input, with one trajectory per matrix. - LCSStakes very long to compute. The accuracy of the algorithm (pointSpacing = ,pointDistance = and errorMarg =) can be varied to provide faster calculations.
Please see Vlachos, Gunopoulos, and Kollios (2002) for more information.
DTW EditDist Frechet LCSS
TrajID 2 3650.025 45 28.54075 3
TrajID 3 50785.51 47 2307.844 1
TrajID 4 5906.787 42 1069.229 11
TrajID 5 2178.411 28 717.9816 31
TrajID 6 1152.718 27 38.96272 38