E4

E4

Eine Trajectory beschreibt:

den Weg eines Objekts (z. B. einer Person, eines Autos oder eines Tieres) durch den Raum, aufgenommen über mehrere Zeitpunkte.

Input Segmentation

Static Fixes: whose average Euclidean distance to other fixes inside temporal window v is less than some threshold d.

  1. Specify a temporal windows  for in which to measure Euclidean distances.

  2. Measure the distance from every point to every other point within this temporal window .

  3. Remove “static points”: These are points where the average distance is less than a given threshold. This segments the trajectory into subtrajectories.

  4. Now remove short subtrajectories: These are trajectories with a short duration (whereas “short” is tbd).

Method on wild boar “Sabi Data

library("readr")
library("sf")
Linking to GEOS 3.13.0, GDAL 3.8.5, PROJ 9.5.1; sf_use_s2() is TRUE
library("dplyr")

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library("ggplot2")

Sabi<-read_delim("Data/wildschwein_BE_2056.csv")
Rows: 51246 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (2): TierID, TierName
dbl  (3): CollarID, E, N
dttm (1): DatetimeUTC

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Nun filtern wir nur einige Tage raus und machen darauf einen sf.

sabi <- Sabi |>
    st_as_sf(coords = c("E", "N"), crs = 2056, remove = FALSE) |>
    filter(
      TierName == "Sabi", 
      DatetimeUTC >= "2015-07-01", 
      DatetimeUTC < "2015-07-03"
      )
sabi
Simple feature collection with 192 features and 6 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 2569724 ymin: 1204916 xmax: 2570927 ymax: 1205957
Projected CRS: CH1903+ / LV95
# A tibble: 192 × 7
   TierID TierName CollarID DatetimeUTC                E        N
 * <chr>  <chr>       <dbl> <dttm>                 <dbl>    <dbl>
 1 002A   Sabi        12275 2015-06-30 22:00:13 2569972. 1205366.
 2 002A   Sabi        12275 2015-06-30 22:16:06 2569975. 1205637.
 3 002A   Sabi        12275 2015-06-30 22:30:19 2570266. 1205857.
 4 002A   Sabi        12275 2015-06-30 22:45:13 2570208. 1205913.
 5 002A   Sabi        12275 2015-06-30 23:00:10 2570247. 1205731.
 6 002A   Sabi        12275 2015-06-30 23:15:17 2570512. 1205279.
 7 002A   Sabi        12275 2015-06-30 23:30:38 2570684. 1205103.
 8 002A   Sabi        12275 2015-06-30 23:45:16 2570526. 1205051.
 9 002A   Sabi        12275 2015-07-01 00:00:10 2570532. 1205044.
10 002A   Sabi        12275 2015-07-01 00:15:14 2570530. 1205059.
# ℹ 182 more rows
# ℹ 1 more variable: geometry <POINT [m]>
  1. Specify a temporal windows  for in which to measure Euclidean distances.

Das Sampling intervall ist 15min, also wenn man ein temporal window v of 60min nimmt, dann hat man im average 4 fixes included.

Euklidische Distanz:

  1. pos[n-2] to pos[n]

  2. pos[n-1] to pos[n]

  3. pos[n] to pos[n+1]

  4. pos[n] to pos[n+2]

    b. Measure the distance from every point to every other point within this temporal window v. Also die Euklidische Distanz zwischen v (60min) und allen Punkten innerhalb.

We define a temporal window of 60 minutes. Given a 15-minute sampling interval, this corresponds to 4 observations. Therefore, for each focal point we consider two preceding and two following positions (n−2, n−1, n+1, n+2) when computing local movement intensity. It is only about: How many neighbours matter and no distance calculation.

  • v = 4
  • each point is compared with the 2 points before and 2 points after

    position meaning
    n-2 30 min before
    n-1 15 min before
    n current
    n+1 15 min after
    n+2 30 min after
distance_by_element <- function(later, now) {
  as.numeric(
    st_distance(later, now, by_element = TRUE)
  )
}


sabi <- sabi |>
    mutate(
        nMinus2 = distance_by_element(lag(geometry, 2), geometry),  # distance to pos -30 minutes
        nMinus1 = distance_by_element(lag(geometry, 1), geometry),  # distance to pos -15 minutes
        nPlus1  = distance_by_element(geometry, lead(geometry, 1)), # distance to pos +15 mintues
        nPlus2  = distance_by_element(geometry, lead(geometry, 2))  # distance to pos +30 minutes
    )

Wir berechnen nurn die mean distance of nMinus2nMinus1nPlus1nPlus2 for each row. Since we want the mean value per Row, we have to explicitly specify this before mutate() with the function rowwise(). To remove this rowwise-grouping, we end the operation with ungroup().

sabi <- sabi |>
    rowwise() |>
    mutate(
        stepMean = mean(c(nMinus2, nMinus1, nPlus1, nPlus2))
    ) |>
    ungroup()

sabi
Simple feature collection with 192 features and 11 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 2569724 ymin: 1204916 xmax: 2570927 ymax: 1205957
Projected CRS: CH1903+ / LV95
# A tibble: 192 × 12
   TierID TierName CollarID DatetimeUTC                E        N
   <chr>  <chr>       <dbl> <dttm>                 <dbl>    <dbl>
 1 002A   Sabi        12275 2015-06-30 22:00:13 2569972. 1205366.
 2 002A   Sabi        12275 2015-06-30 22:16:06 2569975. 1205637.
 3 002A   Sabi        12275 2015-06-30 22:30:19 2570266. 1205857.
 4 002A   Sabi        12275 2015-06-30 22:45:13 2570208. 1205913.
 5 002A   Sabi        12275 2015-06-30 23:00:10 2570247. 1205731.
 6 002A   Sabi        12275 2015-06-30 23:15:17 2570512. 1205279.
 7 002A   Sabi        12275 2015-06-30 23:30:38 2570684. 1205103.
 8 002A   Sabi        12275 2015-06-30 23:45:16 2570526. 1205051.
 9 002A   Sabi        12275 2015-07-01 00:00:10 2570532. 1205044.
10 002A   Sabi        12275 2015-07-01 00:15:14 2570530. 1205059.
# ℹ 182 more rows
# ℹ 6 more variables: geometry <POINT [m]>, nMinus2 <dbl>, nMinus1 <dbl>,
#   nPlus1 <dbl>, nPlus2 <dbl>, stepMean <dbl>

Note that for the first two positions, we cannot calculate a stepMean since there is no Position n-2 for these positions. This is also true for the last to positions (lacking a position n+2).

c. Remove “static points”: These are points where the average distance is less than a given threshold. This segments the trajectory into subtrajectories.Now remove short subtrajectories: These are trajectories with a short duration (whereas “short” is tbd).

We can now determine if an animal is moving or not by specifying a threshold distance on stepMean. In our example, we use the mean value as a threshold: Positions with distances below this value are considered static.

sabi <- sabi |>
    mutate(static = stepMean < mean(stepMean, na.rm = TRUE))

sabi_filter <- sabi |>
    filter(!static)

sabi_filter |>
    ggplot(aes(E, N)) +
    geom_point(data = sabi, col = "red") +
    geom_path() +
    geom_point() +
    coord_fixed() +
    theme(legend.position = "bottom")

Exercise A: Segmentation

To my own Data:

library("readr")
library("sf")
library("dplyr")
library("ggplot2")

anna<-read_delim("Data/Annadata.csv")
Rows: 34562 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (4): user_id, weekday, place_name, transport_mode
dbl  (2): lon_x, lat_y
dttm (1): datetime

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Task 1: Calculate Distances

a) ein temporal window v spezifizieren

b) Distanz zu jedem Punkt innerhalb v

library(sf)
library(dplyr)
library(ggplot2)

anna$datetime <- as.POSIXct(anna$datetime)

Lon_x und lat_y sind Koordinaten und müssen ins richtige Koordinatensystem.

df_sf <- anna |>
  st_as_sf(
    coords = c("lon_x", "lat_y"),
    crs = 4326,
    remove = FALSE
  ) |>
  st_transform(2056)

Distanzfunktion wie oben mit dem Sabi Datensatz

distance_by_element <- function(from, to) {
  as.numeric(
    st_distance(from, to, by_element = TRUE)
  )
}

Distanzen innerhalb Fenster v berechnen –> von zwei Punkten zurück, ein Punkt zurück, ein Punkt vorwärts und ein Punkt rückwärts verwenden:

df_sf <- df_sf |>
  arrange(user_id, datetime) |>
  group_by(user_id) |>
  mutate(

    nMinus2 = distance_by_element(
      lag(geometry, 2),
      geometry
    ),

    nMinus1 = distance_by_element(
      lag(geometry, 1),
      geometry
    ),

    nPlus1 = distance_by_element(
      geometry,
      lead(geometry, 1)
    ),

    nPlus2 = distance_by_element(
      geometry,
      lead(geometry, 2)
    )
  )

Mittelwert berechnen der Distanzen:

df_sf <- df_sf |>
  mutate(
    stepMean = rowMeans(
      cbind(nMinus2, nMinus1, nPlus1, nPlus2),
      na.rm = TRUE
    )
  )

Task 2: Specifiy a treshhold d

Dazu nutzt man alle StepMean values: Es kommt 90.814 raus.

threshold_d <- mean(df_sf$stepMean, na.rm = TRUE)

threshold_d
[1] 90.81416
df_sf <- df_sf |>
  mutate(
    static = stepMean < threshold_d
  )

visualisieren:

  • Histogram

  • Boxplot

  • Summary()

ggplot(df_sf, aes(stepMean)) +
  geom_histogram(bins = 30)

ggplot(df_sf, aes(y = stepMean)) +
  geom_boxplot()

summary(df_sf$stepMean)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
    0.00    21.24    32.81    90.81    78.50 17006.93 
names(df_sf)
 [1] "user_id"        "datetime"       "weekday"        "place_name"    
 [5] "transport_mode" "lon_x"          "lat_y"          "geometry"      
 [9] "nMinus2"        "nMinus1"        "nPlus1"         "nPlus2"        
[13] "stepMean"       "static"        

Segmentierte Trajectories visualisieren

ggplot(df_sf) +
  geom_path(aes(x = lon_x, y = lat_y)) +
  geom_point(aes(x = lon_x, y = lat_y, colour = static)) +
  coord_equal()

Segment ID s machen:

rle_id <- function(vec) {
  x <- rle(vec)$lengths
  as.factor(rep(seq_along(x), times = x))
}

df_sf <- df_sf |>
  mutate(
    segment_id = rle_id(static)
  )

Visualisieren:

ggplot(filter(df_sf, static == FALSE)) +
  geom_path(
    aes(
      x = lon_x,
      y = lat_y,
      colour = segment_id,
      group = segment_id
    )
  ) +
  coord_equal()

Viel zu viele.. Segmentdauer berechnen und die kurzen entfernen, die zb kürzer als 5min sind.

segment_summary <- df_sf |>
  group_by(segment_id, static) |>
  summarise(
    start = min(datetime),
    end = max(datetime),
    duration_min = as.numeric(
      difftime(end, start, units = "mins")
    ),
    .groups = "drop"
  )

valid_segments <- segment_summary |>
  filter(duration_min >= 5) |>
  pull(segment_id)

df_filtered <- df_sf |>
  filter(segment_id %in% valid_segments)

Methodisch wäre bei unregelmässigen Daten wie meinen eigentlich besser mit Distanz pro Zeit oder ein echtes Zeitfenster…. aber für diese Übung kann man das glaube ich so machen.

Exercise B: Similarity

library(tidyverse)
Warning: package 'tibble' was built under R version 4.5.2
Warning: package 'tidyr' was built under R version 4.5.2
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ forcats   1.0.1     ✔ stringr   1.5.2
✔ lubridate 1.9.4     ✔ tibble    3.3.1
✔ purrr     1.1.0     ✔ tidyr     1.3.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
ped <- read.csv("Data/pedestrian.csv")

glimpse(ped)
Rows: 289
Columns: 4
$ TrajID      <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ E           <dbl> 2571414, 2571396, 2571373, 2571347, 2571336, 2571321, 2571…
$ N           <dbl> 1205804, 1205791, 1205770, 1205753, 1205744, 1205732, 1205…
$ DatetimeUTC <chr> "2015-03-01T12:01:00Z", "2015-03-01T12:02:00Z", "2015-03-0…
head(ped)
  TrajID       E       N          DatetimeUTC
1      1 2571414 1205804 2015-03-01T12:01:00Z
2      1 2571396 1205791 2015-03-01T12:02:00Z
3      1 2571373 1205770 2015-03-01T12:03:00Z
4      1 2571347 1205753 2015-03-01T12:04:00Z
5      1 2571336 1205744 2015-03-01T12:05:00Z
6      1 2571321 1205732 2015-03-01T12:06:00Z

Variablen: TrajID, E, N, DatetimeUTC

library(ggplot2)
library(tidyverse)
library(SimilarityMeasures)

ggplot(ped, aes(x = E, y = N, color = factor(TrajID))) +
  geom_path() +
  theme_minimal()

ggplot(ped, aes(E, N)) +
  geom_path() +
  facet_wrap(~TrajID) +
  theme_minimal()

Man sieht den Verlauf jeder Person /ID in der Richtung und kann Unterschiede schon erkennen. Im zweiten kann man die Wege trennen und sieht jeden Einzel. Sie gehen sehr ähnlich ausser nr 4 und 5.

Trajectories trennen:

  • traj_matrices[[1]] = Trajektorie 1
  • traj_matrices[[2]] = Trajektorie 2 usw.
traj_list <- split(ped, ped$TrajID)

traj_matrices <- lapply(traj_list, function(df) {
  as.matrix(df[, c("E", "N")])
})

Similarities berechnen:

Ich vergleiche Trajectory 1 mit allen anderen:

  • Kleine Distanz = ähnliche Trajektorie

  • Große LCSS = ähnlich (anders als die anderen!)

  • DTW / Fréchet Methode sind gut für Formähnlichkeit

  • Edit Distance sind empfindlich auf Sampling

  • LCSS sind robust gegen “Noise”

t1 <- traj_matrices[[1]]

DTW:

dtw_vals <- sapply(2:6, function(i) {
  DTW(t1, traj_matrices[[i]])
})

Edit Distance:

edit_vals <- sapply(2:6, function(i) {
  EditDist(t1, traj_matrices[[i]])
})

Freché Distance:

frechet_vals <- sapply(2:6, function(i) {
  Frechet(t1, traj_matrices[[i]])
})

LCSS:

lcss_vals <- sapply(2:6, function(i) {
  LCSS(t1, traj_matrices[[i]],
       pointSpacing = 1,
       pointDistance = 1,
       errorMarg = 1)
})
results <- data.frame(
  Trajectory = 2:6,
  DTW = dtw_vals,
  EditDist = edit_vals,
  Frechet = frechet_vals,
  LCSS = lcss_vals
)

print(results)
  Trajectory       DTW EditDist    Frechet LCSS
1          2  3650.025       45   28.54075    0
2          3 50785.511       47 2307.84366    0
3          4  5906.787       42 1069.22917    0
4          5  2178.411       28  717.98159    1
5          6  1152.718       27   38.96272    2

Visualisieren mit reshape und den Methoden:

  • Mit facets pro Methode
library(reshape2)

Attaching package: 'reshape2'
The following object is masked from 'package:tidyr':

    smiths
res_long <- melt(results, id.vars = "Trajectory",
                 variable.name = "Method",
                 value.name = "Value")

ggplot(res_long, aes(x = factor(Trajectory), y = Value, fill = Method)) +
  geom_bar(stat = "identity", position = "dodge") +
  theme_minimal() +
  labs(x = "Trajectory", y = "Distance / Similarity Measure")

ggplot(res_long, aes(x = factor(Trajectory), y = Value, fill = factor(Trajectory))) +
  geom_bar(stat = "identity", position = "dodge") +
  facet_wrap(~Method, scales = "free_y") +
  theme_minimal() +
  labs(
    x = "Trajectory comparison (1 vs others)",
    y = "Value",
    fill = "Trajectory",
    title = "Trajectory Similarity Measures"
  )

Trajectory 3 hat die grösste Distanz zu trajectory 1 in der DTW Methode. Der rest ist sehr ähnlich zu 1.

LCSS: Hier ist das trajectory 6 sehr ähnlich wie Trajectory 1 (je mehr desto ähnlicher)