Week 4 Exercise A: Segmentation

Author

Selina Lepori

Read and Tidy Data

Create a weekday column.

# add weekday column
data <- data |>
  mutate(day = as.Date(time), weekday = weekdays(time))

Add coordinates as separate columns (in degrees). Change geometry column from degrees to metres.

Ensure correct CRS.

# geometry in metres, not degrees
data <- st_transform(data, crs=2056)

# add coordinates as separate columns (in metres)
coord <- st_coordinates(data)

# join
data_coord <- cbind(data, coord)

There are 28 duplicated values in the time column. The duplicated values containing no speed value (cmt) will be removed.

# there are 28 duplicated
data_coord |>
  st_drop_geometry(data_coord) |> 
  select(track_seg_point_id, time, cmt, weekday) |> 
  filter(duplicated(time)) |>
  knitr::kable(format = "html") |> 
  kable_styling(font_size = 4)
Table 1. Overview of duplicated values in the time column.
track_seg_point_id time cmt weekday
6 2024-04-01 10:29:38 NA Monday
11 2024-04-01 11:09:37 NA Monday
14 2024-04-01 11:29:38 NA Monday
239 2024-04-06 23:26:22 NA Saturday
244 2024-04-07 00:07:06 NA Sunday
251 2024-04-07 01:06:40 NA Sunday
254 2024-04-07 02:26:23 NA Sunday
257 2024-04-07 02:46:24 NA Sunday
260 2024-04-07 03:06:37 NA Sunday
273 2024-04-07 05:04:08 NA Sunday
526 2024-04-10 06:44:25 NA Wednesday
813 2024-04-13 00:41:43 NA Saturday
820 2024-04-13 01:41:41 NA Saturday
829 2024-04-13 03:01:52 NA Saturday
831 2024-04-13 03:21:45 NA Saturday
846 2024-04-13 05:41:43 NA Saturday
852 2024-04-13 06:41:48 NA Saturday
858 2024-04-13 07:21:46 NA Saturday
877 2024-04-13 10:01:43 NA Saturday
1005 2024-04-14 19:26:44 NA Sunday
1012 2024-04-14 20:26:46 NA Sunday
1023 2024-04-14 22:06:42 NA Sunday
1026 2024-04-14 22:26:43 NA Sunday
1029 2024-04-14 22:46:43 NA Sunday
1197 2024-04-28 08:01:53 NA Sunday
1202 2024-04-28 08:41:52 NA Sunday
1205 2024-04-28 09:01:51 NA Sunday
1214 2024-04-28 10:21:52 NA Sunday
# remove the duplicated rows
data_coord <- data_coord |> 
  distinct(time, .keep_all=TRUE)

Select a sample of two days and ensure correct CRS.

data_sample <- data_coord |>
    st_as_sf(coords = c("X", "Y"), crs = 2056, remove = FALSE) |>
    filter(
      day >= "2024-04-01", 
      day < "2024-04-03"
      )

data_no_geo <- st_drop_geometry(data_sample)

Calculate time lag.

data_no_geo <- data_no_geo |>
    arrange(time) |> # make sure order is correct
    mutate(timestamp = difftime_secs(lead(time), time))

Check timelags.

Table 2. Overview of time lags.
count_na mean_interval median_interval min_interval max_interval count_timestamp count_ts_larger_15
0 1419.659 600 3 43198 92 7

Figure 1. Visualisation of (large) time lags.

There are 7 of total 92 time lag values which are larger than 15 minutes. Interestingly, the mean time lag is around 23 minutes, which indicates that the high outliers have a high influence on the mean value. Therefore, I calculated the median as well, which is 10 minutes. Data gaps occur because when the cat is at home, the GPS does not record any data. At the same time, it happens sometimes that the cat goes outside without the GPS tacker, which might have happened in the night from April 1 to April 2 (data gap of around 12 hours).

Task 1: Calculate Distances

Step a): Specify a temporal window v

The sampling interval should be 10 minutes. I will use a temporal window of 40 minutes which includes 4 fixes. Step b calculates the Euclidean distances.

Step b): Measure the distance (m) to every point within v

We can use the function distance_by_element from week 2 in combination with lead() and lag() to calculate the Euclidean distance.

data_sample <- data_sample |>
    mutate(
        nMinus2 = distance_by_element(lag(geometry, 2), geometry),  # distance -20 minutes
        nMinus1 = distance_by_element(lag(geometry, 1), geometry),  # distance -10 minutes
        nPlus1  = distance_by_element(geometry, lead(geometry, 1)), # distance +10 mintues
        nPlus2  = distance_by_element(geometry, lead(geometry, 2))  # distance +20 minutes
    )

Now we want to calculate the mean distance of nMinus2, nMinus1, nPlus1, nPlus2 for each row.

data_sample <- data_sample |>
    rowwise() |>
    mutate(
        stepMean = mean(c(nMinus2, nMinus1, nPlus1, nPlus2))
    ) |>
    ungroup()

Task 2: Specify and apply threshold d

Exploratory Data Analysis

Figure 2. EDA, showing a boxplot and a histogram.

Figure 2 shows that the median value of the stepMean is around 74. There are more counts on the smaller stepMean spectrum.

Summary of column stepMean.

summary(data_sample$stepMean)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
  10.99   50.08   71.89   79.14  103.81  191.79       4 
threshold = 50 # use first quantile as threshold

# store information of static or not static in a new column as TRUE or FALSE
data_sample <- data_sample |>
    mutate(static = stepMean < threshold)

#mean(data_sample$stepMean, na.rm=TRUE)

The first quantile of stepMean is around 50, which is chosen as threshold for classification as «static».

Task 3: Visualise segmented trajectories

data_filter <- data_sample |>
    filter(!static) # filter the ones which are not static

Figure 3. The trajectory of Jiji. Red dots are static points, the black dots signify moving points.

Task 4: Segment-based analysis

Function for creating unique ID for each subtrajectories.

rle_id <- function(vec) {
    x <- rle(vec)$lengths
    as.factor(rep(seq_along(x), times = x))
}

Create unique ID for each subtrajectories and filter the subtrajectories which are not static.

data_sample <- data_sample |>
    mutate(segment_id = rle_id(static))

data_filter2 <- data_sample |>
    filter(!static) # filter the ones which are not static

Then use segment_id as a grouping variable to determine the segments duration and remove short segments (e.g. segments with a duration < xx minutes).

data_filter2 <- data_filter2 |>
    arrange(time) |> # make sure order is correct
    group_by(segment_id) |> # group it by segment_id
    mutate(seg_duration = difftime_secs(lead(time), time)) # calculate time difference

Calculate the duration per subtrajectory.

data_filter2 |> 
  st_drop_geometry(data_filter2) |> 
  group_by(segment_id) |> 
  summarise(seg_duration=sum(seg_duration, na.rm=TRUE)) |> 
  mutate(seg_duration_min = seg_duration/60) |> 
  knitr::kable()
Table 3. Overview of segment durations.
segment_id seg_duration seg_duration_min
3 1800 30.00000
5 3607 60.11667
7 1244 20.73333
9 624 10.40000
11 18377 306.28333
13 2395 39.91667
15 866 14.43333
17 2984 49.73333
19 7798 129.96667
21 3604 60.06667

Remove the subtrajectories with values < 15 min (subtrajectories 9 and 15).

data_filter2 = data_filter2[!(data_filter2$segment_id %in% c("9", "15")), ]

Visualise.

Figure 4. The trajectory of Jiji. Included are moving points only, coloured by remaining segment ID.