Week 4 Exercise B: Similarity

Figure 1. Visualisation of the 6 trajectories.

Before visualising your results think about the following: Which two trajectories to you percieve to be most similar, which are most dissimilar?

Concerning space, trajectory 1, 2, 3, and 6 will be more similar. Concerning time, trajectories 1 and 6 will be closest.

Now visualise the results from the computed similarity measures. Which measure reflects your own intuition the closest?

# numeric datetime column
data_ped$DatetimeNum <- as.numeric(as.POSIXct(data_ped$DatetimeUTC,
                                              format="%Y-%m-%d %H:%M:%S", tz="UTC"))

# split the data into individual trajectories and create list containing the data
# of the individual trajectories as matrices
trajectories <- lapply(split(data_ped[, c("E", "N", "DatetimeNum")],
                             data_ped$TrajID), as.matrix)

Calculate DTW

DTW <- numeric(5)

# Compute DTW distances between trajectory "1" and trajectories "2" to "6"
for (i in 2:6) {
  DTW[i-1] <- dtw(trajectories[["1"]], trajectories[[as.character(i)]])$distance
}

Calculate Fréchet Distance

FréchetDistance <- numeric(5)

# Compute FDs between trajectory "1" and trajectories "2" to "6"
for (i in 2:6) {
  FréchetDistance[i-1] <- Frechet(trajectories[["1"]], trajectories[[as.character(i)]])
}

Calculate Edit Distance

EditDistance <- numeric(5)

# Compute ED between trajectory "1" and trajectories "2" to "6"
for (i in 2:6) {
  EditDistance[i-1] <- EditDist(trajectories[["1"]], trajectories[[as.character(i)]],
                                pointDistance=2)
}

Calculate LCSS

# LCSS <- numeric(5)
# 
# Compute LCSS between trajectory "1" and trajectories "2" to "6"
# for (i in 2:6) {
#   LCSS[i-1] <- LCSS(trajectories[["1"]], trajectories[[as.character(i)]], errorMarg=20,
#                     returnTrans = FALSE)
# }

Combine to data frame.

# df <- as.data.frame(cbind(DTW, FréchetDistance, EditDistance, LCSS,
#                           "comparison"=seq(2,6)))
# 
# df_long <- melt(df, variable.name="method", id="comparison")

Save as csv and read in again as LCSS takes so long to compute.

# write.csv(df_long, "data/similarity_.csv", row.names = FALSE)

data_sim <- read_delim("data/similarity.csv")

data_sim$comparison = as.factor(data_sim$comparison)

Visualise.

Figure 2. The similarity of trajectory 1 was compared to 2-6 with different methods.

FD and DTW are time sensitive, due to that the plots look rather similar. The highest similarity is between trajectories 1 and 3. This is interesting. Just by looking at Figure 1, I would have suggested that 1 and 5/6 should be most similar and they have the lowest values in the actual FD and DTW comparison. However, the low similarity between 1 and 5 might be explained due to the sentsitivity to outliers in both methods.

EDR and LCSS are not time sensitive, allow for gaps, and are not sensitive to outliers. Still, the plots in Figure 2 look very different. EDR has a high tolerance for noise and for missing points. This might be the reason for all comparison to be equally high. LCSS ignores outliers.