Week 4 Exercise B: Similarity
Before visualising your results think about the following: Which two trajectories to you percieve to be most similar, which are most dissimilar?
Concerning space, trajectory 1, 2, 3, and 6 will be more similar. Concerning time, trajectories 1 and 6 will be closest.
Now visualise the results from the computed similarity measures. Which measure reflects your own intuition the closest?
# numeric datetime column
$DatetimeNum <- as.numeric(as.POSIXct(data_ped$DatetimeUTC,
data_pedformat="%Y-%m-%d %H:%M:%S", tz="UTC"))
# split the data into individual trajectories and create list containing the data
# of the individual trajectories as matrices
<- lapply(split(data_ped[, c("E", "N", "DatetimeNum")],
trajectories $TrajID), as.matrix) data_ped
Calculate DTW
<- numeric(5)
DTW
# Compute DTW distances between trajectory "1" and trajectories "2" to "6"
for (i in 2:6) {
-1] <- dtw(trajectories[["1"]], trajectories[[as.character(i)]])$distance
DTW[i }
Calculate Fréchet Distance
<- numeric(5)
FréchetDistance
# Compute FDs between trajectory "1" and trajectories "2" to "6"
for (i in 2:6) {
-1] <- Frechet(trajectories[["1"]], trajectories[[as.character(i)]])
FréchetDistance[i }
Calculate Edit Distance
<- numeric(5)
EditDistance
# Compute ED between trajectory "1" and trajectories "2" to "6"
for (i in 2:6) {
-1] <- EditDist(trajectories[["1"]], trajectories[[as.character(i)]],
EditDistance[ipointDistance=2)
}
Calculate LCSS
# LCSS <- numeric(5)
#
# Compute LCSS between trajectory "1" and trajectories "2" to "6"
# for (i in 2:6) {
# LCSS[i-1] <- LCSS(trajectories[["1"]], trajectories[[as.character(i)]], errorMarg=20,
# returnTrans = FALSE)
# }
Combine to data frame.
# df <- as.data.frame(cbind(DTW, FréchetDistance, EditDistance, LCSS,
# "comparison"=seq(2,6)))
#
# df_long <- melt(df, variable.name="method", id="comparison")
Save as csv and read in again as LCSS takes so long to compute.
# write.csv(df_long, "data/similarity_.csv", row.names = FALSE)
<- read_delim("data/similarity.csv")
data_sim
$comparison = as.factor(data_sim$comparison) data_sim
Visualise.
FD and DTW are time sensitive, due to that the plots look rather similar. The highest similarity is between trajectories 1 and 3. This is interesting. Just by looking at Figure 1, I would have suggested that 1 and 5/6 should be most similar and they have the lowest values in the actual FD and DTW comparison. However, the low similarity between 1 and 5 might be explained due to the sentsitivity to outliers in both methods.
EDR and LCSS are not time sensitive, allow for gaps, and are not sensitive to outliers. Still, the plots in Figure 2 look very different. EDR has a high tolerance for noise and for missing points. This might be the reason for all comparison to be equally high. LCSS ignores outliers.