This tutorial presents a novel yet accessible approach that combines
cluster analysis of net displacement values with boosted regression
trees to classify migration patterns of Iberian little bustards. The
dataset used consists of satellite-tagged records of little bustards,
available on MoveBank. The tutorial explains the analyses conducted in
the study “Individual variability in migration patterns of Iberian
little bustards”.
First, we need to install and load all necessary packages for data processing, visualization, and modeling.
library(lattice)
library(nlme)
library(mgcv)
## This is mgcv 1.8-41. For overview type 'help("mgcv-package")'.
library(rpart)
library(kml)
## Warning: package 'kml' was built under R version 4.2.3
## Loading required package: clv
## Warning: package 'clv' was built under R version 4.2.3
## Loading required package: cluster
## Loading required package: class
## Loading required package: longitudinalData
## Warning: package 'longitudinalData' was built under R version 4.2.3
## Loading required package: rgl
## Loading required package: misc3d
library(longitudinalData)
library(devEMF)
## Warning: package 'devEMF' was built under R version 4.2.3
library(amt)
## Warning: package 'amt' was built under R version 4.2.3
##
## Attaching package: 'amt'
## The following object is masked from 'package:stats':
##
## filter
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.2.3
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:nlme':
##
## collapse
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(readxl)
Then, we set the working directory using the function
setwd("path/to/your/directory") and import the data.
head(lb)
## ID timestamp date longitude latitude day utm_x
## 1 LB45_2020 2020-04-01 01:29:00 2020-03-31 -5.444840 38.79970 1 433454.8
## 2 LB45_2020 2020-04-02 01:29:00 2020-04-01 -5.431840 38.81745 2 434526.2
## 3 LB45_2020 2020-04-03 01:29:00 2020-04-02 -5.420350 38.81321 3 435538.3
## 4 LB45_2020 2020-04-04 02:29:00 2020-04-04 -5.419658 38.81427 4 435594.9
## 5 LB45_2020 2020-04-05 00:28:00 2020-04-04 -5.417703 38.81401 5 435765.6
## 6 LB45_2020 2020-04-06 03:29:00 2020-04-06 -5.416944 38.81371 6 435832.6
## utm_y NSD
## 1 207178.1 0
## 2 209182.8 5166604
## 3 208740.9 6783433
## 4 208860.4 7410484
## 5 208836.7 8090911
## 6 208806.0 8304120
The dataset contains GPS tracking data for individual Iberian little
bustards, with each row representing a recorded location per day. Each
bird is identified by a unique ID, which combines the bird’s individual
tag and the tracking year (e.g., lb192286_2019 corresponds
to bird LB45 in the year 2019).
timestamp: The exact date and time of the GPS fix. For
our analysis, we retain only the first location per day.date: The calendar date (in YYYY-MM-DD format).longitude and latitude: The geographic
coordinates of the bird’s location in WGS84 format.utm_x and utm_y: The same coordinates
projected in UTM (in meters).day: The day of the year considering April 1st as day 1
and the beginning of the annual cycle.NSD: Net Squared Displacement, which corresponds to the
squared distance (in meters) from the initial capture location.
length(unique(lb$ID))
## [1] 71
The dataset comprises 71 annual cycles (hereafter tracks) from 46 adult individuals. Each track represents one full year of GPS locations for a given individual with at least 300 or more days. To analyse movement patterns, we use the Net Displacement, which is calculated as the square root of the Net Squared Displacement (NSD).
# Check the range of Net Squared Displacement (NSD) values
range(lb$NSD)
## [1] 0 168472881980
# Convert NSD to Net Displacement (ND) by taking the square root
lb$ND <- sqrt(lb$NSD)
# Check the range of ND (in meters)
range(lb$ND)
## [1] 0.0 410454.5
# Convert ND from meters to kilometers
lb$ND_km <- lb$ND/1000
# Check the final range of Net Displacement in kilometers
range(lb$ND_km)
## [1] 0.0000 410.4545
We generate one plot per bird, showing their Net Displacement (in
kilometres) across the days of the year. Each plot is saved as a
separate .jpg file, named according to the bird’s ID.
Loop through each bird and generate the plots:
# Get a list of all unique bird-year IDs
allbirds <- unique(lb$ID)
# Total number of tracked bird-year combinations
maxnb <- length(allbirds)
# Confirm that there are no missing Net Displacement values
sum(is.na(lb$ND_km)) # Should return 0
## [1] 0
# Create individual Net Displacement plots for each bird-year track
for (i in 1:maxnb) {
bird.i <- allbirds[i]
NetDis <- lb[lb$ID == bird.i, ]
# Set filename for each plot
file_name <- paste0(bird.i, ".jpg")
# Open JPEG device
jpeg(file = file_name)
# Create the plot
plot(NetDis$day, NetDis$ND_km,
xlab = "Day",
ylab = "Net Displacement (km)",
ylim = c(0, max(NetDis$ND_km, na.rm = TRUE)),
las = 1,
main = bird.i)
# Close the graphics device
dev.off()
}
We fit Boosted Regression Trees (BRT) to model the Net Displacement (ND) of each tracked bird over time. BRTs are particularly well-suited for this analysis because they can model complex, non-linear relationships and handle unbalanced ecological data with missing values.
BRT works by repeatedly splitting the data into more homogeneous groups based on a predictor variable (in this case, day). For our analysis, ND is the response variable, while day serves as the primary predictor.
A key advantage of BRT is its ability to handle missing data points.
When data are missing for the primary predictor, BRT can use surrogate
splits as alternative predictors that can ensure the model continues to
function even when some values are absent. This feature makes BRT
particularly robust for working with real-world ecological data that may
contain gaps.
We use the rpart function to fit regression trees for
each bird. This function models the relationship between ND
(ND_km) and day.
There are two key parameters that control how the model builds the trees:
minbucket: This parameter defines the minimum number of
observations required in any terminal node.minsplit: This is the minimum number of observations
required for a split to be attempted.minbucket = 2. For birds with more than 50 observations, we
rely on default settings for minsplit = 20 and
minbucket being one-third of minsplit.
For each bird, we first extract the data subset corresponding to that individual and fit the model based on the number of observations:
for (i in 1:maxnb) {
bird.i <- allbirds[i]
subset <- lb[lb$ID == bird.i,]
# Fit the model based on sample size
if (dim(subset)[1] < 50) {
M <- rpart(ND_km ~ day, method = "anova", minbucket = 2, data = subset)
} else {
M <- rpart(ND_km ~ day, method = "anova", data = subset)
}
# Predict the Net Displacement for the current bird
subset$pred <- predict(M)
# Save the plot for the current bird
file_name <- paste(bird.i, "_rt", ".jpg", sep="")
jpeg(file = file_name)
# Plot the observed and predicted ND
if (max(subset$ND_km) > max(subset$pred)) {
plot(subset$day, subset$ND_km, pch = 21, ylim = c(0, max(subset$ND_km)),
xlab = "Day", ylab = "ND (km)", main = bird.i, las = 1)
I <- order(subset$day)
lines(subset$day[I], subset$pred[I])
dev.off()
}
plot(subset$day, subset$ND_km, pch = 21, ylim = c(0, max(subset$pred)),
xlab = "Day", ylab = "ND (km)", main = bird.i, las = 1)
I <- order(subset$day)
lines(subset$day[I], subset$pred[I])
dev.off()
}
Note: The method anova was chosen because ND is a
continuous numeric variable. Other options, like poisson or
class, would be more appropriate for count or categorical
data, respectively. By using anova, we ensure that the
regression tree builds a model based on continuous data.
The plots for each bird’s ND predictions will be saved in the current
working directory.
To later use these predicted values in a cluster analysis, we first store them in a matrix and then export the results.
# Initialize a matrix to store predicted ND values (rows = days, columns = birds)
profiles <- matrix(nrow = 365, ncol = maxnb + 1)
# The first column will contain Julian days (1 to 365)
# The following columns will contain predicted ND values for each bird
profiles[, 1] <- 1:365
colnames(profiles) <- c("Day", as.character(allbirds))
# Preview the matrix structure (first 6 rows and first 3 columns)
print(profiles[1:6, 1:3])
## Day LB45_2020 LB63_2022
## [1,] 1 NA NA
## [2,] 2 NA NA
## [3,] 3 NA NA
## [4,] 4 NA NA
## [5,] 5 NA NA
## [6,] 6 NA NA
# Create a reference data frame for prediction (one row per day of the year)
Day <- 1:365
MyData <- data.frame(day = Day)
# Loop through each bird, fit a model, predict ND for 365 days, and store the result
for (i in 1:maxnb) {
bird.i <- allbirds[i]
subset <- lb[lb$ID == bird.i, ]
if (nrow(subset) < 50) {
M <- rpart(ND_km ~ day, method = "anova", minbucket = 2, data = subset)
} else {
M <- rpart(ND_km ~ day, method = "anova", data = subset)
}
pred <- predict(M, MyData)
profiles[, i + 1] <- pred
}
# Convert the matrix to a dataframe and save as CSV for later use
lb_BRT <- as.data.frame(profiles)
write.csv2(lb_BRT, file = "lb_BRT.csv")
To avoid using model predictions beyond the period each bird was
actually tracked, we will replace extrapolated values with
NA.
# Create a summary dataframe to store the last recorded day for each bird
z <- data.frame(ID = character(maxnb), maxdays = numeric(maxnb), stringsAsFactors = FALSE)
for (i in 1:maxnb) {
bird.i <- allbirds[i]
NetDis <- lb[lb$ID == bird.i, ]
z[i, "ID"] <- as.character(bird.i)
z[i, "maxdays"] <- max(NetDis$day)
}
# Copy the predictions and mask extrapolated values with NA
lb_BRT_NA <- lb_BRT
for (j in 1:maxnb) {
for (i in 1:365) {
if (lb_BRT$Day[i] > z$maxdays[j]) {
lb_BRT_NA[i, j + 1] <- NA
}
}
}
# Convert the matrix to a dataframe and save as CSV
lb_BRT_NA <- as.matrix(lb_BRT_NA, digits = 2)
write.csv2(lb_BRT_NA, file = "lb_BRT_NA.csv")
After generating the BRT predictions and replacing extrapolated
values with NA, we now prepare the data in a format
suitable for clustering. The goal is to create a matrix where each row
represents an individual bird, and each column represents a day of the
year with the predicted displacement value.
# Transpose the BRT matrix so that birds are rows and days are columns
lb_cluster <- t(lb_BRT_NA)
# Remove the first row (original "Day" column), since it's now a redundant header
lb_cluster <- lb_cluster[-1, ]
# Rename columns as day numbers (1 to 365)
colnames(lb_cluster) <- 1:365
# Convert to dataframe to facilitate clustering analysis
lb_cluster <- as.data.frame(lb_cluster)
print(lb_cluster[1:2, 1:8])
## 1 2 3 4 5 6 7
## LB45_2020 8.0035096 8.0035096 8.0035096 8.0035096 8.0035096 8.0035096 8.0035096
## LB63_2022 0.7401503 0.7401503 0.7401503 0.7401503 0.7401503 0.7401503 0.7401503
## 8
## LB45_2020 8.0035096
## LB63_2022 0.7401503
# Save the final matrix to a CSV file
write.csv2(lb_cluster, file = "lb_BRT_cluster.csv")
To group the migration patterns of GPS-tagged little bustards, we
applied a longitudinal k-means clustering analysis using the
kml and longitudinalData R packages. While
traditional k-means clustering minimizes variance by comparing static
points, it does not consider the temporal structure of movement data.
Instead, longitudinal clustering considers the entire migration
trajectories, its timing, duration, and distance, making it more
suitable for time-series data like ours.
The algorithm partitions individuals into clusters based on their ND trajectories over the year, which allows us to identify common migration strategies. We include all tracks in the analysis, even those that belong to the same individual.
We tested nine different approaches, varying four key parameters:
# Create ClusterLongData object for clustering
clustKml <- cld(lb_cluster, idAll = rownames(lb_cluster), varNames = "ND_km")
# Visualize individual trajectories
plot (clustKml, toPlot = "traj")
# Run clustering for 3 to 8 clusters, with 20 random initializations each
kml(clustKml, nbClusters = 3:8, nbRedrawing = 20, toPlot = "criterion")
## ~ Fast KmL ~
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## S
## 100
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## *
## S
nbClusters: Range of cluster numbers to test (e.g., 3
to 8).nbRedrawing: Number of times the algorithm is re-run
per cluster number, to test stability of the result.toPlot = "criterion": Shows a quality metric (e.g.,
Calinski-Harabasz Index) for each run.The plots that appear while the clustering analysis is running refer to the quality of clustering. The sorting scale uses the Calinski-Harabasz Index to provide a score on the quality of the clustering. As more clusters are created, the statistics of each cluster become less reliable. You should be careful if there is a big drop in quality when moving left-to-right through redrawing of the same k. There is no minimum quality, so this should be used as directional guidance.
The default approach is defined below:
imputationMethod = "copyMean": Interpolates missing
values, then adjusts based on average trajectory shape.distanceName = "euclidean": Uses Euclidean distance
between full trajectories.startingCond = "nearlyAll": Ensures that almost all
individuals are initially assigned to clusters.scale = TRUE: Scales data before clustering.To visualize the result of the clustering, we generate trajectory plots for each number of clusters tested:
# Generate and display trajectory plots for 3 to 8 clusters
for (k in 3:8) {
# Display in RStudio
plot(clustKml, k,
parMean = parMEAN(type = "l"),
parTraj = parTRAJ(col = "clusters"),
toPlot = "traj")
# Save to .emf file
file_name <- paste0("lb_kml_default", k, ".emf")
emf(file = file_name, emfPlus = FALSE)
plot(clustKml, k,
parMean = parMEAN(type = "l"),
parTraj = parTRAJ(col = "clusters"),
toPlot = "traj")
dev.off()
}
parTraj(col = "clusters"): Colors individual
trajectories by cluster.parMean(type = "l"): Plots mean trajectory per cluster
as a line.Note: The plotted lines represent the mean trajectory of all individuals within each cluster, not the median.
Finally, we extract and export the cluster assignments for each number of clusters:
# Extract cluster membership for each k
default_c3 <- getClusters(clustKml, nbCluster = 3)
default_c4 <- getClusters(clustKml, nbCluster = 4)
default_c5 <- getClusters(clustKml, nbCluster = 5)
default_c6 <- getClusters(clustKml, nbCluster = 6)
default_c7 <- getClusters(clustKml, nbCluster = 7)
default_c8 <- getClusters(clustKml, nbCluster = 8)
# Combine all cluster assignments into a dataframe
allbirds2 <- row.names(lb_cluster)
lb_kml_default <- data.frame(allbirds2, default_c3, default_c4, default_c5,
default_c6, default_c7, default_c8)
print(lb_kml_default[1:2, 1:7])
## allbirds2 default_c3 default_c4 default_c5 default_c6 default_c7 default_c8
## 1 LB45_2020 C C D D F G
## 2 LB63_2022 A A A B B B
# Save to file
write.csv2(lb_kml_default, file = "lb_kml_default.csv")
# Run clustering with the approach A
optionA <- parALGO(saveFreq = 100, maxIt = 200, imputationMethod="LOCF", distanceName = "euclidean", power = 2, distance = function(){}, centerMethod = meanNA, startingCond = "nearlyAll", nbCriterion = 100, scale = TRUE)
kml(clustKml, nbClusters = 2:8, nbRedrawing = 20, toPlot = "none", parAlgo = optionA)
## ~ Fast KmL ~
## ***************************************************************************************************S
## 100 ****************************************S
# Generate and display trajectory plots for 3 to 8 clusters
for (k in 3:8) {
# Display in RStudio
plot(clustKml, k,
parMean = parMEAN(type = "l"),
parTraj = parTRAJ(col = "clusters"),
toPlot = "none")
# Save to .emf file
file_name <- paste0("lb_kml_approachA", k, ".emf")
emf(file = file_name, emfPlus = FALSE)
plot(clustKml, k,
parMean = parMEAN(type = "l"),
parTraj = parTRAJ(col = "clusters"),
toPlot = "none")
dev.off()
}
# Extract cluster membership for each k
approachA_c3 <- getClusters(clustKml, nbCluster = 3)
approachA_c4 <- getClusters(clustKml, nbCluster = 4)
approachA_c5 <- getClusters(clustKml, nbCluster = 5)
approachA_c6 <- getClusters(clustKml, nbCluster = 6)
approachA_c7 <- getClusters(clustKml, nbCluster = 7)
approachA_c8 <- getClusters(clustKml, nbCluster = 8)
# Combine all cluster assignments into a dataframe
allbirds2 <- row.names(lb_cluster)
lb_kml_approachA <- data.frame(allbirds2, approachA_c3, approachA_c4, approachA_c5, approachA_c6, approachA_c7, approachA_c8)
print(lb_kml_approachA[1:2, 1:7])
## allbirds2 approachA_c3 approachA_c4 approachA_c5 approachA_c6 approachA_c7
## 1 LB45_2020 C C D C F
## 2 LB63_2022 A A A A B
## approachA_c8
## 1 G
## 2 B
# Save to file
write.csv2(lb_kml_approachA, file = "lb_kml_approachA.csv")
# Run clustering with the approach B
optionB <- parALGO (saveFreq=100,maxIt=200,imputationMethod="copyMean", distanceName="euclidean",power=2,distance=function(x,y){dist(rbind(x,y),method='manhattan',p=2)},centerMethod=meanNA,startingCond="nearlyAll",nbCriterion=100,scale=TRUE)
kml(clustKml, nbClusters = 2:8, nbRedrawing = 20, toPlot = "none", parAlgo = optionB)
## ~ Fast KmL ~
## ***************************************************************************************************S
## 100 ****************************************S
# Generate and display trajectory plots for 3 to 8 clusters
for (k in 3:8) {
# Display in RStudio
plot(clustKml, k,
parMean = parMEAN(type = "l"),
parTraj = parTRAJ(col = "clusters"),
toPlot = "none")
# Save to .emf file
file_name <- paste0("lb_kml_approachB", k, ".emf")
emf(file = file_name, emfPlus = FALSE)
plot(clustKml, k,
parMean = parMEAN(type = "l"),
parTraj = parTRAJ(col = "clusters"),
toPlot = "none")
dev.off()
}
# Extract cluster membership for each k
approachB_c3 <- getClusters(clustKml, nbCluster = 3)
approachB_c4 <- getClusters(clustKml, nbCluster = 4)
approachB_c5 <- getClusters(clustKml, nbCluster = 5)
approachB_c6 <- getClusters(clustKml, nbCluster = 6)
approachB_c7 <- getClusters(clustKml, nbCluster = 7)
approachB_c8 <- getClusters(clustKml, nbCluster = 8)
# Combine all cluster assignments into a dataframe
allbirds2 <- row.names(lb_cluster)
lb_kml_approachB <- data.frame(allbirds2, approachB_c3, approachB_c4, approachB_c5, approachB_c6, approachB_c7, approachB_c8)
print(lb_kml_approachB[1:2, 1:7])
## allbirds2 approachB_c3 approachB_c4 approachB_c5 approachB_c6 approachB_c7
## 1 LB45_2020 C C D E F
## 2 LB63_2022 A A A B B
## approachB_c8
## 1 G
## 2 B
# Save to file
write.csv2(lb_kml_approachB, file = "lb_kml_approachB.csv")
# Run clustering with the approach C
optionC <- parALGO (saveFreq=100,maxIt=200,imputationMethod="copyMean",distanceName="euclidean",power=2,distance=function(){},centerMethod=meanNA,startingCond="nearlyAll",nbCriterion=100,scale=FALSE)
kml(clustKml, nbClusters = 2:8, nbRedrawing = 20, toPlot = "none", parAlgo = optionC)
## ~ Fast KmL ~
## ***************************************************************************************************S
## 100 ****************************************S
# Generate and display trajectory plots for 3 to 8 clusters
for (k in 3:8) {
# Display in RStudio
plot(clustKml, k,
parMean = parMEAN(type = "l"),
parTraj = parTRAJ(col = "clusters"),
toPlot = "none")
# Save to .emf file
file_name <- paste0("lb_kml_approachC", k, ".emf")
emf(file = file_name, emfPlus = FALSE)
plot(clustKml, k,
parMean = parMEAN(type = "l"),
parTraj = parTRAJ(col = "clusters"),
toPlot = "none")
dev.off()
}
# Extract cluster membership for each k
approachC_c3 <- getClusters(clustKml, nbCluster = 3)
approachC_c4 <- getClusters(clustKml, nbCluster = 4)
approachC_c5 <- getClusters(clustKml, nbCluster = 5)
approachC_c6 <- getClusters(clustKml, nbCluster = 6)
approachC_c7 <- getClusters(clustKml, nbCluster = 7)
approachC_c8 <- getClusters(clustKml, nbCluster = 8)
# Combine all cluster assignments into a dataframe
allbirds2 <- row.names(lb_cluster)
lb_kml_approachC <- data.frame(allbirds2, approachC_c3, approachC_c4, approachC_c5, approachC_c6, approachC_c7, approachC_c8)
print(lb_kml_approachC[1:2, 1:7])
## allbirds2 approachC_c3 approachC_c4 approachC_c5 approachC_c6 approachC_c7
## 1 LB45_2020 C C D E E
## 2 LB63_2022 A A A A A
## approachC_c8
## 1 F
## 2 B
# Save to file
write.csv2(lb_kml_approachC, file = "lb_kml_approachC.csv")
# Run clustering with the approach D
optionD <- parALGO (saveFreq=100,maxIt=200,imputationMethod="copyMean.bisector",distanceName="euclidean",power=2,distance=function(){},centerMethod=meanNA,startingCond="nearlyAll",nbCriterion=100,scale=TRUE)
kml(clustKml, nbClusters = 2:8, nbRedrawing = 20, toPlot = "none", parAlgo = optionD)
## ~ Fast KmL ~
## ***************************************************************************************************S
## 100 ****************************************S
# Generate and display trajectory plots for 3 to 8 clusters
for (k in 3:8) {
# Display in RStudio
plot(clustKml, k,
parMean = parMEAN(type = "l"),
parTraj = parTRAJ(col = "clusters"),
toPlot = "none")
# Save to .emf file
file_name <- paste0("lb_kml_approachD", k, ".emf")
emf(file = file_name, emfPlus = FALSE)
plot(clustKml, k,
parMean = parMEAN(type = "l"),
parTraj = parTRAJ(col = "clusters"),
toPlot = "none")
dev.off()
}
# Extract cluster membership for each k
approachD_c3 <- getClusters(clustKml, nbCluster = 3)
approachD_c4 <- getClusters(clustKml, nbCluster = 4)
approachD_c5 <- getClusters(clustKml, nbCluster = 5)
approachD_c6 <- getClusters(clustKml, nbCluster = 6)
approachD_c7 <- getClusters(clustKml, nbCluster = 7)
approachD_c8 <- getClusters(clustKml, nbCluster = 8)
# Combine all cluster assignments into a dataframe
allbirds2 <- row.names(lb_cluster)
lb_kml_approachD <- data.frame(allbirds2, approachD_c3, approachD_c4, approachD_c5, approachD_c6, approachD_c7, approachD_c8)
print(lb_kml_approachD[1:2, 1:7])
## allbirds2 approachD_c3 approachD_c4 approachD_c5 approachD_c6 approachD_c7
## 1 LB45_2020 C C D E E
## 2 LB63_2022 A A A A A
## approachD_c8
## 1 F
## 2 B
# Save to file
write.csv2(lb_kml_approachD, file = "lb_kml_approachD.csv")
# Run clustering with the approach E
optionE <- parALGO(startingCond="randomAll")
kml(clustKml, nbClusters = 2:8, nbRedrawing = 20, toPlot = "none", parAlgo = optionE)
## ~ Fast KmL ~
## ***************************************************************************************************S
## 100 ****************************************S
# Generate and display trajectory plots for 3 to 8 clusters
for (k in 3:8) {
# Display in RStudio
plot(clustKml, k,
parMean = parMEAN(type = "l"),
parTraj = parTRAJ(col = "clusters"),
toPlot = "none")
# Save to .emf file
file_name <- paste0("lb_kml_approachE", k, ".emf")
emf(file = file_name, emfPlus = FALSE)
plot(clustKml, k,
parMean = parMEAN(type = "l"),
parTraj = parTRAJ(col = "clusters"),
toPlot = "none")
dev.off()
}
# Extract cluster membership for each k
approachE_c3 <- getClusters(clustKml, nbCluster = 3)
approachE_c4 <- getClusters(clustKml, nbCluster = 4)
approachE_c5 <- getClusters(clustKml, nbCluster = 5)
approachE_c6 <- getClusters(clustKml, nbCluster = 6)
approachE_c7 <- getClusters(clustKml, nbCluster = 7)
approachE_c8 <- getClusters(clustKml, nbCluster = 8)
# Combine all cluster assignments into a dataframe
allbirds2 <- row.names(lb_cluster)
lb_kml_approachE <- data.frame(allbirds2, approachE_c3, approachE_c4, approachE_c5, approachE_c6, approachE_c7, approachE_c8)
print(lb_kml_approachE[1:2, 1:7])
## allbirds2 approachE_c3 approachE_c4 approachE_c5 approachE_c6 approachE_c7
## 1 LB45_2020 C C D E E
## 2 LB63_2022 A A A A A
## approachE_c8
## 1 F
## 2 B
# Save to file
write.csv2(lb_kml_approachE, file = "lb_kml_approachE.csv")
# Run clustering with the approach F
optionF <- parALGO(startingCond="maxDist")
kml(clustKml, nbClusters = 2:8, nbRedrawing = 20, toPlot = "none", parAlgo = optionF)
## ~ Fast KmL ~
## ***************************************************************************************************S
## 100 ****************************************S
# Generate and display trajectory plots for 3 to 8 clusters
for (k in 3:8) {
# Display in RStudio
plot(clustKml, k,
parMean = parMEAN(type = "l"),
parTraj = parTRAJ(col = "clusters"),
toPlot = "none")
# Save to .emf file
file_name <- paste0("lb_kml_approachF", k, ".emf")
emf(file = file_name, emfPlus = FALSE)
plot(clustKml, k,
parMean = parMEAN(type = "l"),
parTraj = parTRAJ(col = "clusters"),
toPlot = "none")
dev.off()
}
# Extract cluster membership for each k
approachF_c3 <- getClusters(clustKml, nbCluster = 3)
approachF_c4 <- getClusters(clustKml, nbCluster = 4)
approachF_c5 <- getClusters(clustKml, nbCluster = 5)
approachF_c6 <- getClusters(clustKml, nbCluster = 6)
approachF_c7 <- getClusters(clustKml, nbCluster = 7)
approachF_c8 <- getClusters(clustKml, nbCluster = 8)
# Combine all cluster assignments into a dataframe
allbirds2 <- row.names(lb_cluster)
lb_kml_approachF <- data.frame(allbirds2, approachF_c3, approachF_c4, approachF_c5, approachF_c6, approachF_c7, approachF_c8)
print(lb_kml_approachF[1:2, 1:7])
## allbirds2 approachF_c3 approachF_c4 approachF_c5 approachF_c6 approachF_c7
## 1 LB45_2020 C C D E E
## 2 LB63_2022 A A A A A
## approachF_c8
## 1 F
## 2 B
# Save to file
write.csv2(lb_kml_approachF, file = "lb_kml_approachF.csv")
# Run clustering with the approach G
optionG <- parALGO(startingCond="randomK")
kml(clustKml, nbClusters = 2:8, nbRedrawing = 20, toPlot = "none", parAlgo = optionG)
## ~ Fast KmL ~
## ***************************************************************************************************S
## 100 ****************************************S
# Generate and display trajectory plots for 3 to 8 clusters
for (k in 3:8) {
# Display in RStudio
plot(clustKml, k,
parMean = parMEAN(type = "l"),
parTraj = parTRAJ(col = "clusters"),
toPlot = "none")
# Save to .emf file
file_name <- paste0("lb_kml_approachG", k, ".emf")
emf(file = file_name, emfPlus = FALSE)
plot(clustKml, k,
parMean = parMEAN(type = "l"),
parTraj = parTRAJ(col = "clusters"),
toPlot = "none")
dev.off()
}
# Extract cluster membership for each k
approachG_c3 <- getClusters(clustKml, nbCluster = 3)
approachG_c4 <- getClusters(clustKml, nbCluster = 4)
approachG_c5 <- getClusters(clustKml, nbCluster = 5)
approachG_c6 <- getClusters(clustKml, nbCluster = 6)
approachG_c7 <- getClusters(clustKml, nbCluster = 7)
approachG_c8 <- getClusters(clustKml, nbCluster = 8)
# Combine all cluster assignments into a dataframe
allbirds2 <- row.names(lb_cluster)
lb_kml_approachG <- data.frame(allbirds2, approachG_c3, approachG_c4, approachG_c5, approachG_c6, approachG_c7, approachG_c8)
print(lb_kml_approachG[1:2, 1:7])
## allbirds2 approachG_c3 approachG_c4 approachG_c5 approachG_c6 approachG_c7
## 1 LB45_2020 C C D E E
## 2 LB63_2022 A A A A A
## approachG_c8
## 1 F
## 2 B
# Save to file
write.csv2(lb_kml_approachG, file = "lb_kml_approachG.csv")
# Run clustering with the approach H
optionH <- parALGO(startingCond="kmeans++")
kml(clustKml, nbClusters = 2:8, nbRedrawing = 20, toPlot = "none", parAlgo = optionH)
## ~ Fast KmL ~
## ***************************************************************************************************S
## 100 ****************************************S
# Generate and display trajectory plots for 3 to 8 clusters
for (k in 3:8) {
# Display in RStudio
plot(clustKml, k,
parMean = parMEAN(type = "l"),
parTraj = parTRAJ(col = "clusters"),
toPlot = "none")
# Save to .emf file
file_name <- paste0("lb_kml_approachH", k, ".emf")
emf(file = file_name, emfPlus = FALSE)
plot(clustKml, k,
parMean = parMEAN(type = "l"),
parTraj = parTRAJ(col = "clusters"),
toPlot = "none")
dev.off()
}
# Extract cluster membership for each k
approachH_c3 <- getClusters(clustKml, nbCluster = 3)
approachH_c4 <- getClusters(clustKml, nbCluster = 4)
approachH_c5 <- getClusters(clustKml, nbCluster = 5)
approachH_c6 <- getClusters(clustKml, nbCluster = 6)
approachH_c7 <- getClusters(clustKml, nbCluster = 7)
approachH_c8 <- getClusters(clustKml, nbCluster = 8)
# Combine all cluster assignments into a dataframe
allbirds2 <- row.names(lb_cluster)
lb_kml_approachH <- data.frame(allbirds2, approachH_c3, approachH_c4, approachH_c5, approachH_c6, approachH_c7, approachH_c8)
print(lb_kml_approachH[1:2, 1:7])
## allbirds2 approachH_c3 approachH_c4 approachH_c5 approachH_c6 approachH_c7
## 1 LB45_2020 C C D E F
## 2 LB63_2022 A A A A A
## approachH_c8
## 1 F
## 2 B
# Save to file
write.csv2(lb_kml_approachH, file = "lb_kml_approachH.csv")