Summary

This tutorial presents a novel yet accessible approach that combines cluster analysis of net displacement values with boosted regression trees to classify migration patterns of Iberian little bustards. The dataset used consists of satellite-tagged records of little bustards, available on MoveBank. The tutorial explains the analyses conducted in the study “Individual variability in migration patterns of Iberian little bustards”.

Contents

  1. Installing Required Packages and Importing Data
  2. Visualizing Movement Patterns of All Tracked Birds
  3. Fitting Boosted Regression Trees (BRT)
  4. Clustering Analysis Based on BRT Values

1. Installing Required Packages and Importing Data

First, we need to install and load all necessary packages for data processing, visualization, and modeling.

library(lattice)
library(nlme)
library(mgcv)
## This is mgcv 1.8-41. For overview type 'help("mgcv-package")'.
library(rpart)
library(kml)
## Warning: package 'kml' was built under R version 4.2.3
## Loading required package: clv
## Warning: package 'clv' was built under R version 4.2.3
## Loading required package: cluster
## Loading required package: class
## Loading required package: longitudinalData
## Warning: package 'longitudinalData' was built under R version 4.2.3
## Loading required package: rgl
## Loading required package: misc3d
library(longitudinalData)
library(devEMF)
## Warning: package 'devEMF' was built under R version 4.2.3
library(amt)
## Warning: package 'amt' was built under R version 4.2.3
## 
## Attaching package: 'amt'
## The following object is masked from 'package:stats':
## 
##     filter
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.2.3
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:nlme':
## 
##     collapse
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(readxl)

Then, we set the working directory using the function setwd("path/to/your/directory") and import the data.

head(lb)
##          ID           timestamp       date longitude latitude day    utm_x
## 1 LB45_2020 2020-04-01 01:29:00 2020-03-31 -5.444840 38.79970   1 433454.8
## 2 LB45_2020 2020-04-02 01:29:00 2020-04-01 -5.431840 38.81745   2 434526.2
## 3 LB45_2020 2020-04-03 01:29:00 2020-04-02 -5.420350 38.81321   3 435538.3
## 4 LB45_2020 2020-04-04 02:29:00 2020-04-04 -5.419658 38.81427   4 435594.9
## 5 LB45_2020 2020-04-05 00:28:00 2020-04-04 -5.417703 38.81401   5 435765.6
## 6 LB45_2020 2020-04-06 03:29:00 2020-04-06 -5.416944 38.81371   6 435832.6
##      utm_y     NSD
## 1 207178.1       0
## 2 209182.8 5166604
## 3 208740.9 6783433
## 4 208860.4 7410484
## 5 208836.7 8090911
## 6 208806.0 8304120

The dataset contains GPS tracking data for individual Iberian little bustards, with each row representing a recorded location per day. Each bird is identified by a unique ID, which combines the bird’s individual tag and the tracking year (e.g., lb192286_2019 corresponds to bird LB45 in the year 2019).

2. Visualizing Movement Patterns of All Tracked Birds

length(unique(lb$ID))
## [1] 71

The dataset comprises 71 annual cycles (hereafter tracks) from 46 adult individuals. Each track represents one full year of GPS locations for a given individual with at least 300 or more days. To analyse movement patterns, we use the Net Displacement, which is calculated as the square root of the Net Squared Displacement (NSD).

# Check the range of Net Squared Displacement (NSD) values
range(lb$NSD)
## [1]            0 168472881980
# Convert NSD to Net Displacement (ND) by taking the square root
lb$ND <- sqrt(lb$NSD)

# Check the range of ND (in meters)

range(lb$ND)
## [1]      0.0 410454.5
# Convert ND from meters to kilometers
lb$ND_km <- lb$ND/1000

# Check the final range of Net Displacement in kilometers
range(lb$ND_km)
## [1]   0.0000 410.4545


Plotting Movement Patterns for All Tracked Birds

We generate one plot per bird, showing their Net Displacement (in kilometres) across the days of the year. Each plot is saved as a separate .jpg file, named according to the bird’s ID.

Loop through each bird and generate the plots:

# Get a list of all unique bird-year IDs
allbirds <- unique(lb$ID)

# Total number of tracked bird-year combinations
maxnb <- length(allbirds)

# Confirm that there are no missing Net Displacement values
sum(is.na(lb$ND_km))  # Should return 0
## [1] 0
# Create individual Net Displacement plots for each bird-year track
for (i in 1:maxnb) {                      
  bird.i <- allbirds[i] 
  NetDis <- lb[lb$ID == bird.i, ]
  
  # Set filename for each plot
  file_name <- paste0(bird.i, ".jpg")
  
  # Open JPEG device
  jpeg(file = file_name)
  
  # Create the plot
  plot(NetDis$day, NetDis$ND_km,
       xlab = "Day",
       ylab = "Net Displacement (km)",
       ylim = c(0, max(NetDis$ND_km, na.rm = TRUE)),
       las = 1,
       main = bird.i)
  
  # Close the graphics device
  dev.off()
}


3. Fitting Boosted Regression Trees (BRT)

We fit Boosted Regression Trees (BRT) to model the Net Displacement (ND) of each tracked bird over time. BRTs are particularly well-suited for this analysis because they can model complex, non-linear relationships and handle unbalanced ecological data with missing values.

BRT works by repeatedly splitting the data into more homogeneous groups based on a predictor variable (in this case, day). For our analysis, ND is the response variable, while day serves as the primary predictor.

A key advantage of BRT is its ability to handle missing data points. When data are missing for the primary predictor, BRT can use surrogate splits as alternative predictors that can ensure the model continues to function even when some values are absent. This feature makes BRT particularly robust for working with real-world ecological data that may contain gaps.

3.1. Setting Up the Model

We use the rpart function to fit regression trees for each bird. This function models the relationship between ND (ND_km) and day.

There are two key parameters that control how the model builds the trees:

  • minbucket: This parameter defines the minimum number of observations required in any terminal node.
  • minsplit: This is the minimum number of observations required for a split to be attempted.
  • For birds with fewer than 50 observations (i.e., small sample sizes), we use a more lenient splitting criterion with minbucket = 2. For birds with more than 50 observations, we rely on default settings for minsplit = 20 and minbucket being one-third of minsplit.

3.2. Fitting the Model for Each Bird

For each bird, we first extract the data subset corresponding to that individual and fit the model based on the number of observations:

for (i in 1:maxnb) {                      
  bird.i <- allbirds[i] 
  subset <- lb[lb$ID == bird.i,]
  
  # Fit the model based on sample size
  if (dim(subset)[1] < 50) {
    M <- rpart(ND_km ~ day, method = "anova", minbucket = 2, data = subset)
  } else {
    M <- rpart(ND_km ~ day, method = "anova", data = subset)
  }
  
  # Predict the Net Displacement for the current bird
  subset$pred <- predict(M)
  
  # Save the plot for the current bird
  file_name <- paste(bird.i, "_rt", ".jpg", sep="")
  jpeg(file = file_name)
  
  # Plot the observed and predicted ND
  if (max(subset$ND_km) > max(subset$pred)) {
    plot(subset$day, subset$ND_km, pch = 21, ylim = c(0, max(subset$ND_km)), 
         xlab = "Day", ylab = "ND (km)", main = bird.i, las = 1)
    I <- order(subset$day)
    lines(subset$day[I], subset$pred[I])
    dev.off()
  }
  
  plot(subset$day, subset$ND_km, pch = 21, ylim = c(0, max(subset$pred)), 
       xlab = "Day", ylab = "ND (km)", main = bird.i, las = 1)
  I <- order(subset$day)
  lines(subset$day[I], subset$pred[I])
  dev.off()
}

Note: The method anova was chosen because ND is a continuous numeric variable. Other options, like poisson or class, would be more appropriate for count or categorical data, respectively. By using anova, we ensure that the regression tree builds a model based on continuous data.

The plots for each bird’s ND predictions will be saved in the current working directory.

3.3. Saving Predicted BRT Profiles and Preparing for Cluster Analysis

To later use these predicted values in a cluster analysis, we first store them in a matrix and then export the results.

# Initialize a matrix to store predicted ND values (rows = days, columns = birds)
profiles <- matrix(nrow = 365, ncol = maxnb + 1)
# The first column will contain Julian days (1 to 365)
# The following columns will contain predicted ND values for each bird
profiles[, 1] <- 1:365
colnames(profiles) <- c("Day", as.character(allbirds))

# Preview the matrix structure (first 6 rows and first 3 columns)
print(profiles[1:6, 1:3])
##      Day LB45_2020 LB63_2022
## [1,]   1        NA        NA
## [2,]   2        NA        NA
## [3,]   3        NA        NA
## [4,]   4        NA        NA
## [5,]   5        NA        NA
## [6,]   6        NA        NA
# Create a reference data frame for prediction (one row per day of the year)
Day <- 1:365
MyData <- data.frame(day = Day)

# Loop through each bird, fit a model, predict ND for 365 days, and store the result
for (i in 1:maxnb) {
  bird.i <- allbirds[i]
  subset <- lb[lb$ID == bird.i, ]
  if (nrow(subset) < 50) {
    M <- rpart(ND_km ~ day, method = "anova", minbucket = 2, data = subset)
  } else {
    M <- rpart(ND_km ~ day, method = "anova", data = subset)
  }
  pred <- predict(M, MyData)
  profiles[, i + 1] <- pred
}

# Convert the matrix to a dataframe and save as CSV for later use
lb_BRT <- as.data.frame(profiles)
write.csv2(lb_BRT, file = "lb_BRT.csv")

To avoid using model predictions beyond the period each bird was actually tracked, we will replace extrapolated values with NA.

# Create a summary dataframe to store the last recorded day for each bird
z <- data.frame(ID = character(maxnb), maxdays = numeric(maxnb), stringsAsFactors = FALSE)

for (i in 1:maxnb) { 
  bird.i <- allbirds[i]
  NetDis <- lb[lb$ID == bird.i, ]
  z[i, "ID"] <- as.character(bird.i)
  z[i, "maxdays"] <- max(NetDis$day)
}

# Copy the predictions and mask extrapolated values with NA
lb_BRT_NA <- lb_BRT
for (j in 1:maxnb) {
  for (i in 1:365) {
    if (lb_BRT$Day[i] > z$maxdays[j]) {
      lb_BRT_NA[i, j + 1] <- NA
    }
  }
}

# Convert the matrix to a dataframe and save as CSV
lb_BRT_NA <- as.matrix(lb_BRT_NA, digits = 2)
write.csv2(lb_BRT_NA, file = "lb_BRT_NA.csv")

After generating the BRT predictions and replacing extrapolated values with NA, we now prepare the data in a format suitable for clustering. The goal is to create a matrix where each row represents an individual bird, and each column represents a day of the year with the predicted displacement value.

# Transpose the BRT matrix so that birds are rows and days are columns
lb_cluster <- t(lb_BRT_NA)

# Remove the first row (original "Day" column), since it's now a redundant header
lb_cluster <- lb_cluster[-1, ]

# Rename columns as day numbers (1 to 365)
colnames(lb_cluster) <- 1:365

# Convert to dataframe to facilitate clustering analysis
lb_cluster <- as.data.frame(lb_cluster)
print(lb_cluster[1:2, 1:8])
##                   1         2         3         4         5         6         7
## LB45_2020 8.0035096 8.0035096 8.0035096 8.0035096 8.0035096 8.0035096 8.0035096
## LB63_2022 0.7401503 0.7401503 0.7401503 0.7401503 0.7401503 0.7401503 0.7401503
##                   8
## LB45_2020 8.0035096
## LB63_2022 0.7401503
# Save the final matrix to a CSV file
write.csv2(lb_cluster, file = "lb_BRT_cluster.csv")


4. Clustering Analysis Based on BRT Values

To group the migration patterns of GPS-tagged little bustards, we applied a longitudinal k-means clustering analysis using the kml and longitudinalData R packages. While traditional k-means clustering minimizes variance by comparing static points, it does not consider the temporal structure of movement data. Instead, longitudinal clustering considers the entire migration trajectories, its timing, duration, and distance, making it more suitable for time-series data like ours.

The algorithm partitions individuals into clusters based on their ND trajectories over the year, which allows us to identify common migration strategies. We include all tracks in the analysis, even those that belong to the same individual.

We tested nine different approaches, varying four key parameters:

4.1 Prepare the Data and Create the Cluster Object

# Create ClusterLongData object for clustering
clustKml <- cld(lb_cluster, idAll = rownames(lb_cluster), varNames = "ND_km")

# Visualize individual trajectories
plot (clustKml, toPlot = "traj")

4.2 Run the Longitudinal k-Means Clustering


Default approach
# Run clustering for 3 to 8 clusters, with 20 random initializations each
kml(clustKml, nbClusters = 3:8, nbRedrawing = 20, toPlot = "criterion")

##  ~ Fast KmL ~
## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## S
## 100

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## *

## S

  • nbClusters: Range of cluster numbers to test (e.g., 3 to 8).
  • nbRedrawing: Number of times the algorithm is re-run per cluster number, to test stability of the result.
  • toPlot = "criterion": Shows a quality metric (e.g., Calinski-Harabasz Index) for each run.

The plots that appear while the clustering analysis is running refer to the quality of clustering. The sorting scale uses the Calinski-Harabasz Index to provide a score on the quality of the clustering. As more clusters are created, the statistics of each cluster become less reliable. You should be careful if there is a big drop in quality when moving left-to-right through redrawing of the same k. There is no minimum quality, so this should be used as directional guidance.

The default approach is defined below:

  • imputationMethod = "copyMean": Interpolates missing values, then adjusts based on average trajectory shape.
  • distanceName = "euclidean": Uses Euclidean distance between full trajectories.
  • startingCond = "nearlyAll": Ensures that almost all individuals are initially assigned to clusters.
  • scale = TRUE: Scales data before clustering.

To visualize the result of the clustering, we generate trajectory plots for each number of clusters tested:

# Generate and display trajectory plots for 3 to 8 clusters
for (k in 3:8) {
  # Display in RStudio
  plot(clustKml, k, 
       parMean = parMEAN(type = "l"), 
       parTraj = parTRAJ(col = "clusters"), 
       toPlot = "traj")

  # Save to .emf file
  file_name <- paste0("lb_kml_default", k, ".emf")
  emf(file = file_name, emfPlus = FALSE)
  plot(clustKml, k, 
       parMean = parMEAN(type = "l"), 
       parTraj = parTRAJ(col = "clusters"), 
       toPlot = "traj")
  dev.off()
}

  • parTraj(col = "clusters"): Colors individual trajectories by cluster.
  • parMean(type = "l"): Plots mean trajectory per cluster as a line.

Note: The plotted lines represent the mean trajectory of all individuals within each cluster, not the median.

Finally, we extract and export the cluster assignments for each number of clusters:

# Extract cluster membership for each k
default_c3 <- getClusters(clustKml, nbCluster = 3)
default_c4 <- getClusters(clustKml, nbCluster = 4)
default_c5 <- getClusters(clustKml, nbCluster = 5)
default_c6 <- getClusters(clustKml, nbCluster = 6)
default_c7 <- getClusters(clustKml, nbCluster = 7)
default_c8 <- getClusters(clustKml, nbCluster = 8)

# Combine all cluster assignments into a dataframe
allbirds2 <- row.names(lb_cluster)
lb_kml_default <- data.frame(allbirds2, default_c3, default_c4, default_c5,
                             default_c6, default_c7, default_c8)
print(lb_kml_default[1:2, 1:7])
##   allbirds2 default_c3 default_c4 default_c5 default_c6 default_c7 default_c8
## 1 LB45_2020          C          C          D          D          F          G
## 2 LB63_2022          A          A          A          B          B          B
# Save to file
write.csv2(lb_kml_default, file = "lb_kml_default.csv")


Approach A
# Run clustering with the approach A
optionA <- parALGO(saveFreq = 100, maxIt = 200, imputationMethod="LOCF", distanceName = "euclidean", power = 2, distance = function(){}, centerMethod = meanNA, startingCond = "nearlyAll", nbCriterion = 100, scale = TRUE)
kml(clustKml, nbClusters = 2:8, nbRedrawing = 20, toPlot = "none", parAlgo = optionA)
##  ~ Fast KmL ~
## ***************************************************************************************************S
## 100 ****************************************S
# Generate and display trajectory plots for 3 to 8 clusters
for (k in 3:8) {
  # Display in RStudio
  plot(clustKml, k, 
       parMean = parMEAN(type = "l"), 
       parTraj = parTRAJ(col = "clusters"), 
       toPlot = "none")

  # Save to .emf file
  file_name <- paste0("lb_kml_approachA", k, ".emf")
  emf(file = file_name, emfPlus = FALSE)
  plot(clustKml, k, 
       parMean = parMEAN(type = "l"), 
       parTraj = parTRAJ(col = "clusters"), 
       toPlot = "none")
  dev.off()
}
# Extract cluster membership for each k
approachA_c3 <- getClusters(clustKml, nbCluster = 3)
approachA_c4 <- getClusters(clustKml, nbCluster = 4)
approachA_c5 <- getClusters(clustKml, nbCluster = 5)
approachA_c6 <- getClusters(clustKml, nbCluster = 6)
approachA_c7 <- getClusters(clustKml, nbCluster = 7)
approachA_c8 <- getClusters(clustKml, nbCluster = 8)

# Combine all cluster assignments into a dataframe
allbirds2 <- row.names(lb_cluster)
lb_kml_approachA <- data.frame(allbirds2, approachA_c3, approachA_c4, approachA_c5,  approachA_c6, approachA_c7, approachA_c8)
print(lb_kml_approachA[1:2, 1:7])
##   allbirds2 approachA_c3 approachA_c4 approachA_c5 approachA_c6 approachA_c7
## 1 LB45_2020            C            C            D            C            F
## 2 LB63_2022            A            A            A            A            B
##   approachA_c8
## 1            G
## 2            B
# Save to file
write.csv2(lb_kml_approachA, file = "lb_kml_approachA.csv")


Approach B
# Run clustering with the approach B
optionB <- parALGO (saveFreq=100,maxIt=200,imputationMethod="copyMean", distanceName="euclidean",power=2,distance=function(x,y){dist(rbind(x,y),method='manhattan',p=2)},centerMethod=meanNA,startingCond="nearlyAll",nbCriterion=100,scale=TRUE)
kml(clustKml, nbClusters = 2:8, nbRedrawing = 20, toPlot = "none", parAlgo = optionB)
##  ~ Fast KmL ~
## ***************************************************************************************************S
## 100 ****************************************S
# Generate and display trajectory plots for 3 to 8 clusters
for (k in 3:8) {
  # Display in RStudio
  plot(clustKml, k, 
       parMean = parMEAN(type = "l"), 
       parTraj = parTRAJ(col = "clusters"), 
       toPlot = "none")

  # Save to .emf file
  file_name <- paste0("lb_kml_approachB", k, ".emf")
  emf(file = file_name, emfPlus = FALSE)
  plot(clustKml, k, 
       parMean = parMEAN(type = "l"), 
       parTraj = parTRAJ(col = "clusters"), 
       toPlot = "none")
  dev.off()
}
# Extract cluster membership for each k
approachB_c3 <- getClusters(clustKml, nbCluster = 3)
approachB_c4 <- getClusters(clustKml, nbCluster = 4)
approachB_c5 <- getClusters(clustKml, nbCluster = 5)
approachB_c6 <- getClusters(clustKml, nbCluster = 6)
approachB_c7 <- getClusters(clustKml, nbCluster = 7)
approachB_c8 <- getClusters(clustKml, nbCluster = 8)

# Combine all cluster assignments into a dataframe
allbirds2 <- row.names(lb_cluster)
lb_kml_approachB <- data.frame(allbirds2, approachB_c3, approachB_c4, approachB_c5,  approachB_c6, approachB_c7, approachB_c8)
print(lb_kml_approachB[1:2, 1:7])
##   allbirds2 approachB_c3 approachB_c4 approachB_c5 approachB_c6 approachB_c7
## 1 LB45_2020            C            C            D            E            F
## 2 LB63_2022            A            A            A            B            B
##   approachB_c8
## 1            G
## 2            B
# Save to file
write.csv2(lb_kml_approachB, file = "lb_kml_approachB.csv")


Approach C
# Run clustering with the approach C
optionC <- parALGO (saveFreq=100,maxIt=200,imputationMethod="copyMean",distanceName="euclidean",power=2,distance=function(){},centerMethod=meanNA,startingCond="nearlyAll",nbCriterion=100,scale=FALSE)
kml(clustKml, nbClusters = 2:8, nbRedrawing = 20, toPlot = "none", parAlgo = optionC)
##  ~ Fast KmL ~
## ***************************************************************************************************S
## 100 ****************************************S
# Generate and display trajectory plots for 3 to 8 clusters
for (k in 3:8) {
  # Display in RStudio
  plot(clustKml, k, 
       parMean = parMEAN(type = "l"), 
       parTraj = parTRAJ(col = "clusters"), 
       toPlot = "none")

  # Save to .emf file
  file_name <- paste0("lb_kml_approachC", k, ".emf")
  emf(file = file_name, emfPlus = FALSE)
  plot(clustKml, k, 
       parMean = parMEAN(type = "l"), 
       parTraj = parTRAJ(col = "clusters"), 
       toPlot = "none")
  dev.off()
}
# Extract cluster membership for each k
approachC_c3 <- getClusters(clustKml, nbCluster = 3)
approachC_c4 <- getClusters(clustKml, nbCluster = 4)
approachC_c5 <- getClusters(clustKml, nbCluster = 5)
approachC_c6 <- getClusters(clustKml, nbCluster = 6)
approachC_c7 <- getClusters(clustKml, nbCluster = 7)
approachC_c8 <- getClusters(clustKml, nbCluster = 8)

# Combine all cluster assignments into a dataframe
allbirds2 <- row.names(lb_cluster)
lb_kml_approachC <- data.frame(allbirds2, approachC_c3, approachC_c4, approachC_c5,  approachC_c6, approachC_c7, approachC_c8)
print(lb_kml_approachC[1:2, 1:7])
##   allbirds2 approachC_c3 approachC_c4 approachC_c5 approachC_c6 approachC_c7
## 1 LB45_2020            C            C            D            E            E
## 2 LB63_2022            A            A            A            A            A
##   approachC_c8
## 1            F
## 2            B
# Save to file
write.csv2(lb_kml_approachC, file = "lb_kml_approachC.csv")


Approach D
# Run clustering with the approach D
optionD <- parALGO (saveFreq=100,maxIt=200,imputationMethod="copyMean.bisector",distanceName="euclidean",power=2,distance=function(){},centerMethod=meanNA,startingCond="nearlyAll",nbCriterion=100,scale=TRUE)
kml(clustKml, nbClusters = 2:8, nbRedrawing = 20, toPlot = "none", parAlgo = optionD)
##  ~ Fast KmL ~
## ***************************************************************************************************S
## 100 ****************************************S
# Generate and display trajectory plots for 3 to 8 clusters
for (k in 3:8) {
  # Display in RStudio
  plot(clustKml, k, 
       parMean = parMEAN(type = "l"), 
       parTraj = parTRAJ(col = "clusters"), 
       toPlot = "none")

  # Save to .emf file
  file_name <- paste0("lb_kml_approachD", k, ".emf")
  emf(file = file_name, emfPlus = FALSE)
  plot(clustKml, k, 
       parMean = parMEAN(type = "l"), 
       parTraj = parTRAJ(col = "clusters"), 
       toPlot = "none")
  dev.off()
}
# Extract cluster membership for each k
approachD_c3 <- getClusters(clustKml, nbCluster = 3)
approachD_c4 <- getClusters(clustKml, nbCluster = 4)
approachD_c5 <- getClusters(clustKml, nbCluster = 5)
approachD_c6 <- getClusters(clustKml, nbCluster = 6)
approachD_c7 <- getClusters(clustKml, nbCluster = 7)
approachD_c8 <- getClusters(clustKml, nbCluster = 8)

# Combine all cluster assignments into a dataframe
allbirds2 <- row.names(lb_cluster)
lb_kml_approachD <- data.frame(allbirds2, approachD_c3, approachD_c4, approachD_c5,  approachD_c6, approachD_c7, approachD_c8)
print(lb_kml_approachD[1:2, 1:7])
##   allbirds2 approachD_c3 approachD_c4 approachD_c5 approachD_c6 approachD_c7
## 1 LB45_2020            C            C            D            E            E
## 2 LB63_2022            A            A            A            A            A
##   approachD_c8
## 1            F
## 2            B
# Save to file
write.csv2(lb_kml_approachD, file = "lb_kml_approachD.csv")


Approach E
# Run clustering with the approach E
optionE <- parALGO(startingCond="randomAll")
kml(clustKml, nbClusters = 2:8, nbRedrawing = 20, toPlot = "none", parAlgo = optionE)
##  ~ Fast KmL ~
## ***************************************************************************************************S
## 100 ****************************************S
# Generate and display trajectory plots for 3 to 8 clusters
for (k in 3:8) {
  # Display in RStudio
  plot(clustKml, k, 
       parMean = parMEAN(type = "l"), 
       parTraj = parTRAJ(col = "clusters"), 
       toPlot = "none")

  # Save to .emf file
  file_name <- paste0("lb_kml_approachE", k, ".emf")
  emf(file = file_name, emfPlus = FALSE)
  plot(clustKml, k, 
       parMean = parMEAN(type = "l"), 
       parTraj = parTRAJ(col = "clusters"), 
       toPlot = "none")
  dev.off()
}
# Extract cluster membership for each k
approachE_c3 <- getClusters(clustKml, nbCluster = 3)
approachE_c4 <- getClusters(clustKml, nbCluster = 4)
approachE_c5 <- getClusters(clustKml, nbCluster = 5)
approachE_c6 <- getClusters(clustKml, nbCluster = 6)
approachE_c7 <- getClusters(clustKml, nbCluster = 7)
approachE_c8 <- getClusters(clustKml, nbCluster = 8)

# Combine all cluster assignments into a dataframe
allbirds2 <- row.names(lb_cluster)
lb_kml_approachE <- data.frame(allbirds2, approachE_c3, approachE_c4, approachE_c5,  approachE_c6, approachE_c7, approachE_c8)
print(lb_kml_approachE[1:2, 1:7])
##   allbirds2 approachE_c3 approachE_c4 approachE_c5 approachE_c6 approachE_c7
## 1 LB45_2020            C            C            D            E            E
## 2 LB63_2022            A            A            A            A            A
##   approachE_c8
## 1            F
## 2            B
# Save to file
write.csv2(lb_kml_approachE, file = "lb_kml_approachE.csv")


Approach F
# Run clustering with the approach F
optionF <- parALGO(startingCond="maxDist")
kml(clustKml, nbClusters = 2:8, nbRedrawing = 20, toPlot = "none", parAlgo = optionF)
##  ~ Fast KmL ~
## ***************************************************************************************************S
## 100 ****************************************S
# Generate and display trajectory plots for 3 to 8 clusters
for (k in 3:8) {
  # Display in RStudio
  plot(clustKml, k, 
       parMean = parMEAN(type = "l"), 
       parTraj = parTRAJ(col = "clusters"), 
       toPlot = "none")

  # Save to .emf file
  file_name <- paste0("lb_kml_approachF", k, ".emf")
  emf(file = file_name, emfPlus = FALSE)
  plot(clustKml, k, 
       parMean = parMEAN(type = "l"), 
       parTraj = parTRAJ(col = "clusters"), 
       toPlot = "none")
  dev.off()
}
# Extract cluster membership for each k
approachF_c3 <- getClusters(clustKml, nbCluster = 3)
approachF_c4 <- getClusters(clustKml, nbCluster = 4)
approachF_c5 <- getClusters(clustKml, nbCluster = 5)
approachF_c6 <- getClusters(clustKml, nbCluster = 6)
approachF_c7 <- getClusters(clustKml, nbCluster = 7)
approachF_c8 <- getClusters(clustKml, nbCluster = 8)

# Combine all cluster assignments into a dataframe
allbirds2 <- row.names(lb_cluster)
lb_kml_approachF <- data.frame(allbirds2, approachF_c3, approachF_c4, approachF_c5,  approachF_c6, approachF_c7, approachF_c8)
print(lb_kml_approachF[1:2, 1:7])
##   allbirds2 approachF_c3 approachF_c4 approachF_c5 approachF_c6 approachF_c7
## 1 LB45_2020            C            C            D            E            E
## 2 LB63_2022            A            A            A            A            A
##   approachF_c8
## 1            F
## 2            B
# Save to file
write.csv2(lb_kml_approachF, file = "lb_kml_approachF.csv")


Approach G
# Run clustering with the approach G
optionG <- parALGO(startingCond="randomK")
kml(clustKml, nbClusters = 2:8, nbRedrawing = 20, toPlot = "none", parAlgo = optionG)
##  ~ Fast KmL ~
## ***************************************************************************************************S
## 100 ****************************************S
# Generate and display trajectory plots for 3 to 8 clusters
for (k in 3:8) {
  # Display in RStudio
  plot(clustKml, k, 
       parMean = parMEAN(type = "l"), 
       parTraj = parTRAJ(col = "clusters"), 
       toPlot = "none")

  # Save to .emf file
  file_name <- paste0("lb_kml_approachG", k, ".emf")
  emf(file = file_name, emfPlus = FALSE)
  plot(clustKml, k, 
       parMean = parMEAN(type = "l"), 
       parTraj = parTRAJ(col = "clusters"), 
       toPlot = "none")
  dev.off()
}
# Extract cluster membership for each k
approachG_c3 <- getClusters(clustKml, nbCluster = 3)
approachG_c4 <- getClusters(clustKml, nbCluster = 4)
approachG_c5 <- getClusters(clustKml, nbCluster = 5)
approachG_c6 <- getClusters(clustKml, nbCluster = 6)
approachG_c7 <- getClusters(clustKml, nbCluster = 7)
approachG_c8 <- getClusters(clustKml, nbCluster = 8)

# Combine all cluster assignments into a dataframe
allbirds2 <- row.names(lb_cluster)
lb_kml_approachG <- data.frame(allbirds2, approachG_c3, approachG_c4, approachG_c5,  approachG_c6, approachG_c7, approachG_c8)
print(lb_kml_approachG[1:2, 1:7])
##   allbirds2 approachG_c3 approachG_c4 approachG_c5 approachG_c6 approachG_c7
## 1 LB45_2020            C            C            D            E            E
## 2 LB63_2022            A            A            A            A            A
##   approachG_c8
## 1            F
## 2            B
# Save to file
write.csv2(lb_kml_approachG, file = "lb_kml_approachG.csv")


Approach H
# Run clustering with the approach H
optionH <- parALGO(startingCond="kmeans++")
kml(clustKml, nbClusters = 2:8, nbRedrawing = 20, toPlot = "none", parAlgo = optionH)
##  ~ Fast KmL ~
## ***************************************************************************************************S
## 100 ****************************************S
# Generate and display trajectory plots for 3 to 8 clusters
for (k in 3:8) {
  # Display in RStudio
  plot(clustKml, k, 
       parMean = parMEAN(type = "l"), 
       parTraj = parTRAJ(col = "clusters"), 
       toPlot = "none")

  # Save to .emf file
  file_name <- paste0("lb_kml_approachH", k, ".emf")
  emf(file = file_name, emfPlus = FALSE)
  plot(clustKml, k, 
       parMean = parMEAN(type = "l"), 
       parTraj = parTRAJ(col = "clusters"), 
       toPlot = "none")
  dev.off()
}
# Extract cluster membership for each k
approachH_c3 <- getClusters(clustKml, nbCluster = 3)
approachH_c4 <- getClusters(clustKml, nbCluster = 4)
approachH_c5 <- getClusters(clustKml, nbCluster = 5)
approachH_c6 <- getClusters(clustKml, nbCluster = 6)
approachH_c7 <- getClusters(clustKml, nbCluster = 7)
approachH_c8 <- getClusters(clustKml, nbCluster = 8)

# Combine all cluster assignments into a dataframe
allbirds2 <- row.names(lb_cluster)
lb_kml_approachH <- data.frame(allbirds2, approachH_c3, approachH_c4, approachH_c5,  approachH_c6, approachH_c7, approachH_c8)
print(lb_kml_approachH[1:2, 1:7])
##   allbirds2 approachH_c3 approachH_c4 approachH_c5 approachH_c6 approachH_c7
## 1 LB45_2020            C            C            D            E            F
## 2 LB63_2022            A            A            A            A            A
##   approachH_c8
## 1            F
## 2            B
# Save to file
write.csv2(lb_kml_approachH, file = "lb_kml_approachH.csv")