Clustering Bald Eagle GPS Points Using K-means

A look at four distinct behavioral clusters: Perching, Gliding, Ascending, and Flapping

Background

Bergen et al., 2021

Classify bald eagle GPS points across Iowa using K-means clustering.

Focus:

  • Bald eagle movement variables (KPH, Sn, AGL, |Angle|, Vertical rate, |VR|)

  • Distinguish in-flight vs perching points

  • Identify flight behaviors: ascending, flapping, gliding

Data

  • 2,093,022 GPS points from 57 eagles over 4 years

  • Contains six focal variables associated with GPS data collected from bald eagles in Iowa, USA.


Animal_ID TimeDiff segment_id segment_length LocalTime Latitude Longitude X Y KPH Sn AGL VerticalRate abs_angle absVR
105 5 1 10 2019-06-25 16:44:10 41.63805 -92.88300 509745.4 4609596 0.07 0.52 9.55 -0.99 0.45 0.99
105 6 1 10 2019-06-25 16:44:16 41.63805 -92.88299 509746.0 4609597 0.07 0.17 9.55 0.00 0.21 0.00
105 5 1 10 2019-06-25 16:44:21 41.63806 -92.88298 509746.6 4609597 0.20 0.14 9.55 0.00 0.91 0.00
105 6 1 10 2019-06-25 16:44:27 41.63807 -92.88298 509746.6 4609599 0.16 0.22 9.55 0.00 3.14 0.00
105 5 1 10 2019-06-25 16:44:32 41.63807 -92.88298 509746.6 4609598 0.11 0.08 11.55 0.39 2.69 0.39
105 6 1 10 2019-06-25 16:44:38 41.63808 -92.88299 509746.0 4609599 0.07 0.22 11.55 0.00 0.53 0.00

Distribution Skewness Check

Some variables were positively skewed, so a square-root transformation was used to make their distributions more even before clustering.

Data Sampling


  • Full dataset was too large for clustering (~2.1M points).

  • Took 500 points per eagle to create an even sample.

  • Ensures balanced representation across all 57 eagles.

  • Confirmed the pattern is consistent across all samples.

Checking Random Samples

Multiple random samples show similar patterns, indicating that the sampling is consistent and ready for clustering.

Choosing K: WSS Method

Within Sum of Square

  • No clear “elbow” appeared in the plot.

  • The curve decreased smoothly without a sharp bend.

  • This suggests we need a better method for choosing the number of clusters.

Choosing K: Silhouette Method

Average silhouette width method

  • Bootstrapped average silhouette widths were highest for K = 2 and K = 4.

  • K = 4 showed slightly higher average silhouette width overall.

  • k = 4 is the optimal number of clusters

Cluster Comparisons Across K = 2–7

Sampled and scaled data were clustered at multiple K values to visualize how the behavioral structure changes.

Variables Overview


  • KPH: instantaneous speed

  • Sn: average speed

  • AGL: height above ground

  • Vertical Rate: upward/downward movement

  • Abs_angle: absolute turning angle

Distinct flight behaviors

Identifying differences between perching and in-flight behavior.

  • Perching: Very low speed (both instantaneous and average) and low altitude.

  • Ascending: Strong upward movement (VR) with higher altitude gains(AGL).

  • Flapping: Moderate speeds and mixed movement patterns near the center.

  • Gliding: very high speed (both instantaneous and average).

Behavioral Interpretation of clusters

Flight with different in-flight behaviors

• Mostly gliding and ascending, with a bit of flapping.

  • Gliding: highest speeds (KPH).

  • Ascending has higher altitudes (AGL).

  • Flapping: moderate speeds, occurring closer to the ground.

Summary

  • Yes, the movement variables clearly separate perching from in-flight points.

  • Among points, four distinct behaviors appear:

    • Perching: very low speed/altitude, little movement, large turns

    • Ascending: medium speed, rising altitude, upward movement

    • Flapping: mid speeds, steady altitude, moderate turns

    • Gliding: fastest speeds, high altitude, low turns, slight descent

Appendix

Data

library(readxl)
library(tidyverse)
library(vegan)
library(knitr)
library(kableExtra)

load('eagle_data.Rdata')

eagle_data_table <- eagle_data %>%
  as_tibble()

kable(
  eagle_data_table %>% slice_head(n = 6),
  caption = "",
  align = "c"
)%>%
  kable_styling(font_size = 14, full_width = TRUE) 

Distribution

eagle_data_td <- eagle_data %>%
  mutate(
    KPH = sqrt(KPH),
    Sn = sqrt(Sn),
    AGL = sqrt(AGL),
    abs_angle = sqrt(abs_angle),
    VerticalRate = VerticalRate,
    absVR = sqrt(absVR)
  )

ordered_vars2 <- c("KPH", "Sn", "AGL", "abs_angle", "VerticalRate", "absVR")

df_long2 <- eagle_data_td %>%
  select(all_of(ordered_vars2)) %>%
  pivot_longer(cols = everything(),
               names_to = "Variable", 
               values_to = "Value") %>%
  mutate(Variable = factor(Variable, levels = ordered_vars2))


p1<- ggplot(df_long2, aes(x = Value)) +
  geom_histogram(aes(y = after_stat(density)), 
                 bins = 60, fill = "grey50", color = "grey30") +
  facet_wrap(~ Variable, scales = "free", ncol = 3) +
  labs(y = "Density", x = NULL) +
  theme_classic(base_size = 12)+
  theme(
    strip.text = element_text(size = 10),
    aspect.ratio = 1
  )

p1 

Random samples

create_biplot_clean <- function(data, n_points = 500, title_suffix = "") {

  sampled <- data %>%
    group_by(Animal_ID) %>%
    slice_sample(n = n_points, replace = TRUE) %>%
    ungroup() %>%
    select(KPH:absVR) %>%
    scale()


  eagle_pca <- prcomp(sampled, center = TRUE)

  if (cor(eagle_pca$x[,2], sampled[, "AGL"]) < 0) {
    eagle_pca$x[,2] <- -eagle_pca$x[,2]
    eagle_pca$rotation[,2] <- -eagle_pca$rotation[,2]
  }

  scores <- as.data.frame(eagle_pca$x)
  scores$PC1 <- scores$PC1
  scores$PC2 <- scores$PC2



  ggplot() +
    geom_point(data = scores, aes(PC1, PC2),
               size = 0.2,
               alpha = 0.5,
               color="steelblue") +
    coord_equal(xlim = c(-7, 7), ylim = c(-5, 5)) +
    labs(title = title_suffix, x = "PC1", y = "PC2") +
    theme_classic(base_size = 14)+
    theme(
  panel.spacing = unit(1, "lines"),,
  aspect.ratio = 0.8,
  plot.margin = margin(7, 7, 7, 7),
  strip.text = element_text(size = 10)
)
}


biplot1 <- create_biplot_clean(eagle_data_td, title_suffix = "Sample 1")
biplot2 <- create_biplot_clean(eagle_data_td, title_suffix = "Sample 2")
biplot3 <- create_biplot_clean(eagle_data_td, title_suffix = "Sample 3")
biplot4 <- create_biplot_clean(eagle_data_td, title_suffix = "Sample 4")

all_biplots <- wrap_plots(biplot1, biplot2, biplot3, biplot4, ncol = 2)


saveRDS(all_biplots, "all_biplots.rds")
all_biplots <- readRDS("all_biplots.rds")

all_biplots

WSS Plot

 wss_plot <- fviz_nbclust(
  eagle_sample_scaled,
   FUNcluster = kmeans,
   method = "wss"
 ) +
labs(title = "")+
  theme_classic(base_size = 8)


saveRDS(wss_plot, "wss_plot.rds")
wss_plot <- readRDS("wss_plot.rds")


wss_plot

Silhouette plot

library(cluster)
sample_and_scale <- function(data, n_points = 1000) {
  data %>%
    group_by(Animal_ID) %>%
    slice_sample(n = n_points, replace = TRUE) %>%
    ungroup() %>%
    select(KPH:absVR) %>%
    scale() %>%
    as.data.frame()
}

silhouette_fun <- function(df, kmax = 10) {
  d <- dist(df)
  tibble(
    k = 2:kmax,
    sil = map_dbl(2:kmax, ~{
      km <- kmeans(df, centers = .x)
      ss <- silhouette(km$cluster, d)
      mean(ss[, 3])
    })
  )
}

set.seed(123)
n_runs <- 20


sil_results <- map_df(1:n_runs, ~{
  df_sample <- sample_and_scale(eagle_data)
  silhouette_fun(df_sample) %>% mutate(run = .x)
})

sil_avg <- sil_results %>%
  group_by(k) %>%
  summarize(sil = mean(sil))


saveRDS(sil_results, "sil_results.rds")
saveRDS(sil_avg, "sil_avg.rds")

ggplot() +
  geom_line(
    data = sil_results,
    aes(k, sil, group = run),
    alpha = 0.15
  ) +
  geom_line(
    data = sil_avg,
    aes(k, sil),
    size = 1.4,
    color = "blue"
  ) +
  labs(
    title = "",
    x = "K",
    y = "Average silhouette Width"
  ) +
  theme_classic(base_size = 16)

Cluster Across K=2-7

library(patchwork)
set.seed(123)


k_values <- 2:7
plots <- list()


eagle_pca <- prcomp(eagle_sample_scaled, center = TRUE, scale. = TRUE)

if(cor(eagle_pca$x[,2], eagle_sample_scaled[,"AGL"]) < 0) {
  eagle_pca$x[,2] <- -eagle_pca$x[,2]
  eagle_pca$rotation[,2] <- -eagle_pca$rotation[,2]
}

scores <- as.data.frame(eagle_pca$x)
loadings <- as.data.frame(eagle_pca$rotation)
loadings$var <- rownames(loadings)


arrow_scale <- 3
loadings$PC1 <- loadings$PC1 * arrow_scale
loadings$PC2 <- loadings$PC2 * arrow_scale

for (k in k_values) {

  kmeans_clusters <- kmeans(eagle_sample_scaled,
                            centers = k,
                            iter.max = 50,
                            nstart = 10)
  scores$cluster <- factor(kmeans_clusters$cluster)

  p <- ggplot(scores, aes(x = PC1,
                          y = PC2,
                          color = cluster)) +
    geom_point(size = 0.4,
               alpha = 0.5) +
    scale_x_continuous(limits = c(-4, 7), name = "PC1") +
    scale_y_continuous(limits = c(-2, 5), name = "PC2") +
    ggtitle(paste("K =", k)) +
    guides(color = "none") +
    theme_classic(base_size = 11)+
    theme(
       panel.spacing = unit(1, "lines"),,
       aspect.ratio = 0.8,
       plot.margin = margin(7, 7, 7, 7),
       strip.text = element_text(size = 10)
)

  plots[[k]] <- p
}


all_biplots_2_7 <- plots[[2]] + plots[[3]] + plots[[4]] + plots[[5]] +   plots[[6]] + plots[[7]] + plot_layout(nrow = 2)

saveRDS(all_biplots_2_7, "all_pca_biplots_2_7.rds")

all_biplots_2_7 <- readRDS("all_pca_biplots_2_7.rds")


all_biplots_2_7

K=4 Cluster plot

library(ggrepel)
eagle_pca <- prcomp(eagle_sample_scaled, center = TRUE, scale. = TRUE)

if (cor(eagle_pca$x[,2], eagle_sample_scaled[,"AGL"]) < 0) {
  eagle_pca$x[,2] <- -eagle_pca$x[,2]
  eagle_pca$rotation[,2] <- -eagle_pca$rotation[,2]
}

scores <- as.data.frame(eagle_pca$x)

kmeans_clusters <- kmeans(eagle_sample_scaled, centers = 4, iter.max = 50, nstart = 10)

cluster_order <- order(tapply(eagle_sample_scaled[,1], kmeans_clusters$cluster, mean))
cluster_map <- c(
  "Perching"  = cluster_order[1],
  "Ascending" = cluster_order[2],
  "Flapping"  = cluster_order[3],
  "Gliding"   = cluster_order[4]
)

scores$cluster <- factor(
  kmeans_clusters$cluster,
  levels = unname(cluster_map),
  labels = names(cluster_map)
)

my_cols <- c(
  "Perching"  = "#7CAE00",
  "Ascending" = "#F8766D",
  "Flapping"  = "#00BFC4",
  "Gliding"   = "#C77CFF"
)

loadings <- as.data.frame(eagle_pca$rotation)
loadings$var <- rownames(loadings)

arrow_scale <- 6
loadings$PC1 <- loadings$PC1 * arrow_scale
loadings$PC2 <- loadings$PC2 * arrow_scale

p <- ggplot(scores, aes(x = PC1, y = PC2, color = cluster)) +
  geom_point(size = 1.5,
             alpha = 1) +
  geom_segment(data = loadings,
               aes(x = 0, y = 0, xend = PC1, yend = PC2),
               arrow = arrow(length = unit(0.3, "cm")),
               color = "black",
               size = 1.2) +
  geom_text_repel(data = loadings,
                  aes(x = PC1, y = PC2, label = var),
                  color = "black",
                  size = 10,
                  nudge_x = loadings$PC1 * 0.1,
                  nudge_y = loadings$PC2 * 0.1) +
  scale_color_manual(values = my_cols) +
  coord_equal() +
  scale_x_continuous(limits = c(-4, 7), name = "PC1") +
  scale_y_continuous(limits = c(-2, 5), name = "PC2") +
  ggtitle("K = 4") +
  theme_classic(base_size = 28)+
  theme(legend.position = "none")

saveRDS(p, "k4_cluster_plot.rds")
k4_plot <- readRDS("k4_cluster_plot.rds")
k4_plot

Box plots

centers <- as.data.frame(kmeans_clusters$centers)

cluster_order <- centers %>%
  mutate(cluster = 1:4) %>%
  arrange(KPH) %>%
  pull(cluster)


cluster_map <- c(
  "Perching"  = cluster_order[1],
  "Ascending" = cluster_order[2],
  "Flapping"  = cluster_order[3],
  "Gliding"   = cluster_order[4]
)

clusters_fixed <- factor(
  kmeans_clusters$cluster,
  levels = unname(cluster_map),
  labels = names(cluster_map)
)

cluster_colors <- c(
  "Perching" =  "#7CAE00",
  "Ascending" =  "#F8766D",
  "Flapping" = "#00BFC4",
  "Gliding" ="#C77CFF"
)

eagle_small <- eagle_small %>%
  mutate(Cluster = clusters_fixed)


saveRDS(eagle_pca, "eagle_pca.rds")
saveRDS(kmeans_clusters, "kmeans_clusters.rds")
saveRDS(cluster_order, "cluster_order.rds")
saveRDS(cluster_map, "cluster_map.rds")
saveRDS(cluster_colors, "cluster_colors.rds")
saveRDS(eagle_small, "eagle_small_with_clusters.rds")

eagle_pca <- readRDS("eagle_pca.rds")
kmeans_clusters <- readRDS("kmeans_clusters.rds")
cluster_order <- readRDS("cluster_order.rds")
cluster_map <- readRDS("cluster_map.rds")
cluster_colors <- readRDS("cluster_colors.rds")
eagle_small <- readRDS("eagle_small_with_clusters.rds")


make_box <- function(var_name, y_label){
  ggplot(eagle_small, aes(x = Cluster, y = .data[[var_name]], fill = Cluster)) +
    geom_boxplot(color = "black", outlier.shape = NA) +
    scale_fill_manual(values = cluster_colors) +
    labs(
      title = var_name,
      x = "",
      y = y_label
    ) +
    geom_hline(yintercept = 0, linetype = "dashed") +
    theme_classic(base_size = 12) +
    theme(legend.position = "none")
}

p1 <- make_box("KPH" , "(km/hr)")
p2 <- make_box("AGL", "(m)")
p3 <- make_box("abs_angle","Absolute Turn Angle (radians)")
p4 <- make_box("VerticalRate", "Vertical Rate of Change (m/s)") +
  scale_y_continuous(limits = c(-5, 5))

(p1 | p2) /
(p3 | p4)

Flight with different in-flight behaviors

vars <- c("KPH", "Sn", "AGL", "abs_angle", "VerticalRate")


df <- eagle_data %>%
  drop_na(all_of(vars))

X <- scale(df %>% select(all_of(vars)))

set.seed(123)
km <- kmeans(X, centers = 4)

df$cluster <- factor(km$cluster)


perching_cluster <- df %>%
  group_by(cluster) %>%
  summarise(mean_speed = mean(KPH, na.rm = TRUE)) %>%
  arrange(mean_speed) %>%
  slice(1) %>% pull(cluster)

bird_id <- 107


flight_segment <- df %>%
  filter(Animal_ID == bird_id, cluster != perching_cluster) %>%
  count(segment_id) %>%
  arrange(desc(n))


flight_segment <- flight_segment$segment_id[10]   



bird_df <- df %>%
  filter(Animal_ID == bird_id) %>%
  filter(segment_id == flight_segment) %>% 
  filter(cluster != perching_cluster) %>% 
  arrange(LocalTime) %>%
  mutate(seconds = cumsum(TimeDiff))


remaining_clusters <- sort(unique(bird_df$cluster))

custom_colors <- setNames(
  c("#F8766D", "#00BFC4", "#C77CFF"),
  remaining_clusters
)



p_kph <- ggplot(bird_df, aes(x = seconds, y = KPH)) +
  geom_line(color = "grey70", linewidth = 0.8, alpha = 0.4) +   
  geom_point(aes(color = cluster), size = 2.5, alpha = 0.9) + 
  scale_color_manual(values = custom_colors) +
  theme_classic(base_size = 18) +
  theme(legend.position = "none") +
  labs(
    title = "KPH Over Time",
    x = "Seconds",
    y = "KPH"
  )

p_agl <- ggplot(bird_df, aes(x = seconds, y = AGL, color = cluster)) +
  geom_line(alpha = 0.5) +
  geom_point(size = 2.5) +
  scale_color_manual(values = custom_colors) +
  theme_classic(base_size = 18) +
  theme(legend.position = "none")+
  labs(
    title = paste("AGL Over Time"),
    x = "Seconds",
    y = " Above the ground (m)"
  )


final_plot <- p_kph / p_agl
final_plot