Classifying Bald Eagle Behavior from GPS Data

Background

Bergen et al. (2022), Ecology & Evolution

  • Study focuses on free-flying bald eagles in Iowa tracked with short-interval GPS biologgers (3–10 second fixes)
  • This produces millions of fine-scale movement points, offering rich behavioral information but also major analytical challenges
  • Goal: Use K-means clustering on short-interval GPS data to classify behavior

Research questions:

  1. Can we use the movement variables KPH, Sn, AGL, |Angle|, Vertical rate, and |VR| to distinguish in-flight from perching points?

  2. Of the in-flight points:

    • Are there distinct flight behaviors that can be identified?
    • What are the characteristics of these behaviors?
    • What are some visual examples of flight segments that demonstrate the different types of in-flight behaviors?

Data Snippet

  • 6 movement variables used for clustering:
    • Speed (KPH)
    • Speed between fixes (Sn)
    • Altitude above ground (AGL)
    • Absolute turn angle (|Angle|)
    • Vertical rate (VR)
    • Absolute vertical rate (|VR|)
  • Data include millions of raw GPS points, sampled to 500 points per eagle for PCA/K-means
  Animal_ID TimeDiff segment_id segment_length           LocalTime  KPH   Sn   AGL abs_angle VerticalRate absVR
1       105        5          1             10 2019-06-25 16:44:10 0.07 0.52  9.55      0.45        -0.99  0.99
2       105        6          1             10 2019-06-25 16:44:16 0.07 0.17  9.55      0.21         0.00  0.00
3       105        5          1             10 2019-06-25 16:44:21 0.20 0.14  9.55      0.91         0.00  0.00
4       105        6          1             10 2019-06-25 16:44:27 0.16 0.22  9.55      3.14         0.00  0.00
5       105        5          1             10 2019-06-25 16:44:32 0.11 0.08 11.55      2.69         0.39  0.39
6       105        6          1             10 2019-06-25 16:44:38 0.07 0.22 11.55      0.53         0.00  0.00

Methods - Variable Distributions

  • Movement variables are strongly right-skewed
  • Speed, altitude, and vertical rates contain extreme values
  • Therefore, I performed transformations on the data to help stabilize the variation

Methods - Transformation & Standardization

  • Several movement variables (KPH, Sn, AGL, |Angle|, |VR|) were highly right-skewed
  • To reduce skew, I applied square-root transformations to these variables
    (VerticalRate itself was left untransformed)
  • Because K-means uses Euclidean distance, each variable needs to be on a comparable scale
  • All six movement variables (transformed + VerticalRate) were then
    standardized (mean = 0, sd = 1) before PCA and K-means clustering

Methods - Sampling

  • The full dataset contains millions of GPS points across multiple eagles.
  • To make computation feasible while retaining biological signal:
    • I used a random subsample (~10,000 points) for initial exploratory histograms
    • I used sampling by bird (e.g., 2,000 points per eagle) for clustering evaluation
  • Stratifying by individual ensures that all birds contribute equally
  • This sampling approach is used for:
    • bootstrap silhouette analysis
    • elbow plot
    • final clustering model

Choosing K - Silhouette Plot

Choosing K - WSS Plot

PCA – Cluster Structure (K = 2–7)

  • PCA provides a 2-dimensional representation of the six movement variables
  • Running K-means for K = 2–7 shows how cluster structure changes as K increases
  • Patterns seen here helped determine that K = 4 provides meaningful behavioral separation

PCA – Final Clusters (K = 4)

PCA – Final Clusters (K = 4)

  • Perching (gold): compact cluster at low PC1 with PC2 ≈ 0 → slow, near-ground points
  • Ascending (green): high PC2 near PC1 ≈ 0 → steep upward flights
  • Flapping (blue): spread around the origin → intermediate speeds and mixed movement
  • Gliding (orange): stretched along positive PC1 with PC2 ≈ 0 → fast, level flights
  • Overall, K = 4 captures four distinct movement behaviors in PCA space

PCA Loadings (PC1 & PC2)

Interpreting PCA Loadings

  • PC1: driven mainly by speed (KPH, Sn) and altitude (AGL)
  • PC2: driven mainly by vertical motion (VR, |VR|) and turn angle (|Angle|)
  • Together, PC1 and PC2 summarize the main trade-off between
    • fast / high vs slow / low flight, and
    • strong vertical / turning movement vs level gliding
  • These patterns help explain the cluster layout in the K = 4 PCA plot

Movement Variables by Behavior

Summary

  • Movement variables capture distinct behaviors
    • Speed, altitude, turn angle, and vertical rate together describe how eagles move in 3D space
  • PCA + K-means revealed four main flight behaviors
    • K = 4 clusters aligned with: perching, ascending,flapping and gliding
  • Behavioral patterns are interpretable from the plots
    • PCA/NMDS biplots and boxplots show clear differences in speed, altitude, and vertical rate across behavior

Appendix

Data snippet

options(width = 10000)

load(url("https://www.dropbox.com/scl/fi/cwk63bxp42m4h256xusg7/eagle_data.Rdata?rlkey=boszpwc6a0odvw9xl4yq0w5jq&dl=1"))

(eagle_data %>% as_tibble()) %>% head()

eagle_data |>
  dplyr::select(
    Animal_ID, TimeDiff, segment_id, segment_length,
    LocalTime, KPH, Sn, AGL
  ) |>
  head() |>
  knitr::kable()

Distribution Plot

set.seed(415)

eagle_fig1 <- eagle_data %>%
  dplyr::slice_sample(n = 10000)

cols <- c("KPH", "Sn", "AGL", "abs_angle", "VerticalRate", "absVR")

plot_hist <- function(df, var) {
  ggplot(df, aes(x = .data[[var]])) +
    geom_histogram(
      aes(y = after_stat(density)),
      bins = 60,
      fill = "grey50",
      color = "grey30"
    ) +
      labs(title = var, y = "Density", x = NULL) +
      theme_classic(base_size = 16)
}

p_list <-map(cols, ~plot_hist(eagle_fig1, .x))

(p_list[[1]] | p_list[[2]] | p_list[[3]]) / 
(p_list[[4]] | p_list[[5]] | p_list[[6]]) 

Silhouette Plot

sil_mean <- sil_boot %>%
  group_by(K) %>%
  summarize(mean_sil = mean(silhouette), .groups = "drop")

ggplot() +
  geom_line(data = sil_boot,
            aes(x = K, y = silhouette, group = bootstrap),
            alpha = 0.2) +  # lots of thin grey lines
  geom_line(data = sil_mean,
            aes(x = K, y = mean_sil),
            linewidth = 1.2) +  # thick mean line
  labs(
    title    = "Bootstrapped average silhouette vs K",
    subtitle = "Thin lines = bootstrap samples; thick line = mean",
    y        = "Average silhouette"
  )

WSS Plot

set.seed(123)

vars <- vars_tr

sample_per_bird <- 500     
K_vals <- 2:7               


df1 <- eagle_data_sqrt_trs %>%
  group_by(Animal_ID) %>%
  sample_n(sample_per_bird, replace = TRUE) %>%
  ungroup() %>%
  select(all_of(vars_tr))

X <- scale(df1)

wss <- numeric(length(K_vals))

for (i in seq_along(K_vals)) {
  k <- K_vals[i]
  km <- kmeans(X, centers = k, nstart = 20, iter.max = 100)
  wss[i] <- km$tot.withinss     
}

wss_df <- tibble(
  K   = K_vals,
  WSS = wss
)

K 2-7 Clusters

ggplot(all_scores, aes(PC1, PC2, color = cluster)) +
  geom_point(alpha = 0.25, size = 0.3) +
  facet_wrap(~ K, nrow = 2) +
  labs(
    x = "PC1",
    y = "PC2",
    title = "Bird 106 – PCA biplots for K-means clustering"
  ) +
  theme_minimal() +
  theme(
    legend.position = "none",
    strip.text = element_text(face = "bold")
  ) +
  scale_x_continuous(
  limits = c(-5, 5),
  breaks = c(-5, -2.5, 0, 2.5, 5)
) +
scale_y_continuous(
  limits = c(-5, 5),
  breaks = c(-5, -2.5, 0, 2.5, 5)
)

Biplot (K = 4)

library(factoextra)

behavior <- factor(
  km4$cluster,
  levels = c(2, 1, 4, 3),
  labels = c("Perching", "Ascending", "Flapping", "Gliding")
)

p <- fviz_pca(
  pca_fit,
  axes      = c(1, 2),
  habillage = behavior,
  palette   = c("gold", "forestgreen", "dodgerblue3", "darkorange"),
  label     = "var",
  repel     = TRUE,
  labelsize = 4,
  geom.ind  = "point",
  pointsize = 0.3,
  alpha.ind = 0.15,
  col.var   = "black",
  arrowsize = 0.7
)

p +
  coord_equal(xlim = c(-5, 10), ylim = c(-5, 10)) +
  labs(
    title = "K = 4",
    x = "PC1",
    y = "PC2",
    color = "Flight behavior"
  ) +
  guides(
    shape = "none",
    fill  = "none"
  ) +
  theme_classic(base_size = 16)

Loadings

#| fig-width: 7
#| fig-height: 4


loadings_long <- loadings %>%
  select(variable, PC1, PC2) %>%  
  pivot_longer(
    cols      = starts_with("PC"),
    names_to  = "PC",
    values_to = "loading"
  ) %>%
  group_by(PC) %>%                
  mutate(
    variable = forcats::fct_reorder(
      variable, loading, .desc = TRUE  
    )
  ) %>%
  ungroup()

ggplot(loadings_long,
       aes(x = variable, y = loading)) +
  geom_col(fill = "grey40") +
  facet_wrap(~ PC, ncol = 1) +    
  coord_cartesian(ylim = c(-0.5, 0.5)) +
  labs(
    x = "Variable",
    y = "Loading",
    title = "PC1 and PC2 loadings (transformed, standardized data)"
  ) +
  theme_minimal() +
  theme(
    strip.text = element_text(face = "bold"),
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

Boxplots

df_box <- df_106 %>%
  mutate(
    cluster = factor(km4$cluster) 
  ) %>%
  select(
    cluster,
    KPH,
    AGL,
    abs_angle,
    VerticalRate
  )

cluster_box <- df_box %>%
  filter(VerticalRate >= -5, VerticalRate <= 5) %>%
  select(
    cluster,
    KPH,
    AGL,
    abs_angle,
    VerticalRate
  ) %>%
  tidyr::pivot_longer(
    cols      = c(KPH, AGL, abs_angle, VerticalRate),
    names_to  = "variable",
    values_to = "value"
  ) %>%
  mutate(
    variable = factor(variable,
                      levels = c("KPH", "AGL", "abs_angle", "VerticalRate"))
  )
cluster_cols <- c(
  "1" = "gold",
  "2" = "forestgreen",
  "3" = "dodgerblue3",
  "4" = "darkorange"
)

vr_line <- data.frame(variable = "VerticalRate", yint = 0)

ggplot(cluster_box, aes(x = cluster, y = value, fill = cluster)) +
  geom_boxplot(outlier.size = 0.7, alpha = 0.8) +
  geom_hline(
    data = vr_line,
    aes(yintercept = yint),
    linetype = "dashed",
    inherit.aes = FALSE
  ) +
  facet_wrap(~ variable, scales = "free_y", ncol = 2) +
  scale_fill_manual(values = cluster_cols) +
  labs(
    x = "Cluster",
    y = NULL,
    title = "Boxplots of key variables across clusters (untransformed values)"
  ) +
  theme_classic() +
  theme(
    legend.position = "none",
    strip.text      = element_text(face = "bold")
  )