A look at four distinct behavioral clusters: Perching, Gliding, Ascending, and Flapping
Bergen et al., 2021
Classify bald eagle GPS points across Iowa using K-means clustering.
Focus:
Bald eagle movement variables (KPH, Sn, AGL, |Angle|, Vertical rate, |VR|)
Distinguish in-flight vs perching points
Identify flight behaviors: ascending, flapping, gliding
2,093,022 GPS points from 57 eagles over 4 years
Contains six focal variables associated with GPS data collected from bald eagles in Iowa, USA.
| Animal_ID | TimeDiff | segment_id | segment_length | LocalTime | Latitude | Longitude | X | Y | KPH | Sn | AGL | VerticalRate | abs_angle | absVR |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 105 | 5 | 1 | 10 | 2019-06-25 16:44:10 | 41.63805 | -92.88300 | 509745.4 | 4609596 | 0.07 | 0.52 | 9.55 | -0.99 | 0.45 | 0.99 |
| 105 | 6 | 1 | 10 | 2019-06-25 16:44:16 | 41.63805 | -92.88299 | 509746.0 | 4609597 | 0.07 | 0.17 | 9.55 | 0.00 | 0.21 | 0.00 |
| 105 | 5 | 1 | 10 | 2019-06-25 16:44:21 | 41.63806 | -92.88298 | 509746.6 | 4609597 | 0.20 | 0.14 | 9.55 | 0.00 | 0.91 | 0.00 |
| 105 | 6 | 1 | 10 | 2019-06-25 16:44:27 | 41.63807 | -92.88298 | 509746.6 | 4609599 | 0.16 | 0.22 | 9.55 | 0.00 | 3.14 | 0.00 |
| 105 | 5 | 1 | 10 | 2019-06-25 16:44:32 | 41.63807 | -92.88298 | 509746.6 | 4609598 | 0.11 | 0.08 | 11.55 | 0.39 | 2.69 | 0.39 |
| 105 | 6 | 1 | 10 | 2019-06-25 16:44:38 | 41.63808 | -92.88299 | 509746.0 | 4609599 | 0.07 | 0.22 | 11.55 | 0.00 | 0.53 | 0.00 |
Some variables were positively skewed, so a square-root transformation was used to make their distributions more even before clustering.
Full dataset was too large for clustering (~2.1M points).
Took 500 points per eagle to create an even sample.
Ensures balanced representation across all 57 eagles.
Confirmed the pattern is consistent across all samples.
Multiple random samples show similar patterns, indicating that the sampling is consistent and ready for clustering.
Within Sum of Square
No clear “elbow” appeared in the plot.
The curve decreased smoothly without a sharp bend.
This suggests we need a better method for choosing the number of clusters.
Average silhouette width method
Bootstrapped average silhouette widths were highest for K = 2 and K = 4.
K = 4 showed slightly higher average silhouette width overall.
k = 4 is the optimal number of clusters
Sampled and scaled data were clustered at multiple K values to visualize how the behavioral structure changes.
KPH: instantaneous speed
Sn: average speed
AGL: height above ground
Vertical Rate: upward/downward movement
Abs_angle: absolute turning angle
Identifying differences between perching and in-flight behavior.
Perching: Very low speed (both instantaneous and average) and low altitude.
Ascending: Strong upward movement (VR) with higher altitude gains(AGL).
Flapping: Moderate speeds and mixed movement patterns near the center.
Gliding: very high speed (both instantaneous and average).
• Mostly gliding and ascending, with a bit of flapping.
Gliding: highest speeds (KPH).
Ascending has higher altitudes (AGL).
Flapping: moderate speeds, occurring closer to the ground.
Yes, the movement variables clearly separate perching from in-flight points.
Among points, four distinct behaviors appear:
Perching: very low speed/altitude, little movement, large turns
Ascending: medium speed, rising altitude, upward movement
Flapping: mid speeds, steady altitude, moderate turns
Gliding: fastest speeds, high altitude, low turns, slight descent
eagle_data_td <- eagle_data %>%
mutate(
KPH = sqrt(KPH),
Sn = sqrt(Sn),
AGL = sqrt(AGL),
abs_angle = sqrt(abs_angle),
VerticalRate = VerticalRate,
absVR = sqrt(absVR)
)
ordered_vars2 <- c("KPH", "Sn", "AGL", "abs_angle", "VerticalRate", "absVR")
df_long2 <- eagle_data_td %>%
select(all_of(ordered_vars2)) %>%
pivot_longer(cols = everything(),
names_to = "Variable",
values_to = "Value") %>%
mutate(Variable = factor(Variable, levels = ordered_vars2))
p1<- ggplot(df_long2, aes(x = Value)) +
geom_histogram(aes(y = after_stat(density)),
bins = 60, fill = "grey50", color = "grey30") +
facet_wrap(~ Variable, scales = "free", ncol = 3) +
labs(y = "Density", x = NULL) +
theme_classic(base_size = 12)+
theme(
strip.text = element_text(size = 10),
aspect.ratio = 1
)
p1 create_biplot_clean <- function(data, n_points = 500, title_suffix = "") {
sampled <- data %>%
group_by(Animal_ID) %>%
slice_sample(n = n_points, replace = TRUE) %>%
ungroup() %>%
select(KPH:absVR) %>%
scale()
eagle_pca <- prcomp(sampled, center = TRUE)
if (cor(eagle_pca$x[,2], sampled[, "AGL"]) < 0) {
eagle_pca$x[,2] <- -eagle_pca$x[,2]
eagle_pca$rotation[,2] <- -eagle_pca$rotation[,2]
}
scores <- as.data.frame(eagle_pca$x)
scores$PC1 <- scores$PC1
scores$PC2 <- scores$PC2
ggplot() +
geom_point(data = scores, aes(PC1, PC2),
size = 0.2,
alpha = 0.5,
color="steelblue") +
coord_equal(xlim = c(-7, 7), ylim = c(-5, 5)) +
labs(title = title_suffix, x = "PC1", y = "PC2") +
theme_classic(base_size = 14)+
theme(
panel.spacing = unit(1, "lines"),,
aspect.ratio = 0.8,
plot.margin = margin(7, 7, 7, 7),
strip.text = element_text(size = 10)
)
}
biplot1 <- create_biplot_clean(eagle_data_td, title_suffix = "Sample 1")
biplot2 <- create_biplot_clean(eagle_data_td, title_suffix = "Sample 2")
biplot3 <- create_biplot_clean(eagle_data_td, title_suffix = "Sample 3")
biplot4 <- create_biplot_clean(eagle_data_td, title_suffix = "Sample 4")
all_biplots <- wrap_plots(biplot1, biplot2, biplot3, biplot4, ncol = 2)
saveRDS(all_biplots, "all_biplots.rds")
all_biplots <- readRDS("all_biplots.rds")
all_biplotslibrary(cluster)
sample_and_scale <- function(data, n_points = 1000) {
data %>%
group_by(Animal_ID) %>%
slice_sample(n = n_points, replace = TRUE) %>%
ungroup() %>%
select(KPH:absVR) %>%
scale() %>%
as.data.frame()
}
silhouette_fun <- function(df, kmax = 10) {
d <- dist(df)
tibble(
k = 2:kmax,
sil = map_dbl(2:kmax, ~{
km <- kmeans(df, centers = .x)
ss <- silhouette(km$cluster, d)
mean(ss[, 3])
})
)
}
set.seed(123)
n_runs <- 20
sil_results <- map_df(1:n_runs, ~{
df_sample <- sample_and_scale(eagle_data)
silhouette_fun(df_sample) %>% mutate(run = .x)
})
sil_avg <- sil_results %>%
group_by(k) %>%
summarize(sil = mean(sil))
saveRDS(sil_results, "sil_results.rds")
saveRDS(sil_avg, "sil_avg.rds")
ggplot() +
geom_line(
data = sil_results,
aes(k, sil, group = run),
alpha = 0.15
) +
geom_line(
data = sil_avg,
aes(k, sil),
size = 1.4,
color = "blue"
) +
labs(
title = "",
x = "K",
y = "Average silhouette Width"
) +
theme_classic(base_size = 16)library(patchwork)
set.seed(123)
k_values <- 2:7
plots <- list()
eagle_pca <- prcomp(eagle_sample_scaled, center = TRUE, scale. = TRUE)
if(cor(eagle_pca$x[,2], eagle_sample_scaled[,"AGL"]) < 0) {
eagle_pca$x[,2] <- -eagle_pca$x[,2]
eagle_pca$rotation[,2] <- -eagle_pca$rotation[,2]
}
scores <- as.data.frame(eagle_pca$x)
loadings <- as.data.frame(eagle_pca$rotation)
loadings$var <- rownames(loadings)
arrow_scale <- 3
loadings$PC1 <- loadings$PC1 * arrow_scale
loadings$PC2 <- loadings$PC2 * arrow_scale
for (k in k_values) {
kmeans_clusters <- kmeans(eagle_sample_scaled,
centers = k,
iter.max = 50,
nstart = 10)
scores$cluster <- factor(kmeans_clusters$cluster)
p <- ggplot(scores, aes(x = PC1,
y = PC2,
color = cluster)) +
geom_point(size = 0.4,
alpha = 0.5) +
scale_x_continuous(limits = c(-4, 7), name = "PC1") +
scale_y_continuous(limits = c(-2, 5), name = "PC2") +
ggtitle(paste("K =", k)) +
guides(color = "none") +
theme_classic(base_size = 11)+
theme(
panel.spacing = unit(1, "lines"),,
aspect.ratio = 0.8,
plot.margin = margin(7, 7, 7, 7),
strip.text = element_text(size = 10)
)
plots[[k]] <- p
}
all_biplots_2_7 <- plots[[2]] + plots[[3]] + plots[[4]] + plots[[5]] + plots[[6]] + plots[[7]] + plot_layout(nrow = 2)
saveRDS(all_biplots_2_7, "all_pca_biplots_2_7.rds")
all_biplots_2_7 <- readRDS("all_pca_biplots_2_7.rds")
all_biplots_2_7library(ggrepel)
eagle_pca <- prcomp(eagle_sample_scaled, center = TRUE, scale. = TRUE)
if (cor(eagle_pca$x[,2], eagle_sample_scaled[,"AGL"]) < 0) {
eagle_pca$x[,2] <- -eagle_pca$x[,2]
eagle_pca$rotation[,2] <- -eagle_pca$rotation[,2]
}
scores <- as.data.frame(eagle_pca$x)
kmeans_clusters <- kmeans(eagle_sample_scaled, centers = 4, iter.max = 50, nstart = 10)
cluster_order <- order(tapply(eagle_sample_scaled[,1], kmeans_clusters$cluster, mean))
cluster_map <- c(
"Perching" = cluster_order[1],
"Ascending" = cluster_order[2],
"Flapping" = cluster_order[3],
"Gliding" = cluster_order[4]
)
scores$cluster <- factor(
kmeans_clusters$cluster,
levels = unname(cluster_map),
labels = names(cluster_map)
)
my_cols <- c(
"Perching" = "#7CAE00",
"Ascending" = "#F8766D",
"Flapping" = "#00BFC4",
"Gliding" = "#C77CFF"
)
loadings <- as.data.frame(eagle_pca$rotation)
loadings$var <- rownames(loadings)
arrow_scale <- 6
loadings$PC1 <- loadings$PC1 * arrow_scale
loadings$PC2 <- loadings$PC2 * arrow_scale
p <- ggplot(scores, aes(x = PC1, y = PC2, color = cluster)) +
geom_point(size = 1.5,
alpha = 1) +
geom_segment(data = loadings,
aes(x = 0, y = 0, xend = PC1, yend = PC2),
arrow = arrow(length = unit(0.3, "cm")),
color = "black",
size = 1.2) +
geom_text_repel(data = loadings,
aes(x = PC1, y = PC2, label = var),
color = "black",
size = 10,
nudge_x = loadings$PC1 * 0.1,
nudge_y = loadings$PC2 * 0.1) +
scale_color_manual(values = my_cols) +
coord_equal() +
scale_x_continuous(limits = c(-4, 7), name = "PC1") +
scale_y_continuous(limits = c(-2, 5), name = "PC2") +
ggtitle("K = 4") +
theme_classic(base_size = 28)+
theme(legend.position = "none")
saveRDS(p, "k4_cluster_plot.rds")
k4_plot <- readRDS("k4_cluster_plot.rds")
k4_plotcenters <- as.data.frame(kmeans_clusters$centers)
cluster_order <- centers %>%
mutate(cluster = 1:4) %>%
arrange(KPH) %>%
pull(cluster)
cluster_map <- c(
"Perching" = cluster_order[1],
"Ascending" = cluster_order[2],
"Flapping" = cluster_order[3],
"Gliding" = cluster_order[4]
)
clusters_fixed <- factor(
kmeans_clusters$cluster,
levels = unname(cluster_map),
labels = names(cluster_map)
)
cluster_colors <- c(
"Perching" = "#7CAE00",
"Ascending" = "#F8766D",
"Flapping" = "#00BFC4",
"Gliding" ="#C77CFF"
)
eagle_small <- eagle_small %>%
mutate(Cluster = clusters_fixed)
saveRDS(eagle_pca, "eagle_pca.rds")
saveRDS(kmeans_clusters, "kmeans_clusters.rds")
saveRDS(cluster_order, "cluster_order.rds")
saveRDS(cluster_map, "cluster_map.rds")
saveRDS(cluster_colors, "cluster_colors.rds")
saveRDS(eagle_small, "eagle_small_with_clusters.rds")
eagle_pca <- readRDS("eagle_pca.rds")
kmeans_clusters <- readRDS("kmeans_clusters.rds")
cluster_order <- readRDS("cluster_order.rds")
cluster_map <- readRDS("cluster_map.rds")
cluster_colors <- readRDS("cluster_colors.rds")
eagle_small <- readRDS("eagle_small_with_clusters.rds")
make_box <- function(var_name, y_label){
ggplot(eagle_small, aes(x = Cluster, y = .data[[var_name]], fill = Cluster)) +
geom_boxplot(color = "black", outlier.shape = NA) +
scale_fill_manual(values = cluster_colors) +
labs(
title = var_name,
x = "",
y = y_label
) +
geom_hline(yintercept = 0, linetype = "dashed") +
theme_classic(base_size = 12) +
theme(legend.position = "none")
}
p1 <- make_box("KPH" , "(km/hr)")
p2 <- make_box("AGL", "(m)")
p3 <- make_box("abs_angle","Absolute Turn Angle (radians)")
p4 <- make_box("VerticalRate", "Vertical Rate of Change (m/s)") +
scale_y_continuous(limits = c(-5, 5))
(p1 | p2) /
(p3 | p4)vars <- c("KPH", "Sn", "AGL", "abs_angle", "VerticalRate")
df <- eagle_data %>%
drop_na(all_of(vars))
X <- scale(df %>% select(all_of(vars)))
set.seed(123)
km <- kmeans(X, centers = 4)
df$cluster <- factor(km$cluster)
perching_cluster <- df %>%
group_by(cluster) %>%
summarise(mean_speed = mean(KPH, na.rm = TRUE)) %>%
arrange(mean_speed) %>%
slice(1) %>% pull(cluster)
bird_id <- 107
flight_segment <- df %>%
filter(Animal_ID == bird_id, cluster != perching_cluster) %>%
count(segment_id) %>%
arrange(desc(n))
flight_segment <- flight_segment$segment_id[10]
bird_df <- df %>%
filter(Animal_ID == bird_id) %>%
filter(segment_id == flight_segment) %>%
filter(cluster != perching_cluster) %>%
arrange(LocalTime) %>%
mutate(seconds = cumsum(TimeDiff))
remaining_clusters <- sort(unique(bird_df$cluster))
custom_colors <- setNames(
c("#F8766D", "#00BFC4", "#C77CFF"),
remaining_clusters
)
p_kph <- ggplot(bird_df, aes(x = seconds, y = KPH)) +
geom_line(color = "grey70", linewidth = 0.8, alpha = 0.4) +
geom_point(aes(color = cluster), size = 2.5, alpha = 0.9) +
scale_color_manual(values = custom_colors) +
theme_classic(base_size = 18) +
theme(legend.position = "none") +
labs(
title = "KPH Over Time",
x = "Seconds",
y = "KPH"
)
p_agl <- ggplot(bird_df, aes(x = seconds, y = AGL, color = cluster)) +
geom_line(alpha = 0.5) +
geom_point(size = 2.5) +
scale_color_manual(values = custom_colors) +
theme_classic(base_size = 18) +
theme(legend.position = "none")+
labs(
title = paste("AGL Over Time"),
x = "Seconds",
y = " Above the ground (m)"
)
final_plot <- p_kph / p_agl
final_plot