Analyzing Bald Eagle Flight Behaviors Using GPS Biologger Data

Walter Weber

Introduction

Background on bald eagle GPS biologger study
Data collected from eagles in Iowa
GPS tracking provides detailed movement patterns

Research Questions

Can movement variables distinguish in-flight from perching points?
For in-flight points:
- Are there distinct flight behaviors?
- What characterizes these behaviors?
- Visual examples of different flight types?

Methods: Data Preparation

Loaded eagle GPS data containing 15 variables
Removed observations with missing values in key movement variables
Square-rooted transformed to account for skewness
Brought data from ~2000000 to ~400000 rows

Exploratory Data Analysis

Strong correlations between speed measures (KPH and Sn: r = 0.83)
Bimodal AGL distribution suggests distinct altitude patterns

Methods: Flight vs. Perching Classification

Used K-means clustering with all six movement variables
Standardized variables to ensure equal contribution to clustering
Evaluated optimal number of clusters using elbow method and silhouette analysis
Two clusters identified: perching (low speed, low altitude variability) and flight (high speed, dynamic movement)
Validated results by examining cluster characteristics

Results: Flight vs. Perching

Successfully classified 31.1% as flight, 68.9% as perching

Methods: Identifying Flight Behaviors

Applied K-means clustering to in-flight points only
Used same six movement variables as flight vs. perching analysis
Evaluated 2-10 clusters using elbow method and silhouette scores
Selected optimal number of clusters: k = 5
Assigned behavior labels to each in-flight observation

Results: Optimal Clusters Selection

Both methods suggest k = 5 is optimal

Results: Flight Behavior Types

Identified 5 distinct flight behavior types

Results: Behavior Characteristics

Behavior Interpretation

Behavior	N	Speed (KPH)	Altitude (m)	Vertical Rate	Turn Angle
1	157786	44.40	315.28	-1.31	0.24
2	92108	76.24	525.47	-1.24	0.14
3	188232	56.38	157.40	-0.22	0.22
4	58936	54.18	429.82	0.50	1.25
5	76724	61.73	394.90	-3.46	0.20

Results: Example Flight Segments

Conclusions

Successfully distinguished flight from perching using movement variables
Identified 5 distinct flight behavior types
Each behavior characterized by unique movement patterns:
- Differences in speed, altitude, and turning behavior
- Consistent with biological expectations from Bergen et al. (2022)
Implications for understanding eagle behavior and ecology
K-means clustering effective for large GPS datasets

References

Bergen, S., Huso, M.M., Duerr, A.E., Braham, M.A., Katzner, T.E., Schmuecker, S., & Miller, T.A. (2022). Classifying behavior from short-interval biologging data: An example with GPS tracking of birds. Ecology and Evolution, 12, e8395.

Appendix: R Code

Setup and Data Loading

# Load all required libraries
library(tidyverse)      # Data manipulation and visualization
library(cluster)        # Clustering algorithms and silhouette analysis
library(factoextra)     # PCA and cluster visualization
library(gridExtra)      # Arrange multiple plots

options(width = 10000)
load('eagle_data.Rdata')

# Convert to tibble for easier manipulation
eagle_data <- eagle_data %>% as_tibble()

# Inspect the data structure
head(eagle_data)
str(eagle_data)
summary(eagle_data)

Data Cleaning

# Check for missing values in key movement variables
# This helps us understand data quality before filtering
eagle_data %>% 
  select(KPH, Sn, AGL, abs_angle, VerticalRate, absVR) %>%
  summarise(across(everything(), ~sum(is.na(.))))

# Remove observations with missing values in any of the key movement variables
# Complete cases are necessary for clustering algorithms
eagle_clean <- eagle_data %>%
  filter(!is.na(KPH), !is.na(Sn), !is.na(AGL), 
         !is.na(abs_angle), !is.na(VerticalRate), !is.na(absVR))

# Filter to segments with at least 10 consecutive time points
eagle_clean <- eagle_clean %>%
  filter(segment_length >= 10)

#Square-root transform to account for skewness
eagle_clean <- eagle_clean %>%
  mutate(
    KPH_trans = sqrt(KPH),
    Sn_trans = sqrt(Sn),
    AGL_trans = sqrt(AGL),
    abs_angle_trans = sqrt(abs_angle),
    absVR_trans = sqrt(absVR),
    VerticalRate_trans = VerticalRate
  )

# Verify cleaning results
cat("Original observations:", nrow(eagle_data), "\n")
cat("Clean observations:", nrow(eagle_clean), "\n")
cat("Unique eagles:", n_distinct(eagle_clean$Animal_ID), "\n")
cat("Unique segments:", n_distinct(eagle_clean$segment_id), "\n")

Exploratory Data Analysis: Summary Statistics

# Summary statistics of key movement variables
# Provides overview of central tendency and spread
eagle_clean %>%
  select(KPH, Sn, AGL, abs_angle, VerticalRate, absVR) %>%
  summary()

# Calculate standard deviations
eagle_clean %>%
  select(KPH, Sn, AGL, abs_angle, VerticalRate, absVR) %>%
  summarise(across(everything(), ~sd(., na.rm = TRUE)))

Exploratory Data Analysis: Correlations

# Correlation matrix of movement variables
cor_matrix <- eagle_clean %>%
  select(KPH, Sn, AGL, abs_angle, VerticalRate, absVR) %>%
  cor(use = "complete.obs")

print(round(cor_matrix, 2))

# Note the high correlation between KPH and Sn
# Both measure speed, so correlation is expected
cat("\nCorrelation between KPH and Sn:", round(cor_matrix["KPH", "Sn"], 3), "\n")

Exploratory Data Analysis: Visualizations

# Visualize distributions of each variable
# Instantaneous speed (KPH)
p1 <- ggplot(eagle_clean, aes(x = KPH)) + 
  geom_histogram(bins = 50, fill = "steelblue", alpha = 0.7) +
  labs(title = "Distribution of KPH", x = "KPH", y = "Count") +
  theme_minimal()

# Speed between points (Sn)
p2 <- ggplot(eagle_clean, aes(x = Sn)) + 
  geom_histogram(bins = 50, fill = "steelblue", alpha = 0.7) +
  labs(title = "Distribution of Sn", x = "Sn (m/s)", y = "Count") +
  theme_minimal()

# Altitude above ground level
p3 <- ggplot(eagle_clean, aes(x = AGL)) + 
  geom_histogram(bins = 50, fill = "steelblue", alpha = 0.7) +
  labs(title = "Distribution of AGL", x = "AGL (m)", y = "Count") +
  theme_minimal()

# Absolute turn angle
p4 <- ggplot(eagle_clean, aes(x = abs_angle)) + 
  geom_histogram(bins = 50, fill = "steelblue", alpha = 0.7) +
  labs(title = "Distribution of Absolute Angle", x = "Absolute Angle", y = "Count") +
  theme_minimal()

# Vertical rate (rate of altitude change)
p5 <- ggplot(eagle_clean, aes(x = VerticalRate)) + 
  geom_histogram(bins = 50, fill = "steelblue", alpha = 0.7) +
  labs(title = "Distribution of Vertical Rate", x = "Vertical Rate", y = "Count") +
  theme_minimal()

# Absolute vertical rate
p6 <- ggplot(eagle_clean, aes(x = absVR)) + 
  geom_histogram(bins = 50, fill = "steelblue", alpha = 0.7) +
  labs(title = "Distribution of Absolute VR", x = "Absolute VR", y = "Count") +
  theme_minimal()

# Display all histograms in grid layout
grid.arrange(p1, p2, p3, p4, p5, p6, ncol = 3)

Exploratory Data Analysis: Pairs Plot

# Create pairs plot to visualize relationships between variables
# Sample data for easier computations
pairs_data <- eagle_clean %>%
  select(KPH, Sn, AGL, abs_angle, VerticalRate, absVR) %>%
  slice_sample(n = 5000)

# Create pairwise scatterplots
pairs(pairs_data, pch = 16, col = rgb(0, 0, 1, 0.1), 
      main = "Pairwise Scatterplots of Movement Variables")

Flight vs Perching: Standardize Data

# Select movement features for classification
movement_vars <- eagle_clean %>%
  select(KPH, Sn, AGL, abs_angle, VerticalRate, absVR)

# scale the variables
# Centers each variable at mean=0 and scales to sd=1
movement_scaled <- scale(movement_vars)

# Verify standardization
colMeans(movement_scaled)  # Should be approximately 0
apply(movement_scaled, 2, sd)  # Should be approximately 1

Flight vs Perching: Elbow Method

# Determine optimal number of clusters using elbow method
# Test k from 1 to 10 clusters
set.seed(123)  # Set seed for reproducibility
wss <- numeric(10)

for (k in 1:10) {
  # Fit k-means with k clusters
  # nstart=10 tries multiple random starting points
  kmeans_model <- kmeans(movement_scaled, centers = k, nstart = 10)
  # Store total within-cluster sum of squares
  wss[k] <- kmeans_model$tot.withinss
}

# Plot elbow curve
plot(1:10, wss, type = "b", pch = 19, 
     xlab = "Number of Clusters", 
     ylab = "Total Within-Cluster Sum of Squares",
     main = "Elbow Method for Optimal k")

Flight vs Perching: Silhouette Analysis

# Compute silhouette scores for different k values
# Silhouette measures how similar points are to their own cluster
# vs. other clusters (ranges from -1 to 1, higher is better)

# Use sample for speed - silhouette is computationally expensive
set.seed(123)
sample_size <- 2000
sample_idx <- sample(1:nrow(movement_scaled), sample_size)
movement_scaled_sample <- movement_scaled[sample_idx, ]

# Calculate silhouette scores for k=2 to k=10
sil_scores <- numeric(9)
for (k in 2:10) {
  kmeans_model <- kmeans(movement_scaled_sample, centers = k, nstart = 10)
  # Calculate silhouette for each point
  sil <- silhouette(kmeans_model$cluster, dist(movement_scaled_sample))
  # Store average silhouette width
  sil_scores[k-1] <- mean(sil[, 3])
}

# Plot silhouette scores
plot(2:10, sil_scores, type = "b", pch = 19,
     xlab = "Number of Clusters",
     ylab = "Average Silhouette Width",
     main = "Silhouette Analysis for Optimal k")

# Report optimal k based on silhouette
# Higher average silhouette width indicates better clustering
cat("Silhouette scores suggest k =", which.max(sil_scores) + 1, "is optimal\n")

Flight vs Perching: K-means Clustering

# Fit final K-means model with k=2 (flight vs. perching)
# Based on both elbow and silhouette analysis
set.seed(123)
kmeans_flight_perch <- kmeans(movement_scaled, centers = 2, nstart = 25)

# Add cluster assignments to data
eagle_clean <- eagle_clean %>%
  mutate(flight_status = kmeans_flight_perch$cluster)

# Examine cluster characteristics to identify which is flight vs. perching
#calculate mean values for each cluster
cluster_summary <- eagle_clean %>%
  group_by(flight_status) %>%
  summarise(
    n = n(),
    mean_KPH = mean(KPH),
    mean_Sn = mean(Sn),
    mean_AGL = mean(AGL),
    mean_abs_angle = mean(abs_angle),
    mean_VR = mean(VerticalRate),
    mean_absVR = mean(absVR)
  )

print(cluster_summary)

Flight vs Perching: Recode Clusters

# Recode clusters based on speed characteristics
# Higher speed cluster = flight, lower speed = perching
if (cluster_summary$mean_KPH[1] > cluster_summary$mean_KPH[2]) {
  eagle_clean <- eagle_clean %>%
    mutate(flight_status = ifelse(flight_status == 1, "flight", "perching"))
} else {
  eagle_clean <- eagle_clean %>%
    mutate(flight_status = ifelse(flight_status == 2, "flight", "perching"))
}

# verify classification
classification_table <- table(eagle_clean$flight_status)
print(classification_table)

# Report percentages
cat("\nPercentage in flight:", 
    round(100 * classification_table["flight"] / sum(classification_table), 1), "%\n")
cat("Percentage perching:", 
    round(100 * classification_table["perching"] / sum(classification_table), 1), "%\n")

Flight vs Perching: Visualization

# Create comparison boxplots for flight vs. perching
# Visualizes differences in movement variables between states

# Speed (KPH) comparison
p1 <- ggplot(eagle_clean, aes(x = flight_status, y = KPH, fill = flight_status)) +
  geom_boxplot() +
  scale_fill_manual(values = c("flight" = "skyblue", "perching" = "coral")) +
  labs(title = "KPH by Status", x = "", y = "KPH") +
  theme_minimal() +
  theme(legend.position = "none")

# Horizontal speed (Sn) comparison
p2 <- ggplot(eagle_clean, aes(x = flight_status, y = Sn, fill = flight_status)) +
  geom_boxplot() +
  scale_fill_manual(values = c("flight" = "skyblue", "perching" = "coral")) +
  labs(title = "Sn by Status", x = "", y = "Sn (m/s)") +
  theme_minimal() +
  theme(legend.position = "none")

# Altitude (AGL) comparison
p3 <- ggplot(eagle_clean, aes(x = flight_status, y = AGL, fill = flight_status)) +
  geom_boxplot() +
  scale_fill_manual(values = c("flight" = "skyblue", "perching" = "coral")) +
  labs(title = "AGL by Status", x = "", y = "AGL (m)") +
  theme_minimal() +
  theme(legend.position = "none")

# Turn angle comparison
p4 <- ggplot(eagle_clean, aes(x = flight_status, y = abs_angle, fill = flight_status)) +
  geom_boxplot() +
  scale_fill_manual(values = c("flight" = "skyblue", "perching" = "coral")) +
  labs(title = "Absolute Angle by Status", x = "", y = "Absolute Angle") +
  theme_minimal() +
  theme(legend.position = "none")

# Display all boxplots in 2x2 grid
grid.arrange(p1, p2, p3, p4, ncol = 2)

Flight Behavior Analysis: Filter to In-Flight Points

# Filter to in-flight points only
#  focus on distinguishing different types of flight behavior
eagle_flight <- eagle_clean %>%
  filter(flight_status == "flight")

cat("In-flight observations:", nrow(eagle_flight), "\n")
cat("In-flight segments:", n_distinct(eagle_flight$segment_id), "\n")

Flight Behavior Analysis: Prepare and Standardize

# Prepare data for behavior clustering
flight_vars <- eagle_flight %>%
  select(KPH, Sn, AGL, abs_angle, VerticalRate, absVR)

# Standardize the variables
# Important: standardize within flight data, not using original scale
flight_scaled <- scale(flight_vars)

# Verify standardization
colMeans(flight_scaled)
apply(flight_scaled, 2, sd)

Flight Behavior Analysis: Determine Optimal Clusters

# Elbow method for flight behaviors
set.seed(456)  # Different seed for flight analysis
wss_flight <- numeric(10)

for (k in 1:10) {
  kmeans_model <- kmeans(flight_scaled, centers = k, nstart = 10)
  wss_flight[k] <- kmeans_model$tot.withinss
}

# Plot elbow curve
plot(1:10, wss_flight, type = "b", pch = 19,
     xlab = "Number of Clusters",
     ylab = "Total Within-Cluster Sum of Squares",
     main = "Elbow Method for Flight Behaviors")

# Silhouette analysis on sample
set.seed(456)
sample_size_flight <- min(2000, nrow(flight_scaled))
sample_idx_flight <- sample(1:nrow(flight_scaled), sample_size_flight)
flight_scaled_sample <- flight_scaled[sample_idx_flight, ]

sil_scores_flight <- numeric(9)
for (k in 2:10) {
  kmeans_model <- kmeans(flight_scaled_sample, centers = k, nstart = 10)
  sil <- silhouette(kmeans_model$cluster, dist(flight_scaled_sample))
  sil_scores_flight[k-1] <- mean(sil[, 3])
}

# Plot silhouette scores
plot(2:10, sil_scores_flight, type = "b", pch = 19,
     xlab = "Number of Clusters",
     ylab = "Average Silhouette Width",
     main = "Silhouette Analysis for Flight Behaviors")

# Report optimal k
optimal_k <- which.max(sil_scores_flight) + 1
cat("Silhouette scores suggest k =", optimal_k, "flight behaviors\n")

Flight Behavior Analysis: Fit Clustering Model

# Fit K-means model with optimal k
# Using nstart=25 for more robust results
set.seed(456)
kmeans_flight_behaviors <- kmeans(flight_scaled, centers = optimal_k, nstart = 25)

# Add cluster assignments to flight data
eagle_flight <- eagle_flight %>%
  mutate(behavior_type = kmeans_flight_behaviors$cluster)

# Display cluster sizes
cat("Cluster sizes:\n")
print(table(eagle_flight$behavior_type))

# Check for very small or very large clusters
# Imbalanced clusters may need further investigation
prop.table(table(eagle_flight$behavior_type))

Flight Behavior Characteristics: Summary Statistics

# Calculate comprehensive summary statistics by behavior type
behavior_summary <- eagle_flight %>%
  group_by(behavior_type) %>%
  summarise(
    n = n(),
    mean_KPH = mean(KPH),
    sd_KPH = sd(KPH),
    mean_Sn = mean(Sn),
    sd_Sn = sd(Sn),
    mean_AGL = mean(AGL),
    sd_AGL = sd(AGL),
    mean_abs_angle = mean(abs_angle),
    sd_abs_angle = sd(abs_angle),
    mean_VR = mean(VerticalRate),
    sd_VR = sd(VerticalRate),
    mean_absVR = mean(absVR),
    sd_absVR = sd(absVR)
  ) %>%
  arrange(behavior_type)
print(behavior_summary)

# Export
write.csv(behavior_summary, "behavior_summary.csv", row.names = FALSE)

Flight Behavior Characteristics: Visualizations

# Create detailed boxplots comparing all behaviors across key variables

# Speed comparison
p1 <- ggplot(eagle_flight, aes(x = as.factor(behavior_type), 
                                y = KPH, 
                                fill = as.factor(behavior_type))) +
  geom_boxplot() +
  labs(title = "Speed (KPH) by Behavior Type", 
       x = "Behavior Type", 
       y = "KPH") +
  theme_minimal() +
  theme(legend.position = "none")

# Altitude comparison
p2 <- ggplot(eagle_flight, aes(x = as.factor(behavior_type), 
                                y = AGL, 
                                fill = as.factor(behavior_type))) +
  geom_boxplot() +
  labs(title = "Altitude (AGL) by Behavior Type", 
       x = "Behavior Type", 
       y = "AGL (m)") +
  theme_minimal() +
  theme(legend.position = "none")

# Turn angle comparison
p3 <- ggplot(eagle_flight, aes(x = as.factor(behavior_type), 
                                y = abs_angle, 
                                fill = as.factor(behavior_type))) +
  geom_boxplot() +
  labs(title = "Turn Angle by Behavior Type", 
       x = "Behavior Type", 
       y = "Absolute Angle") +
  theme_minimal() +
  theme(legend.position = "none")

# Vertical rate comparison
p4 <- ggplot(eagle_flight, aes(x = as.factor(behavior_type), 
                                y = VerticalRate, 
                                fill = as.factor(behavior_type))) +
  geom_boxplot() +
  labs(title = "Vertical Rate by Behavior Type", 
       x = "Behavior Type", 
       y = "Vertical Rate") +
  theme_minimal() +
  theme(legend.position = "none")

# Display all plots
grid.arrange(p1, p2, p3, p4, ncol = 2)

Flight Behavior: PCA Visualization

# Perform PCA on standardized flight data
# PCA reduces dimensionality while preserving variance
pca_flight <- prcomp(flight_scaled, scale. = FALSE)

# Examine variance explained by each component
summary(pca_flight)

# Create data frame for plotting
pca_df <- data.frame(
  PC1 = pca_flight$x[, 1],
  PC2 = pca_flight$x[, 2],
  behavior = as.factor(eagle_flight$behavior_type)
)

# Plot first two principal components
# Colors represent different behavior types
ggplot(pca_df, aes(x = PC1, y = PC2, color = behavior)) +
  geom_point(alpha = 0.4, size = 1) +
  stat_ellipse(level = 0.68, size = 1) +
  labs(title = "Flight Behavior Clusters (PCA Visualization)",
       x = paste0("PC1 (", round(100 * summary(pca_flight)$importance[2, 1], 1), "%)"),
       y = paste0("PC2 (", round(100 * summary(pca_flight)$importance[2, 2], 1), "%)"),
       color = "Behavior") +
  theme_minimal()

# Examine variable loadings
# Shows which variables contribute most to each PC
print("PC1 loadings:")
print(sort(pca_flight$rotation[, 1], decreasing = TRUE))
print("\nPC2 loadings:")
print(sort(pca_flight$rotation[, 2], decreasing = TRUE))

Example Flight Trajectories

# Select example segments from each behavior type
# Choose segments with sufficient length for visualization
set.seed(789)
example_segments <- eagle_flight %>%
  group_by(behavior_type) %>%
  filter(n() >= 20) %>%  # Only behaviors with enough points
  slice_sample(n = 1) %>%  # One segment per behavior
  pull(segment_id)

# Extract data for example segments
example_data <- eagle_flight %>%
  filter(segment_id %in% example_segments[1:min(4, length(example_segments))])

# Create trajectory plots
if(nrow(example_data) > 0) {
  # Spatial trajectories
  p1 <- ggplot(example_data, aes(x = Longitude, y = Latitude, 
                                  color = as.factor(behavior_type))) +
    geom_path(linewidth = 1) +
    geom_point(alpha = 0.5, size = 2) +
    facet_wrap(~behavior_type, scales = "free") +
    labs(title = "Example Flight Trajectories by Behavior Type",
         x = "Longitude", y = "Latitude",
         color = "Behavior") +
    theme_minimal() +
    theme(legend.position = "bottom")
  
  print(p1)
  
  # Altitude profiles
  p2 <- ggplot(example_data, aes(x = seq_along(Animal_ID), y = AGL, 
                                  color = as.factor(behavior_type))) +
    geom_line(linewidth = 1) +
    facet_wrap(~behavior_type, scales = "free") +
    labs(title = "Altitude Profiles by Behavior Type",
         x = "Time Point", y = "AGL (m)",
         color = "Behavior") +
    theme_minimal() +
    theme(legend.position = "bottom")
  
  print(p2)
}

Validation: Cluster Quality Metrics

# Calculate within-cluster sum of squares for final model
cat("Total within-cluster SS:", kmeans_flight_behaviors$tot.withinss, "\n")
cat("Between-cluster SS:", kmeans_flight_behaviors$betweenss, "\n")
cat("Total SS:", kmeans_flight_behaviors$totss, "\n")

# Calculate proportion of variance explained
var_explained <- kmeans_flight_behaviors$betweenss / kmeans_flight_behaviors$totss
cat("\nProportion of variance explained:", round(var_explained, 3), "\n")

# Silhouette analysis for final model
sil_final <- silhouette(kmeans_flight_behaviors$cluster, 
                        dist(flight_scaled[sample(1:nrow(flight_scaled), 
                                                 min(2000, nrow(flight_scaled))), ]))
cat("Average silhouette width:", round(mean(sil_final[, 3]), 3), "\n")

# Plot silhouette
plot(sil_final, main = "Silhouette Plot for Final Clustering")