Analyzing Bald Eagle Flight Patterns

Jackson Dvorak

Introduction

We are interpreting any possible flight patterns for Bald Eagles in Iowa. We have two main questions to answer:

  • How do we classify a data point as “moving” or “perched?”

  • For “moving” variables, are there certain/distinct patterns that are recognizable?

Methods

Before writing any code, it is always best to write your process down, and also consider any other possibilities.

To consider: 2,000,000 data points is a LOT of data to load into R. If we were to visualize this data on a PCA plot, it will very likely crash or take extremely long to load. We need to minimize this data before making any plots.

Process:

  • Use stratified sampling to reduce the amount of data being used

  • Create a scree plot and Within-cluster Sum of Square means plot to determine how our PCA plot will form

Methods (contd.)

  • Generate a PCA plot with arrows for each movement variable

  • After determining the clusters characteristics, provide an example of a specific segment and how it moves over time.

WSS Table

Scree Plot

PCA 1 explains about 43% of variation in the data, while PCA 2 explains an additional 20% of the data.

PCA

Only one cluster can be perched…

Obviously, only one group of points can be classified as “perched” or not moving. It makes sense that the cluster will have around these values:

Very low values of both horizontal and vertical velocity; KPH, Sn, Vertical Rate(s)

The angle is tricky. Under the assumption that if a bird’s wings are straight up, they will have a turn value of 0, then we are also looking for low values here.

Summary of Clusters

  Cluster      KPH        Sn       AGL VerticalRate abs_angle     absVR
1       1 43.76233 11.214367 124.30971  -0.04904643 0.4762345 0.6094449
2       2 64.80844 17.798377 446.57799  -2.31743988 0.2467720 2.3996064
3       3 14.56755  3.544493  69.29634   0.10695022 1.7115532 0.5868511
4       4 35.45490  8.489519 465.42061   2.07890664 1.6291968 2.0922263

Interpretation

Here is how I would interpret each cluster.

  Cluster      KPH        Sn       AGL VerticalRate abs_angle     absVR
1       1 43.76233 11.214367 124.30971  -0.04904643 0.4762345 0.6094449
2       2 64.80844 17.798377 446.57799  -2.31743988 0.2467720 2.3996064
3       3 14.56755  3.544493  69.29634   0.10695022 1.7115532 0.5868511
4       4 35.45490  8.489519 465.42061   2.07890664 1.6291968 2.0922263
  • Cluster one is low in altitude, change in altitude, but moving relatively fast with a high turn angle. The eagle is likely hunting, looking for animals/objects.

  • Cluster two is the fastest cluster, high in the air, and the lowest turn angle. The eagle is likely cruising in the air, covering a longer distance.

  • Cluster three has the lowest speeds, altitude, but has a very high turn angle. The eagle is likely stationary, but could also be closely circling a certain object.

  • Cluster four has a middle of the pack speed, but the highest altitude cluster and change in altitude (climbing), while also turning. This eagle is likely both soaring and searching.

Behavior Example

[1] "The segment that has the most amount of data in cluster 2 is  142054 ."

Appendix

library(dplyr)
library(cluster)
library(ggplot2)
library(ggrepel)
library(FactoMineR)
library(factoextra)
library(tidyr)

load("eagle_data.RData")

# Changed the data frame to df
data_list <- ls()
data_name <- data_list[length(data_list)]
df <- get(data_name)


movement_vars <- c("KPH", "Sn", "AGL", "VerticalRate", "abs_angle", "absVR")

set.seed(123) 

# Determine sample size per animal. If animal has less than 1000, then all of their data will be used. 
sample_size <- 1000

df_sample <- df %>%
  group_by(Animal_ID) %>%
  group_modify(~ slice_sample(.x, n = min(nrow(.x), sample_size))) %>%
  ungroup()

movement_vars <- c("KPH", "Sn", "AGL", "VerticalRate", "abs_angle", "absVR")

df_pca <- df_sample %>%
  select(all_of(movement_vars)) %>%
  drop_na()

df_scaled <- scale(df_pca)

# PCA
pca_res <- prcomp(df_scaled, rank. = 2)

pca_scores <- as.data.frame(pca_res$x)


wss <- sapply(2:6, function(k){
  kmeans(df_scaled, centers = k, nstart = 10, iter.max = 20)$tot.withinss
})

wss_df <- data.frame(
  k = 1:10,
  WSS = wss
)

ggplot(wss_df, aes(k, WSS)) +
  geom_line() + geom_point() +
  labs(title = "Elbow Method: Total WSS vs k",
       x = "Number of clusters (k)",
       y = "Total WSS") +
  theme_minimal()


k <- 4

# Scores
set.seed(123)
km_final <- kmeans(df_scaled, centers = k, nstart = 25)
pca_scores$Cluster <- factor(km_final$cluster)

# Loadings
loadings <- as.data.frame(pca_res$rotation)
loadings$Variable <- rownames(loadings)
fviz_eig(pca_res) +
  ggtitle("Scree Plot for PCA of Eagle Movement Variables")
# PCA plot
arrow_mult <- 8  

ggplot(pca_scores, aes(PC1, PC2, color = Cluster)) +
  geom_point(alpha = 0.3, size = 0.6) +
  geom_segment(
    data = loadings,
    aes(
      x = 0, y = 0,
      xend = PC1 * arrow_mult,
      yend = PC2 * arrow_mult
    ),
    arrow = arrow(length = unit(0.2, "cm")),
    inherit.aes = FALSE
  ) +
  geom_text_repel(
    data = loadings,
    aes(
      x = PC1 * arrow_mult * 1.1,
      y = PC2 * arrow_mult * 1.1,
      label = Variable
    ),
    inherit.aes = FALSE,
    size = 4
  ) +
  theme_minimal()+
  xlim(-5,10)+ ylim(-2,10)
df_sample$Cluster <- factor(km_final$cluster)

cluster_summary <- df_sample %>%
  group_by(Cluster) %>%
  summarise(across(all_of(movement_vars), mean), .groups = "drop")

as.data.frame(cluster_summary)
# This code finds the segment ID with the most points in cluster 2 to help visualize the data 
best_segment <- df_sample %>%
  filter(Cluster == 2) %>%        # keep only Cluster 2 
  count(segment_id) %>%          # count rows per segment
  arrange(desc(n)) %>%           # sort largest to smallest
  slice(1) %>%                   # take the top one
  pull(segment_id)               # store the ID

print(paste("The segment that has the most amount of data in cluster 2 is ",  best_segment,"."))

segment_path <- df_sample %>%
  filter(segment_id == 142054) %>%
  arrange(LocalTime)

# Starting point
x0 <- segment_path$X[1]
y0 <- segment_path$Y[1]

# Compute Euclidean distance from origin
segment_path <- segment_path %>%
  mutate(Distance_from_origin = sqrt((X - x0)^2 + (Y - y0)^2))

# Plot Distance (Euclidian Method)
ggplot(segment_path, aes(x = LocalTime, y = Distance_from_origin)) +
  geom_line() +
   scale_y_continuous(labels = scales::comma) +
  geom_point(size = 1) +
  labs(
    title = "Segment 142054: Distance from Origin Over Time",
    x = "Time",
    y = "Distance from Origin (meters)"
  ) +
  theme_minimal()