Eagle in-flight behavioral tracking analysis

Background / Introduction

Research Question:

Can movement variables in eagles distinguish in-flight from perching points using machine learning algorithms and tools of unsupervised learning?

Methods Used to distinguish behaviors :
  • K-Means

  • DBscan

  • Loadings of weights

Data

  • Eagle behaviors: 6 variables that explain eagle movements

    • KPH, Sn, AGL, |Angle|, Verticle Rate, |Verticle Rate|
    • Variables not used: Latitude/Longitude, Animal ID, Segment ID, Segment length, and TimeDiff.
    • A sample of 10,000 points out of the 200,000,000 points were used.
    • The data was scaled, and outliers with extreme data points were removed, keeping only the middle 95%.
             X       Y  KPH   Sn   AGL VerticalRate abs_angle absVR
    1 509745.4 4609596 0.07 0.52  9.55        -0.99      0.45  0.99
    2 509746.0 4609597 0.07 0.17  9.55         0.00      0.21  0.00
    3 509746.6 4609597 0.20 0.14  9.55         0.00      0.91  0.00
    4 509746.6 4609599 0.16 0.22  9.55         0.00      3.14  0.00
    5 509746.6 4609598 0.11 0.08 11.55         0.39      2.69  0.39
    6 509746.0 4609599 0.07 0.22 11.55         0.00      0.53  0.00
[1] 10000
               X       Y   KPH    Sn    AGL VerticalRate abs_angle absVR
1650366 696863.0 4594786 48.33 14.01  85.62         0.84      0.97  0.84
173134  619790.4 4670037 57.37  5.97 402.71         1.46      0.60  1.46
1549395 648924.8 4780856 51.67 13.78 122.83        -0.59      0.05  0.59
906545  502276.4 4582268 36.29  4.56 268.87        -1.37      2.04  1.37
1696883 656150.6 4607804 18.52  8.04 163.90         0.47      1.84  0.47
1902470 619373.7 4746099 61.30 17.81  46.79         1.61      0.11  1.61

Elbow Method : Within Sum of Squares (WSS)

  • The elbow method was used to find the optimal number of clusters (k).

    • 4 is the optimal number in this graph using K-means.

Silhouette Method:

  • The Silhouette method was not a good representation of how many clusters should be used.
  • This graph shows the optimal number of clusters = 2.

K= ?: Which is the right one?

DBSCAN eps = 1.7

DBSCAN eps =2

PCA plot with loadings

Whats the story?

  • Cluster #1 (Purple) - These birds have a low KPH, and low Sn, also have a lower AGL, and VR
    • I think these birds are perching, they are more lower not so high above ground level than the pink and blue cluster. They are also slower than the pink cluster sitting right across.
  • Cluster #2 (Pink) = high Sn and KPH
    • very high speeds , and the birds seem to be climbing or gliding with it being next to the blue cluster with high AGL and absolute vertical rates.
  • Cluster #3 (green)= Low AGL, and low VR
    • The bird are traveling short distances or low altitude , possibly just flying, gliding at low heights.
  • Cluster #4 (Blue) - high AGL, High absVR also high VR and Angle
    • These birds I think are flying up, possibly migrating since they keep on climbing higher up in the air to fly.

Loading contributions

Conclusion

  • Some of the characteristics that can be identified with this analysis is if the eagle’s are hunting or soaring in the air. You can also assume that the data shows when they are perched and migrating.

  • With a high AGL (Above Ground Level) and a high Vertical Rate I can tell that the birds are possibly going to start migrating. They also fly really fast when they are migrating which is a high KPH and high SN.

  • The blue cluster was in the opposite direction of AGL , KPH and Sn indicating that the birds are more likely perched, getting ready to hunt.

  • The high absolute angle is right next to the cluster that I believe is when the bird is perched. That is where the bird is moving the angle of their head really quickly.

Appendix

# Appendix: Reproducible R Code

The following R code reproduces the full analysis.  

All chunks are provided with `eval: false` and `echo: true` to show the code without executing it.  

All paths assume the user is working inside the **Project2/** directory.

# Load libraries used in the project
library(cluster)
library(dbscan)
library(factoextra)
library(tidyverse)
library(patchwork)
library(ggrepel)
library(dplyr)    

 -----------------------
  #Load and inspect the Data: 
  
library(tidyverse)
options(width =10000) #sample
load('/Users/rosagomez/Desktop/DSCI 415/Project2/eagle_data.Rdata')
(eagle_data
  %>% select(X:absVR)
  %>%  head(6)) #load data

-------------------------

# Remove outliers , keep only numeric data 
  
# Keep only middle 95% of data (remove top and bottom 2.5%)
remove_extreme <- function(eagle_data, lower = 0.025, upper = 0.975) {
  bounds <- apply(eagle_data, 2, quantile, probs = c(lower, upper))
  
  keep <- apply(eagle_data, 1, function(row) {
    all(row >= bounds[1, ] & row <= bounds[2, ])
  })
  
  return(eagle_data[keep, ])
}
numeric_data <- eagle_data[8:15]
numeric_data_clean <- remove_extreme(numeric_data)


-------------------------
  # Scale the data
  
   set.seed(123) 
 sample_eagle_data <- numeric_data_clean[sample(1:nrow(numeric_data_clean), 10000), ] 

 
 # scale the data

scaled_eagle_data <- scale(sample_eagle_data)

# calculate the distance metrix 

sample_eagle_data <- dist(scaled_eagle_data)



#calculate how many clusters 
p1<- fviz_nbclust(scaled_eagle_data,
            kmeans,
             method = "wss") +
  labs(subtitle = "WSS using K-means")
p1

----------------------------------------

#calculate how many clusters 
p1<- fviz_nbclust(scaled_eagle_data,
            kmeans,
             method = "wss") +
  labs(subtitle = "WSS using K-means")
p1


# silhouette method 

sil1 <- fviz_nbclust(scaled_eagle_data,
                   kmeans,
                   method = 'silhouette') +
 labs(subtitle = "Silhouette using K-means")
sil1

-------------------------------------------------
  
#Calculate K-means with different centers. 1-6 clusters
  
  #calculate kmeans - with wss it's telling me i should do 4-5 
k1 <- km.out_1 <- kmeans(scaled_eagle_data, centers = 1, nstart = 10)
k2 <- km.out_2 <- kmeans(scaled_eagle_data, centers = 2, nstart = 10)
k3 <- km.out_3 <- kmeans(scaled_eagle_data, centers = 3, nstart = 10)
k4 <- km.out_4 <- kmeans(scaled_eagle_data, centers = 4, nstart = 10)
k5 <- km.out_5 <- kmeans(scaled_eagle_data, centers = 5, nstart = 10)
k6 <- km.out_6 <- kmeans(scaled_eagle_data, centers = 6, nstart = 10)
 
---------------------
  #create plots

km.clusters <- k1$cluster

v1<- fviz_cluster(list(data=scaled_eagle_data, cluster=km.clusters),
            geom = "point",
            pointsize = .5,
            alpha = 0.6,
            ellipse = FALSE,
            label = "none",
            palette = c("purple"),
            
            ggtheme = theme_classic(base_size = 16)) + 
  ggtitle("K = 1 ") +
   labs (x = "Dim1" , y = "Dim 2") +
  theme(
    plot.title = element_text(size = 12, hjust = 0.5)
  )

km.clusters_2 <- k2$cluster

v2<- fviz_cluster(list(data=scaled_eagle_data, cluster=km.clusters_2),
            geom = "point",
            pointsize = .5,
            alpha = 0.6,
            ellipse = FALSE,
            label = "none",
            palette = c("purple", "pink"),
            
            ggtheme = theme_classic(base_size = 16)) +
  ggtitle("K = 2 ") +
  labs (x = "Dim1" , y = "Dim 2") +
  theme(
    plot.title = element_text(size = 12, hjust = 0.5)
  )
km.clusters_3 <- k3$cluster

v3<- fviz_cluster(list(data=scaled_eagle_data, cluster=km.clusters_3),
            geom = "point",
            pointsize = .5,
            alpha = 0.6,
            ellipse = FALSE,
            label = "none",
            palette = c("purple", "pink","lightblue"),
            ggtheme = theme_classic(base_size = 16)) +
  ggtitle("K = 3 ") + 
  labs (x = "Dim1" , y = "Dim 2") +
  theme(
    plot.title = element_text(size = 12, hjust = 0.5)
  )
km.clusters_4 <- k4$cluster

v4 <- fviz_cluster(list(data=scaled_eagle_data, cluster=km.clusters_4),
            geom = "point",
            pointsize = .5,
            alpha = 0.6,
            ellipse = FALSE,
            label = "none",
            palette = c("purple", "pink" ,"lightblue", "lightgreen"),
            ggtheme = theme_classic(base_size = 16)) + 
  ggtitle("K = 4 ") +
   labs (x = "Dim1" , y = "Dim 2") +
  theme(
    plot.title = element_text(size = 12, hjust = 0.5)
  )
km.clusters_5 <- k5$cluster

v5 <- fviz_cluster(list(data=scaled_eagle_data, cluster=km.clusters_5),
            geom = "point",
            pointsize = .5,
            alpha = 0.6,
            ellipse = FALSE,
            label = "none",
            palette = c("purple", "pink" ,"lightblue", "lightgreen", "yellow"),
            ggtheme = theme_classic(base_size = 16)) +
  ggtitle("K = 5 ") +
   labs (x = "Dim1" , y = "Dim 2") +
  theme(
    plot.title = element_text(size = 12, hjust = 0.5)
  )
km.clusters_6 <- k6$cluster

v6 <- fviz_cluster(list(data=scaled_eagle_data, cluster=km.clusters_6),
            geom = "point",
            pointsize = .5,
            alpha = 0.6,
            ellipse = FALSE,
            label = "none",
            palette = c("purple", "pink" ,"blue", "lightgreen", "yellow", "black"),
            ggtheme = theme_classic(base_size = 16)) +
  ggtitle("K = 6 ") +
   labs (x = "Dim1" , y = "Dim 2") +
  theme(
    plot.title = element_text(size = 12, hjust = 0.5)
  )

# Using Patchwork, added all plots together. 

library(patchwork)
(v1 + scale_x_reverse() + v2+ scale_x_reverse() + v3 +scale_x_reverse()) / (v4+scale_x_reverse() + v5 +scale_x_reverse() + v6+scale_x_reverse())

# DBSCAN pca , make sample data as a data frame. 

pca_eagles <- prcomp(scaled_eagle_data)
sample_eagle_df <- as.data.frame(sample_eagle_data)

#| fig.width: 5
#| fig.height: 4
#| out.width: "30%"
#| fig.align: "center"
distplot1 <- kNNdistplot(scaled_eagle_data, minPts = 10)
abline(h = 1.7, col = 'red')

eagle_dbscan1 <- dbscan(scaled_eagle_data, eps = 1.7)


#| fig.width: 5
#| fig.height: 4
#| out.width: "40%"
#| fig.align: "center"
db1 <- fviz_pca(pca_eagles, 
         habillage = factor(eagle_dbscan1$cluster),
         label='var',
         repel = TRUE) + 
      ggtitle('DBSCAN clustering results, epsilon = 1.7') +
      labs(color='Area',shape='Area') 

library(patchwork)
distplot1 / db1


#| fig.width: 5
#| fig.height: 4
#| out.width: "30%"
#| fig.align: "center"
distplot2 <- kNNdistplot(scaled_eagle_data, minPts = 10)
abline(h = 2, col = 'red')


eagle_dbscan2<- dbscan(scaled_eagle_data, eps = 2)

#| fig.width: 5
#| fig.height: 4
#| out.width: "40%"
#| fig.align: "center"
db2 <- fviz_pca(pca_eagles, 
         habillage = factor(eagle_dbscan2$cluster),
         label='var',
         repel = TRUE) + 
      ggtitle('DBSCAN clustering results, epsilon = 2') +
      labs(color='Area',shape='Area') 

library(patchwork)
distplot2 / db2

#PCA plots with loadings

# Run k-means with 4 clusters
k4 <- kmeans(scaled_eagle_data, centers = 4, nstart = 10)
km.clusters_4 <- k4$cluster

# PCA
pca_result <- prcomp(scaled_eagle_data)
pca_data <- as.data.frame(pca_result$x[, 1:2])
pca_data$cluster <- factor(km.clusters_4)

#loadings
loadings <- as.data.frame(pca_result$rotation[, 1:2])
loadings$variable <- rownames(loadings)  
arrow_scale <- 4 


# Variance explained
var_explained <- summary(pca_result)$importance[2, 1:2] * 100

# Plot
loadingpca <- ggplot(pca_data, aes(x = PC1, y = PC2, color = cluster)) +
  geom_point(alpha = 0.6, size = .5) +
  scale_color_manual(values =  c("purple", "pink" ,"lightblue", "lightgreen")) +
    # Reference lines
  geom_hline(yintercept = 0, linetype = "dashed", color = "gray50") +
  geom_vline(xintercept = 0, linetype = "dashed", color = "gray50") +
  # Variable arrows
  scale_x_reverse() + 
  geom_segment(data = loadings,
               aes(x = 0, y = 0, 
                   xend = PC1 * arrow_scale, 
                   yend = PC2 * arrow_scale),
               arrow = arrow(length = unit(0.3, "cm")),
               color = "black",
               size = 0.5,
               inherit.aes = FALSE) +
  geom_text_repel(data = loadings,  # Automatically avoids overlaps
                  aes(x = PC1 * arrow_scale, 
                      y = PC2 * arrow_scale, 
                      label = variable),
                  color = "black",
                  size = 4,
                  fontface = "bold",
                  inherit.aes = FALSE) +
  labs(x = paste0("PC1 "),
       y = paste0("PC2" ),
       title = "K=4",
       color = "Cluster") +
  theme_classic(base_size = 16) +
  theme(plot.title = element_text(hjust = 0, face = "bold")) +
    coord_fixed() +
  geom_hline(yintercept = 0, linetype = "dashed", alpha = 0.3) +
  geom_vline(xintercept = 0, linetype = "dashed", alpha = 0.3)

loadingpca 

-------------------------------
  # loading contributions 
  loadings1 <- fviz_contrib(pca_result, choice = "var", axes = 1, top = 10) +
  ggtitle("Contribution to PC1")

loadings2 <- fviz_contrib(pca_result, choice = "var", axes = 2, top = 10) +
  ggtitle("Contribution to PC2")

library(patchwork)
loadings1 / loadings2