GPS Tracking of Eagle Data in Iowa

How do Eagles in Iowa Traverse?

We wanted to see how Eagles moved around in the “Hawkeye” state!

Background

  • Biologging is when sensors are attached to animals to monitor their heart rate, movement, & other behaviors.

  • In this large scale observational study over 2 million data points were collected from 57 eagles over a 4 year period

Our variables

  • Overall, 9 numerical variables were drawn from this survey to chart out GPS points

  • We are focusing on six variables to analyze tracking points to answer 2 research questions:

  1. Can we use the movement variables KPH (instantaneous speed), Sn (average speed), AGL (above ground level), |Angle| (abs. angle), Vertical rate, and |VR| (absolute VR), to differentiate between in-flight and perching points?
  2. What distinct in-flight behaviors do eagles exhibit, how can we characterize them, and how can representative flight segments illustrate these behaviors?

Data

The data that was given

# A tibble: 6 × 15
  Animal_ID TimeDiff segment_id segment_length LocalTime           Latitude Longitude       X        Y   KPH    Sn   AGL VerticalRate abs_angle absVR
      <int>    <int>      <dbl>          <int> <dttm>                 <dbl>     <dbl>   <dbl>    <dbl> <dbl> <dbl> <dbl>        <dbl>     <dbl> <dbl>
1       105        5          1             10 2019-06-25 16:44:10     41.6     -92.9 509745. 4609596.  0.07  0.52  9.55        -0.99      0.45  0.99
2       105        6          1             10 2019-06-25 16:44:16     41.6     -92.9 509746. 4609597.  0.07  0.17  9.55         0         0.21  0   
3       105        5          1             10 2019-06-25 16:44:21     41.6     -92.9 509747. 4609597.  0.2   0.14  9.55         0         0.91  0   
4       105        6          1             10 2019-06-25 16:44:27     41.6     -92.9 509747. 4609599.  0.16  0.22  9.55         0         3.14  0   
5       105        5          1             10 2019-06-25 16:44:32     41.6     -92.9 509747. 4609598.  0.11  0.08 11.6          0.39      2.69  0.39
6       105        6          1             10 2019-06-25 16:44:38     41.6     -92.9 509746. 4609599.  0.07  0.22 11.6          0         0.53  0   

Methods

  • Because of such a large data set I grouped by Animal ID for random selection of smaller data for optimization.
  • We want to find ways to distinguish points from each other for identifying movement patterns
  • Clustering methods that will help pin point what points should be grouped together

Is our data normal?

  • We want to check for data normality to uphold accuracy
  • We see that abs_angle, absVR, AGL and Sn all very positively skewed- not good.

Fixing our data first

  • To fix that I used an Square Root Transformation to normalize the focal variables

Clustering our data

  • K-means is an effective way to cluster large number of observations which is what I used to converge points.

1st.) Silhouette

2nd.) WSS

Clustered PCA biplot k=2

  • PCA biplot is used to distinguish GPS tracking points that were moving and those that were perched so we used 2 clusters.
PCA Biplot of the data k=2

Takeaways from the PCA Biplot where k=2

  • In-flight data points are likely overly represented in the first cluster where vectors AGL, KPH,Sn, and absVR were projected far across the primarly first observations.

  • Perched data points were likely over represented and grouped in the second cluster where where vectors VerticalRate and abs_angle were projected furter across the second group.

  • The strongest variables were KPH, Sn, & AGL

  • The weakest variable was verticalRate

  • 69% of variance explained by first two PC in covariates

Clustered PCA biplot k=4

This PCA biplot is used to distinguish between the movement that were identified as in-flight points-ascending, flapping, descending-and determine patterns

Takeaways from the PCA Biplot where k=4

  • The clusters that had was highest in KPH and in SN, strong in other clusters was the gliding/descending cluster. It also had much lower association with vertical rate and absolute angle.

  • The cluster strong in AGL, absVR, VerticalRate, and abs_angle was the ascending movement. Lower association with KPH and Sn

  • The cluster modestly stronger in KPH and Sn, was the flapping movement. Which had lower association with AGL, absVR, VerticalRate, abs_angle.

  • Accounts for 68.7% of the variance in covariates

Looking at In-flight Behaviors

  • We want to view ‘in-flight’ behavior by randomly selecting 1 eagle to visualize their KPH over time (11 second time intervals)

  • This will allow us to deduce what behaviors (the four) are exhibited at certain times of their journey

  • Behaviors that I deduced: descending had the highest KPH, followed by ascending and flapping. Perching had next to none.

Appendix 1

#loading libraries and our dataset
library(cluster)
library(dbscan)
library(factoextra)
library(tidyverse)
library(patchwork)
library(ggrepel)
library(tidyverse)
options(width=10000)

#loading our dataset 
load('eagle_data.Rdata')
(eagle_data
  %>% as_tibble() 
  ) %>% head

Appendix 2

set.seed(123)

#Getting a rough grouped sample ~1000 rows 
sample_1000 <- eagle_data %>%
  group_by(Animal_ID) %>%
  sample_frac(size = 1000 / nrow(eagle_data)) %>%
  ungroup()

#Selecting just the important variables of the dataset
vars <- sample_1000 %>%
  select(KPH, Sn, AGL, abs_angle, VerticalRate, absVR)

#pivoting those variable 
vars_long <- vars %>%
  pivot_longer(cols = c(Sn, KPH, AGL, VerticalRate, absVR, abs_angle),
               names_to = "variable",
               values_to = "value")

#faceted histogram of each graph 
ggplot(vars_long, aes(x = value)) +
  geom_histogram(bins = 30, color = "white") +
  facet_wrap(~ variable, scales = "free") +
  theme_minimal() +
  labs(
    title = "Distribution of Selected Variables",
    x = "Value",
    y = "Count"
  )
#sqaure rooting the variables
vars_fixed <- vars %>%
  mutate(
    AGL = sqrt(AGL),
    absVR = sqrt(absVR),
    abs_angle = sqrt(abs_angle),
    Sn = sqrt(Sn)
  )


#pivoting the variables to fit them into a faceted data
vars_long <- vars_fixed %>%
  pivot_longer(cols = c(Sn, KPH, AGL, VerticalRate, absVR, abs_angle),
               names_to = "variable",
               values_to = "value")


#graphing the faceted histogram for normalized values
ggplot(vars_long, aes(x = value)) +
  geom_histogram(bins = 30, color = "white") +
  facet_wrap(~ variable, scales = "free") +
  theme_minimal() +
  labs(
    title = "Histograms of Focal Variables after Sq. Root Transformation",
    x = "Value",
    y = "Count"
  )

Appendix 3

#Taking the PC loadings of our data (our vector values)
pca_obj <- prcomp(vars_fixed,center = TRUE, scale. = TRUE)
#turn the PC loadings into a dataframe, so that the  
#we can use avg. sil and wss to determine the k value


set.seed(42)

#Getting our clustered list kmeans score using k=2, 
#giving us a series of vectors that are our convexes
kmeans_df <- kmeans(scores, centers = 2, nstart = 10)


#Getting a graphed biplot of the kmeans (after normalization)

kmeans_biplot <- fviz_pca(pca_obj,
      label = 'var',
     habillage = factor(kmeans_df$cluster),
     repel = TRUE) + 
  ggtitle('PCA biplot with k=2 clustering') +
  coord_cartesian(xlim = c(-8,5), ylim = c(-10,5)) +
  guides(color='none', shape ='none')


#Getting a kmeans of the PC loadings but k=4 for the clustering
kmeans_df <- kmeans(scores, centers = 4, nstart = 10)
#color coding by cluster for PCA biplot k=4
custom_colors <- c(
  "1" = "turquoise",   # light green for cluster 1
  "2" = "purple",      #red for cluster 2
  "3" = "lightgreen",  # blue for cluster 3
  "4" = "red"        # purple for cluster 4
)

#shape coding by cluster for PCA biplot k=4 
custom_shapes <- c(
  "1" = 17,   # triangle
  "2" = 3 ,    # plus (+)
  "3" = 15,   # circle
  "4" = 16     # sqaure 
)

#PCA biplot when k=4 for distinguishing in-flight movements
kmeans_biplot <- fviz_pca(pca_obj,
      label = 'var',
     habillage = factor(kmeans_df$cluster),
     repel = TRUE) + 
  ggtitle('PCA biplot of the 4-cluster solution') +
  coord_cartesian(xlim = c(-8,5), ylim = c(-10,5)) + 
  scale_shape_manual(values = custom_shapes) +
  scale_color_manual(values = custom_colors) +
  guides(color='none', shape ='none')

kmeans_biplot

#Assigning each observation to a cluster, this line 
#attaches the k-means labels to the PCA score object 
scores$cluster <- factor(kmeans_df$cluster)

#Aligning the cluster alignments with each observation
sample_1000 <- bind_cols(sample_1000, cluster = scores$cluster)


#Color coded by clusters, this creates a line graph
#with KPH as a response over Time (in seconds)
kph_time <- sample_1000 %>%
  filter(Animal_ID == 109) %>%
  mutate(seconds = cumsum(TimeDiff)) %>%
  ggplot(aes(x = seconds, y = KPH, color = cluster)) +
  geom_point() +
  geom_line(alpha = 0.3) +
  scale_color_manual(values = custom_colors) +
  theme_minimal() +
  labs(x = "Seconds", y = "KPH",
       title = "Instantaneous Speed Over Time for 1 Randomly Picked Bird")

kph_time