GPS Tracking of Eagle Data in Iowa

How do Eagles in Iowa Traverse?

We wanted to see how Eagles moved around in the “Hawkeye” state!

Background

Biologging is when sensors are attached to animals to monitor their heart rate, movement, & other behaviors.
In this large scale observational study over 2 million data points were collected from 57 eagles over a 4 year period

Our variables

Overall, 9 numerical variables were drawn from this survey to chart out GPS points
We are focusing on six variables to analyze tracking points to answer 2 research questions:

Can we use the movement variables KPH (instantaneous speed), Sn (average speed), AGL (above ground level), |Angle| (abs. angle), Vertical rate, and |VR| (absolute VR), to differentiate between in-flight and perching points?
What distinct in-flight behaviors do eagles exhibit, how can we characterize them, and how can representative flight segments illustrate these behaviors?

Data

The data that was given

# A tibble: 6 × 15
  Animal_ID TimeDiff segment_id segment_length LocalTime           Latitude Longitude       X        Y   KPH    Sn   AGL VerticalRate abs_angle absVR
      <int>    <int>      <dbl>          <int> <dttm>                 <dbl>     <dbl>   <dbl>    <dbl> <dbl> <dbl> <dbl>        <dbl>     <dbl> <dbl>
1       105        5          1             10 2019-06-25 16:44:10     41.6     -92.9 509745. 4609596.  0.07  0.52  9.55        -0.99      0.45  0.99
2       105        6          1             10 2019-06-25 16:44:16     41.6     -92.9 509746. 4609597.  0.07  0.17  9.55         0         0.21  0   
3       105        5          1             10 2019-06-25 16:44:21     41.6     -92.9 509747. 4609597.  0.2   0.14  9.55         0         0.91  0   
4       105        6          1             10 2019-06-25 16:44:27     41.6     -92.9 509747. 4609599.  0.16  0.22  9.55         0         3.14  0   
5       105        5          1             10 2019-06-25 16:44:32     41.6     -92.9 509747. 4609598.  0.11  0.08 11.6          0.39      2.69  0.39
6       105        6          1             10 2019-06-25 16:44:38     41.6     -92.9 509746. 4609599.  0.07  0.22 11.6          0         0.53  0

Methods

Because of such a large data set I grouped by Animal ID for random selection of smaller data for optimization.

We want to find ways to distinguish points from each other for identifying movement patterns
Clustering methods that will help pin point what points should be grouped together

Is our data normal?

We want to check for data normality to uphold accuracy

We see that abs_angle, absVR, AGL and Sn all very positively skewed- not good.

Fixing our data first

To fix that I used an Square Root Transformation to normalize the focal variables

Clustering our data

K-means is an effective way to cluster large number of observations which is what I used to converge points.

1st.) Silhouette

2nd.) WSS

Clustered PCA biplot k=2

PCA biplot is used to distinguish GPS tracking points that were moving and those that were perched so we used 2 clusters.

Takeaways from the PCA Biplot where k=2

In-flight data points are likely overly represented in the first cluster where vectors AGL, KPH,Sn, and absVR were projected far across the primarly first observations.
Perched data points were likely over represented and grouped in the second cluster where where vectors VerticalRate and abs_angle were projected furter across the second group.
The strongest variables were KPH, Sn, & AGL
The weakest variable was verticalRate
69% of variance explained by first two PC in covariates

Clustered PCA biplot k=4

This PCA biplot is used to distinguish between the movement that were identified as in-flight points-ascending, flapping, descending-and determine patterns

Takeaways from the PCA Biplot where k=4

The clusters that had was highest in KPH and in SN, strong in other clusters was the gliding/descending cluster. It also had much lower association with vertical rate and absolute angle.
The cluster strong in AGL, absVR, VerticalRate, and abs_angle was the ascending movement. Lower association with KPH and Sn
The cluster modestly stronger in KPH and Sn, was the flapping movement. Which had lower association with AGL, absVR, VerticalRate, abs_angle.
Accounts for 68.7% of the variance in covariates

Looking at In-flight Behaviors

We want to view ‘in-flight’ behavior by randomly selecting 1 eagle to visualize their KPH over time (11 second time intervals)
This will allow us to deduce what behaviors (the four) are exhibited at certain times of their journey
Behaviors that I deduced: descending had the highest KPH, followed by ascending and flapping. Perching had next to none.

Appendix 1

#loading libraries and our dataset
library(cluster)
library(dbscan)
library(factoextra)
library(tidyverse)
library(patchwork)
library(ggrepel)
library(tidyverse)
options(width=10000)

#loading our dataset 
load('eagle_data.Rdata')
(eagle_data
  %>% as_tibble() 
  ) %>% head

Appendix 2

set.seed(123)

#Getting a rough grouped sample ~1000 rows 
sample_1000 <- eagle_data %>%
  group_by(Animal_ID) %>%
  sample_frac(size = 1000 / nrow(eagle_data)) %>%
  ungroup()

#Selecting just the important variables of the dataset
vars <- sample_1000 %>%
  select(KPH, Sn, AGL, abs_angle, VerticalRate, absVR)

#pivoting those variable 
vars_long <- vars %>%
  pivot_longer(cols = c(Sn, KPH, AGL, VerticalRate, absVR, abs_angle),
               names_to = "variable",
               values_to = "value")

#faceted histogram of each graph 
ggplot(vars_long, aes(x = value)) +
  geom_histogram(bins = 30, color = "white") +
  facet_wrap(~ variable, scales = "free") +
  theme_minimal() +
  labs(
    title = "Distribution of Selected Variables",
    x = "Value",
    y = "Count"
  )

#sqaure rooting the variables
vars_fixed <- vars %>%
  mutate(
    AGL = sqrt(AGL),
    absVR = sqrt(absVR),
    abs_angle = sqrt(abs_angle),
    Sn = sqrt(Sn)
  )


#pivoting the variables to fit them into a faceted data
vars_long <- vars_fixed %>%
  pivot_longer(cols = c(Sn, KPH, AGL, VerticalRate, absVR, abs_angle),
               names_to = "variable",
               values_to = "value")


#graphing the faceted histogram for normalized values
ggplot(vars_long, aes(x = value)) +
  geom_histogram(bins = 30, color = "white") +
  facet_wrap(~ variable, scales = "free") +
  theme_minimal() +
  labs(
    title = "Histograms of Focal Variables after Sq. Root Transformation",
    x = "Value",
    y = "Count"
  )

Appendix 3

#Taking the PC loadings of our data (our vector values)
pca_obj <- prcomp(vars_fixed,center = TRUE, scale. = TRUE)
#turn the PC loadings into a dataframe, so that the  
#we can use avg. sil and wss to determine the k value


set.seed(42)

#Getting our clustered list kmeans score using k=2, 
#giving us a series of vectors that are our convexes
kmeans_df <- kmeans(scores, centers = 2, nstart = 10)


#Getting a graphed biplot of the kmeans (after normalization)

kmeans_biplot <- fviz_pca(pca_obj,
      label = 'var',
     habillage = factor(kmeans_df$cluster),
     repel = TRUE) + 
  ggtitle('PCA biplot with k=2 clustering') +
  coord_cartesian(xlim = c(-8,5), ylim = c(-10,5)) +
  guides(color='none', shape ='none')


#Getting a kmeans of the PC loadings but k=4 for the clustering
kmeans_df <- kmeans(scores, centers = 4, nstart = 10)

#color coding by cluster for PCA biplot k=4
custom_colors <- c(
  "1" = "turquoise",   # light green for cluster 1
  "2" = "purple",      #red for cluster 2
  "3" = "lightgreen",  # blue for cluster 3
  "4" = "red"        # purple for cluster 4
)

#shape coding by cluster for PCA biplot k=4 
custom_shapes <- c(
  "1" = 17,   # triangle
  "2" = 3 ,    # plus (+)
  "3" = 15,   # circle
  "4" = 16     # sqaure 
)

#PCA biplot when k=4 for distinguishing in-flight movements
kmeans_biplot <- fviz_pca(pca_obj,
      label = 'var',
     habillage = factor(kmeans_df$cluster),
     repel = TRUE) + 
  ggtitle('PCA biplot of the 4-cluster solution') +
  coord_cartesian(xlim = c(-8,5), ylim = c(-10,5)) + 
  scale_shape_manual(values = custom_shapes) +
  scale_color_manual(values = custom_colors) +
  guides(color='none', shape ='none')

kmeans_biplot

#Assigning each observation to a cluster, this line 
#attaches the k-means labels to the PCA score object 
scores$cluster <- factor(kmeans_df$cluster)

#Aligning the cluster alignments with each observation
sample_1000 <- bind_cols(sample_1000, cluster = scores$cluster)


#Color coded by clusters, this creates a line graph
#with KPH as a response over Time (in seconds)
kph_time <- sample_1000 %>%
  filter(Animal_ID == 109) %>%
  mutate(seconds = cumsum(TimeDiff)) %>%
  ggplot(aes(x = seconds, y = KPH, color = cluster)) +
  geom_point() +
  geom_line(alpha = 0.3) +
  scale_color_manual(values = custom_colors) +
  theme_minimal() +
  labs(x = "Seconds", y = "KPH",
       title = "Instantaneous Speed Over Time for 1 Randomly Picked Bird")

kph_time