GPS Tracking of Eagle Data in Iowa

How do Eagles in Iowa Traverse?

We wanted to see how Eagles moved around in the “Hawkeye” state!

Background

Biologging is the use of attached sensors that go on animals to monitor their movement, heart rate & other behaviors
In this large scale observational study over 2 million data points were collected from 57 eagles over a 4 year period

Our variables

Overall, 9 numerical variables were drawn from this survey to chart out GPS points
We are only working with six variables in our data to analyze and determine what our tracking points say relating to 2 research questions:

Can we use the movement variables KPH, Sn, AGL, |Angle|, Vertical rate, and |VR|, to differentiate between in-flight from perching points?
What are some big takeaways from analysis and visualization of the in-flight points that we can determine?

Data

This is a massive set of data we’re gifted (2 million rows) we might have to use data scaling skills to make it smaller

# A tibble: 6 × 15
  Animal_ID TimeDiff segment_id segment_length LocalTime           Latitude Longitude       X        Y   KPH    Sn   AGL VerticalRate abs_angle absVR
      <int>    <int>      <dbl>          <int> <dttm>                 <dbl>     <dbl>   <dbl>    <dbl> <dbl> <dbl> <dbl>        <dbl>     <dbl> <dbl>
1       105        5          1             10 2019-06-25 16:44:10     41.6     -92.9 509745. 4609596.  0.07  0.52  9.55        -0.99      0.45  0.99
2       105        6          1             10 2019-06-25 16:44:16     41.6     -92.9 509746. 4609597.  0.07  0.17  9.55         0         0.21  0   
3       105        5          1             10 2019-06-25 16:44:21     41.6     -92.9 509747. 4609597.  0.2   0.14  9.55         0         0.91  0   
4       105        6          1             10 2019-06-25 16:44:27     41.6     -92.9 509747. 4609599.  0.16  0.22  9.55         0         3.14  0   
5       105        5          1             10 2019-06-25 16:44:32     41.6     -92.9 509747. 4609598.  0.11  0.08 11.6          0.39      2.69  0.39
6       105        6          1             10 2019-06-25 16:44:38     41.6     -92.9 509746. 4609599.  0.07  0.22 11.6          0         0.53  0

Methods

We want to minimize the dataset since 2 million is a lot of rows. Personally I grouped by Animal ID for random selection of smaller data for optimization.

We want to distinguish points from each other before we look at a PCA biplot.
Clustering methods that will help pin point what points should be grouped together

Is our data normal?

What would a distributions of each variable look like?

We see that abs_angle, absVR, AGL and Sn all very positively skewed- not good.

Fixing our data first

Before we do any work with the data we want to make sure that our data is regularly skewed
I used an Square Root Transformation to normalize the focal variables

Clustering our data

K-means is a very effective way of trying to cluster large datasets related to movement analysis

1st.) Silhouette 2nd.) WSS

Clustered PCA biplot k=2

This PCA biplot is used to distinguish GPS tracking points that were moving and those that were perched so we used 2 clusters.

Takeaways from the PCA Biplot where k=2

In-flight data points are likely overly represented in the first cluster where vectors AGL, KPH,Sn, and absVR were projected far across the primarly first observations.
Perched data points were likely over represented and grouped in the second cluster where where vectors VerticalRate and abs_angle were projected furter across the second group.
The strongest variables were KPH, Sn, & AGL
The weakest variable was verticalRate
67% of variance explained by first two PC in covariates

Clustered PCA biplot k=4

This PCA biplot is used to distinguish between the movement that were flying points-ascending, flapping, descending-and see patterns

Takeaways from the PCA Biplot where k=4

The clusters that had was highest in KPH and in SN, strong in other clusters was the gliding/descending cluster. It also had much lower association with vertical rate and absolute angle.
The cluster strong in AGL, absVR, VerticalRate, and abs_angle was the ascending movement. Lower association with KPH and Sn
The cluster stronger in KPH and Sn, was the flapping movement. Which had lower association with AGL, absVR, VerticalRate, abs_angle.
Accounts for 68.7% of the variance in covariates

Looking at In-flight Behaviors

We want to view ‘in-flight’ behavior by randomly selecting 1 eagle to visualize their KPH over time (11 second time intervals)
This will allow us to deduce what behaviors (the four) are exhibited at certain times of their journey
Behaviors that I deduced: descending had the highest KPH, followed by ascending and flapping. Perching had next to none.

Appendix

#loading libraries and our dataset
library(cluster)
library(dbscan)
library(factoextra)
library(tidyverse)
library(patchwork)
library(ggrepel)
library(tidyverse)
options(width=10000)

#loading our dataset 
load('eagle_data.Rdata')
(eagle_data
  %>% as_tibble() 
  ) %>% head

Appendix 2

set.seed(123)

#Getting a rough grouped sample 
sample_1000 <- eagle_data %>%
  group_by(Animal_ID) %>%
  sample_frac(size = 1000 / nrow(eagle_data)) %>%
  ungroup()

#Selecting just the important variables of the dataset
vars <- sample_1000 %>%
  select(KPH, Sn, AGL, abs_angle, VerticalRate, absVR)

#pivoting those variable 
vars_long <- vars %>%
  pivot_longer(cols = c(Sn, KPH, AGL, VerticalRate, absVR, abs_angle),
               names_to = "variable",
               values_to = "value")

#faceted histogram of each graph 
ggplot(vars_long, aes(x = value)) +
  geom_histogram(bins = 30, color = "white") +
  facet_wrap(~ variable, scales = "free") +
  theme_minimal() +
  labs(
    title = "Distribution of Selected Variables",
    x = "Value",
    y = "Count"
  )

#sqaure rooting the variables
vars_fixed <- vars %>%
  mutate(
    AGL = sqrt(AGL),
    absVR = sqrt(absVR),
    abs_angle = sqrt(abs_angle),
    Sn = sqrt(Sn)
  )


#pivoting the variables to fit them into a faceted data
vars_long <- vars_fixed %>%
  pivot_longer(cols = c(Sn, KPH, AGL, VerticalRate, absVR, abs_angle),
               names_to = "variable",
               values_to = "value")


#graphing the faceted histogram for normalized values
ggplot(vars_long, aes(x = value)) +
  geom_histogram(bins = 30, color = "white") +
  facet_wrap(~ variable, scales = "free") +
  theme_minimal() +
  labs(
    title = "Histograms of Focal Variables after Sq. Root Transformation",
    x = "Value",
    y = "Count"
  )