We wanted to see how Eagles moved around in the “Hawkeye” state!
Background
Biologging is when sensors are attached to animals to monitor their heart rate, movement, & other behaviors.
In this large scale observational study over 2 million data points were collected from 57 eagles over a 4 year period
Our variables
Overall, 9 numerical variables were drawn from this survey to chart out GPS points
We are focusing on six variables to analyze tracking points to answer 2 research questions:
Can we use the movement variables KPH (instantaneous speed), Sn (average speed), AGL (above ground level), |Angle| (abs. angle), Vertical rate, and |VR| (absolute VR), to differentiate between in-flight and perching points?
What distinct in-flight behaviors do eagles exhibit, how can we characterize them, and how can representative flight segments illustrate these behaviors?
Because of such a large data set I grouped by Animal ID for random selection of smaller data for optimization.
We want to find ways to distinguish points from each other for identifying movement patterns
Clustering methods that will help pin point what points should be grouped together
Is our data normal?
We want to check for data normality to uphold accuracy
We see that abs_angle, absVR, AGL and Sn all very positively skewed- not good.
Fixing our data first
To fix that I used an Square Root Transformation to normalize the focal variables
Clustering our data
K-means is an effective way to cluster large number of observations which is what I used to converge points.
1st.) Silhouette
2nd.) WSS
Clustered PCA biplot k=2
PCA biplot is used to distinguish GPS tracking points that were moving and those that were perched so we used 2 clusters.
Takeaways from the PCA Biplot where k=2
In-flight data points are likely overly represented in the first cluster where vectors AGL, KPH,Sn, and absVR were projected far across the primarly first observations.
Perched data points were likely over represented and grouped in the second cluster where where vectors VerticalRate and abs_angle were projected furter across the second group.
The strongest variables were KPH, Sn, & AGL
The weakest variable was verticalRate
69% of variance explained by first two PC in covariates
Clustered PCA biplot k=4
This PCA biplot is used to distinguish between the movement that were identified as in-flight points-ascending, flapping, descending-and determine patterns
Takeaways from the PCA Biplot where k=4
The clusters that had was highest in KPH and in SN, strong in other clusters was the gliding/descending cluster. It also had much lower association with vertical rate and absolute angle.
The cluster strong in AGL, absVR, VerticalRate, and abs_angle was the ascending movement. Lower association with KPH and Sn
The cluster modestly stronger in KPH and Sn, was the flapping movement. Which had lower association with AGL, absVR, VerticalRate, abs_angle.
Accounts for 68.7% of the variance in covariates
Looking at In-flight Behaviors
We want to view ‘in-flight’ behavior by randomly selecting 1 eagle to visualize their KPH over time (11 second time intervals)
This will allow us to deduce what behaviors (the four) are exhibited at certain times of their journey
Behaviors that I deduced: descending had the highest KPH, followed by ascending and flapping. Perching had next to none.
Appendix 1
#loading libraries and our datasetlibrary(cluster)library(dbscan)library(factoextra)library(tidyverse)library(patchwork)library(ggrepel)library(tidyverse)options(width=10000)#loading our dataset load('eagle_data.Rdata')(eagle_data%>%as_tibble() ) %>% head
Appendix 2
set.seed(123)#Getting a rough grouped sample ~1000 rows sample_1000 <- eagle_data %>%group_by(Animal_ID) %>%sample_frac(size =1000/nrow(eagle_data)) %>%ungroup()#Selecting just the important variables of the datasetvars <- sample_1000 %>%select(KPH, Sn, AGL, abs_angle, VerticalRate, absVR)#pivoting those variable vars_long <- vars %>%pivot_longer(cols =c(Sn, KPH, AGL, VerticalRate, absVR, abs_angle),names_to ="variable",values_to ="value")#faceted histogram of each graph ggplot(vars_long, aes(x = value)) +geom_histogram(bins =30, color ="white") +facet_wrap(~ variable, scales ="free") +theme_minimal() +labs(title ="Distribution of Selected Variables",x ="Value",y ="Count" )
#sqaure rooting the variablesvars_fixed <- vars %>%mutate(AGL =sqrt(AGL),absVR =sqrt(absVR),abs_angle =sqrt(abs_angle),Sn =sqrt(Sn) )#pivoting the variables to fit them into a faceted datavars_long <- vars_fixed %>%pivot_longer(cols =c(Sn, KPH, AGL, VerticalRate, absVR, abs_angle),names_to ="variable",values_to ="value")#graphing the faceted histogram for normalized valuesggplot(vars_long, aes(x = value)) +geom_histogram(bins =30, color ="white") +facet_wrap(~ variable, scales ="free") +theme_minimal() +labs(title ="Histograms of Focal Variables after Sq. Root Transformation",x ="Value",y ="Count" )
Appendix 3
#Taking the PC loadings of our data (our vector values)pca_obj <-prcomp(vars_fixed,center =TRUE, scale. =TRUE)#turn the PC loadings into a dataframe, so that the #we can use avg. sil and wss to determine the k valueset.seed(42)#Getting our clustered list kmeans score using k=2, #giving us a series of vectors that are our convexeskmeans_df <-kmeans(scores, centers =2, nstart =10)#Getting a graphed biplot of the kmeans (after normalization)kmeans_biplot <-fviz_pca(pca_obj,label ='var',habillage =factor(kmeans_df$cluster),repel =TRUE) +ggtitle('PCA biplot with k=2 clustering') +coord_cartesian(xlim =c(-8,5), ylim =c(-10,5)) +guides(color='none', shape ='none')#Getting a kmeans of the PC loadings but k=4 for the clusteringkmeans_df <-kmeans(scores, centers =4, nstart =10)
#color coding by cluster for PCA biplot k=4custom_colors <-c("1"="turquoise", # light green for cluster 1"2"="purple", #red for cluster 2"3"="lightgreen", # blue for cluster 3"4"="red"# purple for cluster 4)#shape coding by cluster for PCA biplot k=4 custom_shapes <-c("1"=17, # triangle"2"=3 , # plus (+)"3"=15, # circle"4"=16# sqaure )#PCA biplot when k=4 for distinguishing in-flight movementskmeans_biplot <-fviz_pca(pca_obj,label ='var',habillage =factor(kmeans_df$cluster),repel =TRUE) +ggtitle('PCA biplot of the 4-cluster solution') +coord_cartesian(xlim =c(-8,5), ylim =c(-10,5)) +scale_shape_manual(values = custom_shapes) +scale_color_manual(values = custom_colors) +guides(color='none', shape ='none')kmeans_biplot#Assigning each observation to a cluster, this line #attaches the k-means labels to the PCA score object scores$cluster <-factor(kmeans_df$cluster)#Aligning the cluster alignments with each observationsample_1000 <-bind_cols(sample_1000, cluster = scores$cluster)#Color coded by clusters, this creates a line graph#with KPH as a response over Time (in seconds)kph_time <- sample_1000 %>%filter(Animal_ID ==109) %>%mutate(seconds =cumsum(TimeDiff)) %>%ggplot(aes(x = seconds, y = KPH, color = cluster)) +geom_point() +geom_line(alpha =0.3) +scale_color_manual(values = custom_colors) +theme_minimal() +labs(x ="Seconds", y ="KPH",title ="Instantaneous Speed Over Time for 1 Randomly Picked Bird")kph_time