Classifying Eagle Flight Behaviors

Motivation

In this analysis, we aim to better understand the flight behavior of bald eagles by examining GPS-derived movement metrics and identifying patterns of similarity and difference within the data.

Data

The data used is obtained from GPS biologgers on Bald Eagles in Iowa (bergen, et al.)

2,093,022 GPS points from 57 eagles over ~ 4 years
13 Variables within the data set, 6 of them being flight variables useful within analysis

Variable	Definition
KPH	Current Speed (Kilometers)
AGL	Meters above ground
SN	Horizontal Distance between measures
\|Angle\|	Abs. Value of turn angle (Radians)
Vertical Rate	Mean vertical Velocity between measures
\|Vertical Rate\|	Absolute Value of Vertical Rate

X,Y, and Z variables for location are also present, as well as time recorded for each measurement

Methods

With the massive amounts of data present, a clustering approach is appropriate to find distinct flight patterns in the data.

The goal is:

Get Data ready (scale, fix skewing issues)
Determine the number of clusters that does the best job creating distinct groups
Plot the clusters (Using Principal Component Analysis to plot in 2 dimensions)
Examine Characteristics for each variable within each cluster
Determine, for each cluster, the typical flight behaviors or patterns of eagles based on the variables in that cluster.

Methods - Prepping data

Before data for the clustering can be used, some steps need to be taken:

Data must be scaled to ensure all variables are weighted equally

Every variable distribution (except for vertical rate) looks like this. To fix the skewing issue, we will square root the data

Methods - Determining cluster amount needed

Next, we need to determine the amount of clusters that are best at creating unique groupings

To do this, we will use 3 approaches:

Elbow Plot
Average Silhouette distances plot (bootstrapped to bypass computational limits problem)
Visual inspection

Methods - Determining cluster amount needed

Elbow (WSS) plots show the total within-cluster variation for different numbers of clusters, with the ‘elbow’ indicating the optimal cluster count.

There seems to be no obvious elbow, so this is not much help

Methods - Determining cluster amount needed

An average silhouette width plot shows how well data points fit within their assigned clusters for different numbers of clusters. Values range from -1 to 1, with 1 being the best

Judging by this plot, we should use 4 clusters to have the most distinct clusters possible

Methods

Visualizing how k amount of clusters would look can also give us a good idea for the appropriate amount of clusters - Which seems to be either 3 or 4

Methods - PCA

Since we have 6 dimensions of data, we will need to reduce our data down in order to properly visualize

To do this, we will use Principal Component Analysis to reduce dimensionality while retaining most of its variability

We will be using K-means to perform this clustering for its ability to handle large amounts of data

Methods - PCA

For visualization and clustering, we focus on the first two principal components, which together explain 67% of the total variance.

So, the Analysis is able to preserve 2/3rds of the original variance from the 6-dimension to 2nd dimension reduction

Results

Perching - Opposite of KPH and SN, indicating no movement

Flapping/Low Flying - Pretty low amounts of everything, but not opposite end of the horizontal movement variables

Soaring - High speed, slow descending, some correlation with AGL and absolute angle as well

Ascending - High levels of Vertical and |Vertical| rate

Results

Examining Horizontal Flight Patterns for each cluster over a Flight Segment

Results

Examining Vertical Flight Patterns for each cluster over time

Conclusion

Variable	Perching	Low Flying/Flapping	Soaring	Ascending
KPH	0	Low	High	Varies
AGL	Very Low	Low	Varies	High
SN	0	Low	High	Varies
\|Angle\|	Varies	Low	High	Very Low
Vertical Rate	0	Low	Negative	High
\|Vertical Rate\|	0	Low	High	High