In this analysis, we aim to better understand the flight behavior of bald eagles by examining GPS-derived movement metrics and identifying patterns of similarity and difference within the data.
The data used is obtained from GPS biologgers on Bald Eagles in Iowa (bergen, et al.)
2,093,022 GPS points from 57 eagles over ~ 4 years
13 Variables within the data set, 6 of them being flight variables useful within analysis
| Variable | Definition |
|---|---|
| KPH | Current Speed (Kilometers) |
| AGL | Meters above ground |
| SN | Horizontal Distance between measures |
| |Angle| | Abs. Value of turn angle (Radians) |
| Vertical Rate | Mean vertical Velocity between measures |
| |Vertical Rate| | Absolute Value of Vertical Rate |
With the massive amounts of data present, a clustering approach is appropriate to find distinct flight patterns in the data.
The goal is:
Get Data ready (scale, fix skewing issues)
Determine the number of clusters that does the best job creating distinct groups
Plot the clusters (Using Principal Component Analysis to plot in 2 dimensions)
Examine Characteristics for each variable within each cluster
Determine, for each cluster, the typical flight behaviors or patterns of eagles based on the variables in that cluster.
Before data for the clustering can be used, some steps need to be taken:
Next, we need to determine the amount of clusters that are best at creating unique groupings
To do this, we will use 3 approaches:
Elbow Plot
Average Silhouette distances plot (bootstrapped to bypass computational limits problem)
Visual inspection
Elbow (WSS) plots show the total within-cluster variation for different numbers of clusters, with the ‘elbow’ indicating the optimal cluster count.
There seems to be no obvious elbow, so this is not much help
An average silhouette width plot shows how well data points fit within their assigned clusters for different numbers of clusters. Values range from -1 to 1, with 1 being the best
Judging by this plot, we should use 4 clusters to have the most distinct clusters possible
Visualizing how k amount of clusters would look can also give us a good idea for the appropriate amount of clusters - Which seems to be either 3 or 4
Since we have 6 dimensions of data, we will need to reduce our data down in order to properly visualize
To do this, we will use Principal Component Analysis to reduce dimensionality while retaining most of its variability
We will be using K-means to perform this clustering for its ability to handle large amounts of data
For visualization and clustering, we focus on the first two principal components, which together explain 67% of the total variance.
So, the Analysis is able to preserve 2/3rds of the original variance from the 6-dimension to 2nd dimension reduction
Perching - Opposite of KPH and SN, indicating no movement
Flapping/Low Flying - Pretty low amounts of everything, but not opposite end of the horizontal movement variables
Soaring - High speed, slow descending, some correlation with AGL and absolute angle as well
Ascending - High levels of Vertical and |Vertical| rate
Examining Horizontal Flight Patterns for each cluster over a Flight Segment
Examining Vertical Flight Patterns for each cluster over time
| Variable | Perching | Low Flying/Flapping | Soaring | Ascending |
|---|---|---|---|---|
| KPH | 0 | Low | High | Varies |
| AGL | Very Low | Low | Varies | High |
| SN | 0 | Low | High | Varies |
| |Angle| | Varies | Low | High | Very Low |
| Vertical Rate | 0 | Low | Negative | High |
| |Vertical Rate| | 0 | Low | High | High |