We are interpreting any possible flight patterns for Bald Eagles in Iowa. We have two main questions to answer:
How do we classify a data point as “moving” or “perched?”
For “moving” variables, are there certain/distinct patterns that are recognizable?
Methods
Before writing any code, it is always best to write your process down, and also consider any other possibilities.
To consider: 2,000,000 data points is a LOT of data to load into R. If we were to visualize this data on a PCA plot, it will very likely crash or take extremely long to load. We need to minimize this data before making any plots.
Process:
Use stratified sampling to reduce the amount of data being used
Create a scree plot and Within-cluster Sum of Square means plot to determine how our PCA plot will form
Methods (contd.)
Generate a PCA plot with arrows for each movement variable
After determining the clusters characteristics, provide an example of a specific segment and how it moves over time.
WSS Table
Scree Plot
PCA 1 explains about 43% of variation in the data, while PCA 2 explains an additional 20% of the data.
PCA
Only one cluster can be perched…
Obviously, only one group of points can be classified as “perched” or not moving. It makes sense that the cluster will have around these values:
Very low values of both horizontal and vertical velocity; KPH, Sn, Vertical Rate(s)
The angle is tricky. Under the assumption that if a bird’s wings are straight up, they will have a turn value of 0, then we are also looking for low values here.
Cluster one is low in altitude, change in altitude, but moving relatively fast with a high turn angle. The eagle is likely hunting, looking for animals/objects.
Cluster two is the fastest cluster, high in the air, and the lowest turn angle. The eagle is likely cruising in the air, covering a longer distance.
Cluster three has the lowest speeds, altitude, but has a very high turn angle. The eagle is likely stationary, but could also be closely circling a certain object.
Cluster four has a middle of the pack speed, but the highest altitude cluster and change in altitude (climbing), while also turning. This eagle is likely both soaring and searching.
Behavior Example
[1] "The segment that has the most amount of data in cluster 2 is 142054 ."
Appendix
library(dplyr)library(cluster)library(ggplot2)library(ggrepel)library(FactoMineR)library(factoextra)library(tidyr)load("eagle_data.RData")# Changed the data frame to dfdata_list <-ls()data_name <- data_list[length(data_list)]df <-get(data_name)movement_vars <-c("KPH", "Sn", "AGL", "VerticalRate", "abs_angle", "absVR")set.seed(123) # Determine sample size per animal. If animal has less than 1000, then all of their data will be used. sample_size <-1000df_sample <- df %>%group_by(Animal_ID) %>%group_modify(~slice_sample(.x, n =min(nrow(.x), sample_size))) %>%ungroup()movement_vars <-c("KPH", "Sn", "AGL", "VerticalRate", "abs_angle", "absVR")df_pca <- df_sample %>%select(all_of(movement_vars)) %>%drop_na()df_scaled <-scale(df_pca)# PCApca_res <-prcomp(df_scaled, rank. =2)pca_scores <-as.data.frame(pca_res$x)wss <-sapply(2:6, function(k){kmeans(df_scaled, centers = k, nstart =10, iter.max =20)$tot.withinss})wss_df <-data.frame(k =1:10,WSS = wss)ggplot(wss_df, aes(k, WSS)) +geom_line() +geom_point() +labs(title ="Elbow Method: Total WSS vs k",x ="Number of clusters (k)",y ="Total WSS") +theme_minimal()k <-4# Scoresset.seed(123)km_final <-kmeans(df_scaled, centers = k, nstart =25)pca_scores$Cluster <-factor(km_final$cluster)# Loadingsloadings <-as.data.frame(pca_res$rotation)loadings$Variable <-rownames(loadings)
fviz_eig(pca_res) +ggtitle("Scree Plot for PCA of Eagle Movement Variables")
# This code finds the segment ID with the most points in cluster 2 to help visualize the data best_segment <- df_sample %>%filter(Cluster ==2) %>%# keep only Cluster 2 count(segment_id) %>%# count rows per segmentarrange(desc(n)) %>%# sort largest to smallestslice(1) %>%# take the top onepull(segment_id) # store the IDprint(paste("The segment that has the most amount of data in cluster 2 is ", best_segment,"."))segment_path <- df_sample %>%filter(segment_id ==142054) %>%arrange(LocalTime)# Starting pointx0 <- segment_path$X[1]y0 <- segment_path$Y[1]# Compute Euclidean distance from originsegment_path <- segment_path %>%mutate(Distance_from_origin =sqrt((X - x0)^2+ (Y - y0)^2))# Plot Distance (Euclidian Method)ggplot(segment_path, aes(x = LocalTime, y = Distance_from_origin)) +geom_line() +scale_y_continuous(labels = scales::comma) +geom_point(size =1) +labs(title ="Segment 142054: Distance from Origin Over Time",x ="Time",y ="Distance from Origin (meters)" ) +theme_minimal()