Project2 on Eagle Data

Overview

Data was collected on 57 different eagles using GPS biologgers.
This data was collected over the course of 4 years.
The data included many different variables and the movement variables were the most important for the analysis.

Movement variables

The movement variables in the data were:

KPH: Instantaneous speed in kilometers per hours (k/h)

Sn: Average speed over time in meters per second (m/s)

AGL: Meters (m) above ground level

|Angle|: Absolute value of turn angle in radians

Vertical rate: Mean vertical velocity calculated by change in altitude / change in time betweeen consecutive sample points measured in meters per second (m/s)

|Vertical rate|: Absolute value of vertical rate in meters per second (m/s)

Two questions

Question 1: Can we use the movement variables KPH, Sn, AGL, |Angle|, Vertical rate, and |Vertical rate|, to distinguish in-flight from perching points?

Question 2:

Of the in-flight points:

a. Are there distinct flight behaviors that can be identified?

b. What are the characteristics of these behaviors?

c. What are some visual examples of flight segments that demonstrate the different types of in-flight behaviors?

Methods

This is the pca plot that has all of the eagle data. It has two clusters to show that there is a perching cluster and an in flight cluster.

Methods continued

One issue with the entire eagle data set is that it has 2 million rows which will create long computation times for future parts of the analysis.
One way to solve this issue is to take samples of the data.
This analysis takes 6 random samples of 1000 rows of the eagle data.
Square root transformation was applied to the values to help with skewness.

Methods continued

What is the optimal number of clusters?

This silhouette plot will plot the optimal number of clusters and the optimal number of in flight behavior clusters.

Results for Question 1

Can we use the movement variables to distinguish in-flight from perching points? Yes

Looking at the first dimension the speed variables contribute a lot to the first dimension on the plot.

Looking at the second dimension the speed variables do not contribute a lot to the second dimension. Instead vertical rate and absolute angle contribute more.

Results for Question 2 part a

Are there distinct flight behaviors that can be identified? Yes

Cluster 1, 2, and 3 will be the in flight points.

Cluster 4 will be the perching points.

Results for Question 2 part a continued

Table of cluster means
cluster	KPH	Sn	AGL	VerticalRate	abs_angle	absVR
1	33.045501	7.8854399	326.08638	1.5168248	1.7508422	1.5573757
2	56.814846	15.6074088	364.36337	-1.9634261	0.2481148	2.0165739
3	37.107753	9.4343892	86.87814	-0.0150772	0.8462942	0.4706384
4	1.543194	0.7681942	16.43931	-0.0772764	1.5264169	0.4936627

Results for Question 2 part b

What are the characteristics of these behaviors?

Cluster 1 (ascending) has high absolute turn angle and above ground level.
Cluster 2 (flapping) has low above ground level.
Cluster 3 (gliding) has low absolute turn angle, low vertical rate, high above ground level, and high speed.
Cluster 4 (perching) has high absolute turn angle, low above ground level, and low instantaneous speed.

Results for Question 2 part c

Visual example of a flight segment that shows different in flight behaviors

Conclusion

The movement variables were able to distinguish perching and flight clusters.
Flight behaviors were able to distinguished into sub flight behaviors.
Speed was a important for determining in flight behavior.
Absolute angle was important in determining perching points.

Appendix

#this loads the data
options(width=10000)
load('eagle_data.Rdata')
(eagle_data
  %>% as_tibble() 
  ) %>% head

movement <- eagle_data %>% select(KPH:absVR) #select movement variables
movement_scaled <- scale(movement) # scale variables

movement_pca <- prcomp(movement, center = TRUE, scale. = TRUE) #pca of movement varaiables and scales and centers
movement_pca_scores <- as_tibble(movement_pca$x) #creates scores of the movement variables

Appendix

movement_vars <- c("KPH", "Sn", "AGL", "VerticalRate", "abs_angle", "absVR") #this is to select the movement variables if need
sqrt_vars <- c("KPH", "Sn", "AGL", "abs_angle", "absVR") #selects variables where it is safe to use a sqrt transformation

kmeans_movement <- kmeans(movement_scaled, centers = 2) # use k means to make 2 clusters for first pca

fviz_pca_biplot( #create pca plot
  movement_pca, #data
  geom.ind = "point", #point plot
  col.ind = factor(kmeans_movement$cluster), #adds clusters to data
  palette = c("blue", "orange"), #color code clusters
  label = "var", #create labels for arrows
  col.var = "black", #black color arrows
  repel = TRUE #repel the text
) +
  labs(title = "PCA Biplot with K-means Clusters") #title name

Appendix

set.seed(123) #set the seed
# this function makes 6 sets of data that randomly gets 1000 rows of data
samples <- lapply(1:6, function(i) {
  eagle_data %>% sample_n(1000)
})
# combined all 6 data sets
combined_samples <- bind_rows(
  samples,
  .id = "sample_id"   # optional: marks which sample each row came from
)
#select all the movement variables
combined_samples_movement <- combined_samples %>% select(all_of(movement_vars))
combined_transformed <- combined_samples_movement %>% # sqrt transformation for skewness
  mutate(across(all_of(sqrt_vars), sqrt)) %>%
  drop_na() # drop null

combined_scaled <- scale(combined_transformed) #scale
combined_pca <- prcomp(combined_scaled, center = TRUE, scale. = FALSE) #pca
kmeans_combined4 <- kmeans(combined_scaled, centers = 4) # 4 clusters using k means
pca_scores_combined <- as.data.frame(combined_pca$x[, 1:2]) %>%
  mutate(cluster = factor(kmeans_combined4$cluster)) #gets pc1 and pc2

#create sil plot to find optimal clusters
fviz_nbclust(combined_scaled, 
             FUNcluster = kmeans,
             method='silhouette',
             )

# creates a data frame of the scores
pca_eagle_clustered <- movement_pca_scores %>%
  as.data.frame() %>%
  mutate(cluster = factor(kmeans_movement$cluster))

Appendix

# takes pc1 and gets the variable contributions in a %
contribution_pc1 <- fviz_contrib(combined_pca, choice = 'var', axes = 1) + 
  theme_classic(base_size = 16) +
  labs(x = 'Variable', 
       title = 'Contribution to first dimension') +
  # add this line to make the x-axis label text smaller
  theme(axis.text.x = element_text(size = 10))

# takes pc2 and gets the variable contributions in a %
contribution_pc2 <- fviz_contrib(combined_pca, choice = 'var', axes = 2) + 
  theme_classic(base_size = 16) +
  labs(x = 'Variable', 
       title = 'Contribution to second dimension') +
  # Add this line to make the x-axis label text smaller
  theme(axis.text.x = element_text(size = 10))

contribution_pc1 / contribution_pc2 #patches the 2 plots together

kmeans_movement4 <- kmeans(movement_scaled, centers = 4) # get 4 clusters using k means of the movement variables

Appendix

#create data frame of the clusters form the kmeans
eagle_clustered <- eagle_data %>%
  as.data.frame() %>%
  mutate(cluster = factor(kmeans_movement4$cluster))

#similar as above code
eagle_clustered_combined <- combined_samples %>%
  as.data.frame() %>%
  mutate(cluster = factor(kmeans_combined4$cluster))

#this code is used to group the clusters in the data frame
cluster_means_combined <- eagle_clustered_combined %>%
  group_by(cluster) %>%
  summarize( #summarize the data to find the means of the variables
    across(all_of(movement_vars), mean, na.rm = TRUE)
  )
cluster_means_combined

Appendix

#this package makes a table of the above code ouput for better visual
knitr::kable(cluster_means_combined, caption = "Table of cluster means") %>%
  kable_styling(
    font_size = 18, # Adjust this number (e.g., 8, 10, or 'small', 'x-small')
    full_width = FALSE # Keeps the table from stretching to 100% width
  )

#plot of the pca of the combined samples
fviz_pca_biplot(
    combined_pca,
    geom.ind = "point",
    habillage = kmeans_combined4$cluster, # Ccolor by cluster
    pointsize = 1, #point size is 1
    pointshape = 19, # points are shape 19
    label = "var", # show variable arrows
    col.var = "black", # color of the variable arrows
    repel = TRUE # avoid overlapping labels
) +
  #color codes the clusters on plot
scale_color_manual(
    values = c(
        "1" = "blue", # blue
        "2" = "darkorange", # orange
        "3" = "darkgreen",  # green
        "4" = "red"  # red
    )
) +
labs(
    title = "PCA Biplot (Combined 6-Sample Dataset)", #title 
    color = "Cluster"
) +
theme_minimal(base_size = 14) # minimal theme

Appendix

boxplot_variables <- c('KPH', 'AGL', 'VerticalRate', 'abs_angle')#variables that were used for the boxplot

# define the colors once to reuse them easily and avoid typos
cluster_colors <- c
  "blue" = "blue" # map color names to color values if needed or just keep as is
  "darkorange" = "darkorange"
  "darkgreen" = "darkgreen"
  "red" = "red"


eagle_clustered <- combined_samples %>%
  as.data.frame() %>%
  mutate(
    cluster = factor(
      kmeans_combined4$cluster,
      levels = c("1", "2", "3", "4"), # original levels
      labels = c("Ascending", "Flapping", "Gliding", "Perching") # ew labels
    )
  )

# define the variable labels for facets
variable_labels <- c(
  'abs_angle' = '|Angle|',
  'AGL' = 'Above ground level',
  'VerticalRate' = 'Vertical rate',
  'KPH' = 'KPH'
)

eagle_clustered %>%
  pivot_longer( #pivoting the data
    cols = all_of(boxplot_variables),
    names_to = "variable",
    values_to = "value"
  ) %>%
ggplot(aes(x = cluster, y = value, fill = cluster)) +
  geom_boxplot(outlier.alpha = 0.2) + #create boxplot
  
  # use the labeller argument to apply custom labels to the facets
  facet_wrap(~ variable, scales = "free_y", labeller = as_labeller(variable_labels)) +
  
  # update scale fill manual to map the new labels to the desired colors
  scale_fill_manual(
    values = c(
      "Ascending" = "blue",
      "Flapping" = "darkorange",
      "Gliding" = "darkgreen",
      "Perching" = "red"
    )
  ) + 
  
  theme_minimal() +
  labs(
    title = "Inter-Cluster Differences Across Movement Variables",
    x = "Behavioral Cluster", #changed x axis label to be more descriptive
    y = "Value"
  ) +
  theme(# this is the theme
    strip.text = element_text(size = 12, face = "bold"),
    legend.position = "none"
  )

Appendix

#this is to create a new data frame
eagle112 <- eagle_data %>% 
  filter(Animal_ID == '112') %>% #filter by animal id 112
  filter(segment_length == '608') #filter by segment length 608

eagle112_numeric <- eagle112 %>% select(where(is.numeric)) #select numeric values
eagle112_numeric_clean <- eagle112_numeric %>% select(-Animal_ID, -segment_id, -segment_length) # this variables turned null so they needed to be removed
eagle112_scaled <- scale(eagle112_numeric_clean)#scale data
kmeans_eaggle112 <- kmeans(eagle112_scaled, centers = 4)# 4 clusters using k means
eagle112$cluster_k4 <- factor(kmeans_eaggle112$cluster) #get the clusters

# this is a plot that will plot the eagle 112 flight path for the flight segment
ggplot(eagle112, aes(x = X, y = Y, color = cluster_k4)) + #color are the different clusters
  geom_point(size = 1) + # dots for each GPS fix
  coord_equal() +
  theme_minimal() +
  scale_color_manual( #color code clusters
    values = c(
      "1" = "blue",      # Map color to original cluster '1'
      "2" = "darkorange", # Map color to original cluster '2'
      "3" = "darkgreen",  # Map color to original cluster '3'
      "4" = "red"       # Map color to original cluster '4'
    ),
    labels = c(# label the clusters
      "1" = "perching", # Map original cluster '1' to new label 'perching'
      "2" = "ascending",  # Map original cluster '2' to new label 'ascending'
      "3" = "flapping",  # Map original cluster '3' to new label 'flapping'
      "4" = "gliding"   # Map original cluster '4' to new label 'gliding'
    )
  ) +
  labs(
    title = "Flight Path of Eagle 112 Colored by Movement Cluster", #title
    x = "X Coordinate", #x axis title
    y = "Y Coordinate",# y axis title
    color = "Movement Behavior"
  )