Analyzing Fish Migration Patterns Using K-means Clustering

Analyzing Fish Migration Patterns Using K-means Clustering

When I embarked on the project to analyze fish migration patterns using K-means clustering, my main objective was to pinpoint common routes and crucial gathering spots for fish populations during their migrations. This analysis is pivotal as it sheds light on the environmental influences on migration pathways and aids in conservation efforts.

I started with simulating data to represent hypothetical locations (latitude and longitude) of fish populations at various times. This step was essential for visualizing their movements and pinpointing potential clusters in their migration paths. By creating this simulated dataset, I could manipulate and observe the dynamics of fish migration without the constraints of real-world data collection.

I opted for K-means clustering because of its effectiveness in partitioning geographical data into meaningful groups. These groups could represent common migration destinations or routes, making it a suitable method for my needs. I found that K-means was particularly adept at revealing natural divisions in the data, which aligned perfectly with the geographic aspect of my study.

The process involved numerous iterations where I adjusted the centroids based on the mean coordinates of the points assigned to each cluster. This iterative refinement was critical to ensure that the clusters accurately represented the central points of migration. Each adjustment brought me closer to a more precise understanding of the migration patterns.

The culmination of this project was the identification of specific areas where fish populations predominantly migrate. These areas are likely of high ecological importance, possibly serving as critical feeding and breeding grounds for various fish species. The clusters formed in the analysis illuminated these key areas, providing a clear and quantitative view of migration patterns.

By employing K-means clustering, I was able to both visually and quantitatively dissect the migration patterns of fish. This approach not only enriched my understanding but also laid a foundational framework for further ecological studies and conservation initiatives. I could capture a snapshot of the dynamic and complex nature of fish migrations, contributing valuable insights to the field of marine biology.

When I set out to analyze the migration patterns of fish using K-means clustering, I was determined to uncover the nuances in their geographic distribution during migration periods. The scatter plot that I generated from my analysis visually depicts the clustering results based on the simulated dataset, which clearly delineates three distinct migration clusters represented by different colors: blue, red, and green.

Blue Cluster (Cluster 2): Located primarily between latitudes 55 and 60 and longitudes -30 to -20, this cluster represents a colder, northern migratory route. I noticed that this cluster had the densest concentration of points, suggesting a preferred migration route for a significant portion of the fish population. This might indicate abundant food sources or optimal breeding conditions in these northern waters.

Red Cluster (Cluster 1): Spread across latitudes 50 to 55 and longitudes -20 to -10, this cluster is positioned slightly south of the blue cluster. The distribution of points here is somewhat more spread out, indicating a wider range of migration within this middle latitude band. This could suggest a transitional route where fish populations vary their migration based on seasonal changes.

Green Cluster (Cluster 3): This cluster spans from latitude 40 to 50 and longitude -30 to -20, marking the southernmost migration path among the three clusters. The points here are more dispersed compared to the blue cluster, possibly reflecting a less favored route due to factors like water temperature or lower food availability.

By examining these clusters, I gained valuable insights into the environmental and ecological dynamics influencing fish migrations. The clustering provided a clear visual and quantitative breakdown of migration patterns, enabling me to hypothesize about ecological conditions in each cluster. For instance, the dense aggregation in the blue cluster could be indicative of optimal survival conditions, whereas the dispersion in the green cluster might point to less ideal conditions.

This analysis has not only enhanced my understanding of fish migration but also highlighted potential areas for further research and conservation efforts. The clear distinctions between the clusters underscore the complex interplay of environmental factors that guide these migration paths. Moving forward, I can use these findings to inform more detailed ecological studies and potentially guide conservation strategies to protect these critical marine habitats.

# I'm loading the necessary libraries for data manipulation and visualization.
library(tidyverse)  # For data manipulation and ggplot for plotting

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(cluster)    # For clustering algorithms

# I'm simulating fish migration data. I'm imagining coordinates for 300 fish observations.
set.seed(123)  # I'm setting a seed for reproducibility.
fish_data <- tibble(
  Longitude = rnorm(300, mean = -20, sd = 5),  # Simulated longitudes around a central migration point.
  Latitude = rnorm(300, mean = 50, sd = 5)     # Simulated latitudes around a central migration point.
)

# I'm standardizing the data because K-means clustering is sensitive to the scale of the data.
scaled_fish_data <- scale(fish_data)

# Performing K-means clustering. I'm initially trying with 3 clusters.
kmeans_result <- kmeans(scaled_fish_data, centers = 3, nstart = 25)  # Using multiple starts to find a good solution.

# I'm visualizing the clusters to interpret the common migration patterns.
ggplot(fish_data, aes(x = Longitude, y = Latitude)) +
  geom_point(aes(color = factor(kmeans_result$cluster)), alpha = 0.6) +  # I'm coloring points based on cluster assignment.
  scale_color_manual(values = c("red", "blue", "green")) +  # I'm using distinct colors for clarity.
  labs(title = "Fish Migration Patterns", x = "Longitude", y = "Latitude") +
  theme_minimal()  # I prefer a minimal theme for visual clarity.

Analyzing Fish Migration Patterns Using K-means Clustering

Avery Holloman

2025-01-03