Before integrating the data into the RShiny application, we perform preprocessing after the initial exploratory data analysis (EDA). To enhance performance, all necessary calculations and data transformations are conducted at this stage. The processed data is then exported as .geojson or .csv files, ready for seamless import into the RShiny app.
# Save processed datast_write(bird_tracks, "02_preprocessing_export/bird_tracks.geojson", delete_dsn =TRUE)
Deleting source `02_preprocessing_export/bird_tracks.geojson' using driver `GeoJSON'
Writing layer `bird_tracks' to data source
`02_preprocessing_export/bird_tracks.geojson' using driver `GeoJSON'
Writing 385 features with 2 fields and geometry type Line String.
Preprocess Cluster Analysis (DBScan)
For each bird, we apply DBScan clustering with a minimum points (minPts) parameter set to 15. The eps parameter is dynamically calculated as the mean of the k-nearest neighbor (kNN) distances, ensuring adaptive and data-driven clustering.
# Initialize cluster columnsbird_data$cluster_season <-NAbird_data$cluster_all <-NA# Set parametersminPts <-15unique_ids <-unique(bird_data$id)for (bird_id in unique_ids) {# --- DBScan for All Seasons --- bird_all_data <- bird_data %>%filter(id == bird_id) coords_all <-st_coordinates(bird_all_data)if (nrow(coords_all) >= minPts) { knn_distances_all <- dbscan::kNNdist(coords_all, k = minPts /2) eps_all <-mean(knn_distances_all, na.rm =TRUE) db_all <-dbscan(coords_all, eps = eps_all, minPts = minPts) bird_data$cluster_all[bird_data$id == bird_id] <-as.factor(db_all$cluster) }# --- DBScan by Season ---for (season_name inc("Breeding Time", "Harvesting Time", "Winter")) { bird_season_data <- bird_data %>%filter(id == bird_id, season == season_name) coords_season <-st_coordinates(bird_season_data)if (nrow(coords_season) >= minPts) { knn_distances_season <- dbscan::kNNdist(coords_season, k = minPts /2) eps_season <-mean(knn_distances_season, na.rm =TRUE) db_season <-dbscan(coords_season, eps = eps_season, minPts = minPts) bird_data$cluster_season[bird_data$id == bird_id & bird_data$season == season_name] <-as.factor(db_season$cluster) } }}# Save the processed data with new cluster columnsst_write(bird_data, "02_preprocessing_export/bird_data.geojson", delete_dsn =TRUE)
Deleting source `02_preprocessing_export/bird_data.geojson' using driver `GeoJSON'
Writing layer `bird_data' to data source
`02_preprocessing_export/bird_data.geojson' using driver `GeoJSON'
Writing 49757 features with 15 fields and geometry type Point.
Visualize
To verify the effectiveness of the preprocessing and clustering, we create visualizations that illustrate the clustering results. This step ensures that the DBScan algorithm has accurately identified clusters based on the calculated eps values.
# Visualize Clusters for Specific IDtarget_id <-'.458'print(target_id)
[1] ".458"
bird_data_filtered <- bird_data[bird_data$id == target_id, ]if (nrow(bird_data_filtered) >0) {tmap_mode("view")tm_shape(bird_data_filtered) +tm_dots(col ="cluster_all", palette ="Set1", title ="Cluster ID") +tm_layout(title =paste("Clusters for ID", target_id))} else {print("No data available for the specified ID")}
ℹ tmap mode set to "view".
── tmap v3 code detected ───────────────────────────────────────────────────────
[v3->v4] `tm_tm_dots()`: migrate the argument(s) related to the scale of the
visual variable `fill` namely 'palette' (rename to 'values') to fill.scale =
tm_scale(<HERE>).
[v3->v4] `tm_dots()`: use 'fill' for the fill color of polygons/symbols
(instead of 'col'), and 'col' for the outlines (instead of 'border.col').
[tm_dots()] Argument `title` unknown.
[v3->v4] `tm_layout()`: use `tm_title()` instead of `tm_layout(title = )`
[cols4all] color palettes: use palettes from the R package cols4all. Run
`cols4all::c4a_gui()` to explore them. The old palette name "Set1" is named
"brewer.set1"
Prepare Data for Export of DB-Clusters
DBScan Algorithm
In this step, we structure the data for export to .geojson files. Each bird’s DBScan clusters are saved as individual .geojson files, with each file containing the cluster polygons for that specific bird, facilitating organized data handling and subsequent analysis.
Reading layer `bird_data' from data source
`C:\Users\claud\Documents\Geographie-Studium-10. Semester\GEO880\Project\GEO880_Project\Project\02_preprocessing_export\bird_data.geojson'
using driver `GeoJSON'
Simple feature collection with 49757 features and 15 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 9.34394 ymin: 46.42356 xmax: 10.47204 ymax: 46.95712
Geodetic CRS: WGS 84
# Check and set CRS to WGS84 (EPSG:4326)target_crs <-4326if (st_crs(bird_data)$epsg != target_crs) { bird_data <-st_transform(bird_data, crs = target_crs)}# Initialize lists for storing polygons and attributesconvex_hulls_polygons <-st_sfc(crs = target_crs)convex_hulls_ids <-c()convex_hulls_clusters <-c()convex_hulls_seasons <-c()# Iterate through each birdfor (id_name inunique(bird_data$id)) {# Iterate through each season and "All Seasons"for (season_name inc("Breeding Time", "Harvesting Time", "Winter", "All Seasons")) {if (season_name =="All Seasons") { selected_data <- bird_data[bird_data$id == id_name, ] cluster_column <-"cluster_all" } else { selected_data <- bird_data[bird_data$id == id_name & bird_data$season == season_name, ] cluster_column <-"cluster_season" }# Ensure CRS is consistentif (st_crs(selected_data)$epsg != target_crs) { selected_data <-st_transform(selected_data, crs = target_crs) }# Get unique clusters clusters <-unique(na.omit(selected_data[[cluster_column]]))# Skip if no clusters foundif (length(clusters) ==0) next# Identify the smallest cluster ID and exclude it min_cluster <-min(clusters) clusters <- clusters[clusters != min_cluster]# Process each clusterfor (cluster_id in clusters) {# Filter by cluster cluster_data <- selected_data[selected_data[[cluster_column]] == cluster_id, ]# Ensure sufficient points for convex hullif (nrow(cluster_data) >3) { hull <-st_convex_hull(st_union(cluster_data))# Store data convex_hulls_polygons <-c(convex_hulls_polygons, hull) convex_hulls_ids <-c(convex_hulls_ids, id_name) convex_hulls_clusters <-c(convex_hulls_clusters, cluster_id) convex_hulls_seasons <-c(convex_hulls_seasons, season_name) } } }}# Create sf object with 3 columns: id, cluster_id, seasonconvex_hulls_sf <-st_as_sf(data.frame(id = convex_hulls_ids,cluster_id = convex_hulls_clusters,season = convex_hulls_seasons,geometry = convex_hulls_polygons ),crs = target_crs)# Save as GeoJSONst_write(convex_hulls_sf, "02_preprocessing_export/DB_Scan_polygons.geojson", delete_dsn =TRUE)
Deleting source `02_preprocessing_export/DB_Scan_polygons.geojson' using driver `GeoJSON'
Writing layer `DB_Scan_polygons' to data source
`02_preprocessing_export/DB_Scan_polygons.geojson' using driver `GeoJSON'
Writing 593 features with 3 fields and geometry type Polygon.
Calculate Overlap Matrix (all Season)
The overlap matrix is calculated to identify intersecting DBScan clusters between birds. The function calculate_overlap_matrix() uses vectorized operations to efficiently compute the intersections and generate a symmetric matrix, indicating overlap presence with binary values. The resulting matrix is exported as a .csv file for further analysis.
# Funktion zur Berechnung der Overlap-Matrix# Funktion zur effizienten Berechnung der Overlap-Matrixcalculate_overlap_matrix <-function(DB_Scan_data) {# Extrahiere eindeutige Vogel-IDs bird_ids <-unique(DB_Scan_data$id) n <-length(bird_ids)# Erstelle eine leere Overlap-Matrix overlap_matrix <-matrix(0, nrow = n, ncol = n, dimnames =list(bird_ids, bird_ids))# Nutze Vektorisierung zur Berechnung der Overlapsfor (i in1:(n -1)) { bird1_data <- DB_Scan_data[DB_Scan_data$id == bird_ids[i], ]for (j in (i +1):n) { bird2_data <- DB_Scan_data[DB_Scan_data$id == bird_ids[j], ]# Berechne die Intersection intersection <-st_intersection(bird1_data, bird2_data)# Überprüfe, ob eine Intersection vorhanden ist overlap_matrix[i, j] <-ifelse(nrow(intersection) >0, 1, 0) overlap_matrix[j, i] <- overlap_matrix[i, j] } }return(overlap_matrix)}# Beispielaufrufoverlap_matrix <-calculate_overlap_matrix(convex_hulls_sf)write.csv(overlap_matrix, '02_preprocessing_export/DB_Scan_Matrix.csv')
Function to find relatable birds
The find_related_birds() function identifies birds with overlapping DBScan clusters based on the overlap matrix. By inputting a specific bird ID, the function extracts all bird IDs with intersecting clusters (indicated by a value of 1 in the matrix). This provides a quick way to find potential interactions or shared areas among birds.
find_related_birds <-function(matrix_data, bird_id) {if (!(bird_id %in%rownames(matrix_data))) {stop("Die angegebene Vogel-ID existiert nicht in der Matrix.") }# Finde die Zeile, die der bird_id entspricht bird_row <- matrix_data[as.character(bird_id), ]# Extrahiere die IDs der ähnlichen Vögel (Spalten mit Wert 1) similar_birds <-colnames(matrix_data)[which(bird_row ==1)]return(similar_birds)}find_related_birds(overlap_matrix, "7934")