To identify the main factors influencing soil properties, I used Principal Component Analysis (PCA) to transform the original 18, correlated variables into a smaller set of uncorrelated Principal Components (PCs), which capture the maximum variance in the data (Souza, 2025). I carried out the PCA using the PCA() function from the FactoMineR package in R (Husson et al., 2024). I determined the significance of each principal component by examining its eigenvalue; components with eigenvalues greater than one were considered meaningful. I examined variable loadings and correlation structures to interpret the ecological significance of each axis.
✅ Interpretation: T#The eigenvalues represent the amount of variance captured
by each principal component (PC).The first five components have eigenvalues greater than 1, suggesting they retain meaningful information.
✅ Interpretation: Cumulative variance: PC1 explains 41.8% of the total variance. PC2 adds 24.9%, bringing the cumulative variance to 66.7% #Variable Contributions to Principal Components (Correlation with PCs)
✅ Interpretation: PC1:These properties are strongly affected by fertilisers especially NPK Strong positive correlations: CEC (0.98), Ca (0.98), Mg (0.95), B (0.84), pH (0.89). High values of these components show high use of the fertilisers and lime. Further, the soils showed strong negative correlation: Al (-0.65), Sand (-0.54), ExAc (-0.51), Fe (-0.49). Suggests PC1 represents cation exchange capacity (CEC) and soil fertility.
✅ Interpretation: PC2:High PC2: Probably in conservation areas and unfarmed hilly areas (due to organic matter accumulation).Strongest positive correlations: TN (0.97), SOC (0.94),EC (0.70), Sand (0.62). Negative correlations Clay (-0.69), Na (-0.60) Indicates PC2 captures organic matter and soil texture gradients.
Soil spatial statistics
To test for multivariate differences in overall soil composition among the 16 clusters, I conducted a permutational multivariate analysis of variance (PERMANOVA) with Euclidean distances, based on standardised soil variables (adonis2 function, vegan package, Oksanen et al., 2025). I identified which specific soil variables exhibited significant spatial variation by applying Kruskal-Wallis tests to each variable individually, using a non-parametric approach due to violations of normality. I subsequently tested for spatial differences in the extracted scores of the first two principal components (PC1 and PC2) among clusters using one-way ANOVA.
Show Code
# Shapiro-Wilk test for all raw soil variablessoil_normality <- soil_data %>%pivot_longer(cols =c(pH, SOC, TN, EC, P, Al, B, Ca, Fe, K, Mg, Na, CEC, ExAc, PSI, Clay, Silt, Sand), names_to ="Variable", values_to ="Value") %>%group_by(Variable) %>%shapiro_test(Value)# PERMANOVA Does the overall soil change between clusters?soil_matrix <- soil_data %>%ungroup() %>% dplyr::select(-cluster, -plot)adonis2(soil_matrix ~ cluster, data = soil_data)
Permutation test for adonis under reduced model
Permutation: free
Number of permutations: 999
adonis2(formula = soil_matrix ~ cluster, data = soil_data)
Df SumOfSqs R2 F Pr(>F)
Model 15 1.9487 0.59707 14.028 0.001 ***
Residual 142 1.3151 0.40293
Total 157 3.2638 1.00000
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Show Code
# Kruskal-Wallis Which specific nutrients or textures are different?soil_kruskal <- soil_data %>%pivot_longer(cols =c(pH, SOC, TN, EC, P, Al, B, Ca, Fe, K, Mg, Na, CEC, ExAc, PSI, Clay, Silt, Sand),names_to ="Variable",values_to ="Value" ) %>%group_by(Variable) %>%kruskal_test(Value ~ cluster) %>%add_significance() %>%filter(p <0.05) soil_kruskal %>%kable(caption ="Significant Soil Property Differences Across Clusters") %>%kable_styling(bootstrap_options =c("striped", "hover")) %>%scroll_box(width ="100%", height ="400px")
Significant Soil Property Differences Across Clusters
Variable
.y.
n
statistic
df
p
method
p.signif
Al
Value
158
97.01431
15
0.00e+00
Kruskal-Wallis
****
B
Value
158
71.90914
15
0.00e+00
Kruskal-Wallis
****
CEC
Value
158
94.06721
15
0.00e+00
Kruskal-Wallis
****
Ca
Value
158
94.12593
15
0.00e+00
Kruskal-Wallis
****
Clay
Value
158
76.87263
15
0.00e+00
Kruskal-Wallis
****
EC
Value
158
47.05928
15
3.60e-05
Kruskal-Wallis
****
ExAc
Value
158
44.76172
15
8.35e-05
Kruskal-Wallis
****
Fe
Value
158
39.87837
15
4.73e-04
Kruskal-Wallis
***
K
Value
158
86.76481
15
0.00e+00
Kruskal-Wallis
****
Mg
Value
158
93.31948
15
0.00e+00
Kruskal-Wallis
****
Na
Value
158
85.33213
15
0.00e+00
Kruskal-Wallis
****
P
Value
158
44.24040
15
1.01e-04
Kruskal-Wallis
***
PSI
Value
158
52.21129
15
5.20e-06
Kruskal-Wallis
****
SOC
Value
158
47.75408
15
2.79e-05
Kruskal-Wallis
****
Sand
Value
158
77.46504
15
0.00e+00
Kruskal-Wallis
****
Silt
Value
158
35.38149
15
2.17e-03
Kruskal-Wallis
**
TN
Value
158
53.98273
15
2.60e-06
Kruskal-Wallis
****
pH
Value
158
105.81762
15
0.00e+00
Kruskal-Wallis
****
Show Code
# PCA Test (PC1 & PC2 acrosee the landscape) ---summary(aov(PC1 ~ Cluster, data = pca_scores))
Df Sum Sq Mean Sq F value Pr(>F)
Cluster 15 733.2 48.88 15.27 <2e-16 ***
Residuals 142 454.6 3.20
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Show Code
summary(aov(PC2 ~ Cluster, data = pca_scores))
Df Sum Sq Mean Sq F value Pr(>F)
Cluster 15 237.9 15.859 4.802 1.83e-07 ***
Residuals 142 468.9 3.302
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
✅ Interpretation:
The PERMANOVA revealed that overall soil composition differed significantly among the 16 clusters (F = 14.03, p = 0.001). The cluster grouping explained 59.7% of the multivariate variance in soil properties. Kruskal-Wallis tests on individual soil variables confirmed that all 18 measured variables exhibited statistically significant spatial variation (all p < 0.01).
Plants Data
Rangelands Plants metacommunity
Following, Marcon et al. (2025) I used the entropart R package to create a MetaCommunity object.These objects integrated abundance data for rangeland plant species—including perennial and annual grasses, shrubs, and trees recorded along 28-m transects—as well as avian community data derived from both traditional point counts and BirdNET automated detections. In this study, each metacommunity was defined as the total collection of species recorded at the landscape level across 16 clusters, with each cluster treated as a local community. These metacommunities served as the computational basis for partitioning diversity, as it allows for the simultaneous calculation of local α (alpha), between-site β (beta) and total landscape γ (gamma) diversity while weighting each community according to its sample size (Marcon et al., 2025).
✅ Interpretation: Analysis of the plant community across the 16 clusters revealed significant spatial structuring. The analysis identified a total of 132 species distributed across 16 local communities, with an exceptionally high estimated sample coverage (0.9998), indicating that the survey nearly captured the full extent of the species richness present in the study area.
Rangeland Plants Community Alpha Diversity
I then evaluated the local community structure by calculating theα-diversity for each of the 16 clusters using Hill numbers (q = 0, 1, and 2) via the DivPart function of entropart with the “Best” estimator (Marcon et al., 2025). This provided measures of species richness (q = 0), Shannon-effective diversity (q = 1), and Simpson-effective diversity (q = 2). Diversity profiles were generated using the DivProfile function (entropart package) to visualise the transition from richness to dominance-weighted diversity (Marcon et al., 2025). Additionally, I calculated community evenness as the ratio of Simpson diversity to richness (q2 / q0).
Note: This data frame contains grasses, shrubs and trees collected along the 28 metres transect in each plot
Show Code
#****************************************************************all species in rangeland form# Alpha diversity for all(q=0,1,2)alpha.rangelands.0<-DivPart(q=0, mc.rangelands, Correction ="Best")summary(alpha.rangelands.0)
HCDT diversity partitioning of order 0 of metaCommunity mc.rangelands
Alpha diversity of communities:
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16
55 45 50 61 57 50 56 65 47 63 52 50 44 55 48 55
Total alpha diversity of the communities:
[1] 53.3125
Beta diversity of the communities:
None
2.475967
Gamma diversity of the metacommunity:
None
132
To test for significant spatial differences in community composition, a PERMANOVA was performed on the plot-level plant species abundance matrix (n = 160) and at the point station level for birds (n = 96) using Bray–Curtis dissimilarity. Spatial variation in alpha diversity across the 16 clusters was evaluated using one-way ANOVA (stats package).
Permutation test for adonis under reduced model
Permutation: free
Number of permutations: 999
adonis2(formula = species_matrix ~ cluster, data = rangelands_plots, permutations = 999, method = "bray")
Df SumOfSqs R2 F Pr(>F)
Model 15 6.630 0.21198 2.5824 0.001 ***
Residual 144 24.647 0.78802
Total 159 31.277 1.00000
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Show Code
# Normality of Residuals (Shapiro)plot_diversity_long <- plot_diversity %>%pivot_longer(cols =c(q0, q1, q2), names_to ="q_order", values_to ="value")shapiro_results <- plot_diversity_long %>%group_by(q_order) %>%shapiro_test(value)# Homogeneity of Variance (Levene's Test for q1)levene_q1 <-leveneTest(q1 ~ cluster, data = plot_diversity)print(shapiro_results)
# A tibble: 3 × 4
q_order variable statistic p
<chr> <chr> <dbl> <dbl>
1 q0 value 0.990 0.325
2 q1 value 0.978 0.0109
3 q2 value 0.969 0.00122
Show Code
print(levene_q1)
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 15 0.8086 0.6667
144
Show Code
# ANOVA MODELS ---aov_q0 <-aov(q0 ~ cluster, data = plot_diversity)aov_q1 <-aov(q1 ~ cluster, data = plot_diversity)aov_q2 <-aov(q2 ~ cluster, data = plot_diversity)summary(aov_q0)
Df Sum Sq Mean Sq F value Pr(>F)
cluster 15 976 65.05 2.184 0.00929 **
Residuals 144 4288 29.78
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Show Code
summary(aov_q1)
Df Sum Sq Mean Sq F value Pr(>F)
cluster 15 511 34.07 4.157 2.58e-06 ***
Residuals 144 1180 8.20
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Show Code
summary(aov_q2)
Df Sum Sq Mean Sq F value Pr(>F)
cluster 15 351.5 23.431 4.86 1.37e-07 ***
Residuals 144 694.3 4.821
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Show Code
# Run Kruskal-Wallis for all diversity orders to confirm the ANOVA test since q1 and q2 didnt meet the assumption of normalitykruskal_q0 <-kruskal.test(q0 ~ cluster, data = plot_diversity)kruskal_q1 <-kruskal.test(q1 ~ cluster, data = plot_diversity)kruskal_q2 <-kruskal.test(q2 ~ cluster, data = plot_diversity)list(q0 = kruskal_q0, q1 = kruskal_q1, q2 = kruskal_q2)
$q0
Kruskal-Wallis rank sum test
data: q0 by cluster
Kruskal-Wallis chi-squared = 30.776, df = 15, p-value = 0.009411
$q1
Kruskal-Wallis rank sum test
data: q1 by cluster
Kruskal-Wallis chi-squared = 49.428, df = 15, p-value = 1.493e-05
$q2
Kruskal-Wallis rank sum test
data: q2 by cluster
Kruskal-Wallis chi-squared = 54.44, df = 15, p-value = 2.217e-06
Show Code
#ANOVA AND KRUSKAL RESULTS ARE CONSISTENT
✅ Interpretation: One-way ANOVA revealed significant spatial variation in alpha diversity among the 16 clusters across all Hill-number orders (Table 4). Differences were significant for q = 0 (p = 0.009), q = 1 (p < 0.001), and q = 2 (p < 0.001). The strength of the cluster effect increased with q, indicating that dominant‑species diversity varied more strongly across the landscape than the species richness.
Rangeland Plants diversity profiles
Diversity profiles were generated using the DivProfile function (entropart package) to visualise the transition from richness to dominance-weighted diversity (Marcon et al., 2025).
I quantified species turnover (β-diversity) using abundance-based Bray–Curtis dissimilarity via the betapart package(Baselga et al., 2025). The beta diversity analysis, based on the abundance-weighted Bray-Curtis dissimilarity index, provides quantitative evidence of the spatial variation in species composition and abundance structure. The total dissimilarity (β BRAY) is divided into two components: β BRAY.BAL, representing balanced species turnover or replacement, and βBRAY.GRA, representing differences due to species richness and nestedness.
✅ Interpretation: The extremely high total dissimilarity (β.BRAY = 0.988) indicates near-complete differentiation in community structure across clusters. This was overwhelmingly driven by species turnover (β.BRAY.BAL = 0.818), with a minor contribution from nestedness resulting from abundance gradients (β.BRAY.GRA = 0.169). The statistical significance of this compositional differentiation was confirmed by PERMANOVA (F₁₅,₁₄₄ = 2.58, p < 0.001; R² = 0.212), which demonstrated that spatial grouping explained a substantial portion of community variation. These results demonstrate that strong species replacement and pronounced abundance gradients jointly shape landscape-level variation, resulting in a highly heterogeneous rangeland environment despite the shared presence of common dominant taxa.
Rangeland Plants Species rank curves
The species rank abundance curves is important to characterise structural shifts in plant communities across the landscape’s 16 clusters.
I used non-parametric Kruskal-Wallis tests to assess spatial differences in shrub and tree cover and volume across 16 clusters, following a significant deviation from normality in the data (Shapiro-Wilk, p < 0.001). The distributions were strongly right-skewed and contained zero values corresponding to plots without woody vegetation. I defined statistical significance for all models at α = 0.05.
Show Code
#TEST ASSUMPTIONSshrub_norm <-shapiro_test(residuals(lm(volume_ha.sh ~ Cluster, data = shrub_plot_summary)))tree_norm <-shapiro_test(residuals(lm(volume_ha.tr ~ Cluster, data = tree_reps)))shrub_var <-leveneTest(volume_ha.sh ~ Cluster, data = shrub_plot_summary)tree_var <-leveneTest(volume_ha.tr ~ Cluster, data = tree_reps)print(shrub_norm)
# A tibble: 5 × 8
Variable .y. n statistic df p method p.signif
<chr> <chr> <int> <dbl> <int> <dbl> <chr> <chr>
1 cover_percent.sh Value 160 53.8 15 0.00000287 Kruskal-Wall… ****
2 volume_ha.sh Value 160 54.1 15 0.00000254 Kruskal-Wall… ****
3 basal_area_ha.tr Value 160 45.6 15 0.0000616 Kruskal-Wall… ****
4 cover_percent.tr Value 160 45.6 15 0.0000616 Kruskal-Wall… ****
5 volume_ha.tr Value 160 45.2 15 0.0000708 Kruskal-Wall… ****
✅ Interpretation: The Kruskal-Wallis tests revealed statistically significant spatial variation across all five measured woody vegetation metrics: shrub metrics (percentage cover and stem volume per hectare) and tree metrics (basal area per hectare, stem volume per hectare, and percentage cover) (p < 0.001) (Table 5).
Point count
Birds metacommunity and alpha diversity
Same diversity measures as above. Metacommunity, alpha diversity, beta and gamma diverisity.
To test for significant spatial differences in community composition, a PERMANOVA was performed on the plot-level plant species abundance matrix (n = 160) and at the point station level for birds (n = 96) using Bray–Curtis dissimilarity. Spatial variation in alpha diversity across the 16 clusters was evaluated using one-way ANOVA (stats package).
Show Code
# 1. Prepare data at the Station level (Point Counts)birds_stations <- birds.data %>%# pivot_longer assumes your raw columns are the stations/pointspivot_longer(cols =!species, names_to ="station", values_to ="freq") %>%group_by(station, species) %>%summarise(freq =sum(freq, na.rm =TRUE), .groups ="drop") %>%pivot_wider(names_from = species, values_from = freq, values_fill =0) %>%# Assign 16 clusters with 6 stations each (ensure your data rows match 16 * 6 = 96)mutate(cluster =factor(paste0("C", rep(1:16, each =6)), levels =paste0("C", 1:16))) %>%relocate(cluster, station)# 2. Meta-Community & Alpha Diversity at Station Level# Transpose for entropart: species as rows, stations as columnsmc_birds_stations <- birds_stations %>%ungroup() %>% dplyr::select(-cluster) %>%column_to_rownames("station") %>%t() %>%as.data.frame() %>%MetaCommunity()# Hill Numbers (q=0, 1, 2) per stationq0_birds <-DivPart(q=0, mc_birds_stations, Correction ="Best")$CommunityAlphaDiversitiesq1_birds <-DivPart(q=1, mc_birds_stations, Correction ="Best")$CommunityAlphaDiversitiesq2_birds <-DivPart(q=2, mc_birds_stations, Correction ="Best")$CommunityAlphaDiversities# Create diversity dataframe for testingbirds_div_stats <- birds_stations %>% dplyr::select(cluster, station) %>%mutate(q0.b = q0_birds, q1.b = q1_birds, q2.b = q2_birds)# 3. PERMANOVA (Community Composition)# Does the species mix differ significantly between clusters?bird_species_matrix <- birds_stations %>%ungroup() %>% dplyr::select(-cluster, -station)birds_permanova <-adonis2(bird_species_matrix ~ cluster, data = birds_stations, method ="bray", permutations =999)print(birds_permanova)
Permutation test for adonis under reduced model
Permutation: free
Number of permutations: 999
adonis2(formula = bird_species_matrix ~ cluster, data = birds_stations, permutations = 999, method = "bray")
Df SumOfSqs R2 F Pr(>F)
Model 15 4.5142 0.36384 3.0503 0.001 ***
Residual 80 7.8929 0.63616
Total 95 12.4071 1.00000
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# A tibble: 3 × 4
q_order variable statistic p
<chr> <chr> <dbl> <dbl>
1 q0.b value 0.984 0.306
2 q1.b value 0.981 0.164
3 q2.b value 0.976 0.0771
Show Code
print(levene_birds)
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 15 2.0879 0.01873 *
80
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Show Code
# 5. ANOVA and Kruskal-Wallis# Running both to ensure consistency, especially if normality is violatedkruskal_birds <-list(q0 =kruskal.test(q0.b ~ cluster, data = birds_div_stats),q1 =kruskal.test(q1.b ~ cluster, data = birds_div_stats),q2 =kruskal.test(q2.b ~ cluster, data = birds_div_stats))print(kruskal_birds)
$q0
Kruskal-Wallis rank sum test
data: q0.b by cluster
Kruskal-Wallis chi-squared = 28.721, df = 15, p-value = 0.01747
$q1
Kruskal-Wallis rank sum test
data: q1.b by cluster
Kruskal-Wallis chi-squared = 17.637, df = 15, p-value = 0.2823
$q2
Kruskal-Wallis rank sum test
data: q2.b by cluster
Kruskal-Wallis chi-squared = 20.733, df = 15, p-value = 0.1456
Show Code
# ANOVA MODELS ---aov_q0.b <-aov(q0.b ~ cluster, data =birds_div_stats)aov_q1.b <-aov(q1.b ~ cluster, data = birds_div_stats)aov_q2.b <-aov(q2.b ~ cluster, data = birds_div_stats)summary(aov_q0.b)
Df Sum Sq Mean Sq F value Pr(>F)
cluster 15 1922 128.10 2.64 0.00276 **
Residuals 80 3882 48.53
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Show Code
summary(aov_q1.b)
Df Sum Sq Mean Sq F value Pr(>F)
cluster 15 544.8 36.32 1.469 0.137
Residuals 80 1977.7 24.72
Show Code
summary(aov_q2.b)
Df Sum Sq Mean Sq F value Pr(>F)
cluster 15 385.7 25.71 1.594 0.0942 .
Residuals 80 1290.8 16.14
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
✅ Interpretation: One-way ANOVA showed that species richness (q = 0) differed significantly among the 16 clusters, whereas Shannon diversity (q = 1) and Simpson diversity (q = 2) did not show significant spatial differences (Table 6).
# SPECIES RANK ABUNDANCE CURVES ---# View the most dominant bird species overallbirds.data.MC %>%arrange(desc(rowSums(across(where(is.numeric))))) %>%print(n =5)
# This tells you if the site is improving or degrading regardless of rain.residuals_cluster_slope <- tile_restrand_results %>% dplyr::select(Tile, Index, SenSlope) %>%pivot_wider(names_from = Index, values_from = SenSlope, names_prefix ="R.S.") %>%rename(Cluster = Tile)head(residuals_cluster_slope)
Remote Sensing Time Series Analysis for environmental monitoring, to evaluate vegetation health and soil characteristics across 16 (tiles) from 2019 to 2025, using monthly satellite-derived data. I focus on; NDVI (Normalized Difference Vegetation Index): Measures vegetation “greenness” or health. SAVI (Soil Adjusted Vegetation Index): Similar to NDVI but corrects for soil brightness in areas with sparse vegetation. BSI (Bare Soil Index): Used to identify areas with little to no vegetation (bare soil). Statistical Trend Analysis Mann-Kendall Test: Determines if there is a statistically significant monotonic upward or downward trend. Sen’s Slope: Estimates the magnitude of that trend. RESTREND Analysis (RESTREND stands for Residual Trend Analysis). It performs a linear regression between the indices and rainfall (CHIRPS data). It calculates the Residuals (the difference between what the vegetation should look like based on rain vs. what it actually looks like). If the residuals show a downward trend, it suggests land degradation is occurring due to human activity or other factors independent of rainfall.
Acoustics
Indices
I calculated a suite of standard acoustic indices using Kaleidoscope Pro software (Version 5.8.1; Wildlife Acoustics, Inc.), which provides specialised tools for processing and analysing environmental sound recordings (Bradfer-Lawrence et al., 2019; Greenhalgh et al., 2021; Hyland et al., 2023). I conducted the acoustic indices analysis using the Non-Bat Analysis Mode with default parameter settings to maintain methodological consistency and ensure replicability of results. I batch-processed the audio recordings, segmenting larger files to improve the granularity of the analysis. The software simultaneously calculated the following indices: the Acoustic Complexity Index (ACI), Acoustic Diversity Index (ADI), Acoustic Evenness Index (AEI), Bioacoustics Index (BI), Normalised Difference Soundscape Index (NDSI), and Entropy Index (H).
To identify avian vocalisations within the acoustic recordings, I employed an automated analysis workflow using BirdNET (v2.4.0; Kahl et al., 2023). BirdNET is a deep neural network developed for the automated detection and classification of bird vocalisations, drawing on extensive global reference libraries (Kahl et al., 2021). The model can identify more than 6,000 bird species worldwide based on diagnostic spectral features in the audio signal (Kahl et al., 2023). To ensure comparability across data types, the BirdNET acoustic detections were grouped and summarised by season, cluster, and species to determine the total detection frequency per species. Subsequently, community diversity metrics were calculated following the same procedures used for the vegetation and manual point-count datasets. This consistent application of Hill numbers (q0, q1, q2) and beta diversity partitioning enabled a direct comparison of community structure and spatial heterogeneity across all three survey methods.
birdnet_processed <- birdnet_data %>%filter(Confidence >0.75) %>%#Higher confidence thresholds (0.7-0.8) will reduce the need for manual verification, while potentially missing some rare or poorly-recorded species. dplyr::select(species =`Common name`, File) %>%separate(File, into =c("Drive", "Project", "Category", "SeasonFolder","ClusterFolder", "SubFolder", "FileName"),sep ="\\\\", remove =FALSE) %>%mutate(cluster =paste0("c", str_extract(ClusterFolder, "\\d+")), season =paste(str_extract(SeasonFolder, "^[A-Za-z]{3}"),str_extract(SeasonFolder, "\\d{2}(?=\\.)|\\d{2}$") ) ) %>% dplyr::select(species, cluster, season)#group_by(species) %>% filter(n() <= 10) %>% ungroup() #Requiring a species to be detected a minimum number of times (e.g., ten detections) before including it in results can help reduce misidentifications. Confidence thresholds around 0.5 provide a balance between precision and recall at the assemblage level, while requiring a minimum number of detections per species reduces false positives from sporadic misidentifications. ##With a threshold of 0.5, the total number of species is 427. Adding a filter to remove the species detected less than 10 times reduces the species to 230.head(birdnet_processed)
# A tibble: 6 × 3
species cluster season
<chr> <chr> <chr>
1 Scarlet-chested Sunbird c1 Oct 25
2 Scarlet-chested Sunbird c1 Oct 25
3 Scarlet-chested Sunbird c1 Oct 25
4 Scarlet-chested Sunbird c1 Oct 25
5 Scarlet-chested Sunbird c1 Oct 25
6 Scarlet-chested Sunbird c1 Oct 25
Show Code
#n_distinct(birdnet_processed$species)#View(distinct(birdnet_processed, species))# Convert the long detection list into a wide Community Matrix for meta analysisbirdnet_matrix <- birdnet_processed %>%group_by(season, species, cluster) %>%summarise(detections =n(), .groups ="drop") %>%pivot_wider(names_from = cluster, values_from = detections, values_fill =0) %>%arrange(season, species)head (birdnet_matrix)
✅ Interpretation: The analysis of acoustic data across the study sites revealed substantial spatial heterogeneity in both BirdNET’s classification performance and the resulting avian community structures. The BirdNET analysis comprised 94,927 detections across 16 communities, each with equal sample coverage weighting (Wi = 0.0625). A total of 284 species were detected. Overall sample coverage was exceptionally high (0.999), indicating that the sampling effort captured nearly all species present in the regional pool. Individual community coverage values ranged from 0.992 to 0.998, consistently demonstrating near-complete inventory at each site. Total Bray–Curtis dissimilarity was 0.866, partitioned into a balanced variation component (turnover = 0.790) and an abundance gradient component (nestedness = 0.076). This pronounced dominance of turnover over nestedness reveals that compositional differences between communities are driven almost entirely by species replacement rather than differences in species richness or abundance of shared species. The results confirm a highly diverse, well-sampled assemblage in which BirdNET performed reliably, with distinct species assemblages characterising each of the 16 recording locations.
Combined dataset
These data set comprises 11 acoustic metrics: Bioacoustic Index (BI), Normalised Difference Soundscape Index (NDSI), Acoustic Diversity Index (ADI), Acoustic Evenness Index (AEI), Acoustic Complexity Index (ACI), Acoustic Entropy (SH), ultrasonic metrics, and BirdNET-derived avian diversity metrics (q0, q1, q2, D). I initially compiled 17 environmental predictors spanning vegetation structural variables (shrub volume per hectare, tree volume per hectare, percentage cover per hectare, tree basal area), rangeland plant species diversity metrics (calculated using Hill numbers: q0, q1, q2, and evenness D), bird community diversity metrics (Hill’s numbers derived from manual point counts), soil health components (PC1 and PC2), and residual slope values of remotely sensed vegetation indices (Normalized Difference Vegetation Index [NDVI], Soil Adjusted Vegetation Index [SAVI], and Bare Soil Index [BSI]).
Show Code
# source https://www.sthda.com/english/wiki/ggally-r-package-extension-to-ggplot2-for-correlation-matrix-and-survival-plots-r-software-and-data-visualization# Create the dataset# https://www.sthda.com/english/articles/31-principal-component-methods-in-r-practical-guide/# #https://www.sthda.com/english/articles/37-model-selection-essentials-in-r/154-stepwise-regression-essentials-in-r/#https://www.sthda.com/english/articles/38-regression-model-validation/157-cross-validation-essentials-in-r/#https://www.sthda.com/english/articles/38-regression-model-validation/158-regression-model-accuracy-metrics-r-square-aic-bic-cp-and-more/# https://www.sthda.com/english/articles/40-regression-analysis/168-multiple-linear-regression-in-r/#https://www.sthda.com/english/articles/39-regression-model-diagnostics/160-multicollinearity-essentials-and-vif-in-r/# Combine all data sourcesbig.dataset <-data.frame(soil.pc, alphas.rangelands, shrub.volume.cover, tree.vol.cov.area, residuals_cluster_slope, alphas.birds, alphas.birdnet, acoustic_cluster_means) %>% dplyr::select(-matches("^cluster", ignore.case =TRUE), cluster) %>%relocate(cluster)print(colnames(big.dataset))
I visualised exploratory relationships among variables using scatterplot matrices implemented with the ggpairs function (GGally package, Schloerke et al., 2025) facilitating assessment of distributional properties, non-linear patterns, outliers, and preliminary multicollinearity.
To address high multicollinearity among the three remote sensing indices (Pearson’s |r| > 0.96), I performed a Principal Component Analysis (PCA) using the FactoMineR package (Husson et al., 2026). I retained the first principal component (remote_pc1) as a composite proxy for remotely sensed vegetation and soil health. This primary axis captured 98.4% of the total variance, reducing the environmental predictor set from 18 to 16 variables.
I scaled all environmental and acoustic variables to zero mean and unit variance using the scale() function to ensure comparability of coefficients.
I evaluated multicollinearity among the 16 remaining predictors using the Variance Inflation Factor (VIF) with the vifstep function in the usdmpackage (Naimi, 2023). I applied a threshold of VIF < 3 for variable retention. I first examined VIF within each predictor group separately, then conducted a global VIF analysis on all scaled predictors simultaneously. In addition to VIF-based variable selection, I evaluated the necessity of variable removal by comparing the Global Permutation Test and adjusted R² across successive model iterations. The final model demonstrated an improved adjusted R² (0.299) and a more significant global p-value (0.006) relative to the saturated model. This iterative process yielded a final set of four robust predictors: rangeland plant richness (q0), tree volume per hectare, soil properties principal component 2, and the remote sensing composite (remote_pc1).
1 variables from the 3 input variables have collinearity problem:
q1
After excluding the collinear variables, the linear correlation coefficients ranges between:
min correlation ( q2 ~ q0 ): 0.5388536
max correlation ( q2 ~ q0 ): 0.5388536
---------- VIFs of the remained variables --------
Variables VIF
1 q0 1.409172
2 q2 1.409172
3 variables from the 5 input variables have collinearity problem:
TR.A SH.V TR.C
After excluding the collinear variables, the linear correlation coefficients ranges between:
min correlation ( TR.V ~ SH.C ): -0.06663559
max correlation ( TR.V ~ SH.C ): -0.06663559
---------- VIFs of the remained variables --------
Variables VIF
1 SH.C 1.00446
2 TR.V 1.00446
2 variables from the 4 input variables have collinearity problem:
q2b q0b
After excluding the collinear variables, the linear correlation coefficients ranges between:
min correlation ( D.eveb ~ q1b ): 0.3966637
max correlation ( D.eveb ~ q1b ): 0.3966637
---------- VIFs of the remained variables --------
Variables VIF
1 q1b 1.186721
2 D.eveb 1.186721
No variable from the 2 input variables has collinearity problem.
The linear correlation coefficients ranges between:
min correlation ( soil.pc2 ~ soil.pc1 ): -0.3461851
max correlation ( soil.pc2 ~ soil.pc1 ): -0.3461851
---------- VIFs of the remained variables --------
Variables VIF
1 soil.pc1 1.136162
2 soil.pc2 1.136162
Show Code
# Run VIF on ALL scaled predictors together (global)vif_scaled <-vifstep(scaled_env, th =3)print(vif_scaled)
7 variables from the 15 input variables have collinearity problem:
TR.A q2b SH.V q1b q1 TR.C q2
After excluding the collinear variables, the linear correlation coefficients ranges between:
min correlation ( D.eveb ~ TR.V ): 0.02491861
max correlation ( soil.pc2 ~ SH.C ): 0.5142444
---------- VIFs of the remained variables --------
Variables VIF
1 q0 1.786894
2 SH.C 2.603972
3 TR.V 1.712278
4 q0b 2.446289
5 D.eveb 1.782379
6 soil.pc1 1.986627
7 soil.pc2 2.375860
8 remote_pc1 1.338640
Show Code
# Extract variables that survived VIFclean_env_vars <- vif_scaled@results$Variables# Variables to force-removevars_to_remove <-c("D.eveb", "q0b", "SH.C", "soil.pc1")clean_env_vars <-setdiff(clean_env_vars, vars_to_remove)#Environmental predictor variablesfinal_env <- scaled_env[, clean_env_vars, drop =FALSE]head(final_env)
#In addition to VIF-based variable selection, the necessity of variable removal was evaluated by comparing the Global Permutation Test and Adjusted R^2 across successive model iterations. The final model demonstrated an improved Adjusted R^2(0.299) and a more significant global p-value (0.006) relative to the saturated model. These improvements indicate that the excluded variables (e.g., soil.pc1) did not provide independent explanatory power beyond the retained primary predictors such as TR.V, and their removal reduced redundancy while enhancing model parsimony.# Put indices and birdnet results in 1 data frame#Acoustic response variablesclean_ac_vars <-colnames(scaled_acoustic)clean_bird_vars <-colnames(scaled_birdnet)final_acoustic <-cbind(scaled_acoustic[, clean_ac_vars, drop =FALSE], scaled_birdnet[, clean_bird_vars, drop =FALSE])head(final_acoustic)
I fitted individual multiple linear regression models using the lm() function to assess the influence of the four retained environmental predictors on each of the nine acoustic response variables (five acoustic indices and four BirdNET-derived metrics).
Response term estimate std.error
Length:45 Length:45 Min. :-0.82300 Min. :0.1445
Class :character Class :character 1st Qu.:-0.17083 1st Qu.:0.2028
Mode :character Mode :character Median : 0.00000 Median :0.2375
Mean :-0.03429 Mean :0.2285
3rd Qu.: 0.23055 3rd Qu.:0.2598
Max. : 0.57179 Max. :0.3080
statistic p.value
Min. :-4.55617 Min. :0.0008218
1st Qu.:-0.77526 1st Qu.:0.0939884
Median : 0.00000 Median :0.3833199
Mean :-0.08767 Mean :0.4345601
3rd Qu.: 0.95981 3rd Qu.:0.7655387
Max. : 3.19355 Max. :1.0000000
Model diagnostics included residual normality (Shapiro–Wilk test, Q–Q plots), homoscedasticity (Breusch–Pagan test), and identification of influential observations (Cook’s distance threshold 4/n). I verified that all models met the assumptions of linear regression.
Show Code
# MODEL DIAGNOSTICS# ================================================================diag_results <-list()par(mfrow =c(2, 3))for(i in1:ncol(final_acoustic)) { var_name <-colnames(final_acoustic)[i]#model diag_model <-lm(final_acoustic[,i] ~ ., data = final_env)plot(diag_model, which =1, main =paste("Resid vs Fit:", var_name))plot(diag_model, which =2, main =paste("Q-Q Plot:", var_name))plot(diag_model, which =3, main =paste("Scale-Location:", var_name)) cooksd <-cooks.distance(diag_model)plot(cooksd, type ="h", main =paste("Cook's D:", var_name))abline(h =4/length(cooksd), col ="red", lty =2) shapiro_p <-shapiro.test(residuals(diag_model))$p.value bp_p <- lmtest::bptest(diag_model)$p.value max_cook <-max(cooksd)mtext(paste("Shapiro p =", round(shapiro_p, 4)), side =3, line =0, cex =0.7)mtext(paste("BP test p =", round(bp_p, 4)), side =3, line =-1, cex =0.7) diag_results[[var_name]] <-data.frame(Acoustic_Variable = var_name, Shapiro_Wilk_p = shapiro_p,BP_Test_p = bp_p, Max_CooksD = max_cook )}
Show Code
# Combine all stored results into one final data framediagnostic_summary <-do.call(rbind, diag_results)print(diagnostic_summary)
To test the overarching hypothesis that acoustic composition varies along land degradation gradients, I performed Redundancy Analysis (RDA) using the vegan package (Oksanen et al., 2025). The RDA model constrained the acoustic response matrix by the four retained environmental predictors. I assessed the global significance of the RDA model using permutation tests (999 permutations) implemented with the permute package (Simpson et al., 2026) to assess: Global model significance, Significance of constrained axes, Significance of individual environmental predictors. Adjusted R² was calculated to estimate the true explanatory power while accounting for model complexity. Effect sizes were quantified as the proportion of constrained variance explained by each predictor and by each RDA axis.
Show Code
# REDUNDANCY ANALYSIS (RDA)# ================================================================# This tests if the landscape variation explains the acoustic variationrda_model <-rda(final_acoustic ~ ., data =as.data.frame(final_env))print(summary(rda_model))