# --- 1. CORE DATA MANIPULATION & TIDYING ---library(tidyverse) # Essential: dplyr, ggplot2, tidyr, readr, purrr, stringrlibrary(lubridate) # Date/time handling (Year/Month parsing)library(reshape2) # Necessary for melt() in correlation heatmapslibrary(scales) # Advanced axis scaling for plots library(ggplot2)library(dplyr)library(readr)library(stringr)library(patchwork)library(magick)# --- 2. BAYESIAN MODELING ---library(brms) # Bayesian Multilevel Models library(tidybayes) library(broom.mixed) # --- 3. TREND DETECTION & RESTREND ---library(Kendall) # Mann-Kendall trend testslibrary(trend) # sens.slope() calculation# --- 4. SPATIAL & REMOTE SENSING DATA ---library(sf) # Vector data library(terra) # Raster handling library(usdm) # VIF analysis for multicollinearity # --- 5. MULTIVARIATE ANALYSIS---library(vegan) # RDA, PERMANOVA, and Bray-Curtislibrary(FactoMineR) # PCA for environmental variable reductionlibrary(factoextra) # Clean visualization of PCA/Eigenvalueslibrary(betapart) # Beta diversity partitioning (Jaccard/Sorensen)library(entropart) # Hill numbers (DivPart, DivProfile) and MetaCommunitylibrary(iNEXT.3D) # 3D rarefaction and diversity estimationlibrary(iNEXT.beta3D) # Beta diversity rarefaction# --- 6. STATISTICAL DIAGNOSTICS ---library(car) # ANOVA and Levene’s testlibrary(rstatix) # Pipe-friendly statistical testslibrary(lmtest) # Heteroscedasticity testing# --- 7. VISUALIsATION & REPORTING ---library(patchwork) # Combining multiple plots library(GGally) # ggpairs for initial data explorationlibrary(knitr) # kable() for table generationlibrary(kableExtra) # Styling kable tables for PDF/HTML reports
This document contains analyses of data collected from the 10 x 10 km study area. Soil samples were collected from each subplot (100m2) to create a composite sample for every plot (1000m2).
Rangeland plants data were collected along North-South and East-West 28-metres transects cutting across each plot. At every 2 metres along the transect, the nearest annual and perennial grasses, forbs, shrubs and woody species were recorded.
Trees height and diametre at breast height, and shrubs height, length and width and species names were recorded in each subplot in every plot. Point count transects for bird monitoring were 1 km long cutting across each cluster (1km2).
Acoustic recorders were placed at the centre of each cluster.
For the rangeland plants, birds from the point count monitoring and birds from the BirdNet derived avian species, I used the package enropart to calculate the diversity using Hill’s numbers. Vegetation structural variables (shrub volume per hectare, tree volume per hectare, percentage cover per hectare, tree basal area), were calculated within the tidyverse framework. I used Kaleidoscope to compute the acoustic indices.
All variables were aggregated at the cluster level.
To identify the main factors influencing soil properties, I used Principal Component Analysis (PCA) to transform the original 18, correlated variables into a smaller set of uncorrelated Principal Components (PCs), which capture the maximum variance in the data (Souza, 2025). I carried out the PCA using the PCA() function from the FactoMineR package in R (Husson et al., 2024). I determined the significance of each principal component by examining its eigenvalue; components with eigenvalues greater than one were considered meaningful. I examined variable loadings and correlation structures to interpret the ecological significance of each axis.
✅ Interpretation: T#The eigenvalues represent the amount of variance captured
by each principal component (PC).The first five components have eigenvalues greater than 1, suggesting they retain meaningful information.
✅ Interpretation: Cumulative variance: PC1 explains 41.8% of the total variance. PC2 adds 24.9%, bringing the cumulative variance to 66.7% #Variable Contributions to Principal Components (Correlation with PCs)
✅ Interpretation: PC1:These properties are strongly affected by fertilisers especially NPK Strong positive correlations: CEC (0.98), Ca (0.98), Mg (0.95), B (0.84), pH (0.89). High values of these components show high use of the fertilisers and lime. Further, the soils showed strong negative correlation: Al (-0.65), Sand (-0.54), ExAc (-0.51), Fe (-0.49). Suggests PC1 represents cation exchange capacity (CEC) and soil fertility.
✅ Interpretation: PC2:High PC2: Probably in conservation areas and unfarmed hilly areas (due to organic matter accumulation).Strongest positive correlations: TN (0.97), SOC (0.94),EC (0.70), Sand (0.62). Negative correlations Clay (-0.69), Na (-0.60) Indicates PC2 captures organic matter and soil texture gradients.
Soil spatial statistics
To test for multivariate differences in overall soil composition among the 16 clusters, I conducted a permutational multivariate analysis of variance (PERMANOVA) with Euclidean distances, based on standardised soil variables (adonis2 function, vegan package, Oksanen et al., 2025). I identified which specific soil variables exhibited significant spatial variation by applying Kruskal-Wallis tests to each variable individually, using a non-parametric approach due to violations of normality. I subsequently tested for spatial differences in the extracted scores of the first two principal components (PC1 and PC2) among clusters using one-way ANOVA.
Show Code
# Shapiro-Wilk test for all raw soil variablessoil_normality <- soil_data %>%pivot_longer(cols =c(pH, SOC, TN, EC, P, Al, B, Ca, Fe, K, Mg, Na, CEC, ExAc, PSI, Clay, Silt, Sand), names_to ="Variable", values_to ="Value") %>%group_by(Variable) %>%shapiro_test(Value)# PERMANOVA Does the overall soil change between clusters?soil_matrix <- soil_data %>%ungroup() %>% dplyr::select(-cluster, -plot)adonis2(soil_matrix ~ cluster, data = soil_data)
Permutation test for adonis under reduced model
Permutation: free
Number of permutations: 999
adonis2(formula = soil_matrix ~ cluster, data = soil_data)
Df SumOfSqs R2 F Pr(>F)
Model 15 1.9487 0.59707 14.028 0.001 ***
Residual 142 1.3151 0.40293
Total 157 3.2638 1.00000
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Show Code
# Kruskal-Wallis Which specific nutrients or textures are different?soil_kruskal <- soil_data %>%pivot_longer(cols =c(pH, SOC, TN, EC, P, Al, B, Ca, Fe, K, Mg, Na, CEC, ExAc, PSI, Clay, Silt, Sand),names_to ="Variable",values_to ="Value" ) %>%group_by(Variable) %>%kruskal_test(Value ~ cluster) %>%add_significance() %>%filter(p <0.05) soil_kruskal %>%kable(caption ="Significant Soil Property Differences Across Clusters") %>%kable_styling(bootstrap_options =c("striped", "hover")) %>%scroll_box(width ="100%", height ="400px")
Significant Soil Property Differences Across Clusters
Variable
.y.
n
statistic
df
p
method
p.signif
Al
Value
158
97.01431
15
0.00e+00
Kruskal-Wallis
****
B
Value
158
71.90914
15
0.00e+00
Kruskal-Wallis
****
CEC
Value
158
94.06721
15
0.00e+00
Kruskal-Wallis
****
Ca
Value
158
94.12593
15
0.00e+00
Kruskal-Wallis
****
Clay
Value
158
76.87263
15
0.00e+00
Kruskal-Wallis
****
EC
Value
158
47.05928
15
3.60e-05
Kruskal-Wallis
****
ExAc
Value
158
44.76172
15
8.35e-05
Kruskal-Wallis
****
Fe
Value
158
39.87837
15
4.73e-04
Kruskal-Wallis
***
K
Value
158
86.76481
15
0.00e+00
Kruskal-Wallis
****
Mg
Value
158
93.31948
15
0.00e+00
Kruskal-Wallis
****
Na
Value
158
85.33213
15
0.00e+00
Kruskal-Wallis
****
P
Value
158
44.24040
15
1.01e-04
Kruskal-Wallis
***
PSI
Value
158
52.21129
15
5.20e-06
Kruskal-Wallis
****
SOC
Value
158
47.75408
15
2.79e-05
Kruskal-Wallis
****
Sand
Value
158
77.46504
15
0.00e+00
Kruskal-Wallis
****
Silt
Value
158
35.38149
15
2.17e-03
Kruskal-Wallis
**
TN
Value
158
53.98273
15
2.60e-06
Kruskal-Wallis
****
pH
Value
158
105.81762
15
0.00e+00
Kruskal-Wallis
****
Show Code
# PCA Test (PC1 & PC2 acrosee the landscape) ---summary(aov(PC1 ~ Cluster, data = pca_scores))
Df Sum Sq Mean Sq F value Pr(>F)
Cluster 15 733.2 48.88 15.27 <2e-16 ***
Residuals 142 454.6 3.20
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Show Code
summary(aov(PC2 ~ Cluster, data = pca_scores))
Df Sum Sq Mean Sq F value Pr(>F)
Cluster 15 237.9 15.859 4.802 1.83e-07 ***
Residuals 142 468.9 3.302
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
✅ Interpretation:
The PERMANOVA revealed that overall soil composition differed significantly among the 16 clusters (F = 14.03, p = 0.001). The cluster grouping explained 59.7% of the multivariate variance in soil properties. Kruskal-Wallis tests on individual soil variables confirmed that all 18 measured variables exhibited statistically significant spatial variation (all p < 0.01).
Plants Data
Rangeland Plants metacommunity
Following Marcon et al. (2025) I used the entropart R package to create a MetaCommunity object. These objects integrated abundance data for rangeland plant species—including perennial and annual grasses, shrubs, and trees recorded along 28-m transects—as well as avian community data derived from both traditional point counts and BirdNET automated detections. In this study, each metacommunity was defined as the total collection of species recorded at the landscape level across 16 clusters, with each cluster treated as a local community. These metacommunities served as the computational basis for partitioning diversity, as it allows for the simultaneous calculation of local α (alpha), between-site β (beta) and total landscape γ (gamma) diversity while weighting each community according to its sample size (Marcon et al., 2025).
✅ Interpretation: Analysis of the plant community across the 16 clusters revealed significant spatial structuring. The analysis identified a total of 132 species distributed across 16 local communities, with an exceptionally high estimated sample coverage (0.9998), indicating that the survey nearly captured the full extent of the species richness present in the study area.
Rangeland Plants Community Alpha Diversity
I then evaluated the local community structure by calculating theα-diversity for each of the 16 clusters using Hill numbers (q = 0, 1, and 2) via the DivPart function of entropart with the “Best” estimator (Marcon et al., 2025). This provided measures of species richness (q = 0), Shannon-effective diversity (q = 1), and Simpson-effective diversity (q = 2). Diversity profiles were generated using the DivProfile function (entropart package) to visualise the transition from richness to dominance-weighted diversity (Marcon et al., 2025). Additionally, I calculated community evenness as the ratio of Simpson diversity to richness (q2 / q0).
Note: This data frame contains grasses, shrubs and trees collected along the 28 metres transect in each plot
Show Code
#all species in rangeland form# Alpha diversity for all(q=0,1,2)alpha.rangelands.0<-DivPart(q=0, mc.rangelands, Correction ="Best")summary(alpha.rangelands.0)
HCDT diversity partitioning of order 0 of metaCommunity mc.rangelands
Alpha diversity of communities:
C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16
55 45 50 61 57 50 56 65 47 63 52 50 44 55 48 55
Total alpha diversity of the communities:
[1] 53.3125
Beta diversity of the communities:
None
2.475967
Gamma diversity of the metacommunity:
None
132
To test for significant spatial differences in community composition, a PERMANOVA was performed on the plot-level plant species abundance matrix (n = 160) and at the point station level for birds (n = 96) using Bray–Curtis dissimilarity. Spatial variation in alpha diversity across the 16 clusters was evaluated using one-way ANOVA (stats package).
Permutation test for adonis under reduced model
Permutation: free
Number of permutations: 999
adonis2(formula = species_matrix ~ cluster, data = rangelands_plots, permutations = 999, method = "bray")
Df SumOfSqs R2 F Pr(>F)
Model 15 6.630 0.21198 2.5824 0.001 ***
Residual 144 24.647 0.78802
Total 159 31.277 1.00000
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Show Code
# Normality of Residuals (Shapiro)plot_diversity_long <- plot_diversity %>%pivot_longer(cols =c(q0, q1, q2), names_to ="q_order", values_to ="value")shapiro_results <- plot_diversity_long %>%group_by(q_order) %>%shapiro_test(value)# Homogeneity of Variance (Levene's Test for q1)levene_q1 <-leveneTest(q1 ~ cluster, data = plot_diversity)print(shapiro_results)
# A tibble: 3 × 4
q_order variable statistic p
<chr> <chr> <dbl> <dbl>
1 q0 value 0.990 0.325
2 q1 value 0.978 0.0109
3 q2 value 0.969 0.00122
Show Code
print(levene_q1)
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 15 0.8086 0.6667
144
Show Code
# ANOVA MODELS ---aov_q0 <-aov(q0 ~ cluster, data = plot_diversity)aov_q1 <-aov(q1 ~ cluster, data = plot_diversity)aov_q2 <-aov(q2 ~ cluster, data = plot_diversity)summary(aov_q0)
Df Sum Sq Mean Sq F value Pr(>F)
cluster 15 976 65.05 2.184 0.00929 **
Residuals 144 4288 29.78
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Show Code
summary(aov_q1)
Df Sum Sq Mean Sq F value Pr(>F)
cluster 15 511 34.07 4.157 2.58e-06 ***
Residuals 144 1180 8.20
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Show Code
summary(aov_q2)
Df Sum Sq Mean Sq F value Pr(>F)
cluster 15 351.5 23.431 4.86 1.37e-07 ***
Residuals 144 694.3 4.821
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Show Code
# Run Kruskal-Wallis for all diversity orders to confirm the ANOVA test since q1 and q2 didnt meet the assumption of normalitykruskal_q0 <-kruskal.test(q0 ~ cluster, data = plot_diversity)kruskal_q1 <-kruskal.test(q1 ~ cluster, data = plot_diversity)kruskal_q2 <-kruskal.test(q2 ~ cluster, data = plot_diversity)list(q0 = kruskal_q0, q1 = kruskal_q1, q2 = kruskal_q2)
$q0
Kruskal-Wallis rank sum test
data: q0 by cluster
Kruskal-Wallis chi-squared = 30.776, df = 15, p-value = 0.009411
$q1
Kruskal-Wallis rank sum test
data: q1 by cluster
Kruskal-Wallis chi-squared = 49.428, df = 15, p-value = 1.493e-05
$q2
Kruskal-Wallis rank sum test
data: q2 by cluster
Kruskal-Wallis chi-squared = 54.44, df = 15, p-value = 2.217e-06
Show Code
#ANOVA AND KRUSKAL RESULTS ARE CONSISTENT
✅ Interpretation: One-way ANOVA revealed significant spatial variation in alpha diversity among the 16 clusters across all Hill-number orders (Table 4). Differences were significant for q = 0 (p = 0.009), q = 1 (p < 0.001), and q = 2 (p < 0.001). The strength of the cluster effect increased with q, indicating that dominant‑species diversity varied more strongly across the landscape than the species richness.
Rangeland Plants diversity profiles
Diversity profiles were generated using the DivProfile function (entropart package) to visualise the transition from richness to dominance-weighted diversity (Marcon et al., 2025).
I quantified species turnover (β-diversity) using abundance-based Bray–Curtis dissimilarity via the betapart package(Baselga et al., 2025). The beta diversity analysis, based on the abundance-weighted Bray-Curtis dissimilarity index, provides quantitative evidence of the spatial variation in species composition and abundance structure. The total dissimilarity (β BRAY) is divided into two components: β BRAY.BAL, representing balanced species turnover or replacement, and βBRAY.GRA, representing differences due to species richness and nestedness.
✅ Interpretation: The extremely high total dissimilarity (β.BRAY = 0.988) indicates near-complete differentiation in community structure across clusters. This was overwhelmingly driven by species turnover (β.BRAY.BAL = 0.818), with a minor contribution from nestedness resulting from abundance gradients (β.BRAY.GRA = 0.169). The statistical significance of this compositional differentiation was confirmed by PERMANOVA (F₁₅,₁₄₄ = 2.58, p < 0.001; R² = 0.212), which demonstrated that spatial grouping explained a substantial portion of community variation. These results demonstrate that strong species replacement and pronounced abundance gradients jointly shape landscape-level variation, resulting in a highly heterogeneous rangeland environment despite the shared presence of common dominant taxa.
Rangeland Plants Species rank curves
The species rank abundance curves is important to characterise structural shifts in plant communities across the landscape’s 16 clusters.
I used non-parametric Kruskal-Wallis tests to assess spatial differences in shrub and tree cover and volume across 16 clusters, following a significant deviation from normality in the data (Shapiro-Wilk, p < 0.001). The distributions were strongly right-skewed and contained zero values corresponding to plots without woody vegetation. I defined statistical significance for all models at α = 0.05.
Show Code
#TEST ASSUMPTIONSshrub_norm <-shapiro_test(residuals(lm(volume_ha.sh ~ Cluster, data = shrub_plot_summary)))tree_norm <-shapiro_test(residuals(lm(volume_ha.tr ~ Cluster, data = tree_reps)))shrub_var <-leveneTest(volume_ha.sh ~ Cluster, data = shrub_plot_summary)tree_var <-leveneTest(volume_ha.tr ~ Cluster, data = tree_reps)print(shrub_norm)
# A tibble: 5 × 8
Variable .y. n statistic df p method p.signif
<chr> <chr> <int> <dbl> <int> <dbl> <chr> <chr>
1 cover_percent.sh Value 160 53.8 15 0.00000287 Kruskal-Wall… ****
2 volume_ha.sh Value 160 54.1 15 0.00000254 Kruskal-Wall… ****
3 basal_area_ha.tr Value 160 45.6 15 0.0000616 Kruskal-Wall… ****
4 cover_percent.tr Value 160 45.6 15 0.0000616 Kruskal-Wall… ****
5 volume_ha.tr Value 160 45.2 15 0.0000708 Kruskal-Wall… ****
✅ Interpretation: The Kruskal-Wallis tests revealed statistically significant spatial variation across all five measured woody vegetation metrics: shrub metrics (percentage cover and stem volume per hectare) and tree metrics (basal area per hectare, stem volume per hectare, and percentage cover) (p < 0.001) (Table 5).
Point count
Birds metacommunity and alpha diversity
Same diversity measures as above. Metacommunity, alpha diversity, beta and gamma diverisity.
To test for significant spatial differences in community composition, a PERMANOVA was performed on the plot-level plant species abundance matrix (n = 160) and at the point station level for birds (n = 96) using Bray–Curtis dissimilarity. Spatial variation in alpha diversity across the 16 clusters was evaluated using one-way ANOVA (stats package).
Show Code
birds_stations <- birds.data %>%# pivot_longer pivot_longer(cols =!species, names_to ="station", values_to ="freq") %>%group_by(station, species) %>%summarise(freq =sum(freq, na.rm =TRUE), .groups ="drop") %>%pivot_wider(names_from = species, values_from = freq, values_fill =0) %>%# Assign 16 clusters with 6 stations each (ensure your data rows match 16 * 6 = 96)mutate(cluster =factor(paste0("C", rep(1:16, each =6)), levels =paste0("C", 1:16))) %>%relocate(cluster, station)# Meta-Community & Alpha Diversity at Station Level# Transpose for entropart: species as rows, stations as columnsmc_birds_stations <- birds_stations %>%ungroup() %>% dplyr::select(-cluster) %>%column_to_rownames("station") %>%t() %>%as.data.frame() %>%MetaCommunity()# Hill Numbers (q=0, 1, 2) per stationq0_birds <-DivPart(q=0, mc_birds_stations, Correction ="Best")$CommunityAlphaDiversitiesq1_birds <-DivPart(q=1, mc_birds_stations, Correction ="Best")$CommunityAlphaDiversitiesq2_birds <-DivPart(q=2, mc_birds_stations, Correction ="Best")$CommunityAlphaDiversities# Create diversity dataframe for testingbirds_div_stats <- birds_stations %>% dplyr::select(cluster, station) %>%mutate(q0.b = q0_birds, q1.b = q1_birds, q2.b = q2_birds)# PERMANOVA (Community Composition)# Does the species mix differ significantly between clusters?bird_species_matrix <- birds_stations %>%ungroup() %>% dplyr::select(-cluster, -station)birds_permanova <-adonis2(bird_species_matrix ~ cluster, data = birds_stations, method ="bray", permutations =999)print(birds_permanova)
Permutation test for adonis under reduced model
Permutation: free
Number of permutations: 999
adonis2(formula = bird_species_matrix ~ cluster, data = birds_stations, permutations = 999, method = "bray")
Df SumOfSqs R2 F Pr(>F)
Model 15 4.5142 0.36384 3.0503 0.001 ***
Residual 80 7.8929 0.63616
Total 95 12.4071 1.00000
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# A tibble: 3 × 4
q_order variable statistic p
<chr> <chr> <dbl> <dbl>
1 q0.b value 0.984 0.306
2 q1.b value 0.981 0.164
3 q2.b value 0.976 0.0771
Show Code
print(levene_birds)
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 15 2.0879 0.01873 *
80
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Show Code
# ANOVA and Kruskal-Wallis# Running both to ensure consistency, check if normality is violatedkruskal_birds <-list(q0 =kruskal.test(q0.b ~ cluster, data = birds_div_stats),q1 =kruskal.test(q1.b ~ cluster, data = birds_div_stats),q2 =kruskal.test(q2.b ~ cluster, data = birds_div_stats))print(kruskal_birds)
$q0
Kruskal-Wallis rank sum test
data: q0.b by cluster
Kruskal-Wallis chi-squared = 28.721, df = 15, p-value = 0.01747
$q1
Kruskal-Wallis rank sum test
data: q1.b by cluster
Kruskal-Wallis chi-squared = 17.637, df = 15, p-value = 0.2823
$q2
Kruskal-Wallis rank sum test
data: q2.b by cluster
Kruskal-Wallis chi-squared = 20.733, df = 15, p-value = 0.1456
Show Code
# ANOVA MODELS ---aov_q0.b <-aov(q0.b ~ cluster, data =birds_div_stats)aov_q1.b <-aov(q1.b ~ cluster, data = birds_div_stats)aov_q2.b <-aov(q2.b ~ cluster, data = birds_div_stats)summary(aov_q0.b)
Df Sum Sq Mean Sq F value Pr(>F)
cluster 15 1922 128.10 2.64 0.00276 **
Residuals 80 3882 48.53
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Show Code
summary(aov_q1.b)
Df Sum Sq Mean Sq F value Pr(>F)
cluster 15 544.8 36.32 1.469 0.137
Residuals 80 1977.7 24.72
Show Code
summary(aov_q2.b)
Df Sum Sq Mean Sq F value Pr(>F)
cluster 15 385.7 25.71 1.594 0.0942 .
Residuals 80 1290.8 16.14
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
✅ Interpretation: One-way ANOVA showed that species richness (q = 0) differed significantly among the 16 clusters, whereas Shannon diversity (q = 1) and Simpson diversity (q = 2) did not show significant spatial differences (Table 6).
Remote Sensing Time Series Analysis for environmental monitoring, to evaluate vegetation health and soil characteristics across 16 (tiles) from 2019 to 2026, using monthly satellite-derived data. I focus on; NDVI (Normalized Difference Vegetation Index): Measures vegetation “greenness” or health. SAVI (Soil Adjusted Vegetation Index): Similar to NDVI but corrects for soil brightness in areas with sparse vegetation. BSI (Bare Soil Index): Used to identify areas with little to no vegetation (bare soil). Statistical Trend Analysis Mann-Kendall Test: Determines if there is a statistically significant monotonic upward or downward trend. Sen’s Slope: Estimates the magnitude of that trend. RESTREND Analysis (RESTREND stands for Residual Trend Analysis). It performs a linear regression between the indices and rainfall (CHIRPS data). It calculates the Residuals (the difference between what the vegetation should look like based on rain vs. what it actually looks like). If the residuals show a downward trend, it suggests land degradation is occurring due to human activity or other factors independent of rainfall. ##Vegetation Structure
Show Code
ldsf_raw <-read_csv("data/Mara_Vegstructure_2025(in).csv") %>%mutate(vegstructure =str_replace_all(vegstructure, "_", " ")) %>%mutate(vegstructure =str_to_sentence(vegstructure)) %>%mutate(vegstructure =factor(vegstructure, levels =c("Grassland", "Wooded grassland", "Shrubland", "Woodland", "Bushland", "Cropland" )))structure_summary <- ldsf_raw %>%group_by(vegstructure) %>%summarise(Count =n(), .groups ="drop") %>%mutate(Percentage = (Count /sum(Count)) *100) # Bar Chart (Panel A)p1 <-ggplot(structure_summary, aes(x =reorder(vegstructure, -Count), y = Count, fill = vegstructure)) +geom_bar(stat ="identity", color ="white", width =0.7) +geom_text(aes(label =paste0(round(Percentage, 1), "%")), vjust =-0.5, size =3.5, fontface ="bold") +scale_fill_viridis_d(option ="viridis", direction =-1) +theme_minimal() +theme(panel.grid.major.x =element_blank(),legend.position ="none",axis.title =element_text(size =11, family ="calibri"), axis.text =element_text(size =10, color ="black"),axis.line.x =element_line(color ="black") ) +labs(x ="Land Cover Classification", y ="Number of Plots")# Load the QGIS Map (Panel B)qgis_map <-image_read("Veg structure.png")qgis_map_grob <- grid::rasterGrob(qgis_map)# Combine and Save Final Figurecombined_figure <- p1 +wrap_elements(qgis_map_grob) +plot_layout(widths =c(1, 1.2)) +plot_annotation(tag_levels ='A')ggsave("Figure1_Final_Submission.png", combined_figure, width =14, height =6, dpi =300, bg ="white")
Acoustics
Indices
I calculated a suite of standard acoustic indices using Kaleidoscope Pro software (Version 5.8.1; Wildlife Acoustics, Inc.), which provides specialised tools for processing and analysing environmental sound recordings (Bradfer-Lawrence et al., 2019; Greenhalgh et al., 2021; Hyland et al., 2023). I conducted the acoustic indices analysis using the Non-Bat Analysis Mode with default parameter settings to maintain methodological consistency and ensure replicability of results. I batch-processed the audio recordings, segmenting larger files to improve the granularity of the analysis. The software simultaneously calculated the following indices: the Acoustic Complexity Index (ACI), Acoustic Diversity Index (ADI), Acoustic Evenness Index (AEI), Bioacoustics Index (BI), Normalised Difference Soundscape Index (NDSI), and Entropy Index (H).
To identify avian vocalisations within the acoustic recordings, I employed an automated analysis workflow using BirdNET (v2.4.0; Kahl et al., 2023). BirdNET is a deep neural network developed for the automated detection and classification of bird vocalisations, drawing on extensive global reference libraries (Kahl et al., 2021). The model can identify more than 6,000 bird species worldwide based on diagnostic spectral features in the audio signal (Kahl et al., 2023). To ensure comparability across data types, the BirdNET acoustic detections were grouped and summarised by season, cluster, and species to determine the total detection frequency per species. Subsequently, community diversity metrics were calculated following the same procedures used for the vegetation and manual point-count datasets. This consistent application of Hill numbers (q0, q1, q2) and beta diversity partitioning enabled a direct comparison of community structure and spatial heterogeneity across all three survey methods.
birdnet_processed <- birdnet_data %>%filter(Confidence >0.75) %>%#Higher confidence thresholds (0.7-0.8) will reduce the need for manual verification, while potentially missing some rare or poorly-recorded species. dplyr::select(species =`Common name`, File) %>%separate(File, into =c("Drive", "Project", "Category", "SeasonFolder","ClusterFolder", "SubFolder", "FileName"),sep ="\\\\", remove =FALSE) %>%mutate(cluster =paste0("c", str_extract(ClusterFolder, "\\d+")), season =paste(str_extract(SeasonFolder, "^[A-Za-z]{3}"),str_extract(SeasonFolder, "\\d{2}(?=\\.)|\\d{2}$") ) ) %>% dplyr::select(species, cluster, season)#group_by(species) %>% filter(n() <= 10) %>% ungroup() #Requiring a species to be detected a minimum number of times (e.g., ten detections) before including it in results can help reduce misidentifications. Confidence thresholds around 0.5 provide a balance between precision and recall at the assemblage level, while requiring a minimum number of detections per species reduces false positives from sporadic misidentifications. ##With a threshold of 0.5, the total number of species is 427. Adding a filter to remove the species detected less than 10 times reduces the species to 230.head(birdnet_processed)
# A tibble: 6 × 3
species cluster season
<chr> <chr> <chr>
1 Scarlet-chested Sunbird c1 Oct 25
2 Scarlet-chested Sunbird c1 Oct 25
3 Scarlet-chested Sunbird c1 Oct 25
4 Scarlet-chested Sunbird c1 Oct 25
5 Scarlet-chested Sunbird c1 Oct 25
6 Scarlet-chested Sunbird c1 Oct 25
Show Code
#n_distinct(birdnet_processed$species)#View(distinct(birdnet_processed, species))# Convert the long detection list into a wide Community Matrix for meta analysisbirdnet_matrix <- birdnet_processed %>%group_by(season, species, cluster) %>%summarise(detections =n(), .groups ="drop") %>%pivot_wider(names_from = cluster, values_from = detections, values_fill =0) %>%arrange(season, species)head (birdnet_matrix)
✅ Interpretation: The analysis of acoustic data across the study sites revealed substantial spatial heterogeneity in both BirdNET’s classification performance and the resulting avian community structures. The BirdNET analysis comprised 94,927 detections across 16 communities, each with equal sample coverage weighting (Wi = 0.0625). A total of 284 species were detected. Overall sample coverage was exceptionally high (0.999), indicating that the sampling effort captured nearly all species present in the regional pool. Individual community coverage values ranged from 0.992 to 0.998, consistently demonstrating near-complete inventory at each site. Total Bray–Curtis dissimilarity was 0.866, partitioned into a balanced variation component (turnover = 0.790) and an abundance gradient component (nestedness = 0.076). This pronounced dominance of turnover over nestedness reveals that compositional differences between communities are driven almost entirely by species replacement rather than differences in species richness or abundance of shared species. The results confirm a highly diverse, well-sampled assemblage in which BirdNET performed reliably, with distinct species assemblages characterising each of the 16 recording locations.
Combined dataset
These data set comprises 11 acoustic metrics: Bioacoustic Index (BI), Normalised Difference Soundscape Index (NDSI), Acoustic Diversity Index (ADI), Acoustic Evenness Index (AEI), Acoustic Complexity Index (ACI), Acoustic Entropy (SH), ultrasonic metrics, and BirdNET-derived avian diversity metrics (q0, q1, q2, D). I initially compiled 17 environmental predictors spanning vegetation structural variables (shrub volume per hectare, tree volume per hectare, percentage cover per hectare, tree basal area), rangeland plant species diversity metrics (calculated using Hill numbers: q0, q1, q2, and evenness D), bird community diversity metrics (Hill’s numbers derived from manual point counts), soil health components (PC1 and PC2), and residual slope values of remotely sensed vegetation indices (Normalized Difference Vegetation Index [NDVI], Soil Adjusted Vegetation Index [SAVI], and Bare Soil Index [BSI]).
Show Code
# source https://www.sthda.com/english/wiki/ggally-r-package-extension-to-ggplot2-for-correlation-matrix-and-survival-plots-r-software-and-data-visualization# Create the dataset# https://www.sthda.com/english/articles/31-principal-component-methods-in-r-practical-guide/# #https://www.sthda.com/english/articles/37-model-selection-essentials-in-r/154-stepwise-regression-essentials-in-r/#https://www.sthda.com/english/articles/38-regression-model-validation/157-cross-validation-essentials-in-r/#https://www.sthda.com/english/articles/38-regression-model-validation/158-regression-model-accuracy-metrics-r-square-aic-bic-cp-and-more/# https://www.sthda.com/english/articles/40-regression-analysis/168-multiple-linear-regression-in-r/#https://www.sthda.com/english/articles/39-regression-model-diagnostics/160-multicollinearity-essentials-and-vif-in-r/# Combine all data sourcesbig.dataset <-data.frame(soil.pc, alphas.rangelands, shrub.volume.cover, tree.vol.cov.area, spatial_trends_csv, alphas.birds, alphas.birdnet, acoustic_cluster_means) %>% dplyr::select(-matches("^cluster", ignore.case =TRUE), cluster) %>%relocate(cluster)print(colnames(big.dataset))
I visualised exploratory relationships among variables using scatterplot matrices implemented with the ggpairs function (GGally package, Schloerke et al., 2025) facilitating assessment of distributional properties, non-linear patterns, outliers, and preliminary multicollinearity.
Prior to analysis, I standardised all predictor and response variables to zero mean and unit variance using the scale() function to ensure comparability of coefficients and to meet assumptions of multivariate analyses. I applied scaling separately to each group using the scale() function, which centres each variable by subtracting its mean and divides by its standard deviation, resulting in variables with mean = 0 and standard deviation = 1.
I evaluated multicollinearity among the 16 remaining predictors using the Variance Inflation Factor (VIF) with the vifstep function in the usdmpackage (Naimi, 2023). I grouped variables a priori into ecological meaningful categories to guide variable selection. I applied a threshold of VIF < 3 for variable retention. Within each group, I sequentially removed variables with VIF > 3 using a stepwise procedure (vifstep) until all remaining variables met the threshold (Zuur et al., 2010). For the remote sensing variables, which exhibited extreme multicollinearity (VIF > 10 for all three indices), I conducted principal component analysis (PCA) using prcomp() with scaling to reduce dimensionality. We retained the first principal component (remote_pc1), which explained 97.1% of the variance, for subsequent analyses.
After within-group variable selection, I combined all retained predictors into a single environmental dataset. I reassessed global multicollinearity across all variables using car::vif() (Fox & Weisberg, 2019) and applied a second VIF stepwise selection (threshold = 3) to remove any remaining collinear variables across groups. This process yielded a final set of environmental predictors with acceptable multicollinearity: rangeland plant richness (q0), tree volume per hectare, soil properties principal component 1, and the remote sensing composite (remote_pc1).
Show Code
# VIF: INDIVIDUAL GROUP PROCESSING# --- GROUP 3: RANGELAND DIVERSITY ---vif_range <-vifstep(rangelands_sc, th =3)rangelands_final <-exclude(rangelands_sc, vif_range)colnames( rangelands_final)
# If any cross-group VIFs are still > 3, remove themenv.final_vif <-vifstep(env.selected, th =3)env.final_vars <-exclude(env.selected, env.final_vif)env.final_vars <- env.final_vars %>% dplyr::select( -SH.C, -D.eveb)colnames(env.final_vars)
[1] "q0" "TR.V" "remote_pc1" "soil.pc1"
##RDA To test the overarching hypothesis that environmental gradients explain variation in both acoustic patterns and bird communities, I performed redundancy analysis (RDA) using the vegan package (Oksanen et al., 2025). I constructed a combined response matrix comprising all five acoustic indices and four BirdNET metrics, reflecting the integrated nature of soundscape ecology where acoustic patterns and bird communities represent different facets of the same ecological system. I used the final set of environmental predictors selected through the VIF procedure as explanatory variables. I assessed model significance using permutation tests with 999 permutations under the reduced model (Legendre & Legendre, 2012). I calculated adjusted R² to account for the number of predictors relative to sample size (Peres-Neto et al., 2006) and evaluated axis-specific significance using forward tests with 999 permutations.
To provide context and facilitate comparison with univariate analyses, I also conducted separate RDA analyses for acoustic indices and BirdNET metrics individually. I constructed two separate RDA models:
Acoustic model: Scaled acoustic indices (NDSI., ADI., ACI., SH., BI.) as response variables
I used the final set of environmental predictors selected through the VIF procedure as explanatory variables in both models. I assessed model significance using permutation tests with 999 permutations under the reduced model (Legendre & Legendre, 2012). I calculated adjusted R² values to account for the number of predictors relative to sample size (Peres-Neto et al., 2006). I evaluated axis-specific significance using forward tests with 999 permutations to identify which constrained ordination axes explained significant portions of the variance.
Permutation test for rda under reduced model
Permutation: free
Number of permutations: 999
Model: rda(formula = birdnet_sc ~ q0 + TR.V + remote_pc1 + soil.pc1, data = env.final_vars)
Df Variance F Pr(>F)
Model 4 1.5651 1.7676 0.164
Residual 11 2.4349
To complement the multivariate analyses and identify specific environmental drivers of individual acoustic and BirdNET metrics, I fitted separate multiple linear regression models using the lm() function. I modeled each of the nine response variables (five acoustic indices and four BirdNET-derived metrics) against the four retained environmental predictors. I assessed model assumptions through residual diagnostics, including normality of residuals (Shapiro-Wilk test), homoscedasticity (Breusch-Pagan test), and identification of influential observations (Cook’s distance threshold 4/n). I evaluated model fit using R², adjusted R², Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC).
Show Code
# REGRESSION MODELS WITH VIF-SELECTED VARIABLES#================================================================#("Model: Y ~ q0 + TR.V + soil.pc1 + remote_pc1"+)model_data <-cbind(birdnet_sc, env.final_vars, acoustics_sc)colnames(model_data)
Response term estimate std.error
Length:45 Length:45 Min. :-0.87175 Min. :0.1183
Class :character Class :character 1st Qu.:-0.20773 1st Qu.:0.1962
Mode :character Mode :character Median : 0.00000 Median :0.2319
Mean :-0.06112 Mean :0.2222
3rd Qu.: 0.09374 3rd Qu.:0.2619
Max. : 0.74039 Max. :0.2958
statistic p.value
Min. :-4.0830 Min. :0.001811
1st Qu.:-0.8151 1st Qu.:0.106888
Median : 0.0000 Median :0.550316
Mean :-0.2127 Mean :0.509151
3rd Qu.: 0.4865 3rd Qu.:0.972109
Max. : 3.2947 Max. :1.000000
# 4. Final Fit Table (Rounded)final_fit_summary <- fit_table %>%mutate(across(where(is.numeric), ~round(., 3)))cat("\n--- MODEL FIT SUMMARY (R2 & AIC) ---\n")