Intermediary report JRC

Author

Beillouin Damien

Objective of the study

The general objective of the proposed meta-analysis is to explore the relationship between the share of landscape features at different spatial scales and the abundance and richness of pollinators (namely bees, bumblebees, hoverflies and butterflies). We aim at understanding how is such relationship and whether some thresholds can be established (for instance, the share of landscape features from which no further significant increases of pollinators could be expected).

#Based on the objective of the study, several insightful hypotheses can be proposed to guide the meta-analysis:

Threshold Effect of Landscape Feature Share on Pollinator Abundance and Richness:

Hypothesis: There exists a threshold of landscape feature share (e.g., flower strips, hedgerows, woodland, etc.) beyond which increases in pollinator abundance and richness plateau or show diminishing returns. This would suggest that once a certain percentage of landscape features are present, additional landscape features do not significantly contribute to pollinator diversity or abundance.
Spatial Scale Influence on Pollinator Response:

Hypothesis: The effect of landscape features on pollinator abundance and richness varies depending on the spatial scale (e.g., farm-level vs. landscape-level). Smaller-scale features may have a stronger impact on local populations, while larger-scale landscape features may have a more pronounced effect on species richness due to increased connectivity and habitat availability.
Species-Specific Response to Landscape Features:

Hypothesis: Different pollinator species (bees, bumblebees, hoverflies, butterflies) respond differently to the presence of landscape features. For example, bees and bumblebees may be more sensitive to floral resources, while hoverflies and butterflies may be more influenced by habitat structure and diversity.

Overview of Methods

Search strings: TODO (Ana, Irene, Tulie)

first screeningTODO (Ana, Irene, Tulie)

Second Screening and Data Extraction

Following the initial screening, 65 papers were deemed suitable for detailed analysis. A thorough review of each paper, including supplementary materials, was conducted based on the following inclusion criteria:

Pollinator Species Counts: Presence of count data for pollinator species (specifically bees, flies, and butterflies) for each transect or aggregated at the seasonal level across multiple transect rounds or other sampling methods. Studies providing averaged data across multiple plots were directly accepted.
Landscape Element Information: Availability of data on landscape elements, particularly surface areas within various landscape features at different buffer radii. Priority was given to elements such as woody areas, grasslands, croplands, water bodies, and the sum of semi-natural habitats. Localized information (i.e., plot or farm-level data, such as surface area or proportion of semi-natural habitat) was also extracted when available.

The analysis of these 65 papers yielded the following results:

Distribution of Data Extraction Results After the Second Screening.

Due to the limited availability of fine-grained data in supplementary materials and associated data papers, a strategy was implemented to contact authors directly, requesting comprehensive data access in exchange for co-authorship. Each email was personalized, explicitly identifying the relevant data and/or figures that were incomplete within the respective papers.

Of the 21 emails sent, 2 were undeliverable due to outdated contact information. 9 authors responded positively, offering to provide their data. The remaining authors are pending response or require follow-up.

This strategy has significantly expanded the geographical coverage and enriched the database by integrating resources from various institutions:

University of Oviedo, Public university in Oviedo, Spain
Kazimierz Wielki University, Poland
State of california, department of pesticides regulation
INRAE Avignon
Facultad de Agronomía, Universidad de Buenos Aires
Instituto Multidisciplinario de Biología Vegetal, UNC, CONICET, FCEFyN, Argentina
Department of Land Resources and Environmental Sciences, Montana State University, Bozeman, MT
Centre for Environmental and Climate Research, Lund University, Sweden

-> See details of the mails: https://docs.google.com/document/d/1jJIcQoO81IjK8yhV4bvorLfqR_D5kMH6vwgQX5n3gps/edit?tab=t.0

Additional paper to consider

Although not initially required, I extended the literature search using snowballing to identify additional relevant articles that might contribute to our analysis. This method allowed me to trace key references from selected papers and uncover potentially useful studies. So far, I have identified nine potential articles, listed below, that could be considered for inclusion.

-Brosi, Berry J., et al. “The effects of forest fragmentation on bee communities in tropical countryside.” Journal of Applied Ecology 45.3 (2008): 773-783.

Bos, Merijn M., et al. “Insect diversity responses to forest conversion and agroforestry management.” Stability of Tropical Rainforest Margins: Linking Ecological, Economic and Social Constraints of Land Use and Conservation (2007): 277-294.
Blaauw, Brett R., and Rufus Isaacs. “Flower plantings increase wild bee abundance and the pollination services provided to a pollination‐dependent crop.” Journal of Applied Ecology 51.4 (2014): 890-898.
Jha, S. and Vandermeer, J.H., 2009. Contrasting bee foraging in response to resource scale and local habitat management. Oikos, 118(8), pp.1174-1180. (if we consider canopy cover as a proxy for woody proportion)
Baños-Picón, L., Torres, F., Tormos, J., Gayubo, S.F. and Asís, J.D., 2013. Comparison of two Mediterranean crop systems: Polycrop favours trap-nesting solitary bees over monocrop. Basic and Applied Ecology, 14(3), pp.255-262.
Tylianakis, Jason M., et al. “Spatial scale of observation affects α, β and γ diversity of cavity‐nesting bees and wasps across a tropical land‐use gradient.” Journal of Biogeography 33.7 (2006): 1295-1304. (maybe?)
Kremen, Claire, Neal M. Williams, and Robbin W. Thorp. “Crop pollination from native bees at risk from agricultural intensification.” Proceedings of the National Academy of Sciences 99.26 (2002): 16812-16816. (but rough characterization of the landscape)
Pollination services from field-scale agricultural diversification may be context-dependent
- Perfecto, Ivette, et al. “Conservation of biodiversity in coffee agroecosystems: a tri-taxa comparison in southern Mexico.” Biodiversity & Conservation 12 (2003): 1239-1252. (Butterflies—was previously rejected as “not on target pollinators.” Should we keep or reject it?)

Database structure

Following collaborative discussions with Ana, Irene, and Talie, a strategic restructuring of the project database has been implemented to optimize data entry, ensure data integrity, and align with project requirements. The primary objective was to simplify and secure data entry without compromising information richness.

Specifically, we have enhanced the capture of landscape element percentage data. We will now prioritize recording the following information, when available: proportion of woody areas, grassland, water elements, cropland, and the total proportion of semi-natural habitats.

Regarding pollinator species, we have confirmed that common bees will be excluded from the dataset. We will detail, where possible, the various genera of bees, butterflies, and hoverflies. This level of detail is crucial for future coherent analysis, as some articles focus on specific genera while others consider broader groups

Key Improvements Implemented:

Modular Database Structure: Metadata and quantitative/contextual data have been segregated into distinct sheets (DB_meta_data and DB_database), enhancing user-friendliness and clarity.
Integrated Calculation Functionality: Basic calculations have been embedded directly within the Google Sheets to minimize data transfer errors between R and Excel, particularly addressing issues arising from generic formulas when dealing with varying plot repetitions. -Enhanced Visual Accessibility: Color-coding has been introduced to streamline data entry, making the process more intuitive and visually accessible.
Optimized Identifier Placement: Article and plot identifiers have been repositioned to the beginning of rows, facilitating efficient data navigation.
Platform Migration and Performance Optimization: The database has been migrated from SharePoint to Google Sheets to resolve performance issues associated with slow data access and processing.
Standardized Data Orientation: Quantitative data (e.g., richness and abundance) has been restructured into columns, aligning with established scientific publication conventions and improving data readability.

Implementation Details:

The refined database structure can be accessed at: https://docs.google.com/spreadsheets/d/1clyZc0lTWoq_ubkC-ME_DaMieg5MyGpAAkgoGNNu44o/edit?gid=1871607765#gid=1871607765

Specifically, please refer to the “DB_meta_data” and “DB_database” sheets.

Integration of Environmental Data: Soil and Climate (and landscape features?)

We have integrated high-resolution environmental datasets from SoilGrids and WorldClim to enhance the accuracy of our analyses.

SoilGrids (Hengl et al., 2017): Provides global data on soil properties (organic carbon, pH, density, texture) at multiple depths, based on machine learning models. This data allows for the understanding of soil variability.
WorldClim (Fick & Hijmans, 2017): Offers climatic data (temperature, precipitation) from weather stations and climate models. This data is essential for analyzing regional climate variations.

Potential additional data:
   - Global Land Cover (GLC) 30m: 30m resolution land cover data, usable with the raster R package, to calculate landscape metrics (diversity, fragmentation).
    - Landscape Dynamics and Fragmentation Database (LDI): landscape fragmentation metrics, to be verified for global coverage.
    - landscapemetrics R package: allows the calculation of over 60 landscape metrics.

The integration of these data into our database enables meta-regression analyses to assess the impact of soil and climate factors on study outcomes. This allows us to account for environmental heterogeneity and improve the reliability of our results. We can thus analyze the interaction between environmental factors and agricultural practices.

Result: Database

The database currently comprises 4,564 entries, each documenting pollinator abundance and richness. When available in the original studies, certain entries further refine these variables at the genus level, enhancing the granularity of the dataset.

Geographical analysis

The dataset spans 1,101 experimental sites, though its geographic coverage remains relatively constrained, as a significant proportion of the data originates from national monitoring programs primarily designed for pest population tracking rather than comprehensive pollinator assessments.

[1] 1701

Geographical distribution of the studies (not all studies provided GPS coordinates).

Most of the studies are located in biomes such as temperate rainforests and woodland-shrubland areas.

Temporal analysis

The data is relatively recent, partly due to the limited availability of older datasets and the low response rate from authors of older articles. Indeed, older studies rarely provide their underlying data in supplementary materials or data papers.

Type of pollinators

We have details for some article on the folowing list of genus:

 [1] "agapostemon"                 "andrena"                    
 [3] "apis"                        "augochlora"                 
 [5] "augochlorella"               "bombus"                     
 [7] "ceratina"                    "colletes"                   
 [9] "halictus"                    "heriades"                   
[11] "holcopasites"                "hoplitis"                   
[13] "hylaeus"                     "lasioglossum"               
[15] "osmia"                       "sphecodes"                  
[17] "exoneura sub-genus exoneura" "homalictus"                 
[19] "lipotriches"                 "anthidiellum"               
[21] "anthidium"                   "anthophora"                 
[23] "dasypoda"                    "epeolus"                    
[25] "eucera"                      "megachile"                  
[27] "melitta"                     "nomada"                     
[29] "panurgus"                    "rophites"                   
[31] "rophitoides"                 "xylocopa"                   
[33] "chelostoma"                  "epeoloides"                 
[35] "macropis"

Experimental Design

Landscape features

Landscape features were characterized across five buffer zones (<600 m, 600-1000 m, …). The data distribution is relatively heterogeneous, with fewer observations available for the larger buffers. Additionally, information is more abundant for arable land cover compared to other landscape elements.

Example of the distribution of data on the percentage of semi-natural elements within a buffer of less than 600 meters

Quick and Exploratory Analysis of the Relationship Between Abundance/richness and Landscape Features

For Abundance:

This preliminary analysis provides a rapid and exploratory assessment of how species abundance relates to different landscape features. At this stage, the abundance values are not standardized by transect duration or size, meaning that the results should be interpreted with caution. This analysis serves as an initial exploration to identify potential patterns and guide further, more rigorous investigations.

For Richness:

This initial analysis explores the relationship between species richness and various landscape features in a rapid and exploratory manner. At this stage, richness values are not yet standardized by sampling effort, so the results should be interpreted with caution. The goal is to identify potential patterns that could inform more in-depth analyses and refined methodological approaches.

Preliminary Application of Random Forest for Pollinator Prediction

This analysis represents an initial exploration of the relationship between landscape features and pollinator abundance and richness. At this stage, the dataset is incomplete, and several preprocessing steps remain necessary, including outlier detection, error correction, and variable normalization. The goal here is not to produce definitive results but to begin testing the feasibility of predictive modeling approaches.

Why Use a Random Forest Model?

The relationship between landscape features and pollinator abundance/richness is complex and influenced by multiple interacting factors, including spatial scale, habitat diversity, and species-specific preferences. Traditional statistical models, such as linear regressions, may struggle to capture such nonlinear dependencies and interactions.

Random Forest (RF) is a robust machine learning method particularly suited for this type of ecological analysis because:

It handles complex interactions between multiple predictors without requiring explicit specification.
It identifies nonlinear relationships, which is crucial given the potential threshold effects in pollinator responses to landscape features.
It provides variable importance measures, helping to determine which landscape variables are most influential in shaping pollinator abundance and richness.

By applying RF, we can evaluate how well different landscape attributes predict pollinator abundance and richness while accounting for potential nonlinearities and hierarchical effects across spatial scales.

How Does This Contribute to the Study’s Hypotheses?

Threshold Effect Hypothesis: The model can detect diminishing returns in pollinator richness beyond a certain share of landscape features. If a plateau in predictions is observed, it supports the idea that additional habitat features may not significantly enhance pollinator diversity.
Spatial Scale Influence Hypothesis: By testing the predictive power of variables at different scales (e.g., local vs. landscape-level coverage), we can determine whether pollinators respond more strongly to small-scale habitat patches or broader landscape connectivity.
- Species-Specific Response Hypothesis: By running models separately for different taxa (bees, hoverflies, butterflies), we can assess whether certain pollinators are more dependent on specific landscape attributes, helping refine conservation and land management strategies.


Call:
 randomForest(formula = Richness_raw ~ LF_cover_per_r600 + LF_typology +      pol_taxa_name, data = train_data, ntree = 500) 
               Type of random forest: regression
                     Number of trees: 500
No. of variables tried at each split: 1

          Mean of squared residuals: 65.18229
                    % Var explained: 3.7

Root Mean Squared Error (RMSE):  12.28895

R-squared:  0.05492683

Now, let’s examine the Partial Dependence Plots (PDPs), which illustrate the functional relationship between habitat characteristics and pollinator richness. These plots help us understand how specific landscape features influence pollinator diversity while accounting for the effects of other variables in the model.

Preparation of a new explainer is initiated
  -> model label       :  random forest 
  -> data              :  2796  rows  5  cols 
  -> data              :  tibble converted into a data.frame 
  -> target variable   :  2796  values 
  -> predict function  :  yhat.workflow  will be used (  default  )
  -> predicted values  :  No value for predict function target column. (  default  )
  -> model_info        :  package tidymodels , ver. 1.2.0 , task regression (  default  ) 
  -> predicted values  :  numerical, min =  6.302748 , mean =  8.351877 , max =  27.58878  
  -> residual function :  difference between y and yhat (  default  )
  -> residuals         :  numerical, min =  -15.83564 , mean =  0.002312741 , max =  218.7422  
  A new explainer has been created!

Conclusion and Next Steps

The database structure has been established, and key environmental datasets have been integrated. However, several tasks remain to finalize the dataset and proceed with analyses. Given the time already allocated to the initial phases (approximately two-thirds of the total effort), prioritization is necessary to ensure efficient completion of the remaining work. Remaining Tasks

Finalizing Data Collection
- Integrate additional studies and metadata.
- Process author responses and incorporate submitted datasets.

- Data Verification and Standardization

    *  Check for consistency and completeness of extracted variables.

    *  Harmonize classifications and ensure comparability across studies.

- Spatial Data Integration

    * Incorporate additional landscape metrics where relevant.

    * Ensure alignment between environmental datasets (soil, climate, landscape).

- Refining Hypotheses and Analytical Framework

    * Verify that the available data support planned hypotheses.

    * Assess methodological choices for integrating environmental covariates in meta-regression.

    * Plan sensitivity analyses to evaluate the impact of data limitations.

These steps will ensure that the dataset is complete and standardized before moving to the analytical phase.