1Thesis introduction and expected achievements

Mycorrhizal fungi form partnerships with most terrestrial plants, playing a vital role in supporting plant growth and regulating the cycling of nutrients and carbon in forest ecosystems. Two major types of these partnerships, arbuscular mycorrhizae (AM) and ectomycorrhizae (EcM), are associated with distinct groups of trees, each influencing how forests function, how diverse they are, and how they respond to environmental changes. Despite their importance, our understanding of how these mycorrhizal associations shape tree diversity and ecosystem productivity across different landscapes and environmental gradients remains limited, mainly because current research is constrained by small-scale field studies and a lack of comprehensive, spatial data. To address these challenges, it is essential to integrate advanced observational technologies that enable the simultaneous studies of both aboveground and belowground ecological processes.In my thesis, I am going to provide a fine-scale map of mycorrhizal associations in US forests based on satellite observations, and then study on the impacts of mycorrhizal associations on forest productivity, diversity, and their ability to mitigate climate change.

Mycorrhizal fungi symbioses with plant roots, which are typically barried by the soil, How can you monitor them from satellite images?

This is a frequently raised question. Instead of observing mycorrhizae directly, I focus on tree-level proxies. Existing research, based on species-level comparisons and localized experiments, indicates that trees associated with AM and EcM fungi differ significantly in their resource-use strategies, exhibiting distinct functional traits and phenological patterns.In Chapter 1, I will provide continental-scale remote sensing evidence for this aboveground-belowground coupling, establishing a theoretical foundation for subsequent monitoring. Chapter 2 will focus on quantifying the relationships between mycorrhizal associations and optical remote sensing metrics, utilizing machine learning to retrieve the mycorrhizal composition of forests across the United States. In Chapters 3 and 4, I will investigate how shifts in forest mycorrhizal composition feedback to climate change by influencing carbon sequestration and water-energy cycles. Upon completion, this research will yield the following outcomes:

  1. An algorithmic framework for monitoring forest mycorrhizal composition based on satellite remote sensing and phenological indicators.
  2. A high-resolution distribution map of mycorrhizal associations across the Conterminous United States (CONUS).
  3. Forest ecosystem management recommendations informed by mycorrhizal composition.

2Key words

mycorrhizal associations, plant functional traits, ecosystem functioning, plant and soil feedbacks, climate mitigation, forest management

3Different findings in literature review tools

The search function in CONNECTED PAPERS at my PC seems to be broken, so that CONNECTED PAPERS is not available for me and I focus on RESEARCH RABBITS.

In my experience, CONNECTED PAPERS tends to offer more up-to-date literature, whereas RESEARCH RABBITS appears to lag, with its most recent entries dating back to 2023—nearly three years ago. Although time filters are available, they often require a paid subscription.

My search consistently points to the seminal 2013 paper, ‘The Mycorrhizal-Associated Nutrient Economy: A New Framework for Predicting Carbon-Nutrient Couplings in Temperate Forests.’ This foundational work established the theoretical framework for the relationship between mycorrhizal associations and plant traits, guiding the vast majority of subsequent studies. Two pivotal 2019 publications, ‘Global Imprint of Mycorrhizal Fungi on Whole-Plant Nutrient Economics’ and ‘Climatic Controls of Decomposition Drive the Global Biogeography of Forest-Tree Symbioses’, serve as critical bridges. These studies expanded the original framework by providing global-scale validations across a broader range of species, despite their coarse resolution. Furthermore, the 2020 review, ‘How Mycorrhizal Associations Drive Plant Population and Community Biology,’ offers a comprehensive synthesis by extensively citing previous milestones, thereby facilitating future research. These are essential, classic works that demand my repeated study and deep reflection throughout the writing of my thesis.

Figure1 -Searching results from RESEARCH RABBITS.
Figure1 -Searching results from RESEARCH RABBITS.

4Hypothesis

Figure2 -Schematic diagram illustrating differences between arbuscular mycorrhizal (AM) and ectomycorrhizal (EcM) trees in foliar traits, structural characteristics, belowground processes, and the strength of aboveground–belowground carbon and nutrient linkages.
Figure2 -Schematic diagram illustrating differences between arbuscular mycorrhizal (AM) and ectomycorrhizal (EcM) trees in foliar traits, structural characteristics, belowground processes, and the strength of aboveground–belowground carbon and nutrient linkages.
Figure3 -Schematic diagram illustrating two main hypotheses regarding the impact of mycorrhizal associations on tree species diversity: the competitive exclusion hypothesis (H1) and the niche partitioning and coexistence hypothesis (H2).
Figure3 -Schematic diagram illustrating two main hypotheses regarding the impact of mycorrhizal associations on tree species diversity: the competitive exclusion hypothesis (H1) and the niche partitioning and coexistence hypothesis (H2).

5Data

Data Source Data Type Data Access or Collection
Satellite data time-series spectural reflectance Google Earth Engine, NASA
Field collected data plot-level mycorrhizal associations, single canopy species records US Forest Inventory data (FIA), National Ecological Observatory Network (NEON)
environmental data climate conditions, soil propertities, topography TerraClimate dataset, WorldClim dataset, SoilGrids dataset, NASA SRTM digital elevation model

DATA MANAGEMENT PLAN (DMP)

YOU HAVE SELECTED OPTION A:

Data is freely available on the internet, in libraries or archives. DMP and Dataset submission are not needed. Primary supervisor approval will be sought.

The URL of your dataset retrieved online:

Satellite time-series images can be downloaded from Google Earth Engine (https://earthengine.google.com). US FIA plot data are public at https://research.fs.usda.gov/programs/fia, and single canopy species records for NEON sites can be obtained at https://zenodo.org/records/10926344. Climate conditions including temperature, precipitation, radiation can be obtained from Terraclimate dataset (https://www.climatologylab.org/terraclimate.html) and WorldClim dataset (https://worldclim.org/data/worldclim21.html). Soil properties are available from SoilGrids dataset (https://isric.org/explore/soilgrids). elevation and slope can be downloaded and calculated from SRTM dataset at NASA wetsite (https://www.earthdata.nasa.gov/data/instruments/srtm).

6Code and analysis

Analysis 1: linear mixed effects model (R pakages: lme4)

Linear mixed effects model allows me to quantify the effects of mycorrhizal associations and other covariates on leaf functional traits considering random effects including ecoregions or biomes. By comparing the standardized coefficient, I can assess the relationship between aboveground leaf features and belowground mycorrhizal associations.

example code:

library(lme4)
## Loading required package: Matrix
library(forestplot)
## Warning: package 'forestplot' was built under R version 4.5.3
## Loading required package: grid
## Loading required package: checkmate
## Warning: package 'checkmate' was built under R version 4.5.3
## Loading required package: abind
library(ggplot2)


set.seed(123)  


n_studies <- 10
n_per_study <- 20
n_total <- n_studies * n_per_study


data <- data.frame(
  study_id = factor(rep(1:n_studies, each = n_per_study)),
  treatment = factor(rep(c("Drug", "Placebo"), times = n_total/2)),
  age = runif(n_total, 30, 70),
  baseline = rnorm(n_total, 100, 15),
  outcome = NA
)


for(i in 1:n_studies) {
  
  random_intercept <- rnorm(1, 0, 5)
  
  
  treatment_effect <- ifelse(data$treatment[data$study_id == i] == "Drug", -10, 0)
  
 
  data$outcome[data$study_id == i] <- 100 + 
    random_intercept + 
    treatment_effect + 
    -0.5 * (data$age[data$study_id == i] - 50) + 
    0.3 * (data$baseline[data$study_id == i] - 100) + 
    rnorm(n_per_study, 0, 8)
}


model <- lmer(outcome ~ treatment + age + baseline + (1 | study_id), 
              data = data)


summary(model)
## Linear mixed model fit by REML ['lmerMod']
## Formula: outcome ~ treatment + age + baseline + (1 | study_id)
##    Data: data
## 
## REML criterion at convergence: 1427.6
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -2.66470 -0.62794  0.08804  0.67744  2.75903 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  study_id (Intercept) 31.42    5.606   
##  Residual             65.73    8.107   
## Number of obs: 200, groups:  study_id, 10
## 
## Fixed effects:
##                  Estimate Std. Error t value
## (Intercept)      89.73840    5.47954  16.377
## treatmentPlacebo  8.67181    1.15424   7.513
## age              -0.48401    0.05440  -8.897
## baseline          0.27525    0.04086   6.736
## 
## Correlation of Fixed Effects:
##             (Intr) trtmnP age   
## tretmntPlcb -0.211              
## age         -0.551  0.072       
## baseline    -0.786  0.094  0.060
fixed_effects <- summary(model)$coefficients



forest_data <- data.frame(
  Variable = rownames(fixed_effects),
  Estimate = fixed_effects[, 1],  
  StdError = fixed_effects[, 2],   
  CI_lower = fixed_effects[, 1] - 1.96 * fixed_effects[, 2],
  CI_upper = fixed_effects[, 1] + 1.96 * fixed_effects[, 2]
)


if(ncol(fixed_effects) >= 4) {
  forest_data$P_value <- fixed_effects[, 4]  
} else {
  
  forest_data$P_value <- 2 * (1 - pnorm(abs(forest_data$Estimate / forest_data$StdError)))
}


forest_data$CI_text <- sprintf("%.2f (%.2f, %.2f)", 
                                forest_data$Estimate, 
                                forest_data$CI_lower, 
                                forest_data$CI_upper)

label_text <- cbind(
  forest_data$Variable,
  forest_data$CI_text
)

forestplot(
  labeltext = label_text,
  mean = forest_data$Estimate,
  lower = forest_data$CI_lower,
  upper = forest_data$CI_upper,
  xlab = "Estimate (95% Confidence Interval)",
  title = "Forest Plot of Linear Mixed Effects Model",
  txt_gp = fpTxtGp(label = gpar(cex = 0.8),
                   xlab = gpar(cex = 0.9),
                   title = gpar(cex = 1.1)),
  col = fpColors(box = "royalblue", line = "darkblue", zero = "gray50"),
  zero = 0,
  boxsize = 0.2,
  lwd.ci = 2,
  ci.vertices = TRUE,
  lineheight = "auto"
)

Analysis 2: piecewise structual equation model (R pakages: piecewiseSEM)

Structural equation model is a good way to estimate the mycorrhizal associations on forest ecosystem function via different pathways. By comparing the pathway effects, I can figure out to what extent the ecosystem function (e.g. productivity) is affected by mycorrhizal associations through each pathway.

example code:

if (!require("piecewiseSEM")) install.packages("piecewiseSEM")
## Loading required package: piecewiseSEM
## 
##   This is piecewiseSEM version 2.3.0.2.
## 
## 
##   Questions or bugs can be addressed to <jslefche@gmail.com>.
if (!require("lme4")) install.packages("lme4")

library(piecewiseSEM)
library(lme4)

set.seed(123)

n <- 100

soil_nitrogen <- rnorm(n, 50, 10)
soil_moisture <- rnorm(n, 60, 15)
elevation <- rnorm(n, 500, 100)

plant_biomass <- 0.5 * soil_nitrogen + 0.3 * soil_moisture + rnorm(n, 0, 5)
plant_diversity <- -0.4 * scale(elevation) + 0.2 * plant_biomass + rnorm(n, 0, 2)
herbivore_abundance <- 0.6 * plant_biomass + 0.3 * plant_diversity + rnorm(n, 0, 3)
predator_abundance <- 0.7 * herbivore_abundance - 0.2 * scale(elevation) + rnorm(n, 0, 2)

data <- data.frame(
  soil_nitrogen = soil_nitrogen,
  soil_moisture = soil_moisture,
  elevation = elevation,
  plant_biomass = plant_biomass,
  plant_diversity = plant_diversity,
  herbivore_abundance = herbivore_abundance,
  predator_abundance = predator_abundance
)

model <- psem(
  lm(plant_biomass ~ soil_nitrogen + soil_moisture, data = data),
  lm(plant_diversity ~ elevation + plant_biomass, data = data),
  lm(herbivore_abundance ~ plant_biomass + plant_diversity, data = data),
  lm(predator_abundance ~ herbivore_abundance + elevation, data = data)
)
## Warning: the 'nobars' function has moved to the reformulas package. Please update your imports, or ask an upstream package maintainter to do so.
## This warning is displayed once per session.
summary(model)
##   |                                                                              |                                                                      |   0%  |                                                                              |=======                                                               |  10%  |                                                                              |==============                                                        |  20%  |                                                                              |=====================                                                 |  30%  |                                                                              |============================                                          |  40%  |                                                                              |===================================                                   |  50%  |                                                                              |==========================================                            |  60%  |                                                                              |=================================================                     |  70%  |                                                                              |========================================================              |  80%  |                                                                              |===============================================================       |  90%  |                                                                              |======================================================================| 100%
## 
## Structural Equation Model of model 
## 
## Call:
##   plant_biomass ~ soil_nitrogen + soil_moisture
##   plant_diversity ~ elevation + plant_biomass
##   herbivore_abundance ~ plant_biomass + plant_diversity
##   predator_abundance ~ herbivore_abundance + elevation
## 
##     AIC
##  1971.832
## 
## ---
## Tests of directed separation:
## 
##                               Independ.Claim Test.Type DF Crit.Value P.Value 
##        plant_diversity ~ soil_nitrogen + ...      coef 96    -1.1498  0.2531 
##    herbivore_abundance ~ soil_nitrogen + ...      coef 96    -0.3172  0.7518 
##     predator_abundance ~ soil_nitrogen + ...      coef 96    -0.1522  0.8793 
##        plant_diversity ~ soil_moisture + ...      coef 96    -0.2772  0.7822 
##    herbivore_abundance ~ soil_moisture + ...      coef 96     1.6648  0.0992 
##     predator_abundance ~ soil_moisture + ...      coef 96     1.2008  0.2328 
##              plant_biomass ~ elevation + ...      coef 96    -0.5113  0.6103 
##        herbivore_abundance ~ elevation + ...      coef 96     0.6502  0.5171 
##     predator_abundance ~ plant_biomass + ...      coef 94     0.0980  0.9221 
##   predator_abundance ~ plant_diversity + ...      coef 95     0.1561  0.8763 
## 
## --
## Global goodness-of-fit:
## 
## Chi-Squared = 7.582 with P-value = 0.67 and on 10 degrees of freedom
## Fisher's C = 14.336 with P-value = 0.813 and on 20 degrees of freedom
## 
## ---
## Coefficients:
## 
##              Response           Predictor Estimate Std.Error DF Crit.Value
##         plant_biomass       soil_nitrogen   0.4761    0.0577 97     8.2459
##         plant_biomass       soil_moisture   0.3150    0.0363 97     8.6679
##       plant_diversity           elevation  -0.0051    0.0021 97    -2.4402
##       plant_diversity       plant_biomass   0.1518    0.0246 97     6.1727
##   herbivore_abundance       plant_biomass   0.5468    0.0412 97    13.2660
##   herbivore_abundance     plant_diversity   0.5762    0.1402 97     4.1089
##    predator_abundance herbivore_abundance   0.6835    0.0353 97    19.3638
##    predator_abundance           elevation  -0.0020    0.0022 97    -0.9122
##   P.Value Std.Estimate    
##    0.0000       0.5404 ***
##    0.0000       0.5681 ***
##    0.0165      -0.2041   *
##    0.0000       0.5163 ***
##    0.0000       0.7411 ***
##    0.0001       0.2295 ***
##    0.0000       0.8883 ***
##    0.3639      -0.0418    
## 
##   Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
## 
## ---
## Individual R-squared:
## 
##              Response method R.squared
##         plant_biomass   none      0.58
##       plant_diversity   none      0.33
##   herbivore_abundance   none      0.78
##    predator_abundance   none      0.80
plot(model)

Analysis 3: explainable machine learning (R packages: randomforest)

Machine learning methods can capture the complicate and non-linear relationship between the predict variables and response variable, providing a robust way to mapping the response variable.