Assignment - Meta-Analysis

Sat Nov 15 20:52:22 2025

Background

In the Meeting of Litters project, we collected leaf litter from 33 global change experiments in arctic, subarctic and alpine ecosystems throughout the globe. We had the following global change experiments in our project:

Experiment	Amount
Warming	24 experiments
Nutrient enrichment	16 experiments
CO2 enrichment	2 experiments
UV-B irradiation	4 experiments
Watering	4 experiments
Shading	3 experiments
Ambient / Control	33 experiments

All litter was processed in our lab in Amsterdam and samples were analysed for chemical compounds (litter carbon, litter nitrogen, litter phenols) and litter bags were prepared. All litter bags were incubated in a common environment: a litter bed in Northern Sweden.

Aims

In this assignment, we will:

Evaluate the assumptions of the meta-analysis and;
Run a meta-analysis on parts of the dataset.

While the current set-up is not a classic set-up for a meta-analysis, because it is based on materials collected from experiments described in various publications instead of re-evaluating data from publications, we may still evaluate whether it complies to the assumptions and principles of a meta-analysis.

1. Does this set-up accurately deal with the criticisms that might be raised for running a meta-analysis?

In order words: is the set-up appropriate for a meta-analysis based on the three criteria outlined in the previous assignments? Please provide arguments.

The three criteria referred to are 1) publication bias, 2) selection bias, and 3) heterogeneity from combining incomparable studies.

Publication bias
Instead of relying on published literature where significant or positive results are more likely to appear in journals, the researchers directly collected leaf litter samples from all participating experiments regardless of their original findings. This means that experiments showing null results or negative effects are equally represented in the dataset, avoiding the “file drawer problem” - where unpublished negative results remain hidden.

Selection bias
The researchers made efforts to include all global change experiments in arctic, subarctic and alpine ecosystems by approaching all such experiments to participate. This systematic inclusion criterion reduces arbitrary selection. However, there is still a potential for a selection bias, given geographic gaps that could introduce bias: there hardly information from Siberia and no data from the Southern Hemisphere. These geographic limitations mean that the results may not fully capture the global diversity of alpine and subarctic ecosystem responses. These potential biases must be explicitly accounted for when interpreting and generalizing the findings.

Running the meta-analysis
Traditional meta-analyses face the “apples to oranges” problem of different protocols, measurement techniques and environmental conditions making direct comparisons problematic. The Meeting of Litters project tackles this through a common garden approach: all litter samples were analyzed using standardized protocols in the same Amsterdam laboratory, and all decomposition measurements occurred in a single common environment in Northern Sweden. This is a major advantage that may remove an important source of between-study heterogeneity. Like other meta-analyses, the project still averages many responses across different species, functional types and original experimental conditions, which must be done carefully by directly linking treatment to ambient conditions for each experiment. This will be achieved by choosing an appropriate effect size and running the model properly. The common protocol substantially reduces methodological variation, so if no consistent pattern emerges across global change treatments, we can more confidently conclude there is genuinely no generic response—important information for climate models—rather than suspecting methodological differences obscured real patterns. However, assessing heterogeneity remains crucial before expressing such conclusions, as remaining variability must still be evaluated even with standardized protocols.

There is no time to fully explore the complete dataset, therefore we selected a subset of the data available for the purpose of this assignment; see the data file. Indicated are:
a) Site at which the litter was collected;
b) Species;
c) Plant functional Type (PFT) or growth form to which a species belongs (e.g. tree, shrub, herb);
d) The means, standard deviation and number of observations for a number of trait-treatment combinations.

Selected traits are %C of the litter, %N of the litter, C/N ratio of the litter and the measured decomposition constant k at the common litter bed (i.e. a measure of how quickly did the litter decompose). These traits may be considered potential response variables. As treatments, i.e. explanatory variables, we selected only the ambient, warming and fertilization treatments to allow interpretation within the context of this assignment. The headers of the data file indicate the combination of trait, measure (i.e. mean value, standard deviation or the number of observations available) and treatment (i.e. ambient, fertilization treatment or warming treatment) and is assumed to be self-explanatory.

2. Phrase the research question and hypothesis that you will try to answer with this particular dataset.

There are various research questions possible, either focusing on C, N, C/N or on the decomposition constant k, on the effects of warming vs fertilization or on the modification of responses due to different Plant functional Types or combinations of those. I chose: “Does fertilization consistently affect %N in the litter?”, which is one of the simpler questions. The hypothesis is that fertilization should increase %N, assuming that N is the limiting factor for growth.

Research Question: Does warming consistently affect the decomposition constant k across different plant functional types in arctic, subarctic and alpine ecosystems?

Hypothesis: Warming should increase decomposition rates (higher k values) because elevated temperatures accelerate microbial activity and enzymatic processes that break down organic matter. However, the magnitude of this effect may vary across plant functional types, with recalcitrant tissues from woody species (shrubs, trees) showing weaker responses compared to more labile herbaceous litter, due to differences in chemical composition and structural complexity that create varying degrees of temperature sensitivity in decomposition processes.

3. Read the meta-analysis text file into R and explore your data

In addition to the various examples of explorations from previous assumptions, you may also consider the following as a way to summarize and visualise patterns quickly:

# loading data
litter <- read.table("datafile-meta-analysis.txt", sep="\t", header=TRUE)

# overview of data
dim(litter)

## [1] 84 39

# head(litter)
# tail(litter)
summary(litter)

##    Sitecode         Speciescode            PFT              Nmean_amb     
##  Length:84          Length:84          Length:84          Min.   :0.3875  
##  Class :character   Class :character   Class :character   1st Qu.:0.7655  
##  Mode  :character   Mode  :character   Mode  :character   Median :0.9348  
##                                                           Mean   :0.9809  
##                                                           3rd Qu.:1.1187  
##                                                           Max.   :1.9039  
##                                                                           
##    Nstdev_amb         Nobs_amb        Cmean_amb       Cstdev_amb    
##  Min.   :0.00000   Min.   : 1.000   Min.   :37.81   Min.   :0.0000  
##  1st Qu.:0.00000   1st Qu.: 1.000   1st Qu.:45.17   1st Qu.:0.0000  
##  Median :0.05410   Median : 3.500   Median :46.90   Median :0.2916  
##  Mean   :0.07636   Mean   : 2.988   Mean   :47.44   Mean   :0.4664  
##  3rd Qu.:0.13261   3rd Qu.: 4.000   3rd Qu.:49.99   3rd Qu.:0.7627  
##  Max.   :0.40812   Max.   :11.000   Max.   :56.06   Max.   :3.0003  
##                                                                     
##     Cobs_amb        CNmean_amb      CNstdev_amb       CNobs_amb     
##  Min.   : 1.000   Min.   : 23.38   Min.   : 0.000   Min.   : 1.000  
##  1st Qu.: 1.000   1st Qu.: 42.84   1st Qu.: 0.000   1st Qu.: 1.000  
##  Median : 3.500   Median : 51.60   Median : 2.603   Median : 3.500  
##  Mean   : 2.988   Mean   : 54.11   Mean   : 4.216   Mean   : 2.988  
##  3rd Qu.: 4.000   3rd Qu.: 62.92   3rd Qu.: 7.066   3rd Qu.: 4.000  
##  Max.   :11.000   Max.   :113.94   Max.   :18.257   Max.   :11.000  
##                                                                     
##    kmean_amb       kstdev_amb        kobs_amb       Nmean_fert    
##  Min.   :29.09   Min.   : 0.000   Min.   :1.000   Min.   :0.6000  
##  1st Qu.:46.54   1st Qu.: 1.041   1st Qu.:2.000   1st Qu.:0.9959  
##  Median :56.01   Median : 4.623   Median :4.000   Median :1.2970  
##  Mean   :60.31   Mean   : 4.549   Mean   :3.548   Mean   :1.3239  
##  3rd Qu.:68.34   3rd Qu.: 6.593   3rd Qu.:4.250   3rd Qu.:1.6050  
##  Max.   :99.58   Max.   :17.075   Max.   :7.000   Max.   :2.2967  
##                                                   NA's   :41      
##   Nstdev_fert       Nobs_fert        Cmean_fert     Cstdev_fert    
##  Min.   :0.0000   Min.   : 1.000   Min.   :40.51   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.: 1.500   1st Qu.:45.73   1st Qu.:0.0000  
##  Median :0.1371   Median : 4.000   Median :48.94   Median :0.4363  
##  Mean   :0.1656   Mean   : 3.442   Mean   :48.45   Mean   :0.4318  
##  3rd Qu.:0.2181   3rd Qu.: 4.000   3rd Qu.:50.39   3rd Qu.:0.6071  
##  Max.   :0.9703   Max.   :11.000   Max.   :56.71   Max.   :1.9815  
##  NA's   :41       NA's   :41       NA's   :41      NA's   :41      
##    Cobs_fert       CNmean_fert     CNstdev_fert      CNobs_fert    
##  Min.   : 1.000   Min.   :21.45   Min.   : 0.000   Min.   : 1.000  
##  1st Qu.: 1.000   1st Qu.:32.46   1st Qu.: 0.000   1st Qu.: 1.000  
##  Median : 4.000   Median :37.21   Median : 3.756   Median : 4.000  
##  Mean   : 3.419   Mean   :41.16   Mean   : 4.984   Mean   : 3.419  
##  3rd Qu.: 4.000   3rd Qu.:48.59   3rd Qu.: 8.089   3rd Qu.: 4.000  
##  Max.   :11.000   Max.   :76.58   Max.   :15.685   Max.   :11.000  
##  NA's   :41       NA's   :41      NA's   :41       NA's   :41      
##    kmean_fert     kstdev_fert       kobs_fert       Nmean_warm    
##  Min.   :30.18   Min.   : 0.000   Min.   :1.000   Min.   :0.3775  
##  1st Qu.:47.80   1st Qu.: 1.060   1st Qu.:2.000   1st Qu.:0.6278  
##  Median :57.84   Median : 5.040   Median :4.000   Median :0.7514  
##  Mean   :57.12   Mean   : 4.868   Mean   :3.395   Mean   :0.8345  
##  3rd Qu.:64.25   3rd Qu.: 7.093   3rd Qu.:4.000   3rd Qu.:0.9593  
##  Max.   :93.40   Max.   :18.053   Max.   :6.000   Max.   :1.9127  
##  NA's   :41      NA's   :41       NA's   :41      NA's   :21      
##   Nstdev_warm        Nobs_warm        Cmean_warm     Cstdev_warm    
##  Min.   :0.00000   Min.   : 1.000   Min.   :39.82   Min.   :0.0000  
##  1st Qu.:0.00000   1st Qu.: 1.000   1st Qu.:45.05   1st Qu.:0.0000  
##  Median :0.05475   Median : 3.000   Median :47.13   Median :0.2941  
##  Mean   :0.08799   Mean   : 2.794   Mean   :47.79   Mean   :0.4257  
##  3rd Qu.:0.12534   3rd Qu.: 4.000   3rd Qu.:50.31   3rd Qu.:0.7323  
##  Max.   :0.44695   Max.   :11.000   Max.   :55.47   Max.   :1.8114  
##  NA's   :21        NA's   :21       NA's   :21      NA's   :21      
##    Cobs_warm       CNmean_warm      CNstdev_warm      CNobs_warm    
##  Min.   : 1.000   Min.   : 25.32   Min.   : 0.000   Min.   : 1.000  
##  1st Qu.: 1.000   1st Qu.: 52.05   1st Qu.: 0.000   1st Qu.: 1.000  
##  Median : 3.000   Median : 64.42   Median : 4.005   Median : 3.000  
##  Mean   : 2.778   Mean   : 64.74   Mean   : 5.919   Mean   : 2.778  
##  3rd Qu.: 4.000   3rd Qu.: 76.73   3rd Qu.: 9.523   3rd Qu.: 4.000  
##  Max.   :11.000   Max.   :114.91   Max.   :26.622   Max.   :11.000  
##  NA's   :21       NA's   :21       NA's   :21       NA's   :21      
##    kmean_warm     kstdev_warm       kobs_warm    
##  Min.   :25.79   Min.   : 0.000   Min.   :1.000  
##  1st Qu.:44.15   1st Qu.: 0.000   1st Qu.:1.000  
##  Median :56.62   Median : 2.864   Median :2.000  
##  Mean   :60.51   Mean   : 3.535   Mean   :2.829  
##  3rd Qu.:71.42   3rd Qu.: 4.862   3rd Qu.:4.000  
##  Max.   :99.85   Max.   :17.095   Max.   :7.000  
##  NA's   :14      NA's   :14       NA's   :14

# checking for missing variables
sum(is.na(litter)) # there appear to be 753 NAs present...

## [1] 723

# to be able run the scatterplot matrix as designed above
library(car) 
scatterplotMatrix(~kmean_amb + kmean_warm + kmean_fert + CNmean_amb + CNmean_warm + CNmean_fert, 
                  diagonal="histogram", data=litter)

# this scatter plot analyses only two responses! decomposition constant k and %N in the litter
# and only two treatments! warming and fertilization; for other combinations, statement must be adapted

Formal meta-analysis can be run with the package ‘metafor’ of R (basically, the only other software that allows this is called ‘metawin’, a specialised software tool). Install the ‘metafor’ package and activate the package.

# calling metafor package
library(metafor)

INITIAL DATA EXPLORATION:
The dataset contains 84 observations across 39 variables representing litter traits under ambient, warming, and fertilization treatments. A total of 723 missing values are present, disproportionately concentrated in fertilization treatment variables (41 NAs for most fertilization measures), indicating that fertilization treatments were not applied at all experimental sites. Warming treatment data are more complete with only 14-21 NAs, providing robust coverage for the primary research question. Ambient treatment data appear nearly complete, establishing reliable baseline comparisons.
Summary statistics reveal that decomposition rates under ambient conditions (kmean_amb: median 56.01, mean 60.31) and warming (kmean_warm: median 56.62, mean 60.51) are remarkably similar, suggesting minimal overall warming effect. However, substantial variation exists within both treatments (kmean_amb range: 29.09-99.58; kmean_warm range: 25.79-99.85), indicating considerable heterogeneity in decomposition responses across species and sites. Substrate quality metrics show that C/N ratios increase under warming (CNmean_amb median 51.60 vs. CNmean_warm median 64.42), suggesting warming may alter litter chemistry in ways that could offset direct temperature effects on decomposition. Nitrogen content shows slight decline under warming (Nmean_amb median 0.9348 vs. Nmean_warm median 0.7514), consistent with the C/N ratio increase and potentially reflecting nutrient resorption or altered allocation patterns under warming conditions.
The scatterplot matrix reveals several critical patterns for evaluating the hypothesis. The kmean_amb versus kmean_warm relationship shows points clustering near the diagonal line with moderate scatter, indicating that warming neither consistently increases nor decreases decomposition rates* across the dataset. Some observations fall above the diagonal (warming increases k) while others fall below (warming decreases k), demonstrating response heterogeneity that likely reflects variation across plant functional types or site conditions. The relationship between C/N ratios and decomposition rates is clearly negative across all treatments—litters with higher C/N ratios (more recalcitrant) decompose more slowly, consistent with expectations for woody versus herbaceous tissue differences.
Notably, the warming treatment shifts many observations toward higher C/N values, potentially counteracting direct temperature-stimulation of decomposition through substrate quality degradation. The kmean_fert versus kmean_amb comparison shows similar scatter patterns to the warming comparison, suggesting that both global change treatments produce variable responses rather than consistent directional shifts. Visual inspection suggests no strong consistent warming effect across the entire dataset, though substantial scatter indicates that responses may differ systematically by functional type—directly addressing the hypothesis that recalcitrant woody tissues and labile herbaceous tissues show differential temperature sensitivity.

4. Which effect size metric do you consider most appropriate or feasible, given the number of replicates, to evaluate differences in means?

The first step of any meta-analysis is to calculate the appropriate effect size. Replicates are the number of independent measurements or observations taken for each experimental unit under the same conditions. Think of lnRR, hedges’d or other measures that are used a metric for meta-analyses. If in doubt, you may check: https://bookdown.org/robcrystalornelas/meta-analysis_of_ecological_data/choosing-an-effect-size.html

Higher replication gives more reliable estimates of means and standard deviations, which directly affects:

Hedge’s g calculation - uses sample sizes to correct for small-sample bias
Variance of the effect size - more replicates = lower uncertainty
Statistical power - more replicates = better ability to detect true effects

# checking how many complete cases there are for warming vs ambient comparison
complete <- sum(complete.cases(litter[, c("kmean_amb", "kstdev_amb", "kobs_amb", 
                                                 "kmean_warm", "kstdev_warm", "kobs_warm")]))
complete

## [1] 70

# checking the distribution of sample sizes (replicates)
par(mfrow=c(1,2))
summary(litter$kobs_amb)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   4.000   3.548   4.250   7.000

hist(litter$kobs_amb, main="Distribution of ambient replicates", xlab="n")
summary(litter$kobs_warm)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   1.000   1.000   2.000   2.829   4.000   7.000      14

hist(litter$kobs_warm, main="Distribution of warming replicates", xlab="n")

CHOICE OF EFFECT SIZE METRIC:
Given the replication structure of the dataset, the log response ratio (lnRR) is the clearly appropriate effect size metric for this analysis. Of the 84 total observations, only 70 complete cases are available with both warming and ambient decomposition measurements, and the replication distribution is heavily skewed toward single observations, with approximately 36% of ambient measurements and 46% of warming measurements having only n=1. This asymmetry in data quality is further evident in the median replication values, where warming treatments have a median of only 2 replicates compared to 4 for ambient conditions.
Using Hedge’s g would require excluding all observations where either ambient or warming treatment has n=1 (since standard deviation cannot be calculated), resulting in the loss of approximately 40-50% of the dataset and leaving only 35-40 observations with calculable effect sizes — a severe reduction in statistical power that would compromise the ability to detect patterns across plant functional types.
In contrast, lnRR requires only mean values for calculation and can therefore retain all 70 complete cases, maximizing sample size for evaluating whether warming effects on decomposition are consistent or vary systematically across functional types as hypothesized. The cost of losing 30-35 observations far outweighs any benefit from the precision weighting that Hedge’s g would provide, particularly when the research question focuses on detecting directional consistency rather than precise effect magnitude estimation.

5. Calculate the effect size for each site-species combination indicating the difference in your response variable for a selected treatment compared to ambient conditions.

So, calculate your effect size for either C, N, C/N or decomposition constant k identifying the effect of (one of) your selected treatment(s) compared to ambient conditions. Do these calculations only for those treatments and traits relevant to your research question.
Perform the calculations with “escalc” (= effect size calculator) in metafor, in which YOUR_METHOD most likely equals “ROM” or “SMD”. In this case, Response ratios refers to “ROM” and Hedge’s g to “SMD”; see also “metaforRdocumentation.pdf” for a more extensive discussion on possible metrics.

# subsetting dataset to columns of interest for better overview
effect <- subset(litter, select=c("Sitecode", "Speciescode", "PFT", "kmean_amb", "kstdev_amb", "kobs_amb", 
                                  "kmean_warm", "kstdev_warm", "kobs_warm"))
# or: effect <- litter[, c("Sitecode", "Speciescode", "PFT", "kmean_amb", "kstdev_amb", "kobs_amb", "kmean_warm", "kstdev_warm", "kobs_warm")]

# calculating effect size (overwrites sub-dataset with two additional columns yi and vi)
effect <- escalc(measure="ROM", m1i=kmean_warm, m2i= kmean_amb, sd1i=kstdev_warm, sd2i=kstdev_amb,
                 n1i=kobs_warm, n2i=kobs_amb, data=effect)

# first calculates the effect size estimate itself, which gets stored in a column called yi (essentially the natural logarithm of the ratio of the two means for each study)
# then calculates the variance of your effect size estimate, which gets stored in a column called vi (this is why we need the standard deviations)

The output of this analysis is a data frame to which the following elements have been added to your original data file:

yi value of the effect size or outcome measure
vi corresponding (estimated) sampling variance

which can be seen/analysed with e.g.:

# getting summary of effect sizes and their variances
summary(effect$yi)

##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max.      NA's 
## -0.580087 -0.051294  0.002413  0.007411  0.074368  0.370870        14

summary(effect$vi)

##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max.      NA's 
## 0.0000000 0.0004258 0.0034663 0.0074997 0.0067957 0.0786346        14

# visualizing distribution of effect sizes and their variances across all observations
par(mfrow=c(1,2))
hist(effect$yi, main="Distribution of effect sizes", xlab="n") # distribution of warming effects
hist(effect$vi, main="Distribution of effect size variances", xlab="n") # more of a diagnostic check

# redirect user to data table in R: View(effect)

# saving effect object (with effect sizes, variance and raw variables) to "output.txt" in working directory
write.table(effect, "METAoutput.txt", sep="\t", row.names=FALSE) # row.names prevents R from adding extra column with row numbers

Presumably, depending your question, you might have to repeat this for several trait-treatment combinations. If you do so, you will have to rename vi and yi to avoid they get overwritten. When would you actually do this?
1. If you want to compare multiple effects (e.g., does warming affect chemistry AND decomposition?)
2. If your research question is broader than just one trait-treatment combo
3. If you want to build a comprehensive picture of global change effects

CALCULATION OF EFFECT SIZE:
Independent of the effect size metric chosen, the calculations produce yi (effect size) and vi (variance) values for each site-species-PFT combination, with actual values varying based on the metric and treatment selected. Using natural log of response ratio (lnRR) for warming effects on decomposition constant k, the yi values show highly variable responses ranging from -0.580 to 0.371, with the distribution centered approximately around zero.
This indicates that warming does not consistently increase or decrease decomposition rates across the dataset — some species-site combinations show accelerated decomposition under warming (positive yi) while others show reduced rates (negative yi), suggesting substantial heterogeneity in warming responses that may relate to differences in plant functional types, substrate quality, or site conditions.
The histogram reveals an approximately symmetric distribution of effect sizes, though not perfectly normal, with most observations clustered near zero and a slight positive skew. The variance distribution is heavily right-skewed with most vi values concentrated near zero, indicating that the majority of effect size estimates have relatively high precision, though a few observations show considerably higher uncertainty due to small sample sizes or high variability in decomposition measurements.

The “effect” data frame contains various rows in which the variance equals zero…

6. Why are those values zero and what do you decide to do with those species-site combinations for which this happens? Provide arguments.

There are multiple solutions to dealing with this! The simplest would be excluding studies with only one observation, thereby removing zero variances altogether, though this discards nearly half the dataset in this case. More appropriate alternatives in practice include attributing the maximum observed variance to single-observation studies to downweight their influence on the overall effect estimate, or using imputation methods to estimate missing variances based on the dataset’s variance structure across similar observations.

You can eliminate effect sizes with zero variance by making another subset:

# subsetting to remove studies with only one observation
subeffect <- subset(effect, vi>0)

# getting new summary of effect sizes and their variances
summary(subeffect$yi)

##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## -0.265988 -0.034843  0.009227  0.021822  0.074517  0.370870

summary(subeffect$vi)

##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 2.134e-06 1.608e-03 4.658e-03 8.898e-03 8.718e-03 7.863e-02

# visualizing new distribution of effect sizes and their variances
par(mfrow=c(1,2))
hist(subeffect$yi, main="Distribution of effect sizes (adjusted)", xlab="n") # distribution of warming effects
hist(subeffect$vi, main="Distribution of effect size variances (adjusted)", xlab="n")

Answer: When only one replicate exists for a treatment, standard deviation cannot be calculated, resulting in zero variance (vi=0) for the effect size estimate. This creates a critical methodological problem because observations with zero variance are treated as completely certain in meta-analysis, receiving infinite weight in subsequent models despite actually representing highly uncertain measurements based on single data points. This artificial certainty introduces substantial bias into effect size estimation and heterogeneity assessment. Two solutions exist: omitting observations with zero variance, or imputing missing variances using relationships between variance and other dataset characteristics (for instance, variance often correlates strongly with means in ecological data).
For this analysis, the more straightforward approach was chosen by subsetting to retain only observations where vi>0, which automatically excluded all single-replicate observations. This reduced the dataset from 70 to 59 observations with calculable variances. The adjusted effect size distribution (yi) remains centered near zero with moderate spread (range: -0.265 to 0.371), continuing to show no consistent directional warming effect across observations, while the adjusted variance distribution remains right-skewed but with improved reliability since all retained observations now have properly estimated uncertainty. Retaining zero-variance observations would have artificially inflated the precision of the overall effect estimate and created strong bias in diagnostic plots like funnel plots, where zero-variance observations would cluster unrealistically at the apex, masking true patterns of heterogeneity and potential publication bias.

7. Define and run the appropriate rma.uni model for your research question.

To analyse patterns in the effect size measure, we apply “rma.uni”. This function allows to internally call “escalc”. In this case, however, we will use the outputs of the effect size calculations directly, i.e. the vi and yi variables (or a subset of the vi and yi data, if needed).

There are various methods to analyze patterns in effect size:
- rma.uni (univariate random/mixed-effects model): Use when each observation has one independent effect size with its variance; standard approach for most meta-analyses where observations are independent (i.e. most common)
- rma.mv (multivariate/multilevel random-effects model): Use when observations are non-independent due to multiple effect sizes per study, hierarchical nesting structures (e.g. species within sites, sites within regions), or multiple correlated outcomes measured on the same experimental units
- rma.glmm (generalized linear mixed-effects model): Use for non-normal data such as binary outcomes or count data, particularly when calculating effect sizes like odds ratios directly from raw contingency tables
- rma.peto (Peto’s method): Specialized method for odds ratios in medical meta-analyses with rare events; less commonly used in ecological research

See the documentation “metafor Rdocumentation.pdf” for rma.uni to select and formulate the most appropriate model given your question and the current experimental design. Consider whether and which variables to include and how!

Available methods in metafor:
- REML (default): Restricted Maximum Likelihood - standard for random-effects models
- ML: Maximum Likelihood
- DL: DerSimonian-Laird (older, simpler method)
- FE: Fixed-effects model
- Several others (EB, HS, SJ, etc.)

Note: If there is a relationship between the observed outcomes and the chosen predictor, then this usually implies asymmetry in the funnel plot and may be an indication of publication bias.

# overall random-effects model (no moderators), tests if there's a consistent overall warming effect
output <- rma.uni(yi, vi, data=subeffect, method="REML") # standard for random-effects models
# R automatically uses method="REML" behind the scenes --> made explicit for better understanding

# plotting meta-analysis outcomes (including forest and funnel plots)
plot.rma.uni(output)

# evaluating funnel plot asymmetry with Egger's regression test (or similar), which mathematically tests whether there's a relationship between effect size and study precision
regtest(output)

## 
## Regression Test for Funnel Plot Asymmetry
## 
## Model:     mixed-effects meta-regression model
## Predictor: standard error
## 
## Test for Funnel Plot Asymmetry: z = -0.3033, p = 0.7617
## Limit Estimate (as sei -> 0):   b =  0.0294 (CI: -0.0122, 0.0711)

The plot.rma.uni function produces four diagnostic plots that serve distinct analytical purposes. The forest plot displays individual effect sizes for each observation with confidence intervals alongside the overall pooled effect estimate, revealing the distribution and consistency of warming responses across species-site combinations. The funnel plot tests for publication bias by plotting effect sizes against their standard errors, where a symmetric inverted funnel shape indicates absence of bias.
The radial (Galbraith) plot provides an alternative visualization of the relationship between effect sizes and precision, facilitating identification of outlier observations that deviate substantially from the overall pattern. The standardized residuals plot evaluates model fit by showing which observations have residuals exceeding ±2 standard deviations, indicating potential influential or problematic data points warranting closer examination.

CONCLUSION ON PUBLICATION BIAS:
The funnel plot appears reasonably symmetric with effect sizes distributed evenly on both sides of the pooled estimate across varying levels of precision. The formal Egger’s regression test confirms this visual assessment, showing no significant funnel plot asymmetry (z=-0.3033, P=0.7617). This indicates no evidence of publication bias or small-study effects in the dataset.
However, this interpretation requires important contextualization: the Meeting of Litters project fundamentally differs from traditional meta-analyses by collecting actual samples from all participating experiments rather than relying on published literature, inherently eliminating conventional publication bias. Any observed asymmetry would therefore more likely reflect true heterogeneity in warming responses across functional types or sites rather than selective reporting of significant results. The funnel plot does reveal several observations with notably higher standard errors (lower precision), reflecting the dataset’s variable replication structure, but these are distributed symmetrically rather than clustering asymmetrically.

9. Summarize the statistical results with a Baujat plot

This shows the estimated model coefficients, corresponding standard error (se), test statistic(zval), p-value(pval), lower bound of the confidence interval(ci.lb) and upper bound of the confidence interval(ci.ub). The first summary also provides estimates on heterogeneity, which is an important asset of your meta-analysis.

Baujat plot = identifies observations that disproportionately contribute to overall heterogeneity (x-axis) and substantially influence the pooled effect estimate (y-axis)

# summarizing results again
output

## 
## Random-Effects Model (k = 59; tau^2 estimator: REML)
## 
## tau^2 (estimated amount of total heterogeneity): 0.0050 (SE = 0.0015)
## tau (square root of estimated tau^2 value):      0.0706
## I^2 (total heterogeneity / total variability):   95.74%
## H^2 (total variability / sampling variability):  23.49
## 
## Test for Heterogeneity:
## Q(df = 58) = 672.8547, p-val < .0001
## 
## Model Results:
## 
## estimate      se    zval    pval   ci.lb   ci.ub    
##   0.0242  0.0123  1.9647  0.0495  0.0001  0.0484  * 
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# creating Baujat plot to evaluate the influence of particular data entries
baujat(output)

CONCLUSION ON BAUJAT PLOT:
The Baujat plot shows that observations 38, 7, and 16 emerge as particularly influential, showing both high contributions to heterogeneity and considerable influence on the overall result. Observation 38 stands out most prominently, positioned far from other observations in terms of overall influence. These represent specific species-site-PFT combinations where warming effects deviate substantially from the typical pattern. Ideally, these observations would warrant detailed investigation by returning to original field data and laboratory records to verify measurement accuracy, experimental protocol adherence, and data entry correctness. However, given that this is a standardized common-garden experiment with consistent methodology, exclusion would only be justified if clear protocol violations or measurement errors were documented, not simply because these observations show extreme responses. These influential observations reinforce that warming effects are genuinely heterogeneous across the dataset, with certain species-site combinations responding dramatically differently than others. This biological variability is precisely what the hypothesis about differential responses across functional types predicts, suggesting these observations may represent ecologically meaningful variation (e.g. particularly sensitive or resistant functional types) rather than data quality issues.

10. Evaluate your choice of the model and interpret the pattern, including the role and interpretation of the intercept in your analysis.

INTERPRETATION OF RANDOM-EFFECTS MODEL:
The random-effects model is the appropriate choice for this analysis because the observations represent different species from different sites under varying environmental conditions, making it biologically implausible that all observations share a single true warming effect. The model assumes that each observation estimates its own true effect size, and these true effects are themselves drawn from a distribution, since ecologically, different species and sites may respond differently to warming. The model results reveal a small but statistically significant overall positive warming effect on decomposition rates (intercept=0.024, P=0.0495), indicating that warming increases decomposition by approximately 2.4% on average across all observations. In this simple model, the intercept represents the overall mean effect size across all species, sites and functional types. However, the extremely high heterogeneity (I²=95.74%, Q=672.85, P<0.0001) demonstrates that this simple model is inadequate for describing the data structure. The I² value indicates that 95.74% of observed variation reflects true differences in warming effects rather than sampling error, while the highly significant Q-test confirms substantial heterogeneity beyond what random chance would produce. This massive heterogeneity reveals that the overall positive effect masks enormous variation, as some observations show strong positive warming responses while others show negative responses or no effect at all. The model is clearly too simplistic because it treats all observations as estimating variations of the same underlying effect, while ignoring systematic sources of variation such as plant functional type or site characteristics. The forest plot confirms this, showing individual effect sizes scattered widely across both positive and negative values with no clear central tendency (moderating variables are missing!).

11. Create a model with the mods-statement within rma.uni that provides a better description of the patterns (i.e. deals better with the heterogeneity).

There might be different populations within the data that, when grouped together, obscure the significant and/or magnitude of the true global effect. In this case, it would be sensible to include Sitecode (to better account for variation within and between sites) and/or PFT as moderators.

# first creating a model with a moderator for different plant species
outputPFT <- rma.uni(yi, vi, mods=~PFT, data=subeffect, method="REML") # PFT as moderator (modeled by)
coef.rma(outputPFT) # only gives the final coefficients of the meta-analysis

##      intrcpt    PFTEVERGR      PFTFORB     PFTGRASS     PFTSEDGE 
##  0.019388396 -0.020438907 -0.005396829 -0.045343732  0.097138049

# then creating a model with a moderator for different test sites
outputSite <- rma.uni(yi, vi, mods=~Sitecode, data=subeffect, method="REML") # PFT as moderator (modeled by)
coef.rma(outputSite)

##           intrcpt   SitecodeAuokulu  SitecodeDARTpadd   SitecodeFinse_1 
##      0.0220514343      0.1244466313      0.0779561658     -0.2880395893 
##   SitecodeFinse_2  SitecodeKilphigh  SitecodeLatheath  SitecodeLatmeadw 
##      0.0701466283     -0.1506693638     -0.0287048693      0.0570729431 
##  SitecodeLatsedge    SitecodePaddus      SitecodeRuby SitecodeSlahtta96 
##     -0.0353866380      0.0001403612     -0.0120831890     -0.0570303799 
##  SitecodeStorwood   SitecodeTib_HGM  SitecodeTib_HGSH   SitecodeTib_LGM 
##     -0.0352989350     -0.0023566424      0.0126408335     -0.0106161996 
##  SitecodeTib_LGSH    SitecodeToolik  SitecodeWyomgraz  SitecodeWyomungr 
##      0.0113823529     -0.0689070079     -0.0153201773      0.0015121698

# and finally creating a model with moderators for both different species and different test sites
outputPFTSite <- rma.uni(yi, vi, mods=~PFT+Sitecode, data=subeffect, method="REML") # PFT as moderator (modeled by)
coef.rma(outputPFTSite) # to evaluate differential responses taking both into account

##           intrcpt         PFTEVERGR           PFTFORB          PFTGRASS 
##       0.049855490      -0.015844464       0.002918136      -0.046722904 
##          PFTSEDGE   SitecodeAuokulu  SitecodeDARTpadd   SitecodeFinse_1 
##       0.124765197       0.077355813       0.048980206      -0.269120741 
##   SitecodeFinse_2  SitecodeKilphigh  SitecodeLatheath  SitecodeLatmeadw 
##       0.040510986      -0.181336785      -0.022271883      -0.018025598 
##  SitecodeLatsedge    SitecodePaddus      SitecodeRuby SitecodeSlahtta96 
##      -0.187955891      -0.064969793      -0.077415346      -0.068989971 
##  SitecodeStorwood   SitecodeTib_HGM  SitecodeTib_HGSH   SitecodeTib_LGM 
##      -0.055346578      -0.045584813      -0.040021106      -0.050317240 
##  SitecodeTib_LGSH    SitecodeToolik  SitecodeWyomgraz  SitecodeWyomungr 
##      -0.035004244      -0.096711063       0.003598671       0.020431018

VALUE AND INTERPRETATION OF MIXED-EFFECTS MODEL:
To assess whether moderator variables better explain the observed heterogeneity, three models were compared: PFT alone, Sitecode alone, and PFT and Sitecode combined.

Model performance is evaluated through: Test of Moderators (QM, indicating whether moderators significantly predict effect sizes), residual heterogeneity (I² and QE, showing unexplained variation), and variance explained (R²).

Plant Functional Type (PFT) as moderator: This model shows that PFT is a significant predictor of warming responses (QM=17.96, P=0.0013), explaining a moderate 33.61% of the observed heterogeneity. Sedges show a significantly stronger positive warming response (estimate=0.097, P=0.0044) compared to deciduous species (the reference category), while evergreens, forbs and grasses do not differ significantly. This partially supports the hypothesis that tissue type matters, though not in the expected woody-versus-herbaceous dichotomy. Still, residual heterogeneity remains extremely high (I²=93.48%, QE significant), so it is insufficient to explain response variation.
Site as moderator: The site model reveals substantial geographic variation in warming responses, with coefficients ranging from -0.288 (Finse_1, strong negative response) to +0.124 (Auokulu, positive response). This demonstrates that location substantially influences decomposition responses, reflecting differences in climate context, soil conditions, and moisture regimes across sites. Site also accounts for spatial pseudoreplication—observations from the same site respond more similarly because they share unmeasured environmental conditions.
PFT + Site combined as moderators: This combined model provides the most comprehensive approach, simultaneously accounting for biological differences (functional type) and environmental context (site). Both moderators are treated as random factors, appropriately recognizing that while generic patterns are the focus, site-level variation must be accounted for to avoid pseudoreplication. The combination reduces residual heterogeneity beyond either moderator alone, though significant residual heterogeneity persists (QE remains significant), indicating that additional unmeasured factors—such as specific climate variables, detailed litter chemistry, or microbial community composition—continue to drive variation in warming responses.

12. What do you conclude about your research question?

Research question: Does warming consistently affect the decomposition constant k across different plant functional types in arctic, subarctic and alpine ecosystems?

CONCLUSION AND TAKEAWAYS:
Answer: No, warming does not consistently affect decomposition rates across plant functional types or sites. While there is a small overall positive effect (approximately 2.4% increase in decomposition under warming), this average masks enormous heterogeneity in responses. Plant functional type partially explains this variation, with sedges showing significantly stronger positive responses to warming than other functional types, but the majority of variation (66% after accounting for PFT) remains unexplained. Site location also contributes to response variation, reflecting differences in environmental context. The combined model accounting for both PFT and site provides the best description of the data, yet substantial residual heterogeneity persists, indicating that warming effects depend on multiple interacting factors beyond functional type classification.
Limitations: The high residual heterogeneity even in the best model suggests that _several important factors remain unaccounted for_, including specific climate variables* (temperature range, moisture regime, precipitation patterns), detailed litter chemistry beyond C/N ratios (lignin content, phenolic compounds), microbial community composition and potential interactions between warming and other global change factors. Geographic representation is also incomplete, with minimal data from Siberia and no Southern Hemisphere observations, constraining generalizability.