Introduction

Sources of heterogeneity

Methods

Terminology

There are a lot of overlapping concepts and unfortunately terminology can overlap, as well. Hopefully specificity can enhance clarity, and to that end we use here the following debatable definitions to distinguish two drivers of “biomass”:

  • Productivity – Potential aboveground annual net primary productivity (aANPP) as determined by regional factors such as rainfall.
  • Standing crop – Realized aboveground herbaceous biomass that either falls below aANPP due to offtake or local limitations on available soil moisture, or exceeds aANPP due to the accumulation of plant material over several seasons without removal.

I’ve yet to go through and make these terms consistent; let’s agree on them ourselves first!

Simulated grassland standing crop

These data represent the distribution within the 95% confidence interval of aboveground perennial herbaceous biomass across US National Grasslands. Details on how these data were obtained are provided in Supplemental Information.

Biomass class N per group Mean standing crop SD CV
VL 190 553 78.8 0.14
L 190 739 42.3 0.06
M 189 869 33.6 0.04
H 188 993 39.4 0.04
VH 190 1194 81.7 0.07

Non-spatial simulations

To compare how CV is affected by variability in both moments used to calculate it–the mean and standard deviation–each Biomass class from above was treated as a different productivity scenario. First, 10 samples were randomly drawn from each productivity scenario, and the mean and standard deviation of these random draws used to simulate 25 random observations for which CV was calculated. Then, four additional simulations were performed, in which one moment–either the mean or standard deviation–were assigned to values from the highest (VH) and lowest (VL) productivity classes, while the other moment reflected the true value from the productivity class (Unique).

This process was repeated over 1000 iterations, and the median CV and 95% confidence intervals for each productivity/moment scenario calculated.

Spatially-explicit simulations

Four landscape scenarios were created to simulate the full factorial combination of two potential sources of heterogeneity:

  • Top row simulates 3 patches created by spatially-discrete disturbance (right), or not (left)
  • Bottom row simulates 3 patches created by inherent variability in productivity driven by differences in ecological site
Hypothetical landscapes with four spatial heterogeneity scenarios.

Hypothetical landscapes with four spatial heterogeneity scenarios.

Spatial CV comparison

Consider the top row of the hypothetical landscapes, in which one simulates a landscape with three spatially-discrete patches, and the other with no patch structure. Each cell has been assigned to one of the Low, Moderate, or High categories of the simulated grassland data. The categories are assigned by patch in the Heterogeneous landscape, and randomly across the Homogeneous landscape. Standing crop values are random draws based on the normal distribution of each standing crop category (Low, Moderate, High) based on the unique mean of the respective category and the pooled standard deviation of all three categories (recall from above the SD of each category ranged only from 34-42).

Variance partitioning

We used the lmer function from the lme4 package to fit random-effect regression models, from which variance estimates for each term were extracted with lme4::VarCorr and expressed as \(\sqrt{variance}\).

Simulations were designed to test firstly a single source of spatial heterogeneity (patchy disturbance), and secondly a scenario in which landscapes under both disturbance scenarios (heterogeneous and homogeneous) have underlying heterogeneity (productivity differences due to soil variability). All simulations incorporated a productivity gradient–ranging from 50% to 150% of the observed mean aboveground herbaceous biomass across US National Grasslands–and a variability gradient, in which the standard deviation of assigned biomass values in simulated landscapes ranged from 50% to 250% of what was observed.

When simulations incorporated inherent variability, a third gradient was added that varied the deviation from the mean for the low productivity and high productivity sites from 0–no productivity differences among sites–to 0.4–low productivity site 60% of mean value, high productivity site 140% of mean value.

Simulations consisted of a full factorial combination of the gradients, each combination iterated 1000 times.

Results

Comparing CV across different means

Non-spatial example

Only the center pane calculates CV from the actual mean and SD of each group. The other graphs use either the highest or the lowest value from among all five groups.

From left to right, the CV-mean biomass relationship is always curved because groups at the tails of the normal distribution are necessarily wider and have larger SD.

On the top and bottom, the CV-mean biomass relationship decreases linearly because lower means necessarily have lower CV when SD is constant.

Variability in the Coefficient of Variation calculated from five different datasets with moments from Table 1 meant to simulate a productivity gradient (gradient in mean aboveground biomass values). Points are medians, and bars range of 95% confidence intervals, for 1000 iterations of each. Each group on the **left** has its own SD, but calculations are based on the highest mean (VH). Likewise, on the **right**, each group has its own SD but calculations are based on the *lowest* mean (VL). Similarly, each group on the **top** has its own mean, but calculations are based on the highest SD, while on the **bottom** each group has its own mean but calculations are based on the *lowest* SD.

Variability in the Coefficient of Variation calculated from five different datasets with moments from Table 1 meant to simulate a productivity gradient (gradient in mean aboveground biomass values). Points are medians, and bars range of 95% confidence intervals, for 1000 iterations of each. Each group on the left has its own SD, but calculations are based on the highest mean (VH). Likewise, on the right, each group has its own SD but calculations are based on the lowest mean (VL). Similarly, each group on the top has its own mean, but calculations are based on the highest SD, while on the bottom each group has its own mean but calculations are based on the lowest SD.

A spatially-explicit example

These graphs show just how dependent CV is on the mean. Colors represent different levels of mean standing crop, within which each heterogeneous landscape has a patch lower and a patch higher than the mean. Although the CV does qualitatively discern the two landscape types by picking up the high degree of variability within patches at all levels of productivity, CV in homogeneous landscapes demonstrates dependence on the mean across a range of biomass production.

In creating these data, I couldn’t decide whether to maintain variance as mean increased–homoscedasticity, which is an assumption of linear regression–or allow the differences between the classes to get farther apart as the mean increased–heteroscedasticity, which is how a lot of field data collected along gradients tend to turn out–so I did both ¯\_(ツ)_/¯.

Landscape-level coefficient of variation plotted against mean aboveground biomass in two landscape scenarios. CV and biomass means calculated from the mean value of 6 transects placed in each of three patches (n = 18 transects per landscape).

Landscape-level coefficient of variation plotted against mean aboveground biomass in two landscape scenarios. CV and biomass means calculated from the mean value of 6 transects placed in each of three patches (n = 18 transects per landscape).

These graphs show the difference in the data behind the heteroscedastic and homoscedastic scenarios. This would go into supplemental information for a paper:

Distributions of the above data. Note the gap between peaks increasing as mean biomass increases in the top row, but remaining constant in the bottom row.

Distributions of the above data. Note the gap between peaks increasing as mean biomass increases in the top row, but remaining constant in the bottom row.

Variance partitioning

Single-source heterogeneity

Evidence that variance partitioning is robust to differences in mean, but sensitive to differences in variability--the Biomass variability factor refers to the deviation from mean biomass assigned to low-biomass and high-biomass patches in the landscape. Points represent the median variance assigned to the patch term in variance partioning models in 1000 landscape simulations. Bars span the 95% confidence intervals.

Evidence that variance partitioning is robust to differences in mean, but sensitive to differences in variability–the Biomass variability factor refers to the deviation from mean biomass assigned to low-biomass and high-biomass patches in the landscape. Points represent the median variance assigned to the patch term in variance partioning models in 1000 landscape simulations. Bars span the 95% confidence intervals.

Two sources of heterogeneity

Site var factor refers to the degree of divergence between low and high productivity sites–0 means no difference in productivity among sites; 0.4 indicates each are 40% different from the mean (below/above, respectively).

Shaded data in the background are the patch term, which is constant across the site scenarios. They are the same as the graph above. This is because each term’s variance is calculated independently; they are not relative to each other. Think Type II sums of squares.

Takeaways:

  • As inherent heterogeneity increases (greater differences in biomass among sites), measured contrast due to that term increases.
  • There is consistently greater variability in highly-productive scenarios than in low-productivity scenarios.
  • Contrast tends to decline at very high variability scenarios, when the range of potential variation is widest. This decline is most evident in homogeneous landscapes, where it is more sensitive to biomass variability than heterogeneous landscapes.

Supplemental Information

Data simulation details

To get a realistic range of perennial herbaceous standing crop, herbaceous vegetative biomass data were extracted from the Rangeland Analysis Platform for 77 fire perimeters across the US National Grasslands. Data were extracted from 1 yr before the fire until 10 yr after to capture a broad range of potential aboveground vegetative biomass.

745 observations, representing the 95% confidence interval of the extracted data, roughly follow a normal distribution:

The means and distributions of the simulated groups matches the actual data well:

The mean (867) and standard deviation (257.1) of these data were used to generate the simulated dataset, within which five productivity classes were assigned by quintile.

 set.seed(5678)
sim_dat <- tibble(PerProd = rnorm(1000, 866.9, 257.1))