Forests support biodiversity through the supply of food (fruits, seeds, nuts, leaves, cambium, wood, detritus) and a 3-D habitat structure (ground cover, microhabitats, protection from predators on the ground and in the air). This diversity of resources (food and habitat) is probably critical to the diversity of all other organisms.

In addition to biodiversity, forests sequester CO2 from the atmosphere in the form of living wood and buried soil organic matter. They regulate temperature by intercepting incoming solar radiation and through through evaporative cooling during photosynthesis.

The big questions about biodiversity include the following:

At Duke Forest we’ll gather data that address these questions. We’ll learn about distribution and abundance of tree species, the food resources they produce, and the vertebrate consumers.

Some Field trip logistics

To bring: - Clip board (or Excel sheets on ipad) - Cruz-all - Diameter tapes - Binoculars - Smart phone with iNaturalist

Activities will include the following:

In the vignette that follows we’ll explore existing data from the site we’ll visit in the field.

Some inventory data

Here I load libraries, source code, and specify data:

source( 'clarkFunctions2024.R' )
library( dplyr )
library( mastif )
library( stringi )
library( maps )
library( car )
dataFile <- 'mastifDataNC.rdata'
plot     <- 'DUKEBW'

Next, I load a file that contains forest data for North Carolina–I’ve clipped from our global data. In this block of code I load the file plot the locations from the data.frame plotData. I use the map function to plot the locations:

load( dataFile, verbose = T )
xlim   <- range( plotData$lon ) + .5*c(-1, 1 )
ylim   <- range( plotData$lat ) + .5*c(-1, 1 )
map( 'state', xlim = xlim, ylim = ylim )                 # draw map
points( plotData$lon, plotData$lat, cex = .2, col = 2 )  # add data points
## Loading objects:
##   plotData
##   taxa
##   inputs

Here are the plots with long-term data (data includes ST for seed traps):

plotData$plot[ grep( 'ST', plotData$data ) ]
##  [1] "CALLHDWD"  "CALLPIPA"  "CWT118"    "CWT218"    "CWT318"    "CWT427"   
##  [7] "CWT527"    "CWTLG"     "CWTUG"     "DUKEBW"    "DUKEEE"    "DUKEEW"   
## [13] "DUKEFACE1" "DUKEFACE2" "DUKEFACE3" "DUKEFACE4" "DUKEFACE5" "DUKEFACE6"
## [19] "DUKEHW"    "GRSMCD"    "GRSMND"    "GRSMPK"    "MARSF"     "MARSP"

The field-trip plot is in this list, DUKEBW.

Sample effort

The problem with this map is that all of the observations are piled on top of each other. Some of these points correspond to 1 tree-year. Others correspond to 10,000. The map does not show how sample effort varies. In this case, sample effort means the number of tree-years observed within each 0.1 x 0.1 degree of lon/lat. Here I use the function obsGrid, which is sourced from clarkFunctions2024.R, to shade locations by the number of tree-years they represent. I use the function symbols to draw squares of the size and color needed to show effort:

effort <- obsGrid( inputs$treeData, plotData ) # aggregate observations to 0.1 degree
map( 'state', xlim = xlim, ylim = ylim )                 
symbols( effort$lon, effort$lat, 
         squares = rep(.11, nrow(effort) ), 
         fg = effort$cols, bg = effort$cols, inches = F, add = T )

Clearly, the effort is uneven. This is typical of environmental data. This distribution affects what we can estimate.

The list effort is a data.frame. A few lines show the structure:

lon lat nobs cols
-80.5 33.5 32 #FC9F6A
-80.4 33.5 1 #FDBB84
-80.8 33.8 63 #FC9661
-78.4 34.0 3 #FCB17B
-78.3 34.0 1 #FDBB84

The column nobs is the number of observations in the grid cell. The column cols is the color assigned to that grid cell. The hex format has # followed by 2 values each for red, green, and blue.

Taxa

The list inputs holds several objects:

summary( inputs )
##              Length Class      Mode     
## priorTable     9    data.frame list     
## randomEffect   2    -none-     list     
## seedData     247    data.frame list     
## seedNames      2    -none-     character
## seedTraits   334    -none-     numeric  
## specNames    111    -none-     character
## treeData      28    data.frame list     
## xytrap         4    data.frame list     
## xytree         4    data.frame list     
## yearEffect     2    -none-     list

Several of these are data.frames. The treeData starts like this, with one row per tree-year

plot tree species year diam shade cropCount cropFraction
GRSMCD 1.01 abiesFraseri 2011 15.9 3 NA NA
GRSMCD 1.01 abiesFraseri 2012 16.4 3 NA NA
GRSMCD 1.01 abiesFraseri 2013 16.8 3 NA NA
GRSMCD 1.01 abiesFraseri 2014 17.3 3 NA NA
GRSMCD 10.01 abiesFraseri 2011 3.4 3 NA NA

This is a large data.frame with nrow(inputs$treeData) = 796728 tree-years. The plot column is used to align this tree-year data with plot data in plotData:

plot lon lat UTMx UTMy UTMzone data ecoRegWWF
BCEF30 -82.64043 35.46287 351149.3 3925612 17 CC Appalachian-Blue Ridge forests
BCEF31 -82.62934 35.50521 352232.9 3930292 17 CC Appalachian-Blue Ridge forests
BCEFBB -82.64627 35.48923 350650.3 3928333 17 CC Appalachian-Blue Ridge forests
BCEFBP -82.59549 35.50025 355277.0 3929481 17 CC Appalachian-Blue Ridge forests
BCEFBT -82.64707 35.47600 350553.2 3926867 17 CC Appalachian-Blue Ridge forests

The UTM columns translate lon/lat to meters. The data column indicates if the site has long-term inventory data with seed traps (ST), crop counts (CC), or both (ST_CC). ecoRegWWF is the WWF ecoRegion.

For example, if I want to extract plot information for the first five tree-years in inputs$treeData I can use the match function:

mm <- match( inputs$treeData$plot, plotData$plot )
plotData[ mm[1:5], ]

These lines are identical, because they come from the same plot in inputs$treeData.

Here are some columns of inputs$treeData:

object mode description
plot character plot name, e.g., DUKEBW
tree character unique within a plot
species code8 format camelCase, 8 genus and 8 species characters
year numeric observation year
diam numeric cm
shade numeric 1 (full sun) to 5 (full shade)
cropCount numeric no. fruits counted
cropFraction numeric fraction of crop represented by cropCount

Many of the tree-years do not include cropCount (or other variables). Missing observations is a common feature of long-term data sets. They are represented by NA.

The species that occur in this data set are here:

inputs$specNames
##   [1] "abiesFraseri"     "acerBarbatum"     "acerFloridan"    
##   [4] "acerNegundo"      "acerPensylva"     "acerRubrum"      
##   [7] "acerSaccharu"     "acerSpicatum"     "aesculusFlava"   
##  [10] "aesculusPavia"    "aesculusSylvatic" "ailanthuAltissim"
##  [13] "amelanchArborea"  "amelanchLaevis"   "asiminaTriloba"  
##  [16] "betulaAlleghan"   "betulaLenta"      "carpinusCarolini"
##  [19] "caryaAlba"        "caryaIllinoin"    "caryaOvalis-g"   
##  [22] "caryaOvata"       "caryaPallida"     "castaneaDentata-"
##  [25] "castaneaPumila"   "celtisLaevigat"   "cercisCanadens"  
##  [28] "chamaecyThyoides" "chionantVirginic" "cornusFlorida"   
##  [31] "corylusAmerican"  "diospyroVirginia" "fagusGrandifo"   
##  [34] "frangulaCarolini" "fraxinusAmerican" "fraxinusPennsylv"
##  [37] "gleditsiTriacant" "gordoniaLasianth" "halesiaCarolina" 
##  [40] "ilexDecidua"      "ilexMontana"      "ilexOpaca"       
##  [43] "ilexVerticil"     "ilexVomitori"     "juglansNigra"    
##  [46] "juniperuVirginia" "linderaBenzoin"   "liquidamStyracif"
##  [49] "liriodenTulipife" "magnoliaAcuminat" "magnoliaFraseri" 
##  [52] "magnoliaGrandifl" "magnoliaMacrophy" "magnoliaTripetal"
##  [55] "magnoliaVirginia" "morusRubra"       "nyssaBiflora"    
##  [58] "nyssaSylvatic"    "ostryaVirginia"   "oxydendrArboreum"
##  [61] "paulowniTomentos" "perseaBorbonia"   "perseaPalustri"  
##  [64] "piceaRubens"      "pinusEchinata"    "pinusElliotti"   
##  [67] "pinusPalustri"    "pinusPungens"     "pinusRigida"     
##  [70] "pinusSerotina"    "pinusStrobus"     "pinusTaeda"      
##  [73] "pinusVirginia"    "platanusOccident" "prunusCarolini"  
##  [76] "prunusPensylva"   "prunusSerotina"   "quercusAlba"     
##  [79] "quercusBicolor"   "quercusCoccinea"  "quercusFalcata"  
##  [82] "quercusHemispha"  "quercusIncana"    "quercusLaevis"   
##  [85] "quercusLaurifol"  "quercusLyrata"    "quercusMacrocar" 
##  [88] "quercusMargaret"  "quercusMariland"  "quercusMichauxi" 
##  [91] "quercusMontana"   "quercusNigra"     "quercusPhellos"  
##  [94] "quercusRubra"     "quercusStellata"  "quercusVelutina" 
##  [97] "quercusVirginia"  "robiniaPseudoac"  "sassafraAlbidum" 
## [100] "sorbusAmerican"   "taxodiumAscenden" "taxodiumDistichu"
## [103] "tiliaAmerican"    "tiliaHeteroph"    "tsugaCanadens"   
## [106] "ulmusAlata"       "ulmusAmerican"    "ulmusRubra"      
## [109] "viburnumPrunifol" "viburnumRafinesq" "viburnumRufidulu"

These names could be unfamilar. Some addition insight is gained from the data.frame taxa. Here are the top rows from taxa:

genus family leaves fruit pollination dispersal gmPerSeed seedsPerFruit
abiesFraseri Abies Pinaceae needle_evergreen samara wind anemochory 0.0100 180.5
acerBarbatum Acer Sapindaceae broad_deciduous samara animal anemochory 0.0300 1.0
acerFloridan Acer Sapindaceae broad_deciduous samara animal anemochory 0.0683 1.0
acerNegundo Acer Sapindaceae broad_deciduous samara animal anemochory 0.0450 1.0
acerPensylva Acer Sapindaceae broad_deciduous samara animal anemochory 0.0385 1.0
acerRubrum Acer Sapindaceae broad_deciduous samara animal anemochory 0.0210 1.0
acerSaccharu Acer Sapindaceae broad_deciduous samara animal anemochory 0.0736 1.0
acerSpicatum Acer Sapindaceae broad_deciduous samara animal anemochory 0.0503 1.0
aesculusFlava Aesculus Sapindaceae broad_deciduous capsule animal barochory 4.5000 1.0
aesculusPavia Aesculus Sapindaceae broad_deciduous capsule animal barochory 6.5000 1.0

Many of these columns are self-explanatory. However, don’t worry about your knowledge of tree species.

Examine a long-term plot

Again, we will visit the long-term plot named DUKEBW (Duke Forest, Blackwood). I use the function extractPlot to examine long-term data from this site:

load( dataFile, verbose = T )
## Loading objects:
##   plotData
##   taxa
##   inputs
tmp  <- extractPlot( inputs, plot, taxa, plotData ) # plot specified at top of this doc
dataList <- tmp$inputs
plotData <- tmp$plotData
taxa     <- tmp$taxa

The treeData starts like this, with one row per tree-year

plot tree species year diam shade cropCount cropFraction
59157 DUKEBW 10000 acerRubrum 2000 11.1 5 NA NA
60100 DUKEBW 10000 acerRubrum 2001 11.1 5 NA NA
61122 DUKEBW 10000 acerRubrum 2002 11.1 5 NA NA
62107 DUKEBW 10000 acerRubrum 2003 11.1 5 NA NA
6396 DUKEBW 10000 acerRubrum 2004 11.1 5 NA NA

The species on this plot and their traits are in the new version of taxa. Here is a table of tree-years by year:

table( dataList$treeData$year )
## 
##  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012 
##  9841  9841  9928  9940 10122 10117 10256 10253  9690 10119 11494 11425 10661 
##  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023 
## 10660 11578 11558 10085 10041 10470 10470  9227  9930  9905    18

The plots were not sampled every year. I’ve filled in the missing years by interpolating diameters.

Spatial distribution

To see maps from two different years I can use the function mastMap:

mapList <- dataList
mapList$mapPlot <- plot
mapList$mapYears <- c(2000, 2020)
mapList$treeScale <- .3
mastMap( mapList ) 
## Loading objects:
##   dem
## Loading objects:
##   dem

In these maps trees of each species is given a different color. I have not included a color legend because it would be too long. The colors still give a sense of the diversity. There is a road separating the SW and NE sections of the plot. There is power line cutting through the southern end of the plot. If you look closely you can see that some of the large trees present in 2000 are no longer there.

To get a better idea of the diversity, here’s one genus:

genus <- 'quercus'
mapList$specNames <- dataList$specNames[ startsWith( dataList$specNames , genus ) ]
mapList$seedNames <- dataList$seedNames[ startsWith( dataList$seedNames , genus ) ]
mapList$mapYears <- 2020
mapList$treeScale <- .4
mapList$LEGEND <- TRUE
mastMap( mapList ) 
## Loading objects:
##   dem

The squares are seed traps, scaled to the number of seeds collected at that location.

Here is another genus:

genus <- 'pinus'
mapList$specNames <- dataList$specNames[ startsWith( dataList$specNames , genus ) ]
mapList$seedNames <- dataList$seedNames[ startsWith( dataList$seedNames , genus ) ]
mastMap( mapList ) 

Another example:

genus <- 'carya'
mapList$specNames <- dataList$specNames[ startsWith( dataList$specNames , genus ) ]
mapList$seedNames <- dataList$seedNames[ startsWith( dataList$seedNames , genus ) ]
mastMap( mapList ) 

Here are some species from different genera:

specs <- c('juglansNigra', 'nyssaSylvatic' )
mapList$specNames <- specs
mapList$seedNames <- specs
mastMap( mapList ) 

Size distribution

There are some contrasts in distribution on this map. They also appear to differ in size distribution. Here is a histogram of diameters for the pines:

treeData <- dataList$treeData  # less typing!
pines <- rownames( taxa )[ taxa$genus == 'Pinus' ]
xlim <- c(0, 80)
wp   <- which( treeData$species %in% pines )
hist( treeData$diam[wp], xlim = xlim, nclass = 20, main = 'Pines' )

Here are the oaks:

oaks <- rownames( taxa )[ taxa$genus == 'Quercus' ]
wq   <- which( treeData$species %in% oaks )
hist( treeData$diam[wq], xlim = xlim, nclass = 20, main = 'Oaks' )

Here is sweet gum:

par( mfrow = c(1,3), bty = 'n', omi = c(.5,.5,.1,.1) )
hist( treeData$diam[wp], xlim = xlim, nclass = 20, main = 'Pines', xlab = '', ylab = '' )
hist( treeData$diam[wq], xlim = xlim, nclass = 20, main = 'Oaks', xlab = '', ylab = '' )
ws   <- which( treeData$species == 'liquidamStyracif' )
hist( treeData$diam[ws], xlim = xlim, nclass = 20, main = 'Sweet gum', xlab = '', ylab = '' )
mtext( 'Diameter (cm)', 1, outer = T )
mtext( 'Tree years', 2, outer = T )

Dominance

Which species dominant the stand? The data.frame treeData has a row for each tree-year, but I want one row per tree. So I find the maximum diameter for each tree, which should be the most recent:

maxDiam <- tapply( treeData$diam, treeData$tree, max, na.rm = T )
mm      <- match( names(maxDiam), treeData$tree )
trees   <- data.frame( maxDiam, treeData[mm,c('species','shade')] )

The rownames of trees is the tree number.

Here is species dominance by numbers:

stems <- table( trees$species )
stems <- stems[ order(stems, decreasing = T ) ]
plot( stems, type = 's', log = 'xy', ylim = c(1, max(stems) ), xlab = 'Rank', ylab = 'Number' )

#label 12 most abundant
i <- 1:12
text( i, .9*stems[i], names(stems)[i], srt = 90, pos = 2 )

Here is species dominance by basal area:

trees$BA <- pi*(trees$maxDiam/2)^2
tba      <- tapply(trees$BA, trees$species, sum, na.rm = T )
tba      <- tba[ order(tba, decreasing = T ) ]
plot( tba, type = 's', log = 'xy', ylim = c(1, max(tba) ), xlab = 'Rank', ylab = 'Basal Area' )
text( i, .9*tba[i], names(tba)[i], srt = 90, pos = 2 )

Exercise 1: Which could be the most important tree species in this stand? Based on what criteria?

  1. How do species “partition” the environment in space?
  2. What menvironmental variables might explain spatial variation in composition?

Exercise 2: Repeat this exercise with another nearby plot, DUKEEW. Does this added information change the previous answers?

Fecundity estimates

After the field trip we’ll estimate seed production using a Bayesian hierarchical model called mastif (Clark et al. 2019, Qiu et al. 2023). We want to know which species have the largest reproductive effort and how it varies over time. Model fitting is done one genus at a time. This is because many of the seeds recovered from seed traps can be identified to genus level but not to species level. The estimation of seed produced by species is detailed in Clark et al. (2019).

We can see some fruits, nuts, and seeds in trees, but not most of them. We can observe them on the ground, but how do we determine which seeds came from which tree?

mastif uses seed traps and crop counts on trees to estimate seed productivity and dispersion. Attributes of individual trees and their local environments could explain their differences in fecundity. Inference requires information on locations of trees and seed traps and predictors ( covariates and factors ) that could explain source strength.

The `mastif` model admits two data types, crop counts (above) and seed traps (below). The model for maturation and seed production of each tree-year links to observations through a data model. In the case of seed traps, this data model describes seed transport.

The mastif model admits two data types, crop counts (above) and seed traps (below). The model for maturation and seed production of each tree-year links to observations through a data model. In the case of seed traps, this data model describes seed transport.

Crop counts enter through the long-term inventory data (treeData) or through the mastif project on iNaturalist. To add observations to mastif you can download the iNaturalist app on a smart phone. In addition to a smart phone, you need a diameter tape. A simple summary is here.

Seed traps accumulate seeds that can be counted used to estimate seed production through a dispersal kernel. They enter through inventory plots in seedData with locations in xytrap.

Many trees are not reproductively mature, and reproductive status is often unknown. For individuals of unknown maturation status (reproductively mature or not), the maturation status must be estimated together with the conditional fecundity (seed production given the mature state). Currently, maturation status depends only on tree diameter (the variable diam).

Predictors of fecundity include individual traits, site characteristics, climate, synchronicity with other trees, and lag effects. For studies that involve multiple plots and years, there can be effects of climate between sites and through time. There may be synchronicity between individuals both within and between plots.

The model incorporates two types of redistribution. The first type is a redistribution from discrete species space to a different seed-type space. Not all seed types can be definitely assigned to species. The contribution of trees of each species to multiple seed types must be estimated. As an example, a plot may include trees of these species in the same genus: pinusTaeda, pinusEchinata. The seeds recovered in traps from the same genus could include any combination of the folling types: pinusTaeda, pinusEchinata, pinusUNKN. The last seed type, identified only to the genus Pinus, could dominate seed counts. The seed production by species must estimate how they are distributed across the observed seed types. This redistribution is estimated as a specNames-to-seedNames matrix.

The second redistribution comes from source trees to seed traps, or seed dispersal. Within a plot-year there is dependence between all seed traps induced by a redistribution kernel, which describes how seeds produced by trees are dispersed in space. Because ‘seed shadows’ of trees overlap, seeds are assigned to trees only in a probabilistic sense. The capacity to obtain useful estimates depends on the number of trees, the number of seed traps, the dispersal distances of seeds, the taxonomic specificity of seed identification, and the number of species contributing to a given seed type.

In addition to predictors, the process model for fecundity can involve random tree effects, random group effects ( e.g., species-location ), year effects, and autoregressive lags limited only by the duration of data. Dispersal estimates can include species and site differences (Clark et al. 2013).

mastif package

A full description of mastif is here. The figure below summarizes the main elements. The function mastSim is used to simulate a data set. This allows us to check on data prediction and parameter recovery.

Inputs and outputs (grey) for three main functions (yellow) in the `mastif` package.

Inputs and outputs (grey) for three main functions (yellow) in the mastif package.

The function mastif fits the model to data held in four data.frames called treeData (tree-year by variables), seedData (seedTrap-year by seed counts), xytree (tree locations), and xytrap (seed-trap locations). The function mastif generates output as a list.

The function mastPlot generates plots of diagnostics.

The model requires a minimum of six inputs directly as arguments to the function mastif. There are two formulas, two character vectors, and four data.frames:

Table 2. Required in the list inputs to the function mastif

object mode components
formulaFec formula fecundity model
formulaRep formula maturation model (currently fixed at ~ diam
specNames character vector names in column species of treeData to include in model
seedNames character vector names of seed types to include in model: match column names in seedData. Seeds identified only to genus are genusUNKN
treeData data.frame tree-year by variables, includes columns plot, tree, year, diam, repr
seedData data.frame trap-year by seed types, includes columns plot, trap, year, one for each seedNames
xytree data.frame tree by location, includes columns plot, tree, x, y
xytrap data.frame trap by location, includes columns plot, trap, x, y

I introduce additional inputs in the sections that follow, but start with this basic model.

Here is code to extract species from one genus

load( dataFile, verbose = F )
genus <- 'fraxinus'
tmp   <- extractGenus( inputs, genus = genus, taxa, plotData )
dataList <- tmp$inputs
plotData <- tmp$plotData
taxa     <- tmp$taxa

Here is the trait table that has been trimmed to members of this genus on the plot:

genus family leaves fruit pollination dispersal gmPerSeed seedsPerFruit
fraxinusAmerican fraxinus Oleaceae broad_deciduous samara wind anemochory 0.0454 1
fraxinusPennsylv fraxinus Oleaceae broad_deciduous samara wind anemochory 0.0317 1

Add predictors to treeData

I want to estimate the effects of environment on seed production, so I need to add those predictors. The treeData data.frame already contains some tree-year variables, including diam and shade. Here I add plot variables to treeData from the data.frame plotData using the match function:

plotVars <- c( 'ecoRegWWF', 'slope', 'aspect', 'cec30', 'n30', 'ph', 'tpi' )
mm       <- match( dataList$treeData$plot, plotData$plot )
dataList$treeData <- cbind( dataList$treeData, plotData[mm, plotVars ] )

Here is the top of the data.frame treeData in the list dataList:

plot tree species diam shade tempAnom defAnom lateFreezeAnom tempSite defSite slope aspect cec30 n30 ph tpi
CWT427 439 fraxinusAmerican 44.7 5 -1.006 292 -3004.0 11.48 -130 18.88 26.76 332.15 5.75 5.1 -51
CWT427 439 fraxinusAmerican 44.7 5 -0.748 -447 -1193.0 11.48 -130 18.88 26.76 332.15 5.75 5.1 -51
CWT427 439 fraxinusAmerican 44.7 5 -0.548 81 43530.0 11.48 -130 18.88 26.76 332.15 5.75 5.1 -51
CWT427 439 fraxinusAmerican 44.7 5 -1.148 -28 -169.8 11.48 -130 18.88 26.76 332.15 5.75 5.1 -51
CWT427 439 fraxinusAmerican 44.7 5 -0.981 -152 -15720.0 11.48 -130 18.88 26.76 332.15 5.75 5.1 -51
CWT427 439 fraxinusAmerican 44.7 5 0.619 110 -1012.0 11.48 -130 18.88 26.76 332.15 5.75 5.1 -51
CWT427 439 fraxinusAmerican 44.7 5 0.194 238 -15040.0 11.48 -130 18.88 26.76 332.15 5.75 5.1 -51
CWT427 439 fraxinusAmerican 44.7 5 -0.431 311 20970.0 11.48 -130 18.88 26.76 332.15 5.75 5.1 -51
CWT427 439 fraxinusAmerican 44.7 5 -0.198 236 -8444.0 11.48 -130 18.88 26.76 332.15 5.75 5.1 -51
CWT427 439 fraxinusAmerican 44.7 5 0.069 210 5462.0 11.48 -130 18.88 26.76 332.15 5.75 5.1 -51

You’ll see that some predictors are plot (e.g., tempSite, defSite, cec), plot-year (e.g., tempAnom, defAnom), tree (e.g., species) or tree-year (e.g., diam). If I were using only one plot for this analysis, I could not include any plot-level predictors; they would not vary across trees on the same plot.

Year effects

There can be year effects to allow for “masting”, the tendency for individuals of the same species to produce large seed crops quasi-synchronously. There can also be detectable effects of climate variation, which enters the model as annual anomalies. In this example, I include defAnom, which is the annual anomaly in moisture deficit. Positive values for defAnom are wet years, and vice versa. In this case the yearEffect represents variation that is shared across individuals that is not explained by defAnom. The year effects tend to be “regional”, but not global. For example, the mast years for a species in New England generally do not match those in North Carolina. Here I define yearEffect by ecoRegWWF and species (quercusAlba can differ from quercusRubra).

dataList$yearEffect <- list( groups = c( 'species', 'ecoRegWWF' ) )

Random individual effects

Individuals can vary widely in seed production, so we use random tree effects, or treeID. These are specified as a random intercept:

dataList$randomEffect <- list( randGroups = 'treeID', formulaRand = as.formula( ~ 1 ) )

I try a combination of tree-year and plot-year variables for conditional fecundity. Here are some formulas and a specification to generate seed-rain prediction for the plot = DUKEBW for the mapYears = c(2018:2021):

# assign formulas
dataList$formulaFec <- formulaFec <- as.formula( ~ diam + shade + defAnom + tpi )
dataList$formulaRep <- as.formula( ~ diam )

# plot-years to predict seed rain
mapYears <- c(2000:2021)
dataList$predList <- list( mapMeters = 10, plots = plot, years = mapYears )

Model fitting requires specification of the number of MCMC iterations ng and the burnin. Here is an analysis using the dataList with a small number of iterations. Still, it will take some time due to the large numbers of tree years and seed traps.

output   <- mastif( inputs = dataList, ng = 2000, burnin = 1000 )
mastPlot( output )

Here are some plots from mastPlot:

The fitted model predicts the data from latent state estimates (c) and from predictors (a).

The fitted model predicts the data from latent state estimates (c) and from predictors (a).

Recall that fecundity is not observed, but rather estimated. Therefore I cannot compare predicted fecundity with observed fecundity. However, I can compare the seed counts (per m2) with seed counts predicted from the estimated fecundity, or fecundity estimate –> seed traps. This is part (c) of the figure above. If the estimates of fecundity and seed dispersal are confident, then this plot would tend to the 1:1 line.

I can also compare seed counts to the prediction from the full fitted model, or predictors –> fecundity –> seed traps (part a in the figure). If the predictors explain much of variation in fecundity, and dispersal is confidently estimated by the model, then this plot will tend to the 1:1 line. The differences between these plots can indicate that a breakdown in fit comes at the stage of estimating fecundity from the predictors–the predictors may not explain much variation in fecundity, even if there are confident estimates of fecundity.

Estimated year effects for two species in two ecoregions.

Estimated year effects for two species in two ecoregions.

In the plot above, the year effects are grouped by species and ecoregion. There are two species here, and the coverage differs by year and ecoregion. This is not a strong masting genus, and the year effects are noisy, not much different from zero.

Effect of predictors on fecundity, including shade class, annual deficit anomaly (negative is wet), and the topographic position index (negative is low).

Effect of predictors on fecundity, including shade class, annual deficit anomaly (negative is wet), and the topographic position index (negative is low).

Shade generally has a negative effect on fecundity. We’ve known this to be true for growth, but is effects on fecundity of trees in the wild was not known.

The negative effects of defAnom means that both species have reduced seed production in years with high moisture deficit. In other words, moist years are good.

The negative effects of tpi, or topographic position index, on fraxinusAmerican means that seed production is higher in the moist bottomlands. The second species, fraxinusPennsylv has a smaller sample size in the data set; it does not show a tpi effect.

Predicted maturation probability and fecundity by diameter for two species.

Predicted maturation probability and fecundity by diameter for two species.

The model predicts maturation and fecundity for each species. It depends on the distribution of data, shown as histograms in the right panels.

Here is the prediction surface for seed supply to the forest floor by all Fraxinus trees on the plot:

output$mapPlot   <- plot
output$mapYears  <- mapYears
output$trapScale <- .8 
output$PREDICT <- T
mastMap( output )
Predicted seed rain across the DUKEBW plot differs by year. Circles show tree diameter. Boxes are scaled to seed counts in traps. The `+` symbol indicates a seed trap that collected no seed in that year.

Predicted seed rain across the DUKEBW plot differs by year. Circles show tree diameter. Boxes are scaled to seed counts in traps. The + symbol indicates a seed trap that collected no seed in that year.

Note how the seed supply to the forest floor changes from one year to the next. If this were a strong masting species, the yearly contrasts would be much larger. Even so, there are areas receiving abundant seed and others not at all.

A volatile species

Seed production of the genus Quercus can vary widely from year to year. Many consumers benefit from high mast years. Because there are many Quercus species, and they may not be synchronized across species, it’s possible that there may be some compensation, in the sense that a poor year for one species may coincide with good years in others. Consumers capable of consuming the fruits of more than one tree species may respond to the sum.

I fitted the model to the genus Quercus and summarize the annual variation by species and ecoregion. Values have a predictive coefficient of variation greater than 1 are omitted, so there are some straight lines connecting years across these poorly estimated years. I represent the variation within a species-ecoregion in two ways.

The estimated year effects (+/- 1 SE) add to the effects of other annual effects in the model, such as climate anomalies.

The estimated year effects (+/- 1 SE) add to the effects of other annual effects in the model, such as climate anomalies.

The estimated year effects in the model amplify the effects of other predictors that contribute inter-annual variation. For this reason, the year effects can be viewed as variation shared between individuals beyond the effects other predictors in the model. Wide standard errors for the most recent years are due to fact that many samples are not yet analyzed.

The predicted inter-annual variation in seed production per cm2 of the tree's basal area; it attempts to adjust for size differences.

The predicted inter-annual variation in seed production per cm2 of the tree’s basal area; it attempts to adjust for size differences.

The interannual variation per basal area shows wide variation across trees, because synchronicity in most years is weak. Unlike the year effect estimates, these estimates incorporate all variation, including unexplained variation.

Predicted seed rain of *Quercus* across the DUKEBW plot. Circles are trees scaled to diameter. Squares are seed traps scale to seed recovery. Prediction surfaces follow the scale at bottom.Predicted seed rain of *Quercus* across the DUKEBW plot. Circles are trees scaled to diameter. Squares are seed traps scale to seed recovery. Prediction surfaces follow the scale at bottom.

Predicted seed rain of Quercus across the DUKEBW plot. Circles are trees scaled to diameter. Squares are seed traps scale to seed recovery. Prediction surfaces follow the scale at bottom.

The variation is spatio-temporal. For example, at the DUKEBW plot shows the tendency for most seed to accumulate in the southwestern part of the plot, which is high elevation and dominated by oaks. However, the maps can shift between years based on the interannual variation within and between species.

Citations

Clark, JS, C Nunes, and B Tomasek. 2019. Masting as an unreliable resource: spatio-temporal host diversity merged with consumer movement, storage, and diet. Ecological Monographs, 89:e01381.

Journe, V., et al., 2022. Globally, tree fecundity exceeds productivity gradients. Ecology Letters, DOI: 10.1111/ele.14012.

Qiu T., et al., 2022. Limits to reproduction and seed size-number tradeoffs that shape forest dominance and future recovery. Nature Communications, 13:2381.