Getting started with pecan

Note

This vignette was drafted with AI assistance and has not yet been comprehensively reviewed.

devtools::load_all("..")   # load development version of pecan
library(future)

This vignette walks through the three main functions — pecan_strata(), pecan_fit(), and pecan_plot() — using the bundled synthetic datasets. It assumes familiarity with graphical lasso and the hglasso coupling model.

Example data

Two datasets are included:

pecan_sim: 80 samples, 4 genera, 12 species, 28 ASVs. Single stratum.
pecan_sim_strat: 200 samples, 6 genera, 12 species, 24 ASVs, split into "stratum1" and "stratum2" (100 each) with orthogonal network topologies. This dataset uses a 6-genus taxonomy — with only 4 genera the CLR covariance matrix is always rank-deficient and the graphical lasso cannot resolve genus-level edges cleanly.

Each dataset is a named list with three elements that map directly to the inputs of pecan_strata():

data("pecan_sim")
names(pecan_sim)

[1] "reads"       "annotations" "meta"

# reads: samples × ASVs; ASV columns are opaque IDs (asv1, asv2, ...)
pecan_sim$reads[1:4, 1:6]

# A tibble: 4 × 6
  sample.id  asv1  asv2  asv3  asv4  asv5
  <chr>     <int> <int> <int> <int> <int>
1 sample.1      2     9    32    78     6
2 sample.2     34   240  1100   461    41
3 sample.3      8   100   427   560    33
4 sample.4      2    27    85   480    29

# annotations: maps each ASV ID to its species and genus
head(pecan_sim$annotations)

# A tibble: 6 × 5
  asv.id taxon     subdivision species   genus 
  <chr>  <chr>     <chr>       <chr>     <chr> 
1 asv1   SpeciesA1 Simulated   SpeciesA1 GenusA
2 asv2   SpeciesA1 Simulated   SpeciesA1 GenusA
3 asv3   SpeciesA1 Simulated   SpeciesA1 GenusA
4 asv4   SpeciesA2 Simulated   SpeciesA2 GenusA
5 asv5   SpeciesA2 Simulated   SpeciesA2 GenusA
6 asv6   SpeciesA3 Simulated   SpeciesA3 GenusA

# meta: one row per sample; no stratification variable here
pecan_sim$meta

# A tibble: 80 × 1
   sample.id
   <chr>    
 1 sample.1 
 2 sample.2 
 3 sample.3 
 4 sample.4 
 5 sample.5 
 6 sample.6 
 7 sample.7 
 8 sample.8 
 9 sample.9 
10 sample.10
# ℹ 70 more rows

ASV column names are intentionally opaque (asv1, asv2, …), as in real sequencing data. The taxonomy is carried entirely in annotations.

pecan_sim_strat has the same three-element structure, but meta includes a stratum column whose values will be passed to stratum_col:

data("pecan_sim_strat")
names(pecan_sim_strat)

[1] "reads"       "annotations" "meta"

# reads: 200 samples × 24 ASVs
pecan_sim_strat$reads[1:4, 1:6]

# A tibble: 4 × 6
  sample.id  asv1  asv2  asv3  asv4  asv5
  <chr>     <int> <int> <int> <int> <int>
1 sample.1    125   554    54   241   109
2 sample.2     49   197    22    74   982
3 sample.3     66   329    91   432    22
4 sample.4     43   258    56   258    35

# annotations: 6-genus taxonomy (SpA1, SpA2, … → GenusA through GenusF)
head(pecan_sim_strat$annotations)

# A tibble: 6 × 5
  asv.id taxon subdivision species genus 
  <chr>  <chr> <chr>       <chr>   <chr> 
1 asv1   SpA1  Simulated   SpA1    GenusA
2 asv2   SpA1  Simulated   SpA1    GenusA
3 asv3   SpA2  Simulated   SpA2    GenusA
4 asv4   SpA2  Simulated   SpA2    GenusA
5 asv5   SpB1  Simulated   SpB1    GenusB
6 asv6   SpB1  Simulated   SpB1    GenusB

# meta: includes the stratification column
pecan_sim_strat$meta

# A tibble: 200 × 2
   sample.id stratum 
   <chr>     <chr>   
 1 sample.1  stratum1
 2 sample.2  stratum1
 3 sample.3  stratum1
 4 sample.4  stratum1
 5 sample.5  stratum1
 6 sample.6  stratum1
 7 sample.7  stratum1
 8 sample.8  stratum1
 9 sample.9  stratum1
10 sample.10 stratum1
# ℹ 190 more rows

The stratum column in meta is the only structural difference from the base case. Everything else — the three-element list, the opaque ASV IDs, the annotations format — is identical.

`pecan_strata()`

pecan_strata() aggregates ASV counts to the requested taxonomy levels, optionally applies a prevalence filter, and (when stratum_col is supplied) splits samples into strata. The output is a pecan_strata object passed directly to pecan_fit().

st <- pecan_strata(
  reads_asv_tbl       = pecan_sim$reads,
  annotations_asv_tbl = pecan_sim$annotations,
  meta_tbl            = pecan_sim$meta,
  stratum_col         = NULL,         # NULL → single stratum named "all"
  levels              = c("species", "genus"),
  mode                = "constrained",
  min_prevalence      = 0.1
)
st

pecan_strata
  mode      : constrained (cascade: adjacent )
  levels    : genus > species 
  stratum   : <none>

levels must be supplied in fine → coarse order. pecan_fit() uses the first element as the fine level (L) and the second as the coarse level (H). These names must correspond to columns in annotations_asv_tbl.

mode controls which ASVs are used to build the coarser level:

"constrained" (recommended): the genus-level count table is built only from ASVs that are annotated at both species and genus level. This ensures the two levels share a consistent set of reads, which is important for coupling to work as intended.
"independent": each level is aggregated from any ASV annotated at that level. The two levels may include different reads, which can weaken or mislead the coupling signal.

min_prevalence drops features present in fewer than that fraction of samples before stratification, applying a consistent filter across all strata. A value of 0.10–0.20 is typical; use lower values when community composition is highly variable.

Object structure

A pecan_strata object is a named list with class "pecan_strata". Its core slot is strata: a list keyed by level name (here "species" and "genus"), each containing two named lists of the same length — one entry per stratum:

pecan_strata
└── strata
    ├── species
    │   ├── strata_reads        named list of count tibbles (samples × features)
    │   └── strata_annotations  named list of annotation tibbles
    └── genus
        ├── strata_reads
        └── strata_annotations

Without a stratification variable there is one entry named "all" in each list. The count tibble has a leading sample.id column followed by one column per feature at that level:

# Count matrix at species level (80 samples × 12 species + sample.id)
dim(st$strata$species$strata_reads$all)

[1] 80 13

st$strata$species$strata_reads$all[1:3, 1:5]

# A tibble: 3 × 5
  sample.id SpeciesA1 SpeciesA2 SpeciesA3 SpeciesB1
  <chr>         <int>     <int>     <int>     <int>
1 sample.1         43        84       335       328
2 sample.10       799       137       663       412
3 sample.11       163       106       483       590

# Count matrix at genus level (80 samples × 4 genera + sample.id)
dim(st$strata$genus$strata_reads$all)

[1] 80  5

st$strata$genus$strata_reads$all[1:3, ]

# A tibble: 3 × 5
  sample.id GenusA GenusB GenusC GenusD
  <chr>      <int>  <int>  <int>  <int>
1 sample.1     462   1008   1237   2293
2 sample.10   1599   1581   1138    682
3 sample.11    752    912   2037   1299

The annotation tibble maps each feature at a given level to its parent at the coarser level — this is what pecan_fit() uses to build the cross-level coupling term:

st$strata$species$strata_annotations$all

# A tibble: 12 × 2
   species   genus 
   <chr>     <chr> 
 1 SpeciesA1 GenusA
 2 SpeciesA2 GenusA
 3 SpeciesA3 GenusA
 4 SpeciesB1 GenusB
 5 SpeciesB2 GenusB
 6 SpeciesB3 GenusB
 7 SpeciesC1 GenusC
 8 SpeciesC2 GenusC
 9 SpeciesC3 GenusC
10 SpeciesD1 GenusD
11 SpeciesD2 GenusD
12 SpeciesD3 GenusD

Adding a stratification variable

Supplying stratum_col causes pecan_strata() to split the samples before aggregation. Each level’s strata_reads and strata_annotations lists then have one entry per stratum instead of the single "all" entry:

data("pecan_sim_strat")

st_strat <- pecan_strata(
  reads_asv_tbl       = pecan_sim_strat$reads,
  annotations_asv_tbl = pecan_sim_strat$annotations,
  meta_tbl            = pecan_sim_strat$meta,
  stratum_col         = "stratum",
  levels              = c("species", "genus"),
  mode                = "constrained",
  min_prevalence      = 0.1
)
st_strat

pecan_strata
  mode      : constrained (cascade: adjacent )
  levels    : genus > species 
  stratum   : stratum 
  strata    : stratum1, stratum2

The print summary now lists the stratum names. The internal structure is identical to the single-stratum case — two strata keys replace the one "all" key:

# Keys present in strata_reads at each level
names(st_strat$strata$species$strata_reads)

[1] "stratum1" "stratum2"

# Dimensions: 100 samples × 12 species (+sample.id) per stratum
sapply(st_strat$strata$species$strata_reads, dim)

     stratum1 stratum2
[1,]      100      100
[2,]       13       13

pecan_fit() iterates over these stratum keys automatically — no further preparation is needed.

`pecan_fit()`

pecan_fit() dispatches over strata and levels. The method argument selects a solver; the control list passes all solver and stability-selection parameters.

list_solvers()

[1] "glasso"  "hglasso"

Solvers

"glasso" fits one graphical lasso per level independently via huge::huge(). It has no coupling across levels. Use it as a baseline or when you want uncoupled per-level networks.

"hglasso" fits both levels jointly using ADMM with a quadratic coupling penalty. Coupling makes the two problems aware of each other: genus-level evidence can promote or suppress species-level edges, and vice versa. This is the primary solver for coupled inference.

Lambda path parameters

Both solvers build a log-spaced regularisation path from λ_max down to lambda_min_ratio × λ_max. These parameters apply to both solvers:

Parameter	Default	Role
`nlambda`	30	Number of path steps
`lambda_min_ratio`	0.05	Lower end of path relative to λ_max
`keep_frac`	0.75	Fraction of path used (trims the densest end)
`min_lams_keep`	10	Minimum steps regardless of `keep_frac`

Heuristics. Start with lambda_min_ratio = 0.08–0.12. If the fitted network is too dense, increase it; if too sparse, decrease it. nlambda = 20–30 is sufficient for most datasets; increase only if selection probabilities look jagged.

Lambda selection (single-path fits)

When stability = FALSE, pecan_fit() traverses the full regularisation path and must choose one lambda to report as the primary result. By default it uses extended BIC (eBIC; Foygel & Drton 2010), which balances the Gaussian negative log-likelihood against an edge-count penalty calibrated to both sample size and number of features:

\[\text{eBIC}_\gamma = n\bigl[\text{tr}(S\,\hat\Omega) - \log\det\hat\Omega\bigr] + |E|\bigl[\log n + 4\gamma\log p\bigr]\]

eBIC is summed across all levels before minimisation, so jointly-fit solvers (hglasso) pick a single consistent step and are not left with level-specific choices that may be contradictory. The full path arrays are always retained in results alongside the selected solution, so any step can be inspected or used downstream.

Control parameter	Default	Role
`select`	`"ebic"`	Selection method: `"ebic"` or `"sparsest"`
`ebic_gamma`	0.5	eBIC sparsity preference ∈ [0, 1]; 0 = BIC, higher favours sparser models

ebic_gamma = 0.5 is a widely-used default. Increase it toward 1 for smaller n/p ratios where false-positive edges are more likely; decrease toward 0 to allow denser networks when data are plentiful. Use select = "sparsest" to recover the old behaviour (pick the most regularised step regardless of data fit), or as a quick sanity check: if eBIC and sparsest disagree substantially it is a signal that the path range needs widening.

Glasso fit

stability = FALSE traverses the full regularisation path and returns the eBIC-selected network alongside all path arrays. With method = "glasso" the two levels are fit independently via huge::huge():

fit_glasso <- pecan_fit(
  st,
  method    = "glasso",
  stability = FALSE,
  control   = list(
    nlambda          = 20L,
    lambda_min_ratio = 0.10,
    keep_frac        = 0.80
  )
)
fit_glasso

pecan_fit
  method    : glasso (single fit) 
  mode      : constrained 
  stratum   : <none> 
  levels    : species, genus 
  precision : yes (prec_mx) 
  selection : eBIC 
  path      : 16 steps (adj_arr / prec_arr in results)
  results   : 2 rows (level x stratum)
# A tibble: 2 × 4
  level   stratum n_edges lambda_opt
  <chr>   <chr>     <int>      <dbl>
1 species all          11      0.217
2 genus   all           4      0.217

Stability selection is also available with method = "glasso" by setting stability = TRUE; see the Stability-selection fit section for details.

Output structure. Every pecan_fit object is a named list with the following top-level slots:

Slot	Contents
`method`	Solver used (`"glasso"` or `"hglasso"`)
`stability`	Whether stability selection was run
`mode`	`"constrained"` or `"independent"`
`stratum_var`	Name of the stratification column (`NULL` if none)
`levels`	Level names in fine → coarse order
`results`	Tibble — one row per level × stratum
`details`	List — per-stratum diagnostics

The results tibble (single-path fits). The primary output. For stability = FALSE it has eleven columns: summary scalars plus path arrays as list columns.

Column	Type	Contents
`level`	chr	Level name
`stratum`	chr	Stratum name
`nodes`	list	Annotation tibble for this level × stratum
`adj_mx`	list	p × p logical adjacency at the eBIC-selected lambda
`prec_mx`	list	p × p precision matrix at the selected lambda
`n_edges`	int	Edge count at the selected step
`adj_arr`	list	p × p × L logical array — adjacency at every path step
`prec_arr`	list	p × p × L numeric array — precision at every path step
`lambda`	list	Length-L numeric vector of lambda values
`lambda_opt`	dbl	The selected lambda value (eBIC minimum or last step)
`ebic_scores`	list	Length-L per-level eBIC scores (`NULL` if eBIC unavailable)

# Primary summary: selected solution per level
fit_glasso$results[, c("level", "stratum", "n_edges", "lambda_opt")]

# A tibble: 2 × 4
  level   stratum n_edges lambda_opt
  <chr>   <chr>     <int>      <dbl>
1 species all          11      0.217
2 genus   all           4      0.217

The path arrays let you inspect any step without re-fitting. For example, to see how the genus-level network changes across the full regularisation path:

r <- fit_glasso$results
# adj_arr is a p × p × L array (step index in the 3rd dimension)
cat("adj_arr dims (genus):", dim(r$adj_arr[[2]]), "\n")

adj_arr dims (genus): 4 4 16

# Edge count at each step
apply(r$adj_arr[[2]], 3, function(a) { diag(a) <- FALSE; sum(a) / 2 })

 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 
 2  3  4  4  4  4  4  4  5  5  6  6  6  6  6  6

The eBIC scores show the goodness-of-fit tradeoff at each step:

r <- fit_glasso$results
data.frame(
  step       = seq_along(r$lambda[[2]]),
  lambda     = round(r$lambda[[2]], 4),
  edges      = apply(r$adj_arr[[2]], 3, function(a) { diag(a)<-FALSE; sum(a)/2 }),
  ebic_genus = round(r$ebic_scores[[2]], 1)
)

   step lambda edges ebic_genus
1     1 0.4497     2      341.5
2     2 0.3984     3      338.6
3     3 0.3529     4      335.6
4     4 0.3126     4      325.4
5     5 0.2769     4      316.2
6     6 0.2453     4      307.9
7     7 0.2173     4      300.5
8     8 0.1925     4      294.0
9     9 0.1705     5      295.2
10   10 0.1511     5      288.0
11   11 0.1338     6      285.6
12   12 0.1186     6      275.9
13   13 0.1050     6      266.2
14   14 0.0930     6      256.5
15   15 0.0824     6      246.8
16   16 0.0730     6      237.1

The eBIC minimum (marked by lambda_opt) balances fit improvement against the log-edge penalty. Steps to the left are over-regularised; steps to the right overfit.

The details list (single-path fits). details is keyed by stratum and contains three summary fields per stratum (the full path arrays live in results):

Field	Description
`l_opt`	Index of the selected path step
`lambda_opt`	Lambda value at the selected step
`combined_ebic`	Length-L combined eBIC (summed across levels); `NULL` if eBIC was unavailable

d <- fit_glasso$details$all
cat("Selected step:", d$l_opt, "\n")

Selected step: 7

cat("Selected lambda:", round(d$lambda_opt, 4), "\n")

Selected lambda: 0.2173

cat("Combined eBIC at selected step:",
    round(d$combined_ebic[d$l_opt], 1), "\n")

Combined eBIC at selected step: 1334.2

Coupling parameters (`"hglasso"` only)

Parameter	Default	Role
`rho_couple`	1.0	Total coupling strength
`alpha`	0.5	Directional balance (0 = species→genus only, 1 = genus→species only)
`coupling_mode`	`"signal_only"`	How coupling is applied
`rho_suppress`	NULL	Suppression multiplier in `signal_and_suppress` mode

rho_couple sets how strongly the two levels attract each other. A value of 0 recovers uncoupled glasso at both levels. A value of 1–2 is a reasonable starting range; above 3 the coupling tends to dominate the data.

alpha = 0.5 gives symmetric coupling. Set alpha closer to 1 to have genus-level edges guide species-level selection more strongly; closer to 0 for the reverse. Symmetric is appropriate when you have no reason to prefer one direction.

coupling_mode selects the coupling mechanism:

"signal_only": a uniform penalty reduction is applied to all species pairs that have cross-level support. Edges without support are treated symmetrically (no extra penalty). This is the safer starting point.
"signal_and_suppress": a smooth per-edge weight h ∈ [0, 1] simultaneously boosts edges with cross-level support (h → 1: threshold reduced) and penalises edges without it (h → 0: threshold inflated). The neutral point (uncoupled baseline) always sits at h = 0.5. Use this when the genus-level network is dense enough to provide reliable suppression signal; it produces sparser, more interpretable species-level networks than signal_only when n/p is low.

rho_suppress (only active in signal_and_suppress mode) multiplies the natural suppression scale. NULL (default) uses natural scaling. Values > 1 amplify suppression; values < 1 soften it.

ADMM convergence parameters

These rarely need changing from defaults. rho_admm_L applies to the fine level (species); rho_admm_H to the coarse level (genus).

Parameter	Default	Role
`rho_admm_L` / `rho_admm_H`	1.0	ADMM step size per level
`max_outer_iter`	10	Outer alternating minimization steps
`tol_outer`	1e-3	Outer convergence tolerance
`max_iter`	2000	Max inner ADMM iterations per outer step
`tol_abs` / `tol_rel`	1e-4	Inner ADMM convergence tolerances

Heuristics. For exploratory single-path fits, reduce max_outer_iter to 5 and max_iter to 500 for speed. For final fits, use the defaults. If the solver is not converging (unusual), try increasing rho_admm_L / rho_admm_H toward 2–3.

Coupled fit

With method = "hglasso" the two levels are fit jointly. Setting stability = FALSE traverses the full path with the coupling penalty active and selects the eBIC-optimal lambda:

ctl_single <- list(
  nlambda          = 20L,
  lambda_min_ratio = 0.10,
  keep_frac        = 0.80,
  rho_couple       = 1.5,
  alpha            = 0.5,
  coupling_mode    = "signal_and_suppress",
  max_outer_iter   = 5L,
  max_iter         = 500L
)

fit_single <- pecan_fit(
  st,
  method    = "hglasso",
  stability = FALSE,
  control   = ctl_single
)
fit_single

pecan_fit
  method    : hglasso (single fit) 
  mode      : constrained 
  stratum   : <none> 
  levels    : species, genus 
  precision : yes (prec_mx) 
  selection : eBIC 
  path      : 16 steps (adj_arr / prec_arr in results)
  results   : 2 rows (level x stratum)
# A tibble: 2 × 4
  level   stratum n_edges lambda_opt
  <chr>   <chr>     <int>      <dbl>
1 species all           0      0.277
2 genus   all           0      0.122

Both prec_arr and prec_mx are populated because hglasso’s ADMM solver returns Θ at every step. The path arrays are in results, structured identically to the glasso case:

r <- fit_single$results
cat("adj_arr dims (species):", dim(r$adj_arr[[1]]), "\n")    # p × p × L

adj_arr dims (species): 12 12 16

cat("prec_arr dims (species):", dim(r$prec_arr[[1]]), "\n")  # p × p × L

prec_arr dims (species): 12 12 16

# Precision matrix at the eBIC-selected step
round(r$prec_mx[[1]][1:4, 1:4], 3)

          SpeciesA1 SpeciesA2 SpeciesA3 SpeciesB1
SpeciesA1     1.023     0.000      0.00     0.000
SpeciesA2     0.000     0.919      0.00     0.000
SpeciesA3     0.000     0.000      1.25     0.000
SpeciesB1     0.000     0.000      0.00     1.139

The combined eBIC path (summed across both levels) that drove the selection is in details$all$combined_ebic; plot it against the lambda path to inspect the tradeoff curve.

Note

For hglasso, eBIC is evaluated on the correlation scale: each precision matrix is rescaled as $\tilde\Omega_{ij} = \sigma_i\sigma_j\Omega_{ij}$ (where $\sigma_i$ are feature standard deviations from the CLR data) so that all levels are on a common, scale-invariant basis. Because the ADMM coupling term perturbs the optimality conditions away from the standard graphical lasso, the eBIC here is an approximation. It reliably identifies models that are clearly over- or under-regularised, but for precise lambda calibration stability selection (stability = TRUE) remains the principled choice.

Stability selection parameters

When stability = TRUE, stability_select() wraps the solver with Meinshausen-Bühlmann subsampling. Only edges whose selection probability exceeds a calibrated threshold π are retained.

Parameter	Default	Role
`B`	50	Number of subsamples
`sub_frac`	0.80	Fraction of samples per subsample
`min_gap`	2	Ensures m ≤ n − min_gap
`target_ev`	15	Target expected false positives
`target_fdr`	NULL	Alternative: target FDR bound (use one or the other)
`seed`	1	RNG seed
`parallel`	TRUE	Use `furrr` with the active `future` plan

Heuristics. B = 50 is standard for production; use B = 25 during development. target_ev = 10–20 balances sensitivity and specificity; lower values give fewer, more reliable edges. sub_frac = 0.8 works well for n ≥ 40; reduce slightly for smaller n. Always set a seed for reproducibility.

Stability-selection fit

stability = TRUE wraps the solver with subsampling. Edges are selected based on selection probability, not a single fit’s adjacency matrix. Set a future plan before calling if parallel = TRUE.

plan(multisession(workers = 4L))

ctl_stab <- list(
  B                = 25L,          # increase to 50 for production
  target_ev        = 15,
  seed             = 7492L,
  parallel         = TRUE,
  nlambda          = 30L,
  lambda_min_ratio = 0.10,
  keep_frac        = 0.75,
  rho_couple       = 1.5,
  alpha            = 0.5,
  coupling_mode    = "signal_and_suppress",
  max_outer_iter   = 8L,
  max_iter         = 1000L
)

fit_stab <- pecan_fit(
  st,
  method    = "hglasso",
  stability = TRUE,
  control   = ctl_stab
)
fit_stab

pecan_fit
  method    : hglasso (+ stability selection) 
  mode      : constrained 
  stratum   : <none> 
  levels    : species, genus 
  precision : available (prec_mx column) 
  results   : 2 rows (level x stratum)
# A tibble: 2 × 3
  level   stratum n_edges
  <chr>   <chr>     <int>
1 species all          25
2 genus   all           4

The details slot contains the full stability output, keyed by stratum and then by level. The most useful per-level diagnostics are:

pi_thr: the calibrated selection probability threshold. An edge is retained if its maximum selection probability across the lambda path meets or exceeds this value. Higher thresholds mean stricter selection.
ev_bound: the expected number of falsely selected edges at pi_thr, derived from the MB bound. This is the quantity controlled by target_ev.
fdr_bound: the implied FDR bound at pi_thr, i.e. ev_bound / max(R, 1). Useful for comparing networks of different sizes.
R: the number of selected edges at pi_thr.
subsample_m: actual subsample size used (≈ n × sub_frac, subject to min_gap).

s <- fit_stab$details$all
cat("--- species ---\n")

--- species ---

cat("Threshold (π):             ", s$species$pi_thr, "\n")

Threshold (π):              0.95

cat("Selected edges (R):        ", s$species$R, "\n")

Selected edges (R):         25

cat("Expected false positives:  ", round(s$species$ev_bound, 1), "\n")

Expected false positives:   29

cat("FDR bound:                 ", round(s$species$fdr_bound, 3), "\n")

FDR bound:                  1.159

cat("Subsample size (m):        ", s$species$subsample_m, "\n")

Subsample size (m):         64

cat("\n--- genus ---\n")


--- genus ---

cat("Threshold (π):             ", s$genus$pi_thr, "\n")

Threshold (π):              0.59

cat("Selected edges (R):        ", s$genus$R, "\n")

Selected edges (R):         4

cat("Expected false positives:  ", round(s$genus$ev_bound, 1), "\n")

Expected false positives:   14.5

cat("FDR bound:                 ", round(s$genus$fdr_bound, 3), "\n")

FDR bound:                  3.63

Interpreting these numbers. The species threshold π = 0.95 means an edge must have appeared in at least 95% of the 25 subsamples to be retained — a very conservative bar. At this threshold, 25 out of 66 possible species pairs are selected, and the MB bound guarantees at most 29 of those are expected to be false positives (FDR bound ≤ 1.16). With only 25 subsamples the bound is loose; the calibration pushed π to the top of the search grid because target_ev = 15 was not achievable at this sample size — a common outcome with small B and modest n/p. Increasing to B = 50 tightens the bound substantially.

The genus level is better calibrated: π = 0.59 and ev_bound = 14.5 (just at the target of 15), with 4 edges selected from 6 possible pairs. The FDR bound 3.63 exceeds 1 — technically valid but uninformative — because R is very small. When R is small, even one false positive implies a high FDR; this is a known limitation of the MB framework for dense regularisation at coarse levels.

Rule of thumb: a π ≥ 0.75 with ev_bound ≤ target_ev and FDR bound ≤ 0.20 is a comfortable regime. Values outside that range, as here, indicate either more subsamples or a stronger regularisation path are needed.

The results tibble (stability fits). When stability = TRUE, the results tibble has the same six core columns as always (level, stratum, nodes, adj_mx, prec_mx, n_edges) but does not contain path arrays — those columns (adj_arr, prec_arr, lambda, lambda_opt, ebic_scores) are only present for single-path fits. The selection probability history and calibration diagnostics live in details.

The details list (stability fits). Keyed by stratum, then level. The path arrays from single-path fits (adj_arr, prec_arr, lambda) are replaced by calibration outputs and the full selection probability history:

Field	Description
`adj_mx`	Logical adjacency matrix (same as in `results`)
`p_hat_max`	p × p matrix of max selection probability across the lambda path
`p_hat_array`	p × p × L array of selection probabilities at each lambda step
`pi_thr`	Calibrated threshold
`R`	Number of selected edges
`ev_bound`	Expected false positives at `pi_thr` (MB bound)
`fdr_bound`	`ev_bound / max(R, 1)`
`prec_mx`	Precision matrix refitted at the optimal lambda
`lambda`	Lambda path vector (length L)
`B`, `sub_frac`, `subsample_m`	Subsampling parameters used

p_hat_max is the most useful field for downstream work — it shows how close every edge came to being selected, regardless of whether it crossed the threshold:

round(fit_stab$details$all$species$p_hat_max, 2)[1:5, 1:5]

          SpeciesA1 SpeciesA2 SpeciesA3 SpeciesB1 SpeciesB2
SpeciesA1      0.00      0.72      1.00      0.20      0.04
SpeciesA2      0.72      0.00      1.00      0.08      0.04
SpeciesA3      1.00      1.00      0.00      0.00      0.08
SpeciesB1      0.20      0.08      0.00      0.00      0.80
SpeciesB2      0.04      0.04      0.08      0.80      0.00

prec_mx is the precision matrix Θ refitted on the full dataset at the optimal lambda. For hglasso stability fits it is also available via results$prec_mx:

round(fit_stab$results$prec_mx[[1]][1:4, 1:4], 3)

          SpeciesA1 SpeciesA2 SpeciesA3 SpeciesB1
SpeciesA1     1.467     0.000     0.200     0.000
SpeciesA2     0.000     1.296     0.194     0.000
SpeciesA3     0.200     0.194     1.414     0.000
SpeciesB1     0.000     0.000     0.000     1.228

Stratified fit

Passing a stratified pecan_strata object to pecan_fit() requires no additional arguments — the function iterates over strata automatically and applies the same control list to each:

fit_strat <- pecan_fit(
  st_strat,
  method    = "hglasso",
  stability = TRUE,
  control   = ctl_stab
)
fit_strat

pecan_fit
  method    : hglasso (+ stability selection) 
  mode      : constrained 
  stratum   : stratum 
  levels    : species, genus 
  precision : available (prec_mx column) 
  results   : 4 rows (level x stratum)
# A tibble: 4 × 3
  level   stratum  n_edges
  <chr>   <chr>      <int>
1 species stratum1      34
2 genus   stratum1      10
3 species stratum2      21
4 genus   stratum2      10

The results tibble now has one row per level × stratum (four rows instead of two), and stratum_var records which metadata column drove the split:

fit_strat$results[, c("level", "stratum", "n_edges")]

# A tibble: 4 × 3
  level   stratum  n_edges
  <chr>   <chr>      <int>
1 species stratum1      34
2 genus   stratum1      10
3 species stratum2      21
4 genus   stratum2      10

cat("stratum_var:", fit_strat$stratum_var, "\n")

stratum_var: stratum

The details list gains one key per stratum, each containing the same per-level sub-list as the single-stratum case:

# Top-level keys: one per stratum
names(fit_strat$details)

[1] "stratum1" "stratum2"

# Each stratum has the same per-level structure
names(fit_strat$details$stratum1)

[1] "species" "genus"

`pecan_plot()`

pecan_plot() produces a combined heatmap + network layout. The layout adapts automatically to whether the fit has one stratum or many.

Key arguments:

Argument	Default	Effect
`edge_color`	`"steelblue"`	Colour of inferred edges
`edge_alpha`	0.8	Transparency of edges
`node_size`	1.5	Node point size
`drop_isolates`	TRUE	Hide unconnected nodes
`show_implied`	FALSE	Draw implied coarse-level edges
`implied_color` / `implied_lty`	`"grey50"` / `"dashed"`	Style of implied edges
`heatmap`	TRUE	Show heatmap (single-stratum) or faceted heatmap (multi-stratum)
`heatmap_ncol`	1	Columns in multi-stratum heatmap facet

Single-stratum

For a single-stratum fit, the default layout places the coarse-level heatmap on the left and the two network panels stacked on the right. The heatmap fill encodes the number of cross-group fine-level edges supporting each coarse pair; a solid border marks a direct coarse-level edge. Dashed borders (show_implied = TRUE) mark pairs linked by fine-level edges that did not themselves reach the selection threshold at the coarse level.

pecan_plot(
  fit_stab,
  edge_color    = "#0072B2",
  show_implied  = TRUE,
  implied_lty   = "dashed",
  implied_alpha = 0.4,
  implied_color = "#D55E00"
)

Multi-stratum

For multi-stratum fits, heatmap = TRUE (default) returns a faceted heatmap — one panel per stratum — so genus-level patterns can be compared across conditions on a common axis. Pass heatmap = FALSE for the classic level × stratum network grid.

pecan_plot(
  fit_strat,
  edge_color    = "#0072B2",
  show_implied  = TRUE,
  implied_color = "#D55E00",
  heatmap_ncol  = 2L
)

pecan_plot(
  fit_strat,
  heatmap       = FALSE,
  edge_color    = "#0072B2",
  show_implied  = TRUE,
  implied_color = "#D55E00"
)

The two strata have fully orthogonal genus-level topologies: "stratum1" has three active genus pairs — A–D cooperating, B–E competing, and C–F cooperating; "stratum2" reassigns every genus to a different partner — A–F cooperating, B–C competing, and D–E cooperating. The species-level networks mirror this: the cross-genus species edges are in completely different positions, and the within-genus correlated pairs differ too (SpA1–SpA2 and SpD1–SpD2 are correlated in stratum1; SpA1–SpA2 and SpF1–SpF2 in stratum2). The heatmap facets should show dark tiles in non-overlapping positions across the two panels.

Example data

pecan_strata()

Object structure

Adding a stratification variable

pecan_fit()

Solvers

Lambda path parameters

Lambda selection (single-path fits)

Glasso fit

Coupling parameters ("hglasso" only)

ADMM convergence parameters

Coupled fit

Stability selection parameters

Stability-selection fit

Stratified fit

pecan_plot()

Single-stratum

Multi-stratum

`pecan_strata()`

`pecan_fit()`

Coupling parameters (`"hglasso"` only)

`pecan_plot()`