Getting started with pecan

Note

This vignette was drafted with AI assistance and has not yet been comprehensively reviewed.

devtools::load_all("..")   # load development version of pecan
library(future)

This vignette walks through the three main functions — pecan_strata(), pecan_fit(), and pecan_plot() — using the bundled synthetic datasets. It assumes familiarity with graphical lasso and the hglasso coupling model.

Example data

Two datasets are included:

  • pecan_sim: 80 samples, 4 genera, 12 species, 28 ASVs. Single stratum.
  • pecan_sim_strat: 200 samples, 6 genera, 12 species, 24 ASVs, split into "stratum1" and "stratum2" (100 each) with orthogonal network topologies. This dataset uses a 6-genus taxonomy — with only 4 genera the CLR covariance matrix is always rank-deficient and the graphical lasso cannot resolve genus-level edges cleanly.

Each dataset is a named list with three elements that map directly to the inputs of pecan_strata():

data("pecan_sim")
names(pecan_sim)
[1] "reads"       "annotations" "meta"       
# reads: samples × ASVs; ASV columns are opaque IDs (asv1, asv2, ...)
pecan_sim$reads[1:4, 1:6]
# A tibble: 4 × 6
  sample.id  asv1  asv2  asv3  asv4  asv5
  <chr>     <int> <int> <int> <int> <int>
1 sample.1      2     9    32    78     6
2 sample.2     34   240  1100   461    41
3 sample.3      8   100   427   560    33
4 sample.4      2    27    85   480    29
# annotations: maps each ASV ID to its species and genus
head(pecan_sim$annotations)
# A tibble: 6 × 5
  asv.id taxon     subdivision species   genus 
  <chr>  <chr>     <chr>       <chr>     <chr> 
1 asv1   SpeciesA1 Simulated   SpeciesA1 GenusA
2 asv2   SpeciesA1 Simulated   SpeciesA1 GenusA
3 asv3   SpeciesA1 Simulated   SpeciesA1 GenusA
4 asv4   SpeciesA2 Simulated   SpeciesA2 GenusA
5 asv5   SpeciesA2 Simulated   SpeciesA2 GenusA
6 asv6   SpeciesA3 Simulated   SpeciesA3 GenusA
# meta: one row per sample; no stratification variable here
pecan_sim$meta
# A tibble: 80 × 1
   sample.id
   <chr>    
 1 sample.1 
 2 sample.2 
 3 sample.3 
 4 sample.4 
 5 sample.5 
 6 sample.6 
 7 sample.7 
 8 sample.8 
 9 sample.9 
10 sample.10
# ℹ 70 more rows

ASV column names are intentionally opaque (asv1, asv2, …), as in real sequencing data. The taxonomy is carried entirely in annotations.

pecan_sim_strat has the same three-element structure, but meta includes a stratum column whose values will be passed to stratum_col:

data("pecan_sim_strat")
names(pecan_sim_strat)
[1] "reads"       "annotations" "meta"       
# reads: 200 samples × 24 ASVs
pecan_sim_strat$reads[1:4, 1:6]
# A tibble: 4 × 6
  sample.id  asv1  asv2  asv3  asv4  asv5
  <chr>     <int> <int> <int> <int> <int>
1 sample.1    125   554    54   241   109
2 sample.2     49   197    22    74   982
3 sample.3     66   329    91   432    22
4 sample.4     43   258    56   258    35
# annotations: 6-genus taxonomy (SpA1, SpA2, … → GenusA through GenusF)
head(pecan_sim_strat$annotations)
# A tibble: 6 × 5
  asv.id taxon subdivision species genus 
  <chr>  <chr> <chr>       <chr>   <chr> 
1 asv1   SpA1  Simulated   SpA1    GenusA
2 asv2   SpA1  Simulated   SpA1    GenusA
3 asv3   SpA2  Simulated   SpA2    GenusA
4 asv4   SpA2  Simulated   SpA2    GenusA
5 asv5   SpB1  Simulated   SpB1    GenusB
6 asv6   SpB1  Simulated   SpB1    GenusB
# meta: includes the stratification column
pecan_sim_strat$meta
# A tibble: 200 × 2
   sample.id stratum 
   <chr>     <chr>   
 1 sample.1  stratum1
 2 sample.2  stratum1
 3 sample.3  stratum1
 4 sample.4  stratum1
 5 sample.5  stratum1
 6 sample.6  stratum1
 7 sample.7  stratum1
 8 sample.8  stratum1
 9 sample.9  stratum1
10 sample.10 stratum1
# ℹ 190 more rows

The stratum column in meta is the only structural difference from the base case. Everything else — the three-element list, the opaque ASV IDs, the annotations format — is identical.

pecan_strata()

pecan_strata() aggregates ASV counts to the requested taxonomy levels, optionally applies a prevalence filter, and (when stratum_col is supplied) splits samples into strata. The output is a pecan_strata object passed directly to pecan_fit().

st <- pecan_strata(
  reads_asv_tbl       = pecan_sim$reads,
  annotations_asv_tbl = pecan_sim$annotations,
  meta_tbl            = pecan_sim$meta,
  stratum_col         = NULL,         # NULL → single stratum named "all"
  levels              = c("species", "genus"),
  mode                = "constrained",
  min_prevalence      = 0.1
)
st
pecan_strata
  mode      : constrained (cascade: adjacent )
  levels    : genus > species 
  stratum   : <none> 

levels must be supplied in fine → coarse order. pecan_fit() uses the first element as the fine level (L) and the second as the coarse level (H). These names must correspond to columns in annotations_asv_tbl.

mode controls which ASVs are used to build the coarser level:

  • "constrained" (recommended): the genus-level count table is built only from ASVs that are annotated at both species and genus level. This ensures the two levels share a consistent set of reads, which is important for coupling to work as intended.
  • "independent": each level is aggregated from any ASV annotated at that level. The two levels may include different reads, which can weaken or mislead the coupling signal.

min_prevalence drops features present in fewer than that fraction of samples before stratification, applying a consistent filter across all strata. A value of 0.10–0.20 is typical; use lower values when community composition is highly variable.

Object structure

A pecan_strata object is a named list with class "pecan_strata". Its core slot is strata: a list keyed by level name (here "species" and "genus"), each containing two named lists of the same length — one entry per stratum:

pecan_strata
└── strata
    ├── species
    │   ├── strata_reads        named list of count tibbles (samples × features)
    │   └── strata_annotations  named list of annotation tibbles
    └── genus
        ├── strata_reads
        └── strata_annotations

Without a stratification variable there is one entry named "all" in each list. The count tibble has a leading sample.id column followed by one column per feature at that level:

# Count matrix at species level (80 samples × 12 species + sample.id)
dim(st$strata$species$strata_reads$all)
[1] 80 13
st$strata$species$strata_reads$all[1:3, 1:5]
# A tibble: 3 × 5
  sample.id SpeciesA1 SpeciesA2 SpeciesA3 SpeciesB1
  <chr>         <int>     <int>     <int>     <int>
1 sample.1         43        84       335       328
2 sample.10       799       137       663       412
3 sample.11       163       106       483       590
# Count matrix at genus level (80 samples × 4 genera + sample.id)
dim(st$strata$genus$strata_reads$all)
[1] 80  5
st$strata$genus$strata_reads$all[1:3, ]
# A tibble: 3 × 5
  sample.id GenusA GenusB GenusC GenusD
  <chr>      <int>  <int>  <int>  <int>
1 sample.1     462   1008   1237   2293
2 sample.10   1599   1581   1138    682
3 sample.11    752    912   2037   1299

The annotation tibble maps each feature at a given level to its parent at the coarser level — this is what pecan_fit() uses to build the cross-level coupling term:

st$strata$species$strata_annotations$all
# A tibble: 12 × 2
   species   genus 
   <chr>     <chr> 
 1 SpeciesA1 GenusA
 2 SpeciesA2 GenusA
 3 SpeciesA3 GenusA
 4 SpeciesB1 GenusB
 5 SpeciesB2 GenusB
 6 SpeciesB3 GenusB
 7 SpeciesC1 GenusC
 8 SpeciesC2 GenusC
 9 SpeciesC3 GenusC
10 SpeciesD1 GenusD
11 SpeciesD2 GenusD
12 SpeciesD3 GenusD

Adding a stratification variable

Supplying stratum_col causes pecan_strata() to split the samples before aggregation. Each level’s strata_reads and strata_annotations lists then have one entry per stratum instead of the single "all" entry:

data("pecan_sim_strat")

st_strat <- pecan_strata(
  reads_asv_tbl       = pecan_sim_strat$reads,
  annotations_asv_tbl = pecan_sim_strat$annotations,
  meta_tbl            = pecan_sim_strat$meta,
  stratum_col         = "stratum",
  levels              = c("species", "genus"),
  mode                = "constrained",
  min_prevalence      = 0.1
)
st_strat
pecan_strata
  mode      : constrained (cascade: adjacent )
  levels    : genus > species 
  stratum   : stratum 
  strata    : stratum1, stratum2 

The print summary now lists the stratum names. The internal structure is identical to the single-stratum case — two strata keys replace the one "all" key:

# Keys present in strata_reads at each level
names(st_strat$strata$species$strata_reads)
[1] "stratum1" "stratum2"
# Dimensions: 100 samples × 12 species (+sample.id) per stratum
sapply(st_strat$strata$species$strata_reads, dim)
     stratum1 stratum2
[1,]      100      100
[2,]       13       13

pecan_fit() iterates over these stratum keys automatically — no further preparation is needed.

pecan_fit()

pecan_fit() dispatches over strata and levels. The method argument selects a solver; the control list passes all solver and stability-selection parameters.

list_solvers()
[1] "glasso"  "hglasso"

Solvers

"glasso" fits one graphical lasso per level independently via huge::huge(). It has no coupling across levels. Use it as a baseline or when you want uncoupled per-level networks.

"hglasso" fits both levels jointly using ADMM with a quadratic coupling penalty. Coupling makes the two problems aware of each other: genus-level evidence can promote or suppress species-level edges, and vice versa. This is the primary solver for coupled inference.

Lambda path parameters

Both solvers build a log-spaced regularisation path from λ_max down to lambda_min_ratio × λ_max. These parameters apply to both solvers:

Parameter Default Role
nlambda 30 Number of path steps
lambda_min_ratio 0.05 Lower end of path relative to λ_max
keep_frac 0.75 Fraction of path used (trims the densest end)
min_lams_keep 10 Minimum steps regardless of keep_frac

Heuristics. Start with lambda_min_ratio = 0.080.12. If the fitted network is too dense, increase it; if too sparse, decrease it. nlambda = 2030 is sufficient for most datasets; increase only if selection probabilities look jagged.

Lambda selection (single-path fits)

When stability = FALSE, pecan_fit() traverses the full regularisation path and must choose one lambda to report as the primary result. By default it uses extended BIC (eBIC; Foygel & Drton 2010), which balances the Gaussian negative log-likelihood against an edge-count penalty calibrated to both sample size and number of features:

\[\text{eBIC}_\gamma = n\bigl[\text{tr}(S\,\hat\Omega) - \log\det\hat\Omega\bigr] + |E|\bigl[\log n + 4\gamma\log p\bigr]\]

eBIC is summed across all levels before minimisation, so jointly-fit solvers (hglasso) pick a single consistent step and are not left with level-specific choices that may be contradictory. The full path arrays are always retained in results alongside the selected solution, so any step can be inspected or used downstream.

Control parameter Default Role
select "ebic" Selection method: "ebic" or "sparsest"
ebic_gamma 0.5 eBIC sparsity preference ∈ [0, 1]; 0 = BIC, higher favours sparser models

ebic_gamma = 0.5 is a widely-used default. Increase it toward 1 for smaller n/p ratios where false-positive edges are more likely; decrease toward 0 to allow denser networks when data are plentiful. Use select = "sparsest" to recover the old behaviour (pick the most regularised step regardless of data fit), or as a quick sanity check: if eBIC and sparsest disagree substantially it is a signal that the path range needs widening.

Glasso fit

stability = FALSE traverses the full regularisation path and returns the eBIC-selected network alongside all path arrays. With method = "glasso" the two levels are fit independently via huge::huge():

fit_glasso <- pecan_fit(
  st,
  method    = "glasso",
  stability = FALSE,
  control   = list(
    nlambda          = 20L,
    lambda_min_ratio = 0.10,
    keep_frac        = 0.80
  )
)
fit_glasso
pecan_fit
  method    : glasso (single fit) 
  mode      : constrained 
  stratum   : <none> 
  levels    : species, genus 
  precision : yes (prec_mx) 
  selection : eBIC 
  path      : 16 steps (adj_arr / prec_arr in results)
  results   : 2 rows (level x stratum)
# A tibble: 2 × 4
  level   stratum n_edges lambda_opt
  <chr>   <chr>     <int>      <dbl>
1 species all          11      0.217
2 genus   all           4      0.217

Stability selection is also available with method = "glasso" by setting stability = TRUE; see the Stability-selection fit section for details.

Output structure. Every pecan_fit object is a named list with the following top-level slots:

Slot Contents
method Solver used ("glasso" or "hglasso")
stability Whether stability selection was run
mode "constrained" or "independent"
stratum_var Name of the stratification column (NULL if none)
levels Level names in fine → coarse order
results Tibble — one row per level × stratum
details List — per-stratum diagnostics

The results tibble (single-path fits). The primary output. For stability = FALSE it has eleven columns: summary scalars plus path arrays as list columns.

Column Type Contents
level chr Level name
stratum chr Stratum name
nodes list Annotation tibble for this level × stratum
adj_mx list p × p logical adjacency at the eBIC-selected lambda
prec_mx list p × p precision matrix at the selected lambda
n_edges int Edge count at the selected step
adj_arr list p × p × L logical array — adjacency at every path step
prec_arr list p × p × L numeric array — precision at every path step
lambda list Length-L numeric vector of lambda values
lambda_opt dbl The selected lambda value (eBIC minimum or last step)
ebic_scores list Length-L per-level eBIC scores (NULL if eBIC unavailable)
# Primary summary: selected solution per level
fit_glasso$results[, c("level", "stratum", "n_edges", "lambda_opt")]
# A tibble: 2 × 4
  level   stratum n_edges lambda_opt
  <chr>   <chr>     <int>      <dbl>
1 species all          11      0.217
2 genus   all           4      0.217

The path arrays let you inspect any step without re-fitting. For example, to see how the genus-level network changes across the full regularisation path:

r <- fit_glasso$results
# adj_arr is a p × p × L array (step index in the 3rd dimension)
cat("adj_arr dims (genus):", dim(r$adj_arr[[2]]), "\n")
adj_arr dims (genus): 4 4 16 
# Edge count at each step
apply(r$adj_arr[[2]], 3, function(a) { diag(a) <- FALSE; sum(a) / 2 })
 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 
 2  3  4  4  4  4  4  4  5  5  6  6  6  6  6  6 

The eBIC scores show the goodness-of-fit tradeoff at each step:

r <- fit_glasso$results
data.frame(
  step       = seq_along(r$lambda[[2]]),
  lambda     = round(r$lambda[[2]], 4),
  edges      = apply(r$adj_arr[[2]], 3, function(a) { diag(a)<-FALSE; sum(a)/2 }),
  ebic_genus = round(r$ebic_scores[[2]], 1)
)
   step lambda edges ebic_genus
1     1 0.4497     2      341.5
2     2 0.3984     3      338.6
3     3 0.3529     4      335.6
4     4 0.3126     4      325.4
5     5 0.2769     4      316.2
6     6 0.2453     4      307.9
7     7 0.2173     4      300.5
8     8 0.1925     4      294.0
9     9 0.1705     5      295.2
10   10 0.1511     5      288.0
11   11 0.1338     6      285.6
12   12 0.1186     6      275.9
13   13 0.1050     6      266.2
14   14 0.0930     6      256.5
15   15 0.0824     6      246.8
16   16 0.0730     6      237.1

The eBIC minimum (marked by lambda_opt) balances fit improvement against the log-edge penalty. Steps to the left are over-regularised; steps to the right overfit.

The details list (single-path fits). details is keyed by stratum and contains three summary fields per stratum (the full path arrays live in results):

Field Description
l_opt Index of the selected path step
lambda_opt Lambda value at the selected step
combined_ebic Length-L combined eBIC (summed across levels); NULL if eBIC was unavailable
d <- fit_glasso$details$all
cat("Selected step:", d$l_opt, "\n")
Selected step: 7 
cat("Selected lambda:", round(d$lambda_opt, 4), "\n")
Selected lambda: 0.2173 
cat("Combined eBIC at selected step:",
    round(d$combined_ebic[d$l_opt], 1), "\n")
Combined eBIC at selected step: 1334.2 

Coupling parameters ("hglasso" only)

Parameter Default Role
rho_couple 1.0 Total coupling strength
alpha 0.5 Directional balance (0 = species→genus only, 1 = genus→species only)
coupling_mode "signal_only" How coupling is applied
rho_suppress NULL Suppression multiplier in signal_and_suppress mode

rho_couple sets how strongly the two levels attract each other. A value of 0 recovers uncoupled glasso at both levels. A value of 1–2 is a reasonable starting range; above 3 the coupling tends to dominate the data.

alpha = 0.5 gives symmetric coupling. Set alpha closer to 1 to have genus-level edges guide species-level selection more strongly; closer to 0 for the reverse. Symmetric is appropriate when you have no reason to prefer one direction.

coupling_mode selects the coupling mechanism:

  • "signal_only": a uniform penalty reduction is applied to all species pairs that have cross-level support. Edges without support are treated symmetrically (no extra penalty). This is the safer starting point.
  • "signal_and_suppress": a smooth per-edge weight h ∈ [0, 1] simultaneously boosts edges with cross-level support (h → 1: threshold reduced) and penalises edges without it (h → 0: threshold inflated). The neutral point (uncoupled baseline) always sits at h = 0.5. Use this when the genus-level network is dense enough to provide reliable suppression signal; it produces sparser, more interpretable species-level networks than signal_only when n/p is low.

rho_suppress (only active in signal_and_suppress mode) multiplies the natural suppression scale. NULL (default) uses natural scaling. Values > 1 amplify suppression; values < 1 soften it.

ADMM convergence parameters

These rarely need changing from defaults. rho_admm_L applies to the fine level (species); rho_admm_H to the coarse level (genus).

Parameter Default Role
rho_admm_L / rho_admm_H 1.0 ADMM step size per level
max_outer_iter 10 Outer alternating minimization steps
tol_outer 1e-3 Outer convergence tolerance
max_iter 2000 Max inner ADMM iterations per outer step
tol_abs / tol_rel 1e-4 Inner ADMM convergence tolerances

Heuristics. For exploratory single-path fits, reduce max_outer_iter to 5 and max_iter to 500 for speed. For final fits, use the defaults. If the solver is not converging (unusual), try increasing rho_admm_L / rho_admm_H toward 2–3.

Coupled fit

With method = "hglasso" the two levels are fit jointly. Setting stability = FALSE traverses the full path with the coupling penalty active and selects the eBIC-optimal lambda:

ctl_single <- list(
  nlambda          = 20L,
  lambda_min_ratio = 0.10,
  keep_frac        = 0.80,
  rho_couple       = 1.5,
  alpha            = 0.5,
  coupling_mode    = "signal_and_suppress",
  max_outer_iter   = 5L,
  max_iter         = 500L
)

fit_single <- pecan_fit(
  st,
  method    = "hglasso",
  stability = FALSE,
  control   = ctl_single
)
fit_single
pecan_fit
  method    : hglasso (single fit) 
  mode      : constrained 
  stratum   : <none> 
  levels    : species, genus 
  precision : yes (prec_mx) 
  selection : eBIC 
  path      : 16 steps (adj_arr / prec_arr in results)
  results   : 2 rows (level x stratum)
# A tibble: 2 × 4
  level   stratum n_edges lambda_opt
  <chr>   <chr>     <int>      <dbl>
1 species all           0      0.277
2 genus   all           0      0.122

Both prec_arr and prec_mx are populated because hglasso’s ADMM solver returns Θ at every step. The path arrays are in results, structured identically to the glasso case:

r <- fit_single$results
cat("adj_arr dims (species):", dim(r$adj_arr[[1]]), "\n")    # p × p × L
adj_arr dims (species): 12 12 16 
cat("prec_arr dims (species):", dim(r$prec_arr[[1]]), "\n")  # p × p × L
prec_arr dims (species): 12 12 16 
# Precision matrix at the eBIC-selected step
round(r$prec_mx[[1]][1:4, 1:4], 3)
          SpeciesA1 SpeciesA2 SpeciesA3 SpeciesB1
SpeciesA1     1.023     0.000      0.00     0.000
SpeciesA2     0.000     0.919      0.00     0.000
SpeciesA3     0.000     0.000      1.25     0.000
SpeciesB1     0.000     0.000      0.00     1.139

The combined eBIC path (summed across both levels) that drove the selection is in details$all$combined_ebic; plot it against the lambda path to inspect the tradeoff curve.

Note

For hglasso, eBIC is evaluated on the correlation scale: each precision matrix is rescaled as \(\tilde\Omega_{ij} = \sigma_i\sigma_j\Omega_{ij}\) (where \(\sigma_i\) are feature standard deviations from the CLR data) so that all levels are on a common, scale-invariant basis. Because the ADMM coupling term perturbs the optimality conditions away from the standard graphical lasso, the eBIC here is an approximation. It reliably identifies models that are clearly over- or under-regularised, but for precise lambda calibration stability selection (stability = TRUE) remains the principled choice.

Stability selection parameters

When stability = TRUE, stability_select() wraps the solver with Meinshausen-Bühlmann subsampling. Only edges whose selection probability exceeds a calibrated threshold π are retained.

Parameter Default Role
B 50 Number of subsamples
sub_frac 0.80 Fraction of samples per subsample
min_gap 2 Ensures m ≤ n − min_gap
target_ev 15 Target expected false positives
target_fdr NULL Alternative: target FDR bound (use one or the other)
seed 1 RNG seed
parallel TRUE Use furrr with the active future plan

Heuristics. B = 50 is standard for production; use B = 25 during development. target_ev = 1020 balances sensitivity and specificity; lower values give fewer, more reliable edges. sub_frac = 0.8 works well for n ≥ 40; reduce slightly for smaller n. Always set a seed for reproducibility.

Stability-selection fit

stability = TRUE wraps the solver with subsampling. Edges are selected based on selection probability, not a single fit’s adjacency matrix. Set a future plan before calling if parallel = TRUE.

plan(multisession(workers = 4L))

ctl_stab <- list(
  B                = 25L,          # increase to 50 for production
  target_ev        = 15,
  seed             = 7492L,
  parallel         = TRUE,
  nlambda          = 30L,
  lambda_min_ratio = 0.10,
  keep_frac        = 0.75,
  rho_couple       = 1.5,
  alpha            = 0.5,
  coupling_mode    = "signal_and_suppress",
  max_outer_iter   = 8L,
  max_iter         = 1000L
)

fit_stab <- pecan_fit(
  st,
  method    = "hglasso",
  stability = TRUE,
  control   = ctl_stab
)
fit_stab
pecan_fit
  method    : hglasso (+ stability selection) 
  mode      : constrained 
  stratum   : <none> 
  levels    : species, genus 
  precision : available (prec_mx column) 
  results   : 2 rows (level x stratum)
# A tibble: 2 × 3
  level   stratum n_edges
  <chr>   <chr>     <int>
1 species all          25
2 genus   all           4

The details slot contains the full stability output, keyed by stratum and then by level. The most useful per-level diagnostics are:

  • pi_thr: the calibrated selection probability threshold. An edge is retained if its maximum selection probability across the lambda path meets or exceeds this value. Higher thresholds mean stricter selection.
  • ev_bound: the expected number of falsely selected edges at pi_thr, derived from the MB bound. This is the quantity controlled by target_ev.
  • fdr_bound: the implied FDR bound at pi_thr, i.e. ev_bound / max(R, 1). Useful for comparing networks of different sizes.
  • R: the number of selected edges at pi_thr.
  • subsample_m: actual subsample size used (≈ n × sub_frac, subject to min_gap).
s <- fit_stab$details$all
cat("--- species ---\n")
--- species ---
cat("Threshold (π):             ", s$species$pi_thr, "\n")
Threshold (π):              0.95 
cat("Selected edges (R):        ", s$species$R, "\n")
Selected edges (R):         25 
cat("Expected false positives:  ", round(s$species$ev_bound, 1), "\n")
Expected false positives:   29 
cat("FDR bound:                 ", round(s$species$fdr_bound, 3), "\n")
FDR bound:                  1.159 
cat("Subsample size (m):        ", s$species$subsample_m, "\n")
Subsample size (m):         64 
cat("\n--- genus ---\n")

--- genus ---
cat("Threshold (π):             ", s$genus$pi_thr, "\n")
Threshold (π):              0.59 
cat("Selected edges (R):        ", s$genus$R, "\n")
Selected edges (R):         4 
cat("Expected false positives:  ", round(s$genus$ev_bound, 1), "\n")
Expected false positives:   14.5 
cat("FDR bound:                 ", round(s$genus$fdr_bound, 3), "\n")
FDR bound:                  3.63 

Interpreting these numbers. The species threshold π = 0.95 means an edge must have appeared in at least 95% of the 25 subsamples to be retained — a very conservative bar. At this threshold, 25 out of 66 possible species pairs are selected, and the MB bound guarantees at most 29 of those are expected to be false positives (FDR bound ≤ 1.16). With only 25 subsamples the bound is loose; the calibration pushed π to the top of the search grid because target_ev = 15 was not achievable at this sample size — a common outcome with small B and modest n/p. Increasing to B = 50 tightens the bound substantially.

The genus level is better calibrated: π = 0.59 and ev_bound = 14.5 (just at the target of 15), with 4 edges selected from 6 possible pairs. The FDR bound 3.63 exceeds 1 — technically valid but uninformative — because R is very small. When R is small, even one false positive implies a high FDR; this is a known limitation of the MB framework for dense regularisation at coarse levels.

Rule of thumb: a π ≥ 0.75 with ev_bound ≤ target_ev and FDR bound ≤ 0.20 is a comfortable regime. Values outside that range, as here, indicate either more subsamples or a stronger regularisation path are needed.

The results tibble (stability fits). When stability = TRUE, the results tibble has the same six core columns as always (level, stratum, nodes, adj_mx, prec_mx, n_edges) but does not contain path arrays — those columns (adj_arr, prec_arr, lambda, lambda_opt, ebic_scores) are only present for single-path fits. The selection probability history and calibration diagnostics live in details.

The details list (stability fits). Keyed by stratum, then level. The path arrays from single-path fits (adj_arr, prec_arr, lambda) are replaced by calibration outputs and the full selection probability history:

Field Description
adj_mx Logical adjacency matrix (same as in results)
p_hat_max p × p matrix of max selection probability across the lambda path
p_hat_array p × p × L array of selection probabilities at each lambda step
pi_thr Calibrated threshold
R Number of selected edges
ev_bound Expected false positives at pi_thr (MB bound)
fdr_bound ev_bound / max(R, 1)
prec_mx Precision matrix refitted at the optimal lambda
lambda Lambda path vector (length L)
B, sub_frac, subsample_m Subsampling parameters used

p_hat_max is the most useful field for downstream work — it shows how close every edge came to being selected, regardless of whether it crossed the threshold:

round(fit_stab$details$all$species$p_hat_max, 2)[1:5, 1:5]
          SpeciesA1 SpeciesA2 SpeciesA3 SpeciesB1 SpeciesB2
SpeciesA1      0.00      0.72      1.00      0.20      0.04
SpeciesA2      0.72      0.00      1.00      0.08      0.04
SpeciesA3      1.00      1.00      0.00      0.00      0.08
SpeciesB1      0.20      0.08      0.00      0.00      0.80
SpeciesB2      0.04      0.04      0.08      0.80      0.00

prec_mx is the precision matrix Θ refitted on the full dataset at the optimal lambda. For hglasso stability fits it is also available via results$prec_mx:

round(fit_stab$results$prec_mx[[1]][1:4, 1:4], 3)
          SpeciesA1 SpeciesA2 SpeciesA3 SpeciesB1
SpeciesA1     1.467     0.000     0.200     0.000
SpeciesA2     0.000     1.296     0.194     0.000
SpeciesA3     0.200     0.194     1.414     0.000
SpeciesB1     0.000     0.000     0.000     1.228

Stratified fit

Passing a stratified pecan_strata object to pecan_fit() requires no additional arguments — the function iterates over strata automatically and applies the same control list to each:

fit_strat <- pecan_fit(
  st_strat,
  method    = "hglasso",
  stability = TRUE,
  control   = ctl_stab
)
fit_strat
pecan_fit
  method    : hglasso (+ stability selection) 
  mode      : constrained 
  stratum   : stratum 
  levels    : species, genus 
  precision : available (prec_mx column) 
  results   : 4 rows (level x stratum)
# A tibble: 4 × 3
  level   stratum  n_edges
  <chr>   <chr>      <int>
1 species stratum1      34
2 genus   stratum1      10
3 species stratum2      21
4 genus   stratum2      10

The results tibble now has one row per level × stratum (four rows instead of two), and stratum_var records which metadata column drove the split:

fit_strat$results[, c("level", "stratum", "n_edges")]
# A tibble: 4 × 3
  level   stratum  n_edges
  <chr>   <chr>      <int>
1 species stratum1      34
2 genus   stratum1      10
3 species stratum2      21
4 genus   stratum2      10
cat("stratum_var:", fit_strat$stratum_var, "\n")
stratum_var: stratum 

The details list gains one key per stratum, each containing the same per-level sub-list as the single-stratum case:

# Top-level keys: one per stratum
names(fit_strat$details)
[1] "stratum1" "stratum2"
# Each stratum has the same per-level structure
names(fit_strat$details$stratum1)
[1] "species" "genus"  

pecan_plot()

pecan_plot() produces a combined heatmap + network layout. The layout adapts automatically to whether the fit has one stratum or many.

Key arguments:

Argument Default Effect
edge_color "steelblue" Colour of inferred edges
edge_alpha 0.8 Transparency of edges
node_size 1.5 Node point size
drop_isolates TRUE Hide unconnected nodes
show_implied FALSE Draw implied coarse-level edges
implied_color / implied_lty "grey50" / "dashed" Style of implied edges
heatmap TRUE Show heatmap (single-stratum) or faceted heatmap (multi-stratum)
heatmap_ncol 1 Columns in multi-stratum heatmap facet

Single-stratum

For a single-stratum fit, the default layout places the coarse-level heatmap on the left and the two network panels stacked on the right. The heatmap fill encodes the number of cross-group fine-level edges supporting each coarse pair; a solid border marks a direct coarse-level edge. Dashed borders (show_implied = TRUE) mark pairs linked by fine-level edges that did not themselves reach the selection threshold at the coarse level.

pecan_plot(
  fit_stab,
  edge_color    = "#0072B2",
  show_implied  = TRUE,
  implied_lty   = "dashed",
  implied_alpha = 0.4,
  implied_color = "#D55E00"
)
Figure 1

Multi-stratum

For multi-stratum fits, heatmap = TRUE (default) returns a faceted heatmap — one panel per stratum — so genus-level patterns can be compared across conditions on a common axis. Pass heatmap = FALSE for the classic level × stratum network grid.

pecan_plot(
  fit_strat,
  edge_color    = "#0072B2",
  show_implied  = TRUE,
  implied_color = "#D55E00",
  heatmap_ncol  = 2L
)
Figure 2
pecan_plot(
  fit_strat,
  heatmap       = FALSE,
  edge_color    = "#0072B2",
  show_implied  = TRUE,
  implied_color = "#D55E00"
)
Figure 3

The two strata have fully orthogonal genus-level topologies: "stratum1" has three active genus pairs — A–D cooperating, B–E competing, and C–F cooperating; "stratum2" reassigns every genus to a different partner — A–F cooperating, B–C competing, and D–E cooperating. The species-level networks mirror this: the cross-genus species edges are in completely different positions, and the within-genus correlated pairs differ too (SpA1–SpA2 and SpD1–SpD2 are correlated in stratum1; SpA1–SpA2 and SpF1–SpF2 in stratum2). The heatmap facets should show dark tiles in non-overlapping positions across the two panels.