devtools::load_all("..") # load development version of pecan
library(future)Getting started with pecan
This vignette was drafted with AI assistance and has not yet been comprehensively reviewed.
This vignette walks through the three main functions — pecan_strata(), pecan_fit(), and pecan_plot() — using the bundled synthetic datasets. It assumes familiarity with graphical lasso and the hglasso coupling model.
Example data
Two datasets are included:
pecan_sim: 80 samples, 4 genera, 12 species, 28 ASVs. Single stratum.pecan_sim_strat: 200 samples, 6 genera, 12 species, 24 ASVs, split into"stratum1"and"stratum2"(100 each) with orthogonal network topologies. This dataset uses a 6-genus taxonomy — with only 4 genera the CLR covariance matrix is always rank-deficient and the graphical lasso cannot resolve genus-level edges cleanly.
Each dataset is a named list with three elements that map directly to the inputs of pecan_strata():
data("pecan_sim")
names(pecan_sim)[1] "reads" "annotations" "meta"
# reads: samples × ASVs; ASV columns are opaque IDs (asv1, asv2, ...)
pecan_sim$reads[1:4, 1:6]# A tibble: 4 × 6
sample.id asv1 asv2 asv3 asv4 asv5
<chr> <int> <int> <int> <int> <int>
1 sample.1 2 9 32 78 6
2 sample.2 34 240 1100 461 41
3 sample.3 8 100 427 560 33
4 sample.4 2 27 85 480 29
# annotations: maps each ASV ID to its species and genus
head(pecan_sim$annotations)# A tibble: 6 × 5
asv.id taxon subdivision species genus
<chr> <chr> <chr> <chr> <chr>
1 asv1 SpeciesA1 Simulated SpeciesA1 GenusA
2 asv2 SpeciesA1 Simulated SpeciesA1 GenusA
3 asv3 SpeciesA1 Simulated SpeciesA1 GenusA
4 asv4 SpeciesA2 Simulated SpeciesA2 GenusA
5 asv5 SpeciesA2 Simulated SpeciesA2 GenusA
6 asv6 SpeciesA3 Simulated SpeciesA3 GenusA
# meta: one row per sample; no stratification variable here
pecan_sim$meta# A tibble: 80 × 1
sample.id
<chr>
1 sample.1
2 sample.2
3 sample.3
4 sample.4
5 sample.5
6 sample.6
7 sample.7
8 sample.8
9 sample.9
10 sample.10
# ℹ 70 more rows
ASV column names are intentionally opaque (asv1, asv2, …), as in real sequencing data. The taxonomy is carried entirely in annotations.
pecan_sim_strat has the same three-element structure, but meta includes a stratum column whose values will be passed to stratum_col:
data("pecan_sim_strat")
names(pecan_sim_strat)[1] "reads" "annotations" "meta"
# reads: 200 samples × 24 ASVs
pecan_sim_strat$reads[1:4, 1:6]# A tibble: 4 × 6
sample.id asv1 asv2 asv3 asv4 asv5
<chr> <int> <int> <int> <int> <int>
1 sample.1 125 554 54 241 109
2 sample.2 49 197 22 74 982
3 sample.3 66 329 91 432 22
4 sample.4 43 258 56 258 35
# annotations: 6-genus taxonomy (SpA1, SpA2, … → GenusA through GenusF)
head(pecan_sim_strat$annotations)# A tibble: 6 × 5
asv.id taxon subdivision species genus
<chr> <chr> <chr> <chr> <chr>
1 asv1 SpA1 Simulated SpA1 GenusA
2 asv2 SpA1 Simulated SpA1 GenusA
3 asv3 SpA2 Simulated SpA2 GenusA
4 asv4 SpA2 Simulated SpA2 GenusA
5 asv5 SpB1 Simulated SpB1 GenusB
6 asv6 SpB1 Simulated SpB1 GenusB
# meta: includes the stratification column
pecan_sim_strat$meta# A tibble: 200 × 2
sample.id stratum
<chr> <chr>
1 sample.1 stratum1
2 sample.2 stratum1
3 sample.3 stratum1
4 sample.4 stratum1
5 sample.5 stratum1
6 sample.6 stratum1
7 sample.7 stratum1
8 sample.8 stratum1
9 sample.9 stratum1
10 sample.10 stratum1
# ℹ 190 more rows
The stratum column in meta is the only structural difference from the base case. Everything else — the three-element list, the opaque ASV IDs, the annotations format — is identical.
pecan_strata()
pecan_strata() aggregates ASV counts to the requested taxonomy levels, optionally applies a prevalence filter, and (when stratum_col is supplied) splits samples into strata. The output is a pecan_strata object passed directly to pecan_fit().
st <- pecan_strata(
reads_asv_tbl = pecan_sim$reads,
annotations_asv_tbl = pecan_sim$annotations,
meta_tbl = pecan_sim$meta,
stratum_col = NULL, # NULL → single stratum named "all"
levels = c("species", "genus"),
mode = "constrained",
min_prevalence = 0.1
)
stpecan_strata
mode : constrained (cascade: adjacent )
levels : genus > species
stratum : <none>
levels must be supplied in fine → coarse order. pecan_fit() uses the first element as the fine level (L) and the second as the coarse level (H). These names must correspond to columns in annotations_asv_tbl.
mode controls which ASVs are used to build the coarser level:
"constrained"(recommended): the genus-level count table is built only from ASVs that are annotated at both species and genus level. This ensures the two levels share a consistent set of reads, which is important for coupling to work as intended."independent": each level is aggregated from any ASV annotated at that level. The two levels may include different reads, which can weaken or mislead the coupling signal.
min_prevalence drops features present in fewer than that fraction of samples before stratification, applying a consistent filter across all strata. A value of 0.10–0.20 is typical; use lower values when community composition is highly variable.
Object structure
A pecan_strata object is a named list with class "pecan_strata". Its core slot is strata: a list keyed by level name (here "species" and "genus"), each containing two named lists of the same length — one entry per stratum:
pecan_strata
└── strata
├── species
│ ├── strata_reads named list of count tibbles (samples × features)
│ └── strata_annotations named list of annotation tibbles
└── genus
├── strata_reads
└── strata_annotations
Without a stratification variable there is one entry named "all" in each list. The count tibble has a leading sample.id column followed by one column per feature at that level:
# Count matrix at species level (80 samples × 12 species + sample.id)
dim(st$strata$species$strata_reads$all)[1] 80 13
st$strata$species$strata_reads$all[1:3, 1:5]# A tibble: 3 × 5
sample.id SpeciesA1 SpeciesA2 SpeciesA3 SpeciesB1
<chr> <int> <int> <int> <int>
1 sample.1 43 84 335 328
2 sample.10 799 137 663 412
3 sample.11 163 106 483 590
# Count matrix at genus level (80 samples × 4 genera + sample.id)
dim(st$strata$genus$strata_reads$all)[1] 80 5
st$strata$genus$strata_reads$all[1:3, ]# A tibble: 3 × 5
sample.id GenusA GenusB GenusC GenusD
<chr> <int> <int> <int> <int>
1 sample.1 462 1008 1237 2293
2 sample.10 1599 1581 1138 682
3 sample.11 752 912 2037 1299
The annotation tibble maps each feature at a given level to its parent at the coarser level — this is what pecan_fit() uses to build the cross-level coupling term:
st$strata$species$strata_annotations$all# A tibble: 12 × 2
species genus
<chr> <chr>
1 SpeciesA1 GenusA
2 SpeciesA2 GenusA
3 SpeciesA3 GenusA
4 SpeciesB1 GenusB
5 SpeciesB2 GenusB
6 SpeciesB3 GenusB
7 SpeciesC1 GenusC
8 SpeciesC2 GenusC
9 SpeciesC3 GenusC
10 SpeciesD1 GenusD
11 SpeciesD2 GenusD
12 SpeciesD3 GenusD
Adding a stratification variable
Supplying stratum_col causes pecan_strata() to split the samples before aggregation. Each level’s strata_reads and strata_annotations lists then have one entry per stratum instead of the single "all" entry:
data("pecan_sim_strat")
st_strat <- pecan_strata(
reads_asv_tbl = pecan_sim_strat$reads,
annotations_asv_tbl = pecan_sim_strat$annotations,
meta_tbl = pecan_sim_strat$meta,
stratum_col = "stratum",
levels = c("species", "genus"),
mode = "constrained",
min_prevalence = 0.1
)
st_stratpecan_strata
mode : constrained (cascade: adjacent )
levels : genus > species
stratum : stratum
strata : stratum1, stratum2
The print summary now lists the stratum names. The internal structure is identical to the single-stratum case — two strata keys replace the one "all" key:
# Keys present in strata_reads at each level
names(st_strat$strata$species$strata_reads)[1] "stratum1" "stratum2"
# Dimensions: 100 samples × 12 species (+sample.id) per stratum
sapply(st_strat$strata$species$strata_reads, dim) stratum1 stratum2
[1,] 100 100
[2,] 13 13
pecan_fit() iterates over these stratum keys automatically — no further preparation is needed.
pecan_fit()
pecan_fit() dispatches over strata and levels. The method argument selects a solver; the control list passes all solver and stability-selection parameters.
list_solvers()[1] "glasso" "hglasso"
Solvers
"glasso" fits one graphical lasso per level independently via huge::huge(). It has no coupling across levels. Use it as a baseline or when you want uncoupled per-level networks.
"hglasso" fits both levels jointly using ADMM with a quadratic coupling penalty. Coupling makes the two problems aware of each other: genus-level evidence can promote or suppress species-level edges, and vice versa. This is the primary solver for coupled inference.
Lambda path parameters
Both solvers build a log-spaced regularisation path from λ_max down to lambda_min_ratio × λ_max. These parameters apply to both solvers:
| Parameter | Default | Role |
|---|---|---|
nlambda |
30 | Number of path steps |
lambda_min_ratio |
0.05 | Lower end of path relative to λ_max |
keep_frac |
0.75 | Fraction of path used (trims the densest end) |
min_lams_keep |
10 | Minimum steps regardless of keep_frac |
Heuristics. Start with lambda_min_ratio = 0.08–0.12. If the fitted network is too dense, increase it; if too sparse, decrease it. nlambda = 20–30 is sufficient for most datasets; increase only if selection probabilities look jagged.
Lambda selection (single-path fits)
When stability = FALSE, pecan_fit() traverses the full regularisation path and must choose one lambda to report as the primary result. By default it uses extended BIC (eBIC; Foygel & Drton 2010), which balances the Gaussian negative log-likelihood against an edge-count penalty calibrated to both sample size and number of features:
\[\text{eBIC}_\gamma = n\bigl[\text{tr}(S\,\hat\Omega) - \log\det\hat\Omega\bigr] + |E|\bigl[\log n + 4\gamma\log p\bigr]\]
eBIC is summed across all levels before minimisation, so jointly-fit solvers (hglasso) pick a single consistent step and are not left with level-specific choices that may be contradictory. The full path arrays are always retained in results alongside the selected solution, so any step can be inspected or used downstream.
| Control parameter | Default | Role |
|---|---|---|
select |
"ebic" |
Selection method: "ebic" or "sparsest" |
ebic_gamma |
0.5 | eBIC sparsity preference ∈ [0, 1]; 0 = BIC, higher favours sparser models |
ebic_gamma = 0.5 is a widely-used default. Increase it toward 1 for smaller n/p ratios where false-positive edges are more likely; decrease toward 0 to allow denser networks when data are plentiful. Use select = "sparsest" to recover the old behaviour (pick the most regularised step regardless of data fit), or as a quick sanity check: if eBIC and sparsest disagree substantially it is a signal that the path range needs widening.
Glasso fit
stability = FALSE traverses the full regularisation path and returns the eBIC-selected network alongside all path arrays. With method = "glasso" the two levels are fit independently via huge::huge():
fit_glasso <- pecan_fit(
st,
method = "glasso",
stability = FALSE,
control = list(
nlambda = 20L,
lambda_min_ratio = 0.10,
keep_frac = 0.80
)
)
fit_glassopecan_fit
method : glasso (single fit)
mode : constrained
stratum : <none>
levels : species, genus
precision : yes (prec_mx)
selection : eBIC
path : 16 steps (adj_arr / prec_arr in results)
results : 2 rows (level x stratum)
# A tibble: 2 × 4
level stratum n_edges lambda_opt
<chr> <chr> <int> <dbl>
1 species all 11 0.217
2 genus all 4 0.217
Stability selection is also available with method = "glasso" by setting stability = TRUE; see the Stability-selection fit section for details.
Output structure. Every pecan_fit object is a named list with the following top-level slots:
| Slot | Contents |
|---|---|
method |
Solver used ("glasso" or "hglasso") |
stability |
Whether stability selection was run |
mode |
"constrained" or "independent" |
stratum_var |
Name of the stratification column (NULL if none) |
levels |
Level names in fine → coarse order |
results |
Tibble — one row per level × stratum |
details |
List — per-stratum diagnostics |
The results tibble (single-path fits). The primary output. For stability = FALSE it has eleven columns: summary scalars plus path arrays as list columns.
| Column | Type | Contents |
|---|---|---|
level |
chr | Level name |
stratum |
chr | Stratum name |
nodes |
list | Annotation tibble for this level × stratum |
adj_mx |
list | p × p logical adjacency at the eBIC-selected lambda |
prec_mx |
list | p × p precision matrix at the selected lambda |
n_edges |
int | Edge count at the selected step |
adj_arr |
list | p × p × L logical array — adjacency at every path step |
prec_arr |
list | p × p × L numeric array — precision at every path step |
lambda |
list | Length-L numeric vector of lambda values |
lambda_opt |
dbl | The selected lambda value (eBIC minimum or last step) |
ebic_scores |
list | Length-L per-level eBIC scores (NULL if eBIC unavailable) |
# Primary summary: selected solution per level
fit_glasso$results[, c("level", "stratum", "n_edges", "lambda_opt")]# A tibble: 2 × 4
level stratum n_edges lambda_opt
<chr> <chr> <int> <dbl>
1 species all 11 0.217
2 genus all 4 0.217
The path arrays let you inspect any step without re-fitting. For example, to see how the genus-level network changes across the full regularisation path:
r <- fit_glasso$results
# adj_arr is a p × p × L array (step index in the 3rd dimension)
cat("adj_arr dims (genus):", dim(r$adj_arr[[2]]), "\n")adj_arr dims (genus): 4 4 16
# Edge count at each step
apply(r$adj_arr[[2]], 3, function(a) { diag(a) <- FALSE; sum(a) / 2 }) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
2 3 4 4 4 4 4 4 5 5 6 6 6 6 6 6
The eBIC scores show the goodness-of-fit tradeoff at each step:
r <- fit_glasso$results
data.frame(
step = seq_along(r$lambda[[2]]),
lambda = round(r$lambda[[2]], 4),
edges = apply(r$adj_arr[[2]], 3, function(a) { diag(a)<-FALSE; sum(a)/2 }),
ebic_genus = round(r$ebic_scores[[2]], 1)
) step lambda edges ebic_genus
1 1 0.4497 2 341.5
2 2 0.3984 3 338.6
3 3 0.3529 4 335.6
4 4 0.3126 4 325.4
5 5 0.2769 4 316.2
6 6 0.2453 4 307.9
7 7 0.2173 4 300.5
8 8 0.1925 4 294.0
9 9 0.1705 5 295.2
10 10 0.1511 5 288.0
11 11 0.1338 6 285.6
12 12 0.1186 6 275.9
13 13 0.1050 6 266.2
14 14 0.0930 6 256.5
15 15 0.0824 6 246.8
16 16 0.0730 6 237.1
The eBIC minimum (marked by lambda_opt) balances fit improvement against the log-edge penalty. Steps to the left are over-regularised; steps to the right overfit.
The details list (single-path fits). details is keyed by stratum and contains three summary fields per stratum (the full path arrays live in results):
| Field | Description |
|---|---|
l_opt |
Index of the selected path step |
lambda_opt |
Lambda value at the selected step |
combined_ebic |
Length-L combined eBIC (summed across levels); NULL if eBIC was unavailable |
d <- fit_glasso$details$all
cat("Selected step:", d$l_opt, "\n")Selected step: 7
cat("Selected lambda:", round(d$lambda_opt, 4), "\n")Selected lambda: 0.2173
cat("Combined eBIC at selected step:",
round(d$combined_ebic[d$l_opt], 1), "\n")Combined eBIC at selected step: 1334.2
Coupling parameters ("hglasso" only)
| Parameter | Default | Role |
|---|---|---|
rho_couple |
1.0 | Total coupling strength |
alpha |
0.5 | Directional balance (0 = species→genus only, 1 = genus→species only) |
coupling_mode |
"signal_only" |
How coupling is applied |
rho_suppress |
NULL | Suppression multiplier in signal_and_suppress mode |
rho_couple sets how strongly the two levels attract each other. A value of 0 recovers uncoupled glasso at both levels. A value of 1–2 is a reasonable starting range; above 3 the coupling tends to dominate the data.
alpha = 0.5 gives symmetric coupling. Set alpha closer to 1 to have genus-level edges guide species-level selection more strongly; closer to 0 for the reverse. Symmetric is appropriate when you have no reason to prefer one direction.
coupling_mode selects the coupling mechanism:
"signal_only": a uniform penalty reduction is applied to all species pairs that have cross-level support. Edges without support are treated symmetrically (no extra penalty). This is the safer starting point."signal_and_suppress": a smooth per-edge weight h ∈ [0, 1] simultaneously boosts edges with cross-level support (h → 1: threshold reduced) and penalises edges without it (h → 0: threshold inflated). The neutral point (uncoupled baseline) always sits at h = 0.5. Use this when the genus-level network is dense enough to provide reliable suppression signal; it produces sparser, more interpretable species-level networks thansignal_onlywhen n/p is low.
rho_suppress (only active in signal_and_suppress mode) multiplies the natural suppression scale. NULL (default) uses natural scaling. Values > 1 amplify suppression; values < 1 soften it.
ADMM convergence parameters
These rarely need changing from defaults. rho_admm_L applies to the fine level (species); rho_admm_H to the coarse level (genus).
| Parameter | Default | Role |
|---|---|---|
rho_admm_L / rho_admm_H |
1.0 | ADMM step size per level |
max_outer_iter |
10 | Outer alternating minimization steps |
tol_outer |
1e-3 | Outer convergence tolerance |
max_iter |
2000 | Max inner ADMM iterations per outer step |
tol_abs / tol_rel |
1e-4 | Inner ADMM convergence tolerances |
Heuristics. For exploratory single-path fits, reduce max_outer_iter to 5 and max_iter to 500 for speed. For final fits, use the defaults. If the solver is not converging (unusual), try increasing rho_admm_L / rho_admm_H toward 2–3.
Coupled fit
With method = "hglasso" the two levels are fit jointly. Setting stability = FALSE traverses the full path with the coupling penalty active and selects the eBIC-optimal lambda:
ctl_single <- list(
nlambda = 20L,
lambda_min_ratio = 0.10,
keep_frac = 0.80,
rho_couple = 1.5,
alpha = 0.5,
coupling_mode = "signal_and_suppress",
max_outer_iter = 5L,
max_iter = 500L
)
fit_single <- pecan_fit(
st,
method = "hglasso",
stability = FALSE,
control = ctl_single
)
fit_singlepecan_fit
method : hglasso (single fit)
mode : constrained
stratum : <none>
levels : species, genus
precision : yes (prec_mx)
selection : eBIC
path : 16 steps (adj_arr / prec_arr in results)
results : 2 rows (level x stratum)
# A tibble: 2 × 4
level stratum n_edges lambda_opt
<chr> <chr> <int> <dbl>
1 species all 0 0.277
2 genus all 0 0.122
Both prec_arr and prec_mx are populated because hglasso’s ADMM solver returns Θ at every step. The path arrays are in results, structured identically to the glasso case:
r <- fit_single$results
cat("adj_arr dims (species):", dim(r$adj_arr[[1]]), "\n") # p × p × Ladj_arr dims (species): 12 12 16
cat("prec_arr dims (species):", dim(r$prec_arr[[1]]), "\n") # p × p × Lprec_arr dims (species): 12 12 16
# Precision matrix at the eBIC-selected step
round(r$prec_mx[[1]][1:4, 1:4], 3) SpeciesA1 SpeciesA2 SpeciesA3 SpeciesB1
SpeciesA1 1.023 0.000 0.00 0.000
SpeciesA2 0.000 0.919 0.00 0.000
SpeciesA3 0.000 0.000 1.25 0.000
SpeciesB1 0.000 0.000 0.00 1.139
The combined eBIC path (summed across both levels) that drove the selection is in details$all$combined_ebic; plot it against the lambda path to inspect the tradeoff curve.
For hglasso, eBIC is evaluated on the correlation scale: each precision matrix is rescaled as \(\tilde\Omega_{ij} = \sigma_i\sigma_j\Omega_{ij}\) (where \(\sigma_i\) are feature standard deviations from the CLR data) so that all levels are on a common, scale-invariant basis. Because the ADMM coupling term perturbs the optimality conditions away from the standard graphical lasso, the eBIC here is an approximation. It reliably identifies models that are clearly over- or under-regularised, but for precise lambda calibration stability selection (stability = TRUE) remains the principled choice.
Stability selection parameters
When stability = TRUE, stability_select() wraps the solver with Meinshausen-Bühlmann subsampling. Only edges whose selection probability exceeds a calibrated threshold π are retained.
| Parameter | Default | Role |
|---|---|---|
B |
50 | Number of subsamples |
sub_frac |
0.80 | Fraction of samples per subsample |
min_gap |
2 | Ensures m ≤ n − min_gap |
target_ev |
15 | Target expected false positives |
target_fdr |
NULL | Alternative: target FDR bound (use one or the other) |
seed |
1 | RNG seed |
parallel |
TRUE | Use furrr with the active future plan |
Heuristics. B = 50 is standard for production; use B = 25 during development. target_ev = 10–20 balances sensitivity and specificity; lower values give fewer, more reliable edges. sub_frac = 0.8 works well for n ≥ 40; reduce slightly for smaller n. Always set a seed for reproducibility.
Stability-selection fit
stability = TRUE wraps the solver with subsampling. Edges are selected based on selection probability, not a single fit’s adjacency matrix. Set a future plan before calling if parallel = TRUE.
plan(multisession(workers = 4L))
ctl_stab <- list(
B = 25L, # increase to 50 for production
target_ev = 15,
seed = 7492L,
parallel = TRUE,
nlambda = 30L,
lambda_min_ratio = 0.10,
keep_frac = 0.75,
rho_couple = 1.5,
alpha = 0.5,
coupling_mode = "signal_and_suppress",
max_outer_iter = 8L,
max_iter = 1000L
)
fit_stab <- pecan_fit(
st,
method = "hglasso",
stability = TRUE,
control = ctl_stab
)
fit_stabpecan_fit
method : hglasso (+ stability selection)
mode : constrained
stratum : <none>
levels : species, genus
precision : available (prec_mx column)
results : 2 rows (level x stratum)
# A tibble: 2 × 3
level stratum n_edges
<chr> <chr> <int>
1 species all 25
2 genus all 4
The details slot contains the full stability output, keyed by stratum and then by level. The most useful per-level diagnostics are:
pi_thr: the calibrated selection probability threshold. An edge is retained if its maximum selection probability across the lambda path meets or exceeds this value. Higher thresholds mean stricter selection.ev_bound: the expected number of falsely selected edges atpi_thr, derived from the MB bound. This is the quantity controlled bytarget_ev.fdr_bound: the implied FDR bound atpi_thr, i.e.ev_bound / max(R, 1). Useful for comparing networks of different sizes.R: the number of selected edges atpi_thr.subsample_m: actual subsample size used (≈n × sub_frac, subject tomin_gap).
s <- fit_stab$details$all
cat("--- species ---\n")--- species ---
cat("Threshold (π): ", s$species$pi_thr, "\n")Threshold (π): 0.95
cat("Selected edges (R): ", s$species$R, "\n")Selected edges (R): 25
cat("Expected false positives: ", round(s$species$ev_bound, 1), "\n")Expected false positives: 29
cat("FDR bound: ", round(s$species$fdr_bound, 3), "\n")FDR bound: 1.159
cat("Subsample size (m): ", s$species$subsample_m, "\n")Subsample size (m): 64
cat("\n--- genus ---\n")
--- genus ---
cat("Threshold (π): ", s$genus$pi_thr, "\n")Threshold (π): 0.59
cat("Selected edges (R): ", s$genus$R, "\n")Selected edges (R): 4
cat("Expected false positives: ", round(s$genus$ev_bound, 1), "\n")Expected false positives: 14.5
cat("FDR bound: ", round(s$genus$fdr_bound, 3), "\n")FDR bound: 3.63
Interpreting these numbers. The species threshold π = 0.95 means an edge must have appeared in at least 95% of the 25 subsamples to be retained — a very conservative bar. At this threshold, 25 out of 66 possible species pairs are selected, and the MB bound guarantees at most 29 of those are expected to be false positives (FDR bound ≤ 1.16). With only 25 subsamples the bound is loose; the calibration pushed π to the top of the search grid because target_ev = 15 was not achievable at this sample size — a common outcome with small B and modest n/p. Increasing to B = 50 tightens the bound substantially.
The genus level is better calibrated: π = 0.59 and ev_bound = 14.5 (just at the target of 15), with 4 edges selected from 6 possible pairs. The FDR bound 3.63 exceeds 1 — technically valid but uninformative — because R is very small. When R is small, even one false positive implies a high FDR; this is a known limitation of the MB framework for dense regularisation at coarse levels.
Rule of thumb: a π ≥ 0.75 with ev_bound ≤ target_ev and FDR bound ≤ 0.20 is a comfortable regime. Values outside that range, as here, indicate either more subsamples or a stronger regularisation path are needed.
The results tibble (stability fits). When stability = TRUE, the results tibble has the same six core columns as always (level, stratum, nodes, adj_mx, prec_mx, n_edges) but does not contain path arrays — those columns (adj_arr, prec_arr, lambda, lambda_opt, ebic_scores) are only present for single-path fits. The selection probability history and calibration diagnostics live in details.
The details list (stability fits). Keyed by stratum, then level. The path arrays from single-path fits (adj_arr, prec_arr, lambda) are replaced by calibration outputs and the full selection probability history:
| Field | Description |
|---|---|
adj_mx |
Logical adjacency matrix (same as in results) |
p_hat_max |
p × p matrix of max selection probability across the lambda path |
p_hat_array |
p × p × L array of selection probabilities at each lambda step |
pi_thr |
Calibrated threshold |
R |
Number of selected edges |
ev_bound |
Expected false positives at pi_thr (MB bound) |
fdr_bound |
ev_bound / max(R, 1) |
prec_mx |
Precision matrix refitted at the optimal lambda |
lambda |
Lambda path vector (length L) |
B, sub_frac, subsample_m |
Subsampling parameters used |
p_hat_max is the most useful field for downstream work — it shows how close every edge came to being selected, regardless of whether it crossed the threshold:
round(fit_stab$details$all$species$p_hat_max, 2)[1:5, 1:5] SpeciesA1 SpeciesA2 SpeciesA3 SpeciesB1 SpeciesB2
SpeciesA1 0.00 0.72 1.00 0.20 0.04
SpeciesA2 0.72 0.00 1.00 0.08 0.04
SpeciesA3 1.00 1.00 0.00 0.00 0.08
SpeciesB1 0.20 0.08 0.00 0.00 0.80
SpeciesB2 0.04 0.04 0.08 0.80 0.00
prec_mx is the precision matrix Θ refitted on the full dataset at the optimal lambda. For hglasso stability fits it is also available via results$prec_mx:
round(fit_stab$results$prec_mx[[1]][1:4, 1:4], 3) SpeciesA1 SpeciesA2 SpeciesA3 SpeciesB1
SpeciesA1 1.467 0.000 0.200 0.000
SpeciesA2 0.000 1.296 0.194 0.000
SpeciesA3 0.200 0.194 1.414 0.000
SpeciesB1 0.000 0.000 0.000 1.228
Stratified fit
Passing a stratified pecan_strata object to pecan_fit() requires no additional arguments — the function iterates over strata automatically and applies the same control list to each:
fit_strat <- pecan_fit(
st_strat,
method = "hglasso",
stability = TRUE,
control = ctl_stab
)
fit_stratpecan_fit
method : hglasso (+ stability selection)
mode : constrained
stratum : stratum
levels : species, genus
precision : available (prec_mx column)
results : 4 rows (level x stratum)
# A tibble: 4 × 3
level stratum n_edges
<chr> <chr> <int>
1 species stratum1 34
2 genus stratum1 10
3 species stratum2 21
4 genus stratum2 10
The results tibble now has one row per level × stratum (four rows instead of two), and stratum_var records which metadata column drove the split:
fit_strat$results[, c("level", "stratum", "n_edges")]# A tibble: 4 × 3
level stratum n_edges
<chr> <chr> <int>
1 species stratum1 34
2 genus stratum1 10
3 species stratum2 21
4 genus stratum2 10
cat("stratum_var:", fit_strat$stratum_var, "\n")stratum_var: stratum
The details list gains one key per stratum, each containing the same per-level sub-list as the single-stratum case:
# Top-level keys: one per stratum
names(fit_strat$details)[1] "stratum1" "stratum2"
# Each stratum has the same per-level structure
names(fit_strat$details$stratum1)[1] "species" "genus"
pecan_plot()
pecan_plot() produces a combined heatmap + network layout. The layout adapts automatically to whether the fit has one stratum or many.
Key arguments:
| Argument | Default | Effect |
|---|---|---|
edge_color |
"steelblue" |
Colour of inferred edges |
edge_alpha |
0.8 | Transparency of edges |
node_size |
1.5 | Node point size |
drop_isolates |
TRUE | Hide unconnected nodes |
show_implied |
FALSE | Draw implied coarse-level edges |
implied_color / implied_lty |
"grey50" / "dashed" |
Style of implied edges |
heatmap |
TRUE | Show heatmap (single-stratum) or faceted heatmap (multi-stratum) |
heatmap_ncol |
1 | Columns in multi-stratum heatmap facet |
Single-stratum
For a single-stratum fit, the default layout places the coarse-level heatmap on the left and the two network panels stacked on the right. The heatmap fill encodes the number of cross-group fine-level edges supporting each coarse pair; a solid border marks a direct coarse-level edge. Dashed borders (show_implied = TRUE) mark pairs linked by fine-level edges that did not themselves reach the selection threshold at the coarse level.
pecan_plot(
fit_stab,
edge_color = "#0072B2",
show_implied = TRUE,
implied_lty = "dashed",
implied_alpha = 0.4,
implied_color = "#D55E00"
)Multi-stratum
For multi-stratum fits, heatmap = TRUE (default) returns a faceted heatmap — one panel per stratum — so genus-level patterns can be compared across conditions on a common axis. Pass heatmap = FALSE for the classic level × stratum network grid.
pecan_plot(
fit_strat,
edge_color = "#0072B2",
show_implied = TRUE,
implied_color = "#D55E00",
heatmap_ncol = 2L
)pecan_plot(
fit_strat,
heatmap = FALSE,
edge_color = "#0072B2",
show_implied = TRUE,
implied_color = "#D55E00"
)The two strata have fully orthogonal genus-level topologies: "stratum1" has three active genus pairs — A–D cooperating, B–E competing, and C–F cooperating; "stratum2" reassigns every genus to a different partner — A–F cooperating, B–C competing, and D–E cooperating. The species-level networks mirror this: the cross-genus species edges are in completely different positions, and the within-genus correlated pairs differ too (SpA1–SpA2 and SpD1–SpD2 are correlated in stratum1; SpA1–SpA2 and SpF1–SpF2 in stratum2). The heatmap facets should show dark tiles in non-overlapping positions across the two panels.