This document compares two approaches to ensuring reproducibility in analytical projects:
Both approaches aim to make past results reproducible and auditable. The difference lies in how much of that responsibility is carried by people and process versus encoded directly into the system.
Under the folder-based approach:
renvdata/current/ directorydata/current/ are moved to
data/past/data/current/To recreate or audit a past result, the user:
renv::restore()data/past/data/current/This approach relies on consistent naming conventions, careful file movement, and institutional knowledge.
Under the bcgovpond approach:
renvdata_store/data_pond/data_index/meta/data_index/views/Analysis scripts never refer to mutable directories. Instead, they resolve logical data names through versioned views.
To recreate or audit a past result, the user:
renv::restore()The mapping from logical datasets to physical files is fully defined by the checked-out commit.
| Aspect | Folder-based approach | bcgovpond approach |
|---|---|---|
| Code versioning | Explicit (git) | Explicit (git) |
| Package versions | Explicit (renv) |
Explicit (renv) |
| Raw data preservation | Explicit (data/past/) |
Explicit (data_store/data_pond/) |
| Data selection | Implicit, procedural | Explicit, versioned |
| Mutable runtime state | data/current/ |
None |
| File movement during audits | Required | None |
| Audit trail | Reconstructed | Native |
The folder-based approach requires users to remember and correctly execute a sequence of manual steps to reproduce results. Errors in file selection or movement can silently invalidate a reproduction attempt.
The bcgovpond approach eliminates these steps by encoding data selection directly in versioned metadata. Reproducibility does not depend on remembering procedural details.
The legacy approach works well when users are careful and unhurried. Under time pressure, however, mutable directories and manual file movement increase the risk of subtle mistakes.
bcgovpond reduces this risk by removing mutable shared state and file shuffling from the reproduction process.
With the folder-based approach, explaining how a result was produced often requires reconstructing the sequence of actions that led to it.
With bcgovpond, the explanation is structural: the git commit itself defines the code, the environment, and the data selection used.
The folder-based approach may be adequate when:
The bcgovpond approach provides clear benefits when:
Both approaches support reproducibility in principle. The key difference is where reproducibility lives:
In practice, bcgovpond reduces cognitive load, lowers audit risk, and improves long-term defensibility by making the correct workflow the easiest one to follow.