Lecture 10: Data & Coding Best Practice + Transparency

Marc Cadotte

Why Standards Matter

  • Reproducibility ≠ optional — it’s core to science
  • Transparent workflows → trust + collaboration
  • Your future self will thank you

What Transparency Looks Like

  • Show how you got your results, not just results
  • Share code + data + documentation
  • FAIR principles
    Findable | Accessible | Interoperable | Reusable

Data Hygiene & Organization

  • Separate raw data vs processed outputs
  • Use consistent file/folder naming
  • Avoid spaces → snake_case or kebab-case
  • Backups, mirrors, cloud storage

Structuring Data for Analysis

  • Tidy format → rows = observations | columns = variables
  • No merged cells, hidden formatting, or embedded metadata
  • Create a metadata file with:
    • units
    • methods
    • variable definitions
    • provenance

Good Coding Style

  • Clear object + function names (avoid x1, tmp2, df3)
  • Comment why, not what
  • Break code into reusable functions
  • Keep scripts modular instead of one long file

Reproducible Pipelines

  • Use automation tools to avoid manual re-runs
  • R Markdown / Quarto for integrated code + text
  • targets, drake, make, or Snakemake for pipelines
  • Record session info + set random seeds

Documentation Essentials

  • Include a README at project root:
    • goals, structure, environment
    • where data came from, how to run analysis
  • Create data dictionaries / codebooks
  • Use roxygen2 for function documentation in R

Version Control with Git

  • Track changes + reduce file “vXX_final_FINAL2.R”
  • Meaningful commit messages:
    🟢 Good: "add PCA visualization + clean data import"
    🔴 Bad: "fix stuff"
  • Use branches + pull requests for collaboration

Transparent Analytics

  • Show model assumptions, choices, diagnostics
  • Avoid hiding scripts — share reproducible code
  • Consider pre-registration for confirmatory work

Data Sharing & Ethics

  • Licenses → CC-BY, MIT, GPL etc.
  • Sensitive data needs controlled access
  • Use repositories:
    • Zenodo, Dryad, OSF, GitHub releases

Final Takeaways

  • Document early, document always
  • Automate everything you can
  • Reproducibility isn’t extra work — it is the work
  • The goal: research that anyone can run, inspect, and trust