Lecture 10: Data & Coding Best Practice + Transparency
Why Standards Matter
- Reproducibility ≠ optional — it’s core to science
- Transparent workflows → trust + collaboration
- Your future self will thank you
What Transparency Looks Like
- Show how you got your results, not just results
- Share code + data + documentation
- FAIR principles
Findable | Accessible | Interoperable | Reusable
Data Hygiene & Organization
- Separate raw data vs processed outputs
- Use consistent file/folder naming
- Avoid spaces →
snake_case or kebab-case
- Backups, mirrors, cloud storage
Structuring Data for Analysis
- Tidy format → rows = observations | columns = variables
- No merged cells, hidden formatting, or embedded metadata
- Create a metadata file with:
- units
- methods
- variable definitions
- provenance
Good Coding Style
- Clear object + function names (avoid x1, tmp2, df3)
- Comment why, not what
- Break code into reusable functions
- Keep scripts modular instead of one long file
Reproducible Pipelines
- Use automation tools to avoid manual re-runs
- R Markdown / Quarto for integrated code + text
targets, drake, make, or Snakemake for pipelines
- Record session info + set random seeds
Documentation Essentials
- Include a README at project root:
- goals, structure, environment
- where data came from, how to run analysis
- Create data dictionaries / codebooks
- Use
roxygen2 for function documentation in R
Version Control with Git
- Track changes + reduce file “vXX_final_FINAL2.R”
- Meaningful commit messages:
🟢 Good: "add PCA visualization + clean data import"
🔴 Bad: "fix stuff"
- Use branches + pull requests for collaboration
Transparent Analytics
- Show model assumptions, choices, diagnostics
- Avoid hiding scripts — share reproducible code
- Consider pre-registration for confirmatory work
Data Sharing & Ethics
- Licenses → CC-BY, MIT, GPL etc.
- Sensitive data needs controlled access
- Use repositories:
- Zenodo, Dryad, OSF, GitHub releases
Final Takeaways
- Document early, document always
- Automate everything you can
- Reproducibility isn’t extra work — it is the work
- The goal: research that anyone can run, inspect, and trust