- What is reproducible research?
- Why do we care?
- Why reproducibility questions arise?
- The cost of reproducibility
- Reproducibility and statistics
- Current status of reproducibility
- What can we do?
Science is the systematic enterprise of gathering knowledge about the universe and organizing and condensing that knowledge into testable laws and theories
The success and credibility of science are anchored in the willingness of scientists to expose their ideas and results to independent testing and replication by other scientists.
Reproducible research is the ultimate standard for strengthening scientific evidence by independent:
Two pages from Galilei's Sidereus Nuncius (“The Starry Messenger” or “The Herald of the Stars”), Venice, 1610. http://journals.plos.org/ploscompbiol/article?id=info:doi/10.1371/journal.pcbi.1003542
High-throughput biology generates volumes of data
Data-generating technologies are increasingly used to make clinical recommendations and treatment decisions
A problem may be overlooked .. Published .. Get in clinical trials
https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0917-0
http://www.nytimes.com/2011/07/08/health/research/08genes.html
Published
... [3,] 1881_at [4,] 31321_at [5,] 31725_s_at [6,] 32307_r_at ...
Published Replicated
... [3,] 1881_at 1882_g_at [4,] 31321_at 31322_at [5,] 31725_s_at 31726_at [6,] 32307_r_at 32308_r_at ...
http://www.nationalacademies.org/hmd/reports/2012/evolution-of-translational-omics.aspx
http://science.sciencemag.org/content/335/6076/1554.long
http://www.nature.com/nature/journal/v502/n7471/full/nature12564.html
Data/metadata used to develop test should be made publicly available
The computer code and fully specified computational procedures used for development of the candidate omics-based test should be made sustainably available
"… the computer code … will encompass all of the steps of computational analysis, including all data preprocessing steps … All aspects of the analysis need to be transparently reported"
http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002165
Human beings do not have very many natural defenses. We are not all that fast, and we are not all that strong. We do not have claws or fangs or body armor. We cannot spit venom. We cannot camouflage ourselves. And we cannot fly. Instead, we survive by means of our wits. Our minds are quick. We are wired to detect patterns and respond to opportunities and threats without much hesitation.
https://www.amazon.com/Signal-Noise-Many-Predictions-Fail-but/dp/0143125087
http://journals.plos.org/plosmedicine/article?id=10.1371/journal.pmed.0020124
http://marginalrevolution.com/marginalrevolution/2005/09/why_most_publis.html
Remember that the more hypotheses which are tested and the less selection which goes into choosing hypotheses - the more likely it is that you are looking at noise
Bigger samples are better
Small effects are to be distrusted
Multiple sources and types of evidence are desirable
Evaluate literatures not individual papers
Trust empirical papers which test other people's theories more than empirical papers which test the author's theory
As an editor or referee, don't reject papers that fail to reject the null
Large-scale collaborative research
Adoption of replication culture
Registration (of studies, protocols, analysis codes, datasets, raw data, and results)
Sharing (of data, protocols, materials, software)
More appropriate statistical methods
Standardization of definitions and analyses
More stringent thresholds for claiming "breakthroughs"
…
http://amstat.tandfonline.com/doi/abs/10.1080/00031305.2016.1154108
http://www.nature.com/news/statisticians-issue-warning-over-misuse-of-p-values-1.19503
P-values can indicate how incompatible the data are with a specified statistical model
P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone
Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold
Proper inference requires full reporting and transparency
A p-value, or statistical significance, does not measure the size of an effect or the importance of a result
By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis
http://www.nature.com/nature/journal/v483/n7391/full/483531a.html
http://www.nature.com/news/policy-nih-plans-to-enhance-reproducibility-1.14586
https://www.nlm.nih.gov/NIHbmic/nih_data_sharing_policies.html
https://gds.nih.gov/03policy2.html
https://grants.nih.gov/grants/policy/model_organism/
https://www.nsf.gov/bfa/dias/policy/rcr.jsp
https://web.stanford.edu/~vcs/papers/RoundtableDeclaration2010.pdf
http://scitation.aip.org/content/aip/journal/cise/11/1/10.1109/MCSE.2009.19
http://www.equator-network.org/ - over 300 reporting guidelines
Empirical reproducibility
Computational reproducibility
Statistical reproducibility
Empirical reproducibility
Computational reproducibility
Statistical reproducibility
The most important is the mindset, when starting, that the end product will be reproducible.
The most important is the mindset, when starting, that the end product will be reproducible.
http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-5-80
Problems
With point-and-click, there’s no way to record/save the steps that generated the (copy/pasted) results
Data files are kept separately from the analysis code, and from reports
After modifications of one of the files, it becomes unclear which version corresponds exactly to the reported results
Every time something changes, you have to regenerate the figures/results/reports by hand – very time consuming
Explicitly import text data as text, numeric as numeric, etc.
One thing in a cell (avoid comments, color coding, etc.)
Choose comprehensive variable names. Use "_" instead of spaces. Be consistent
Save the data as CSV, comma-separated values
Avoid calculations
The report is automated via code
Data is attached to the well-documented code
History of any changes should be preserved
The final report should be self-sufficient and reproducible with a single command
Unix is a family of operating systems and environments that exploits the power of linguistic abstractions to perform tasks
Unix users spend a lot of time at the command line
In Unix, a command is worth a thousand mouse clicks
Make is a tool which controls the workflow of generating target/result files from the dependencies/source files
Automates/documents a workflow
Intelligently handles the dependencies among data files, code
Accounts for the updates in data, code
R – free/open source programming language
Runs on Windows, Mac, and Linux
Extensible with a very large collection of actively developing packages
Excellent graphics & report-creating capabilities
A report containing a stream of text and code chunks
Each code chunk loads data, computes results, shows figures
Each text chunk explains how the code chunks work
The resulting report is human- and machine readable
HTML - HyperText Markup Language, used to create web pages. Developed in 1993
LaTeX – a typesetting system for production of technical/scientific documentation, PDF output. Developed in 1994
Sweave – a tool that allows embedding of the R code in LaTeX documents, PDF output. Developed in 2002
reStructuredText - a markup syntax that plays well with Python and Sphinx documentation. Developed in 2002
Markdown – a lightweight markup language for plain text formatting syntax. Easily converted to HTML. Developed in 2004
knitR - a package for dynamic report generation written in R Markdown
Supports RMarkdown, LaTeX, MathJax. PDF, HTML, DOCX output
Developed in 2012 by Yihui Xie
Code chunks separated by ```. Inline code allowed
Graphics, code generated and external images
Tables, code generated and in RMarkdown format
Caching long code chunks
Code chunks and results output fully customizable
Literate programming in other languages
Git and GitHub – version control system
Each project stored in its own repository
Keep history of changes – track what you did
Ability to go back if something breaks
Branch out, go creative, then merge or revert the changes
Collaborate through merging changes from multiple people
IPython/Jupyter Notebooks
Research environments
Docker - an envelope (or container) for the whole project environment ~ lightweight virtual machine
Are the tables and figures reproducible from the code and data?
Does the code actually do what you think it does?
In addition to what was done, is it clear why it was done? (e.g., how parameter settings were chosen?)
Is your code scalable to accommodate more data/methods?
The results cannot be reproduced
The results cannot seem to be reproduced
Reproducibility requires extreme effort
Reproducibility requires considerable effort
Easy reproducibility, but require some proprietary source packages (MATLAB, SAS, etc.)
The results can be easily reproduced by an independent researcher with at most 15 min of user effort, requiring only standard, freely available tools (C compiler, R, Python, etc.)
Begin with the final product in mind
Use literate programming (self-documenting code)
Keep history of changes via code versioning and sharing
Get basic statistics right
Set stringent cutoffs, correct p-values for multiple testing
Be critical, consider batch effects, visualize, do sanity checks, use random controls, cross-validation
Follow reporting guidelines
BIOS 692 “Reproducible Research Tools”
References https://mdozmorov.github.io/BIOS692/pages/references.html
This presentation on GitHub:
https://github.com/mdozmorov/presentations
Mikhail Dozmorov, Ph.D.
Assistant professor, Department of Biostatistics, VCU